What are Variational Auto Encoders

28 May

Variational Autoencoders (VAEs) are a class of machine learning models that belong to the family of autoencoders. They are primarily used for generating complex data samples, like images or music, and for learning meaningful data representations. VAEs stand out due to their foundation in the principles of Bayesian inference, which allows them to model the underlying probability distribution of data.

Key Concepts of VAEs

Autoencoder Architecture

Autoencoders, the broader category to which VAEs belong, are neural networks designed for unsupervised learning. They work by encoding input data into a compressed representation and then reconstructing the output from this representation. The goal is to capture the most relevant features in the compressed representation.

Variational Aspect

Unlike standard autoencoders, VAEs introduce a probabilistic twist: they model the encoder’s outputs as distributions—typically Gaussian—over the latent space. This means each input data point is mapped to a distribution over latent variables rather than a single point. These distributions are characterized by two properties: mean (μ) and variance (σ²), which the encoder learns to predict.

Reparameterization Trick

A key component of training VAEs is the reparameterization trick, which allows for the backpropagation of gradients through stochastic nodes. Instead of sampling from the distribution directly, VAEs sample from a standard normal distribution and then transform these samples using the learned mean and variance. This approach helps in optimizing the network using gradient descent techniques effectively.

Loss Function

The loss function in VAEs consists of two terms:

Reconstruction Loss: This measures how effectively the decoder reconstructs the original input from the latent variables. It encourages the decoded samples to be as close as possible to the original inputs.
KL Divergence: This component acts as a regularizer and is the Kullback-Leibler divergence between the learned latent variable distributions and the prior distribution (typically a standard normal distribution). This term ensures that the distributions of latent variables are as close as possible to the prior, promoting generalization and smooth latent space.

Applications of VAEs

Data Generation: VAEs can generate new data samples that are similar to the training data, useful in fields such as synthetic data generation for training machine learning models where data may be limited or sensitive.
Feature Extraction: They are effective for dimensionality reduction, similar to PCA, but with the ability to capture non-linear relationships in data.
Anomaly Detection: By learning to reconstruct normal patterns of data, VAEs can be used to identify data points that do not follow these patterns.

Senthil Ravindran https://www.senthilravindran.com/