Road to ML Engineer #17

Autoencoder

An autoencoder combines two neural networks: an encoder, which reduces the dimensions into a latent space, and a decoder, which expands the dimensions back to the original size. The output of the encoder is connected to the input of the decoder, and the autoencoder model is trained to reproduce the original image using backpropagation, minimizing the mean squared error between the output image and the original input image. The following is an example of an autoencoder architecture that reduces dimensions from 784 to 32, and then expands them back from 32 to 784.

Due to its structure, the encoder can be trained to reduce the dimensionality while retaining the important relationships between features, including non-linear relationships with the help of neural networks, allowing the decoder to reproduce the original image. After training, the decoder can be discarded, and the encoder can be used solely for dimensionality reduction. Since PCA (Principal Component Analysis) can only capture linear relationships (as it relies on eigenvectors of the covariance matrix), one might argue that an autoencoder is a superior technique for dimensionality reduction. (If you're not familiar with PCA, I recommend checking out the article, Road to ML Engineer #8 - PCA. )

Code Implementation

Although building an autoencoder might seem complex at first glance, it is relatively simple to create using machine learning frameworks like PyTorch and TensorFlow.

Step 1 & 2. Data Exploration & Preprocessing

Before building the autoencoder, it is essential to preprocess the data appropriately, as shown below.

(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
 
# Flatten Images
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1]*X_train.shape[2])
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1]*X_test.shape[2])
 
# Normalize
def zscore(X, axis = None):
    X_mean = X.mean(axis=axis, keepdims=True)
    X_std  = np.std(X, axis=axis, keepdims=True)
    zscore = (X-X_mean)/X_std
    return zscore
 
X_train = zscore(X_train)
X_test = zscore(X_test)
 
# Validation Dataset
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=10000, random_state=101)

Next, we will create a Dataset and DataLoader for PyTorch. Keep in mind that the output of the autoencoder is the image itself, not a class label.

train_dataset = torch.utils.data.TensorDataset(X_train, X_train)
val_dataset = torch.utils.data.TensorDataset(X_val, X_val)
test_dataset = torch.utils.data.TensorDataset(X_test, X_test)
 
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)
val_loader = torch.utils.data.DataLoader(dataset=val_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=10000, shuffle=True)

Step 3. Models

The following is the implementation of the example autoencoder using PyTorch and TensorFlow.

Step 4. Model Evaluation

After training the models, we can use the test dataset and plot the decoded images to compare them with the original ones, to see if the autoencoder has effectively learned to reduce the dimensions. To do this, we can get the prediction from the models and reshape both the predictions and the test dataset from 784 back to (28, 28).

# TensorFlow predictions
preds = autoencoder.predict(X_test)
 
# PyTorch predictions
for X, y in test_loader:
  X_test = X
  preds = autoencoder(X)
 
X_test = X_test.numpy()
preds = preds.detach().numpy()
 
# Reshape them for visualization
X_test = X_test.reshape(X_test.shape[0], 28, 28)
preds = preds.reshape(preds.shape[0], 28, 28)

Then, we can use the function below to visualize 10 samples.

def plotImgs (X):
    plt.figure(figsize=(10, 4))
    for i in range(10):
        plt.subplot(2, 5, i + 1)
        plt.imshow(X[i], cmap='gray')
        plt.axis('off')
    plt.tight_layout()
    plt.show()
 
plotImgs(X_test)
plotImgs(preds)

The following is the result after training the autoencoder (in TensorFlow) for 3 epochs.

You can already observe that the autoencoder is properly learning to reconstruct the images from the smaller latent representations. We can access the encoder to get the image vectors in the latent space.

# TensorFlow Encoding
encoded_train = encoder.predict(X_train)
encoded_test = encoder.predict(X_test)
 
# PyTorch Encoding
for X, y in train_loader:
  encoded_train = autoencoder.encoder(X)
for X, y in test_loader:
  encoded_train = autoencoder.encoder(X)

Afterward, you can use these latent representations as inputs for a smaller classifier model to classify handwritten digits.

Decoder

Some of you might be wondering if it's possible to utilize the decoder to generate images by providing slightly modified vectors in the latent space. Let's try doing that to see if we can use the decoder for image generation. (Here, we will use TensorFlow.)

encoded = encoder.predict(X_test)
latent = np.random.normal(encoded, 1) # slight modification
decoded = decoder.predict(latent)
 
decoded = decoded.reshape(decoded.shape[0], 28, 28)
plotImgs(decoded)

Although some images may be mildly legible, perhaps due to the extent of modification, most of them are not. This is because we are not enforcing any rules on how the encoder organizes handwritten digits in the latent space, and the decoder is specifically trained to map that unknown latent space back to images.

How can we enforce rules on the autoencoder so that we know how to pick a vector from the latent space for the decoder to generate handwritten digits? I encourage you to brainstorm this until we cover it in the next article.