Generating Images with Generative Adversarial Networks (GANs)¶
The purpose of the project is to test the ability of Generative Adversial Networks (GANs) in generating realistic-looking images.
Dataset¶
The dataset used will be FashionMNIST. It contains low resolution ($28 \times 28$) grey-scale images representing different kind of clothes. The dataset is available on keras and accessable in $\texttt{tf.keras.datasets.fashion\_mnist}$. Note that the pixel values for the images are initially in the interval $[0, 255]$. It is required to normalize them since all of the algorithm we will use require them to be in that format. To be fair, you will find the dataset already normalized, do not modify that part of the code.
Metrics¶
Measuring the quality of newly generated images is a non-trivial task. Indeed, there is no label associated to each image, and thus it is impossible to measure the quality image-by-image. For that reason, common metrics uses statistical consideration on a generated dataset to test how well the network recovered the statistics of the original data. One of the most common is the Fréchet Inception Distance (FID). The idea of FID is that in a realistic-looking dataset of images, the statistics of the activation of the last hidden layer in a well-trained classificator should be similar to that of a dataset containing real images. Specifically, regarding FID, the Inception-v3 network is used as a classificator. A real dataset $\mathbb{D}_r$ and a generated dataset $\mathbb{D}_g$ are processed by the network, and the activation of the last hidden layer has mean and variance $(\mu_r, \Sigma_r)$, $(\mu_g, \Sigma_g)$ respectively. Then, FID is computed as:
$$ FID(\mathbb{D}_r, \mathbb{D}_g) = || \mu_r - \mu_g ||^2 + Tr(\Sigma_r + \Sigma_g - 2(\Sigma_r \ast \Sigma_g)^{\frac{1}{2}}) $$
A Python implementation of FID can be found in the file $\texttt{fid.py}$ that you find attached on Virtuale. Its usage is very simple, just generate $10k$ fake images with your GAN, and with the command $\texttt{fid.get\_fid(x\_test, x\_gen)}$, where $\texttt{x\_test}$ is the test set, containing $10k$ real images, you get the value for the FID of your network. Remember that, when passed through that function, $\texttt{x\_gen}$ must be a dataset of $10k$ images, in the interval $[0, 1]$. The number of $10k$ images is fundamental, since the value of FID strongly depends on the number of input images.
Limitations¶
You are required to implement a vanilla Generative Adversarial Network (GAN), not a variant of it (e.g. PixelGAN, CycleGAN, ... are not accepted). The maximum number of parameters is 15 million, and every pre-trained network can be used as an add-on (the number of parameters for pre-trained network does not count). Clearly, only the training set can be used to train the network, no additional images (Data Augmentation is ok).
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.datasets import fashion_mnist
import numpy as np
from matplotlib import pyplot as plt
The images are normalized in $[0, 1]$. For simplicity, images are padded to have dimension $32 \times 32$.
# Load the data. Note that the labels y_train and y_test are not loaded since not required.
(x_train, _), (x_test, _) = fashion_mnist.load_data()
# Normalize and pad the datasets
x_train = np.pad(x_train, ((0,0), (2,2), (2,2)))
x_train = np.reshape(x_train, x_train.shape + (1, ))
x_train = x_train / 255.
x_test = np.pad(x_test, ((0,0), (2,2), (2,2)))
x_test = np.reshape(x_test, x_test.shape + (1, ))
x_test = x_test / 255.
print(f"Training shape: {x_train.shape}, Training pixel values: {x_train.min(), x_train.max()}")
print(f"Test shape: {x_test.shape}, Test pixel values: {x_test.min(), x_test.max()}")
Now, we import the functions for the computation of the FID, and we test that FID(x_train, x_test) is low.
Note: Computing the FID function requires some minutes. Consequently, it is suggested to comment this cell after you tested once, to reduce the execution time of the notebook. To speed-up the process, after a first use, the function will generate a file containing the value of the activations of the test set, so that it does not have to compute it again every time.
Remember that, when you use the FID function, the first input MUST be the test set, while the second will be the generated images set.
"""
Do not modify this code. This is just for utilities.
"""
import os
from tensorflow.keras.applications.inception_v3 import InceptionV3
# prepare the inception v3 model
model = InceptionV3(include_top=False, pooling='avg', input_shape=(299, 299, 3), weights='imagenet')
def get_inception_activations(inps, batch_size=100):
"""
Compute the activation for the model Inception v3 for a given input 'inps'.
Note: inps is assumed to be normalized in [0, 1].
"""
n_batches = inps.shape[0] // batch_size
act = np.zeros([inps.shape[0], 2048], dtype=np.float32)
for i in range(n_batches):
# Load a batch of data
inp = inps[i * batch_size:(i + 1) * batch_size]
# Resize each image to match the input shape of Inception v3
inpr = tf.image.resize(inp, (299, 299))
# Resize images in the interval [-1, 1], given that inpr is in [0, 1].
inpr = inpr * 2 - 1
# Predict the activation
act[i * batch_size:(i + 1) * batch_size] = model.predict(inpr, steps=1)
print(f"Processed {str((i + 1) * batch_size)} images.")
return act
def get_fid(images1, images2):
"""
Compute the FID between two sets of images.
Note: it can take several minutes.
"""
from scipy.linalg import sqrtm
shape = np.shape(images1)[1]
print("Computing FID for {} dimensional images".format(images1.shape))
# Inception v3 requires the input to have 3 channel. If this is not the
# case, just copy the same channel three times.
if images1.shape[-1] == 1:
images1 = np.concatenate([images1, images1, images1], axis=-1)
images2 = np.concatenate([images2, images2, images2], axis=-1)
# activation for true images is always the same: we just compute it once
if os.path.exists("act_mu.npy"):
mu1 = np.load("act_mu.npy")
sigma1 = np.load("act_sigma.npy")
else:
act1 = get_inception_activations(images1)
mu1, sigma1 = act1.mean(axis=0), np.cov(act1, rowvar=False)
np.save("act_mu.npy", mu1)
np.save("act_sigma.npy", sigma1)
print('Done stage 1 of 2')
act2 = get_inception_activations(images2)
mu2, sigma2 = act2.mean(axis=0), np.cov(act2, rowvar=False)
print('Done stage 2 of 2')
# calculate sum squared difference between means
ssdiff = np.sum((mu1 - mu2) ** 2.0)
# compute sqrt of product between cov
covmean = sqrtm(sigma1.dot(sigma2))
# check and correct imaginary numbers from sqrt
if np.iscomplexobj(covmean):
covmean = covmean.real
# calculate score
fid = ssdiff + np.trace(sigma1 + sigma2 - 2.0 * covmean)
return fid
# Compute the FID between the Test set and (the first 10k images of) Train set (should be low)
train_fid = get_fid(x_test, x_train[:10_000])
# Print out the results
print(f"FID(x_test, x_train) = {train_fid}")