Fooling neural networks¶

Here we show how to fool a neural network using a gradient ascent technique over the input.

In [11]:
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions
from tensorflow.keras import backend as K
from tensorflow.keras import losses 
import numpy as np
import matplotlib.pyplot as plt

Let us start importing the VGG16 model.

In [7]:
model = VGG16(weights='imagenet', include_top=True)
#model.summary()

Now, we load an image (in our case, an elephant)

In [9]:
from google.colab import files
uploaded = files.upload()
Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.
Saving elephant2.jpg to elephant2 (1).jpg

Next, we classify it.

VGG16 is higly confident it is an elephant.

In [12]:
img = image.load_img('elephant2.jpg', target_size=(224, 224))

x0 = image.img_to_array(img)
x = np.expand_dims(x0, axis=0)
preds = model.predict(x)
print("label = {}".format(np.argmax(preds)))
print('Predicted:', decode_predictions(preds, top=3)[0])

xd = image.array_to_img(x[0])
imageplot = plt.imshow(xd)
plt.show()
label = 385
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
40960/35363 [==================================] - 0s 0us/step
49152/35363 [=========================================] - 0s 0us/step
('Predicted:', [(u'n02504013', u'Indian_elephant', 0.5068745), (u'n02504458', u'African_elephant', 0.27114946), (u'n01871265', u'tusker', 0.21195689)])
No description has been provided for this image

Now we try to convert the image into something different: a tiger shark, with label 3.

In [55]:
output_index = 3 #tiger shark

expected_output = np.zeros(1000)
expected_output[output_index] = 1
expected_output = K.variable(np.reshape(expected_output,(1,1000)))

Now we simply iterate the gradient ascent technique for a sufficent number of steps, working on a copy of the original image

In [71]:
input_img_data = np.copy(x)

# run gradient ascent for 50 steps
for i in range(50):
    print("iteration n. {}".format(i))
    with tf.GradientTape() as g:
      x = K.variable(input_img_data)
      y = model(x)
      loss = tf.keras.losses.categorical_crossentropy(y,expected_output)
    res = y[0]
    print("elephant prediction: {}".format(res[386]))
    print("tiger shark prediction: {}".format(res[3]))
    grads_value = g.gradient(loss, x)[0]
    print(grads_value.shape)
    ming = np.min(grads_value)
    maxg = np.max(grads_value)
    #print("min grad = {}".format(ming))
    #print("max grad = {}".format(maxg))
    scale = 1/(maxg -ming)
    #brings gradients to a sensible value
    input_img_data -= grads_value * scale
iteration n. 0
elephant prediction: 0.271149456501
tiger shark prediction: 6.29209928604e-09
(224, 224, 3)
iteration n. 1
elephant prediction: 0.281902700663
tiger shark prediction: 1.5599056491e-09
(224, 224, 3)
iteration n. 2
elephant prediction: 0.269811719656
tiger shark prediction: 6.18359363713e-09
(224, 224, 3)
iteration n. 3
elephant prediction: 0.28002473712
tiger shark prediction: 1.58007817941e-09
(224, 224, 3)
iteration n. 4
elephant prediction: 0.291066497564
tiger shark prediction: 3.82965881229e-10
(224, 224, 3)
iteration n. 5
elephant prediction: 0.295582085848
tiger shark prediction: 3.7838754352e-10
(224, 224, 3)
iteration n. 6
elephant prediction: 0.295692920685
tiger shark prediction: 4.01752936474e-10
(224, 224, 3)
iteration n. 7
elephant prediction: 0.288921773434
tiger shark prediction: 1.42619627219e-09
(224, 224, 3)
iteration n. 8
elephant prediction: 0.2917060256
tiger shark prediction: 1.58619251067e-09
(224, 224, 3)
iteration n. 9
elephant prediction: 0.297342151403
tiger shark prediction: 4.2315012605e-10
(224, 224, 3)
iteration n. 10
elephant prediction: 0.295875966549
tiger shark prediction: 5.48357081964e-10
(224, 224, 3)
iteration n. 11
elephant prediction: 0.293566584587
tiger shark prediction: 8.2218165609e-10
(224, 224, 3)
iteration n. 12
elephant prediction: 0.287314027548
tiger shark prediction: 2.980256264e-09
(224, 224, 3)
iteration n. 13
elephant prediction: 0.283039689064
tiger shark prediction: 4.31411040225e-09
(224, 224, 3)
iteration n. 14
elephant prediction: 0.290171891451
tiger shark prediction: 1.17616960615e-09
(224, 224, 3)
iteration n. 15
elephant prediction: 0.296473622322
tiger shark prediction: 3.13109649319e-10
(224, 224, 3)
iteration n. 16
elephant prediction: 0.290288418531
tiger shark prediction: 1.20128740289e-09
(224, 224, 3)
iteration n. 17
elephant prediction: 0.286366939545
tiger shark prediction: 1.9773198634e-09
(224, 224, 3)
iteration n. 18
elephant prediction: 0.281111091375
tiger shark prediction: 3.47145623358e-09
(224, 224, 3)
iteration n. 19
elephant prediction: 0.277514845133
tiger shark prediction: 6.03332495075e-09
(224, 224, 3)
iteration n. 20
elephant prediction: 0.274204820395
tiger shark prediction: 1.23049765932e-08
(224, 224, 3)
iteration n. 21
elephant prediction: 0.269815891981
tiger shark prediction: 2.52712180071e-08
(224, 224, 3)
iteration n. 22
elephant prediction: 0.264236658812
tiger shark prediction: 4.91879994513e-08
(224, 224, 3)
iteration n. 23
elephant prediction: 0.252512007952
tiger shark prediction: 2.27731902669e-07
(224, 224, 3)
iteration n. 24
elephant prediction: 0.243171066046
tiger shark prediction: 1.09528605208e-06
(224, 224, 3)
iteration n. 25
elephant prediction: 0.224495723844
tiger shark prediction: 6.56170777802e-06
(224, 224, 3)
iteration n. 26
elephant prediction: 0.200678929687
tiger shark prediction: 2.92676668323e-05
(224, 224, 3)
iteration n. 27
elephant prediction: 0.170531451702
tiger shark prediction: 0.000120037228044
(224, 224, 3)
iteration n. 28
elephant prediction: 0.138410657644
tiger shark prediction: 0.000439859926701
(224, 224, 3)
iteration n. 29
elephant prediction: 0.107356064022
tiger shark prediction: 0.00153460924048
(224, 224, 3)
iteration n. 30
elephant prediction: 0.0794935971498
tiger shark prediction: 0.00462667644024
(224, 224, 3)
iteration n. 31
elephant prediction: 0.0558861494064
tiger shark prediction: 0.0129528734833
(224, 224, 3)
iteration n. 32
elephant prediction: 0.0374869778752
tiger shark prediction: 0.0319199636579
(224, 224, 3)
iteration n. 33
elephant prediction: 0.0238489937037
tiger shark prediction: 0.0695150569081
(224, 224, 3)
iteration n. 34
elephant prediction: 0.0149186002091
tiger shark prediction: 0.130459263921
(224, 224, 3)
iteration n. 35
elephant prediction: 0.00892047211528
tiger shark prediction: 0.218916848302
(224, 224, 3)
iteration n. 36
elephant prediction: 0.00608869967982
tiger shark prediction: 0.313300192356
(224, 224, 3)
iteration n. 37
elephant prediction: 0.00404852442443
tiger shark prediction: 0.418670773506
(224, 224, 3)
iteration n. 38
elephant prediction: 0.00275968248025
tiger shark prediction: 0.515644490719
(224, 224, 3)
iteration n. 39
elephant prediction: 0.00165919016581
tiger shark prediction: 0.614631593227
(224, 224, 3)
iteration n. 40
elephant prediction: 0.000994459958747
tiger shark prediction: 0.695274949074
(224, 224, 3)
iteration n. 41
elephant prediction: 0.000552096113097
tiger shark prediction: 0.76745647192
(224, 224, 3)
iteration n. 42
elephant prediction: 0.000285832385998
tiger shark prediction: 0.828008115292
(224, 224, 3)
iteration n. 43
elephant prediction: 0.000141570984852
tiger shark prediction: 0.878301978111
(224, 224, 3)
iteration n. 44
elephant prediction: 6.99893789715e-05
tiger shark prediction: 0.915232658386
(224, 224, 3)
iteration n. 45
elephant prediction: 3.54357543983e-05
tiger shark prediction: 0.938627004623
(224, 224, 3)
iteration n. 46
elephant prediction: 1.53456494445e-05
tiger shark prediction: 0.958389043808
(224, 224, 3)
iteration n. 47
elephant prediction: 6.75061437505e-06
tiger shark prediction: 0.971303224564
(224, 224, 3)
iteration n. 48
elephant prediction: 2.81107531919e-06
tiger shark prediction: 0.98096114397
(224, 224, 3)
iteration n. 49
elephant prediction: 1.21003927234e-06
tiger shark prediction: 0.987086653709
(224, 224, 3)

At the end, VGG16 is extremely confident he is looking at a tiger shark

In [72]:
preds = model.predict(input_img_data)
print("label = {}".format(np.argmax(preds)))
print('Predicted:', decode_predictions(preds, top=3)[0])
label = 3
('Predicted:', [(u'n01491361', u'tiger_shark', 0.9913406), (u'n01494475', u'hammerhead', 0.003008757), (u'n01484850', u'great_white_shark', 0.0024799022)])

Let us look at the resulting image (we both print the original and the processed image)

In [73]:
nimg = input_img_data[0]
nimg = image.array_to_img(img)

plt.figure(figsize=(10,5))
ax = plt.subplot(1, 2, 1)
plt.title("elephant")
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.imshow(xd)
ax = plt.subplot(1, 2, 2)
plt.imshow(nimg)
plt.title("tiger shark")
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)

imageplot = plt.imshow(img)
plt.show()
No description has been provided for this image

We just fooled the neural network!