Fooling neural networks¶
Here we show how to fool a neural network using a gradient ascent technique over the input.
In [11]:
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg16 import preprocess_input, decode_predictions
from tensorflow.keras import backend as K
from tensorflow.keras import losses
import numpy as np
import matplotlib.pyplot as plt
Let us start importing the VGG16 model.
In [7]:
model = VGG16(weights='imagenet', include_top=True)
#model.summary()
Now, we load an image (in our case, an elephant)
In [9]:
from google.colab import files
uploaded = files.upload()
Saving elephant2.jpg to elephant2 (1).jpg
Next, we classify it.
VGG16 is higly confident it is an elephant.
In [12]:
img = image.load_img('elephant2.jpg', target_size=(224, 224))
x0 = image.img_to_array(img)
x = np.expand_dims(x0, axis=0)
preds = model.predict(x)
print("label = {}".format(np.argmax(preds)))
print('Predicted:', decode_predictions(preds, top=3)[0])
xd = image.array_to_img(x[0])
imageplot = plt.imshow(xd)
plt.show()
label = 385 Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json 40960/35363 [==================================] - 0s 0us/step 49152/35363 [=========================================] - 0s 0us/step ('Predicted:', [(u'n02504013', u'Indian_elephant', 0.5068745), (u'n02504458', u'African_elephant', 0.27114946), (u'n01871265', u'tusker', 0.21195689)])
Now we try to convert the image into something different: a tiger shark, with label 3.
In [55]:
output_index = 3 #tiger shark
expected_output = np.zeros(1000)
expected_output[output_index] = 1
expected_output = K.variable(np.reshape(expected_output,(1,1000)))
Now we simply iterate the gradient ascent technique for a sufficent number of steps, working on a copy of the original image
In [71]:
input_img_data = np.copy(x)
# run gradient ascent for 50 steps
for i in range(50):
print("iteration n. {}".format(i))
with tf.GradientTape() as g:
x = K.variable(input_img_data)
y = model(x)
loss = tf.keras.losses.categorical_crossentropy(y,expected_output)
res = y[0]
print("elephant prediction: {}".format(res[386]))
print("tiger shark prediction: {}".format(res[3]))
grads_value = g.gradient(loss, x)[0]
print(grads_value.shape)
ming = np.min(grads_value)
maxg = np.max(grads_value)
#print("min grad = {}".format(ming))
#print("max grad = {}".format(maxg))
scale = 1/(maxg -ming)
#brings gradients to a sensible value
input_img_data -= grads_value * scale
iteration n. 0 elephant prediction: 0.271149456501 tiger shark prediction: 6.29209928604e-09 (224, 224, 3) iteration n. 1 elephant prediction: 0.281902700663 tiger shark prediction: 1.5599056491e-09 (224, 224, 3) iteration n. 2 elephant prediction: 0.269811719656 tiger shark prediction: 6.18359363713e-09 (224, 224, 3) iteration n. 3 elephant prediction: 0.28002473712 tiger shark prediction: 1.58007817941e-09 (224, 224, 3) iteration n. 4 elephant prediction: 0.291066497564 tiger shark prediction: 3.82965881229e-10 (224, 224, 3) iteration n. 5 elephant prediction: 0.295582085848 tiger shark prediction: 3.7838754352e-10 (224, 224, 3) iteration n. 6 elephant prediction: 0.295692920685 tiger shark prediction: 4.01752936474e-10 (224, 224, 3) iteration n. 7 elephant prediction: 0.288921773434 tiger shark prediction: 1.42619627219e-09 (224, 224, 3) iteration n. 8 elephant prediction: 0.2917060256 tiger shark prediction: 1.58619251067e-09 (224, 224, 3) iteration n. 9 elephant prediction: 0.297342151403 tiger shark prediction: 4.2315012605e-10 (224, 224, 3) iteration n. 10 elephant prediction: 0.295875966549 tiger shark prediction: 5.48357081964e-10 (224, 224, 3) iteration n. 11 elephant prediction: 0.293566584587 tiger shark prediction: 8.2218165609e-10 (224, 224, 3) iteration n. 12 elephant prediction: 0.287314027548 tiger shark prediction: 2.980256264e-09 (224, 224, 3) iteration n. 13 elephant prediction: 0.283039689064 tiger shark prediction: 4.31411040225e-09 (224, 224, 3) iteration n. 14 elephant prediction: 0.290171891451 tiger shark prediction: 1.17616960615e-09 (224, 224, 3) iteration n. 15 elephant prediction: 0.296473622322 tiger shark prediction: 3.13109649319e-10 (224, 224, 3) iteration n. 16 elephant prediction: 0.290288418531 tiger shark prediction: 1.20128740289e-09 (224, 224, 3) iteration n. 17 elephant prediction: 0.286366939545 tiger shark prediction: 1.9773198634e-09 (224, 224, 3) iteration n. 18 elephant prediction: 0.281111091375 tiger shark prediction: 3.47145623358e-09 (224, 224, 3) iteration n. 19 elephant prediction: 0.277514845133 tiger shark prediction: 6.03332495075e-09 (224, 224, 3) iteration n. 20 elephant prediction: 0.274204820395 tiger shark prediction: 1.23049765932e-08 (224, 224, 3) iteration n. 21 elephant prediction: 0.269815891981 tiger shark prediction: 2.52712180071e-08 (224, 224, 3) iteration n. 22 elephant prediction: 0.264236658812 tiger shark prediction: 4.91879994513e-08 (224, 224, 3) iteration n. 23 elephant prediction: 0.252512007952 tiger shark prediction: 2.27731902669e-07 (224, 224, 3) iteration n. 24 elephant prediction: 0.243171066046 tiger shark prediction: 1.09528605208e-06 (224, 224, 3) iteration n. 25 elephant prediction: 0.224495723844 tiger shark prediction: 6.56170777802e-06 (224, 224, 3) iteration n. 26 elephant prediction: 0.200678929687 tiger shark prediction: 2.92676668323e-05 (224, 224, 3) iteration n. 27 elephant prediction: 0.170531451702 tiger shark prediction: 0.000120037228044 (224, 224, 3) iteration n. 28 elephant prediction: 0.138410657644 tiger shark prediction: 0.000439859926701 (224, 224, 3) iteration n. 29 elephant prediction: 0.107356064022 tiger shark prediction: 0.00153460924048 (224, 224, 3) iteration n. 30 elephant prediction: 0.0794935971498 tiger shark prediction: 0.00462667644024 (224, 224, 3) iteration n. 31 elephant prediction: 0.0558861494064 tiger shark prediction: 0.0129528734833 (224, 224, 3) iteration n. 32 elephant prediction: 0.0374869778752 tiger shark prediction: 0.0319199636579 (224, 224, 3) iteration n. 33 elephant prediction: 0.0238489937037 tiger shark prediction: 0.0695150569081 (224, 224, 3) iteration n. 34 elephant prediction: 0.0149186002091 tiger shark prediction: 0.130459263921 (224, 224, 3) iteration n. 35 elephant prediction: 0.00892047211528 tiger shark prediction: 0.218916848302 (224, 224, 3) iteration n. 36 elephant prediction: 0.00608869967982 tiger shark prediction: 0.313300192356 (224, 224, 3) iteration n. 37 elephant prediction: 0.00404852442443 tiger shark prediction: 0.418670773506 (224, 224, 3) iteration n. 38 elephant prediction: 0.00275968248025 tiger shark prediction: 0.515644490719 (224, 224, 3) iteration n. 39 elephant prediction: 0.00165919016581 tiger shark prediction: 0.614631593227 (224, 224, 3) iteration n. 40 elephant prediction: 0.000994459958747 tiger shark prediction: 0.695274949074 (224, 224, 3) iteration n. 41 elephant prediction: 0.000552096113097 tiger shark prediction: 0.76745647192 (224, 224, 3) iteration n. 42 elephant prediction: 0.000285832385998 tiger shark prediction: 0.828008115292 (224, 224, 3) iteration n. 43 elephant prediction: 0.000141570984852 tiger shark prediction: 0.878301978111 (224, 224, 3) iteration n. 44 elephant prediction: 6.99893789715e-05 tiger shark prediction: 0.915232658386 (224, 224, 3) iteration n. 45 elephant prediction: 3.54357543983e-05 tiger shark prediction: 0.938627004623 (224, 224, 3) iteration n. 46 elephant prediction: 1.53456494445e-05 tiger shark prediction: 0.958389043808 (224, 224, 3) iteration n. 47 elephant prediction: 6.75061437505e-06 tiger shark prediction: 0.971303224564 (224, 224, 3) iteration n. 48 elephant prediction: 2.81107531919e-06 tiger shark prediction: 0.98096114397 (224, 224, 3) iteration n. 49 elephant prediction: 1.21003927234e-06 tiger shark prediction: 0.987086653709 (224, 224, 3)
At the end, VGG16 is extremely confident he is looking at a tiger shark
In [72]:
preds = model.predict(input_img_data)
print("label = {}".format(np.argmax(preds)))
print('Predicted:', decode_predictions(preds, top=3)[0])
label = 3 ('Predicted:', [(u'n01491361', u'tiger_shark', 0.9913406), (u'n01494475', u'hammerhead', 0.003008757), (u'n01484850', u'great_white_shark', 0.0024799022)])
Let us look at the resulting image (we both print the original and the processed image)
In [73]:
nimg = input_img_data[0]
nimg = image.array_to_img(img)
plt.figure(figsize=(10,5))
ax = plt.subplot(1, 2, 1)
plt.title("elephant")
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.imshow(xd)
ax = plt.subplot(1, 2, 2)
plt.imshow(nimg)
plt.title("tiger shark")
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
imageplot = plt.imshow(img)
plt.show()
We just fooled the neural network!