In this file we develop a Neural Network from scratch, just using mathematical libraries of numpy.

In [ ]:
import math
import numpy as np

np.set_printoptions(precision=2, suppress=True)

Let us define a couple of activation functions (sigmoid and relu) and their derivatives.

In [ ]:
##############################################
# activation functions
##############################################

def sigmoid(x): return 1 / (1 + math.exp(-x))

def sigderiv(x): return (sigmoid(x)*(1-sigmoid(x)))

def relu(x):
  if x >= 0: return x
  else: return 0

def reluderiv(x):
  if x >= 0: return 1
  else: return 0

def activate(x): return sigmoid(x)  #relu(x)
def actderiv(x): return sigderiv(x) #reluderiv(x)

A neural network is just a collection of numerical vectors describing the weigths of the links at each layer. For instance, a dense layer between n input neurons and m output neurons is defined by a matrix w of dimension nxm for the weights and a vector b of dimension m for the biases.

Supposing the network is dense, its architecture is fullly specified by the number of neurons at each layer. For our example, we define a shallow network with 8 input neurons, 3 hidden neurons, and 8 output neurons, hence with dimension [8,3,8].

We initialize weights and biases with random values.

In [ ]:
##############################################
# net parameters
##############################################

dim = [8,3,8]
l = len(dim)

w,b = [],[]

for i in range(1,l):
  w.append(np.random.rand(dim[i-1],dim[i]))
  b.append(np.random.rand(dim[i]))

For the backpropagation algorithm we also need to compute, at each layer, the weighted sum z (inputs to activation), the activation a, and the partial derivative d of the error relative to z.

We define a version of the backpropagation algorithm working "on line", processing a single training sample (x,y) at a time, and updating the nework parameters at each iteration. The backpropagation function also return the current error relative to (x,y).

An epoch, is a full pass of the error update on all training data; it returns the cumulative error on all data.

In [ ]:
##############################################
# training - on line, one input data at a time
##############################################

mu = 1

z,a,d=[],[],[]

for i in range(0,l): 
  a.append(np.zeros(dim[i]))

for i in range(1,l):
  z.append(np.zeros(dim[i]))
  d.append(np.zeros(dim[i]))

def update(x,y):
  #input                
  a[0] = x
  #feed forward
  for i in range(0,l-1):
    z[i] = np.dot(a[i],w[i])+b[i]
    a[i+1] = np.vectorize(activate)(z[i])
  #output error
  d[l-2] = (y - a[l-1])*np.vectorize(actderiv)(z[l-2])
  #back propagation
  for i in range(l-3,-1,-1):
    d[i]=np.dot(w[i+1],d[i+1])*np.vectorize(actderiv)(z[i])
  #updating
  for i in range(0,l-1):
    for k in range (0,dim[i+1]):
      for j in range (0,dim[i]):
        w[i][j,k] = w[i][j,k] + mu*a[i][j]*d[i][k]
      b[i][k] = b[i][k] + mu*d[i][k]
    if False:
      print("d[%i] = %s" % (i,(d[i],)))
      print("b[%i] = %s" % (i,(b[i],)))
  #print("error = {}".format(np.sum((y-a[l-1])**2)))  
  return np.sum((y-a[l-1])**2)

def epoch(data):
    e = 0
    for (x,y) in data:
      e += update(x,y)
    return e

Now we define same data and fit the network over them.

We want to define a simple example of autoencoder, taking in input a one-hot representation of the numbers between 0 and 7, and trying to compress them to a boolean internal representation on 3 bits.

In [ ]:
X = [[1,0,0,0,0,0,0,0],
     [0,1,0,0,0,0,0,0],
     [0,0,1,0,0,0,0,0],
     [0,0,0,1,0,0,0,0],
     [0,0,0,0,1,0,0,0],
     [0,0,0,0,0,1,0,0],
     [0,0,0,0,0,0,1,0],
     [0,0,0,0,0,0,0,1]]

def data(): return zip(X,X)  

final_error = .002
dist = epoch(data()) 

while dist > final_error:
  print("distance= %f" % dist)
  dist = epoch(data())

print("distance= %f" % dist)
for x in X:
  print("input = %s" % (x,))
  a[0] = x
  #feed forward
  for i in range(0,l-2):
    z[i] = np.dot(a[i],w[i])+b[i]
    a[i+1] = np.vectorize(activate)(z[i])
  print("hidden level = %s" % (a[i+1],))
  z[l-2] = np.dot(a[l-2],w[l-2])+b[l-2]
  a[l-1] = np.vectorize(activate)(z[l-2])
  #print("output = %s" % (a[l-1],))
distance= 0.002000
input = [1, 0, 0, 0, 0, 0, 0, 0]
hidden level = [0.02 0.   0.64]
input = [0, 1, 0, 0, 0, 0, 0, 0]
hidden level = [0.98 0.03 1.  ]
input = [0, 0, 1, 0, 0, 0, 0, 0]
hidden level = [0.  0.7 1. ]
input = [0, 0, 0, 1, 0, 0, 0, 0]
hidden level = [0.99 0.01 0.12]
input = [0, 0, 0, 0, 1, 0, 0, 0]
hidden level = [0.98 0.95 0.  ]
input = [0, 0, 0, 0, 0, 1, 0, 0]
hidden level = [0.99 1.   0.99]
input = [0, 0, 0, 0, 0, 0, 1, 0]
hidden level = [0.01 0.99 0.21]
input = [0, 0, 0, 0, 0, 0, 0, 1]
hidden level = [0.07 0.19 0.  ]

Exercises.

  1. change the specification of the network to allow a different activation function for each layer;
  2. modify the backpropagation algorithm to work on a minibatch of samples.