1. Motivation

Artificial Neural Networks (ANN) are networks made of artificial neurons, inspired by the principles of biological neural networks. Biological Neural Networks like the human brain are particularly powerful information processing systems when it comes to tasks like learning or pattern recognition. Given the example of a text recognition task, it is usually not a problem for humans to read written language, even if the letters are blurred or written sloppily. A computer on the other hand may often return unusable results in that case. So when inputs are ambiguous or impaired, artificial neural networks often prove to be a good alternative. 

If we look at current speech recognition systems, artificial neural networks are used in many different ways (see Artificial Neural Networks for Feature ExtractionArtificial Neural Networks for SR). As a basis for further reading on the topic, this article will give an introduction into the working principle of artificial neural networks. Core components will be explained, the idea of training a neural network will be established and some simple examples will be provided. 

2. Principles of Artificial Neural Networks

2.1 The Artificial Neuron

Biological neural networks consist of countless neurons connected by synapses. In simplified terms a biological neuron consists of several input synapses and an output synapse. If the summed input to an neuron surpasses a certain threshold, the neuron will transmit an electrical signal through its output synapse to connected neurons. Artificial neural networks imitate this behavior.

The foundation for artificial neural networks is the artificial neuron, also called unit. Every unit consists of

  • A state of activation yk, which is equivalent to the output of the unit
  • The inputs yj from connected units. Each connection between units is weighted with a factor wjk, determining the influence the unit j has on unit k
  • An offset parameter θk
  • A propagation rule to determine the effective input sk from the weighted inputs
  • An activation function Fk to determine the new state of activation based on the effective input sk(t) and the current state of activation yk(t)
Figure 1: Artificial neuron with weighted summation as propagation function [1]

2.2 Propagation Function and Activation Function

In many cases the weighted sum of the inputs, added with an offset term θk, is used as the propagation rule:

The activation function uses the effective input sk(t) and the current state of activation yk(t) to calculate the new state of activation:

Common activation functions are binary (sgn), linear or semi-linear, and sigmoid functions. Usually a threshold is used. Given the example of the signum function (figure 2), the output of a unit would be -1 if the input to the activation function is below 0, and +1 if it is above zero.

 

Figure 2: Different activation functions [1]

 

It is common to introduce some sort of limit to the output of an activation function. In that way a possible overflow in recurring networks (see section 3.3) is prevented.

2.3 The Perceptron

The most simple form of a neural network is the perceptron. The perceptron itself only consists of one artificial neuron with adjustable weights and a threshold. This artificial neural network is already able to implement the logical AND function (see figure 3).

Figure 3: Binary AND function implemented with a perceptron [5]

 

The line g divides the sample space in two distinct areas. Given the binary input x1,x2 = 1 or 0, the weights w1,w2 = 1, an offset -1.5, weighted summation as propagation, and the Heavyside step function as activation function, the output of the unit is defined as:

y = σ (-1.5 + x1 + x2)

with σ (x) = 0 ∀x < 0 and σ (x) = 1 ∀x ≥ 0

Thus y will be 1 only if both x1 and x2 are equal to 1.

3 Multilayer Neural Networks

3.1 The Layer Model

Single layer artificial neural networks like the perceptron are limited in their application. While the logical AND function is implementable using the perceptron, the XOR function for example is not. To solve more complex problems with neural networks, we need to introduce the layer architecture.

Every unit (or artificial neuron) can be classified into one of these three groups:

  • Input units - Units that receive input signals from the outside world
  • Hidden units - Units in between input and output units
  • Output units - Units that output signals to the outside world
Figure 4: Multilayer ANN: input / hidden /output units (left to right)

 

According to their hierarchical structure, the units can be grouped into layers. A multilayer neural network consists of one input layer, zero or multiple hidden layers, and one output layer. Figure 4 shows a simple neural network with two hidden layers. It has been shown that every problem, which is solvable with a multi layer ANN, can be solved with only one hidden layer -  as long as the number of units on each layer is sufficiently great.

3.2 The XOR Problem

As mentioned before, the logical XOR function can not be implemented using the simplest form of an perceptron. The discovery of this limitation even led to stagnation in the research on artificial neural networks in the 1960s. It was only when a solution to this problem was found, that research gathered pace again. And the solution lay in the usage of multiple layers of perceptrons.

Figure 5: Logical XOR function implemented with two layers of perceptrons [5]

 

Figure 5 illustrates the implementation of a logical XOR function with a simple neural network. In this case it can be seen that two lines are necessary to divide the sample space accordingly. The properties of each perceptron or unit are according to section 2.3, with only the weighted inputs and offset adjusted.

Thus y will be 1 if, and only if, either x1 or x2 is 1.

3.3 Network Topologies

Up until now all the artificial neural networks we have considered were strictly feed forward. That means information flows from the input units, through hidden units, to the output units in a sequential manner. In many cases it is however necessary to introduce feedback connections. In consequence we distinguish two kinds of neural network topologies:

  • Feedforward networks, where the data flow from input to output units is strictly feed forward. The data processing can extend over multiple (layers of) units, but no feedback connections are present, that is, connections extending from outputs of units to inputs of units in the same layer or previous layers.
  • Recurrent networks that do contain feedback connections. Contrary to feed-forward networks, the dynamical properties of the network are important. In some cases, the activation values of the units undergo a relaxation process such that the network will evolve to a stable state in which these activations do not change anymore. In other applications, the change of the activation values of the output neurons are significant, such that the dynamical behaviour constitutes the output of the network [1, page 17]

4 Training of Artificial Neural Networks

Through training processes artificial neural networks are enabled to learn how to react on certain input patterns. Usually the purpose of training is to adjust the weights between connected units. There are two ways of training a neural network:

  • Supervised training: In supervised training a pre-defined pattern is presented to the neural network. The weights between the units of the network are then adjusted, such that the output layer produces a desired result.
  • Unsupervised training: In unsupervised training there are no predefined input/ouput pairs. The network reacts on statistically salient features on the input layer and creates its own pattern representations of the input stimuli.

In the following section two basic training algorithms will be introduced. The Delta rule is a representative for supervised training, while the Hebbian theory can also be used in an unsupervised approach.

4.1 The Delta Rule

The Delta rule is based on the comparison between the desired and the observed activation of an output unit:

δ = ai (desired) - ai (observed) [2]

If the desired activation is greater then the observed one (δ > 0), the weights to the connected sending units need to be increased. If the desired activation is smaller then the observed one (δ > 0), the weights to the connected sending units need to be decreased (assuming positive input values in all cases) 

Δwij = ε * δi * a[2]

Δwij - change of weight between connected units i and j
ε - predifined learning factor 
δi - delta value of ouput unit i 
aj - activation of sending unit 

The delta rule may only be applied to neural networks without hidden layers. An extension of the delta rule to multiple layers and recurrent networks is the Backpropagation algorithm (see Training Restricted Boltzmann Machines).

4.2 The Hebbian Theory

The Hebbian theory is a very basic approach which still contains a lot of biological plausibility. In simplified term its message can be reduced to "What fires together, wires together". According to Hebb the weight between two units should be adjusted if they are commonly active at the same time. The simple formula for the Hebbian theory is:

Δwij = ε * ai * a[2]

Δwij - change of weight between connected units i and j
ε - predifined learning factor
aj/i - activation of units i and j

References

[1] B. Krose and P. van der Smagt, "An Introduction to Neural Networks," The University of Amsterdam, November 1996.

[2] G. D. Rey and K.F. Wender, Neuronale Netze. Huber, April 2008.

[3] C.M. Bishop, Pattern recognition and machine learning. Springer, October 2007.

[4] Einführung in Neuronale Netze [Online]. Available: http://wwwmath.uni-muenster.de:8010/Professoren/Lippe/lehre/skripte/wwwnnscript/, 2007.

[5] Künstliche Neuronale Netze [Online]. Available: http://wiki.ldv.ei.tum.de/tiki-index.php?page=K%C3%BCnstliche+Neuronale+Netze, 2010.


Contents