Artificial Neural Networks

Created Sep 28, 2000 | Updated Jan 28, 2002

Artificial Neural Networks, known affectionately as "networks", constitute a class of signal processing algorithms ¹ that bear some, however remote, resemblance to wetware neural networks, such as the nervous systems of animals (like the human brain). Still, this is not really artificial intelligence, at least not on its own, and this is not a good mathematical model of actual physico-chemical brains.

Several scientific communities ² contribute to the theory of artificial neural networks, and most of these have their own viewpoints on them.

Artificial neural networks have proven to be practical, robust tools, that are used in many applications: distinguishing bombs and weapons from alarm-clocks in semi-automatic airport x-ray, translating spoken words into computer commands and the control of autonomous robots to mention a few. Some of the network theory helps by defining a conceptual vocabulary that enables scientists to more accurately describe the vastly more complex phenomenon that we observe in e g our own brains.

As usual, there are problems as well. Even if you have a nice network that does its job, it is almost impossible to tell just how it does it. This goes along the same lines as asking a natural talent how she does whatever she is good at. They just do it. Artificial neural networks also typically involve the use of non-linear optimisation (explained later), and are then largely dependent on the performance of this rather difficult procedure.

Architecture and training

A neural network is composed of individual, locally connected units termed neurons. Typically these sum up the effects of their respective input connections, weigth them according to their own fashion and transform this weighted sum with a non-linear function. The latter function is often termed activation function, in analogy with the biological neuron.

Connecting several layers in succession is a great idea. One can show that a network with only two layers of adaptive weights suffices to model any funcution, given "enough" neurons. This is just theory, however, and one should note that it only grants the exsistance of such a network solution - not the ability to actually find it! This requires some kind of learning procedure.

Changing the network parameters is termed network training. This procedure is also refered to as weight adaption, or in short "learing", since thats really whats going on. One can think of learning as attempting to store data in a way that allows generalisation.

Training of a network can be done by most types of standard, non-linear optimisation algorithms such as gradient descent or BFGS ³. To understand this, picture the network parameters as latitude and longditude in a large⁴ rolling landscape where the altitude represents how far from the desired answer the output is. The optimisation algorithm then strolls along on the surface trying to find as low a valley as possible. A very neat feature developed in this context is "error-backpropagation". This solves the problem of assigning the blame for bad prediction to individual neurons⁵. Neurons are very local creatures, remember, but using differentiable non-linearities means that we can use the chain-rule ⁶ to determine who-did-what to the final result. In the landscape analogy this corresponds to computing how steep the terrain around the current location is.

Supervision

Roughly, one can divide the learning procedures of most learning systems, including artificial neural networks, into supervised or unsupervised learning. In the case of supervised learning, a set of training samples is used. When no book-of-answers is present, training is unsupervised.

For instance, attempting to predict tomorrows Dow-Jones index using as input variations in interest rates, certain key-numbers of the largest companies and the ammount on transpiration present on Mr. Greenspans' hands would be a case of supervised learning. We have both input data and target data.

If we instead present a large number of spoken Finnish words to a network, and modify the network slightly according to some local, competitive rules with each new word, we end up with neurons that each recognize one Finnish phonem. This is a case of unsupervised learning: no-one told the network what phonems there are - it found them itself. The unsupervised case includes the many interesting self-organisational techniques, such as Kohonens self-organising feature maps. Self organisation, although being mathematically rather simple, is robust and seems to be frequently employed in carving the layout of several systems in at least mammal wetware networks, including hearing, vision and language processing. We may have discovered these principles, but they probably played a key role in turning us into what we are, opposing the forces of increasing entropy by creating global order out of local interactions ⁷.

Generalisation

The most powerful property of an artificial neural network is the ability to generalise, to not only reprouduce previously seen data, but also provide correct predictions in similar "situations". For instance, feeding a network with a few words, pronounced by a few speakers, may allow it to recognise these words when spoken by a previously unheard speaker.

History

In the early days (the 1960's) guys like Rosenblatt and Widrow built fascinating, linear and mostly single-layer networks using lots and lots of transistors. The developement took an embarrasing halt shortly afterwards after proof that this type of networks were fun, but rather useless. It was not until the 1980's they became popular again, much due to a paper Rumelhart published in a very influential book ⁸. This paper brought attention to things that made artificial neural networks surpass their believed limitations by introducing differentiable non-linearities ⁹ and multi-layer networks already. This had really been discovered by some guys already in the 70's. Nobody really remembers these guys, though ¹⁰.

Dynamics

Although the feed-forward network architechtures are most employed in engineering applications, allowing some down-stream neurons to connect to up-stream ones adds an interesting feature: dynamics. Such recurrent networks exhibit the characteristics of complex, adaptive systems (see e g complexity theory). This can be used for various useful things, but is mainly invoked to de-blur noisy images or recognise the handwriting of engineering students¹¹. This might also be a possible link to artificial intelligence, since complex dynamical systems are more or less prerequisites for sentience. But, hey, a waterfall is also a complex dynamical system, so don't hope too much.

Bayesian interpretation

Bayesian statistics can be used to explain and improve much of artificial neural network theory and practice. Notably, a standard feed-forward network under certain circumstances can be shown to approximate aposteriori class-conditional probabilities (how likely is this guy to belong to that class, given all that we know about him?). Bayesian networks allows keeping track of errors in data, and estimating the likelihood of error in network estimates.

Learning more

If you read only one book on neural networks, I'd recommend that book to be Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press.

For more online info, visit the artificial neural network newsgroup FAQ at e g ANN-FAQ

¹One might describe an algorithm as an ordered set of mathematical operations; kind of like a recipie for solving a specific problem. Typically, they are translated to some programming language and run on a computer. Alternative approaches involve the use of hordes of graduate students or designing VLSI circutry. The former is slow and rather errorprone, wheras the latter is amazingingly fast and interesting.²Communities such as electrical engineering, signal processing, mathematical statistics, computational science, complexity theory, artificial intelligence and even some quantitative neurobiologists. This has the effect that if you speak with an artificial intelligence researcher, she might tell you that networks can be seen as one building block that can be used in forging novel intelligence, whereas the guys down at electrical engineering will argue that this is just a case of non-parametric function-approximation with adaptive basis-functions. A frequentist statistican might mumble about Bayesian non-stringency, and a quantitative neurobiologist could speak for hours and hours on how unfathomably complex the real wetware networks actually are. Take your pick; most of these people can tell really cool stories!³Non-linear optimisation method named after its creators Broyden, Fletcher, Goldfarb and Shanno. The acronym is easier to remember, really. Oh, and it is quite nearly as powerful as Quake-players might assume. ⁴ and often insanely multi-dimensional⁵Also known as the credit assignment problem.⁶More basic calculus: if a function depends upon another, the slope of it also does, and typically in a predictable way⁷See e g the entry on complexity theory.⁸Rumelhart, Hinton and Williams (1986) Learning internal representations by error propagation, in Parallell Distributed Processing: Explorations in the Microstructure of Cognition, Cambridge, MA: MIT Press.⁹Ok, time to recall some simple calculus: if you can determine the slope of a curve in most positions, mathemagicans term this curve, or function, differentiable.¹⁰For instance Werbos (1974), but he did it in his dissertation, and noone really reads dissertations either. ¹¹The systems used in mail-sorting are typically non-dynamical, though. And, no, so far I haven't heard of any artificial neural network able to read MD handwriting. That's part of why they invented the handheld recording machine and the handheld computer, remember?

h2g2 The Hitchhiker's Guide to the Galaxy: Earth Edition

Find h2g2 Entries:

The
Hitchhikers Guide
To The Galaxy

Earth Edition

Artificial Neural Networks

Architecture and training

Supervision

Generalisation

History

Dynamics

Bayesian interpretation

Learning more

Conversations About This Entry

Title

Latest Post

Entry

Infinite Improbability Drive

Read a random Edited Entry

Written and Edited by

References

h2g2 Entries

External Links

Disclaimer

Write an Entry

Help

About Us

Contact Us

Follow Us

Statistics

Other Stuff

h2g2 The Hitchhiker's Guide to the Galaxy: Earth Edition

Find h2g2 Entries:

TheHitchhikers GuideTo The Galaxy

Earth Edition

Architecture and training

Supervision

Generalisation

History

Dynamics

Bayesian interpretation

Learning more

Conversations About This Entry

Title

Latest Post

Entry

Infinite Improbability Drive

Read a random Edited Entry

Written and Edited by

References

h2g2 Entries

External Links

Disclaimer

Write an Entry

Help

About Us

Contact Us

Follow Us

Statistics

Other Stuff

The
Hitchhikers Guide
To The Galaxy