NEuroNet: Roadmap: Neural Networks Hardware

a) Hardware Implementations

Daniele Caviglia, Univ. Genova - DIBE, May 1999

A questionnaire has been developed in order to collect the actual and common perception of interested researchers on these topics.

The questionnaire has been distributed at the following conferences/conventions:

1. Wirn '99 (Vietri sul Mare, 20-22 May 1999)
2. IWANN '99 (Alicante, 2-4 June 1999)
3. SOCO '99 - IIA '99 (Genoa, 3-5 June 1999)
4. IJCNN'00 (Como, 24-27 May 2000)

It is expected that the collected responses will constitute the background on which the Technological Roadmap will be developed, or, as minimal result, could act as stimuli for subsequent deeper understanding of the two topics. As soon as the responses are collected, a comprehensive report will be generated.

A World Wide Web site is being developed which will contain and help to disseminate all the information about the Italian NEuroNet Node and allow the questionnaire to be filled in online. This will result in a continuous updating of data that will be visible to the other Nodes and contribute to

b) Neural Networks Hardware

J R Dorronsoro and E Chiozza, ADIC-IIC
Revised and updated information originating from the SIENA 9811 ESPRIT Project

1. Hardware Neural Networks

1.1 Digital Implementations:
Slice Architectures - Multi-Processor Chips - Radial Basis Functions
- Other Digital Designs
1.2 Analogue Implementations
1.3 Hybrid Designs

2. Reference Links
3. Links to Relevant Sites in Neural Network Hardware

1. Hardware Neural Networks
The majority of ANN applications in commercial use are implemented in software, and run on a conventional single processor general purpose computer. This fact is mainly due to software's flexibility; particularly important in using comparatively new and unknown technology, where conditions may be somewhat experimental. However, specialised hardware (which can either support or replace the software) offers appreciable advantages in some situations. It is worth looking in greater depth at the features of hardware ANNs because of their potential to change market opportunities.

The most common reasons for using specialised ANN hardware are as follows:

Speed. Most applications could be speeded up by the use of specialised hardware. Faster processing of repetitive calculations is one approach. ANNs have the additional feature that they are intrinsically parallel, and hardware implementations can make use of this. Hardware ANNs can therefore offer a considerable potential for speed improvements.
Cost. A hardware implementation can provide margins for reducing system costs by lowering the total component count, decreasing power requirements, and so on. This can be important in some high-volume applications such as consumer products, which are very price-sensitive.
Reliability. For very similar reasons to cost reduction, a hardware implementation can offer greater reliability in operation, in the sense of reduced probability of equipment failure.
Special operating conditions. In applications, which impose constraints such as limited physical size or weight, a hardware implementation may be essential.
Secrecy. Specialised hardware can offer better protection against 'reverse engineering' by potential competitors than the equivalent functions implemented in software.

Hardware ANN components are available in a number of different forms. The choice between these is governed by the nature of the application. The main types are:

Neurocomputers, which provide a complete system based on neural techniques. These are generally aimed at solving problems, which demand significant processing power: for example, those with large networks and/or high speed requirements.
PC accelerators and other cards, which are generally made for a common bus standard such as ISA (for PCs). These offer some of the advantages of the neurocomputer, but in a lower performance/price band.
Chips, which can be used to build either of the preceding forms or can be included along with other devices to make a complete application unit. The latter are typically used in applications where the unit is not perceived primarily as a computing device: for example, a domestic appliance.
Cell libraries, which are now becoming available to allow an appropriate level of neural functionality to be included within a dedicated chip alongside other necessary functions. This is well suited to high-volume applications.
Embedded microcomputers, which can be thought of as general purpose computers implementing a software ANN on dedicated hardware without the normal computer peripherals (screen, keyboard, disk and so on.)

Some of the accelerator cards also contain general-purpose programmable processors. Their increased performance is gained by speeding up the repetitive multiply-and-add steps, which are required in software simulations of parallel operations. For the remainder of this section we concentrate on devices in which the ANN functionality itself is directly implemented in hardware.

Hardware ANN implementations can be divided into three broad categories: digital, analogue and hybrid. Within these categories various architectures are used to implement the necessary learning and/or recognition functions.

1.1 Digital Implementations
In a digital implementation of a neural network, all the values passed around the network are represented by binary words with a characteristic word length. Digital technology offers many advantages over analogue circuitry, such as freedom from noise, the ability to use RAM to store coefficient weights for an indefinite length of time; off-the-shelf fabrication technologies, exact precision in the multiplication and addition stages and easy incorporation into existing systems. Against these advantages must be put several constraints imposed by digital technology. Principally, the speed of operation is slower, especially in the multiply and add step, and also inputs from the real world are typically analogue, and must be converted to a digital representation before processing may be carried out.

1.1.1 Slice Architectures
Slice architectures are the neural network analogue of the bit slice approach to constructing conventional digital processors. They provide building blocks from which neural networks of arbitrary size and word length may be constructed. Examples of slice architectures are the Philips Lneuro chip, the Micro Devices MD1220 and the Neuralogix NLX-420 Neural Processor.

1.1.2 Multi-Processor Chips
The multi-processor approach is to put many simple processors on a single chip. These solutions may be divided into two groups known as SIMD (Single Instruction, Multiple Data) and systolic arrays. In an SIMD design, all processors execute the same instruction in parallel on different data. In a systolic array, on the other hand, each processor repeatedly performs one step of a calculation before passing its result onto the next processor in the array.

SIMD chips include the Inova N64000, which contains 64 processing elements (PEs), the HNC 100NAP chip containing 4 PEs, and the Siemens MA-16 chip, which is designed to perform fast matrix operations.

1.1.3 Radial Basis Functions
Radial Basis Function (RBF) networks operate by manipulating prototype vectors, which define regions of influence around training data inputs. Each vector can be thought of as a sphere in a hyper-dimensional space defined by the size of the vector. RBF networks provide fast learning and straight-forward interpretation. The comparison of input vectors to stored training vectors can be very fast if some simplifications are made; for example, treating the region of influence as a hypercube instead of a hypersphere.

Commercial RBF products include the IBM ZISC (Zero Instruction Set Computer) chip and the Nestor Ni1000 chip. Although manufactured by a nominally American company, the ZISC chip family was designed in Europe.

1.1.4 Other Digital Designs
Some digital neural network chips cannot be easily classified using the categories defined above. For instance, the Micro Circuit Engineering MT19003 NISP is essentially a RISC processor, which implements seven instructions optimised for implementing multi-layer networks. The Hitachi Wafer Scale Integration chips represent another approach to implementation. Whole wafers have been designed to implement Hopfield and back-propagation networks.

1.2 Analogue Implementations
Analogue hardware neural networks have the capability of achieving high speeds and high density of implementation. Against these advantages must be set the difficulty in obtaining high precision because of differences in components due to manufacturing tolerances, variations in temperature and thermal noise, which will limit the smallest practical signal in the system. Other problems are caused by the difficulty in long-term storage of analogue weighting coefficients and implementation of analogue multipliers, which are linear over a wide range of operation.

One approach to implementing an analogue network is a neuromorphic design, where the circuitry attempts to mimic the behaviour of biological neurons and synapses as closely as possible. Living neural networks are constructed from neurons, which are less than perfect, with matching problems and non-linearities, and rely on the interconnections between them to compensate for these deficiencies. Similarly, designs such as the Synaptics Silicon Retina perform useful image processing functions by emulating the function of the cells in a biological retina. The Intel 8017NW ETANN is an Electrically Trainable Analogue Neural Network containing 64 neurons and 10 280 weights, which are stored as trapped charge on floating gates.

1.3 Hybrid Designs
By combining digital and analogue techniques, hybrid designs attempt to get the best of both worlds. In many designs external communication is achieved via digital signals, allowing easy integration with other computing systems, whilst internal processing is done wholly or partly in analogue circuitry.

The Bellcore CLNN-32 chip stores the weighting coefficients digitally but carries out simulated annealing on the network using analogue circuitry. Similarly, the AT&T ANNA chip uses charge stored on a capacitor, which is refreshed periodically from a DAC to store the weighting coefficients. The Neuroclassifier, developed at the University of Twente in the Netherlands uses digital storage of weighting coefficients together with analogue processing to achieve a performance of 20G connections per second. Another technique is to use pulse rates or pulse widths instead of voltage levels in the network. The Neural Semiconductor chip set, comprising the SU3232 Synapse unit and the NU32 Neuron unit, together with the Ricoh RN-100 are examples of hybrid systems.

2. Reference Links
Clark S Lindsey and Thomas Lindblad, "Review of Hardware Neural Networks: A User's Perspective"
Clark S Lindsey, "Neural Networks in Hardware: Architecture, Products and Applications", Lecture Pages
Jan N H Heemskerk, "Overview of Neural Hardware"
SIENA 9811 ESPRIT Project

3. Links to Relevant Sites in Neural Network Hardware
Neural Network Hardware
Status of NNW's in HEP Experiments
FAQ on NN-Usenet newsgroup coordinated by Lutz Prechelt

c) RAM-based Neural Networks or Weightless Neural Models

Thomas M Jørgensen, Risø National Laboratory

An interesting type of artificial neural networks that is generally not so widely exposed as for instance the multilayer perceptron is the so-called RAM based neural nets [1,2]. With this kind of architecture the neuron functions are stored into look-up tables. Instead of adjusting weights in the conventional sense the nets are trained by changing the contents of the look-up tables. This is the main reason why these nets have also been denoted weightless neural models. The truth is however, that these nets have weights, although they often have been constrained to take on binary values.

The fundamental type of a RAM based neural net is the n-tuple classifier proposed as early as 1959 by Bledsoe and Browning [3]. This classifier can be viewed as a distributed memory that stores information on which sub-patterns are linked to which classes. When used to classify new examples/patterns the output class is found as the class whose training examples have most sub-patterns in common with the presented example. It is basically a one-shot learning training algorithm that is used with this original architecture, and this is a notably feature compared with most other types of learning algorithms for artificial neural networks. In the seventies Alexander and Stonham [4] examined the architecture in more detail and with the advances made in integrated circuit technology they were able to construct a hardware implementation, WISARD [5], based on RAM circuits. The suitability for hardware implementations is a main characteristic of these kind of neural networks.

The revived interest in neural networks that occurred in the mid-eighties also led to an increased interest in RAM-based neural networks and many modifications of the architecture has been proposed since [6-8]. A pyramid based multilayer architecture denoted PLN [9] was suggested and this architecture also introduced a probabilistic output generator. In addition reinforcement training algorithms was introduced. Although the original n-tuple classifier was originally proposed on an engineering basis the probabilistic RAM (pRAM) proposed by Taylor was shown to correspond to noisy neurons with properties equivalent to many known properties of living neurons. Today the pRAM architectures are also key representatives of the so-called spiking neurons [10]. Like the RAM neural nets the pRAM are well suited for hardware implementations [11].

As with other types of artificial neural networks it became clear during the last decade that a deeper understanding of the mechanism behind the n-tuple based networks would involve an interpretation based on statistical concepts [12-13]. Recently, such a statistical framework was presented [14], which showed that by allowing for a more flexible decision scheme, the n-tuple classifier can be made to operate as a close approximation to the maximum á posteriori estimator.

Recently it has also been noted that the RAM based architectures have a close relationship to decision tree and ensembles of these [15-16]. With this observation, n-tuple based architectures might be seen as a bridge between Machine Learning techniques and artificial neural nets.

[1] J. Austin, RAM-Based Neural Networks, World Scientific, York, UK, 1998.

[2] Teresa B. Ludermir et al., Weightless Neural Models: A Review of Current and Past Works, Neural Computing Surveys, vol. 2, 41-61.

[3] W. Bledsoe and I. Browning, Pattern recognition and reading by machine. In Proceedings of Eastern Joint Computer Conference, pages 225-232, Boston, 1959.

[4] I. Aleksander and T. J. Stonham, Guide to pattern recognition using random-access memories, Computers and Digital Techniques, 2:29-40, 1979.

[5] I. Aleksander, W. V Thomas, and P. A. Bowden, WISARD: a radical step forward in image recognition, Sensor Review, 4(3):120-124, 1984.

[6] E. Filho, M. C. Fairhurst, and D. L. Bisset, Adaptive pattern recognition using goal-seeking neurons, Pattern Recognition Letters, 12:131-138, March 1991.

[7] K N. Gurney, Training nets of hardware realizable sigma-pi units, Neural Networks, 5:289-303, 1992.

[8] A. Kolcz and N. M. Allinson, N-tuple Regression Network, Neural Networks, vol. 9, no. 5, pp. 855-869, 1996.

[9] I. Aleksander, Canonical neural nets based on logic nodes, In Proceedings of the IEE International Conference on Artificial Neural Network, pages 110-114, London, 1989.

[10] D. Gorse and J. G. Taylor, On the equivalence properties of noisy neural and probabilistic RAM nets, Physics Letters A, 131(6):326-332, 1988.

[11] Clarkson T G, Ng C K & Bean J, Review of hardware pRAMS, Neural Network World, No 5, 551-564, 1993

[12] R. J. Rohwer, Two bayesian treatments of the n-tuple recognition method, In Proceedings of the IEE International Conference on Neural Networks, pages 171-176, UK, 1995.

[13] R. Rohwer and M. Morciniec, The theoretical and experimental status of the n-tuple classifier, Neural Networks, 11(1):1-14, 1998.

[14] T.M. Jørgensen and C. Linneberg, Theoretical Analysis and Improved Decision Criteria for the n-Tuple Classifier, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, 1999.

[15] A. Kolcz, N-tuple network, CART and bagging, in Proceedings of the WNNW'99 workshop, York.

[16] C. Linneberg and T.M. Jørgensen, Discretization Methods for Encoding of Continuous Input Variables for Boolean Neural Networks, In Proceedings of IJCNN'99, Washington

d) Technology Transfer

Contributions from: King's College London, UK; Eindhoven University of Technology, NL University of Paderborn, D; University of Turin, I

Introduction

The near and long-term future implementations of hardware-based neural networks will be shaped in three ways: (1) by developing advanced techniques for mapping neural networks onto FPGAs, (2) by developing innovative learning algorithms which are hardware-realisable, (3) by defining high-level descriptions of the neural algorithms in an industry standard to allow full simulations to be carried out and to allow fabrication by the most appropriate technique and (4) to produce demonstrators of the technology for industry.

Such designs will be of use to industry if the cost of adopting this new technology is sufficiently low for the company and if the technology is made accessible to them. The cost of implementing new technology in an application specific integrated circuit (ASIC) falls each year. Europe lags behind Japan and the USA in the application of intelligent techniques, especially in consumer electronics.

This document is intended to cover all implementations, but concentrates mainly on digital techniques.

Industrial Applications

Considerable expertise in the design of neural networks and their application to industry is available in universities throughout the European Union. Strong collaboration exists in this field, especially between universities, as expressed through existing ESPRIT programmes such as NEuroNet.

The benefits of neural networks have been recognised especially in Japan where a number of consumer goods are making use this technology. The most prominent product recently has been a microwave oven (Sharp) which uses a neural module developed in the UK. Other consumer applications of related technology include fuzzy logic modules in cameras and in vacuum cleaners.

Solutions should be tailored to the needs of industry by providing a choice of implementations from software modules, through FPGAs and semi-custom chips to full-custom VLSI. A library of neural functions should be made available in software and libraries of cells (digital, mixed and analogue) for hardware. Software libraries exist for the traditional neural network models, for example for use with MATLAB.

Hardware-based neural networks are important to industry as they offer a small-size and low-power consumption compared to software running on a workstation. Therefore such neural network controllers can be embedded in a wide range of systems both large and small.

For industry to take up university-based designs, these designs must be in an industry-standard form, for example VHDL or C++ functional code, they should be modular and they should be parameterised to allow customisation to the industry's needs.

The following European companies are known to have investigated the use of hardware-based neural network: Ericsson (UK and Sweden), Philips Research (NL), Siemens AG Munich, Siemens / Nixdorf Bonn, 3M Laboratories (Europe) GmbH Neuss, XIONICS Document Technologies GmbH Dortmund, Robert Bosch GmbH Reutlingen, Spectrum Microelectronics Siek (Germany), Fiat (Italy), Domain Dynamics Ltd (UK).

Specific application areas include the control of telecommunications networks, speech processing and recognition, speaker identification and microelectromechanical systems.

Means of Delivery

European industry should be well-placed to exploit innovative neural techniques. As these techniques are comparatively recent, they are not well-understood by many current designers in industry. Technology-transfer (TT) from the universities to industry will be achieved through the Industrial Clubs set up by NEuroNet and CoIL.

The nature of the technology to be transferred must be more than the mature research results already obtained and should include ongoing original research topics such as on-line learning and novel architectures. There will be a direct benefit to universities and industry in designing these new ideas using an industry-standard language as the new designs can be more readily compared and benchmarked with competing systems. In this way, this effort will not quickly become dated and therefore can be applied to future problems as well.

The industry which already applies neural technology, or is likely to benefit from it, is already pan-European. For example, Siemens has activities in both Germany and the UK, Ericsson has activities in Sweden and in most European states and the UK hosts Ericsson's VLSI Design Centre.

Relevant past European funding actions include GALATEA, PROMETHEUS and NERVES.

Requirements A family of neural controllers will be required in industry-standard form, for example VHDL or C++ functional code.

The controllers should be modular so that they can be readily integrated with an existing design or incorporated into a new design.

The modules should be parameterised to allow customisation to a specific need. That is, the number of inputs, outputs, and nodes can be defined later. Other features such as the architecture or learning algorithm are options, although full orthogonality is not achievable.

A number of specific tasks could be addressed using the above modular approach. In this way, case studies can be produced as a means of informing industry of the possible advantages of neural techniques. This could be developed in a similar way to the CoIL competitions, where different teams tackle the same problem.

Future Aims

Future research should aim to provide:

1) Formal Performance Criteria

For the qualitative as well as quantitative evaluation of the properties and performance of different NN implementations it is important to have formal criteria. The criteria will be application and/or model dependent, such as power consumption, speed (real-time), area, fault-tolerance (reducing test problems), scalability and robustness. Simple, objective and universal cost-functions for performance evaluation should be used.

The following criteria should be used to select the appropriate technology:

synaptic weight precision
fault-tolerance
complexity of the transfer-function and
the learning rule
signal format (time-continuous, analogue, pulse-width modulated or binary signals)

2) VLSI-friendly Learning Algorithms

Because smart controllers should be adaptive, continuous learning algorithms should be developed and enhanced, and which can be efficiently implemented in silicon.

Optimal VLSI implementations of neural networks should address:

optimal learning of particular classes of functions;
innovative learning strategies for minimising the hardware requirements (minimum area and/or delay), and
satisfying constraints on fan-in and range of weights and thresholds;

3) Embedding of NNs into Systems

The information processing chain may start with analogue sensors and end with analogue effectors, or the system may be entirely digital. The information processing in between input and output has to be optimised at the system level rather than optimising each processing step by itself. In this context the efficient integration of NNs into hybrid systems should be addressed.

The performance of a neural chip itself has been found to be less important than the data flow into and out from the chip. Structures which ease this bottleneck should be evaluated and implemented.

4) Realisations

Simple realisations such as 2-transistor circuits for analogue processing up to functional digital modules should be designed in industry-standard forms. They will be generic in size, i.e. the design should be parameterized so that a user of the design should be able to instantiate a particular neural network with the required number of synapses/neurons.

A data sheet with all relevant characteristics such as speed, size and power consumption should be produced and verified versus the VHDL model for efficient evaluation by the end user. A list of relevant Neuro-building-blocks should be compiled covering the needs of the industrial users.

Software packages which will map a high level description of a neural network onto an FPGA may not need to be written specifically, as such tools are available from silicon vendors.

The realisations may be placed into 3 categories: FPGA, analogue and digital ASICs.

European Expertise

The following expertise is available within the EU, related to hardware and neural networks as an example of the application areas. It is not an exhaustive list. It is sorted into two areas, applications and implementations.

Applications

Communications systems, demodulators, intelligent antennas, semiconductors for the space environment.
Object identification, image compression, HDTV, medical and biometric image analysis, thermal image processing systems, materials analysis.
Character recognition, speaker identification, speech recognition, speech enhancement, handwriting recognition.
Information retrieval, exploratory data analysis, quality control, function learning, automatic control, economic prediction, electrical consumption prediction, knowledge extraction, intelligent controls, automatic verification of VLSI and WSI circuits.
Stochastic learning algorithms, Content Addressable Memory, massively parallel processors, pulse-stream computation.

Implementations

VLSI digital hardware, VLSI analogue hardware
On-chip learning, reinforcement training, feed-forward training, stochastic training
Distributed and heterogeneous processor architecturfses, fault tolerant systems
Analogue implementation of neural networks, pulse-stream systems, on-chip weight perturbation algorithms
Optical neural techniques
Analogue and mixed hardware implementations of neural networks using time-continuous or coherent pulse-width modulation techniques,
Massively parallel computers, silicon implementations of neural networks, neuro-fuzzy systems.