Conversational AI 21-INTRODUCTION TO THE TECHNOLOGY OF NEURAL DIALOGUE
The input to the encoder is the sequence of words spoken or typed by the user. However, in order for the encoder to be able to process these words, they have to be converted into numbers. As a result of this process, which is known as word embedding, each word is represented by a unique real-number vector that captures its meaning and its relationship to the other words in the vocabulary. One-hot encoding is widely used in machine learning to encode categorical data such as the names of different countries.
RECURRENT NEURAL NETWORKS (RNNS)
There are many different neural network architectures. Recurrent Neural Network (RNN)s have been used widely in speech and language applications as they can handle sequential inputs of variable length in contrast to a traditional feed-forward network. A standard feed-forward neural net, Recurrent Neural Network (RNN)s operate over sequences of vectors and so they have several advantages for the processing of sequences of text. First, because the information cycles through a loop, RNNs are able to capture information about the previous inputs so that they maintain a sort of memory that is useful for subsequent processing. A second advantage is that by processing inputs as a sequence, they are able to capture information about the ordering of the input that may be relevant to the processing of the input sequence as a whole.
LONG SHORT-TERM MEMORY UNITS
Long Short-term Memory (LSTM) units use two methods to tackle the issue of context in RNNs: they provide a mechanism for the network to forget information that is no longer needed and a second mechanism to add information that is likely to be needed later. This is done by adding an additional context layer to the network called the cell state that contains gates that control the flow of information into and out of the cell state. An LSTM has three gates. The first gate controls what information can be removed, the second what information is to be stored, and the third what information to output.
THE ENCODER-DECODER NETWORK
Encoding involves the use of neural networks, usually RNNS, LSTMs, and GRUs, and more recently Transformer networks. Usually, stacked networks are used and the output representation is taken from the top layer of the stack. Decoding takes one element at a time to produce an output sequence, using the context vector that represents the final hidden state of the encoder. While the encoder-decoder network has been applied successfully to machine translation, it is more difficult in dialogue applications as there can be a wide range of appropriate responses to the input in a dialogue as opposed to the phrase alignment between the source and target sequences in machine translation. Also, dialogue responses can be conditional on information from a background database, API, or other contextual information.