Isolated spoken digit recognition using a leaky integrator reservoir

 This code is available as examples/analog_speech.py.

In this example, we will use a reservoir consisting of leaky integrator neurons to perform classification of isolated spoken digits (zero to nine, i.e. ten classes). The speech was already pre-processed using a cochlear model by R.F. Lyon, which results in a 77-dimensional set of input signals per digit.

We start by constructing the dataset:

[inputs, outputs] = Oger.datasets.analog_speech(indir="/afs/elis/group/snn/speech_corpora/ti46_subset/Lyon_decimation_128")

Here, outputs is a ten-dimensional signal, with +1 encoding the current digit class and -1 for all the remaining classes.

Next, we construct the nodes and create a flow:

input_dim = inputs[0].shape[1]
reservoir = Oger.nodes.LeakyReservoirNode(input_dim=input_dim, output_dim=100, input_scaling=1, leak_rate=0.1)
readout = Oger.nodes.RidgeRegressionNode(0.001)
mnnode = Oger.nodes.MeanAcrossTimeNode()
flow = mdp.Flow([reservoir, readout, mnnode])

Note how we pass a leak_rate=0.1 argument to the constructor. The argument passed to the ridge regression node is the regularization parameter, to ensure good generalization on the test data. This should really be optimized per reservoir and dataset, but to reduce computation time we pre-specify it here (so the result will be sub-optimal).

We also use an additional utility node, the MeanAcrossTimeNode, which takes the mean value over time of each of its input signals. This is needed because we need a single output vector for classification purposes.

We then determine how many samples to use for training and testing, and train the flow:

train_frac = .9
n_samples = len(inputs)
n_train_samples = int(round(n_samples * train_frac))
n_test_samples = int(round(n_samples * (1 - train_frac))) 
flow.train([None, \
                zip(inputs[0:n_train_samples - 1], \
                    outputs[0:n_train_samples - 1]), \
                [None]])

Note how we pass a list of three iterables now, one for each node in the flow.

Finally, we apply the flow to the test data and compute the test error:

for xtest in inputs[n_train_samples:]:
        ytest.append(flow(xtest))

ytestmean = sp.array([sp.argmax(sample) for sample in ytest]) 

We can now the ConfusionMatrix class to compute several metrics and error measures that are typically used in classification problems.

First, we construct the 10-class confusion matrix using the output of the system and the desired output labels.

confusion_matrix = ConfusionMatrix.from_data(10, ytestmean, ymean) # 10 classes

Many commonly used error measures and metrics can now be obtained as properties of the ConfusionMatrix:

print "Error rate: %.4f" % confusion_matrix.error_rate # this comes down to 0-1 loss
print "Balanced error rate: %.4f" % confusion_matrix.ber

We can also reduce the 10-class problem to 10 binary classification problems, where each class in turn is chosen to be the positive class, and all the others are grouped into the negative class. The 'binary' method of the ConfusionMatrix class generates the corresponding binary confusion matrices. Binary confusion matrices have some additional methods to compute measures like precision, recall, etc. which do not generalise to multiclass-classification problems.

print "Per-class precision and recall"
binary_confusion_matrices = confusion_matrix.binary()
for c in range(10):
    m = binary_confusion_matrices[c]
    print "label %d - precision: %.2f, recall %.2f" % (c, m.precision, m.recall)    

It is also possible to create functions that compute error measures based on confusion matrices using the 'error_measure' method of the ConfusionMatrix class. This method returns a function that takes observed outputs and desired outputs as input, and computes the desired error measure.

ber = ConfusionMatrix.error_measure('ber', 10) # 10-class balanced error rate function
print "Balanced error rate: %.4f" % ber(ytestmean, ymean)

Finally, we can visualise the confusion matrix using the plot_conf function. In most cases it is recommended to balance (row-normalise) a confusion matrix before visualisation, particularly when the number of training samples per class is not the same for every class. The ConfusionMatrix class has a 'balance' (or 'normalise_per_class') method for this purpose.

plot_conf(confusion_matrix.balance())