Learning algorithms for classification-a comparison on handwritten digit recognition

Keywords: LeNet, Convolutional Neural Networks, Handwritten Digit Recognition

All the figures in this post come from ‘Learning algorithms for classification-a comparison on handwritten digit recognition’[1]

Basic Works

  1. Convolutional network[2]


  1. Raw accuracy, training time, recognition time and memory requirements should be considered in classification.
  2. From experiments and comparison, the results can illuminate which one is better
    1. Selected competitors(Baseline)
      • Linear Classifier
      • Nearest Neighbor Classifier
      • Large Fully Connected Multi-Layer Neural Network
      • LeNet(1,4,5)
      • Boosted LeNet 4
      • Tangent Distance Classifier(TDC)
      • LeNet 4 with K-Nearest Neighbors
      • Local Learning with LeNet 4
      • Optimal Margin Classifier(OMC)


  1. Listing some data set that was used in recognition
  2. Details in comparison:
    1. large fully connected multi-layer neural network
      • It is over-parameterized but still works well - some build-in “self-regularization” mechanism. This is due to the nature of the error surface, gradient descent training invariably goes through a phase where the weights are small. Small weights cause the sigmoid(activation function) to operate in the quasi-linear region, making the network essentially equivalent to a low-capacity, single-layer network. (need more empirical evidence)
    2. LeNet 1
      • Convolutional neural network
      • first few layers:
        • local ‘receptive field’
        • the output of the convolution is called ‘feature map’
        • followed by a squashing function
      • share a single weight vector(weight sharing technique)
        • reduce the number of free parameters
        • shift-invariance
        • need multiple feature maps, extracting different features types from the same image
        • weights are trained by gradient descent
      • local, convolutional feature maps in hidden layers
        • increasing complexity and abstraction
        • higher-level features require less precise coding of their location
        • local averaging and subsampling is used to reducing the resolution of the feature map
        • invariance to distortions
        • the resulting architecture is a ‘bi-pyramid’
      • 1.7% error in the MNIST test
    3. LeNet 4
    • expanded version of LeNet 1 input(28x28 to 32x32)
    • 1.1% error in MNIST test
    1. LeNet 5
    • more feature maps
    • a large fully-connected layer
    • a distributed representation to encode the categories at the output layer rather than “1 of N”
    • 0.9% error in the MNIST test
    1. Boosted LeNet 4
    • insufficient data to train 3 models
    • affine transformation and line-thickness variation to augment the training set
    • 0.7% error in the MNIST test

Experiment and Result

Personal Summary

This paper is another paper of LeCun after he trained the convolutional neural network by BP in 1990. And the experiment is the kernel in this paper. And some techniques mentioned in this paper are still employed today, such as training data augment. In 1995, LeCun had tried to improve the CNNs by different activation functions, different numbers of feature maps in each layer, different training data, and different combining methods. And he concluded as the training databases growing the CNNs will become more striking.


  1. LeCun, Yann, L. D. Jackel, Léon Bottou, Corinna Cortes, John S. Denker, Harris Drucker, Isabelle Guyon et al. “Learning algorithms for classification: A comparison on handwritten digit recognition.” Neural networks: the statistical mechanics perspective 261 (1995): 276. ↩︎

  2. LeCun, Yann, Bernhard E. Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne E. Hubbard, and Lawrence D. Jackel. “Handwritten digit recognition with a back-propagation network.” In Advances in neural information processing systems, pp. 396-404. 1990. ↩︎