/ CS231N

cs231n - Lecture 2. Image Classification

Image Classification: A Core Task in Computer Vision

  • The Problem: Semantic Gap
    considering image as a tensor of integers between [0,255] with 3 channels RGB

  • Challenges:
    Viewpoint variation
    Background Clutter Illumination
    Occlusion
    Deformation
    Intraclass variation

  • An image classifier

def classify_image(image):  
# Some magic here?  
	return class_label  
  • ML: Data-Driven Approach
    1. Collect a dataset of images and labels
    2. Use ML algorithms to train a classifier
    3. Evaluate the classifier on new images
def train(images,labels):  
	# Machine Learning!  
	return model  

def predict(model, test_images):  
	# Use model to predict labels  
	return test_labels  
  • Nearest Neighbor Classifier
    Predict the label of the most similar training imgae
    Training data with labels x $\leftrightarrow$ query data \(x^*\)
    distance metric \(|x,x^*| \rightarrow R\)
    L1 distance \(d_1(I_1,I_2) = \sum_p |I_1^p-I_2^p|\)
    pixel-wise absolute value differences $\rightarrow$ sum for scoring
import numpy as np
class NearestNeighbor:  
	def __init__(self):  
		pass

	def train(self,X,y):  ### Memorize training data
		""" X is N x D for n example. Y is 1-dim of size N"""  
		self.Xtr = Xf
		self.ytr = y  

	def predict(self,X):  
		num_test = X.shape[0]  
		Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
		
		for i in xrange(num_test):
			### find closest train image for each test image, predict label of its
			distances = np.sum(np.abs(self.Xtr - X[i,:]), axis=1)
			min_index = np.argmin(distances)  
			Ypred[i] = self.ytr[min_index]
		return Ypred
  • Q: With N examples, how fast are training and prediction?
    Answer: Train O(1), predict O(N)
    $\rightarrow$ Bad: we want fast at prediction; slow for training is ok.

  • KNN with majority vote
    Distance metric: L1(Manhattan), L2(Euclidean)

  • Hyperparameters
    To find best value of k and best distance(metric) to use, use train-val-test approach

  • However, pixel distances are not informative for KNN
    very slow at test time & curse of dimensionality

Linear Classifier

  • Parametric Approach
    $f(x,W)=Wx+b$; W for parameters or weights

  • Interpreting a linear classifier: Geometric Viewpoint
    hard cases in non-linearity