cs231n - Lecture 2. Image Classification
Image Classification: A Core Task in Computer Vision
-
The Problem: Semantic Gap
considering image as a tensor of integers between [0,255] with 3 channels RGB -
Challenges:
Viewpoint variation
Background Clutter Illumination
Occlusion
Deformation
Intraclass variation -
An image classifier
def classify_image(image):
# Some magic here?
return class_label
- ML: Data-Driven Approach
- Collect a dataset of images and labels
- Use ML algorithms to train a classifier
- Evaluate the classifier on new images
def train(images,labels):
# Machine Learning!
return model
def predict(model, test_images):
# Use model to predict labels
return test_labels
- Nearest Neighbor Classifier
Predict the label of the most similar training imgae
Training data with labels x $\leftrightarrow$ query data \(x^*\)
distance metric \(|x,x^*| \rightarrow R\)
L1 distance \(d_1(I_1,I_2) = \sum_p |I_1^p-I_2^p|\)
pixel-wise absolute value differences $\rightarrow$ sum for scoring
import numpy as np
class NearestNeighbor:
def __init__(self):
pass
def train(self,X,y): ### Memorize training data
""" X is N x D for n example. Y is 1-dim of size N"""
self.Xtr = Xf
self.ytr = y
def predict(self,X):
num_test = X.shape[0]
Ypred = np.zeros(num_test, dtype = self.ytr.dtype)
for i in xrange(num_test):
### find closest train image for each test image, predict label of its
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis=1)
min_index = np.argmin(distances)
Ypred[i] = self.ytr[min_index]
return Ypred
-
Q: With N examples, how fast are training and prediction?
Answer: Train O(1), predict O(N)
$\rightarrow$ Bad: we want fast at prediction; slow for training is ok. -
KNN with majority vote
Distance metric: L1(Manhattan), L2(Euclidean) -
Hyperparameters
To find best value of k and best distance(metric) to use, use train-val-test approach -
However, pixel distances are not informative for KNN
very slow at test time & curse of dimensionality
Linear Classifier
-
Parametric Approach
$f(x,W)=Wx+b$; W for parameters or weights -
Interpreting a linear classifier: Geometric Viewpoint
hard cases in non-linearity