Lec 2: Image Classification¶
Challenges¶
Take cat classification (the images are chosen either between cats and other animals, or between different breeds of cats)
- 
Semantic Gap: There's no obvious way to convert the RGB map into the semantically meaningful category label of cat
 - 
Viewpoint Variation: All pixels change (dramatically) when the camera moves!

 - 
Intraclass Variation: Different breeds of cats have distinct RGB map. But we must find the common feature in them.

 - 
Fine-Grained Categories: Different breeds of cats are still cats, so have similar features as well. We must extract more detailed features to distinguish them.

 - 
Background Clutter: Sometimes, the images we want to recognize somehow blend into the background

 - 
Illumination Changes & Deformation & Occlusion (i.e. The Object is Blocked By Something):



- So, actually, the animal under a cushion might be a racoon. However, our common sense tell us that
- cats are likely to appear in homes
 - cats can sometimes hide under cushions
 - racoons are very unlikely to appear in homes
 
 
 - So, actually, the animal under a cushion might be a racoon. However, our common sense tell us that
 
Naive Approaches¶
We can use nearest neighbor approach.
That is,
- we use \(L^1\) norm to calculate the "distance" between the test image and all training images,
 - find the training image that has the smallest distance, and
 - give the prediction that the test image is of the same category as the nearest training image.
 
To enhance the robustness, we might use the nearest k-th neighbor.
Actually, k-th nearest neighbor algorithm is practical if you choose the right metric / right data.
For example,
- 
considering this arXiv paper recommendation system. It uses a metric called tf-idf.
 - 
Also, using feature vectors instead of raw pixels in KNN can make good predictions.
 
Set Hyperparameters¶
Always divide your dataset to three disjoint parts:
- training set: where you get your model
 - validation set: where you test your model on and tune your hyperparameters (e.g. the \(k\) in \(k\)-th nearest neighbors and the metric we use)
- NOTE: the only purpose of the validation set is to let you compare the performance of models based on different hyperparameters.
 
 - test set: you can only use it to test once on your model. If the result is bad, you fucked you; otherwise, congratulations!
 
Also, you can do cross validation. That is, split data into folds, try each fold as validation and average the results

- We do this, because averaging means better than worst case.
 
Summary¶
- In Image classification we start with a training set of images and labels, and must predict labels on the test set.
 - Image classification is challenging due to the semantic gap: we need invariance to occlusion, deformation, lighting, intraclass variation, etc
 - Image classification is a building block for other vision tasks
 - The K-Nearest Neighbors classifier predicts labels based on nearest training examples
 - Distance metric and K are hyperparameters
 - Choose hyperparameters using the validation set; only run on the test set once at the very end!