Lec 2: Image Classification¶

Challenges¶

Take cat classification (the images are chosen either between cats and other animals, or between different breeds of cats)

Semantic Gap: There's no obvious way to convert the RGB map into the semantically meaningful category label of cat
Viewpoint Variation: All pixels change (dramatically) when the camera moves!
Intraclass Variation: Different breeds of cats have distinct RGB map. But we must find the common feature in them.
Fine-Grained Categories: Different breeds of cats are still cats, so have similar features as well. We must extract more detailed features to distinguish them.
Background Clutter: Sometimes, the images we want to recognize somehow blend into the background
Illumination Changes & Deformation & Occlusion (i.e. The Object is Blocked By Something):
- So, actually, the animal under a cushion might be a racoon. However, our common sense tell us that
  - cats are likely to appear in homes
  - cats can sometimes hide under cushions
  - racoons are very unlikely to appear in homes

We can use nearest neighbor approach.

That is,

we use \(L^1\) norm to calculate the "distance" between the test image and all training images,
find the training image that has the smallest distance, and
give the prediction that the test image is of the same category as the nearest training image.

To enhance the robustness, we might use the nearest k-th neighbor.

Actually, k-th nearest neighbor algorithm is practical if you choose the right metric / right data.

For example,

considering this arXiv paper recommendation system. It uses a metric called tf-idf.
Also, using feature vectors instead of raw pixels in KNN can make good predictions.

Always divide your dataset to three disjoint parts:

training set: where you get your model
validation set: where you test your model on and tune your hyperparameters (e.g. the \(k\) in \(k\)-th nearest neighbors and the metric we use)
- NOTE: the only purpose of the validation set is to let you compare the performance of models based on different hyperparameters.
test set: you can only use it to test once on your model. If the result is bad, you fucked you; otherwise, congratulations!

Also, you can do cross validation. That is, split data into folds, try each fold as validation and average the results

In Image classification we start with a training set of images and labels, and must predict labels on the test set.
Image classification is challenging due to the semantic gap: we need invariance to occlusion, deformation, lighting, intraclass variation, etc
Image classification is a building block for other vision tasks
The K-Nearest Neighbors classifier predicts labels based on nearest training examples
Distance metric and K are hyperparameters
Choose hyperparameters using the validation set; only run on the test set once at the very end!