Active Learning
...under construction...
Idea bag
- Variance for information? Train with random images until variance is sufficiently low?
Problem Statement
We restrict our attention to the following problem:
Given:
- a training set of N images X:=\{x^{(i)}\},
- each of which falls into one of m classes C = \{c_1, \ldots, c_m\},
- an oracle able to determine the label of a specified image x \in X.
Our goal is to determine the optimal training strategy for an image classifier F.
Entropy
At i-th training step we have:
- a model f = f_i,
- a set L = L_i of images labelled by the oracle,
- a set U = U_i of unlabelled images such that X = L \sqcup U.
Given an unlabelled image x \in U, we can compute
the corresponding probability vector. Given this we define the entropy of x (with respect to the model) as
We declare the total entropy (again, with respect to f) as
Claim. The entropy of an image H_f(x) is a measure of the model's uncertainty over that image.
Categorical Cross Entropy
In the case that the image labels are known, we can define a related quantity, the (categorical) cross-entropy. Let the image x_i \in X have corresponding label y_i \in C. Concretely, let y_i \in \{0,1\}^m be the one-hot encoding of the corresponding class label.
Define the cross-entropy of x \in X (with respect to f) as
where x_i has label c_j.
We define the (total) cross entropy as
Claim. The cross entropy of an image J_f(x) is a measure of the model's accuracy on that image.
Note. To compute the cross entropy of a model we need the ground truth labels.
Strategies for image selection
The overall goal is to simultaneously reduce uncertainty while increasing accuracy.
Idea. Order the images to present to the oracle as follows:
high-entropy < (low-entropy + high cross-entropy)
High-entropy images
Given a training schedule we have to choose which images to present to the oracle. This strategy selects the images with highest entropy first.
Multi-arm strategy
We define two phases: explore and exploit. In the exploration phase we present low-entropy images to assess the model's performance. If the model is performing poorly, we might want to present the oracle with low-entropy images (based on assumption that low-entropy+high cross-entropy is more informative than high-entropy). If the model is performing well we present high-entropy images.