Active Learning

...under construction...

Idea bag

Variance for information? Train with random images until variance is sufficiently low?

Problem Statement

We restrict our attention to the following problem:

Given:

a training set of $N$ images $X:=\{x^{(i)}\}$ ,
each of which falls into one of $m$ classes $C = \{c_1, \ldots, c_m\}$ ,
an oracle able to determine the label of a specified image $x \in X$ .

Our goal is to determine the optimal training strategy for an image classifier $F$ .

Entropy

At i-th training step we have:

a model $f = f_i$ ,
a set $L = L_i$ of images labelled by the oracle,
a set $U = U_i$ of unlabelled images such that $X = L \sqcup U$ .

Given an unlabelled image $x \in U$ , we can compute

$f(x) = [p_1, \ldots, p_m],$

the corresponding probability vector. Given this we define the entropy of x (with respect to the model) as

$H_f(x) = - \sum_{i=1}^m p_i \ln(p_i).$

We declare the total entropy (again, with respect to $f$ ) as

$H_f = \frac{1}{N}\sum_X H(f(x))$

Claim. The entropy of an image $H_f(x)$ is a measure of the model's uncertainty over that image.

Categorical Cross Entropy

In the case that the image labels are known, we can define a related quantity, the (categorical) cross-entropy. Let the image $x_i \in X$ have corresponding label $y_i \in C$ . Concretely, let $y_i \in \{0,1\}^m$ be the one-hot encoding of the corresponding class label.

Define the cross-entropy of $x \in X$ (with respect to $f$ ) as

$J_f(x) = - y_i \cdot \ln(f(x)) = - \ln(p_j),$

where $x_i$ has label $c_j$ .

We define the (total) cross entropy as

$J_f = \frac{1}{N} \sum_X J_f(x)$

Claim. The cross entropy of an image $J_f(x)$ is a measure of the model's accuracy on that image.

Note. To compute the cross entropy of a model we need the ground truth labels.

Strategies for image selection

The overall goal is to simultaneously reduce uncertainty while increasing accuracy.

Idea. Order the images to present to the oracle as follows:

high-entropy < (low-entropy + high cross-entropy)

High-entropy images

Given a training schedule we have to choose which images to present to the oracle. This strategy selects the images with highest entropy first.

Multi-arm strategy

We define two phases: explore and exploit. In the exploration phase we present low-entropy images to assess the model's performance. If the model is performing poorly, we might want to present the oracle with low-entropy images (based on assumption that low-entropy+high cross-entropy is more informative than high-entropy). If the model is performing well we present high-entropy images.