Losses

Losses#

We have now presented a first family of models, which is the MLP family. In order to train these models (i.e. tune their parameters to fit the data), we need to define a loss function to be optimized. Indeed, once this loss function is picked, optimization will consist in tuning the model parameters so as to minimize the loss.

In this section, we will present two standard losses, that are the mean squared error (that is mainly used for regression) and logistic loss (which is used in classification settings).

In the following, we assume that we are given a dataset \(\mathcal{D}\) made of \(n\) annotated samples \((x_i, y_i)\), and we denote the model’s output:

\[ \forall i, \hat{y}_i = m_\theta(x_i) \]

where \(m_\theta\) is our model and \(\theta\) is the set of all its parameters (weights and biases).

Mean Squared Error#

The Mean Squared Error (MSE) is the most commonly used loss function in regression settings. It is defined as:

\[\begin{align*} \mathcal{L}(\mathcal{D} ; m_\theta) &= \frac{1}{n} \sum_i \|\hat{y}_i - y_i\|^2 \\ &= \frac{1}{n} \sum_i \|m_{\theta}(x_i) - y_i\|^2 \end{align*}\]

Its quadratic formulation tends to strongly penalize large errors:

../../_images/f98286a96b5eab2cc0ef5f50f077b8b21afe557eb26c11b73abbae320cbc081a.svg

Logistic loss#

The logistic loss is the most widely used loss to train neural networks in classification settings. It is defined as:

\[ \mathcal{L}(\mathcal{D} ; m_\theta) = \frac{1}{n} \sum_i - \log p(\hat{y}_i = y_i ; m_\theta) \]

where \(p(\hat{y}_i = y_i ; m_\theta)\) is the probability predicted by model \(m_\theta\) for the correct class \(y_i\).

Its formulation tends to favor cases where the model outputs a probability of 1 for the correct class, as expected:

../../_images/0e546206826f18363e2ef028f5d9bff1cb1ee3fa754d8eb35482953f12dea47f.svg

Losses

Contents

Losses#

Mean Squared Error#

Logistic loss#