Defining Custom Loss Functions¶
In our examples, we have thus far used the logistic
loss on data
that
is either defined manually or read from a dataset. However, we can also define
our custom loss functions, and pass them as the first argument to the solve
member function of the algorithms. To demonstrate this, we focus on the
following simple loss function:
for some \(N \geq 1\). As can be observed, the loss is a convex quadratic function of the \(N\)-dimensional decision vector, and the optimizer of Problem (1) with this loss is
Furthermore, it can easily be verified that the gradient and the Hessian of the loss are
In polo
, a smooth loss is any object that, when called with two input
arguments, x
and g
, returns the loss value associated with x
and
writes the gradient in g
. Because the loss objects are not allowed to
modify the decision vector and because of compatibility with other (high-level)
languages, the types of x
and g
are const value_t *
and
value_t *
, respectively, where value_t
is the type (e.g., double
)
of the values that the decision vector and the gradient contain. Hence, one way
to define the loss mentioned above is to create a new struct
as follows
struct simple_loss {
simple_loss(const int N) : N{N} {}
double operator()(const double *x, double *g) const {
double loss{0};
for (int n = 1; n <= N; n++) {
const double residual{x[n - 1] - n};
loss += residual * residual / n;
g[n - 1] = 2 * residual / n;
}
return loss;
}
private:
const int N;
};
where we keep the data (N
) of the loss private
, and define the
operator()
member function appropriately. Note that, because C++ has
zero-based numbering,
we use n-1
when indexing x
and g
.
To use a custom loss object of type simple_loss
, we first construct
loss
with, e.g., N = 10
. Thanks to the simple and closed-form
representation of this simple loss function, we can easily verify that the loss
is a \(\mu\)-strongly convex function with \(\mu =
\lambda_{\text{min}} = 2/N\) and \(L = \lambda_{\text{max}} = 2\), where
\(\lambda_{\text{min}}\) and \(\lambda_{\text{max}}\) are the minimum
and maximum eigenvalues of the Hessian, respectively. We know from
[2004-Nesterov] that the vanilla gradient descent algorithm with constant step
size \(\gamma = \frac{2}{\mu + L}\) converges linearly to the optimum. We
set the step size of the algorithm accordingly, and define a value
terminator with absolute and relative tolerances of \(10^{-8}\) and
\(10^{-13}\), respectively. The resulting code is given in
Listing 7.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | /* include system libraries */
#include <fstream>
#include <iostream>
#include <vector>
using namespace std;
/* include polo */
#include <polo/polo.hpp>
using namespace polo;
struct simple_loss {
simple_loss(const int N) : N{N} {}
double operator()(const double *x, double *g) const {
double loss{0};
for (int n = 1; n <= N; n++) {
const double residual{x[n - 1] - n};
loss += residual * residual / n;
g[n - 1] = 2 * residual / n;
}
return loss;
}
private:
const int N;
};
int main(int argc, char *argv[]) {
/* define the smooth loss */
const int N{10};
simple_loss loss(N);
const double mu{2. / N};
const double L{2.};
/* select and configure the desired solver */
algorithm::gd<double, int> alg;
alg.step_parameters(2 / (mu + L));
/* pick a state logger */
utility::logger::value<double, int> logger;
/* pick a terminator */
terminator::value<double, int> terminator(1E-8, 1E-13);
/* provide an initial vector to the solver, and solve the problem */
const vector<double> x0(N);
alg.initialize(x0);
alg.solve(loss, logger, terminator);
/* open a csv file for writing */
ofstream file("loss.csv");
if (file) { /* if successfully opened for writing */
file << "k,t,f\n";
for (const auto &log : logger)
file << fixed << log.getk() << ',' << log.gett() << ',' << log.getf()
<< '\n';
}
/* print the result */
cout << "Optimum: " << fixed << alg.getf() << '\n';
cout << "Optimizer: [";
for (const auto val : alg.getx())
cout << val << ',';
cout << "].\n";
return 0;
}
|
We append the following lines to CMakeLists.txt
add_executable(loss loss.cpp)
target_link_libraries(loss polo::polo)
and build the project. Running the executable should give the output:
Optimum: 0.000000
Optimizer: [1.000036,2.000000,3.000000,4.000000,5.000000,6.000000,6.999998,7.999984,8.999910,9.999641,].
Here, we observe that the optimum value is attained (up to the fixed, 6-digit
precision) by the algorithm, even though the optimizer is not. This is because
the value
terminator only checks the absolute and relative changes in the
loss value. If we want to have a termination condition based on the changes in
the decision vector instead, we can replace
terminator::value<double, int> terminator(1E-8, 1E-13);
with
terminator::decision<double, int> terminator(1E-8);
This decision
terminator with the tolerance \(10^{-8}\) will stop the
algorithm when the following condition holds:
Rebuilding the project and rerunning the executable should give the following:
Optimum: 0.000000
Optimizer: [1.000000,2.000000,3.000000,4.000000,5.000000,6.000000,7.000000,8.000000,9.000000,10.000000,].