Logging and Post-Processing¶
So far, we have seen how to define or load data, pick a loss, select and
configure an algorithm, and run it to minimize a smooth loss,
\(\operatorname{f}(x)\). We have used getx
and getf
member
functions of gd
to retreive the last decision vector (i.e., iterate)
generated by the algorithm and the smooth loss value at that iterate.
Generally, we are not only interested in the last iterate and the loss value
generated by the algorithm but also the sequence of states (e.g., iterates,
(partial) gradients, loss values, iteration counts, wall-clock times) the
algorithm generates. To support logging these states while the algorithm is
running, polo
provides different State Loggers. Here, we briefly
show how to log iteration counts, wall-clock times and the function values
easily to a comma-separated values (csv) file.
Revisiting the example in Listing 2, we need to pick a proper state logger, input the logger to the algorithm, and finally save the (in-memory) logged states to a csv file. We provide the resulting code in Listing 3, with the necessary changes highlighted.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | /* include system libraries */
#include <fstream>
#include <iostream>
#include <vector>
using namespace std;
/* include polo */
#include <polo/polo.hpp>
using namespace polo;
int main(int argc, char *argv[]) {
/* define the problem data */
auto data =
utility::reader<double, int>::svm({"../data/australian_scale"}, 690, 14);
/* define the smooth loss */
loss::logistic<double, int> loss(data);
/* estimate smoothness of the loss */
double rowmax{0};
for (int row = 0; row < data.nsamples(); row++) {
double rowsquare{0};
for (const auto val : data.matrix()->getrow(row))
rowsquare += val * val;
if (rowsquare > rowmax)
rowmax = rowsquare;
}
const double L = 0.25 * data.nsamples() * rowmax;
/* select and configure the desired solver */
algorithm::gd<double, int> alg;
alg.step_parameters(2 / L);
/* pick a state logger */
utility::logger::value<double, int> logger;
/* provide an initial vector to the solver, and solve the problem */
const vector<double> x0(data.nfeatures());
alg.initialize(x0);
alg.solve(loss, logger);
/* open a csv file for writing */
ofstream file("logger.csv");
if (file) { /* if successfully opened for writing */
file << "k,t,f\n";
for (const auto &log : logger)
file << log << '\n';
}
/* print the result */
cout << "Optimum: " << alg.getf() << '\n';
cout << "Optimizer: [";
for (const auto val : alg.getx())
cout << val << ',';
cout << "].\n";
return 0;
}
|
First, we include the standard C++ <fstream> library to be able to open
a csv file. Then, we pick a value
logger, which logs the iteration counts,
wall-clock times and the loss values generated by the algorithm, and we provide
the logger
to the solve
method of our algorithm as the second argument.
Last, for post-processing purposes, we open a csv file, named logger.csv
,
and write each log
line by line. Note that the value
logger, by
default, outputs the iteration count, wall-clock time (in milliseconds) and the
loss value in the given order, delimited by a comma.
We append the following lines to CMakeLists.txt
add_executable(logger logger.cpp)
target_link_libraries(logger polo::polo)
and build the project. Running the executable should give the same output as before:
Optimum: 229.222
Optimizer: [0.0110083,0.162899,0.0832372,0.627515,0.968077,0.328978,0.257715,1.69923,0.556535,0.157199,-0.143509,0.328954,-0.358702,0.179352,].
However, this time, our executable has created an artifact, named
logger.csv
. We can check, for instance, the last 5 lines of the file:
# assuming that we are already in $HOME/examples/build
tail -n 5 logger.csv
96,5.61734,229.408
97,5.66521,229.37
98,5.70951,229.332
99,5.75266,229.295
100,5.79627,229.258
Moreover, we can use a plotting script such as that given in Listing 4 to plot the loss values with respect to iteration counts and wall-clock times.
import csv # for reading a csv file
from matplotlib import pyplot as plt # for plotting
k = []
t = []
f = []
with open("logger.csv") as csvfile:
csvReader = csv.reader(csvfile, delimiter=",")
next(csvReader) # skip the header
for row in csvReader:
k.append(int(row[0]))
t.append(float(row[1]))
f.append(float(row[2]))
h, w = plt.figaspect(0.5)
fig, axes = plt.subplots(1, 2, sharey=True, figsize=(h, w))
# f vs k
axes[0].plot(k, f)
axes[0].set_xlabel(r"$k$")
axes[0].set_ylabel(r"$f(\cdot)$")
axes[0].grid()
# f vs t
axes[1].plot(t, f)
axes[1].set_xlabel(r"$t$ [ms]")
axes[1].grid()
plt.tight_layout()
plt.savefig("logger.svg")
plt.savefig("logger.pdf")
The resulting figure should look similar to Fig. 1. There, we observe the loss values plotted against the iteration counts (left) and the wall-clock times (right).
Note
For this example, we have used matplotlib as the
plotting library in Python. The library can be installed easily, if there
exists pip on the system, by issuing
pip install --user --upgrade matplotlib
.