Logging and Post-Processing

So far, we have seen how to define or load data, pick a loss, select and configure an algorithm, and run it to minimize a smooth loss, \(\operatorname{f}(x)\). We have used getx and getf member functions of gd to retreive the last decision vector (i.e., iterate) generated by the algorithm and the smooth loss value at that iterate.

Generally, we are not only interested in the last iterate and the loss value generated by the algorithm but also the sequence of states (e.g., iterates, (partial) gradients, loss values, iteration counts, wall-clock times) the algorithm generates. To support logging these states while the algorithm is running, polo provides different State Loggers. Here, we briefly show how to log iteration counts, wall-clock times and the function values easily to a comma-separated values (csv) file.

Revisiting the example in Listing 2, we need to pick a proper state logger, input the logger to the algorithm, and finally save the (in-memory) logged states to a csv file. We provide the resulting code in Listing 3, with the necessary changes highlighted.

Listing 3 getting-started/logger.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
/* include system libraries */
#include <fstream>
#include <iostream>
#include <vector>
using namespace std;

/* include polo */
#include <polo/polo.hpp>
using namespace polo;

int main(int argc, char *argv[]) {
  /* define the problem data */
  auto data =
      utility::reader<double, int>::svm({"../data/australian_scale"}, 690, 14);

  /* define the smooth loss */
  loss::logistic<double, int> loss(data);

  /* estimate smoothness of the loss */
  double rowmax{0};
  for (int row = 0; row < data.nsamples(); row++) {
    double rowsquare{0};
    for (const auto val : data.matrix()->getrow(row))
      rowsquare += val * val;
    if (rowsquare > rowmax)
      rowmax = rowsquare;
  }
  const double L = 0.25 * data.nsamples() * rowmax;

  /* select and configure the desired solver */
  algorithm::gd<double, int> alg;
  alg.step_parameters(2 / L);

  /* pick a state logger */
  utility::logger::value<double, int> logger;

  /* provide an initial vector to the solver, and solve the problem */
  const vector<double> x0(data.nfeatures());
  alg.initialize(x0);
  alg.solve(loss, logger);

  /* open a csv file for writing */
  ofstream file("logger.csv");
  if (file) { /* if successfully opened for writing */
    file << "k,t,f\n";
    for (const auto &log : logger)
      file << log << '\n';
  }

  /* print the result */
  cout << "Optimum: " << alg.getf() << '\n';
  cout << "Optimizer: [";
  for (const auto val : alg.getx())
    cout << val << ',';
  cout << "].\n";

  return 0;
}

First, we include the standard C++ <fstream> library to be able to open a csv file. Then, we pick a value logger, which logs the iteration counts, wall-clock times and the loss values generated by the algorithm, and we provide the logger to the solve method of our algorithm as the second argument. Last, for post-processing purposes, we open a csv file, named logger.csv, and write each log line by line. Note that the value logger, by default, outputs the iteration count, wall-clock time (in milliseconds) and the loss value in the given order, delimited by a comma.

We append the following lines to CMakeLists.txt

add_executable(logger logger.cpp)
target_link_libraries(logger polo::polo)

and build the project. Running the executable should give the same output as before:

Optimum: 229.222
Optimizer: [0.0110083,0.162899,0.0832372,0.627515,0.968077,0.328978,0.257715,1.69923,0.556535,0.157199,-0.143509,0.328954,-0.358702,0.179352,].

However, this time, our executable has created an artifact, named logger.csv. We can check, for instance, the last 5 lines of the file:

# assuming that we are already in $HOME/examples/build
tail -n 5 logger.csv
96,5.61734,229.408
97,5.66521,229.37
98,5.70951,229.332
99,5.75266,229.295
100,5.79627,229.258

Moreover, we can use a plotting script such as that given in Listing 4 to plot the loss values with respect to iteration counts and wall-clock times.

Listing 4 getting-started/logger.py
import csv  # for reading a csv file
from matplotlib import pyplot as plt  # for plotting

k = []
t = []
f = []

with open("logger.csv") as csvfile:
    csvReader = csv.reader(csvfile, delimiter=",")
    next(csvReader)  # skip the header
    for row in csvReader:
        k.append(int(row[0]))
        t.append(float(row[1]))
        f.append(float(row[2]))

h, w = plt.figaspect(0.5)
fig, axes = plt.subplots(1, 2, sharey=True, figsize=(h, w))

# f vs k
axes[0].plot(k, f)
axes[0].set_xlabel(r"$k$")
axes[0].set_ylabel(r"$f(\cdot)$")
axes[0].grid()

# f vs t
axes[1].plot(t, f)
axes[1].set_xlabel(r"$t$ [ms]")
axes[1].grid()

plt.tight_layout()
plt.savefig("logger.svg")
plt.savefig("logger.pdf")

The resulting figure should look similar to Fig. 1. There, we observe the loss values plotted against the iteration counts (left) and the wall-clock times (right).

../_images/logger.svg

Fig. 1 Loss values generated by the algorithm in Listing 3.

Note

For this example, we have used matplotlib as the plotting library in Python. The library can be installed easily, if there exists pip on the system, by issuing pip install --user --upgrade matplotlib.