Loading Data

In Anatomy of a POLO Program, we have seen how to manually define problem data (cf. matrix \(A\) and vector \(b\) in Listing 1). This approach is appropriate for defining small data sets; however, it gets impractical for larger data sets that many machine-learning problems involve. To facilitate convenient experimentation on machine-learning problems, polo includes functionality 1 for reading data from common formats such as LIBSVM [2011-Chang].

Now, we briefly demonstrate how polo can be used to solve a logistic regression problem on data in the LIBSVM format. To this end, we revisit the example in Listing 1. Instead of working on the \(3\times3\) data matrix, we download australian_scale file from australian data set to data folder under $HOME/examples. The data set contains 690 samples, each of which has 14 features and belongs to either of the two classes: -1 or +1. Next, we change the data-related parts of Listing 1 to obtain Listing 2 (modifications are highlighted).

Listing 2 getting-started/svmdata.cpp
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/* include system libraries */
#include <iostream>
#include <vector>
using namespace std;

/* include polo */
#include <polo/polo.hpp>
using namespace polo;

int main(int argc, char *argv[]) {
  /* define the problem data */
  auto data =
      utility::reader<double, int>::svm({"../data/australian_scale"}, 690, 14);

  /* define the smooth loss */
  loss::logistic<double, int> loss(data);

  /* estimate smoothness of the loss */
  double rowmax{0};
  for (int row = 0; row < data.nsamples(); row++) {
    double rowsquare{0};
    for (const auto val : data.matrix()->getrow(row))
      rowsquare += val * val;
    if (rowsquare > rowmax)
      rowmax = rowsquare;
  }
  const double L = 0.25 * data.nsamples() * rowmax;

  /* select and configure the desired solver */
  algorithm::gd<double, int> alg;
  alg.step_parameters(2 / L);

  /* provide an initial vector to the solver, and solve the problem */
  const vector<double> x0(data.nfeatures());
  alg.initialize(x0);
  alg.solve(loss);

  /* print the result */
  cout << "Optimum: " << alg.getf() << '\n';
  cout << "Optimizer: [";
  for (const auto val : alg.getx())
    cout << val << ',';
  cout << "].\n";

  return 0;
}

As can be observed, we first call svm reader on the data file, providing the number of samples and features the file contains, and assign its output to the automatic variable data. Then, we try to approximate the Lipschitz smoothness of the logistic loss defined on data. This is because we are still using the vanilla gradient descent algorithm with a constant step size, and it is known that the iterates of this algorithm converge to the optimum (for convex functions) if the step size satisfies \(\gamma < 2/L\), where \(L\) is the smoothness parameter [2004-Nesterov]. Because the logistic loss is defined as

\[\operatorname{f}(x) := \sum_{n=1}^{N} \operatorname{f_{n}}(x) = \sum_{n=1}^{N} \log \left( 1 + \exp \left( -b_{n} \left \langle a_{n}, x \right \rangle \right) \right) \,,\]

and each \(\operatorname{f_{n}}(x)\) is \(L_{n}\)-smooth with \(L_{n} = {\Vert a_{n} \Vert}_{2}^{2} / 4\) [2014-Xiao], a computationally cheap (albeit conservative) estimate for \(L\) is

\[L = 0.25 N L_{\text{max}} \,, \qquad L_{\text{max}} = \max_{n} {\Vert a_{n} \Vert}_{2}^{2} \,.\]

Finally, we set the step size of the algorithm, initialize it with a zero-vector of appropriate dimension, and run the algorithm. To compile Listing 2, we add the following lines to our previous CMakeLists.txt:

add_executable(svmdata svmdata.cpp)
target_link_libraries(svmdata polo::polo)

Building the executable using CMake and running the resulting program give:

Optimum: 229.222
Optimizer: [0.0110083,0.162899,0.0832372,0.627515,0.968077,0.328978,0.257715,1.69923,0.556535,0.157199,-0.143509,0.328954,-0.358702,0.179352,].
2004-Nesterov

Yurii E. Nesterov. Introductory Lectures on Convex Optimization. Springer US, 2004. DOI: 10.1007/978-1-4419-8853-9.

2011-Chang

Chih-Chung Chang and Chih-Jen Lin. “LIBSVM.” ACM Transactions on Intelligent Systems and Technology 2.3 (Apr. 2011), pp. 1-27. DOI: 10.1145/1961189.1961199.

2014-Xiao

Lin Xiao and Tong Zhang. “A Proximal Stochastic Gradient Method with Progressive Variance Reduction.” SIAM Journal on Optimization 24.4 (Jan. 2014), pp. 2057–2075. DOI: 10.1137/140961791.

Footnotes

1

We will cover this functionality in more detail in Data Handling.