Today I will write a story about how I saved myself a lot of writing with just a few lines of code (that doesn’t mean it took little time :P).

Usually, when you want to iterate over something, you write a loop. But what if you want to use the loop counter as a template argument, like this?:

#include <array>
int main() {
    for(int i=0; i<10; ++i)
        std::array<int, i> a;
}

Compilation of above code fails with such compilation errors:

$ g++ -std=c++14 m.cpp 
m.cpp: In function ‘int main()’:
m.cpp:4:25: error: the value of ‘i’ is not usable in a constant expression
         std::array<int, i> a;
                         ^
m.cpp:3:13: note: ‘int i’ is not const
     for(int i=0; i<10; ++i) {
             ^
m.cpp:4:26: error: the value of ‘i’ is not usable in a constant expression
         std::array<int, i> a;
                          ^
m.cpp:3:13: note: ‘int i’ is not const
     for(int i=0; i<10; ++i) {
             ^
m.cpp:4:26: note: in template argument for type ‘long unsigned int’ 
         std::array<int, i> a;

To solve this problem we can make a loop using templates. This may end up with such code:

#include <iostream>
#include <array>
 
template<size_t c>
struct ForLoop {
    template<template <size_t> class Func>
    static void iterate() {
        Func<c>()();
        ForLoop<c-1>::template iterate<Func>();
    }
};
 
template<>
struct ForLoop<0> {
    template<template <size_t> class Func>
    static void iterate() {
        Func<0>()();
    }
};

template <size_t size>
struct Foo {
    void operator()() {
        std::array<int, size> arr;
        std::cout << "Array size: " << arr.size() << std::endl;
    }
};

int main() {
    ForLoop<4>::iterate<Foo>();
}

And… everything is fine with this technique unless you aren’t using a macro, to which you pass a template with arguments as a token and the macro prints it to the screen. This is what BENCHMARK_TEMPLATE macro does in google benchmark framework:

EDIT: This was valid around december of 2015. Currently, things can be done better. For more information see my issue: https://github.com/google/benchmark/issues/167.

// above the ForLoop structure and includes for google benchmark framework
// ...
template <size_t size>
struct Foo {
    void operator()() {
        // the arguments are: template function to benchmark, template argument
        BENCHMARK_TEMPLATE(BM_SimulationSplit, DodSimulation<size, size>);
    }
};

// normally we would use this macro for main but since we need to use 
// our template for loop, we have to write main manually
//BENCHMARK_MAIN();

int main(int argc, char** argv) {
    ForLoop<10>::iterate<Foo>();
    ::benchmark::Initialize(&argc, argv);
    ::benchmark::RunSpecifiedBenchmarks();
}


// It turns out that the Google Benchmark output is not so cool
// launched with --color_print=false --benchmark_format=csv
//
/*name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
Run on (8 X 3380.81 MHz CPU s)
2015-12-01 00:35:12
"BM_SimulationSplit<DodSimulation<size, size>>",833,475924,619448,,,
"BM_SimulationSplit<DodSimulation<size, size>>",1167,726188,589546,,,
"BM_SimulationSplit<DodSimulation<size, size>>",1496,355430,465241,,,
"BM_SimulationSplit<DodSimulation<size, size>>",1509,357407,466534,,,
"BM_SimulationSplit<DodSimulation<size, size>>",1471,659624,464990,,,
"BM_SimulationSplit<DodSimulation<size, size>>",1509,543499,461233,,,
"BM_SimulationSplit<DodSimulation<size, size>>",1823,290969,379594,,,
"BM_SimulationSplit<DodSimulation<size, size>>",1842,581460,384365,,,
"BM_SimulationSplit<DodSimulation<size, size>>",1823,435162,379594,,,
"BM_SimulationSplit<DodSimulation<size, size>>",1842,295108,384365,,,
"BM_SimulationSplit<DodSimulation<size, size>>",2397,433502,290363,,,
*/

As you can see, the output from google benchmark is not so good - we have no idea what parameters were used for which run. It would be much better to see lines like this:

"BM_SimulationSplit<DodSimulation<0, 0>>",2188,312266,297989,,,

And that’s what can be done by manually using BENCHMARK_TEMPLATE macro.

And… this is actually achievable by using Boost.Preprocessor. This library allows you to do a lot of magic with preprocessor.

One of the things you can do with it is a preprocessor loop.

Let’s look at a modified example from the documentation for BOOST_PP_SEQ_FOR_EACH:

#include <iostream>

#include <boost/preprocessor/cat.hpp>
#include <boost/preprocessor/seq/for_each.hpp>

#define SEQ (foo)(bar)
#define MACRO(r, data, elem) std::cout << BOOST_PP_CAT(data, elem)();

const char* prefix_foo() { return "foo"; }
const char* prefix_bar() { return "bar"; }

int main() {
    BOOST_PP_SEQ_FOR_EACH(MACRO, prefix_, SEQ)
    std::cout << std::endl;
}

And the compilation and execution:

$ g++ -std=c++14 -Wall -Wextra -Wpedantic foo.cpp
$ ./a.out
foobar
$

As you can see, the preprocessor macro iterated over the SEQ sequence elements - foo and bar and generated prefix_foo() and prefix_bar() calls.

Back to the title of the post. The above trick allowed me to change 800 lines of code into 8. The thing I wanted to do was to benchmark a template function with range of template arguments and as the google benchmark itself doesn’t have a mechanism to do that (it lets you iterate over range of arguments but not template arguments as you can’t use normal loop for that) I had to either use template loop or Boost.Preprocessor. I have decided to use the latter, since it allows me to see the template arguments as well.

Below you can see the code I have finally used for my benchmark:

#include "benchmark/benchmark.h"

#include "dod/dod_simulation.cpp"
#include "oop/oop_simulation.cpp"

template <typename SimulationType>
static void BM_SimulationSplit(benchmark::State &state) {
    SimulationType s{100000};

    while (state.KeepRunning()) {
        s.iterationX();
        s.iterationY();
        s.iterationZ();
    }
}

template <typename SimulationType>
static void BM_SimulationNoSplit(benchmark::State &state) {
    SimulationType s{100000};

    while (state.KeepRunning())
        s.iteration();
}


#include <boost/preprocessor/seq/for_each.hpp>

#define FILL_SEQ (0)(4)(8)(12)(16)(20)(24)(28)(32)(36)(40)(44)(48)(52)(56)(60)(64)(68)(72)(76)(80)(84)(88)(92)(96)(100)(104)(108)(112)(116)(120)(124)(128)(132)(136)(140)(144)(148)(152)(156)(160)(164)(168)(172)(176)(180)(184)(188)(192)(196)(200)(204)(208)(212)(216)(220)(224)(228)(232)(236)(240)(244)(248)(252)(256)(260)(264)(268)(272)(276)(280)(284)(288)(292)(296)(300)(304)(308)(312)(316)(320)(324)(328)(332)(336)(340)(344)(348)(352)(356)(360)(364)(368)(372)(376)(380)(384)(388)(392)(396)(400)(404)(408)(412)(416)(420)(424)(428)(432)(436)(440)(444)(448)(452)(456)(460)(464)(468)(472)(476)(480)(484)(488)(492)(496)(500)(504)(508)(512)
#define SIMULATION_SPLIT(_dummy, SIMULATION_TYPE, FILL) BENCHMARK_TEMPLATE(BM_SimulationSplit, SIMULATION_TYPE<FILL, FILL>);
#define SIMULATION_NO_SPLIT(_dummy, SIMULATION_TYPE, FILL) BENCHMARK_TEMPLATE(BM_SimulationNoSplit, SIMULATION_TYPE<FILL, FILL>);

BOOST_PP_SEQ_FOR_EACH(SIMULATION_SPLIT, DodSimulation, FILL_SEQ)
BOOST_PP_SEQ_FOR_EACH(SIMULATION_SPLIT, OopSimulation, FILL_SEQ)

BOOST_PP_SEQ_FOR_EACH(SIMULATION_NO_SPLIT, DodSimulation, FILL_SEQ)
BOOST_PP_SEQ_FOR_EACH(SIMULATION_NO_SPLIT, OopSimulation, FILL_SEQ)

BENCHMARK_MAIN();

…and in the end I have my expected output with that:

"BM_SimulationSplit<DodSimulation<0, 0>>",2134,313182,313027,,,
"BM_SimulationSplit<DodSimulation<4, 4>>",1716,396065,398601,,,
"BM_SimulationSplit<DodSimulation<8, 8>>",1346,524012,523031,,,
"BM_SimulationSplit<DodSimulation<12, 12>>",1094,676046,680073,,,
"BM_SimulationSplit<DodSimulation<16, 16>>",795,840854,840252,,,
...

To sum up:

  • Templates can do a lot (and even more, they are turing complete)
  • There is an interesting library called Boost.Preprocessor which may save you time typing or rather generating the code from e.g. Python script
  • As said among the internet “use the right tool for the job” - don’t use preprocessor loops just because you can - in the end it is not so easy maintainable, so avoid it without having a really good excuse.

Comments