Separate Preprocess and Compile Performance

Unlike most other build systems, build2 performs an explicit header dependency extraction step (-M, /showIncludes, etc). The conventional approach is to perform this as part of the compilation itself. It is a clever trick but unfortunately it doesn't work for auto-generated headers. And in build2 we do support auto-generated headers.

One notable aspect of the dependency extraction step is that it is essentially a full preprocessor run. In fact, for /showIncludes, one has to either preprocess or compile so if you only need the header dependencies, you have to send the preprocessed output to /dev/null. Which is a waste, of course: why not save the preprocessed output and then compile that?

In fact, if we can pull this off, it opens some intriguing possibilities: For starters, we can ignore comment-only changes by hashing the preprocessed output. More importantly, we will be all set for distributed compilation and caching since we now have a self-contained translation unit that we can ship to a remote host. Finally, the build system can analyze the preprocessed output, for example, to extract C++ module dependencies, which, in many ways, are not unlike header dependencies, just being on the language rather than preprocessor level.

Note that the separate preprocess and compile setup is not without challenges. For details see these GCC mailing list, Clang mailing list, and r/cpp discussions. But I believe we managed to pull it off in build2 with the separate preprocess and compile mode now being the default for the big three. GCC and Clang are pretty solid with VC and its broken preprocessor being the iffy one, so we will have to wait and see. So far we haven't seen any issues on our builds (which means standard libraries and system headers are all good). Plus, this can be disabled for specific translation units, project directories, and entire projects.

Ok, finally we are getting to the interesting part: the performance. While in build2 dependency extraction during compilation is not an option, it would be interesting to see how it compares. So what we did is one better: we disabled the dependency extraction altogether. So the first column is the time it takes for just compiling which is our base or 100%. We expect it to be the fastest.

We would expect the new mode, which compiles the preprocessed output (saved as the side effect of dependency extraction), to be the second fastest. This is our second column.

The old mode, which extracts dependencies without producing the preprocessed output and then compiles the original source, should be the slowest. This is our third column. Note that we don't have a measurement for VC since there is no way not to produce preprocessed output in this mode.

Finally, we can add a column for the "reprocess" mode: save the preprocessed output but force the use of the original source during compilation (this uses the disabling mechanism mentioned above). Compared to column two this will give us the cost of preprocessing our sources.

The test was done on a Linux machine with 4-core/8-thread i7-6820HQ 2.70GHz CPU, 64GB of RAM, and a Samsung 950 PRO NVMe SSD. It involved rebuilding from scratch the build2 build system and package manager without optimization. The numbers (time taken) are only comparable for the same compiler. Less is better.

compiler       src      dep+pre    dep+src    dep+pre+src
---------------------------------------------------------
GCC   5.4      100%      99.2%     104.8%     104.0%
Clang 3.8      100%     107.1%     107.9%     112.1%
VC    15u0     100%     122.3%       --       128.5%

A couple of observations: If we have to have a separate dependency extraction step then compiling the preprocessed output is faster than compiling the source. The cost of preprocessing the source is about 5% of the (non-optimized) build time for all three compilers.

Also, keep in mind that the first column is not the traditional way of extracting dependencies – it does not extract them at all, does not get their modification times, and so on. The "true" traditional time will be somewhere between the first and the second columns.