Separate Preprocess and Compile Performance
Unlike most other build systems, build2
performs an explicit
header dependency extraction step (-M
,
/showIncludes
, etc). The conventional approach is to perform
this as part of the compilation itself. It is a clever trick but
unfortunately it doesn't work for auto-generated headers. And in
build2
we do support auto-generated headers.
One notable aspect of the dependency extraction step is that it is
essentially a full preprocessor run. In fact, for
/showIncludes
, one has to either preprocess or compile so if
you only need the header dependencies, you have to send the preprocessed
output to /dev/null
. Which is a waste, of course: why not save
the preprocessed output and then compile that?
In fact, if we can pull this off, it opens some intriguing possibilities: For starters, we can ignore comment-only changes by hashing the preprocessed output. More importantly, we will be all set for distributed compilation and caching since we now have a self-contained translation unit that we can ship to a remote host. Finally, the build system can analyze the preprocessed output, for example, to extract C++ module dependencies, which, in many ways, are not unlike header dependencies, just being on the language rather than preprocessor level.
Note that the separate preprocess and compile setup is not without
challenges. For details see these GCC mailing
list, Clang
mailing list, and r/cpp
discussions. But I believe we managed to pull it off in build2
with the separate preprocess and compile mode now being the default for the
big three. GCC and Clang are pretty solid with VC and its broken
preprocessor being the iffy one, so we will have to wait and see. So far we
haven't seen any issues on our builds (which means standard libraries and
system headers are all good). Plus, this can be disabled for specific
translation units, project directories, and entire projects.
Ok, finally we are getting to the interesting part: the performance.
While in build2
dependency extraction during compilation is not
an option, it would be interesting to see how it compares. So what we did is
one better: we disabled the dependency extraction altogether. So the first
column is the time it takes for just compiling which is our base or 100%. We
expect it to be the fastest.
We would expect the new mode, which compiles the preprocessed output (saved as the side effect of dependency extraction), to be the second fastest. This is our second column.
The old mode, which extracts dependencies without producing the preprocessed output and then compiles the original source, should be the slowest. This is our third column. Note that we don't have a measurement for VC since there is no way not to produce preprocessed output in this mode.
Finally, we can add a column for the "reprocess" mode: save the preprocessed output but force the use of the original source during compilation (this uses the disabling mechanism mentioned above). Compared to column two this will give us the cost of preprocessing our sources.
The test was done on a Linux machine with 4-core/8-thread i7-6820HQ
2.70GHz CPU, 64GB of RAM, and a Samsung 950 PRO NVMe SSD. It involved
rebuilding from scratch the build2
build system and package
manager without optimization. The numbers (time taken) are only comparable
for the same compiler. Less is better.
compiler src dep+pre dep+src dep+pre+src --------------------------------------------------------- GCC 5.4 100% 99.2% 104.8% 104.0% Clang 3.8 100% 107.1% 107.9% 112.1% VC 15u0 100% 122.3% -- 128.5%
A couple of observations: If we have to have a separate dependency extraction step then compiling the preprocessed output is faster than compiling the source. The cost of preprocessing the source is about 5% of the (non-optimized) build time for all three compilers.
Also, keep in mind that the first column is not the traditional way of extracting dependencies – it does not extract them at all, does not get their modification times, and so on. The "true" traditional time will be somewhere between the first and the second columns.