[webkit-dev] Growing tired of long build times? Check out this awesome new way to speed up your build... soon (HINT: It's not buying a new computer)

Konstantin Tokarev annulen at yandex.ru
Tue Aug 29 14:31:34 PDT 2017



29.08.2017, 14:44, "Alicia Boya García" <aboya at igalia.com>:
> On 08/29/2017 06:20 AM, Daniel Bates wrote:
>
>>  Do we know what is the cause(es) for the slow clean builds? I am assuming that much of the speed up from the "unified source" comes from clang being able to use an in-memory #include cache to avoid disk I/O and re-parsing of a seen header. Have we exhausted all efforts (or have we given up?) removing extraneous #includes? Do we think this pursuing this effort would be more time consuming or have results that pale in comparison to the "unified source" approach?
>
> Whilst having an in-process-memory #include cache is not a bad thing,
> it's not the greatest gain as the operating systems should already cache
> file reads just fine.

>From my experience, Windows OS is particularly bad at this, even when given
quite enough memory for caching.

>
> The greatest gain comes from reducing the amount of times C++ headers
> are parsed. If you are building a certain .cpp file and include a .h
> file, the compiler has to parse it, which can take quite a bit because
> C++ is really complex monster, especially when templates are used. Doing
> this more than necessary raises build times really quickly.
>
> Header files almost always are include-guarded (either with #pragma once
> or traditional #ifndef guards), so including the same header twice
> within the same .cpp file (or any of its included files) has no cost. On
> the other hand, if you then start building a different .cpp file that
> also includes the same header, you have to parse it again because so far
> C++ is concerned, every inclusion could add different symbols to the AST
> the compiler is building, so their output can't be reused*. In turn we
> end up parsing most headers many more times than actually needed (i.e.
> for every .cpp file that includes Node.h the compiler has to parse
> Node.h and all its dependencies from source, that's a lot of wasted
> effort!).
>
> *Note that including twice the same .h within the same .cpp file is not
> fast because the output is cached in anyway, but because the entire .h
> file is skipped, adding no additional nodes to the AST.
>
> The goal of C++ modules is to fix this problem from its root cause:
> instead of literally including text header files, .cpp files declare
> dependencies on module files that can be compiled, stored, loaded and
> referenced from .cpp files in any order, so you would only parse the
> Node module source code once for the entire project, whilst the compiler
> could load directly the AST every other time from a cached module object
> file.
>
> Note the great advantage of modules comes from the fact that they can be
> imported in different contexts and their content is still semantically
> equivalent, whereas with plain C/C++ includes every header file may act
> differently depending on the preprocessor variables defined by the
> includer and other previous inclusions. In the worst case, when they are
> not include-guarded (luckily this is not too common, but it still
> happens), even including the same file in the same .cpp twice could add
> different symbols each time!
>
> Unfortunately C++ modules are a work in progress... There are two
> different competing proposals with implementations, one from Clang and
> another one from Microsoft, and the C++ technical specification is in a
> very early stage too:
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4681.pdf
>
> We know for sure modules are very important for the future of C++, but
> maybe it's still a bit too early to bet a big project like WebKit on them.
>
> So how else can we avoid parsing the same header files so many times and
> speed up our builds? Enter unified builds.
>
> A requirement for unified builds to work correctly is that header files
> are coded in such a way they work as independent units, much like C++
> modules, i.e. including headers should work no matter in what order you
> place them, and in each case they must define the same symbols. On July
> 31 I wrote about some issues we currently have because of not doing
> exactly this in WebKit (particularly, our #include "config.h" lines are
> ambiguous). They can be worked around so they will not become blockers
> for unified builds, but I still think we should fix them at some point.
>
> Once you have a set of .cpp files whose includes all (1) are guarded
> (e.g. by #pragma once) and (2) are independent units according to the
> above rule, you can take advantage of unified builds:
>
> Instead of invoking once the compiler for each .cpp file, you create a
> new artificial "unified" or "bundle" .cpp file that concatenates (or
> #include's) a number of different .cpp files. This way, headers included
> within the bundle are parsed only once, even if they are used by
> different individual .cpp files, as long as they are within the same
> bundle. This can often result in a massive build speed gain.
>
> Unfortunately, there are some pitfalls, as there is a dissonance between
> what the programmer thinks the compilation unit is (the individual .cpp
> file) and the actual compilation unit used by the compiler (the bundle
> .cpp file).
>
> * `static` variables and functions are intended to be scoped to the
> individual .cpp file, but the compiler has no way to know this and
> instead are scoped to the bundle. This can lead to non-intuitive
> name-clashes that we are trying to avoid (e.g. with `namespace FILENAME`).
>
> * Header files that don't work as independent units as they should may
> still work somehow or may fail in hard to diagnose ways.
>
> * Editing a .cpp file that is part of a bundle will trigger
> recompilation of the entire bundle. This makes changes to small,
> independent files slower the more files are grouped per bundle.
>
> * Similar to the issue before, editing a .h depended by a .cpp file will
> trigger recompilation of the entire bundle, not the individual file.
>
> It's desirable to bundle .cpp files in a way that minimizes the impact
> of the last two issues (e.g. by bundling by features, changing a header
> used by all files implementing that feature may trigger the
> recompilation of that single feature bundle rather than many scattered
> bundles containing a few .cpp files each using that feature).
>
> Even with these issues, editing files depended by many will usually
> become much more faster than before, because although more individual
> .cpp files will be rebuilt, the number of actual compilation units
> (bundles) will be much lower and so will be the number of times header
> files are re-parsed.
>
> Compared to modules, unified builds are really a dirty hack. Modules
> don't have any of these issues: they are potentially faster and more
> reliable. If only they existed now as an standard rather than some
> experimental implementation with uncertain tooling, we would
> definitively use them.
>
> In absence of modules, unified builds still provide really good speedups
> for our dime.
>
> -- Alicia
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev

-- 
Regards,
Konstantin


More information about the webkit-dev mailing list