Wednesday, August 20, 2014

Devirtualization in C++, part 6 - asking user for help

In previous posts of the series I described most of the new GCC devirtualization machinery. To cut the story short, GCC implements two kinds of devirtualization: the full and speculative one.

Full devirtualization replaces given polymorphic by a direct call (that can get inlined later).  Speculative devirtualization is a weaker form turning polymorphic call to a conditional testing whether the target is the predicted one and going by direct call path if it happens to be so and doing the indirect call otherwise.

Whole performing speculative devirtualization the compiler can play somehwat unsafe and make assumptions that are not valid 100% of cases. The main assumption made is that there are no new derived types introduced by code not visible to compiler. With link time optimization this is usually true, but not necessarily so - new derived types can be introduced by libraries or plugins. While compiler can generally attempt to track what instances may not be introduced by external code, this analysis is difficult and it is a question whether it can be implemented with reasonable precision within statically optimizing compiler.

The following example shows speculative devirtualization:
struct A {virtual void foo() {}};
void test (struct A *a)
{
  a->foo();
}
 Here GCC 4.9 produces:
test(A*):
        movq    (%rdi), %rax
        movq    (%rax), %rax
        cmpq    $a::foo(), %rax
        jne     .L5
        ret
.L5:
        jmp     *%rax
Instead of:
test(A*):
        .cfi_startproc
        movq    (%rdi), %rax
        movq    (%rax), %rax
        jmp     *%rax
If the target of polymorphic call happens to be a::foo(), the speculatively devirtualized code is significantly faster saving the call overhead (and enabling further optimizations).



As I measured in part 5, the speculative devirtualization can capture significant percentage of calls full devirtualization can not and is correct about its prediction in wast majority of cases.

C++ provides for ways that may be used to inform compiler that it has a complete information on derived types:
  1. declaring a type in an anonymous namespace,
  2. declaring a type in local scope (within function body),
  3. using the final specifier on a type, or,
  4. using the final specifier on a method.
Among these hints the first one is the most powerful. Declaring a type in the anonymous namespace not only ensures that no new derived types will appear in other translation units, but also makes it impossible for code in the other compilation units to access the data stored in a given type. This enables other optimizations, such as changing the memory layout. This is however not implemented yet in GCC.

In this post I will focus on C++11 final specifier and a way how compiler can aid user to annotate the code to improve code quality.

New warnings

The example given can be "improved" either by annotating struct A as final:
struct A final {virtual void foo() {}};
void test (struct A *a)
{
  a->foo();
Or by annotating the method foo as final:
struct A {virtual void foo() final {}};
void test (struct A *a)
{
  a->foo();
}
Both allows full devirtualization. Compiling both examples with GCC 4.9 will lead into an empty function test (that is a result of the full devirtualization and inlining the empty function A::foo):
test(A*):
       ret
For this reason I added two warnings: -Wsuggest-final-types and -Wsuggest-final-methods.

As name suggest, the warnings tells users about types or methods where final specifier would improve code quality. Both warnings are selective and do not buzz about all types with no derived types nor about all virtual methods with no override. They also do not warn in the cases where the compiler can easily determine the optimization itself (thus they are necessarily compiler version and optimization setting sensitive and thus not good candidates for -Werror).
$ gcc -O2 -Wsuggest-final-types -Wsuggest-final-methods t.C
t.C:1:8: warning: Declaring type ‘struct A’ final would enable devirtualization [-Wsuggest-final-types]
 struct A {virtual void foo() {}};
        ^
t.C:1:24: warning: Declaring method ‘virtual void A::foo()’ final would enable devirtualization [-Wsuggest-final-methods]
 struct A {virtual void foo() {}};
                        ^
The implementation on the top of the existing devirtualization machinery is quite straighforward. A (Speculative) devirtualization is implemented by filling in a polymorphic call context for a given call.  This context consists of outer type that is the type of instance the virtual table belongs to, offset within the outer type where the virtual table pointer is located and two flags.  First (maybe_in_construction) specify that type may be in construction and thus virtual functions of base types needs to be considered while the second (maybe_derived_type) specify that the instance actually may be of a type derived from the outer type. In most cases maybe_derived_type is set, because the type is determined from references in the source code.

Once filled in, the polymorphic call context is then turned into a list of methods that may be called by a given polymorphic call. This is done by recursive walk of the type inheritance tree as described in part 4 of the series. The resulting list is either complete or incomplete. Incomplette lists contains all possible targets within the current compilation unit, but the call may lead to virtual functions defined outside. Complete lists contains all possible targets. Incomplete lists are useful for speculation and profile prediction. Complete lists can drive full devirtualization as well as optimization propagating information across the callgraph.

When the new warnings are requested, the speculative devirtualization pass looks for contextes that have maybe_derived_type set, that contain precisely one method but they are incoplete. If such context is found then there are two options:
  1. If outer type has no derivations, -Wsuggest-final-types will output a warning that the type may be declared final
  2. If the method has no override, -Wsuggest-final-methods will suggest declaring it final.

How useful are the warnings?

The main purpose of the final keyword is not to assist code optimization. It is meant to be an syntactic sugar making APIs more robust. Unlike in Java codebases, the use of final keyword in C++ is not very widespread because it was introduced to C++ only recently. There are large code bases that was developed in pre-C++11 era and programmers are not used to the new feature.

I do not expect users to blindly annotate every type or method suggested. Instead I would advise to first try -Wsuggest-final-types and inspect individual types found.  Once suitable ones are annotated -Wsuggest-final-methods can be used to annotate suitable methods: annotating a type final renders final specifiers on its methods pointless.

The warnings can be used with both normal compilation and with link time optimization.  The second makes them a lot more precise: during the link time optimization the compiler has more information about derivations of a given type. Moreover more devirtualization and inlining happens in link time optimization builds.

Prioritizing the importantce

As an experiment I produced warnings from compilation of Firefox's libxul
library.  -Wsuggest-final-types identified 1772 types and -Wsuggest-final-methods 7215 methods (because of a non-linearity in GCC's line information lookup it takes about 5 minutes to just output them out of compiler and it would take ages to inspect all of them by hand!).  It is not a surprise that Firefox developers was unimpressed and asked me whether I can prioritize it somehow.

For this reason I implemented simple accounting into devirtualization pass and made it to output the warnings in the priority order, where priority is given by number of calls that would be fully devirtualized by the annotation either statically or dynamically when profile feedback is available.  The Firefox warnings now looks as follows:
/aux/hubicka/firefox/editor/libeditor/html/nsHTMLEditor.h:66:0: warning: Declaring type ‘struct nsHTMLEditor’ final would enable devirtualization of 575 calls [-Wsuggest-final-types]
 class nsHTMLEditor : public nsPlaintextEditor,
 ^
/aux/hubicka/firefox/docshell/base/nsDocShell.h:124:0: warning: Declaring type ‘struct nsDocShell’ final would enable devirtualization of 216 calls [-Wsuggest-final-types]
 class nsDocShell : public nsDocLoader,
 ^
/aux/hubicka/firefox/intl/icu/source/common/unicode/uniset.h:276:0: warning: Declaring type ‘struct UnicodeSet’ final would enable devirtualization of 146 calls [-Wsuggest-final-types]
 class U_COMMON_API UnicodeSet : public UnicodeFilter {
 ^
../../dist/include/mozilla/Selection.h:48:0: warning: Declaring type ‘struct Selection’ final would enable devirtualization of 131 calls [-Wsuggest-final-types]
/aux/hubicka/firefox/intl/icu/source/common/unicode/unistr.h:245:0: warning: Declaring type ‘struct UnicodeString’ final would enable devirtualization of 115 calls [-Wsuggest-final-types]
 class U_COMMON_API UnicodeString : public Replaceable
 ^
/aux/hubicka/firefox/intl/icu/source/i18n/unicode/decimfmt.h:663:0: warning: Declaring type ‘struct DecimalFormat’ final would enable devirtualization of 92 calls [-Wsuggest-final-types]
 class U_I18N_API DecimalFormat: public NumberFormat {
 ^
/aux/hubicka/firefox/netwerk/protocol/http/nsHttpChannel.h:40:7: warning: Declaring type ‘struct nsHttpChannel’ final would enable devirtualization of 65 calls [-Wsuggest-final-types]
 class nsHttpChannel : public HttpBaseChannel
 ^
...
 Followed by method warnings:
/aux/hubicka/firefox/dom/bindings/CallbackObject.cpp:32:0: warning: Declaring method ‘AddRef’ final would enable devirtualization of 148 calls [-Wsuggest-final-methods]
 NS_IMPL_CYCLE_COLLECTING_ADDREF(CallbackObject)
 ^
/aux/hubicka/firefox/content/media/MediaDecoder.cpp:1440:0: warning: Declaring method ‘GetReentrantMonitor’ final would enable devirtualization of 108 calls [-Wsuggest-final-methods]
 ReentrantMonitor& MediaDecoder::GetReentrantMonitor() {
 ^
/aux/hubicka/firefox/dom/base/nsGlobalWindow.cpp:9294:0: warning: Declaring method ‘GetExistingListenerManager’ final would enable devirtualization of 103 calls [-Wsuggest-final-methods]
 nsGlobalWindow::GetExistingListenerManager() const
 ^
/aux/hubicka/firefox/dom/base/nsGlobalWindow.cpp:9281:0: warning: Declaring method ‘GetOrCreateListenerManager’ final would enable devirtualization of 101 calls [-Wsuggest-final-methods]
 nsGlobalWindow::GetOrCreateListenerManager()
 ^
/aux/hubicka/firefox/dom/bindings/CallbackObject.cpp:33:0: warning: Declaring method ‘Release’ final would enable devirtualization of 76 calls [-Wsuggest-final-methods]
 NS_IMPL_CYCLE_COLLECTING_RELEASE(CallbackObject)
 ^
/aux/hubicka/firefox/intl/icu/source/common/unistr.cpp:398:0: warning: Declaring virtual destructor of ‘struct UnicodeString’ final would enable devirtualization of 68 calls [-Wsuggest-final-methods]
 UnicodeString::~UnicodeString()
 ^
/aux/hubicka/firefox/intl/icu/source/common/uvector.cpp:86:0: warning: Declaring virtual destructor of ‘struct UVector’ final would enable devirtualization of 63 calls [-Wsuggest-final-methods]
 UVector::~UVector() {
 ^
/aux/hubicka/firefox/editor/libeditor/html/nsHTMLEditor.cpp:3189:0: warning: Declaring method ‘DeleteNode’ final would enable devirtualization of 49 calls [-Wsuggest-final-methods]
 nsHTMLEditor::DeleteNode(nsIDOMNode* aNode)
 ^
Overall 9270 polymorphic calls (out of 68162 in libxul binary) can be devirtualized by declaring types final, while declaring top 100 calls will handle 3876 virtual calls.

13952 calls (23% of all polymorphic calls in libxul binary) can be devirtualized by declaring methods final. The distribution is however significantly more flat - declaring first 500 methods would devirtualize only 4976 calls.  As the quoted warning suggest, however many cases arise from macros and template declarations, so there is some hope this can be handled more centrally.

Firefox experiment

Wiill developers follow the warnings? Fortunately I did not need to wait too long to figure out :)

Based on warnings I posted to GCC IRC, Trevor Saunders prepared a patch adding couple 145 finals into Firefox sources and filled in PR1047696. I found interesting to read the reactions of other developers who wonder why the annotation is a good idea. Trevor found that roughly 3600 indirect calls are saved by his patch - not far from my estimate above. Reportedly the patch is on its way to upstream. 

Using profile feedback

Can we do better? Well, with profile feedback we can. I implemented this in this patch. Firefox has profiledbuild mode where the browser train itself on renderring various pages and rebuilds with precise execution counts information on every basic blocks.  With this GCC can prioritize warnings by execution counts rather by static counts. With this, only 10 types are identified as how candidates:
/aux/hubicka/trunk-install/bin/ld: warning: hidden symbol 'pixman_add_triangles' in /aux/hubicka/build-firefox5-inst7-marxin-profile-nolt/toolkit/library/build/../../../gfx/cairo/libpixman/src/pixman-trap.o is referenced by DSO /usr/lib/x86_64-linux-gnu/libcairo.so
/aux/hubicka/firefox-martin/gecko-dev/docshell/base/nsDocShell.h:125:0: warning: Declaring type ‘struct nsDocShell’ final would enable devirtualization of 576 calls executed 8840 times [-Wsuggest-final-types]
 class nsDocShell : public nsDocLoader,
 ^
/aux/hubicka/firefox-martin/gecko-dev/content/base/src/nsScriptLoader.h:38:0: warning: Declaring type ‘struct nsScriptLoader’ final would enable devirtualization of 64 calls executed 608 times [-Wsuggest-final-types]
 class nsScriptLoader : public nsIStreamLoaderObserver
 ^
/aux/hubicka/firefox-martin/gecko-dev/xpcom/base/nsCycleCollector.cpp:2019:7: warning: Declaring type ‘struct GCGraphBuilder’ final would enable devirtualization of 20 calls executed 262 times [-Wsuggest-final-types]
 class GCGraphBuilder : public nsCycleCollectionTraversalCallback,
       ^
/aux/hubicka/firefox-martin/gecko-dev/layout/base/nsCaret.h:23:0: warning: Declaring type ‘struct nsCaret’ final would enable devirtualization of 130 calls executed 228 times [-Wsuggest-final-types]
 class nsCaret : public nsISelectionListener
 ^
/aux/hubicka/firefox-martin/gecko-dev/netwerk/protocol/http/nsHttpConnectionInfo.h:32:7: warning: Declaring type ‘struct nsHttpConnectionInfo’ final would enable devirtualization of 2 calls executed 100 times [-Wsuggest-final-types]
 class nsHttpConnectionInfo
       ^
/aux/hubicka/firefox-martin/gecko-dev/layout/base/../../content/base/src/nsXMLHttpRequest.h:183:0: warning: Declaring type ‘struct nsXMLHttpRequest’ final would enable devirtualization of 124 calls executed 78 times [-Wsuggest-final-types]
 class nsXMLHttpRequest : public nsXHREventTarget,
 ^
/aux/hubicka/firefox-martin/gecko-dev/parser/html/nsHtml5StreamListener.h:31:0: warning: Declaring type ‘struct nsHtml5StreamListener’ final would enable devirtualization of 8 calls executed 70 times [-Wsuggest-final-types]
 class nsHtml5StreamListener : public nsIStreamListener,
 ^
/aux/hubicka/firefox-martin/gecko-dev/xpfe/appshell/src/nsWebShellWindow.h:26:0: warning: Declaring type ‘struct nsWebShellWindow’ final would enable devirtualization of 62 calls executed 12 times [-Wsuggest-final-types]
 class nsWebShellWindow : public nsXULWindow,
 ^
/aux/hubicka/firefox-martin/gecko-dev/dom/base/nsScreen.h:20:0: warning: Declaring type ‘struct nsScreen’ final would enable devirtualization of 16 calls executed 2 times [-Wsuggest-final-types]
 class nsScreen : public mozilla::DOMEventTargetHelper
 ^
and 31 methods with one few executed over 100 times:
/aux/hubicka/firefox-martin/gecko-dev/docshell/base/nsDocShell.h:157:0: warning: Declaring method ‘_ZThn424_N10nsDocShell17GetIsBrowserOrAppEPb’ final would enable devirtualization of 4 calls executed 7878 times [-Wsuggest-final-methods]
     NS_DECL_NSIDOCSHELL
 ^
/aux/hubicka/firefox-martin/gecko-dev/image/src/imgRequestProxy.cpp:93:0: warning: Declaring method ‘AddRef’ final would enable devirtualization of 70 calls executed 4924 times [-Wsuggest-final-methods]
 NS_IMPL_ADDREF(imgRequestProxy)
 ^
/aux/hubicka/firefox-martin/gecko-dev/dom/base/nsGlobalWindow.h:386:0: warning: Declaring method ‘_ZThn32_N14nsGlobalWindow10GetRealTopEPP12nsIDOMWindow’ final would enable devirtualization of 6 calls executed 1590 times [-Wsuggest-final-methods]
   NS_DECL_NSIDOMWINDOW
 ^
/aux/hubicka/firefox-martin/gecko-dev/layout/base/nsPresContext.cpp:332:0: warning: Declaring method ‘Release’ final would enable devirtualization of 302 calls executed 1112 times [-Wsuggest-final-methods]
 NS_IMPL_CYCLE_COLLECTING_RELEASE_WITH_LAST_RELEASE(nsPresContext, LastRelease())
 ^
/aux/hubicka/firefox-martin/gecko-dev/dom/base/nsGlobalWindow.h:386:0: warning: Declaring method ‘_ZThn32_N14nsGlobalWindow13GetRealParentEPP12nsIDOMWindow’ final would enable devirtualization of 8 calls executed 826 times [-Wsuggest-final-methods]
   NS_DECL_NSIDOMWINDOW
 ^
/aux/hubicka/firefox-martin/gecko-dev/dom/bindings/CallbackObject.cpp:33:0: warning: Declaring method ‘Release’ final would enable devirtualization of 3262 calls executed 670 times [-Wsuggest-final-methods]
 NS_IMPL_CYCLE_COLLECTING_RELEASE(CallbackObject)
 ^
/aux/hubicka/firefox-martin/gecko-dev/content/base/src/nsScriptLoader.cpp:179:0: warning: Declaring method ‘AddRef’ final would enable devirtualization of 14 calls executed 608 times [-Wsuggest-final-methods]
 NS_IMPL_ISUPPORTS(nsScriptLoader, nsIStreamLoaderObserver)
 ^
The execution counts are actually surprisingly small. Partly because developers still tends to try to keep polymorphic calls out of the hot paths and partly because the train run is actually not very computationally intensive. I would be very interested if the benefits can be measured in some practical benchmarks.

Next time I plan to write on determing dynamic types of heap allocated memory.

4 comments:

  1. -Wsuggest-final-types and -Wsuggest-final-methods will be available first in GCC 4.10?

    ReplyDelete
    Replies
    1. Next GCC version will be 5.0. But yes, it will be available there.

      Delete
  2. Replies
    1. The release candidate will be hopfully out in a week or two. Once the current mainline meets the release criteria (no P1 regressions and less than 100 regressions) https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=SUSPENDED&bug_status=WAITING&bug_status=REOPENED&bugidtype=include&known_to_fail_type=allwordssubstr&known_to_work_type=allwordssubstr&list_id=114632&priority=P1&priority=P2&priority=P3&query_format=advanced&short_desc=\[%28[%200-9.%2F]*[%20%2F]%29*5[%20%2F][%200-9.%2F]*[Rr]egression%20*\]&short_desc_type=regexp&target_milestone=4.8.5&target_milestone=4.9.3&target_milestone=5.0

      Delete