Sunday, February 9, 2014

Devirtualization in C++, part 3 (building the type inheritance graph)

Having covered the basic optimizations done by the C++ frontend and the way how generic middle-end optimizations may devirtualize. It is a time to dive into an wonderful world if special-purpose middle-end devirtualization techniques.

Topic of optimization of polymorphic calls is seemingly well covered by the literature with active research since the advent of object oriented programming (see, for example, DF Bacon, PF Sweeney: Fast static analysis of C++ virtual function calls and the numerous papers citing thisone). Pretty much all of the techniques are based on constructing and analyzing the type inheritance graph.

Unfortunately most of the research papers make assumptions that are not met in modern C++ programs. Here the situation is complicated by many C++ features including interesting implementation details of multiple and virtual inheritance partly described in this post and ability to make dynamic type changes. Important is also the fact that the compiler hardly ever see the whole program at once. Even with the link-time optimization one has to take into account the presence of shared libraries. Because shared libraries are supposed to be upgradeable without recompiling programs that use them, the compiler can not make any analysis beyond what it is given in the header files and as explained last time sometimes not even that. Moreover shared libraries are also often dynamically loaded as plugins. Contemporary C++ programs are really a lot more dynamic than most of the research assume.

Today I thus cover the seemingly simple topic of building and walking the type inheritance graph and show details one has to think about. It may seem exhausting, but so was my way to gain all this wisdom. :) I promise that next time I finally get to some real optimization fun.

 

Type inheritance graph

Type inheritance graph (or class hirearchy graph, or class hirearchy diagram) is a directed acyclic graph where vertices are types (classes) and there are oriented edges representing direct bases of a type. 

This is a common educational and source documentation tool for object oriented programming and there are many tools to visualize it (for example doxygen), so probably it does not need more introduction.

Existing middle-end implementations

The type inheritance graph is a new concept to the GCC's middle-end. While there was an implementation submitted in 2005 (see 2006 GCC Summit paper Mircea Namularu: Devirtualization in GCC) it was not quite production ready. It was used for analysis of polymorphic calls (based on Bacon's and Sweeney's research) making very strong whole program assumptions long time before the link-time-optimization was implemented. For those reasons it never made it to the mainline. 

Last July I implemented a new type inheritance graph builder (in trunk/gcc/ipa-devirt.c) mostly motivated by Firefox developers complaining about lack of devirtualization in GCC. GCC 4.9 will thus be the first release of GCC doing optimizations based on type inheritance analysis. I implemented only the basic analysis and transformations so far, but even those seems to perform pretty well. This will be a topic of next post. Once the implementation gets production ready with release of GCC-4.9, I plan to add new features.

Open 64 contained a type inheritance graph construction and analysis since 2006 (implemented in osprey/ipa/main/analyze/ipa_chg.cxx and osprey/ipa/main/analyze/ipa_devirtual.cxx) including quite fancy virtual call optimization logic.

LLVM does not have middle-end concept of type inheritance, yet. People at RAS implemented devirtualization based on similar research papers as Mircea's GCC patch. Their work is not in the mainline, yet.

Clang provide way to visualize the graph via GraphViz in InheritViz.cpp, but this information is not explicitly passed down nor used for any any non-trivial optimizations within the front-end. RAS implementation extended the IR to pass the information, though I think it can be also partly reconstructed from the debug info.

Building the type inheritance graph in GCC

The main challenge of obtaining the type inheritance graph in language-independent middle-end is designing a reasonable way to pass down the information from the language front-end without poluting the middle-end with too many language specific notions.

GCC has an elaborate representation of types that store virtually all information you want to ever know about it (and also a lot of information you do not want causing memory usage and maintainability issues). The way middle-end is informed about bases of a given type is a structure called BINFO. Records and unions produced by C++, objective-c++ and Java frontends attach BINFO to types. Traditionally the main consumer for BINFOs is the debug info output machinery.

BINFOs can be used to obtain the type inheritance graph but their own representation does not allow walks and analysis that I am interested in. All links are one directional from derived type to bases. For that purpose I implemented my own datastructure that translates BINFOs into the explicitly represented type inheritance graph.

What information available in BINFO is relevant to me?
  1. VTABLE points to an virtual table.  It is expression in form &vtable+offset. The reason for offsets will be explained in greater detail later.
  2. TYPE links back to type BINFO corresponds to.
  3. BASE_BINFOS provide list of all direct basses. The bases are not types, but another BINFOs. These "duplicated" BINFOs actually holds additional information — just as pointing to base's virtual table that is different from base type virtual table.

    This means that one type have associated many BINFOs depending on how many times it was derived. One have to keep in mind if he wants base's BINFO (to lookup, for example, an additional virtual table used by the type he is looking into) or base type's BINFO that links to the virtual table of the base type.
  4. OFFSET is non-0 only for BINFO representing a base and it specify where the base's data are located within the memory layout of the instance of the type in question. For single inheritance this is always 0.
Turning this information into a directed acyclic graph is a simple exercise. The only problem is where to start. GCC does not really have central table of types used by the current compilation unit. It lists all top-level type declarations it seen during parsing (for debug info purposes), but building graph from this would be expensive since most of the types actually come from headers that are not reached by current unit and optimized out. Also types declared within different types or in function bodies would be missed.

Instead of that my type inheritance graph builder walks the symbol table and looks for virtual methods defined. For every virtual method found it takes its type to populate the graph. This leads to a pruned type inheritance graph. I do not need the graph to contain every type defined in the source code. It needs to track only those types that have reachable virtual methods associated with them. When program itself deals with a derived type that does not contain virtual method, I add it into the graph later just to make walking possible. Those nodes serve no other purpose.

This idea has turned out to be wrong for the case of virtual inheritance; here virtual methods come in two flavours (the second is a thunk described later) and the type itself use the normal functions, while derived types use the thunk. It may happen that the derived type does not define any virtual method on its own. For this I had to extend the builder to also consider types of virtual tables in the unit.

Going inter-module

From a single compilation unit in isolation it is often impossible to derive information we want. We do not know all derivations of a given type and thus we can not list all method overriding a given virtual method. The main motivation for implementing the middle-end analysis of type inheritance is to make devirtualization happen at the whole program as part of the link-time optimization.

Inter-module type inheritance graph construction brings interesting problem: How to decide if two types from different compilation units are the same?

This problem in general is harder than it seems. Most language standards do not really cover what happens with types in between modules. For example in C two sources like this:
  1. a.c:
    typedef int foo;
    foo *ptr;
    foo_write (int a)
    { *ptr = a; }
  2. b.c:
    typedef int bar;
    extern bar *ptr;
    bar_read (int a)
    { *bar = a; }

can be linked into a valid program and foo_write and bar_read have to cooperate even though ptr has type foo in one compilation unit and bar in the other. This differs from single compilation unit where writing one memory location by pointer of type foo * and reading by bar * is undefined by ANSI-C aliasing rules.

Things get a lot sloppier when multiple languages are linked together. No one really formally define what happens when you share same data by C, Ada and Fortran together. This is an engineering challenge developers of every compiler has to fact when going inter-module.

For this reason at link-time, the GCC (and LLVM) optimizers really makes no differences in between type names and works on the memory layout only. Doing so while building the type inheritance graph would be disasterous, because different classes with the same memory layout would get unified even if their virtual tables are not the same at all. The directed acyclic graph would become a directed graphs with loops wrecking any attempt to analyze it.

GCC link-time optimizer have second merging algorithm that is used i.e. for the debug info. Here two types are merged only if they are really equivalent in all respects, including the type names. Even this equivalence does not fit the bill. Consider example:
  1. c.C:
    struct A;
    struct B {struct A *a;};
  2. d.C:
    struct A {int a;};
    struct B {struct A *a;};  
Here the types of B are obviously the same types, but the fact is not so obvious for the compiler. In the c.C structure A is not a complete type a thus the in memory representation differs from one in unit d.C. Consequently also B representation is considered different. This is desirable property for C, where in complete C program one actually may have two different definitions of struct A (for example coming from two independently developed libraries) and struct B is the same only by accident. Unifying the types would confuse debug info, where exploring values stored in struct A via field of struct B would give wrong result.

Using this weak type equivalency would again lead to wrong type inheritance graph. In some cases the types would end up being duplicated with their derived types attached randomly to one of the copies and thus the walks would again be incomplete.

Fortunately C++ standard actually speaks about inter-module type equivalency and introduces One Deifinition Rule. As wikipedia puts it, it means that:
  1. In any translation unit, a template, type, function, or object can have no more than one definition. Some of these can have any number of declarations. A definition provides an instance.
  2. In the entire program, an object or non-inline function cannot have more than one definition; if an object or function is used, it must have exactly one definition. You can declare an object or function that is never used, in which case you don't have to provide a definition. In no event can there be more than one definition.
  3. Some things, like types, templates, and extern inline functions, can be defined in more than one translation unit. For a given entity, each definition must be the same. Non-extern objects and functions in different translation units are different entities, even if their names and types are the same.
By one definition rule a.c and b.c I gave above can not coexist in a single C++ program. This is an important difference between C and C++.

I therefore implemented one definition rule based type merging (odr_type). My first attempt to implement it actually compared type names and recursively their contexts. This failed miserably because of templates: one template may have multiple different instantiations and in this case one has to compare template arguments, too. Doing so is not a trivial task and something that does not belong to language independent part of the compiler.

Instead of duplicating all this logic, it is possible to use C++ name mangling. For polymorphic types one can compare mangled names of their associated virtual tables (and this is what GCC does now). For non-polymorphic types I currently have no use for for this kind of equivalency, but for GCC 4.10 I want to use mangled type names for this purpose. (Currently these are not computed and I will need to work out how expensive is to store them.) This equivalency is useful beyond the devirtualization machinery. Another important consumer can be the type based alias analysis.

Assigning virtual methods to types

Main use of the type inheritance graph in the code optimizer is to get reasonable answers about possible destinations of a polymorphic call. For that one needs to associate virtual methods with the types in the type inheritance graph.

High-level tools assign methods to types by recording all methods declared for a given type. For an instance of known type it is then usually easy to see what is the method called by looking up in the type inheritance graph and obtaining the first method of a given name. This however may lead to incorrect results not only because C++ allows to define two methods with the same name and decision which one will be used is non-trivial.

GCC implementation is more low-level. I do not use name of the method names, but a token that is the index into an virtual table which is used to perform the virtual call in the low-level representation. Based on the knowledge of type of the instance, I look its BINFO and via that I get into the corresponding virtual table to obtain the target.

This shifts the problem to the problem of associating virtual tables with a given type and a given polymorphic call. In a naive implementation it is easy. Here every polymorphic type would have a virtual table pointer and every of its base classes would have a virtual table pointer, too.

Naive implementation would be wasteful and thus C++ implementations does quite a lot of additional magic.

Consider single inheritance cases where struct A is a base of struct B. There is no need for two virtual table pointers, instead the memory layout of virtual table of B can start with virtual table of A followed by entries for possible virtual functions introduced in B. C++ Application Binary Interface makes this sharing of virtual tables mandatory. For example:
struct A
{
  virtual void foo(void);
};
struct B: A
{
  virtual void foo(void);
  virtual void bar(void);
};
Will lead to two virtual tables. Virtual table of A will consist of
{0, typeinfo_of_a, &A::foo}
That is offset to top and RTTI type info followed by an array of pointers to virtual methods. See part1 for details. Virtual table of B will consist of
{0, typeinfo_of_b, &B::foo, &B::bar}.
Instances of B will have one virtual table pointer pointing to the first virtual function in the table. In similar way are also organized the data stored in instances; virtual table pointer comes first followed by data of A (in a way so the initial segment of memory layout can be considered to be instance of A) and additional data of B comes next.

Multiple inheritance

In the case of multiple inheritance the first base is special (called primary base) and is handled as in the case of single inheritance. (This is oversimplified, the rules about what base comes first involve several optimizations to improve the layout, see C++ ABI). Other bases follows and have they own virtual table pointers. The virtual table is now really an sequence of virtual tables and the additional bases have virtual table pointers pointing into middle of it. Again the memory layout if instance data is similar: things come in DFS postorder. Nice rationale for the design appears in Bjarne Stroustrup: Multiple Inheritance for C++ (1999).

Consider:
struct A
{
  virtual void foo(void);
};
 
struct B
{
  public:
  virtual void bar(void);
};
struct C: A,B
{
  virtual void foo(void);
  virtual void barbar(void);
};
This will give virtual table of A same as in the previous case. B's virtual table is also obvious. C's virutal table will be:
{ 0, typeinfo_for_C, C::foo, C::barbar,
 -8, typeinfo_for_C, B::bar}
That is technically a two virtual table in sequence.  One is shared by A and C, while second is for B contained withing C.  Notice the offset to top of -8 specifying how to get from B into the whole instance. Memory layout of C will have two virtual table pointers.  One pointing to C::foo within the virtual table (used for A and C) and other to B::bar (used for B).

This is all explicitly represented in GCC BINFO structures. BINFO representing C will have virtual table pointer as describe, BINFO representing base A in C will have no pointer to virtual table with offset 8 (because it is shared with the outer type) and BINFO representing B will have pointer to the virtual table with offset 16. Whiile walking down the type hiearchy one thus has to keep track of the stack of BINFOs defining the virtual table and if one wants to associate a virtual table for BINFO without any, one has to look in the stack and find one with equivalent offset.

And now for something completely different

Things go really crazy in the case of virtual inheritance. First the base class data are now no longer at fixed offset within the instance. For this reason, the virtual tables needs to be extended and contain additional offset that can be used to access the data. Next it is really challenging to get right layout of virtual tables and to get construction and destruction right.

Lets consider an example:
struct A
{
  virtual void foo(void);
};
struct B: virtual A
{
  virtual void foo(void);
};
struct C: virtual A
{
  virtual void bar(void);
};
struct D: B,C
{
  virtual void foo(void);
  virtual void barbar(void);
};
And lets again look into the virtual tables and memory layout.

virtual table and memory layout of A is easy.

Memory layout of B is the same as for the case of single inheritance: there is one virtual table pointer followed by data of A and data of B.

Virtual table of B (and analogously for C) gets two extra offsets per each virtual base added to the left of virtual table pointer (just after typeinfo and offset to top):
  1. Virtual call offset. This offset makes it possible to override virtual method of virtual base. This method gets called with this pointer pointing to the virtual base and it needs to use the offset to get to pointer to the whole object.
  2. Virtual base offset. This offset say how to get from pointer to the object to pointer to the virtual base.
So virtual table of B looks like this:
{0, 0, 0, typeinfo_of_B, B::foo}
The new offsets are both 0 because nothing special is going on here; the memory layout of the object is precisely as it would be in the case of single inheritance.

Virtual table for D is:
{ 0,  0,  0, typeinfo_of_D, D::foo, D::barbar,
 -8, -8, -8, typeinfo_of_D}
In addition to that there is so called  construction virtual table for B in D:
{0, 0, 0, typeinfo_of_B, B::foo}
This table is just coincidentally same to virtual table for B, because D do not override B::foo.  GCC will completely pointlessly produce the duplicate though. Something Martin Liška will hopefully address by code and data unification pass in GCC 4.10.

There is also a construction virtual table for C in D:
{-8, -8, 0, typeinfo_of_C, A::foo, C::bar, 0, 8, typeinfo_of_C, A::foo}
And finally there is table called virtual table table:
{&vtable_for_D[4],
 &construction_vtable_for_B_in_D[4],

  &construction_vtable_for_B_in_D[4],
 &construction_vtable_for_C_in_D[4],
 &construction_vtable_for_C_in_D[9],
 &construction_vtable_for_D[4],
 &construction_vtable_for_D[10] }
OK, why we need so many tables? They come into game during construction and destruction. Before an object is constructed but after base is constructed, the virtual table pointers needs to be set in a way so virtual call to the base's virtual methods works and lands in the base's method and not into an overrider whose instance is not constructed yet. Constructor of D looks like this:
D::D() (struct D * const this)
{
  struct A * a;
  struct C * c;

  a = &this->vptr;
  A::A (a);
  B::B (a, &virtual_table_table_for_d[1]);
  c = &this->c.vptr;
  C::C (c, &virtual_table_table_for_d[3]);
  this_1(D)->b.a._vptr.A = &virtual_table_for_d[4];
  this_1(D)->c.a._vptr.A = &virtual_table_for_d[10];
  return;
}
One thing to observe is that virtual table pointer this->b.a.vptr is actually shared for D, B and non-virtual base A of B. There is another virtual table pointer (this->c.vptr)for virtual base A of C. The construction tables are passed via virtual table tables into the constructors of A and B that now exists in an alternative version that takes the virtual table table pointer as an parameter.
B::B() (struct B * const this, const void * * __vtt_parm)
{
  struct A *a;

  this->vptr = __vtt_parm[0];
  a = (char *)this + vtt_parm[0][-4];
  a->vptr = vtt_parm[1];
  return;
}
The constructor first initializes B's virtual table pointer to virtual table passed via virtual table table at index 0. Then it uses the virtual table to lookup where A is really located and initializes A's virtual table pointer to virtual table passed via virtual table table at index 1. It does not construct A assuming it is already done by the outer constructor.

If you look into how that works in the context of constructor of C, you will see that B writes both virtual tables into the same location and both times it will be the construction virtual table of B in D.

Now the constructor of C looks analogously, but it is given two different virtual tables. It will write construction virtual table for C in D to virtual pointer of C within D and then it will look up the exactly same copy of A and rewrite its virtual table by the second part of the construction virtual table for C in D.

Now C and D are constructed.  It remains to write a proper pointers into virtual table for D to both virtual table pointers contained in instance of D.

With optimization enabled, the constructors will get inlined and compiled into two memory stores initializing the virtual tables to their final values. Again optimizer however needs to do quite some legwork to actually get to that.

Easy! Want something more complicated?

Thunks

Thunk is something that people hardly know about (and want to know about) that makes C++ programs work.

All methods take a hidden parameter to the instance of object (this pointer). With only single inheritance involved, the situation is easy: this pointer is the same for instance's type as it is for its base class.

Situation changes with multiple inheritance.  In the following example:
struct A
 {
   int a;
   virtual int bar(void) {return a;}
 };
struct B
 {
   virtual int foo(void) {return b;}
   int b;
 };
struct C: A,B
 {
   virtual int foo(void) {return a;}
 };

struct C c, *d=&c;
int test(void)
{
  struct B *b=d;
  return d->C::foo()+d->foo()+b->foo();
}
all three calls leads to C::foo, but each in a different way. The first one is a direct call, the this pointer passed to the call is d that points to c. The second is a polymorphic call, here the lookup happens from virtual table that is at the beggining of memory layout of c and this pointer again points to c. In the last case however the assignment b=d implicitly offsets the pointer to beggining of base B within C. The lookup thus happens from the second virtual table pointer in c and foo is called with this pointer equivalent to b instead of d.  How C::foo can work when it may get called in two incompatible ways?

For this reason the second virtual table does not contain pointer to C::foo, but pointer to so-called non-virtual thunk that in x86-64 assembly looks as follows:
_ZThn16_N1C3fooEv:
.LFB15:
        subq    $16, %rdi
        jmp     _ZN1C3fooEv
This is a small wrapper function that offsets the this pointer back and makes the program work.

Thunks again come in several flavours — the virtual thunks makes the offset based on virtual table lookup to make virtual inheritance work. Covariant thunks  are used to do the same compensation on return value in the case the method returns pointer to a class (such as this).

When using results of the type inheritance graph to turn a polymorphic call into a direct call, one can not make a mistake in between method and its thunk. This is the main reason why one really wants to use the low-level representation via virtual tables instead of associating the methods via their names. One also have to keep in the mind, that instances may be in a construction/destruction phase when their virtual tables are temporarily changed.

Walking the type inheritance graph

Now we are set to collect results of a polymorphic call.

The trivial case

In the simplest case:
class A *a;
...
a->foo()
One first needs to figure out the token of foo in a's virutal table. In GIMPLE the token is associated with a call via explicit wrapper called OBJ_TYPE_REF. Then one walks the type inheritance graph down recursively starting from a. For every type one recursively walks all bases and look for bases of type a. For each of that one can inspect the virtual table and see if there is an override.

Using knowledge about type containing the instance.

GCC does more elaborate analysis of call sites. In the following case:
class B *b;
...
class A *a=b;...
a->foo()
one can easily work out that instance of a is actually base of instance of b at a known offset (the implicit cast a=b offsets the pointer in the case of multiple inheritance). When listing possible targets of the call, one do not need to walk all dervied types of A, instead one can start the walk on A. Here one needs to recursively walk and look for bases of all those types and find B. Within those copies of B one has to look for base of type A that appears at a given offset.

Considering types in construction and destruction

Logic taking into account the outer type will break if the sequence above was found within a constructor or destructor.  If *b is in construction, the virtual table pointer is temporarily set to virtual table of a. For this case one needs to add into the list also virtual methods of all corresponding instances of A in bases of B.

Combining it together

In GCC I implemented a structure polymorphic_call_context that contains
  1. an outer_type, this is the type where start the walk. If the program analysis determined that the virtual call is contained in a bigger object, this is the type of bigger object in our example it is b,
  2. offset in bits specifying where the class whose method is called is located within the outer_type,
  3. a flag maybe_in_construction specifying whether we want to include the base types, and,
  4. a flag maybe_derived_type specifying whether we want to include derivations.  

These queries seems to describe all types of problems we need to solve on type inheritance graph for now. The precise algorithm is as follows:

Every polymorphic call has a given type (that is a type of pointer that was used to produce the call), token (index into the virtual table). This is explicitly represented in GIMPLE via the OBJ_TYPE_REF wrapper. We determine the polymorphic_call_context by analyzing the function body. The lookup of possible targets is performed in several steps:
  1. If outer_type is specified and it is a structure or class, its fields are walked recursively to their types until there is no field that would overlap with position specified by offset. This gives a class whose method is called.
    For example when outer_type is:
    struct A {struct B b; struct C c;}
    Polymorphic call analysis really care about virtual methods of B or C, because A has none.

    After this outer_type and offset is updated to contain this type. If the update happened, the maybe_derived_type flag can be cleared, since structure fields can not contain derived types.

    (see get_class_context in  gcc/ipa-devirt.c)
  2. If maybe_in_construction is set, then all bases of outer_type that in memory layout overlap with position offset are walked recursively and for each of them, the virtual table lookup is performed in the base's type virtual tables (as opposed to the base's virtual tables that are different) and the method specified by token and result is added to the list of possible targets.

    Because I care only about possible destinations of the call, I do not really need to walk the construction virtual tables. When type is derived, no new thunk of base's method are produced and they will be all present in virtual tables of bases.

    (see record_targets_from_bases in  gcc/ipa-devirt.c)
  3. lookup of the virtual table of the method specified by token.
  4. If maybe_derived_type is set, the derived types of outer_type are walked and for each apperance of outer_type as a base (it may appear many times for every derivation) the lookup is performed and method is added to list.

    (see possible_polymorphic_call_targets_1 in  gcc/ipa-devirt.c)
In all steps the virtual table lookup actually takes into account the offset specifying the position of the table in the memory layout of instance and further dives into the bases to find first base with virtual table at the given position.

To make things go faster, the answers are cached so if the program contains many virtual calls that are alike, we do not keep reconstructing the table.

GCC 4.9+ can dump results of polymorphic call analysis to with -fdump-ipa-devirt. Analyzing example:
struct A
 {
   int a;
   virtual int bar(void) {return a;}
 };
struct B
 {
   virtual int foo(void) {return b;}
   int b;
 };
struct C: A,B
 {
   virtual int foo(void) {return a;}
 };

struct C c, *d=&c;
int test(void)
{
  struct B *b=d;
  return d->foo()+b->foo();
}
One gets:
  Targets of polymorphic call of type: struct C token 1
    Contained in type:struct C at offset 0
    This is partial list;
    extra targets may be defined in other units.
    (derived types included)
       virtual int C::foo()
  Targets of polymorphic call of type: struct B token 0
    Contained in type:struct B at offset 0
    This is partial list;
    extra targets may be defined in other units.
    (derived types included)
       virtual int B::foo()
       virtual int C::_ZThn16_N1C3fooEv()
The first call is correctly analyzed to land in C::foo() unless some other compilation unit derives the type.

The second call has two alternatives: B::foo() and the aforementioned non-virtual thunk to C::foo(). Here GCC makes too weak assumption thinking that b may not point to instance of d. This is because in C++ one can not rely on fact that pointer d does not hold a pointer of completely different type. The program becomes invalid only when d is referenced, that happens on the first use of d->C::foo(), but GCC does not understand it, yet.

What happens when we make the instance explicit?
struct C c;
int test(void)
{
  struct C *d=&c;
  struct B *b=d;
  return d->foo()+b->foo();
}

here the analysis of second call improves as:

  Targets of polymorphic call of type: struct B token 0
    Contained in type:struct C at offset 128
    This is full list. (base types included)
       virtual int C::_ZThn16_N1C3fooEv()
       virtual int B::foo()

B is now correctly contained in C, but why B::foo() is still on the list?  Well, because GCC thinks that instance of C may be in construction or destruction. Again something that is not valid by C++ stabdard (one can not use pointer to instance before it is fully built), but lost in the translation from C++ to the intermediate language.

Are my answers complete?

Most of papers about polymorphic call optimization assume that they know complete type inheritance graph of the whole program. This is something definitely not met in practice. C++ allows to derive a type defined in one compilation unit in other compilation unit and introduce new overrides without making this explicit in the both units. This makes C++ type inheritances hard to analyze, since one needs to assume that new derived types to the type inheritance graph can be added during linking, dynamic linking or via dlopen.

C++ now however provide several ways to explicitly declare that type will not be derived in other unit via the anonymous namespace. C++11 also introduce a final keyword, that can be used to annotate classes and methods. In first case it forbids any derivations in the other case it prevents any override in a derived class.

Since GCC-4.9 they are used to make completeness assumptions about the list of possible targets. Sadly, anonymous namespaces and final keywords are not very widely used. Lets hope it will change in longer run. I also plan to add a warning to help users to annotate their sources.

Now we are set and next time for some real optimization fun!

3 comments:

  1. Sir, I don't know how much time you've spent studying compilers and C++ but this article is amazing!.
    I envy the depth of your knowledge. I only aspire to be a decent C++ developer someday

    ReplyDelete
    Replies
    1. Thank you! I would not realy call myself advanced C++ developer, but by working on the compiler one has to learn the low-level part very well. One of motivations for writing these series is to get things organized. I am happy you appreciate it!

      Delete
  2. if your aim is to get organised, I would appreciate if you are not at all at the moment and need to spend more efforts to become. very well written and enlightening indeed. a breeze to have a guided glimpse under the hoods, masking out all the things that are gratuitous at this specific moment.

    ReplyDelete