Thursday, September 18, 2014

Devirtualization in C++, part 7 (Enforcing One Definition Rule)

This time I am going to write about a feature that I implemented as a side effect of the devirtualization machinery: the verification of One Definition Rule on types.

One Definition Rule (ODR) is one of more interesting cases where C and C++ differs in very subtle way. Wikipedia describes it as follows:
In short, the ODR states that:
  1. In any translation unit, a template, type, function, or object can have no more than one definition. Some of these can have any number of declarations. A definition provides an instance.
  2. In the entire program, an object or non-inline function cannot have more than one definition; if an object or function is used, it must have exactly one definition. You can declare an object or function that is never used, in which case you don't have to provide a definition. In no event can there be more than one definition.
  3. Some things, like types, templates, and extern inline functions, can be defined in more than one translation unit. For a given entity, each definition must be the same. Non-extern objects and functions in different translation units are different entities, even if their names and types are the same.
Some violations of the ODR must be diagnosed by the compiler. Other violations, particularly those that span translation units, are not required to be diagnosed.[1]
This means that C++ type names are global and should not be re-used in different ways in different source files. (In C doing so is perfectly valid and common, types across units are considered equivalent if their structure is.)

One of motivations for ODR is to make name mangling possible. Type names are used, for example, as program wide identifiers to distinguish functions and methods of same name but taking different types of arguments (function overloading):
struct A {int a;};
int getval (struct A a) {return a.a;}
struct B {char a;};
int getval (struct B a) { return a.a;}
This compilation unit will result in exporting two function symbols:  _Z6getval1A and _Z6getval1B. A and B comes from name of the argument's type.

Now, obviously, if another unit defines completely different structure A and completely different getval taking it as argument, the function will also be called _Z6getval1A and things will fail miserably. General violations of ODR can not be diagnosed by compiler working on single translation unit in isolation. If one is lucky, ODR violation turns into linker error about duplicated symbol name. With presence of inline functions however this will not happen and one of the two functions will be eliminated from program. This may result in invocation of a function on wrong data crashing program in random ways.

Tuesday, September 9, 2014

Linktime optimization in GCC, part 3 - LibreOffice

After Firefox, I decided to look into LibreOffice performance with link time optimizations (LTO) and profile feedback (FDO). This post was in works since April, so my setup is not exactly bleeding edge - GCC 4.9.1 prerelease, LLVM 3.5 and LibreOffice checkout from April. Some real world issues (like a wedding) interrupted my work, however the data presented should be still relevant and interesting.