One of the most useful and interesting features in C++ is virtual function. With the feature we are allowed to postpone determining the behavior of an object till run time. This gives us great flexibilities as we now are able to think of an object in terms of what category of types it belongs to, and the virtual functions will be evaluated respecting the object’s actual type while referring to it by category.
Under the hood C++ implement this by constructing a data structure called ‘virtual table’ at compile time, and adds an pointer called ‘virtual table pointer’ (aka. vptr) to each instances of the class. For each virtual function call, at run time we will first need to find proper virtual function table through vptr, and query the table with an offset which is filled by compiler. A call then can be made after the entry is found in the virtual table.
So what do we have to pay for this magic?
- a vptr in each instance – class size gets bigger.
- vptr need to be initialized in ctor of the class – this requires extra time.
- virtual function call is indirect – a query to vtable has to be made before the call.
Now let’s see how expensive the pay is. As long as class size is not a major problem in a modern computer, time is all what we concern about. In this post we are going to take a look at the second bullet, that is, to see how virtual function impacts performance of ctor/dtor of a class. We will discuss about the performance impact of indirect function call in next post and see if we have any alternative approach to improve it.
First of all let’s make a poor man’s profiler, which looks like below. We are going to use the class to do couple of experiments. I am using Visual C++ 9.0 SP1 to do the test. All code is compiled in release mode but have optimization turned off because the compiler optimization will screw our analysis result, and analyze the effect itself is another huge story.
class Timer
{
public:
Timer() : m_start(clock()) {}
~Timer()
{
printf("duration: %d\n", clock() - m_start);
}
private:
clock_t m_start;
};
My testing environment is 32bit Vista with SP1 on Intel Core(TM)2 Quad CPU 2.4GHz, 4.00GB RAM
Performance Impact: Constructor and Destructor
In order to test the performance of constructor and destructor, we need to implement a small helper function like below. The function basically just create and destroy instances of a certain type of class for many times to make the time difference obvious.
template <typename t>
void test_ctor_dtor(unsigned int N)
{
Timer timer;
for (unsigned int i = 0 ; i < N; ++i)
{
T c;
}
}
Experiment 1: Flat Inheritance Hierarchy Case
We have to initialize the vptr in the constructor of a class. Extra time have to be spent to do the the work. In this experiment we have two classes, one has virtual table and one does not. We will construct and destroy each of these classes to exam the time difference. Code shown as below:
struct ClassHasVTable
{
virtual void foo() {}
};
struct ClassDontHaveVTable
{
void foo() {};
};
int main()
{
const unsigned int N = 1000000000;
for(unsigned int i = 1000; i <= N; i *= 2)
{
printf ("------- N = %d ---------\n", i);
test_ctor_dtor<ClassHasVTable>(i);
test_ctor_dtor<ClassDontHaveVTable>(i);
}
}
So if we take a look at the generated assembly code, The construct of ClassHasVTable would be like below. You can see clearly how the vptr is initialized. ClassDontHaveVTable doesn’t have a constructor at all. (For the reason, see Stanley B. Lippman’s book: Inside C++ Object Model).
PUBLIC ??_7ClassHasVTable@@6B@ ; ClassHasVTable::`vftable'
; COMDAT ??_7ClassHasVTable@@6B@
CONST SEGMENT
??_7ClassHasVTable@@6B@ DD FLAT:?foo@ClassHasVTable@@UAEXXZ ; ClassHasVTable::`vftable'
CONST ENDS
??0ClassHasVTable@@QAE@XZ PROC ; ClassHasVTable::ClassHasVTable, COMDAT
; _this$ = ecx
push ebp
mov ebp, esp
push ecx
mov DWORD PTR _this$[ebp], ecx
mov eax, DWORD PTR _this$[ebp]
mov DWORD PTR [eax], OFFSET ??_7ClassHasVTable@@6B@
mov eax, DWORD PTR _this$[ebp]
mov esp, ebp
pop ebp
ret 0
??0ClassHasVTable@@QAE@XZ ENDP ; ClassHasVTable::ClassHasVTable
Calling test_ctor_dtor() with different N and I got the testing result as below. For this trivial case, ctor/dtor of a class has vptr is almost twice as slow as class that doesn’t have vptr. In my machine construct and destroy ~0.5 billion objects will cause about 1 second time loss.
Experiment 2: Object Construction and Destruction – Multiple Inheritance Hierarchy
Take a look at the class hierarchy below.
struct ClassHaveVTable
{
virtual void foo() {}
};
struct SubClassHaveVTable : public ClassHaveVTable {};
struct SubSubClassHaveVTable : public SubClassHaveVTable {};
struct SubSubSubClassHaveVTable : public SubSubClassHaveVTable {};
int main()
{
const unsigned int N = 1000000000;
for(unsigned int i = 1000; i <= N; i *= 2)
{
printf ("------- N = %d ---------\n", i);
test_ctor_dtor<ClassHaveVTable>(i);
test_ctor_dtor<SubClassHaveVTable>(i);
test_ctor_dtor<SubSubClassHaveVTable>(i);
test_ctor_dtor<SubSubSubClassHaveVTable>(i);
}
}
Since SubSubSubClassHaveVTable does nothing more than its ancient ClassHaveVTable, we would expect that creating a SubSubSubClassHaveVTable instance should not more expensive than creating a ClassHaveVTable instance. But wait! If we look back to see how an object is constructed, it would be pretty similar like human growing from baby to adult. That is, we have to start with initializing a ClassHaveVTable instance, including initialize its vptr. Then initialize SubClassHaveVTable specific class members as well as its vptr. After that we do the same thing for SubSubClassHaveVTable and SubSubSubClassHaveVTable is the last one we have to initialize. So if we counted it correctly the vptr is set 4 times when initialize a SubSubSubClassHaveVTable instance. The deeper class inheritance hierarchy is, the more extra work have to be done.
We can verify this by taking a look at SubSubSubClassHaveVTable’s constructor which is generated by the compiler automatically:
PUBLIC ??_7SubSubSubClassHaveVTable@@6B@ ; SubSubSubClassHaveVTable::`vftable'
; COMDAT ??_7SubSubSubClassHaveVTable@@6B@
CONST SEGMENT
??_7SubSubSubClassHaveVTable@@6B@ DD FLAT:?foo@ClassHaveVTable@@UAEXXZ ; SubSubSubClassHaveVTable::`vftable'
CONST ENDS
; 104 : {
; 105 : T c;
mov DWORD PTR _c$4297[ebp], OFFSET ??_7ClassHaveVTable@@6B@
mov DWORD PTR _c$4297[ebp], OFFSET ??_7SubClassHaveVTable@@6B@
mov DWORD PTR _c$4297[ebp], OFFSET ??_7SubSubClassHaveVTable@@6B@
mov DWORD PTR _c$4297[ebp], OFFSET ??_7SubSubSubClassHaveVTable@@6B@
; 106 : }
The testing result is illustrated as the chart below. As we can see for classes that have virtual functions, the performance of ctor/dtor may have 10~40% decreases for every Inheritance level.

October 3rd, 2008 | Tags: C++ | Category: C++ | Comments (5)