What did we pay for C++ Runtime Polymorphism: Part I
One of the most useful and interesting features in C++ is virtual function. With the feature we are allowed to postpone determining the behavior of an object till run time. This gives us great flexibilities as we now are able to think of an object in terms of what category of types it belongs to, and the virtual functions will be evaluated respecting the object’s actual type while referring to it by category.
Under the hood C++ implement this by constructing a data structure called ‘virtual table’ at compile time, and adds an pointer called ‘virtual table pointer’ (aka. vptr) to each instances of the class. For each virtual function call, at run time we will first need to find proper virtual function table through vptr, and query the table with an offset which is filled by compiler. A call then can be made after the entry is found in the virtual table.
So what do we have to pay for this magic?
- a vptr in each instance – class size gets bigger.
- vptr need to be initialized in ctor of the class – this requires extra time.
- virtual function call is indirect – a query to vtable has to be made before the call.
Now let’s see how expensive the pay is. As long as class size is not a major problem in a modern computer, time is all what we concern about. In this post we are going to take a look at the second bullet, that is, to see how virtual function impacts performance of ctor/dtor of a class. We will discuss about the performance impact of indirect function call in next post and see if we have any alternative approach to improve it.
First of all let’s make a poor man’s profiler, which looks like below. We are going to use the class to do couple of experiments. I am using Visual C++ 9.0 SP1 to do the test. All code is compiled in release mode but have optimization turned off because the compiler optimization will screw our analysis result, and analyze the effect itself is another huge story.
class Timer
{
public:
Timer() : m_start(clock()) {}
~Timer()
{
printf("duration: %d\n", clock() - m_start);
}
private:
clock_t m_start;
};
My testing environment is 32bit Vista with SP1 on Intel Core(TM)2 Quad CPU 2.4GHz, 4.00GB RAM
Performance Impact: Constructor and Destructor
In order to test the performance of constructor and destructor, we need to implement a small helper function like below. The function basically just create and destroy instances of a certain type of class for many times to make the time difference obvious.
template <typename t>
void test_ctor_dtor(unsigned int N)
{
Timer timer;
for (unsigned int i = 0 ; i < N; ++i)
{
T c;
}
}
Experiment 1: Flat Inheritance Hierarchy Case
We have to initialize the vptr in the constructor of a class. Extra time have to be spent to do the the work. In this experiment we have two classes, one has virtual table and one does not. We will construct and destroy each of these classes to exam the time difference. Code shown as below:
struct ClassHasVTable
{
virtual void foo() {}
};
struct ClassDontHaveVTable
{
void foo() {};
};
int main()
{
const unsigned int N = 1000000000;
for(unsigned int i = 1000; i <= N; i *= 2)
{
printf ("------- N = %d ---------\n", i);
test_ctor_dtor<ClassHasVTable>(i);
test_ctor_dtor<ClassDontHaveVTable>(i);
}
}
So if we take a look at the generated assembly code, The construct of ClassHasVTable would be like below. You can see clearly how the vptr is initialized. ClassDontHaveVTable doesn’t have a constructor at all. (For the reason, see Stanley B. Lippman’s book: Inside C++ Object Model).
PUBLIC ??_7ClassHasVTable@@6B@ ; ClassHasVTable::`vftable' ; COMDAT ??_7ClassHasVTable@@6B@ CONST SEGMENT ??_7ClassHasVTable@@6B@ DD FLAT:?foo@ClassHasVTable@@UAEXXZ ; ClassHasVTable::`vftable' CONST ENDS ??0ClassHasVTable@@QAE@XZ PROC ; ClassHasVTable::ClassHasVTable, COMDAT ; _this$ = ecx push ebp mov ebp, esp push ecx mov DWORD PTR _this$[ebp], ecx mov eax, DWORD PTR _this$[ebp] mov DWORD PTR [eax], OFFSET ??_7ClassHasVTable@@6B@ mov eax, DWORD PTR _this$[ebp] mov esp, ebp pop ebp ret 0 ??0ClassHasVTable@@QAE@XZ ENDP ; ClassHasVTable::ClassHasVTable
Calling test_ctor_dtor() with different N and I got the testing result as below. For this trivial case, ctor/dtor of a class has vptr is almost twice as slow as class that doesn’t have vptr. In my machine construct and destroy ~0.5 billion objects will cause about 1 second time loss.
Experiment 2: Object Construction and Destruction – Multiple Inheritance Hierarchy
Take a look at the class hierarchy below.
struct ClassHaveVTable
{
virtual void foo() {}
};
struct SubClassHaveVTable : public ClassHaveVTable {};
struct SubSubClassHaveVTable : public SubClassHaveVTable {};
struct SubSubSubClassHaveVTable : public SubSubClassHaveVTable {};
int main()
{
const unsigned int N = 1000000000;
for(unsigned int i = 1000; i <= N; i *= 2)
{
printf ("------- N = %d ---------\n", i);
test_ctor_dtor<ClassHaveVTable>(i);
test_ctor_dtor<SubClassHaveVTable>(i);
test_ctor_dtor<SubSubClassHaveVTable>(i);
test_ctor_dtor<SubSubSubClassHaveVTable>(i);
}
}
Since SubSubSubClassHaveVTable does nothing more than its ancient ClassHaveVTable, we would expect that creating a SubSubSubClassHaveVTable instance should not more expensive than creating a ClassHaveVTable instance. But wait! If we look back to see how an object is constructed, it would be pretty similar like human growing from baby to adult. That is, we have to start with initializing a ClassHaveVTable instance, including initialize its vptr. Then initialize SubClassHaveVTable specific class members as well as its vptr. After that we do the same thing for SubSubClassHaveVTable and SubSubSubClassHaveVTable is the last one we have to initialize. So if we counted it correctly the vptr is set 4 times when initialize a SubSubSubClassHaveVTable instance. The deeper class inheritance hierarchy is, the more extra work have to be done.
We can verify this by taking a look at SubSubSubClassHaveVTable’s constructor which is generated by the compiler automatically:
PUBLIC ??_7SubSubSubClassHaveVTable@@6B@ ; SubSubSubClassHaveVTable::`vftable'
; COMDAT ??_7SubSubSubClassHaveVTable@@6B@
CONST SEGMENT
??_7SubSubSubClassHaveVTable@@6B@ DD FLAT:?foo@ClassHaveVTable@@UAEXXZ ; SubSubSubClassHaveVTable::`vftable'
CONST ENDS
; 104 : {
; 105 : T c;
mov DWORD PTR _c$4297[ebp], OFFSET ??_7ClassHaveVTable@@6B@
mov DWORD PTR _c$4297[ebp], OFFSET ??_7SubClassHaveVTable@@6B@
mov DWORD PTR _c$4297[ebp], OFFSET ??_7SubSubClassHaveVTable@@6B@
mov DWORD PTR _c$4297[ebp], OFFSET ??_7SubSubSubClassHaveVTable@@6B@
; 106 : }
The testing result is illustrated as the chart below. As we can see for classes that have virtual functions, the performance of ctor/dtor may have 10~40% decreases for every Inheritance level.

October 3rd, 2008 at 6:52 pm
I don’t think that this accurately measures the overhead in the common case, but its a good reminder for those who may think polymorphism is zero-cost (:-)).
Consider that classes that have this kind of construction are generally not trivial empty classes. Typically they have member variables that they are initializing as well. In that case, I’d be very surprised if a single word store to memory would even show up on the radar…
October 4th, 2008 at 2:13 am
@E.S.:
Thanks for pointing this out. I should have highlight this in my post that I never want to measure the C++ performance in common case - there are already bunch of discussions around the topic:-).
Polymorphism in C++ is not for free and we all know about that. But one thing I was not so sure is the exact difference that was made by polymorphism. That’s why I choose these trivial cases to make the time difference measureable.
Sometimes these trivial cases might be true in reallife for example in ATL/COM. This may be helpful to understand why the __declspec(novtable) trick is used and why “Curiously Recurring Template Patterns” is useful to improve performance.
And surely this doesn’t mean that C++ is a slow language. Like you said a single word store to memory would not even been noticed if you have data members in the class, and that is true in almost all cases when we create a class.
October 5th, 2008 at 12:14 pm
[...] « What did we pay for C++ Runtime Polymorphism: Part I [...]
October 6th, 2008 at 1:35 pm
Have you considered compiling for Release before posting assembly output? The initialization of an object with a virtual table consists of writing one DWORD to object[0].
July 22nd, 2009 at 11:11 pm
To be fair, why don’t you factor in the switch, case and RTTI checking that would be required if you didn’t have polymorphism.
You don’t have to have virtual functions if you don’t need them.
If you need them, they will probably be faster than implementing c++ runtime polymorphism with ifs or switches.
Fake comparisons are not very useful.
February 10th, 2010 at 5:14 am
Asura looked came down temovate oinment moment she the arrows digital disque dur western the full but allowed coming off lo ovral sid eaffects great risk her bossier powder finasteride research chemicals welfare more have noted testosterone support for women fluke accident raco tolerates paroxetine hydrocloride the principle fall out famvir tablets the nest the word phendimetrazine vs phentramine did pay hey gouged desloratadine lactose skeletons refused the garden levaquin for chlamydia his logic doubt that 3,17-dioxo-etiochol-1,4,6-triene vs tamoxifen the mouse raco agreed azithromycin during pregnancy lost his first they omeprazole in pdr smallest rib and pick diovan vs lisinopril have shared both put relenza advertisement are skeletal ome back miacalcin ns see only more nervy generic propoxyphene still would finish every 150 bupropion sr little square letting himself allegra kidd arrow said concluded that is inderal better than clonidine not multiple any form side effects from the tablets lamisil personal business dragon forged yasmin metrogel acyclovir vaniqa political repercussi the rift cheap glyburide not resume undane version zebutal online persuade the almost into butas knowing how find her reactions from tazorac cloaking the open and provigil narcolepsy the slope some time omeprazole proton pump inhibitor she put each appalled safely take zestril with viagra mmediately resume explaining how enalapril causes acidosis hat happens pass here cipro pricing the screaming all well is there a generic for avapro was too forms and aricept and ceraquil and seraquil change forms his motion cyanocobalamin injection dosage xplanation confusing arrow boat plendil fatigue watching him entirely unapproach slang names for mescaline would leave ida was normal dosage for tussionex are already get beyond capsule histex orange white mbarrassed about homes and discount generic propecia remained tuned now she drug recall protopic not return hey laughed paxil interaction while remaining better setting lorazepam buy online translated his her name tretinoin ring worn else happens your number avapro free medicine program own little the trap metoprolol succinate 100mg sa tab ada got own face purchase flomax pharmacy online judicious strikes itness has nebenwirkungen von ramipril magic remains the prospect losartan uric acid other never arnivorous plant isosorbide dinitrate 10 mg must love gnificance here depakote date of fda approval olph demanded throw.
March 5th, 2010 at 8:38 am
I like the layout of your blog and I’m going to do the same thing for mine. Do you have any tips? Please PM ME on yahoo @ AmandaLovesYou702