Code Gem

Code Gem

Knowledge was arriving in dribs and drabs

Code Gem RSS Feed
 
 
 
 

What did we pay for C++ Runtime Polymorphism: Part II

It seems that there is some misunderstanding about my previous post. The purpose was never for measuring C++ performance but for better understanding of how polymorphism is implemented in C++, and make people aware that those extra power is not come for free. This is why I choose trivial cases and disable compiler optimization. Trivial cases for make the time penalty observable, disabling compiler optimization for ensure that assembly code is generated from C++ source code directly so it could be easier for analysis. These time penalty actually may not be a problem in real life as a considerable part of the overhead will be eliminated by compiler optimization.

Let’s back to our topic. This time we are going to discuss the overhead of indirect function call by comparing it with compile time polymorphism.

Compile Time Polymorphism

If we think about scenarios that we leverage polymorphism, a large part of it doesn’t need runtime binding actually. We can pretty much tell what the exact type is even if the object is referenced by base type. Take a look at the code snippet as below:

struct IBase
{
    virtual void go() const = 0;
};

class Derived : public IBase
{
public:
    virtual void go() const
    {
        doSomething();
        doSomethingElse();
    }

protected:
    virtual void doSomething() const
    {
        // Do something specific for class Derived
    }

    virtual void doSomethingElse() const
    {
        // Do something else specific for class Derived
    }
};

This is actually a very typical pattern that you could find everywhere. But if we take a second look, when we use the class in the following way:

const IBase& b = Derived();
b.go()

Apparently Derived::go() will be invoked at this time because of the indirect call. Inside Derived::go() it calls doSomething()and doSomethingElse(). You can easily figure out that it will need to go through indirect function call process again for these function calls. Can we save some overhead in this case? The answer is positive: we can use the compile time polymorphism.

Runtime polymorphism is all about figure out type information at run time and call the right function based on the real object type. The assumption here is that you do not have a way to determine what exact type of a object at compile time. We do not need bother to call a function indirectly if we can put the type information somewhere in the base class. Yes, template can do the dirty work.

Take a look at the new implementation below:

template <class T>
class IBaseImpl : public IBase
{
public:
    virtual void go() const
    {
        doSomething();
        doSomethingElse();
    }

protected:
    void doSomething() const
    {
        return static_cast<const T*>(this)->doSomething();
    }

    void doSomethingElse() const
    {
        return static_cast<const T*>(this)->doSomethingElse();
    }
};

class DerivedT : public IBaseImpl<DerivedT>
{
    friend IBaseImpl<DerivedT>;

protected:
    void doSomething() const
    {
        // Do something specific for class DerivedT
    }

    void doSomethingElse() const
    {
        // Do something else specific for class DerivedT
    }
};

OK, now let’s take a look at the code. There are couple of things you might notice:

  • First of all a template class, IBaseImpl is created. The template argument carries type of derived class. So it could down-cast it self to the sub-type
  • The sub-class now derived from IBaseImpl<DerivedT>. It passes itself to its base class so it could do the down-casting.
  • virtual functions like doSomething() and doSomethingElse() are not virtual any more. By down-casting the type we know exactly what function call. So there is no vtable query and indirect function call needed.

This pattern is used widely to reduce overhead of virtual function calls. You can find a lot of examples from ATL/WTL. The technique is mentioned for the first time in C++ Report, Feb. 1995 by James O. Coplien with a name “Curiously Recurring Template Patterns”. With the pattern we could

  • Save indirection calls at run time.
  • type cast is at compile time, so no runtime overhead out there.
  • static methods can be overridden in this way. Remember that you can never define a static virtual function.

Actually by doing this you only can gain a pseudo polymorphism (or static polymorphism, simulated dynamic binding, ATL style inheritance, upside down inheritance, whatever). I am calling it pseudo because it only have part of the magic that runtime polymorphism (virtual function) has:

  • go() have to be virtual otherwise we are not able to reference to the object by IBase. But we can use C++ extension __declspec(novtable) to eliminate the overhead of initializing vptr of base classes somehow.
  • Classes derived from class DerivedT cannot gain different behavior by override doSomething() and doSomethingElse().
  • Template classes have a code size deficit. If we have 10 different classes try to implement interface IBase, we’ll get 10 extra middle-level classes IBaseImpl<T> under the hood.

So the pattern definitely cannot replace virtual functions at all. We have to think really carefully and use the pattern only in necessary places.

Compile-time Polymorphism vs. Runtime Polymorphism

Now let’s keep distance with theory and see some numbers about the differences about two type of polymorphisms. Use the poor man’s profiler I have testing code as below. Note that I have added __declspec(novtable) to avoid creating and initialize vtable/vptr unnecessarily for classes that never have instance created.

#define NO_VTABLE __declspec(novtable)

struct NO_VTABLE IBase
{
    virtual void go() const = 0;
};

class Derived : public IBase
{
public:
    virtual void go() const
    {
        doSomething();
        doSomethingElse();
    }

protected:
    virtual void doSomething() const
    {
        // Do something specific for class Derived
    }

    virtual void doSomethingElse() const
    {
        // Do something else specific for class Derived
    }
};

template <class T>
class NO_VTABLE IBaseImpl : public IBase
{
public:
    virtual void go() const
    {
        doSomething();
        doSomethingElse();
    }

protected:
    void doSomething() const
    {
        return static_cast<const T*>(this)->doSomething();
    }

    void doSomethingElse() const
    {
        return static_cast<const T*>(this)->doSomethingElse();
    }
};

class DerivedT : public IBaseImpl<DerivedT>
{
    friend IBaseImpl<DerivedT>;
protected:
    void doSomething() const
    {
        // Do something specific for class DerivedT
    }

    void doSomethingElse() const
    {
        // Do something else specific for class DerivedT
    }
};

template <typename T>
void test_polymorphism(unsigned int N)
{
    const IBase& b = T();

    Timer timer;
    for (unsigned int i = 0; i < N; ++i)
        b.go();
}

int main()
{
    const unsigned int N = 1048576000;

    for(unsigned int i = 1000; i <= N; i *= 2)
    {
        printf ("--------- N:%d ----------\n", i);
        test_polymorphism<Derived>(i);
        test_polymorphism<DerivedT>(i);
    }

    return 0;
}

And the testing result looks like below in my machine:

 image

From the chart we can see the conclusions are:

  • The overhead of runtime polymorphism can be twice as expensive as compile time polymorphism
  • Optimization works :-)

Share/Save/Bookmark

5 Responses to “What did we pay for C++ Runtime Polymorphism: Part II”

  1. 1
    Paul:

    Amazing, I never thought we can get the tremendous performance improvement through static polymorphism! You got my vote!

  2. 2
    Jay:

    Well actually the performance improvement is not that huge although it looks like so in the post. Becasue I am using some trivial cases to do the test. In real life you got have some code in your methods which will make the function call overhead unnoticeable.:-)

  3. 3
    Bryan E:

    You should see the same performance improvement by simply removing “virtual” from the definitions of the two helper functions. Since these functions are defined as “specific to Derived”, this should not cause problems, and eliminates the unnecessary complexity of using templates.

    This may be considered a lesson in “being aware of what should be virtual” versus “make everything virtual, just in case”…

  4. 4
    Mark:

    You could have gotten all the gains by just not declaring doSomething() and doSomethingElse() as non-virtual. This technique is useful for strategy pattern, where the base class defines a generic algorithm whose specific steps are executed by derived classes. In such a case, we can get away with virtual for the implementation steps.
    Eg:

    class Base
    {
    void algo()
    {
    this->init();
    //some common stuff
    this->print();
    }

    virtual void init() = 0;
    virtual void print() = 0;
    };

    class Derived: public Base
    {
    //implement the specific functions
    };

    In this case, CRTP can be used to eliminate the need to declare the two implementation functions as virtual.

  5. 5
    Mark:

    Sorry, pressed submit accidentally :(
    Using CRTP, you can eliminate _all_ virtual function calls. Your example, provides a common base class for all the Derived objects, so you had to implement go() as virtual. CRTP is not intended for such cases and is intended where you want to reuse the functionality of base class at most points, but want to customize specific steps in the algo. Consider:

    class Base
    {
    virtual void magic()
    {
    chant();
    wave();
    profit();
    }

    virtual chant(){}
    virtual wave(){}
    virtual profit(){}
    };

    Assume there is a derived class, Derived, which overrides only wave(). [we want to use default implementations for all else]
    Now, if you are wondering why chant(), wave() and profit() were declared virtual, you cannot customize them unless you declare them as virtual. This is because, once your derived class calls magic(), it calls the Base::magic() [we have not overridden magic() for Derived, we have overridden only wave()]. Then, when it calls wave(), it would have called Base::wave() if it had not been declared virtual. Here, we know that we are using Derived object, but just want to reuse the functionality of Base and we are paying the penalty of virtual due to above mentioned problem. CRTP comes to the rescue in precisely these situation, and you can get rid of _all_ virtual functions. Yes sir, all of them :)

Leave a Reply

February 2010
M T W T F S S
« Jun «-»  
1234567
891011121314
15161718192021
22232425262728

Blogroll