Hares and Tortoises

An Empirical Study of the Overheads Carried by Static and Dynamic Implementations of the Decorator Pattern

Much has been written about the Decorator pattern and its benefits in software development. From the client's viewpoint, the essence of decoration — invariance of type across object relationships — permits augmentation of functionality without change in interface. However, while many treatments of the Decorator present it solely as a dynamically constructed entity, this in fact need not be the case. In languages such as C++ the use of recursive template-instantiation allows the static construction of a decorative chain, with essentially the same properties as its dynamic counterpart; and while this approach loses the ability to configure implementation on the fly, it preserves the other characteristics of the pattern. This minimises run-time overheads, and maximises the potential for compile-time optimisations, and in certain contexts these advantages may be critical; but how much smaller, precisely, are the run-time costs of the static approach? This article is an empirical comparison of the overheads that accompany dynamic and static decoration.

Thumbnail image of the front cover of the July 2006 issue of Dr Dobbs Journal of Programming A version of this article appeared online in association with the July 2006 edition of Dr Dobbs Journal of Programming.

The Benefits of Decoration

Whatever the implementation, the principle of decoration turns upon the reification of complex functionality into separate classes of the same type. Linking instances of these classes together allows a given method to be called on the lead object, which performs its processing, before calling the same method on the next object, and so on. The invariance in interface throughout the chain means that objects need not be concerned about the precise nature of their siblings. This allows arbitrary sequences of objects to be constructed, of arbitrary length, but with minimal inter-object dependencies and the problems these entail, and the object-diagram on the right illustrates the principle.

In essence, this approach is an alternative to augmentation of functionality through sub-classing, but without the conflation of responsibility, and the implicit dependencies, that this use of derivation would bring. This delivers clear design-time benefits: each object can do one thing, and one thing alone, which promotes good, clean, abstractions. This, in turn, allows independent and incremental development of different areas of functionality.

On the downside, a criticism is that the high degree of granularity (lots of little classes) can result in designs that are hard to understand. However, software development is dominated by the conservation of complexity[1]. That is to say: whichever way you do it, there is always a minimum challenge, therefore any approach that reduces the net complexity to that minimum possible is to be favoured. However, high granularity may also bring performance overheads that are disproportionate to the net functionality of the structure, and to the demands on it, thus bringing the means of implementation into question.

Dynamic Costs

The most flexible way of implementing decoration is to allocate each object dynamically, and then link one to the other by means of pointers (as the next diagram illustrates), which forms, in essence, a linked list. The indirection allows objects to be added and removed inexpensively at run-time, which is ideal when functionality must change arbitrarily and frequently, especially where the overheads are low in relation to the real costs of the technique.

However, in situations where functionality need change only rarely, if ever, at run time, the dynamic approach may be inappropriate, especially where the overheads approach or surpass the real costs of the approach. For example, the propagation of a method call down the chain requires a pointer de-reference for each call, and this carries a small but real cost in time, which increases with the number of decorating objects. In addition, the use of indirection mandates the derivation of each decorating class from a common base. This, in turn, requires all methods that conform to the interface invariant to be virtual, and to be overridden in the derived types. The run-time overhead for these compounds the overhead of the indirection. Moreover, the virtual nature of method calls precludes inlining, and thereby precludes a raft of other optimisations that the compiler could perform if only it had simultaneous access to the method bodies.

These costs are exacerbated by the costs of creation and destruction. The high granularity of decoration fosters fragmentation of the free store, and each individual allocation and de-allocation takes time, especially with a general-purpose allocator. This is compounded by a separate v-table pointer (re)initialisation during construction/destruction of each object, and these penalties accrue as the number of decorating objects increases. Given that the majority of objects are relatively short-lived, the costs in creation and destruction may be disproportionate to the costs and benefits of usage.

Mostly, the benefits of decoration trade off very well against these costs. However, there will be instances when design pressures mandate decoration, yet at a point in the system where the construct occupies a performance hotspot — what is gained on the swings of good design can be lost on the roundabouts of poor performance.

Static Savings

An alternative, less flexible, approach to decoration is to parameterise a member object of a class template, with a class that supports the same interface as the parent. This enables the parent to forward method calls to the same methods in the child. Moreover, because the parameterising type can also be a class template, the principle can be repeated recursively, thus allowing a decorative chain of method calls. The listing illustrates one-stage static decoration:

As with a dynamic implementation, this promotes clean abstraction, and therefore allows disparate areas of functionality to be developed independently and incrementally. Different numbers and permutations of decorating objects can be combined, at compile time. What is lost is the ability to change the decorative structure at run-time, but with that go the overheads of the dynamic approach. Only a single object is created at runtime, which minimises fragmentation of the free store, and minimises the total time for allocation and de-allocation.

Moreover, because the object relationships are subsumptive rather than in-directed, there are no pointers to assign, making the structure faster to construct. The lack of indirection also obviates sub-classing, which in turn obviates virtual functions and their overheads. This means that all function-bodies are visible simultaneously to the compiler, which allows the elision of function-call overhead through inlining, and (importantly) opens the door to many other optimisations. The overheads for the creation, destruction and use of a static decorator should therefore be minimal.


 template <typename NextType>
 class Type_1
    {
    private: NextType Next;

    public: void Method ( ... )
       {

       // Processing...

       Next.Method ( ... );

       }

    };

 class Type_2
    {
    public: void Method ( ... ) { ... }

    };

 ...

 Type_1 <Type_2> MyDecorator;

 MyDecorator.Method ( ... );
            

A Fair Contest

Against this theoretical backdrop, is the challenge of ascertaining the relative overheads that the two approaches carry. In testing for these the code should be as 'bland' as possible, meaning that it should embody the principle of decoration without actually doing anything of interest. It should also allow the costs for creation, method dispatch, and destruction to be sampled. However, the multi-stage nature of decoration also demands that structures with different numbers of objects be tested, so as to expose performance effects that may emerge as the number of layers increases.

More difficult is the question of which variants on the static and dynamic form to test. The static case offers minimal latitude in implementation, but dynamic Decorators present more options, and the more flexible variants carry greater overheads. When racing hares and tortoises, one must select the fastest animals in the stable — the best-case scenarios — as failure to do so constitutes experimental bias. Similarly, it would be easy to test feature-laden Decorators, but this would disguise their essential performance profile — in pitting implementations against each other, both subjects must be as fast as possible.

A second consideration is that non-experimental variables must be controlled for, again in the interests of avoiding bias. Given this, the static and dynamic implementations that are tested should operate in as similar a manner as possible, so as not to confer unfair advantage. Non-experimental variables must also be minimised, therefore run-time creation and destruction should avoid the allocator that underlies the defaults for new and delete. This is to avoid polluting the test data with the overheads it carries (which stem from its general-purpose nature), and the fastest alternative is to overload new and delete on a per-class basis, such that objects can be returned from a pre-allocated area of storage.

In terms of constructing the chain, the static form carries a zero run-time cost, because the layering 'process' occurs at a compile-time. In the dynamic form, however, appending each object to the chain carries a run-time cost. Here, the fastest way to append one object to another is to pass the first object into the second's constructor, which avoids a two-stage construct-and-append process. Eschewing the ability to insert and remove objects arbitrarily also yields the fastest design.

The third consideration is the endmost, or terminating object. The last object in the chain cannot forward method calls to a non-existent sibling, therefore its functions should do nothing — making it an instance of the Null Object pattern. In the static form one can paramaterise the final template-instantiation with a 'NullDecorator' class, and in the dynamic a NullDecorator sub-class can be appended at construction time. However, the intent is to minimise allocation overheads, therefore the fastest implementation here is to allocate the Null Decorator statically, and pass it by reference.

Running the Race

Full listings for the dynamic and static test classes are provided as appendices, along with further explanations of the design choices made. The test code was compiled using Digital Mars v8.38 — employing the RDTSC machine-instruction to measure the number of machine cycles consumed[2] — and was run on a Pentium III system. Cache-based operations alone were sampled, with RTTI and exception handling disabled, and function inlining being the only optimisation. The results, adjusted for the test-harness overhead, are shown in the chart on the right-hand side.

What is clear is that the performance of the static form is streets ahead of the dynamic in all aspects. So much so that a number of the fewer-staged objects returned cycle counts that were less than the overhead of the test harness alone. This is because the CPU 'parallelised', and applied out-of-order execution to, both the code under test, and to the timing code[3]. This is also evident in the erratic cycle-counts for method dispatch in the static implementation, in that certain numbers of stages map well to the number of instructions fetched at once, whereas other layer-counts are 'out of phase' spatially with the breadth of the processor's 'maw'. Presumably, this artefact persists as the number of layers increases beyond 25, although it should be said that even 25-stage decoration is probably unlikely in production code.

Note also that the dramatic increase in destruction times for the dynamic form arises, in part, because of the overhead of the virtual destructor, which is necessary in the test code because the overloaded delete-operator must know the size of the object being destroyed. However, and although using the default implementation of delete does not require this, a virtual destructor is still likely in most real applications of the dynamic form, because of the sub-classing that most dynamic Decorators employ.

Finally, this study, in essence, extends the tricky problem of measuring performance in the fundamental relationships[3] — after all, a static Decorator is simply an instance of recursive subsumption. It is therefore tempting to work backwards from the method-dispatch datum for a 25-stage dynamic implementation (by dividing 249 by 25), to arrive at a hypothetical cycle-count of 9.96 for a single polymorphic-call. While this figure falls within the correct ballpark, it really is hypothetical, given the indeterminacy that the fundamentals exhibit on modern platforms.

On Balance

Bear in mind that the test regimen used here is extremely conservative. In a real application the overhead (for example) of using a default allocator could widen the gap between the dynamic and static implementations considerably. In addition, the overloaded method() function in the test suite does nothing significant, therefore the potential for optimisations that recursive inlining holds cannot be reflected in the data. This is rather unfair to the static implementation because the optimisations possible in real applications are immense.

However, and despite its speedy profile, a downside of many-stage static decoration is that passing parameters to constructors becomes rather awkward. Additionally, a fly in the coding-ointment is that many stages entail verbose template-instantiations, although typedefs do help considerably, and can be tucked away in a header. Finally, and for some developers, the loss of dynamic configuration renders the matter academic. However, nothing prevents the approaches from being combined, such that trains of statically decorating objects are interlinked with sequences of the dynamic variety.

References

[1] To Finity Where Else? — Exploring and Refuting the Principles Embodied in Initiatives such as Model Driven Architecture
Richard Vaughan
Application Development Advisor January/February 2005

[2] Testing Times — Resolving Temporal Costs at the Level of Individual Machine-Cycles
Richard Vaughan
Dr Dobbs Journal of Programming (online) June 2006

[3] Dear Relations — An Empirical Study of the Temporal Costs of Basic Object-Relationships
Richard Vaughan
Dr Dobbs Journal of Programming (online) June 2006

Appendix 1 — Dynamic Decorator Test Listing


 // ///////////////////////////////////////////////////////
 //
 // Listing for test dynamic-Decorator. Note that this is
 // not a general-purpose implementation.
 //
 // Note: DecoratorBase dtor must be virtual because of
 // overloaded delete in derived class. Note also that only
 // a single instance of the NullDecorator is needed for
 // testing, hence the static definition.
 //

 class NullDecorator;  // Forward declaration - mandated by compilation dependencies

 class DecoratorBase
    {
    public:  static  NullDecorator  Tail;
    private:         DecoratorBase *Next;

    // -------------------------------------------------

    public:  virtual               ~DecoratorBase () { }

    public:  inline                 DecoratorBase ();
    public:  inline                 DecoratorBase (DecoratorBase &Prev);

    private:         void           SetNext       (DecoratorBase *Next_) { Next = Next_; }
    public:          DecoratorBase *GetNext       ()                     { return Next;  }

    public:  virtual void           Method        (unsigned Value) = 0;

    };


 // ///////////////////////////////////////////////////////
 //

 class NullDecorator : public DecoratorBase
    {
    private: virtual void Method (unsigned Value) { }

    };


 // ///////////////////////////////////////////////////////
 //
 // Instantiation of static terminating-object, plus out of
 // line DecoratorBase ctors. Compilation dependencies
 // mandate they be defined after the NullDecorator
 // instantiation.
 //

 NullDecorator DecoratorBase::Tail;

 DecoratorBase::DecoratorBase (DecoratorBase &Prev) : Next (&Tail) { Prev.SetNext (this); }
 DecoratorBase::DecoratorBase ()                    : Next (&Tail) { }


 // ///////////////////////////////////////////////////////
 //

 class DerivedDecorator : public DecoratorBase
    {
    private: unsigned Attrib;

    // -------------------------------------------------

    public:  virtual ~DerivedDecorator () { }

    public:           DerivedDecorator ()                    : Attrib        (0) { }
    public:           DerivedDecorator (DecoratorBase &Prev) : DecoratorBase (Prev),
                                                               Attrib        (0) { }
    // -------------------------------------------------

    public: virtual void Method (unsigned Value)
       {

       Attrib = Value;

       GetNext ()->Method (Value);

       }

    // -------------------------------------------------

    private: static unsigned char  AllocationArray[];
    private: static int            ByteCount;

    public:  inline void          *operator new    (size_t Bytes);
             inline void           operator delete (void *);

    };


 // ///////////////////////////////////////////////////////
 //
 // Instantiation of static members.
 //

 const int LAYER_MAX = 25;

 unsigned char DerivedDecorator::AllocationArray[sizeof (DerivedDecorator) * LAYER_MAX];
 int           DerivedDecorator::ByteCount = 0 - sizeof (DerivedDecorator);


 // ///////////////////////////////////////////////////////
 //
 // New and delete must be defined here because of
 // compilation dependencies
 //

 inline void   DerivedDecorator::operator delete (void *) { ByteCount -= sizeof (DerivedDecorator); }
 inline void  *DerivedDecorator::operator new    (size_t Bytes)
 {

 ByteCount += sizeof (DerivedDecorator);

 return (void *)(AllocationArray + ByteCount);

 }


 // ///////////////////////////////////////////////////////
 //
 // The wrapper class simplifies creation of a decorator
 // during tests, and must be instantiated only once.
 //

 class DecoratorWrapper
     {
     private: DerivedDecorator Head;

    // -------------------------------------------------

     public:  DecoratorWrapper (unsigned NumLayers)
        {

        DecoratorBase *CurDecorator = &Head;

        for (unsigned Count = 1;
                      Count < NumLayers;
                      Count ++) CurDecorator = new DerivedDecorator (*CurDecorator);

        }

    // -------------------------------------------------

    public: ~DecoratorWrapper ()
       {

       DecoratorBase *Current = Head.GetNext ();

       while (Current != &DecoratorBase::Tail)
          {
          DecoratorBase *Next = Current->GetNext ();

          delete Current;

          Current = Next;

          }

       }

    // -------------------------------------------------

    private: static unsigned char  AllocationObj[];

    public:  inline void          *operator new    (size_t Bytes);
             inline void           operator delete (void *) { }

    public:  void                  Method          (unsigned Value) { Head.Method (Value); }

    };


 // ///////////////////////////////////////////////////////
 //
 // Instantiation of static members
 //

 unsigned char  DecoratorWrapper::AllocationObj[sizeof (DecoratorWrapper)];
 inline void   *DecoratorWrapper::operator new (size_t Bytes)
 {

 return static_cast<void *>(AllocationObj);

 }
         

Appendix 2 — Static Decorator Test Listing


 // ///////////////////////////////////////////////////////
 //
 // Listing for test static-decorator. Note the custom
 // memory-management renders this unsuitable as a general
 // purpose implementation.
 //

 template <typename NextType>
 class Decorator
    {
    private: NextType Next;
    private: unsigned Attrib;

    // -------------------------------------------------

    public: Decorator () : Attrib (0) { }

    public: void Method (unsigned Value)
            {

            Attrib = Value;

            Next.Method (Value);

            }

    // -------------------------------------------------

    public:  inline void *operator new    (size_t Bytes);
             inline void  operator delete (void *);

    private: static unsigned char  AllocationArray[];

    private: static int            ByteCount; // Note: ByteCount is unnecessary but needed to maintain
                                              // equivalence with the dynamic implementation
    };


 // ///////////////////////////////////////////////////////
 //

 class NullDecorator
    {
    public:      NullDecorator () { }
    public: void Method        (unsigned Value) { }

    };


 // ///////////////////////////////////////////////////////
 //
 // Typedef eases use of a particular instantiation when
 // testing. Also of use in overloading new.
 //

 typedef Decorator
        <Decorator
        <Decorator       // To add more layers, copy and paste this line...
        <NullDecorator>
        >                // ...and this line too
        > TestType;


 // ///////////////////////////////////////////////////////
 //
 // Note: Your compiler may not like the AllocationArray
 // statement. If so, try a dummy instantiation of TestType
 // just before that line.

 template <typename NextType>
 unsigned char Decorator<NextType>::AllocationArray[sizeof (TestType)];

 template <typename NextType>
 int Decorator<NextType>::ByteCount = 0 - sizeof (TestType);


 // ///////////////////////////////////////////////////////
 //
 // Note: ByteCount is unnecessary but needed to keep
 // equivalence with the dynamic implementation
 //

 template <typename NextType>
 inline void Decorator<NextType>::operator delete(void *) { ByteCount -= sizeof (Decorator<NextType>);}

 template <typename NextType>
 inline void *Decorator<NextType>::operator new (size_t Bytes)
 {

 ByteCount += sizeof (Decorator<NextType>);

 return static_cast<void *>(AllocationArray + ByteCount);

 }
            
Copyright © Richard Vaughan 2004