Fresh Fuel

An Exploration of Managed C++ on the .NET Platform

Those of you with culinary leanings may have noticed a recipe for Warm Spider Crab Linguine in the March 2002 edition of the BBC's Good Food magazine[BBC]. The instructions for eviscerating a Spider Crab read like a cross between The Spanish Inquisition's Staff Handbook and the memoirs of Joseph Mengele, and one gets a similar feeling upon consulting the Managed C++ Reference on the Microsoft web site. Many changes have been made to traditional C++, in the name of .NET, in order to bring us this variant on the standard and this article examines the major thrust of the offering.

Thumbnail image of the front cover of the seventh issue of Objective View This article appeared originally in issue 7 (Autumn 2003) of Ratio Group's journal Objective View.

First Impressions

Considering .NET generally, there are some eyebrow-raising claims being made for the technology. The following from the Wrox Press .NET support site is an example:

'This idea of making programming much much easier by putting the really complex stuff "under the hood" while giving the programmer an intuitive and easy interface to work with is leading to the possibility of non-computer professionals taking up the task of authoring computer programs for their non-information technology knowledge specialties (medicine, civil engineering, etc.). Time will tell if we are at a turning point in the history of electronic digital information technology in which the numbers of persons capable of competently authoring a computer program increases by one or two or more orders of magnitude.'

This is worrying because precisely the same claims were made for COBOL when that was introduced and (as ever) Object Orientation too. However, the development of non-trivial software is complex and challenging because the systems that we attempt to model on our machines are themselves very complex and challenging. To apply the logic espoused above to other areas of human pursuit is to imply that one day everybody could be a brain surgeon. Yet no amount of nifty tools (and languages are tools as well) will ever deliver us from the conservation of complexity.

Modus Operandi

While there are superficial similarities between the Java model and .NET, they are, in fact, fundamentally different approaches. In the case of Java, high-level source code is compiled into virtual-machine instructions. This allows a single executable to run on any machine, given that a VM is available for that machine. .NET parallels Java, in that high-level source code is compiled into Microsoft Intermediate Language (MSIL), which is similar to Java byte code, however from here the technologies diverge.

Firstly, the intermediate language (IL) does not run on a VM but is Just In Time compiled (jitted) to native machine code prior to execution on the target platform. (The Jitting process being a temporal equivalent to the spatial solution that a VM represents.) The second difference is that IL code can be generated from a variety of common programming languages. (Note that while it is also possible to compile Java byte code from non-Java sources, Sun did not intend this.)

However, all programming languages are not the same and .NET stipulates the Common Language Specification (CLS) to which existing languages must be bent in order to comply with the .NET framework. In essence, the CLS is to these as the XML Infoset is to XML syntax. That is to say XML documents are composed, semantically, of elements but other syntax could be used in place of angle brackets — XML documents could, hypothetically, be coded using C-like syntax.

Given this, we are not actually dealing with cross-language programming, where object code from different compilers is linked to a common executable, but cross-syntax programming. The syntax may appear to be that of your favourite programming language (C++, Eiffel, Smalltalk etc.) but in reality you are conforming to CLS semantics. Given that one is coding to a single underlying object model, this will necessarily place restrictions on the syntactic model that one is using.

Managed code is compiled into 'assemblies', which contain procedural code, versioning information and metadata that describes the types used in the assembly. While traditional C++ development is possible with Visual Studio .NET, sources can be compiled into IL by using the /clr compiler switch. However, this does not automatically transform traditional C++ syntax into a .NET assembly. Firstly, you must include the following at the top of a C++ source file:

   #using <mscorlib.dll>
   using namespace System;

The #using pre-processor directive makes the resources in the core library assembly available to code in the translation unit. The other using directive makes the .NET class libraries available. That however, is only the start as MC++ code differs significantly from unmanaged C++. Let's explore those features and differences.

The MC++ Lexicon

MC++ adds fourteen new keywords to C++ which must be used to qualify otherwise normal C++ syntax. These are:

   __abstract
   __box
   __delegate
   __event
   __gc
   __identifier
   __interface
   __nogc
   __pin
   __property
   __sealed
   __try_cast
   __typeof
   __value

Some of these are fairly simple and innocuous in their effect, as in the case of the __identifier keyword. This allows one to use other language types where the names of those types are normally C++ keywords. For example, you may wish to use a class called 'switch', which has been declared using another language and which would normally constitute a syntax error were one to use it as the name of a C++ class. Others however have far more profound implications and this is partly because of the .NET dynamic-allocation model, which we shall examine next.

Storage Management

Given that .NET is about interoperability, all .NET applications will hold (at least some) objects in a common heap. It is the characteristics of this that feed back into C++ to yield MC++. The CLR heap is a garbage-collected, compacting heap. Garbage collection absolves the application of the responsibility of deallocating unwanted objects, while 'compacting' means that free space between allocated blocks is coalesced into a single block. This is accomplished by shifting blocks towards the beginning of the storage arena; a technique that was introduced in the storage management policies of early operating systems. The diagram shows the deallocation of an object (Object 2), the gap that it leaves and the subsequent compaction of the free space that remains.

Garbage collection has the advantage that it prevents deallocated storage from being re-accessed, thus precluding rogue pointers. Memory leaks are also impossible, because unreachable objects are recovered automatically, plus it frees the developer from managing object lifetimes. This does incur a cost however because the collector runs as a background thread, which impinges on performance, plus the compaction process incurs a time penalty.

All of this has implications for MC++ because the normal C++ storage management model conflicts with the CLS semantics demanded by .NET. Type declarations that are compiled for the .NET framework must therefore be adorned with MC++ keywords, which makes them known to the CLR.

The next listing shows a simple managed C++ class, where the __gc keyword means that instances of Simple are garbage collected. Were this unmanaged C++ then the loop would eventually exhaust the free store because each object that is created would be lost upon the next iteration. Instead the heap is never depleted because the garbage collector recovers the unreachable objects.


 #using <mscorlib.dll>
 using namespace System;

 __gc class Simple
     {
     public: int i;

     };

 int main ()
     {

     while (true)
        {
        Simple *SimplePtr = new Simple; // Runs forever

        }

     return 0;

     }
      

Clearly this has a systemic effect on other aspects of normal C++ and this is demonstrated in the next diagram. Here we can see a set of objects that reside in the managed heap. The first points to the third, which is also pointed to by a stack-based pointer in the application's runtime space. The fourth is pointed to by an object, which resides in the application's non-managed free store.

Given that compaction necessarily changes the address of an object then this means that any pointers to the object must also be updated accordingly. Given this, any pointer to a CLR heap-object is not an ordinary pointer, as the compactor must be aware of it for it to be updated upon compaction. Therefore MC++ pointers must be declared with the __gc keyword as well as the objects that they point to. Note that the same holds for references as well. Note also that a given object's address can be fixed (thus preventing full compaction) using the __pin keyword.

The following example therefore shows a class that contains a __gc pointer to objects of class Simple:


 #using <mscorlib.dll>
 using namespace System;

 __gc class Simple
     {
     public: int i;

     };

 __gc class OtherSimple
     {
     public: __gc Simple *SimplePtr;

     };
      

In addition to the above, stack-based objects of user-defined type are disallowed (every instance of a __gc class must be allocated dynamically). This in turn means that __gc objects cannot be passed or returned by value. The following example demonstrates this:


 #using <mscorlib.dll>
 using namespace System;

 __gc class Simple
     {
     public: int i;

     };

 void SomeFunc (Simple SimpleParameter) { }

 int main ()
     {
     Simple       InstanceOfSimple;       // Will not compile

     Simple __gc *SimplePtr = new Simple; // Fine

     SomeFunc (InstanceOfSimple);         // Will not compile

     return 0;

     }
      

There is, however, an alternative to __gc classes and that is the concept of __value classes. The __value keyword is intended for small objects that generally have short lifetimes and for which garbage collection would be too costly. As opposed to __gc classes, __value classes can be instantiated on the stack and can be passed by value to functions. They cannot however be allocated on the CLR heap unless they are embedded within a __gc class. For example:

This may appear to return us to more familiar unmanaged C++ territory, however it creates a new problem in that one must choose whether a class is going to be __gc or __value — a type cannot be both. Note also that while one can still allocate a __value object dynamically, this can only occur on the normal C++ free store and you must qualify operator new with the __nogc keyword to achieve this.

Finally, there is another issue lurking behind all of this. Studies indicate that there is no single, optimum, storage-allocation policy, yet .NET takes a 'one size fits all' approach, which cannot suit every application. Traditional C++ gives the ability to customise allocation strategies on a per-class basis, thus yielding up opportunities for significant performance optimisations. Overloads of new are not allowed in __gc classes however and this restriction could compromise many applications.

Storage management aside, let us examine the inheritance model in MC++.


 __value struct Simple { int i; };

 __gc class Enclosing
    {

    Simple SimpleMember;

    };

 Simple SomeFunc (Simple SimpleParam)  // Passed by value
    {

    SimpleParam.i += 1;        // Value at call site unaffected
    return SimpleParam;        // Returned by value

    }

 int main ()
    {

    Simple SimpleInst_1         = {10};
    Simple SimpleInst_2         = SomeFunc (SimpleInst_1);

    Enclosing _gc *EnclosingPtr = new Enclosing;    // Allocated as
                                                    // part of Enclosing
                                                    // instance

    EnclosingPtr->SimpleMember    = SimpleInst_1;   // Copy value
    EnclosingPtr->SimpleMember.i += SimpleInst_2.i;

    Console::WriteLine (SimpleInst_1.i);
    Console::WriteLine (SimpleInst_2.i);
    Console::WriteLine (EnclosingPtr->SimpleMember.i);

    }

 --------------------------------------

 Output:

 10
 11
 21
      

Inheritance

Once again, and in order to get all .NET languages singing from the same hymn sheet, the MC++ inheritance model differs significantly from traditional C++. Firstly, if a class has no super class then it is implicitly derived from System::Object. This is to provide compatibility with the C# libraries where everything is derived, cosmically, from a single, universal base. Secondly, all inheritance must be public; private and protected inheritance is not allowed. In addition to this, managed classes cannot declare friend classes or functions.

Fourth, one cannot mix __gc and __nogc classes in an inheritance relationship although it is possible to mix __gc classes with classes declared using other languages. For example a C# class can form the base for an MC++ class and that class could then form the base for an Eiffel# class.

Implementation inheritance is available in the single form only — multiple inheritance being available only for interfaces. I.e. a class can have many base classes but only one of these may have attributes; the rest must be populated with member functions alone. How troublesome this is depends upon your view of multiple inheritance. Some believe that full-blown MI, as supported by C++, Eiffel etc. is an entirely good and desirable thing and that for it to be curtailed in .NET is to deny developers their freedom. Meyer, for one, takes this view and feels that .NET's restrictions on MI should (and eventually will) be lifted[Meyer].

In reality, multiple implementation-inheritance will always incur a small performance penalty and can introduce name conflicts. Moreover, the same effect can always be accomplished using a mix of Is A and Has A relationships with no greater loss of performance. Moreover, as Alexandrescu points out, MI is a simply a syntactic mechanism for combining classes and does not implicitly orchestrate the collection of bases — this is something that the derived class must be made to do[Alexandrescu 6]. Many developers will therefore be unperturbed by the restrictions that .NET imposes in terms of building new applications from scratch, although it does have significant implications for porting existing systems because a proportion of their code will not compile.

Aside from restrictions on the normal C++ inheritance model, what additions does .NET make in MC++? Firstly, the subclassing of a type can be mandated by the use of the __abstract keyword, although an __abstract class is not required to have pure virtual functions. Secondly, __interface classes can be declared where no data members are allowed (apart from a __value enum) and all member functions are implicitly pure virtual. Subclassing of __interface classes is therefore also mandatory.

The next diagram summarises many of the above points.

In contrast with the __interface and __abstract classes, further derivation can be prevented by use of the __sealed keyword, although this cannot be applied to an __abstract or __interface class. __sealed can also be applied to virtual member functions thus preventing them from being overridden. Note that value classes are __sealed implicitly.

Member Functions

Member functions also differ in a number of ways from unmanaged C++. Firstly, the garbage collection issue holds important implications for constructors and destructors. One of the great things about these is that they are an excellent way of ensuring that something happens automatically. In the case of destructors, this is true even when an exception is raised and this lends itself to a variety of attractive and useful techniques. However, with garbage collection, destructors become finalisers, which means that they are called just before the garbage collector reclaims the storage, not when the object goes out of scope. Given that one cannot normally guarantee when garbage-collection will occur, it is therefore impossible to know precisely when the finaliser will be called and thus a useful technique is lost.

It is possible to work around this by calling operator delete on a __gc object, which will force execution of the destructor there and then. (The CLS guarantees that the destructor will not be called again when the garbage collector subsequently recovers the object.) Alternatively, you can call the destructor explicitly, as in unmanaged C++, and this would appear to reinstate the technique outlined above. In reality, you now have to think explicitly about the death of the object, whereas the original idea was to let the language rules and the compiler take care of everything.

There are other changes to construction semantics to consider as well, such as the fact that user-defined destructors are always virtual. Other changes, however, are more radical. For example, in traditional C++, a call to a virtual function from within a base-class constructor will result in execution of the override that is visible at that class's position in the hierarchy. This occurs irrespective of any overrides that are present in derived classes. In MC++, however, the same call will cause execution of the most overriding version of the function to be executed. Note that member objects are zero initialised before the execution of the constructor and the overriding function is therefore guaranteed not to execute on an object that contains garbage, (although you may not actually want the defaults either).

In addition to this, managed types cannot have a user-defined copy constructor. This implies that copying an object that has a pointer to another dynamically allocated object will result in the CLR creating a copy of that object as well, and so on recursively. (Although the Microsoft reference does not seem to mention this.)

Functions with default arguments are not permitted therefore one must use a wrapping function to achieve the same effect. I.e. for a function with a single default argument one must create a new function that takes the original's non-default arguments as tramp variables and which calls that function supplying a hard-coded default parameter.

Operator Overloading

Operator overloading is also affected by the demands of the CLS and, while it is (mostly) still possible, the operator keyword cannot be used. For example the following constitutes a syntax error:


 __gc class Simple
    {
    public: bool operator == (Simple &RHS); // Error
                                            // Note: const cannot
                                            // be used

    };
      

Instead one must used a static function with a distinguished name and operator overloads in __gc classes must have at least one parameter that is a pointer to the defining class. This function can then be called explicitly by quoting the distinguished name, or it can be invoked implicitly using conventional infix notation. For example, the equality operator is overloaded using op_Equality:


 __gc class Simple
    {
    public: static bool op_Equality (Simple *LHS, Simple &RHS) {...}

    };

 bool Compare (Simple P1, Simple P2)
    {

    if ( Simple::op_Equality (&P1, P2) ) return false;  // Explicit

    if ( P1 == P2 )                          return false;  // Implicit

    return true;

    };
            

The following is a list of the operators that can be overloaded in MC++ and their conventional symbols:

   Unary operators                 Binary operators

   op_Decrement            --      op_Addition     +
   op_Increment            ++      op_Assign       =
   op_Negation             !       op_BitwiseAnd   &
   op_UnaryNegation        -       op_BitwiseOr    |
   op_UnaryPlus            +       op_Division     /
   op_Equality             ==
   op_ExclusiveOr          ^
   op_GreaterThan          >
   op_GreaterThanOrEqual   >=
   op_Inequality           !=
   op_LeftShift            <<
   op_LessThan             <
   op_LessThanOrEqual      <=
   op_LogicalAnd           &&
   op_LogicalOr            ||
   op_Modulus              %
   op_Multiply             *
   op_RightShift           >>
   op_Subtraction          -

Note that __gc classes cannot overload the address-of operator.

Serious Restrictions

Some features of traditional C++ are preserved in MC++. For example, exception handling is still implemented using the normal try/throw/catch mechanism. Alternatively, one can use Structured Exception Handling by means of the __try, __throw and __catch keywords. SEH also adds the __finally keyword, which marks code that will be executed following the throwing of an exception but before the exception object leaves the throw site. MC++ also adds the __try_cast keyword, which is similar in operation to the dynamic_cast mechanism in normal C++ and is used within a try block (an exception is raised if the cast cannot be performed).

However, there are features of C++ that are simply not supported in MC++ because they would conflict with the CLS. The most onerous of these is the rough treatment that templates receive. While it is possible to instantiate a template with a managed type, templates cannot have a managed type as a parameter type and, worse of all, there is no such thing as a managed template. For example the following code is not allowed:

Given that genericity is not supported then there can be no support for cross-language genericity. For example you cannot use a C++ template in a C# assembly nor can you cannot derive a C# class from a C++ template. The official word from Microsoft is that they are not supported in this release, however for various technical reasons, it is hard to see how they could be supported even in future releases. This will create problems for the adoption of .NET into the C++ community because a lot of very powerful techniques will be impossible — a managed version of the STL, for example, is out of the question. Certainly the suppression of templates in MC++ refutes the idea that .NET signals the end of C++ because the need for templates and the STL is too strong to overcome by other existing means.

In addition, RTTI cannot be used. In itself this may not be so bad because the CLR supports a much richer reflection model but it does impinge on porting existing code. A trivial bit of editing and some recompilation will not suffice and many points in existing systems have to be rewritten.

Finally, __gc classes cannot have const member functions. Nor is volatile permitted either. While volatile is not such a problem as its role has always been much more in low-level systems development (something that .NET, by definition, detracts from), the lack of const is pretty serious because it denies us the benefits of const correctness and denies optimisation opportunities to the compiler.


 template <typename T>
 __gc class SimpleTemplate  // Error
   {


   };
      

Conclusion

Managed C++ cannot be covered exhaustively here and, for example, Delegates, Events and Properties (a formalisation of accessor/mutator functions) have not been explored, nor has 'Boxing' received an airing. However, core issues are clear. Firstly, anyone who thinks that one can develop for the .NET platform in C++ as if nothing had happened is mistaken — syntax aside, the semantics are, in places, wildly different. Given this, the term 'Managed' in MC++ will be viewed by many as a euphemism for 'Bastardised'. With MC++ (or any other .NET language) one is actually working with C# semantics (i.e. the CLS), therefore systems that are developed from scratch may as well be coded in C# from the start. In that way you can at least enjoy clean syntax that is devoid of the extra (messy) keywords mandated by MC++.

Secondly, Microsoft cannot walk away from C++ and the principle reason, therefore, for them to support it on .NET is to allow the porting of existing code to the platform. It also wishes (political interests to the fore) to create a sense of 'old favourite on new platform'. It therefore positions things by saying that you can create managed code that bridges between existing (legacy, in their terms) code and other .NET components — unmanaged code can be moved over incrementally.

Some of the overviews of .NET support this by implying that there is virtually seamless interoperability between C++ and MC++. However, many existing applications will not port by the simple addition of a few keywords, because of semantic impedance. Given this, fresh design-decisions will have to be made or a wrapping approach will have to be taken (same thing). This can be accomplished by either embedding unmanaged classes within managed classes or by using managed pointers to managed objects, which wrap unmanaged objects in turn. However, these workarounds only serve to complicate an already complicated business — good software development seeks to simplify, and MC++ runs contrary to this.

Considering .NET generally, there is another issue, which is the 'cultural' impedance that exists between developers with different language-experience. The experience gap between same-language developers creates sufficient difficulties as it is, while language-choice can influence the way that developers think about a problem. .NET combines these factors, so that we end up with heterogeneity in both syntax and conception, and this can only add fresh fuel to the language wars — to unify semantics whilst dividing syntax does not seem favourable.

These negative conclusions aside, there is a thin silver-lining to all this, which is the vicarious benefit that comes from understanding the compilation issues that apply as a whole to C++/C#/Java etc. and the .NET runtime. The John Gough book mentioned below is thoroughly recommended for those who wish to plumb these fascinating depths.

References

Bertrand Meyer's .NET Training Course
2001 Prentice Hall, PTR.
ISBN 0-13-03315-5

Modern C++ Design: Generic Programming and Design Patterns Applied
Alexei Alexandrescu
Addison Wesley
ISBN 0 201 70431 5

BBC Good Food Magazine — March 2002
BBC Worldwide Publishing

Further Reading

Compiling for the .NET Common Language Runtime (CLR)
John Gough
Prentice Hall
ISBN 0 13 062296 6

Copyright © Richard Vaughan 2002