As software developers we enjoy a curious luxury over those who build other types of system. Fundamentally, all software is comprised of discrete elements that occupy a finite number of states, and which determine change in state. The potential for recursive application of this principle is, in theory, infinite and underlies the proliferation and success of computing systems across the world. Yet we pay for our luxury; software construction is challenging and characterised by high rates of failure. In response, periodic initiatives, such as Model Driven Architecture, seek resolution through ever-greater automation. However, just as the notion that heavy objects fall faster than lighter ones is obvious and wrong, such convergent strategies are based on a specious premise, and may even exacerbate the problem. This article explores and refutes the reasoning behind such approaches.
A version of this article appeared in the Jan/Feb 2005 issue of Application Development Advisor magazine, under the title 'The Devil is in the Detail' (a title that captures the matter only generally, and which is nowhere near as much fun as implying that proponents of any kind of MDA approach are as well-meaning but daft as Buzz Lightyear). |
It is a fundamental tenet of computer science that a set of machine-based instructions and data is entirely finite and deterministic in operation. This allows the construction of systems in terms of collections of such sets, where the behaviour of a larger-scale component arises solely from that of its smaller-scale components. This allows the abstraction of behaviour, and this, in turn, is the essence of programming languages. We need not, for example, specify a loop in assembler when a compiler can map 'while...' to a corresponding machine representation. Moreover, such automata can be automated themselves — to extend the example, one loop can be nested within another.
This principle implies systems of arbitrary size that will operate perfectly. However, it can also seduce one into thinking that the more a concept is abstracted, the greater its universal potential. Greater automation, goes the argument, captures a greater number of systems, using less detail, therefore the complexities that bedevil software development can be reduced significantly by automating at a higher level. This embodies the argument behind MDA, where functionality is abstracted more than in current approaches, with the goal of greater productivity and reliability.
Yet first principles also show a flaw in this argument. Within the discrete/deterministic scheme there is the question of efficiency, in that some elements of a system incur a cost, but do not contribute directly to its functionality. These costs can take the form of overheads, or just pure deadweight; and minimising such redundancy is the principle by which compiler optimisations and lossless data-compression algorithms operate. Run-Length Encoding, for example, replaces contiguous blocks of a particular value with a single token representing that value, along with a token stating how many times it repeats. Critically, this is a form of automation, and is also the static equivalent of a procedural loop.
This principle, however, cannot be applied infinitely. Although there is no upper limit to the inefficiency in a system, there is a lower limit for RLE and the like, beyond which increasing loss of functionality would occur. If this were not the case a string could be squirreled away into zero bits, only to be resurrected in full subsequently, as the diagram illustrates.
Clearly, this is impossible; one can automate the inefficiency out of a system, but not the details of the system itself. This implies a 'conservation of complexity', and applies equally to spatial and temporal representations — procedural code is subject to this limitation as much as static data. In physics, energy and matter can only be converted from one to the other, or moved from place to place; they cannot be created or destroyed. Similarly, complexity is always conserved in software. In plain language: one must, sooner or later, confront the fiddly details.
Moreover, this shows that automation does not yield something for nothing. Automation is the very crystallisation of regularity, but in describing a general characteristic, it suppresses specifics. Any use of regularity must therefore be salted with appropriate irregularity to deliver the specificity we seek. Obviously, automation (and therefore abstraction) allows transaction of design concepts at a higher level, which is convergence. But this just replaces an existing challenge with the question of how to combine those abstractions in original and interesting ways — divergence is thus re-introduced, and so our craft progresses.
Applying this argument to MDA shows that while high-level, abstract modelling of a system may capture its generalities, the specifics must still appear in the equation at some point. Taking a translative approach simply pushes the complexity to a higher level, meaning that 'action-semantic language' descriptions of procedure must capture the specifics in the same way that coding in C++/Java/Smalltalk etc. does currently. Proponents may argue that abstraction at this level precludes much of this specification, but this just forces re-emergence of complexity at a lower level, which is the implicit concession in elaborative MDA.
Similarly, platform-independent design still mandates platform-specific code; therefore the complexities of implementing a given feature in a given way must still be fielded. The counter-argument is that transformation tools can encapsulate this, yet adapters, virtual machines, and JIT compilation of intermediate-language files are established approaches to working with platform dependencies. In this sense MDA offers nothing new, while compromising its own philosophy by using metadata to annotate PIMs. This is 'specified independence', which is contradiction — to echo Dan Haywood's comment[Haywood(1)] regarding action-semantic languages, the platform comes back to haunt us.
There is a persistent perception that software development is an engineering discipline with an appalling track record[Haywood(2)]. The parallel being that if we built bridges with the same reliability as software then a large proportion would either remain uncompleted, would be too dangerous to use, or would fail. However, while the track record is fact, programming is not engineering. This is because physical systems may be deterministic (remove a girder, and the bridge will fall), but their components are not truly discrete at the Newtonian level. A crankshaft, for example, is composed of a population of atoms and molecules, and it is the statistical behaviour of that population that yields crankshaft properties.
This is why it is possible to build physical systems using components that possess minor flaws — in the same way that the loss of minor detail that may accompany JPEG encoding does not affect image quality unduly. As long as the magnitude of a flaw remains within certain limits then the component will not fail. Even if it does then the overall system may still operate properly, despite lower-level compromise, and thus one can use high-level concepts reliably in the design of physical systems — 'tolerance' absorbs disjuncture between abstraction and implementation, especially when applied recursively.
Contrive, however, to pass a signed integer to a routine that expects an unsigned (a difference involving a single bit), and the 'if' and 'when' of (often systemic) failure resolves to simple Russian Roulette. Conservation of complexity distinguishes logic from physicality; therefore we should never apply the term 'engineering' to programming, no matter how alluring the comparison may be.
Ultimately, it may be that the mean rates of software reliability and developer productivity will remain pretty much the same. In some respects, therefore, approaches such as MDA can be seen as more damaging than useful. If developers promise to resolve the intrinsic challenges of software construction, and then fail to deliver, then the perception will be that we persist in getting it wrong. The more this is reinforced then the more the emergence of palpably useful principles and techniques will be treated as an instance of cry wolf. In the meantime, the true and un-encodable mediators of convergence and divergence, namely aptitude, experience and creativity, remain a constant.
[Haywood (1)] MDA: Nice Idea. Shame About the...
Dan Haywood. May 2004
www.theserverside.com/articles/article.tss?l=MDA_Haywood
[Haywood (2)] Evaluating the Model Driven Architecture (OMG Response)
Dan Haywood
Application Development Advisor January/February 2003
ADA also printed a response from the magazine's review board. This was as follows:
An interesting argument, which may provoke interesting reactions from both supporters and detractors of MDA. Richard's argument that MDA is bad because we there is a point beyond which we cannot compress information without loss, holds only if we are currently at a point where our representation of systems is without redundancy.
This morning I was working on some code for a J2EE system and we generated hundreds of lines of code (not using MDA I hasten to add) without any enhancement of the fundamental model. It seems clear to me that such code could be compressed out of the picture leaving us with the same 'essential complexity' but in a much smaller representations [sic]. There are many approaches for doing this, of which MDA is just one and perhaps not the best one, but that there is value in such compression seems self-evident.
I would have more doubt over whether MDA will succeed in being the most widely accepted approach, than about whether such compression is possible or effective. Some approach or other is required for combining the patterns associated with specific platforms with the design models of required systems, and I await with interest to see which schemes emerge as the most effective.
In response to this:
Firstly, the argument does not assert that 'MDA is bad because we cannot compress information beyond a certain point without loss'. It uses the fact of the lower limits to loss-less compression to demonstrate that automation cannot deliver something for nothing, and from this, that it is dangerous for initiatives such as MDA to promise solutions to the intrinsic challenges of software development.
Moreover, the assertion that 'the argument holds only if we are currently at a point where our representation of systems is without redundancy' is tangential. If the potential for greater automation of an existing approach offers no greater advantage then that potential is immaterial, and the argument therefore holds without qualification. Only developers or automation can manage complexity in software, and the more that MDA-style approaches demand decisions from programmers, the more they fail in their promise, and the less they challenge established techniques. On the other hand, the more they automate, the more they shift the responsibility for specification to a higher level, whereupon the argument is invoked recursively.
Given this, the question of 'which MDA' cannot take precedence over the question of validity; if an idea holds nothing over established techniques then pondering which flavour to try is superfluous. The fact is, however, that there have always been those who will try MDA-style approaches, wherein certain variants may gain favour over others, and here I share Andy's curiosity. However, the challenge of specifying a set of symbols so as to generate a given system correctly will always remain, and thus the 'cry wolf' argument is invoked.