Large Systems

Many times over the years we have encountered an attitude towards code cleanliness that is summed up in the assertion that as systems grow in size, code quality will necessarily degrade. Often the argument is backed by references to ‘programming in the large’ as opposed to ‘programming in the small’. We have consistently found these assertions to be unfounded and this article is an attempt to explain why.

To do so, we will break a recent client’s confidence and show you the whole system we built. This system was an ‘enterprise’ system in that it was bought to life within an ecosystem that contained the usual array of multiple messaging technologies, databases (legacy and new) and bizarre mandated in-house technologies that performed functions we didn’t have a need for.

The system in question is an order management system, the back end of which is written in Java and a front end is written in C#. The back end takes orders in the form of messages from multiple sources. Each source has its own message format and sometimes its own messaging technology. Examples of the different message formats and messaging technology are: the C# GUI which uses REST over HTTP to exchange XML messages, an electronic sales system that sends messages over JMS using a proprietary message format, and another electronic system that sends messages over a proprietary messaging system in FIXML.

In addition, we had to back onto a legacy database for reference data, but used our own new database for storing our specific data.

I think it’s reasonable to say that this is not a toy problem. It is exactly the kind of problem that might set an expectation of low code quality, high defect rates, etc.

Now, without further ado, I will show you the whole system. Brace yourselves:

public class MessageConsumer {
    private MessageTranslator translator;
    private OrderCommandProcessor processor;
    private Responder responder;

    public MessageConsumer(MessageTranslator translator,
                           OrderCommandProcessor processor,
                           Responder responder) {
        this.translator = translator;
        this.processor = processor;
        this.responder = responder;
    }

    public void consume(T message) {
        try {
            processor.process(translator.translate(message));
            responder.success(message);
        } catch (Exception ex) {
            responder.error(message, ex);
        }
    }
}

There it is. Nine months of work!

This class represents the entire function of the system. It accepts messages, translates them into a canonical form, processes them and informs the sender of the result.

So where is the complexity? This code is pretty clean; there’s no duplication, no excessive cyclomatic complexities, no global variables, no spooky action at a distance, etc.

What’s that? You’re saying I’m cheating. That all of the complexity must be in the MessageTranslator, OrderCommandProcessor and Responder. Yes, OK, there’s some more code there. As an irrelevant side note, there are actually multiple instances of the MessageConsumer, each configured with different MessageTranslators and Responders but the same OrderCommandProcessor. This is because, as stated earlier, there were multiple messaging technologies and message protocols in play and the abstractions represented by MessageTranslators and Responders adapt each of those to the one internal form that we want to handle.

But since you insisted, let’s look at the OrderCommandProcessor to see if we can find some of these ‘large system’ problems:

public class OrderCommandProcessor {
    private OrderCommandLogger logger;
    private OrderBook orderBook;
    private ProductCatalog productCatalog;
    private OrderValidator validator;

    public OrderCommandProcessor(OrderBook orderBook,
                                 ProductCatalog productCatalog,
                                 OrderValidator validator,
                                 OrderCommandLogger logger) {
        this.orderBook = orderBook;
        this.productCatalog = productCatalog;
        this.validator = validator;
        this.logger = logger;
    }

    public void process(OrderCommand orderCommand)
                throws ValidationException {
        orderCommand.validate(validator, productCatalog);
        orderCommand.execute(orderBook);
        logger.success(orderCommand);
    }
}

Hmmm…. That doesn’t look too bad either. One question the astute reader might ask is why we have the OrderCommandProcessor at all. Why don’t we just have those three lines of code and the necessary dependencies in MessageConsumer? The Single Responsibility Principle notwithstanding, it looks like the introduction of an unnecessary class. Right?

Well, the MessageConsumer is one of many entry points into the system. All but one entry point is a message queue of one kind of another. The C# GUI is the other one and that sends XML over HTTP. Our corresponding servlet turns the XML into OrderCommands and then dispatches to the OrderCommandProcessor for the rest. But, messages received from the GUI also required some additional processing so we couldn’t simply have a MessageTranslator and Responder suitable for HTTP messages (actually, we could have done that and then decorated the OrderCommandProcessor with the extra stuff – this would have required an extra class anyway). In any case, separating out this functionality lets us process the OrderCommands uniformly, regardless of where they come from.

But still, it’s not complicated; we have a bunch of instances of MessageConsumer and our servlet, all appropriately configured to translate messages and hand them over to the OrderCommandProcessor to do the rest. That’s all fairly straightforward, so far.

However, validation always makes things more complicated. Maybe we can find some complexity in there. OrderCommand is an interface, because there are different kinds of command (create, update, cancel, execute, book, etc). So to see the validation we’ll have to look at a particular implementation:

public class CreateOrderCommand implements OrderCommand {
    private Integer salesPersonId;
    private Integer clientId;
    private SourceSystem sourceSystem;
    private Integer productId;
    private Quantity quantity;
    private Price price;
    private Side clientSide;

    public void validate(OrderValidator validator,
                         ProductCatalog productCatalog,
                         ClientBook clientBook)
                throws ValidationException {
        validateRequiredFields(validator);
        validateClient(validator, clientBook);
        validateSalesPerson(validator);
        validateProduct(validator, productCatalog);
        validateQuantity(validator, productCatalog);
        validatePrice(validator, productCatalog);
    }

    private void validateRequiredFields(OrderValidator validator)
                 throws ValidationException {
        validator.notNull(salesPersonId, "salesPersonId");
        validator.notNull(clientId, "clientId");
        validator.notNull(sourceSystem, "sourceSystem");
        validator.notNull(productId, "productId");
        validator.notNull(quantity, "quantity");
        validator.notNull(side, "side");
    }

    private void validateClient(OrderValidator validator,
                                ClientBook clientBook)
                 throws ValidationException {
        validator.validate(clientBook.hasClient(clientId), "client does not exist");
    }

    ...
}

Oh well, not too much complexity there either.

OK. I have shown enough of this system. What is my point? Well, it all looks the same! Yes… code at each level of this system looks the same. It is very hard to tell by looking at the code that one class is ‘enterprisey’, whereas another is ‘domainy’, other than by looking at their names. Where is the ‘large-system’ code? This code all looks ‘small-system’ to me. The complexity of code in each location is no more or less complex than in any other location, and it’s all pretty simple stuff.

All code, no matter what size of system it appears in, consists of the same language constructs to which is added some library usage plus whatever application specific abstractions are required. There is no difference between the code in ‘high-level’ classes and that in ‘low-level’ classes. A class doesn’t need to know that it is in a large system or a small system. The level of complexity at any point can be about the same, if you choose to distribute it appropriately.

What is true is that systems usually (but not always) become large over a longer period of time than small systems, giving them more time to accumulate bad decisions. There are however some practices that make it more likely that a system will suffer in this way. For example, ubiquitous use of language primitives to represent rich concepts invites complexity, subsequent attempts to work around the complexity, and introduction of ‘clever hacks’ to avoid disturbing too much code. These ‘clever hacks’ are often themselves the source of need for future workarounds.

The ‘large-system’ problems start as choices about the small stuff. As an example consider the Side type in the CreateOrderCommand above. This represents the notion that an order can be a ‘buy’ or a ‘sell’ order. When we introduced this type, we had a choice. We could represent the concept of side with a language primitive (a char with value ‘B’ or ‘S’ maybe, a String with value ‘buy’ or ‘sell’, a boolean named ‘buy’ implying a sell if its value was false, etc.) or we could introduce something whose meaning was unequivocal. We chose the latter and in so doing we aid future development because we leave little room for misunderstanding, and we also allow safe extension of the notion of side. In this system, this concept did end up being extended with two more cases and dealing with those cases at all the decision points in the code (some polymorphic, some explicit) simply became an issue of looking at the compiler errors and understanding what the business wanted in each case, rather than having to track down each case, often in response to a defect found in production.

Another example of a ‘big system problem’ is that in order to add a new feature, a new piece of data needs to be passed through a large number of interfaces before it is finally used. Sometimes, developers decide to circumvent the need to touch a large number of classes by using some shared mutable state. This is ‘chewing-gum and string’ programming. While it preserves the appearance of the interfaces, it profoundly changes its semantics, introduces non-locality, hides something essential to understanding and will most likely result in defects (even without concurrency). It certainly results in something that has to be remembered or documented. Often this problem is due to a reluctance to add new abstractions – ‘primitive obsession’ and concern about ‘proliferation of classes’; it is frequently the case that if various domain specific abstractions are being passed as parameters, then the data required for new features will fit into existing abstractions. If the new data does not fit, then something significant has happened and a team that thinks in terms of abstractions will not hesitate to make use of that new information to make their code more accepting of changes in the future.

When we maintain ‘big’ systems, we are often involved with third party components. For example, the order management system above uses messaging systems, has an embedded HTTP server, persistence layers, etc. The degree to which these components pollute the application specific code is entirely a matter of choice. These components could have been allowed to run amok through the codebase above but instead are tightly contained and as a result, the code that deals with them amounts to a few dozen lines at the most, expressed in one or two classes in each case.

Collectively, the amount of code represented by these components completely dwarfs the order management system codebase. So the vast majority of the system is code that wasn’t even written by us. But all of those components do not require us to deal with ‘big system’ problems. Why? Because most developers would rightly shy away from using a third party component if it invaded the rest of their codebase – consider the reaction of developers to using an ‘ORM’ framework that requires all persistent classes to implement a framework interface – so the implementers of those components have taken great care to make sure that their stuff can be used in simple, non-invasive ways.

Given that at any one time, a developer sees a very small fraction of most codebases and it hardly matters if that fraction is 0.01% or 0.0001%, the difference between small systems and large systems is simply that we deem a system to be large when we can no longer keep track of the idiosyncrasies. And that can happen in very small codebases indeed. These ‘large system’ problems are actually nothing to do with the size of the system, but instead entirely to do with their quality.

Comments are closed.