Each of the teams was putting a great deal of effort into resolving the issues that arise from this: difficulties pulling together a release because of conflicting changes, various strategies for avoiding the conflicting changes, tightly controlled ownership of modules producing a concentration of knowledge in one or a few people and bottlenecks in the process, building incredibly complex traceability matrices, etc.
Each of these streams was managed by different project managers (sometimes one project manager had two or more streams), but if there was management above that level (not always the case), it was advocating multiple streams so that ‘progress can be seen’ in each area.
In addition, the business and project management teams tended to view the issues that arose from this approach as technical problems or as failures of the developers. This is a classic example of the kind of problem that W. Edwards Deming reasoned against throughout most of the last century; management prescribes working practices that cripple those who do the work, and then berate those same people for failing to achieve the results predicted by the project management theory.
It seems that the principle reason that the business and management continued to expect results that were repeatedly not forthcoming was that they simply did not understand how the streams necessarily interact in the one essential product of development – the codebase. In one case, development process ‘experts’ were advocating an estimation method which depended not on the experience of the developers, but at least partially on developers’ estimates of the complexity of the class that must be developed in order to solve the problem. Any developer with the smallest experience will tell you that most development in object oriented systems is achieved by modifying existing classes, not by creating new ones. The same is true for every programming paradigm, if ‘class’ is replaced with its equivalent module concept in that paradigm.
This blog is an attempt to provide an example to business and management of why this approach will not work, why there is no general technical solution that does not dramatically increase complexity, and therefore, why the problem should be solved by a process change that starts with the business and management.
To achieve this, I will provide a series of example of the inevitable problems that arise. As most of my target audience will not understand any programming language, the examples will be in the form of a paragraph of text from a detective novel that has been ‘in production’ for some time, and which is still in active development. Each paragraph represents a unit that a developer would manipulate (a ‘class’ in object-oriented programming, for example). So imagine that the business provides requirements for how the novel should proceed, and the developers work on the necessary paragraphs to produce the desired result. Also imagine that this book, like software, will be released frequently and that, for example, bugs sometimes need emergency releases to correct.
Example
[TABLE=2]
As an example of a single stream of development, a new requirement has come in. The business wants to enhance the descriptions in this paragraph, and there have been some complaints that the language seems a little ‘stilted’. The developers get to work straight away and produce the following, revised version.
[TABLE=4]
Although it’s still a little stilted, we’re up against the deadline and needs must; the business agrees to put this into production. Everybody is very pleased, some money is made available by senior management and a party is thrown.
Now, the business has decided that they don’t want Bob to be blindfolded, but instead want the room to be almost dark so that sinister shadows can be seen. Work has started on this, and the paragraph – now a work in progress – looks like this:
[TABLE=5]
Now, a production bug has been found in V2; the tense is inconsistent in the first sentence. This is considered critical by the business, so they want an emergency bug fix. Clearly, the developers can’t simply fix the bug in V3 because it contains incomplete work for the next release. So the fix has to be based on version 2. Fortunately, developers have tools called ‘version control systems’ or sometimes, erroneously, ‘configuration management systems’, that allow us to deal with this very problem. Every time a developer ‘commits’ or ‘checks in’ the changes from their ‘working copy’ (the one on their hard drive) to the version control system, the old version is superseded, but still retained by the version control system. When other developers ‘update’ their local working copy from the version control system, they will see the changes that other developers have made. The version control system is therefore a way of retaining history and also allows developers to share their changes with each other in a controlled way.
Critically, for this emergency bug fix, it is possible to go back to earlier versions. So, the unfortunate developer chosen to fix the bug (the one who will be asked ‘is it done yet?’ every 5 minutes by the worried project manager), can retrieve version 2 of the paragraph and ‘branch’ it to create a version 2 bug fix branch. Now, side by side in the version control system we have these relevant versions of the paragraph (the bug fix is emboldened):
[TABLE=6]
This is great! The bug fix can be, and is, released from the bug fix branch as version 2.1. Everybody is a bit ashamed that such an obvious bug made it into production, so no party is thrown this time.
However, if we look at the version 3 branch (this is sometimes called the ‘trunk’ or the ‘head’ but I will refer to it here as just another branch), we can still see the bug. If nothing is done to correct it, we will have a regression when version 3 is released. As our developer is competent he will know that he has to merge his bug fix into the version 3 branch, or if version 3 has changed too much from version 2, figure out again how to fix the bug in version 3.
Because the version 3 branch is still very similar to the version 2 bug fix branch, it is easy to merge the fix in, and this is duly done:
[TABLE=7]
If any more bug fixes are required in the released 2.x version, they can be made in the Version 2 bug fix branch and released without inadvertently releasing incomplete version 3 work.
Although simplified somewhat, the situation described above is a schematic of the common working practice of millions of developers worldwide. It works very well. The mechanisms used to achieve it are described in more detail by Mike Hogan here.
However, this approach starts to break down if we introduce another significant stream of development that works on a different timescale. Suppose that a new senior manager has arrived with the firm conviction that detective novels work best when they are written in the first person. This is viewed as quite a significant change that will take quite a while to achieve, so a new ‘project’ is launched to effect this change. A new team is formed to do this work, because the business demands ongoing enhancements to the book, and these will occupy the current team. Also, there are bug fixes. So now, we have three streams with three different release cycles.
Now the developers have a choice. They can create a new branch for the ‘first person’ work or they can try to do both sets of work in the main development branch (the Version 3 branch). Now some developers may relish this challenge. ‘Maybe,’ they will say, ‘we can achieve this though configuration, so that the new stuff remains hidden until we turn it on’.
If we try that approach, those developers need to either have conditional logic everywhere to switch between the first person and third person forms, or they need to find some abstraction of the first person / third person parts of the sentence.
Lots of conditional logic is a recipe for bugs, so they choose to abstract the ‘personness’ away. In this case they will need to understand the parts of speech represented by the third person ‘Bob’ and the equivalent first person ‘me’ in the first sentence and ‘call a function’ or ‘define some constants’ that will provide the right value, depending on what is configured. I am not an expert grammarian, but the version 3 branch below represents an attempt at this:
[TABLE=8]
Clearly, now there is a lot more ‘code’ required to support this decision. This is all stuff that developers working on version 3 or on the third person project have to think about whenever they do something. This is also stuff that obscures the ‘business logic’ of the paragraph. It is, as Fred Brooks termed it, ‘accidental complexity’.
In addition, if we look at the text as it would appear in the first person, it doesn’t actually work very well; surely, I would ‘wake up with a start because ice cold water hit me in the face’, rather than ‘Ice cold water hit me in the face and I woke up with a start’.
We could also say ‘Bob woke up with a start as ice cold water hit him in the face’ but now, the first person project is imposing changes on the version 3 work – i.e. the longer timescale project is affecting the work of the shorter timescale project. In the end, those changes imposed by the first person (longer timescale) project will become too many and too heavy for the version 3, 4, 5, etc stream to manage. In addition, the version 3, etc stream will continuously be introducing changes that the first person team will have to deal with, slowing down their efforts. So, clearly, we have not actually managed to separate the projects; they are both causing work for, and influencing, each other.
Now think about what would happen to this approach if the business decides that detective novels are far more exciting if they are phrased in the present tense. Because this is another big effort, another project is kicked off that is going to work with the same material as the original Version 3 project. We will end up with something like this:
[TABLE=9]
Actually, there isn’t enough in the above abstractions to correctly deal with verb agreement, and that would further complicate the paragraph – maybe a good grammarian could do better, but it is unquestionable that we can no longer understand the paragraph easily. The central point is that complexity is being introduced not because of complexity in a business domain, but because of complexity in the process.
By the way, did you notice at least two bugs in the Version 3 stream above? One of them was actually introduced in the previous example and one was introduced in this example. No? Exactly! This approach is not going to work.
What if we tried the same approach as we used for the bug fix branch earlier? Although it may seem logical, given what I have said above, to separate the two major streams of development by branching again, there are significant problems associated with branching over long time scales. The main one is that as more changes are made to each branch, it becomes increasingly difficult to see the similarities between the branches, and this makes merging of changes more and more difficult until it simply ceases to be possible. All work must be done twice, and the inevitable bugs that arise from this will need to be fixed.
To see this, consider what would happen if the ‘first person’ project was launched before the 2.1 bug had been spotted and fixed:
[TABLE=10]
Now, upon reading the First Person Branch version, the events appear to be in the wrong order and the business requests some revisions (in italics):
[TABLE=11]
Now, the production bug fix is required, and applied to the bug fix branch and merged into the version 3 branch as before:
[TABLE=12]
But, because of the revision to the first sentence in the first person branch, it is far harder to see how to merge the bug fix. We must now understand the restructuring of the sentence and identify where to place the equivalent, not identical change. If we do not do this, then we will have a regression when the first person branch finally goes into production for a bug that was eliminated several version earlier.
But of course, we can actually, correct the First Person Branch too, it just takes more effort (or, if you prefer, more time, money and risk).
[TABLE=13]
Furthermore, once version 3 has been released, version 4 will be required, and then version 5. The first person branch must include all of those enhancements too, suitably modified to suit the restructured sentences. Now add the complexity of the ‘present tense’ work on top of this. I don’t think another example is required to show how intricate (i.e. risky and defect-ridden) this is going to become.
So, getting the teams to work with exactly the same material doesn’t really work, and neither does branching the material and getting the teams to keep them synchronised.
In addition, consider the fact that we have looked at a single paragraph in these examples, but in a moderately sized project, it is not unusual to find thousands to hundreds of thousands of ‘paragraphs’ that must be kept consistent, not only internally, but with each other too. If this does not happen, defects will abound, regressions will occur and their will be (and is) a wailing and a gnashing of teeth. This is an indication that the word ‘risk’ that I have used several times is too weak; it is almost a certainty that this risk will materialise and the problems encountered will overwhelm the developers, and the result will be massive schedule slips for all streams.
Summary of the Problem
The essential point is that running projects in this way is an utter project management and business fantasy. It cannot be done efficiently and will lead to problems that cost time, money and increase the risks associated with the project.
These are not technical problems because there is no sensible technical solution. Even though developers can work in these unproductive ways and, after a lot of pain for everybody involved, produce a viable result, choosing to do so is a project management or business decision. So even when the problem is ‘handled’ by the development team, the problem and its ‘solution’ is one chosen by project management or business; spend much more money, much more time and deal with the fallout, which will be significant. Developers cannot be held accountable for these consequences; they are entirely of the making of the project management and the business. The time, costs and risks associated with this approach must therefore be accepted by project management and the business.
If this is not considered acceptable then a simple fact must be understood: whenever you see a problem at one stage of a process that has no known solution at that stage, you must either be prepared to invest a lot of time (money) trying to find a solution in that or subsequent stages and accept the risk of failure, or you must look at a previous stage to prevent the problem from occurring.
Given that most business and project management teams are averse to spending the project budget by asking technical teams to research such problems (and in this case, there is very little hope of finding a technical solution), the only remaining option is to look at the approach chosen by project management and the business.
Avoiding the Problem
One important step to take is to stop slicing the ‘resource pool’ and to start slicing the work in the right way. Obviously, arbitrary slicing of the work will not yield benefit; it has to be sliced in ways that result in deliverable pieces of functionality.
As an example of the kind of dysfunctional ‘allocation of resources’ we have seen, a recent project involved one ‘team’ of 7 developers. One of the developers was assigned to a piece of work that was estimated to take 9 elapsed months to complete, another 3 were assigned to another piece of work that was estimated at 7 months elapsed time, and the final three to another piece that was estimated at 3 months elapsed time.
Any piece of work that is estimated at 9 months of developer time can easily and efficiently, be worked on by 7 developers at once, and the result will be that, assuming the original estimate is good, the time to completion will reduce to just over a month. There will not be massive ‘communication overhead’ and developers will not be tripping over each other as a result of this. Our experience has been that the focus generated by this kind of approach actually keeps the whole effort on track, and can help to reduce the amount of work required. Once this piece of work has been completed, it can be released (and can start to produce some return on investment) and the team – now a genuine team – can move onto the next piece of work together.
In addition, it almost always the case that when a development team is handed a set of ‘requirements’, no matter how pared down the business thinks it is, more can be done to trim off work: by slimming each requirement down, by selecting a core subset that can be released and which can be rapidly followed by the remainder, by challenging the business on their understanding of the system as it stands, and many other approaches.
All of this amounts to a simple notion. In order to avoid the problems described above, the amount of work in progress needs to be reduced and the team organised around that work to get it done. It is poor project management and a poor process that requires several plates to be kept spinning at once.
The measure that we should value is the amount of work done, not the amount of work in progress. But what do we mean by ‘done’? If it is in production and providing business value, it is ‘done’. Otherwise, it is decidedly not done. Work that is not done is better not started, until it is possible to get it done quickly and efficiently.
In my experience, it’s worth taking that parenthetical on ROI and making it a major argument.
If you ask the businesspeople to estimate the return for each item, you can put together a simple spreadsheet showing two scenarios: many projects in parallel, or each project serially. Show costs, returns, and net, both per project and for the whole set. What you’ll see is a decreased need for capital and increased return on investment.
As a bonus, you get to release things sooner, so your company looks like a vigorous market leader.
Even if there were no additional cost to spinning multiple plates, the business case for minimum work in process and focusing as much as possible is pretty compelling.