Burndown, Prediction, Confidence and Risk

One of Casual Miracles’ clients had a long running project with a big specification up front. This is not the mode in which we usually like to work. It is, however, an opportunity for an experiment with burndown charts.

Having obtained estimates for all of the work, and with contracts duly signed, etc., the work was begun. And so, we tracked the burndown:

The chart above shows the first 43 days of burndown.

One common question that we try to answer with the burndown charts is: ‘What day do we expect the burndown to reach 0?’. That is: ‘When do we expect to finish?’ The assumption is that historical performance predicts future performance. In our case, the question is more like: ‘Is there any reason to suspect we might not finish by the contracted date?’

One simple way of answering this question is to look at ratios: we have completed (551 - 302) estimated days of work in 43 days, so we can expect to take 43 * 551 / (551 - 302) = 96 days to finish everything.

A more ‘sophisticated’ approach is to put a linear regression line through the burndown and extrapolate. This tells us that we will reach 0 remaining estimate on day 85, but it fails to give an indication of the confidence we can have about this prediction. We can use the the correlation coefficient to give some idea of this, but it is difficult to know what to do with it.

In fact, the questions above are not really the right ones. The question we should be asking is: ‘How confident can we be that the burndown will reach zero on or before a given date?’.

In order to answer this, we need a different approach. One such approach is based on the Monte Carlo Method, which in this case translates to asking: ‘Assuming the future burndown is similar to the historical data, what fraction of the possible futures result in the burndown reaching zero on or before a given date (or between two dates)?’

We can apply this method by starting with the estimate on the last burndown day for which we have an estimate (day 43), choosing a day from the history randomly, determining the change in burndown between this day and the preceding one, and then applying that change to the estimate we have for day 43. This then becomes the estimate for day 44 and we repeat the process, until the estimate reaches 0. This then represents one simulated burndown, and we repeat the whole process thousands of times to obtain a distribution of estimates for each future day.

The chart below shows the first result from applying this method:

Burndown With Median Line

The line extending the historical burndown is the ‘Median Line’. This is obtained by determining the estimate for each day for which half of the simulations yield a higher estimate and half yield a lower estimate. When this line intersects the Day Number axis (day 97), it tells us that half of the simulated burndowns have reached a zero estimate on or before day 97 and half have not. Based on this information (and speaking somewhat informally), we have a half chance of delivering on or before day 97 and a half chance of delivering afterwards.

Knowing this is interesting information, but it is reasonable to ask what is the spread of simulation results. After all, if all simulations terminate within a very tight spread centred around day 97, we can still reasonably accurately predict the end day. If however, the spread is very wide, some other action might be suggested.

We can obtain an idea of the spread in the same way as we obtained the median line. If, for each day, we ask ‘What value of estimates (centred around the median value) contain x% of the simulated results on each simulated day?’. This will give us ‘confidence intervals’ around the median:

Burndown With Confidence Intervals

The chart above shows the 50%, 90% and 99% confidence intervals. Consider the 90% confidence interval, which is enclosed by the dark green curves starting at the final known burndown day (43, 302) and terminating at (74, 0) and (124, 0). On each simulated day, the set of estimates between these two curves contain 90% of the simulated burndowns. Because we placed the intervals symmetrically around the median line, the space underneath the lower curve represents 5% of the simulated estimates (those that represent the team performing above expectations), and the space above the upper curve represents the final 5% of the simulated estimated (those that represent the team performing below expectations).

Clearly the spread is quite wide! If we wanted to give a date that represented 95% confidence, we should choose choose day 124, which is 27 days after the median line predicts, or a difference of 25%. It is also 28 days (also about 25%) after our simple ratio based estimate given at the beginning of this article and 39 days (40%) after the the linear regression estimate.

The astute reader might notice two important things:

  1. The day at which the confidence interval curves reach a Remaining Estimate of 0 spreads out alarmingly as the required confidence increases. i.e. the 50%, 90% and 99% upper confidence envelopes require 10%, 28% and 45% respectively more days to reach 0 than the median line does.
  2. The confidence intervals are distributed symmetrically around the median line when a vertical slice is taken through the chart. Indeed, the histograms giving rise to these confidence intervals are near-Gaussian (courtesy of the Central Limit Theorem), at least until some of the simulations are terminated when they reach 0 remaining estimate. This is not true when a horizontal slice is taken. In particular, when those curves reach the Remaining Estimate of 0, it is plain to see that, for example, the lower envelope for the 90% confidence interval is 97 - 74 = 23 days below the median, whereas the corresponding upper curve is 124 - 97 = 27 days above the median.

The first of these points suggests that, even at this advanced state of the project, requiring high confidence about the delivery date demands significant ‘contingency’, with the 99% upper confidence envelope prediction being more than a factor of 2 larger than the corresponding lower confidence envelope prediction. Similarly, the ratio between the upper 99% confidence envelope prediction and the median prediction is just under 1.5.

The second point suggests that it is easier to run late than it is to run early, which will come as no surprise to development teams and their customers everywhere. Indeed, the skew also implies that when things are running late, it is possible for even more tardiness to result. This is intuitively true since, with less work remaining to be done, it is less likely that significant estimation ‘errors’ will occur.

Of course, we should not take all of this too seriously; a great deal has been made of very little historical data. Attention to a burndown chart often causes teams to adjust scope and other factors to bring the project in on time. The date is far more likely to be missed because of some unmitigated risk, such as the absence of a team member for an extended period of time.

In a subsequent article, once the delivery is complete, I will return to this topic with the remainder of the burndown data and see how well the confidence intervals modelled the future.


Update: About a week after I wrote this article, a series of events occurred that made it impossible to complete the graph in the way that I’d hoped.

First, one of the ‘key’ developers left the team. The notion of ‘key’ developer (due to functional silos) is something that I’d been trying to eliminate since starting with the organisation, but the team had not had sufficient time to get to grips with each other’s work.

Since the team was not large, the developer leaving implied that timescales would change. This caused the client to panic somewhat, leading to a review of the planning. During this review, it was discovered that the original requirements were hopelessly inadequate in term of both ‘completeness’ and ‘correctness’. After more analysis, the total scope required was estimated at six times (!) the original estimate and much of the remaining original work envisaged was de-prioritised or deemed to be no longer required.

Given the significantly extended timescales, a couple of the developers who had been intending to hang on until the delivery decided to resign, since the delivery was now clearly much more remote.

Shortly after this, a member of the executive team who I had a good relationship with was replaced by someone else (not because of this project). It very quickly became obvious that this new executive had a strong belief in process over people, big requirements up front and all manner of things I have railed against for years. This clearly spelled the end of my time with this client.

Although I had warned of much of this risk early in the project, I take no pleasure in its materialisation. It could have all been completely different.

As I said in the first paragraph above: This is not the mode in which we usually like to work.

Comments are closed.