Wednesday, 4 May 2011

Theory of Constraints in Software Development

I finally got around to reading The Goal recently, and let me tell you, if you you haven't already read it, you must do so, it's simply brilliant. I really mean that:
It's simple (TOC is  incredibly obvious once you know it)
It's brilliant (our internal LRQA auditor gave me a story yesterday of doubling a company's turnover in 2 years by applying it).

So what's it got to do with software development? EVERYTHING! Basically, practically every human endeavor to make something (anything) new, is a sequence of linked operations with statistical variation in them. This means that TOC applies. Simple as, no arguments, no buts, no "we're different", it applies. End of.

So what do we do as software project managers? Well, as the book goes, we "find Herbie" (the slow-coach in the pack). It's a vital message in the book, to recognize that your total throughput is the same as the throughput of your bottleneck. There are then five focusing steps involved in dealing with your bottleneck:

  1. Locate the bottleneck
  2. Maximally exploit the bottleneck
  3. Subordinate everything else to exploiting the bottleneck
  4. Elevate the bottleneck (add capacity some other way)
  5. Avoid inertia and keep checking your bottleneck hasn't moved

And what have I done? Well, I don't know for sure yet, but I've a strong suspicion that QA is our bottleneck, and my numbers are piling up to allow me to make a proper judgement. Testing and QA has traditionally been held to the end of a project in the waterfall development lifecycle (the worst place for Herbie to be, he should be at the front, controlling the pace!) In addition, QA has the terrible problem of it's throughput being a slave to the shipping decision: it's like the average pregnancy term in the UK: unknowable, because we prevent them from running more than two weeks over! In the software gestation period, the worst offenders cut their losses and call all of the bugs "known issues"!

With all of this considered, what should I do about the test team? Basically, make sure they're never sat idle! Now I first learned this from Kanban by David Anderson, but it's a TOC thing really. By applying the teachings of Anderson and Goldratt, I've been through a very enjoyable learning curve, passing through a number of phases on the way:

First, I simply loaded the testers up with all of the bugs that we'd addressed so far and a few feature tests for code we'd completed. That went through quickly and generated a few new development items, but the testers were pretty much left twiddling their thumbs again almost immediately. The lesson here was that at this stage in the process, test is not a constraint, coding is. However, overall, I know it's likely to be, because of the QA push at the end, when the developers sit idle waiting for bugs.

OK I thought to myself, We'd better load them up with some feature tests for the stuff we're developing now, after all, we have finished specs for the tests to be written against and this is all good Agile practice. On balance, this seemed to get us closer to keeping the test team busy, but they were still capable of starving the test buffer pretty quickly if they were all in the office and working well. The lesson I learned at this stage was a basic Agile one as already stated: that a bunch of user stories and a detailed spec is best expressed as a set of tests. There's still more to do here, and I've been talking with people only today about whether my Product Manager should sign off a feature after it's been tested, or whether he should sign off tests as representative of his requirements up-front.

The interesting thing that I now felt I could see on my Kanban board was a test buffer "vacuum" which just wanted to consume some bugs or features. The clock was also ticking in my head: every day that goes by conjures up fears of delivery a day late, or quality some equivalent to a "test day" lower. The natural thing to release to the "Testing Hoover" seemed to be bugs, so we stopped feature development and addressed a load of legacy bugs. In fact, we do this on a regular basis now. Not just the bugs in the feature we're writing (that's plain Agile good practice to keep quality up and maintain regular delivery of release quality code), but historical bugs and new issues found in features not under development. This was pretty good at feeding the hungry testers, but still, they seemed to need more. Although almost everything I'd done so far to keep the test team busy has been about "exploiting" the bottleneck, this step involved me realizing that a bug backlog is a type of inventory in the process, and TOC tells me that I should reduce this as well as increasing throughput. Although bugs don't represent items we've invested time and money in that we haven't yet sold (a very simple definition), it does represent a pile of work-in-process in front of the bottleneck: it has to go through at some point, so why wait until the bottleneck is maximally loaded?!

The last thing I have tried was to enter bugs into test before they are fixed. This might seem odd to some, as testers are used to verifying that something's behaving the way it should be, rather than writing a test that they know is going to fail. I was a bit worried about this if I'm honest, as a broken feature prevents a bit of exploratory testing around it to add to the tests. However the great advantage of this approach was that once a developer did a bug fix, they could have almost immediate feedback about whether it was fixed or not. What did this teach me? That TDD can "reach outside" development and extend to the test team, such that they create independent, "QA minded" tests to support development.

An overall lesson I have learned is one that Kanban can teach us all: that I need an appropriately sized buffer for my test team. Particularly as I'm in the UK and the testers are in the US. And the test manager's in Australia. It's also worth noting that - for me - test is not only a bottleneck constraint, it's a non-instant availability constraint also, as the test engineers also do customer support and site installs, so can disappear for days or weeks at a time. Basically, as soon as their backside is back in their chair and they think of me, I need them to be able to start work straight away!

So, what's left to sort out? Well, one huge flaw in all of this is that I don't know for sure that testing is the bottleneck, so I need to gather and scrutinize some data to root out where Herbie really is. The other thing I need to do is to bring in the remaining three focusing steps: all I've done so far is try to maximally exploit the bottleneck. I'll need time to see if this has a positive effect on our QA process beyond code complete, but if there's more to be done, then I need to look at other strategies. However, I suspect that the next thing to do will be to sniff out a new "Herbie" within the process. That shouldn't be too difficult in principle, as the process is very linear and is composed of very few operations (compared to a complex manufacturing plant). However, I can see the need to gather good statistical data to support this learning process as a significant challenge. I'm looking forward to it.

6 comments:

  1. Nice article
    I'm talking about this and related issues today in #lssc11 see my slides at http://www.slideshare.net/mobile/yyeret/using-flow-approaches-to-effectively-manage-agile-testing-at-the-enterprise-level

    One comment is that your next step might be to continue to reduce the batch size in order to drive for earlier and earlier testing. Which makes sense since earlier quality is cheaper not just to avoid bottleneck at the end.

    If you're looking for more inspiration see Don Reinertsen principles of product development flow

    ReplyDelete
  2. I could have told you that about 20 years age ... wait a moment, I did tell them that - repeatedly, on many occasions, and demonstrated it be sending in lists of bugs within a couple of days of receiving new software releases.

    Because the "testers are used to verifying that something's behaving the way it should be" it was always awesomely easy to crash OML apps by simply asking myself, "what happens IF?"

    The problem is that the testers are always trying to please you, the code-master, by showing you that the software works - what you need are testers who piss you off by actively trying to crash your code and show you that it doesn't work, or works badly.

    The trouble with this approach is that that type of tester is seen as a trouble maker so they get eased out of the company.

    ReplyDelete
  3. Interesting comments @Anonymous: I think that we (as an industry and as a company) are getting past the "tester is a pest" mantra. I share your opinion that they simply have to be an order of magnitude better at finding bugs than customers, after all, the company is paying the testers and the customers are paying the company! Getting the attitude towards testing right is part of my goal, as well as giving them as much testing to do as is physically possible.

    @Yuval: I'll certainly look up Don's work and add it to the reading list. I also like your slide about the bug pile being smoothed through the project, and I'd use it in future to illustrate one of my points - that we have to FIND bugs early, as well as fix them quickly, which isn't compatible with the traditional "start testing when you're code complete" paradigm.

    ReplyDelete
  4. How do you make the decision to release to test? Are you arguing that dev would perform their tests exactly as always but the next step would be pushing the same code through QA prior to allowing dev to fix anything? It seems some level of functionality must be achieved before it is released to test. Am I missing something?

    ReplyDelete
  5. Curtis,

    Forgive me if I have misunderstood your question. We release to test (and by that I mean the QA department, not just to the build/test farm: that happens continuously) when we have run whatever tests are available to the developer (bugs or features). Having the bug or feature assigned to the testers long before the developers have finished their work means that the likelihood of churn is really low: any tests the tester has created that are automated prevent the developer from releasing to QA until they pass. Even manual tests can be quickly scraped by the developer to check of the basics before they deploy.

    ReplyDelete
  6. a Bit late but you should really read critical chain... (same author)

    ReplyDelete