Tuesday, 1 May 2012

How Bad Code Commit Habits Hide Problems

Commit habits, it seems, are highly personal. I know people who commit incredibly fine grained work, and I know people who commit X thousand changed lines regularly. Who am I kidding, I just did that yesterday. And last Friday... There's good and bad in everyone's behaviour, and I'm interested to think about what's what in this particular case. Most importantly, it seems to me (and others), that bad code commit habits can cause terrible de-coupling of the inputs and outputs of the development process from the coding itself.

I was initially inspired to write a few words on these matter by this post, and I've been reminded to finish it by the new tester on my team, Sona, who's come from a class IV medical devices background, where their auditor insisted that every single commit was linked to a defect or change request ID.

Part of me wants to pursue this at Vicon, and part of me knows that I'll get a hard time doing so, so where's the middle ground?

So, the original post shows how a particular team is disciplined enough to link each commit to a user story (or, I'm guessing, a range of commits sometimes) and a test to each user story. Now, linking tests to user stories seems fine, this seems to happen a lot (in fact our auditor just insisted on it, albeit gently), but linking to commits to a user story? No, not yet, only to defects and change requests.

And under what auspices do we commit re-factoring? If legacy code is really crumbly, you just have to do a clean sweep through. What do you do here? Perhaps get a ticket raised for the re-factoring? Thinking about it, maybe this would be a good thing: you're going to be confronted on the necessity of the work you're proposing to do. The downside I can see here though is that the people you're probably going to have to approach are going to be weighing up re-factoring against other defect and change request tickets they'd rather see in your queue, and may not see your argument. If we have to do this, then maybe the technical lead or chief architect should have the power to approve, and perhaps there has to be some up-front agreement about how much dev time you're going to spend on re-factoring.

But getting back to the subject of simply "bad habits" when it comes to code commits, let's go through a few:

  • Committing everything you've been working on at the end of the day. This is simply lazy backup and will potentially break the build. Use a good tool like Mercurial, where you can commit to your own repo, rather than the trunk.
  • Committing infrequently. OK, I'll caveat this with "where it can be avoided". I just spent a whole week re-factoring a small system (well, it would have been two days if all I did was write code), and there was no way I was pushing to the trunk until I'd completely finished, and test driven all of the applications that used the sub-system. But anyone that only commits once a week routinely needs, to read Continuous Integration. Right now. I'll lend you my copy.
  • Committing "mixed" work: This has to be one of my biggest bug-bears, but also one of the things I understand is hard to avoid. You're working on a specific user story or defect, and you notice a simple fix to a different bug, so you address it. It all goes in the same commit, and the message says so, so what's the problem? My issue is, that it's incredibly hard for me to disambiguate the work from the code's perspective: Yes, rev 4036 contains the fixes to issues #345 and #397, but which change to which file does which? We have to fight the urge to deal with the second job while the first is in flight, and finish what we've started. You could even shelve your current changes by dumping a diff or patch and reverting if you want to get the new thing out of the way first. I was thinking for a moment that committing a secondary fix, but commented out, then making a second commit removing the comments might work for a second there, but then I was reminded that I'd have to take myself outside and shoot myself for committing commented out code (or worse, compile-excluded code using #if). (Note to self, take myself outside and shoot myself tomorrow...)
  • Committing changes and not even attempting to link it to some specified work. At least if you're committing a re-factor to some sub-system, state what it is you're trying to achieve ("to improve disk write performance" perhaps - maybe even the ID of that user story). If a commit message doesn't say anything, then I've no idea whether you're fixing a new bug, an old bug, re-factoring, implementing a new feature or just making an arbitrary change.

A fundamental problem that seems to have emerged for me here is that problems arise as soon as we start doing more than one thing at a time! The problems don't start and end with the codebase of course: we're not singly focussed on one work item and we're multi-tasking, and therefore losing time in context switching. Since working with Mercurial, I'm beginning to like the constraint it provides of having no outstanding commits before doing a push: at first it can seem like a constraint you can do without because of those little tweaks you have in for your local build and some tentative optimisations, but it's a very simple and effective constraint that forces a singular focus on finishing a task.

But when all said and done, I seem to have talked myself in to three important points about code commit habits:

  1. Infrequent commits damage my ability to estimate.
  2. Mixed commits damage my ability to connect fixes and features to actual code.
  3. Commits that aren't linked to fixes or features prevent me from figuring out what work is value adding, and what work is churn.

No comments:

Post a comment