This blog contains reflections and thoughts on my work as a software engineer

torsdag den 25. december 2008

TDD on untested code - why does the first test take so long?

Most software developers have at some point in time been assigned a task with the following description: "We need to extend report X with a new column which sums up all the other columns in each row" - or something like that. Even if you don't know anything about the codebase tasks like this one in most cases don't cause too much trouble because you extend already something which is already working and in production. Even if you like to deliver this new functionality with shiny tests with 100% coverage of the new functionality it isn't a big deal because you don't have to touch the existing armageddon of untested code.

Worse - much worse is the task assigned to you the next day which says "Well - we have this bug in our invoicing system which causes all sales on the 1. of January to never show up in our reports. We need you to fix it". Even if you know that the invoicing system is a pretty decent working piece of software without a lot of maintenance - you also know that it has a mere code coverage of about 2%... When asked "How long will it take" what do you respond?

If you and your team are on the TDD side of things I have always been astonished how long it takes for someone to take the first step towards TDD-enabling untested software. However after having read the first 10-or-some chapters of Michael Feathers Working effectively with legacy code I am beginning to put words to my gutfeeling of "This will probably take a while..." when I see a change request to a system I know doesn't have a decent testsuite attached.

You've been assigned the task and you need to write your first test - why does it take forever to write it? Michael Feathers goes deep with numerous code examples but from a birdseye view it all sums up to the fact that before you refactor the dependencies in your untested code you can't write a suite of good unittests. Unittests must be kept as short as possible - I've heard the "No test should be longer than 15 lines of code" and it is a good best-practise to follow. Of course it doesn't always happen but if you have to write 20 lines of code to set up your test you need to look for design smells - most notably violations of the Single Responsibility Principle on the class or classes you are testing.

What I've recognized after reading these chapters are that code which have been developed without testing in mind have lots of dependencies to other classes. I've whupped up quite a few of myself - no doubt about it. When all you do is write code and never actually use your own code in a test you'll never get to see how your code "looks" from the outside. You basicly just make things work - and whoever is left to use your code after you have signed off on a task are pretty much on their own. This is how i.e constructors and methods with 10+ parameters are being born - I've seen them in my entire professional career wherever I've been - if you are "coding away" and need an extra parameter, you change the signature of your method, change the caller to use your new signature and everything is fine - from your perspective. What happens is that the more parameters you pass into a method the more state you will have to set up in your "Arrange" section of a unittest. Given that the more parameters a method or constructor takes the more state is injected your tests of that method will also be bigger in terms of lines of code - and long "arrange" sections in your tests are as much of a codesmell as any other codesmell you can think of.

The first test you write on untested code will have to take into account the time spent refactoring all the intertangled dependencies you dig up trying to get to those precious three lines of code you want to test on. It's a fact of life - get used to it. You might have to split up large classes, move object creation to constructors, extract interfaces to be able to mock dependant libraries etc. before you can actually make the single test that solves your problem turn green... The next time somebody has to write a test on your refactored code it will be easier - but the first test takes the longest - and the first test is also the one people fear the most because there is no other tests around to back you up when you begin to alter working code.

I agree with him on most of his writings - not everything though. He actually promotes writing "characterization tests" for already existing code "to be able to test for the existing behaviour". I disagree with him. The time you spend writing tests on existing code could be better used focusing on solving the problem you have before you. It's a mere waste of time in my opinion - when do you know enough and have written enough tests to conclude that the core existing behaviour is covered and documented in your tests? No sir - Keep It Simple Stupid, just get the job done. Use some of the many useful techniques he has documented but please - don't write tests for existing code. He means it well but honestly - it's just not valuable work to write code for existing functionality.

Even though I'm not finished with the book I can highly recommend it to anyone who just wants to be able to navigate through untested code without causing too big waves in the pond. It's a good book with quality stuff in it - definately one I will use to plan my attack on codebases that I've never touched before

Regards K.

1 kommentar:

Anonym sagde ...

My experience with TDD has given me two opinions that aren't too popular these days:

1. Unit testing is only one way to code up a test, and although it's automated, that doesn't make it the best fit everywhere. Where it's appropriate(integrated systems, algorithms with many variables such as a physics simulation), you can and should instead make "toy interfaces" that let you generate the complex/corner-case situations encountered in production use.

2. The best time to start writing the test is the moment you realize the code you're writing will be hard and painful to debug. That's kind of a judgment call, but I find it to work out well because the moment you realize that, you're deep into the implementation details and can quickly come up with the deepest unknowns "lurking" in the code you are writing or about to write. Writing the test at any other time, you're a bit distanced from those unknowns and won't think of as many corner cases.