This blog contains reflections and thoughts on my work as a software engineer

onsdag den 14. oktober 2009

Definition of software quality #3

I have from time to time struggled hard to define the term "software quality". I have written about the subject before

...and on and off it has come back to us at work - we've discussed the subject for hours without any conclusions we could all back up 100%. The discussions were mainly taken whenever we had delivered incomplete features which proved to be - less than adequate tested...  :o)   At my current job we have no testers employed and our QA process is driven by developers so we do our best and take responsibility for our actions knowing that testing our own code is an antipattern. I had a small breakthrough last week - I think I've managed to figure out metrics for our business which actually is watertight - it was during a meeting when I thought it up and asked the others for feedback. They bought it immediatly and we've started gathering data for the first time... What we're measuring? First a little background story:

At work we handle a lot of asynchronous processing of online enrollments. An enrollment made by a user because he or she wants to attend an arrangement or a course (summer camps, soccer schools, gymnastics every Thursday throughout the winter etc.) ends up in our backend system as a "job" which will be processed asynchronously. A job could be an enrollment but it can also be updating an address change in various databases and 3rd party systems. Another job could be sending Emails to everybody attending a summercamp. We have a lot of business processes ending up as different kinds of jobs in our database. 

Processing jobs succesfully are highly critical for our business to function so if a job fails the business needs to attend it and fix the error. A likely cause for an error could be that a 3rd party webservice didn't respond in a timely fashion or that our support staff has blocked someone from attending more courses because the person in question didn't pay for the last course. That person is by design still able to make another enrollment but the job created will fail because there is a business rule somewhere blocking further money transactions on that particular person. Enter a developer who makes the necessary phonecalls to clarify if the person should be unblocked or not. The developer can then restart the job if our support staff unblocks that person in our backend systems.

Why do I tell you all this? Because I figured during a meeting last week that we could easily get a clear indication of our software quality by measuring the amount of failed jobs during a period of time. We are interested in measuring the quality of code. What code? That really was the question... And how to measure if that code was OK? You can get some of the way by using tools to analyze code and reviewing the codebase but I have realized that the main interest of the business is to know that whenever they use our backend as part of their business process they need to know that everything "works". How do we know that it doesn't? If a job fails... The only people who care about TDD, IoC, mocking frameworks, Cyclomatic Complexity and code coverage are developers - the business just wants to know that when they click "Send Email" that email was actually sent to the recipients. The quality of the code itself really comes second as long as your users are able to do their job. I had two other developers and my boss think about it for a few seconds and they bought it instantly. As of last week we've started saving an entry in a database table whenever a job fails us so we can track over time how many failed jobs we have under the current load. Then we also know which parts of the codebase and which business processes that needs improvement if the same kind of job fails repeatly under the same conditions. Sweeeeet........

If you (as a developer) are in doubt about the quality of your codebase ask yourself these questions:

  • Can you in any way as the software developer with insight measure when the software fails a user?
  • Can you in any way as the software developer with insight measure when the software fails a user - when the user doesn't know that an error occured?
  • Can you as a developer fix any problem in the codebase knowing all sideeffects of your changes?
  • Can you easily test and deploy a bugfix/patch to the production environment?

There are many ways to measure code quality but start focusing on the user experience and the shortcuts they make because the software doesn't work as expected. Software quality should be measured in

  1. Happy users being able to do what your software promises them they can do.
  2. Happy programmers feeling comfortable with the current state of the codebase

Until next time...

3 kommentarer:

gsempe sagde ...

Hello,
your thoughts are common and interesting.
Perhaps could you be inspired by EN9126 (http://en.wikipedia.org/wiki/ISO_9126) which defined software quality from 3 point of view: internal, external and quality in use.
I hope this will help you.

Kristian Erbou sagde ...

@gsempe: Thanks - I'll check it out :o)

Felciano Guarracino sagde ...

In retrospect, there are a lot of ways to look at the definition of software qualities. It's almost to say that it's subjective and depends on the developer's individual skill and attitude towards the project. There exist a number of quality management systems for our perusal, like the one the commenter above had mentioned. Software quality is a very interesting discussion that can make turns into more equally-important aspects of development.