Tuesday, November 4, 2008

Extreme Measures

  • Shorten iterations to force priority
  • SME can only help others complete tasks
  • Require 40% stories 100% done at midpoint
  • Revert/discard work over three weeks old
  • Random weekly team roster to force closure
  • Stir pairs twice daily
  • Eliminate individual tasks
Sometimes one has to take some extreme measures to help a team over the hump in their agile transition. It is hard to adjust work habits without having a work environment that depends on new behaviors. These extreme measures may stick, or may be training wheels for extreme programming.

Shorten Iterations
Shorten iterations to force priority. Cause the Customer role to pick fewer things to do, more often. This also should force developers to reach closure on cases more quickly. If the team is used to letting things lag and stack up for some future day, shortening the iteration can help them get into the habit of finishing things more quickly and taking on less work.

SME Has No Tasks
SME can only help others complete tasks. This rule forces collective code ownership. If a subject matter expert is not allowed to "do his own work" then he/she must do it through other people. This means more people with their hands in the same code, and also means a higher "truck number" for the team.

40 At The 50
Require 40% stories 100% done at midpoint if the team is still trying to assign work to individuals. If work is split up to individuals, it is normal that much of the work completes (if it completes) on the last day of the iteration. If the team is expected to organize around completing stories every day or two, then they have to work together in a new way. Normally tracking velocity will take care of the problem, since having 100% of the work 90% done means a velocity of zero. In the cases where velocity is not motivation enough, you may need to enforce the "fourty by the fifty" (40% completely done by the 50% mark of the iteration) rule.

Destroy Unfinished Work
Revert/discard work over three weeks old. Nothing proves sincerity like throwing work away. If you roll up your sleeves and delete some of the old tasks that have been pecked at over the course of weeks or months but never completed, it actually helps your team focus on the things that *really* need to be done, which improves your velocity. Incomplete work product is defined in Lean processes as "waste". If it were really all that important, it would have been driven to completion. It's trash. Take it out.

Random Roster
Randomize the team roster weekly to force closure of stories. Divide the team in half. Perhaps have half of them fix bugs while the other half works on new features. In most non-agile teams, people are used to having work slop over the edges of the iteration, and so they claim "done" when they're not really "done done". So randomize the teams. Now nobody can count on having next week to finish up the work they've committed to this week.

In addition, moving from team to team means that they will have the opportunity/obligation to work on parts of the system that are unfamiliar to them. This motivates the cleaning of ugly code and the shoring up of weak tests. It costs velocity, but improves truck number and code.

It is uncomfortable to live with change and uncertainty this way, but it will push people to rely on tests for features they don't know and ensure the tests pass pass before they hand off the code to a peer.

Stir Pairs
Stir pairs twice daily if you find people migrating to "pair marriages". You want to avoid having the same people partner up over and over. Things go stale that way, and people tend to partner with people at their own skill level rather than learning from people who are more skilled and sharing the burden of teaching those less skilled. If the partners are stirred occasionally, there is no undue burden and no hiding.

No Individual Tasks
Eliminate individual tasks by requiring that all production code have two sets of eyes at a minimum. Require pairing and TDD for all code. If this sounds extreme to you, you haven't been working in a shop that truly practices the XP style. This is actually part of the original process, and has been taught as-given for quite a long time now.

I don't generally advocate heavy-handed measures, but sometimes you have to create a system that teaches the practices you want people to learn... if only as a temporary measure.

Python Pimpl Pattern

A classic unit test blunder is to make use of the system time freely in your code. Another blunder is to monkey-patch your preferred time function.

I was working with some ATs which failed because they were written with a date in mind, and the calendar has marched on since those days. The answer is fairly obvious, to override date function. With a little searching, I find a utility fixture for forcing a given date/time. It worked as long as I ran the test in isolation, but failed when I ran the test in its suite.

Code in the system performed imports as "from mx.DateTime import now", and 'now' became a stable reference to whatever mx.DateTime.now happens to be. If you change the reference in mx.DateTime, it doesn't affect your stable reference. It binds at the time the mx.DateTime importer is loaded.

Now, python does some nice optimization. When you import a file, it doesn't necessarily read the file from disk. If the file is already loaded, it merely maps the namespace of the module into the current namespace (as requested by the import statement).

So the file Importer.py imports using "from mx.DateTime import now". If that happens after the fixture has monkey-patched mx.DateTime.now to some silly lambda method, then 'now' in Importer points to the lambda. If, on the other hand, it was imported prior to the monkey patch, 'now' points to the original function. If mx.DateTime.now is changed after Importer imported it, it has no effect. That's true even if the change is to set it back to mx.DateTime.now's original value.

Now let's say that Importer did "import mx.DateTime" and didn't bind 'now' to mx.DateTime.now but instead called the method as mx.DateTime.now(). Now the monkey patch is fine. The reference is indirect, via lookup, and not via a bound reference. If we always called mx.DateTime.now, then monkey-patching ("mx.DateTime.now = lambda: return DateTime(blah)") will work, and un-patching it will work too. Some would say "problem solved". I suppose that would do it. But in Python, we consider this kind of patching to be evil. We try to respect module boundaries and not make implicit changes.

We can write our own function in a module and have it call mx.DateTime.now() and replace it to force the current date, but that puts us back in the same trouble if anyone writes "from TimsModule import now". That stable reference problem comes back for TimsModule as it did for mx.DateTime.

So we need a function that can be used with a bound reference or called via the module path, and still give us the results we want. Back in C++ days, J.Coplein wrote up the envelope/letter pattern (aka pImpl). You need a function that delegates its implementation (like a 'strategy'). This is easy since all functions in python are objects:
------ NowFunction.py

from mx.DateTime import now as originalNowFunction

def now():
return now.implementation()

now.implementation = originalNowFunction

Now we need an example of a program which imports now() and calls it repeatedly, so that we can prove that it is affected dynamically by changes to the implementation:

----- Importer.py

from NowFunction import now

def lookNow():
"Watch how now() changes implementation"
for i in xrange(35):
yield now()

What's left is a program that manipulates the now function and demonstrates that the first file is getting the full benefit of setting and unsetting the implementation. Something that will set it to various values and back. Maybe based on some well-known programming example (with no attempt at optimizing or playing code golf):
----- test.py

import Importer
from NowFunction import now, originalNowFunction

for n,value in enumerate(Importer.lookNow()):
if (n % 3) == 0:
now.implementation = lambda: "fizz"
if (n % 5) == 0:
now.implementation = lambda: "buzz"
if (n % 5) == 0 and (n % 3) == 0:
now.implementation = lambda: "fizzbuzz"
if (n % 7) == 0:
now.implementation = originalNowFunction
print n, "Got",value, ", next sample ", now()

Monday, November 3, 2008

Acceptance Test Qualities

I'm involved in writing a new agile agile guide with Jeff Langr. We are taking agile concepts and trying to boil them down to the simplest forms that cover the bases reasonably well.

It is rather like playing The Three Things (AKA "the Two Things") game for Agile software development. An example:

Acceptance Tests

  • Define “done done” for stories
  • Must be automated
  • Document all uses of the system
  • Should be usable as the basis for system documentation
  • Do not replace exploratory tests
  • Run in as-close-as-possible-to-production environment

This list is intended as a small set of reminders, so that when one is in the midst of a project, one might find some guidance. Is the test really fit for use as documentation or written as programmer-ese? Is it describing the feature well enough to guide development? Is the Continuous Integration environment running it in a naive or unusual system configuration? Should we run these tests manually?

The bullet list should speak to you. If not, then read through the explanation below.

Define “done done” for stories

Clearly some of the greatest value in ATs is that they are executable specifications. No work should be assigned for completion without some ATs first being created that describe the feature fairly fully. I tend to not require fully comprehensive coverage for all ATs, but I find that sometimes I am wrong not to. This point is as important as it is difficult. We are frequently finding "missed requirements" or "unexpected interactions." The answer for these is probably not to have full Big Design Up-Front (BDUF) but to find a more agile way to deal with corrections and changes.

Must be automated

ATs really have to be automated. Manual testing simply cannot scale. We can expect to run every automated test we've ever written a few times a day, but could hardly expect to run all of the manual tests we could have written even once every two weeks. Automation doesn't just make testing convenient, it makes continual testing possible.

Document all uses of the system

Even uses of a system that pre-exist the team's agile transition still need tests. This is because the second value of acceptance tests is in preventing regressions or detecting brokenness. It is never a good time to be ignorant of the fact that you've broken your system.

Should be usable as the basis for system documentation

The third value of the ATs is that they document the system. That should make it easier for people whose job is also to document the system. Often this power of testing is overlooked, especially when the test are written in a non-literate style.

Do not replace exploratory tests

Of course, automated tests are never complete and features are prone to have unintended interactions or consequences. Professional testers are valuable teammates. Their exploratory testing may uncover things that programmers, intimate with the workings of their code, might not.

Run in as-close-as-possible-to-production environment

Finally, tests need to run on their target platform. It happens, though. It's better to find any platform issues earlier in the process though. If the tests include a database, it ought to be the same kind of database you'll see in production. Likewise file systems, network hardware & software, etc. It might be handy to have a CI system run the tests once on a development-like system and then install and run again on a production-like environment.

Agile Progress and Branching

This week, and last, we are doing our work in the release candidate (RC) branch, which will eventually be merged to trunk. We maintain a "stable trunk" system, with the RC as our codeline (for now). This is an intermediate step on our way to continuous integration.

Partly because of the change in version control, the team has learned to rely more upon the tests, and is writing them quickly. We have had a noticeable increase in both unit tests (UTs) and automated user acceptance tests (UATs) in only one week There were some problems with people checking in code for which some tests did not pass, but they have learned very quickly that this is quite unwelcome.

We are painfully aware of the time it takes to run both test suites. The UTs suffer from a common testability problem, in that they were written to use the database and they sometimes tend to be subsystem tests rather than truly unit tests. When they are scoped down and mocking is applied, they should be much faster. Sadly, we are using one of those ORM frameworks that wants to own our objects and bind them tightly to the database, so we will have to go through some more mechanizations to get our objects free of the database. This is common, but always troublesome. The features that make a framework convenient can be the same ones that frustrate all attempts at building moderately comprehensive test suites. Our unit tests take over 10 minutes on my computer, and the UATs take much longer. =8-o

We have been closing down old branches for a while now (releasing backlogged work), which can only increase our productivity by decreasing the "drag" of branch maintenance and troublesome integrations. We have not outlawed development branches, but we will start committing to a small amount of work to always be done in the RC, with larger tasks or those with uncertain results branched for now.

We have a nosetest-based harness for gathering coverage information from unit tests, and I hooked up coverage.py to collect the same data for our UATs. It's not a perfect system yet, but we can at least start to chart some trends.

Our Continuous Integration effort is nascent. I'm going to try to find a way to set up buildbot to run all our unit tests (at least) and then to launch the UATs through FitNesse (always a pain to automate). I'm expecting a lot of fun here.

Our informative workspace initiative is coming along. We have UT counts and timing graphs, the same for UATs, working card-walls for our tasks, simple process information, etc. Some of our programmers have been producing monthly production charts to track the amount of money moving through the system, etc.

Overall, we're doing a pretty good job of transitioning. We have challenges, but we've come a long way.