Tuesday, October 12, 2010

C# Curiously Nested Production Class

We tackled some code for refactoring, and I found an unusual construct. To wit:

public class Base: Many, Varied, Bases {
    private int x;
    // ...

    public class Derived: Base {
        public setX(int value) { 
            x = value;

    // ...

So here we have the derived class nested in its base class. I am quite comfortable with nested anythings in python or a few other languages, so that's not a problem. I nest 'testable' classes in tests all the time so I can override or stub a few methods. In python it's a normal way of life to nest functions and classes.

This one is odd because it is a production class, nested inside a production class, and deriving from its outer class. It makes me think of a pregnant female, having a child inside it whose dna derives from the mother.

So what powers does this odd nesting convey? My first thought is that it is some kind of closure, but this turns out to be false. There is no real closure involved. The whole closure that derives from the parent made my head swim just a bit, but it is a wild goose chase. Nothing to it.

What does happen is that the nesting bestows readability on the private variables inherited from the enclosing parent. The x in the setX is the Derived.x inherited from Base. It is referencing its own variable, even though that variable is private instead of protected in the parent. Indeed, resharper refused to move the class to an outer scope because its own variables would become unreachable to its own methods. To do that extraction, we made the variables protected in the parent.

The other thing that it did was hide the class from an automated test program. These classes are both code-behind for web pages. We have an automated tool that searches the namespace for all classes derived from our base web component class and ensures that they all have page generation tests. This one did not.

I am wondering if there is some interesting way to use this like a private interface class in C++ or Java. It was curious, but I'm wondering if it can't also be useful.

Friday, October 8, 2010

Explaining Refactoring

I recently collected up some information from Jira and git about our software process.

I am mulling over an idea, and a little feedback is welcome. To help understand the context, among our products we have one codebase that was originally written pre-agile in a non-TDDed manner but greatly improved during and after the company's agile transition. It is pretty good stuff, and constantly improving in quality and functionality, consistently delivering real value to customers. We're pretty of proud of it and its future. There is a great team working on it. Sometimes, though, we find the need for a large-scale refactoring somewhere in the code base, and we have the courage to tackle these.

Basically the problem is that refactoring sounds bad to product managers. When it's done well, there is no visible difference in the code. When it isn't done perfectly, it injects defects. The obvious knee-jerk reaction is to resist any large-scale refactoring efforts.

As an aside: refactoring ought to be invisible. It should be part of every task, and not a task on its own. If a system is built ground-up using TDD then no "refactoring tasks" should appear in release notes and no "refactoring tickets" need be filed. Sometimes, however, there is legacy code with significant technical debt or significant code cruft in new code. That ugliness might be in key areas of the application. Sometimes it is worthwhile to dedicate time to bringing a piece of code under control of tests. Even so, if people outside of your development team are aware of refactoring it is because something is wrong somewhere. It is a process smell.

Of course the product manager is right about the risk of large-scale refactorings and also about the lack of end-user value from effective efforts. What is missing is an understanding of the value of improvements to code virtues, maintainability, and leverage. Product managers don't see code.

Refactoring done well speeds code changes. The reward is that we are able to sustainably produce more features faster, with better quality. The punishment for not refactoring is that the code becomes continually worse and changes become slower and more defective. We need continual refactoring and testing, and sometimes we need some an initial "boost" to areas that are hard to test. This is only obvious to people who are elbows-deep in the code, and product managers don't see code.

Since indeed cleaning up ugly code is somewhat risky, we technical people should have a solid means to judge whether any large-scale effort is truly worth doing. Otherwise we may waste many weeks on a crazy "ugly code" witch hunt without significantly improving the fluidity of development. The product managers don't want that, and neither do we.

Product managers see a bug here, a flaw there, a new bit of functionality, a slipped date, a funky side-effect here and there, all disconnected. They can't see the problems in the code because they can't see code. Some of them don't even believe that the technical problems really exist because it sounds like programmers are just making excuses.

I hacked together a simple similarity comparison of the text of our Jira tickets and compared them to code change records for those tickets. My efforts met with great disappointment. Two similarly-described bugs may not involve the same code at all. Conversely, if I show the tickets whose resolutions do involve the same file, non-programmers strain to see any commonality between them.

While product managers see only holes. programmers see only gophers. If product managers could see which set of bugs are connected to which source code files then they might demand remediation, but only for the code that really effects delivery and quality. At least in a rational world there would be such a level of data-based decision-making.

One answer may be to have the product managers work in the code, but I doubt that coaches will often be successful with this strategy. Instead, we may need a better way to illustrate the underlying connections.

This is where the heat map comes in, but it was written from a programmer's point of view. As a step in a new direction, I coded up a report to list the tickets for any given file. Maybe in the coming weeks I can convert the data to a form that product managers can see and digest without becoming programmers first.

If you have any interesting ideas for further research or credible presentation of this data, drop me a comment below. I'd love to walk in some day with a new way for my partners who are product managers to understand our work so that technical debt remediation can be properly assessed and prioritized. I'll settle for better criteria on which to base my choice of large-scale refactoring efforts.

It's a good dream. Waiting for your comments.

Heatmap: The new hotness

I had some fun recently after agonizing over the problem of bug prevention.  I put out the observation that bugs have a tendency to cluster, and that the more often code is edited the greater the need for the code to be frightfully virtuous, clean, and thoroughly tested. The problem was with determining which code should be cleaned up in order to reduce the injection of bugs.

I considered various metrics and measurements, including cyclomatic complexity, code duplication, and various design details, but ultimately dismissed them for fear that they would be unconvincing. I didn't want my audience to think the heat maps is just "coach Tim lecturing."  In keeping with our description of effective information radiators we needed the information to be simple and stark.

Our SCM is based on git, and our tracking software is Jira. We have to include a recognizable, open, Jira ticket id in every code commit. Many of the tickets are enhancements, improvements, or investigations. Some are problems with legacy data or data import, and some are defects. Tickets are associated with releases, and the release dates are in the database as custom fields. I decided to count tickets, and not commits. We commit frequently, and some more frequently than others. Counting commits would be mostly noise, but counting tickets seemed like a pretty good idea.

I fired up python and grabbed a partner. By using the PyGit library, we were able to parse all the git commits, collect ticket numbers, and parse the diffs to find the names of all files touched in the commit. I tossed together a shelf (bdb) of file names to ticket numbers.

Next I pulled down SOAPpy and my Jira-smart partner helped me build a Jira query. We pulled down the tickets and created another handy shelf, including the ticket number as a key and key facts like the descriptions, summaries, a defect indicator for defect tickets, and release numbers with dates.

Finally we walked through all of the tickets and ordered them by release date. I guess that's pretty coarse-grained, but it makes sense to keep it pegged to recognizable events. Releases have more data available (BOMs, release notes, etc) in case we need to dive deeper.

So then we get the fudgy part. A file that doesn't get any activity is not very interesting. It may be good or bad, tested or untested, clean or filthy, DRY or very non-dry, but nobody is messing with it so we can ignore it for a while. On the other hand, the more a file is touched, the more important it is that the file is easy to work with.

We don't have any information about which file(s) contained a defect, only the set of files that were touched while resolving the defect. That rightfully includes any tests, flawed files, test utilities improved, and any other source files touched by renames or other refactorings.  The more often a file is updated in the resolution of a defects, the closer it probably is to the problem file. I needed to weight activity related to defects much more highly than activity related to non-defect tickets.

As a rough heat map metric, I took the total tickets for a period and added back the total defects * 2, counting each defect three times effectively. That just feels about right to me and my partners. If I have two updates to one file with no defects, the code might not need much preventative maintenance. If I have two defects in one release then definitely needs some love. I came up with a ranking and graphed the top five files.

That left me with a jittery graph, so I decided to use a longer period, which I settled on being 3 releases. The January release "heat index" (heh) is based on November through January, and the February heat index is December through February. It gives us a smoother graph, but may not be the smartest way to smooth it.  It is simple, though, and that counts for something.

The top five files were pretty interesting. When I looked over it with a production support (triage) person, the files were very familiar to them. A manager-type remembered the defects associated with them quite well. A features programmer told me that these files really were troublesome. Some files had been remedied by dedicated refactoring and testing time, and we could see how those files had lost 'heat' over the period of a year. Other files were trending upward in heat, because of a combination of discovered defects and being in a functional area that is seeing a lot of expansion.

 We checked out the top two files which were trending hotter and found that they had low unit test coverage. We suspect a connection there.

The interesting parts of this experiment (to me) were:
1) We closed a feedback loop (request->code->release->defect report->code)
2) The metric is not based on my ideas of quality, just historic fact. so the answers are credible
3) There is so much more we can do with this data. More on that later.

 This is the new hotness, and it is presented as an information radiator in the bullpen area.  We are all "hot" on the idea of testing and refactoring these files, so hopefully we'll see some marked improvement over the next few releases.

Friday, October 1, 2010

Tests Are Code

I write an automated acceptance test (AT) and I find that I would like to perform certain actions against the business objects.  I create a method on the business objects and my test stays slim while my model builds up more functions.

So, there is potentially code in the production system that is only used in the tests. The debate today is whether that is "unnecessary bulk that solves no business need" or is instead "first-class code."

I am betting that the code you need in the AT is code you probably also needed in the UI and/or APIs to external clients. I am supposing that it is needed in the test because it is likely written into the UIs instead of the model and so the tests' logic is duplication. I also think that even a reasonably poorly-written test is written to support business rules.  Therefore, the tests express business logic so methods required by the tests are methods required by business.  While some worry that a method on a business object might be picked up and used by a programmer in the production code, I hope that this is true.

Your opinion matters, so feel free to leave a comment.