Wednesday, January 31, 2018

A little signal-to-noise



WARNING: the blogger "WYSIWYG" editor is really not very good about the "WYSIWYG" bit... so this article looks great in the editor but is a real crapshow in the actual post. I'm fixing it. Be kind, and bear with me.

In our eLearning, we publish problems and solutions. Sometimes people contribute other solutions and we show those as well. Today's sample comes from our Test-Driven Development album.

Album Art for Test-Driven Development



Geepaw hill tells us "everything matters" -- so today I'm going to nitpick at something that (in this case) is tiny and you might consider it insignificant. So be it.

But just the same, I would like to introduce you to a process that can improve your code and design in ways subtle and profound.

In this case, it stays a little to the subtle side, but that's okay for a blog.

Here is a source code example:

AreEqualWithPrecision(PhoneBill.calculateRate(PhoneBill.GOLD, 900, 1), 49.95); AreEqualWithPrecision(PhoneBill.calculateRate(PhoneBill.GOLD, 900, 2), 64.45); AreEqualWithPrecision(PhoneBill.calculateRate(PhoneBill.SILVER, 490, 1), 29.95);
AreEqualWithPrecision(PhoneBill.calculateRate(PhoneBill.SILVER, 490, 3), 72.95);



This test is checking a hypothetical phone billing calculation.

Let's look at these lines and figure out how much unique content exists per line.

De-Noise-Ify



Duplicated:
AreEqualWithPrecision(PhoneBill.calculateRate(PhoneBill.


56 characters of every line are duplicated. You probably didn't even read them after the first time.

Unique:
GOLD, 900, 1), 49.95);
GOLD, 900, 2), 64.45);
SILVER, 490, 1), 29.95);
SILVER, 490, 3), 72.95);




About 25 characters are not the duplicated prefix.

Of these, even fewer are unique (if you drop punctuation).

With 2/3 of every line being noise, it's pretty obvious that this code is inviting you to copy and paste. Heck, it's practically demanding it.

How many times would you want to type those first 56 characters (plus indentation)?

Most of the time when people copy and paste, it's because the code asks them to do that.

I'm willing to wager a pleasant adult beverage that the four-line test was written by copying the first line three times.

If we were to get minimal noise, it might look like this:

SILVER 490 3 72.95


Now we've got it to four points of data, and that's pretty noiseless. Do you know what it means?

Nope. I didn't think so.

This has all the noise removed, but also all the information.


If you have near-zero signal, then having little noise doesn't help.


But if you have little signal, having a lot of noise doesn't make it any better either.


Find Significance





There is a violation of the fidelity rule here. The fidelity rule tells us this:
One reads the tests to understand the code. One does not read the code to understand what the test does.

The first three numbers describe facts about a simple phone bill.

public static double calculateRate(int plan, int minutes, int numLines)


The other is the expected amount of the calculation (here done in decimal because it's just a teaching example).

So when the plan is type=SILVER, and billing is for 3 lines and 490 minutes of use, the expected result is $72.95.

Now the question is how to phrase this. We are a little stymied because there are two different kinds of plans being tested here for two different conditions each. We're not going to come up with a test name that reflects that because it's a number of different ideas.

Maybe the tests are too big.

We could divide the tests into GOLD tests and SILVER tests. We could make four different tests.

This seems like a good idea since test naming is a classic way of making code make sense.

When we look at the code we see that the algorithm is the same regardless of plan. Only some numerical values change per plan. That's interesting.


Exalt the Significant



Possibly we could rework the code a bit. I'm going to take some liberties and not actually build and run this, but just examine some different organization.

var baseRate = 29.95;
var included=500;
var extraMinutesRate=0.54;
var extraLines=21.50;
var baseRate = 29.95;
var included=500;
var extraMinutesRate=0.54;
var extraLines=21.50;


var silver = new Plan(baseRate, extraLines, quota, extraMin);
assertEqual(72.95, silver.calculate(lines=3, minutes=490));


There is more to this, though.

  • The significance of 490 minutes is merely that it is less than 500.
  • The significance of 3 is that it's two more lines than the 1 included in the plan.
In this test, the extraMinutesRate is insignificant. It's a shame we have to provide it.

I'm not even going to talk about the primitive obsession, using floats for money, or any of the other obvious issues here.

Especially not having small classes for minutes, and for money, and type-safe function parameters to keep us from shooting ourselves in the foot via mishandling of variables.

Far be it from me to mention that. This is, after all, a training exercise.


Avoid Duplication



Now we're getting closer to something that can be understood from the test. The signal is increased considerably. That's a good thing. Sadly, these numbers are going to be all over the tests and duplicated in the production code.

That violates the Single Point Of Truth (SPOT) principle, and also damages our signal-to-noise ratio.

Now we'll have numbers all over the place duplicating numbers in other places, and we'll have to be careful to ensure that they all agree when they should.

Maybe what we need now is to create a record type to hold the variables for different rates. Let's call them GOLD_RATE_PACKAGE and SILVER_RATE_PACKAGE for now.


var silver = new Plan(SILVER_RATE_PACKAGE);
var underQuota = SILVER_RATE_PACKAGE.minutes_quota - 1;
AssertEqual( 29.95, silver.calculate(minutes=underQuota, lines=1));
AssertEqual( 72.95, silver.calculate(minutes=underQuota, lines=3));


This could be taken further, but consider this example v. the original.

AreEqualWithPrecision(PhoneBill.calculateRate(PhoneBill.SILVER, 490, 1), 29.95);
AreEqualWithPrecision(PhoneBill.calculateRate(PhoneBill.SILVER, 490, 3), 72.95);




On one hand, they are almost exactly the same. On the other hand, there is a huge difference in the signal-to-noise ratio and the places one has to look to research why the first answer should be 29.95.


So Friggin What?


The point of this is not "my code is better than yours" or "I'm cooler than you" (which is almost certainly false).

What I'm suggesting is that there are subtle-but-different changes even in simple code if we consider the signal-to-noise ratio in our code.
  • De-Noise-ify
  • Find and Exalt the Significant 
  • Avoid Duplication 

As a result, you end up with code that is more obvious at a glance and likely has a better design as well.

This matters to me because I care about the code rather deeply.

Maybe you don't like it as well as the original.

That's okay, but what do you come up with when you follow the same process?