Monday, April 4, 2016

Naming is still hard.

Ideally, we would have excellent and obvious names for all variables, classes, packages, methods, etc. But it's hard to be excellent all the time. People building the java libraries struggled with names, I can tell, and though I don't always appreciate their choices I recognize that this is a multifaceted problem.

Maybe a moment of agonizing over a bad name might help share a mental model, and some patterns or smells for naming methods.

Let's give it a shot.

A Tiny Example

Here is a tiny, dull, dumb snippet:
 Date now = new Date();
 DateFormat df = DateFormat.getDateInstance(LONG, Locale.FRENCH);

Java has a method called getDateInstance().

Does it return an instance of a date? No, it does not.

If it lived in a package called Date then it would be a terribly wrong name, but it doesn't live there.

It returns an instance of a date formatter, whose class is called DateFormat.

DateFormat doesn't seem like such a good name, because it is an object with methods format. A DateFormat with a format method seems odd and redundant. Is it one, or does it make one, or what?But let's not worry about that for a minute.

It lives in DateFormat, so technically its name is DateFormat.getDateInstance.  That's better, but it's possibly both misleading (not a date instance) and also redundant ('Date' appearing both in package and method name).  Would DateFormat.getInstance() be a better name?

Well, that depends on how you import it. If you import DateFormat, then DateFormat.getInstance() seems harmless enough, but you might import DateFormat.getInstance() and then the use of the method appears without needed context:

      Console.out.println(getInstance(LONG, Locale.US).format(now)); 

Instance of a LONG? Instance of an US? Should the name be getInstanceOfClassDateFormat? Ew. GetInstanceOfFormatter?

Well, the call to format seems to help give context so we know more about it, and we can hover over the getInstance() call in an IDE to help us see where it comes from. It is harmless and survivable.

But it doesn't seem excellent.

Identifying a Noise Word

Trying to finesse or expand the word Instance is not productive here.

The problem with Instance is that it is a noise word.  It is like Data, Manager, Information, and so many other space-consuming bits of non-meaning we often assign to variables and classes.

So maybe the question that helps with naming is to ask again why this function exists.

It seems to me that it exists to provide the date formatter that the user requested from among the many date formats that may exist in different locales.

That suggests that the name should probably be something like getLocaleSpecificDateFormatter, but that's a real handful to type and most of the interesting words are near the end. Eww.

Perhaps getLocaleFormatter() is sufficient, since it's in the DateFormat package to begin with.

I prefer that, but I'm not crazy about it. I don't even like starting with 'get', which leads me to ...

Working with the Audience

I don't care for the getter/setter standard in Java. I would prefer to see something like DateFormat.for(LONG, Locale.FRENCH).  However, I have to balance that preference with the idioms and habits of the Java community.

A programmer working in an IDE will type the word get, and then ask the IDE for completion. That's a powerfully useful habit in Java IDEs, and it behooves us to comply.  Therefore 'get' is mandatory.

Now the next-most important thing is the word following get, because the programmer needs to quickly select the method they want from the completion list.

The next important word seems to be Locale. Let's make that word #2.

Now we see DateFormat.getLocale -- and that's still misleading. We don't really want a locale object/package/class here, but rather an object that will format a date for us. Drats.

Do we drop "Locale" from the name elaborate further?

It seems that Locale is important, so we don't want to move it later in the name or drop it entirely, so we'll elaborate a little further to see where it takes us.

So if we're not getting a locale, but a formatter, let's append the word formatter.

We suck up the redundancy issue. DateFormat.getLocaleFormatter() seems like the best we can do without agonizing over this for weeks.

Looking at the example of usage (in a test, of course) I see something like this:
DateFormatter df = getLocaleFormatter(LONG, Locale.FRENCH);
String formattedDate = df.format(now);
This seems to reveal intent so much better than getInstance().

Of course, the java libraries are in wide use and people have already formed habits and programs that would break if we renamed the library method now. So, we don't do anything about it, and go back to calling getInstance() while gritting our teeth.

Oh, look. The variable df has a horribly bad name. I bet I could fix that...

What do we learn from this exercise?

In this (perhaps silly) example, we examined a name that we cannot change, but along the way  you and I have examined:

  • the direct honesty of a name (and found it wanting)
  • the context of the name (the package and usage)
  • the existence of noise words in the name
  • the way in which other programmers use names
  • the idioms of the programming language
  • using a compound name to progressively reveal the intent of the method
  • the difficulty of changing APIs whose users are unknown to us
  • the fact that naming is an ongoing battle (see remark about df, above)

I welcome your comments and criticisms.