Monday, March 14, 2022

What does Tim have against "private" methods?

 A big thread erupted, all full of misunderstandings and miscommunications, about the idea of "private" methods. 

It all started quite innocently (I maintain) when someone asked how we felt about testing private methods. Some people jumped in with "Absolutely Not! Never! That's wrong! Test via public interfaces."

I thought a little longer, and said "I'm not sure" and then later "I'm not sure that 'private' is even needed."

This is where the problems started, and maybe here I can clarify what I meant by it all. 

People assume (and insist) that I could only possibly mean that they should substitute 'public' for all protected and private members, polluting the interface, and inviting the violation of a class' internal state. 

That was never my intention and still isn't. Still, this is what people insist that I must have meant from the start. I suppose this is because that's what they imagined me to mean and it's hard to admit that you're wrong, or perhaps because this is social media so people never miss an opportunity to double-down and pile on if at all possible. It's like a sport. 

It's also entirely possible that I didn't communicate well at all in short tweets. Maybe that's all on me. I can't say.

Well, you've come this far, so let's see if I can't communicate better in blog than in twitter, since I can write in peace without people accusing me of meaning the wrong thing while I try to explain what I actually mean.

If I've managed to make sense with this post, let me know. 

If I've not, the forum is open for clarifying questions below.

Autocomplete is a big deal.

The good thing about 'private' and 'protected' -- the thing that actually makes programming easier/better, is that the methods aren't offered by auto-complete. The idea of locking people out of calling the method? Not even remotely a second-degree concern.

People program by autocomplete. You type the name of a thing, press the dot, and you get a list of things you can do. You may never look at the documentation, or read the example code, but you will see the autocomplete hundreds of times a day.

When you want to disable an account, there is likely to be a 'disable' method attached to an account. You choose that method, run the tests, ship it. That's how you normally will navigate.

When you don't see a method you want, you might pop up a level and use the module or some management API interface. Again, you type the name and a dot, and you look for 'disable' in the recommended names. 

Private and Protected are two of the ways to keep functions out of that list, and if they're not in the list, you're not likely to even consider calling them.

Arbitrarily Located Functions

A developer is implementing a class and realizes that they need to compare date ranges. They write that up, drop it into a private method, and all is well.

Or is it? 

Why is that method located in this class? Does it have high cohesion? No. Does it support the purpose of this class directly? No. 

Why then? Because this class calls the function, and this is the file I was editing when I wrote the function.

In other words, the function doesn't belong here according to any rule of design, but it wasn't found anywhere else either. The developer drops it into the class with 'private' and moves on.

What I've found is that often the private methods in one class are repeatedly reinvented in private methods of other classes. Sometimes they're even copied and pasted directly from one class to another.

Why? Because this is the class they had open in the editor when they realized they needed the function, and it's not available for calling because it's private in the other class.  So they copied it.

Arbitrarily located functions, hidden from view, repeated by invention and copying, likely hiding the opportunity for Single Point Of Truth (SPOT) abstractions and libraries that would make everyone's job a little easier.

This isn't a rare occurrence. 

By being hidden and arbitrarily located, they invite duplication and reinvention. 

Testability of Private Methods? None

When we talk about testing private methods, it's a non-starter. If you put the word 'private' on the front of a method in most languages, you cannot call that method from a test. That means you can't possibly test private methods.

Okay, there is a way, but it involves reflection and that's an obscenity. If you ever use reflection to crack into some legacy code and test hidden methods, make sure you remove that hack as soon as humanly possible (or sooner) because it's a travesty. It's awful. It will fail you if the method is renamed or moved. Reflection is generally a bad idea, and I wouldn't do that (much, or for long).

If you are doing TDD (and why wouldn't you?) you test the public methods, and if they're using private methods then the private methods are being exercised. If you have meaningful tests and assertions, then the private methods are being tested through the public interface and that's probably okay.

If you're not doing TDD (and why not?) then you may not have code that tests all the private behaviors. You may have to read all the code in order to exercise it properly, which is a pain.

If you're doing test-after development (but why?) you've written the code and now you have to write tests around it. Every private method is a called from some public method and you're going to have to figure out how to get down into that private method code and recognize if it's working correctly or not.

By being private, those methods have cut off the easiest route to testability -- calling the method directly.


Should You Test Implementation?

You should: implementation is how code behaves and you have to test behavior.

There are caveats here:

  • If you're structure-aware in your tests, they'll resist structural refactoring
  • If you're time-sequence-aware in your tests, they'll resist time-order refactoring
  • If you're using reflection, you're poisoning our future and must be stopped (half-grin)

There are rules, of course. You don't lock down things that you want to change, and you don't leave unspoken the things that must be true. Your tests specify how your world behaves at a certain level.

Sometimes that's too high a level and there are mini-behaviors you need to check too.

I've seen cases where testing only at the interface required hundreds of tests, but testing at the internal level requires barely a dozen. This is because at the higher level, you need all the combinations of all the paths of all the subordinate functions. At lower levels, you need one for each path in a function.

High-level end-to-end and integration tests take a long time to run, and microtests are cheap and fast.

We tend to do microtesting (or unit testing) so that we can afford to run the tests in a tight TDD cycle. 

To argue against testing implementation functions is to argue against unit testing and microtesting on principle.

If you should only use the most-public interface, then shouldn't you only test at the UI level? Yeah, this is not a good idea. 

So how can we get at the lower-level functions? If they're all private, we can't but it is totally possible if they are public methods on lower-level classes that are composed ("encapsulated") by the API classes. 


Interfaces and Implementations

In many languages, you can declare a public interface, and that interface can be implemented by other classes.

It's accepted that one should always program to the interface in those languages. This way, one can only use the declared, public interface of the implementations. This keeps the implementations substitutable (via Liskov Substitution Principle). 

Non-interface methods of the implementers are already hidden: callers have no access to them via the interface. 

If you're using the interface, and you press the dot, then only methods in the public interface are presented. The other methods of the implementer may as well not exist, because they are not available here.

If you have segregated interfaces, this is even nicer. It may be that two or three public interfaces are the right way to think and design interactions, but combining a few of those interfaces together is the better way to implement the behaviors. No user of one interface has any awareness that the other interfaces exist on the object, nor of the public methods of the other interfaces, nor any of the public methods of the implementers.

In order to reach the public methods of the implementer, a caller would have to down-cast from the interface to the implementation (a no-no that justifies a sharp crack across the knuckles with a yardstick) in order to even know that other methods exist. 

This is convenient. This means that all methods of the implementers can be public without exposing those methods to callers.  Since they're public, you can write tests directly against them. Because the implementation is being tested also through the interface, it's easy to ensure that it behaves as a whole implementation.

Why not make them private? Because it's redundant and limits testing. 

Without polluting the public interface, one has a fully open and testable class.


Thinking a little more deeply

Private methods, when we choose to use them, may signal a need for us to think more deeply about our situation and strategies.

  1. In a Rat's Nest:
    sometimes people are doing test-last (after coding) and they have a complex method. You shouldn't have big, complex methods if you have any other choice. To try to manage the complexity, they may extract some private methods.
    In order to test through the API, you have to navigate the rat's nest. You will have to set up deep data structures and some combination of boolean conditions that will allow you to get to the private method's function call, and then to exercise the code inside that method.
    It's easier to make the method's access less restricted and test it directly than to have dozens of long and complicated tests. Perhaps package-public, perhaps protected so you can cheat with inherit-to-expose. 
    It's better to test the code easily than to write awful, fragile, internals-aware tests.
  2. Missed Abstractions
    Where one class has a bunch of private methods, often there is some cohesion between those methods. If one were to set the private methods side-by-side, one might recognize that there have been missed abstractions.
    Maybe some of the methods are generic string, date, or math functions. These could be public methods in a utility package or more-primitive type. If you move them to the "right" place, they can be fully testable and can be reused within the codebase. They would not be public methods of the class you're working on.
    Perhaps some of those represent a lower-level concept that is munged into the current class. If they were pulled out, they could become testable, public methods on the new class. They would still not appear in the public interface of the class you're working on.
  3. In Languages without Private
    In Python, smalltalk, and similar languages there's no 'private' and we've been okay with that for a long time (since the 70s for Smalltalk). 
    While people say that encapsulation is a core feature of OOD (and it is) it isn't the "private" keyword that causes encapsulation. It's done via composition.
    In a Python module, you can choose what classes you expose and which you don't. You have a class with a public API, which is composed of classes/functions that aren't part of the public API. They're tested directly within the module and don't have a "private" keyword associated with them. They might not have underscore-decorated names or any other semblance of access protections. 
  4. Other APIs
    In any language, the Model class or the API class may have a simple interface into the module (as described above) but the module may have a lot of complicated functions and logic divided into multiple classes and functions. 
    The model class doesn't have any 'private' methods at all - it just calls the public methods of other classes in the module. Those classes have copious tests and may not have any methods declared Private or Protected, because only the API and tests call those methods - they're not visible to the outside world (protection from Hyrum's Law)

So What Do You Do Instead

  • Don't pollute the public interface of a class with private methods. If you simply flip the private methods to be public, you'll lose understandability of the interface and create unwanted dependencies. This is what I recommend not doing. Keep interfaces clear and clean.
  • Move your methods to the places where they belong (where they have cohesion) and can be easily found by others.
  • Try not to need private methods. They should be rare. Remember, many OO languages don't even have the concept of private and they're just fine without them.
  • in some legacy code test-after situations, you might raise accessibility of a method to public, protected, or module-private in order to support refactoring via better tests (in the short term)
  • Consider using encapsulation properly: by composing behaviors under an API, rather than by housing all behaviors in the API's class.
  • In some languages, you have 'interface' or 'protocol' classes that declare an interface to use. Do that when there is a clear public interface. 
And, of course, I would be remiss if I didn't admit that I do sometimes create private methods. I try not to, and when I find a better way than settling on private, I am usually happier.  I sometimes leave a little cruft like private methods until I have more information and can see a better design form; it may take weeks or months. 

So yeah, there are private methods in my code. I just don't see that as "good coding" and a "solution." It's temporary.

3 comments:

  1. In C#, I add a class called Friends.cs to the project, mark my methods I want to actually unit test as internal and put inside my Friend.cs file an InternalsVisibleTo attribute that points to my unit test project. This allows just my unit test project to test my methods while keeping other calling code outside the assembly at bay.

    ReplyDelete
  2. Smaller, focused classes, even if they have only one public method have great benefits.
    Easier to test, mock, and re-use.

    Efforts to make web APIs using a single character of code, esoteric interface features, the race to create the latest javascript framework or abandon out-right, or implement the pattern out of this months box of lego coding parts all serve to hide the fact that most classes should be like simple sentences and not complex run ons that are confusing. <- For example, NOT this one.

    ReplyDelete
  3. One benefit of private methods is that they allow us to separate out a small piece of code and give it a name. That makes the calling method easier to read.
    We still need to understand that for purposes of complexity and testing, the public method and any private methods it calls are effectively all one method. If there are 100 lines of code in a public method we can't meaningfully improve readability or make testing easier by moving some of that code into private methods.

    ReplyDelete