Thursday, April 17, 2014

The revamped C++11 auto keyword: blessing and curse

In 2011, the C++11 (aka C++0x) standard was accepted and subsequently implemented at least in part by most major compilers. One of the more talked-about changes to the language was the expansion of the auto keyword.

auto has been in C since the beginning as a way to denote automatic (as opposed to static, register, or external) storage for a variable. But since auto is implied by default when no storage class is specified, no one ever bothered to actually write it. Until 2011. With C++11, the word auto can also be used in place of a type identifier when creating a variable as a way to tell the compiler to deduce the type from context. This change was nominally motivated by a need to support template programming techniques where the type really can't be known until compile time, but what auto really gets used for 99% of the time is saving keystrokes. No longer must you, weary coder, suffer wrist strain from typing out travesties like:

Some::Long::Namespace::SomeLongClassName<SomeTemplateType>* myVar =
   new Some::Long::Namespace::SomeLongClassName<SomeTemplateType>();

Instead, you can write:

auto myVar = new Some::Long::Namespace::SomeLongClassName<SomeTemplateType>();

Voila! auto has nearly halved the number of characters needed to construct a new instance of this particular object. And if that's as far as things went, I doubt anyone would have a problem with it. But like any good thing, auto can be taken too far. The existence of auto means that authors in principle need never write a typename on the left side of an assignment again, and more than a few do exactly that. This certainly makes things more convenient for you as a writer. The problem is thathate to break it to youthe writer's convenience doesn't matter. Code should always be written for the convenience of the reader. The crucial question, then, is: does auto make code easier or harder to read? And the answer to that question, as usual, is that it depends.

But before going further, let's address a couple pro-auto arguments I simply don't buy. You'll hear that if you really want to know what an auto is, any modern IDE will have intellisense features that can tell you with little fuss. With C++ that may or may not be true. C++ is a notoriously difficult language to parse in real time, and while there are a bunch of tools that can make reasonable guesses and be right most of the time as long as you don't try to do anything too tricky, only the compiler is able to make a final, authoritative judgement. Further, code is not always read in an IDE. If I'm doing a diff, or getting a snippet in an email, or reading something that's been posted online, there are no tools available to help figure out what that auto actually is, and I just have to hope it doesn't matter.

Next, some people will tell you you shouldn't be overly-concerned with types when you're reading code anyway, and that it's better to create good variable names and have the types abstracted away so you can reason about the code at a higher level. And if a high-level overview is what I'm after, that may be a valid point. But what if it's not? What if the code doesn't work, and the whole reason I'm reading it is to figure out where it's wrong? Bugs and the devil are in the details, and they're that much harder to find when the details are hidden. Now, if I've got decent intellisense (again, by no means a given with C++), I can hover over a variable name and get a popup telling me what its type is. But I'll probably forget as soon as I start thinking about something else, especially if there are multiple auto variables I'm trying to track at the same time. When you're debugging, you want to maximize the amount of information you can access at a glance. auto is your enemy here.

Others will claim that auto makes refactoring easier. If GetValue() returns int, but everyone who calls it stores the result in an auto, I can later change it to return double without breaking compilation all over the place. My response to that is if you're going to change something as fundamental as a function's return type, you darn well better take the time to examine each and every caller and make sure it will still function correctly after the change. The fact that the code won't compile until you do so helps enforce that sort of diligence, and using auto to get around the "problem" just makes it that much easier to write bugs.

So when is it ok to use auto? My opinions are always evolving and I've gotten quite a bit more permissive as time has gone by, but here are the cases where I currently allow myself the convenience:

1) The type already appears on the right-hand side of the expression.

auto somethingConcrete = dynamic_cast<LongConcreteName*>(somethingAbstract);

We're typing enough here already without repeating LongConcreteName. I don't feel the slightest bit of ambivalence about using auto in these situations.

Sometimes the type isn't strictly quoted on the right side, but can be confidently deduced from what is there.

auto mySuperWidget = SuperWidget::Create();

SuperWidget::Create() had better return a SuperWidget. True, we don't know if we're getting a SuperWidget*, a shared_ptr<SuperWidget>, or a stack SuperWidget. This used to bother me and I can contrive situations where making the wrong assumption would cause a bug, but I've learned to be a little flexible here. These days I tend to always return shared_ptr from my factory functions, so I typically use auto in those cases and only write the type out when it needs to be something different.

A more dicey example is something like this:

auto mySuperWidget = myWidgetFactory.CreateSuperWidget();

Am I getting a concrete SuperWidget? An ISuperWidget? An IWidget whose concrete type is SuperWidget? I go back and forth on cases like this. It comes down to how much ambiguity there really is given the classes and interfaces that really exist in the codebase, and what kind of mood I'm in.

2) You're capturing the return value from some function, and you're not doing anything with it except passing it off to someone else.

auto result = ComputeResult();
return FilteredResult(result);

This is not fundamentally different from:

return FilteredResult(ComputeResult());

3) You're capturing the return value from a well-known library function that any likely reader will be familiar with.

auto it = myStlCollection.begin();

I think it's safe to assume that anyone who's working with a C++/STL codebase will know that begin() returns some sort of iterator. In addition, the full typenames for STL iterators are really ugly if you have to write them out completely. Which leads to...

4) The actual typename is really long and ugly. A project I'm working on right now has a function that returns:

std::shared_ptr<Reactive::IObservable<std::shared_ptr<std::vector<double>>>> 

(it actually has quite a lot of functions that return things of that sort). Although having that all in front of you does communicate a lot of important information, this is a case where the inconvenience of having to look the type up may be less than the inconvenience of having a nearly 80-character typename filling up the screen. In situations like these it's usually better to use a typedef to communicate the gist of the type with a smaller number of characters (although many of the complaints I have about auto work against typedef too), but if you have a whole bunch of functions all returning ugly types and they're all just a little bit different from each other, that may not be practical.

What all of these except #4 boil down to is simply this: only use auto when the reader will know what the type is anyway. If I only got one rule, that's what it would be.

Used, properly, auto is great for both writer and reader. After getting used to having it, I've been surprised how frustrating it is to work in languages like Java that lack an equivalent, and I certainly wouldn't want to go back to the days before it existed. But used willy-nilly by authors who don't want to type ten characters instead of four, or (worse) who can't be bothered to check what the result of some expression is, it can lead to incomprehensible, bug-prone code that can't be debugged. Enjoy responsibly.

No comments:

Post a Comment