Streaming Consciousness: C++: Hiding the implementation

C++ is a powerful language, but it does have a few aggravations (some would quibble with "a few"; others would quibble with "some"). Large among them is the header/source file dichotomy. If you have a class that you want to make visible to the rest of your application, you need to write an outline-like definition of it in one file (an #include-able header), and put the actual code for its methods in another (a source file).

For example, the header for a class representing a monster in an RPG might look like this:

Standard header file

// Revenant.h

#include "Armor.h"

class Revenant
{
public:
 Revenant();

 int GetHp() const;
 void SetHp(int newHp);

 void TakeAttack(int attackPower);

private:
 int GetDamageFromAttack(int attackPower) const;

private:
 static const int STARTING_HP = 20;

 int m_hp;
 Armor m_armor;
};

The corresponding source file might look like this:

Standard source file

// Revenant.cpp

#include "Revenant.h"
#include "ArmorFactory.h"

Revenant::Revenant()
 : m_hp(STARTING_HP)
 , m_armor(ArmorFactory::CreatePlateArmor())
{
}

int Revenant::GetHp() const
{
 return m_hp;
}

void Revenant::SetHp(int newHp)
{
 m_hp = newHp;
}

void Revenant::TakeAttack(int attackPower)
{
 int damage = attackPower;
 GetDamageFromAttack(attackPower);
 SetHp(GetHp() - damage);
}

int Revenant::GetDamageFromAttack(int attackPower) const
{
 int damage = attackPower;
 damage -= m_armor.MitigateDamage(damage);
 // apply more mitigations, etc.
 return damage;
}

The problem with header files

Right away you should notice one of the problems with the header file approach—you're forced to repeat a great deal of information. Luckily you don't need to worry about the two copies gradually drifting out of sync because, with a couple of exceptions, the compiler won't let you get away with that. But it's still extra work. Writing code in the source file and decide you want to add a private function? You'll need to switch to the header, scroll to the correct position, write the new function's signature, switch back to the implementation and write the same signature again (with a syntax that is just different enough to prevent you from using straight copy/paste), and only then start writing the actual code. There are some good tools that can automate much of this, but it's still a hassle, and you can't depend on automatic tools to keep things organized as well as you could by hand.

The other, bigger problem is more subtle. The need to declare all your member variables and functions in the header forces you to expose the private workings of your class to the entire outside world. Anyone reading Revenant.h will see that a Revenant starts with 20 HP, and may come to depend on that detail. More damningly, because Revenant has a private member variable of type Armor, any file that wants to use Revenant will need to #include Armor.h, even if it never actually uses Armor directly. In addition to making more work for the preprocessor, this means that any change to Armor.h will require all consumers of Revenant to be recompiled. Something similar happens if you decide to change Revenant's implementation details, perhaps by splitting GetDamageFromAttack() into several functions that each apply a different type of mitigation. This would mean that everyone using Revenant would need to be recompiled, even though no one should need to care about the details of how you calculate damage. Thanks to transitive dependencies, over time this sort of arrangement can easily lead to a situation where changing a single "hot" header file causes a good chunk of the source files in the project to be recompiled.

pimpl: A partial solution

A big part of the reason we need to fuss around with all this header stuff is because the compiler needs to know the size of any object that you pass around. This means if you declare a parameter or variable of type MyAwesomeObject, the compiler will need to read all of MyAwesomeObject's details from its header file so it can determine how much space to set aside for it. But there is a loophole—if we replace all those MyAwesomeObjects with pointers to MyAwesomeObjects, and we never try to dereference any of those pointers, then we no longer need to tell the compiler anything about MyAwesomeObject except that it is the name of a class. This is because pointers are always the same size no matter what they point to. This is true for smart pointers too, since they are just glorified wrappers around raw pointers when all is said and done.

This fact is exploited in the idiom known as pimpl (either "private implementation" or "pointer to implementation"), which solves some of the header file issues by hiding implementation details in the source file. The basic idea is to define a second, private "impl" class within the source file to hold all the private methods and members of the public "interface" class. Your public class then declares a private pointer to an instance of the impl and no other private functions or variables. You still need to forward-declare the impl in the header so the compiler can make sense of the symbol, but includers don't need to know anything about it beyond its name. Revenant might look like this after being pimpled:

pimpl example

// Revenant.h

// forward declaration for the impl
// #includers don't get to know anything about it except its name
class RevenantImpl;

class Revenant
{
public:
 Revenant();
 Revenant(const Revenant& other);
 Revenant& operator =(const Revenant& other);
 
 int GetHp() const;
 void SetHp(int newHp);

 void TakeAttack(int attackPower);

private:
 // all (other) private implementation contained within
 std::shared_ptr<RevenantImpl> m_impl;
};



// Revenant.cpp

#include "Revenant.h"
#include "Armor.h"
#include "ArmorFactory.h"

class RevenantImpl
{
private:
 static const int STARTING_HP = 20;

 int m_hp;
 Armor m_armor;

public:
 RevenantImpl()
  : m_hp(STARTING_HP)
  , m_armor(ArmorFactory::CreatePlateArmor())
 {
 }

 int GetHp() const
 {
  return m_hp;
 }

 void SetHp(int newHp)
 {
  m_hp = newHp;
 }

 void TakeAttack(int attackPower)
 {
  int damage = GetDamageFromAttack(attackPower);
  SetHp(GetHp() - damage);
 }

private:
 int GetDamageFromAttack(int attackPower) const
 {
  int damage = attackPower;
  damage -= m_armor.MitigateDamage(damage);
  // apply more mitigations, etc.
  return damage;
 }
};



Revenant::Revenant()
 : m_impl(std::make_shared<RevenantImpl>())
{
}

Revenant::Revenant(const Revenant& other)
{
 *this = other;
}

Revenant& Revenant::operator =(const Revenant& other)
{
 *m_impl = *other.m_impl;
 return *this;
}

int Revenant::GetHp() const
{
 return m_impl->GetHp();
}

void Revenant::SetHp(int newHp)
{
 m_impl->SetHp(newHp);
}

void Revenant::TakeAttack(int attackPower)
{
 m_impl->TakeAttack(attackPower);
}

This has solved our biggest problems—other coders won't see our starting HP anymore unless they deliberately go looking for it, Revenant's consumers don't need to know anything about Armor, and we can modify the impl all day long without forcing anyone else to recompile.

But we've only solved the repetition problem for the private functions. For the public ones we've actually made things worse since we now need to write each one's signature three times. We also had to jump through some hoops to add deep-copy semantics to the class (another option would have been to make the object noncopyable). Finally, we've got all these stub functions that do nothing but delegate to identical functions in the impl. The performance impact is minimal, but it's that much extra code to read and write. This final point can be ameliorated somewhat by making everything in the impl public and dividing the logic between the two classes:

A more freeform pimpl example

// Revenant.cpp

#include "Revenant.h"
#include "Armor.h"
#include "ArmorFactory.h"

class RevenantImpl
{
public:
 static const int STARTING_HP = 20;

 int m_hp;
 Armor m_armor;

public:
 RevenantImpl()
  : m_hp(STARTING_HP)
  , m_armor(ArmorFactory::CreatePlateArmor())
 {
 }

 int GetDamageFromAttack(int attackPower) const
 {
  int damage = attackPower;
  damage -= m_armor.MitigateDamage(damage);
  // apply more mitigations, etc.
  return damage;
 }
};



Revenant::Revenant()
 : m_impl(std::make_shared<RevenantImpl>())
{
}

Revenant::Revenant(const Revenant& other)
{
 *this = other;
}

Revenant& Revenant::operator =(const Revenant& other)
{
 *m_impl = *other.m_impl;
 return *this;
}

int Revenant::GetHp() const
{
 return m_impl->m_hp;
}

void Revenant::SetHp(int newHp)
{
 m_impl->m_hp = newHp;
}

void Revenant::TakeAttack(int attackPower)
{
 int damage = m_impl->GetDamageFromAttack(attackPower);
 SetHp(GetHp() - damage);
}

This works, but it's ugly. The code is scattered across two locations. Sometimes you need to prefix a member's name with m_impl-> and sometimes you don't, and if you move code from one class to the other, you'll need to either add or remove those prefixes. Further, although I don't show it here, in a real class it's likely that at some point the impl will need to call a function in the outer class, which means it must be provided with a reference to it. All this adds up to one more piece of state information you need to keep in your head when you're reading and writing code, and there's enough of that already.

Interface-based programming: a better solution

A technique called interface-based programming takes things one step farther in an attempt to solve the same problems pimpl addresses without introducing the new ones it creates (I am not the biggest fan of the name "interface-based programming" since that term already has a similar but broader definition, but that's what people call it). With interface-based programming, the header file contains nothing but a pure, abstract definition of the class's public interface and a free or static factory function for creating concrete instances. The concrete class itself lives entirely within the source file. Let's take a look at Revenant again:

Interface-based programming example

// Revenant.h

class Revenant
{
public:
 // static factory function instead of ctor
 static std::shared_ptr<Revenant> Create();
 
 // virtual dtor is essential!
 virtual ~Revenant() = 0 {}
 
 // interface requires clone semantics instead of copy
 virtual std::shared_ptr<Revenant> Clone() const = 0;

 virtual int GetHp() const = 0;
 virtual void SetHp(int newHp) = 0;

 virtual void TakeAttack(int attackPower) = 0;

private:
 // copy not implemented; hide the assignment operator
 Revenant& operator =(const Revenant&);
};



// Revenant.cpp

#include "Revenant.h"
#include "Armor.h"
#include "ArmorFactory.h"

namespace
{
 class RevenantImpl : public Revenant
 {
 private:
  static const int STARTING_HP = 20;

  int m_hp;
  Armor m_armor;

 public:
  RevenantImpl()
   : m_hp(STARTING_HP)
   , m_armor(ArmorFactory::CreatePlateArmor())
  {
  }

  virtual std::shared_ptr<Revenant> Clone() const override
  {
   auto clone = std::make_shared<RevenantImpl>();
   clone->m_hp = m_hp;
   clone->m_armor = m_armor;
   return clone;
  }

  virtual int Revenant::GetHp() const override
  {
   return m_hp;
  }

  virtual void Revenant::SetHp(int newHp) override
  {
   m_hp = newHp;
  }

  virtual void Revenant::TakeAttack(int attackPower) override
  {
   int damage = GetDamageFromAttack(attackPower);
   SetHp(GetHp() - damage);
  }

 private:
  int GetDamageFromAttack(int attackPower)
  {
   int damage = attackPower;
   damage -= m_armor.MitigateDamage(damage);
   // apply more mitigations, etc.
   return damage;
  }
 };
}

std::shared_ptr<Revenant> Revenant::Create()
{
 return std::make_shared<RevenantImpl>();
}

This really gives us the best of both worlds. As with pimpl, the header only communicates the class's public interface. But unlike pimpl, the entire implementation is now contained within a single class. We've also gotten rid of pimpl's stub functions and triple-repeats, though we still had to write the signatures of the public methods twice (that's as good as we're going to do). As an added bonus, Revenant is now a true interface in the technical sense of the word, which means we gain the ability to mock and stub it in tests for free.

A couple details to note: first, since client code can only access RevenantImpl via a pointer to the base class Revenant, it is essential that Revenant have a virtual destructor; otherwise deleting a Revenant would have undefined behavior. If this hadn't been done in the code above, for example, it's likely that the Armor member variable's destructor would not be called properly, and any resources it held would be leaked.

Second, since Revenant is an interface now, it no longer makes sense to copy instances of it—we need to use clone semantics instead. In addition to adding Clone() to the interface, this means we should hide the assignment operator. If we didn't, someone could get away with writing "*revenantPointer1 = *revenantPointer2". The compiler would accept this, but the call would do nothing since the Revenant interface itself doesn't contain anything to copy.

Inheritance

If I've neglected to mention inheritance in any of the above, it's because I practically never use it—I've been pretty thoroughly converted to the composition camp. But if you want to hide your private implementation and retain the ability for other classes to derive from you, all is not lost. You're pretty much out of luck with interface-based programming, but pimpl can still get the job done. You'll just need to add any protected methods or members you want to expose to your descendants to your header—anything in your private impl will be invisible to them (they'll be able to see the m_impl pointer itself if you make it protected, but they won't know anything about the structure of the object it points to).

pimpl vs. interface-based programming

I've described two different techniques for hiding implementation details in C++ here; how do you decide which to use when? Or when to use them at all? A lot of it is personal preference, but I follow an algorithm that goes something like this:

I don't bother hiding implementation at all if:

It's a simple object where any of this would be overkill
I need all the object's data to be contiguous in memory
I don't want to allocate anything on the heap
Performance is so critical that I can't afford either interface-based-programming's virtual calls or pimpl's extra delegations (very, very rare)

Otherwise, I use pimpl if:

I need to be able to allocate instances on the stack
I can't afford the extra overhead of the virtual function calls
I'm working with some monster legacy class that can't easily be converted to interface-based programming. In this case, I'll add an impl to hold new private data and functionality while leaving everything else the way it is.
I need to be able to inherit from the class

Otherwise, the default choice is interface-based programming

Wrapup

Implementation-hiding is one of those game-changing concepts that can radically alter how you design and think about your code. I use it for virtually every class I write these days, and can hardly imagine doing things any other way (every time I have to go back and read something I wrote n years ago before I learned about these concepts, I die a little inside). Interface-based programming in particular has a way of forcing you to design your code in a way that makes it more modular and testable almost by accident. And anything that makes you think about your class's public interface up-front and enforces not just a conceptual but also a physical separation between interface and implementation can only be a good thing.

Streaming Consciousness

Tuesday, April 22, 2014

C++: Hiding the implementation