Data structures first

A Recipe

A recipe for writing simple, maintainable, robust code:

  1. Think about the program you're trying to write as a state machine.

  2. Articulate the actual state space of the program.

  3. Encode the state space into your data structures as restrictively as possible.

    • This often involves the use of algebraic data types.
  4. Simply write the state transition functions.

    • In a statically-typed language, provided that you've encoded the state space as restrictively as possible, the code should be obvious and there should really only be one way to write the code.1

As articulated by others

Linus Torvalds:

git actually has a simple design, with stable and reasonably well-documented data structures. In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful.

I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.


Eric Raymond:

Fold knowledge into data, so program logic can be stupid and robust.

Data is more tractable than program logic. It follows that where you see a choice between complexity in data structures and complexity in code, choose the former. More: in evolving a design, you should actively seek ways to shift complexity from code to data.


Rob Pike:

Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.



This idea of restricting the number of different ways code can be written and relying on the type system to prove that the code stays on one of these paths is similar to the idea of parametricity.