Grey Hill

On Nested Loops

Nested loops are inevitable. Whenever you have a collection of something you want to apply to another collection, you will need a nested loop. While Computer Science theory suggests the order of the loops doesn't matter, it does matter in practise. We'll look at an example I encountered while working at Demonware as a co-op student. It has been a while since I worked at Demonware, so I'll use a fictitious example to illustrate the situation.

The Context

Say we have a online multiplayer game with lots of players and we want to reward players based on their performance. For example, if a player logged in within the past 10 days, they'll get some in-game currency. If a player logged in within the past 3 days, they'll get premium in-game currency.

Each player in stored in a relational database (like PostgreSQL) while the rules are stored on a hard-drive. The hard-drive is attached to our computer but the relational database is somewhere else. We'll say it takes 100 times longer to read from the relational database than the hard-drive, which isn't an unreasonable assumption.

The Theory

A graduate of Computer Science would know the run time of looping through two collections of size n and m is O ( n m ) . They would either:

and apply the rule to the player. Either option you pick is O ( n m ) so it doesn't matter which one you use.

The Reality

In Computer Science theory, big-O notation hides a constant at the front of the expression. This constant makes no difference when we approach the limits of the algorithm, but it does matter with smaller numbers, even though those small numbers can be really big. An algorithm with a running time of 5 n is faster than one with a running time of 10 n , even though they are both O ( n ) .

Another abstraction computer science makes is the time of accessing data. Generally, computer science theory assumes accessing data is O ( 1 ) . However, as mentioned above, the big-O notation hides a constant at the front of the expression, which can change depending on if the data is in memory, on a hard-drive, or on a database. Where data resides is a practical concern, not a theoretical one.

The Analysis

For our analysis of the two approaches, we'll let n be the number of players and m be the number of rules. We know from the context that reading a player takes 100 times longer than reading a rule. An assumption we'll make is the time to execute a rule is 1.

Rules-first

When iterating through the rules then the players, the code will look like this:

for rule in rules:
    for player in players:
        rule.apply(player)

The running time is

number of rules * time to read a rule:
    number of players * time to read a player:
        time to execute rule on player

which translates to m ( 1 + n * 100 ) = m + 100 m n which is O ( n m ) .

Players-first

When iterating through the players then the rules, the code will look like this:

for player in players:
    for rule in rules:
        rule.apply(player)

The running time is

number of players * time to read a player:
    number of rules * time to read a rule:
        time to execute rule on player

The math equation is n ( 100 + m * 1 ) = 100 n + n m which is O ( n m ) .

Comparison

The factor in the O ( n m ) for rules-first iteration is 100 while it's 1 for players-first. This means, as n and m approach their limits, the rules-first iteration will take 100 times longer than the players-first iteration. For massive multiplayer games, the number of players will always dwarf the number of rules you have, making the players-first iteration the best choice.

Takeaway

The initial takeaway from this problem is you should read data from the slowest resource first when iterating in a nested loop. This can be extrapolated to any number of collections/resources. A higher-level lesson is the more you minimize reading from slow resources, the faster the algorithm will be.

Some people might get the impression that this is simply a case of

In theory, theory and practice are the same.
In practice, they are not.

but it misses the point of theory and practice. Theory and practise are not adverse to one another; they inform one another. Theory informs practioners on how things work without having to know everthing and practioners inform theorists of how reality actually is. Sound theory is derived from reality and practise gives intuition which rises to theory. Physics is a great example of this interplay.

A more pragmatic take from theory and practise not aligning is to remember theory hides information. One must know the limits and assumptions of theory in order to apply the theory to practise. If you forget, you'll get the wrong results.

How did you learn this?

My insight into this problem was from watching "Data-Oriented Design" by Mike Acton. I watched it back in either high-school or early university, as I worked at Demonware in the third-year of university. I don't remember the details of the talk, but my takeaway was to lay out data the way you access it. If you need to look through the keys of a hash table, you should lay out the keys next to each other.

I also reversed my takeaway to "access your data the way you laid it out". In the example above, the players were laid out in a database which is much slower than the rules on the hard-drive. From the talk, this is like having a key and value stored right next to each other; it takes longer to read a key when values are stored next to keys compared to keys being stored right next to each other. Therefore, we should avoid the expensive read.

I do have a better example of applying data-oriented design, or at least thinking that way, but that will have to be another article.