A lot of people seem to struggle with the idea that stochastic processes can produce deterministic outcomes on grand scales. A prime example is genetic drift and how its results (and it’s interactions with silent mutations) can be used to assess past population structures and sizes.
As I’m increasingly finding that a good way to better understand a concept is to build it into a little model where you can tweak the parameters at your will (and also as an exercise in try-except-constructions and classes), I’ve written some Python code that simulates neutral evolution(0). It doesn’t show anything new, but it may be didactical. The bottom line is the following graph, which shows how the diversity of a population (the number of different alleles present) starts to oscillate around a point of equilibrium between mutations introducing new variants and existent once drifting out of the population after a while, the level where that equilibrium is reached being function of the population size and irrespective of initial diversity.

The total diversity (number of different genotypes) of a population over time under 3×2 conditions – different initial diversity and three different population sizes; log-scale on y-axis.
What this little toy of mine can not measure is different types of diversity: In my model, all variants are created equal, and don’t stand in any special relation to each other, so when I assume a possibility space of 100000, if and when a carrier of variant #73489 undergoes a mutation, the result can be any of the remaining 99999 possible variants. In reality, variants form a network of possible transformations, with variants that are closer or more distant from each other. So my model (even if it allowed to change the size of a population over time, which would currently require some ugly hacks) is insufficient to distinguish a large population that is the result of a recent expansion from a medium-sized population from a large population that is the result of a not-so-recent expansion from a small population – both will show less diversity than is expected for their size, from which fact, absent a way to tell different types of diversity apart, we could only conclude (assuming we know the mutation rate and reproduction patterns) is that either of these things must have happened. In reality, one would have (say) hundred different variants most of which are close to each other (converging at just a few right before the expansion started), while the other might have the same number of variants, but those would be more distinct from each other. Another feature of real genomes absent in this simple model is that you can track the variants of multiple genetic loci individually. This allows to diagnose subdivisions of the population when the limits of the ranges of individual variants correlate, which they shouldn’t if sheer distance in an otherwise uniform population is all that’s at work.
These shortcomings notwithstanding, the model is sufficient to see the effect of population size on gross diversity.
The baseline
Let’s start with a very small population, i.e. 50 smurfs.(1) Continue reading







