On smurfs – playing around with genetic drift in a spatially structured population

A lot of people seem to struggle with the idea that stochastic processes can produce deterministic outcomes on grand scales. A prime example is genetic drift and how its results (and it’s interactions with silent mutations) can be used to assess past population structures and sizes.

As I’m increasingly finding that a good way to better understand a concept is to build it into a little model where you can tweak the parameters at your will (and also as an exercise in try-except-constructions and classes), I’ve written some Python code that simulates neutral evolution(0). It doesn’t show anything new, but it may be didactical. The bottom line is the following graph, which shows how the diversity of a population (the number of different alleles present) starts to oscillate around a point of equilibrium between mutations introducing new variants and existent once drifting out of the population after a while, the level where that equilibrium is reached being function of the population size and irrespective of initial diversity.

The total diversity (number of different genotypes) of a population over time under 3x2 conditions - different initial diversity and three different population sizes; log-scale on y-axis.

The total diversity (number of different genotypes) of a population over time under 3×2 conditions – different initial diversity and three different population sizes; log-scale on y-axis.

What this little toy of mine can not measure is different types of diversity: In my model, all variants are created equal, and don’t stand in any special relation to each other, so when I assume a possibility space of 100000, if and when a carrier of variant #73489 undergoes a mutation, the result can be any of the remaining 99999 possible variants. In reality, variants form a network of possible transformations, with variants that are closer or more distant from each other. So my model (even if it allowed to change the size of a population over time, which would currently require some ugly hacks) is insufficient to distinguish a large population that is the result of a recent expansion from a medium-sized population from a large population that is the result of a not-so-recent expansion from a small population – both will show less diversity than is expected for their size, from which fact, absent a way to tell different types of diversity apart, we could only conclude (assuming we know the mutation rate and reproduction patterns) is that either of these things must have happened. In reality, one would have (say) hundred different variants most of which are close to each other (converging at just a few right before the expansion started), while the other might have the same number of variants, but those would be more distinct from each other. Another feature of real genomes absent in this simple model is that you can track the variants of multiple genetic loci individually. This allows to diagnose subdivisions of the population when the limits of the ranges of individual variants correlate, which they shouldn’t if sheer distance in an otherwise uniform population is all that’s at work.

These shortcomings notwithstanding, the model is sufficient to see the effect of population size on gross diversity.

The baseline

Let’s start with a very small population, i.e. 50 smurfs.(1) Continue reading

The transition from quantity to quality, in multicolor pictures

In many natural systems, we observe phase transitions, or sudden emergence of qualitatively different behaviour once a certain threshold is reached through gradual, quantitative changes. This insight opens some interesting doors for conceptualising (the evolutionary roots of) human language, but this isn’t the post to elaborate on these. Here, I just want to offer a graphical illustration, using a rather more simple model, of how small, barely perceptible changes of the local properties of a system can drastically change its global properties.

Below, you see black and white pictures of a 2-dimensional random matrix of 0s and 1s, 1s black. The three pictures represent the results for three different probabilities `P` for a dot to become a 1 during the stage in which the matrix is generated, 57%, 59%, and 61%. I dare you to guess which is which without enlarging the images in order to be able to read what it says in the title bar! I know I couldn’t for my life, they all look the same to me.

Bildschirmfoto 2013-04-10 um 01.12.29

Bildschirmfoto 2013-04-10 um 00.47.26 Bildschirmfoto 2013-04-10 um 00.47.38

But exactly within this range of values of `P`, the global property of connectivity or permeability of the system changes in dramatic ways. If, instead of blackening all 1s, we sort them into clusters of mutually connected spots (through paths only using the four main directions), and code those clusters with color, we’ll see that with P-values as high as 0.585, we still get a haphazard assemblage of clusters of various size (in this and the following picture, the single largest cluster is coded black, the second largest red, and for the rest, the other colors are recycled as often as may be necessary, so when you see a large patch of, say, orange, it doesn’t necessarily mean that they’re indeed one and the same cluster, but for black and red you can be sure they are):

Bildschirmfoto 2013-04-10 um 00.49.44

Alas, once we move up to 0.595, the global structure has changed: We’re no longer looking at a multitude of independent clusters of roughly comparable size, but rather at a supercluster that alone alone covers a clear majority of the 1s, with all other essentially just islands within the sea of points that are connected to the the supercluster:

Bildschirmfoto 2013-04-10 um 00.50.08

Not a lot changes when we go further up to 0.6 – the islands just become smaller:

Bildschirmfoto 2013-04-10 um 00.50.50


By just increasing the ratio of ones so slightly that you won’t even notice the difference in a black-and-white representation, we’ve come to the point where you can walk almost anywhere from any starting point without ever stepping on the zeros.

(Code below fold)
Continue reading