• Ad Astra
  • Posts
  • Infinite Monkeys attempt writing the first line of Hamlet - a Genetic Evolution Experiment

Infinite Monkeys attempt writing the first line of Hamlet - a Genetic Evolution Experiment

Who Knew AI was just an over-glorified while loop that sucks your RAM away?

If you are part of this niche community of people that like to obsess on the weird hypothesis and imaginative daydreaming - you must have definitely come across the infinite monkey theorem.

The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type any given text, including the complete works of William Shakespeare. In fact, the monkey would almost surely type every possible finite text an infinite number of times.

In general, any sequence of events that has a non-zero probability of happening will almost certainly occur an infinite number of times, given an infinite amount of time or a universe that is infinite in size.

Well, that's a lot of infinites - but basically, given enough time, anything that's possible will be possible (Heck, our own evolution is a good example of that!)

But hitting keys at random is so - time consuming. It's no different than sorting an array of 10000 elements using bogo sort - very less chance of happening!

What if the monkeys knew if they were close to typing the hamlet - and like real life - the more closer / stronger they are in society, more their chance of spreading their seed (cough cough Genghis Khan) - and to carry on their legacy. Basically push their genes forward and undergo evolution?

Let's not throw around random math equations here - but bear with me for a second. For once, let the total number of Monkeys be n. The entire purpose of their existence is to type a set of keys and then they die before they make babies with other monkeys. The closer they are to typing the entire hamlet - the more likely their genes are to be transferred to the next generation - mimicking Darwin's Natural Selection. The rest just exist. It's a cruel world. Sheesh.

Now this cycle is one generation.

Now that the monkeys learn what's more fitting to pass their genes on, they will begin to mimick more and more of that - yet, subtly still do random stuff out of the ordinary - little mutations peeking out like nature.

Eventually, they would succeed. Faster than random. Eventually.

What you read is basically a simple instance of Deep Reinforcement Learning. The algorithm that mimics the natural selection - known as Genetic Evolution Algorithm. Let's try to implement this in C++.

Let's first create a Monkey class.


class Monkey {
    public:
    string chromosome;
    int fitness;

    Monkey(string chromosome){
        this->chromosome = chromosome;
        this->fitness = calculateFitness();
    }

And let's initialise these 'genes', i.e, the total set of letters in their universe.

#include <iostream>
#include <cstdlib>
#include <ctime>

#define POPULATION_SIZE 10000

using namespace std;

const std::string GENES = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890,.-;:_!\\\\"'#%&/()=?@${[]} ";
string TARGET;

int randInt(int start, int end){ 
    int range = end - start + 1;
    int randInt = start + (rand()%range);
    return randInt;
}

char mutatedGene(){
    return GENES[randInt(0, GENES.size() - 1)];
}

string createGenome(){
    int len = TARGET.size();
    string genome = "";
    for(int i = 0; i < len; i++){
        genome += mutatedGene();
    }
    return genome;
}

Let's add a few methods inside Monkey that help the Monkeys make babies and spread their genes - leaving some luck for mutations.


	int Monkey::calculateFitness(){
        int len = TARGET.size();
        int fitness = 0;
        for(int i = 0; i < len; i++)
        {
            if(chromosome[i] != TARGET[i]) {
                fitness += 1;
            }
        }
        return fitness;
    };

	Monkey Monkey::mate(Monkey parent2){
        string childChromosome = "";
        int len = chromosome.size();
        for(int i = 0; i < len; i++){
            float p = randInt(0, 100) / 100.0;
            if(p < 0.44){
                childChromosome += chromosome[i];
            } else if (p < 0.88){
                childChromosome += parent2.chromosome[i];
            } else {
                childChromosome += mutatedGene();
            }
        }
        return Monkey(childChromosome);
    }

And finally, implementing the main() where the the main while loop iterates through every generation until we get the desired string.

int main(){

    srand((unsigned) (time(0)));
    int generation = 0;
    vector<Monkey> population;
    bool found = false;

    cout << "Enter output string:\\\\n>> ";
    getline(cin, TARGET);

    for(int i = 0; i < POPULATION_SIZE; i++){
        population.push_back(Monkey(createGenome()));
    }

    while(!found){
        sort(population.begin(), population.end(),[](Monkey &p1, Monkey &p2){ return p1.fitness < p2.fitness;});

        if(population[0].fitness <= 0){
            found = true;
            break;
        }
        vector<Monkey> newGeneration;
        int elite = (10 * POPULATION_SIZE) / 100;
        for(int i = 0; i < elite; i++){
            newGeneration.push_back(population[i]);
        }
        elite = (90 * POPULATION_SIZE) / 100;
        for(int i = 0; i < elite; i++){
            int len = population.size();
            int r = randInt(0, 50);
            Monkey parent1 = population[r];
            r = randInt(0, 50);
            Monkey parent2 = population[r];
            newGeneration.push_back(parent1.mate(parent2));
        }

        population = newGeneration;

        cout << "Generation: " << generation << "\t";
        cout << "String: " << population[0].chromosome << "\t";
        cout << "Fitness: " << population[0].fitness << "\n";

        generation++;
    }

        cout << "Generation: " << generation << "\t";
        cout << "String: " << population[0].chromosome << "\t";
        cout << "Fitness: " << population[0].fitness << "\n";
}

Here, in the elite variable, we set the inital positions to pass the top 10% of the population to next generation as it is. The rest 90% have babies with each other (at random). And in the mate method, there is a 10% chance of mutation. These values can be tweaked to get the desired output - but ehh I'm lazy to test it out (and I have to study for my internals so nope).

Finally, on running the program, we are prompted with a desired string for the monkeys to write. Here, I will leave it to your imagination - but this is what I went with.

and....

It took 61 generations of 10,000 monkeys each to achieve the feat. A whole Hamlet would definitely take a lot of Monkeys (and GPU power).

On High Insight, there are probably ways to optimize it, by removing the already solved words from the string - or something like creating a better parameter of mutation. But I'm too lazy.

Interestingly, on implementing the same code in python - because everyone loves it easy (right?), we do get a different output…

import random

GENES = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ "1234567890,.-;:_!\\"'#%&/()=?@${[]}"
TARGET = ""
POPULATION_SIZE = 1000

def mutatedGene():
    return GENES[random.randint(0, len(GENES) - 1)]

def createGenome():
    return [GENES[random.randint(0, len(GENES) - 1)] for _ in range(len(TARGET))]

class Monkey:
    chromosome : list = []
    fitness = 0

    def __init__(self, genome=None):
        if(genome == None):
            self.chromosome = createGenome()
        else:
            self.chromosome = genome
        self.fitness = self.calulateFitness()
    
    def calulateFitness(self):
        ftns = 0
        for i in range(len(TARGET)):
            if self.chromosome[i] != TARGET[i]:
                ftns += 1
        return ftns

    def mate(self, partner):
        childChromosome = []
        for i in range(len(TARGET)):
            p = random.randint(0, 100) / 100.0
            if p < 0.45:
                childChromosome.append(self.chromosome[i])
            elif p < 0.90:
                childChromosome.append(partner.chromosome[i])
            else:
                childChromosome.append(mutatedGene())
        return Monkey("".join(childChromosome))
    def __lt__(self, other):
        return self.fitness < other.fitness

def main():
    global TARGET
    TARGET = input("Enter Output String: ")

    population = [Monkey(createGenome()) for _ in range(POPULATION_SIZE)]
    found = False

    for i, monkey in enumerate(population):
        if monkey is None:
            print(f"Invalid monkey at index {i}")

    generation = 0
    while not found:
        population.sort()

        if population[0].fitness <= 0:
            found = True
            break

        newGeneration = []
        elite = (10 * POPULATION_SIZE) //  100
        for i in range(elite):
            newGeneration.append(population[i])
        elite = (90 * POPULATION_SIZE) // 100
        for i in range(elite):
            p1 : Monkey = population[random.randint(0, 50)]
            p2 : Monkey = population[random.randint(0, 50)]
            newGeneration.append(p1.mate(p2))

        population = newGeneration
        generation += 1
        
        print(f"Generation: {generation}\tString: {"".join(population[0].chromosome)}\tFitness:{population[0].fitness}")

    print(f"Generation: {generation}\tString: {"".join(population[0].chromosome)}\tFitness:{population[0].fitness}")

main()

This is EXACTLY the same code as in C++, however, it does yield a way different output.

Python surprisingly took 119 times to get the same output, and it spent the last 60 generations trying to just find one or two letters. Now this might just be luck - but spending 50% more generations than C++? This had to be looked into more.

On running this same string on both C++ and Python 10 times in a row, we get the following table:

Population per generation

C++ Generations

Python Generations

10000

61

119

1000

30

571

1000

36

3238

1000

53

676

1000

78

695

1000

57

766

1000

256

475

1000

68

845

1000

37

5052

1000

31

858

Average

70.7 generations

1329.5 Generations

AVERAGE OF 1229.5 GENERATIONS? WTF? FOR A TWELVE WORD SENTENCE!? DUDE HOW EXCITED ARE THESE MONKEYS!????

Conclusion: Python sucks and Monkeys can type.

Much of my frustration with the python version comes with the fitness of the generation - which comes down quickly to around 2 or 3 within the range of 80 - 150 generations, but it spends like, the next 400 (or worse, 4000) generations fixating on the last two characters. There must be something that python does wrong - that C++ doesn’t.

MAYBE - and it’s just my guess (and lack of will to research because I’ve already spent way too much time on this and need to get back to important stuff like scroll reels) - it could have something to do with the ways random numbers are generated in C++ and Python. The random library in python is clearly giving out a more distributed set of numbers which struggle during the final edge case scenarios. C++ on the other hand uses rand() function and is set to a seed of the current time in seconds. That might be the case why the results differ. Maybe. And I wish I got to throw a bunch of jargon there because it might sound cool - but you get the point.

I mean, it’s the same exact code, and even if not entirely deterministic - we should be getting similar results - which we clearly aren’t.

In any which way, this serves as a sense of reminder that if we ever decide to make infinite monkeys undergo evolution to write the hamlet for us - we shall avoid the use of python.

But damn, apart from acquiring another reason for me to shit on python, Genetic Evolution in C++ is such a fun implementation, and who knew Artificial Intelligence is just over-glorified While Loops that take up a lot of RAM. The more you know.

/

Reply

or to participate.