Tuesday 2 January 2018

Mating Sudoku

Natural Selection: the process whereby organisms better adapted to their environment tend to survive and produce more offspring.

Given that this age-old strategy of cultivating the fit and eliminating the weak has shaped everything from predatorial leopards to the complexity of human brains, it'd be safe to assume that it too could be employed to undertake menial tasks for lazy people, say, solve a sudoku for me. Right?

Well, let's give it a try!

Now the first step would be to understand the mechanisms of evolution. According to Wikipedia, there are 4 major aspects that concern the evolution of species in the natural world. 
1. Population
2. Sex & Recombination
3. Mutation
4. Selection

Step 1: Population


For natural selection and evolution to work, we first need to start with an initial population. In the case of my sudoku-solving quest, I will have to start with a randomized solution that fills up the sudoku puzzle. 

First, I load up a problem set as demonstrated to the left. Then, the program generates a list of 15 randomized solution, as shown below:

Alright, that fulfills the step one of evolution. As you can see from the solution above, there are many violations to the sudoku rule, such as repeating digits in the same row/column or in the same 3x3 tile. We need to start pairing these random solutions with each other to produce offsprings that fare better than their randomized parents.




Step 2: Sex & Recombination

Here comes the fun part. To decide which pairs of randomized solutions get to 'mate' with each other and produce combined results, I opted to learn from nature. As natural selection prefers organisms with higher fitness index, I had to introduce an artificially-conceived Fitness Index to differentiate between the population. 

This is how the fitness index is calculated:

TL;DR: I took the maximum fitness score (243) and subtract 1 for every time a tile violates the row/column/tile non-repetition rule. So a row with the digits 6,5,6,4,9,1,2,1,6 would lead to deduction of 3 scores due to repeating 6s and 1s

Whew. Now that we got that sorted, let's see just how fit our first generation of sudoku solutions are. 

A whopping 92/243! That's horrible, and as you can tell from how the numbers are arranged, there are so many errors and repetitions and digits across rows and columns.

Luckily, we don't have to panic yet. Despite starting off as blind and deaf unicellular organisms, we ended up faring pretty well after billion years of evolution. What we need is for our ancestors to have ample time and a lot of sex. A whole lot of 'em.

Cross-over mechanism:
1. Lottery to pick 2 sudoku-solutions for sexual reproduction(Chances are weighed by their Fitness Index)
2. Convert solution tables from both parties into lists of numbers. Then split the list of numbers at a randomized position and join the two lists together
3. Convert the joined list back into table form
4. A new baby is born

(For detailed understanding of the mechanism, refer to lines 41-67 in the source code)

Step 3: Mutation

If the population were left to intermix with each other, they will gradually converge into identical copies over time. Unless you are a fan of the dystopian clone-ridden future, we need to prevent this situation from occurring. If our sudoku-solutions were to converge too quickly, we will end up with suboptimal solutions with no room for improvement.

A working contingency plan, as demonstrated by nature, is to create random mutations. In nature, this happens because of DNA cloning errors. In the program, we artificially induce mutations on random tiles on the table.  





Note that mutation is not always good for the offspring. In 80% of the case, the mutation will lead to disastrous results, causing the offpsrings to shed off 20-30 fitness score. In the long run, mutation does lead to better diversity and more potential for growth.

 Step 4: Selection

This step is made easy for us. All sudoku solutions simply die off (or shall I say, executedafter each iteration. Their evolutionary success is judged solely by their reproductive ability, and that in turns is artificially determined by their Fitness Index. (Step 2)

With all four steps set in place, let's run our program!



As you can see, the average fitness and best fitness indexes are slowly climbing up after multiple generations of recombination and selection.

The process starts to slow down after 2-3 minutes as individuals of the population become more and more similar to each other. Here's the results after 4 hours of the repeated process, as taken from my previous run:



Pretty neat, eh? Admittedly, there's still room for improvements, as the solution hasn't reached the optimal solution of 240. For that to happen, better algorithms is needed, as my program hits a plateau at around the 200 region and never seemed to go over it.

Hope you enjoyed the read. All ideas and suggestions are greatly welcomed!

Source Code (Python): https://github.com/ImSandwich/AI-Code-Projects/blob/master/genetic.py

To test the code for yourself:
1. Download and install Python 3.6.x from https://www.python.org/downloads/
2. Download the source code
3. Open up Command Prompt
4. type cd [the folder you downloaded the source code to]
For example: C:\Users\User\Desktop\Code Projects\
5. type python genetic.py
6. Insert the desired population size (Recommended: 15)
7. Insert the desired mutation rate (Recommended: 0.04)
8. Insert the desired cutoff  (Recommended: 60, this is the minimum fitness a solution must have to be considered for survival)
9. Insert the desired scatter rate (Recommended: 0.15, this is a simulation of natural disasters that will occur leading to random selection of population)
10. Let it run, set back and enjoy =)

No comments:

Post a Comment