A simple genetic algorithm
Thomas Munro
Back to my software page
I saw IBM Alphaworks' ATOMIK writing system in a Slashdot post - created using "a Metropolis optimization algorithm" - and tried it out straight away. I am quite interested in minimalist user-interfaces and small computing devices. Over the years I have owned a HP DOS-powered hand-held (still works), an Apple Newton (a friend left it on the roof of his car and drove off... oops), and four different models of Palm. The Palm V is my current model, and it gets a lot of use, as a French/English dictionary, a newspaper browser, a note taker, a life organiser, a calculator, a game toy, and one of the platforms I write software for at my job.
ATOMIK is excellent - it is a really fast way to enter data into your PalmOS device. They sent me a sticker for free, you just have to cut it out and attach it to the front of you Palm V or similar, and install the 5k hack that patches the text input system.
It's a keyboard layout that has been optimised with three goals in mind: staying as close to alphabetical order as possible so that new users can find letters fairly quickly; putting commonly occuring letters next to each other; and using an 'atom' metaphor, with the space at the center (since it occurs once per word) and putting the letters around it.
So I started wondering how I would have gone about such a thing, and also wondering about how to deal with other Western European languages. I set out to make my own layout optimiser, so I could make keyboards for English and French. Here are my results.
This seemed to be a clearcut case for genetic selection. I used the following simple method:
I simplified the problem in several ways: accents are ignored (there probably aren't any in an English language text anyway), capital letters are ignored (there is no shift key), punctuation is ignored --- in fact everything is ignored except letters and the space character.
I wrote a small program in C which you can download here:
The included Makefile fetches the text of The Cathedral and the Bazaar by Eric Raymond, and tries it on different keyboards, the results of which are shown below. I chose this text because it's fairly long (15163 words at the time of writing, but it could be different when you read this, he's still working on it), it's written in fairly casual style, and it uses American spelling (I use British spelling myself, but IBM presumably used American spelling in their analysis, not that there is a particularly big difference in the scheme of things).
Using a straight alphabetically-sorted keyboard:
munro@spitfire:~/projects/mutantscript$ make abc
./mutantscript 1 1 "abcdefghijklmnopqrstuvwxyz " < cathedral-bazaar.txt
Generation 0 / 1 (best score 14.365348)
+---+---+---+---+
| a | b | c | d |
+-+-+-+-+-+-+-+-+-+---+---+
| e | f | g | h | i | j |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| k | l | m | n | o | p | q |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| r | s | t | u | v | w |
+---+---+-+-+-+-+-+-+-+-+-+
| x | y | z | |
+---+---+---+---+
Distance per letter: 14.365348 millimetres
Alphabetic order rating: 0
Weighted score: 14.365348
Generations: 1
You can see that the pen travels about 14.4mm per letter on average, in order to write out Mr Raymond's book. Note that I chose the arbitrary distance of 5mm for the distance to move between one letter and the letter immediately to its left or right - that's roughly the distance on the ATOMIK sticker - and to go from one letter to the character up and to the left it is slightly further. The alphabetic order rating is the lowest possible.
Next, let's see how the ATOMIK layout fares:
munro@spitfire:~/projects/mutantscript$ make atomik
./mutantscript 1 1 "bkdgcanimqfle syxjhtopvruwz" < cathedral-bazaar.txt
Generation 0 / 1 (best score 10.016061)
+---+---+---+---+
| b | k | d | g |
+-+-+-+-+-+-+-+-+-+---+---+
| c | a | n | i | m | q |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| f | l | e | | s | y | x |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| j | h | t | o | p | v |
+---+---+-+-+-+-+-+-+-+-+-+
| r | u | w | z |
+---+---+---+---+
Distance per letter: 9.066493 millimetres
Alphabetic order rating: 118
Weighted score: 10.016061
Generations: 1
Much better - only about 9mm per letter. Now let's try genetically selecting a keyboard using 100 generations, with a population of 100:
munro@spitfire:~/projects/mutantscript$ make breed100
./mutantscript 100 100 < cathedral-bazaar.txt
Generation 99 / 100 (best score 9.916789)
+---+---+---+---+
| k | h | t | d |
+-+-+-+-+-+-+-+-+-+---+---+
| c | s | | i | f | b |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| m | a | e | n | o | g | j |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| v | r | l | p | u | z |
+---+---+-+-+-+-+-+-+-+-+-+
| y | w | x | q |
+---+---+---+---+
Distance per letter: 8.907460 millimetres
Alphabetic order rating: 124
Weighted score: 9.887808
Generations: 100
Not bad at about 9.1mm per letter. Notice that the space is not quite in the centre. Not as good as the ATOMIK layout though - let's try 300 generations:
The distance per letter is down to about 8.3mm per letter, less than ATOMIK's, and it is also "closer" to alphabetical order (lower is better). The space is now in the centre, like in the ATOMIK layout. Time to see what a larger number of generations will do...
munro@spitfire:~/projects/mutantscript$ make breed10000
gcc -g -lm mutantscript.c -o mutantscript
./mutantscript 200 10000 < cathedral-bazaar.txt
Generation 9999 / 10000 (best score 9.282173)
+---+---+---+---+
| b | g | d | c |
+-+-+-+-+-+-+-+-+-+---+---+
| f | a | n | i | h | k |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| m | l | e | | t | w | q |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| p | r | o | s | v | x |
+---+---+-+-+-+-+-+-+-+-+-+
| u | y | z | j |
+---+---+---+---+
Distance per letter: 8.812849 millimetres
Alphabetic order rating: 60
Weighted score: 9.282173
Generations: 10000
What happens if we use the ATOMIK keyboard as a starting point?
munro@spitfire:~/projects/mutantscript$ make atomik400
./mutantscript 100 400 "bkdgcanimqfle syxjhtopvruwz" < cathedral-bazaar.txt
Generation 399 / 400 (best score 9.258436)
+---+---+---+---+
| j | b | c | d |
+-+-+-+-+-+-+-+-+-+---+---+
| g | a | n | i | l | f |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| k | h | e | | o | m | q |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| v | r | t | s | u | p |
+---+---+-+-+-+-+-+-+-+-+-+
| w | y | z | x |
+---+---+---+---+
Distance per letter: 8.775523 millimetres
Alphabetic order rating: 62
Weighted score: 9.258436
Generations: 400
Genetic algorithms are fun! Using a very simple program it's possible to get very interesting results. I feel that if this program were made a little bit more sophisticated (crossing successful keyboards?), and modified to include some of the details I have left out (such as punctuation, numbers and capital letters), this might possibly be a very good way to make an optimal layout for minimising pen movement.
Almost exactly one year later, I have to admit that I am not the ATOMIK layout OR my own "home grown" layouts. What can I say, Grafitti is quite likeable :-) However I was inspired to tidy up this experiment and put it on my webserver when I saw something similar on Slashdot recently: This person has applied the same type of thinking to standard computer keyboard layouts.
I am currently trying to get my head around genetic programming which "create[s] computer programs in the lisp or scheme computer languages as the solution" (as opposed to genetic algorithms like mutantscript which "create[...] a string of numbers that represent the solution").