Building dictionaries from professional games

One of the features of my Go program is that I build and use a "joseki dictionary" based on my collection of professional games. I use only professional games, and assume that all lines of play found in them are good ones.

Here's a first order description of my algorithm. My dictionary is normalized with the first move black, in the upper half of upper-right corner of the board. To incorporate a new game, I start four trackers, one for each corner of the board. The zones covered by the trackers are sawtooth shaped, and overlap at the middle of the sides. As I play through the game, and each tracker notices when a stone is played in its zone and uses it to extend the growing joseki tree.

Trackers notice when adjacent moves are not in the same zone, and record a possible tenuki, which can become a real tenuki if the next stone played in the zone is the same color.

Trackers get shut off when a stone is played immediately adjacent to their zone of concern. This exclusion zone prevents the joseki from incorporating completely silly moves; (silly because their meaning is heavily influenced by stones just outside the zone covered by the tracker).

After the fact, I order the tree by frequency and do a little manual pruning to eliminate really silly openings, but for the most part I use it unaltered.

Based on about 6000 pro games, I have a tree with about 10,000 terminal nodes and 100,000 total nodes. Each terminal is annotated with game it came from, and approximately which page of Ishida's joseki dictionary is relevant.

The current cutoff algorithm, based on playing into exclusion zones adjacent to each tracker, is pretty crude. I'd be interested to hear any ideas to make the joseki dictionary better than it is; to include more good lines and fewer bad ones.

A The fuseki dictionary is also built, approximately the same way. The termination is arbitrarily set to 30 moves (It used to be 50, but the tree got too big), and there are no exclusion zones.

The dictionaries and the full set of game records are available as SGF files, for research purposes, to bona fide Go programmers. BTW,many of the commonly available viewing programs have a lot of trouble handling trees of this size, but MGT seems to handle it pretty well..

comments/suggestions to:

my home page