Sunday, November 22, 2009

Writing a recursive Java puzzle solver

A couple years ago I wrote a puzzle solver in Java. The puzzle in question goes by many names, and you can play various implementations of it on several different social networking sites and elsewhere. It probably began as a board game but I'm not familiar with its name. I think this is a nice, simple example of how to use a recursive algorithm to solve a problem using brute force.

I do not condone cheating and I ask that you not use my code for that purpose.

In order to "solve" a puzzle one must know what a solved puzzle looks like. The puzzle is a grid of letters, typically with dimensions between 4x4 and 5x5, but that is not a fast rule (our solver will allow the dimensions to be configured at runtime). Typically there is a time limit to find as many words as possible in the grid, after which point the round ends and all players' scores are tallied.

Here's an example 4x4 game grid (the numbers are just row/column labels and are not part of the board itself):

 1 2 3 4
[A E K C] 1
[M R F S] 2
[B D C I] 3
[L P V R] 4

The challenge is to find as many words as possible, using each space in the grid only once per word, and building words by hopping from space to contiguous space. The player can start at any place on the grid, and a contiguous space includes horizontal, vertical and diagonal moves. For example, in the grid above one word we could build is 'RISK' -- It starts on (r4, c4): R -> (r3, c4): I -> (r2, c4): S -> (r1, c3): K.

The first question we need to answer when designing our solver: what is a valid word? I know 'Risk' is a word, but how would the computer know that? The answer is, we need a dictionary of valid words which the program can refer to when solving the puzzle. This dictionary of allowed words varies in practice, but so long as our dictionary matches the game provider's dictionary, everything will work out nicely. I will not be providing a word list, as the actual words are irrelevant to the algorithm.

Our program will assume the dictionary is a flat text file with one word per line. It need not be alphabetical or sorted in any way.

Here's an example of how our file might look:

---------------- BEGIN FILE --------------------
ant
car
tree
risk
annual
etc
----------------- END FILE ---------------------

Now that we know the rules for solving the puzzle and we have a list of words which are valid, we can write our solver!

I will break our solution into three different files/classes:

  • Board.java (This will be an in-memory representation of the game board)
  • Dictionary.java (This will be the in-memory dictionary)
  • Solver.java (This is where the algorithm itself will be implemented, and will also contain the main method)

First I'll start with the Dictionary. This is the data structure where we'll store the list of valid words. The reason we store them here rather than just access them directly from the text file above, is speed. If we had to open the flat file and search the entire list of words every time we were checking if a word was valid, the program would run extremely slowly.

Our goal here is to make word lookups very speedy since we'll be doing a lot of them every time we solve a puzzle. The data structure we use here will either make or break our implementation.

Here's the Dictionary class:

import java.io.FileReader;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.HashSet;

/**
 * 
 * @author Ryan LaHue
 * This is our data structure to hold the dictionary of valid words.
 *
 */

public class Dictionary extends HashSet {
 public Dictionary(String filename) throws FileNotFoundException, IOException {
    super();
    buildDictionary(filename);
  }
  
 public void buildDictionary(String textFile) throws FileNotFoundException, IOException {
    //Takes in the name of a file containing words -- one per line.
    //Builds the dictionary in memory from this file.
    BufferedReader reader = new BufferedReader(new FileReader(textFile));
    String word = reader.readLine();

  while(word != null) {
    word = word.toLowerCase();
    add(word);
    word = reader.readLine();
  }
 }
}

As you can see, the Dictionary is a subclass of HashSet, which I picked as a reasonable data structure to hold the dictionary words. HashSet builds on top of the hash table data structure. Our Dictionary doesn't take long to build, has very quick access times (very important), and doesn't allow duplicates.


Here's the Board class:
import java.io.BufferedReader;
import java.io.InputStreamReader;

/**
 * 
 * @author Ryan LaHue
 * This is a representation of the puzzle board.
 * It is a grid consisting of rows and columns, with one letter per slot.
 */

public class Board {
 private String [][] board;
 private int rows;
 private int cols;
 private BufferedReader keyboard;
 
 public Board() {
  keyboard = new BufferedReader(new InputStreamReader(System.in));
    buildBoard();
    printBoard();
 }
 
 public int getRows() {
  return rows;
 }
 
 public int getCols() {
  return cols;
 }
 
 public String getLetterAt(int column, int row) {
  return board[column][row];
 }
 
 public void buildBoard() {
  //Prompts the user for the dimensions and letters on the board
  rows = iQuery("Enter the number of rows: ");
  cols = iQuery("Enter the number of columns: ");

  board = new String[cols][rows];
  for(int i=0;i<rows;i++) {
   String rawRow = sQuery("Enter the first row: ");
   rawRow = rawRow.replace(" ","");
   for(int j=0;j<rawRow.length();j++) {
    //Convert each row of letters into an array
    board[i][j] = String.valueOf(rawRow.charAt(j));
   }
  } 
 }
 
 public void printBoard() {
  //Prints a representation of the board entered by the user
  System.out.println("---The Board---");
  for(int i=0;i<cols;i++)
  {
   for(int j=0;j<rows;j++) {
    System.out.print( board[i][j] + " ");
   }
   System.out.println("");
  }
  System.out.println("---------------");
 }
 
 private String sQuery(String query) {
  //Returns a String typed by the user
  String line;
  System.out.print(query);
  try {
   line = keyboard.readLine();
  }catch(Exception e) {
   System.out.println("Invalid entry");
   return sQuery(query); //Keep prompting until a string is entered 
  }
  return line;
 }
 
 private int iQuery(String query) {
  //Returns an int typed by the user
  int answer = 0;
  System.out.print(query);
  try {
   answer = Integer.parseInt(keyboard.readLine());
  }catch(Exception e) {
   System.out.println("Invalid entry");
   return iQuery(query); //Keep prompting until a valid number is entered 
  }
  return answer;
 }
}

The Board class is the puzzle board. It also has some methods which are used to construct the board from user input.


Here's the Solver class:
import java.util.HashSet;
import java.util.Set;

/**
 * 
 * @author Ryan LaHue
 * This class contains the main method, which takes as an argument the dictionary filename
 * After the dictionary is created it prompts the user for the board dimensions and board letters.
 * After the board is set up, it prints out unique words as it finds them while traversing the board recursively.
 *
 * The rules and parameters:
 * 1) Build words between 3 and 9 letters long
 * 2) Use each slot on the board once per word
 * 3) Build words by moving to a letter in a slot adjacent to the last letter added to the current word.
 */
public class Solver {
 private Dictionary dictionary;
 private Board board;
 private static final int MAX_WORD_LENGTH = 9;
 private static final int MIN_WORD_LENGTH = 3;
 private Set foundWords = new HashSet();
  private int numIterations = 0;
 
 public static void main(String[] args) throws Exception {
    Solver solver = new Solver();
    solver.dictionary = new Dictionary(args[0]); //args[0] = File containing all valid words
    solver.board = new Board();
    solver.solve();
 }
 
 public void solve() {
  //This method solves the puzzle, utilizing the SBoard and SDictionary in memory

  for(int i=0;i<board.getCols();i++) {
   for(int j=0;j<board.getRows();j++) {
        //For each space on the board

        //Create a temporary map of the board; each space defaults to false (i.e. unused)
    boolean memoryBoard [][] = new boolean [board.getCols()][board.getRows()];

    //Solve the game, attempting to use each slot as the first letter
    solve("", memoryBoard, i, j);
   }
  }
  System.out.println("Number of iterations: " + numIterations);
 }
 
 public void solve(String currentWord, boolean memoryBoard[][], int currentColumn, int currentRow) {
    numIterations++;

  //Immediately back up the memoryBoard and pass a new one in recursion;
  //Otherwise multiple recursions will be changing the same memoryBoard
  boolean [][] newMemoryBoard = new boolean [board.getCols()][board.getRows()];
  for(int i=0;i<board.getCols();i++) {
   for(int j=0;j<board.getRows();j++) {
    newMemoryBoard[i][j] = memoryBoard[i][j];
      }
    }

    //Exit if the potential word grows too long
  if(currentWord.length() == MAX_WORD_LENGTH) {
   return;
  }

    if(!spaceExists(currentRow, currentColumn) || memoryBoard[currentColumn][currentRow]) {
      //If this space does not exist or it has already been used, end this recursion
      return;
    }

    //Mark this space as used
    newMemoryBoard[currentColumn][currentRow] = true;

  //Add this slot's letter to the pre-existing word
  currentWord = currentWord + board.getLetterAt(currentColumn, currentRow);
  
  if(dictionary.contains(currentWord) && currentWord.length() >= MIN_WORD_LENGTH) {
   if(foundWords.add(currentWord)) {
        //Print this word out since it hasn't previously been discovered
    System.out.println(currentWord);
      }
  }
   
  //Try building on this word by using every neighboring letter
  solve(currentWord, newMemoryBoard, currentColumn - 1, currentRow - 1); //lower-left
  solve(currentWord, newMemoryBoard, currentColumn, currentRow - 1); //left
  solve(currentWord, newMemoryBoard, currentColumn + 1, currentRow - 1); //upper-left
  solve(currentWord, newMemoryBoard, currentColumn - 1, currentRow); //below
  solve(currentWord, newMemoryBoard, currentColumn + 1, currentRow); //right
  solve(currentWord, newMemoryBoard, currentColumn - 1, currentRow + 1); //lower-right
  solve(currentWord, newMemoryBoard, currentColumn, currentRow + 1); //above
  solve(currentWord, newMemoryBoard, currentColumn + 1, currentRow + 1); //upper-right
 }

  private boolean spaceExists (int row, int col) {
    //Determines whether this row/column exists on the board
    if(row < 0
    || col < 0
    || row >= board.getRows()
    || col >= board.getCols()) {
   //No such space exists
   return false;
  }
    return true;
  }
}

The Solver class is where all the logic exists. It's all built around the solve method, which is called recursively to exhaust all possibilities as it looks for words.


The limit to word length (set to 3-9 letters in the code) is useful for larger boards. If we didn't cap it at a reasonable length then boards as small as 5x5 would lead to a phenomenal number of recursions. On my computer, with a 4x4 board and no cap, it took about a minute to solve the puzzle with 96237136 recursions. A 5x5 board went several minutes before I stopped it. I do not know the formula to calculate recursions based on a grid of size (X,Y), but it appears to be exponential.

The word cap is a simple heuristic I implemented to stop recursions that are unlikely to reveal undiscovered words. While it means we'll miss any word longer than 9 letters, for larger boards it dramatically increases the speed with which words are discovered. The fact that our program misses 10-letter and longer words means that it doesn't guarantee complete solutions for boards larger than 3x3.

Is this a good solution? That depends on your definition. If you're a cheater who is trying to score the most points, this is a perfectly good solution as it starts printing out words almost immediately and fairly quickly (depending on letter distribution, of course). If you need every matching word in a puzzle, and your dictionary contains words 10 letters in length or longer, then it is a bad solution with the cap; but if you remove the cap then it will find these long words.

The trade-off here is speed vs completeness. Memory is a third sliding variable, which could be a factor if this needed to run in a low-resource environment.

Can we improve upon this solution? Absolutely. If memory was not a concern then we could store every substring of every dictionary word, in our dictionary (along with a flag stating whether the word is a substring or a full word). At each point if the word we're building isn't found in the dictionary as either a full word or a substring, we can stop that recursion short.

Another possibility would be to explicitly search for each word in the dictionary rather than to let the puzzle drive the search. This would probably call for a different data structure.

Is there a perfect solution that improves in the areas of speed, completeness and memory? Almost certainly. Is there a "best" solution that wins against all other solutions in all three areas? I doubt it, as these things tend to be trade offs.

No comments:

Post a Comment