×

Search anything:

# Solve Crossword using Backtracking

#### backtracking Algorithms Get this book -> Problems on Array: For Interviews and Competitive Programming

In this article, we have covered the Backtracking Algorithm for Crossword and compared with the Brute Force approach. We have presented the Time and Space Complexity for various cases.

1. Introduction
2. Naive Approach
3. Backtracking
4. Time and Space Complexity
5. Conclusion

## Introduction

A crossword is a word puzzle that usually consists of a square or rectangular grid of white and black-shaded squares. By solving clues that lead to the solutions, the goal is to fill the white squares with letters, producing words or phrases. The solution words and phrases are inserted in the grid from left to right ("across") and from top to bottom in languages that are written left to right ("down"). The words or phrases are separated by shaded squares.

Crossword Puzzle Solution  The goal is to fill all of the empty cells in a 6x6 grid with words such that the grid becomes valid. The constraint here is that the character at the intersection point have to be the same.

## Naive Approach

The process in the naive approach fills in the empty cells with no logic and then examines whether it was a valid placement. This can take a long time and be quite inefficient. The Backtracking Algorithm would be an improvement to this naive approach.

## Backtracking

Backtracking is a form of brute-force approach that comes into play when addressing a problem that involves evaluating several options since we don't know which one is accurate and we try to solve the problem using the trial and error method, one decision at a time, until we get the desired answer.

The method described above can be used to solve the crossword puzzle. This backtracking algorithm traverses all the vacant cells, incrementally fills the cells with possible words retrieved from a dictionary file, then backtracks when a word filled does not comply with the constraint. This process is repeated until all of the cells are filled.

###### Steps
• We begin by looking for a continous row/column of empty cells.
• A valid crossword is obtained when all of the cells are filled.
• We try to fill the continous row/column of empty cells with values retrieved from the dictionary file.
• We examine the intersection point between the horizontal and vertical words for contraint before placing.
• If we can satisfy the contraints, we will place that word at that location, start-coordinates(a,b), end-coordinates(x,y) and then restart the procedure by looking for the next continous row/column of empty cells.
• If none of the words can be placed, we'll have to backtrack and alter the values for previously visited cells.

Let's take a closer look at the algorithm.

###### Pseudocode
``````backtrack_solver(crossword)
if(no more cells are there to explore)
return true
for(all available possibilities)
try one possibility p
if(backtrack_solver(crossword with possibility p made) is true)
return true
unmake possibility p
return false
``````
###### Implementation

Implementaion of the above backtracking algorithm :

The `copy` library is imported to enable the usage of the `deepcopy()` function. In the case of deep copy, an object is copied in another object. It means that any changes made to a copy of an object are not reflected in the original. The `shapely` library is imported to allow us to find the intersection point(constraint) in the crossword.

The class is used to represents each word in the crossword puzzle. When we apply backtracking later, we use the value variable to assign values to each word.

``````import copy
from shapely.geometry import LineString

class Word:

#coordinates of the starting and ending point
start_coord = ()
end_coord = ()

#horizontal word = 0, vertical word = 1
orientation = 0

#word length
length = 0

#value assigned to this word
value = ''
``````

The `load_crossword_puzzle` and `load_dictionary` function take in one input (filename), read content of the file, clean it up by removing the `tabs`, `newlines` and `spaces` and return the result as a `list`.

``````def load_crossword_puzzle(filename):
crossword = []
with open(filename, 'r') as cfile:
for line in puzzle:
replaced = line.replace("\t", "")
replaced = replaced.replace("\n", "")
replaced = replaced.replace(" ", "")
crossword.append(list(replaced))
return crossword

dictionary = []
with open(filename, 'r') as dfile:
for word in wordslist:
replaced = word.replace("\n", "")
dictionary.append(replaced)
return dictionary
``````

The function `find_horizontal_words(crossword)` verify whether the orientation of the words in the crossword puzzle are horizontal. It achieve this by iterating each row of the puzzle and check if the character is `0`. If this is the first instance of `0` is found, i.e. the previous character is `#`, the function will save the starting coordinates (row, column) and track the length of the word. The function will continue to iterate to the next column and check for the `0` character. If the next character is `#`, the function will save the previous column position as the ending coordinates of the word.

The function `find_vertical_words(crossword)` verify whether the orientation of the words in the crossword puzzle are vertical. The function is similar to the `find_horizontal_words(crossword)`, except that it is iterating through each column of the puzzle to locate the words.

``````def find_horizontal_words(crossword):
horizontal_words = []

for row in range(len(crossword)):

column = 0
word = Word()
finished = False
prev = '#' #prev mean the previous char in the word

while column <= len(crossword[row])-1:

if crossword[row][column] == '0':

if prev == '0':
word.length += 1
prev = '0'
if column == len(crossword[row])-1:
if not finished:
finished = True
word.end_coord = (row, column)
prev = '#'

elif prev == "#":
if finished:
finished = False
word.start_coord = (row, column)
word.length += 1
prev = '0'

elif crossword[row][column] == '#':

if prev == '0':
if not finished:
finished = True
if word.length > 1:
word.end_coord = (row, column-1)
else:
word = Word()
prev = '#'

if word.length > 1 and finished:
word.orientation = 0
horizontal_words.append(word)
word = Word()
finished = False

column += 1

return horizontal_words

def find_vertical_words(crossword):
vertical_words = []
word = Word()
started = False

for column in range(0, len(crossword)):
started = False
for row in range(0, len(crossword)-1):
if crossword[row][column] == '0' and crossword[row+1][column] == '0':
if started == False:
started = True
word.start_coord = (row, column)

if row == len(crossword)-2 and started:
word.end_coord = (row+1, column)
word.length = word.end_coord - word.start_coord + 1
word.orientation = 1
vertical_words.append(word)
word = Word()
started = False
else:
if started:
word.end_coord = (row, column)
word.length = word.end_coord - word.start_coord + 1
word.orientation = 1
vertical_words.append(word)
word = Word()
started = False
return vertical_words
``````

Next we defined the backtracking algorithm function. The function take in three variable, `assigned_variable_list`, `not_assigned_variable_list`, `dict`. The `not_assigned_variable_list` consist of all the horizontal and vertical words pending to be filled in the crossword. The `dict` variable is the value returned by the load_dictionary() function. The `assigned_variable_list` hold all the values that satisfy the contraint in the crossword.

Next the `get_possible_values()` is called to return all possible words(values) that fit the word length of the crossword. The `check_constraint()` function is called to ensure the value assigned satisfy the contraint of the crossword.

If all possible values are unable to satisfy the constraint, it means that the previous "word" assigned was wrong and hence the algorithm will backtrack and leave the word cell unassigned to try another possibilities.

``````def backtracking(assigned_variable_list, not_assigned_variable_list, dict):

#theres are no variables to assign a value so we are done
if len(not_assigned_variable_list) == 0:
return assigned_variable_list

var = not_assigned_variable_list

possible_val = get_possible_values(var, assigned_variable_list, dict)

for val in possible_val:
# we create the variable check_var to do the checking and avoid assigning values which do not comply with the constraint
check_var = copy.deepcopy(var)
check_var.value = val
if check_constraint(check_var, assigned_variable_list):
var.value = val
result = backtracking(assigned_variable_list+[var], not_assigned_variable_list[1:], dict)
if result != None:
return result
# we've reached here because the choice we made by putting some 'word' here was wrong
# hence now leave the word cell unassigned to try another possibilities
var.value = ''

return None
``````

The `get_possible_values` function is to go through the dictionary and return all possible value that fit the word length of the crossword that we are solving. The function also check values already assigned to the crossword to avoid duplication

``````#returns all possible values for the desired variable
def get_possible_values(var, assigned_variable_list, dict):
possibles_values = []

for val in dict:
if len(val) == var.length:
possibles_values.append(val)

for item in assigned_variable_list:
if item.value in possibles_values:
possibles_values.remove(item.value)

return possibles_values
``````

The function `check_constraint()` check the proposed value to be assigned (`var`) against the `assigned_variable_list`. If they are the same orientation (both are horizontal or both are vertical), no further check is needed. If otherwise (one is horizontal and the other is vertical), the `check_intersections()` will be called to help check for the intersection point (if any).

The `check_intersections()` make use of the `LineString` function from the `shapely` module which we imported earlier. As we treat words here like lines, we can find the intersection point of horizontal and vertical words (the character position - intersection point is the constraints which the algorithm must apply to get a valid solution)

``````#checks var against assigned variable list
def check_constraint(var, assigned_variable_list):
if assigned_variable_list != None:
for word in assigned_variable_list:
#if orientation is equal they will never interesect!
if var.orientation != word.orientation:
intersection = check_intersections(var, word)
if len(intersection) != 0:
if var.orientation == 0: #horizontal
if var.value[int(intersection-var.start_coord)] != word.value[int(intersection-word.start_coord)]:
return False
else: #vertical
if var.value[int(intersection-var.start_coord)] != word.value[int(intersection-word.start_coord)]:
return False
return True

#treat words here like lines so we find the intersection point of horizontal and vertical words (the character position - intersection point is the constraints which the algorithm must apply to get a valid solution)
def check_intersections(w1, w2):
line1 = LineString([w1.start_coord, w1.end_coord])
line2 = LineString([w2.start_coord, w2.end_coord])

intersection_point = line1.intersection(line2)

if not intersection_point.is_empty:
return [intersection_point.coords] #result(float)
else:
return []
``````

The function `insert_word_to_puzzle()` will now insert the word found by our solver into the crossword base on their orientation, starting coordinates and ending coordinates.

``````def insert_word_to_puzzle(crossword, word, coord, orientation):
pos_count = 0
for char in word:
if orientation == 0: #horizontal if orientation == 0
crossword[coord][coord+pos_count] = char
else:
crossword[coord+pos_count][coord] = char
pos_count += 1
return crossword
``````

The below codes will attempt to run our crossword solver. `puzzle.txt` is our crossword layout we are trying to solve. An example of the layout is as follow, `#` represent the dark shaded area of the crossword and `0` represent the word cell to be filled.

``````#	0	#	#	#	#
#	0	#	#	#	#
#	0	#	#	#	0
#	0	0	0	0	0
#	0	#	#	#	0
#	0	#	#	#	0
``````

The file `words.txt` is a list of word separated by a new line for our solver to use as possible values to fill in the word cells. An example is as follow

``````ability
able
above
accept
according
``````
``````cw_puzzle = load_crossword_puzzle("puzzle.txt")
horizontal_word = find_horizontal_words(cw_puzzle)
vertical_word = find_vertical_words(cw_puzzle)
total_words = horizontal_word + vertical_word
assign_var_list = []
suggested_solution = backtracking(assign_var_list, total_words, dict)

print("---------- Crossword ---------")
for line in cw_puzzle:
print(line)
print("------------------------------")

print("---------- Solution ----------")

if suggested_solution is None:
print("No solution found")
else:
for word in suggested_solution:
insert_word_to_puzzle(cw_puzzle, word.value, word.start_coord, word.orientation)

for line in cw_puzzle:
print(line)

print("------------------------------")
``````

Once we run the code, we should see the following output

``````---------- Crossword ---------
['#', '0', '#', '#', '#', '#']
['#', '0', '#', '#', '#', '#']
['#', '0', '#', '#', '#', '0']
['#', '0', '0', '0', '0', '0']
['#', '0', '#', '#', '#', '0']
['#', '0', '#', '#', '#', '0']
------------------------------
---------- Solution ----------
['#', 'a', '#', '#', '#', '#']
['#', 'l', '#', '#', '#', '#']
['#', 'w', '#', '#', '#', 'i']
['#', 'a', 'b', 'o', 'u', 't']
['#', 'y', '#', '#', '#', 'e']
['#', 's', '#', '#', '#', 'm']
------------------------------
``````

## Time and Space Complexity

#### Time Complexity

Time Complexity of our Backtracking approach: O((M * P)^D)

where:

• N is the number of continous row/columns of empty cell (word to be filled) in the grid
• P is the lists of possible word to be tested for the crossword constraint.
• For the backtracking function, the depth(D) of the this recursive function which will be equal to the crossword constraint. D = intersection point(s) between the horizontal and vertical words.
• M is the average length of word

The Time Complexity will be O((M * P)^D) as each continuous cells will have only one word so D+1 words will be used. At each intersection, we need to check P words. M is the average length of word and this is needed as we need to check constraints (reading time of a string).

#### Space Complexity

O(L) : where L is the length of the given word. This space is used for recursion stack.

## Conclusion

Other ways to solve crossword include:

• Forward Checking
• Dynamic Variable Ordering
• Conflict-Directed Backjumping
• Arc Consistency

With this article at OpenGenus, you must have the complete idea of Solving Crossword using Backtracking.