Designing an algorithm

In this blog post I want you to show how I design my algorithms. It’s a simply and structured approach to write short, fast and easy to read code (although you can not always fulfill all of the three requirement).

Example

Count the islands in 2 dimensional map. The values of the fields on the map can be 0 = water or 1 = land. An island is land surrounded by water fields.

* This example is from a Google Job Interview posted on Glassdoor.com.

 

My approach for designing this algorithm

1) Visualize the problem

At first I visualize some examples of the problem in a spreadsheet. You will see that some questions arise…

algorithm_visualization

 

2) Ask questions to stakeholders and specify the problem

Next I ask questions (to stakeholders) to find out more about the problem. Examples:

  • “Are there any constraints to the size or shape of the 2d map?”,
  • “Do I also need to detect islands with more than 1 fields?”,
  • “Do I need to detect islands within islands?”,
  • “Are islands which are directly at the border of the map really islands or peninsulas?”

I write down the answers, so the the requirements specification is available for later purposes (e.g. when new requirements should be added, the code should be refactored, a bug is reported, the algorithm should be integrated into an other project, …).

For this example the following spec is valid:

  • Count the islands in 2d-map. There are 2 field values: 0 = water, 1 = land
    • The map can be of any size m times n
    • Islands can have multiple fields
    • Islands can have islands within them
    • Constraint: fields which border to the edge of the map are peninsulas

 

3) Write unit tests

Now I do not start coding right away. Following a Test Driven Development (TDD) it’s time to write unit tests based on the examples I’ve figured out in 1). My tests also focus on special cases.

Here you can see the unit tests for this example based on the (excel file):

def test():
 field1 = [[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
           [1, 1, 0, 1, 0, 1, 1, 0, 1, 0],
           [1, 0, 0, 0, 0, 1, 0, 0, 1, 0],
           [0, 0, 1, 0, 0, 0, 0, 0, 1, 0],
           [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]]
 field2 = [[1, 0, 0, 0, 0, 0],
           [1, 1, 0, 1, 0, 1],
           [1, 0, 0, 0, 0, 1],
           [0, 0, 1, 0, 0, 0],
           [0, 0, 1, 0, 0, 0]]
 field3 =  [[0, 0, 0, 0, 0],
           [1, 1, 0, 1, 0],
           [1, 0, 0, 1, 0],
           [0, 0, 0, 1, 0],
           [0, 0, 0, 0, 0]]
 field4 = [[1],[1],[1],[1],[1],[1]]
 field5 = [[1,1,1,1,1,1,1,1]]
 field6 = [[0],[0],[0],[0],[0],[0]]
 field7 = [[0,0,0,0,0,0,0,0]]
 field8 = [[0],[0],[0],[1],[0],[0]]
 field9 = [[0,0,0,0,1,0,0,0]]
 field10 =[[0, 0, 0, 0, 0],
           [1, 1, 1, 1, 1],
           [0, 0, 0, 0, 0]]
 field11 = [[0, 0, 0, 0],
            [0, 1, 1, 0],
            [0, 1, 1, 0],
            [0, 0, 1, 0],
            [0, 0, 0, 0]]
 field12 = [[0]*250]*500 + [[0]*5 + [1]*240 + [0]*5] + [[0]*250]*500
 field13 = [[0, 0, 0, 0, 0],
            [0, 1, 1, 1, 0],
            [0, 1, 0, 1, 0],
            [0, 1, 1, 1, 0],
            [0, 0, 0, 0, 0],
            [0, 0, 0, 0, 0]]
 island_on_island = [[0, 0, 0, 0, 0, 0, 0],
                     [0, 1, 1, 1, 1, 1, 0],
                     [0, 1, 0, 0, 0, 1, 0],
                     [0, 1, 0, 1, 0, 1, 0],
                     [0, 1, 0, 0, 0, 1, 0],
                     [0, 1, 1, 1, 1, 1, 0],
                     [0, 0, 0, 0, 0, 0, 0]]
 assert(count_islands(field1) == 3)
 assert(count_islands(field2) == 1)
 assert(count_islands(field3) == 1)
 assert(count_islands(field4) == 0)
 assert(count_islands(field5) == 0)
 assert(count_islands(field6) == 0)
 assert(count_islands(field7) == 0)
 assert(count_islands(field8) == 0)
 assert(count_islands(field9) == 0)
 assert(count_islands(field10) == 0)
 assert(count_islands(field11) == 1)
 assert(count_islands(field12) == 1)
 assert(count_islands(field13) == 1)
 assert(count_islands(island_on_island) == 2)

 return "*** all tests pass! ***"

print(test())

4) Write pseudo-code

So far I didn’t care about the design of the algorithm. That changes in this step. Because trying to figure out a solution by writing pseudo-code it easier than directly implementing the algorithm, I write down my thoughts on paper (or in a virtual document).

It is a good idea to bear the efficiency of the algorithm in mind. You can write down the estimated runtime of every line with the Big-O-Notation.

 

5) Start coding

Once I’ve figured out a good solution, I start coding. Because I’ve already written my unit tests in step 3 it’s very easy for me to verify if my implemented algorithm works properly.

I’ve come to the following first draft of the solution for the problem (python):

class FieldIsWaterException(Exception): pass

def count_islands(_map):
 islands = 0
 width = len(_map[0])
 height = len(_map)
 
 r = 1
 while r < height:
   f = 1
     while f < width:
       if _map[r][f] != 2 and _map[r][f] != 0:
         if check_island(_map, r, f, width, height) == True:
           islands += 1
       f += 1
   r += 1
 
 return islands

# should only be called on a field which isn't water
def check_island(_map, r, f, width, height, prev=None):
 
 # if this field is water then something went wrong
 if(_map[r][f] == 0): FieldIsWaterException("Error: called on water")
 
 _map[r][f] = 2 
 
 # check lower
 if prev != "LO":
   if r+1 < height:
     if(_map[r+1][f] == 1): 
       if check_island(_map, r+1, f, width, height, "UP") == False:
         return False
 else:
   return False     # edge of world
 
 # check left
 if prev != "LE": 
   if (f-1) >= 0:
     if(_map[r][f-1] == 1): 
       if check_island(_map, r, f-1, width, height, "RI") == False:
         return False
 else:
   return False     # edge of world
 
 # check right
 if prev != "RI":
   if (f+1) < width:
     if(_map[r][f+1] == 1): 
       if check_island(_map, r, f+1, width, height, "LE") == False:
         return False
 else:
   return False     # edge of world
 
 # check upper
 if prev != "UP":
   if (r-1) >= 0:
     if(_map[r-1][f] == 1): 
       if check_island(_map, r-1, f, width, height, "LO") == False:
         return False
 else:
   return False     # edge of world
 
 return True

6) Test efficiency and refactoring

Once I am finished with the first draft and the tests pass, I start refactoring my solution. I want my code to be fast and easy to understand.

In python you can easily profile your code by using the cProfile class:

def test2():
 island_on_island = [[0, 0, 0, 0, 0, 0, 0],
                     [0, 1, 1, 1, 1, 1, 0],
                     [0, 1, 0, 0, 0, 1, 0],
                     [0, 1, 0, 1, 0, 1, 0],
                     [0, 1, 0, 0, 0, 1, 0],
                     [0, 1, 1, 1, 1, 1, 0],
                     [0, 0, 0, 0, 0, 0, 0]] 
 assert(count_islands(island_on_island) == 2)
 
import cProfile
cProfile.run("test2()")

In the test2() function I created a map with an island on an island. The map consists of 7 columns and 7 rows. O(n*m) would be if the check_island() function would be called around 49 times. Executing cProfile.run() on the test2() function gives the following result:

algorithm_profiling

We can see that check_island is only called 17 times. This is because the algorithms does not check “water-fields” and no “land-field” is checked twice.

In first draft above from point 5 I violated the “Don’t repeat yourself” principle by writing down the same checks for the left, upper, lower and right field in the check_island() function. Therefore I refactor the code a little bit. Because I’ve written my unit tests I can be confident that I do not destroy the algorithm by refactoring it.

...
# should only be called on a field which isn't water
def check_island(_map, r, f, width, height, prev=None):
 
 # if this field is water then something went wrong
 if(_map[r][f] == 0): FieldIsWaterException("Error: called on water")
 
 # if a land field has been checked once, no further need to check it
 _map[r][f] = 2 
 
 _checks = [["LO", r+1, f, "UP"], # check lower field
            ["LE", r, f-1, "RI"], # check left field
            ["RI", r, f+1, "LE"], # check right field
            ["UP", r-1, f, "LO"]] # check upper field
 
 for [_prev, _r, _f, _next] in _checks:
   if prev != _prev:                                         # if not previous field
     if _r < height and _r >= 0 and _f < width and _f >= 0:  # if in range
       if(_map[_r][_f] == 1):                                # if land
         if check_island(_map, _r, _f, width, height, _next) == False:
           return False
     else:
       return False         # edge of world 
 
 return True

Summary

This is it. This algorithms took me about 2 hours incl. specification, tests and refactoring. And believe me: Writing down your requirements and unit tests is definitely worth it in future. Regression tests prevent you from including bugs during implementation of further features and allows you to refactor your code. Specification allows you to write proper tests and to understand your code in future (comments in code also help, but the do not show you the goal of the code). Only if you have the requirements, you can check in future if new requirements conflict with old ones.

At the end I want to summarize the 6 steps shown in this blog post:

  1. Visualize the problem
  2. Ask questions to stakeholders and specify the problem
  3. Write unit tests
  4. Write pseudo-code
  5. Start coding
  6. Test efficiency and refactoring

If you are working in a team with other developers and testers you can split the work (e.g steps 1 – 3 is doing an other person than steps 4 – 6) and you can also add a 7th step called “review” where a senior is reviewing the code of the other team members. In this way the code get’s better and the others can learn from the senior developer.