19. Searching

Searching is an important and very common operation that computers do all the time. Searches are used every time someone does a ctrl-f for “find”, when a user uses “type-to” to quickly select an item, or when a web server pulls information about a customer to present a customized web page with the customer’s order.

../../_images/search.png

There are a lot of ways to search for data. Google has based an entire multi-billion dollar company on this fact. This chapter introduces the two simplest methods for searching, the linear search and the binary search.

19.1. Reading From a File

Before discussing how to search we need to learn how to read data from a file. Reading in a data set from a file is way more fun than typing it in by hand each time.

Let’s say we need to create a program that will allow us to quickly find the name of a super-villain. To start with, our program needs a database of super-villains. To download this data set, download and save this file:

super_villains.txt

These are random names generated by the nine.frenchboys.net website, although last I checked they no longer have a super-villain generator. They have other cool random name generators though.

Save this file and remember which directory you saved it to.

In the same directory as super_villains.txt, create, save, and run the following python program:

1
2
3
4
file = open("super_villains.txt")

for line in file:
    print(line)

There is only one new command in this code: open. Because it is a built-in function like print, there is no need for an import. Full details on this function can be found in the Python documentation but at this point the documentation for that command is so technical it might not even be worth looking at.

The above program has two problems with it, but it provides a simple example of reading in a file. Line 1 opens a file and gets it ready to be read. The name of the file is in between the quotes. The new variable file is an object that represents the file being read. Line 3 shows how a normal for loop may be used to read through a file line by line. Think of file as a list of lines, and the new variable line will be set to each of those lines as the program runs through the loop.

Try running the program. One of the problems with the it is that the text is printed double-spaced. The reason for this is that each line pulled out of the file and stored in the variable line includes the carriage return as part of the string. Remember the carriage return and line feed introduced back in Chapter 1? The print statement adds yet another carriage return and the result is double-spaced output.

The second problem is that the file is opened, but not closed. This problem isn’t as obvious as the double-spacing issue, but it is important. The Windows operating system can only open so many files at once. A file can normally only be opened by one program at a time. Leaving a file open will limit what other programs can do with the file and take up system resources. It is necessary to close the file to let Windows know the program is no longer working with that file. In this case it is not too important because once any program is done running, the Windows will automatically close any files left open. But since it is a bad habit to program like that, let’s update the code:

1
2
3
4
5
6
7
file = open("super_villains.txt")

for line in file:
    line = line.strip()
    print(line)

file.close()

The listing above works better. It has two new additions. On line 4 is a call to the strip method built into every String class. This function returns a new string without the trailing spaces and carriage returns of the original string. The method does not alter the original string but instead creates a new one. This line of code would not work:

line.strip()

If the programmer wants the original variable to reference the new string, she must assign it to the new returned string as shown on line 4.

The second addition is on line 7. This closes the file so that the operating system doesn’t have to go around later and clean up open files after the program ends.

19.2. Reading Into an Array

It is useful to read in the contents of a file to an array so that the program can do processing on it later. This can easily be done in python with the following code:

Read in a file from disk and put it in an array
1
2
3
4
5
6
7
8
9
# Read in a file from disk and put it in an array.
file = open("super_villains.txt")

name_list = []
for line in file:
    line = line.strip()
    name_list.append(line)

file.close()

This combines the new pattern of how to read a file, along with the previously learned pattern of how to create an empty array and append to it as new data comes in, which was shown back in Chapter 7. To verify the file was read into the array correctly a programmer could print the length of the array:

print( "There were",len(name_list),"names in the file.")

Or the programmer could bring the entire contents of the array:

for name in name_list:
    print(name)

Go ahead and make sure you can read in the file before continuing on to the different searches.

19.4. Linear Search Algorithm

Linear search
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# --- Linear search
key = "Morgiana the Shrew"

list_position = 0
while list_position < len(name_list) and name_list[list_position] != key:
    list_position += 1

if list_position < len(name_list):
    print("The name is at position", list_position)
else:
    print("The name was not in the list.")

The linear search is rather simple. Line 4 sets up an increment variable that will keep track of exactly where in the list the program needs to check next. The first element that needs to be checked is zero, so list_position is set to zero.

The next line is a bit more complex. The computer needs to keep looping until one of two things happens. It finds the element, or it runs out of elements. The first comparison sees if the current element we are checking is less than the length of the list. If so, we can keep looping. The second comparison sees if the current element in the name list is equal to the name we are searching for.

This check to see if the program has run out of elements must occur first. Otherwise the program will check against a non-existent element which will cause an error.

Line 6 simply moves to the next element if the conditions to keep searching are met in line 5.

At the end of the loop, the program checks to see if the end of the list was reached on line 8. Remember, a list of n elements is numbered 0 to n-1. Therefore if i is equal to the length of the list, the end has been reached. If it is less, we found the element.