Imagine going to a bookstore and finding an interesting book. How does a person do such a thing? One possibility would be for this person to pick up the first book ey sees, read it cover to cover, and then do the same with the next book, and so on. If you’re like most people, you’ll consider this an absurd algorithm.
This algorithm is an example of a depth-first search. The idea is as follows: go as far as you can along the first path you find; when you can’t go any further, back up until a new unexplored path shows up. Keep going until you’ve explored all possible paths, or at least as many as necessary.
The depth-first search doesn’t work very well when looking at large data sets because it takes a long time to explore each path in its entirety.
A better plan is to look at a lot of data in a cursory manner and try to generate clues about which avenues are worthy of further exploration. Returning to the bookstore example, we might try first looking for a section that interests us. If one wishes to read a popular science book, for example, ey would be advised to start in the popular science section of the bookstore.
So, the first clue would be to start in the right section. But there are probably still many books in that section, far too many to read them all cover to cover, one at a time. So ey would look for further clues. Some of the books have titles that sound intriguing, so ey might take a few of those books off the shelf. Still, there are likely to be too many to read them all cover to cover. To get further clues, ey will probably read a few pages in each of several books to determine which ones have a writing style consistent with eir own tastes. At that point, if ey is determined to buy exactly one, the selection may be somewhat random. But at least ey has narrowed the selection down to a few reasonable possibilities and has selected what is probably a pretty good fit, if not necessarily the optimal one
This process is an example of a parallel terraced scan, as introduced by Douglas Hofstadter and John Rehling for use in artificial intelligence projects. I read about it for the first time in Hofstadter’s essay "The Architecture of Jumbo," which I’ll probably write more about soon. (Interesting piece of trivia: The book containing this essay, Fluid Concepts and Creative Analogies, was the first book sold on Amazon.) The parallel terraced scan is a process that humans find easy and do naturally, without thinking about it. The way it works is just as described above: in the first pass, the scanner (who I will think of as being a person, although it could be a computer as well) very quickly looks through a lot of options and eliminates most of them. On the second pass, the scanner looks at the remaining options a bit more carefully and eliminates most of those. The process continues until the number of options remaining is small enough to give every option a fairly careful analysis. Finally, one very promising option is selected.