Previous Section  < Day Day Up >  Next Section

Hack 38. Perform Proximity Searches

GAPS performs a proximity check between two words.

Sometimes it would be advantageous to search both forward and backward. For example, if you're doing genealogy research, you might find your uncle John Smith as both "John Smith" or "Smith John." Similarly, some pages might include John's middle initial—"John Q Smith" or "Smith John Q."

If all you're after is query permutations, [Hack #28] might do the trick.


You might also need to find concepts that exist near each other but don't make up a phrase. For example, you might want to learn about keeping squirrels out of your bird feeder. Various attempts to create a phrase based on this idea might not work, but just searching for several words might not find specific enough results.

GAPS, created by Kevin Shay, allows you to run searches both forward and backward and within a certain number of spaces of each other. GAPS stands for Google API Proximity Search, and that's exactly what this application is: a way to search for topics within a few words of each other without having to run several queries in a row. The program runs the queries and automatically organizes the results.

You enter two terms (there is an option to add more terms that will not be searched for in proximity) and specify how far apart you want them (1, 2, or 3 words). You can specify that the words be found only in the order you request (wordA, wordB) or in either order (wordA, wordB, and wordB, wordA). You can specify how many results you want and in what order they appear (sorted by title, URL, ranking, and proximity).

Search results are formatted much like regular Google results, only a distance ranking is included beside each title. The distance ranking, between one and three, specifies how far apart the two query words were on the page. Figure 2-12 shows a GAPS search for google and hacks within two words of one another, order intact.

Figure 2-12. GAPS search for "google" and "hacks" within two words of one another


Click the distance rating link to pass the generated query on to Google directly.

2.20.1. Making the Most of GAPS

GAPS works best when you have words on the same page that are ambiguously or not at all related to one another. For example, if you're looking for information on Google and search engine optimization (SEO), you might find that searching for the words Google and SEO doesn't find the results that you want, while using GAPS to search for the words Google and SEO within three words of each other finds material focused much more on search engine optimization for Google.

GAPS also works well when you're searching for information about two famous people who might often appear on the same page, though not necessarily in proximity to each other. For example, you might want information on Bill Clinton and Alan Greenspan, but might find that you're getting too many pages that happen to list the two of them. By searching for their names in proximity to each other, you'll get better results.

Finally, you might find GAPS useful in medical research. Many times your search results will include index pages that list several symptoms. However, including symptoms or other medical terms within a few words of each other can help you find more relevant results. Note that this technique will take some experimentation. Many pages about medical conditions contain long lists of symptoms and effects, and there's no reason that one symptom might be within a few words of another.

2.20.2. The Code

The GAPS source code is rather lengthy, so we're not making it available here. You can, however, get it online at http://www.staggernation.com/gaps/readme.html.

2.20.3. See Also

If you like GAPS, you might want to try a couple of other scripts from Staggernation:


GAWSH (http://www.staggernation.com/gawsh)

Stands for Google API Web Search by Host. This program allows you to enter a query and get a list of domains that contain information on that query. If you click on the triangle beside any domain name, you'll get a list of pages in that domain that match your query. This program uses DHTML, which means that it'll only work with Internet Explorer or Mozilla/Netscape.


GARBO (http://www.staggernation.com/garbo)

Stands for Google API Relation Browsing Outliner. Like GAWSH, this program uses DHTML, so it'll work only with Mozilla/Netscape and Internet Explorer. When you enter a URL, GARBO will do a search for either pages that link to the URL you specify or pages related to that URL. Run a search and you'll get a list of URLs with triangles beside them. If you click on a triangle, you'll get a list of pages that either link to the URL you chose or are related to the URL you chose, depending on what you chose in the initial query.

    Previous Section  < Day Day Up >  Next Section