Previous Section  < Day Day Up >  Next Section

Hack 36. Search Google Topics

Run queries against some of the available Google API specialty topics.

Google doesn't talk about it much, but it does make specialty web searches available. And I'm not just talking about searches limited to a certain domain. I'm talking about searches that are devoted to a particular topic (http://www.google.com/options/specialsearches.html). The Google API makes four of these searches available: the U.S. Government, Linux, BSD, and Macintosh.

In this hack, we'll look at a program that takes a query from a form and provides a count of that query in each specialty topic, as well as a count of results for each topic. This program runs via a form.

2.18.1. Why Topic Search?

Why would you want to topic search? Because Google currently indexes over eight billion pages. If you try to do more than very specific searches, you might find yourself with far too many results. If you narrow down your search by topic, you can get good results without having to exactly zero in on your search.

You can also use it to do some decidedly unscientific research. Which topic contains more iterations of the phrase "open source"? Which contains the most pages from .edu (educational) domains? Which topic, Macintosh or FreeBSD, has more on user interfaces? Which topic holds the most for Monty Python fans?

2.18.2. The Code

Save the following code as a CGI script ["How to Run the Hacks" in the Preface] named gootopic.cgi in the cgi-bin directory on your web server:

#!/usr/local/bin/perl

# gootopic.cgi

# Queries across Google Topics (and All of Google), returning 

# number of results and top result for each topic.

# gootopic.cgi is called as a CGI with form input

     

# Your Google API developer's key.

my $google_key='insert key here';

     

# Location of the GoogleSearch WSDL file.

my $google_wdsl = "./GoogleSearch.wsdl";

     

# Google Topics

my %topics = (

  ''       => 'All of Google',

  unclesam => 'U.S. Government',

  linux    => 'Linux',

  mac      => 'Macintosh',

  bsd      => 'FreeBSD'

);

     

use strict;

     

use SOAP::Lite;

use CGI qw/:standard *table/;

     

# Display the query form.

print

  header( ),

  start_html("GooTopic"),

  h1("GooTopic"),

  start_form(-method=>'GET'),

  'Query: ', textfield(-name=>'query'), ' &nbsp; ',

  submit(-name=>'submit', -value=>'Search'),

  end_form( ), p( );

     

my $google_search  = SOAP::Lite->service("file:$google_wdsl");

     

# Perform the queries, one for each topic area.

if (param('query')) {

  print 

    start_table({-cellpadding=>'10', -border=>'1'}),

    Tr([th({-align=>'left'}, ['Topic', 'Count', 'Top Result'])]);

     

  foreach my $topic (keys %topics) {

     

    my $results = $google_search -> 

      doGoogleSearch(

        $google_key, param('query'), 0, 10, "false", $topic,  "false",

        "", "latin1", "latin1"

      );

     

    my $result_count = $results->{'estimatedTotalResultsCount'};

     

    my $top_result = 'no results';

     

    if ( $result_count ) {

      my $t = @{$results->{'resultElements'}}[0];

      $top_result = 

        b($t->{title}||'no title') . br( ) .

        a({href=>$t->{URL}, $t->{URL}}) . br( ) .

        i($t->{snippet}||'no snippet');

    }

   

    # Output

    print Tr([ td([

      $topics{$topic},

      $result_count,

      $top_result

      ])

    ]);

  }

     

  print 

    end_table( ),

}

     

print end_html( );

Be sure to replace insert key here with your Google API key.

2.18.3. Running the Hack

Point your web browser at gootopic.cgi.

Provide a query and the script will search for your query in each special topic area, providing you with an overall ("All of Google") count, topic area count, and the top result for each. Figure 2-11 shows a sample run for "user interface", with Macintosh (surprisingly) not coming out on top.

Figure 2-11. Topic search for "user interface"


2.18.4. Search Ideas

Trying to figure out how many pages each topic finds for particular top-level domains (e.g., .com, .edu, .uk) is rather interesting. You can query for inurl:xx site:xx, where xx is the top-level domain you're interested in. For example, inurl:va site:va searches for any of the Vatican's pages in the various topics; there aren't any. inurl:mil site:mil finds an overwhelming number of results in the U.S. Government special topic—no surprise there.

If you are in the mood for a party game, try to find the weirdest possible searches that appear in all the special topics.

    Previous Section  < Day Day Up >  Next Section