Previous Section  < Day Day Up >  Next Section

Hack 28. Permute a Query

Run all permutations of query keywords and phrases to squeeze the last drop of results from the Google index.

Google, ah, Google. Search engine of over eight billion pages and zillions of possible results. If you're a search engine geek like I am, few things are more entertaining than trying various tweaks with your Google search to see what exactly makes a difference to the results.

It's amazing what makes a difference. For example, you wouldn't think that word order would make much of an impact, but it does. In fact, buried in Google's documentation is the admission that the word order of a query will impact search results.

While that's an interesting thought, who has time to generate and run every possible iteration of a multiword query? Google API to the rescue! This hack takes a query of up to four keywords or "quoted phrases" (as well as supporting special syntaxes) and runs all possible permutations, showing result counts by permutation and the top results for each permutation.

2.10.1. The Code

Save the following code as a CGI script ["How to Run the Hacks" in the Preface] named order_matters.cgi in your web site's cgi-bin directory. As you type in the script, be sure to replace insert key here with your Google API key.

You'll need to have the Algorithm::Permute Perl module for this program to work correctly (http://search.cpan.org/search?query=algorithm%3A%3Apermute&mode=all).


#!/usr/local/bin/perl

# order_matters.cgi

# Queries Google for every possible permutation of up to 4 query keywords,

# returning result counts by permutation and top results across permutations.

# order_matters.cgi is called as a CGI with form input

     

# Your Google API developer's key.

my $google_key='insert key here';

     

# Location of the GoogleSearch WSDL file.

my $google_wdsl = "./GoogleSearch.wsdl";

     

use strict;

     

use SOAP::Lite;

use CGI qw/:standard *table/;

use Algorithm::Permute;

     

print

  header( ),

  start_html("Order Matters"),

  h1("Order Matters"),

  start_form(-method=>'GET'),

  'Query: &nbsp; ', textfield(-name=>'query'),

  ' &nbsp; ',

  submit(-name=>'submit', -value=>'Search'), br( ),

  '<font size="-2" color="green">Enter up to 4 query keywords or "quoted phrases"</font>',

  end_form( ), p( );

     

if (param('query')) {

     

 # Glean keywords.

 my @keywords = grep !/^\s*$/,  split /([+-]?".+?")|\s+/, param('query');

     

 scalar @keywords > 4 and 

  print('<font color="red">Only 4 query keywords or phrases allowed.</font>'), last; 

     

 my $google_search = SOAP::Lite->service("file:$google_wdsl");

     

 print 

  start_table({-cellpadding=>'10', -border=>'1'}),

  Tr([th({-colspan=>'2'}, ['Result Counts by Permutation' ])]),

  Tr([th({-align=>'left'}, ['Query', 'Count'])]);

 

 my $results = {}; # keep track of what we've seen across queries

 

 # Iterate over every possible permutation.

 my $p = new Algorithm::Permute( \@keywords );

 while (my $query = join(' ', $p->next)) {

     

  # Query Google.

  my $r = $google_search -> 

   doGoogleSearch(

    $google_key, 

    $query,

    0, 10, "false", "",  "false", "", "latin1", "latin1"

   );

     print Tr([td({-align=>'left'}, [$query, $r->{'estimatedTotalResultsCount'}] )]);

  @{$r->{'resultElements'}} or next;

   

  # Assign a rank.

  my $rank = 10;

  foreach (@{$r->{'resultElements'}}) {

   $results->{$_->__CON_L_BRACKETCON_R_BRACKET_  _} = {

    title => $_->{title},

    snippet => $_->{snippet},

    seen => ($results->{$_->{URL}}->{seen}) + $rank

   };

   $rank--;

  }

}

     

print 

  end_table( ), p( ),

  start_table({-cellpadding=>'10', -border=>'1'}),

  Tr([th({-colspan=>'2'}, ['Top Results across Permutations' ])]),

  Tr([th({-align=>'left'}, ['Score', 'Result'])]);

     

foreach ( sort { $results->{$b}->{seen} <=> $results->{$a}->{seen} } keys %$results ) {

  print Tr(td([

    $results->{$_}->{seen},

    b($results->{$_}->{title}||'no title') . br( ) .

    a({href=>$_}, $_) . br( ) .

    i($results->{$_}->{snippet}||'no snippet')

  ]));

}

     

  print end_table( ),

}

print end_html( );

2.10.2. Running the Hack

Point your web browser at the CGI script order_matters.cgi on your web server. Enter the query you want to check (up to four words or phrases). The script will first search for every possible combination of the search words and phrases, as shown in Figure 2-4.

Figure 2-4. Permutations for applescript google api


The script will then display the top 10 search results across all permutations of the query, as shown in Figure 2-5.

Figure 2-5. Top results for permutations of applescript google api


At first blush, this hack looks like a novelty with few practical applications. But if you're a regular researcher or a web wrangler, you might find it of interest.

If you're a regular researcher—that is, there are certain topics that you research on a regular basis—you might want to spend some time with this hack and see if you can detect a pattern in how your regular search terms are impacted by changing word order. You might need to revise your searching so that certain words always come first or last in your query.

If you're a web wrangler, you need to know where your page appears in Google's search results. If your page loses a lot of ranking ground because of a shift in a query arrangement, maybe you want to add some more words to your text or shift your existing text.

    Previous Section  < Day Day Up >  Next Section