Previous Section  < Day Day Up >  Next Section

Hack 91. Remove Your Materials from Google

Remove your content from Google's various web properties.

Some people are more than thrilled to have Google index their sites. Other folks don't want the GoogleBot anywhere near them. If you fall into the latter category and the bot's already done its worst, there are several things you can do to remove your materials from Google's index. Each part of Google—Web Search, Google Images, and Google Groups—has its own set of methodologies.

8.12.1. Google Web Search

Here are several tips to avoid being listed.

8.12.1.1 Making sure your pages never get there to begin with

While you can take steps to remove your content from the Google index after the fact, it's always much easier to make sure the content is never found and indexed in the first place.

Google's crawler obeys the robot exclusion protocol, a set of instructions you put on your web site that tells the crawler how to behave when it comes to your content. You can implement these instructions in two ways: via a META tag that you put on each page (handy when you want to restrict access to only certain pages or certain types of content) or via a robots.txt file that you insert in your root directory (handy when you want to block some spiders completely or want to restrict access to kinds or directories of content). You can get more information about the robots exclusion protocol and how to implement it at http://www.robotstxt.org/.

8.12.1.2 Removing your pages after they're indexed

There are several things you can have removed from Google's results.

These instructions are for keeping your site out of Google's index only. For information on keeping your site out of all major search engines, you'll have to work with the robots exclusion protocol.



Removing the whole site

Use the robots exclusion protocol, probably with robots.txt.


Removing individual pages

Use the following META tag in the HEAD section of each page you want to remove:

<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">


Removing snippets

A snippet is the little excerpt of a page that Google displays on its search result. To remove snippets, use the following META tag in the HEAD section of each page for which you want to prevent snippets:

<META NAME="GOOGLEBOT" CONTENT="NOSNIPPET">


Removing cached pages

To prevent Google from keeping cached versions of your pages in its index, use the following META tag in the HEAD section of each page for which you want to prevent caching:

<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

8.12.1.3 Removing that content now

Once you implement these changes, the next time GoogleBot crawls your web site (usually within a few weeks), it will remove or limit your content according to your META tags and robots.txt file. If you want your materials removed right away, you can use the automatic remover at http://services.google.com:8882/urlconsole/controller. You'll have to sign in with an account (requires an email address and a password). Using the remover, you can request that Google crawl your newly created robots.txt file, or you can enter the URL of a page that contains exclusionary META tags.

Make sure that you have your exclusion tags all set up before you use this service. Going to all the trouble of getting Google to pay attention to a robots.txt file or exclusion rules that you've not yet set up will simply be a waste of your time.


8.12.1.4 Reporting pages with inappropriate content

While you may like your own content fine, you might find that, even if you have filtering activated, you're getting search results with explicit content. Or you might find a site with a misleading title tag and content completely unrelated to your search.

You have two options for reporting these sites to Google. Bear in mind that there's no guarantee that Google will remove the sites from the index, but they will investigate them. At the bottom of each page of search results, you'll see a "Dissatisfied? Help Us Improve" link; follow it to a form for reporting inappropriate sites. You can also send the URL of explicit sites that show up on a SafeSearch but probably shouldn't to safesearch@google.com. If you have more general complaints about a search result, you can send an email to search-quality@google.com.

8.12.2. Google Images

Google's Image database of materials is separate from that of the main search index. To remove items from Google Images, use robots.txt to specify that the GoogleBot Image crawler should stay away from your site. Add these lines to your robots.txt file:

User-agent: Googlebot-Image

Disallow: /

You can use the automatic remover mentioned in the web search section to have Google remove the images from its index database quickly.

There may be cases where someone has put images on their server for which you own the copyright. In other words, you don't have access to their server to add a robots.txt file, but you need to stop Google from indexing your content there. In this case, you need to contact Google directly. Google has instructions for situations just like this at http://www.google.com/remove.html; look at Option 2, "If you do not have any access to the server that hosts your image."

8.12.3. Google Groups

Like the Google Web Index, you have the option to both prevent material from being archived on Google and to remove it after the fact.

8.12.3.1 Preventing your material from being archived

To prevent your material from being archived on Google, add the following line to the headers of your Usenet posts:

X-No-Archive: yes

If you do not have the options to edit the headers of your post, make that line the first line in your post itself.

8.12.3.2 Removing materials after the fact

If you want materials removed after the fact, you have a couple of options:

  • If the materials that you want removed were posted under an address to which you still have access, you can use the automatic removal tool mentioned earlier in this hack.

  • If the materials that you want removed were posted under an address to which you no longer have access, you'll need to send an email to groups-support@google.com with the following information:

    • Your full name and contact information, including a verifiable email address.

    • The complete Google Groups URL or message ID for each message you want removed.

    • A statement that says, "I swear under penalty of civil or criminal laws that I am the person who posted each of the foregoing messages or am authorized to request removal by the person who posted those messages."

    • Your electronic signature.

8.12.4. Google Phonebook

You migt not want to have your contact information made available via the phonebook searches on Google. You'll have to follow one of two procedures, depending on whether the listing you want removed is for a business or for a residential number.

If you want to remove a business phone number, you'll need to send a request on your business letterhead to:

Google PhoneBook Removal
1600 Amphitheatre Parkway
Mountain View, CA 94043

Be sure to include a phone number so that Google can reach you to verify your request.

Removing a residential phone number is much simpler. Fill out the form at http://www.google.com/help/pbremoval.html. The form asks for your name, city and state, phone number, email address, and reason for removal, a multiple choice: incorrect number, privacy issue, or "other."

    Previous Section  < Day Day Up >  Next Section