How to Scrape a Search Engine Results Page for Your SEO Project

Posted June 29th, 2009 by Dale Stokdyk

Have you ever wanted to capture search results and import them into Excel (or favorite database) for a quick-and-dirty SEO analysis?

I’ll give you a couple of examples from my own experience. Sometimes I want to capture:

  • search engine results pageshow well does a site rank for keywords? who are competitors? how appealing are the page titles?
  • a web site’s indexed pageswhich pages are indexed? how many? any duplicate pages? what keywords are being used?

If I’m interested in only a few keywords or I’m examining a fairly small web site, I can usually jot down some observations and I’m good to go.

But that approach rapidly becomes unwieldy when I’m interested in, say, a dozen or more keywords. Or when I’m examining a site with more than 50 pages. That’s when it would be nice to download the (Google/Yahoo/Bing) search results and import them into a spreadsheet so I can sort, count, and arrange things to my heart’s content!

Harvesting the Web with Outwit Hub

My (some might say anal retentive) desire to capture everything in an Excel spreadsheet led me to OutWit Hub, a free extension for use with the Firefox web browser. Here’s how they describe it:

With OutWit Hub you can find, grab and organize all kinds of data and media from online sources. Automatically explore series of Web pages or search engine results to extract contacts, links, images, data, news, etc.

With a little tinkering, I found that I can use OutWit Hub to grab search results I’m interested in and save them to Excel (the two other export options are CSV and SQL). Let’s look at a few examples so you can judge for yourself how this tool might be useful.

Using OutWit Hub with “Site:” Search

With over 100 pages, a new e-commerce site I examined for SEO Audit of Gemstone Designs was a great opportunity to use OutWit Hub in combination with the site: command to inspect pages indexed by the major search engines.

To follow along, visit OutWit Technologies to download and then install OutWit Hub for Firefox. After restarting Firefox, you’ll see a new menu button for OutWit Hub in your browser menu:

new outwit hub button in firefox browser menu

We’ll start on Google’s home page and use the ‘Advanced Search’ option to select “100″ for “Results per page.” Then we’ll use the “site:” command to find all the indexed pages for a domain by searching for “site:your-domain-name-here.com.” For Gemstone Designs, Google displays 100 results on the first page and an additional 15 on the second, for a total of 115 indexed pages.

With the first 100 search results displayed, launch OutWit Hub by clicking on the button in your Firefox menu bar. I’ve already created a scraper for Google searches (more on this in a minute). Here’s how my Google scraper displays the Source URL, Page Title, Page Description, and Page Link (showing the first 11 of 100 lines):

outwit hub scraper window

We can select the desired rows and use the “Catch” button to capture the first 100 items. Navigate to the next search results page using the “next in series” button to pick up the remaining 15 search results, and then export these results using File > Export Catch As > Excel.

Investigate Site Rankings with Outwit Hub

Next, let’s use OutWit Hub with Microsoft’s Bing.com search engine to scrape search results for a keyword we’re interested in – let’s say “shoelaces” – and save them in Excel.

Once again, let’s collect more than just the top 10 results. In the upper right hand corner of the Bing home page select Extras > Preferences > Web settings (middle of page) > Results setting. Let’s select “50″ and then save this new preference.

After clicking on the OutWit Hub button, the scraper I’ve created for Bing does a good job of collecting the Page Title, Description and Link, but it also captured 5 paid ads with the organic search results. Not a problem! I’ll just exclude the last 5 lines from my selection when using “Catch” or when exporting directly to Excel.

scraper for bing.com search results

Create Your Own Scraper

guess and source buttons

Scrapers are specific to the page displaying the information, so you’ll need to create separate OutWit Hub scrapers for each search engine — one each for Google search results, Yahoo! Maps searches, and image searches on Bing, for example.

Fortunately, that’s not difficult. Sometimes using OutWit Hub’s “guess,” “list” or “table” data extractors will provide exactly what you’re looking for.

If not, select “source” to view the source code for a search engine’s results page and begin building your own custom scraper.

You do this by identifying unique markers in the source code that appear before and after the data you want.

Use the Editor, in the window to the right of the source code, to assemble your scraper. Here’s a picture of my Google scraper:

editor window for creating custom scrapers

For for more information on how to create a custom scraper, see OutWit Technologies’ Create Your First Scraper.

Basic Scrapers for Google, Yahoo and Bing

OutWit Hub makes it easy to download and share scrapers as xml files. But for these basic scrapers, I’m providing snapshots of the Editor window so you can quickly see the snippets I’ve used to create these scrapers.

Google Scraper

snippets to create a google scraper with outwit hub

Yahoo! Scraper

snippets to create a yahoo scraper with outwit hub

Bing Scraper

snippets to create a bing scraper with outwit hub

A Caveat or Two to Wrap This Up

Because search engines sometimes format certain results differently (for example, Wikipedia.org and YouTube entries in Google search results), these scrapers will not provide perfect results in every instance.

The Google scraper also stumbles on search results pages with ads. But there are work-arounds, including copying Google’s source code (View > Source Code) into a text file and deleting the ads before using Outwit Hub. Alternatively, use the Scroogle Scraper to get ad-free search results (but will also need to copy the source code for these results into a text file before scraping).

But for not-a-lot-of-money (did I mention that OutWit Hub is free?!?) and just a little effort, you too can harvest the web for SEO projects that you want to catch in a spreadsheet!

Resources Mentioned in this Post

Share or Bookmark this post:
  • RSS
  • Twitter
  • Sphinn
  • Digg
  • del.icio.us
  • Reddit
  • StumbleUpon
  • Technorati
  • Facebook
  • LinkedIn
  • FriendFeed
  • email

Related Posts

  1. SEO Audit of Gemstone Designs


View Comments to: “How to Scrape a Search Engine Results Page for Your SEO Project”

  1. How to Export Google Search Results to Excel responds:
    Posted: July 8th, 2009 at 8:11 am

    [...] here’s an absolutely fantastic post on scraping Google results teaching how to export search listings titles, descriptions and [...]

  2. How to Extract Any Web Page Information and Export it to Excel responds:
    Posted: July 17th, 2009 at 11:31 am

    [...] Here is a detailed info on creating your first scraper as well as the post where I found this cool tip. [...]

  3. Top Positions - How To Extract Any Web Page Information And Export It To Excel « TopPositions.org responds:
    Posted: July 18th, 2009 at 1:26 am

    [...] Here is a detailed info on creating your first scraper as well as the post where I found this cool tip. [...]

  4. How to scrape Search Engine Result Pages with OutWit Hub for SEO Audit (Video) | OutWitters' Blog responds:
    Posted: September 17th, 2009 at 8:41 am

    [...] Read the full tutorial on Marketing2OH [...]

  5. mrique responds:
    Posted: November 30th, 2009 at 6:32 am

    yes you could use Dapper.net, yahoo pipes or yahoo yql console to

  6. chris responds:
    Posted: December 11th, 2009 at 9:12 am

    spent bloody ages looking for an easy scraper, thanks!

  7. Mike Slowakei responds:
    Posted: January 16th, 2010 at 4:26 pm

    I do adore your style of writing and the ting I love most of all – is that tips that you give, especially about scrappers!
    Thanks for the really useful material of high quality published!

  8. Anonymous responds:
    Posted: August 18th, 2010 at 4:05 pm

    Thanks for great information. So far only managed to find a few sources with information concerning optimization tips for Chinese search market. Seems like the biggest concern for most of the people will be relevant and readable by humans Chinese content…
    seo

  9. Ajay responds:
    Posted: August 20th, 2010 at 4:44 am

    The programs and instructions that run a computer, as opposed to the actual physical machinery and devices that compose the hardware.
    anti spyware software download

  10. rickey Surname or initialgupta responds:
    Posted: August 21st, 2010 at 7:02 pm

    Thanks for sharing this application here. I needed it for my project. The reason i liked it because it is very simple and easy as you told above. I admire you for making very simple and useful application.
    Poker Reviews

  11. Optimator responds:
    Posted: August 23rd, 2010 at 1:02 pm

    Great tools, thx a lot!


Post a Comment

Enter Your Details:


You may write the following basic XHTML Strict in your comments:
<a href="" title=""></a> · <acronym title=""></acronym> · <abbr title=""></abbr>
<blockquote cite=""></blockquote> · <code></code> · <strong></strong> · <em></em>

  • If you’re a first-time commenter, your response will be moderated.
  • If your response includes a link, it will require moderator approval.
Enter Your Comments:
blog comments powered by Disqus


Note: This is the end of the usable page. The image(s) below are preloaded for performance only.