Have you ever wanted to capture search results and import them into Excel (or favorite database) for a quick-and-dirty SEO analysis?
I’ll give you a couple of examples from my own experience. Sometimes I want to capture:
- search engine results pages — how well does a site rank for keywords? who are competitors? how appealing are the page titles?
- a web site’s indexed pages — which pages are indexed? how many? any duplicate pages? what keywords are being used?
If I’m interested in only a few keywords or I’m examining a fairly small web site, I can usually jot down some observations and I’m good to go.
But that approach rapidly becomes unwieldy when I’m interested in, say, a dozen or more keywords. Or when I’m examining a site with more than 50 pages. That’s when it would be nice to download the (Google/Yahoo/Bing) search results and import them into a spreadsheet so I can sort, count, and arrange things to my heart’s content!
Harvesting the Web with Outwit Hub
My (some might say anal retentive) desire to capture everything in an Excel spreadsheet led me to OutWit Hub, a free extension for use with the Firefox web browser. Here’s how they describe it:
With OutWit Hub you can find, grab and organize all kinds of data and media from online sources. Automatically explore series of Web pages or search engine results to extract contacts, links, images, data, news, etc.
With a little tinkering, I found that I can use OutWit Hub to grab search results I’m interested in and save them to Excel (the two other export options are CSV and SQL). Let’s look at a few examples so you can judge for yourself how this tool might be useful.
Using OutWit Hub with “Site:” Search
With over 100 pages, a new e-commerce site I examined for SEO Audit of Gemstone Designs was a great opportunity to use OutWit Hub in combination with the site: command to inspect pages indexed by the major search engines.
To follow along, visit OutWit Technologies to download and then install OutWit Hub for Firefox. After restarting Firefox, you’ll see a new menu button for OutWit Hub in your browser menu:
We’ll start on Google’s home page and use the ‘Advanced Search’ option to select “100″ for “Results per page.” Then we’ll use the “site:” command to find all the indexed pages for a domain by searching for “site:your-domain-name-here.com.” For Gemstone Designs, Google displays 100 results on the first page and an additional 15 on the second, for a total of 115 indexed pages.
With the first 100 search results displayed, launch OutWit Hub by clicking on the button in your Firefox menu bar. I’ve already created a scraper for Google searches (more on this in a minute). Here’s how my Google scraper displays the Source URL, Page Title, Page Description, and Page Link (showing the first 11 of 100 lines):
We can select the desired rows and use the “Catch” button to capture the first 100 items. Navigate to the next search results page using the “next in series” button to pick up the remaining 15 search results, and then export these results using File > Export Catch As > Excel.
Investigate Site Rankings with Outwit Hub
Next, let’s use OutWit Hub with Microsoft’s Bing.com search engine to scrape search results for a keyword we’re interested in – let’s say “shoelaces” – and save them in Excel.
Once again, let’s collect more than just the top 10 results. In the upper right hand corner of the Bing home page select Extras > Preferences > Web settings (middle of page) > Results setting. Let’s select “50″ and then save this new preference.
After clicking on the OutWit Hub button, the scraper I’ve created for Bing does a good job of collecting the Page Title, Description and Link, but it also captured 5 paid ads with the organic search results. Not a problem! I’ll just exclude the last 5 lines from my selection when using “Catch” or when exporting directly to Excel.
Create Your Own Scraper
Scrapers are specific to the page displaying the information, so you’ll need to create separate OutWit Hub scrapers for each search engine — one each for Google search results, Yahoo! Maps searches, and image searches on Bing, for example.
Fortunately, that’s not difficult. Sometimes using OutWit Hub’s “guess,” “list” or “table” data extractors will provide exactly what you’re looking for.
If not, select “source” to view the source code for a search engine’s results page and begin building your own custom scraper.
You do this by identifying unique markers in the source code that appear before and after the data you want.
Use the Editor, in the window to the right of the source code, to assemble your scraper. Here’s a picture of my Google scraper:
For for more information on how to create a custom scraper, see OutWit Technologies’ Create Your First Scraper.
Basic Scrapers for Google, Yahoo and Bing
OutWit Hub makes it easy to download and share scrapers as xml files. But for these basic scrapers, I’m providing snapshots of the Editor window so you can quickly see the snippets I’ve used to create these scrapers.
A Caveat or Two to Wrap This Up
Because search engines sometimes format certain results differently (for example, Wikipedia.org and YouTube entries in Google search results), these scrapers will not provide perfect results in every instance.
The Google scraper also stumbles on search results pages with ads. But there are work-arounds, including copying Google’s source code (View > Source Code) into a text file and deleting the ads before using Outwit Hub. Alternatively, use the Scroogle Scraper to get ad-free search results (but will also need to copy the source code for these results into a text file before scraping).
But for not-a-lot-of-money (did I mention that OutWit Hub is free?!?) and just a little effort, you too can harvest the web for SEO projects that you want to catch in a spreadsheet!