articlescradle.com articlescradle.com
Index About Us Privacy Policy Terms of Service Place Your Link Add Your Article
Search:   

 

Art & Culture

 

Investment & Finance

 

Fitness & Health

 

Property & Agents

 

Internet & Computers

 

Sports

 

Home Family & Garden

 

Self Enhancement

 

Relationship & Lifestyle

 

Medicine & Treatment

 

Teens & Kids

 

Travel & Accommodation

 

Companies & Business

 

Music & Entertainment

 

Online & Indoor Games

 

People & Communities

 

Food & Recipe

 

Online Shopping

 

News & Events

 

Education & Reference

 

Automobile & Automotive

 

Jobs & Careers

 

Technology & Science

 

Law & Politics

 

Index › Internet & Computers › Paid Software
 

Data Discovery vs. Data Extraction

 

Author: Todd Wilson

Looking at screen-scraping at a simplified level, there are two primary stages involved: data discovery and data extraction. Data discovery deals with navigating a web site to arrive at the pages containing the data you want, and data extraction deals with actually pulling that data off of those pages. Generally when people think of screen-scraping they focus on the data extraction portion of the process, but my experience has been that data discovery is often the more difficult of the two.

The data discovery step in screen-scraping might be as simple as requesting a single URL. For example, you might just need to go to the home page of a site and extract out the latest news headlines. On the other side of the spectrum, data discovery may involve logging in to a web site, traversing a series of pages in order to get needed cookies, submitting a POST request on a search form, traversing through search results pages, and finally following all of the details links within the search results pages to get to the data youre actually after. In cases of the former a simple Perl script would often work just fine. For anything much more complex than that, though, a commercial screen-scraping tool can be an incredible time-saver. Especially for sites that require logging in, writing code to handle screen-scraping can be a nightmare when it comes to dealing with cookies and such.

In the data extraction phase youve already arrived at the page containing the data youre interested in, and you now need to pull it out of the HTML. Traditionally this has typically involved creating a series of regular expressions that match the pieces of the page you want (e.g., URLs and link titles). Regular expressions can be a bit complex to deal with, so most screen-scraping applications will hide these details from you, even though they may use regular expressions behind the scenes.

As an addendum, I should probably mention a third phase that is often ignored, and that is, what do you do with the data once youve extracted it? Common examples include writing the data to a CSV or XML file, or saving it to a database. In the case of a live web site you might even scrape the information and display it in the users web browser in real-time. When shopping around for a screen-scraping tool you should make sure that it gives you the flexibility you need to work with the data once its been extracted.

Author Bio:
Todd Wilson is a well-known scripter. Todd likes to create articles about this industry.
You can also reach this article by using: Data Discovery vs. Data Extraction, Internet & Computers, Paid Software, web design software
 
 
 

Related Articles

 
6 Essential Tactics to Maximizing Your Affiliate Profits!
 
Top 7 Ways to Make Sure Google Adwords is Making You Money
 
Sexy Business Women And More Great Business Keywords
 
Affiliate Program, A Highly Lucrative Web Marketing
 
How To Make Money On A Music Website Through Affiliate Programs?
 
How To Cheaply Produce And Market Online Games
 
Microsoft Dynamics GP - Remote Support
 
Online Casino Affiliate Advertising
 
Top 5 Tips For Frustrated Affiliate Program Marketers
 
SEO What?
 
 
 
   Index >> Privacy Policy >> Terms of Service
© 2006-2008 www.articlescradle.com All Rights Reserved Worldwide.