search for books and compare prices
Tables of Contents for Spidering Hacks
Chapter/Section Title
Page #
Page Count
Credits
ix
 
Preface
xv
 
Walking Softly
1
20
A Crash Course in Spidering and Scraping
1
2
Best Practices for You and Your Spider
3
4
Anatomy of an HTML Page
7
3
Registering Your Spider
10
2
Preempting Discovery
12
3
Keeping Your Spider Out of Sticky Situations
15
3
Finding the Patterns of Identifiers
18
3
Assembling a Toolbox
21
78
Perl Modules
22
1
Resources You May Find Helpful
23
1
Installing Perl Modules
24
3
Simply Fetching with LWP::Simple
27
2
More Involved Requests with LWP::UserAgent
29
1
Adding HTTP Headers to Your Request
30
2
Posting Form Data with LWP
32
2
Authentication, Cookies, and Proxies
34
4
Handling Relative and Absolute URLs
38
2
Secured Access and Browser Attributes
40
2
Respecting Your Scrapee's Bandwidth
42
4
Respecting robots.txt
46
1
Adding Progress Bars to Your Scripts
47
6
Scraping with HTML::TreeBuilder
53
3
Parsing with HTML::TokeParser
56
3
WWW::Mechanize 101
59
3
Scraping with WWW::Mechanize
62
5
In Praise of Regular Expressions
67
3
Painless RSS with Template::Extract
70
4
A Quick Introduction to XPath
74
4
Downloading with curl and wget
78
2
More Advanced wget Techniques
80
2
Using Pipes to Chain Commands
82
4
Running Multiple Utilities at Once
86
3
Utilizing the Web Scraping Proxy
89
4
Being Warned When Things Go Wrong
93
3
Being Adaptive to Site Redesigns
96
3
Collecting Media Files
99
42
Detective Case Study: Newgrounds
99
6
Detective Case Study: iFilm
105
3
Downloading Movies from the Library of Congress
108
3
Downloading Images from Webshots
111
4
Downloading Comics with dailystrips
115
3
Archiving Your Favorite Webcams
118
4
News Wallpaper for Your Site
122
3
Saving Only POP3 Email Attachments
125
7
Downloading MP3s from a Playlist
132
5
Downloading from Usenet with nget
137
4
Gleaning Data from Databases
141
208
Archiving Yahoo! Groups Messages with yahoo2mbox
141
2
Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups
143
4
Gleaning Buzz from Yahoo!
147
3
Spidering the Yahoo! Catalog
150
7
Tracking Additions to Yahoo!
157
3
Scattersearch with Yahoo! and Google
160
4
Yahoo! Directory Mindshare in Google
164
4
Weblog-Free Google Results
168
3
Spidering, Google, and Multiple Domains
171
5
Scraping Amazon.com Product Reviews
176
2
Receive an Email Alert for Newly Added Amazon.com Reviews
178
2
Scraping Amazon.com Customer Advice
180
2
Publishing Amazon.com Associates Statistics
182
3
Sorting Amazon.com Recommendations by Rating
185
3
Related Amazon.com Products with Alexa
188
5
Scraping Alexa's Competitive Data with Java
193
1
Finding Album Information with FreeDB and Amazon.com
194
9
Expanding Your Musical Tastes
203
4
Saving Daily Horoscopes to Your iPod
207
2
Graphing Data with RRDTOOL
209
4
Stocking Up on Financial Quotes
213
4
Super Author Searching
217
15
Mapping O'Reilly Best Sellers to Library Popularity
232
3
Using All Consuming to Get Book Lists
235
6
Tracking Packages with FedEx
241
2
Checking Blogs for New Comments
243
5
Aggregating RSS and Posting Changes
248
7
Using the Link Cosmos of Technorati
255
4
Finding Related RSS Feeds
259
11
Automatically Finding Blogs of Interest
270
3
Scraping TV Listings
273
4
What's Your Visitor's Weather Like?
277
4
Trendspotting with Geotargeting
281
6
Getting the Best Travel Route by Train
287
3
Geographic Distance and Back Again
290
6
Super Word Lookup
296
4
Word Associations with Lexical Freenet
300
3
Reformatting Bugtraq Reports
303
5
Keeping Tabs on the Web via Email
308
6
Publish IE's Favorites to Your Web Site
314
8
Spidering GameStop.com Game Prices
322
3
Bargain Hunting with PHP
325
6
Aggregating Multiple Search Engine Results
331
4
Robot Karaoke
335
4
Searching the Better Business Bureau
339
3
Searching for Health Inspections
342
3
Filtering for the Naughties
345
4
Maintaining Your Collections
349
14
Using cron to Automate Tasks
349
2
Scheduling Tasks Without cron
351
4
Mirroring Web Sites with wget and rsync
355
4
Accumulating Search Results Over Time
359
4
Giving Back to the World
363
28
Using XML::RSS to Repurpose Data
364
4
Placing RSS Headlines on Your Site
368
3
Making Your Resources Scrapable with Regular Expressions
371
7
Making Your Resources Scrapable with a Rest Interface
378
3
Making Your Resources Scrapable with XML-RPC
381
4
Creating an IM Interface
385
4
Going Beyond the Book
389
2
Index
391