In our previous tutorial, we have shown how to setup the extraction rules. In this tutorial, we will move forward to show you how to define the source list.
Source List is what AnyPicker uses to identify the source of links to the detailed data page you would like to scrape. It can be the search result page or a list of the product pages. The source list is not required if you do not intend to extract data from different page hierarchies.
If you are only extracting data from the list of product page, you can leave the source list blanet.
Once you have picked the data to scrape in the detail product page, click the last page button to navigate to the previous page, as the source list is usually the previous page you used to navigate to the product detail page.
You will now need to tell AnyPicker the actual links to the product detail page you would like to scrape data from. Simply click on the LINK LIST SUGGESTION button to get a list of suggested links. AnyPicker does this automatically and can recognize all related links to the data page by grouping similarly patterned link groups.
All the links that might lead to the detailed data page will be highlighted.
This step is optional, AnyPicker is smart enough to grab all the related data in your source list. However, if you keep finding unwanted data in your scraping result file, you can come back here and delete the links that are not linking to the detailed data/product page you want to scrape data from.
Simply move your mouse to the left panel and click on the “-” button to remove links that are not appearing as the correct ones. AnyPicker will hightlight the links to better assist you to manually remove them.
Retail & eCommerce
Real Estate Data
Human Capital Data
Sales Leads Data
Travel & Hospitality
Finance & Stock Market
Turn-Key Scraping Solution
Leave a Comment