人気ブログランキング | 話題のタグを見る

How to Extract Product Information from Amazon

(post from https://www.octoparse.com/tutorial/extract-amazon-data/)


Click HERE To Download The .Otd File Before You Get Started. The Extraction Rule Of This Task Is Stored In This .Otd File.

Step 1. Download Octoparse and install it. Register a new account at www.octoparse.com . Or directly click the "Sign up" option the Login interface.

Step 2. Click "start" to build a new task. / Hit the "Quick start" button in Navigation Panel to Create a new task. (Here we use Advanced Mode.)

Step 3. Complete basic information. ➜ Click "Next".

Step 4. Design Workflow to configure the extraction rule. You can check your configuration rule in Workflow Designer here if something goes wrong.

Wait until the page loaded, click the first subcategory. ➜ Choose "create a list of items".

Select "Add current item to the list" ➜ "Continue to edit the list" ➜ Click the second subcategory.

Select "Add current item to the list" again.

When you get all subcategory links, click "Finish Creating List". ➜ Select "Loop" to process the list.

Step 7. Now you can see it automatically enter the first category page

Click "Next Page" ➜ "Loop click next page" to create a loop action to the web pages. The action of pagination has been added to the extraction rule.

Cho </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> a list of items "again.

Then click "Add current item to the list" ➜ "Continue to edit the list".

➜ Click "the current product to the list" ➜ "Finish Creating List"

As can be seen, all the detail links on the first page are all here. And Click "loop" to process the list.

Step 8. Now extract any information you need. Click on the product title to extract it.

Click "Extract Text".

D on Next "Extract Text". And you get the product title and price in the Customize Current Action box.

You can change your field name right here.

Same way goes to other information. Select what you want to extract!

Step 9. Now look at the Workflow designer.

Drag the second "Loop Item" before "Click to paginate" action.

Step 10. Now we are done configuring extraction rule! Click on "Next" to process configured rule. When images are not needed, you can choose not to load images to speed up the extraction.

Now the Task is completed! Choose the "Local extraction" to run the task on your computer.

Data Extracted Will The Be Shown In "Data Extracted" Pane. Click Button To Export The Results To Excel File, Databases Or Other Formats And Save The File To Your Computer.


Happy Data Hunting!


Author: The Octoparse Team


by octoparse | 2017-09-22 19:33