人気ブログランキング | 話題のタグを見る

前回Octoparseというツールを紹介し、そのツールの登録、ダウンロード、インストール、データ抽出などの利用方法を紹介しました。(前回の内容の詳細については、こちらをご覧下さい。)今回は、Octoparseをもっと理解して頂くために、主な特長、具体例による使用方法および幾つかの拡張機能を紹介します。

https://nelog.jp/octoparse#Octoparse


目次
1.概要
2.Octoparseの主な特長のご紹介
3.具体例による使用方法のご説明
4.Octoparseの拡張機能のご紹介
5.まとめ



Octoparse is , simple and very visual and easy to understand Web scraper is, no knowledge of much programming even those who have, Web data from the collect can be extracted.


brand

Octoparse

Customer support

Facebook community , phone, e-mail, Skype

price

$ 75 ~ (provided free version)

Trial period

5 days (Pro version )

Operating system

Windows XP, 7, 8, 9, 10

Data export format

CSV, Excel, Txt, Html,

Database (SqlServer, MySql, Oracle)

Multithread

Yes (unlimited)

API (application programming interface)

Present

Scheduling

Present

Cloud service

Present

2. Introduction of main feature of Octoparse

(1) Simple web scraping by clicking and dragging

Octoparse is a tool that allows all users to use Web scraping . That interface is a pane (area) of the operation screen which the user can understand very visually. Basically, " click ", in the "points" and "drag", an existing Web site 98% you can create a very functional workflow to scrape.

Octoparse:ノンプログラマーのためのスクレイピングツール_c0389313_16070800.png

(2) Responding to dynamic Web sites

More complex scraping for, for example, data of mutual exchange type Web site on JavaScript using when loaded , Octoparse is able to provide a solution in all cases below.

HTML within the captured data that has been hidden in

etc.

Octoparse has all of the user data crawling can has been designed to. Octoparse has a built-in XPath and RegEx tool utilizing the by, the developer, of course, also in the people of non-developers, Web a single one of the elements on the page easily you can complete the verification. ( Please see the direct extension page . )

(3) Support

For users who are using the free version, Facebook of O Ctoparse group of help see . I think group members of that community will cooperatively explain with enthusiasm . In addition, O Ctoparse support contact option is to it , you might take a long time to support .

For users who are using the paid version, the O ctoparse team will take precedence and support through telephone , e-mail and Skype .

3. Explanation of usage by concrete example

In the above, I briefly introduced the main features of Octoparse . Here , in case you want to know further, to create a scenario, it describes a specific example.

You are , we in Tokyo have moved just young employees Imagine that it should. First, it should be solved , apartments the Sagasuko is and I'm? Since there are many information on the rental apartment on the Internet , I do not think which rental apartment to choose. Here, if you have a list of rented apartments that are organized, you can compare them more easily, right? Octoparse Waso in the case, such as the role to stand to become one the best tools I think.

suumo.jp the real estate and rental related to housing a maximum of comprehensive information site, investment house , a new employee and the demand for housing to people for many offers information. Yourself, Shibuya Station , Shinjuku Station , from Harajuku Station 15 minutes within, rent is 15 Man yen below the apartment looking for assuming that, from now Octoparse by scraping to try Let's.

Step 1 . Basic Information of the set.

Click " Quick Start " . New Task (Advanced Mode) a click and then. Complete Basic Information .

Octoparse:ノンプログラマーのためのスクレイピングツール_c0389313_16080994.png

Step 2 . You want to find in a browser that is built-in Web and navigate to the site.

Enter the URL you want to search in the built-in web browser . " G O " to open the site by clicking on the you can.

Example URL :

http://suumo.jp/jj/chintai/ichiran/FR301FC005/?ar=030&bs=040&pc=100&smk=&po1=00&po2=99&shkr1=03&shkr2=03&shkr3=03&shkr4=03&rn=0005&ek=000517640&ek=000531250&ek=000519670&ra=013&cb=0.0 & ct = 15.0 & et = 15 & mb = 0 & mt = 9999999 & cn = 9999999 & fw 2 =

Octoparse:ノンプログラマーのためのスクレイピングツール_c0389313_16083587.png

Step 3 . Set pagination.

"Next " (pagination link) click to . " Loop click the element " to choose.

Octoparse:ノンプログラマーのためのスクレイピングツール_c0389313_16225978.png

Step 4 . Create a list of items.

Drag " Loop item " to Workflow . " Variable list " to select. D Paste the following XP ath in the blank next to " Variable list " below it. " Save " click to .

X P Ath : // Div [Atto Class = 'Property_group'] / Div ( XPath For more information on , click here please visit.)

Octoparse:ノンプログラマーのためのスクレイピングツール_c0389313_16422620.png

Step 5 . Extract search results .

Title part extraction will . ➜ Click the title . " Extract text ," the select . Other contents can be extracted in the same way .

Octoparse:ノンプログラマーのためのスクレイピングツール_c0389313_16422620.png

Step 6 . Correct the name of the extracted Data Field .

When all Data Fields are extracted, they are automatically named. D If you want to modify the name, click " Field Name " and modify it.

Octoparse:ノンプログラマーのためのスクレイピングツール_c0389313_16443324.png


Step 7 . Fix XPath of pagination .

O Ctoparse configured default in XPath in, because it can not correctly place the item "to the next", XPath you need to modify the. The modified XPath is as follows.

// P [@ class = 'pagination -parts'] / A [contains (text (), ' Next ')] ( XPath For more information about , here please visit.)

Octoparse:ノンプログラマーのためのスクレイピングツール_c0389313_16454759.png


Step 8. Execute the extractor.

" Next ," the click and then. Click " Next " . Click " Local Extraction " . Click " OK " to execute the task on the computer. Octoparse is , you specify all the data automatically to specific extraction will .

Octoparse:ノンプログラマーのためのスクレイピングツール_c0389313_16465338.png


Once all of the above steps are completed , you will get the following classified data as shown below.

Octoparse:ノンプログラマーのためのスクレイピングツール_c0389313_16471885.png


4. Summary

Octoparse is a feature rich, visually understandable web scraping tool. In particular, we can definitely support in that non-technical users can easily scrape Web . The Octoparse software is excellent and versatile, so you can scrape most dynamic sites quite easily. Also, this price with a free plan supporting unlimited web page scraping is obviously "wallet friendly" with a free plan. From the above, Octoparse is definitely worth a try .


# by octoparse | 2017-10-18 17:07

(post from https://www.octoparse.com/tutorial/extract-amazon-data/)


Click HERE To Download The .Otd File Before You Get Started. The Extraction Rule Of This Task Is Stored In This .Otd File.

Step 1. Download Octoparse and install it. Register a new account at www.octoparse.com . Or directly click the "Sign up" option the Login interface.

Step 2. Click "start" to build a new task. / Hit the "Quick start" button in Navigation Panel to Create a new task. (Here we use Advanced Mode.)

Step 3. Complete basic information. ➜ Click "Next".

Step 4. Design Workflow to configure the extraction rule. You can check your configuration rule in Workflow Designer here if something goes wrong.

Wait until the page loaded, click the first subcategory. ➜ Choose "create a list of items".

Select "Add current item to the list" ➜ "Continue to edit the list" ➜ Click the second subcategory.

Select "Add current item to the list" again.

When you get all subcategory links, click "Finish Creating List". ➜ Select "Loop" to process the list.

Step 7. Now you can see it automatically enter the first category page

Click "Next Page" ➜ "Loop click next page" to create a loop action to the web pages. The action of pagination has been added to the extraction rule.

Cho </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> </s> a list of items "again.

Then click "Add current item to the list" ➜ "Continue to edit the list".

➜ Click "the current product to the list" ➜ "Finish Creating List"

As can be seen, all the detail links on the first page are all here. And Click "loop" to process the list.

Step 8. Now extract any information you need. Click on the product title to extract it.

Click "Extract Text".

D on Next "Extract Text". And you get the product title and price in the Customize Current Action box.

You can change your field name right here.

Same way goes to other information. Select what you want to extract!

Step 9. Now look at the Workflow designer.

Drag the second "Loop Item" before "Click to paginate" action.

Step 10. Now we are done configuring extraction rule! Click on "Next" to process configured rule. When images are not needed, you can choose not to load images to speed up the extraction.

Now the Task is completed! Choose the "Local extraction" to run the task on your computer.

Data Extracted Will The Be Shown In "Data Extracted" Pane. Click Button To Export The Results To Excel File, Databases Or Other Formats And Save The File To Your Computer.


Happy Data Hunting!


Author: The Octoparse Team


# by octoparse | 2017-09-22 19:33