Current location - Quotes Website - Personality signature - Scrapy crawler crawls B station video titles and links
Scrapy crawler crawls B station video titles and links

After studying for an afternoon, I have just successfully crawled out the video data and hyperlinks of station B (although the method is very clumsy). But it’s still very interesting. Let’s record the process here

The program uses scrapy. For details on the installation method, see /p/d2c8b1496949. Here you can directly use CMD to create a scrapy project. You only need to enter the scrapy startproject project name. , a new folder will be created in the current directory. After cd to the folder, enter scrapy genspider crawler name target URL to create your crawler file (such as: scrapy genspider sample /v/douga)

After successfully creating the crawler, you can edit it. You can see the opened crawler file (here is sample.py)

Next, you need to define the parse() function and use the Xpath selector to extract the web page. Inner tag content, Xpath helper can be used here to improve efficiency /v/digital (digital area)

/v/music (music area)

Although the method is very stupid, it finally ran successfully I was still very happy when I was working on it, and I will continue to study crawlers in the future. Stupid methods are not advisable after all.