标签归档:scrapy

使用Splash解决爬取页面时需要执行JS问题

写 Scrapy 爬虫时,遇到了 js 进行跳转的页面,大家有没有好的解决方法?

答案是:

splash

Splash is a javascript rendering service with an HTTP API. It’s a lightweight browser with an HTTP API, implemented in Python 3 using Twisted and QT5.

It’s fast, lightweight and state-less which makes it easy to distribute.

Documentation Documentation is available here: https://splash.readthedocs.io/

scrapy-splash

This library provides Scrapy and JavaScript integration using Splash. The license is BSD 3-clause.

参考:

scrapy爬虫相关资料

Web Scraping and Crawling With Scrapy and MongoDB

django-dynamic-scraper

Django Dynamic Scraper (DDS) is an app for Django build on top of the scraping framework Scrapy. While preserving many of the features of Scrapy it lets you dynamically create and manage spiders via the Django admin interface.

Indexing web sites in Solr with Python

Scrapy at a glance

Building a Web Crawler with Scrapy