Symfony bundle for Roach PHP

Overview
roach-php-bundle

roach-php-bundle

Symfony bundle for Roach PHP.

Latest Stable Version PHP Version Require Tests Build Status Scrutinizer Code Quality Code Coverage License

Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python.

The Symfony bundle mostly provides the necessary container bindings for the various services Roach uses, as well as making certain configuration options available via a config file. To learn about how to actually start using Roach itself, check out the rest of the documentation.

Installing the Symfony bundle

Add nelexa/roach-php-bundle to your composer.json file:

composer require nelexa/roach-php-bundle

Register the bundle:

Register bundle into config/bundles.php (Flex did it automatically):

return [
    //...
    \Nelexa\RoachPhpBundle\RoachPhpBundle::class => ['all' => true],
];

Available Commands

The Symfony bundle of Roach registers a few console commands to make out development experience as pleasant as possible.

Run spider

php bin/console roach:run

After that, you will get the entire list of available spiders.


 Choose a spider class:
  [0] App\Spider\GoogleSpider
  [1] App\Spider\FacebookSpider
  [2] App\Spider\TwitterSpider

Simply select the desired spider (▼ or ▲) or enter its number and press Enter.

You can pass as the first argument the name spider class name to run or its alias. For example, if you have a class App\Spider\GoogleSpider, then you can pass the following aliases: GoogleSpider, google_spider or google.

php bin/console roach:run google

Sometimes it is useful to override the number of concurrent requests and the pre-request delay. To do this, you can pass the --concurrency and --delay options.

php bin/console roach:php google --concurrency 8 --delay 2

These options override the $concurrency and $requestDelay public properties of your spider.

Starting the REPL

Roach ships with an interactive shell (often called Read-Evaluate-Print-Loop, or Repl for short) which makes prototyping our spiders a breeze. We can use the provided roach:shell command to launch a new Repl session.

php bin/console roach:shell "https://roach-php.dev/docs/introduction"

Generator classes

First install Symfony MakerBundle.

composer require --dev symfony/maker-bundle

Create a new roach spider class

php bin/console make:roach:spider

Create a new roach extension class

php bin/console make:roach:extension

Create a new roach item processor class

php bin/console make:roach:item:processor

Create a new roach downloader request middleware class

php bin/console make:roach:middleware:downloader:request

Create a new roach downloader response middleware class

php bin/console make:roach:middleware:downloader:response

Create a new roach spider item middleware class

php bin/console make:roach:middleware:spider:item

Create a new roach spider request middleware class

php bin/console make:roach:middleware:spider:request

Create a new roach spider response middleware class

php bin/console make:roach:middleware:spider:response

Screencast

asciicast

Credits

Changelog

Changes are documented in the releases page.

License

The MIT License (MIT). Please see LICENSE for more information.

You might also like...
Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性,基于 Workerman

Beanbun 是用 PHP 编写的多进程网络爬虫框架,具有良好的开放性、高可扩展性,基于 Workerman

PHP scraper for ZEE5 Live Streaming URL's Using The Channel ID and Direct Play Anywhere
PHP scraper for ZEE5 Live Streaming URL's Using The Channel ID and Direct Play Anywhere

It can scrape ZEE5 Live Streaming URL's Using The Channel ID and Direct Play Anywhere

PHP library to Scrape website into entity easily

Scraper Scraper can handle multiple request type and transform them into object in order to create some API. Installation composer require rem42/scrap

Roach is a complete web scraping toolkit for PHP

🐴 Roach A complete web scraping toolkit for PHP About Roach is a complete web scraping toolkit for PHP. It is heavily inspired (read: a shameless clo

Roach-example-project - Example project to demonstrate how to use RoachPHP in a Laravel project.

Example repository to illustrate how to use roach-php/laravel in a Laravel app. Check app/Spiders/FussballdatenSpider.php for an example spider that c

Airbrake.io & Errbit integration for Symfony 3/4/5. This bundle plugs the Airbrake API client into Symfony project

AmiAirbrakeBundle Airbrake.io & Errbit integration for Symfony 3/4/5. This bundle plugs the Airbrake API client into Symfony project. Prerequisites Th

This plugin integrates cache functionality into Guzzle Bundle, a bundle for building RESTful web service clients.

Guzzle Bundle Cache Plugin This plugin integrates cache functionality into Guzzle Bundle, a bundle for building RESTful web service clients. Requireme

Symfony React Blank is a blank symfony and react project, use this template to start your app using Symfony as an backend api and React as a frontend library.

Symfony React Blank Symfony React Blank is a blank symfony and react project, use this template to start your app using Symfony as an backend api and

Pure PHP implementation of GraphQL Server – Symfony Bundle

Symfony GraphQl Bundle This is a bundle based on the pure PHP GraphQL Server implementation This bundle provides you with: Full compatibility with the

This bundle provides tools to build a complete GraphQL server in your Symfony App.

OverblogGraphQLBundle This Symfony bundle provides integration of GraphQL using webonyx/graphql-php and GraphQL Relay. It also supports: batching with

DataTables bundle for Symfony

Symfony DataTables Bundle This bundle provides convenient integration of the popular DataTables jQuery library for realtime Ajax tables in your Symfon

GraphQL Bundle for Symfony 2.

Symfony 2 GraphQl Bundle Use Facebook GraphQL with Symfony 2. This library port laravel-graphql. It is based on the PHP implementation here. Installat

An Unleash bundle for Symfony applications to provide an easy way to use feature flags

Unleash Bundle An Unleash bundle for Symfony applications. This provide an easy way to implement feature flags using Gitlab Feature Flags Feature. Ins

Symfony Health Check Bundle Monitoring Project Status

Symfony Health Check Bundle Version Build Status Code Coverage master develop Installation Step 1: Download the Bundle Open a command console, enter y

A Symfony Feature Flag Bundle which easily allows you to configure and use your favorite feature flag provider.

Metro Markets FF Metro Markets FF is a Feature Flag Symfony Bundle. It easily allows you to configure and use your favorite feature flag provider. Ins

 A Symfony bundle built to schedule/consume repetitive tasks
A Symfony bundle built to schedule/consume repetitive tasks

Daily runs Code style Infection PHPUnit Rector Security Static analysis A Symfony bundle built to schedule/consume repetitive tasks Main features Exte

A bundle providing routes and glue code between Symfony and a WOPI connector.

WOPI Bundle A Symfony bundle to facilitate the implementation of the WOPI endpoints and protocol. Description The Web Application Open Platform Interf

📐 Symfony Bundle to generate database diagrams
📐 Symfony Bundle to generate database diagrams

Doctrine Diagram Bundle 📐 Symfony Bundle to generate database diagrams.

Symfony Bundle to assist in imagine manipulation using the imagine library

LiipImagineBundle PHPUnit PHP-CS-Fixer Coverage Downloads Release This bundle provides an image manipulation abstraction toolkit for Symfony-based pro

Comments
Releases(1.1.1)
Owner
Pisarev Alexey
Fullstack developer (PHP, JavaScript, React, Java, Bash)
Pisarev Alexey
A browser testing and web crawling library for PHP and Symfony

A browser testing and web scraping library for PHP and Symfony Panther is a convenient standalone library to scrape websites and to run end-to-end tes

Symfony 2.7k Dec 31, 2022
PHP Scraper - an highly opinionated web-interface for PHP

PHP Scraper An opinionated & limited way to scrape the web using PHP. The main goal is to get stuff done instead of getting distracted with xPath sele

Peter Thaleikis 327 Dec 30, 2022
Goutte, a simple PHP Web Scraper

Goutte, a simple PHP Web Scraper Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extrac

null 9.1k Jan 1, 2023
A configurable and extensible PHP web spider

Note on backwards compatibility break: since v0.5.0, Symfony EventDispatcher v3 is no longer supported and PHP Spider requires v4 or v5. If you are st

Matthijs van den Bos 1.3k Dec 28, 2022
🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

crawlerdetect.io About CrawlerDetect CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent and http_from header. Current

Mark Beech 1.7k Dec 30, 2022
:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

QueryList QueryList is a simple, elegant, extensible PHP Web Scraper (crawler/spider) ,based on phpQuery. API Documentation 中文文档 Features Have the sam

Jaeger(黄杰) 2.5k Dec 27, 2022
Goutte, a simple PHP Web Scraper

Goutte, a simple PHP Web Scraper Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extrac

null 9.1k Jan 4, 2023
PHP Discord Webcrawler to log all messages from a Discord Chat.

Disco the Ripper was created to rip all messages from a Discord specific channel into JSON via CLI and help people to investigate some servers who has awkward channels before they get deleted.

Daniel Reis 46 Sep 21, 2022
PHP DOM Manipulation toolkit.

phpQuery The PHP DOM Manipulation toolkit. Motivation I'm working currently with PHP, and I've missed using something like jQuery in PHP to manipulate

João Eduardo Fornazari 1 Nov 26, 2021
This Project is for digikala.com scrapping challenge of 2021 blackfriday using php/laravel/horizon

Objective This script is intended for finding the hidden treasure, A scraping challenge by digikala for 2021 black Friday Prerequisites Php mysql redi

ǃшɒʞɒH ǃǀɄ 1 Dec 22, 2021