QueryList

QueryList is a simple, elegant, extensible PHP Web Scraper (crawler/spider) ,based on phpQuery.

API Documentation

中文文档

Features

Have the same CSS3 DOM selector as jQuery
Have the same DOM manipulation API as jQuery
Have a generic list crawling program
Have a strong HTTP request suite, easy to achieve such as: simulated landing, forged browser, HTTP proxy and other complex network requests
Have a messy code solution
Have powerful content filtering, you can use the jQuey selector to filter content
Has a high degree of modular design, scalability and strong
Have an expressive API
Has a wealth of plug-ins

Through plug-ins you can easily implement things like:

Multithreaded crawl
Crawl JavaScript dynamic rendering page (PhantomJS/headless WebKit)
Image downloads to local
Simulate browser behavior such as submitting Form forms
Web crawler
.....

Requirements

PHP >= 7.1

Installation

By Composer installation:

composer require jaeger/querylist

Usage

DOM Traversal and Manipulation

Crawl「GitHub」all picture links

QueryList::get('https://github.com')->find('img')->attrs('src');

Crawl Google search results

$ql = QueryList::get('https://www.google.co.jp/search?q=QueryList');

$ql->find('title')->text(); //The page title
$ql->find('meta[name=keywords]')->content; //The page keywords

$ql->find('h3>a')->texts(); //Get a list of search results titles
$ql->find('h3>a')->attrs('href'); //Get a list of search results links

$ql->find('img')->src; //Gets the link address of the first image
$ql->find('img:eq(1)')->src; //Gets the link address of the second image
$ql->find('img')->eq(2)->src; //Gets the link address of the third image
// Loop all the images
$ql->find('img')->map(function($img){
	echo $img->alt;  //Print the alt attribute of the image
});

More usage

$ql->find('#head')->append('<div>Append content</div>')->find('div')->htmls();
$ql->find('.two')->children('img')->attrs('alt'); // Get the class is the "two" element under all img child nodes
// Loop class is the "two" element under all child nodes
$data = $ql->find('.two')->children()->map(function ($item){
    // Use "is" to determine the node type
    if($item->is('a')){
        return $item->text();
    }elseif($item->is('img'))
    {
        return $item->alt;
    }
});

$ql->find('a')->attr('href', 'newVal')->removeClass('className')->html('newHtml')->...
$ql->find('div > p')->add('div > ul')->filter(':has(a)')->find('p:first')->nextAll()->andSelf()->...
$ql->find('div.old')->replaceWith( $ql->find('div.new')->clone())->appendTo('.trash')->prepend('Deleted')->...

List crawl

Crawl the title and link of the Google search results list:

$data = QueryList::get('https://www.google.co.jp/search?q=QueryList')
	// Set the crawl rules
    ->rules([ 
	    'title'=>array('h3','text'),
	    'link'=>array('h3>a','href')
	])
	->query()->getData();

print_r($data->all());

Results:

Array
(
    [0] => Array
        (
            [title] => Angular - QueryList
            [link] => https://angular.io/api/core/QueryList
        )
    [1] => Array
        (
            [title] => QueryList | @angular/core - Angularリファレンス - Web Creative Park
            [link] => http://www.webcreativepark.net/angular/querylist/
        )
    [2] => Array
        (
            [title] => QueryListにQueryを追加したり、追加されたことを感知する | TIPS ...
            [link] => http://www.webcreativepark.net/angular/querylist_query_add_subscribe/
        )
        //...
)

Encode convert

// Out charset :UTF-8
// In charset :GB2312
QueryList::get('https://top.etao.com')->encoding('UTF-8','GB2312')->find('a')->texts();

// Out charset:UTF-8
// In charset:Automatic Identification
QueryList::get('https://top.etao.com')->encoding('UTF-8')->find('a')->texts();

HTTP Client (GuzzleHttp)

Carry cookie login GitHub

//Crawl GitHub content
$ql = QueryList::get('https://github.com','param1=testvalue & params2=somevalue',[
  'headers' => [
      // Fill in the cookie from the browser
      'Cookie' => 'SINAGLOBAL=546064; wb_cmtLike_2112031=1; wvr=6;....'
  ]
]);
//echo $ql->getHtml();
$userName = $ql->find('.header-nav-current-user>.css-truncate-target')->text();
echo $userName;

Use the Http proxy

$urlParams = ['param1' => 'testvalue','params2' => 'somevalue'];
$opts = [
	// Set the http proxy
    'proxy' => 'http://222.141.11.17:8118',
    //Set the timeout time in seconds
    'timeout' => 30,
     // Fake HTTP headers
    'headers' => [
        'Referer' => 'https://querylist.cc/',
        'User-Agent' => 'testing/1.0',
        'Accept'     => 'application/json',
        'X-Foo'      => ['Bar', 'Baz'],
        'Cookie'    => 'abc=111;xxx=222'
    ]
];
$ql->get('http://httpbin.org/get',$urlParams,$opts);
// echo $ql->getHtml();

Analog login

// Post login
$ql = QueryList::post('http://xxxx.com/login',[
    'username' => 'admin',
    'password' => '123456'
])->get('http://xxx.com/admin');
// Crawl pages that need to be logged in to access
$ql->get('http://xxx.com/admin/page');
//echo $ql->getHtml();

Submit forms

// Get the QueryList instance
$ql = QueryList::getInstance();
// Get the login form
$form = $ql->get('https://github.com/login')->find('form');

// Fill in the GitHub username and password
$form->find('input[name=login]')->val('your github username or email');
$form->find('input[name=password]')->val('your github password');

// Serialize the form data
$fromData = $form->serializeArray();
$postData = [];
foreach ($fromData as $item) {
    $postData[$item['name']] = $item['value'];
}

// Submit the login form
$actionUrl = 'https://github.com'.$form->attr('action');
$ql->post($actionUrl,$postData);
// To determine whether the login is successful
// echo $ql->getHtml();
$userName = $ql->find('.header-nav-current-user>.css-truncate-target')->text();
if($userName)
{
    echo 'Login successful ! Welcome:'.$userName;
}else{
    echo 'Login failed !';
}

Bind function extension

Customize the extension of a myHttp method:

$ql = QueryList::getInstance();

//Bind a `myHttp` method to the QueryList object
$ql->bind('myHttp',function ($url){
	// $this is the current QueryList object
    $html = file_get_contents($url);
    $this->setHtml($html);
    return $this;
});

// And then you can call by the name of the binding
$data = $ql->myHttp('https://toutiao.io')->find('h3 a')->texts();
print_r($data->all());

Or package to class, and then bind:

$ql->bind('myHttp',function ($url){
    return new MyHttp($this,$url);
});

Plugin used

Use the PhantomJS plugin to crawl JavaScript dynamically rendered pages:

// Set the PhantomJS binary file path during installation
$ql = QueryList::use(PhantomJs::class,'/usr/local/bin/phantomjs');

// Crawl「500px」all picture links
$data = $ql->browser('https://500px.com/editors')->find('img')->attrs('src');
print_r($data->all());

// Use the HTTP proxy
$ql->browser('https://500px.com/editors',false,[
	'--proxy' => '192.168.1.42:8080',
    '--proxy-type' => 'http'
])

Using the CURL multithreading plug-in, multi-threaded crawling GitHub trending :

$ql = QueryList::use(CurlMulti::class);
$ql->curlMulti([
    'https://github.com/trending/php',
    'https://github.com/trending/go',
    //.....more urls
])
 // Called if task is success
 ->success(function (QueryList $ql,CurlMulti $curl,$r){
    echo "Current url:{$r['info']['url']} \r\n";
    $data = $ql->find('h3 a')->texts();
    print_r($data->all());
})
 // Task fail callback
->error(function ($errorInfo,CurlMulti $curl){
    echo "Current url:{$errorInfo['info']['url']} \r\n";
    print_r($errorInfo['error']);
})
->start([
	// Maximum number of threads
    'maxThread' => 10,
    // Number of error retries
    'maxTry' => 3,
]);

Plugins

jae-jae/QueryList-PhantomJS:Use PhantomJS to crawl Javascript dynamically rendered page.
jae-jae/QueryList-CurlMulti : Curl multi threading.
jae-jae/QueryList-AbsoluteUrl : Converting relative urls to absolute.
jae-jae/QueryList-Rule-Google : Google searcher.
jae-jae/QueryList-Rule-Baidu : Baidu searcher.

View more QueryList plugins and QueryList-based products: QueryList Community

Contributing

Welcome to contribute code for the QueryList。About Contributing Plugins can be viewed:QueryList Plugin Contributing Guide

Author

Jaeger [email protected]

If this library is useful for you, say thanks buying me a beer 🍺 !

Lisence

QueryList is licensed under the license of MIT. See the LICENSE for more details.

挂机采集时出现错误报告Document with ID '67ca6e7b6472494a20aff43a4739ab59' isn't loaded. Use phpQuery::newDocument($html) or phpQuery::newDocumentFile($file) first.请问是什么原因造成的？

Error 500: Internal Server Error { "message": "Argument 1 passed to QL\Dom\Query::handleData() must be an instance of Tightenco\Collect\Support\Collection, instance of Illuminate\Support\Collection given, called in /home/vagrant/code/yishang/vendor/jaeger/querylist/src/Dom/Query.php on line 142", "status_code": 500 }

代码 $url = 'https://it.ithome.com/ityejie/'; // 元数据采集规则 $rules = [ // 采集文章标题 'title' => ['h2>a','text'], // 采集链接 'link' => ['h2>a','href'], // 采集缩略图 'img' => ['.list_thumbnail>img','src'], // 采集文档简介 'desc' => ['.memo','text'] ]; // 切片选择器 $range = '.content li'; $rt = QueryList::get($url)->rules($rules)->range($range)->query()->getData(); print_r($rt->all());die();

之前也遇到过一些在高版本laravel上出现一些莫名其妙的结果，还以为自己的代码写得有问题。今天试了下laravel7，结果遇到更加莫名其妙的问题。获取列表时，如果是href，只能获取到第一条数据。如果有text，可以全部获取到，但是被全部合并成一条数据了。 laravel7框架执行结果： laravel5.8框架r执行结果：后来我只好把代码拷贝到laravel5.8的框架下，执行后得到的数据是正常的。这个框架其实本来还挺好用的，但是遇到这种问题，心里挺烦的。

用这个框架开发效率还是挺高的。希望作者多多优化。谢谢！

$search_url = 'http://so.iqiyi.com/so/q_' . $keyword; $rules = [ //div[@class='mod_search_result']/div/ul/li[1]/h3[@class="result_title"] 'title' => ['div>h3','text'], //div[@class='mod_search_result']/div/ul/li[1]/a/img/@src 'image' => ['a>img','src'] ]; $range = '.mod_search_result>div>ul'; $data = QueryList::get($search_url)->rules($rules)->range($range)->query()->getData(); //打印结果 print_r($data->all());

phpQuery有个bug，那就是当HTML中有它无法识别的特殊字符时，HTML就会被截断，导致最终的采集结果不正确，此时可以尝试使用正则或其它方式获取到要采集的内容的HTML片段，把这个HTML片段传给QueryList，从而可以解决这种场景下的问题。

请问这个BUG会修复吗?特殊字符,表情会中断,对我的业务需求来说.问题比较大.

版本：3.1.2
php版本：5.3.27 处理：GB2312 转 UTF-8

报错截图： qq 20170906123551

不影响正常执行

相关代码：

private function _arrayConvertEncoding($arr, $toEncoding, $fromEncoding)
{
    eval('$arr = '.iconv($fromEncoding, $toEncoding.'//IGNORE', var_export($arr,TRUE)).';');
    return $arr;
}

composer 升级到新版4.2.8后， QueryList::get($cai_url)->rules([ 'title'=>array('h3','text'), 'link'=>array('h3>a','href') ])->query()->getData(); 只能得到：Array ( [title] => Angular - QueryList [link] => https://angular.io/api/core/QueryList )

没法得到多维数组了，回退到之前使用的4.0.1 就正常了

比如这里：http://xiaohua.zol.com.cn/lengxiaohua/34.html 我setHtml后，发现html的内容被破坏了，少了一个

好像是guzzlehttp的问题。 http://xiaohua.zol.com.cn/ 不管怎样都是乱码，后来只能临时用file_get_contents再加上手动转码。但是我现在希望写一个通用的采集器，如果总是会遇到特殊情况的话，就很难用这个框架实现了。。

太痛苦。有没有曾经研究过这个问题的老鸟们？怎样直接修改guzzlehttp来永久性解决乱码问题呢？

"title" => " 想念的星星不说话"，比如这里，标题被加了一个空格。这个空格很奇怪，我用trim居然不能去除，只能复制后用str_replace。增：也可以用$x['title'] = ltrim($v['title'][0],' ');这种方式去除这个字符。这个符号我复制到notepad++里面，也搞不清具体是什么符号？但实际上网页源码中被采集元素的开头是不带空格的。。原文是这样：class="tooltip">想念的星星不说话我的laravel版本是：5.8

{"exception":"[object] (ErrorException(code: 0): Array and string offset access syntax with curly braces is deprecated at /app/vendor/jaeger/phpquery-single/phpQuery.php:2170)

我在使用QueryList::rules($rules)->multiGet是总是会得到这个报错，但为什么使用try catch却无法捕获到它，这样会导致程序直接崩溃。 Fatal error: Uncaught TypeError: Argument 1 passed to QL\Services\MultiRequestService::QL\Services{closure}() must be an instance of GuzzleHttp\Exception\RequestException, instance of GuzzleHttp\Exception\ConnectException given in D:\phpstudy\WWW\wpblog\wp-content\plugins\seekhub-collector\vendor\jaeger\querylist\src\Services\MultiRequestService.php:56 Stack trace: #0 [internal function]: QL\Services\MultiRequestService->QL\Services{closure}(Object(GuzzleHttp\Exception\ConnectException), 7, Object(GuzzleHttp\Promise\Promise)) #1 D:\phpstudy\WWW\wpblog\wp-content\plugins\seekhub-collector\vendor\guzzlehttp\promises\src\EachPromise.php(192): call_user_func(Object(Closure), Object(GuzzleHttp\Exception\ConnectException), 7, Object(GuzzleHttp\Promise\Promise)) #2 D:\phpstudy\WWW\wpblog\wp-content\plugins\seekhub-collector\vendor\guzzlehttp\promises\src\Promise.php(204): GuzzleHttp\Promise\EachPromise->GuzzleHttp\Promise{closure}(Object(GuzzleHttp\Exception\ConnectException)) #3 D:\phpstudy\WWW\wpblog\wp-content\plugins\seek in D:\phpstudy\WWW\wpblog\wp-content\plugins\seekhub-collector\vendor\jaeger\querylist\src\Services\MultiRequestService.php on line 56

% composer require jaeger/querylist                          
Using version ^4.2 for jaeger/querylist
./composer.json has been updated
Running composer update jaeger/querylist
Loading composer repositories with package information
Updating dependencies
Your requirements could not be resolved to an installable set of packages.

  Problem 1
    - jaeger/querylist[V4.2.0, ..., V4.2.8] require jaeger/g-http ^1.1 -> satisfiable by jaeger/g-http[V1.1, ..., V1.7.2].
    - jaeger/g-http V1.7.2 requires cache/filesystem-adapter ^1 -> satisfiable by cache/filesystem-adapter[1.0.0, 1.1.0, 1.1.x-dev (alias of dev-master), 1.2.0].
    - jaeger/g-http[V1.7.0, ..., V1.7.1] require cache/filesystem-adapter ^1.0 -> satisfiable by cache/filesystem-adapter[1.0.0, 1.1.0, 1.1.x-dev (alias of dev-master), 1.2.0].
    - cache/filesystem-adapter 1.1.x-dev is an alias of cache/filesystem-adapter dev-master and thus requires it to be installed too.
    - cache/filesystem-adapter[dev-master, 1.2.0] require psr/cache ^1.0 || ^2.0 -> found psr/cache[1.0.0, 1.0.1, 2.0.0] but the package is fixed to 3.0.0 (lock file version) by a partial update and that version does not match. Make sure you list it as an argument for the update command.
    - cache/filesystem-adapter 1.0.0 requires php ^5.6 || ^7.0 -> your php version (8.0.11) does not satisfy that requirement.
    - cache/filesystem-adapter 1.1.0 requires psr/cache ^1.0 -> found psr/cache[1.0.0, 1.0.1] but the package is fixed to 3.0.0 (lock file version) by a partial update and that version does not match. Make sure you list it as an argument for the update command.
    - jaeger/g-http[V1.1, ..., V1.6.0] require guzzlehttp/guzzle ^6.2 -> found guzzlehttp/guzzle[6.2.0, ..., 6.5.x-dev] but it conflicts with your root composer.json require (^7.0.1).
    - Root composer.json requires jaeger/querylist ^4.2 -> satisfiable by jaeger/querylist[V4.2.0, ..., V4.2.8].

Use the option --with-all-dependencies (-W) to allow upgrades, downgrades and removals for packages currently locked to specific versions.

Installation failed, reverting ./composer.json and ./composer.lock to their original content.

guzzle 冲突，依赖的 symfony/cache 和 psr/cache 版本冲突。

Added

rules add attributes:
- attr()
- attrs()

Source code(tar.gz)
Source code(zip)

Added

rules add attributes:
- texts: get the text of multiple elements
- htmls: get the html of multiple elements
- htmlOuter: get the element's outer html
- htmlOuters: get the outer html of multiple elements
destructDocuments()：destroy all documents
Elements class add htmlOuters() method

Changed

destruct(): will destroy the current object
range: when range is not set, the data structure returned changes
Elements::each(): callback function parameters changed

Source code(tar.gz)
Source code(zip)

Added

postJson(): Send POST JSON Request
multiGet(): Concurrent GET Request
multiPost(): Concurrent Post Request
pipe(): data flow pipeline method
Add HTTP Cache

Changed

Static calls are no longer in single mode

Source code(tar.gz)
Source code(zip)

Rewrite the entire framework
Have an expressive API
Fully composer, no longer support manual installation
The version of PHP must be larger than PHP7
More modular and easier to expand
Built in a powerful HTTP plug-in and code conversion plug-in
Have almost all the same API as the jQuery operation DOM

Source code(tar.gz)
Source code(zip)

错误报告Document with ID '67ca6e7b6472494a20aff43a4739ab59' isn't loaded. Use phpQuery::newDocument($html) or phpQuery::newDocumentFile($file) first.

挂机采集时出现错误报告Document with ID '67ca6e7b6472494a20aff43a4739ab59' isn't loaded. Use phpQuery::newDocument($html) or phpQuery::newDocumentFile($file) first.请问是什么原因造成的？

opened by windpursuer 13
更新了composer就一直报这个错，写法跟文档一样

Error 500: Internal Server Error { "message": "Argument 1 passed to QL\Dom\Query::handleData() must be an instance of Tightenco\Collect\Support\Collection, instance of Illuminate\Support\Collection given, called in /home/vagrant/code/yishang/vendor/jaeger/querylist/src/Dom/Query.php on line 142", "status_code": 500 }

代码 $url = 'https://it.ithome.com/ityejie/'; // 元数据采集规则 $rules = [ // 采集文章标题 'title' => ['h2>a','text'], // 采集链接 'link' => ['h2>a','href'], // 采集缩略图 'img' => ['.list_thumbnail>img','src'], // 采集文档简介 'desc' => ['.memo','text'] ]; // 切片选择器 $range = '.content li'; $rt = QueryList::get($url)->rules($rules)->range($range)->query()->getData(); print_r($rt->all());die();

opened by smiaoO712 6
高版本laravel会出现很多莫名其妙的问题

之前也遇到过一些在高版本laravel上出现一些莫名其妙的结果，还以为自己的代码写得有问题。今天试了下laravel7，结果遇到更加莫名其妙的问题。获取列表时，如果是href，只能获取到第一条数据。如果有text，可以全部获取到，但是被全部合并成一条数据了。 laravel7框架执行结果： laravel5.8框架r执行结果：后来我只好把代码拷贝到laravel5.8的框架下，执行后得到的数据是正常的。这个框架其实本来还挺好用的，但是遇到这种问题，心里挺烦的。

用这个框架开发效率还是挺高的。希望作者多多优化。谢谢！

opened by aboutboy 5
列表采集数据不对

$search_url = 'http://so.iqiyi.com/so/q_' . $keyword; $rules = [ //div[@class='mod_search_result']/div/ul/li[1]/h3[@class="result_title"] 'title' => ['div>h3','text'], //div[@class='mod_search_result']/div/ul/li[1]/a/img/@src 'image' => ['a>img','src'] ]; $range = '.mod_search_result>div>ul'; $data = QueryList::get($search_url)->rules($rules)->range($range)->query()->getData(); //打印结果 print_r($data->all());

opened by navysummer 5
phpQuery有个bug，那就是当HTML中有它无法识别的特殊字符时，HTML就会被截断，导致最终的采集结果不正确

phpQuery有个bug，那就是当HTML中有它无法识别的特殊字符时，HTML就会被截断，导致最终的采集结果不正确，此时可以尝试使用正则或其它方式获取到要采集的内容的HTML片段，把这个HTML片段传给QueryList，从而可以解决这种场景下的问题。

请问这个BUG会修复吗?特殊字符,表情会中断,对我的业务需求来说.问题比较大.

opened by ghost 4

转换编码格式报错

opened by storyflow 4

4.2.8 版本没法用啊

composer 升级到新版4.2.8后， QueryList::get($cai_url)->rules([ 'title'=>array('h3','text'), 'link'=>array('h3>a','href') ])->query()->getData(); 只能得到：Array ( [title] => Angular - QueryList [link] => https://angular.io/api/core/QueryList )

没法得到多维数组了，回退到之前使用的4.0.1 就正常了

opened by cokyhe 3
在某些情况下，html(setHtml)会有问题

比如这里：http://xiaohua.zol.com.cn/lengxiaohua/34.html 我setHtml后，发现html的内容被破坏了，少了一个

V4.2.8(Jul 5, 2021)

Source code(tar.gz)
Source code(zip)
V4.2.5(Apr 3, 2020)
Added

rules add attributes:

attr()

attrs()

Source code(tar.gz)
Source code(zip)
V4.2.0(Mar 20, 2020)
Added

rules add attributes:

texts: get the text of multiple elements

htmls: get the html of multiple elements

htmlOuter: get the element's outer html

htmlOuters: get the outer html of multiple elements

destructDocuments()：destroy all documents

Elements class add htmlOuters() method

Changed

destruct(): will destroy the current object

range: when range is not set, the data structure returned changes

Elements::each(): callback function parameters changed

Source code(tar.gz)
Source code(zip)
V4.1.0(Dec 17, 2018)
Added

postJson(): Send POST JSON Request

multiGet(): Concurrent GET Request

multiPost(): Concurrent Post Request

pipe(): data flow pipeline method

Add HTTP Cache

Changed

Static calls are no longer in single mode

Source code(tar.gz)
Source code(zip)
V4.0.1(Dec 6, 2017)
Rewrite the entire framework

Have an expressive API

Fully composer, no longer support manual installation

The version of PHP must be larger than PHP7

More modular and easier to expand

Built in a powerful HTTP plug-in and code conversion plug-in

Have almost all the same API as the jQuery operation DOM

Source code(tar.gz)
Source code(zip)
V3.1(Dec 28, 2015)

use composer
Source code(tar.gz)
Source code(zip)

:spider: The progressive PHP crawler framework! 优雅的渐进式PHP采集框架。

Related tags

Overview

QueryList

Features

Requirements

Installation

Usage

DOM Traversal and Manipulation

List crawl

Encode convert

HTTP Client (GuzzleHttp)

Submit forms

Bind function extension

Plugin used

Plugins

Contributing

Author

Lisence

Comments

Releases(V4.2.8)

V4.2.8(Jul 5, 2021)

V4.2.5(Apr 3, 2020)

Added

V4.2.0(Mar 20, 2020)

Added

Changed

V4.1.0(Dec 17, 2018)

Added

Changed

V4.0.1(Dec 6, 2017)

V3.1(Dec 28, 2015)

Owner

Jaeger(黄杰)

On-Page SEO Crawler Tool with Interface

Packagist crawler

Library for Rapid (Web) Crawler and Scraper Development

Extractor (scraper, crawler, parser) of products from Allegro

Crawlzone is a fast asynchronous internet crawling framework aiming to provide open source web scraping and testing solution.

PHP Scraper - an highly opinionated web-interface for PHP

Goutte, a simple PHP Web Scraper

A browser testing and web crawling library for PHP and Symfony

🕷 CrawlerDetect is a PHP class for detecting bots/crawlers/spiders via the user agent

Goutte, a simple PHP Web Scraper

PHP Discord Webcrawler to log all messages from a Discord Chat.

PHP DOM Manipulation toolkit.

This Project is for digikala.com scrapping challenge of 2021 blackfriday using php/laravel/horizon

Beanbun 是用 PHP 编写的多进程网络爬虫框架，具有良好的开放性、高可扩展性，基于 Workerman

Symfony bundle for Roach PHP

PHP scraper for ZEE5 Live Streaming URL's Using The Channel ID and Direct Play Anywhere

PHP library to Scrape website into entity easily

Roach is a complete web scraping toolkit for PHP

🔥High Performance PHP Progressive Framework.

Framework for building extensible server-side progressive applications for modern PHP.