Simple and fast HTML parser

Overview

DiDOM

Build Status Total Downloads Latest Stable Version License

README на русском

DiDOM - simple and fast HTML parser.

Contents

Installation

To install DiDOM run the command:

composer require imangazaliev/didom

Quick start

use DiDom\Document;

$document = new Document('http://www.news.com/', true);

$posts = $document->find('.post');

foreach($posts as $post) {
    echo $post->text(), "\n";
}

Creating new document

DiDom allows to load HTML in several ways:

With constructor
// the first parameter is a string with HTML
$document = new Document($html);

// file path
$document = new Document('page.html', true);

// or URL
$document = new Document('http://www.example.com/', true);

The second parameter specifies if you need to load file. Default is false.

Signature:

__construct($string = null, $isFile = false, $encoding = 'UTF-8', $type = Document::TYPE_HTML)

$string - an HTML or XML string or a file path.

$isFile - indicates that the first parameter is a path to a file.

$encoding - the document encoding.

$type - the document type (HTML - Document::TYPE_HTML, XML - Document::TYPE_XML).

With separate methods
$document = new Document();

$document->loadHtml($html);

$document->loadHtmlFile('page.html');

$document->loadHtmlFile('http://www.example.com/');

There are two methods available for loading XML: loadXml and loadXmlFile.

These methods accept additional options:

$document->loadHtml($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$document->loadHtmlFile($url, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$document->loadXml($xml, LIBXML_PARSEHUGE);
$document->loadXmlFile($url, LIBXML_PARSEHUGE);

Search for elements

DiDOM accepts CSS selector or XPath as an expression for search. You need to path expression as the first parameter, and specify its type in the second one (default type is Query::TYPE_CSS):

With method find():
use DiDom\Document;
use DiDom\Query;

...

// CSS selector
$posts = $document->find('.post');

// XPath
$posts = $document->find("//div[contains(@class, 'post')]", Query::TYPE_XPATH);

If the elements that match a given expression are found, then method returns an array of instances of DiDom\Element, otherwise - an empty array. You could also get an array of DOMElement objects. To get this, pass false as the third parameter.

With magic method __invoke():
$posts = $document('.post');

Warning: using this method is undesirable because it may be removed in the future.

With method xpath():
$posts = $document->xpath("//*[contains(concat(' ', normalize-space(@class), ' '), ' post ')]");

You can do search inside an element:

echo $document->find('nav')[0]->first('ul.menu')->xpath('//li')[0]->text();

Verify if element exists

To verify if element exist use has() method:

if ($document->has('.post')) {
    // code
}

If you need to check if element exist and then get it:

if ($document->has('.post')) {
    $elements = $document->find('.post');
    // code
}

but it would be faster like this:

if (count($elements = $document->find('.post')) > 0) {
    // code
}

because in the first case it makes two queries.

Search in element

Methods find(), first(), xpath(), has(), count() are available in Element too.

Example:

echo $document->find('nav')[0]->first('ul.menu')->xpath('//li')[0]->text();

Method findInDocument()

If you change, replace, or remove an element that was found in another element, the document will not be changed. This happens because method find() of Element class (a, respectively, the first () and xpath methods) creates a new document to search.

To search for elements in the source document, you must use the methods findInDocument() and firstInDocument():

// nothing will happen
$document->first('head')->first('title')->remove();

// but this will do
$document->first('head')->firstInDocument('title')->remove();

Warning: methods findInDocument() and firstInDocument() work only for elements, which belong to a document, and for elements created via new Element(...). If an element does not belong to a document, LogicException will be thrown;

Supported selectors

DiDom supports search by:

  • tag
  • class, ID, name and value of an attribute
  • pseudo-classes:
    • first-, last-, nth-child
    • empty and not-empty
    • contains
    • has
// all links
$document->find('a');

// any element with id = "foo" and "bar" class
$document->find('#foo.bar');

// any element with attribute "name"
$document->find('[name]');
// the same as
$document->find('*[name]');

// input field with the name "foo"
$document->find('input[name=foo]');
$document->find('input[name=\'bar\']');
$document->find('input[name="baz"]');

// any element that has an attribute starting with "data-" and the value "foo"
$document->find('*[^data-=foo]');

// all links starting with https
$document->find('a[href^=https]');

// all images with the extension png
$document->find('img[src$=png]');

// all links containing the string "example.com"
$document->find('a[href*=example.com]');

// text of the links with "foo" class
$document->find('a.foo::text');

// address and title of all the fields with "bar" class
$document->find('a.bar::attr(href|title)');

Output

Getting HTML

With method html():
$posts = $document->find('.post');

echo $posts[0]->html();
Casting to string:
$html = (string) $posts[0];
Formatting HTML output
$html = $document->format()->html();

An element does not have format() method, so if you need to output formatted HTML of the element, then first you have to convert it to a document:

$html = $element->toDocument()->format()->html();

Inner HTML

$innerHtml = $element->innerHtml();

Document does not have the method innerHtml(), therefore, if you need to get inner HTML of a document, convert it into an element first:

$innerHtml = $document->toElement()->innerHtml();

Getting XML

echo $document->xml();

echo $document->first('book')->xml();

Getting content

$posts = $document->find('.post');

echo $posts[0]->text();

Creating a new element

Creating an instance of the class

use DiDom\Element;

$element = new Element('span', 'Hello');

// Outputs "<span>Hello</span>"
echo $element->html();

First parameter is a name of an attribute, the second one is its value (optional), the third one is element attributes (optional).

An example of creating an element with attributes:

$attributes = ['name' => 'description', 'placeholder' => 'Enter description of item'];

$element = new Element('textarea', 'Text', $attributes);

An element can be created from an instance of the class DOMElement:

use DiDom\Element;
use DOMElement;

$domElement = new DOMElement('span', 'Hello');

$element = new Element($domElement);

Using the method createElement

$document = new Document($html);

$element = $document->createElement('span', 'Hello');

Getting the name of an element

$element->tag;

Getting parent element

$document = new Document($html);

$input = $document->find('input[name=email]')[0];

var_dump($input->parent());

Getting sibling elements

$document = new Document($html);

$item = $document->find('ul.menu > li')[1];

var_dump($item->previousSibling());

var_dump($item->nextSibling());

Getting the child elements

$html = '<div>Foo<span>Bar</span><!--Baz--></div>';

$document = new Document($html);

$div = $document->first('div');

// element node (DOMElement)
// string(3) "Bar"
var_dump($div->child(1)->text());

// text node (DOMText)
// string(3) "Foo"
var_dump($div->firstChild()->text());

// comment node (DOMComment)
// string(3) "Baz"
var_dump($div->lastChild()->text());

// array(3) { ... }
var_dump($div->children());

Getting document

$document = new Document($html);

$element = $document->find('input[name=email]')[0];

$document2 = $element->getDocument();

// bool(true)
var_dump($document->is($document2));

Working with element attributes

Creating/updating an attribute

With method setAttribute:
$element->setAttribute('name', 'username');
With method attr:
$element->attr('name', 'username');
With magic method __set:
$element->name = 'username';

Getting value of an attribute

With method getAttribute:
$username = $element->getAttribute('value');
With method attr:
$username = $element->attr('value');
With magic method __get:
$username = $element->name;

Returns null if attribute is not found.

Verify if attribute exists

With method hasAttribute:
if ($element->hasAttribute('name')) {
    // code
}
With magic method __isset:
if (isset($element->name)) {
    // code
}

Removing attribute:

With method removeAttribute:
$element->removeAttribute('name');
With magic method __unset:
unset($element->name);

Comparing elements

$element  = new Element('span', 'hello');
$element2 = new Element('span', 'hello');

// bool(true)
var_dump($element->is($element));

// bool(false)
var_dump($element->is($element2));

Appending child elements

$list = new Element('ul');

$item = new Element('li', 'Item 1');

$list->appendChild($item);

$items = [
    new Element('li', 'Item 2'),
    new Element('li', 'Item 3'),
];

$list->appendChild($items);

Adding a child element

$list = new Element('ul');

$item = new Element('li', 'Item 1');
$items = [
    new Element('li', 'Item 2'),
    new Element('li', 'Item 3'),
];

$list->appendChild($item);
$list->appendChild($items);

Replacing element

$element = new Element('span', 'hello');

$document->find('.post')[0]->replace($element);

Waning: you can replace only those elements that were found directly in the document:

// nothing will happen
$document->first('head')->first('title')->replace($title);

// but this will do
$document->first('head title')->replace($title);

More about this in section Search for elements.

Removing element

$document->find('.post')[0]->remove();

Warning: you can remove only those elements that were found directly in the document:

// nothing will happen
$document->first('head')->first('title')->remove();

// but this will do
$document->first('head title')->remove();

More about this in section Search for elements.

Working with cache

Cache is an array of XPath expressions, that were converted from CSS.

Getting from cache

use DiDom\Query;

...

$xpath    = Query::compile('h2');
$compiled = Query::getCompiled();

// array('h2' => '//h2')
var_dump($compiled);

Cache setting

Query::setCompiled(['h2' => '//h2']);

Miscellaneous

preserveWhiteSpace

By default, whitespace preserving is disabled.

You can enable the preserveWhiteSpace option before loading the document:

$document = new Document();

$document->preserveWhiteSpace();

$document->loadXml($xml);

count

The count () method counts children that match the selector:

// prints the number of links in the document
echo $document->count('a');
// prints the number of items in the list
echo $document->first('ul')->count('li');

matches

Returns true if the node matches the selector:

$element->matches('div#content');

// strict match
// returns true if the element is a div with id equals content and nothing else
// if the element has any other attributes the method returns false
$element->matches('div#content', true);

isElementNode

Checks whether an element is an element (DOMElement):

$element->isElementNode();

isTextNode

Checks whether an element is a text node (DOMText):

$element->isTextNode();

isCommentNode

Checks whether the element is a comment (DOMComment):

$element->isCommentNode();

Comparison with other parsers

Comparison with other parsers

Comments
  • Как вставить элемент, не child?

    Как вставить элемент, не child?

    Есть список абзацев

    <p>... <p>
    <p>...<p>
    ..
    

    нужно после N-го абзаца вставить элемент (div) с заданным содержанием и сохранить результат:

    $dom = new \DiDom\Document($text);
    $child = new \DiDom\Element('div', '[inread=100]');
    $selector = $dom->find('p:nth-child(4)')[0]->appendChild($child);
    return $dom->html();
    

    Создает структуру:

    <p>... <div>[inread=100]</div></p>
    

    А нужно

    <p>...</p>
    <div>[inread=100]</div>
    

    В ридми намёков не нашёл.

    Возможно ли сделать то, что мне нужно?

    opened by KarelWintersky 19
  • Cannot Modify The Element From Document

    Cannot Modify The Element From Document

    Sample Code:

    $document = new Document('<div id="content"><div class="a">b</div><div class="a">c</div></div>');
    $contentDom = $document->find('#content')[0];
    $contentDom->find('.a')[0]->remove();
    echo $contentDom->innerHtml();
    

    Expected Result:

    <div class="a">c</div>
    

    Result:

    <div class="a">b</div><div class="a">c</div>
    
    opened by shtse8 14
  • DiDom\Document::load expects parameter 4 to be integer, string given

    DiDom\Document::load expects parameter 4 to be integer, string given

    Hello,

    I developed a plugin for WordPress using DiDom as its DOM parser. It reads the content of the generated HTML page and modifies some parts of it.

    It works great, but for some people, it doesn't, and it is always the same error message: DiDom\Document::load expects parameter 4 to be integer, string given.

    It's very hard for me to propose a solution to those users, even for me, this error message is quite cryptic. So basically, would it be possible to enhance this error message, to understand exactly what is wrong? This error is internal to DiDom, so it would be good to know what is wrong with the input, basically that the HTML markup is wrong or something else. Right now I have no idea except trying another parser, but since DiDom works fast and nicely, I would rather spend a bit more time here understanding how to go around this issue.

    Thanks a lot :)

    opened by jordymeow 11
  • file_get_contents(https://.....com): failed to open stream !!

    file_get_contents(https://.....com): failed to open stream !!

    Hey, I'm trying to parse this URL 'eloquentbyexample.com' and it's sucks with this exception: file_get_contents(https://eloquentbyexample.com): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found ' in /..../vendor/imangazaliev/didom/src/DiDom/Document.php:252. Have an idea to solve this? thx!

    opened by atefBB 11
  • Multiple Elements

    Multiple Elements

    Hi,

    I looking to try and pull down images in SVG and also other elements from the page and loop through them. How do i go about this?

        $names = ['a', 'b', 'c', 'd'];
    
        foreach ($names as $name) {
            $name = $name;
        }
    
        $document = new Document('http://www.website.com/link/' . $name, true);
    
        $posts = $document->find('.classname');
        // $title = $document->find('.title'); ????????
        // Does it need to go here e.g $icon = $document->find('img[src$=svg]');
    
        foreach($posts as $post) {
            echo $post->text(), "\n";
            // How do i loop through the images & titles?
        }
    

    Any help is much appreciated.

    Thanks Jake.

    opened by JakeHenshall 11
  • Ничего не возвращают lastChild() nextSibling() и previousSibling()

    Ничего не возвращают lastChild() nextSibling() и previousSibling()

    Вот ваш пример:

    $html = '
    <ul>
        <li>Foo</li>
        <li>Bar</li>
        <li>Baz</li>
    </ul>
    ';
    $document = new Document($html);
    $list = $document->first('ul');
    // string(3) "Baz"
    echo '<br><b>$list->child(2)->text():</b><br>'.highlight_string(print_r($list->child(2)->text(), true), true).'<br>';
    // string(3) "Foo"
    echo '<br><b>$list->firstChild()->text():</b><br>'.highlight_string(print_r($list->firstChild()->text(), true), true).'<br>';
    // string(3) "Baz" - нет ничего
    echo '<br><b>$list->lastChild()->text():</b><br>'.highlight_string(print_r($list->lastChild()->text(), true), true).'<br>';
    
    $document = new Document($html);
    $item = $document->find('ul > li')[1];
    echo '<br><b>$item->previousSibling():</b><br>'.highlight_string(print_r($item->previousSibling()->text(), true), true).'<br>';
    echo '<br><b>$item->nextSibling():</b><br>'.highlight_string(print_r($item->nextSibling()->text(), true), true).'<br>';
    

    $list->lastChild()->text() ничего не возвращает nextSibling() и previousSibling() так же ничего

    opened by Grafs 9
  • Catchable fatal error

    Catchable fatal error

    $item = $document->find('ul.menu > li')[1];
    // предыдущий элемент
    var_dump($item->previousSibling());
    
    Catchable fatal error: Argument 1 passed to DiDom\Element::setNode() must be an instance of DOMElement, instance of DOMText given, called in \vendor\imangazaliev\didom\src\DiDom\Element.php on line 32 and defined in \vendor\imangazaliev\didom\src\DiDom\Element.php on line 452
    
    opened by Grafs 9
  • Encoding issue

    Encoding issue

    HI

    I'm using your lib to parse some HTML pages. When the task is run as crontab job, everything is OK. But once I try to parse the same page via interactive action in browser, it parses the source page in wrong encoding. Any recommendations to fix it? Attaching an example of print_r of some piece of parsed page capture

    opened by Andrewkha 9
  • Can't set attribute in loop

    Can't set attribute in loop

    Code example:

    $html = new Document($file_name, true);
    foreach ( $html->find($selector)[0]->find('img') as $element ) {
        $element->src = self::embed($path);
    }
    

    Expected: <img src="'data:image/jpg;base64,base64_encode_output" /> Got old src attr value.

    opened by DarkPreacher 8
  • Select by ng-atrr

    Select by ng-atrr

    Есть следующий список

    <div class="panel-heading">
    	<select class="form-control"
    		ng-init="rId = 'H255RC51833'"
    		ng-model="rId">
    			<option value="H255RC51833">Полулюкс 2местный</option>
    			<option value="H255RC51834">Люкс 2местный</option>
    			<option value="H255RC51829">Стандартный 1местный</option>
    			<option value="H255RC51830">Стандартный 2местный</option>
    			<option value="H255RC51831">Стандартный Улучшенный 1 местный</option>
    			<option value="H255RC51832">Стандартный Улучшенный 2 местный</option>
    			<option value="H255RC51835">Коттедж 2 местный с 1 спальней</option>
    			<option value="H255RC51836">Коттедж 3 местный с 2 спальнями</option>
    			<option value="H255RC51837">Коттедж 5 местный с 3 спальнями</option>
    	</select>
    </div>
    

    уникальным параметром которого является ng-model="rId".

    пытался по разному его получить, но ничего не выходит.

    $document()->find('select.form-control option'); // так получаю все селекты на странице
    $document()->find('select[ng-model="rId"] option'); // не выбирает
    $document()->find('select::attr(ng-model=rId) option'); // выбирает все селекты
    

    Может быть не работает это из-за того, что атрибут с дефисом ng-model?

    opened by loveorigami 6
  • I cant installed many times

    I cant installed many times

    I was try installed on my XAMPP but always error

    Problem 1 - The requested package imangazaliev/didom No version set (parsed as 1.0.0) is satisfiable by imangazaliev/didom[No version set (parsed as 1.0.0)] but these conflict with your requirements or minimum-stability.

    Installation failed, reverting ./composer.json to its original content.

    can you help me how to fix it ?

    opened by HeriAzhar 6
  • trim() expects parameter 1 to be string, boolean given (0)

    trim() expects parameter 1 to be string, boolean given (0)

    [TypeError] trim() expects parameter 1 to be string, boolean given (0) /vendor/imangazaliev/didom/src/DiDom/Query.php:108 #0: trim(boolean) /vendor/imangazaliev/didom/src/DiDom/Query.php:108 #1: DiDom\Query::parseAndConvertSelector(string, string) /vendor/imangazaliev/didom/src/DiDom/Query.php:74 #2: DiDom\Query::cssToXpath(string) /vendor/imangazaliev/didom/src/DiDom/Query.php:53 #3: DiDom\Query::compile(string, string) /vendor/imangazaliev/didom/src/DiDom/Document.php:402 #4: DiDom\Document->find(string)

    opened by vaajnur 1
  • find не работает в cml файлах с кириллическими тегами

    find не работает в cml файлах с кириллическими тегами

    Например не работает селектор: $document->find("Номер1С"); Пример файла для проверки https://dropmefiles.com/GpWXl Код:

    <КоммерческаяИнформация xmlns="urn:1C.ru:commerceml_3" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ВерсияСхемы="3.1" ДатаФормирования="2022-08-31T18:42:51" Ид="1"> <Контейнер> <Документ> <Ид>64749</Ид> <НомерВерсии>AAAAAAAkbfQ=</НомерВерсии> <ПометкаУдаления>false</ПометкаУдаления> <Номер>64749</Номер> <Номер1С>0000-064749</Номер1С> <Дата>2022-08-23</Дата> <Дата1С>2022-08-23</Дата1С> <Время>00:00:00</Время> <ХозОперация>Заказ товара</ХозОперация> </Документ> </Контейнер> </КоммерческаяИнформация>

    opened by lyrmin 0
  • Разобрать строку внутри элемента

    Разобрать строку внутри элемента

    Всем привет, может кто подскажет, есть такая строка

    Link1 и Link2

    как можно разобрать строку через find чтобы получить такой массив объектов [ 0 => 'Link1', 1 => 'и', 2 => 'Link2' ]
    opened by staixe 0
  • Как получить только дочерние элементы (без рекурсивного поиска) ? (аналог children(selector) из jquery?)

    Как получить только дочерние элементы (без рекурсивного поиска) ? (аналог children(selector) из jquery?)

    Представим ситуацию, у нас таблица, а внутри её ячеек есть ещё таблицы.

    $table = new \DiDom\Element('table');
    
    $table->setInnerHtml('
    <tr>
        <td>Строка 1 Ячейка 1</td>
        <td>Строка 1 Ячейка 2</td>
        <td>
            <table>
                <tr>
                    <td>Субтаблица: Строка 1 Ячейка 1</td>
                    <td>Субтаблица: Строка 1 Ячейка 2</td>
                </tr>
                <tr>
                    <td>Субтаблица: Строка 2 Ячейка 1</td>
                    <td>Субтаблица: Строка 2 Ячейка 2</td>
                </tr>
            </table>
        </td>
    </tr>
    <tr>
        <td>Строка 2 Ячейка 1</td>
        <td>Строка 2 Ячейка 2</td>
    </tr>
    ');
    

    Как нам получить только элементы <td> из самого элемента $table, но не из его дочерних элементов?

    Поскольку метод find('tr > td') даёт нам все вложенные элементы <td>

    $tds = $table->find('tr > td');
    
    $result = [];
    
    foreach ($tds as $td) {
        $result[] = $td->text();
    }
    

    $result:

    Array
    (
        [0] => Строка 1 Ячейка 1
        [1] => Строка 1 Ячейка 2
        [2] => 
                    Субтаблица: Строка 1 Ячейка 1
                    Субтаблица: Строка 1 Ячейка 2
                
                    Субтаблица: Строка 2 Ячейка 1
                    Субтаблица: Строка 2 Ячейка 2
        
        [3] => Субтаблица: Строка 1 Ячейка 1
        [4] => Субтаблица: Строка 1 Ячейка 2
        [5] => Субтаблица: Строка 2 Ячейка 1
        [6] => Субтаблица: Строка 2 Ячейка 2
        [7] => Строка 2 Ячейка 1
        [8] => Строка 2 Ячейка 2
    )
    

    А нужно:

    Array
    (
        [0] => Строка 1 Ячейка 1
        [1] => Строка 1 Ячейка 2
        [2] => 
                    Субтаблица: Строка 1 Ячейка 1
                    Субтаблица: Строка 1 Ячейка 2
                
                    Субтаблица: Строка 2 Ячейка 1
                    Субтаблица: Строка 2 Ячейка 2
    
        [3] => Строка 2 Ячейка 1
        [4] => Строка 2 Ячейка 2
    )
    
    opened by rusproject 0
Releases(2.0)
Owner
null
Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services, assert responses, and extract data from HTML/XML/JSON responses.

Blackfire Player Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services,

Blackfire 485 Dec 31, 2022
Crawlzone is a fast asynchronous internet crawling framework aiming to provide open source web scraping and testing solution.

Crawlzone is a fast asynchronous internet crawling framework aiming to provide open source web scraping and testing solution. It can be used for a wide range of purposes, from extracting and indexing structured data to monitoring and automated testing. Available for PHP 7.3, 7.4, 8.0.

null 68 Dec 27, 2022
This script scrapes the HTML from different web pages to get the information from the video and you can use it in your own video player.

XVideos PornHub RedTube API This script scrapes the HTML from different web pages to get the information from the video and you can use it in your own

null 57 Dec 16, 2022
Goutte, a simple PHP Web Scraper

Goutte, a simple PHP Web Scraper Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extrac

null 9.1k Jan 1, 2023
Goutte, a simple PHP Web Scraper

Goutte, a simple PHP Web Scraper Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extrac

null 9.1k Jan 4, 2023
A browser testing and web crawling library for PHP and Symfony

A browser testing and web scraping library for PHP and Symfony Panther is a convenient standalone library to scrape websites and to run end-to-end tes

Symfony 2.7k Dec 31, 2022
A configurable and extensible PHP web spider

Note on backwards compatibility break: since v0.5.0, Symfony EventDispatcher v3 is no longer supported and PHP Spider requires v4 or v5. If you are st

Matthijs van den Bos 1.3k Dec 28, 2022
Libraries and scripts for crawling the TYPO3 page tree. Used for re-caching, re-indexing, publishing applications etc.

Libraries and scripts for crawling the TYPO3 page tree. Used for re-caching, re-indexing, publishing applications etc.

AOE 0 Sep 14, 2021
A small example of crawling another website and extracting the required information from it to save the website wherever we need it

A small example of crawling another website and extracting the required information from it to save the website wherever we need it Description This s

Mohammad Qasemi 9 Sep 24, 2022
It can Scrap ZEE5 Live Streaming URL's Using The Channel ID and Direct Play Anywhere

It can Scrap ZEE5 Live Streaming URL's Using The Channel ID and Direct Play Anywhere

Techie Sneh 21 Nov 19, 2021
A program to scrape online web-content (APIs, RSS Feeds, or Websites) and notify if search term was hit.

s3n Search-Scan-Save-Notify A program to scrape online web-content (APIs, RSS Feeds, or Websites) and notify if search term was hit. It is based on PH

Aamer 11 Nov 8, 2022
PHP scraper for ZEE5 Live Streaming URL's Using The Channel ID and Direct Play Anywhere

It can scrape ZEE5 Live Streaming URL's Using The Channel ID and Direct Play Anywhere

null 1 Mar 24, 2022
Library for Rapid (Web) Crawler and Scraper Development

Library for Rapid (Web) Crawler and Scraper Development This package provides kind of a framework and a lot of ready to use, so-called steps, that you

crwlr.software 60 Nov 30, 2022
Simple and fast HTML parser

DiDOM README на русском DiDOM - simple and fast HTML parser. Contents Installation Quick start Creating new document Search for elements Verify if ele

null 2.1k Dec 30, 2022
DBML parser for PHP8. It's a PHP parser for DBML syntax.

DBML parser written on PHP8 DBML (database markup language) is a simple, readable DSL language designed to define database structures. This page outli

Pavel Buchnev 32 Dec 29, 2022
📜 Modern Simple HTML DOM Parser for PHP

?? Simple Html Dom Parser for PHP A HTML DOM parser written in PHP - let you manipulate HTML in a very easy way! This is a fork of PHP Simple HTML DOM

Lars Moelleken 665 Jan 4, 2023
php-crossplane - Reliable and fast NGINX configuration file parser and builder

php-crossplane Reliable and fast NGINX configuration file parser and builder ℹ️ This is a PHP port of the Nginx Python crossplane package which can be

null 19 Jun 30, 2022
Efficient, easy-to-use, and fast PHP JSON stream parser

JSON Machine Very easy to use and memory efficient drop-in replacement for inefficient iteration of big JSON files or streams for PHP 5.6+. See TL;DR.

Filip Halaxa 801 Dec 28, 2022
A super fast, highly extensible markdown parser for PHP

A super fast, highly extensible markdown parser for PHP What is this? A set of PHP classes, each representing a Markdown flavor, and a command line to

Carsten Brandt 989 Dec 16, 2022
Lightning Fast, Minimalist PHP User Agent String Parser.

Lightning Fast, Minimalist PHP User Agent String Parser.

Jesse Donat 523 Dec 21, 2022