πŸ“œ Modern Simple HTML DOM Parser for PHP

Overview

Build Status Coverage Status Codacy Badge Latest Stable Version Total Downloads License Donate to this project using Paypal Donate to this project using Patreon

πŸ“œ Simple Html Dom Parser for PHP

A HTML DOM parser written in PHP - let you manipulate HTML in a very easy way! This is a fork of PHP Simple HTML DOM Parser project but instead of string manipulation we use DOMDocument and modern php classes like "Symfony CssSelector".

  • PHP 7.0+ & 8.0 Support
  • PHP-FIG Standard
  • Composer & PSR-4 support
  • PHPUnit testing via Travis CI
  • PHP-Quality testing via SensioLabsInsight
  • UTF-8 Support (more support via "voku/portable-utf8")
  • Invalid HTML Support (partly ...)
  • Find tags on an HTML page with selectors just like jQuery
  • Extract contents from HTML in a single line

Install via "composer require"

composer require voku/simple_html_dom
composer require voku/portable-utf8 # if you need e.g. UTF-8 fixed output

Quick Start

use voku\helper\HtmlDomParser;

require_once 'composer/autoload.php';

...
$dom = HtmlDomParser::str_get_html($str);
// or 
$dom = HtmlDomParser::file_get_html($file);

$element = $dom->findOne('#css-selector'); // "$element" === instance of "SimpleHtmlDomInterface"

$elements = $dom->findMulti('.css-selector'); // "$elements" === instance of SimpleHtmlDomNodeInterface<int, SimpleHtmlDomInterface>

$elementOrFalse = $dom->findOneOrFalse('#css-selector'); // "$elementOrFalse" === instance of "SimpleHtmlDomInterface" or false

$elementsOrFalse = $dom->findMultiOrFalse('.css-selector'); // "$elementsOrFalse" === instance of SimpleHtmlDomNodeInterface<int, SimpleHtmlDomInterface> or false
...

Examples

github.com/voku/simple_html_dom/tree/master/example

API

github.com/voku/simple_html_dom/tree/master/README_API.md

Support

For support and donations please visit Github | Issues | PayPal | Patreon.

For status updates and release announcements please visit Releases | Twitter | Patreon.

For professional support please contact me.

Thanks

  • Thanks to GitHub (Microsoft) for hosting the code and a good infrastructure including Issues-Managment, etc.
  • Thanks to IntelliJ as they make the best IDEs for PHP and they gave me an open source license for PhpStorm!
  • Thanks to Travis CI for being the most awesome, easiest continous integration tool out there!
  • Thanks to StyleCI for the simple but powerfull code style check.
  • Thanks to PHPStan && Psalm for relly great Static analysis tools and for discover bugs in the code!

License

FOSSA Status

Comments
  • Wrong return value

    Wrong return value

    After update to latest version i am getting this error:

    Return value of voku\helper\HtmlDomParser::findOne() must be an instance of voku\helper\SimpleHtmlDom, instance of voku\helper\SimpleHtmlDomNodeBlank returned

    Probably a missed change in core?

    opened by DariusIII 7
  • Update phpunit/phpunit requirement from ~6.0 to ~6.0 || ~7.0

    Update phpunit/phpunit requirement from ~6.0 to ~6.0 || ~7.0

    Updates the requirements on phpunit/phpunit to permit the latest version.

    Changelog

    Sourced from phpunit/phpunit's changelog.

    7.5.1 - 2018-12-12

    Fixed

    • Fixed #3441: Call to undefined method DataProviderTestSuite::usesDataProvider()

    [7.5.0] - 2018-12-07

    Added

    • Implemented #3340: Added assertEqualsCanonicalizing(), assertEqualsIgnoringCase(), assertEqualsWithDelta(), assertNotEqualsCanonicalizing(), assertNotEqualsIgnoringCase(), and assertNotEqualsWithDelta() as alternatives to using assertEquals() and assertNotEquals() with the $delta, $canonicalize, or $ignoreCase parameters
    • Implemented #3368: Added assertIsArray(), assertIsBool(), assertIsFloat(), assertIsInt(), assertIsNumeric(), assertIsObject(), assertIsResource(), assertIsString(), assertIsScalar(), assertIsCallable(), assertIsIterable(), assertIsNotArray(), assertIsNotBool(), assertIsNotFloat(), assertIsNotInt(), assertIsNotNumeric(), assertIsNotObject(), assertIsNotResource(), assertIsNotString(), assertIsNotScalar(), assertIsNotCallable(), assertIsNotIterable() as alternatives to assertInternalType() and assertNotInternalType()
    • Implemented #3391: Added a TestHook that fires after each test, regardless of result
    • Implemented #3417: Refinements related to test suite sorting and TestDox result printer
    • Implemented #3422: Added assertStringContainsString(), assertStringContainsStringIgnoringCase(), assertStringNotContainsString(), and assertStringNotContainsStringIgnoringCase()

    Deprecated

    • The methods assertInternalType() and assertNotInternalType() are now deprecated. There is no behavioral change in this version of PHPUnit. Using these methods will trigger a deprecation warning in PHPUnit 8 and in PHPUnit 9 these methods will be removed.
    • The methods assertAttributeContains(), assertAttributeNotContains(), assertAttributeContainsOnly(), assertAttributeNotContainsOnly(), assertAttributeCount(), assertAttributeNotCount(), assertAttributeEquals(), assertAttributeNotEquals(), assertAttributeEmpty(), assertAttributeNotEmpty(), assertAttributeGreaterThan(), assertAttributeGreaterThanOrEqual(), assertAttributeLessThan(), assertAttributeLessThanOrEqual(), assertAttributeSame(), assertAttributeNotSame(), assertAttributeInstanceOf(), assertAttributeNotInstanceOf(), assertAttributeInternalType(), assertAttributeNotInternalType(), attributeEqualTo(), readAttribute(), getStaticAttribute(), and getObjectAttribute() are now deprecated. There is no behavioral change in this version of PHPUnit. Using these methods will trigger a deprecation warning in PHPUnit 8 and in PHPUnit 9 these methods will be removed.
    • The optional parameters $delta, $maxDepth, $canonicalize, and $ignoreCase of assertEquals() and assertNotEquals() are now deprecated. There is no behavioral change in this version of PHPUnit. Using these parameters will trigger a deprecation warning in PHPUnit 8 and in PHPUnit 9 these parameters will be removed.
    • The annotations [**expectedException**](https://github.com/expectedException), [**expectedExceptionCode**](https://github.com/expectedExceptionCode), [**expectedExceptionMessage**](https://github.com/expectedExceptionMessage), and [**expectedExceptionMessageRegExp**](https://github.com/expectedExceptionMessageRegExp) are now deprecated. There is no behavioral change in this version of PHPUnit. Using these annotations will trigger a deprecation warning in PHPUnit 8 and in PHPUnit 9 these annotations will be removed.
    • Using the methods assertContains() and assertNotContains() on string haystacks is now deprecated. There is no behavioral change in this version of PHPUnit. Using these methods on string haystacks will trigger a deprecation warning in PHPUnit 8 and in PHPUnit 9 these methods cannot be used on on string haystacks anymore.
    • The optional parameters $ignoreCase, $checkForObjectIdentity, and $checkForNonObjectIdentity of assertContains() and assertNotContains() are now deprecated. There is no behavioral change in this version of PHPUnit. Using these parameters will trigger a deprecation warning in PHPUnit 8 and in PHPUnit 9 these parameters will be removed.

    Fixed

    • Fixed #3428: TestSuite setup failures are not logged correctly
    • Fixed #3429: Inefficient loop in getHookMethods()
    • Fixed #3437: JUnit logger skips PHPT tests

    [7.5.0]: https://github.com/sebastianbergmann/phpunit/compare/7.4.5...7.5.0

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Note: This repo was added to Dependabot recently, so you'll receive a maximum of 5 PRs for your first few update runs. Once an update run creates fewer than 5 PRs we'll remove that limit.

    You can always request more updates by clicking Bump now in your Dependabot dashboard.

    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language
    • @dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

    Additionally, you can set the following in your Dependabot dashboard:

    • Update frequency (including time of day and day of week)
    • Automerge options (never/patch/minor, and dev/runtime dependencies)
    • Pull request limits (per update run and/or open at any time)
    • Out-of-range updates (receive only lockfile updates, if desired)
    • Security updates (receive only security updates, if desired)

    Finally, you can contact us by mentioning @dependabot.


    This change is Reviewable

    dependencies 
    opened by dependabot-preview[bot] 7
  • "Not valid HTML fragment!" on default WordPress theme HTML

    What is this feature about (expected vs actual behaviour)?

    I was using @patrickposner's simply-static WordPress plugin and came across a bug when generating static files through his plugin using the 'twentytwentytwo' theme. You can see the related issue here: https://github.com/patrickposner/simply-static/issues/27 @patrickposner believed it was an issue with the theme not producing valid HTML according to W3, but when you remove the <style id='wp-block-image-inline-css'>...</style> tag, the static file generation works. Also, the CSS inside of the style tag is valid, according to W3's CSS validator (you'll need to plug it in).

    How can I reproduce it?

    I created this repository with the bare minimum code that reproduces this bug: https://github.com/Inclushe/voku-simple-html-dom-style-bug test.html comes from https://twentytwentytwodemo.wordpress.com/

    It should show this error when run:

    Fatal error: Uncaught RuntimeException: Not valid HTML fragment!
    .wp-block-image%7Bmargin%3A001em%7D.wp-block-imageimg%7Bvertical-align%3Abottom%7D.wp-block-image%3Anot%28.is-style-rounded%29%3Ea%2C.wp-block-image%3Anot%28.is-style-rounded%29img%7Bborder-radius%3Ainherit%7D.wp-block-image.aligncenter%7Btext-align%3Acenter%7D.wp-block-image.alignfullimg%2C.wp-block-image.alignwideimg%7Bheight%3Aauto%3Bwidth%3A100%25%7D.wp-block-image.aligncenter%2C.wp-block-image.alignleft%2C.wp-block-image.alignright%7Bdisplay%3Atable%7D.wp-block-image.aligncenter%3Efigcaption%2C.wp-block-image.alignleft%3Efigcaption%2C.wp-block-image.alignright%3Efigcaption%7Bcaption-side%3Abottom%3Bdisplay%3Atable-caption%7D.wp-block-image.alignleft%7Bfloat%3Aleft%3Bmargin%3A.5em1em.5em0%7D.wp-block-image.alignright%7Bfloat%3Aright%3Bmargin%3A.5em0.5em1em%7D.wp-block-image.aligncenter%7Bmargin-left%3Aauto%3Bmargin-right%3Aauto%7D.wp-block-imagefigcaption%7Bmargin-bottom%3A1em%3Bmargin-top%3A.5em%7D.wp-block-image.is-style-circle-maskimg%2C.wp-block-imag in C:\Users\ejw98\Projects\voku-simple-html-dom-style-bug\vendor\voku\simple_html_dom\src\voku\helper\SimpleHtmlDom.php on line 196
    

    Removing the <style id='wp-block-image-inline-css'>...</style> tag from test.html and running the script again produces no errors.

    PHP Version: 7.4.27

    Does it take minutes, hours or days to fix?

    No clue.

    Any additional information?

    Nope.

    opened by Inclushe 6
  • Encoding problem while writing to database

    Encoding problem while writing to database

    Hi, I'm trying to do a web scraper with this library in laravel and everything works great until I want to save the result to the database. This is some encoding problem. For example, when HtmlDomParser downloads "LΓ©on" to $film['title'], "L& eacute;on" is saved in the database, but echo displays "LΓ©on".

    Do you have any idea how to fix this problem?

    code snippet:

        $dom = HtmlDomParser::file_get_html('url');
        $film['title'] = $dom->find('selector', 0)->innertext;
        ...
        $film_db = new Movie_info;
        foreach ($film as $k => $v) {
            $film_db->$k = $v;
            echo $k .": ". $v ."<br>";
        }
        $film_db->save();
    

    my database settings:

    'mysql' => [
                'driver' => 'mysql',
                'url' => env('DATABASE_URL'),
                'host' => env('DB_HOST', '127.0.0.1'),
                'port' => env('DB_PORT', '3306'),
                'database' => env('DB_DATABASE', 'forge'),
                'username' => env('DB_USERNAME', 'forge'),
                'password' => env('DB_PASSWORD', ''),
                'unix_socket' => env('DB_SOCKET', ''),
                'charset' => 'utf8mb4',
                'collation' => 'utf8mb4_unicode_ci',
                'prefix' => '',
                'prefix_indexes' => true,
                'strict' => true,
                'engine' => null,
                'options' => extension_loaded('pdo_mysql') ? array_filter([
                    PDO::MYSQL_ATTR_SSL_CA => env('MYSQL_ATTR_SSL_CA'),
                ]) : [],
            ],
    
    question 
    opened by stanislawsk 6
  • about find(). search someone out from html tag,it return blank

    about find(). search someone out from html tag,it return blank

    it can't find tag out from html.

    $html    = '<!DOCTYPE HTML>
    <html>
    <head>
        <title>title</title>
    </head>
    
    <body>
    <div id="a">
        an apple
    </div>
    </body>
    
    </html>
    <div id="b">
        body
    </div>';
    $domTree = \voku\helper\HtmlDomParser::str_get_html($html);
    var_dump($domTree->findOne('#a')->text()); // an apple
    
    var_dump($domTree->findOne('#b')->text()); // empty
    
    bug 
    opened by maoSting 6
  • Single first-level element is stripped

    Single first-level element is stripped

    What is this feature about (expected vs actual behaviour)?

    If string contains only one parent element, parser strips this one from the result.

    $d = new voku\helper\HtmlDomParser;
    
    $d->loadHtml("<p>p1</p><p>p2</p>");
    echo (string)$d; // <p>p1</p><p>p2</p> - correct
    
    $d->loadHtml("<div><p>p1</p></div>");
    echo (string)$d; // <p>p1</p> - incorrect
    

    I don't know if this is an expected behavior, but seems broken for me... Right now I have to manually wrap initial html in a div element and if it contains several root elements this div stays in generated html.

    wait for response 
    opened by Thanty 6
  • How to obtain the div content of bid red in the screenshot

    How to obtain the div content of bid red in the screenshot

    How to obtain the div content of bid red in the screenshot

    image

    I can't get it in the following ways: $html_meal->findOne('#tab_show_1 table')->nextSibling()->innertext

    opened by kl521516 6
  • Can't remove anchor tag from within p tag?

    Can't remove anchor tag from within p tag?

    What is this feature about (expected vs actual behaviour)?

    I don't know if I've misunderstood how to use this or whether this is a bug.

    Any additional information?

    I'm trying to parse this HTML to display only the content in the p tag, stripping the a tags out completely:

    <section class="section" id="section-47" data-period="#period1">
       <div class="content">
          <p>The quick brown fox
             <a class="noteRef commentary F" href="#c563672">F1</a>
             jumps over 
             <a class="noteRef commentary F" href="#c563672">F1</a> 
             the lazy
             <a class="noteRef commentary F" href="#c1844523">F5</a>
             dog.
          </p>
       </div>
    </section>
    

    This is what I have tried so far with no luck and by following the examples and API docs:

    $html = HtmlDomParser::file_get_html($url); // Gets full HTML - above is just a snippet.
    
    foreach ($html->find('section .content p') as $section) {
       foreach($section->find('p a') as $a) {
          $a->outertext = '';
       }
       $content[] = $section->save;
    }
    

    My result is a blank. I have tried following the example but adding parenthesis to the save $content[] = $section->save(); throws an error:

    BadMethodCallException
    Method does not exist
    

    I'm using Laravel.

    opened by rgbaman 5
  • Added getTag() and removeAttributes() and a test for both

    Added getTag() and removeAttributes() and a test for both

    I think that in some cases is needed the name of the tag and sometimes psalm/phpstan/others prefer a method over a property for that. Sometimes you need remove all attributes. Instead a foreach each time the method "removeAttributes" is better

    And I added a method delete() that serve as an alias of $this->outertext="";


    This change is Reviewable

    opened by marioquartz 4
  • Get recommendation rating from sidebar in Yahoo Finance

    Get recommendation rating from sidebar in Yahoo Finance

    I am trying to scrape the recommendation rating that is provided in Yahoo Finance. Example page: https://finance.yahoo.com/quote/BYND/analysis?p=BYND On the side of the page you can find the recommendation rating as shown below image The structure of the part looks something like this: image When I try to get $html->find('.YDC-Col2'), I instead got some JS code: image When I try to see the value of $html->plaintext, I could see that the value I am looking for is under "recommendationMean" image

    Hence, I want to know how I could get this value out of $html. Thank you so much in advance.

    opened by bte234 4
  • Update phpunit/phpunit requirement from ~6.0 || ~7.0 to ~6.0 || ~7.0 || ~8.0

    Update phpunit/phpunit requirement from ~6.0 || ~7.0 to ~6.0 || ~7.0 || ~8.0

    Updates the requirements on phpunit/phpunit to permit the latest version.

    Changelog

    Sourced from phpunit/phpunit's changelog.

    8.0.1 - 2019-02-03

    Fixed

    • Fixed #3509: Process Isolation does not work with phpunit.phar

    [8.0.0] - 2019-02-01

    Changed

    • Implemented #3060: Cleanup PHPUnit\Framework\Constraint\Constraint
    • Implemented #3133: Enable dependency resolution by default
    • Implemented #3236: Define which parts of PHPUnit are covered by the backward compatibility promise
    • Implemented #3244: Enable result cache by default
    • Implemented #3288: The void_return fixer of php-cs-fixer is now in effect
    • Implemented #3439: Improve colorization of TestDox output
    • Implemented #3444: Consider data provider that provides data with duplicate keys to be invalid
    • Implemented #3467: Code location hints for [**requires**](https://github.com/requires) annotations as well as --SKIPIF--, --EXPECT--, --EXPECTF--, --EXPECTREGEX--, and --{SECTION}_EXTERNAL-- sections of PHPT tests
    • Implemented #3481: Improved --help output

    Deprecated

    • Implemented #3332: Deprecate annotation(s) for expecting exceptions
    • Implemented #3338: Deprecate assertions (and helper methods) that operate on (non-public) attributes
    • Implemented #3341: Deprecate optional parameters of assertEquals() and assertNotEquals()
    • Implemented #3369: Deprecate assertInternalType() and assertNotInternalType()
    • Implemented #3388: Deprecate the TestListener interface
    • Implemented #3425: Deprecate optional parameters of assertContains() and assertNotContains() as well as using these methods with string haystacks
    • Implemented #3494: Deprecate assertArraySubset()

    Removed

    • Implemented #2762: Drop support for PHP 7.1
    • Implemented #3123: Remove PHPUnit_Framework_MockObject_MockObject

    [8.0.0]: https://github.com/sebastianbergmann/phpunit/compare/7.5...8.0.0

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language
    • @dependabot badge me will comment on this PR with code to add a "Dependabot enabled" badge to your readme

    Additionally, you can set the following in your Dependabot dashboard:

    • Update frequency (including time of day and day of week)
    • Automerge options (never/patch/minor, and dev/runtime dependencies)
    • Pull request limits (per update run and/or open at any time)
    • Out-of-range updates (receive only lockfile updates, if desired)
    • Security updates (receive only security updates, if desired)

    Finally, you can contact us by mentioning @dependabot.


    This change is Reviewable

    dependencies 
    opened by dependabot-preview[bot] 4
  • Replace content error.

    Replace content error.

    $elementsOrFalse = $dom->findMultiOrFalse($selector);
    if ($elementsOrFalse !== false) {
    	 foreach ($elementsOrFalse as $element) {
    		$element->outerhtml = "<p>".$element->innerhtml."</p>";  // it's not works, the p tag is removed
    		$element->outerhtml = "<div>".$element->innerhtml."</div>";  // it's works fine
    	}
    }
    

    Thank you.

    opened by kbzwxq 0
  • delete function has bug

    delete function has bug

    What is this feature about (expected vs actual behaviour)?

    delete function has bug

    How can I reproduce it?

    $dom = HtmlDomParser::str_get_html($html); $body = $dom->findOne("body"); $body->findOne("img")->delete(); Image is not deleted.

    Image is deleted is only delete if using: $dom->findOne("img")->delete();

    Does it take minutes, hours or days to fix?

    minutes

    Any additional information?

    opened by chief725 5
  • "Not valid HTML fragment!" using svg in style tag

    What is this feature about (expected vs actual behaviour)?

    We are using the package (https://github.com/CarbonPackages/Carbon.Compression) on a larger project to minify the generated HTML. There is a case, where we have an svg in the CSS which causes an exception thrown here.

    I noticed, there was already an issue and some kind of a fix. So, I'm not sure if this is unrelated to the previous fixed issue #81.

    In the previous issue, it is mentioned that the problem is that php do not support html5 or svg by default. Does that mean this cannot be fixed? What exactly is the issue here with php / html5?

    How can I reproduce it?

    I've created a testcase below which just replicates the failing lines from SimpleHtmlDom. For the sake simplicity I just copied the original normalizeStringForComparison.

    Testcase
    <?php
    
    use \voku\helper\HtmlDomParser;
    
    require __DIR__ . '/vendor/autoload.php';
    
    $string = '<style>lite-youtube>.lty-playbtn{background-color:transparent;background-image:url(\'data:image/svg+xml;utf8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 68 48"><path d="M66.52 7.74c-.78-2.93-2.49-5.41-5.42-6.19C55.79.13 34 0 34 0S12.21.13 6.9 1.55c-2.93.78-4.63 3.26-5.42 6.19C.06 13.05 0 24 0 24s.06 10.95 1.48 16.26c.78 2.93 2.49 5.41 5.42 6.19C12.21 47.87 34 48 34 48s21.79-.13 27.1-1.55c2.93-.78 4.64-3.26 5.42-6.19C67.94 34.95 68 24 68 24s-.06-10.95-1.48-16.26z" fill="red"/><path d="M45 24 27 14v20" fill="white"/></svg>\');-webkit-filter:grayscale(100%)}</style>';
    
    $newDocument = new HtmlDomParser($string);
    
    $tmpDomOuterTextString = normalizeStringForComparison($newDocument);
    $tmpStr = normalizeStringForComparison($string);
    
    if ($tmpDomOuterTextString !== $tmpStr) {
        var_dump("FAILED", $tmpDomOuterTextString, $tmpStr);
    }
    else{
        var_dump("SUCCESS");
    }
    
    function normalizeStringForComparison($input): string
    {
        if ($input instanceof HtmlDomParser) {
            $string = $input->html(false, false);
    
            if ($input->getIsDOMDocumentCreatedWithoutHeadWrapper()) {
                /** @noinspection HtmlRequiredTitleElement */
                $string = \str_replace(['<head>', '</head>'], '', $string);
            }
        } else {
            $string = (string) $input;
        }
    
        return
            \urlencode(
              \urldecode(
                   \trim(
                        \str_replace(
                             [
                                ' ',
                                "\n",
                                "\r",
                                '/>',
                            ],
                            [
                                '',
                                '',
                                '',
                                '>',
                            ],
                            \strtolower($string)
                        )
                    )
                )
             );
    }
    

    Does it take minutes, hours or days to fix?

    :shrug:

    Any additional information?

    PHP 8.1.4

    opened by gjwnc 0
  • Question: parsing Laravel blade

    Question: parsing Laravel blade

    A laravel blade file is not really a html file. It includes program directives for conditions and loops. But, because it starts with @ (like @foreach), the laravel directrives get escaped by domReplaceHelpers.

    For example,

                     @foreach ($company->members as $m)
                        @if ($m->checkbox)
                          {{ $m->checkbox }}
                        @else
                          <span class="" title="{{ $m->profile }}">{{ $m->name }}</span>
                        @endif
                      @endforeach
    

    will be parsed to:

                    ____SIMPLE_HTML_DOM__VOKU__AT____foreach ($company-&gt;members as $m)\n
                             ____SIMPLE_HTML_DOM__VOKU__AT____if ($m-&gt;checkbox)\n
                               {{ $m-&gt;checkbox }}\n
                             ____SIMPLE_HTML_DOM__VOKU__AT____else\n
                               <span class="" title="{{ $m-&gt;profile }}">{{ $m-&gt;name }}</span>\n
                             ____SIMPLE_HTML_DOM__VOKU__AT____endif\n
                           ____SIMPLE_HTML_DOM__VOKU__AT____endforeach\n
    

    So, this works well for me to manipulate html parts of the blade file.

    However, a case below messes it up:

           @for ($i = 2; $i <= 6; $i++)
               <div class="form-group">
                 <label for="file{{ $i }}" class="col-md-2 control-label">添付
                   {{ $i }}</label>
                 <div class="col-md-10">
                   <input type="file" name="file{{ $i }}" id="file{{ $i }}">
                   <input id="filename{{ $i }}" type="hidden" name="filename{{ $i }}"
                     value="">
                   {{ HTML::error($errors, 'file{$i}') }}
                 </div>
               </div>
             @endfor
    

    You see below, the for part gets truncated.

                ____SIMPLE_HTML_DOM__VOKU__AT____for ($i = 2; $i \n
                        <label for="file{{ $i }}" class="col-md-2 control-label">添付\n
                          {{ $i }}</label>\n
                        <div class="col-md-10">\n
                          <input type="file" name="file{{ $i }}" id="file{{ $i }}"/>\n
                          <input id="filename{{ $i }}" type="hidden" name="filename{{ $i }}" value=""/>\n
                          <x-error name="file{$i}"/>\n
                        </div>\n
                      </x-form></div>\n
                    ____SIMPLE_HTML_DOM__VOKU__AT____endfor\n
    

    I believe that because 'i <= 6' includesa html starting tag: <.

    I wonder if there is anyway to prevent this from happening.

    Thanks in advance.

    opened by lotsofbytes 1
  • Link selector in XML-Documents does not seem to work.

    Link selector in XML-Documents does not seem to work.

    The example "Scraping Lebensmittelwarnung" does not seem to work. I tried accessing similar feeds, and it turns out to be the same error. In your example, the "Link" does not work, the same as the date. Turns out, that the date can be fixed $return[$title]['DatumTime'] = date('Y-m-d H:m:s', strtotime($item->find('pubDate')->text()[0]));. I cannot find any fix for the link-Attribute.

    Any idea on that? Should be an easy fix.

    opened by luggesexe 0
Owner
Lars Moelleken
Webdeveloper & Sysadmin | egrep '#php|#js|#html|#css|#sass'
Lars Moelleken
Simple URL parser

urlparser Simple URL parser This is a simple URL parser, which returns an array of results from url of kind /module/controller/param1:value/param2:val

null 1 Oct 29, 2021
This is a simple, streaming parser for processing large JSON documents

Streaming JSON parser for PHP This is a simple, streaming parser for processing large JSON documents. Use it for parsing very large JSON documents to

Salsify 685 Nov 23, 2022
Better Markdown Parser in PHP

Parsedown Better Markdown Parser in PHP - Demo. Features One File No Dependencies Super Fast Extensible GitHub flavored Tested in 5.3 to 7.3 Markdown

Emanuil Rusev 14.3k Dec 3, 2022
Highly-extensible PHP Markdown parser which fully supports the CommonMark and GFM specs.

league/commonmark league/commonmark is a highly-extensible PHP Markdown parser created by Colin O'Dell which supports the full CommonMark spec and Git

The League of Extraordinary Packages 2.4k Nov 24, 2022
A super fast, highly extensible markdown parser for PHP

A super fast, highly extensible markdown parser for PHP What is this? A set of PHP classes, each representing a Markdown flavor, and a command line to

Carsten Brandt 988 Nov 29, 2022
An HTML5 parser and serializer for PHP.

HTML5-PHP HTML5 is a standards-compliant HTML5 parser and writer written entirely in PHP. It is stable and used in many production websites, and has w

null 1.1k Dec 6, 2022
Advanced shortcode (BBCode) parser and engine for PHP

Shortcode Shortcode is a framework agnostic PHP library allowing to find, extract and process text fragments called "shortcodes" or "BBCodes". Example

Tomasz Kowalczyk 358 Nov 26, 2022
Parsica - PHP Parser Combinators - The easiest way to build robust parsers.

Parsica The easiest way to build robust parsers in PHP.

null 0 Feb 22, 2022
This is a php parser for plantuml source file.

PlantUML parser for PHP Overview This package builds AST of class definitions from plantuml files. This package works only with php. Installation Via

Tasuku Yamashita 5 May 29, 2022
Efficient, easy-to-use, and fast PHP JSON stream parser

JSON Machine Very easy to use and memory efficient drop-in replacement for inefficient iteration of big JSON files or streams for PHP 5.6+. See TL;DR.

Filip Halaxa 790 Dec 1, 2022
A PHP hold'em range parser

mattjmattj/holdem-range-parser A PHP hold'em range parser Installation No published package yet, so you'll have to clone the project manually, or add

Matthias Jouan 1 Feb 2, 2022
Parser for Markdown and Markdown Extra derived from the original Markdown.pl by John Gruber.

PHP Markdown PHP Markdown Lib 1.9.0 - 1 Dec 2019 by Michel Fortin https://michelf.ca/ based on Markdown by John Gruber https://daringfireball.net/ Int

Michel Fortin 3.3k Dec 4, 2022
A New Markdown parser for PHP5.4

Ciconia - A New Markdown Parser for PHP The Markdown parser for PHP5.4, it is fully extensible. Ciconia is the collection of extension, so you can rep

Kazuyuki Hayashi 359 Sep 14, 2022
A lightweight lexical string parser for BBCode styled markup.

Decoda A lightweight lexical string parser for BBCode styled markup. Requirements PHP 5.6.0+ Multibyte Composer Contributors "Marten-Plain" emoticons

Miles Johnson 196 Nov 10, 2022
Convert HTML to Markdown with PHP

HTML To Markdown for PHP Library which converts HTML to Markdown for your sanity and convenience. Requires: PHP 7.2+ Lead Developer: @colinodell Origi

The League of Extraordinary Packages 1.5k Nov 23, 2022
HTML sanitizer, written in PHP, aiming to provide XSS-safe markup based on explicitly allowed tags, attributes and values.

TYPO3 HTML Sanitizer ℹ️ Common safe HTML tags & attributes as given in \TYPO3\HtmlSanitizer\Builder\CommonBuilder still might be adjusted, extended or

TYPO3 GitHub Department 19 Nov 2, 2022
A simple PHP library for handling Emoji

Emoji Emoji images from unicode characters and names (i.e. :sunrise:). Built to work with Twemoji images. use HeyUpdate\Emoji\Emoji; use HeyUpdate\Emo

null 54 May 23, 2022
A simple PHP library for handling Emoji

Emoji Emoji images from unicode characters and names (i.e. :sunrise:). Built to work with Twemoji images. use HeyUpdate\Emoji\Emoji; use HeyUpdate\Emo

null 51 Jan 15, 2021
A simple Atom/RSS parsing library for PHP.

SimplePie SimplePie is a very fast and easy-to-use class, written in PHP, that puts the 'simple' back into 'really simple syndication'. Flexible enoug

SimplePie 1.5k Nov 26, 2022