Provides tools for working with DOM documents and structures

Overview

laminas-dom

This package is considered feature-complete, and is now in security-only maintenance mode, following a decision by the Technical Steering Committee. If you have a security issue, please follow our security reporting guidelines. If you wish to take on the role of maintainer, please nominate yourself

If you are looking for an actively maintained package alternative, we recommend:

Build Status

The Laminas\Dom component provides tools for working with DOM documents and structures. Currently, we offer Laminas\Dom\Query, which provides a unified interface for querying DOM documents utilizing both XPath and CSS selectors.

Comments
  • No pissibility to use libxml options in loadHTML

    No pissibility to use libxml options in loadHTML

    This is a feature request.

    Accordingly to: https://github.com/zendframework/zend-dom/blob/245d75d1cce819cb8da8726cf9c9ba563fa5d8f0/src/Document.php#L254

    loadHTML is called without any options and there is no way to configure them.

    Exactly in my case, I'm missing LIBXML_HTML_NOIMPLIED to operate partial HTMLs


    Originally posted by @step307 at https://github.com/zendframework/zend-dom/issues/15

    Awaiting Author Updates 
    opened by weierophinney 7
  • Regression in Query since 2.9.0, TypeError when querying on XHTML

    Regression in Query since 2.9.0, TypeError when querying on XHTML

    BC Break Report

    | Q | A |------------ | ------ | Version | 2.12.0

    Summary

    Laminas\Dom\Query automatically attempts to register namespaces it finds in the root tag of an XHTML document, by regexing the tag and appending the namespace URL it finds to $this->xpathNamespaces. When running the query, getNodeList() walks through the array of namespaces and registers each one using the DOMXPath object's registerNamespace() method, using the array key as the prefix.

    The problem is that since the "auto" namespace from the root tag is just appended to the array with [], it's always a integer key, and registerNamespace() takes a string.

    With the strict_types declaration introduced by #16, this now results in an error: Since that was merged for the 2.9.0 release, I believe this is present in all versions 2.9.0 and up.

    Previous behavior

    Previously this executed with no error.

    Current behavior

    TypeError: DOMXPath::registerNamespace() expects parameter 1 to be string, int given in .../vendor/laminas/laminas-dom/src/Query.php:325
    Stack trace:
    #0 .../vendor/laminas/laminas-dom/src/Query.php(325): DOMXPath->registerNamespace()
    #1 .../vendor/laminas/laminas-dom/src/Query.php(287): Laminas\Dom\Query->getNodeList()
    ...
    

    How to reproduce

    To trigger this behavior, the document must begin with the <?xml XML prolog, and have an <html> tag with an xmlns attribute.

    $xhtml = '<?xml version="1.0"?><html xmlns="http://www.w3.org/1999/xhtml"></html>';
    $query = new \Laminas\Dom\Query($xhtml);
    $query->queryXpath('/'); // the specific query doesn't matter
    
    Bug 
    opened by zerocrates 6
  • Drop `laminas/laminas-zendframework-bridge` and `zendframework/*` compatibility

    Drop `laminas/laminas-zendframework-bridge` and `zendframework/*` compatibility

    | Q | A |-------------- | ------ | Documentation | no | Bugfix | yes | BC Break | no | New Feature | no | RFC | no | QA | no

    Description

    Increase performance by removing a compatibility layer while not introducing breaking changes.

    This follow the process described in details in:

    https://github.com/laminas/technical-steering-committee/blob/main/meetings/minutes/2021-08-02-TSC-Minutes.md#remove-laminaslaminas-zendframework-bridge-dependency-from-our-packages

    Enhancement 
    opened by PowerKiKi 2
  • Configure Renovate

    Configure Renovate

    Mend Renovate

    Welcome to Renovate! This is an onboarding PR to help you understand and configure settings before regular Pull Requests begin.

    🚦 To activate Renovate, merge this Pull Request. To disable Renovate, simply close this Pull Request unmerged.


    Detected Package Files

    • composer.json (composer)
    • .github/workflows/auto-close.yml (github-actions)
    • .github/workflows/continuous-integration.yml (github-actions)
    • .github/workflows/docs-build.yml (github-actions)
    • .github/workflows/release-on-milestone-closed.yml (github-actions)

    Configuration Summary

    Based on the default config's presets, Renovate will:

    • Start dependency updates only once this onboarding PR is merged
    • Enable Renovate Dependency Dashboard creation.
    • Ignore node_modules, bower_components, vendor and various test/tests directories.
    • Automerge patch and minor upgrades if they pass tests.
    • If automerging, push the new commit directly to the base branch (no PR).
    • Wait for branch tests to pass or fail before creating the PR.
    • Rebase existing PRs any time the base branch has been updated.
    • Separate major versions of dependencies into individual branches/PRs.
    • Do not separate patch and minor upgrades into separate PRs for the same dependency.
    • Raise PR when vulnerability alerts are detected.
    • Evaluate schedules according to timezone UTC.
    • Append Signed-off-by: to signoff Git commits.
    • Apply label renovate to PRs.
    • Group all minor and patch updates together.
    • Default configuration for repositories in the Laminas organisation

    🔡 Would you like to change the way Renovate is upgrading your dependencies? Simply edit the renovate.json in this branch with your custom config and the list of Pull Requests in the "What to Expect" section below will be updated the next time Renovate runs.


    What to Expect

    With your current configuration, Renovate will create 2 Pull Requests:

    Update actions/checkout action to v3
    • Schedule: ["at any time"]
    • Branch name: renovate/actions-checkout-3.x
    • Merge into: 2.13.x
    • Upgrade actions/checkout to v3
    Lock file maintenance
    • Schedule: ["before 2am"]
    • Branch name: renovate/lock-file-maintenance
    • Merge into: 2.13.x
    • Regenerate lock files to use latest dependency versions

    ❓ Got questions? Check out Renovate's Docs, particularly the Getting Started section. If you need any further assistance then you can also request help here.


    Read more information about the use of Renovate Bot within Laminas.

    renovate 
    opened by renovate[bot] 1
  • Fix `TypeError` when using `Query` on XHTML, and namespace is implicitly used (#20)

    Fix `TypeError` when using `Query` on XHTML, and namespace is implicitly used (#20)

    | Q | A |-------------- | ------ | Documentation | no | Bugfix | yes | BC Break | yes | New Feature | no | RFC | no | QA | no

    Description

    Fix #20 by casting the namespace prefix to a string before registering it.

    Note, this change is purposely limited only to fixing the TypeError, and does not include doing something saner with the namespace (of which the most sane option would probably be to just not extract/register it at all since it's actually not used).

    Bug 
    opened by zerocrates 1
  • Restore continuous integration (removes file header, provides type safety)

    Restore continuous integration (removes file header, provides type safety)

    | Q | A |-------------- | ------ | Documentation | no | Bugfix | no | BC Break | no | New Feature | no | RFC | no | QA | yes

    Description

    Restore CI status so that other PR can be merged

    Enhancement 
    opened by PowerKiKi 1
  • Improve assertion usages in test suite

    Improve assertion usages in test suite

    | Q | A |-------------- | ------ | Documentation | no | Bugfix | no | BC Break | no | New Feature | no | RFC | no | QA | no | Tests | yes

    Description

    • Using the assertCount to assert expected count is same as result count.
    • Using the assertSame to assert exception message when catching block has been triggered on some testing methods.
    • Removing some variables because they're declared, but not used.
    Enhancement 
    opened by peter279k 1
  • Psalm integration

    Psalm integration

    Feature Request

    | Q | A |------------ | ------ | QA | yes

    Summary

    As decided during the Technical-Steering-Committee Meeting on August 3rd, 2020, Laminas wants to implement vimeo/psalm in all packages.

    Implementing psalm is quite easy.

    Required

    • [ ] Create a .psalm.xml.dist in the project root
    • [ ] Copy and paste the contents from this psalm.xml.dist
    • [ ] Run $ composer require vimeo/psalm
    • [ ] Run $ vendor/bin/psalm --set-baseline=psalm-baseline.xml
    • [ ] Add a composer script static-analysis with the command psalm --shepherd --stats
    • [ ] Add a new line to script: in .travis.yml: - if [[ $TEST_COVERAGE == 'true' ]]; then composer static-analysis ; fi
    • [ ] Remove phpstan from the project (phpstan.neon.dist, .travis.yml entry, composer.json require-dev and scripts)
    Optional
    • [ ] Fix as many psalm errors as possible.
    Enhancement Help Wanted hacktoberfest-accepted 
    opened by boesing 1
  • Zend\Dom\Query and special UTF-8 characters

    Zend\Dom\Query and special UTF-8 characters

    This issue has been moved from the zendframework repository as part of the bug migration program as outlined here - http://framework.zend.com/blog/2016-04-11-issue-closures.html


    Original Issue: https://api.github.com/repos/zendframework/zendframework/issues/7618 User: @mtrippodi Created On: 2015-08-26T13:51:12Z Updated At: 2015-11-06T22:17:32Z Body

    use Zend\Dom\Query;
    use Zend\Debug\Debug;
    
    $html = '<div><h1>ßüöä</h1></div>';
    $dom = new Query($html);
    $nodes = $dom->execute('h1');
    Debug::dump($nodes->current()->nodeValue);
    

    ...will result in sth. like:

    ßüöä

    $html = '<div><h1>ßüöä</h1></div>';
    $dom = new Query(utf8_decode($html));
    $nodes = $dom->execute('h1');
    Debug::dump($nodes->current()->nodeValue);
    

    ... will solve the problem and result in correct rendering.

    For convenience I extended Zend\Dom\Query:

    <?php
    
    namespace MyNamespace\Dom;
    
    use Zend\Dom\Query as ZF2Query;
    
    class Query extends ZF2Query
    {
    
        /**
         * Set document to query. If is UTF-8: decode.
         *
         * @param  string $document
         * @param  null|string $encoding Document encoding
         * @return Query
         */
        public function setDocument($document, $encoding = null)
        {
            if (0 === strlen($document)) {
                return $this;
            }
    
            $_encoding = empty($encoding) ? $this->getEncoding() : $encoding;
            if($_encoding == 'UTF-8')
                $document = utf8_decode($document);
    
            return parent::setDocument($document, $encoding);
        }
    }
    

    Now I wonder if this could be perhaps implemented in Zend\Dom\Query. Or do I miss something and there's a better solution? Thanks m.


    Comment

    User: @mtrippodi Created On: 2015-08-26T18:15:20Z Updated At: 2015-08-26T19:17:05Z Body OK, forget my first "solution". It's bad because e.g. ...

    $html = '<div><h1>€</h1></div>';
    $dom = new Query(utf8_decode($html));
    $nodes = $dom->execute('h1');
    Debug::dump($nodes->current()->nodeValue); 
    

    ...will result in:

    ?
    

    This is, because all that utf8_decode() does is convert a string encoded in UTF-8 to ISO-8859-1. This is of course not good because UTF-8 can represent many more characters than ISO-8859-1. See this comment at PHP Man.

    The real problem is, that DOMDocument::loadHTML () by default will always treat the source-string as ISO-8859-1-encoded. Unfortunately, you can only change this behavior by specifying the encoding in the html head at the beginning of the source-string. This comment at PHP Man still seems to apply even though it is 10 years old and UTF-8 is so common nowadays!

    So, based on this comment I again extended Zend\Dom\Query as follows:

    <?php
    
    namespace MyNamespace\Dom;
    
    use Zend\Dom\Query as ZF2Query;
    
    class Query extends ZF2Query
    {
    
        /**
         * Set document to query
         *
         * @param  string $document
         * @param  null|string $encoding Document encoding
         * @return Query
         */
        public function setDocument($document, $encoding = null)
        {
            if (0 === strlen($document)) {
                return $this;
            }
    
            $prepend = '';
            $_encoding = empty($encoding) ? $this->getEncoding() : $encoding;
            if(!empty($_encoding) && strtolower($_encoding) != 'iso-8859-1')
                     $prepend = sprintf('<?xml encoding="%s">', $_encoding);
    
            // breaking XML declaration to make syntax highlighting work
            if ('<' . '?xml' == substr(trim($document), 0, 5)) {
                if (preg_match('/<html[^>]*xmlns="([^"]+)"[^>]*>/i', $document, $matches)) {
                    $this->xpathNamespaces[] = $matches[1];
                    return $this->setDocumentXhtml($prepend . $document, $encoding);
                }
                return $this->setDocumentXml($document, $encoding);
            }
            if (strstr($document, 'DTD XHTML')) {
                return $this->setDocumentXhtml($prepend . $document, $encoding);
            }
            return $this->setDocumentHtml($prepend . $document, $encoding);
        }
    }
    

    Still, two questions remain:

    • Is this the best solution?
    • Should a solution be implemented in Zend\Dom\Query?

    Comment

    User: @croensch Created On: 2015-08-28T14:15:05Z Updated At: 2015-08-28T14:15:05Z Body AFAIK if no header is present the passed encoding is used, if the header is present the passed encoding is ignored. So if your documents are always in iso-8859-1 then just try setDocument() as it is?



    Originally posted by @GeeH at https://github.com/zendframework/zend-dom/issues/10

    opened by weierophinney 1
  • CSS query improperly converted to xpath

    CSS query improperly converted to xpath

    This issue has been moved from the zendframework repository as part of the bug migration program as outlined here - http://framework.zend.com/blog/2016-04-11-issue-closures.html


    Original Issue: https://api.github.com/repos/zendframework/zendframework/issues/7470 User: @fcheslack Created On: 2015-04-29T21:02:26Z Updated At: 2015-11-06T21:29:02Z Body In Zend\Dom\Document\Query, CSS selector strings with classes and attributes in the same segment are not properly converted to XPath.

    "input.class[name='inputname']"

    is converted to

    input.classname[@name='inputname']

    instead of

    input[contains(concat(' ', normalize-space(@class), ' '), ' classname ')][@name='inputname']

    This seems to be solved by simply moving the 'Classes' section of Query._tokenize to the top of the function before the "[@" it checks for is replaced into the expression by other steps. I'm not sure if it was placed as the last step for a reason though.



    Originally posted by @GeeH at https://github.com/zendframework/zend-dom/issues/11

    Bug 
    opened by weierophinney 1
  • Fix encoding when document do not have correct data

    Fix encoding when document do not have correct data

    Provide a narrative description of what you are trying to accomplish:

    • [x] Are you fixing a bug?

      • [x] Detail how the bug is invoked currently.
      • [x] Detail the original, incorrect behavior.
      • [x] Detail the new, expected behavior.
      • [x] Base your feature on the master branch, and submit against that branch.
      • [ ] Add a regression test that demonstrates the bug, and proves the fix.
      • [ ] Add a CHANGELOG.md entry for the fix.
    • [ ] Are you creating a new feature?

      • [ ] Why is the new feature needed? What purpose does it serve?
      • [ ] How will users use the new feature?
      • [ ] Base your feature on the develop branch, and submit against that branch.
      • [ ] Add only one feature per pull request; split multiple features over multiple pull requests
      • [ ] Add tests for the new feature.
      • [ ] Add documentation for the new feature.
      • [ ] Add a CHANGELOG.md entry for the new feature.
    • [ ] Is this related to quality assurance?

    • [ ] Is this related to documentation?


    Originally posted by @ian-patel at https://github.com/zendframework/zend-dom/pull/27

    opened by weierophinney 1
Releases(2.13.0)
Owner
Laminas Project
Laminas components and MVC.
Laminas Project
Disciple Tools is a coalition management system for disciple making movements.

Disciple Tools Disciple.Tools software boosts collaboration, clarity, and accountability for disciple and church multiplication movements. Description

Disciple.Tools 32 Oct 18, 2022
PhpMetrics provides metrics about PHP project and classes, with beautiful and readable HTML report.

PhpMetrics provides metrics about PHP project and classes, with beautiful and readable HTML report.

PhpMetrics 2.3k Dec 1, 2022
allourideas allows groups to collect and priorize information in an open, democratic, and efficient process.

All Our Ideas All Our Ideas 2.0. This codebase runs two sites photocracy.org and allourideas.org. The allourideas.org project provides the user-facing

All Our Ideas 153 Sep 13, 2022
Simple and effective multi-format Web API Server to host your PHP API as Pragmatic REST and / or RESTful API

Luracast Restler ![Gitter](https://badges.gitter.im/Join Chat.svg) Version 3.0 Release Candidate 5 Restler is a simple and effective multi-format Web

Luracast 1.4k Nov 20, 2022
Daux.io is an documentation generator that uses a simple folder structure and Markdown files to create custom documentation on the fly. It helps you create great looking documentation in a developer friendly way.

Daux.io - Deprecation Notice This repository is deprecated! Daux.io has been moved to an organization, to guarantee future development and support. So

Justin Walsh 4.6k Nov 20, 2022
PHP 7.1 ready Smart and Simple Documentation for your PHP project

Smart and Readable Documentation for your PHP project ApiGen is the simplest, the easiest to use and the most modern api doc generator. It is all PHP

ApiGen 2.1k Nov 27, 2022
Documentation generator for PHP Code using standard technology (SRC, DOCBLOCK, XML and XSLT)

phpDox phpDox is a documentation generator for PHP projects. This includes, but is not limited to, API documentation. The main focus is on enriching t

Arne Blankerts 589 Nov 11, 2022
PHP 7.1 ready Smart and Simple Documentation for your PHP project

Smart and Readable Documentation for your PHP project ApiGen is the simplest, the easiest to use and the most modern api doc generator. It is all PHP

ApiGen 2.1k Apr 20, 2021
Daux.io is an documentation generator that uses a simple folder structure and Markdown files to create custom documentation on the fly. It helps you create great looking documentation in a developer friendly way.

Daux.io Daux.io is a documentation generator that uses a simple folder structure and Markdown files to create custom documentation on the fly. It help

Daux.io 719 Nov 29, 2022
Next generation phpDoc parser with support for intersection types and generics.

PHPDoc-Parser for PHPStan PHPStan Next generation phpDoc parser with support for intersection types and generics. Code of Conduct This project adheres

PHPStan 890 Nov 27, 2022
phpDocumentor is an application that is capable of analyzing your PHP source code and DocBlock comments to generate a complete set of API Documentation

phpDocumentor What is phpDocumentor? phpDocumentor is an application that is capable of analyzing your PHP source code and DocBlock comments to genera

phpDocumentor 3.7k Dec 1, 2022
Html-sanitizer - The HtmlSanitizer component provides an object-oriented API to sanitize untrusted HTML input for safe insertion into a document's DOM.

HtmlSanitizer Component The HtmlSanitizer component provides an object-oriented API to sanitize untrusted HTML input for safe insertion into a documen

Symfony 200 Nov 28, 2022
DooTask is a lightweight open source online project task management tool that provides various document collaboration tools, online mind mapping, online flowcharting, project management, task distribution, instant IM, file management and other tools.

DooTask is a lightweight open source online project task management tool that provides various document collaboration tools, online mind mapping, online flowcharting, project management, task distribution, instant IM, file management and other tools.

kuaifan 2.8k Dec 1, 2022
laminas-xml2json provides functionality for converting XML structures to JSON

laminas-xml2json This package is considered feature-complete, and is now in security-only maintenance mode, following a decision by the Technical Stee

Laminas Project 12 Aug 9, 2022
Tools for working with the SPDX license list and validating licenses.

composer/spdx-licenses SPDX (Software Package Data Exchange) licenses list and validation library. Originally written as part of composer/composer, no

Composer 1.4k Nov 21, 2022
📜 Modern Simple HTML DOM Parser for PHP

?? Simple Html Dom Parser for PHP A HTML DOM parser written in PHP - let you manipulate HTML in a very easy way! This is a fork of PHP Simple HTML DOM

Lars Moelleken 650 Nov 28, 2022
PHP SİMPLE HTML DOM with news website

PHP SİMPLE HTML DOM with news website I found a library that can pull data from a site to my own site with php. I used it and pulled the yield from a

Uğur Mercan 2 Oct 26, 2022
PHP DOM Manipulation toolkit.

phpQuery The PHP DOM Manipulation toolkit. Motivation I'm working currently with PHP, and I've missed using something like jQuery in PHP to manipulate

João Eduardo Fornazari 1 Nov 26, 2021
php html parser,类似与PHP Simple HTML DOM Parser,但是比它快好几倍

HtmlParser php html解析工具,类似与PHP Simple HTML DOM Parser。 由于基于php模块dom,所以在解析html时的效率比 PHP Simple HTML DOM Parser 快好几倍。 注意:html代码必须是utf-8编码字符,如果不是请转成utf-8

俊杰jerry 522 Nov 24, 2022
A simple library for management the DOM (XML, HTML) document.

A simple library for management the DOM (XML, HTML) document.

Alexey 3 Oct 1, 2022