This library provides HTML5 element definitions for HTML Purifier, compliant with the WHATWG spec

Overview

HTML5 Definitions for HTML Purifier

Maintenance Build Status Coverage Status Latest Stable Version Total Downloads License

This library provides HTML5 element definitions for HTML Purifier, compliant with the WHATWG spec.

It is the most complete HTML5-compliant solution among all based on HTML Purifier. Apart from providing the most extensive set of element definitions, it provides tidy/sanitization rules for transforming the input into a valid HTML5 output.

Installation

Install with Composer by running the following command:

composer require xemlock/htmlpurifier-html5

Usage

The most basic usage is similar to the original HTML Purifier. Create a HTML5-compatible config using HTMLPurifier_HTML5Config::createDefault() factory method, and then pass it to an HTMLPurifier instance:

$config = HTMLPurifier_HTML5Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html5 = $purifier->purify($dirty_html5);

To modify the config you can either instantiate the config with a configuration array passed to HTMLPurifier_HTML5Config::create(), or by calling set method on an already existing config instance.

For example, to allow IFRAMEs with Youtube videos you can do the following:

$config = HTMLPurifier_HTML5Config::create(array(
  'HTML.SafeIframe' => true,
  'URI.SafeIframeRegexp' => '%^//www\.youtube\.com/embed/%',
));

or equivalently:

$config = HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.SafeIframe', true);
$config->set('URI.SafeIframeRegexp', '%^//www\.youtube\.com/embed/%');

Configuration

Apart from HTML Purifier's built-in configuration directives, the following new directives are also supported:

  • HTML.Forms

    Version added: 0.1.12
    Type: Boolean
    Default: false

    Whether or not to permit form elements in the user input, regardless of %HTML.Trusted value. Please be very careful when using this functionality, as enabling forms in untrusted documents may allow for phishing attacks.

  • HTML.IframeAllowFullscreen

    Version added: 0.1.11
    Type: Boolean
    Default: false

    Whether or not to permit allowfullscreen attribute on iframe tags. It requires either %HTML.SafeIframe or %HTML.Trusted to be true.

  • HTML.Link

    Version added: 0.1.12
    Type: Boolean
    Default: false

    Permit the link tags in the user input, regardless of %HTML.Trusted value. This effectively allows link tags without allowing other untrusted elements.

    If enabled, URIs in link tags will not be matched against a whitelist specified in %URI.SafeLinkRegexp (unless %HTML.SafeIframe is also enabled).

  • HTML.SafeLink

    Version added: 0.1.12
    Type: Boolean
    Default: false

    Whether to permit link tags in untrusted documents. This directive must be accompanied by a whitelist of permitted URIs via %URI.SafeLinkRegexp, otherwise no link tags will be allowed.

  • HTML.XHTML

    Version added: 0.1.12
    Type: Boolean
    Default: false

    While deprecated in HTML 4.01 / XHTML 1.0 context, in HTML5 it's used for enabling support for namespaced attributes and XML self-closing tags.

    When enabled it causes xml:lang attribute to take precedence over lang, when both attributes are present on the same element.

  • URI.SafeLinkRegexp

    Version added: 0.1.12
    Type: String
    Default: null

    A PCRE regular expression that will be matched against a URI. This directive only has an effect if %HTML.SafeLink is enabled. Here are some example values: %^https?://localhost/% - Allow localhost URIs

    Use Attr.AllowedRel to control permitted link relationship types.

Supported HTML5 elements

Aside from HTML elements supported originally by HTML Purifier, this library adds support for the following HTML5 elements:

,
Comments
  • The a element in HTML5

    The a element in HTML5

    The a element may be wrapped around entire paragraphs, lists, tables, and so forth, even entire sections, so long as there is no interactive content within (e.g. buttons or other links). https://html.spec.whatwg.org/dev/text-level-semantics.html#the-a-element

    Example:

    <a><table></table></a>
    

    becomes

    <a></a><table></table>
    
    opened by bytestream 9
  • Audio block not handled correctly when surrounded by <strong> tags

    Audio block not handled correctly when surrounded by tags

    With the below code, the tag is stripped from the audio block and replaced with

    <?php
    require_once('vendor/autoload.php');
    
    $html = '<p><strong><audio controls="controls"><source type="audio/mp3" src="myaudiofile.mp3" /></audio></strong></p>';
    
    echo "In: " . $html . PHP_EOL;
    
    $config = HTMLPurifier_HTML5Config::createDefault();
    $purifier = new HTMLPurifier($config);
    
    
    echo "out: " . $purifier->purify($html);
    

    Expected output:

    <p><strong><audio controls><source type="audio/mp3" src="myaudiofile.mp3" /></audio></strong></p>

    (Unless I'm reading it wrong, the spec says <strong> can contain "Phrasing content" which includes <audio>) http://w3c.github.io/html/single-page.html#phrasing-content-2

    Actual output:

    <p><strong></strong></p><audio controls><strong></strong></audio><strong></strong>

    I think it's because the <strong> tag in the base library is set to allow contents of type "Inline", whereas <audio> is defined as a block in this library.

    Will follow up with a PR if I find a fix today

    bug 
    opened by mattford 9
  • Added masterminds/html5

    Added masterminds/html5

    Ability to switch to masterminds/html5 lexer for better HTML5 support using:

    $config = \HTMLPurifier_HTML5Config::create();
    $config->set('Core.LexerImpl', new \HTMLPurifier_Lexer_HTML5);
    
    opened by bytestream 8
  • ezyang comments

    ezyang comments

    I'm looking at switching to this from ezyang/htmlpurifier due to growing need for HTML5 support.

    Several years ago, lukusw tried to add HTML5 support to htmlpurifier for Drupal but I think the idea dropped priority and was never implemented. ezyang made some comments on lukusw's attempt which is probably what slowed the whole thing down: https://www.drupal.org/project/htmlpurifier/issues/1321490#comment-9509073

    I've been comparing lukusw and your code based on ezyang comments:

    • https://github.com/lukusw/php-htmlpurfier-html5/blob/master/htmlpurifier_html5.php#L40
    • https://github.com/xemlock/htmlpurifier-html5/blob/e843e771f778618137f7e3fefbe2015621056c47/library/HTMLPurifier/HTMLModule/HTML5/Text.php#L23

    With this in mind, I'm hoping you can answer the below questions:

    All of the HTML5 content needs to be gated, so it is only available when a user specifies an HTML5 doctype. You could try to put all of the HTML5 definitions in a new HTMLModule.

    ✔️ looks good

    section/nav/aside/article are not Block content but Sectioning content. Flow should be redefined to include Sectioning (similar to how HTMLPurifier/HTMLModule/Text.php does Flow)

    ❌ Doesn't look to have changed?

    header and footer need to exclude header/footer/main descendants; see the 'excludes' attribute; also an example in Text.php (pre)

    ❌ Doesn't look to have changed?

    Ditto with address, use the same technique

    ❌ Doesn't look to have changed?

    hgroup got removed from the HTML5 spec, so doesn't belong here.

    ✔️ seems fine to keep it

    The figure specification doesn't look right; I think you need an asterisk after the Flow. A plain spec 'Flow' is special-cased. I suspect your specifications also exclude plain text.

    ❔ not sure if you've done this?

    figcaption is not Inline, give it false instead.

    ✔️ seems fine

    I'm a little worried about video tag, but the definition you've given is probably OK. I'm not sure if it should be allowed by default. Definitely autoplay should not be allowed. The contents has the same problem as figure.

    ✔️ allows autoplay, but otherwise seems ok

    We should already have the inline elements; are the existing definitions buggy?

    ✔️ not sure that this is relevant... Existing definitions are gated to XHTML 1.1, so would need gated definition for html5 spec (http://htmlpurifier.org/phorum/read.php?3,8291,8514#msg-8514)

    For ins/del datetime, ideally we would apply the HTML5 parse a date or time string and validate it, see http://www.w3.org/TR/html5/infrastructure.html#parse-a-date-or-time-string

    ✔️ seems fine

    iframe allowfullscreen isn't an HTML5 attribute. And it shouldn't be allowed by default anyway, should be gated by Tricky at least.

    ❌ Not gated by tricky?

    question 
    opened by bytestream 7
  • Add contenteditable to CommonAttributes

    Add contenteditable to CommonAttributes

    This copies CommonAttributes changes in v4.15.0 from upstream, otherwise contenteditable="false" is not considered valid on HTML5 doc types.

    It also bumps the requirements to match ezyang/htmlpurifier in v4.15.0

    opened by bytestream 4
  • HTML.Forms not working?

    HTML.Forms not working?

    Hello I want to pass HTML form through purify process, this should by possible in vanilla htmlpurifier since 4.13.0 by "HTML.Forms" but it doesn't seems to work in this html5 extend. Example:

    <?php
    require 'vendor/autoload.php';
    
    $config = HTMLPurifier_HTML5Config::createDefault();
    
    $config->set('HTML.Trusted', FALSE);
    $config->set('HTML.Forms', TRUE);
    
    $purifier = new HTMLPurifier($config);
    
    $dirty_html5 = '<form mnethod="post" action="#"><input></form>';
    $clean_html5 = $purifier->purify($dirty_html5);
    var_dump(htmlspecialchars($clean_html5));
    ----------------------
    string(0) ""
    

    from composer.lock:

    "name": "ezyang/htmlpurifier",
    "version": "v4.13.0",
    
    "name": "xemlock/htmlpurifier-html5",
    "version": "v0.1.11",
    

    In case of vanilla "ezyang/htmlpurifier:v4.13.0:

    <?php
    
    require 'vendor/autoload.php';
    
    $config = HTMLPurifier_Config::createDefault();
    
    $config->set('HTML.Trusted', FALSE);
    $config->set('HTML.Forms', TRUE);
    
    $purifier = new HTMLPurifier($config);
    
    $dirty_html5 = '<form mnethod="post" action="#"><input></form>';
    $clean_html5 = $purifier->purify($dirty_html5);
    var_dump(htmlspecialchars($clean_html5));
    --------------------------
    string(61) "<form action="#"><input /></form>" 
    

    Not sure what's wrong there?

    question 
    opened by etnyx-efun 4
  • tr@bgcolor removed

    tr@bgcolor removed

    htmlpurifier has support for deprecated attributes and will convert them to their style equivalent

    <table>
    <tr bgcolor="#edeeef">
    <td width="3"></td>
    <td bgcolor="#f9fafa" width="1"></td>
    <td bgcolor="#edeeef" width="1"></td>
    <td bgcolor="#dbdee0" width="1"></td>
    </tr></table>
    

    When using this lib bgcolor seems to get nuked. I've tried added "HTML.TidyLevel" => "heavy", but it doesn't seem to do anything. http://htmlpurifier.org/docs/enduser-tidy.html makes reference to the doctype, so I'm wondering whether the HTML5 doctype has something to do with it not working?

    opened by bytestream 4
  • The definition is ignored in case of HTMLPurifier_Config::inherit() usage

    The definition is ignored in case of HTMLPurifier_Config::inherit() usage

    Hi.

    Thanks for the great library which saves developer lifetime :smile:

    I'm using this config to override the Purifier configuration in a Symfony project with the exercise/htmlpurifier-bundle

    The bundle creates the parent configuration than uses HTMLPurifier_Config::inherit() method to create the child one. The method implementation is taken from the parent class, not HTML5Config as the following.

        /**
         * Creates a new config object that inherits from a previous one.
         * @param HTMLPurifier_Config $config Configuration object to inherit from.
         * @return HTMLPurifier_Config object with $config as its parent.
         */
        public static function inherit(HTMLPurifier_Config $config)
        {
            return new HTMLPurifier_Config($config->def, $config->plist);
        }
    

    As a result all the child configurations are HTMLPurifier_Config instances instead of HTMLPurifier_HTML5Config which causes errors as they don't support HTML5 tags.

    I'm using a workaround inheriting the base class like:

    class HTMLPurifier_AltHTML5Config extends \HTMLPurifier_HTML5Config
    {
        public static function inherit(HTMLPurifier_Config $config)
        {
            return new static($config->def, $config->plist);
        }
    }
    

    But the 'inherit()' method should be overridden as well, I suppose.

    Thanks again. Best wishes.

    enhancement 
    opened by xgrn 4
  • contenteditable attribute

    contenteditable attribute

    Would you accept a PR which adds contenteditable attribute support?

    https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/contenteditable

    opened by bytestream 3
  • (Table) Captions with Heading content results in a stripped table

    (Table) Captions with Heading content results in a stripped table

    In HTML5, captions may contain any flow content excluding descendent table elements- https://html.spec.whatwg.org/multipage/tables.html#the-caption-element

    Meaning:

    <table>
      <caption><h3>Monthly savings</h3></caption>
      <tr>
        <th>Month</th>
        <th>Savings</th>
      </tr>
      <tr>
        <td>January</td>
        <td>$100</td>
      </tr>
    </table>
    

    is perfectly valid HTML, but it results in the entire table being stripped out instead:

    <h3>Monthly savings</h3>
      Month
        Savings
      January
        $100
      
    

    Related issue: https://github.com/ezyang/htmlpurifier/issues/131

    opened by codebymikey 3
  • Border-radius getting removed

    Border-radius getting removed

    Hi,

    I am using a combination of wkhtmltopdf and htmlpurifier-html5 to generate pdf's.

    The problem that I am facing at the moment is that the inline style border-radius gets removed after passing the purify() function.

    Any thoughts on why it could be doing this?

    Thanks.

    question 
    opened by Dimrak 3
  • HTML5 input types

    HTML5 input types

    Currently the set of allowed <input> types doesn't include HTML5 values. Also, it would be useful to be able to narrow the set of allowed input types (as requested in ezyang/htmlpurifier#213).

    WIP 
    opened by xemlock 1
Releases(v0.1.11)
  • v0.1.11(Aug 7, 2019)

    Notable changes since last release:

    • Implemented spec compliant <address>, <header>, <footer> (#39)
    • Implemented spec compliant <form> and <blockquote> (#46)
    • Added spec compliant datetime attribute to <time>, <ins>, <del> (#35)
    • Defined <article>, <aside>, <nav>, <section> as Sectioning content
    • Defined <hgroup> as Heading content
    • Empty <figure>s are no longer removed
    • Made allowfullscreen attribute of <iframe> guarded by %HTML.IframeAllowFullscreen setting (#38)
    • Dropped obsolete elements: <basefont>, <center>, <dir>, <font>, <menu>, <strike>
    • Dropped obsolete <iframe> attributes: scrolling, frameborder, longdesc, marginheight, marginwidth

    Changes to internal APIs:

    • Removed deprecated class HTMLPurifier_AttrDef_Regexp
    • Removed deprecated class HTMLPurifier_AttrTransform_Progress
    • Removed deprecated HTMLPurifier_ChildDef_ classes: Details, Figure, Media, Picture
    • Removed helper class HTMLPurifier_ChildDef_HTML5
    • Removed deprecated method HTMLPurifier_HTML5Definition::setup()
    Source code(tar.gz)
    Source code(zip)
  • v0.1.10(Apr 26, 2019)

  • v0.1.9(Apr 17, 2019)

    DO NOT INSTALL THIS RELEASE!!! It contains a bug in <a> element definition, preventing it from being treated as an inline element. Use v0.1.10 instead.


    Notable changes since last release:

    • HTML5 doctype and modularization
    • Added <bdi> and <dialog>
    • Added datetime attribute to <del> and <ins>
    • Added async and charset attributes to <script> (issue #26)
    • Added auto value to dir global attribute
    • Added validation for rel attribute of <a> element

    The following classes were marked as deprecated:

    • HTMLPurifier_AttrDef_Regexp as unused
    • HTMLPurifier_AttrTransform_Progress, use HTMLPurifier_AttrTransform_HTML5_Progress instead
    • HTMLPurifier_ChildDef_Details, use HTMLPurifier_ChildDef_HTML5_Details instead
    • HTMLPurifier_ChildDef_Figure, use HTMLPurifier_ChildDef_HTML5_Figure instead
    • HTMLPurifier_ChildDef_Media, use HTMLPurifier_ChildDef_HTML5_Media instead
    • HTMLPurifier_ChildDef_Picture, use HTMLPurifier_ChildDef_HTML5_Picture instead
    • HTMLPurifier_ChildDef_Progress, use HTMLPurifier_ChildDef_HTML5_Progress instead
    Source code(tar.gz)
    Source code(zip)
  • v0.1.8(Aug 14, 2018)

  • v0.1.7(Jun 8, 2018)

    Notable changes since last release:

    • Fix content model definition of <figure> and <details> elements
    • Fix content category of <audio>, <video> and <picture> elements (#17)
    • Make HTML5Config::inherit() return an instance of HTML5Config (#16)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(Mar 4, 2018)

  • v0.1.5(Feb 12, 2018)

    Changes since last release:

    • Add support for details, picture, and track elements
    • Add support for HTML5 attributes to a element
    • Allow flow content inside a element
    • Fix content sanitization in audio and video elements
    • Fix definition of source element
    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(Feb 8, 2018)

    Changes since last release:

    • Update readme
    • Make HTML5Config::create() default arguments the same as Config::create()

    Important: previous releases allowed HTMLPurifier_HTML5Config::create() to be called without any arguments. In this release the first argument is required, just as it is in HTMLPurifier_Config::create(). If it's not provided a warning will be issued, but apart from that the library will work as before.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Feb 6, 2018)

  • v0.1.2(Aug 14, 2017)

    Changes since last release:

    • fix boolean attribute validation
    • simplify definitions for audio|video|figure elements
    • use config schema singleton instance instead of creating one, if none was provided
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(May 7, 2017)

Simple yet powerful, PSR-compliant, Symfony-driven PHP Blog engine.

brodaty-blog ✒️ Simple Blog Engine based on pure Markdown files. ?? Works without database, caches HTML templates from Markdown files. ?? Fast and ext

Sebastian 3 Nov 15, 2022
Moodle activity plugin for embedding content from other websites in a GDPR-compliant way

ICON activate external content What is it? This plugin is for when you want to include GDPR-compliant embeddings of content from external platforms su

ICONnewmedia 2 Sep 14, 2022
A Polymer e-commerce element for Magento 2

magento-collect Master branch | Develop branch --- | --- | --- | Note: this element is still in progress, watch it to follow the progress. This is lik

Bob van Luijt 16 Dec 7, 2019
REDAXO-Addon mit nützlichen Methoden im Umgang mit dem Picture-Element, Responsive Bilder, SVG-Ausgabe, u.v.m.

REDAXO-Addon mit nützlichen Methoden im Umgang mit dem Picture-Element, Responsive Bilder, SVG-Ausgabe, u.v.m.

alex+ Informationsdesign 8 Dec 15, 2022
REDAXO-Addon mit nützlichen Methoden im Umgang mit dem Picture-Element, Responsive Bilder, SVG-Ausgabe, u.v.m.

REDAXO-Addon mit nützlichen Methoden im Umgang mit dem Picture-Element, Responsive Bilder, SVG-Ausgabe, u.v.m.

alex+ Informationsdesign 8 Dec 15, 2022
html-sanitizer is a library aiming at handling, cleaning and sanitizing HTML sent by external users

html-sanitizer html-sanitizer is a library aiming at handling, cleaning and sanitizing HTML sent by external users (who you cannot trust), allowing yo

Titouan Galopin 381 Dec 12, 2022
A Magento 1.x module which facilitates automatic purging of static assets from HTTP caches such as browser cache, CDN, Varnish, etc using best practices outlined within the HTML5 boilerplate community.

Magento Cachebuster Cachebuster is a Magento module which facilitates automatic purging of static assets from HTTP caches such as browser cache, CDN,

Gordon Knoppe 129 Apr 1, 2022
HTML5 Twitter Bootstrap 3.1 Magento Boilerplate Template

Magento Boilerplate A HTML5 Twitter Bootstrap 3.1 Magento 1.8 Boilerplate Template Read the blog post or checkout the demo for more information. Insta

null 531 Dec 8, 2022
A project of a Login screen made in PHP/CSS3/HTML5/JS with MySQL database integration

A project of a Login screen made in PHP/CSS3/HTML5/JS with MySQL database integration. And animations made with CSS3 and JavaScript itself! ??

Marcel Leite de Farias 2 Apr 26, 2022
Sanitize untrustworthy HTML user input (Symfony integration for https://github.com/tgalopin/html-sanitizer)

html-sanitizer is a library aiming at handling, cleaning and sanitizing HTML sent by external users (who you cannot trust), allowing you to store it and display it safely. It has sensible defaults to provide a great developer experience while still being entierely configurable.

Titouan Galopin 86 Oct 5, 2022
Dobren Dragojević 6 Jun 11, 2023
This package provides a simple and intuitive way to work on the Youtube Data API. It provides fluent interface to Youtube features.

Laravel Youtube Client This package provides a simple and intuitive way to work on the Youtube Data API. It provides fluent interface to Youtube featu

Tilson Mateus 6 May 31, 2023
Michael Pratt 307 Dec 23, 2022
PHP library to create and validate html forms

FormManager Note: this is the documentation of FormManager 6.x For v5.x version Click here Installation: This package requires PHP>=7.1 and is availab

Oscar Otero 145 Sep 20, 2022
This shell script and PHP file create a browseable HTML site from the Zig standard library source.

Browseable Zig standard library This shell script and PHP file create a browseable HTML site from the Zig standard library source. The idea is to inve

Dave Gauer 3 Mar 20, 2022
A simple library for management the DOM (XML, HTML) document.

A simple library for management the DOM (XML, HTML) document.

Alexey 3 Oct 1, 2022
A comprehensive library for generating differences between two strings in multiple formats (unified, side by side HTML etc). Based on the difflib implementation in Python

PHP Diff Class Introduction A comprehensive library for generating differences between two hashable objects (strings or arrays). Generated differences

Chris Boulton 708 Dec 25, 2022
JSONFinder - a library that can find json values in a mixed text or html documents, can filter and search the json tree, and converts php objects to json without 'ext-json' extension.

JSONFinder - a library that can find json values in a mixed text or html documents, can filter and search the json tree, and converts php objects to json without 'ext-json' extension.

Eboubaker Eboubaker 2 Jul 31, 2022
This library provides a collection of native enum utilities (traits) which you almost always need in every PHP project.

This library provides a collection of native enum utilities (traits) which you almost always need in every PHP project.

DIVE 20 Nov 11, 2022