html-sanitizer is a library aiming at handling, cleaning and sanitizing HTML sent by external users

Overview

html-sanitizer

Build Status Packagist Version Software license

SymfonyInsight

html-sanitizer is a library aiming at handling, cleaning and sanitizing HTML sent by external users (who you cannot trust), allowing you to store it and display it safely. It has sensible defaults to provide a great developer experience while still being entirely configurable.

Internally, the sanitizer has a deep understanding of HTML: it parses the input and create a tree of DOMNode objects, which it uses to keep only the safe elements from the content. By using this technique, it is safe (it works with a strict whitelist), fast and easily extensible.

It also provides useful features such as the possibility to transform images or iframes URLs to HTTPS.

Symfony integration

This library is also available as a Symfony bundle.

Documentation

  1. Getting started
  2. Creating an extension to allow custom tags
  3. Configuration reference
  4. Comparison with HTMLPurifier

Security Issues

If you discover a security vulnerability within the sanitizer, please follow our disclosure procedure.

Backward Compatibility promise

This library follows the same Backward Compatibility promise as the Symfony framework: https://symfony.com/doc/current/contributing/code/bc.html

Note: many classes in this library are either marked @final or @internal. @internal classes are excluded from any Backward Compatiblity promise (you should not use them in your code) whereas @final classes can be used but should not be extended (use composition instead).

Thanks

Many thanks to:

Comments
  • Add support for u (unarticulated) tag

    Add support for u (unarticulated) tag

    Although u (as in underline) tag has been deprecated in HTML 4.01, u tag has been redefined in HTML 5 ("u" as in unarticulated). Thus, I think it should be included in the basic extension.

    opened by fbastien 8
  • How to prevent HTML encode (e.g @ => @)

    How to prevent HTML encode (e.g @ => @)

    Thank you for the library.

    I just have one question that I could not find an answer anywhere (so far). Is there any way to prevent HTML encode of the characters (e.g. prevent < to transform to & lt; > to & gt; @ to & #64; etc.). I use the Symfony bundle (if it matters).

    Thanks.

    opened by BackNot 5
  • Should empty tags be removed

    Should empty tags be removed

    If we provide the following HTML:

    <p>Hello</p>
    <img src="javascript:evil();" onload="evil();" />
    

    Then pass it through html-sanitizer we get:

    <p>Hello</p>
    <img />
    

    Should the <img> tag be taken out entirely now it has no attributes left?

    opened by jonnybarnes 5
  • <u> tag not supported

    tag not supported

    Hi, I am using html/sanitizer version 1.4.0 in a Symfony project, and the Sanitizer filters out the <u> tags even though it is explicitly configured in the tags section as in the configuration reference file.

          $sanitizer = Sanitizer::create([
              'extensions' => ['basic', 'code', 'image', 'list', 'table', 'details', 'extra'],
              'tags' => [
                  'u' => [
                      'allowed_attributes' => [],
                  ],
              ],
          ]);
    

    I noticed there was a PR (#61) for adding <u> tags into the basic extension but it's not been released yet. Are tags only available from within extensions? So until that commit is released there will be no support for <u> tags? Could someone please advise?

    Thanks a lot!

    opened by adriflorence 3
  • How can I allow

    How can I allow "target=_blank" ?

    Hi @tgalopin & all

    This package is extremely useful, thanks for all the hard work. I am currently using it in a site where I need to be able to set a target when for certain links. Is there a simple way to configure this ?

    I'm already using an extension provided by @olegatro to allow relative URI's. I thought perhaps it would be possible to modify that extension ?

    Thanks for any help, PhR

    opened by Phroggy78 2
  • Added MathML Extension + Added sanitizer for xlink:href in maction tag

    Added MathML Extension + Added sanitizer for xlink:href in maction tag

    This is very big commit, sorry for that. I created these all files by running foreach loop which created multiple node and NodeVisitor files for me in appropriate directory.

    Sanitizer added because of this previous reported XSS in MathML

    opened by rohitcoder 2
  • Out of memory on malformed string

    Out of memory on malformed string

    Hello,

    I'm having out of memory exceptions when using HTML sanitizer (version 1.3.0) on a malformed string.

    php.CRITICAL: Fatal Error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 352256 bytes) {"exception":"[object] (Symfony\\Component\\ErrorHandler\\Error\\OutOfMemoryError(code: 0): Error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 352256 bytes) at /srv/web/vendor/masterminds/html5/src/HTML5/Parser/Tokenizer.php:1054)"
    

    Here is a snippet to reproduce the issue:

    The message comes from a chat which truncates messages when they are too long, leading to some invalid html content. (Fun fact, this message comes from production).

    <?php
    /** @var SanitizerInterface $sanitizer   */
    $sanitizer->sanitize("<p>Apr\u00e8s s&#039;il y a un gros bug et que tout le monde en profite, mon avis l\u00e0 dessus peut changer. Mais normalement non, pas de reset pour les joueurs arriv\u00e9s avec la beta publique.<\/p>\n\n<p>Par contre certains \u00e9quilibrages changeront, c&#");
    
    opened by Stoakes 2
  • Images src attribute removed

    Images src attribute removed

    Hi, I'm trying to use html-sanitizer to allow users to create articles in a Blog style application I'm building. I can't figure out why sanitizer is removing src attribute from images tags.

    The config I'm using is this one $this->sanitizerConfig = [ 'extensions' => ['basic', 'code', 'image', 'list', 'table'], 'tags' => [ 'a' => [ 'allowed_hosts' => null, 'allow_mailto' => true, ], 'img' => [ 'allowed_attributes' => ['src', 'alt', 'title', 'width', 'height'], 'allowed_hosts' => null, 'allow_data_uri' => true, 'force_https' => false, ], 'div' => [ 'allowed_attributes' => ['class'], ], 'span' => [ 'allowed_attributes' => ['class'], ], 'table' => [ 'allowed_attributes' => ['class'], ], 'p' => [ 'allowed_attributes' => ['class'], ], 'h1' => [ 'allowed_attributes' => ['class'], ], 'h2' => [ 'allowed_attributes' => ['class'], ], 'h3' => [ 'allowed_attributes' => ['class'], ], 'h4' => [ 'allowed_attributes' => ['class'], ], ], ]; this is the html before sanitizing "<p><img src="/images/uploaded/articles/1b75dd06bf92c5e04e1491af441491fe9a7d7bab.png" alt="Test image" width="960" height="638" /></p>" and this is what I get from sanitize method.

    "<p><img alt="Test image" width="960" height="638" /></p>"

    Thanks for your help.

    opened by alartigue 2
  • Fix allow data uri on img

    Fix allow data uri on img

    When allow_data_uri is set to true on img and allowed_hosts is left at the default null like below, allowed_hosts is effectively overwritten by an array containing only null causing all other images to be blocked.

    "img" => [ "allowed_hosts" => null, // Example: ["trusted1.com", "google.com"] "allow_data_uri" => true ]

    I propose this small change to only add the null value if allowedHosts is already an array.

    opened by martijnve 2
  • Create a static method to pass extensions and `$html`

    Create a static method to pass extensions and `$html`

    In my project at the moment I need to purify 2 properties in different entities with the same extensions, so I create a simple helper.

    class SanitizerHelper
    {
        private const SANITIZER_WHITE_LIST = ['basic', 'code', 'image', 'list', 'table', 'iframe', 'extra'];
    
        public static function sanitize(string $html)
        {
            $sanitizer = Sanitizer::create(['extensions' => self::SANITIZER_WHITE_LIST]);
    
            return $sanitizer->sanitize($html);
        }
    }
    

    The idea is to create a static method to pass extensions as first args and content as second args : Actual

     $sanitizer = Sanitizer::create(['extensions' => self::SANITIZER_WHITE_LIST]);
     $safeHtml = sanitizer->sanitize($untrustedHtml);
    

    After

    $safeHtml = HtmlSanitizer\Sanitizer::sanitize([
        'extensions' => ['basic', 'code', 'image', 'list', 'table', 'iframe', 'details', 'extra'],
         $untrustedHtml)
    

    WDYT ?

    opened by ismail1432 2
  • Comparison with HTMLPurifier

    Comparison with HTMLPurifier

    It would be great to have a comparison between this package and the HTMLPurifier library which is out there since a long time (what are the differences in the feature they support, etc...)

    opened by stof 2
  • General replacements are breaking URLs

    General replacements are breaking URLs

    https://github.com/tgalopin/html-sanitizer/blob/82da21fbb6ca4ed6ac3458391612529958b0a69f/src/Sanitizer/StringSanitizerTrait.php#L23

    This replacement is breaking URLs and replacing valid =

    opened by wlasnapl 0
  • PHP Notice:

    PHP Notice: "Undefined offset: 2" if allowed_hosts contains subdomains

    Hi,

    I recently got the following error:

    Notice: Undefined offset: 2
    

    here: https://github.com/tgalopin/html-sanitizer/blob/d2dd64cf1a3739167802aa00217df0f17e67372a/src/Sanitizer/UrlSanitizerTrait.php#L83

    I got this error after adding a subdomain to allowed_hosts like sub.example.org and this causes the function to be called with the following parameters if a url like https://example.org in a href attribute is sanitized:

    $uriParts = ['org', 'example']; 
    $trustedParts = ['org', 'example', 'sub'];
    

    Is a subdomain not allowed in allowed_hosts or is this a bug?

    opened by codegain 0
  • Unable to use case sensitive attributes and allow empty string attribute's value ( =

    Unable to use case sensitive attributes and allow empty string attribute's value ( ="" )

    Concern 01

    Is there any possibility to allow case sensitive attributes with this package along with a config setting.

    I have HTML like below and it removes case sensitive attributes even I add the attribute's named categoryType under allowed_attributes for div tag.

    Ex:

    <div class="custom" categoryType="books"></div>

    Sanitizer is returning for above HTML as below.

    <div class="custom"></div>

    I have debug your library and found it identify the attribute as "categorytype" (all letters are in lower case). But could we have case sensitive attributes ?

    Concern 02

    Sanitizer package is removing empty strings value of a attribute like below. (="")

    Ex:

    <a rel="" href="https://github.com/tgalopin/html-sanitizer/">HTML Sanitizer</a>

    Sanitizer returns

    <a rel href="https://github.com/tgalopin/html-sanitizer/">HTML Sanitizer</a>

    But could we have empty string as attribute value. May be we can allow this feature as well with a config setting.
    <a rel="" href="https://github.com/tgalopin/html-sanitizer/">HTML Sanitizer</a>

    opened by dulmi-j 2
Releases(1.5.0)
Owner
Titouan Galopin
Helping developers create great Symfony projects with SymfonyInsight (insight.symfony.com).
Titouan Galopin
Sanitize untrustworthy HTML user input (Symfony integration for https://github.com/tgalopin/html-sanitizer)

html-sanitizer is a library aiming at handling, cleaning and sanitizing HTML sent by external users (who you cannot trust), allowing you to store it and display it safely. It has sensible defaults to provide a great developer experience while still being entierely configurable.

Titouan Galopin 86 Oct 5, 2022
Kirby 3 Plugin for running jobs like cleaning the cache from within the Panel, PHP code, CLI or a cronjob

Kirby 3 Janitor Kirby 3 Plugin for running jobs. It is a Panel Button! It has jobs build-in for cleaning the cache, sessions, create zip-backup, pre-g

Bruno Meilick 68 Dec 21, 2022
jMQTT is a plugin for Jeedom aiming to connect Jeedom to an MQTT broker to subscribe and publish messages

jMQTT is a plugin for Jeedom aiming to connect Jeedom to an MQTT broker to subscribe and publish messages

null 19 Dec 27, 2022
A simple mailable trait and interface to export mails to a storage disk once being sent.

Laravel Mail Export This package can export any mail sent with Laravel's Mailable class to any desired filesystem disk and path as a .eml file. This c

Pod Point 80 Nov 6, 2022
Silverstripe-fulltextsearch - Adds external full text search engine support to SilverStripe

FullTextSearch module Adds support for fulltext search engines like Sphinx and Solr to SilverStripe CMS. Compatible with PHP 7.2 Important notes when

Silverstripe CMS 42 Dec 30, 2022
MOP is a php query handling and manipulation library providing easy and reliable way to manipulate query and get result in a fastest way

Mysql Optimizer mysql optimizer also known as MOP is a php query handling and manipulation library providing easy and reliable way to manipulate query

null 2 Nov 20, 2021
Silverstripe-masquerade - SilverStripe module to allow users to "masquerade" as other users

SilverStripe Masquerade Module About This module is designed to allow an Administrator to "login" as another "Member" without changing their password

Daniel Hensby 14 Apr 14, 2022
Firebird-PHP: A library created to meet a work need when handling a Firebird database

Firebird-PHP: A library created to meet a work need when handling a Firebird database

Philipe  Lima 3 Jun 27, 2022
PHP library for handling sessions

Horizom Session PHP library for handling sessions. Requirements Installation Available Methods Quick Start Usage Tests TODO Changelog Contribution Spo

Horizom 1 Aug 29, 2022
Receiver is a drop-in webhook handling library for Laravel.

Receiver Receiver is a drop-in webhook handling library for Laravel. Webhooks are a powerful part of any API lifecycle. Receiver aims to make handling

Adam Campbell 270 Jan 6, 2023
An opinionated extension package for Laravel Orchid to extend its table handling capabilities, and some further useful helper methods.

OrchidTables An opinionated extension package for Laravel Orchid to extend its table handling capabilities, and some further useful helper methods. In

null 25 Dec 22, 2022
Decimal handling as value object instead of plain strings.

Decimal Object Decimal value object for PHP. Background When working with monetary values, normal data types like int or float are not suitable for ex

Spryker 16 Oct 24, 2022
Composer script handling your ignored parameter file

Managing your ignored parameters with Composer This tool allows you to manage your ignored parameters when running a composer install or update. It wo

Incenteev 921 Nov 21, 2022
QueryHandler - Handling PDO ' s query with mySQL database

QueryHandler this class's method are static .... that mean you don't need to create an object to use it . All methodes will return an Exception if it

null 7 Aug 9, 2022
PocketMine-MP virion for easy handling of ScoreBoard packets

PocketMine-MP virion for easy handling of ScoreBoard packets

pocketmine virions of avas 3 Apr 9, 2022
A XOOPS module for handling events, including online registrations.

wgEvents A XOOPS module for handling events, including online registrations. Support If you like the wgEvents module and thanks to the long process fo

XOOPS 2.5.x Modules 6 Dec 15, 2022
Icinga Director has been designed to make Icinga 2 configuration handling easy

The Director aims to be your new favourite Icinga config deployment tool. Director is designed for those who want to automate their configuration deployment and those who want to grant their “point & click” users easy access to the configuration.

Icinga 395 Jan 3, 2023
A plugin to make life easier for users who need to edit specific functions of a world and also create, rename and delete worlds quickly using commands or the world management menu.

A plugin to make life easier for users who need to edit specific functions of a world and also create, rename and delete worlds quickly using commands or the world management menu.

ImperaZim 0 Nov 6, 2022
Talkino allows you to integrate multi social messengers and contact into your website and enable your users to contact you using multi social messengers' accounts.

Talkino Welcome to our GitHub Repository Talkino is a click to chat plugin to show your agents’ multiple social messengers, phone and emails on the ch

Traxconn 2 Sep 21, 2022