Standards compliant HTML filter written in PHP

Overview

HTML Purifier Build Status

HTML Purifier is an HTML filtering solution that uses a unique combination of robust whitelists and aggressive parsing to ensure that not only are XSS attacks thwarted, but the resulting HTML is standards compliant.

HTML Purifier is oriented towards richly formatted documents from untrusted sources that require CSS and a full tag-set. This library can be configured to accept a more restrictive set of tags, but it won't be as efficient as more bare-bones parsers. It will, however, do the job right, which may be more important.

Places to go:

  • See INSTALL for a quick installation guide
  • See docs/ for developer-oriented documentation, code examples and an in-depth installation guide.
  • See WYSIWYG for information on editors like TinyMCE and FCKeditor

HTML Purifier can be found on the web at: http://htmlpurifier.org/

Installation

Package available on Composer.

If you're using Composer to manage dependencies, you can use

$ composer require ezyang/htmlpurifier
Comments
  • Cannot retrieve value of undefined directive HTML.TargetNoreferrer

    Cannot retrieve value of undefined directive HTML.TargetNoreferrer

    I've upgraded HTML purifier from v4.6 to 4.8 and two messages start popping up in error_log

    Tue Sep 27 15:20:43.328747 2016] [:error] [pid 28496] [client ::1:33748] PHP Warning:  Cannot retrieve value of undefined directive HTML.TargetNoreferrer invoked on line 6316 in file /var/www/include/HTMLPurifier.standalone.php in /var/www/include/HTMLPurifier.standalone.php on line 2634
    
    [Tue Sep 27 14:49:31.483688 2016] [:error] [pid 29895] [client ::1:33464] PHP Warning:  Cannot retrieve value of undefined directive CSS.AllowDuplicates invoked on line 10104 in file /var/www/include/HTMLPurifier.standalone.php on line 2634
    

    I tried to set it via config, but didn't help:

    $config->set( 'CSS.AllowDuplicates', TRUE );
    $config->set( 'HTML.TargetNoreferrer', TRUE );
    

    I just got Cannot set undefined directive CSS.AllowDuplicates to value invoked on line 14 in file...

    I couldn't find anything about it. I use PHP 5.4.16 (cli) (built: Aug 11 2016 21:24:59) and standalone version. Could you help me?

    Thanks

    opened by istana 20
  • Class HTMLPurifier_Language_en_x_test does not comply with psr-0 autoloading standard

    Class HTMLPurifier_Language_en_x_test does not comply with psr-0 autoloading standard

    I get this from composer: Deprecation Notice: Class HTMLPurifier_Language_en_x_test located in ./vendor/ezyang/htmlpurifier/library/HTMLPurifier/Language/classes/en-x-test.php does not comply with psr-0 autoloading standard. It will not autoload anymore in Composer v2.0. in phar:///composer.phar/src/Composer/Autoload/ClassMapGenerator.php:185

    opened by xpetter 19
  • PSR-2 Reformatting and PHPDoc corrections

    PSR-2 Reformatting and PHPDoc corrections

    This is a bit of a monster...

    The PSR-2 reformatting was done with a mixture of Sensio's php-cs-fixer followed by more automated formatting and hand-editing of layout and phpdocs in PHPStorm.

    With all the phpdoc changes, everything springs to life in IDEs that do static analysis, auto-completion and type checking. In a few places it took quite a bit of detective work to figure out what types things should be. Sometimes I couldn't work it out, and I've added 'fix type' todos in those places.

    Ther are some very minor code changes - a couple of deleted unused local vars.

    I've not attempted to address PSR-0 issues with name spacing, composer etc - there are plenty of forks that have.

    There's an awful lot of code in here, and I've learned a lot by going through it such detail.

    Tests are passing. Time for some sleep!

    opened by Synchro 19
  • It does not move on PHP7.1.5

    It does not move on PHP7.1.5

    Hi. Thank you for a wonderful library.

    But it does not move in my environment, I made Issue.

    It runs on PHP 7.1.5, but processing will not proceed at the following places.

    library\HTMLPurifier\HTMLModule\List.php:32$ol = $this->addElement('ol', 'List', new HTMLPurifier_ChildDef_List(), 'Common');

    Commenting out this part will work.

    Endless run and CPU process is 100%.

    Please Help me.

    Thank you.

    opened by git-kurara 16
  • Fix autoloading

    Fix autoloading

    Hi,

    The new DebugClassLoader used by symfony (https://github.com/symfony/Debug/blob/master/DebugClassLoader.php) seems disagree with your autoload definition.

    The autoloader expected class "HTMLPurifier\Bootstrap" to be defined in file "/My/Home/Project/vendor/ezyang/htmlpurifier/library/HTMLPurifier/Bootstrap.php". The file was found but the class was not in it, the class name or namespace probably has a typo.

    Here is the fix.

    opened by tyx 15
  • PHP 8.1 Deprecated funcs

    PHP 8.1 Deprecated funcs

    Deprecated: Return type of HTMLPurifier_PropertyListIterator::accept() should either be compatible with FilterIterator::accept(): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in C:\Xampp\htdocs\libs\formr\HTMLPurifier.standalone.php on line 8340

    Deprecated: Return type of HTMLPurifier_StringHash::offsetGet($index) should either be compatible with ArrayObject::offsetGet(mixed $key): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in C:\Xampp\htdocs\libs\formr\HTMLPurifier.standalone.php on line 8458

    opened by xkpx64 13
  • Fix PHP 8.2 deprecated utf8_encode & utf8_decode

    Fix PHP 8.2 deprecated utf8_encode & utf8_decode

    Removed usages of deprecated functions utf8_decode & utf8_encode

    This definitely isn't the way to solve the issue in the long run, but would at least keep backwards compatibility with minimal changes apart from dropping PHP 5.2 support. Removing utf8_encode and utf8_decode completely would require quite a bit of changes to related tests and would also break backwards compatibility.

    opened by SharkMachine 11
  • namespaced alias class

    namespaced alias class

    presuming a sane pre-existing autoloader for namespaced modules, this is immediately deployable with a nice namespaced alias class, eg:

    $purifier = new \HTMLPurifier\Processor();
    $html = '<strong>a string<br></strong>'; // (notice the dirty <br> that should be <br />)
    echo $purifier->purify($html); // outputs: <strong>a string<br /></strong>
    
    opened by Evan-R 11
  • feat: add semantic release

    feat: add semantic release

    Requires a GitHub auth token adding to the repository secrets of name GITHUB_TOKEN https://docs.github.com/en/actions/security-guides/encrypted-secrets

    How does it work? https://semantic-release.gitbook.io/semantic-release/#how-does-it-work

    Release notes / change log is generated based on fix: / feat: commit messages. So the first automatic release won't include historic commits which don't match that format.

    • NEWS is generated automatically per above using markdown format, so will not look the same as before...
    • WHATSNEW has been removed as it's difficult to automatically generate

    Closes #304

    released 
    opened by bytestream 10
  • YouTube not working (iframe)

    YouTube not working (iframe)

    First, awesome package! I began using it to eliminate all of the garbage from MSWord copy & paste.

    Everything is working great but I simply cannot figure out how to allow YouTube videos. I have tried multiple code examples, no matter what the iframe tag is always removed. What am I missing here?

    Code:

    // HTMLPurifier
            $config = \HTMLPurifier_Config::createDefault();
    
            $config->set('HTML.Doctype', 'HTML 4.01 Transitional');
            $config->set('AutoFormat.RemoveEmpty.Predicate', [
                'colgroup' =>
                    [],
                'th' =>
                    [],
                'td' =>
                    [],
                'o:p' =>
                    []
            ]);
            $config->set('AutoFormat.RemoveEmpty', true);
            $config->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
            $config->set('HTML.Allowed', 'p,span[style|class],a[href|title],abbr[title],acronym[title],b,strong,blockquote[cite],code,em,i,iframe[src|width|height],img[alt|title|class|src|height|width],h1,h2,h3,h3,ol,ul,li,table[class|style],tr,td,hr');
            $config->set('HTML.SafeIframe', true);
            $config->set('URI.SafeIframeRegexp', '%^(\/\/www\.youtube(?:-nocookie)?\.com\/embed\/|\/\/player\.vimeo\.com\/)%');
    
            $def = $config->getHTMLDefinition(true);
            $def->addAttribute('iframe', 'allowfullscreen', 'Bool');
    
            $purifier = new \HTMLPurifier($config);
    
            return $purifier->purify($string);
    
    opened by opheliadesign 10
  • Missing spaces after upgrade from 4.6.0 to 4.7.0

    Missing spaces after upgrade from 4.6.0 to 4.7.0

    On 4.6.0 we had the following:

    <b>Vetgedrukt</b> <i>Schuingedrukt</i> <span>Hou</span><iframe></iframe><script></script> jij ook zo van vakjesdenken?
    

    became:

    <b>Vetgedrukt</b> <i>Schuingedrukt</i> Hou jij ook zo van vakjesdenken?
    

    since 4.7.0 it becomes:

    <b>Vetgedrukt</b><i>Schuingedrukt</i>Hou jij ook zo van vakjesdenken?
    

    I checked the changeset (https://github.com/ezyang/htmlpurifier/compare/v4.6.0...v4.7.0) but can't really find what's causing this.

    Any thoughts?

    opened by fieg 10
  • Some csstidy declaration are not handle properly in ExtractStyleBlocks

    Some csstidy declaration are not handle properly in ExtractStyleBlocks

    In csstidy, IMPORTANT_COMMENT ($this->_tidy->css['!'] in ExtractStyleBlocks) are declaired as string. But, when it goes trough ExtractStyleBlocks, it is transformed in an empty array while it should stay a string.

    • ExtractStyleBlocks.php line 317-319 where IMPORTANT_COMMENT is transform
        $new_decls[$selector] = $style;
    }
    $new_css[$k] = $new_decls;
    

    The reason why it is reuse by csstidy where it expected to be a string and it cause an error because it is now an array.

    • class.csstidy_print.php line 352-355 where IMPORTANT_COMMENT is reuse in csstidy
    if (isset($this->css['!'])) {
          $this->parser->_add_token(IMPORTANT_COMMENT, rtrim($this->css['!']), true);
          unset($this->css['!']);
    }
    

    Ex:

    <style>
    /*! important comment */
    h1 {
      color: white;
      text-align: center;
    }
    
    /*! another important comment */
    p {
      font-family: verdana;
      font-size: 20px;
    }
    </style>
    

    After being parse by csstidy (ExtractStyleBlocks.php line 141), IMPORTANT_COMMENT looks like this: $this->_tidy->css['!'] = 'important comment\nanother important comment'

    After being transform by ExtractStyleBlocks, IMPORTANT_COMMENT looks like this: $this->_tidy->css['!'] = []

    It than cause an argument error rtrim(): Argument #1 ($string) must be of type string, array given when it is reuse by csstidy.

    opened by Wolfrank1149 1
  • How to keep CSS variable

    How to keep CSS variable

    Hello,

    How can I configure the library to sanitize style attribute (remove duplicate, etc...) but keep CSS variable please ?
    If I give the following html, it removes the style property color because it considers that it's not a valid color I guess.

    <span style="color: var(--color-c16);">Hello World !</span>
    

    Thanks !

    opened by nmorel 0
  • "Trying to get property 'browsable' of non-object" in HTMLPurifier_AttrTransform_* methods, because parse or getSchemeObj of HTMLPurifier_URIParser can return false

    Hi! I have an PHP error "Trying to get property 'browsable' of non-object" when I execute code in HTMLPurifier_AttrTransform_TargetBlank and HTMLPurifier_AttrTransform_Nofollow, because, they don't check results of

    (new HTMLPurifier_URIParser())->parse (can return false)
    (new HTMLPurifier_URIParser())->parse->getSchemeObj (also can return false)
    

    for false value. Methods suggest, that only object can be returned.

    $url = $this->parser->parse($attr['href']);
    $scheme = $url->getSchemeObj($config, $context);
    
    if ($scheme->browsable && !$url->isLocal($config, $context)) {
    

    need replace to something like that:

    $url = $this->parser->parse($attr['href']);
    if(!$url) return $attr;
    $scheme = $url->getSchemeObj($config, $context);
    if(!$scheme) return $attr;
    if ($scheme->browsable && !$url->isLocal($config, $context)) {
    

    You can test purify on html like this:

    <a href="javascript:">some text</a>
    

    My config for new HTMLPurifier:

    $config = HTMLPurifier_Config::createDefault();
    $config->set("HTML.Nofollow", true);
    $config->set("HTML.TargetNoreferrer", true);
    $config->set("HTML.TargetNoopener", true);
    $config->set("HTML.TargetBlank", true);
    $config->set('Attr.EnableID', true);
    $def = $config->getHTMLDefinition(true);
    $def->addAttribute('img', 'src', new ParameterURIDefinition($this->whiteListedResources));
    $def->addAttribute('div', 'data-react', new \HTMLPurifier_AttrDef_Text());
    $def->addAttribute('a', 'href', new \HTMLPurifier_AttrDef_Text());
    
    opened by alexander-xyz 0
  •  Dont work Telegram, Viber href=tg://

    Dont work Telegram, Viber href=tg://

    Send: <a href="http://www.example.com/">inline URL</a> <a href="tg://user?id=123456789">inline mention of a user</a> <a href="tg://msg?text=text&to=+79851112233">msg with text</a> <a href="callto:79851112233">call</a> <a href="tel:+496170961709" >call</a> Result: <a href="http://www.example.com/">inline URL</a> <a>inline mention of a user</a> <a>msg with text</a> <a>call</a> <a href="tel:+496170961709">call</a>

    Help!
    
    opened by mazai 4
  • Lost case of text when using Core.EscapeInvalidTags

    Lost case of text when using Core.EscapeInvalidTags

    Hello!

    Let's check following code:

    $allowedTagsList = [
                'a',
                'p',
                'div',
            ];
    
    $config = HTMLPurifier_Config::createDefault();
    $config->set('Core.EscapeInvalidTags', true);
    $config->set('HTML.AllowedElements', $allowedTagsList);
    $purifier = new HTMLPurifier($config);
    $htmlBody = $purifier->purify($htmlBody);
    

    $htmlBody before purifying <p>Hello! I want attach following xml:</p><p><someGreatTag>someGreatValue</someGreatTag></p>

    Expected after purifying <p>Hello! I want attach following xml:</p><p>&lt;someGreatTag&gt;someGreatValue&lt;/someGreatTag&gt;</p>

    But in reality <p>Hello! I want attach following xml:</p><p>&lt;somegreattag&gt;someGreatValue&lt;/somegreattag&gt;</p>

    So, any denied tag lose capitalization (someGreatTag -> somegreattag)

    php 7.4.27 htmlpurifier 4.14.0

    opened by andreybatalof 0
Releases(v4.16.0)
On International Talk Like a Pirate Day (September 19th), this filter changes all appropriate English phrases and words into pirate-speak.

Pirate This module is a simple filter that, when enabled, will change your posts to "Pirate talk" on September 19th for Talk like a Pirate Day Install

Backdrop CMS contributed projects 3 Oct 26, 2021
Let's Encrypt/ACME Command Line client written in PHP

Acme PHP Acme PHP is a simple yet very extensible CLI client for Let's Encrypt that will help you get and renew free HTTPS certificates. Acme PHP is a

Acme PHP 539 Dec 30, 2022
HTML/PHP/CSS website that tracks two API data

Detailed instructions on how to build and run Step 1: download XAMPP for a live web server XAMPP download 1 XAMP download 2 Step 2: Download all files

Winsor Tse 0 Jun 2, 2022
Quickly and easily secure HTML text.

Larasane Quickly sanitize text into safe-HTML using fluid methods. Requirements PHP 7.4, 8.0 or later. Laravel 7.x, 8.x or later. Installation Just fi

Italo 40 Jul 20, 2021
PHPIDS (PHP-Intrusion Detection System) is a simple to use, well structured, fast and state-of-the-art security layer for your PHP based web application

PHPIDS PHPIDS (PHP-Intrusion Detection System) is a simple to use, well structured, fast and state-of-the-art security layer for your PHP based web ap

null 752 Jan 3, 2023
php-chmod is a PHP library for easily changing permissions recursively.

PHP chmod php-chmod is a PHP library for easily changing the permissions recursively. Versions & Dependencies Version PHP Documentation ^1.1 ^7.4 curr

Mathias Reker ⚡️ 5 Oct 7, 2022
PHP 5.x support for random_bytes() and random_int()

random_compat PHP 5.x polyfill for random_bytes() and random_int() created and maintained by Paragon Initiative Enterprises. Although this library sho

Paragon Initiative Enterprises 8k Jan 5, 2023
PHP Secure Communications Library

phpseclib - PHP Secure Communications Library Supporting phpseclib Become a backer or sponsor on Patreon One-time donation via PayPal or crypto-curren

null 4.9k Jan 7, 2023
Simple Encryption in PHP.

php-encryption composer require defuse/php-encryption This is a library for encrypting data with a key or password in PHP. It requires PHP 5.6 or new

Taylor Hornby 3.6k Jan 3, 2023
A database of PHP security advisories

PHP Security Advisories Database The PHP Security Advisories Database references known security vulnerabilities in various PHP projects and libraries.

null 1.9k Dec 18, 2022
A php.ini scanner for best security practices

Scanner for PHP.ini The Iniscan is a tool designed to scan the given php.ini file for common security practices and report back results. Currently it

psec.io 1.5k Dec 5, 2022
🤖 Id obfuscation based on Knuth's multiplicative hashing method for PHP.

Optimus id transformation With this library, you can transform your internal id's to obfuscated integers based on Knuth's integer hash. It is similar

Jens Segers 1.2k Jan 2, 2023
㊙️ AntiXSS | Protection against Cross-site scripting (XSS) via PHP

㊙️ AntiXSS "Cross-site scripting (XSS) is a type of computer security vulnerability typically found in Web applications. XSS enables attackers to inje

Lars Moelleken 570 Dec 16, 2022
An experimental object oriented SSH api in PHP

PHP SSH (master) Provides an object-oriented wrapper for the php ssh2 extension. Requirements You need PHP version 5.3+ with the SSH2 extension. Insta

Antoine Hérault 355 Dec 6, 2022
TCrypto is a simple and flexible PHP 5.3+ in-memory key-value storage library

About TCrypto is a simple and flexible PHP 5.3+ in-memory key-value storage library. By default, a cookie will be used as a storage backend. TCrypto h

timoh 57 Dec 2, 2022
Fetches random integers from random.org instead of using PHP's PRNG implementation

TrulyRandom Composer-compatible library to interact with random.org's API in order to generate truly random lists of integers, sequences of integers,

Erik Wurzer 46 Nov 25, 2022
PHPGGC is a library of PHP unserialize() payloads along with a tool to generate them, from command line or programmatically.

PHPGGC: PHP Generic Gadget Chains PHPGGC is a library of unserialize() payloads along with a tool to generate them, from command line or programmatica

Ambionics Security 2.5k Jan 4, 2023
PHP Malware Finder

PHP Malware Finder _______ __ __ _______ | ___ || |_| || | | | | || || ___| | |___| || || |___ Webshell finder, |

NBS System 205 Dec 24, 2022
Compatibility with the password_* functions that ship with PHP 5.5

password_compat This library is intended to provide forward compatibility with the password_* functions that ship with PHP 5.5. See the RFC for more d

Anthony Ferrara 2.2k Dec 30, 2022