A lightweight lexical string parser for BBCode styled markup.

Overview

Decoda

Build Status

A lightweight lexical string parser for BBCode styled markup.

Requirements

  • PHP 5.6.0+
    • Multibyte
  • Composer

Contributors

Framework Integrations

Features

  • Parses custom code to valid (X)HTML markup
  • Setting to make links and emails auto-clickable
  • Setting to use shorthand text for links and emails
  • Filters to parse markup and custom code
  • Hooks to execute callbacks during the parsing cycle
  • Loaders to load resources and files for configuration
  • Engines to render complex markup using a template system
  • Can censor offensive words
  • Can convert smiley faces into images
  • Basic support for localized messages
  • Parser result caching
  • Supports a wide range of tags
  • Parent child node hierarchy
  • Fixes incorrectly nested tags by removing the broken/unclosed tags
  • Self closing tags
  • Logs errors for validation
  • Tag and attribute aliasing

Filters

The following filters and supported tags are available.

  • Default - b, i, u, s, sup, sub, br, hr, abbr, time
  • Block - align, float, hide, alert, note, div, spoiler, left, right, center, justify
  • Code - code, source, var
  • Email - email, mail
  • Image - image, img
  • List - list, olist, ol, ul, li, *
  • Quote - quote
  • Text - font, size, color, h1-h6
  • Url - url, link
  • Video - video, youtube, vimeo, veoh, liveleak, dailymotion, myspace, wegame, collegehumor
  • Table - table, thead, tbody, tfoot, tr, td, th, row, col

Hooks

The following hooks are available.

  • Censor - Censors all words found within config/censored
  • Clickable - Converts all non-tag wrapped URLs and emails into clickable links
  • Emoticon - Converts all smilies found within config/emoticons into emoticon images

Storage Engines

The following caching layers are supported.

  • In-Memory
  • Memcache
  • Redis

Installation and Usage

See docs.

Comments
  • Emoticons are not parsed within quote tag

    Emoticons are not parsed within quote tag

    Code:

    $code = new Decoda\Decoda('[quote=milesj]Hello, my name is [b]Miles Johnson[/b] :)[/quote] [b]Hello[/b] ;)');
    $code->defaults();
    $code->addHook(new \Decoda\Hook\EmoticonHook());
    echo $code->parse();
    

    Output:

    <blockquote class="decoda-quote">
        <div class="decoda-quote-body">
            Hello, my name is <b>Miles Johnson</b> :)
        </div>
    </blockquote>
    <b>Hello</b> <img src="/images/wink.png" alt="">
    

    As you can see the :) inside the quote is rendered as text instead of an image, which seems to have to do with persistContent being false in QuoteFilter.

    opened by ErikMinekus 16
  • Speed up emoticon detection

    Speed up emoticon detection

    | Q | A | | --- | --- | | Bug fix? | no | | New feature? | no | | BC breaks? | yes | | Deprecations? | no | | Tests pass? | yes | | Related tickets | | | License | MIT |

    With PCRE assertions does not need a second pass anymore.

    The callback can be also cleanup.

    BC Breaks

    • The first parameter of Decoda\Hook\EmoticonHook::_emoticonCallback() method has changed:

      Before:

      class MyEmoticonHook extends \Decoda\Hook\EmoticonHook
      {
          // ...
          protected function _emoticonCallback($matches) {
               $matches[0];       // contain the text that matches the full pattern
               $matches['left'];  // contain the text at the left of the smiley
               $matches[1];       // contain the text at the left of the smiley
               $matches['right']; // contain the text at the right of the smiley
               $matches[2];       // contain the text at the right of the smiley
          }
          // ...
      }
      

      Atfer:

      class MyEmoticonHook extends \Decoda\Hook\EmoticonHook
      {
          // ...
          protected function _emoticonCallback($matches) {
               $matches[0];        // contain the text that matches the full pattern only the smiley
               $matches[1];        // contain the text of the smiley
               $matches['smiley']; // contain the text of the smiley
          }
          // ...
      }
      
    opened by alquerci 14
  • Strip bbcodes

    Strip bbcodes

    Can we get a way to strip all the tags so that:

    test [b]some [i]text[/i] here[/b] with a [url]http://google.com[/url] tag
    

    could become the following if we include content:

    test some text here with a http://google.com tag
    

    or if we exclude the content:

    test tag
    

    I am not sure how you call it whether you would do something like $code->parse(array('strip_tags'=>true, 'exclude_content'=>true)) or $code->stripTags(array('exclude_content'=>true))

    I was also wondering instead of that last options to add another param in the FilterAbstract for 'stripContent' => false and then when running $code->stripTags() it would default to including the text within the tag but removing the tag itself. Then by setting 'stripContent' => true it would also strip out everything within the tag which would be useful for [code]blah blah[/code].

    This functionality would allow us to get back to the original content for further parsing. For example, building search functionality against our content would not index the tags or the content within the tags that are not useful for our parsing.

    Feature 
    opened by patrickheeney 13
  • Template engine

    Template engine

    Added the possibilty to switch to other templates instead of php. You could implement a template engine to use twig or smarty files for example. Removed the constant for the templates an changed it to a method of the template engine. This allows a change of the template path if necessary.

    opened by sckoop 10
  • Smileys not parsed

    Smileys not parsed

    Hey,

    For instance, if you center this text "Hello :)" in a wysiwyg editor, the text will eventually look like this when passed to Decoda: "[center]Hello :)[/center]"

    The smiley will not get passed because it is right next to the /center tag..

    opened by Mewgood 9
  • Php 7.3 compatibility

    Php 7.3 compatibility

    I just updated code to be compatible with PHP 7.3.

    Seems to be working correctly, however the PHP 7.3 test seems to hang on Travis while it's working without issue on our machines.

    Please comment if you see changes to be made.

    opened by gignonje 8
  • Links and ClickableHook

    Links and ClickableHook

    Hello,

    I have problem with links and ClickableHook.

    Prev: http://google.com/example/links => <a href="http://google.com/example/links">http://google.com/example/links</a>

    Now: http://google.com/example/links => <a href="http://google.com">http://google.com</a>/example/links

    opened by dimak08 8
  • parse performance on big Posts

    parse performance on big Posts

    We have a few threads with quite big posts, problem is that the parsing of these texts exceeds the maximum PHP runtime. I striped down Decoda to the bare minimum and then tested it on my local machine with one of the problematic Threads.

    public function bbDecodeForum($text)
    {
      $textlen = strlen($text);
      if ($this->stopwatch) {
        $this->stopwatch
          ->start('bbDecodeForum: decode Post ' . $esPostData['id'] . ' with ' . $textlen . 'chars.');
      }
     $config = array(
       'xhtmlOutput' => false,
       'strictMode' => false,
       'escapeHtml' => true
     );
     $codedText = new Decoda($text, $config);
     $engine = new \Decoda\Engine\PhpEngine();
     $codedText
       ->setEngine($engine)
       ->setLocale('de-de')
       ->addFilter(new BBCode\BasicFilter())
       ->setMaxNewlines(3)
       ->setLineBreaks(true);
     $return = $codedText->parse();
     if ($this->stopwatch) {
       $this->stopwatch
         ->stop('bbDecodeForum: decode Post ' . $esPostData['id'] . ' with ' . $textlen . 'chars.');
     }
     return $return;
    }
    

    But still it need a big chunk of CPU time: Parsing time for big Posts

    Is there anything I can do to increase performance? Thanks in advance :)

    -Florian

    opened by frathe 8
  • Wrong parse result for [image]

    Wrong parse result for [image]

    Following BBCode

    [image]/some/url.png[/image]
    

    should be parsed as

    <img src="/some/url.png alt="">
    

    but it parsed to instead

    <img src="/some/url.png" alt="">/some/url.png</img>
    

    At the same time [img] tag works as supposed.

    opened by hilobok 8
  • Improve emoticons detection

    Improve emoticons detection

    QA
    Bug fix?yes
    New feature?no
    BC breaks?no
    Tests pass?yes
    License?MIT

    TODO:

    • [x] Add some tests
    • [x] Fix bugs founds

    This PR try to improved emoticons detection.

    opened by alquerci 8
  • Problem emoticons

    Problem emoticons

    When I use emoticon hook and I want to convert this string ':) :) :)' Parser convert only the first ":)"

    But if I add spaces between smileys, it works

    opened by Nelrann 8
  • Url filter problems

    Url filter problems

    [url=https://example.net/]Dies soll ein Link sein[/url]
    
    [url=https://example.net/]Dies soll ein Link[/url]
    
    [url=https://example.net/]Dies soll ein[/url]
    
    [url=https://example.net/]Dies soll[/url]
    
    [url=https://example.net/]Dies[/url]
    
    https://example.net/ 
    

    becomes

    <div class="body bbcode">
    Dies soll ein Link sein<br><br>
    Dies soll ein Link<br><br>
    Dies soll ein<br><br>
    Dies soll<br><br>
    <a href="http://Dies">Dies</a><br><br>
    <a href="https://example.net">https://example.net</a>/
    </div>
    

    So only the last two (with no space in the title) works.

    The trailing slash at the end getting removed from the last is only a minor issue. The main one for sure is the linking not working for the first 4

    opened by dereuromark 1
  • Normal

    Normal "quotes" are not escaped

    Not sure if thats a problem for normal use cases, but using HTML escaping or h() in CakePHP you get

    Some "<b>demo</b>" string
    

    transformed into

    Some &quot;&lt;b&gt;demo&lt;/b&gt;&quot; string
    

    with >/</" escaped

    But with this BBCode parser the " remain it seems:

    Some "&lt;b&gt;demo&lt;/b&gt;" string
    

    Depending on where those chars appear it could break some layouts maybe. But not sure.

    I tried to first escape h($text) before adding it into the converter, but then the tags needing " characters to work wont parse anymore.

    Bug 
    opened by dereuromark 4
  • Use MediaEmbed for video filter

    Use MediaEmbed for video filter

    The Video standalone filter is nice for starters. But the issue is that you need to also maintain those video links/URLs/code all the time. It could be beneficial to let a dedicated library take care of that E.g. https://github.com/dereuromark/media-embed

    To have BC and the choice we could add an optional MediaEmbedVideoFilter which only adds the dependency as require-dev for testing If you want to use you add the real dependency, otherwise no harm.

    The working code can be seen here: https://github.com/dereuromark/cakephp-markup/blob/master/src/Bbcode/Decoda/VideoFilter.php

    What do you think? Worth porting into, or should this be kept outside as own package shipping?

    Feature 
    opened by dereuromark 1
  • Pass string as method argument

    Pass string as method argument

    Is there a reason the string has to be put into the constructor, and the actual config happens at runtime? It makes it difficult for DI and libraries/config to adjust the object ones in constructing and then using it.

    If you look into e.g. https://commonmark.thephpleague.com/1.5/extensions/github-flavored-markdown/ or https://github.com/kzykhys/Ciconia/blob/master/src/Ciconia/Ciconia.php#L42

    Then you usually make the builder itself stateless and instead wrap statefulness internally.

    $env = .. // Can contain all the filters and stuff
    $options =  [
        'xhtmlOutput' => true,
        'strictMode' => false,
        'escapeHtml' => true
    ];
    
    // We can pass defaults here if we want
    $instance = new Decoda($env, $options);
    
    // We can also set options here per convertion
    $html = $instance->convert($bbcodeText, $options);
    $html2 = $instance->convert($bbcodeText2, $otherOptions);
    

    What do you think?

    There will also be no need for reset() then as it keeps the wrapper stateless and reuses the objects here.

    major 
    opened by dereuromark 3
  • Force img tag to be https on https site

    Force img tag to be https on https site

    Is there anyway to force the img tag to use https when accessing a site over https? A few of my users are using non-secure links and I'd rather the images just not work on https then transfer insecure images.

    Feature 
    opened by NicholasJohn16 3
Releases(6.13.0)
Owner
Miles Johnson
TypeScript enthusiast. Bug creator and bug fixer at @coinbase.
Miles Johnson
HTML sanitizer, written in PHP, aiming to provide XSS-safe markup based on explicitly allowed tags, attributes and values.

TYPO3 HTML Sanitizer ℹ️ Common safe HTML tags & attributes as given in \TYPO3\HtmlSanitizer\Builder\CommonBuilder still might be adjusted, extended or

TYPO3 GitHub Department 22 Dec 14, 2022
php html parser,类似与PHP Simple HTML DOM Parser,但是比它快好几倍

HtmlParser php html解析工具,类似与PHP Simple HTML DOM Parser。 由于基于php模块dom,所以在解析html时的效率比 PHP Simple HTML DOM Parser 快好几倍。 注意:html代码必须是utf-8编码字符,如果不是请转成utf-8

俊杰jerry 522 Dec 29, 2022
Better Markdown Parser in PHP

Parsedown Better Markdown Parser in PHP - Demo. Features One File No Dependencies Super Fast Extensible GitHub flavored Tested in 5.3 to 7.3 Markdown

Emanuil Rusev 14.3k Jan 8, 2023
Parser for Markdown and Markdown Extra derived from the original Markdown.pl by John Gruber.

PHP Markdown PHP Markdown Lib 1.9.0 - 1 Dec 2019 by Michel Fortin https://michelf.ca/ based on Markdown by John Gruber https://daringfireball.net/ Int

Michel Fortin 3.3k Jan 1, 2023
Highly-extensible PHP Markdown parser which fully supports the CommonMark and GFM specs.

league/commonmark league/commonmark is a highly-extensible PHP Markdown parser created by Colin O'Dell which supports the full CommonMark spec and Git

The League of Extraordinary Packages 2.4k Jan 1, 2023
A super fast, highly extensible markdown parser for PHP

A super fast, highly extensible markdown parser for PHP What is this? A set of PHP classes, each representing a Markdown flavor, and a command line to

Carsten Brandt 989 Dec 16, 2022
An HTML5 parser and serializer for PHP.

HTML5-PHP HTML5 is a standards-compliant HTML5 parser and writer written entirely in PHP. It is stable and used in many production websites, and has w

null 1.2k Dec 31, 2022
📜 Modern Simple HTML DOM Parser for PHP

?? Simple Html Dom Parser for PHP A HTML DOM parser written in PHP - let you manipulate HTML in a very easy way! This is a fork of PHP Simple HTML DOM

Lars Moelleken 665 Jan 4, 2023
A New Markdown parser for PHP5.4

Ciconia - A New Markdown Parser for PHP The Markdown parser for PHP5.4, it is fully extensible. Ciconia is the collection of extension, so you can rep

Kazuyuki Hayashi 357 Jan 3, 2023
Parsica - PHP Parser Combinators - The easiest way to build robust parsers.

Parsica The easiest way to build robust parsers in PHP.

null 0 Feb 22, 2022
Simple URL parser

urlparser Simple URL parser This is a simple URL parser, which returns an array of results from url of kind /module/controller/param1:value/param2:val

null 1 Oct 29, 2021
This is a php parser for plantuml source file.

PlantUML parser for PHP Overview This package builds AST of class definitions from plantuml files. This package works only with php. Installation Via

Tasuku Yamashita 5 May 29, 2022
Efficient, easy-to-use, and fast PHP JSON stream parser

JSON Machine Very easy to use and memory efficient drop-in replacement for inefficient iteration of big JSON files or streams for PHP 5.6+. See TL;DR.

Filip Halaxa 801 Dec 28, 2022
This is a simple, streaming parser for processing large JSON documents

Streaming JSON parser for PHP This is a simple, streaming parser for processing large JSON documents. Use it for parsing very large JSON documents to

Salsify 687 Jan 4, 2023
A PHP hold'em range parser

mattjmattj/holdem-range-parser A PHP hold'em range parser Installation No published package yet, so you'll have to clone the project manually, or add

Matthias Jouan 1 Feb 2, 2022
Advanced shortcode (BBCode) parser and engine for PHP

Shortcode Shortcode is a framework agnostic PHP library allowing to find, extract and process text fragments called "shortcodes" or "BBCodes". Example

Tomasz Kowalczyk 358 Nov 26, 2022
A set of tools for lexical and syntactical analysis written in pure PHP.

Welcome to Dissect! master - this branch always contains the last stable version. develop - the unstable development branch. Dissect is a set of tools

Jakub Lédl 221 Nov 29, 2022
A Flarum extension. A MEME BBCode.

Meme A Flarum extension. A MEME BBCode. [meme url=image.jpg text-position=top, center, bottom]This is meme text.[/meme] Installation Install with comp

Billy Wilcosky 2 Nov 26, 2021
Is an Extension of Laravel View Class which compiles String Template on the fly. It automatically detects changes on your string template and recompiles it if needed.

Laravel-fly-view Is an Extension of Laravel View Class which compiles String Template on the fly. It automatically detects changes on your string temp

John Turingan 16 Jul 17, 2022
PHP Japanese string helper functions for converting Japanese strings from full-width to half-width and reverse. Laravel Rule for validation Japanese string only full-width or only half-width.

Japanese String Helpers PHP Japanese string helper functions for converting Japanese strings from full-width to half-width and reverse. Laravel Rule f

Deha 54 Mar 22, 2022