Advanced shortcode (BBCode) parser and engine for PHP

Overview

Shortcode

Build Status SensioLabsInsight License Latest Stable Version Total Downloads Psalm coverage Scrutinizer Code Quality Code Coverage Code Climate

Shortcode is a framework agnostic PHP library allowing to find, extract and process text fragments called "shortcodes" or "BBCodes". Examples of their usual syntax and usage are shown below:

[user-profile /]
[image width=600]
[link href="http://google.pl" color=red]
[quote="Thunderer"]This is a quote.[/quote]
[text color="red"]This is a text.[/text]

The library is divided into several parts, each of them containing logic responsible for different stages and ways of processing data:

  • parsers extract shortcodes from text and transform them to objects,
  • handlers transform shortcodes into desired replacements,
  • processors use parsers and handlers to extract shortcodes, compute replacements, and apply them in text,
  • events alter the way processors work to provide better control over the whole process,
  • serializers convert shortcodes from and to different formats like Text, XML, JSON, and YAML.

Each part is described in the dedicated section in this document.

Installation

There are no required dependencies and all PHP versions from 5.3 up to latest 7.4 are tested and supported. This library is available on Composer/Packagist as thunderer/shortcode, to install it execute:

composer require thunderer/shortcode=^0.7

or manually update your composer.json with:

(...)
"require": {
    "thunderer/shortcode": "^0.7"
}
(...)

and run composer install or composer update afterwards. If you're not using Composer, download sources from GitHub and load them as required. But really, please use Composer.

Usage

Facade

To ease usage of this library there is a class ShortcodeFacade configured for most common needs. It contains shortcut methods for all features described in the sections below:

  • addHandler(): adds shortcode handlers,
  • addHandlerAlias(): adds shortcode handler alias,
  • process(): processes text and replaces shortcodes,
  • parse(): parses text into shortcodes,
  • setParser(): changes processor's parser,
  • addEventHandler(): adds event handler,
  • serialize(): serializes shortcode object to given format,
  • unserialize(): creates shortcode object from serialized input.

Processing

Shortcodes are processed using Processor which requires a parser and handlers. The example below shows how to implement an example that greets the person with name passed as an argument:

use Thunder\Shortcode\HandlerContainer\HandlerContainer;
use Thunder\Shortcode\Parser\RegularParser;
use Thunder\Shortcode\Processor\Processor;
use Thunder\Shortcode\Shortcode\ShortcodeInterface;

$handlers = new HandlerContainer();
$handlers->add('hello', function(ShortcodeInterface $s) {
    return sprintf('Hello, %s!', $s->getParameter('name'));
});
$processor = new Processor(new RegularParser(), $handlers);

$text = '
    <div class="user">[hello name="Thomas"]</div>
    <p>Your shortcodes are very good, keep it up!</p>
    <div class="user">[hello name="Peter"]</div>
';
echo $processor->process($text);

Facade example:

use Thunder\Shortcode\ShortcodeFacade;
use Thunder\Shortcode\Shortcode\ShortcodeInterface;

$facade = new ShortcodeFacade();
$facade->addHandler('hello', function(ShortcodeInterface $s) {
    return sprintf('Hello, %s!', $s->getParameter('name'));
});

$text = '
    <div class="user">[hello name="Thomas"]</div>
    <p>Your shortcodes are very good, keep it up!</p>
    <div class="user">[hello name="Peter"]</div>
';
echo $facade->process($text);

Both result in:

    <div class="user">Hello, Thomas!</div>
    <p>Your shortcodes are very good, keep it up!</p>
    <div class="user">Hello, Peter!</div>

Configuration

Processor has several configuration options available as with*() methods which return the new, changed instance to keep the object immutable.

  • withRecursionDepth($depth) controls the nesting level - how many levels of shortcodes are actually processed. If this limit is reached, all shortcodes deeper than level are ignored. If the $depth value is null (default value), nesting level is not checked, if it's zero then nesting is disabled (only topmost shortcodes are processed). Any integer greater than zero sets the nesting level limit,
  • withMaxIterations($iterations) controls the number of iterations that the source text is processed in. This means that source text is processed internally that number of times until the limit was reached or there are no shortcodes left. If the $iterations parameter value is null, there is no iterations limit, any integer greater than zero sets the limit. Defaults to one iteration,
  • withAutoProcessContent($flag) controls automatic processing of shortcode's content before calling its handler. If the $flag parameter is true then handler receives shortcode with already processed content, if false then handler must process nested shortcodes itself (or leave them for the remaining iterations). This is turned on by default,
  • withEventContainer($events) registers event container which provides handlers for all the events fired at various stages of processing text. Read more about events in the section dedicated to them.

Events

If processor was configured with events container there are several possibilities to alter the way shortcodes are processed:

  • Events::FILTER_SHORTCODES uses FilterShortcodesEvent class. It receives current parent shortcode and array of shortcodes from parser. Its purpose is to allow modifying that array before processing them,
  • Events::REPLACE_SHORTCODES uses ReplaceShortcodesEvent class and receives the parent shortcode, currently processed text, and array of replacements. It can alter the way shortcodes handlers results are applied to the source text. If none of the listeners set the result, the default method is used.

There are several ready to use event handlers in the Thunder\Shortcode\EventHandler namespace:

  • FilterRawEventHandler implements FilterShortcodesEvent and allows to implement any number of "raw" shortcodes whose content is not processed,
  • ReplaceJoinEventHandler implements ReplaceShortcodesEvent and provides the mechanism to apply shortcode replacements by discarding text and returning just replacements.

The example below shows how to manually implement a [raw] shortcode that returns its verbatim content without calling any handler for nested shortcodes:

use Thunder\Shortcode\Event\FilterShortcodesEvent;
use Thunder\Shortcode\EventContainer\EventContainer;
use Thunder\Shortcode\Events;
use Thunder\Shortcode\HandlerContainer\HandlerContainer;
use Thunder\Shortcode\Parser\RegularParser;
use Thunder\Shortcode\Processor\Processor;
use Thunder\Shortcode\Shortcode\ShortcodeInterface;

$handlers = new HandlerContainer();
$handlers->add('raw', function(ShortcodeInterface $s) { return $s->getContent(); });
$handlers->add('n', function(ShortcodeInterface $s) { return $s->getName(); });
$handlers->add('c', function(ShortcodeInterface $s) { return $s->getContent(); });

$events = new EventContainer();
$events->addListener(Events::FILTER_SHORTCODES, function(FilterShortcodesEvent $event) {
    $parent = $event->getParent();
    if($parent && ($parent->getName() === 'raw' || $parent->hasAncestor('raw'))) {
        $event->setShortcodes(array());
    }
});

$processor = new Processor(new RegularParser(), $handlers);
$processor = $processor->withEventContainer($events);

assert(' [n /] [c]cnt[/c] ' === $processor->process('[raw] [n /] [c]cnt[/c] [/raw]'));
assert('n true  [n /] ' === $processor->process('[n /] [c]true[/c] [raw] [n /] [/raw]'));

Facade example:

use Thunder\Shortcode\Event\FilterShortcodesEvent;
use Thunder\Shortcode\Events;
use Thunder\Shortcode\Shortcode\ShortcodeInterface;
use Thunder\Shortcode\ShortcodeFacade;

$facade = new ShortcodeFacade();
$facade->addHandler('raw', function(ShortcodeInterface $s) { return $s->getContent(); });
$facade->addHandler('n', function(ShortcodeInterface $s) { return $s->getName(); });
$facade->addHandler('c', function(ShortcodeInterface $s) { return $s->getContent(); });

$facade->addEventHandler(Events::FILTER_SHORTCODES, function(FilterShortcodesEvent $event) {
    $parent = $event->getParent();
    if($parent && ($parent->getName() === 'raw' || $parent->hasAncestor('raw'))) {
        $event->setShortcodes(array());
    }
});

assert(' [n /] [c]cnt[/c] ' === $facade->process('[raw] [n /] [c]cnt[/c] [/raw]'));
assert('n true  [n /] ' === $facade->process('[n /] [c]true[/c] [raw] [n /] [/raw]'));

Parsing

This section discusses available shortcode parsers. Regardless of the parser that you will choose, remember that:

  • shortcode names can be only aplhanumeric characters and dash -, basically must conform to the [a-zA-Z0-9-]+ regular expression,
  • unsupported shortcodes (no registered handler or default handler) will be ignored and left as they are,
  • mismatching closing shortcode ([code]content[/codex]) will be ignored, opening tag will be interpreted as self-closing shortcode, eg. [code /],
  • overlapping shortcodes ([code]content[inner][/code]content[/inner]) will be interpreted as self-closing, eg. [code]content[inner /][/code], second closing tag will be ignored,

There are three included parsers in this library:

  • RegularParser is the most powerful and correct parser available in this library. It contains the actual parser designed to handle all the issues with shortcodes like proper nesting or detecting invalid shortcode syntax. It is slightly slower than regex-based parser described below,
  • RegexParser uses a handcrafted regular expression dedicated to handle shortcode syntax as much as regex engine allows. It is fastest among the parsers included in this library, but it can't handle nesting properly, which means that nested shortcodes with the same name are also considered overlapping - (assume that shortcode [c] returns its content) string [c]x[c]y[/c]z[/c] will be interpreted as xyz[/c] (first closing tag was matched to first opening tag). This can be solved by aliasing handler name, because for example [c]x[d]y[/d]z[/c] will be processed correctly,
  • WordpressParser contains code copied from the latest currently available WordPress (4.3.1). It is also a regex-based parser, but the included regular expression is quite weak, it for example won't support BBCode syntax ([name="param" /]). This parser by default supports the shortcode name rule, but can break it when created with one of the named constructors (createFromHandlers() or createFromNames()) that change its behavior to catch only configured names. All of it is intentional to keep the compatibility with what WordPress is capable of if you need that compatibility.

Syntax

All parsers (except WordpressParser) support configurable shortcode syntax which can be configured by passing SyntaxInterface object as the first parameter. There is a convenience class CommonSyntax that contains default syntax. Usage is shown in the examples below:

use Thunder\Shortcode\HandlerContainer\HandlerContainer;
use Thunder\Shortcode\Parser\RegexParser;
use Thunder\Shortcode\Parser\RegularParser;
use Thunder\Shortcode\Processor\Processor;
use Thunder\Shortcode\Shortcode\ShortcodeInterface;
use Thunder\Shortcode\Syntax\CommonSyntax;
use Thunder\Shortcode\Syntax\Syntax;
use Thunder\Shortcode\Syntax\SyntaxBuilder;

$builder = new SyntaxBuilder();

Default syntax (called "common" in this library):

$defaultSyntax = new Syntax(); // without any arguments it defaults to common syntax
$defaultSyntax = new CommonSyntax(); // convenience class
$defaultSyntax = new Syntax('[', ']', '/', '=', '"'); // created explicitly
$defaultSyntax = $builder->getSyntax(); // builder defaults to common syntax

Syntax with doubled tokens:

$doubleSyntax = new Syntax('[[', ']]', '//', '==', '""');
$doubleSyntax = $builder // actually using builder
    ->setOpeningTag('[[')
    ->setClosingTag(']]')
    ->setClosingTagMarker('//')
    ->setParameterValueSeparator('==')
    ->setParameterValueDelimiter('""')
    ->getSyntax();

Something entirely different just to show the possibilities:

$differentSyntax = new Syntax('@', '#', '!', '&', '~');

Verify that each syntax works properly:

$handlers = new HandlerContainer();
$handlers->add('up', function(ShortcodeInterface $s) {
    return strtoupper($s->getContent());
});

$defaultRegex = new Processor(new RegexParser($defaultSyntax), $handlers);
$doubleRegex = new Processor(new RegexParser($doubleSyntax), $handlers);
$differentRegular = new Processor(new RegularParser($differentSyntax), $handlers);

assert('a STRING z' === $defaultRegex->process('a [up]string[/up] z'));
assert('a STRING z' === $doubleRegex->process('a [[up]]string[[//up]] z'));
assert('a STRING z' === $differentRegular->process('a @up#[email protected]!up# z'));

Serialization

This library supports several (un)serialization formats - XML, YAML, JSON and Text. Examples below shows how to both serialize and unserialize the same shortcode in each format:

use Thunder\Shortcode\Serializer\JsonSerializer;
use Thunder\Shortcode\Serializer\TextSerializer;
use Thunder\Shortcode\Serializer\XmlSerializer;
use Thunder\Shortcode\Serializer\YamlSerializer;
use Thunder\Shortcode\Shortcode\Shortcode;

$shortcode = new Shortcode('quote', array('name' => 'Thomas'), 'This is a quote!');

Text:

$text = '[quote name=Thomas]This is a quote![/quote]';
$textSerializer = new TextSerializer();

$serializedText = $textSerializer->serialize($shortcode);
assert($text === $serializedText);
$unserializedFromText = $textSerializer->unserialize($serializedText);
assert($unserializedFromText->getName() === $shortcode->getName());

JSON:

$json = '{"name":"quote","parameters":{"name":"Thomas"},"content":"This is a quote!","bbCode":null}';
$jsonSerializer = new JsonSerializer();
$serializedJson = $jsonSerializer->serialize($shortcode);
assert($json === $serializedJson);
$unserializedFromJson = $jsonSerializer->unserialize($serializedJson);
assert($unserializedFromJson->getName() === $shortcode->getName());

YAML:

$yaml = "name: quote
parameters:
    name: Thomas
content: 'This is a quote!'
bbCode: null
";
$yamlSerializer = new YamlSerializer();
$serializedYaml = $yamlSerializer->serialize($shortcode);
assert($yaml === $serializedYaml);
$unserializedFromYaml = $yamlSerializer->unserialize($serializedYaml);
assert($unserializedFromYaml->getName() === $shortcode->getName());

XML:

$xml = '<?xml version="1.0" encoding="UTF-8"?>
<shortcode name="quote">
  <bbCode/>
  <parameters>
    <parameter name="name"><![CDATA[Thomas]]></parameter>
  </parameters>
  <content><![CDATA[This is a quote!]]></content>
</shortcode>
';
$xmlSerializer = new XmlSerializer();
$serializedXml = $xmlSerializer->serialize($shortcode);
assert($xml === $serializedXml);
$unserializedFromXml = $xmlSerializer->unserialize($serializedXml);
assert($unserializedFromXml->getName() === $shortcode->getName());

Facade also supports serialization in all available formats:

use Thunder\Shortcode\Shortcode\Shortcode;
use Thunder\Shortcode\ShortcodeFacade;

$facade = new ShortcodeFacade();

$shortcode = new Shortcode('name', array('arg' => 'val'), 'content', 'bbCode');

$text = $facade->serialize($shortcode, 'text');
$textShortcode = $facade->unserialize($text, 'text');
assert($shortcode->getName() === $textShortcode->getName());

$json = $facade->serialize($shortcode, 'json');
$jsonShortcode = $facade->unserialize($json, 'json');
assert($shortcode->getName() === $jsonShortcode->getName());

$yaml = $facade->serialize($shortcode, 'yaml');
$yamlShortcode = $facade->unserialize($yaml, 'yaml');
assert($shortcode->getName() === $yamlShortcode->getName());

$xml = $facade->serialize($shortcode, 'xml');
$xmlShortcode = $facade->unserialize($xml, 'xml');
assert($shortcode->getName() === $xmlShortcode->getName());

Handlers

There are several builtin shortcode handlers available in Thunder\Shortcode\Handler namespace. Description below assumes that given handler was registered with xyz name:

  • NameHandler always returns shortcode's name. [xyz arg=val]content[/xyz] becomes xyz,
  • ContentHandler always returns shortcode's content. It discards its opening and closing tag. [xyz]code[/xyz] becomes code,
  • RawHandler returns unprocessed shortcode content. Its behavior is different than FilterRawEventHandler because if content auto processing is turned on, then nested shortcodes handlers were called, just their result was discarded,
  • NullHandler completely removes shortcode with all nested shortcodes,
  • DeclareHandler allows to dynamically create shortcode handler with name as first parameter that will also replace all placeholders in text passed as arguments. Example: [declare xyz]Your age is %age%.[/declare] created handler for shortcode xyz and when used like [xyz age=18] the result is Your age is 18.,
  • EmailHandler replaces the email address or shortcode content as clickable mailto: link:
  • PlaceholderHandler replaces all placeholders in shortcode's content with values of passed arguments. [xyz year=1970]News from year %year%.[/xyz] becomes News from year 1970.,
  • SerializerHandler replaces shortcode with its serialized value using serializer passed as an argument in class' constructor. If configured with JsonSerializer, [xyz /] becomes {"name":"json", "arguments": [], "content": null, "bbCode": null}. This could be useful for debugging your shortcodes,
  • UrlHandler replaces its content with a clickable link:
    • [xyz]http://example.com[/xyz] becomes <a href="http://example.com">http://example.com</a>,
    • [xyz="http://example.com"]Visit my site![/xyz] becomes <a href="http://example.com">Visit my site!</a>,
  • WrapHandler allows to specify the value that should be placed before and after shortcode content. If configured with <strong> and </strong>, the text [xyz]Bold text.[/xyz] becomes <strong>Bold text.</strong>.

Contributing

Want to contribute? Perfect! Submit an issue or Pull Request and explain what would you like to see in this library.

License

See LICENSE file in the main directory of this library.

Comments
  • Shortcodes ignored with multiple nested levels

    Shortcodes ignored with multiple nested levels

    Hi,

    First, thank you for making this library available. It appears very comprehensive, but I think I've found an issue that I am banging my head against for the past 3 days. In short, I believe that this code starts to omit shortcodes when there are several levels of. Please see my input text with shortcodes below:

    [la-row] [la-column width="100%"] [la-text format="h1"]Welcome![/la-text] [la-text]This page allows you to send commands to LightAct. These commands are simple strings that can be read with LightAct Layer Layouts and acted upon. This page uses standard web technologies such as html, Jquery, and AJAX so, if you can use these frameworks, you can write your own page. You can also use our own page builder, which you can access at[/la-text] [/la-column] [/la-row] [la-row] [la-column width="25%"] [la-text format="h4"]Sample heading 3[/la-text] [la-text]Sample text[/la-text] [/la-column] [la-column width="25%"] [la-text format="h4"]Sample heading 3[/la-text] [la-text]Sample text[/la-text] [/la-column] [la-column width="25%"] [la-text format="h4"]Sample heading 3[/la-text] [la-text]Sample text[/la-text] [/la-column] [la-column width="25%"] [la-text format="h4"]Sample heading 3[/la-text] [la-text]Sample text[/la-text] [/la-column] [/la-row]

    I've written a php code using your library which transforms the above text into this webpage. Up to here it all works fine. example1

    But if I multiply the above shortcodes 4 times, the last couple of columns start to get omitted as shown here. example2

    Now this is only one manifestation of this issue. From my experience, the more shortcodes there are, especially if they are nested, the sooner this problem appears.

    I was wondering if there is anything you can do to help?

    issue 
    opened by visiblegroup 43
  • Please provide working examples

    Please provide working examples

    Your documentations seems to be hard to understand, Can you provide working examples for each of the base functions so that one can get better understanding?

    Thanks

    issue 
    opened by NishantSolanki 27
  • [*] Asterisk not allowed/working handle name?

    [*] Asterisk not allowed/working handle name?

    Hi, thanks for the project! I like it very much so far. Nonetheless, I'm having a hard Time registering some handles. Trying to register [*] as handle Name doesn't result in an Exception but neither does it seem to result in a working handle. It is not parsed from the input. For example:

    $facade = new ShortcodeFacade();
    $facade->addHandler('*', function(ShortcodeInterface $s) {
        return '<li>' .$s->getContent() .'</li>';
    });
    echo $facade->process('[*]Hello World[/*]');
    

    results in:

    [*]Hello World[/*]
    

    I tried escaping the asterisk as well. Is it me doing something wrong? I mean [*] is a pretty standard BBCode Element, isn't it? Would be a pain to replace it in WYSIWYG editors for this reason.

    Thanks, Mark.

    patch 
    opened by mark-win 16
  • Skip processing handler if this is not a valid shortcode

    Skip processing handler if this is not a valid shortcode

    I was getting some very strange content corruption when I enabled my shortcode plugin that uses your fantastic little shortcode library. This was caused by my having some square bracketed content that did not match up with any of my defined shortcode handlers:

    ##### html([title][, alt][, classes])
    
    !! In Markdown this method is implicitly called when using the `![]` syntax.
    
    The `html` action will output a valid HTML tag for the media based on the current display mode.
    

    This [title], [, alt] and [, classes] text is used in my documentation to display optional method params. However, it was corrupting that title to something like:

    <h5>html([t[title] alt][, classes])</h5>
    

    I debugged this in the Shortcode library and discovered that you were first collecting all the potential shortcodes in the entire page (quite a lot for me), and then iterating over them. However, when you retrieved the handler for every shortcode, it would continue to try to process the shortcode even if the handler was null. This ultimately leads to all kinds of corruption of the end content.

    Simply checking to see if this handler is null, and then continuing without doing any more processing ensures there is no corruption, and also speeds up the page parsing considerably.

    FYI - I created a core shortcode plugin for Grav CMS, and a subsequent shortcode plugin that adds some UI specific shortcodes in case you are interested.

    • https://getgrav.org
    • https://github.com/getgrav/grav-plugin-shortcode-core
    • https://github.com/getgrav/grav-plugin-shortcode-ui
    opened by rhukster 16
  • Nested shortcode issue

    Nested shortcode issue

    Fails

    <?php
    include 'vendor/autoload.php';
    
    use Thunder\Shortcode\ShortcodeFacade;
    use Thunder\Shortcode\Shortcode\ShortcodeInterface;
    
    $facade = new ShortcodeFacade();
    $facade->addHandler('hello', function(ShortcodeInterface $s) {
        return sprintf('Hello, %s!' . $s->getContent(), $s->getParameter('name'));
    });
    
    $text = '
        <p>Start</p>
        [hello name="Thomas"]
            [hello name="Peter"]
        [/hello]
        <p>End</p>
    ';
    echo $facade->process($text);
    

    Result

    <p>Start</p>
    Hello, Thomas!
      [hello name="Peter"]
    [/hello]
    <p>End</p>
    

    Works

    $text = '
        [hello name="Thomas"]
            [hello name="Peter"]
        [/hello]
    ';
    

    Result

    Hello, Thomas!
      Hello, Peter!
    
    issue 
    opened by jenstornell 15
  • Unicode character breaks shortcode replacement

    Unicode character breaks shortcode replacement

    Hey,

    I'm testing following assertion based on README example:

            $handlers = new HandlerContainer();
            $handlers->add('sample', function(ShortcodeInterface $s) {
               return (new JsonSerializer())->serialize($s);
               });
            $processor = new Processor(new RegexParser(), $handlers);
    
            $text = 'x [sample arg=val]cnt[/sample] y';
            $result = 'x {"name":"sample","args":{"arg":"val"},"content":"cnt"} y';
            assert($result === $processor->process($text));
    

    it works fine unless I put some unicode characters inside shortcode.

    In case of polish character ń it returns 'x {"name":"sample","parameters":{"arg":"val"},"content":"\u0144","bbCode":null}] y' (notice closing ]). In case of 4 polish characters żółć it returns 'x {"name":"sample","parameters":{"arg":"val"},"content":"\u017c\u00f3\u0142\u0107","bbCode":null}ple] y' (notice ple] after shortcode replacement).

    I'm using PHP Version 5.5.9-1ubuntu4.11 with Multibyte Support enabled.

    bug 
    opened by michaloo 15
  • Strip out <p> elements

    Strip out

    elements

    For example TinyMCE, wraps everything in

    tags. Block elements like embedded elements, should be stripped of this tag, for HTML5 compliance. So if the block-element is <p>[embedcode]</p>, the resulting output should be just the embedcode, without the <p> elements.

    issue 
    opened by Firesphere 13
  • On large HTML files, RegularParser does not fire any handler.

    On large HTML files, RegularParser does not fire any handler.

    I've been implementing this library on a project in my company, which scraps a whole local Wordpress with httrack and then modifies the files in order to fit to our needs.

    To make some parts dynamic, we're implementing shortcodes (which we translate to custom PHP code). I've been testing your library without any problems, but when I gave it a definitive file, it didn't triggered any handler, and the parser pushed PHP memory consumption over 800MB.

    After several tests, I've decided to try the RegexParser. It parsed the files correctly, and using a riddiculous amount of time and memory compared with the standard one.

    I can understand the extra time and memory consumption (uncer certain limits), but I don't get why the parser didn't saw any of my tags, I am the only one who suffered this issue?

    BTW, thank you very much for your work, your library is awesome and I'm enjoying it a lot!

    issue 
    opened by devnix 12
  • Question: Best way to handle a `[raw][/raw]` shortcode?

    Question: Best way to handle a `[raw][/raw]` shortcode?

    I'm trying to find the best way to write a [raw][/raw] shortcode that stops the Shortcode library from processing anything between these raw tags. My current implementation fakes it by taking setting the 'text' back to the original unmodified text. However, this doesn't stop Shortcode from processing that inner stuff first.

        private function addRawHandler()
        {
            $this->handlers->add('raw', function(ShortcodeInterface $shortcode) {
                $raw = trim(preg_replace('/\[raw\](.*?)\[\/raw\]/is','${1}', $shortcode->getShortcodeText()));
                return $raw;
            });
        }
    

    This raised it's head when I ran into the [0] bug in some javascript example code on my page. While a fix was quickly found, a situation could come up where a fix is impossible or not practical. Turning off shortcodes processing in parts of a page is therefore an important function to have. What is the better way of doing this?

    Cheers!

    issue 
    opened by rhukster 11
  • Underscores in tag names not matched

    Underscores in tag names not matched

    Wasn't sure if this was a deliberate decision but in parsing tags containing underscores aren't matched.

    eg: [testtag] works but [test_tag] does not.

    Both the regular parser here: https://github.com/thunderer/Shortcode/blob/master/src/Parser/RegularParser.php#L81 and the Wordpress parser here: https://github.com/thunderer/Shortcode/blob/master/src/Parser/WordpressParser.php#L27

    use a slightly different match syntax but neither contain the _ character.

    Reason I ask is because I'm trying to use this library to make a content importer from Wordpress and Wordpress does seem to parse them correctly.

    bug 
    opened by rossriley 10
  • Wrong parent shortcode returned in nested shortcodes

    Wrong parent shortcode returned in nested shortcodes

    Hello, I'm dealing with nested shortcodes and I faced the following issue. Giving this shortcode structure:

    [shortcode1]
        [shortcode2]
            [shortcode3]
            [/shortcode3]
        [/shortcode2]
    [/shortcode1]
    

    calling getParent() on shortcode3 returns the shortcode1 and not the expected shortcode2.

    bug 
    opened by giansi 10
  • (question) Also retrieve content outside shortcodes when parsing

    (question) Also retrieve content outside shortcodes when parsing

    Hello,

    I discovered this library while searching for a solution to parse content with Wordpress shortcodes. It is fantastic, thank you for all the efforts you put into this project :)

    I'm trying to make sure I'm using it the right way. What I need to achieve is transforming this initial content:

    Block of regular text
    [my-shortcode id="1"]Shortcode content[/my-shortcode]
    Other block of regular text
    

    into an array of blocks

    [
      0 => [
        'type' => 'text',
        'content' => 'Block of regular text'
      ],
      1 => [
        'type' => 'my-shortcode',
        'id' => 1,
        'content' => 'Shortcode content'
      ],
      2 => [
        'type' => 'text',
        'content' => 'Other block of regular text'
      ]
    ]
    
    

    So everything is working great to parse and extract content from shortcodes, but I am wondering whether there is a way to also extract the content from outside the shortcodes?

    issue 
    opened by thdebay 2
  • Modifying shortcode replacements via event handler?

    Modifying shortcode replacements via event handler?

    Is it possible to use either Events::FILTER_SHORTCODES or Events::REPLACE_SHORTCODES to modify/extend the replacement string provided by the handler?

    I am looking for a way to insert additional HTML (say, a </div>...<div>) around shortcodes. When determining this additional HTML, I'd need to know at lease the shortcode name. And I would like to keep this out of the handler, because it is context/situation specific.

    Any pointers?

    issue 
    opened by mpdude 4
  • BBCode shortcode parameter value chokes on the tag closing character

    BBCode shortcode parameter value chokes on the tag closing character

    Using CommonSyntax, the following text won't be picked up by the regular parser: [url=http://example.com]link[/url]

    The parser correctly recognize the opening shortcode, continue to getting the parameter value, then finds a /which is a marker token, then look for a closing token, doesn't find it and return false.

    Using parameter delimiter like this [url="http://example.com"]link[/url] yields the expected result.

    bug 
    opened by MrPetovan 4
  • Ideas

    Ideas

    Just a list of issues to remember:

    • [ ] custom exceptions for specific cases rather than using generic ones,
    • [ ] recipes for specific tasks (like find all shortcode names in text),
    • [ ] upgrade RegularParser to not split text into tokens at once but iterate over characters,
    • [ ] shortcodes with children to represent their structure in text,
    • [ ] shortcode's getTextUntilNext() which returns the content up to the next shortcode opening tag,
    • [ ] shortcode name wildcard,
    • [ ] more complex positional parameters [x=http://x.com], [x x=http://x.com],
    • [ ] shortcode escaping by doubling opening / closing token [[x /]], [[y a=b]],
    • [ ] tools for text analysis (nesting depth, count, name stats, etc.)
    • [ ] configurable shortcode name validation rules,
    • [ ] solve BBCode ambiguity issues in [x=] and [x= arg=val],
    • [ ] add ProcessedShortcodeInterface and Processor::withShortcodeFactory() (think of a better name) to allow creating custom shortcode objects using ProcessorContext that are compatible with ProcessedShortcode. This will allow users to put their information inside while still maintaining how library works,
    • [ ] fix inconsistency between getText and getShortcodeText between ParsedShortcode and ProcessedShortcode,
    • [x] make ShortcodeFacade a mutable class with all the shortcuts to ease library usage (#36),
    • [ ] new regex parser that catches only opening/closing tags and handles nesting manually,
    • [ ] repeated shortcodes, ie. ability to repeat inner content with collections of data, [list]- [item/],[/list] which renders multiple "item" elements, children of list (item shortcodes) receive context from data passed to list,
    • [x] better documentation than it is now in README (#34),
    • [x] extract several event listener classes with __invoke() to ease events usage (#33),
    • [x] find a way to strip content outside shortcodes in given shortcode content without losing tree structure (apply results event),
    • [x] ~~configurable handler for producing Processor::process() return value using array of replacements~~ (events FTW),
    • [x] issue with losing shortcode parent when many same shortcodes are on the same level,
    • [x] ~~HUGE BC (just an idea, no worries): Change ProcessorInterface::process() to receive array of parsed shortcodes to allow greater flexibility (eg. filtering by parent)~~,
    • [x] find a way to retrieve original shortcode content even when auto-processing,
    • [x] prevent the ability to create shortcode with falsy name (require non-empty string value),
    • [x] resolve naming problem for shortcode parameters (current) vs. arguments,
    • [x] suggest symfony/yaml in composer.json for YAML serializer,
    • [x] allow escaping with double open/close tags like in WordPress ([[code]value[/code]]),
    • [x] add Wordpress parser (regex parser with code from WP).

    Regular parser:

    • [x] fix BBCode parsing (unused variable).

    BBCode:

    • [x] decide whether bbcode value should be:
      • merged with parameters as parameter with shortcode name, no API change, easier version,
      • a separate element, fourth parameter in Shortcode constructor, and separate getter,
      • a separate shortcode type (requires changes in both Parsed and Processed as well).

    Built-in handlers:

    • [x] ~~DeclareHandler should typehint interface in constructor,~~ (needs add() method)
    • [x] EmailHandler could be a BBCode,
    • [x] PlaceholderHandler should have configurable placeholder braces,
    • [x] UrlHandler could be a BBCode,
    • [x] WrapHandler could have several most common variants (eg. bold) created as named constructors.
    issue 
    opened by thunderer 13
Releases(v0.7.5)
  • v0.7.5(Jan 13, 2022)

  • v0.7.3(Dec 3, 2019)

  • v0.7.2(Apr 20, 2019)

  • v0.7.1(Feb 3, 2019)

  • v0.7.0(Dec 18, 2018)

    • [x] many RegularParser improvements and fixes:
      • backtracks now rely on their offsets only, this is an over 10x performance and memory usage improvement which evens its performance with other parsers while still keeping its feature advantage,
      • subsequent non-token text fragments are now reported as single T_STRING tokens,
      • fixed #70, preg_match_all() with large inputs was sometimes silently failing and returning only subset of matches which reduced the number of reported shortcodes,
      • fixed #58 where invalid token sequences in shortcode content may confuse the parser,
      • inlined content() method effectively halving the call nesting level,
      • disabled xdebug.max_nesting_level during parse() to prevent development environment parsing errors,
    • [x] added support for PHPUnit 6.x with fallback translation for PHP 5.x compatibility,
    • [x] dropped PHP 5.3 (still supported) and added PHP 7.2 from Travis matrix,
    • [x] asterisk * is now a valid shortcode name,
    • [x] minor internal Processor improvements,
    • [x] minor README updates.
    Source code(tar.gz)
    Source code(zip)
  • v0.6.5(Jan 8, 2017)

  • v0.6.4(Dec 13, 2016)

  • v0.6.3(Aug 10, 2016)

  • v0.6.2(May 19, 2016)

  • v0.6.1(Feb 25, 2016)

    Fixed bug with not recalculating new text length after applying shortcode replacement which caused the replacements to be applied only up to the length of source text. Thanks to @Jonatanmdez for reporting it in issue #37.

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Feb 15, 2016)

    Events subsystem, builtin event handlers, builtin shortcode handlers, rewritten README, standardized shortcode name validation, ReplacedShortcode to represent handler result instead of internal array, getBaseOffset() and better WordpressParser compatibility with WordPress.

    Source code(tar.gz)
    Source code(zip)
  • v0.5.3(Jan 27, 2016)

    Massive performance improvements in RegularParser, fixed problem with multibyte characters in parsed texts, fixed matching shortcodes with invalid names.

    Source code(tar.gz)
    Source code(zip)
  • v0.5.2(Jan 20, 2016)

  • v0.5.1(Nov 12, 2015)

  • v0.5.0(Oct 28, 2015)

    Removed all deprecated features from v0.4.0, added XML and YAML serializers, added RegularParser and WordpressParser, added HandlerContainer abstraction. This was meant to be a 1.0.0 release, but I'm still working on several ideas and want to test them before first stable release (should be released soon).

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jul 15, 2015)

    Intermediate release with many fixes and improvements. BC was kept intact, though nearly all classes were moved into new places. Introduced UPGRADE document which describes possible breaking changes in future releases. Next release will be BC breaking and will clean everything.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(May 8, 2015)

  • v0.2.2(Apr 26, 2015)

  • v0.2.1(Apr 23, 2015)

  • v0.2.0(Apr 17, 2015)

  • v0.1.0(Apr 7, 2015)

Owner
Tomasz Kowalczyk
Working hard to bring back *engineering* in software engineering.
Tomasz Kowalczyk
php html parser,类似与PHP Simple HTML DOM Parser,但是比它快好几倍

HtmlParser php html解析工具,类似与PHP Simple HTML DOM Parser。 由于基于php模块dom,所以在解析html时的效率比 PHP Simple HTML DOM Parser 快好几倍。 注意:html代码必须是utf-8编码字符,如果不是请转成utf-8

俊杰jerry 521 Dec 6, 2022
Highly-extensible PHP Markdown parser which fully supports the CommonMark and GFM specs.

league/commonmark league/commonmark is a highly-extensible PHP Markdown parser created by Colin O'Dell which supports the full CommonMark spec and Git

The League of Extraordinary Packages 2.4k Nov 24, 2022
An HTML5 parser and serializer for PHP.

HTML5-PHP HTML5 is a standards-compliant HTML5 parser and writer written entirely in PHP. It is stable and used in many production websites, and has w

null 1.1k Dec 6, 2022
Efficient, easy-to-use, and fast PHP JSON stream parser

JSON Machine Very easy to use and memory efficient drop-in replacement for inefficient iteration of big JSON files or streams for PHP 5.6+. See TL;DR.

Filip Halaxa 790 Dec 1, 2022
Parser for Markdown and Markdown Extra derived from the original Markdown.pl by John Gruber.

PHP Markdown PHP Markdown Lib 1.9.0 - 1 Dec 2019 by Michel Fortin https://michelf.ca/ based on Markdown by John Gruber https://daringfireball.net/ Int

Michel Fortin 3.3k Dec 4, 2022
Better Markdown Parser in PHP

Parsedown Better Markdown Parser in PHP - Demo. Features One File No Dependencies Super Fast Extensible GitHub flavored Tested in 5.3 to 7.3 Markdown

Emanuil Rusev 14.3k Dec 3, 2022
A super fast, highly extensible markdown parser for PHP

A super fast, highly extensible markdown parser for PHP What is this? A set of PHP classes, each representing a Markdown flavor, and a command line to

Carsten Brandt 988 Nov 29, 2022
📜 Modern Simple HTML DOM Parser for PHP

?? Simple Html Dom Parser for PHP A HTML DOM parser written in PHP - let you manipulate HTML in a very easy way! This is a fork of PHP Simple HTML DOM

Lars Moelleken 650 Nov 28, 2022
Parsica - PHP Parser Combinators - The easiest way to build robust parsers.

Parsica The easiest way to build robust parsers in PHP.

null 0 Feb 22, 2022
This is a php parser for plantuml source file.

PlantUML parser for PHP Overview This package builds AST of class definitions from plantuml files. This package works only with php. Installation Via

Tasuku Yamashita 5 May 29, 2022
A PHP hold'em range parser

mattjmattj/holdem-range-parser A PHP hold'em range parser Installation No published package yet, so you'll have to clone the project manually, or add

Matthias Jouan 1 Feb 2, 2022
A New Markdown parser for PHP5.4

Ciconia - A New Markdown Parser for PHP The Markdown parser for PHP5.4, it is fully extensible. Ciconia is the collection of extension, so you can rep

Kazuyuki Hayashi 359 Sep 14, 2022
Simple URL parser

urlparser Simple URL parser This is a simple URL parser, which returns an array of results from url of kind /module/controller/param1:value/param2:val

null 1 Oct 29, 2021
This is a simple, streaming parser for processing large JSON documents

Streaming JSON parser for PHP This is a simple, streaming parser for processing large JSON documents. Use it for parsing very large JSON documents to

Salsify 685 Nov 23, 2022
UpToDocs scans a Markdown file for PHP code blocks, and executes each one in a separate process.

UpToDocs UpToDocs scans a Markdown file for PHP code blocks, and executes each one in a separate process. Include this in your CI workflows, to make s

Mathias Verraes 56 Nov 26, 2022
HTML sanitizer, written in PHP, aiming to provide XSS-safe markup based on explicitly allowed tags, attributes and values.

TYPO3 HTML Sanitizer ℹ️ Common safe HTML tags & attributes as given in \TYPO3\HtmlSanitizer\Builder\CommonBuilder still might be adjusted, extended or

TYPO3 GitHub Department 19 Nov 2, 2022
A simple PHP scripting application which fetch emails from your Gmail account according to a filter and parses them for information.

A simple PHP scripting application which fetch emails from your Gmail account according to a filter and parses them for information.

Haitham Sweilem 1 Jan 18, 2022
Plug and play flat file markdown blog for your Laravel-projects

Ampersand Plug-and-play flat file markdown blog tool for your Laravel-project. Create an article or blog-section on your site without the hassle of se

Marcus Olsson 22 Dec 5, 2022
Convert HTML to Markdown with PHP

HTML To Markdown for PHP Library which converts HTML to Markdown for your sanity and convenience. Requires: PHP 7.2+ Lead Developer: @colinodell Origi

The League of Extraordinary Packages 1.5k Nov 23, 2022