Advanced shortcode (BBCode) parser and engine for PHP

Tomasz Kowalczyk

Last update: Nov 26, 2022

Related tags

Markup php shortcode parser library bbcode decoda bbcode-parser

Overview

Shortcode

Shortcode is a framework agnostic PHP library allowing to find, extract and process text fragments called "shortcodes" or "BBCodes". Examples of their usual syntax and usage are shown below:

[user-profile /]
[image width=600]
[link href="http://google.pl" color=red]
[quote="Thunderer"]This is a quote.[/quote]
[text color="red"]This is a text.[/text]

The library is divided into several parts, each of them containing logic responsible for different stages and ways of processing data:

parsers extract shortcodes from text and transform them to objects,
handlers transform shortcodes into desired replacements,
processors use parsers and handlers to extract shortcodes, compute replacements, and apply them in text,
events alter the way processors work to provide better control over the whole process,
serializers convert shortcodes from and to different formats like Text, XML, JSON, and YAML.

Each part is described in the dedicated section in this document.

Installation

There are no required dependencies and all PHP versions from 5.3 up to latest 7.4 are tested and supported. This library is available on Composer/Packagist as thunderer/shortcode, to install it execute:

composer require thunderer/shortcode=^0.7

or manually update your composer.json with:

(...)
"require": {
    "thunderer/shortcode": "^0.7"
}
(...)

and run composer install or composer update afterwards. If you're not using Composer, download sources from GitHub and load them as required. But really, please use Composer.

Usage

Facade

To ease usage of this library there is a class ShortcodeFacade configured for most common needs. It contains shortcut methods for all features described in the sections below:

addHandler(): adds shortcode handlers,
addHandlerAlias(): adds shortcode handler alias,
process(): processes text and replaces shortcodes,
parse(): parses text into shortcodes,
setParser(): changes processor's parser,
addEventHandler(): adds event handler,
serialize(): serializes shortcode object to given format,
unserialize(): creates shortcode object from serialized input.

Processing

Shortcodes are processed using Processor which requires a parser and handlers. The example below shows how to implement an example that greets the person with name passed as an argument:

use Thunder\Shortcode\HandlerContainer\HandlerContainer;
use Thunder\Shortcode\Parser\RegularParser;
use Thunder\Shortcode\Processor\Processor;
use Thunder\Shortcode\Shortcode\ShortcodeInterface;

$handlers = new HandlerContainer();
$handlers->add('hello', function(ShortcodeInterface $s) {
    return sprintf('Hello, %s!', $s->getParameter('name'));
});
$processor = new Processor(new RegularParser(), $handlers);

$text = '
    <div class="user">[hello name="Thomas"]</div>
    <p>Your shortcodes are very good, keep it up!</p>
    <div class="user">[hello name="Peter"]</div>
';
echo $processor->process($text);

Facade example:

use Thunder\Shortcode\ShortcodeFacade;
use Thunder\Shortcode\Shortcode\ShortcodeInterface;

$facade = new ShortcodeFacade();
$facade->addHandler('hello', function(ShortcodeInterface $s) {
    return sprintf('Hello, %s!', $s->getParameter('name'));
});

$text = '
    <div class="user">[hello name="Thomas"]</div>
    <p>Your shortcodes are very good, keep it up!</p>
    <div class="user">[hello name="Peter"]</div>
';
echo $facade->process($text);

Both result in:

    <div class="user">Hello, Thomas!</div>
    <p>Your shortcodes are very good, keep it up!</p>
    <div class="user">Hello, Peter!</div>

Configuration

Processor has several configuration options available as with*() methods which return the new, changed instance to keep the object immutable.

withRecursionDepth($depth) controls the nesting level - how many levels of shortcodes are actually processed. If this limit is reached, all shortcodes deeper than level are ignored. If the $depth value is null (default value), nesting level is not checked, if it's zero then nesting is disabled (only topmost shortcodes are processed). Any integer greater than zero sets the nesting level limit,
withMaxIterations($iterations) controls the number of iterations that the source text is processed in. This means that source text is processed internally that number of times until the limit was reached or there are no shortcodes left. If the $iterations parameter value is null, there is no iterations limit, any integer greater than zero sets the limit. Defaults to one iteration,
withAutoProcessContent($flag) controls automatic processing of shortcode's content before calling its handler. If the $flag parameter is true then handler receives shortcode with already processed content, if false then handler must process nested shortcodes itself (or leave them for the remaining iterations). This is turned on by default,
withEventContainer($events) registers event container which provides handlers for all the events fired at various stages of processing text. Read more about events in the section dedicated to them.

Events

If processor was configured with events container there are several possibilities to alter the way shortcodes are processed:

Events::FILTER_SHORTCODES uses FilterShortcodesEvent class. It receives current parent shortcode and array of shortcodes from parser. Its purpose is to allow modifying that array before processing them,
Events::REPLACE_SHORTCODES uses ReplaceShortcodesEvent class and receives the parent shortcode, currently processed text, and array of replacements. It can alter the way shortcodes handlers results are applied to the source text. If none of the listeners set the result, the default method is used.

There are several ready to use event handlers in the Thunder\Shortcode\EventHandler namespace:

FilterRawEventHandler implements FilterShortcodesEvent and allows to implement any number of "raw" shortcodes whose content is not processed,
ReplaceJoinEventHandler implements ReplaceShortcodesEvent and provides the mechanism to apply shortcode replacements by discarding text and returning just replacements.

The example below shows how to manually implement a [raw] shortcode that returns its verbatim content without calling any handler for nested shortcodes:

use Thunder\Shortcode\Event\FilterShortcodesEvent;
use Thunder\Shortcode\EventContainer\EventContainer;
use Thunder\Shortcode\Events;
use Thunder\Shortcode\HandlerContainer\HandlerContainer;
use Thunder\Shortcode\Parser\RegularParser;
use Thunder\Shortcode\Processor\Processor;
use Thunder\Shortcode\Shortcode\ShortcodeInterface;

$handlers = new HandlerContainer();
$handlers->add('raw', function(ShortcodeInterface $s) { return $s->getContent(); });
$handlers->add('n', function(ShortcodeInterface $s) { return $s->getName(); });
$handlers->add('c', function(ShortcodeInterface $s) { return $s->getContent(); });

$events = new EventContainer();
$events->addListener(Events::FILTER_SHORTCODES, function(FilterShortcodesEvent $event) {
    $parent = $event->getParent();
    if($parent && ($parent->getName() === 'raw' || $parent->hasAncestor('raw'))) {
        $event->setShortcodes(array());
    }
});

$processor = new Processor(new RegularParser(), $handlers);
$processor = $processor->withEventContainer($events);

assert(' [n /] [c]cnt[/c] ' === $processor->process('[raw] [n /] [c]cnt[/c] [/raw]'));
assert('n true  [n /] ' === $processor->process('[n /] [c]true[/c] [raw] [n /] [/raw]'));

Facade example:

use Thunder\Shortcode\Event\FilterShortcodesEvent;
use Thunder\Shortcode\Events;
use Thunder\Shortcode\Shortcode\ShortcodeInterface;
use Thunder\Shortcode\ShortcodeFacade;

$facade = new ShortcodeFacade();
$facade->addHandler('raw', function(ShortcodeInterface $s) { return $s->getContent(); });
$facade->addHandler('n', function(ShortcodeInterface $s) { return $s->getName(); });
$facade->addHandler('c', function(ShortcodeInterface $s) { return $s->getContent(); });

$facade->addEventHandler(Events::FILTER_SHORTCODES, function(FilterShortcodesEvent $event) {
    $parent = $event->getParent();
    if($parent && ($parent->getName() === 'raw' || $parent->hasAncestor('raw'))) {
        $event->setShortcodes(array());
    }
});

assert(' [n /] [c]cnt[/c] ' === $facade->process('[raw] [n /] [c]cnt[/c] [/raw]'));
assert('n true  [n /] ' === $facade->process('[n /] [c]true[/c] [raw] [n /] [/raw]'));

Parsing

This section discusses available shortcode parsers. Regardless of the parser that you will choose, remember that:

shortcode names can be only aplhanumeric characters and dash -, basically must conform to the [a-zA-Z0-9-]+ regular expression,
unsupported shortcodes (no registered handler or default handler) will be ignored and left as they are,
mismatching closing shortcode ([code]content[/codex]) will be ignored, opening tag will be interpreted as self-closing shortcode, eg. [code /],
overlapping shortcodes ([code]content[inner][/code]content[/inner]) will be interpreted as self-closing, eg. [code]content[inner /][/code], second closing tag will be ignored,

There are three included parsers in this library:

RegularParser is the most powerful and correct parser available in this library. It contains the actual parser designed to handle all the issues with shortcodes like proper nesting or detecting invalid shortcode syntax. It is slightly slower than regex-based parser described below,
RegexParser uses a handcrafted regular expression dedicated to handle shortcode syntax as much as regex engine allows. It is fastest among the parsers included in this library, but it can't handle nesting properly, which means that nested shortcodes with the same name are also considered overlapping - (assume that shortcode [c] returns its content) string [c]x[c]y[/c]z[/c] will be interpreted as xyz[/c] (first closing tag was matched to first opening tag). This can be solved by aliasing handler name, because for example [c]x[d]y[/d]z[/c] will be processed correctly,
WordpressParser contains code copied from the latest currently available WordPress (4.3.1). It is also a regex-based parser, but the included regular expression is quite weak, it for example won't support BBCode syntax ([name="param" /]). This parser by default supports the shortcode name rule, but can break it when created with one of the named constructors (createFromHandlers() or createFromNames()) that change its behavior to catch only configured names. All of it is intentional to keep the compatibility with what WordPress is capable of if you need that compatibility.

Syntax

All parsers (except WordpressParser) support configurable shortcode syntax which can be configured by passing SyntaxInterface object as the first parameter. There is a convenience class CommonSyntax that contains default syntax. Usage is shown in the examples below:

use Thunder\Shortcode\HandlerContainer\HandlerContainer;
use Thunder\Shortcode\Parser\RegexParser;
use Thunder\Shortcode\Parser\RegularParser;
use Thunder\Shortcode\Processor\Processor;
use Thunder\Shortcode\Shortcode\ShortcodeInterface;
use Thunder\Shortcode\Syntax\CommonSyntax;
use Thunder\Shortcode\Syntax\Syntax;
use Thunder\Shortcode\Syntax\SyntaxBuilder;

$builder = new SyntaxBuilder();

Default syntax (called "common" in this library):

$defaultSyntax = new Syntax(); // without any arguments it defaults to common syntax
$defaultSyntax = new CommonSyntax(); // convenience class
$defaultSyntax = new Syntax('[', ']', '/', '=', '"'); // created explicitly
$defaultSyntax = $builder->getSyntax(); // builder defaults to common syntax

Syntax with doubled tokens:

$doubleSyntax = new Syntax('[[', ']]', '//', '==', '""');
$doubleSyntax = $builder // actually using builder
    ->setOpeningTag('[[')
    ->setClosingTag(']]')
    ->setClosingTagMarker('//')
    ->setParameterValueSeparator('==')
    ->setParameterValueDelimiter('""')
    ->getSyntax();

Something entirely different just to show the possibilities:

$differentSyntax = new Syntax('@', '#', '!', '&', '~');

Verify that each syntax works properly:

$handlers = new HandlerContainer();
$handlers->add('up', function(ShortcodeInterface $s) {
    return strtoupper($s->getContent());
});

$defaultRegex = new Processor(new RegexParser($defaultSyntax), $handlers);
$doubleRegex = new Processor(new RegexParser($doubleSyntax), $handlers);
$differentRegular = new Processor(new RegularParser($differentSyntax), $handlers);

assert('a STRING z' === $defaultRegex->process('a [up]string[/up] z'));
assert('a STRING z' === $doubleRegex->process('a [[up]]string[[//up]] z'));
assert('a STRING z' === $differentRegular->process('a @up#string@!up# z'));

Serialization

This library supports several (un)serialization formats - XML, YAML, JSON and Text. Examples below shows how to both serialize and unserialize the same shortcode in each format:

use Thunder\Shortcode\Serializer\JsonSerializer;
use Thunder\Shortcode\Serializer\TextSerializer;
use Thunder\Shortcode\Serializer\XmlSerializer;
use Thunder\Shortcode\Serializer\YamlSerializer;
use Thunder\Shortcode\Shortcode\Shortcode;

$shortcode = new Shortcode('quote', array('name' => 'Thomas'), 'This is a quote!');

Text:

$text = '[quote name=Thomas]This is a quote![/quote]';
$textSerializer = new TextSerializer();

$serializedText = $textSerializer->serialize($shortcode);
assert($text === $serializedText);
$unserializedFromText = $textSerializer->unserialize($serializedText);
assert($unserializedFromText->getName() === $shortcode->getName());

JSON:

$json = '{"name":"quote","parameters":{"name":"Thomas"},"content":"This is a quote!","bbCode":null}';
$jsonSerializer = new JsonSerializer();
$serializedJson = $jsonSerializer->serialize($shortcode);
assert($json === $serializedJson);
$unserializedFromJson = $jsonSerializer->unserialize($serializedJson);
assert($unserializedFromJson->getName() === $shortcode->getName());

YAML:

$yaml = "name: quote
parameters:
    name: Thomas
content: 'This is a quote!'
bbCode: null
";
$yamlSerializer = new YamlSerializer();
$serializedYaml = $yamlSerializer->serialize($shortcode);
assert($yaml === $serializedYaml);
$unserializedFromYaml = $yamlSerializer->unserialize($serializedYaml);
assert($unserializedFromYaml->getName() === $shortcode->getName());

XML:

$xml = '<?xml version="1.0" encoding="UTF-8"?>
<shortcode name="quote">
  <bbCode/>
  <parameters>
    <parameter name="name"><![CDATA[Thomas]]></parameter>
  </parameters>
  <content><![CDATA[This is a quote!]]></content>
</shortcode>
';
$xmlSerializer = new XmlSerializer();
$serializedXml = $xmlSerializer->serialize($shortcode);
assert($xml === $serializedXml);
$unserializedFromXml = $xmlSerializer->unserialize($serializedXml);
assert($unserializedFromXml->getName() === $shortcode->getName());

Facade also supports serialization in all available formats:

use Thunder\Shortcode\Shortcode\Shortcode;
use Thunder\Shortcode\ShortcodeFacade;

$facade = new ShortcodeFacade();

$shortcode = new Shortcode('name', array('arg' => 'val'), 'content', 'bbCode');

$text = $facade->serialize($shortcode, 'text');
$textShortcode = $facade->unserialize($text, 'text');
assert($shortcode->getName() === $textShortcode->getName());

$json = $facade->serialize($shortcode, 'json');
$jsonShortcode = $facade->unserialize($json, 'json');
assert($shortcode->getName() === $jsonShortcode->getName());

$yaml = $facade->serialize($shortcode, 'yaml');
$yamlShortcode = $facade->unserialize($yaml, 'yaml');
assert($shortcode->getName() === $yamlShortcode->getName());

$xml = $facade->serialize($shortcode, 'xml');
$xmlShortcode = $facade->unserialize($xml, 'xml');
assert($shortcode->getName() === $xmlShortcode->getName());

Handlers

There are several builtin shortcode handlers available in Thunder\Shortcode\Handler namespace. Description below assumes that given handler was registered with xyz name:

NameHandler always returns shortcode's name. [xyz arg=val]content[/xyz] becomes xyz,
ContentHandler always returns shortcode's content. It discards its opening and closing tag. [xyz]code[/xyz] becomes code,
RawHandler returns unprocessed shortcode content. Its behavior is different than FilterRawEventHandler because if content auto processing is turned on, then nested shortcodes handlers were called, just their result was discarded,
NullHandler completely removes shortcode with all nested shortcodes,
DeclareHandler allows to dynamically create shortcode handler with name as first parameter that will also replace all placeholders in text passed as arguments. Example: [declare xyz]Your age is %age%.[/declare] created handler for shortcode xyz and when used like [xyz age=18] the result is Your age is 18.,
EmailHandler replaces the email address or shortcode content as clickable mailto: link:
- [xyz="[email protected]" /] becomes <a href="[email protected]">[email protected]</a>,
- [xyz][email protected][/xyz] becomes <a href="[email protected]">[email protected]</a>,
- [xyz="[email protected]"]Contact me![/xyz] becomes <a href="[email protected]">Contact me!</a>,
PlaceholderHandler replaces all placeholders in shortcode's content with values of passed arguments. [xyz year=1970]News from year %year%.[/xyz] becomes News from year 1970.,
SerializerHandler replaces shortcode with its serialized value using serializer passed as an argument in class' constructor. If configured with JsonSerializer, [xyz /] becomes {"name":"json", "arguments": [], "content": null, "bbCode": null}. This could be useful for debugging your shortcodes,
UrlHandler replaces its content with a clickable link:
- [xyz]http://example.com[/xyz] becomes <a href="http://example.com">http://example.com</a>,
- [xyz="http://example.com"]Visit my site![/xyz] becomes <a href="http://example.com">Visit my site!</a>,
WrapHandler allows to specify the value that should be placed before and after shortcode content. If configured with <strong> and </strong>, the text [xyz]Bold text.[/xyz] becomes <strong>Bold text.</strong>.

Contributing

Want to contribute? Perfect! Submit an issue or Pull Request and explain what would you like to see in this library.

License

See LICENSE file in the main directory of this library.

Comments

Shortcodes ignored with multiple nested levels

Hi,

First, thank you for making this library available. It appears very comprehensive, but I think I've found an issue that I am banging my head against for the past 3 days. In short, I believe that this code starts to omit shortcodes when there are several levels of. Please see my input text with shortcodes below:

[la-row] [la-column width="100%"] [la-text format="h1"]Welcome![/la-text] [la-text]This page allows you to send commands to LightAct. These commands are simple strings that can be read with LightAct Layer Layouts and acted upon. This page uses standard web technologies such as html, Jquery, and AJAX so, if you can use these frameworks, you can write your own page. You can also use our own page builder, which you can access at[/la-text] [/la-column] [/la-row] [la-row] [la-column width="25%"] [la-text format="h4"]Sample heading 3[/la-text] [la-text]Sample text[/la-text] [/la-column] [la-column width="25%"] [la-text format="h4"]Sample heading 3[/la-text] [la-text]Sample text[/la-text] [/la-column] [la-column width="25%"] [la-text format="h4"]Sample heading 3[/la-text] [la-text]Sample text[/la-text] [/la-column] [la-column width="25%"] [la-text format="h4"]Sample heading 3[/la-text] [la-text]Sample text[/la-text] [/la-column] [/la-row]

I've written a php code using your library which transforms the above text into this webpage. Up to here it all works fine.

But if I multiply the above shortcodes 4 times, the last couple of columns start to get omitted as shown here.

Now this is only one manifestation of this issue. From my experience, the more shortcodes there are, especially if they are nested, the sooner this problem appears.

I was wondering if there is anything you can do to help?
issue

opened by visiblegroup 43
Please provide working examples

Your documentations seems to be hard to understand, Can you provide working examples for each of the base functions so that one can get better understanding?

Thanks
issue

opened by NishantSolanki 27
[*] Asterisk not allowed/working handle name?
Hi, thanks for the project! I like it very much so far. Nonetheless, I'm having a hard Time registering some handles. Trying to register [*] as handle Name doesn't result in an Exception but neither does it seem to result in a working handle. It is not parsed from the input. For example:

$facade = new ShortcodeFacade(); $facade->addHandler('*', function(ShortcodeInterface $s) { return '<li>' .$s->getContent() .'</li>'; }); echo $facade->process('[*]Hello World[/*]');

results in:

[*]Hello World[/*]

I tried escaping the asterisk as well. Is it me doing something wrong? I mean [*] is a pretty standard BBCode Element, isn't it? Would be a pain to replace it in WYSIWYG editors for this reason.

Thanks, Mark.
patch
opened by mark-win 16
Skip processing handler if this is not a valid shortcode
I was getting some very strange content corruption when I enabled my shortcode plugin that uses your fantastic little shortcode library. This was caused by my having some square bracketed content that did not match up with any of my defined shortcode handlers:

##### html([title][, alt][, classes]) !! In Markdown this method is implicitly called when using the `![]` syntax. The `html` action will output a valid HTML tag for the media based on the current display mode.

This [title], [, alt] and [, classes] text is used in my documentation to display optional method params. However, it was corrupting that title to something like:

<h5>html([t[title] alt][, classes])</h5>

I debugged this in the Shortcode library and discovered that you were first collecting all the potential shortcodes in the entire page (quite a lot for me), and then iterating over them. However, when you retrieved the handler for every shortcode, it would continue to try to process the shortcode even if the handler was null. This ultimately leads to all kinds of corruption of the end content.

Simply checking to see if this handler is null, and then continuing without doing any more processing ensures there is no corruption, and also speeds up the page parsing considerably.

FYI - I created a core shortcode plugin for Grav CMS, and a subsequent shortcode plugin that adds some UI specific shortcodes in case you are interested.

https://getgrav.org

https://github.com/getgrav/grav-plugin-shortcode-core

https://github.com/getgrav/grav-plugin-shortcode-ui
opened by rhukster 16

Nested shortcode issue

Fails

<?php
include 'vendor/autoload.php';

use Thunder\Shortcode\ShortcodeFacade;
use Thunder\Shortcode\Shortcode\ShortcodeInterface;

$facade = new ShortcodeFacade();
$facade->addHandler('hello', function(ShortcodeInterface $s) {
    return sprintf('Hello, %s!' . $s->getContent(), $s->getParameter('name'));
});

$text = '
    <p>Start</p>
    [hello name="Thomas"]
        [hello name="Peter"]
    [/hello]
    <p>End</p>
';
echo $facade->process($text);

Result

<p>Start</p>
Hello, Thomas!
  [hello name="Peter"]
[/hello]
<p>End</p>

Works

$text = '
    [hello name="Thomas"]
        [hello name="Peter"]
    [/hello]
';

Result

Hello, Thomas!
  Hello, Peter!

issue

opened by jenstornell 15

Unicode character breaks shortcode replacement
Hey,

I'm testing following assertion based on README example:

$handlers = new HandlerContainer(); $handlers->add('sample', function(ShortcodeInterface $s) { return (new JsonSerializer())->serialize($s); }); $processor = new Processor(new RegexParser(), $handlers); $text = 'x [sample arg=val]cnt[/sample] y'; $result = 'x {"name":"sample","args":{"arg":"val"},"content":"cnt"} y'; assert($result === $processor->process($text));

it works fine unless I put some unicode characters inside shortcode.

In case of polish character ń it returns 'x {"name":"sample","parameters":{"arg":"val"},"content":"\u0144","bbCode":null}] y' (notice closing ]). In case of 4 polish characters żółć it returns 'x {"name":"sample","parameters":{"arg":"val"},"content":"\u017c\u00f3\u0142\u0107","bbCode":null}ple] y' (notice ple] after shortcode replacement).

I'm using PHP Version 5.5.9-1ubuntu4.11 with Multibyte Support enabled.
bug
opened by michaloo 15
Strip out
elements

For example TinyMCE, wraps everything in
tags. Block elements like embedded elements, should be stripped of this tag, for HTML5 compliance. So if the block-element is <p>[embedcode]</p>, the resulting output should be just the embedcode, without the <p> elements.
issue

opened by Firesphere 13
On large HTML files, RegularParser does not fire any handler.

I've been implementing this library on a project in my company, which scraps a whole local Wordpress with httrack and then modifies the files in order to fit to our needs.

To make some parts dynamic, we're implementing shortcodes (which we translate to custom PHP code). I've been testing your library without any problems, but when I gave it a definitive file, it didn't triggered any handler, and the parser pushed PHP memory consumption over 800MB.

After several tests, I've decided to try the RegexParser. It parsed the files correctly, and using a riddiculous amount of time and memory compared with the standard one.

I can understand the extra time and memory consumption (uncer certain limits), but I don't get why the parser didn't saw any of my tags, I am the only one who suffered this issue?

BTW, thank you very much for your work, your library is awesome and I'm enjoying it a lot!
issue

opened by devnix 12
Question: Best way to handle a `[raw][/raw]` shortcode?
I'm trying to find the best way to write a [raw][/raw] shortcode that stops the Shortcode library from processing anything between these raw tags. My current implementation fakes it by taking setting the 'text' back to the original unmodified text. However, this doesn't stop Shortcode from processing that inner stuff first.

private function addRawHandler() { $this->handlers->add('raw', function(ShortcodeInterface $shortcode) { $raw = trim(preg_replace('/\[raw\](.*?)\[\/raw\]/is','${1}', $shortcode->getShortcodeText())); return $raw; }); }

This raised it's head when I ran into the [0] bug in some javascript example code on my page. While a fix was quickly found, a situation could come up where a fix is impossible or not practical. Turning off shortcodes processing in parts of a page is therefore an important function to have. What is the better way of doing this?

Cheers!
issue
opened by rhukster 11
Underscores in tag names not matched

Wasn't sure if this was a deliberate decision but in parsing tags containing underscores aren't matched.

eg: [testtag] works but [test_tag] does not.

Both the regular parser here: https://github.com/thunderer/Shortcode/blob/master/src/Parser/RegularParser.php#L81 and the Wordpress parser here: https://github.com/thunderer/Shortcode/blob/master/src/Parser/WordpressParser.php#L27

use a slightly different match syntax but neither contain the _ character.

Reason I ask is because I'm trying to use this library to make a content importer from Wordpress and Wordpress does seem to parse them correctly.
bug

opened by rossriley 10
Wrong parent shortcode returned in nested shortcodes
Hello, I'm dealing with nested shortcodes and I faced the following issue. Giving this shortcode structure:

[shortcode1] [shortcode2] [shortcode3] [/shortcode3] [/shortcode2] [/shortcode1]

calling getParent() on shortcode3 returns the shortcode1 and not the expected shortcode2.
bug
opened by giansi 10
Modifying shortcode replacements via event handler?

Is it possible to use either Events::FILTER_SHORTCODES or Events::REPLACE_SHORTCODES to modify/extend the replacement string provided by the handler?

I am looking for a way to insert additional HTML (say, a </div>...<div>) around shortcodes. When determining this additional HTML, I'd need to know at lease the shortcode name. And I would like to keep this out of the handler, because it is context/situation specific.

Any pointers?
issue

opened by mpdude 4
BBCode shortcode parameter value chokes on the tag closing character

Using CommonSyntax, the following text won't be picked up by the regular parser: [url=http://example.com]link[/url]

The parser correctly recognize the opening shortcode, continue to getting the parameter value, then finds a /which is a marker token, then look for a closing token, doesn't find it and return false.

Using parameter delimiter like this [url="http://example.com"]link[/url] yields the expected result.
bug

opened by MrPetovan 4
Ideas
Just a list of issues to remember:

[ ] custom exceptions for specific cases rather than using generic ones,

[ ] recipes for specific tasks (like find all shortcode names in text),

[ ] upgrade RegularParser to not split text into tokens at once but iterate over characters,

[ ] shortcodes with children to represent their structure in text,

[ ] shortcode's getTextUntilNext() which returns the content up to the next shortcode opening tag,

[ ] shortcode name wildcard,

[ ] more complex positional parameters [x=http://x.com], [x x=http://x.com],

[ ] shortcode escaping by doubling opening / closing token [[x /]], [[y a=b]],

[ ] tools for text analysis (nesting depth, count, name stats, etc.)

[ ] configurable shortcode name validation rules,

[ ] solve BBCode ambiguity issues in [x=] and [x= arg=val],

[ ] add ProcessedShortcodeInterface and Processor::withShortcodeFactory() (think of a better name) to allow creating custom shortcode objects using ProcessorContext that are compatible with ProcessedShortcode. This will allow users to put their information inside while still maintaining how library works,

[ ] fix inconsistency between getText and getShortcodeText between ParsedShortcode and ProcessedShortcode,

[x] make ShortcodeFacade a mutable class with all the shortcuts to ease library usage (#36),

[ ] new regex parser that catches only opening/closing tags and handles nesting manually,

[ ] repeated shortcodes, ie. ability to repeat inner content with collections of data, [list]- [item/],[/list] which renders multiple "item" elements, children of list (item shortcodes) receive context from data passed to list,

[x] better documentation than it is now in README (#34),

[x] extract several event listener classes with __invoke() to ease events usage (#33),

[x] find a way to strip content outside shortcodes in given shortcode content without losing tree structure (apply results event),

[x] ~~configurable handler for producing Processor::process() return value using array of replacements~~ (events FTW),

[x] issue with losing shortcode parent when many same shortcodes are on the same level,

[x] ~~HUGE BC (just an idea, no worries): Change ProcessorInterface::process() to receive array of parsed shortcodes to allow greater flexibility (eg. filtering by parent)~~,

[x] find a way to retrieve original shortcode content even when auto-processing,

[x] prevent the ability to create shortcode with falsy name (require non-empty string value),

[x] resolve naming problem for shortcode parameters (current) vs. arguments,

[x] suggest symfony/yaml in composer.json for YAML serializer,

[x] allow escaping with double open/close tags like in WordPress ([[code]value[/code]]),

[x] add Wordpress parser (regex parser with code from WP).

Regular parser:

[x] fix BBCode parsing (unused variable).

BBCode:

[x] decide whether bbcode value should be:

merged with parameters as parameter with shortcode name, no API change, easier version,

a separate element, fourth parameter in Shortcode constructor, and separate getter,

a separate shortcode type (requires changes in both Parsed and Processed as well).

Built-in handlers:

[x] ~~DeclareHandler should typehint interface in constructor,~~ (needs add() method)

[x] EmailHandler could be a BBCode,

[x] PlaceholderHandler should have configurable placeholder braces,

[x] UrlHandler could be a BBCode,

[x] WrapHandler could have several most common variants (eg. bold) created as named constructors.

issue
opened by thunderer 13

Releases(v0.7.5)

v0.7.5(Jan 13, 2022)
[x] full Psalm type coverage,

[x] moved from Travis to GitHub Actions,

[x] support for PHP 8.1,

[x] CI runs Infection.

Source code(tar.gz)
Source code(zip)
v0.7.4(Mar 8, 2020)

Source code(tar.gz)
Source code(zip)
v0.7.3(Dec 3, 2019)

Fixed PHP 7.4 compatibility errors reported in #81 and #82.
Source code(tar.gz)
Source code(zip)
v0.7.2(Apr 20, 2019)

Fixed #77, merged quality of life improvements from #73.
Source code(tar.gz)
Source code(zip)
v0.7.1(Feb 3, 2019)

Fixed #74.
Source code(tar.gz)
Source code(zip)
v0.7.0(Dec 18, 2018)
[x] many RegularParser improvements and fixes:

backtracks now rely on their offsets only, this is an over 10x performance and memory usage improvement which evens its performance with other parsers while still keeping its feature advantage,

subsequent non-token text fragments are now reported as single T_STRING tokens,

fixed #70, preg_match_all() with large inputs was sometimes silently failing and returning only subset of matches which reduced the number of reported shortcodes,

fixed #58 where invalid token sequences in shortcode content may confuse the parser,

inlined content() method effectively halving the call nesting level,

disabled xdebug.max_nesting_level during parse() to prevent development environment parsing errors,

[x] added support for PHPUnit 6.x with fallback translation for PHP 5.x compatibility,

[x] dropped PHP 5.3 (still supported) and added PHP 7.2 from Travis matrix,

[x] asterisk * is now a valid shortcode name,

[x] minor internal Processor improvements,

[x] minor README updates.

Source code(tar.gz)
Source code(zip)
v0.6.5(Jan 8, 2017)

Extended parameter simple values possible value range, #44.
Source code(tar.gz)
Source code(zip)
v0.6.4(Dec 13, 2016)

Fixed minor WordPress compatibility issue with content detection in WordpressParser.
Source code(tar.gz)
Source code(zip)
v0.6.3(Aug 10, 2016)

Fixed bug happening when computing replacement of shortcode without handler that contained multibyte content.
Source code(tar.gz)
Source code(zip)
v0.6.2(May 19, 2016)

Fixed issue with parsing shortcode tokens inside shortcode content.
Source code(tar.gz)
Source code(zip)
v0.6.1(Feb 25, 2016)

Fixed bug with not recalculating new text length after applying shortcode replacement which caused the replacements to be applied only up to the length of source text. Thanks to @Jonatanmdez for reporting it in issue #37.
Source code(tar.gz)
Source code(zip)
v0.6.0(Feb 15, 2016)

Events subsystem, builtin event handlers, builtin shortcode handlers, rewritten README, standardized shortcode name validation, ReplacedShortcode to represent handler result instead of internal array, getBaseOffset() and better WordpressParser compatibility with WordPress.
Source code(tar.gz)
Source code(zip)
v0.5.3(Jan 27, 2016)

Massive performance improvements in RegularParser, fixed problem with multibyte characters in parsed texts, fixed matching shortcodes with invalid names.
Source code(tar.gz)
Source code(zip)
v0.5.2(Jan 20, 2016)

Fixed bug with subsequent string tokens in RegularParser, added corresponding test cases.
Source code(tar.gz)
Source code(zip)
v0.5.1(Nov 12, 2015)

Fixed a bug with shortcode replacement when it contained multibyte characters.
Source code(tar.gz)
Source code(zip)
v0.5.0(Oct 28, 2015)

Removed all deprecated features from v0.4.0, added XML and YAML serializers, added RegularParser and WordpressParser, added HandlerContainer abstraction. This was meant to be a 1.0.0 release, but I'm still working on several ideas and want to test them before first stable release (should be released soon).
Source code(tar.gz)
Source code(zip)
v0.4.0(Jul 15, 2015)

Intermediate release with many fixes and improvements. BC was kept intact, though nearly all classes were moved into new places. Introduced UPGRADE document which describes possible breaking changes in future releases. Next release will be BC breaking and will clean everything.
Source code(tar.gz)
Source code(zip)
v0.3.0(May 8, 2015)

Whitespaced syntax, library facade and self-closing tags.
Source code(tar.gz)
Source code(zip)
v0.2.2(Apr 26, 2015)

Fixed support for PHP 5.3.
Source code(tar.gz)
Source code(zip)
v0.2.1(Apr 23, 2015)

Fixed matching simple parameter values using delimiters, added missing support for escaping characters inside parameter values.
Source code(tar.gz)
Source code(zip)
v0.2.0(Apr 17, 2015)

Recursive and iterative processing, HandlerInterface, default handler, handler aliases and configurable shortcode syntax with builder.
Source code(tar.gz)
Source code(zip)
v0.1.0(Apr 7, 2015)

First release.
Source code(tar.gz)
Source code(zip)