A super fast, highly extensible markdown parser for PHP

Carsten Brandt

Last update: Dec 16, 2022

Related tags

Markup markdown gfm php markdown-parser markdown-to-html markdown-converter hacktoberfest markdown-flavors markdown-extra

Overview

A super fast, highly extensible markdown parser for PHP

What is this?

A set of PHP classes, each representing a Markdown flavor, and a command line tool for converting markdown files to HTML files.

The implementation focus is to be fast (see benchmark) and extensible. Parsing Markdown to HTML is as simple as calling a single method (see Usage) providing a solid implementation that gives most expected results even in non-trivial edge cases.

Extending the Markdown language with new elements is as simple as adding a new method to the class that converts the markdown text to the expected output in HTML. This is possible without dealing with complex and error prone regular expressions. It is also possible to hook into the markdown structure and add elements or read meta information using the internal representation of the Markdown text as an abstract syntax tree (see Extending the language).

Currently the following markdown flavors are supported:

Traditional Markdown according to http://daringfireball.net/projects/markdown/syntax (try it!).
Github flavored Markdown according to https://help.github.com/articles/github-flavored-markdown (try it!).
Markdown Extra according to http://michelf.ca/projects/php-markdown/extra/ (currently not fully supported WIP see #25, try it!)
Any mixed Markdown flavor you like because of its highly extensible structure (See documentation below).

Future plans are to support:

Smarty Pants http://daringfireball.net/projects/smartypants/
... (Feel free to suggest further additions!)

Who is using it?

It powers the API-docs and the definitive guide for the Yii Framework 2.0.

Installation

PHP 5.4 or higher is required to use it. It will also run on facebook's hhvm.

The library uses PHPDoc annotations to determine the markdown elements that should be parsed. So in case you are using PHP opcache, make sure it does not strip comments.

Installation is recommended to be done via composer by running:

composer require cebe/markdown "~1.2.0"

Alternatively you can add the following to the require section in your composer.json manually:

"cebe/markdown": "~1.2.0"

Run composer update cebe/markdown afterwards.

Note: If you have configured PHP with opcache you need to enable the opcache.save_comments option because inline element parsing relies on PHPdoc annotations to find declared elements.

Usage

In your PHP project

To parse your markdown you need only two lines of code. The first one is to choose the markdown flavor as one of the following:

Traditional Markdown: $parser = new \cebe\markdown\Markdown();
Github Flavored Markdown: $parser = new \cebe\markdown\GithubMarkdown();
Markdown Extra: $parser = new \cebe\markdown\MarkdownExtra();

The next step is to call the parse()-method for parsing the text using the full markdown language or calling the parseParagraph()-method to parse only inline elements.

Here are some examples:

// traditional markdown and parse full text
$parser = new \cebe\markdown\Markdown();
echo $parser->parse($markdown);

// use github markdown
$parser = new \cebe\markdown\GithubMarkdown();
echo $parser->parse($markdown);

// use markdown extra
$parser = new \cebe\markdown\MarkdownExtra();
echo $parser->parse($markdown);

// parse only inline elements (useful for one-line descriptions)
$parser = new \cebe\markdown\GithubMarkdown();
echo $parser->parseParagraph($markdown);

You may optionally set one of the following options on the parser object:

For all Markdown Flavors:

$parser->html5 = true to enable HTML5 output instead of HTML4.
$parser->keepListStartNumber = true to enable keeping the numbers of ordered lists as specified in the markdown. The default behavior is to always start from 1 and increment by one regardless of the number in markdown.

For GithubMarkdown:

$parser->enableNewlines = true to convert all newlines to <br/>-tags. By default only newlines with two preceding spaces are converted to <br/>-tags.

It is recommended to use UTF-8 encoding for the input strings. Other encodings may work, but are currently untested.

The command line script

You can use it to render this readme:

bin/markdown README.md > README.html

Using github flavored markdown:

bin/markdown --flavor=gfm README.md > README.html

or convert the original markdown description to html using the unix pipe:

curl http://daringfireball.net/projects/markdown/syntax.text | bin/markdown > md.html

Here is the full Help output you will see when running bin/markdown --help:

PHP Markdown to HTML converter
------------------------------

by Carsten Brandt <[email protected]>

Usage:
    bin/markdown [--flavor=<flavor>] [--full] [file.md]

    --flavor  specifies the markdown flavor to use. If omitted the original markdown by John Gruber [1] will be used.
              Available flavors:

              gfm   - Github flavored markdown [2]
              extra - Markdown Extra [3]

    --full    ouput a full HTML page with head and body. If not given, only the parsed markdown will be output.

    --help    shows this usage information.

    If no file is specified input will be read from STDIN.

Examples:

    Render a file with original markdown:

        bin/markdown README.md > README.html

    Render a file using gihtub flavored markdown:

        bin/markdown --flavor=gfm README.md > README.html

    Convert the original markdown description to html using STDIN:

        curl http://daringfireball.net/projects/markdown/syntax.text | bin/markdown > md.html


[1] http://daringfireball.net/projects/markdown/syntax
[2] https://help.github.com/articles/github-flavored-markdown
[3] http://michelf.ca/projects/php-markdown/extra/

Security Considerations

By design markdown allows HTML to be included within the markdown text. This also means that it may contain Javascript and CSS styles. This allows to be very flexible for creating output that is not limited by the markdown syntax, but it comes with a security risk if you are parsing user input as markdown (see XSS).

In that case you should process the result of the markdown conversion with tools like HTML Purifier that filter out all elements which are not allowed for users to be added.

The list of allowed elements for markdown could be configured as:

[
    'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
    'hr',
    'pre', 'code',
    'blockquote',
    'table', 'tr', 'td', 'th', 'thead', 'tbody',
    'strong', 'em', 'b', 'i', 'u', 's', 'span',
    'a', 'p', 'br', 'nobr',
    'ul', 'ol', 'li',
    'img',
],

The list of allowed attributes would be:

['th.align', 'td.align', 'ol.start', 'code.class']

The above configuration is a general recommendation and may need to be adjusted dependent on your needs.

Extensions

Here are some extensions to this library:

Bogardo/markdown-codepen - shortcode to embed codepens from http://codepen.io/ in markdown.
cebe/markdown-latex - Convert Markdown to LaTeX and PDF
softark/creole - A creole markup parser
hyn/frontmatter - Frontmatter Metadata Support (JSON, TOML, YAML)
... add yours!

Extending the language

Markdown consists of two types of language elements, I'll call them block and inline elements simlar to what you have in HTML with <div> and <span>. Block elements are normally spreads over several lines and are separated by blank lines. The most basic block element is a paragraph (<p>). Inline elements are elements that are added inside of block elements i.e. inside of text.

This markdown parser allows you to extend the markdown language by changing existing elements behavior and also adding new block and inline elements. You do this by extending from the parser class and adding/overriding class methods and properties. For the different element types there are different ways to extend them as you will see in the following sections.

Adding block elements

The markdown is parsed line by line to identify each non-empty line as one of the block element types. To identify a line as the beginning of a block element it calls all protected class methods who's name begins with identify. An identify function returns true if it has identified the block element it is responsible for or false if not. In the following example we will implement support for fenced code blocks which are part of the github flavored markdown.

<?php

class MyMarkdown extends \cebe\markdown\Markdown
{
	protected function identifyFencedCode($line, $lines, $current)
	{
		// if a line starts with at least 3 backticks it is identified as a fenced code block
		if (strncmp($line, '```', 3) === 0) {
			return true;
		}
		return false;
	}

	// ...
}

In the above, $line is a string containing the content of the current line and is equal to $lines[$current]. You may use $lines and $current to check other lines than the current line. In most cases you can ignore these parameters.

Parsing of a block element is done in two steps:

Consuming all the lines belonging to it. In most cases this is iterating over the lines starting from the identified line until a blank line occurs. This step is implemented by a method named consume{blockName}() where {blockName} is the same name as used for the identify function above. The consume method also takes the lines array and the number of the current line. It will return two arguments: an array representing the block element in the abstract syntax tree of the markdown document and the line number to parse next. In the abstract syntax array the first element refers to the name of the element, all other array elements can be freely defined by yourself. In our example we will implement it like this:

 protected function consumeFencedCode($lines, $current)
 {
 	// create block array
 	$block = [
 		'fencedCode',
 		'content' => [],
 	];
 	$line = rtrim($lines[$current]);

 	// detect language and fence length (can be more than 3 backticks)
 	$fence = substr($line, 0, $pos = strrpos($line, '`') + 1);
 	$language = substr($line, $pos);
 	if (!empty($language)) {
 		$block['language'] = $language;
 	}

 	// consume all lines until ```
 	for($i = $current + 1, $count = count($lines); $i < $count; $i++) {
 		if (rtrim($line = $lines[$i]) !== $fence) {
 			$block['content'][] = $line;
 		} else {
 			// stop consuming when code block is over
 			break;
 		}
 	}
 	return [$block, $i];
 }

Rendering the element. After all blocks have been consumed, they are being rendered using the render{elementName}()-method where elementName refers to the name of the element in the abstract syntax tree:
```
 protected function renderFencedCode($block)
 {
 	$class = isset($block['language']) ? ' class="language-' . $block['language'] . '"' : '';
 	return "<pre><code$class>" . htmlspecialchars(implode("\n", $block['content']) . "\n", ENT_NOQUOTES, 'UTF-8') . '</code></pre>';
 }
```
You may also add code highlighting here. In general it would also be possible to render ouput in a different language than HTML for example LaTeX.

Adding inline elements

Adding inline elements is different from block elements as they are parsed using markers in the text. An inline element is identified by a marker that marks the beginning of an inline element (e.g. [ will mark a possible beginning of a link or ` will mark inline code).

Parsing methods for inline elements are also protected and identified by the prefix parse. Additionally a @marker annotation in PHPDoc is needed to register the parse function for one or multiple markers. The method will then be called when a marker is found in the text. As an argument it takes the text starting at the position of the marker. The parser method will return an array containing the element of the abstract sytnax tree and an offset of text it has parsed from the input markdown. All text up to this offset will be removed from the markdown before the next marker will be searched.

As an example, we will add support for the strikethrough feature of github flavored markdown:

<?php

class MyMarkdown extends \cebe\markdown\Markdown
{
	/**
	 * @marker ~~
	 */
	protected function parseStrike($markdown)
	{
		// check whether the marker really represents a strikethrough (i.e. there is a closing ~~)
		if (preg_match('/^~~(.+?)~~/', $markdown, $matches)) {
			return [
			    // return the parsed tag as an element of the abstract syntax tree and call `parseInline()` to allow
			    // other inline markdown elements inside this tag
				['strike', $this->parseInline($matches[1])],
				// return the offset of the parsed text
				strlen($matches[0])
			];
		}
		// in case we did not find a closing ~~ we just return the marker and skip 2 characters
		return [['text', '~~'], 2];
	}

	// rendering is the same as for block elements, we turn the abstract syntax array into a string.
	protected function renderStrike($element)
	{
		return '<del>' . $this->renderAbsy($element[1]) . '</del>';
	}
}

Composing your own Markdown flavor

This markdown library is composed of traits so it is very easy to create your own markdown flavor by adding and/or removing the single feature traits.

Designing your Markdown flavor consists of four steps:

Select a base class
Select language feature traits
Define escapeable characters
Optionally add custom rendering behavior

Select a base class

If you want to extend from a flavor and only add features you can use one of the existing classes (Markdown, GithubMarkdown or MarkdownExtra) as your flavors base class.

If you want to define a subset of the markdown language, i.e. remove some of the features, you have to extend your class from Parser.

Select language feature traits

The following shows the trait selection for traditional Markdown.

class MyMarkdown extends Parser
{
	// include block element parsing using traits
	use block\CodeTrait;
	use block\HeadlineTrait;
	use block\HtmlTrait {
		parseInlineHtml as private;
	}
	use block\ListTrait {
		// Check Ul List before headline
		identifyUl as protected identifyBUl;
		consumeUl as protected consumeBUl;
	}
	use block\QuoteTrait;
	use block\RuleTrait {
		// Check Hr before checking lists
		identifyHr as protected identifyAHr;
		consumeHr as protected consumeAHr;
	}
	// include inline element parsing using traits
	use inline\CodeTrait;
	use inline\EmphStrongTrait;
	use inline\LinkTrait;

	/**
	 * @var boolean whether to format markup according to HTML5 spec.
	 * Defaults to `false` which means that markup is formatted as HTML4.
	 */
	public $html5 = false;

	protected function prepare()
	{
		// reset references
		$this->references = [];
	}

	// ...
}

In general, just adding the trait with use is enough, however in some cases some fine tuning is desired to get most expected parsing results. Elements are detected in alphabetical order of their identification function. This means that if a line starting with - could be a list or a horizontal rule, the preference has to be set by renaming the identification function. This is what is done with renaming identifyHr to identifyAHr and identifyBUl to identifyBUl. The consume function always has to have the same name as the identification function so this has to be renamed too.

There is also a conflict for parsing of the < character. This could either be a link/email enclosed in < and > or an inline HTML tag. In order to resolve this conflict when adding the LinkTrait, we need to hide the parseInlineHtml method of the HtmlTrait.

If you use any trait that uses the $html5 property to adjust its output you also need to define this property.

If you use the link trait it may be useful to implement prepare() as shown above to reset references before parsing to ensure you get a reusable object.

Define escapeable characters

Depending on the language features you have chosen there is a different set of characters that can be escaped using \. The following is the set of escapeable characters for traditional markdown, you can copy it to your class as is.

	/**
	 * @var array these are "escapeable" characters. When using one of these prefixed with a
	 * backslash, the character will be outputted without the backslash and is not interpreted
	 * as markdown.
	 */
	protected $escapeCharacters = [
		'\\', // backslash
		'`', // backtick
		'*', // asterisk
		'_', // underscore
		'{', '}', // curly braces
		'[', ']', // square brackets
		'(', ')', // parentheses
		'#', // hash mark
		'+', // plus sign
		'-', // minus sign (hyphen)
		'.', // dot
		'!', // exclamation mark
		'<', '>',
	];

Add custom rendering behavior

Optionally you may also want to adjust rendering behavior by overriding some methods. You may refer to the consumeParagraph() method of the Markdown and GithubMarkdown classes for some inspiration which define different rules for which elements are allowed to interrupt a paragraph.

Acknowledgements

I'd like to thank @erusev for creating Parsedown which heavily influenced this work and provided the idea of the line based parsing approach.

FAQ

Why another markdown parser?

While reviewing PHP markdown parsers for choosing one to use bundled with the Yii framework 2.0 I found that most of the implementations use regex to replace patterns instead of doing real parsing. This way extending them with new language elements is quite hard as you have to come up with a complex regex, that matches your addition but does not mess with other elements. Such additions are very common as you see on github which supports referencing issues, users and commits in the comments. A real parser should use context aware methods that walk trough the text and parse the tokens as they find them. The only implentation that I have found that uses this approach is Parsedown which also shows that this implementation is much faster than the regex way. Parsedown however is an implementation that focuses on speed and implements its own flavor (mainly github flavored markdown) in one class and at the time of this writing was not easily extensible.

Given the situation above I decided to start my own implementation using the parsing approach from Parsedown and making it extensible creating a class for each markdown flavor that extend each other in the way that also the markdown languages extend each other. This allows you to choose between markdown language flavors and also provides a way to compose your own flavor picking the best things from all. I chose this approach as it is easier to implement and also more intuitive approach compared to using callbacks to inject functionallity into the parser.

Where do I report bugs or rendering issues?

Just open an issue on github, post your markdown code and describe the problem. You may also attach screenshots of the rendered HTML result to describe your problem.

How can I contribute to this library?

Check the CONTRIBUTING.md file for more info.

Am I free to use this?

This library is open source and licensed under the MIT License. This means that you can do whatever you want with it as long as you mention my name and include the license file. Check the license for details.

Contact

Feel free to contact me using email or twitter.

Comments

composer up not working for yii2
After composer up --prefer-dist I get

Problem 1 - Installation request for yiisoft/yii2 2.0.14.2 -> satisfiable by yiisoft/yii2[2.0.14.2]. - yiisoft/yii2 2.0.14.2 requires cebe/markdown ~1.0.0 | ~1.1.0 -> no matching package found.

This problem I get after your new release 1.2. https://github.com/cebe/markdown/releases/tag/1.2.0

3 days ago composer up worked fine.
opened by loveorigami 15
A small change request in Parser class
Hi,

First of all thanks for this great library. It's really awesome!

I'm trying to use this library in my project. I want to extend the Markdown class and override its parse() and parseBlocks() methods (inherited from Parser class), but I can't do that because some fields/methods in Parser class are private.

So, my change request is:

make Parser::_depth property protected instead of private;

make Parser::_blockTypes property protected instead of private;

make Parser::_inlineMarkers property protected instead of private;

make Parser::prepareMarkers() method protected instead of private

Is it possible?
enhancement under discussion
opened by olegkrivtsov 15
Metadata support?

Thoughts on adding support for metadata?

I don't think this would be difficult to implement - I happen to need it, and I'm using your library, so wondering if you would be interested in a PR adding support for it.

My use-case is pretty trivial, I need a title and possibly a short title for generating menus for a static website from markdown files - but this could have lots of nice applications, e.g. meta descriptions or marginalia in documents in content management systems that use markdown.

opened by mindplay-dk 14
Invalid output utf8 chars
Hi, Cebe. May all the same you will add the flag u, as shown PHP Regex: How to match \r and \n without using [\r\n].

Namely:

$text = preg_replace('~\R~u', "\n", $text);
bug
opened by romeOz 12
$Only \r\n, \n\r, \r, \n and \f will be treat as line break characters -- No longer use preg_replace$

Only \r\n, \n\r, \r, \n and \f will be treat as line break characters -- No longer use preg_replace

As the preg_replace('~\R~') will actually breaks some UTF-8 character some how.

And the '~\R~u' will break the test case: BaseMarkdownTest::testInvalidUtf8 as it replace too many i guess.

I also add the test data that actually saved in UTF-8 into test case. See unicode.md and unicode.html for detail.

opened by ghost 11

Github flavoured markdown nest code blocks with language in em

Markdown source

\```bash
iptables -t nat -N freedom-https
iptables -t nat -A freedom-https -j ACCEPT

iptables -t nat -A freedom -j freedom-https
iptables -t nat -A freedom -p tcp -j REDIRECT --to-ports <SS-REDIR-PORT>
\```

was parsed into

<pre><em><em><em><code class="  language-bash">iptables -t nat -N freedom-https
iptables -t nat -A freedom-https -j ACCEPT

iptables -t nat -A freedom -j freedom-https
iptables -t nat -A freedom -p tcp -j REDIRECT --to-ports &lt;SS-REDIR-PORT&gt;
</code></em></em></em></pre>

opened by snakevil 10

Performance is quite down after new refactoring and extension approach

Runtime: PHP5.4.30-2+deb.sury.org~precise+1
Host:    Linux cebe-desktop 3.2.0-63-generic-pae #95-Ubuntu SMP Thu May 15 23:26:11 UTC 2014 i686
Profile: Basic markdown content with all official syntax / 1000 times
Class:   Markbench\Profile\DefaultProfile

+----------------------+-----------+---------+---------------+---------+--------------+
| package              | version   | dialect | duration (MS) | MEM (B) | PEAK MEM (B) |
+----------------------+-----------+---------+---------------+---------+--------------+
| erusev/parsedown     | 1.1.0     | extra   | 5855          | 6815744 | 6815744      |
| erusev/parsedown     | 1.1.0     |         | 6256          | 6815744 | 6815744      |
| cebe/markdown        | 1.0.x-dev |         | 9015          | 6815744 | 6815744      |
| cebe/markdown        | 1.0.x-dev | extra   | 9261          | 6815744 | 6815744      |
| cebe/markdown        | 1.0.x-dev | gfm     | 10262         | 6815744 | 6815744      |
| michelf/php-markdown | 1.4.1     |         | 13147         | 7077888 | 7077888      |
| michelf/php-markdown | 1.4.1     | extra   | 16198         | 6815744 | 7077888      |
| kzykhys/ciconia      | v1.0.2    |         | 23319         | 7340032 | 7602176      |
| kzykhys/ciconia      | v1.0.2    | gfm     | 26398         | 7340032 | 7602176      |
+----------------------+-----------+---------+---------------+---------+--------------+

Thinking about making two variants:

super fast but not so easy to extend
highly extensible but a bit slower

performance

opened by cebe 7

Modules applying/not-applying to already marked blocks
What if I want to do an extension to your markdown which should not to be applied to some blocks?

In this case I don’t want Emoji (#35) be applied to fenced code blocks:

```php // not a Emoji :thumbsup: ```

In other modules I may want to do such recursive processing.

As I remember current Markdown declaration is not strict about that. It’s not strict on many questions at all. Hence many of issues (italics inside of links_with_underscores in some not-GFM implementation and others).
opened by maximal 7

Segmentation fault in Parser.php

Thank you for your awesome markdown library! I will report a bug.

What happened

Sometimes, segmentation fault occurs when using your library. Perhaps , reflection method is the cause.

Environment

PHP 5.4.43 (cli) (built: Jul 8 2015 12:08:50)
cebe/markdown 1.1.0

What happened

kernel: php[11562]: segfault at 0 ip 000000312513d414 sp 00007ffcd4ebf658 error 4 in libc-2
.12.so[3125000000+18a000]

coredump

(gdb) where
#0  __strncasecmp_l_ssse3 () at ../sysdeps/x86_64/strcmp.S:1237
#1  0x00000000006043c3 in zend_find_alias_name (ce=<value optimized out>, name=0x3cb4a78 "consumebul", len=10)
    at /usr/src/debug/php-5.4.43/Zend/zend_API.c:3929
#2  0x000000000060451a in zend_resolve_method_name (ce=<value optimized out>, f=0x3cb4a98)
    at /usr/src/debug/php-5.4.43/Zend/zend_API.c:3970
#3  0x00000000004f2519 in reflection_method_factory (ce=0x3c27bc0, method=0x3cb4a98, closure_object=0x0, 
    object=0x3cbd540) at /usr/src/debug/php-5.4.43/ext/reflection/php_reflection.c:1298
#4  0x00000000004f2740 in _addmethod (mptr=0x3cb4a98, ce=0x3c27bc0, retval=0x3cad908, filter=<value optimized out>, 
    obj=0x0) at /usr/src/debug/php-5.4.43/ext/reflection/php_reflection.c:3737
#5  0x00000000004f29dc in _addmethod_va (mptr=<value optimized out>, num_args=<value optimized out>, 
    args=<value optimized out>, hash_key=<value optimized out>)
    at /usr/src/debug/php-5.4.43/ext/reflection/php_reflection.c:3751
#6  0x000000000060fb8e in zend_hash_apply_with_arguments (ht=0x3c27be8, apply_func=0x4f2970 <_addmethod_va>, 
    num_args=4) at /usr/src/debug/php-5.4.43/Zend/zend_hash.c:772
#7  0x00000000004f2859 in zim_reflection_class_getMethods (ht=<value optimized out>, return_value=0x3cad908, 
    return_value_ptr=<value optimized out>, this_ptr=0x3c357f8, return_value_used=<value optimized out>)
    at /usr/src/debug/php-5.4.43/ext/reflection/php_reflection.c:3778

filename and line

(gdb) print (char *)executor_globals.active_op_array->filename
$1 = 0x3c125c8 "/home/vagrant/dev/owl/vendor/cebe/markdown/Parser.php"
(gdb) print executor_globals.current_execute_data.opline->lineno
$2 = 274

opened by fortkle 6

OPCode cache breaks the parsing of inline elements

With some OPCode cache enabled, Parser::inlineMarksers() returns an empty array since the docComments has been removed from the source code in the runtime. As a consequence, we won't be able to parse any inline elements.

Tested with PHP 5.6.3 and Zend OPCache 7.0.4-dev / XAMPP 5.6.3 for Windows.
enhancement

opened by softark 6
Deactivate some functionalities ?

Hi and thanks for you project. :+1:

I created my own MarkdownParser and I want to deactivate some functionalities like title and raw html parsing.

What is the better way to do that ? Nothing on the documentation, it could be good to have a section for that! :)

Thanks.
docs

opened by soullivaneuh 6
[FEATURE] Header links
GitHub at least includes the ability to create header links. This for instance makes the #header-1 link to the first heading named Header, ie

heres the [link](#header-1] <- clicking this takes you to the header below #### Header

However, this library makes those links absolute, meaning it now links to the original github repository, thereby taking them off our website.

Is there a way to make the links relative that I'm missing, or could that be made a new feature if it's not already present?
opened by androidacy-user 8

h is singled out in parsing

I want to use this library for a changelog parser.

But while testing out the output structure of the parser, i found out, it singles out h's in text while using links or references in the text.

The following code sample is used as a baseline (dd is a function from symfony link):

class ChangelogParser extends GithubMarkdown
{
    public function parseChangelog(string $text)
    {
        parent::prepare();

        if (ltrim($text) === '') {
            return '';
        }

        $text = str_replace(["\r\n", "\n\r", "\r"], "\n", $text);

        parent::prepareMarkers($text);

        $blocks = parent::parseBlocks(explode("\n", $text));

        dd($blocks);
    }
}

the following markdown doesn't trigger the error:

qwertzuiopüasdfghjklöäyxcvbnm

output:

^ array:1 [▼
  0 => array:2 [▼
    0 => "paragraph"
    "content" => array:1 [▼
      0 => array:2 [▼
        0 => "text"
        1 => "qwertzuiopüasdfghjklöäyxcvbnm"
      ]
    ]
  ]
]

but these do:

qwertzuiopüasdfghjklöäyxcvbnm[test](https://github.com)

output:

^ array:1 [▼
  0 => array:2 [▼
    0 => "paragraph"
    "content" => array:5 [▼
      0 => array:2 [▼
        0 => "text"
        1 => "qwertzuiopüasdfg"
      ]
      1 => array:2 [▼
        0 => "text"
        1 => "h"
      ]
      2 => array:2 [▼
        0 => "text"
        1 => "jklöäyxcvbnm"
      ]
      3 => array:6 [▼
        0 => "link"
        "text" => array:1 [▼
          0 => array:2 [▼
            0 => "text"
            1 => "test"
          ]
        ]
        "url" => "https://github.com"
        "title" => null
        "refkey" => null
        "orig" => "[test](https://github.com)"
      ]
      4 => array:2 [▼
        0 => "text"
        1 => ""
      ]
    ]
  ]
]

qwertzuiopüasdfghjklöäyxcvbnm[test]

[test]: https://github.com

output:

^ array:1 [▼
  0 => array:2 [▼
    0 => "paragraph"
    "content" => array:5 [▼
      0 => array:2 [▼
        0 => "text"
        1 => "qwertzuiopüasdfg"
      ]
      1 => array:2 [▼
        0 => "text"
        1 => "h"
      ]
      2 => array:2 [▼
        0 => "text"
        1 => "jklöäyxcvbnm"
      ]
      3 => array:6 [▼
        0 => "link"
        "text" => array:1 [▼
          0 => array:2 [▼
            0 => "text"
            1 => "test"
          ]
        ]
        "url" => null
        "title" => null
        "refkey" => "test"
        "orig" => "[test]"
      ]
      4 => array:2 [▼
        0 => "text"
        1 => ""
      ]
    ]
  ]
]

qwertzuiopüasdfghjklöäyxcvbnm

[test]: https://github.com

output:

^ array:1 [▼
  0 => array:2 [▼
    0 => "paragraph"
    "content" => array:3 [▼
      0 => array:2 [▼
        0 => "text"
        1 => "qwertzuiopüasdfg"
      ]
      1 => array:2 [▼
        0 => "text"
        1 => "h"
      ]
      2 => array:2 [▼
        0 => "text"
        1 => "jklöäyxcvbnm"
      ]
    ]
  ]
]

opened by RTUnreal 0

Remove depreciated name tags in A element

This PR cleans up a minor issue in README.md that produces a warning from w3.org. See lines 130-136 in https://validator.w3.org/nu/?doc=https%3A%2F%2Fgithub.com%2Fcebe%2Fmarkdown

The A elements don't really add anything and are not needed.

opened by phpfui 1
Table without body at the end of markdown not detected
The following markdown table without body at the end of the markdown text is not detected and will be rendered as paragraph:

"| Tables | Are | Cool |\n| ------------- |:-------------:| -----:|"

while this one is detected (due to the following line):

| Tables | Are | Cool |\n| ------------- |:-------------:| -----:|\n\nasdf

Due to this check: https://github.com/cebe/markdown/blob/master/block/TableTrait.php#L23
opened by buddh4 0

Releases(1.2.1)

1.2.1(Mar 26, 2018)
Improved handling of inline HTML with URL and email tags.

Improved handling of custom syntax with [[, references should not use [ as the first character in the reference name.

Source code(tar.gz)
Source code(zip)
1.0.3(Mar 26, 2018)
Improved handling of custom syntax with [[, references should not use [ as the first character in the reference name.

Source code(tar.gz)
Source code(zip)
1.2.0(Mar 14, 2018)
This release contains a lot of improvement on markdown edge cases as well as changes to the abstract syntax tree for tables.

#50 Do not render empty emphs.

#69 Improve ABSY for tables, make column and row information directly available in absy (@NathanBaulch)

#89 Lists should be separated by a HR (@bieleckim)

#95 Added TableTrait::composeTable($head, $body), for easier overriding of table layout (@maximal, @cebe)

#111 Improve rendering of successive strongs (@wogsland)

#132 Improve detection and rendering of fenced code blocks in lists.

#134 Fix Emph and Strong to allow escaping * or _ inside them.

#135 GithubMarkdown was not parsing inline code when there are square brackets around it.

#151 Fixed table rendering for lines begining with | for GFM (@GenaBitu)

Improved table rendering, allow single column tables.

Source code(tar.gz)
Source code(zip)
1.1.2(Jul 16, 2017)
#126 Fixed crash on empty lines that extend a lazy list

#128 Fix table renderer which including default alignment (@tanakahisateru)

#129 Use given encoded URL if decoded URL text looks insecure, e.g. uses broken UTF-8 (@tanakahisateru)

Added a workaround for a PHP bug which exists in versions < 7.0, where preg_match() causes a segfault on catastropic backtracking in emph/strong parsing.

Source code(tar.gz)
Source code(zip)
1.1.1(Sep 14, 2016)
#112 Fixed parsing for custom self-closing HTML tags

#113 improve extensibility by making prepareMarkers() protected and add parseBlock() method

#114 better handling of continued inline HTML in paragraphs

Source code(tar.gz)
Source code(zip)
1.1.0(Mar 6, 2015)
improve compatibility with github flavored markdown

#64 fixed some rendering issue with emph and strong

#56 trailing and leading spaces in a link are now ignored

fixed various issues with table rendering

#98 Fix PHP fatal error when maximumNestingLevel was reached (@tanakahisateru)

refactored nested and lazy list handling, improved overall list rendering consistency

Lines containing "0" where skipped or considered empty in some cases (@tanakahisateru)

#54 escape characters are now also considered inside of urls

Source code(tar.gz)
Source code(zip)
1.0.2(Mar 6, 2015)
#98 Fix PHP fatal error when maximumNestingLevel was reached (@tanakahisateru)

Source code(tar.gz)
Source code(zip)
1.0.1(Oct 25, 2014)
Fixed the bin/markdown script to work with composer autoloader (c497bada0e15f61873ba6b2e29f4bb8b3ef2a489)

#74 fixed a bug that caused a bunch of broken characters when non-ASCII input was given. Parser now handles UTF-8 input correctly. Other encodings are currently untested, UTF-8 is recommended.

Source code(tar.gz)
Source code(zip)
1.0.0(Oct 12, 2014)

This is the first stable release of version 1.0 which is incompatible to the 0.9.x branch regarding the internal API which is used when extending the Markdown parser. The external API has no breaking changes. The rendered Markdown however has changed in some edge cases and some rendering issues have been fixed.

The parser got a bit slower compared to earlier versions but is able to parse Markdown more accurately and uses an abstract syntax tree as the internal representation of the parsed text which allows extensions to work with the parsed Markdown in many ways including rendering as other formats than HTML.

For more details about the changes see the release message of 1.0.0-rc.

You can try it out on the website: http://markdown.cebe.cc/try

The parser is now also regsitered on the Babelmark 2 page by John MacFarlane which you can use to compare Markdown output of different parsers.
Source code(tar.gz)
Source code(zip)
1.0.0-rc(Oct 10, 2014)
#21 speed up inline parsing using strpbrk about 20% speedup compared to parsing before.

#24 CLI script now sends all error output to stderr instead of stdout

#25 Added partial support for the Markdown Extra flavor

#10 GithubMarkdown is now fully supported including tables

#67 All Markdown classes are now composed out of php traits

#67 The way to extend markdown has changed due to the introduction of an abstract syntax tree. See https://github.com/cebe/markdown/commit/dd2d0faa71b630e982d6651476872469b927db6d for how it changes or read the new README.

Introduced an abstract syntax tree as an intermediate representation between parsing steps. This not only fixes some issues with nested block elements but also allows manipulation of the markdown before rendering.

This version also fixes serveral rendering issues.

Source code(tar.gz)
Source code(zip)
0.9.2(Feb 22, 2014)

#27 - Fixed some rendering problems with block elements not separated by newlines
Source code(tar.gz)
Source code(zip)
0.9.1(Feb 18, 2014)

Fixed an issue with inline markers that begin with the same character e.g. [ and [[.
Source code(tar.gz)
Source code(zip)
0.9.0(Feb 18, 2014)
The initial release.

Complete implementation of the original Markdown spec

GFM without tables

a command line tool for markdown parsing

Source code(tar.gz)
Source code(zip)