Better Markdown Parser in PHP

Emanuil Rusev

Last update: Dec 28, 2022

Related tags

Overview

Parsedown

Better Markdown Parser in PHP - Demo.

Features

One File
No Dependencies
Super Fast
Extensible
GitHub flavored
Tested in 5.3 to 7.3
Markdown Extra extension

Installation

Install the composer package:

composer require erusev/parsedown

Or download the latest release and include Parsedown.php

Example

$Parsedown = new Parsedown();

echo $Parsedown->text('Hello _Parsedown_!'); # prints: <p>Hello <em>Parsedown</em>!</p>

You can also parse inline markdown only:

echo $Parsedown->line('Hello _Parsedown_!'); # prints: Hello <em>Parsedown</em>!

More examples in the wiki and in this video tutorial.

Security

Parsedown is capable of escaping user-input within the HTML that it generates. Additionally Parsedown will apply sanitisation to additional scripting vectors (such as scripting link destinations) that are introduced by the markdown syntax itself.

To tell Parsedown that it is processing untrusted user-input, use the following:

$Parsedown->setSafeMode(true);

If instead, you wish to allow HTML within untrusted user-input, but still want output to be free from XSS it is recommended that you make use of a HTML sanitiser that allows HTML tags to be whitelisted, like HTML Purifier.

In both cases you should strongly consider employing defence-in-depth measures, like deploying a Content-Security-Policy (a browser security feature) so that your page is likely to be safe even if an attacker finds a vulnerability in one of the first lines of defence above.

Security of Parsedown Extensions

Safe mode does not necessarily yield safe results when using extensions to Parsedown. Extensions should be evaluated on their own to determine their specific safety against XSS.

Escaping HTML

WARNING: This method isn't safe from XSS!

If you wish to escape HTML in trusted input, you can use the following:

$Parsedown->setMarkupEscaped(true);

Beware that this still allows users to insert unsafe scripting vectors, such as links like [xss](javascript:alert%281%29).

Questions

How does Parsedown work?

It tries to read Markdown like a human. First, it looks at the lines. It’s interested in how the lines start. This helps it recognise blocks. It knows, for example, that if a line starts with a - then perhaps it belongs to a list. Once it recognises the blocks, it continues to the content. As it reads, it watches out for special characters. This helps it recognise inline elements (or inlines).

We call this approach "line based". We believe that Parsedown is the first Markdown parser to use it. Since the release of Parsedown, other developers have used the same approach to develop other Markdown parsers in PHP and in other languages.

Is it compliant with CommonMark?

It passes most of the CommonMark tests. Most of the tests that don't pass deal with cases that are quite uncommon. Still, as CommonMark matures, compliance should improve.

Who uses it?

Laravel Framework, Bolt CMS, Grav CMS, Herbie CMS, Kirby CMS, October CMS, Pico CMS, Statamic CMS, phpDocumentor, RaspberryPi.org, Symfony Demo and more.

How can I help?

Use it, star it, share it and if you feel generous, donate.

What else should I know?

I also make Nota — a writing app designed for Markdown files :)

Comments

Prevent various XSS attacks [rebase and update of #276]
I've picked up on the work started over at https://github.com/erusev/parsedown/pull/276 and rebased on erusev/master.

Since this is rebased on master, I can't point at PR at naNuke/master without running into the merge conflicts that I've already resolved manually.

I've implemented what I suggested earlier so that all attributes are properly encoded (and not just the specific ones we remember).

I've also added some tests, so @erusev's concern here should hopefully now be resolved, albeit a year later 😉 https://github.com/erusev/parsedown/pull/276#issuecomment-178577968

@malas one reason is the lack of tests

~~One thing to note is that all this can be circumvented if you forget to turn on $Parsedown->setMarkupEscaped(true); (which is off by default) as you could just write a script tag manually for xss (even though the attributes and link destinations will be safe). So let's all remember to enable this setting 😉~~

Attributes are now always escaped properly (this speaks to just outputting things correctly), but link based XSS or XSS from writing plain old script tags will only be prevented only if the new setSafeMode is enabled.

$Parsedown->setSafeMode(true);

Closes #161 Closes #497 Closes #276 Closes #403 Closes #530

The following CVE has been assigned to the vulnerability specific to bypassing ->setMarkupEscaped(true): CVE-2018-1000162.
security
opened by aidantwoods 74

Safe Mode

Parsedown offers no safe option for rendering, so, for those of us who allow no HTML content within markdown text, we must sanitize prior to feeding to Parsedown. As a result, Parsedown double encodes HTML entities (see #50).

Instead, Parsedown should offer a "safe" option.

function text($text, $safe_mode = false) {
    $this->safe_mode = $safe_mode;
    if ($safe_mode) {
        $text = htmlentities($text, ENT_QUOTES, 'UTF-8');
    }
    ...
}

Usage:

$parsedown = new ParseDown();
$text = "<script>alert('unsafe text');</script>";
echo $parsedown->text($text, true);

Output:

&lt;script&gt;alert(&quote;unsafe text&quote;);&lt/script&gt;

feature request priority

opened by clphillips 44

Extra empty lines not showing up in preview using parsedown and codeigniter

Previewing parsedown not displaying correct newlines in preview

demo

With   manually entered texarea I do not want to have to enter   all the time if need to have extra lines

demo1

Question: How can I get it to show correct lines amount of lines that are in the textarea to preview?

Controller How I load the parsedown.php

if I replace \n with   it will effect the code indents I have tried nl2br() also

<?php

class Example extends CI_Controller {

	public function __construct() {
		parent::__construct();
                // application > libraries > Parsedown.php
		$this->load->library('parsedown'); 
	}

	public function index()
	{
		$this->load->view('example_view');
	}

	public function preview() {
		$data['success'] = false;
		$data['output'] = '';

		if ($this->input->post('html')) {
			
			$string = str_replace('\n', '\r\n', $this->input->post('html'));
			
			$data['success'] = true;
			$data['output'] = $this->parsedown->text($string);
			
		}

		echo json_encode($data);
	}
}

Script


<script type="text/javascript">
$( document ).ready(function() {
	$('#editor').on('paste cut mouseup mousedown mouseout keydown keyup', function(){
		var postData = {
			'html' : $('#editor').val(),
		};
		$.ajax({
			type: "POST",
			url: "<?php echo base_url('example/preview');?>",
			data: postData,
			dataType: 'json',
			success: function(json){
				$('#output').html(json['output']);
			}
		});
	});
});
</script>

it's one of the best markdowns this is the only thing I can that need to fix.

opened by tasmanwebsolutions 34

Prevent various XSS attacks

Hello, I am back with the safe links issue. As always this is open to discussion :) but together with already existing markupEscaped setting, this is basically a fix for issues #161 and #210, and maybe other im not aware of.
feature request priority

opened by naNuke 33
Links won't get converted
Hi,

thanks for this great and fast MD parser :) When i used it on this MD-File: https://github.com/RexDude/seo42/blob/master/README.md

I get 6-times (EDIT: due to a REDAME.md update there are now 10) the following PHP-Warning and also all the Links won't get converted.

Warning: preg_match_all(): Compilation failed: internal error: previously-checked referenced subpattern not found at offset 37 in /home/dude/Projekte/Web/AddonFactory/htdocs/redaxo/include/addons/seo42/classes/class.parsedown.inc.php on line 494

Is this something with my MD-File or with the Parser?
opened by ghost 33
Add a CLI mode to allow parsing markdown at the command line
I was writing a lot of tests, and then copy/pasting the HTML from my browser to the appropriate HTML file, which was very time consuming. Then it occurred to me that we could use Parsedown.php as the CLI tool. With this code you can now do:

php Parsedown.php document.md > document.html
feature request
opened by scottchiefbaker 31
Href attribute not being rendered correctly
When parsing HTML inside my Markdown string, I've been coming across some issues. Most noticeably any href attributes of those HTML bits.

<a z='/training/schema' href='/training/schema'>Schema</a>

Is rendered as

<a z="/training/schema" href="%7B%7B%20url%20%7D%7D">Schema</a>
bug needs-more-info
opened by jackmcdade 26
Allow to remove the englobing paragraph
I would like to remove the  englobing the html output as it has not specifically been asked.

Just make an optional parameter to remove it in parse

$parser = new Parsedown(); $output = $parser->parse($input,false);//Without $output = $parser->parse($input,true);//With $output = $parser->parse($input);//With 

Thanks
feature request
opened by Hypoaristerolactotherapist 25
Add "id" attribute to headings

This is more or less how GitHub Flavored Markdown works.

See an example: https://github.com/henriquemoody/parsedown/blob/titles/test/data/atx_heading.md

opened by henriquemoody 22
Problems with line break

Best regards...

The problem I have is the following: When creating paragraphs with appointments, the next paragraph that does not belong to the appointment is added. In this way:

bug

opened by rdgonzalez2017 21
Formatting already sanitized inputs

Well I am using this lib in my chat app and the problem I am facing is formatting sanitized inputs.

For ex: If the user sends > text it doesn't change to blockquote but shows up as > text where > is the html entity code in the html inspector and also in the db.

Is there any way it could be done to get it to work as expected?

opened by whimsicaldreamer 21

Nested Elements in the same Level

Hi,

Thanks for this great project! Iam using this in https://github.com/secure-77/Perlite

I have a question about blockquotes and nested html elements on the same level.

I want to transform this:

> [!bug]
> Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur et gravida diam, et varius magna. Proin `id felis quis nisl` gravida auctor a eu est. In viverra dui viverra placerat cursus. Curabitur non commodo mi. Mauris volutpat nisl vitae nulla efficitur condimentum. Nulla facilisi. Maecenas malesuada purus mi, eget fringilla quam ultrices sit amet.

to something like this

	<div data-callout="bug" class="callout">
		<div class="callout-title">
			<div class="callout-title-inner">Bug</div>
		</div>
		<div class="callout-content">
			<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur et gravida diam, et varius magna. Proin <code>id felis quis nisl</code> gravida auctor a eu est. In viverra dui viverra placerat cursus. Curabitur non commodo mi. Mauris volutpat nisl vitae nulla efficitur condimentum. Nulla facilisi. Maecenas malesuada purus mi, eget fringilla quam ultrices sit amet.</p>
		</div>
	</div>

So far, I have extended the blockQuotes function to handle callouts

protected function blockQuote($Line)
    {

        if (preg_match('/^>[ ]?(.*)/', $Line['text'], $matches)) {
            $Block = array(
                'element' => array(
                    'name' => 'blockquote',
                    'handler' => 'lines',
                    'text' => (array) $matches[1],
                ),
            );
            

        if (preg_match('/^>\s?\[\!(.*?)\](.*?)$/m', $Line['text'], $matches)) {
            $type = strtolower($matches[1]);
            $title = $matches[2];

            $calloutTitle = $title ?: ucfirst($type);


            $Block = array(
                'element' => array(
                    'name' => 'div',
                    'attributes' => array(
                        'data-callout' => $type,
                        'class' => 'callout'
                    ),
                    'handler' => 'lines',
                    'text' => (array) $calloutTitle,
                ),
            );
        }
    }


        return $Block;
    }

this is my current output.

<div data-callout="info" class="callout">
<p>Info<br>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur et gravida diam, et varius magna. Proin <code>id felis quis nisl</code> gravida auctor a eu est. In viverra dui viverra placerat cursus. Curabitur non commodo mi. Mauris volutpat nisl vitae nulla efficitur condimentum. Nulla facilisi. Maecenas malesuada purus mi, eget fringilla quam ultrices sit amet.</p>
</div>

But I cant figure out how to create a structure like this

<div data-callout="bug" class="callout">
		<div class="callout-title">Title</div>
		<div class="callout-content">Text</div>
</div>

I've tried a few things but I don't know how to make this work. Thanks for any advice!

opened by secure-77 0

[Bug] Code block splitted (1.7.4, 2.0.0 Beta 1)

Description

Parsedown splits a code block in multiple parts when having a markdown file like the following:

* Source
```bash
$ ls

# Comment in code
$ ls
```

The issue is present in the latest stable release and the latest public beta.

Expected Behavior

<ul>
<li>Source</li>
</ul>
<pre><code class="language-bash">$ ls
# Comment in code</span>

$ ls
</code></pre>

Actual Behavior

<ul>
<li>Source
<pre><code class="language-bash">$ ls
</code></pre>
</li>
</ul>
<h1>Comment in code</h1>
<p>$ ls</p>
<pre><code></code></pre>

Steps to reproduce

Reproduce with Parsedown 1.7.4

Go to Parsedown Demo
Add markdown example from the description
Click parse

Reproduce with Parsedown 2.0.0 Beta 1

Setup

$ sudo apt install php8.1
$ php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"
$ php composer-setup.php
$ php -r "unlink('composer-setup.php');"
# Dependencies
$ sudo apt-get install php8.1-mbstring
$ php ../composer.phar require erusev/parsedown:v2.0.0-beta-1
$ php demo.php

demo.php

<?php

require __DIR__ . '/vendor/autoload.php';

use Erusev\Parsedown\Configurables\Breaks;
use Erusev\Parsedown\Configurables\SafeMode;
use Erusev\Parsedown\Configurables\StrictMode;
use Erusev\Parsedown\State;
use Erusev\Parsedown\Parsedown;


$markdown = <<<EOD
* Source
```bash
$ ls

# Comment in code
$ ls
```
EOD;


$state = new State([
    new Breaks(true),
    new SafeMode(true),
    new StrictMode(false)
]);

$Parsedown = new Parsedown($state);
echo $Parsedown->toHtml($markdown);
?>

bug

opened by Tooa 2

[Bug] Element h1 inside list element when having no newline (1.7.4, 2.0.0 Beta 1)

Description

Parsedown places the h1 element inside the list element when having a markdown file like the following:

* element1
* element2
# Troubleshooting

The issue is present for the latest stable release and the latest public beta. The problem does not occur with Markdown PHP 1.3 featured in the Parsedown Demo though.

Let me know how I can further assist @erusev @aidantwoods.

Expected Behavior

<ul>
<li>element1</li>
<li>element2</li>
</ul>

<h1>Troubleshooting</h1>

Actual Behavior

<ul>
<li>element1</li>
<li>element2
<h1>Troubleshooting</h1></li>
</ul>

Steps to reproduce

Reproduce with Parsedown 1.7.4

Go to Parsedown Demo
Add markdown example from the description
Click parse

indent-issue-demo

Reproduce with Parsedown 2.0.0 Beta 1

Setup

$ sudo apt install php8.1
$ php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"
$ php composer-setup.php
$ php -r "unlink('composer-setup.php');"
# Dependencies
$ sudo apt-get install php8.1-mbstring
$ php ../composer.phar require erusev/parsedown:v2.0.0-beta-1
$ php demo.php

demo.php

<?php

require __DIR__ . '/vendor/autoload.php';

use Erusev\Parsedown\Configurables\Breaks;
use Erusev\Parsedown\Configurables\SafeMode;
use Erusev\Parsedown\Configurables\StrictMode;
use Erusev\Parsedown\State;
use Erusev\Parsedown\Parsedown;


$markdown = <<<EOD
* element1
* element2
# Troubleshooting
EOD;


$state = new State([
    new Breaks(true),
    new SafeMode(true),
    new StrictMode(false)
]);

$Parsedown = new Parsedown($state);
echo $Parsedown->toHtml($markdown);
?>

indent-issue-beta

bug

opened by Tooa 0

Add Option minHeaderLevel

Add a new option to define the minimal Header level to be used e.g. if the option ->minHeaderLevel(3) a header defined with a single hash # heading will be parsed as <h3>heading</h3>

opened by nyphis 0

Releases(v2.0.0-beta-1)

v2.0.0-beta-1(May 21, 2022)

This is an initial beta of the planned changes for v2.0.0.

Documentation is still being worked on for general usage. Some initial "extensions focused" documentation is available in: 2.0.x/docs/Migrating-Extensions-v2.0.md.
Source code(tar.gz)
Source code(zip)
1.7.4(Dec 30, 2019)

Introduce rawHtml concept from beta 1.8 that extensions may optionally utilise. In 1.8 beta versions this feature is utilised internally and might have compatibility issues with extensions, this release does not use this feature internally so no such issues will be present.
Source code(tar.gz)
Source code(zip)
1.8.0-beta-6(Mar 17, 2019)

This is a pre-release.

To see what's changed from 1.7.1, please refer to the draft release notes in https://github.com/erusev/parsedown/issues/601.

Any testing, bug-reports, or bug-fixes are very welcome.

This beta increment is a security release and resolves an issue which would allow a user to add arbitrary classes to fenced code blocks. This might have security consequences, see #699 for more detail.
Source code(tar.gz)
Source code(zip)
1.7.3(Apr 2, 2019)

Source code(tar.gz)
Source code(zip)
1.7.2(Mar 17, 2019)

This is a security release and resolves an issue which would allow a user to add arbitrary classes to fenced code blocks. This might have security consequences, see #699 for more detail.
Source code(tar.gz)
Source code(zip)
1.8.0-beta-5(Jun 11, 2018)

This is a pre-release.

To see what's changed from 1.7.1, please refer to the draft release notes in https://github.com/erusev/parsedown/issues/601.

Any testing, bug-reports, or bug-fixes are very welcome.

This beta release restores the existence of some previously deleted protected interface endpoints.
Source code(tar.gz)
Source code(zip)
1.8.0-beta-4(May 8, 2018)

This is a pre-release.

To see what's changed from 1.7.1, please refer to the draft release notes in https://github.com/erusev/parsedown/issues/601.

Any testing, bug-reports, or bug-fixes are very welcome.

Some minor bug-fixes have been resolved since beta-3.
Source code(tar.gz)
Source code(zip)
1.8.0-beta-3(May 7, 2018)

This is a pre-release.

To see what's changed from 1.7.1, please refer to the draft release notes in https://github.com/erusev/parsedown/issues/601.

Any testing, bug-reports, or bug-fixes are very welcome.

Essentially this is the second beta but I forgot to bump the class version number before tagging, and I'm not a fan of deleting version tags – hence number 3.
Source code(tar.gz)
Source code(zip)
1.8.0-beta-1(Apr 8, 2018)

This is a pre-release.

To see what's changed from 1.7.1, please refer to the draft release notes in https://github.com/erusev/parsedown/issues/601.

Any testing, bug-reports, or bug-fixes are very welcome.
Source code(tar.gz)
Source code(zip)
1.7.1(Mar 8, 2018)

This is a bugfix release. The following have been resolved:

#475: "Loose" lists will now contain paragraphs in all items, not just some. #433: Links will no longer be double nested #525: The info-string when beginning a code block may now contain non-word characters (e.g. c++) #561: The mbstring extension (which we already depend on) has been added explicitly to composer.json #563: The Parsedown::version constant now matches the release version #560: Builds will now fail if we forget to update the version constant again 😉

Thanks to @PhrozenByte, @harikt, @erusev, @luizbills, and @aidantwoods for their contributions to this release.
Source code(tar.gz)
Source code(zip)
1.7.0(Feb 28, 2018)

Source code(tar.gz)
Source code(zip)
1.6.0(Oct 4, 2015)
late static binding for Parsedown::instance()

Source code(tar.gz)
Source code(zip)
1.5.0(Jan 19, 2015)

Source code(tar.gz)
Source code(zip)
1.4.0(Jan 12, 2015)

Source code(tar.gz)
Source code(zip)
1.3.0(Jan 12, 2015)

Source code(tar.gz)
Source code(zip)
1.2.0(Jan 10, 2015)

Source code(tar.gz)
Source code(zip)
1.1.0(Sep 26, 2014)

Source code(tar.gz)
Source code(zip)
1.0.0(May 14, 2014)

Source code(tar.gz)
Source code(zip)
1.0.0-rc.1(Apr 18, 2014)

This is a major release. It introduces a more granular class architecture. This improves extensibility and makes the code easier to read. The release also introduces an interface that allows independent parsing of inline elements.

p.s. There's an implementation detail that I'd like to mention. It is about the use of strpbrk. I wanted to mention it, because the idea that strpbrk could replace strpos came from another project - a Parsedown based project by @cebe. I should also mention that it was brought to my attention by @hkdobrev.
Source code(tar.gz)
Source code(zip)
0.9.0(Jan 22, 2014)

Source code(tar.gz)
Source code(zip)
0.8.0(Dec 26, 2013)
Version 0.8 features a new approach to parsing inline elements. Along with performance, it improves consistency.

To give an example, here are a markdown text and a comparison of the output that it would produce.

*em **strong em*** ***strong em** em* *em **strong em** em*

The parser used by GitHub.com:

em *strong em*** **strong em* em* em *strong em** em*

Parsedown:

em strong em strong em* em* *em strong em* em

Additionally, version 0.8 features an option to enable automatic line breaks.
Source code(tar.gz)
Source code(zip)
0.7.0(Nov 21, 2013)

Source code(tar.gz)
Source code(zip)
0.6.0(Nov 20, 2013)

Source code(tar.gz)
Source code(zip)
0.5.0(Nov 17, 2013)
support for fenced code block

performance improvements

code quality improvements

Source code(tar.gz)
Source code(zip)
0.4.0(Nov 4, 2013)
escaping for special characters

performance improvements

Source code(tar.gz)
Source code(zip)
0.3.0(Nov 2, 2013)
HTML support

Improved code block parsing

Source code(tar.gz)
Source code(zip)