Better Markdown Parser in PHP

Overview

Parsedown

Parsedown

Build Status Total Downloads Version License

Better Markdown Parser in PHP - Demo.

Features

Installation

Install the composer package:

composer require erusev/parsedown

Or download the latest release and include Parsedown.php

Example

$Parsedown = new Parsedown();

echo $Parsedown->text('Hello _Parsedown_!'); # prints: <p>Hello <em>Parsedown</em>!</p>

You can also parse inline markdown only:

echo $Parsedown->line('Hello _Parsedown_!'); # prints: Hello <em>Parsedown</em>!

More examples in the wiki and in this video tutorial.

Security

Parsedown is capable of escaping user-input within the HTML that it generates. Additionally Parsedown will apply sanitisation to additional scripting vectors (such as scripting link destinations) that are introduced by the markdown syntax itself.

To tell Parsedown that it is processing untrusted user-input, use the following:

$Parsedown->setSafeMode(true);

If instead, you wish to allow HTML within untrusted user-input, but still want output to be free from XSS it is recommended that you make use of a HTML sanitiser that allows HTML tags to be whitelisted, like HTML Purifier.

In both cases you should strongly consider employing defence-in-depth measures, like deploying a Content-Security-Policy (a browser security feature) so that your page is likely to be safe even if an attacker finds a vulnerability in one of the first lines of defence above.

Security of Parsedown Extensions

Safe mode does not necessarily yield safe results when using extensions to Parsedown. Extensions should be evaluated on their own to determine their specific safety against XSS.

Escaping HTML

WARNING: This method isn't safe from XSS!

If you wish to escape HTML in trusted input, you can use the following:

$Parsedown->setMarkupEscaped(true);

Beware that this still allows users to insert unsafe scripting vectors, such as links like [xss](javascript:alert%281%29).

Questions

How does Parsedown work?

It tries to read Markdown like a human. First, it looks at the lines. It’s interested in how the lines start. This helps it recognise blocks. It knows, for example, that if a line starts with a - then perhaps it belongs to a list. Once it recognises the blocks, it continues to the content. As it reads, it watches out for special characters. This helps it recognise inline elements (or inlines).

We call this approach "line based". We believe that Parsedown is the first Markdown parser to use it. Since the release of Parsedown, other developers have used the same approach to develop other Markdown parsers in PHP and in other languages.

Is it compliant with CommonMark?

It passes most of the CommonMark tests. Most of the tests that don't pass deal with cases that are quite uncommon. Still, as CommonMark matures, compliance should improve.

Who uses it?

Laravel Framework, Bolt CMS, Grav CMS, Herbie CMS, Kirby CMS, October CMS, Pico CMS, Statamic CMS, phpDocumentor, RaspberryPi.org, Symfony Demo and more.

How can I help?

Use it, star it, share it and if you feel generous, donate.

What else should I know?

I also make Nota — a writing app designed for Markdown files :)

Issues
  • Prevent various XSS attacks [rebase and update of #276]

    Prevent various XSS attacks [rebase and update of #276]

    I've picked up on the work started over at https://github.com/erusev/parsedown/pull/276 and rebased on erusev/master.

    Since this is rebased on master, I can't point at PR at naNuke/master without running into the merge conflicts that I've already resolved manually.

    I've implemented what I suggested earlier so that all attributes are properly encoded (and not just the specific ones we remember).

    I've also added some tests, so @erusev's concern here should hopefully now be resolved, albeit a year later 😉 https://github.com/erusev/parsedown/pull/276#issuecomment-178577968

    @malas one reason is the lack of tests

    ~~One thing to note is that all this can be circumvented if you forget to turn on $Parsedown->setMarkupEscaped(true); (which is off by default) as you could just write a script tag manually for xss (even though the attributes and link destinations will be safe). So let's all remember to enable this setting 😉~~

    Attributes are now always escaped properly (this speaks to just outputting things correctly), but link based XSS or XSS from writing plain old script tags will only be prevented only if the new setSafeMode is enabled.

    $Parsedown->setSafeMode(true);
    

    Closes #161 Closes #497 Closes #276 Closes #403 Closes #530


    The following CVE has been assigned to the vulnerability specific to bypassing ->setMarkupEscaped(true): CVE-2018-1000162.

    security 
    opened by aidantwoods 74
  • Safe Mode

    Safe Mode

    Parsedown offers no safe option for rendering, so, for those of us who allow no HTML content within markdown text, we must sanitize prior to feeding to Parsedown. As a result, Parsedown double encodes HTML entities (see #50).

    Instead, Parsedown should offer a "safe" option.

    function text($text, $safe_mode = false) {
        $this->safe_mode = $safe_mode;
        if ($safe_mode) {
            $text = htmlentities($text, ENT_QUOTES, 'UTF-8');
        }
        ...
    }
    

    Usage:

    $parsedown = new ParseDown();
    $text = "<script>alert('unsafe text');</script>";
    echo $parsedown->text($text, true);
    

    Output:

    &lt;script&gt;alert(&quote;unsafe text&quote;);&lt/script&gt;
    
    feature request priority 
    opened by clphillips 44
  • Extra empty lines not showing up in preview using parsedown and codeigniter

    Extra empty lines not showing up in preview using parsedown and codeigniter

    Previewing parsedown not displaying correct newlines in preview

    demo

    With <br/> manually entered texarea I do not want to have to enter <br/> all the time if need to have extra lines

    demo1

    Question: How can I get it to show correct lines amount of lines that are in the textarea to preview?

    Controller How I load the parsedown.php

    if I replace \n with <br> it will effect the code indents I have tried nl2br() also

    <?php
    
    class Example extends CI_Controller {
    
    	public function __construct() {
    		parent::__construct();
                    // application > libraries > Parsedown.php
    		$this->load->library('parsedown'); 
    	}
    
    	public function index()
    	{
    		$this->load->view('example_view');
    	}
    
    	public function preview() {
    		$data['success'] = false;
    		$data['output'] = '';
    
    		if ($this->input->post('html')) {
    			
    			$string = str_replace('\n', '\r\n', $this->input->post('html'));
    			
    			$data['success'] = true;
    			$data['output'] = $this->parsedown->text($string);
    			
    		}
    
    		echo json_encode($data);
    	}
    }
    

    Script

    
    <script type="text/javascript">
    $( document ).ready(function() {
    	$('#editor').on('paste cut mouseup mousedown mouseout keydown keyup', function(){
    		var postData = {
    			'html' : $('#editor').val(),
    		};
    		$.ajax({
    			type: "POST",
    			url: "<?php echo base_url('example/preview');?>",
    			data: postData,
    			dataType: 'json',
    			success: function(json){
    				$('#output').html(json['output']);
    			}
    		});
    	});
    });
    </script>
    

    it's one of the best markdowns this is the only thing I can that need to fix.

    opened by tasmanwebsolutions 34
  • Links won't get converted

    Links won't get converted

    Hi,

    thanks for this great and fast MD parser :) When i used it on this MD-File: https://github.com/RexDude/seo42/blob/master/README.md

    I get 6-times (EDIT: due to a REDAME.md update there are now 10) the following PHP-Warning and also all the Links won't get converted.

    Warning: preg_match_all(): Compilation failed: internal error: previously-checked referenced subpattern not found at offset 37 in /home/dude/Projekte/Web/AddonFactory/htdocs/redaxo/include/addons/seo42/classes/class.parsedown.inc.php on line 494
    

    Is this something with my MD-File or with the Parser?

    opened by ghost 33
  • Prevent various XSS attacks

    Prevent various XSS attacks

    Hello, I am back with the safe links issue. As always this is open to discussion :) but together with already existing markupEscaped setting, this is basically a fix for issues #161 and #210, and maybe other im not aware of.

    feature request priority 
    opened by naNuke 33
  • Add a CLI mode to allow parsing markdown at the command line

    Add a CLI mode to allow parsing markdown at the command line

    I was writing a lot of tests, and then copy/pasting the HTML from my browser to the appropriate HTML file, which was very time consuming. Then it occurred to me that we could use Parsedown.php as the CLI tool. With this code you can now do:

    php Parsedown.php document.md > document.html
    
    feature request 
    opened by scottchiefbaker 31
  • better approach to extending markdown

    better approach to extending markdown

    this approach is totally based on callbacks and more consistent.

    obsoletes #70 and #73

    feature request 
    opened by cebe 28
  • Href attribute not being rendered correctly

    Href attribute not being rendered correctly

    When parsing HTML inside my Markdown string, I've been coming across some issues. Most noticeably any href attributes of those HTML bits.

    <a z='/training/schema' href='/training/schema'>Schema</a>
    

    Is rendered as

    <a z="/training/schema" href="%7B%7B%20url%20%7D%7D">Schema</a>
    
    bug needs-more-info 
    opened by jackmcdade 26
  • Allow to remove the englobing paragraph

    Allow to remove the englobing paragraph

    I would like to remove the <p></p> englobing the html output as it has not specifically been asked.

    Just make an optional parameter to remove it in parse

    $parser = new Parsedown();
    $output = $parser->parse($input,false);//Without <p></p>
    $output = $parser->parse($input,true);//With <p></p>
    $output = $parser->parse($input);//With <p></p>
    

    Thanks

    feature request 
    opened by Hypoaristerolactotherapist 25
  • Add

    Add "id" attribute to headings

    This is more or less how GitHub Flavored Markdown works.

    See an example: https://github.com/henriquemoody/parsedown/blob/titles/test/data/atx_heading.md

    opened by henriquemoody 21
  • How to implement a <figure> block

    How to implement a
    block

    Hi, I wish to have a HTML 5

    block with markdown, for example,

     <figure>
      <img src="pic_trulli.jpg" alt="Trulli" style="width:100%">
      <figcaption>Fig.1 - Trulli, Puglia, Italy.</figcaption>
    </figure> 
    

    Is there a implementation for this? If not, please provide some helps.

    opened by nvayalil 4
  • Updated version of supported PHP to 7.4

    Updated version of supported PHP to 7.4

    Travis run tests up to 7.4.

    opened by grogy 0
  • Convert <img> into <picture>

    Convert into

    Hi,

    I'm looking at automatically converting <img> tags into <picture> tags. For example, converting this

    ![foo](image.jpg)
    

    into this

    <picture>
      <source type="image/webp" srcset="image.webp">
      <img src="image.jpg">
    </picture>
    

    I've reached this point:

    protected function inlineImage($excerpt)
    {
      $image = parent::inlineImage($excerpt);
    
      $image['element']['name'] = 'picture';
    
      return $image;
    }
    

    And the result is

    <picture src="image.jpg" alt="foo"></picture>
    

    But now I'm stuck. How do I add children elements into <picture>? 🤔

    opened by CyrilMazur 0
  • Documentation of returned array

    Documentation of returned array

    I'm struggling with understanding how to create own extension to parsedown, more specificaly - what to return in implementation of functions registered with addInlineType and addBlockType.

    Is there any documentation which explains supported and expected keys and values of returned array?

    opened by SirPL 3
  • Support for{:target=

    Support for{:target="_blank"} in links

    Hi, it's possibile to implement the target tag in links? link:

    zavy.im{:target="_blank"}

    https://stackoverflow.com/a/4705645

    opened by Zavy86 0
  • Multiple blockquotes are getting merged

    Multiple blockquotes are getting merged

    > Blockquote 1
    
    > Blockquote 2
    

    Results in:

    <blockquote>
    <p>Blockquote 1</p>
    <p>Blockquote 2</p>
    </blockquote>
    

    Instead of:

    <blockquote>
    <p>Blockquote 1</p>
    </blockquote>
    
    <blockquote>
    <p>Blockquote 2</p>
    </blockquote>
    
    opened by royduin 1
  • Main class

    Main class

    How to set main class for ALL generated tags? (I need to add css but it needs .markdown-element class)

    opened by MincoMK 1
  • Add superscript shorthand

    Add superscript shorthand

    Hi!

    This is a tentative feature request / pull request to add the shorthand for superscript text.

    Cheers!

    opened by ZoeB 0
  • How hard would it be to support asciidoc?

    How hard would it be to support asciidoc?

    How hard would it be to support Asciidoc?

    Since the two are similar, I am wondering if it makes sense to use this as a starting point. Maybe as a new project based on this one, or as part of Parsedown itself.

    opened by tuaris 0
  • Links with IPv6 addresses cannot be handled correctly

    Links with IPv6 addresses cannot be handled correctly

    Consider the following text:

    Link with IPv6 address:
    https://[2408:400a:d:a300::1]/
    
    Link with IPv6 address and port:
    https://[2408:400a:d:a300::1]:443/
    

    The parsing result on https://parsedown.org/demo is:

    image

    <p>
      Link with IPv6 address:
      <a>https://[2408:400a:d:a300::1</a>]/
    </p>
    <p>
      Link with IPv6 address and port:
      <a href="https://[2408:400a:d:a300::1]/">https://[2408:400a:d:a300::1]:443/</a>
    </p>
    

    It should be:

    <p>
      Link with IPv6 address:
      <a>https://[2408:400a:d:a300::1]/</a>
    </p>
    <p>
      Link with IPv6 address and port:
      <a href="https://[2408:400a:d:a300::1]/">https://[2408:400a:d:a300::1]:443/</a>
    </p>
    
    opened by YihaoPeng 0
Releases(1.7.4)
  • 1.7.4(Dec 30, 2019)

    Introduce rawHtml concept from beta 1.8 that extensions may optionally utilise. In 1.8 beta versions this feature is utilised internally and might have compatibility issues with extensions, this release does not use this feature internally so no such issues will be present.

    Source code(tar.gz)
    Source code(zip)
  • 1.7.2(Mar 17, 2019)

    This is a security release and resolves an issue which would allow a user to add arbitrary classes to fenced code blocks. This might have security consequences, see #699 for more detail.

    Source code(tar.gz)
    Source code(zip)
  • 1.8.0-beta-6(Mar 17, 2019)

    This is a pre-release.

    To see what's changed from 1.7.1, please refer to the draft release notes in https://github.com/erusev/parsedown/issues/601.

    Any testing, bug-reports, or bug-fixes are very welcome.


    This beta increment is a security release and resolves an issue which would allow a user to add arbitrary classes to fenced code blocks. This might have security consequences, see #699 for more detail.

    Source code(tar.gz)
    Source code(zip)
  • 1.8.0-beta-5(Jun 11, 2018)

    This is a pre-release.

    To see what's changed from 1.7.1, please refer to the draft release notes in https://github.com/erusev/parsedown/issues/601.

    Any testing, bug-reports, or bug-fixes are very welcome.


    This beta release restores the existence of some previously deleted protected interface endpoints.

    Source code(tar.gz)
    Source code(zip)
  • 1.8.0-beta-4(May 8, 2018)

    This is a pre-release.

    To see what's changed from 1.7.1, please refer to the draft release notes in https://github.com/erusev/parsedown/issues/601.

    Any testing, bug-reports, or bug-fixes are very welcome.


    Some minor bug-fixes have been resolved since beta-3.

    Source code(tar.gz)
    Source code(zip)
  • 1.8.0-beta-3(May 7, 2018)

    This is a pre-release.

    To see what's changed from 1.7.1, please refer to the draft release notes in https://github.com/erusev/parsedown/issues/601.

    Any testing, bug-reports, or bug-fixes are very welcome.


    Essentially this is the second beta but I forgot to bump the class version number before tagging, and I'm not a fan of deleting version tags – hence number 3.

    Source code(tar.gz)
    Source code(zip)
  • 1.8.0-beta-1(Apr 7, 2018)

    This is a pre-release.

    To see what's changed from 1.7.1, please refer to the draft release notes in https://github.com/erusev/parsedown/issues/601.

    Any testing, bug-reports, or bug-fixes are very welcome.

    Source code(tar.gz)
    Source code(zip)
  • 1.7.1(Mar 7, 2018)

    This is a bugfix release. The following have been resolved:

    #475: "Loose" lists will now contain paragraphs in all items, not just some. #433: Links will no longer be double nested #525: The info-string when beginning a code block may now contain non-word characters (e.g. c++) #561: The mbstring extension (which we already depend on) has been added explicitly to composer.json #563: The Parsedown::version constant now matches the release version #560: Builds will now fail if we forget to update the version constant again 😉

    Thanks to @PhrozenByte, @harikt, @erusev, @luizbills, and @aidantwoods for their contributions to this release.

    Source code(tar.gz)
    Source code(zip)
  • 1.6.0(Oct 4, 2015)

  • 1.0.0-rc.1(Apr 18, 2014)

    This is a major release. It introduces a more granular class architecture. This improves extensibility and makes the code easier to read. The release also introduces an interface that allows independent parsing of inline elements.

    p.s. There's an implementation detail that I'd like to mention. It is about the use of strpbrk. I wanted to mention it, because the idea that strpbrk could replace strpos came from another project - a Parsedown based project by @cebe. I should also mention that it was brought to my attention by @hkdobrev.

    Source code(tar.gz)
    Source code(zip)
  • 0.8.0(Dec 26, 2013)

    Version 0.8 features a new approach to parsing inline elements. Along with performance, it improves consistency.

    To give an example, here are a markdown text and a comparison of the output that it would produce.

    *em **strong em***
    ***strong em** em*
    *em **strong em** em*
    

    The parser used by GitHub.com:

    <em>em *</em>strong em***
    <em>**strong em</em>* em*
    <em>em *</em>strong em** em*
    

    Parsedown:

    <p><em>em <strong>strong em</strong></em>
    <strong><em>strong em</em>* em*
    *em </strong>strong em*<em> em</em></p>
    

    Additionally, version 0.8 features an option to enable automatic line breaks.

    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Nov 17, 2013)

  • 0.4.0(Nov 4, 2013)

  • 0.3.0(Nov 2, 2013)

Owner
Emanuil Rusev
designer / developer / minimalist / maker of @notaapp
Emanuil Rusev
Convert HTML to Markdown with PHP

HTML To Markdown for PHP Library which converts HTML to Markdown for your sanity and convenience. Requires: PHP 7.2+ Lead Developer: @colinodell Origi

The League of Extraordinary Packages 1.3k Jun 16, 2021
Highly-extensible PHP Markdown parser which fully supports the CommonMark and GFM specs.

league/commonmark league/commonmark is a highly-extensible PHP Markdown parser created by Colin O'Dell which supports the full CommonMark spec and Git

The League of Extraordinary Packages 1.9k Jun 13, 2021
Better Markdown Parser in PHP

Parsedown Better Markdown Parser in PHP - Demo. Features One File No Dependencies Super Fast Extensible GitHub flavored Tested in 5.3 to 7.3 Markdown

Emanuil Rusev 13.7k Jun 17, 2021
一个结构清晰的,易于维护的,现代的PHP Markdown解析器

为何要写这样一个解析器 Markdown已经面世许多年了,国内外许多大大小小的网站都在用它,但是它的解析器却依然混乱不堪。SegmentFault 是中国较大规模使用 Markdown 语法的网站,我们一直在使用一些开源类库,包括但不限于 php-markdown CommonMark for PH

SegmentFault 思否 1.2k Jun 12, 2021