Convert HTML to Markdown with PHP

Last update: Aug 11, 2022

HTML To Markdown for PHP

Latest Version Software License Build Status Coverage Status Quality Score Total Downloads

Library which converts HTML to Markdown for your sanity and convenience.

Requires: PHP 7.2+

Lead Developer: @colinodell

Original Author: @nickcernis

Why convert HTML to Markdown?

"What alchemy is this?" you mutter. "I can see why you'd convert Markdown to HTML," you continue, already labouring the question somewhat, "but why go the other way?"

Typically you would convert HTML to Markdown if:

  1. You have an existing HTML document that needs to be edited by people with good taste.
  2. You want to store new content in HTML format but edit it as Markdown.
  3. You want to convert HTML email to plain text email.
  4. You know a guy who's been converting HTML to Markdown for years, and now he can speak Elvish. You'd quite like to be able to speak Elvish.
  5. You just really like Markdown.

How to use it

Require the library by issuing this command:

composer require league/html-to-markdown

Add require 'vendor/autoload.php'; to the top of your script.

Next, create a new HtmlConverter instance, passing in your valid HTML code to its convert() function:

use League\HTMLToMarkdown\HtmlConverter;

$converter = new HtmlConverter();

$html = "<h3>Quick, to the Batpoles!</h3>";
$markdown = $converter->convert($html);

The $markdown variable now contains the Markdown version of your HTML as a string:

echo $markdown; // ==> ### Quick, to the Batpoles!

The included demo directory contains an HTML->Markdown conversion form to try out.

Conversion options

By default, HTML To Markdown preserves HTML tags without Markdown equivalents, like <span> and <div>.

To strip HTML tags that don't have a Markdown equivalent while preserving the content inside them, set strip_tags to true, like this:

$converter = new HtmlConverter(array('strip_tags' => true));

$html = '<span>Turnips!</span>';
$markdown = $converter->convert($html); // $markdown now contains "Turnips!"

Or more explicitly, like this:

$converter = new HtmlConverter();
$converter->getConfig()->setOption('strip_tags', true);

$html = '<span>Turnips!</span>';
$markdown = $converter->convert($html); // $markdown now contains "Turnips!"

Note that only the tags themselves are stripped, not the content they hold.

To strip tags and their content, pass a space-separated list of tags in remove_nodes, like this:

$converter = new HtmlConverter(array('remove_nodes' => 'span div'));

$html = '<span>Turnips!</span><div>Monkeys!</div>';
$markdown = $converter->convert($html); // $markdown now contains ""

By default, all comments are stripped from the content. To preserve them, use the preserve_comments option, like this:

$converter = new HtmlConverter(array('preserve_comments' => true));

$html = '<span>Turnips!</span><!-- Monkeys! -->';
$markdown = $converter->convert($html); // $markdown now contains "Turnips!<!-- Monkeys! -->"

To preserve only specific comments, set preserve_comments with an array of strings, like this:

$converter = new HtmlConverter(array('preserve_comments' => array('Eggs!')));

$html = '<span>Turnips!</span><!-- Monkeys! --><!-- Eggs! -->';
$markdown = $converter->convert($html); // $markdown now contains "Turnips!<!-- Eggs! -->"

By default, placeholder links are preserved. To strip the placeholder links, use the strip_placeholder_links option, like this:

$converter = new HtmlConverter(array('strip_placeholder_links' => true));

$html = '<a>Github</a>';
$markdown = $converter->convert($html); // $markdown now contains "Github"

Style options

By default bold tags are converted using the asterisk syntax, and italic tags are converted using the underlined syntax. Change these by using the bold_style and italic_style options.

$converter = new HtmlConverter();
$converter->getConfig()->setOption('italic_style', '*');
$converter->getConfig()->setOption('bold_style', '__');

$html = '<em>Italic</em> and a <strong>bold</strong>';
$markdown = $converter->convert($html); // $markdown now contains "*Italic* and a __bold__"

Line break options

By default, br tags are converted to two spaces followed by a newline character as per traditional Markdown. Set hard_break to true to omit the two spaces, as per GitHub Flavored Markdown (GFM).

$converter = new HtmlConverter();
$html = '<p>test<br>line break</p>';

$converter->getConfig()->setOption('hard_break', true);
$markdown = $converter->convert($html); // $markdown now contains "test\nline break"

$converter->getConfig()->setOption('hard_break', false); // default
$markdown = $converter->convert($html); // $markdown now contains "test  \nline break"

Autolinking options

By default, a tags are converted to the easiest possible link syntax, i.e. if no text or title is available, then the <url> syntax will be used rather than the full [url](url) syntax. Set use_autolinks to false to change this behavior to always use the full link syntax.

$converter = new HtmlConverter();
$html = '<p><a href="https://thephpleague.com">https://thephpleague.com</a></p>';

$converter->getConfig()->setOption('use_autolinks', true);
$markdown = $converter->convert($html); // $markdown now contains "<https://thephpleague.com>"

$converter->getConfig()->setOption('use_autolinks', false); // default
$markdown = $converter->convert($html); // $markdown now contains "[https://google.com](https://google.com)"

Passing custom Environment object

You can pass current Environment object to customize i.e. which converters should be used.

$environment = new Environment(array(
    // your configuration here
));
$environment->addConverter(new HeaderConverter()); // optionally - add converter manually

$converter = new HtmlConverter($environment);

$html = '<h3>Header</h3>
<img src="" />
';
$markdown = $converter->convert($html); // $markdown now contains "### Header" and "<img src="" />"

Table support

Support for Markdown tables is not enabled by default because it is not part of the original Markdown syntax. To use tables add the converter explicitly:

use League\HTMLToMarkdown\HtmlConverter;
use League\HTMLToMarkdown\Converter\TableConverter;

$converter = new HtmlConverter();
$converter->getEnvironment()->addConverter(new TableConverter());

$html = "<table><tr><th>A</th></tr><tr><td>a</td></tr></table>";
$markdown = $converter->convert($html);

Limitations

  • Markdown Extra, MultiMarkdown and other variants aren't supported – just Markdown.

Style notes

  • Setext (underlined) headers are the default for H1 and H2. If you prefer the ATX style for H1 and H2 (# Header 1 and ## Header 2), set header_style to 'atx' in the options array when you instantiate the object:

    $converter = new HtmlConverter(array('header_style'=>'atx'));

    Headers of H3 priority and lower always use atx style.

  • Links and images are referenced inline. Footnote references (where image src and anchor href attributes are listed in the footnotes) are not used.

  • Blockquotes aren't line wrapped – it makes the converted Markdown easier to edit.

Dependencies

HTML To Markdown requires PHP's xml, lib-xml, and dom extensions, all of which are enabled by default on most distributions.

Errors such as "Fatal error: Class 'DOMDocument' not found" on distributions such as CentOS that disable PHP's xml extension can be resolved by installing php-xml.

Contributors

Many thanks to all contributors so far. Further improvements and feature suggestions are very welcome.

How it works

HTML To Markdown creates a DOMDocument from the supplied HTML, walks through the tree, and converts each node to a text node containing the equivalent markdown, starting from the most deeply nested node and working inwards towards the root node.

To-do

  • Support for nested lists and lists inside blockquotes.
  • Offer an option to preserve tags as HTML if they contain attributes that can't be represented with Markdown (e.g. style).

Trying to convert Markdown to HTML?

Use one of these great libraries:

No guarantees about the Elvish, though.

GitHub

https://github.com/thephpleague/html-to-markdown
You might also like...

A highly configurable markdown renderer and Blade component for Laravel

A highly configurable markdown renderer and Blade component for Laravel

A highly configurable markdown renderer and Blade component for Laravel This package contains: a Blade component that can render markdown a highly con

Aug 13, 2022

Easily add routes to your Laravel app by creating Markdown or Blade files

Laravel Pages This package lets you create pages using Markdown or Blade without having to worry about creating routes or controllers yourself. Essent

Jul 6, 2022

Render colored Markdown contents on console terminal

Render colored  Markdown contents on console terminal

cli-markdown Render colored markdown contents on console terminal Preview run demo by php example/demo.php Features support auto render color on termi

Jan 27, 2022

markdown wiki/blog

markdown wiki/blog

Kwiki markdown wiki/blog Usage Place your markdown files in the /wiki directory. Categories are directories and subcategories are subdirectories. If y

May 29, 2022

Symfony 5 bundle to easily create dynamic subpages with Markdown. Useful for help sections and wikis.

MarkdownWikiBundle This bundle allows you to create rich subpages in a Symfony project using Markdown. Pages are stored in a file cache and sourced fr

Apr 26, 2022

Gruik ! An open-source markdown note-taking web app. [ABANDONED PROJECT]

What is Gruik ? It's a free & open-source note-taking service. A space where you can store notes, tutorials, code snippets... by writing them in markd

Mar 31, 2022

Docbook Tool for static documentation generation from Markdown files

Roave Docbook Tool Static HTML and PDF generator tool for generating documentation from Markdown files. Generates a deployable HTML file from Markdown

Aug 1, 2022

PHP Documentation system.

PHP Documentation system Simple but powerful Markdown docs. Features Search within Markdown files Customizable Twig templates (Note: default design is

Jun 8, 2022

Parser for Markdown and Markdown Extra derived from the original Markdown.pl by John Gruber.

PHP Markdown PHP Markdown Lib 1.9.0 - 1 Dec 2019 by Michel Fortin https://michelf.ca/ based on Markdown by John Gruber https://daringfireball.net/ Int

Aug 5, 2022

Convert HTML to Markdown with PHP

HTML To Markdown for PHP Library which converts HTML to Markdown for your sanity and convenience. Requires: PHP 7.2+ Lead Developer: @colinodell Origi

Aug 1, 2022

Convert HTML to Markdown with PHP

HTML To Markdown for PHP Library which converts HTML to Markdown for your sanity and convenience. Requires: PHP 7.2+ Lead Developer: @colinodell Origi

Aug 11, 2022

A package that uses blade templates to control how markdown is converted to HTML inside Laravel, as well as providing support for markdown files to Laravel views.

A package that uses blade templates to control how markdown is converted to HTML inside Laravel, as well as providing support for markdown files to Laravel views.

Install Install via composer. $ composer require olliecodes/laravel-etched-blade Once installed you'll want to publish the config. $ php artisan vendo

Jul 5, 2021

CssToInlineStyles is a class that enables you to convert HTML-pages/files into HTML-pages/files with inline styles. This is very usefull when you're sending emails.

CssToInlineStyles class Installation CssToInlineStyles is a class that enables you to convert HTML-pages/files into HTML-pages/files with inline style

Aug 11, 2022

A PHP tool that helps you write eBooks in markdown and convert to PDF.

A PHP tool that helps you write eBooks in markdown and convert to PDF.

Artwork by Eric L. Barnes and Caneco from Laravel News ❤️ . This PHP tool helps you write eBooks in markdown. Run ibis build and an eBook will be gene

Aug 14, 2022

A PHP component to convert HTML into a plain text format

html2text html2text is a very simple script that uses DOM methods to convert HTML into a format similar to what would be rendered by a browser - perfe

Aug 8, 2022

Simplexcel.php - Easily read / parse / convert / write between Microsoft Excel XML / CSV / TSV / HTML / JSON / etc spreadsheet tabular file formats

Simple Excel Easily parse / convert / write between Microsoft Excel XML / CSV / TSV / HTML / JSON / etc formats For further deatails see the GitHuib P

Aug 11, 2022

Convert HTML to PDF using Webkit (QtWebKit)

wkhtmltopdf and wkhtmltoimage wkhtmltopdf and wkhtmltoimage are command line tools to render HTML into PDF and various image formats using the QT Webk

Aug 15, 2022

Convert html to an image, pdf or string

Convert html to an image, pdf or string

Convert a webpage to an image or pdf using headless Chrome The package can convert a webpage to an image or pdf. The conversion is done behind the sce

Aug 9, 2022

Laravel package to convert HTML to PDF, supporting multiple drivers.

eve/pdf-converter A Laravel package to help convert HTML to PDF. Supports multiple drivers. Requirements and Installation eve/pdf-converter requires L

Feb 15, 2022
Comments
  • 1. html-to-markdown Not Working

    I installed html-to-markdown using the instructions you provided , but my Laravel project will have this error, I want to ask if you mean to put the data in the src folder in your GitHub into the Laravel project and use it? Because the way I currently do it won't work.

    Below is a picture of my error: image

    Code: image

    Reviewed by Timmy5818 at 2022-08-10 09:05
  • 2. Escape characters incorrectly added in front of valid markdown bullets.

    Version(s) affected

    5.1

    Description

    Given a string that contains a combination of HTML line breaks and markdown bullets, when the the HTML is converted, the bullets are escaped. For example:

    String

    "List of stuff:<br />- List item one<br />- List <a href="http://foo.com" target="_blank" rel="noreferrer noopener">item</a> two<br />* List item [three] with braces"
    

    Expected Result

    List of stuff:
    - List item one
    - List [item](http://foo.com) two
    * List item [three] with braces
    

    Actual Result

    List of stuff:
    \- List item one
    \- List [item](http://foo.com) two
    \* List item \[three\] with braces
    

    How to reproduce

    See description.

    Reviewed by deetergp at 2022-03-24 21:07
  • 3. Line breaks inside tag

    Version(s) affected

    5.0.2

    Description

    Line breaks inside tags produce incorrect markdown

    How to reproduce

    HTML:

    <b>Hello<br><br>World</b>
    

    Output:

    **Hello  
      
    world**
    

    Expected output:

    **Hello**
      
    **world**
    
    Reviewed by multiwebinc at 2021-11-18 23:03
  • 4.
    
    	                                     
    	                                 

    Version(s) affected

    5.0.2

    Description

    How to reproduce

    html

    <pre class="language-"><code>GET /announcements
     </code></pre>
    

    after convert

    ```
    <pre class="language-">```
    GET /announcements
    
    ```
    ```
    
    Reviewed by kbitlive at 2021-11-17 15:25
Generate pseudo-static pages from markdown and HTML files for Flarum

Flarum Pages Generator This is not a Flarum extension. This package provides a Flarum extender that you can use in the local extend.php to define cust

Feb 21, 2022
Better Markdown Parser in PHP
Better Markdown Parser in PHP

Parsedown Better Markdown Parser in PHP - Demo. Features One File No Dependencies Super Fast Extensible GitHub flavored Tested in 5.3 to 7.3 Markdown

Aug 11, 2022
Highly-extensible PHP Markdown parser which fully supports the CommonMark and GFM specs.
Highly-extensible PHP Markdown parser which fully supports the CommonMark and GFM specs.

league/commonmark league/commonmark is a highly-extensible PHP Markdown parser created by Colin O'Dell which supports the full CommonMark spec and Git

Aug 6, 2022
A PHP tool to generate templateable markdown documentation from the docblocks or type-hints of your codebase.

Roster Installation To install, simply require the package using composer: composer require

Sep 8, 2021
PHP Markdown Engine Support

PHP Markdown Version v1.x support all PHP version >=5.4 v2.x support all PHP version >=7.0 Cài đặt thư viện Thư viện này được cài đặt thông qua Compos

Jul 1, 2022
Rendering markdown from PHP code

JBZoo / Markdown Installing composer require jbzoo/markdown Usage Rendering Table <?php declare(strict_types=1); use JBZoo\Markdown\Table; echo (new

Dec 26, 2021
PHP Markdown & Extra

PHP Markdown & Extra An updated and stripped version of the original PHP Markdown by Michel Fortin. Works quite well with PSR-0 autoloaders and is Com

Jan 18, 2022
A simple regex-based Markdown parser in PHP

Slimdown A simple regex-based Markdown parser in PHP. Supports the following elements (and can be extended via Slimdown::add_rule()): Headers Links Bo

Jul 27, 2022
A super lightweight Markdown parser for PHP projects and applications.

A speedy Markdown parser for PHP applications. This is a super lightweight Markdown parser for PHP projects and applications. It has a rather verbose

May 31, 2022
PHP based Markdown documentation viewer

PHP based viewer for Markdown files, to view them with fenced code highlighting and navigation.

Mar 31, 2022