Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Href attribute not being rendered correctly #266

Closed
jackmcdade opened this issue Jan 15, 2015 · 26 comments
Closed

Href attribute not being rendered correctly #266

jackmcdade opened this issue Jan 15, 2015 · 26 comments

Comments

@jackmcdade
Copy link

When parsing HTML inside my Markdown string, I've been coming across some issues. Most noticeably any href attributes of those HTML bits.

<a z='/training/schema' href='/training/schema'>Schema</a>

Is rendered as

<a z="/training/schema" href="%7B%7B%20url%20%7D%7D">Schema</a>
@vladdu
Copy link

vladdu commented Jan 15, 2015

I think you mean that
<a z='{{url}}' href='{{url}}'>Schema</a>
is rendered as
<a z='{{url}}' href='%7B%7Burl%7D%7D'>Schema</a>

@erusev
Copy link
Owner

erusev commented Jan 15, 2015

I don't seem to be able to reproduce it. Do you reproduce it in the demo?

@vladdu
Copy link

vladdu commented Jan 15, 2015

In the demo it looks ok. @jackmcdade should know more details about how he processes the input before feeding it to the parser.

@jackmcdade
Copy link
Author

Ah i see what it is. We're parsing Markdown before our template language does variable replacement, so it's selectively encoding/stripping/altering the {{ curly }} braces of certain attributes.

@erusev
Copy link
Owner

erusev commented Jan 15, 2015

@jackmcdade

Do you think that this is related to an issue with Parsedown?

@jackmcdade
Copy link
Author

Yeah definitely. We just upgraded Parsedown and the issue popped up. Rolling back fixes it. As does using other Markdown parsers.

@erusev
Copy link
Owner

erusev commented Jan 15, 2015

Could you help me reproduce it?

@jackmcdade
Copy link
Author

Sure thing, just drop this in the demo:

<a href="{{ url }}">A URL!</a>

@erusev
Copy link
Owner

erusev commented Jan 15, 2015

I did that. Parsedown seems to produce identical output as Markdown PHP. Am I missing something?

@jackmcdade
Copy link
Author

I'm not quite sure what to say except that in Statamic that parser doesn't break anything, and Parsedown does. I can set up a sandbox for you if that might help?

@erusev
Copy link
Owner

erusev commented Jan 15, 2015

That might help, but before that, could you figure out which is the exact version that introduces the issue?

@jackmcdade
Copy link
Author

I’ll see what I can do.

- Jack

On January 15, 2015 at 5:30:27 PM, Emanuil Rusev (notifications@github.com) wrote:

That might help, but before that, could you figure out which is the exact version that introduces the issue?


Reply to this email directly or view it on GitHub.

@erusev
Copy link
Owner

erusev commented Jan 17, 2015

@jackmcdade

Did you get a chance to look into it?

@rhukster
Copy link
Contributor

I'm not sure if this is the same issue, but I have recently noticed that there is a problem when you use curly brackets in a markdown link, it doesn't work.

Some examples:

[Some Link]({{url}})

Renders out of Parsedown as:

<a href="{{url}}">Some Link</a>

As expected. However, if you have any spaces between the parens, like:

[Some Link]({{ url }})

the indentifyLink() fails on the regex: https://github.com/erusev/parsedown/blob/master/Parsedown.php#L1202

and you basically get your markdown spat back at you unprocessed. The problem with that appears to be the space before the opening parens in the first negated set. If you remove that so the resulting regex is:

'/^[(]((?:[^(]|[(][^ )]+[)])+)(?:[ ]+("[^"]+"|\'[^\']+\'))?[)]/'

This now works. However, this is a pretty complex regex and i'm not sure the implications of removing this space for other tags. Perhaps @erusev has some insights as this was changed pretty heavily between 1.14 and current.

BTW, I have no issue with curly brackets in HTML links themselves, Parsedown just ignores them and it works fine. eg:

<a href="{{ site.foo }}">A URL!</a>

Doesn't get any modified at all, works fine.

@erusev
Copy link
Owner

erusev commented Jan 19, 2015

the indentifyLink() fails on the regex

This is expected. The regex expects a valid URL.

BTW, I have no issue with curly brackets in HTML links themselves, Parsedown just ignores them and it works fine.

Same here. I'm unable to reproduce the issue. It'd be great if @jackmcdade could help us with that.

@rhukster
Copy link
Contributor

FWIW, I got around this in Grav by using on overriden inlineLink() method, detecting Twig syntax in the URL, setting the link to a simple safe URL: /, processing normally, then setting the href back to the original Twig code:

    protected function inlineLink($excerpt)
    {
        // do some trickery to get around Parsedown requirement for valid URL if its Twig in there
        if (preg_match('/\!*\[(?:.*)\]\(([{{|{%|{#].*[#}|%}|}}])\)/', $excerpt['text'], $matches)) {
            $excerpt['text'] = str_replace($matches[1], '/', $excerpt['text']);
            $excerpt = parent::inlineLink($excerpt);
            $excerpt['element']['attributes']['href'] = $matches[1];
            $excerpt['extent'] = $excerpt['extent'] + strlen($matches[1]) - 1;
        } else {
            $excerpt = parent::inlineLink($excerpt);
        }
    return $excerpt;
    }

@erusev
Copy link
Owner

erusev commented Jan 23, 2015

@jackmcdade

Is this about Parsedown or Parsedown Extra?

@jasonvarga
Copy link

I'm unable to reproduce it, but I can tell you Statamic is using Parsedown Extra.

@jasonvarga
Copy link

Okay, I can reproduce it. Even in the demo. Parsedown Extra only.

<li><a href="{{ url }}" attr="{{ url }}">Test</a></li>
Outputs:
<li><a href="%7B%7B%20url%20%7D%7D" attr="{{ url }}">Test</a></li>

But, without the <li> wrapper:

<a href="{{ url }}" attr="{{ url }}">Test</a>
Outputs:
<p><a href="{{ url }}" attr="{{ url }}">Test</a></p>

However, I think it's correct that you are encoding those characters. Is there an option to disable it?

@hwmaier
Copy link

hwmaier commented Nov 3, 2015

Same here. Twig tag brackets are encoded if they appear in a href attribute and the link tag is enclosed in div tag for example (using Parsedown extra).
This works:

<a class="{{ page.class }}" href="{{ page.url }}"> {{ page.title }} </a>

This doesn't:

<div><a class="{{ page.class }}" href="{{ page.url }}"> {{ page.title }} </a></div>

and results in

<div><a class="main" href="%7B%7B%20page.url%20%7D%7D"> My Pagetitle </a></div>

@hwmaier
Copy link

hwmaier commented Nov 24, 2015

I did some further investigation on this. The issue is caused by PHP's DOMDocument class urlencoding the src and href attributes in ParsdownExtra's processTag method.

A similar issue is reported here:
http://stackoverflow.com/questions/20102346/avoid-percent-encoding-href-attributes-when-using-phps-domdocument

One solution would be to loop through all src and href attributes and urldecode them to reverse the encoding. Another could be to rename the attribute to _src and _href, do the DOM processing and rename them back to src and href.

@aidantwoods
Copy link
Collaborator

Since it didn't appear to be possible to reproduce this issue, I'm going to close this one – please reopen if you can reproduce this with the latest version though.

@Moonlight63
Copy link

I've just come across this same issue while working in grav. Came here hoping for a solution.

code:

<a data-test="{{ image.url() }}" href="{{ image.url() }}" data-srcset="{{ image.srcset() }}" data-sizes="{{ image.sizes() }}">
	<img src='{{ image.url() }}'>
</a>

result:

<a data-test="/user/pages/05.test/image.jpg" href="%7B%7B%20image.url()%20%7D%7D" data-srcset="/user/pages/05.test/image@4x.jpg 2500w, /user/pages/05.test/image@5x.jpg 3500w, /user/pages/05.test/image@2x.jpg 1000w, /user/pages/05.test/image@3x.jpg 1500w, /user/pages/05.test/image@6x.jpg 5664w, /user/pages/05.test/image.jpg 640w" data-sizes="100vw, (min-width: 576px) 510px, (min-width: 768px) 460px, (min-width: 992px), 465px, (min-width: 1200px) 555">
    <img src="%7B%7B%20image.url()%20%7D%7D">
</a>

href and src tags broke. Disabling Parsedown Extra fixes it.

@rhukster
Copy link
Contributor

Yah this is an upstream problem with Markdown Extra library, not much we can do. We're probably going to remove Markdown Extra in Grav 2.0 unless they do some significant improvements to it.

@aidantwoods
Copy link
Collaborator

So as @hwmaier has pointed out, the main issue here appears to be caused by ParsedownExtra's use of PHP's DOMDocument. (I think there is a seperate issue @rhukster has experienced related to how many spaces are allowed in a markdown link – Parsedown is doing the correct thing here per CommonMark, and so I think that the correct approach to allowing spaces within curly brackets will boil down to being permitted by user added behaviour).

We use DOMDocument in ParsedownExtra to parse HTML tags to detect the markdown=1 attributes that enable markdown processing within HTML tags. Personally, I'm not a fan of ParsedownExtra's method of enabling markdown within HTML tags: from the implementor's point of view, it is problematic because it requires parsing HTML in addition to the markdown syntax; from the writer's point of view, it's (IMO) not a very natural construction to enable a markdown feature with HTML attribute syntax. Since this approach require's parsing HTML, it leaves a lot of the output to the mercy of the HTML parser we use and so subtleties like this will end up as upstream bugs to ParsedownExtra, as this one is (short of swapping the HTML parser dependency we use).

CommonMark's method of achieving markdown content within HTML is both simpler to use (IMO) and also to implement (see also /cc: @PhrozenByte).

Given more modular "agility" in the next Parsedown release, I'd expect that the additional features that ParsedownExtra contains could be readily imported (or enabled) within Parsedown. There are also significant improvements that have already been made to Parsedown's HTML handling, which (as far as I'm aware) should bring it up to being CommonMark compliant in this regard (see changes). These aren't yet released, but will definitely make it into the next version.

With all that considered, I'd envision it being possible (at the most drastic) to move ParsedownExtra's additions into the Parsedown repo as a set of modular features (that'll be by default off). If this happened I'd expect to just dump the markdown=1 feature. For something less drastic, I'd expect that the user would be able to choose between handling approaches to suit what works best syntactically for them (with the CommonMark approach chosen by default).

As a TL;DR: this is unlikely to occur in 2.0 since we'll have the opportunity to drop a dependency within ParsedownExtra, if we don't drop the dependency it might still occur but only if the markdown=1 feature is specifically enabled.
These are of course just ideas so far though (representing my current intent, but not necessarily a guarantee of what the next version will look like).

@lolozere
Copy link

To fix the problen you can process twig fist:

https://learn.getgrav.org/15/content/headers#process-twig-first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants