Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent various XSS attacks #276

Closed
wants to merge 4 commits into from
Closed

Prevent various XSS attacks #276

wants to merge 4 commits into from

Conversation

naNuke
Copy link
Contributor

@naNuke naNuke commented Jan 21, 2015

Hello, I am back with the safe links issue.
As always this is open to discussion :) but together with already existing markupEscaped setting, this is basically a fix for issues #161 and #210, and maybe other im not aware of.

@erusev
Copy link
Owner

erusev commented Jan 24, 2015

This looks nice. I'd love to hear more opinions about it. I'm no expert on XSS. @hkdobrev ?

@xPaw
Copy link

xPaw commented Jan 24, 2015

I feel like this param should be named as something more generic like safeAttributes and escaping every attribute that user can enter (e.g title). Also I think checking for javascript: in urls seems like a silly idea, and urls should be escaped regardless.

@erusev
Copy link
Owner

erusev commented Jan 24, 2015

@xPaw Thanks for the input! Which characters do u think should be escaped?

@xPaw
Copy link

xPaw commented Jan 24, 2015

Using htmlspecialchars should do the job just fine.

@erusev
Copy link
Owner

erusev commented Jan 24, 2015

Thanks!

escaping every attribute that user can enter (e.g title)

Is this about attributes of anchor (<a>) tags or all of them?

@xPaw
Copy link

xPaw commented Jan 24, 2015

Any attribute that user can change should be escaped. I guess in Parsedown that only applies to links really.

@erusev
Copy link
Owner

erusev commented Jan 24, 2015

What if, instead of having an option, we implement this as a default behaviour? It shouldn't have too much of a performance cost.

@xPaw
Copy link

xPaw commented Jan 24, 2015

That actually would make more sense.

@naNuke
Copy link
Contributor Author

naNuke commented Jan 24, 2015

We dont need to escape title attribute because of the way how parsedown works, any additional quote simply breaks the regex and link wont render at all.

thing like [example](www.example.com "><script>alert('xss')</script>")
ends up as <a href="www.example.com" title="><script>alert('xss')</script>"> and wont do anything. Script tags broken anywhere outside the link will be escaped if markupEscaped is on.

But adding escape on title is easy if there is really need for it.

The check against javascript: in url is to prevent people from writing clickable javascript links which can be used as an XSS attack because you can write full strings in javascript without quotes so even with ENT_QUOTES on htmlspecialchars wont prevent someone from writing malicious code.

@erusev
Copy link
Owner

erusev commented Jan 24, 2015

@xPaw Would htmlspecialchars on attributes protect against things like [link](javascript:alert('xss'))?

@xPaw
Copy link

xPaw commented Jan 24, 2015

It won't, but simply checking if the string starts with javascript: won't fix the problem either.

See http://security.stackexchange.com/a/73751 for some examples as to why.

@erusev
Copy link
Owner

erusev commented Jan 24, 2015

I asked SO for advice at http://security.stackexchange.com/q/80003/15034

@naNuke
Copy link
Contributor Author

naNuke commented Jan 24, 2015

Updated with info from @xPaw to whitelist the protocols instead of javascript: check.
Didn't know the pseudo-protocol can be malformed in some browsers ^^
Also escaping title just to be 100% sure.

@xPaw
Copy link

xPaw commented Jan 25, 2015

I don't think having hardcoded schema list in the regex is the way to go. For example reddit has a bigger schema whitelist: https://github.com/reddit/reddit/blob/bb0f5b552c3a52abfac1e2147144eef8f2e7dfab/r2/r2/lib/filters.py#L177

@naNuke
Copy link
Contributor Author

naNuke commented Jan 25, 2015

Thanks for input :) added customizable whitelist, but since parsedown is parser meant for everyone I am not sure if it should include all possible schemas by default. Thoughts?

@hkdobrev
Copy link
Contributor

Here a few things I think should be done in order Parsedown to be more safe against XSS.

HTML

Raw HTML should be disabled in Markdown. You could use setMarkupEscaped() in Parsedown.

User-generated content

If a developer is getting user-generated content and is putting it inside Markdown documents this should be escaped/validated beforehand. You could also use HTML Purifier or a similar library to actually validate/escape content after the HTML is generated.

Attributes

Every attribute should be escaped for special characters, not only links. In Parsedown this would be the href, title, src and alt attributes.

Schemas:

  1. There is no definitive list of schemas, because you can create your own.
  2. According to OWASP you should whitelist http:// and https:// only for security purposes.

Don't take this the wrong way, I think this is a step in the right direction.

About what @erusev suggested that this might be the default:

  • Attributes escaping could be done by default.
  • HTML escaping is already behind an option and of course should stay that way.
  • Schema whitelisting definitely should be optional.
  • A note which says Parsedown is not enough for your content to be secure would be good. Users which use Parsedown should be educated to think about their content and which text comes from where.

@naNuke
Copy link
Contributor Author

naNuke commented Jan 26, 2015

Is there any other markdown syntax that accepts user content directly for its generated html other than links and images?

Also note that there is no need atm to escape images because it uses link code directly.

@naNuke
Copy link
Contributor Author

naNuke commented Jan 26, 2015

I've just finished reading thru the code and added escaping on hopefully all parsed attributes.

On another note, at the moment, all of this expects user input to be UTF-8, which may leave some project vulnerable by not catching html entities under different encodings. Read here for more info

I am not quite sure how to solve the encoding issue since that might be out of scope of this fix, also the fact that parsedown is supporting php 5.2 is not helping, but my idea of possible solution is:

  • text() method would accept encoding parameter, encode from that encoding to utf-8, go through all the lines, then re-encode back to user encoding and return.

Not aware of any complications this might bring, but I am not that knowledgeable about unicode.

Let me know what you guys think :)

@hkdobrev
Copy link
Contributor

hkdobrev commented Feb 3, 2015

🌟 LGTM ✨ 💫

@hkdobrev
Copy link
Contributor

@erusev What's up with this?

@erusev
Copy link
Owner

erusev commented Mar 10, 2015

@hkdobrev I have some thoughts about it. Hopefully, next week I'll have some time that I'll be able to invest in the topic.

@ivantcholakov
Copy link

Maybe borrowing code from CodeIgniter 3 would be the solution. PHP min = 5.2.4
https://github.com/bcit-ci/CodeIgniter/blob/3.0.0/system/core/Security.php#L349

@adamthehutt
Copy link

Is there any update on this? Anything I can do to help?

@hkdobrev
Copy link
Contributor

Is there any update on this? Anything I can do to help?

👍

@erusev
Copy link
Owner

erusev commented Aug 13, 2015

@adamthehutt You could help by creating simple test data - .md / .html pairs that contain XSS input and the expected safe output. We could discuss it at #161.

@ifeltsweet
Copy link

LGTM

@malas
Copy link

malas commented Feb 2, 2016

why this is still not merged into?

@erusev
Copy link
Owner

erusev commented Feb 2, 2016

@malas one reason is the lack of tests

@malas
Copy link

malas commented Feb 2, 2016

@erusev then is it safe to say that his project is not supported? such a major issue is not addressed for almost a year, does not sound like a ready for production project.

@erusev
Copy link
Owner

erusev commented Feb 2, 2016

@malas Parsedown is a Markdown parser, its responsibility is to parse Markdown, implementing a safe mode would be nice but it has nothing to do with wether the parser is supported or not so no, it would not be safe to say that the project is not supported

@tillkruss
Copy link
Contributor

Any updates on this?

@aidantwoods
Copy link
Collaborator

aidantwoods commented Apr 30, 2017

If I may, I might suggest that encoding output should be the responsibility of the element method so that all attributes and content are always escaped in a consistent way (without having to hunt down all instances of where attributes are set, now or in future).

Choosing a safe URL (i.e. protocol whitelisting) should remain the responsibility of the specific link method though.

However, it may be a better approach to delegate this filtering process elsewhere, so that href in links and src in images are filtered to be safe at a higher level (e.g. so that extensions that completely override the link method still benefit from this). This argument would also yield similar benefits on moving encoding to the element method.

@aidantwoods
Copy link
Collaborator

Hi again everyone, I've gone ahead and implemented my own suggestion on top of the work started here (would be great to get Parsedown a safe mode – given how popular it is).

I've rebased this on master and opened a new PR (can't target this one without running into the conflicts I already had to resolve manually).

Please throw everything you've got at #495, by trying to get some XSS through. Or come up with some more test cases if you like.

@aidantwoods aidantwoods mentioned this pull request Feb 9, 2018
@erusev erusev closed this in #495 Feb 28, 2018
erusev added a commit that referenced this pull request Feb 28, 2018
Prevent various XSS attacks [rebase and update of #276]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants