New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent various XSS attacks #276
Conversation
This looks nice. I'd love to hear more opinions about it. I'm no expert on XSS. @hkdobrev ? |
I feel like this param should be named as something more generic like |
@xPaw Thanks for the input! Which characters do u think should be escaped? |
Using |
Thanks!
Is this about attributes of anchor ( |
Any attribute that user can change should be escaped. I guess in Parsedown that only applies to links really. |
What if, instead of having an option, we implement this as a default behaviour? It shouldn't have too much of a performance cost. |
That actually would make more sense. |
We dont need to escape title attribute because of the way how parsedown works, any additional quote simply breaks the regex and link wont render at all. thing like But adding escape on title is easy if there is really need for it. The check against javascript: in url is to prevent people from writing clickable javascript links which can be used as an XSS attack because you can write full strings in javascript without quotes so even with ENT_QUOTES on htmlspecialchars wont prevent someone from writing malicious code. |
@xPaw Would |
It won't, but simply checking if the string starts with See http://security.stackexchange.com/a/73751 for some examples as to why. |
I asked SO for advice at http://security.stackexchange.com/q/80003/15034 |
Updated with info from @xPaw to whitelist the protocols instead of javascript: check. |
I don't think having hardcoded schema list in the regex is the way to go. For example reddit has a bigger schema whitelist: https://github.com/reddit/reddit/blob/bb0f5b552c3a52abfac1e2147144eef8f2e7dfab/r2/r2/lib/filters.py#L177 |
Thanks for input :) added customizable whitelist, but since parsedown is parser meant for everyone I am not sure if it should include all possible schemas by default. Thoughts? |
Here a few things I think should be done in order Parsedown to be more safe against XSS. HTMLRaw HTML should be disabled in Markdown. You could use User-generated contentIf a developer is getting user-generated content and is putting it inside Markdown documents this should be escaped/validated beforehand. You could also use HTML Purifier or a similar library to actually validate/escape content after the HTML is generated. AttributesEvery attribute should be escaped for special characters, not only links. In Parsedown this would be the Schemas:
Don't take this the wrong way, I think this is a step in the right direction. About what @erusev suggested that this might be the default:
|
Is there any other markdown syntax that accepts user content directly for its generated html other than links and images? Also note that there is no need atm to escape images because it uses link code directly. |
I've just finished reading thru the code and added escaping on hopefully all parsed attributes. On another note, at the moment, all of this expects user input to be UTF-8, which may leave some project vulnerable by not catching html entities under different encodings. Read here for more info I am not quite sure how to solve the encoding issue since that might be out of scope of this fix, also the fact that parsedown is supporting php 5.2 is not helping, but my idea of possible solution is:
Not aware of any complications this might bring, but I am not that knowledgeable about unicode. Let me know what you guys think :) |
🌟 LGTM ✨ 💫 |
@erusev What's up with this? |
@hkdobrev I have some thoughts about it. Hopefully, next week I'll have some time that I'll be able to invest in the topic. |
Maybe borrowing code from CodeIgniter 3 would be the solution. PHP min = 5.2.4 |
Is there any update on this? Anything I can do to help? |
👍 |
@adamthehutt You could help by creating simple test data - .md / .html pairs that contain XSS input and the expected safe output. We could discuss it at #161. |
LGTM |
why this is still not merged into? |
@malas one reason is the lack of tests |
@erusev then is it safe to say that his project is not supported? such a major issue is not addressed for almost a year, does not sound like a ready for production project. |
@malas Parsedown is a Markdown parser, its responsibility is to parse Markdown, implementing a safe mode would be nice but it has nothing to do with wether the parser is supported or not so no, it would not be safe to say that the project is not supported |
Any updates on this? |
If I may, I might suggest that encoding output should be the responsibility of the Choosing a safe URL (i.e. protocol whitelisting) should remain the responsibility of the specific link method though. However, it may be a better approach to delegate this filtering process elsewhere, so that |
Hi again everyone, I've gone ahead and implemented my own suggestion on top of the work started here (would be great to get Parsedown a safe mode – given how popular it is). I've rebased this on master and opened a new PR (can't target this one without running into the conflicts I already had to resolve manually). Please throw everything you've got at #495, by trying to get some XSS through. Or come up with some more test cases if you like. |
Prevent various XSS attacks [rebase and update of #276]
Hello, I am back with the safe links issue.
As always this is open to discussion :) but together with already existing markupEscaped setting, this is basically a fix for issues #161 and #210, and maybe other im not aware of.