SafeHTML
SafeHTML is anti-XSS HTML parser, written in PHP.
written and distributed under the BSD License[link6]
About this project
This parser strips down all potentially dangerous content within HTML:- opening tag without its closing tag
- closing tag without its opening tag
- any of these tags: "base", "basefont", "head", "html", "body", "applet", "object",
"iframe", "frame", "frameset", "script", "layer", "ilayer", "embed", "bgsound",
"link", "meta", "style", "title", "blink", "xml" etc.
- any of these attributes: on*, data*, dynsrc
- javascript:/vbscript:/about: etc. protocols
- expression/behavior etc. in styles
- any other active content
It also tries to convert code to valid HTML, but htmltidy is far better solution for this task.
Whats new?
1.3.12- added HTML5 tag terminators for paragraph
- removed obsolete and deprecated HTML elements
- added new HTML5 Block-level elements
- Replaced preg_replace() e modifier with preg_replace_callback
- UTF-7 XSS vulnerability fixed
- Allowed tags with setAllowTags() method.
- AllowTags can be disabled using resetAllowTags()
- Added 'dl' to the list of 'lists' tags.
- Added 'callto' to the white list of protocols.
- Added white list of "namespaced" attributes.
- More accurate UTF-7 decoding.
- Two serious security flaws fixed: UTF-7 XSS and CSS comments handling.
- Security flaw (improper quotes handling in attributes' values) fixed. Big thanks to Nick Cleaton.
- Dumb bug fixed (some closing tags were ignored).
- Two holes (with decimal HTML entities and with \x00 symbol) fixed.
- Class rewritten under PEAR coding standards.
- Class now uses unmodified HTMLSax3 from PEAR.
- To the list of table tags added: "caption", "col", "colgroup".
- It was possible to create XSS with hexadecimal HTML entities. Fixed. Big thanks to Christian Stocker.
- "id" and "name" attributes added to dangerous attributes list, because malefactor can broke legal javascript by spoofing ID or NAME of some element.
- New method parse() allows to do all parsing process in two lines of code. Examples also updated.
- New array, closeParagraph, contains list of block-level elements. When we open such element, we should close paragraph before. . It allows SafeHTML to produce more XHTML compliant code.
- Added "webcal" to white list of protocols for those who uses calendar programs (Mozilla/iCal/etc).
- Now SafeHTML strips down table elements when we are not inside table.
- Now SafeHTML correctly closes unclosed "li" tags: before opening "li" of the same nesting level.
- New "dangerous" protocols: hcp, ms-help, help, disk, vnd.ms.radio, opera, res, resource, chrome, mocha, livescript.
- <XML> tag was moved from "tags for deletion" to "tags for deletion with content".
- New "dangerous" CSS instruction "include-source" (NN4 specific).
- New array, Attributes, contains list of attributes for removal. If you need to remove "id" or "name" attribute, just add it to this array.
- Now it is possible to choose between white-list and black-list filtering of protocols. Defaults are "white-list".
This list is: "http", "https", "ftp", "telnet", "news", "nntp", "gopher", "mailto", "file".
- For speed purposes, we now filter protocols only from these attributes: src, href, action, lowsrc, dynsrc, background, codebase.
- Opera6 XSS bug ([\xC0][\xBC]script>alert(1)[\xC0][\xBC]/script> [UTF-8] workarounded.
New "dangerous" tag: plaintext.
1.0.3.Added array of elements that can have no closing tag.
1.0.2.Bug fix: <img src="javascript:alert(1);"> attack.
Thanks to shmel.
1.0.1.Thanks to shmel.
Bug fix: safehtml hangs on <style></style></style> code.
Thanks to lj user=electrocat.
1.0.0.Thanks to lj user=electrocat.
First public release
Advanced Topics
- Usage of SafeHTML[link7]
Dependencies for SafeHTML
- PHP 8.0
- HTMLSax3 3.0.0
Authors and copyrights
Author: Roman Ivanov[link8].Copyrights:
© 2006–2023, WackoWiki Team
© 2004–2005, Roman Ivanov
© 2004–2005, Pixel-Apes
© 2004–2005, JetStyle
Contacts
If you found any bugs in this parser, please inform us — Bugtracker[link1].- [link1] https://wackowiki.org/bugs/
- [link2] https://bitbucket.org/wackowiki/wackowiki/src/master/src/lib/safehtml/
- [link3] https://bitbucket.org/wackowiki/wackowiki/src/master/src/lib/HTMLSax3/
- [link4] http://pear.php.net/package/HTML_Safe/
- [link5] https://github.com/pear/HTML_Safe/blob/trunk/HTML/Safe.php
- [link6] http://www.opensource.org/licenses/BSD-3-Clause
- [link7] https://wackowiki.org/doc/Dev/Projects/SafeHTML/Usage
- [link8] https://wackowiki.org/doc/Users?profile=RomanIvanov