Update 2016: There is now a Google Closure package based on the Caja sanitizer.
It has a cleaner API, was rewritten to take into account APIs available on modern browsers, and interacts better with Closure Compiler.
Shameless plug: see caja/plugin/html-sanitizer.js for a client side html sanitizer that has been thoroughly reviewed.
It is white-listed, not black-listed, but the whitelists are configurable as per CajaWhitelists
If you want to remove all tags, then do the following:
var tagBody = '(?:[^"\'>]|"[^"]*"|\'[^\']*\')*';
var tagOrComment = new RegExp(
'<(?:'
// Comment body.
+ '!--(?:(?:-*[^->])*--+|-?)'
// Special "raw text" elements whose content should be elided.
+ '|script\\b' + tagBody + '>[\\s\\S]*?</script\\s*'
+ '|style\\b' + tagBody + '>[\\s\\S]*?</style\\s*'
// Regular name
+ '|/?[a-z]'
+ tagBody
+ ')>',
'gi');
function removeTags(html) {
var oldHtml;
do {
oldHtml = html;
html = html.replace(tagOrComment, '');
} while (html !== oldHtml);
return html.replace(/</g, '<');
}
People will tell you that you can create an element, and assign innerHTML
and then get the innerText
or textContent
, and then escape entities in that. Do not do that. It is vulnerable to XSS injection since <img src=bogus onerror=alert(1337)>
will run the onerror
handler even if the node is never attached to the DOM.