Will HTML Encoding prevent all kinds of XSS attacks?

No.

Putting aside the subject of allowing some tags (not really the point of the question), HtmlEncode simply does NOT cover all XSS attacks.

For instance, consider server-generated client-side javascript – the server dynamically outputs htmlencoded values directly into the client-side javascript, htmlencode will not stop injected script from executing.

Next, consider the following pseudocode:

<input value=<%= HtmlEncode(somevar) %> id=textbox>

Now, in case its not immediately obvious, if somevar (sent by the user, of course) is set for example to

a onclick=alert(document.cookie)

the resulting output is

<input value=a onclick=alert(document.cookie) id=textbox>

which would clearly work. Obviously, this can be (almost) any other script… and HtmlEncode would not help much.

There are a few additional vectors to be considered… including the third flavor of XSS, called DOM-based XSS (wherein the malicious script is generated dynamically on the client, e.g. based on # values).

Also don’t forget about UTF-7 type attacks – where the attack looks like

+ADw-script+AD4-alert(document.cookie)+ADw-/script+AD4-

Nothing much to encode there…

The solution, of course (in addition to proper and restrictive white-list input validation), is to perform context-sensitive encoding: HtmlEncoding is great IF you’re output context IS HTML, or maybe you need JavaScriptEncoding, or VBScriptEncoding, or AttributeValueEncoding, or… etc.

If you’re using MS ASP.NET, you can use their Anti-XSS Library, which provides all of the necessary context-encoding methods.

Note that all encoding should not be restricted to user input, but also stored values from the database, text files, etc.

Oh, and don’t forget to explicitly set the charset, both in the HTTP header AND the META tag, otherwise you’ll still have UTF-7 vulnerabilities…

Some more information, and a pretty definitive list (constantly updated), check out RSnake’s Cheat Sheet: http://ha.ckers.org/xss.html

Leave a Comment