When is it best to sanitize user input?

Unfortunately, almost no one of the participants ever clearly understands what are they talking about. Literally. Only Kibbee managed to make it straight.

This topic is all about sanitization. But the truth is, such a thing like wide-termed “general purpose sanitization” everyone is so eager to talk about is just doesn’t exist.

There are a zillion different mediums, each require it’s own, distinct data formatting. Moreover – even single certain medium require different formatting for it’s parts. Say, HTML formatting is useless for javascript embedded in HTML page. Or, string formatting is useless for the numbers in SQL query.

As a matter of fact, such a “sanitization as early as possible”, as suggested in most upvoted answers, is just impossible. As one just cannot tell in which certain medium or medium part the data will be used. Say, we are preparing to defend from “sql-injection”, escaping everything that moves. But whoops! – some required fields weren’t filled and we have to fill out data back into form instead of database… with all the slashes added.

On the other hand, we diligently escaped all the “user input”… but in the sql query we have no quotes around it, as it is a number or identifier. And no “sanitization” ever helped us.

On the third hand – okay, we did our best in sanitizing the terrible, untrustworthy and disdained “user input”… but in some inner process we used this very data without any formatting (as we did our best already!) – and whoops! have got second order injection in all its glory.

So, from the real life usage point of view, the only proper way would be

  • formatting, not whatever “sanitization”
  • right before use
  • according to the certain medium rules
  • and even following sub-rules required for this medium’s different parts.

Leave a Comment