Input handling is an key aspect of secure web-design. But what makes a good data validation/sanitation engine? The implementation depends greatly on the language and framework that your site is build on. However, best practices across IT security topics maintain that “whitelisting” or “strict checking” is a more secure way to validate. The Open Web Application Security Project (OWASP) is an online community that produces freely-available articles, methodologies, documentation, tools, and technologies in the field of web application security. Below are some exerpts from their advisories on input validation . After the quotes from OWASP, the article will use the terms “strict checking” and “accept list” to refer to whitelisting and “blocklist” to refer to blacklist .
Input validation is performed to ensure only properly formed data is entering the workflow in an information system, preventing malformed data from persisting in the database and triggering malfunction of various downstream components. Input validation should happen as early as possible in the data flow, preferably as soon as the data is received from the external party.
Input Validation should not be used as the primary method of preventing XSS , SQL Injection and other attacks which are covered in respective cheat sheets but can significantly contribute to reducing their impact if implemented properly.
It is a common mistake to use black list validation in order to try to detect possibly dangerous characters and patterns like the apostrophe
'character, the string
1=1, or the
<script>tag, but this is a massively flawed approach as it is trivial for an attacker to bypass such filters.
White list validation is appropriate for all input fields provided by the user. White list validation involves defining exactly what IS authorized, and by definition, everything else is not authorized.
If it’s well structured data, like dates, social security numbers, zip codes, email addresses, etc. then the developer should be able to define a very strong validation pattern, usually based on regular expressions, for validating such input.
If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place.
The primary means of input validation for free-form text input should be:
- Normalization: Ensure canonical encoding is used across all the text and no invalid characters are present.
- Character category whitelisting: Unicode allows whitelisting categories such as “decimal digits” or “letters” which not only covers the Latin alphabet but also various other scripts used globally (e.g. Arabic, Cyrillic, CJK ideographs etc).
- Individual character whitelisting: If you allow letters and ideographs in names and also want to allow apostrophe
'for Irish names, but don’t want to allow the whole punctuation category.
But why are are accept-lists specifically better than block-lists?
But what harm can malformed GET and POST data actually do?
But some may say: who cares if you allow GET parameters that are not on the accept list? If the software doesn’t handle those elements of the array, then your software doesn’t deal with them, and so they shouldn’t be dangerous. This is a wrong direction of thinking. The example below shows how non accept-list variables can be included in an HTTP GET request that will attempt to invoke a function call in PHP. If an accept-list is used to explicitly drop the request early in the process, this request will not be rendered innefective.
If the GET parameter doesn’t belong in your application accept-list, then the application should send drop the request parameters a HTTP 500 sever error or and direct the request to the homepage. Use a header because you want them to see that you have redirected them. Sanitize at the start of the page request building process because your response will be snappy and it will be clear that you are not taking any guff. The attackers will go play somewhere else.
Malformed GET and POST data is how SQL injection attacks happen.
So, it seem obvious that strict checking GET and POST parameter and value pairs is the most secure way to validate input coming into your web-server.
So why aren’t all web-applications developed like this?
One reason applications do not always validate request input early in the process flow is because many online code examples and tutorials show how to validate input at the last possible step before the data is inserted into the database. While this does teach young developers that data needs to be validated, it does not follow best-practices within the greater context of web-application development. From that point, if a dev-ops team lead does not enforce better security practices, this style of coding will persist into enterprise application development. Also, none of the popular PHP frameworks come designed with this security minded approach to input validation built-in. That should change.
Keep in mind that properly validating input with a “strict checking” and routing responses to 500 error pages displays a more active security posture and would be attackers may give up trying to attack your site. This could result in less malicious traffic, leading to less log entries of malicious page requests to review, saving your threat detection team some effort.