Vasilis Papoutsakis's picture

Securing code before output

How to properly filter data on your code to avoid XSS on Drupal
22.01.14

So… you need to write some code, in which you collect data from somewhere (user’s input, database data or an external source of some kind etc..) and display them to your Drupal website in HTML output (for example in a custom block or page).

What is XSS and how critical is it?

Ok… first things first: what is XSS? An XSS (or Cross-site scripting) vulnerability enables attackers to inject client-side script into Web pages viewed by other users/victims, and possibly to hijack user sessions, deface web sites, insert hostile content, redirect users, hijack the user’s browser using malware, etc.

XSS is constantly in OWASP’s Top 10 Vulnerabilities List (revised every 3 years):

As stated by OWASP:

“You are vulnerable if you do not ensure that all user supplied input is properly escaped, or you do not verify it to be safe via input validation, before including that input in the output page. “

https://www.owasp.org/index.php/Top_10_2013-A3-Cross-Site_Scripting_(XSS)

It is estimated that about 82% of the applications are containing this vulnerability at some point:

Image from Blackhat Usa 2013, The Web IS Vulnerable - XSS Defense on the Battle Front

The Drupal philosophy - Escape or filter when appropriate

Probably the most common security advice you will hear is that “you should always sanitize/validate/filter user-input data”. The problem with a content management system like Drupal is that the input is not well defined. As written in Drupal’s community documentation “Handle user input with care”, the solution is to use an appropriate filter when needed. Just before sending plain text or HTML to the browser or mixing plain text with HTML, escape it using the proper Drupal function or when you want write an SQL query use the escaping functions provided by the database API.

If you look at how data are stored on Drupal’s database, it should be clear that data are stored in database without any changes at all, and they are being filtered only upon displaying.

Body’s “value” is how the data are stored, but “safe_value” is how the data will be printed to the browser.

Drupal’s approach has of course some pros & cons (storing possibly malicious code in database and filter it later), but if this approach is followed by the developer as well, most issues can be avoided (but always have in mind that nothing is 100% secure).

How to filter output data in code

In this post, we will explain how you should filter in your code the different kinds of output data in order to ensure your website is protected from these attacks.

Image taken from Doing Drupal Security Right presentation, on DrupalCon London 2011 presented by Gábor Hojtsy, Acquia

This image demonstrates the Drupal’s approach you should use.

Plain Text

Let’s start with printing data as Plain Text, which means print the data as is. Any HTML tags or other special characters, should be altered so the user will be shown the text exactly as is, without any processing. The function you should use to filter your data is check_plain(), which encodes special characters (converting them to HTML entities) in a plain-text string for display as HTML and also validates strings as UTF-8 to prevent XSS on Internet Explorer 6.

This image shows the data as is, before & after filtering. Special characters are encoded on filtering so the text will appear to the screen as stored.

How the data appear unfiltered (before the horizontal line) & filtered (after). When unfiltered script the (alert function) has already run and not shown to the user. In an actual xss attack the script would have run without the victim noticing it.

URIs

URIs are usually Plain Text, but some protocols that should be filtered (for example 'javascript:' has been used for XSS attacks). There are different ways to filter a URI depending on how you to use it later. If you just intend to print it (as an img src for example), the function you should use is check_url() which actually calls drupal_strip_dangerous_protocols(), which stripts all protocols except: ftp, http, https, irc, mailto, news, nntp, rtsp, sftp, ssh, tel, telnet, webcal , and check_plain() to make sure the text is plain to be shown.

But if you intend to pass it as an argument to t(), l(), Drupal_attributes(), or another function that will call check_plain() separately, you should only use drupal_strip_dangerous_protocols() to avoid entities being converted twice (See “Handle text in a secure fashion).

For email addresses both check_plain() and check_url() are solid choices (as with the prefix “mailto:” they are URIs).

Again, the data before & after filtering. The encoding is the same as plain text.

And the results after filtering as shown to the users before & after filtering. Notice in protocol test, the unexpected protocol “javascript:” is filtered.

Rich Text / HTML

In some other cases you want to display the text as formatted by user and you need to apply some filters on it. In this case we have 2 methods:

  • check_markup() , which allows you to use a Text Format, e.g. “Filtered HTML”, and apply its filters to our text.
  • filter_xss() or filter_xss_admin(), that removes characters & constructs that can trick browsers and used for XSS, and allows defined tags only. The difference between filter_xss and filter_xss_admin is that the second allows almost all tags that can be used in a HTML body and its intended for admin-only use.

Which of them you should use? Usually if you want to apply more complex filters made by Drupal filter system, like “Convert line breaks into HTML (i.e. <br> and <p>)”, you should use check_markup(). But, if you want to avoid loading the whole Drupal Text Format (and reduce complexity in execution), the filtering you should use is filter_xss().

The results before and after filtering. In Rich text the function check_markup() has been used, while in HTML the function used for filtering is filter_xss().

The results as shown to the user’s browser.

Trusted

Well… if you actually want to allow the use of javascript for some reason (and do not want to filter your data), then you must be sure that it is allowed to 100% trusted users. Personally, I would only allow this -if needed- for the administrator only, because if a malicious user has access in the administrator account, has already dozens of ways to do anything he wants.

Other uses

Another common case you might use, is if you want to add data from any source to functions like t() or format_plural(). In this case you should always use placeholders for additional data/arguments on these. In javascript, Drupal.t() and Drupal.format_plural() can be used instead.

There are also some functions and APIs which also sanitize their input text (like menu item or breadcrumb titles, some Form API (FAPI) elements or functions with placeholders) usually passing it through check_plain(). Using these, methods filtering the data again might result in unwanted output. See “Handle text in a secure fashion” for additional information.

Security is never enough…

Remember this post only covers a small part of security for Drupal and that you can never have an 100% secure site. But you can do a few things to avoid the most common vulnerabilities. For more security configuration advice you should read Securing your Site from Drupal’s documentation, and if you are writing code for your Drupal project (or why not for drupal community) read Writing Secure Code section.

[Image by: John Maravelakis]