Data sanitization: Best Practices with code examples

I am trying to understand data sanitization (not data validation) to help me write secure themes for WordPress. I have searched the Internet trying to find a comprehnsive guide for theme developers detailing best practices. There were couple of resources I came across including the codex page titled Data Validation, though none were useful to me. The codex page lists available sanitization functions, their usage and what they do, but fails to explain why you would use one over the other or in what situation would you use a particular sanitization function. The purpose of this post is to request everyone to contribute examples of bad/unsanitized code and how it should be re-written for proper sanitization. This could be general code to sanitize post title or post thumnails src or more elaborate codes that handle sanitization of $_POST data for Ajax requests.

Additionally, I’d like to know whether WordPress functions for adding/updating the database (e.g. the ones mentioned in the code block below) automatically take care of the sanitization work for you? If yes, then are there any exceptions when you would take additional measures to sanitize data sent to these WordPress functions?

add_user_meta
update_user_meta
add_post_meta
update_post_meta
//just to name a few

Also, does sanitization needs to be done differently when echoing HTML in PHP as against PHP inline of HTML? To be more clear of what I am asking, here’s the code:

<?php echo '<div class="some-div ' . $another_class . '" data-id="' . $id . '" >' . $text . '</div>'; ?>

<div class="some-div <?php echo $another_class; ?>" data-id="<?php echo $id; ?>"><?php echo $text; ?></div>

Both the above statements achieve the same thing. But do they need to be santized differently?

Solutions Collecting From Web of "Data sanitization: Best Practices with code examples"

This codex page explains it pretty well I think.

The most important and commonly used function is probably esc_attr. Take this example:

<a href="<?php print $author_url; ?>" title="<?php print $author_name; ?>"> 
  <?php print $author_name; ?>
</a>

If $author_name contains a " character you get your attribute closed, and if that character is followed by onclick="do_something();" it could get worse 🙂

Doing print esc_attr($author_name) ensures that such characters are encoded, and the browser doesn’t do things it is not supposed to do.

There’s one case where you don’t need it: when you are expecting a number, in which case you can just cast the input data to integer, for example:

print (int)$_POST['some_number'];


The meta* functions you listed there already take care about sanitizing the input for database storage, so you don’t need to worry about that.

The wpdb->prepare() method needs to be used when you do the DB queries yourself. Here’s an example:

$sql = $wpdb->prepare('
    UPDATE wp_posts SET post_title = %s WHERE ID = %d', 
      $_POST['title'], $_POST['id']);

$wpdb->query($sql);

The %s and %d keywords will get replaced with your sanitized $_POST values.

A very common mistake I see in many plugins in the WP.org repository is to pass an already prepared query to it (and badly prepared), like:

$wpdb->prepare('UPDATE wp_posts SET post_title = \''.$_POST['title'].' WHERE ...

Don’t do this 🙂

Also, does sanitization needs to be done differently when echoing HTML
in PHP as against PHP inline of HTML?

Both the above statements
achieve the same thing. But do they need to be santized differently?

No.

This video by Mark Jaquith cleared it all up for me. http://wordpress.tv/2011/01/29/mark-jaquith-theme-plugin-security/