Relaxing unescaped HTML filtering inside <pre> tags?

By default, WordPress strips out any content that might be unescaped HTML in comments from unregistered users, which is good to protect against XSS, but it unnecessarily extends that filtering into <pre> elements too. On my blog, where almost every post generates comments that benefit from HTML code snippets, that filtering has caused my users (and myself) lots of frustrating trouble.

Is there a way to “fix” that overly aggressive filtering inside <pre> elements within unregistered comments, without disabling it for the rest of the comment? Preferably, in a way that survives upgrades.

Solutions Collecting From Web of "Relaxing unescaped HTML filtering inside <pre> tags?"

a small solution; the highlighting was in my blog via javascript

function pre_esc_html($content) {
  return preg_replace_callback(
    '#(<pre.*?>)(.*?)(</pre>)#imsu',
    create_function(
      '$i',
      'return $i[1].esc_html($i[2]).$i[3];'
    ),
    $content
  );
}

add_filter(
  'the_content',
  'pre_esc_html',
  9
);

Although this might be a little more than you’re looking for, WP-Syntax disables HTML filtering within <pre> tags inside posts and comments (AFAIK). It also works for WordPress 3.0, even though the website says it works only with 2.8.

If you’re looking to make it simpler, I suggest looking in wp-syntax.php within the plugin (specifically at the very bottom where they use add_filters() to see how they disable WordPress’s automatic HTML filtering within <pre> tags. You can then apply that to comments.

EDIT: I have looked at the file, and they use regex and PHP’s preg_replace_callback() to preserve the original HTML within <pre> tags. You might have to modify it to fit your needs.

You would have, for example (note: untested code):

<?php
// Unique string for placeholder
$custom_token = md5(uniqid(rand()));

// Store all the matches in an array
$custom_matches = array();

function custom_substitute(&$match) {
    global $custom_token, $custom_matches;

    $i = count($custom_matches);

    // Store the match for later use
    $custom_matches[$i] = $match;

    // Unique placeholder so that we know where to put the code that was ripped out
    return '<p>' . $custom_token . '</p>';
}

function custom_replace($match) {
    global $custom_matches;

    $i = intval($match[1]);
    $match = $custom_matches[$i];

    // The index might be off - you might want to double-check it
    return htmlentities($match[1]);
}

function custom_before_content_filter($content) {
    return preg_replace_callback("/\s*<pre+>(.*)<\/pre>\s*/siU", 'custom_substitute', $content);
}

function custom_after_content_filter($content) {
    global $custom_token;

    return preg_replace_callback("/<p>\s*" . $custom_token . "\s*<\/p>/si", 'custom_replace', $content);
}
// Run the "before" filter first, with priority of 0
add_filter('comment_text', 'custom_before_content_filter', 0);

// Now run the "after" filter
add_filter('comment_text', 'custom_after_content_filter', 99);