CSI: PHP

"Looking at your tweets I cannot even fathom what your job is. CSI:PHP?" — @grmpyprogrammer

Stop Writing Your Own Strip Tags

| Comments

@devnuhl sent this one to us via a Gist earlier today;

Some background he gave us:

Honestly just stumbled onto this while updating the codebase. Oddly enough, it seems like the only usage of this function is in another function in the same file, which is being added to the gist now. Having gotten rid of some glaring errors in the regular expressions, I find it interesting that preg_replace is used in the one, but not the other. Just replace \s+ with a space.

Here is the original code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
<?php
#...
if (function_exists('strip_html_tags') === FALSE) {
    function strip_html_tags($str) {
        $replacements = array(
            '#<head[^>]*?>.*?</head>#siu',
            '#<style[^>]*?>.*?</style>#siu',
            '#<script[^>]*?>.*?</script>#siu',
            '#<noscript[^>]*?>.*?</noscript>#siu',
        );
        $str = preg_replace('/(<|>)\1{2}/is', '', $str);
        $str = preg_replace($replacements, "", $str);
        $str = replace_whitespace($str);
        $str = strip_tags($str);
        return $str;
    }
}
if (function_exists('replace_whitespace') === FALSE) {
    function replace_whitespace($str) {
        $result = $str;
        $empties = array(
            " \t",
            " \r",
            " \n",
            "\t\t",
            "\t ",
            "\t\r",
            "\t\n",
            "\r\r",
            "\r ",
            "\r\t",
            "\r\n",
            "\n\n",
            "\n ",
            "\n\t",
            "\n\r",
        );
        foreach ($empties as $replacement) {
            $result = str_replace($replacement, ' ', $result);
        }
        return ($str !== $result) ? replace_whitespace($result) : $result;
    }
}

The only hope I have for the original author is that they wrote this to fight back against some ridiculous encoding issues when it came to whitespace.

Assuming you don’t have ridiculous encoding issues here are some better alternatives for replacing whitespace:

Trim (PHP Docs)

1
2
3
4
5
6
7
<?php
if (function_exists('replace_whitespace') === FALSE) {
    function replace_whitespace($str) {

        return trim($str);
    }
}

You do not need a method to do this, you can simly trim($str) elsewhere in your application

As the docs state, without a second parameter trim() will strip these characters:

  • “ ” (ASCII 32 (0x20)), an ordinary space.
  • “\t” (ASCII 9 (0x09)), a tab.
  • “\n” (ASCII 10 (0x0A)), a new line (line feed).
  • “\r” (ASCII 13 (0x0D)), a carriage return.
  • “\0” (ASCII 0 (0x00)), the NUL-byte.
  • “\x0B” (ASCII 11 (0x0B)), a vertical tab.

String Replace (PHP Docs) or Preg Replace (PHP Docs)

Again with this method, you do not need to pass this into it’s own method.

To remove a space “ ” (ASCII 32 (0x20)):

1
2
3
<?php

$str = str_replace(' ', '', $str);

To remove all whitespace, we can use a regular expression:

1
2
3
<?php

$str = preg_replace('/\s+/', '', $str);

More discussion found at stackoverflow

Recap

Rolling your own functionality that PHP already provides is silly. If you believe you need to do this; do some searching. You’ve probably either found an edge case or you need to do some refactoring.

Thanks for Reading. Happy Coding!

Comments