php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #69886 Addressing problems with escaping (security)
Submitted: 2015-06-19 15:35 UTC Modified: -
Votes:2
Avg. Score:3.0 ± 2.0
Reproduced:1 of 2 (50.0%)
Same Version:1 (100.0%)
Same OS:1 (100.0%)
From: craig at craigfrancis dot co dot uk Assigned:
Status: Open Package: Strings related
PHP Version: Irrelevant OS:
Private report: No CVE-ID: None
 [2015-06-19 15:35 UTC] craig at craigfrancis dot co dot uk
Description:
------------
The major security vulnerabilities with websites are STILL Injection/XSS related... see OWASP top 10 for 2010 and 2013 (A1 and A3).

Many developers still have no idea about the security vulnerabilities when they do things like:

    echo $_GET['name'];

Prepared statements try (and fail) to solve this:

    $mysqli->prepare('SELECT name FROM table WHERE id = ' . $_GET['id']);

ORM's try (and fail) to address this:

    $conn->createQueryBuilder()
        ->select('u.id')
        ->addSelect('p.id')
        ->from('users', 'u')
        ->leftJoin('u', 'phonenumbers', '
            u.id = p.user_id AND
            p.type = ' . $_GET['type']);

Templating systems try (and fail) to fix this:

    {{ var }}

As an aside... "|escape" or "autoescape" is still not the default in Twig?

--------------------------------------------------

I'm proposing an error_reporting() mode (or extending E_STRICT) that when be enabled, should help identify (highlight?) the most common mistakes.

Importantly, the developer won't need to see any of this... they just enable it, and the logs get filled with PHP notices, telling them to fix these mistakes.

--------------------------------------------------

Internally (zval), every PHP string gets marked with a escaping type.

A simple string defined in the PHP code itself (T_CONSTANT_ENCAPSED_STRING) would be a ETYPE_CONSTANT... this is a special type of string, where we know the developer has typed it in the PHP code themselves (safe... ish).

Any string from the outside world (request parameters, database, files, etc) would be an ETYPE_PLAIN... this is pretty much the default for all new strings.

Any string that is encoded/escaped, will have its own ETYPE... e.g. the return strings from these functions would be:

    htmlentities() -> ETYPE_HTML
    mysqli_real_escape_string() -> ETYPE_SQL
    urlencode() -> ETYPE_URL
    escapeshellarg() -> ETYPE_SHELL
    escapeshellcmd() -> ETYPE_SHELL
    preg_quote() -> ETYPE_PREG

PHP will be configured (ini_set) with an output type... as we are normally creating websites, the default output would be ETYPE_HTML, but a CLI script might be ETYPE_PLAIN.

And there would need to be a function to override the type for special cases:

    $result = mysqli_query('SELECT name, bio_html FROM person WHERE id = 3');

    if ($row = mysqli_fetch_assoc($result)) {

        $name_html = htmlentities($row['name']);
        $bio_html = string_encoding_set($row['bio_html'], ETYPE_HTML);

            // Assume bio_html has already
            // been though htmlpurifier :-)

    }

--------------------------------------------------

When concatenating/printing the strings, PHP would do a simple type check, e.g.

    $a = 'Hi ' . $_GET['name'];

        // ETYPE_CONSTANT + ETYPE_PLAIN
        // This is fine,
        // $a is now ETYPE_PLAIN.

    echo $a;

        // Log error!
        // The output is ETYPE_HTML,
        // We can't echo a ETYPE_PLAIN!

    $b = $_GET['c'] . htmlentities($_GET['d']);

        // ETYPE_PLAIN + ETYPE_HTML
        // Log error!
        // Cannot mix types like this.

    $sql = 'SELECT
                *
            FROM
                table
            WHERE
                id = ' . $_GET['id'];

        // ETYPE_CONSTANT + ETYPE_PLAIN + ETYPE_CONSTANT
        // This is fine (for now).
        // $sql is now ETYPE_PLAIN.

    mysqli_query(sql);

        // Log error!
        // Cannot accept ETYPE_PLAIN,
        // It must be ETYPE_SQL!

    $sql = 'SELECT
                *
            FROM
                table
            WHERE
                id = "' . mysqli_real_escape_string($_GET['id']) . '"';

        // ETYPE_CONSTANT + ETYPE_SQL + ETYPE_CONSTANT
        // This is fine.
        // $sql is now ETYPE_SQL.

    mysqli_query(sql);

        // Good :-)

--------------------------------------------------

The intention is to identify when escaped strings are being mixed with un-escaped strings... and when un-escaped strings are incorrectly passed to certain functions (or the output).

This isn't a complete solution, but I believe it will catch a lot of errors the other solutions do not (cannot?) address.

I am also aware of ValueObjects... I'm fairly sure these would only get you half way there, and suffer from the same problems as the existing solutions.

Some of the problems (which you might be able to come up with solutions for), are listed below in the comments.


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2015-06-19 15:36 UTC] craig at craigfrancis dot co dot uk
PHP won't notice the missing quotes

    $sql = 'SELECT ' . mysqli_real_escape_string($_GET['a'])

Maybe a database abstraction (ORM?) could define it's own ETYPE that is an escaped string with quote marks already added?

    ./script.php?name=Jo

    $name_sql = $db->escape($_GET['name']);

Which returns the ETYPE_SQL_QUOTED value of '"Jo"'?

--------------------------------------------------

Building paths to resources might be tricky:

    ./script.php?file=../../index.php

    function safe_file_name($name) {
        $name = preg_replace('/[^a-zA-Z0-9_\- ]/', '', $name);
        string_encoding_set($name, 'path');
        return $name;
    }

    $path = '/path/to/' . safe_file_name($_GET['file']);

        // ETYPE_CONST + ETYPE_PATH

    unlink($path);

Note that 'ETYPE_PATH' hasn't been defined... yet?

Maybe a helper object would be needed for this? something that will understand directory structures?

--------------------------------------------------

Probably more for the templating engine to deal with, but:

    $url_html = htmlentities(...);
    $name_html = htmlentities(...);

    <a href="<?= $url_html ?>">xxx</a>
    <input type="text" value=<?= $name_html ?> />

Both are valid with the above rules, but the url example could allow "javascript:", and the second can be "Jo onclick=javascript"
 [2020-12-26 10:54 UTC] craig at craigfrancis dot co dot uk
Full RFC and discussion at:

https://wiki.php.net/rfc/is_literal

https://github.com/craigfrancis/php-is-literal-rfc
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 30 14:01:28 2024 UTC