php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #79359 Literal checking
Submitted: 2020-03-09 13:30 UTC Modified: 2020-03-09 13:43 UTC
From: craig at craigfrancis dot co dot uk Assigned:
Status: Suspended Package: *General Issues
PHP Version: Next Major Version OS:
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2020-03-09 13:30 UTC] craig at craigfrancis dot co dot uk
Description:
------------
Following up on the PHP Internals mailing list[1], and a similar idea by Matt Tait[2].

PHP should allow developers to check a variable was created from Literals.

By checking a variable `is_literal()`, it would allow us to enforce the use of parameterised SQL queries, at run time.

It would also be helpful for ORM's to ensure they don't introduce issues[3].

This is not the same as Taint Checking[4], as that allows you to use untaint(), and does not protect against issues like missing quotes:

  $sql = 'DELETE FROM ... WHERE id = ' . mysqli_real_escape_string($db, $_GET['id']);

  /delete.php?id=id

Note that string escaping is only "theoretically safe"[5] - typically due to character encoding issues.

And while SQL injection is easy to demonstrate, this can also protect against Command Line Injection, and to a certain extent, HTML Injection - as these would benefit from having a string known to be safe (made from literals), with user values being supplied separately.

Internally it would need to introduce a flag on every variable, and a single `is_literal()` function to check if a given variable has only been created by Literal(s). Unlike the taint extension, there should be no way to override this. And certain functions (e.g. mysqli_query) might use this information to generate a error/warning/notice in the future.

This is being discussed for JavaScript, via TC39 [6], to support the introduction of Trusted Types.

[1] https://news-web.php.net/php.internals/108537
    https://news-web.php.net/php.internals/106625
    https://news-web.php.net/php.internals/106631

[2] https://wiki.php.net/rfc/sql_injection_protection

[3] https://framework.zend.com/security/advisory/ZF2014-04
    https://framework.zend.com/security/advisory/ZF2016-03

[4] https://github.com/laruence/taint

[5] https://www.php.net/manual/en/pdo.quote.php

[6] https://github.com/tc39/proposal-array-is-template-object
    https://github.com/mikewest/tc39-proposal-literals

Test script:
---------------
<?php

    define('TABLE', 'example');

    $in_sql = substr(str_repeat('?,', count($ids)), 0, -1); // To create '?,?,?'

    $sql = 'SELECT * FROM ' . TABLE . ' WHERE id IN (' . $in_sql . ')';

  is_literal($sql); // Returns true

    $sql .= ' AND id = ' . mysqli_real_escape_string($db, $_GET['id']);

  is_literal($sql); // Returns false

?>


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-03-09 13:33 UTC] nikic@php.net
-Status: Open +Status: Suspended
 [2020-03-09 13:33 UTC] nikic@php.net
Please continue this discussion on the internals list, this bug tracker is not suitable for extended discussions that require going through the RFC process.
 [2020-03-09 13:40 UTC] craig at craigfrancis dot co dot uk
As to how this could be used for HTML,

Start with the Template defined as a literal, and variables supplied separately:

<?php

  $template_html = '
    <p>Hello <span id="username">?</span></p>
    <p><a>Website</a></p>';

  $values = [
      '//span[@id="username"]' => [
          NULL      => 'Name',
          'class'   => 'admin',
          'data-id' => '123',
        ],
      '//a' => [
          'href' => 'https://example.com',
        ],
    ];

?>

Then the templating engine can do the necessary checks, and be certain in knowing that the HTML string itself is safe:

<?php

  function template_parse($html, $values) {

    if (!is_literal($html)) {
      throw new Exception('Invalid Template HTML.');
    }

    $dom = new DomDocument();
    $dom->loadHTML('<?xml encoding="UTF-8">' . $html);

    $xpath = new DOMXPath($dom);

    foreach ($values as $query => $attributes) {

      if (!is_literal($query)) {
        throw new Exception('Invalid Template XPath.');
      }

      foreach ($xpath->query($query) as $element) {
        foreach ($attributes as $attribute => $value) {

          if (!is_literal($attribute)) {
            throw new Exception('Invalid Template Attribute.');
          }

          if ($attribute) {
            $safe = false;
            if ($attribute == 'href') {
              if (preg_match('/^https?:\/\//', $value)) {
                $safe = true; // Not "javascript:..."
              }
            } else if ($attribute == 'class') {
              if (in_array($value, ['admin', 'important'])) {
                $safe = true; // Only allow specific classes?
              }
            } else if (preg_match('/^data-[a-z]+$/', $attribute)) {
              if (preg_match('/^[a-z0-9 ]+$/i', $value)) {
                $safe = true;
              }
            }
            if ($safe) {
              $element->setAttribute($attribute, $value);
            }
          } else {
            $element->textContent = $value;
          }

        }
      }

    }

    $html = '';
    $body = $dom->documentElement->firstChild;
    if ($body->hasChildNodes()) {
      foreach ($body->childNodes as $node) {
        $html .= $dom->saveXML($node);
      }
    }

    return $html;

  }

  echo template_parse($template_html, $values);

?>
 [2020-03-09 13:43 UTC] requinix@php.net
-Package: PHP Language Specification +Package: *General Issues
 [2020-03-23 16:49 UTC] craig at craigfrancis dot co dot uk
I've written this up as an RFC:

https://wiki.php.net/rfc/is_literal

And mentioned it on the internals list.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Mar 29 10:01:28 2024 UTC