php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #79481 data:// wrapper can hidden some characters
Submitted: 2020-04-16 05:47 UTC Modified: 2020-04-16 07:54 UTC
From: j7ur8 at qq dot com Assigned:
Status: Verified Package: Streams related
PHP Version: 7.4.4 OS: Windows Linux
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: j7ur8 at qq dot com
New email:
PHP Version: OS:

 

 [2020-04-16 05:47 UTC] j7ur8 at qq dot com
Description:
------------
From `https://www.php.net/manual/en/wrappers.data.php#refsect1-wrappers.data-description`, i know it refer to RFC 2397. And i found it may not defined securely ?

From RFC 2397

Syntax:
       dataurl    := "data:" [ mediatype ] [ ";base64" ] "," data
       mediatype  := [ type "/" subtype ] *( ";" parameter )
       data       := *urlchar
       parameter  := attribute "=" value

and if <mediatype> is omitted, it defaults to `text/plain;charset=US-ASCII`. That means we can change it freely, such as `data:text/plain;charset=iso-8859-7,%be%fg%be` would use the iso-8859-7 to handle datas. But i do not think php  supports it wholly which results to characters can hide in the data wrapper stream.

Test script:
---------------
<?php
echo file_get_contents('data:,cc')."\n"; # valid
echo file_get_contents('data://asdc/asd;ccc=ccc,cc')."\n"; # with bad characters hide in

Expected result:
----------------
cc 
cc

Actual result:
--------------
cc 
cc

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-04-16 05:50 UTC] stas@php.net
-Type: Security +Type: Bug
 [2020-04-16 05:50 UTC] stas@php.net
Doesn't look like there's any security issue here.
 [2020-04-16 07:54 UTC] cmb@php.net
-Status: Open +Status: Verified
 [2020-04-16 07:54 UTC] cmb@php.net
> That means we can change it freely, such as
> `data:text/plain;charset=iso-8859-7,%be%fg%be` would use the
> iso-8859-7 to handle datas.

No, that is not the case.  Unless the base64 flag is set, the data
are just urldecode()d, and any desired/required character encoding
conversion has to be done by the application, which can retrieve
the specified charset from the stream meta data[1].

Given that PHP strings are actually byte arrays, this is pretty
much to be expected, but should be documented nonetheless.

[1] <https://3v4l.org/uGqjf>
 [2020-04-16 07:54 UTC] cmb@php.net
-Type: Bug +Type: Documentation Problem
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 30 14:01:28 2024 UTC