php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #48219 Add entry for possible content-transfer-encoding in uploaded file information
Submitted: 2009-05-10 10:49 UTC Modified: 2021-07-20 10:44 UTC
Votes:14
Avg. Score:3.9 ± 1.4
Reproduced:11 of 12 (91.7%)
Same Version:3 (27.3%)
Same OS:3 (27.3%)
From: carsten_sttgt at gmx dot de Assigned:
Status: Open Package: HTTP related
PHP Version: 5.*, 6CVS (2009-05-09) OS: *
Private report: No CVE-ID: None
 [2009-05-10 10:49 UTC] carsten_sttgt at gmx dot de
Description:
------------
Hallo,

In a HTTP POST request and Content-Type "multipart/form-data", each part can have a Content-Transfer-Encoding, which is defined in RFC2045. (See also HTML 4.01-sec17.13.4.2)

PHP only works with 7bit, 8bit and binary, because with these values, the data is not transformed.

With base64 or quoted-printabled, the data is transformed (encoded), and PHP should decode it (see also rfc2616-sec3.7.2 / rfc1867-sec3.3).

Just test the above example from RFC2388-sec4.5. That's also a problem, if you upload a file with such a transfer encoding. After move_uploaded_files, the content of such file is not really what you aspect.

And in a script, which receives such data, I don't see (can't know), if there was a Content-Transfer-Encoding for something in $_POST / $_FILES. Maybe not usual, but a Client can use such a Content-Transfer-Encoding at any time in a POST request.

Regards,
Carsten


Reproduce code:
---------------
Create a simple "test.php" in your DocumentRoot:
==================
<?php
var_dump($_POST);
?>
==================

Telnet to localhost:80 and send this request:
======================================================
POST http://localhost/test.php HTTP/1.0
Content-Length: 181
Content-Type: multipart/form-data; boundary=AaB03x

--AaB03x
Content-Disposition: form-data; name="field1"
Content-Type: text/plain;charset=windows-1250
Content-Transfer-Encoding: quoted-printable

Joe owes =80100.
--AaB03x--

======================================================



Expected result:
----------------
array(1) {
  ["field1"]=>
    string(14) "Joe owes ?100."
}


Actual result:
--------------
array(1) {
  ["field1"]=>
    string(16) "Joe owes =80100."
}


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2009-05-10 17:01 UTC] jani@php.net
Unfortunately this is a feature request so reclassifying as such. 
And this is (I think :) related also to bug #37860 and maybe some others 
I couldn't find. :)


 [2009-05-10 17:02 UTC] jani@php.net
btw. Fastest way to get this implemented is by providing a patch. :)
 [2009-05-10 19:15 UTC] carsten_sttgt at gmx dot de
> And this is (I think :) related also to bug #37860

Yes, it's similar. BTW. I think bug #37860 is a feature request and also a bug.
- Feature: It would be nice, if PHP is decoding the data
  if the coding is known (see rfc2616-sec3.5/-sec14.1.
   e.g. if the gzip-extension is loaded and
   "Content-Encoding: gzip" is set in the request
- Bug: if PHP can't/won't do this, it should raise/return a HTTP 
  status code of 415. (See rfc2616-sec14.11)


> Unfortunately this is a feature request so reclassifying as such. 

That's really something I was unsure about. See rfc2616-sec3.5. In general:
| an HTTP user agent SHOULD follow the same or similar behavior
| as a MIME user agent would upon receipt of a multipart type.
| The MIME header fields within each body-part of a multipart
| message- body do not have any significance to HTTP beyond that
| defined by their MIME semantics.
Well, a MIME user agent must decode such data. Because of the "should" in this statement, it /can/ be a feature request (but "should" is more restrictive than a "may" / "optional".).

But same section rfc2616-sec3.5:
Note: The "multipart/form-data" type has been specifically
      defined for carrying form data suitable for processing
      via the POST request method, as described in RFC 1867 [15].

And in rfc1867 (or the newer rfc2388), Content-Transfer-Encoding is explicit part of the rfc. So I think a HTTP software should know and handle Content-Transfer-Encoding. Well, Perls' CGI.pm also is not doing this ;-)

BTW:
In difference to the Content-Encoding, I can't see the Content-Transfer-Encoding in the script. So that can be really a problem. But using a Content-Transfer-Encoding is not usual (or is it not usual, because Perl/PHP can't handle this?)


> btw. Fastest way to get this implemented is by providing a patch. :)

Yeah, if my C would be better... ;-)
 [2009-05-15 00:14 UTC] carsten_sttgt at gmx dot de
After a quick view to rfc1867.c, I found a lot of:
| #if HAVE_MBSTRING && !defined(COMPILE_DL_MBSTRING)

So I guess a correct behavior, according to rfc2616/rfc1867, is only possible and working, if you have the mbstring extension, and if this is not a shared extension. (why does this not work with a shared extension?)

(can't test this, because this extension is always shared in my installations.)

It's like bug #37860:
A HTTP UA is sending such a valid POST request and PHP is answering with a status 200. And both, browser an script, must assume all is ok. Instead the data is garbled.

In contrast to bug #37860, it's not defined to return a status 415, (but maybe the best solution for now?).

In case of bug #37860, the return status 415 is defined for such situation. But PHP is also not doing this :-/ Also a problem, if all parts are thinking the POST request is OK.


Regards,
Carsten
 [2009-11-18 08:23 UTC] codeslinger at compsalot dot com
this also afflicts Base64 encoding which is a massively prevalent method for binary transfers....

I am really surprised to encounter this *bug*

It means that everything php is doing with regard to saving/moving uploaded files is wasted/useless effort.  Since the content transfer type is not even accessible, we must instead do our own parsing of the raw post data.  How can that be by design???
 [2009-11-19 20:59 UTC] avalon73 at caerleon dot us
Has anyone noticed this?

http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.4.5

RFC 2616 is the one for HTTP 1.1, and the first paragraph reads as such:

---
HTTP does not use the Content-Transfer-Encoding (CTE) field of RFC 2045. Proxies and gateways from MIME-compliant protocols to HTTP MUST remove any non-identity CTE ("quoted-printable" or "base64") encoding prior to delivering the response message to an HTTP client.
---

Perhaps that's why PHP hasn't supported it, and why no browser in the real world (that I know of... I could be mistaken) ever sends base-64 or quoted-printable encoded multipart/form-data parts?
 [2009-11-19 23:00 UTC] carsten_sttgt at gmx dot de
> Has anyone noticed this?
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.4.5

Sure, but in rfc2616-sec3.html#sec3.7.2 you can read, that especially multipart/form-data is defined in RFC1867 (RFC2388). And there you can read about the content-transfer-encoding.

Regards,
Carsten
 [2009-11-19 23:55 UTC] avalon73 at caerleon dot us
RFC 2616 section 3.2.7 itself says nothing about the use of Content-Transfer-Encoding (CTE).

RFCs 1867 and 2388 both mention the possibility of the multipart/form-data MIME type being used with email as a transport as well as HTTP.  The CTE header and the "base64" and "quoted-printable" encodings were included in MIME specifically for moving 8-bit data over 7-bit transport protocols, which included basic (non-enhanced) SMTP at the time of its creation (and still does, if you adhere strictly to the RFCs).  The other standard encodings defined for the CTE header (7bit, 8bit, and binary) imply no content encoding at all.

HTTP is and has always been an 8-bit clean transport protocol.  Because of that, it has no need for any encodings designed to move 8-bit data over a 7-bit protocol.  In fact, the use of such encodings would only needlessly add bulk to the data being transferred.  If no such transformation is necessary, the addition of the CTE header is also not necessary.  Section 19.4.5 of RFC 2616 would seem to merely codify this fact, effectively forbidding the use of CTE over HTTP.
 [2009-11-20 21:46 UTC] codeslinger at compsalot dot com
Well, I mostly deal with email, especially including webmail.  and as far as I can see, nearly all attachments are base64 encoded. In fact it is hard to find anything that isn't,  unless it's plain text.

So, I guess I was a little bit confused about the difference between HTTP uploads and email uploads, since they both use MIME and typically they both contain web pages.

With regard to this feature request.  I would really like for php to make the MIME Header info available.  That way we can easily do our own decoding as long as we have access to the info that tells us what needs to be decoded, currently we don't, at least not with out kludge hacks, and that makes it hard to do something which should be simple.
 [2010-12-20 08:55 UTC] jani@php.net
-Summary: POST data is not decoded, according to Content-Transfer-Encoding +Summary: Add entry for possible content-transfer-encoding in uploaded file information -Package: Feature/Change Request +Package: HTTP related
 [2010-12-20 08:55 UTC] jani@php.net
Updated, shouldn't it be enough if we add the encoding if it is passed by the uploader? Then you could handle the data easier. Any other fields that are missing? :) I don't think PHP should decode it automatically..
 [2012-03-27 14:53 UTC] alastair at alastairs-place dot net
The claim that HTTP, as a binary supporting protocol, does not need Content-Transfer-
Encoding for form POSTs is bogus.

The problem is very simple; if your MIME boundary is set to (say) "test", then a POST with 
a body like this:

  --test
  Content-Disposition: form-data; name="frob"
  
  $frobValue
  --test--

can go wrong if $frobValue happens to contain something like

  This is line one.
  --test
  Content-Disposition: form-data; name="hack"

  This is very naughty.

This could happen from a web browser, if the boundary was predictable.  It's unlikely to 
happen by accident (since in practice the boundary will be randomly generated and contain a 
significant number of characters), but it could nevertheless happen.

There are two ways to deal with this problem.  The first is to set Content-Length headers 
on the subparts; the MIME parser can then read that many bytes in the knowledge that no 
*real* boundary will be within the data.  The second is to use Content-Transfer-Encoding 
and either send the data as e.g. base64, or use quoted-printable in combination with a 
boundary that is not valid quoted printable data.

Unfortunately, as far as I can see from reading rfc1867.c, PHP SUPPORTS NEITHER!  Even for 
binary files, PHP *ignores* Content-Length and scans for a boundary instead.  Result: there 
is a statistical likelihood, however, small, that the POST data will not be as expected.
 [2012-03-27 15:55 UTC] alastair at alastairs-place dot net
I'll add that RFC2388 glibly states that "a boundary is selected that does not 
occur in any of the data".  This is not, of course, the implementation that some 
browser writers have chosen, nor would following that recommendation be 
reasonable in the general case, since it might necessitate pre-scanning a large 
file prior to upload; rather, they pick a random boundary string that they think 
is not likely to come up in practice.

RFC2388 also quite clearly states that

  Each part may be encoded and the "content-transfer-encoding" header
  supplied if the value of that part does not conform to the default
  encoding.
 [2021-07-20 10:44 UTC] cmb@php.net
At least from a theoretical point of view, this looks like a
serious issue to me.  When processing multipart/form-data, the
Content-Transfer-Encoding of the individual parts is completely
ignored, and the user has no way to detect it; even php://input
can't be used, since it is not available for that Content-Type.

Automatically decoding would likelyy be a relevant BC break, but
at least we should provide access to the Content-Transfer-Encoding
of the parts; that wouldn't be much of aproblem regarding $_FILES,
but how to report that for $_POST variables?
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Mon Dec 02 23:01:29 2024 UTC