php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #45533 cURL output includes content from redirects when CURLOPT_FOLLOWLOCATION set
Submitted: 2008-07-16 21:30 UTC Modified: 2009-05-03 23:34 UTC
Votes:2
Avg. Score:4.0 ± 1.0
Reproduced:2 of 2 (100.0%)
Same Version:1 (50.0%)
Same OS:1 (50.0%)
From: signe at cothlamadh dot net Assigned: pajoye (profile)
Status: Not a bug Package: cURL related
PHP Version: 5.2.6 OS: FreeBSD 7.0
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
(description)
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: signe at cothlamadh dot net
New email:
PHP Version: OS:

 

 [2008-07-16 21:30 UTC] signe at cothlamadh dot net
Description:
------------
When retrieving a url that utilizes a 302 redirect, along with viewable error-document content, the error-document is prepended to any REAL content that is retrieved after following the redirect.

This issue is compounded when CURLOPT_HEADER is enabled, because the error-document content is not counted in any of the getinfo data.

Reproduce code:
---------------
http://www.cothlamadh.net/~signe/.outgoing/curl_location.phps

Tested with curl 7.18.0 on FreeBSD 7 and 7.16.4-2ubuntu1 on Ubuntu Gutsy.

Expected result:
----------------
Non-header data from redirects should not be included in the returned content.

Actual result:
--------------
Without headers enabled, the content returned looks like this:


"""
RedirectErrorDocumentContent
ActualDocument
"""

There is no whitespace between the two documents.

With headers enabled, it's much much worse.

"""
RedirectHeader

RedirectErrorDocumentContent
ActualDocumentHeader

ActualDocument
"""

There is whitespace between each set of headers and its respective content, but not between the first content and the second batch of headers.

To make matters worse, curl_getinfo($cUrl, CURLINFO_HEADER_SIZE) returns the combined length of both header sections, as is expected, and curl_getinfo($cUrl, CURLINFO_CONTENT_LENGTH_DOWNLOAD) returns the length of the ActualDocument, also as expected.  The result of this is that RedirectErrorDocumentContent gets tossed in the middle invisibly.  This makes it impossible to cleanly split the document into header and content sections.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2008-07-16 22:03 UTC] signe at cothlamadh dot net
Of course, after posting the reproduction, the server that was causing the issue modified something and it's no longer reproducing against them.  This was the original output from a request to their server:

telnet www.crn.com 80
Trying 66.77.24.10...
Connected to crn.com.
Escape character is '^]'.
GET /rss/cisco/index.xml HTTP/1.1
Host: www.crn.com

HTTP/1.1 302 Found
Date: Wed, 16 Jul 2008 21:52:30 GMT
Server: Apache
Location: http://feeds.pheedo.com/rss/cisco
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
Vary: Accept-Encoding, User-Agent

119
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved <A HREF="http://feeds.pheedo.com/rss/cisco">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.29 Server at www.crn.com Port 80</ADDRESS>
</BODY></HTML>

0

Connection closed by foreign host.
 [2008-07-16 22:29 UTC] jani@php.net
Can you reproduce this using curl on command line? ie. without PHP?
 [2008-07-17 00:17 UTC] signe at cothlamadh dot net
I didn't try before reporting, and since they've made a modification, my one known verification site is lost.

I saw this bug in a previous version of php (5.1.2, I believe) and at that point in time, I couldn't replicate it with curl commandline. I upgraded to 5.2.3 (along with a newer libcurl) on that particular server and wasn't able to reproduce it after the upgrade so I didn't report it at the time.

I need to try to create a redirect script that can cause the behavior - until I do, no, I can't reproduce it in either PHP or curl.
 [2008-07-30 18:57 UTC] jani@php.net
When you can provide a script that reproduces this problem every time, give us feedback.
 [2008-08-07 01:00 UTC] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
 [2009-02-03 05:09 UTC] satishpalasa at hotmail dot com
I am facing the same problem the header content is not recorded. will there be any problem with this.
 [2009-05-03 22:06 UTC] pajoye@php.net
If you ask to include the header in the output, you will get them. Suppress this line:
curl_setopt($cUrl, CURLOPT_HEADER, true);

The same applies to the cmd line:

curl -i -L http://www.crn.com/rss/cisco/index.xml
vs
curl -L http://www.crn.com/rss/cisco/index.xml

No bug > bogus.
 [2009-05-03 22:31 UTC] signe at cothlamadh dot net
You don't understand the issue at all.  The issue is not the headers, it's that CONTENT from the redirect is included in the output, and shouldn't be.

Please fully read the original description.

Output sent from the server during the 3xx response should not be included in the output given to the user.  A server response like this:

"
HTTP/1.1 302 Found
Location: http://www.example.com/

This document has moved to http://www.example.com

HTTP/1.1 200 Found

This is the output from the new document.
"

Should not yield output from PHP that says:

"
This document has moved to http://www.example.com
This is the output from the new document.
"

The 302 content is supposed to be ignored.  CURL does this properly, but in some circumstances, PHP does NOT.  As I detail in the original issue report, the issue is WORSE when headers are turned on, because the content is included between the headers, and is impossible to parse out using the header and body length values returned from curl_getinfo()
 [2009-05-03 22:48 UTC] pajoye@php.net
It seems that you are back and finally provide feedback.


> The issue is not the headers,
> it's that CONTENT from the redirect is included 
> in the output, and shouldn't be.

Cannot reproduce it with curl 7.15.5 or 7.19.4.

PHP does not alter the contents but get it from cURL. Please check with the command line (using the same version than php).

 [2009-05-03 22:48 UTC] pajoye@php.net
> Cannot reproduce it with curl 7.15.5 or 7.19.4.

Cannot reproduce with php or curl with these versions.
 [2009-05-03 23:34 UTC] signe at cothlamadh dot net
I'm not "finally back" - I was asked for a script that can produce this problem 100% of the time.  ("When you can provide a script that reproduces this problem every time, give us feedback.") There is no such thing.  It's sporadic - it will happen with one URL for a while, and then something about the test changes and it stops reproducing.

There is no discernible pattern to reproduction.

The servers are not in my control so I have no insight into what the original settings were or what changed when the issue disappears.

I have reproduced it with every PHP up to 5.2.9, and libcurl versions:
7.16.4  (Ubuntu Gutsy)
7.18.2  (Ubuntu Jaunty)

as well as several revisions on FreeBSD that I no longer have available.
 [2013-01-04 11:10 UTC] kontakt at myseosolution dot de
I'm experiencing the same problem with version 7.20.0 when trying to login at Google (https://accounts.google.com/ServiceLogin).

I could not reproduce the error on my own webspace though.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Sep 14 15:01:27 2024 UTC