php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #71929 Certification information (CERTINFO) data parsing error
Submitted: 2016-03-31 06:18 UTC Modified: 2016-07-28 03:46 UTC
Votes:1
Avg. Score:3.0 ± 0.0
Reproduced:0 of 0 (0.0%)
From: asmqb7 at gmail dot com Assigned: pierrick (profile)
Status: Closed Package: cURL related
PHP Version: 7.0.4 OS: Linux (Arch Linux)
Private report: No CVE-ID: None
 [2016-03-31 06:18 UTC] asmqb7 at gmail dot com
Description:
------------
As demonstrated in "Actual Result," the cURL PHP library appears to be mis-parsing the Subject and Issuer lines of the TLS certification information.

According to https://curl.haxx.se/libcurl/c/CURLINFO_CERTINFO.html

> The info chain is provided in a series of data in the format "name:content" where the content is for the specific named data.

The above URL references certinfo.c, which can be found at https://curl.haxx.se/libcurl/c/certinfo.html (the "Download raw" link toward the top currently points to https://raw.githubusercontent.com/curl/curl/master/docs/examples/certinfo.c).

This is a turnkey certificate demo that connects to https://example.com/ and displays the certificate(s) it gets back. For easy copy-pasting:

wget https://raw.githubusercontent.com/curl/curl/master/docs/examples/certinfo.c
cat certinfo.c; read  # trust but verify
gcc -o certinfo certinfo.c -lcurl
./certinfo

The PHP test script attached to this bug similarly connects to https://example.com/ with CURLOPT_CERTINFO set to TRUE, then print_r()s the contents of curl_getinfo(), which contains the retrieved certificate info. The relevant portions of the output I get on my machine can be found under "Actual Results".

I've tested this on PHP 7.0.4 as noted, but not any other versions. This option appears to have been committed in 2009 (https://bugs.php.net/bug.php?id=49253); it may be interesting to test in PHP versions from that point.

I'm reporting this as a bug and not a security issue as I do not consider it remotely exploitable and most (all?) developers would have noticed the anomaly in the development process.

Considering the above, however, I'm not sure if fixing this would break poorly written workarounds :( especially if this is a long-term bug and not a recent regression.

Test script:
---------------
<?php

$ch = curl_init();

curl_setopt_array($ch, [
	CURLOPT_CERTINFO => true,
	CURLOPT_URL => "https://example.com/",
	CURLOPT_RETURNTRANSFER => true
]);

curl_exec($ch);

print_r(curl_getinfo($ch));

?>


Actual result:
--------------
Irrelevant data removed; a small amount of context left in for comparison.

Output of certinfo.c:

...
Subject:C = US, ST = California, L = Los Angeles, O = Internet Corporation for Assigned Names and Numbers, OU = Technology, CN = www.example.org
Issuer:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA
Version:2
Serial Number:0e64c5fbc236ade14b172aeb41c78cb0
Signature Algorithm:sha256WithRSAEncryption
Public Key Algorithm:rsaEncryption
...
Subject:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA
Issuer:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert High Assurance EV Root CA
Version:2
Serial Number:04e1e7a4dc5cf2f36dc02b42b85d159f
Signature Algorithm:sha256WithRSAEncryption
Public Key Algorithm:rsaEncryption
...


Output of PHP script:

  [certinfo] => Array
        (
            [0] => Array
                (
                    [Subject] => Array
                        (
                            [C ] =>  US, ST = California, L = Los Angeles, O = Internet Corporation for Assigned Names and Numbers, OU = Technology, CN = www.example.org
                        )

                    [Issuer] => Array
                        (
                            [C ] =>  US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA
                        )

                    [Version] => 2
                    [Serial Number] => 0e64c5fbc236ade14b172aeb41c78cb0
                    [Signature Algorithm] => sha256WithRSAEncryption
                    [Public Key Algorithm] => rsaEncryption
                    ...
       )
       [1] => Array
                (
                    [Subject] => Array
                        (
                            [C ] =>  US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 High Assurance Server CA
                        )

                    [Issuer] => Array
                        (
                            [C ] =>  US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert High Assurance EV Root CA
                        )

                    [Version] => 2
                    [Serial Number] => 04e1e7a4dc5cf2f36dc02b42b85d159f
                    [Signature Algorithm] => sha256WithRSAEncryption
                    [Public Key Algorithm] => rsaEncryption
                    ...
                )
        )
)

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-04-24 00:49 UTC] pierrick@php.net
-Status: Open +Status: Verified -Assigned To: +Assigned To: pajoye
 [2016-04-24 00:49 UTC] pierrick@php.net
Thanks for reporting the bug. I'm assigning this to the initial commiter to see what he's thinking about it because I have no clue if we should fix it or not since the bug was introduced in the first version in 5.3.
 [2016-04-24 04:18 UTC] pajoye@php.net
Do you refer to the
  [C ] =>  US, 

Part? C= should be kept as part of the content and not use as index.

I cannot remember having written the cert info split code but the slit cert function must be the place to look at if we are willing to fix that.

I think it should ve fixed. And it should also match the openssl output
 [2016-04-24 16:22 UTC] pajoye@php.net
Alternatively explode each part of the subject and issuer into an associative array, but that could break more code out there.

Comments&suggestions welcome :)
 [2016-04-24 16:24 UTC] pajoye@php.net
And the cert info is used quite a lot, not sure if they actually use this part of the info (less than pkey or signature but still):

https://github.com/search?l=php&q=CURLOPT_CERTINFO+&type=Code&utf8=%E2%9C%93
 [2016-04-25 01:34 UTC] asmqb7 at gmail dot com
In a side project I'm currently working on I'm using...

  $m = preg_match('/O = (.+?(?=, [A-Z]+ = ))/',
    $info['certinfo'][0]['Subject'], $tls_org);

...a mildly involved regular expression to (hopefully) extract useful info from the Organization field, which I thought I'd include for fun in my logging output. That said, my cURL code is only being used for one site right now, so I'm just messing around and having fun testing functionality - I have VERIFYPEER=2 and VERIFYHOST=1 doing the heavy lifting.

But I think the fact that I'm reluctant to use this for production functionality says something. I think there are two main aspects to this bug.

1. First of all, the practical aspect.

It's unfortunate GitHub and Searchcode strips quotes from searches; if only we could search for "curlinfo" (including quotes), we'd find all the array index references :P. I poked around for a bit but didn't find much.

I think that it's reasonable to assume that these fields have never been used for anything noteworthy because a bug report like this has never been raised before.

With this in mind, fixing thing and putting a "there was a bug" in the changelog for curl_getinfo seems like a good way to notify anybody using this that the function works properly now.

For my own part, I'm fixing the bug this way:

foreach ($curlinfo['certinfo'] as &$certinfo) {
    foreach (['Subject', 'Issuer'] as $n) {
        if (is_array($certinfo[$n])) {
            $certinfo[$n] = array_keys($certinfo[$n])[0].
                '='.array_values($certinfo[$n])[0];
        }
    }
}

This does use a couple moderately magic/new PHP features such as foreach pass-by-reference and function indices (if I've used the right feature names) to keep the code size down and this snippet may need to be further simplified for extra backward compatibility (I'm not sure if the features I mentioned work in PHP 5.3 - I think they do?).

FWIW, this looks solid enough to me to be able to confidently hand it off to anybody and say "don't worry about how this works, just run it immediately after you run curl_getinfo() when you want the Subject and Issuer."

The noteworthy part is the is_array(), so the code will be a no-op when this is fixed. The loops will always run and this is noted, but I doubt curl_getinfo() is unlikely to be used in time-critical sections of PHP so that should be fine.

For applications with logging capabilities, this variant might be interesting:

foreach ($curlinfo['certinfo'] as &$certinfo) {
    foreach (['Subject', 'Issuer'] as $n) {
        if (is_array($certinfo[$n])) {
            $certinfo[$n] = array_keys($certinfo[$n])[0].
                '='.array_values($certinfo[$n])[0];
        } else {
            print "PHP is fixed!\n";
            break;
        }
    }
}

So that's the practical aspect.

2. Now for the fundamental (or "philosophical," for want of a better word) aspect of this bug.

I think that nobody is using this information because it's too arbitrarily formatted. I would not be surprised if nobody uses cURL's own info data structures for anything much beyond verifying the key or hash or whatnot, because there's too much room for error in a string like:

  Subject:C = US, ST = California, L = Los Angeles, O = Internet Corporation for Assigned Names and Numbers, OU = Technology, CN = www.example.org

I don't consider the above unambiguously parseable.

One possibility might be the ability to retrieve a binary info structure from cURL that could be passed to openssl_x509_parse() would, in my opinion, be the real solution to this bug. I mean, it's amazing - that function gives back an array with the Subject and Issuer fields both broken down into array key/value pairs! (:O)

Alternatively, cURL could be altered to return this structure in such a way that it's unambiguous, and then represented in PHP as such as well.

Comments/feedback/critique/insight welcome.
 [2016-06-09 17:37 UTC] pierrick@php.net
-Assigned To: pajoye +Assigned To: pierrick
 [2016-07-28 03:44 UTC] pierrick@php.net
Automatic comment on behalf of pierrick
Revision: http://git.php.net/?p=php-src.git;a=commit;h=30a5ed3a7979f1b865f6633cb16b5f3e78371df1
Log: Fixed bug #71929 (CURLINFO_CERTINFO data parsing error).
 [2016-07-28 03:44 UTC] pierrick@php.net
-Status: Verified +Status: Closed
 [2016-07-28 03:46 UTC] pierrick@php.net
CURLINFO_CERTINFO now returns Subject and Issuer as string since this is how libcurl returns this information. There is no clear définition on how the certificate issuer and subjects are formatting so we should not try to parse this.
 [2016-10-17 10:10 UTC] bwoebi@php.net
Automatic comment on behalf of pierrick
Revision: http://git.php.net/?p=php-src.git;a=commit;h=30a5ed3a7979f1b865f6633cb16b5f3e78371df1
Log: Fixed bug #71929 (CURLINFO_CERTINFO data parsing error).
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Fri Nov 22 00:01:30 2024 UTC