php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #81688 PDO_ODBC doesn't handle fixed-length character columns with character conversio
Submitted: 2021-12-02 18:30 UTC Modified: 2022-01-14 16:49 UTC
Votes:1
Avg. Score:5.0 ± 0.0
Reproduced:1 of 1 (100.0%)
Same Version:1 (100.0%)
Same OS:0 (0.0%)
From: calvin at cmpct dot info Assigned: cmb (profile)
Status: Assigned Package: PDO ODBC
PHP Version: 8.0.13 OS: IBM i 7.2
Private report: No CVE-ID: None
Have you experienced this issue?
Rate the importance of this bug to you:

 [2021-12-02 18:30 UTC] calvin at cmpct dot info
Description:
------------
(was kinda wanting to wait until the transition to GH issues, but deciding to post this now... let me know if i should)

It seems PDO_ODBC doesn't handle situations where if characters are converted as part of a data bind. For example, if the CHAR(30) column on the database stores as one encoding and it gets converted to something requiring more bytes to represent, there's seemingly a possibility of a buffer overflow (what looks like to me, anyways). The example script has one such example; an é takes one byte to represent in some encodings, and two in UTF-8.

Db2 is a pathological case because most columns are stored as some kind of EBCDIC encoding and gets converted to usually UTF-8 on the client, but it's quite possible for other databases too (i.e. ISO-8859-1 columns to UTF-8). The example I posted would probably trigger the same issue if the driver is converting.

Looking at the unixODBC trace a user provided, I can see PHP just allocates what the driver claims is the CHAR(n) size plus one.

PHP PDO_ODBC:

```
[ODBC][24294][1638369671.619132][SQLDescribeCol.c][504]
		Exit:[SQL_SUCCESS]                
			Column Name = [VARTE1]                
			Data Type = 70000000009b318 -> 1                
			Column Size = fffffffffffce78 -> 30 (64 bits)                
			Decimal Digits = NULLPTR                
			Nullable = NULLPTR
[ODBC][24294][1638369671.619175][SQLColAttribute.c][294]
		Entry:
			Statement = 180847ad0
			Column Number = 1
			Field Identifier = SQL_DESC_DISPLAY_SIZE
			Character Attr = 0
			Buffer Length = 0
			String Length = 0
			Numeric Attribute = fffffffffffce70
[ODBC][24294][1638369671.619219][SQLColAttribute.c][709]
		Exit:[SQL_SUCCESS]
[ODBC][24294][1638369671.619261][SQLBindCol.c][245]
		Entry:
			Statement = 180847ad0
			Column Number = 1
			Target Type = 1 SQL_CHAR
			Target Value = 700000000056540
			Buffer Length = 31
			StrLen Or Ind = 70000000009b310
[ODBC][24294][1638369671.619304][SQLBindCol.c][353]
		Exit:[SQL_SUCCESS]
```

For comparison, Node ODBC library:

```
[ODBC][24271][1638369403.771225][SQLDescribeCol.c][504]
		Exit:[SQL_SUCCESS]                
			Column Name = [VARTE1]                
			Data Type = 1805796a4 -> 1                
			Column Size = 1805796a8 -> 30 (64 bits)                
			Decimal Digits = 1805796b0 -> 0                
			Nullable = 1805796c0 -> 0
[ODBC][24271][1638369403.771312][SQLBindCol.c][245]
		Entry:
			Statement = 180552e70
			Column Number = 1
			Target Type = 1 SQL_CHAR
			Target Value = 180583130
			Buffer Length = 121
			StrLen Or Ind = 1805437d0
[ODBC][24271][1638369403.771392][SQLBindCol.c][353]
		Exit:[SQL_SUCCESS]
```

It looks to be overallocating, per: https://github.com/markdirish/node-odbc/blob/master/src/odbc_connection.cpp#L1659

ibm_db2 actually has a flag (i5_dbcs_alloc) in case of these situations. It is a bit crude, but seeing other ODBC consumers do it is kind of suggesting to me it may be a valid solution.

The other option might be to reallocate more aggressively if the buffer is inadequate. I'm unsure how tricky that'd be to implement. The annoying option for users but simplest to prevent crashes in PHP is to emit a warning if the driver is trying to return more that what PHP allocated.

Test script:
---------------
<?php

// SQL to repro
create table calvin.bigchars(
    x char(30)
);

// character that must be representable as a single char in DB but expanded on client (i.e. accented e in ISO-8859 or EBCDIC is multiple bytes in UTF-8)
insert into calvin.bigchars values('éééééééééééééééééééééééééééééé'); // 30
insert into calvin.bigchars values('ééééééééééééééé'); // 15
insert into calvin.bigchars values('ééééé'); // 5
// End SQL


//Connect to IBM i
$user = 'user';
$pw = 'password';
// CCSID 1208 for UTF-8, 819 for ISO-8859-1
$dsn = 'Driver={IBM i Access ODBC Driver};System=127.0.0.1;AlwaysCalculateResultLength=1;CCSID=1208';
$persistence = false;
// create pdo odbc connection
try {
    $pdo_odbc = new PDO("odbc:$dsn", $user, $pw, array(
        PDO::ATTR_PERSISTENT => $persistence,
        PDO::ATTR_ERRMODE => PDO::ERRMODE_WARNING,
    ));
} catch (PDOException $e) {
        die($e->getMessage());
}
$sql = "select * from calvin.bigchars";
$pdoStmt = $pdo_odbc->prepare($sql);
echo "before PDO\n";
var_dump($pdoStmt->execute());
echo "after PDO\n";
while($record = $pdoStmt->fetch(PDO::FETCH_ASSOC)){
        echo "Record\n";
        var_dump($record["X"]);
}


Actual result:
--------------
before PDO
bool(true)
after PDO
Record
string(60) "éééééééééééééééT`�
                              ��sql"
Record
string(45) "éééééééééééééééT`�
                              "
Record
string(35) "ééééé                    "


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2021-12-03 18:03 UTC] cmb@php.net
Well, the relevant differenec to the Node ODBC library is
apparently that PDO does not rely on SQLDescribeCol() to detect
the size, but rather on SQLColAttribute(SQL_DESC_DISPLAY_SIZE)[1].
Whether that is *supposed* to return the "correct" length is not
clear to me.  Could you please check what
SQLColAttribute(SQL_DESC_OCTET_LENGTH) would yield?  Might be the
same value as SQL_DESC_DISPLAY_SIZE, but maybe not.

> there's seemingly a possibility of a buffer overflow

I don't think that is possible, since the buffer size is passed
to SQLBindCol(), so only truncation may happen.

Note that I'm not generally against overallocation, but would try
to avoid it for the general case.  Some setting (not necessarily
INI) might be an option.  But please check SQL_DESC_OCTET_LENGTH
first.

[1] <https://github.com/php/php-src/blob/php-8.0.13/ext/pdo_odbc/odbc_stmt.c#L604>
 [2021-12-03 19:05 UTC] calvin at cmpct dot info
I agree overallocation is ugly and something we want to avoid if we can help it.

SQL_DESC_OCTET_LENGTH is the same as the display length for the column (in this case, 30) when I write a program to confirm this in C. I'm guessing this is technically correct because it's what it's true for how the database may be representing it, but not after the driver does conversion for the client's encoding.

My comment about the overflow is more about how there's garbage at the end there; I suspect it's more not truncating and letting it go out reading stuff in the abyss. Or I'm also reading the situation wrong, and it's reallocating the buffer, just not doing it correctly in a way that leaves garbage. Orrrrrrr the driver isn't truncating properly; either way, I'll probably bug the IBM people responsible for the driver and get their read on the situation.

(Apologies if you see dupes, bugs.php keeps timing out...)
 [2021-12-07 16:13 UTC] kadler at us dot ibm dot com
I know my colleague has done some investigation here as to how different databases/drivers react when running in different encodings between the database and the client. IIRC most drivers don't do any conversion and just give back the data in the original encoding (which works "ok" for the most part when everyone is some ASCII variant, but not so when one is EBCDIC).

As to how the PHP code works using SQL_DESC_DISPLAY_SIZE, this is the docs:

https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlcolattribute-function?view=sql-server-ver15

"SQL_DESC_DISPLAY_SIZE - Maximum number of characters required to display data from the column. For more information about display size, see Column Size, Decimal Digits, Transfer Octet Length, and Display Size in Appendix D: Data Types."

Of course, the link to the Appendix isn't any more enlightening:

https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/column-size-decimal-digits-transfer-octet-length-and-display-size?view=sql-server-ver15

"The display size value for all data types corresponds to the value in a single descriptor field, SQL_DESC_DISPLAY_SIZE."


In the straightforward reading of the spec, I suppose one could argue that the driver should account for any encoding conversion of the data that it will do.
 [2021-12-07 16:54 UTC] calvin at cmpct dot info
Interesting observation: if you try to cast so PHP picks a bigger buffer size that might be able to accommodate the conversion, you still get garbage at the end:

i.e. `select cast(x as char(150)) as X from calvin.bigchars`

gets you

```
Record
string(180) "éééééééééééééééééééééééééééééé                                                                                          0 (*Wind!@.0*) Gecko* "
Record
string(165) "ééééééééééééééé                                                                                                                        0 (*Windo"
Record
string(155) "ééééé                                                                                                                                            0 (*"
```

(I'm guessing that's garbage in memory left over from parsing browscap, since this is CLI...)
 [2021-12-14 13:11 UTC] cmb@php.net
-Package: ODBC related +Package: PDO ODBC
 [2021-12-14 13:11 UTC] cmb@php.net
> Of course, the link to the Appendix isn't any more enlightening:

Yeah. :(

Anyhow, all this looks remotely related to bug #79059; see
particularly my last comment there[1].  It might be worthwhile to
try to run the scripts with PDO::ODBC_ATTR_ASSUME_UTF8 set to
true, even though that is not yet consistently implemented.

[1] <https://bugs.php.net/bug.php?id=79059#1601993047>
 [2021-12-14 13:11 UTC] cmb@php.net
-Assigned To: +Assigned To: cmb
 [2021-12-14 13:47 UTC] cmb@php.net
Oh, forgot: could you also please check the results of the queries
with ext/odbc (i.e. the odbc_*() functions)?  The implementation
is quite different from PDO_ODBC.
 [2021-12-14 16:26 UTC] cmb@php.net
-Status: Assigned +Status: Feedback
 [2021-12-14 18:53 UTC] calvin at cmpct dot info
I can confirm that PDO::ODBC_ATTR_ASSUME_UTF8 and using the procedural ODBC extension have the same results. (Well, the garbage on the end is different, but it's still garbage and the string is still the same length.)

Code for the procedural ODBC version (the attrib on PDO ver is trivial so not including):

```
$odbc = odbc_connect($dsn, $user, $pw);
$sql = "select cast(x as char(150)) as X from calvin.bigchars";
$stmt = odbc_prepare($odbc, $sql);
echo "before ODBC\n";
var_dump(odbc_execute($stmt));
echo "after ODBC\n";
while($record = odbc_fetch_array($stmt)){
        echo "Record\n";
        var_dump($record["X"]);
}
```
 [2021-12-26 04:22 UTC] php-bugs at lists dot php dot net
No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Re-Opened". Thank you.
 [2021-12-27 13:56 UTC] calvin at cmpct dot info
The problem still exists, it should be re-opened.

That said, I'm not entirely sure of the etiquette though around bugs on bugs.php; are these better off moved to GH Issues instead now?
 [2021-12-27 16:51 UTC] cmb@php.net
-Status: No Feedback +Status: Open
 [2021-12-27 16:51 UTC] cmb@php.net
Sorry for the late reply!

From what I can tell, there may be two different issues here: (a)
PHP does not allocate sufficient space for the data, and (b) the
driver may deliver fewer data than PHP expects, resulting in
arbitrary memory contents to be given to userland.  Let's focus
on (b) for now, since that should be easy to fix by zeroing the
buffers before fetching.  Could you please try the following patch
for ODBC (PDO_ODBC could likely get a similar fix):


 ext/odbc/php_odbc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/ext/odbc/php_odbc.c b/ext/odbc/php_odbc.c
index d4de4ec75b..d0ee19aa07 100644
--- a/ext/odbc/php_odbc.c
+++ b/ext/odbc/php_odbc.c
@@ -1381,6 +1381,12 @@ static void php_odbc_fetch_hash(INTERNAL_FUNCTION_PARAMETERS, int result_type)
 		RETURN_FALSE;
 	}
 
+	for (i = 0; i < result->numcols; i++) {
+		if (result->values[i].value != NULL) {
+			memset(result->values[i].value, 0, result->values[i].vallen);
+		}
+	}
+
 #ifdef HAVE_SQL_EXTENDED_FETCH
 	if (result->fetch_abs) {
 		if (rownum > 0) {


Something else that may be worth investigating is the usage of
SQL_LONGVARCHAR columns (or respective casts), if something like
that is supported by the DB and the driver.  Such columns are not
bound, but rather SQLGetData() is used to retrieve the data, and
the driver is supposed to report the length of the transferred
data in the StrLen_or_IndPtr argument (so in this case neither
SQLDescribeCol() nor SQLColAttribute() are used to compute the
length).
 [2021-12-27 19:42 UTC] calvin at cmpct dot info
Hi Christoph, I'll give this patch a shot (and see if Db2 has that column type...). I'm fine with the late reply considering the season (time off is more important than ODBC bugs), just chuffed at the issue tracker that it got auto-closed.
 [2021-12-27 21:16 UTC] calvin at cmpct dot info
Testing your patch and it faults, but it segfaults on the additional memset. Bizarre, so I'm gonna check if it reproduces the same on Linux.

```
(gdb) run
Starting program: /QOpenSys/pkgs/bin/php -d error_log= test4.php
[New Thread 1]
before ODBC
bool(true)
after ODBC

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1]
0x000000000000000c in ?? ()
(gdb) sharedlib odbc
Reading symbols from /QOpenSys/pkgs/lib/libodbcinst.so.2(shr_64.o)...done.
Reading symbols from /QOpenSys/pkgs/lib/libcwbodbc.so...done.
Reading symbols from /QOpenSys/pkgs/lib/php-8.0/extensions/pdo_odbc.so...done.
Reading symbols from /QOpenSys/pkgs/lib/libodbc.so.2(shr_64.o)...done.
Reading symbols from /QOpenSys/pkgs/lib/php-8.0/extensions/odbc.so...done.
(gdb) where
#0  0x000000000000000c in ?? ()
#1  0x090000004e2862d0 in php_odbc_fetch_hash.isra.11.constprop () from /QOpenSys/pkgs/lib/php-8.0/extensions/odbc.so
#2  0x00000001000b9914 in execute_ex (ex=0x70000000022ffe0) at /home/calvin/rpmbuild/BUILD/php-8.0.14/Zend/zend_execute.c:51782
#3  0x00000001000c1f30 in zend_execute (op_array=0x70000000022ffe0, return_value=0x7) at /home/calvin/rpmbuild/BUILD/php-8.0.14/Zend/zend_execute.c:58537
#4  0x00000001000e5b80 in zend_execute_scripts (type=2293728, retval=0x7, file_count=0) at /home/calvin/rpmbuild/BUILD/php-8.0.14/Zend/zend.c:1680
#5  0x00000001001d1ff4 in php_execute_script (primary_file=0xfffffffffffdec0) at /home/calvin/rpmbuild/BUILD/php-8.0.14/main/main.c:2539
#6  0x0000000100002d00 in do_cli (argc=4, argv=0x1800ee770) at /home/calvin/rpmbuild/BUILD/php-8.0.14/sapi/cli/php_cli.c:347
#7  0x00000001000008cc in main (argc=4, argv=0x7) at /home/calvin/rpmbuild/BUILD/php-8.0.14/sapi/cli/php_cli.c:1337
```
 [2021-12-27 21:32 UTC] calvin at cmpct dot info
Thankfully was a quick build, but I don't notice any different behaviour (where /tmp/php is a build of PHP-8.0.14 with your patch vs. my system 8.0.13 from Fedora, both using the Linux build of the same driver I've been using on i):

```
[calvin@salient src]$ /tmp/php/bin/php -d extension_dir=/tmp/php/lib/php/extensions/no-debug-non-zts-20200930 test4.php 
before ODBC
bool(true)
after ODBC
Record
string(180) "éééééééééééééééééééééééééééééé                                                                                          V��(���j"
Record
string(165) "ééééééééééééééé                                                                                                                        "
Record
string(155) "ééééé                                                                                                                                            "
[calvin@salient src]$ php test4.php 
before ODBC
bool(true)
after ODBC
Record
string(180) "éééééééééééééééééééééééééééééé                                                                                          �c��]"
Record
string(165) "ééééééééééééééé                                                                                                                        �c��]"
Record
string(155) "ééééé                                                                                                                                            "
```
 [2021-12-27 22:10 UTC] cmb@php.net
> […], just chuffed at the issue tracker that it got auto-closed.

Yeah, that is a bit unfortunate.  If the status is set to
"feedback", the bug is automatically closed on the next but one
Sunday, even if feedback has been provided.  You, as bug reporter,
should be able to change the status back to "open", though
("re-opened" doesn't work, IIRC).

> but I don't notice any different behaviour

Hmm, does the *driver* fill the buffers with some garbage at the
end, in this case?  I'll have a closer look tomorrow; it seems
using SQLGetData() always (for testing purposes) might be a way to
proceed.
 [2021-12-28 16:58 UTC] calvin at cmpct dot info
Was going to write a sample w/ ODBC in C, but decided to poke the existing code path to determine if it's just junk left over before, or just created after.

Gist b/c it's too long to post here: https://gist.githubusercontent.com/NattyNarwhal/3864224066dda8ed4be991c06e67a670/raw/3e3aad9efbc6472f7dde1e8e1167e2bc9ad06288/gistfile1.txt

(I rebuilt PHP w/ --enable-debug for my own sanity in GDB. Also on Linux too.)

Note that the garbage on 0x7ffff767e3a8 that appears in the string comes from before the memset, so I think this is just leftover in memory and not coming from the driver. (You can also see my connection string... thankfully not my password.)
 [2021-12-30 23:06 UTC] cmb@php.net
Oh yeah, sorry, zeroing the memory can't work, since the length is
only properly set when fetching, so we cannot do that upfront.

Sorry for the delay, once again.  I shall dig deeper ASAP.
 [2022-01-05 16:15 UTC] cmb@php.net
-Status: Assigned +Status: Feedback
 [2022-01-05 16:15 UTC] cmb@php.net
I made a quick patch which disables any column binding for ext/odbc[1].
That is, all data are now retrieved by SQLGetData().  This is obviously
less efficient, but I like to know whether that would work with the IBM
driver/DB.  Could you please check it out?

Note that all ext/odbc tests work for me like without the patch
(or undefining ODBC_DONT_BIND), except for bug44618.phpt which now
is able to retrieve the third column.

[1] <https://gist.github.com/cmb69/839d5e04395936ad64fd049d2cb7fd55>
 [2022-01-05 18:23 UTC] calvin at cmpct dot info
I tried the patch and not only does it work (as below):

```
Record
string(180) "éééééééééééééééééééééééééééééé                                                                                                                        "
Record
string(165) "ééééééééééééééé                                                                                                                                       "
Record
string(155) "ééééé                                                                                                                                                 "
```

...it's actually faster. I took the sample from earlier, wrapped the prepare-execute-fetch-display in a 10000 iteration loop, ran the script 10 times (to average out outliers - it's not super scientific, but to make it fair), and unmodified PHP returning garbage takes 10 seconds, and your fix takes 7 seconds. Now I'm wondering if SQLGetData might be more efficient for other scenarios...
 [2022-01-05 22:28 UTC] cmb@php.net
Thanks for checking and the good news!

Regarding the performance, I'm surprised, but an MSDN article
about fetching result data[1] possibly clarifies:

| If a result set contains only a few rows, using SQLGetData
| instead of SQLBindCol is faster; otherwise, SQLBindCol gives the
| best performance.

And:

| When using server cursors, the SQL Server Native Client ODBC
| driver is optimized to not transmit the data for unbound text,
| ntext, or image columns at the time the row is fetched. The text,
| ntext, or image data is not actually retrieved from the server
| until the application issues SQLGetData for the column.

So probably the performance of SQLBindCol() vs. SQLGetData()
likely depends very much on the driver.  If the latter requires
additional requests to the server, performance might drastically
degrade.  It might make sense to do more extensive testing
(although, we cannot hope to cover nearly all cases), but at least
wrt. fixing a bug (i.e. changes to PHP-8.0), we need to be
conservative anyway.  Would it work for you/your client to do the
builds with a compile time constant (maybe configurable)?  If so,
I could do a pull request based on the latest patch (plus a
similar solution for PDO_ODBC), and we can go from there.

More aggressive changes might be doable for the master branch (PHP
8.2) afterwards.  At least the code in ext/odbc looks like it
needs some overhaul anyway.

[1] <https://docs.microsoft.com/en-us/sql/relational-databases/native-client-odbc-results/fetching-result-data?view=sql-server-ver15>
 [2022-01-06 16:46 UTC] calvin at cmpct dot info
I think I'm OK with compile-time for now. We mostly support users running on i directly (with our packages, so we can easily include this patch), though some do use the same driver on Linux/Windows with official PHP/distro binaries.

FWIW, most of our users are also running PDO_ODBC, mostly because that supports in/out parameters with stored procedures (which, I haven't checked if that's impacted...). I skimmed the source to see what it'd take to do the same, since I think it's a little more dependent on binding for fetching data.

I'm a little concerned about perf since you mentioned the possibilities about other scenarios and maybe causing a regression for realistic workloads, but I'll try the microbenchmark again with a larger count of rows.
 [2022-01-06 18:00 UTC] calvin at cmpct dot info
OK, expanding it to 26 rows (not quite "big" but less trivial than a single row, albeit still a single column and printing the output) is ~43s for unmodified and ~30s for your patch.

FWIW, I also tested a version that doesn't bother with the bug and just the differences in approach; basically the same test against a more "realistic" table (on i, QIWS.QCUSTCDT is basically a multirow, multi-column table with a variety of data that most people use for tests like this) and not printing any rows (to avoid measuring the costs writing the data out). The result is the unmodified PHP is a bit faster (14.5496666666667) versus your patch (14.9203333333333) but it's very small and perhaps within the margin of error.
 [2022-01-09 00:19 UTC] cmb@php.net
-Status: Feedback +Status: Open
 [2022-01-14 16:31 UTC] calvin at cmpct dot info
The user having the issue wanted to try the fix with their existing code, but was using PDO. So, I've ported the logic over for PDO_ODBC and I have the patch here: https://gist.github.com/NattyNarwhal/077dbd184f08ac030f8b6088b3d39ed4 - I'll try it out with them soon and see how it goes.

It's quite simple and like the procedural ODBC one, works by basically skipping functionality and using existing things. There's almost certainly a better solution, as this patch does all sorts of bad things (i.e. coercing to string, have to get assembled in a 256 byte buffer piecemeal), but it does fix the primary issue here.

(Also, unrelated observation and happens with/without any of the patches: PDO_ODBC is *noticeably* faster at least on Linux. I can actually see the results stream out w/ procedural, but they're instant with PDO, and that's without fetchAll. Quite interesting...)
 [2022-01-14 16:49 UTC] cmb@php.net
> So, I've ported the logic over for PDO_ODBC and I have the patch
> here: […]

Ah, cool!  It might be possible to simplify further by just
setting S->going_long = 1 early (but that doesn't matter for now).

> coercing to string, have to get assembled in a 256 byte buffer
> piecemeal

I think that code is indeed rather suboptimal.  I still wonder why
fetching in small chunks is claimed to be faster than reading a
single chunk[1].

> PDO_ODBC is *noticeably* faster at least on Linux.

That's interesting!  Might be worth investigating.

[1] <https://github.com/php/php-src/pull/6716#issuecomment-783461335>
 [2022-01-20 19:11 UTC] calvin at cmpct dot info
Brief update w/ the PDO_ODBC patch from last week: The user does report the patch works for them without any (noticeable) performance regression. Probably a bit rough still, but good to know so far.
 
PHP Copyright © 2001-2022 The PHP Group
All rights reserved.
Last updated: Sat Jan 22 00:03:34 2022 UTC