PHP :: Bug #32140 :: NVARCHAR2 columns are truncated

Bug #32140	NVARCHAR2 columns are truncated
Submitted:	2005-02-28 23:08 UTC	Modified:	2005-09-08 11:47 UTC
From:	bmr at comtime dot com	Assigned:	tony2001 (profile)
Status:	Closed	Package:	OCI8 related
PHP Version:	5.0.4, 4.3.11	OS:	Linux
Private report:	No	CVE-ID:	None

View Developer Edit

[2005-02-28 23:08 UTC] bmr at comtime dot com

Description:
------------
NVARCHAR2 columns can be used to store UTF-8 or UTF-16 characters.  When using UTF-8 it can take 3 (or more??) bytes to represent one character.  PHP truncates strings that have byte representations longer than 2 times the character length.    This is actually an Oracle bug because the OCI call returns the wrong byte length.

Here's my patch:
--- php-4.3.10/ext/oci8/oci8.c  Wed Nov  3 08:35:56 2004
+++ php-4.3.10.cti/ext/oci8/oci8.c      Mon Feb 28 15:37:54 2005
@@ -1443,6 +1443,7 @@
        ub4 iters;
        ub4 colcount;
        ub2 dynamic;
+       ub1 charset_form;
        int dtype;
        dvoid *buf;
        oci_descriptor *descr;
@@ -1573,6 +1574,21 @@
                                return 0; /* XXX we loose memory!!! */
                        }

+                       if(outcol->data_type == SQLT_CHR) {
+                                CALL_OCI_RETURN(error, OCIAttrGet(
+                                                                (dvoid *)param,
+                                                                OCI_DTYPE_PARAM,
+                                                                (dvoid *)&charset_form,
+                                                                (dvoid *)0,
+                                                                OCI_ATTR_CHARSET_FORM,
+                                                                statement->pError));
+                                statement->error = oci_error(statement->pError, "OCIAttrGet OCI_DTYPE_PARAM/OCI_ATTR_CHARS
ET_FORM", error);
+                                if (statement->error) {
+                                         oci_handle_error(statement->conn, statement->error);
+                                         return 0; /* XXX we loose memory!!! */
+                                }
+                       }
+
                        CALL_OCI_RETURN(error, OCIAttrGet(
                                                (dvoid *)param,
                                                OCI_DTYPE_PARAM,
@@ -1700,6 +1716,9 @@
                                                ) {
                                                outcol->storage_size4 = 512; /* XXX this should fit "most" NLS date-formats
 and Numbers */
                                        } else {
+                                               if(charset_form == SQLCS_NCHAR)
+                                                        outcol->storage_size4 *=2; /* double for unicode */
+
                                                outcol->storage_size4++; /* add one for string terminator */
                                        }
                                        if (outcol->data_type == SQLT_BIN) {

Reproduce code:
---------------
ocifetchinto()

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2005-02-28 23:23 UTC] tony2001@php.net

See also #31042, #27156.
Why do you multiple storage size by 2? Why not 4 (afaik max.  UTF char length is 4 bytes)? 6 (other Unicodes may contain such chars too)?
This is known solution and it's *ugly*.

[2005-03-01 03:34 UTC] bmr at comtime dot com

Oracle seems to report double the length of the field already, i.e. as if UTF-16 encoding was used.  AFAIK Oracle only supports UTF-8 and UTF-16.  The Oracle documentation says that the byte length is 3 times the character length for UTF-8 and 2 times for UTF-16.  So I guess the right fix is to multiply by 1.5!  Does anyone know how to get Oracle to fix their stuff?  I looked at the OCI docs for quite a while and couldn't find a way to get the right length, so it does seem like their bug.
 	
Microsoft Word - nchar_migration.doc: Migration to Unicode Datatypes for Multilingual Databases and Applications $Q2UDFOH:KLWH3DSHU August 2003 Migration to... (published 12/17/2003)
http://www.oracle.com/technology/tech/globalization/pdf/TWP_NCHAR_Migration_10gR1.pdf

[2005-09-08 11:47 UTC] tony2001@php.net

The bug has been fixed in OCI8 v.1.1, which is available in CVS HEAD and PECL (use `pear install oci8-beta` to install it).

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2026 The PHP Group All rights reserved.	Last updated: Sat Jul 11 19:00:02 2026 UTC