PHP :: Bug #32140 :: NVARCHAR2 columns are truncated

Bug #32140	NVARCHAR2 columns are truncated
Submitted:	2005-02-28 23:08 UTC	Modified:	2005-09-08 11:47 UTC
From:	bmr at comtime dot com	Assigned:	tony2001 (profile)
Status:	Closed	Package:	OCI8 related
PHP Version:	5.0.4, 4.3.11	OS:	Linux
Private report:	No	CVE-ID:	None

View Developer Edit

Welcome back! If you're the original bug submitter, here's where you can edit the bug or add additional notes.
If you forgot your password, you can retrieve your password here.

Password:

Status:
Package:
Bug Type:
Summary:
From:	bmr at comtime dot com
New email:
PHP Version:		OS:

New Comment:

[2005-02-28 23:08 UTC] bmr at comtime dot com

Description:
------------
NVARCHAR2 columns can be used to store UTF-8 or UTF-16 characters.  When using UTF-8 it can take 3 (or more??) bytes to represent one character.  PHP truncates strings that have byte representations longer than 2 times the character length.    This is actually an Oracle bug because the OCI call returns the wrong byte length.

Here's my patch:
--- php-4.3.10/ext/oci8/oci8.c  Wed Nov  3 08:35:56 2004
+++ php-4.3.10.cti/ext/oci8/oci8.c      Mon Feb 28 15:37:54 2005
@@ -1443,6 +1443,7 @@
        ub4 iters;
        ub4 colcount;
        ub2 dynamic;
+       ub1 charset_form;
        int dtype;
        dvoid *buf;
        oci_descriptor *descr;
@@ -1573,6 +1574,21 @@
                                return 0; /* XXX we loose memory!!! */
                        }

+                       if(outcol->data_type == SQLT_CHR) {
+                                CALL_OCI_RETURN(error, OCIAttrGet(
+                                                                (dvoid *)param,
+                                                                OCI_DTYPE_PARAM,
+                                                                (dvoid *)&charset_form,
+                                                                (dvoid *)0,
+                                                                OCI_ATTR_CHARSET_FORM,
+                                                                statement->pError));
+                                statement->error = oci_error(statement->pError, "OCIAttrGet OCI_DTYPE_PARAM/OCI_ATTR_CHARS
ET_FORM", error);
+                                if (statement->error) {
+                                         oci_handle_error(statement->conn, statement->error);
+                                         return 0; /* XXX we loose memory!!! */
+                                }
+                       }
+
                        CALL_OCI_RETURN(error, OCIAttrGet(
                                                (dvoid *)param,
                                                OCI_DTYPE_PARAM,
@@ -1700,6 +1716,9 @@
                                                ) {
                                                outcol->storage_size4 = 512; /* XXX this should fit "most" NLS date-formats
 and Numbers */
                                        } else {
+                                               if(charset_form == SQLCS_NCHAR)
+                                                        outcol->storage_size4 *=2; /* double for unicode */
+
                                                outcol->storage_size4++; /* add one for string terminator */
                                        }
                                        if (outcol->data_type == SQLT_BIN) {

Reproduce code:
---------------
ocifetchinto()

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2005-02-28 23:23 UTC] tony2001@php.net

See also #31042, #27156.
Why do you multiple storage size by 2? Why not 4 (afaik max.  UTF char length is 4 bytes)? 6 (other Unicodes may contain such chars too)?
This is known solution and it's *ugly*.

[2005-03-01 03:34 UTC] bmr at comtime dot com

Oracle seems to report double the length of the field already, i.e. as if UTF-16 encoding was used.  AFAIK Oracle only supports UTF-8 and UTF-16.  The Oracle documentation says that the byte length is 3 times the character length for UTF-8 and 2 times for UTF-16.  So I guess the right fix is to multiply by 1.5!  Does anyone know how to get Oracle to fix their stuff?  I looked at the OCI docs for quite a while and couldn't find a way to get the right length, so it does seem like their bug.
 	
Microsoft Word - nchar_migration.doc: Migration to Unicode Datatypes for Multilingual Databases and Applications $Q2UDFOH:KLWH3DSHU August 2003 Migration to... (published 12/17/2003)
http://www.oracle.com/technology/tech/globalization/pdf/TWP_NCHAR_Migration_10gR1.pdf

[2005-09-08 11:47 UTC] tony2001@php.net

The bug has been fixed in OCI8 v.1.1, which is available in CVS HEAD and PECL (use `pear install oci8-beta` to install it).

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2025 The PHP Group All rights reserved.	Last updated: Tue Jul 01 17:01:34 2025 UTC