PHP :: Request #27238 :: iptcparse() function misses some fields

Request #27238	iptcparse() function misses some fields
Submitted:	2004-02-13 06:27 UTC	Modified:	2004-03-06 12:24 UTC
From:	philip at nancarrow dot net	Assigned:	pajoye (profile)
Status:	Closed	Package:	Feature/Change Request
PHP Version:	4.3.4	OS:	Windows and Linux
Private report:	No	CVE-ID:	None

View Developer Edit

[2004-02-13 06:27 UTC] philip at nancarrow dot net

Description:
------------
The iptcparse() function (GD extension) only returns IPTC/NAA records 2 and upward, skipping past record 1. This appears to be by design, but means that the returned data is incomplete, for example the "destination" dataset 1:05 is missing. Worse that this is the fact that "coded character set" (1:90) is missing, and without this value the encoding of the data is unknown (for example if 1:90 specifies ESC,%,G the data is UTF8 encoded). I assume that the current implementation is defaulting to ASCII or Latin1 encoding.
I can provide you with JPEG files containing IIM record 1 if required; they're quite common in the news industry.
Thank you

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports

[2004-02-13 09:29 UTC] pajoye@php.net

> I can provide you with JPEG files containing IIM record 1
> if required; they're quite common in the news industry.

Please do :)
If you can provide an URL with some images with the required fields and a txt file for the expected result.

Note that I never read the charset part in any docs about IPTC standart. Have you a link that describes it?

pierre

[2004-02-13 10:40 UTC] philip at nancarrow dot net

Pierre,
OK sure, I've put two JPEGs that include IIM record 1 at:
http://www.nancarrow.net/download/testpic1_latin1.jpg
[Latin1 encoded English]
and
http://www.nancarrow.net/download/testpic2_utf8.jpg
[UTF8 encoded Chinese]

The IPTC/NAA (aka "IIM") spec is freely downloadable from http://www.iptc.org/download/download.php?fn=IIMV4.1.pdf and this details all records include record 1.

Appendix C lists the currently defined character sets, which is specified in dataset 1:90. Note the strange IPTC terminology - an "octet" is a byte, so "octet 2/5" means 0x25. The character set sequence starts with ESC, so where it says ISO-8859-1 is "intermediate character 2/12 to 2/15" followed by "octet 4/1" this would be something like:
ESC,0x2F,0x41
or "ESC/A". Similarly UTF8 is ESC,2/5,4/7 or "ESC%G".
Where the spec says "intermediate character 2/12 to 2/15" most creators writing the file use the end character, ie. 2/15 in this case.

I'm not sure that PHP really needs to know about the encoding, does it ? Since strings are just byte sequences in PHP I guess it's down to the application to do the appropriate encoding/decoding... as long as they have access to the character set of course !

Thanks
Philip

[2004-03-06 12:24 UTC] pajoye@php.net

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.

	php.net \| support \| documentation \| report a bug \| advanced search \| search howto \| statistics \| random bug \| login
go to bug id or search bugs for


Copyright © 2001-2026 The PHP Group All rights reserved.	Last updated: Thu May 14 06:00:02 2026 UTC