php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #27238 iptcparse() function misses some fields
Submitted: 2004-02-13 06:27 UTC Modified: 2004-03-06 12:24 UTC
From: philip at nancarrow dot net Assigned: pajoye (profile)
Status: Closed Package: Feature/Change Request
PHP Version: 4.3.4 OS: Windows and Linux
Private report: No CVE-ID: None
 [2004-02-13 06:27 UTC] philip at nancarrow dot net
Description:
------------
The iptcparse() function (GD extension) only returns IPTC/NAA records 2 and upward, skipping past record 1. This appears to be by design, but means that the returned data is incomplete, for example the "destination" dataset 1:05 is missing. Worse that this is the fact that "coded character set" (1:90) is missing, and without this value the encoding of the data is unknown (for example if 1:90 specifies ESC,%,G the data is UTF8 encoded). I assume that the current implementation is defaulting to ASCII or Latin1 encoding.
I can provide you with JPEG files containing IIM record 1 if required; they're quite common in the news industry.
Thank you



Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2004-02-13 09:29 UTC] pajoye@php.net
> I can provide you with JPEG files containing IIM record 1
> if required; they're quite common in the news industry.

Please do :)
If you can provide an URL with some images with the required fields and a txt file for the expected result.

Note that I never read the charset part in any docs about IPTC standart. Have you a link that describes it?

pierre
 [2004-02-13 10:40 UTC] philip at nancarrow dot net
Pierre,
OK sure, I've put two JPEGs that include IIM record 1 at:
http://www.nancarrow.net/download/testpic1_latin1.jpg
[Latin1 encoded English]
and
http://www.nancarrow.net/download/testpic2_utf8.jpg
[UTF8 encoded Chinese]

The IPTC/NAA (aka "IIM") spec is freely downloadable from http://www.iptc.org/download/download.php?fn=IIMV4.1.pdf and this details all records include record 1.

Appendix C lists the currently defined character sets, which is specified in dataset 1:90. Note the strange IPTC terminology - an "octet" is a byte, so "octet 2/5" means 0x25. The character set sequence starts with ESC, so where it says ISO-8859-1 is "intermediate character 2/12 to 2/15" followed by "octet 4/1" this would be something like:
ESC,0x2F,0x41
or "ESC/A". Similarly UTF8 is ESC,2/5,4/7 or "ESC%G".
Where the spec says "intermediate character 2/12 to 2/15" most creators writing the file use the end character, ie. 2/15 in this case.

I'm not sure that PHP really needs to know about the encoding, does it ? Since strings are just byte sequences in PHP I guess it's down to the application to do the appropriate encoding/decoding... as long as they have access to the character set of course !

Thanks
Philip
 [2004-03-06 12:24 UTC] pajoye@php.net
This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.


 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Tue Nov 26 16:01:32 2024 UTC