php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #76731 Date time format parsing is wrong for format Ynd
Submitted: 2018-08-12 12:21 UTC Modified: 2018-08-13 10:53 UTC
From: remy dot fox at simbuka dot com Assigned:
Status: Not a bug Package: Date/time related
PHP Version: 7.2.8 OS:
Private report: No CVE-ID: None
 [2018-08-12 12:21 UTC] remy dot fox at simbuka dot com
Description:
------------
The parsing of the date formats 'Ynd' and 'Ymd' gives a wrong result when segment in the date are not separated by any characters.

There may be similar issues with other date time format specifier combinations too, but I haven't tested them.

Test script:
---------------
$date = DateTime::createFromFormat("Ymd", "2000-10-1");
var_dump($date);
$date = DateTime::createFromFormat("Ymd", "2000101");
var_dump($date);

Expected result:
----------------
As we can see, the first $date returns false, which is correct. After all the day value was not padded with a zero. In the second example, the zero could be interpreted as either part of the month (i.e. october) or as padding for the day value. The result does not return false but it should. 

object(DateTime)#2 (3) {
  ["date"]=>
  string(26) "2000-10-01 05:18:02.000000"
  ["timezone_type"]=>
  int(3)
  ["timezone"]=>
  string(10) "US/Pacific"
}


Patches

Add a Patch

Pull Requests

Add a Pull Request

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-08-12 12:49 UTC] requinix@php.net
I don't think it's reasonable to require PHP to check for ambiguity. It's extra processing cycles for inputs when in nearly all cases, and I would even say all cases that aren't due to bad decisions, the input is not ambiguous.

Really, strings like "2000101" are weird - surely anyone serializing a date will use 8 digits, right? I mean, what date is it supposed to represent anyways? You say Oct 1 but it could be Jan 01 too. As a human, how would you decide?

I'd add a warning to the docs that m/n and d/j (and others) work best when (a) the date parts are always the full length with zero padding if needed, or (b) there is a clear separation between the parts. And that otherwise the result is undefined.
 [2018-08-12 18:06 UTC] remy dot fox at simbuka dot com
You are right that sometimes there is ambiguity. For example, if we take the string '2000111' then for the format 'Ynj' it could mean either 11 january or 1 november. This is because both the month and day specifiers are unpadded.

In my example there is no ambiguity though. Both the month and day specifier are padded in 'Ymd' and so the parsing should fail (i.e. return false), because the zero in position 6 is used twice now.
 [2018-08-12 23:19 UTC] a at b dot c dot de
In short: createFromFormat("Ymd", "2000101") should be rejected because it doesn't have enough digits, right? There should be four for the year, two for the month, and two for the day.

The test case can be simplified:

$date = DateTime::createFromFormat("d", "1");
var_dump($date);

Of the other "leading zero" format specifiers, "m", "h" and "H" also parse, but "i" and "s" both fail.
 [2018-08-12 23:24 UTC] a at b dot c dot de
Incidentally, I just noticed that this behaviour is documented on the createFromFormat() page.

"d and j 	Day of the month, 2 digits with or without leading zeros"
 [2018-08-13 00:58 UTC] requinix@php.net
> Both the month and day specifier are padded in 'Ymd'
Optionally padded. 2000101 gets parsed as Y=2000 m=10 d=1 so the zero isn't being reused.

PHP takes whatever digits are available when deciding whether to use one or two digits. If there are two digits available then they both will be used. That's why "200012" does not work with Ymd: Y=2000, m=12 because there are two digits available, and since there's nothing left for d to match it fails. Even though it could have matched with m=1 d=2.
 [2018-08-13 07:58 UTC] derick@php.net
Your first sample fails because you didn't specify the - in your format:

derick@singlemalt:~ $ php

<?php
$date = DateTime::createFromFormat("Y-m-d", "2000-10-1");
var_dump($date);

Standard input code:3:
class DateTime#1 (3) {
  public $date =>
  string(26) "2000-10-01 07:55:25.000000"
  public $timezone_type =>
  int(3)
  public $timezone =>
  string(3) "UTC"
}

Works just fine here.

The format characters in createFromFormat take *up* to the amount of characters acceptable for the format. So indeed, the "Y" parses 4 characters ("2000"), the "m" parses 2 characters ("10"), and the "d" up to 2 characters, but as there is only one, it parses only "1".
 [2018-08-13 10:41 UTC] remy dot fox at simbuka dot com
You are right that I made a mistake in the first example.

On the date webpage (http://php.net/manual/en/function.date.php) the 'd' specifier is prescribed as 'Day of the month, 2 digits with leading zeros', which is why my conclusion was that the parsing must be wrong. Of course if the documentation is inaccurate because the 'd' actually means *up to* 2 digits, then the bug report is not valid.

Though in my opinion *up to* 2 digits is rather confusing and I think that the current documentation on the date function (although not representing the current state of the functionality) is more consistent. I'd suggest fixing the functionality to match the date function's documentation rather than the other way around.
 [2018-08-13 10:53 UTC] requinix@php.net
-Status: Open +Status: Not a bug
 [2018-08-13 10:53 UTC] requinix@php.net
Yeah, and the documentation for strftime says the two-digit day is %d.

You have to look at the pages for the functions you're *actually using*.

http://php.net/manual/en/datetime.createfromformat.php
> d and j   Day of the month, 2 digits with or without leading zeros           01 to 31 or 1 to 31
> m and n   Numeric representation of a month, with or without leading zeros   01 through 12 or 1 through 12

Anyway, it sounds like you're satisfied with the current behavior given what 'm' and 'd' represent.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Apr 25 19:01:33 2024 UTC