php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Request #72392 array_diff not working as expected due to unintuitive behavior in file()
Submitted: 2016-06-12 20:28 UTC Modified: 2016-06-13 00:07 UTC
From: sky at skyboydston dot com Assigned:
Status: Wont fix Package: Arrays related
PHP Version: 7.0.7 OS: Windows 8
Private report: No CVE-ID: None
 [2016-06-12 20:28 UTC] sky at skyboydston dot com
Description:
------------
This is part of a mini-application that takes inputs of arrays of email addresses, then compares the lists to create a final list of only valid addresses. One text file contains all of the addresses while the other two lists contain addresses that are invalid, either because their users have  unsubscribed or because emails to their addresses have been undeliverable.

The problem is that the array_diff() function is not working as expected. You can see from the output (shown below) that the reason is related to the output of the file() function, which will take into account newlines in .txt files by default, making array values unequal if they happen to fall on the last line in one file but on a preceding line in another file. It's necessary to actively suppress the newlines by using the `FILE_IGNORE_NEW_LINES` flag on file().

My suggestion is that this behavior is unintuitive and that PHP would be more user friendly if the suppression of these newlines was the default for file(). Not setting it up like that seems to be creating a situation that's fairly difficult to debug.

In order for a developer to see that this is the cause of their problem in array_diff(), it's necessary to add `<pre></pre>` tags so that the newlines are apparent, and even then, not glaringly so as would be best. In my case, I simply happened to insert the `<pre></pre>` tags because I was preparing the code for other's review here and on stackoverflow.com. I believe the following post was created by others having difficulties related to this issue: http://stackoverflow.com/questions/7348280/array-diff-not-working-as-expected-what-could-be-the-reason



Contents of the three files:
all.txt:
one@one.com
two@two.com
three@three.net
four@four.org
five@five.com

unsub.txt:
four@four.org
two@two.com

bounced.txt:
one@one.com



Incorrect output:
Array
(
    [0] => one@one.com

    [1] => two@two.com

    [2] => three@three.net

    [3] => four@four.org

    [4] => five@five.com
)


Array
(
    [0] => four@four.org

    [1] => two@two.com
)


Array
(
    [0] => one@one.com
)


one@one.com
two@two.com
three@three.net
five@five.com
7.0.6


Correct output:
Array
(
    [0] => one@one.com
    [1] => two@two.com
    [2] => three@three.net
    [3] => four@four.org
    [4] => five@five.com
)


Array
(
    [0] => four@four.org
    [1] => two@two.com
)


Array
(
    [0] => one@one.com
)


three@three.net
five@five.com
7.0.6

Test script:
---------------
$all = array();  //  Initialize the arrays we'll need
$unsub = array();
$bounced = array();
$valid = array();

//  Populate them and display their contents to make sure we're getting what we want.
$all = file('all.txt', FILE_IGNORE_NEW_LINES);  // <-- This flag can be removed to show the code which does not behave as expected. Make sure to remove it for all three files.
echo '<pre>';
print_r($all);
echo '</pre><br>';

$unsub = file('unsub.txt', FILE_IGNORE_NEW_LINES);
echo '<pre>';
print_r($unsub);
echo '</pre><br>';

$bounced = file('bounced.txt', FILE_IGNORE_NEW_LINES);
echo '<pre>';
print_r($bounced);
echo '</pre><br>';



$valid = array_diff($all, $unsub, $bounced);  // Here's where we originally seemed to be having a problem

foreach ($valid as $value) {
	print($value . '<br>');
}

echo (PHP_VERSION);  //  This behavior has also been tested and is the same in version 5.6.19


Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2016-06-13 00:07 UTC] requinix@php.net
-Status: Open +Status: Wont fix
 [2016-06-13 00:07 UTC] requinix@php.net
The newline has been present for years and its presence is clearly noted with a Note in the Return Values section of the function's documentation. Removing the newline would break backwards compatibility for all the people who use it while expecting them; for example, using it to copy one file to another while making some content changes along the way (which I myself have done before). Those who don't want them are already using FILE_IGNORE_NEW_LINES - or perhaps r/trim if they weren't aware of the flag.

One way or another some group of people would need to use a flag and others would not. However changing the behavior adds the BC break and makes it noticeably more awkward to write code targeting multiple versions of PHP.

With that said, if you feel strongly about this then you should bring it up for discussion on the internals mailing list.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 15:01:29 2024 UTC