php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Doc Bug #75923 Body is fetched without the last line
Submitted: 2018-02-06 11:37 UTC Modified: 2020-09-14 11:43 UTC
Votes:3
Avg. Score:3.0 ± 0.0
Reproduced:2 of 2 (100.0%)
Same Version:0 (0.0%)
Same OS:1 (50.0%)
From: jochem dot blok at fasterforward dot nl Assigned: cmb (profile)
Status: Closed Package: mailparse (PECL)
PHP Version: 5.6.33 OS: Ubuntu
Private report: No CVE-ID: None
View Developer Edit
Welcome! If you don't have a Git account, you can't do anything here.
If you reported this bug, you can edit this bug over here.
Block user comment
Status: Assign to:
Package:
Bug Type:
Summary:
From: jochem dot blok at fasterforward dot nl
New email:
PHP Version: OS:

 

 [2018-02-06 11:37 UTC] jochem dot blok at fasterforward dot nl
Description:
------------
With a not multipart mail the body is fetched without the last line. Appending a new line to the end "solves" the problem.

Test script:
---------------
<?php

error_reporting(E_ALL);
ini_set('display_errors', 1);
header('Content-type: text/plain');


$tekst = <<<EOD
From: someone@example.com
To: someone_else@example.com
Subject: An RFC 822 formatted message

This is the plain text body of the message. Note the blank line
between the header information and the body of the message.
EOD;

$stream = fopen('php://memory', 'r+');
fwrite($stream, $tekst);
fseek($stream, 0);
$resource = mailparse_msg_create();
mailparse_msg_parse($resource, fread($stream, 10000));

echo mailparse_msg_extract_part($resource, $stream);

Expected result:
----------------
This is the plain text body of the message. Note the blank line
between the header information and the body of the message.

Actual result:
--------------
This is the plain text body of the message. Note the blank line
1

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2018-03-17 15:17 UTC] pudge601 at hotmail dot com
I’ve tried to look into this myself and see if I can come up with a solution.

Disclaimer - I’m not a C dev, and have very little knowledge of PHP internals/extensions

The crux of the issue is that mailparse will only process a line when it gets to a new line character, so for an email string/file which doesn’t end with a new line, the last line will not be processed.

I tried to fix this by making it “flush” when we get to EOF, but this solution would only work when parsing files. The mailparse_msg_parse function (for parsing the mail as a string) can be called multiple times to parse the mail incrementally, and so there is no way for the extension to know whether it has reached the end of the mail string.

The only solutions I can think of for this (short of completely re-designing the API for parsing mail messages) would be;

1. Add a mailparse_msg_parse_flush function, which would process the remainder of contents in the buffer as if it is the last line. This would put the onus on the user to make sure they always call this function after calling mailparse_msg_parse
2. Automatically “flush” the buffer the first time any other function which works on a mime mail resource is called by the user; i.e. assume that if the user is trying to query about any aspect of the mime mail (get structure, get part data, get part), then they have finished passing in the raw data and anything left in the buffer is the end of the contents

Both of these approaches would probably then need to add restrictions on calling mailparse_msg_parse after flushing the buffer.

Given that these solutions aren’t particularly great, perhaps it would be better to just accept the current behaviour and document it. After all, the behaviour of only considering a line to be a line if it ends with a newline character is technically correct (as far as POSIX is concerned). The only trouble with this is that it isn’t possible to ever have a non-multipart mail whose body contents do not end in a newline, and for it to be parsed correctly by the mailparse extension (except if the contents are base64 encoded, in which case the raw newline is ignored).
 [2020-08-17 15:34 UTC] thomas at landauer dot at
Just adding a shorter script for easier reproduction:

```php
$raw = <<<EOD
Date: Mon, 17 Aug 2020 18:36:05 +0100
From: <foo@example.com>

foobar
EOD;

$mimemail = mailparse_msg_create();
mailparse_msg_parse($mimemail, $raw);
var_dump(mailparse_msg_extract_part($mimemail, $raw));
```
 [2020-09-14 10:10 UTC] cmb@php.net
-Status: Open +Status: Analyzed -Type: Bug +Type: Documentation Problem -Assigned To: +Assigned To: cmb
 [2020-09-14 10:10 UTC] cmb@php.net
Thanks for the fine analysis, @pudge601!

I have to add that RFC 5322 does not mandate that a message has to
end with a newline (CRLF)[1], so the current behavior is an
unfortunate limitation, but I see no reasonable way to fix this.
The suggested solution (2) would break BC, since it is currently
supported to incrementally parse a message even after some of it's
contents have been queried.  The suggested solution (1) would only
cater to incremental parsing, and can easily be catered to by the
userland developer by explicitly parsing a final CRLF if it is
missing from the message.

So changing to documentation problem.

[1] <https://tools.ietf.org/html/rfc5322#section-3.5>
 [2020-09-14 11:43 UTC] cmb@php.net
-Status: Analyzed +Status: Closed
 [2020-09-14 11:44 UTC] phpdocbot@php.net
Automatic comment on behalf of cmb
Revision: http://git.php.net/?p=doc/en.git;a=commit;h=393365367eebdb347f8c643279b143edf3ada04f
Log: Fix #75923: Body is fetched without the last line
 [2020-09-15 00:50 UTC] phpdocbot@php.net
Automatic comment on behalf of mumumu
Revision: http://git.php.net/?p=doc/ja.git;a=commit;h=46d8304bcc5739b7859c232c5316aacfa0af81ca
Log: Fix #75923: Body is fetched without the last line
 [2020-12-30 11:59 UTC] nikic@php.net
Automatic comment on behalf of mumumu
Revision: http://git.php.net/?p=doc/ja.git;a=commit;h=cb2aa58b4a23fa264cdb92a091f5d46adc0e43fd
Log: Fix #75923: Body is fetched without the last line
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Sat Dec 21 11:01:30 2024 UTC