|
php.net | support | documentation | report a bug | advanced search | search howto | statistics | random bug | login |
[2007-12-09 23:59 UTC] mariusads at helpedia dot com
Description: ------------ I run a website that accepts game cheats submissions from users and displays them in categories and so on. User submits .txt files which are saved on the driver, a certain page on the website reads the text file or a fragment of it, performs htmlentities on it and displays it on the screen. Recently, the hosting company upgraded PHP to PHP 5.2.5 and with htmlentities returned an empty string when trying to escape it. I understand this is probably because of that fix regarding multi-byte characters in string, making htmlentities ignore input. That seems dumb a bit, shouldn't it return at least a string part that's before that multibyte character? Anyway, the file submitted is plain text and I honestly don't know what characters are wrong, that it would make htmlentities to ignore the text. The file is uploaded here: http://www.tgdb.net/a.txt In the scripts I have the following code: function htmlesc($text) { $s = html_entity_decode($text,ENT_QUOTES,'UTF-8'); return htmlentities($s,ENT_QUOTES,'UTF-8');} } The text passes html_entity_decode with no problems but htmlentities returns empty string. If possible, could you please tell me how could I check in the future if a string contains multibyte characters, so that i don't have this problem? Right now, the only solution the hosting company gave to me was to add a rule in .htaccess which makes the server process the PHP files with PHP4. Thank you for your help. Marius Hudea PS. The captcha doesn't seem to work right, I'm sure I didn't get the captcha wrong 8 times in a row Reproduce code: --------------- I've used the code below uploaded on several web servers to test: <html><body> <? $text = $_REQUEST['text']; echo htmlentities($text,ENT_QUOTES,'UTF-8'); ?> <form name="A" method="post"> <textarea name="text"></textarea> <input name="sub" type="submit" value="submit"/> </form> </body></html> Test file: http://www.tgdb.net/a.txt Expected result: ---------------- Expected to have the text displayed on the screen, to have the function return a non-empty string. Expected at least a partial string, up to that error, not having to check scripts for 5 minutes to see what went wrong. Actual result: -------------- Copy and paste text from a.txt results in an empty string. Any other text is processed correctly. PatchesPull RequestsHistoryAllCommentsChangesGit/SVN commits
|
|||||||||||||||||||||||||||||||||||||
Copyright © 2001-2025 The PHP GroupAll rights reserved. |
Last updated: Tue Nov 04 22:00:01 2025 UTC |
The example you gave me does work but my issue has nothing to do with receiving data in textarea or input boxes. The page here: hxxp://www.tgdb.net/pc/faq/5845/Diablo-page1.html tries to open a text document from a certain location on the drive which was previously uploaded by the user in a zip file. If I change the example you gave me to read the text from a file: <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> </head> <body> <pre> <?php $text = file_get_contents('a.txt'); var_dump($text); var_dump(htmlentities($text,ENT_QUOTES,'UTF-8')); ?> </pre> <form name="A" method="post"> <textarea name="text"></textarea> <input name="sub" type="submit" value="submit"/> </form> </body></html> which basically gets the text from the text file, htmlentities no longer works as I expected it. I guess with your example, the browser corrects the text pasted in the form before sending it to the script. Almost all files I have are like the NFO files in scene releases, they contain drawings made with ASCII characters, but I need them escaped because the whole page is sent as UTF-8. They don't have multibyte stuff on purpose and they're not corrupted or something like that. It's also not feasible to tell the user to paste them in a text area in a form, almost all submitted content is sent in a ZIP file and my scripts extract the text file. Some files are also about 4-600KB in size (walkthroughs) so using text area is out of the question. I'm surprised to see that htmlentities returns a blank string and I honestly don't know how to fix this so any help would be great. Shouldn't it return the string up until the point where multibyte characters are, whatever that means? Could my problem be solved if I don't use the charset argument or if I use htmlspecialchars? (i'll test...) Or is there another solution to my predicament? Thank you again for replying to my questions. I had bad experiences in the past where other people just ignored my messages after replying with "works for me". I really appreciate it.It comes down to what to do with invalid input. We can't let invalid UTF-8 through, because if you do, your site will be insecure. Before this fix, your site was actually open to XSS exploits since you were spitting invalid UTF-8 chars out on a page marked as UTF-8 and that confuses IE. I suppose we could change htmlentities to just strip the invalid chars, but from a security perspective that is typically not the right approach. You can strip the invalid utf-8 chars yourself with: $str = iconv('utf-8','utf-8',$str);