php.net |  support |  documentation |  report a bug |  advanced search |  search howto |  statistics |  random bug |  login
Bug #80291 Data corruption and data loss in default session handler (All PHP versions)
Submitted: 2020-10-28 09:22 UTC Modified: 2020-10-30 05:11 UTC
Votes:4
Avg. Score:3.0 ± 2.0
Reproduced:1 of 3 (33.3%)
Same Version:0 (0.0%)
Same OS:1 (100.0%)
From: jozyah-etienne at eerees dot com Assigned:
Status: Open Package: Session related
PHP Version: Next Major Version OS:
Private report: No CVE-ID: None
 [2020-10-28 09:22 UTC] jozyah-etienne at eerees dot com
Description:
------------
PHP's default session handler (on all versions so far) do lock files which is good for preventing race conditions, but it's write operation is not atomic. It directly writes data to the destination file, which in case of failure (power outage, disk failure, crash, connection loss to a networked storage, etc) session file will get corrupted.

The correct way to write session data to files is writing data to a temporary file (such as /path/to/sessions/sess_id.tmp) and only on success, rename it to needed destination (/path/to/sessions/sess_id). Then if rename was successful, the write operation was successful too, otherwise the old data remains intact.

Also temporary file should be in the same destination directory, otherwise if temp file and destination file are in separate devices or mount points, data loss may happen too.

In modern operating systems (POSIX OSes and NTFS on Windows 10 1607 and later, and maybe other operating systems), rename is an atomic operation and should be used to minimize failures and mitigate data loss.

Garbage collector can remove temporary files in case of failure, or those files can get updated in next write operation. Also there is no need to acquire a lock on temporary files, because whole session operation is already locked. 

More info in this Stack Overflow's thread:
https://stackoverflow.com/questions/64565698/are-php-session-writes-atomic

Expected result:
----------------
Session's write operation should be atomic.

Actual result:
--------------
Session's write operation is not atomic.

Patches

Pull Requests

History

AllCommentsChangesGit/SVN commitsRelated reports
 [2020-10-28 09:49 UTC] jozyah-etienne at eerees dot com
-Summary: Data loss on session_write +Summary: Data loss in default session handler while on write operation
 [2020-10-28 09:49 UTC] jozyah-etienne at eerees dot com
Fixed summary
 [2020-10-28 09:56 UTC] fgfgfgfgfg at gmail dot com
> It directly writes data to the destination file, which 
> in case of failure (power outage, disk failure, crash, 
> connection loss to a networked storage, etc) session 
> file will get corrupted

how often do you have poweroutage without UPS, dying disks without RAID and crashes at all on production servers?

the one case every few decades where *probably* a ssession file my get corrupt don't matter that much

> The correct way to write session data to files is writing data 
> to a temporary file (such as /path/to/sessions/sess_id.tmp) 
> and only on success, rename it to needed destination

for sure not!

looks like you never where targetr auf a serious DDOS!
in that case you are graceful for every overhead you can save

session files are *temporary* data and if you really insist in atomic writes please use a sql-based session handler but don't make my setups slower for cases which never happen at all
 [2020-10-28 17:00 UTC] jozyah-etienne at eerees dot com
> session files are *temporary* data and if you really insist in atomic writes please use a sql-based session handler but don't make my setups slower for cases which never happen at all

They are *temporary*, not *cached*.
Temporary data are important, cached data are not.
You need those temporary data to serve your user, otherwise bad things can happen, from a simple user logout to massive data losses.
Also, even cached data are important if a data loss during write could break your production and you are not aware of it.

> looks like you never where targetr auf a serious DDOS!
> in that case you are graceful for every overhead you can save

I just executed the following code:

#$i = 0;
#$start = microtime(true);
#do
#{
#    rename('./temp.'.$i, './temp.'.($i+1));
#} while($i++ < 100000);
#$total = microtime(true) - $start;
#var_dump($total);

It took less than 1 second to rename a file for 100,000 times in my 7 years old, 2 core, super slow laptop. That's a 0.00001 seconds overhead per request. You know why? because the needed inodes (or whatever needed in the process) are already found and cached by operating system & PHP during write operation.

If you really need to optimize your site for 0.00001 seconds, you are long in a wrong way, because definitely PHP is not the right tool for your use case. Also in that case, your hardware infrastructure is also not meeting your needs and spending a couple of $bucks to upgrading your hardware is much better than putting yourself in danger of data loss.


> how often do you have poweroutage without UPS, dying disks without RAID and crashes at all on production servers?
> 
> the one case every few decades where *probably* a ssession file my get corrupt don't matter that much

You are wrong, it's not about how good your infrastructure is, it's about how reliable your tool is and how much you can trust it.
It's all about how important your data is and how often you (as a programmer) know that your data is really important and also you know that the tool you are using is not reliable so you know that you shouldn't trust the tool and you know that you need to handle the scenario all by yourself so in case of data loss, you don't need to bang your head against the wall to find the source of problem.

It's all about trusting the tool you are using that it can do it's job, and do it well, and do it in a reliable fashion.

And to answer your specific question, yes, data gets lost all the time, even in professional-grade servers in profession-grade data centers, that's why databases are ACID and we use transactions to avoid such marginal, once in a life time disaster cases, otherwise they would be way faster than they are.

And to answer your question with other questions:
* How often do you have SQL injection or code injection? So why bother and use precious CPU clocks and RAM and Disk space to convert passwords to hashes using slow algorithms?
* How often do you hit with a timing attack on a non-real-time, threaded server and operating system? So why bother yourself and use hash_equals() instead of "==="?
* How often do you hit with a serious DDoS attack that a 0.00001 seconds long operation becomes so important to you that you prefer to ask for troubles instead of doing jobs in secure, reliable way?

I'm saying, most programmers are using standard session handler and they are not aware that data loss and data corruption can happen. This is an important bug that needs to be fixed. If a 0.00001 seconds operation is so important in DDoS attacks, at least a option like session_write_close_atomic() should be added so the programmer can chooses between reliability and 0.00001 seconds optimization.
 [2020-10-28 17:14 UTC] jozyah-etienne at eerees dot com
-Summary: Data loss in default session handler while on write operation +Summary: Data corruption and data loss in default session handler (All PHP versions)
 [2020-10-28 17:14 UTC] jozyah-etienne at eerees dot com
Better summary :D
 [2020-10-28 20:46 UTC] rtrtrtrtrt at dfdfdfdf dot dfd
> It took less than 1 second to rename a file for 100,000 times

in 1 second i spit out 20000-50000 dynamic pages 

nobody cares about session files, they are in tmpfs here for 20 years

the worst case every few decades is that one needs to login again which is the same for every ordinary reboot in that setup


> I'm saying, most programmers are using standard session 
> handler and they are not aware that data loss and data 
> corruption can happen

it don't happen except for cases where you have *much larger* problems - period

> It's all about how important your data is

session data aren't unless you have way bigger problems and then they are not important
 [2020-10-29 02:44 UTC] jozyah-etienne at eerees dot com
> nobody cares about session files, they are in tmpfs here for 20 years
> session data aren't unless you have way bigger problems and then they are not important

Who says so? Are there any conventions out there that programmers agreed on? Are there any notifications in docs saying so? NO!

> the worst case every few decades is that one needs to login again which is the same for every ordinary reboot in that setup

And he/she will lose all his/her session data because some unknown guy on internet said that a very very negligible overhead worth much more than his/her session data!

I don't know about the default handler handler, but what if he/she can not logout? what if he needs to clear the cookie from the browser by him/herself? What if there are hundreds of users that lost their data and they all have problem logging out? How many hours the programmers need to scratch their head to find the reason?

> it don't happen except for cases where you have *much larger* problems - period
And if a very very negligible overhead is so problematic to you that you want to risk your users' data and their experience on trust on you, you have *much larger* problems too - period
 [2020-10-29 02:50 UTC] jozyah-etienne at eerees dot com
If nobody here wants to have this very very negligible overhead in his/her websites, that's fine. But docs should clearly mention this problem and also introduce a solid way to those who care for their users data and experience and users' trust on site owners and programmers trust on PHP as a tool.

There are 3 ways to solve the issue:
1. Fix the problem as a bug for all PHP websites out there.
2. Introduce a new function such as session_write_close_atomic() so programmers can decide.
3. Introduce a new boolean config in php.ini such as session.atomic_write so sysadmins can decide.

And please don't say that data is not important, they are, at least to those of us who care.
 [2020-10-29 09:33 UTC] rtrtrtrtrt at dfdfdfdf dot dfd
> And if a very very negligible overhead is so problematic 
> to you that you want to risk your users' data

yes because "in case of failure (power outage, disk failure, crash, connection loss to a networked storage" don't happen in the real world

* each server has two power supplies
* each power supply is on a different UPS
* disk failures don't matter on redundant arrays
* my servers don't crash
* my storags don't lose connections - redundancy is the keyword

again: your issue don't exist in ten real world and you should fix the root cause
 [2020-10-29 18:38 UTC] xx_pulse_xx at idkwhatiamdoing dot com
> Session's write operation should be atomic.

Why? Is there a standard somewhere? 
Sessions are designed to store data related to the current session, the issues you raise in this bug reports and the comment tend to show unreliable patterns in your code. 

Not worrying about user data and not worrying about an issue that might never happen is different. And the issue we are talking about is a user having to re-connect. (if your code is correctly designed, it's the only thing that should happen).

This is not a bug, and should not be an option. You can create a custom session save handler for such purposes. However I agree that the docs could use a small paragraph about the session_writes not being atomic and thus data should not be saved only in session (but again, that should already be the case even without worrying about session writes atomicity).
 [2020-10-30 00:57 UTC] jozyah-etienne at eerees dot com
> Why? Is there a standard somewhere? 

When the docs advertise session write operation without any warning that data may gets lost, it means that it is reliable, while it's clearly not. I don't know about any standard out there.

> Not worrying about user data and not worrying about an issue that might never happen is different. And the issue we are talking about is a user having to re-connect. (if your code is correctly designed, it's the only thing that should happen).

I tried to manipulate session files and it seems that PHP silently clears $_SESSION data in case of data corruption. That's a relief because user session automatically restarts and the user just logouts and there will be no issues in re-login at all. 

But some data is gets lost without any sysadmin/programmer gets noticed about because PHP doesn't trigger any errors at all. Maybe that's why nobody is worried about this issue yet and this issue seems so unrealistic to happen. Who knows how many times people lost their session because of this bug? Who spends time to report a unwanted logout to website owners? (And believe me, I had a lot of unwanted logouts in different sites in the past and nobody including I myself and site owners knows how many of those was because of this bug. I never reported those issues and just signed in again. Maybe that situation happened to you too?)

> You can create a custom session save handler for such purposes.

We do that for years, but it needs a lot of knowledge to write a good session handler to save data in files. We currently are using DBMS for convenience, but It takes a lot of time to connect and read and save transactions compared to PHP's original files handler. (I've explained our situation in the Stack Overflow page that is linked in the original post above). Anyway, I believe the only issue with modern default session handler is atomic writes and not notifying in case of data corruption. So why not just fix it and put the problem on programmers shoulder? And how many of us knows enough to write a perfect session handler (Years ago I spent one whole week to write our own session handler and I mess a lot at that time while developing it, and at the end we decided to use DBMS with transactions. It is a tricky topic and people will make more problems with rolling out their own session handler, I believe).

> This is not a bug, and should not be an option.

But to some people like us who worry about our data and our users' experience, it clearly is. Specially that PHP silently ignores data corrupted session files and doesn't log  it at all, so nobody knows how often this issue had happened in the past.

> However I agree that the docs could use a small paragraph about the session_writes not being atomic and thus data should not be saved only in session (but again, that should already be the case even without worrying about session writes atomicity).

I believe if this issue isn't going to get fixed for good, not only docs should mention it, but also at least a NOTICE level error must be triggered so people can get notified about it when it happens. Also a reliable way to avoid such conditions should be introduced for those who care, like what I suggested in my last comment (I personally prefer session.atomic_writes with PHP_INI_ALL mode because not only both sysadmin and programmers can change it, but also works when session_write_close_atomic() is not called and so PHP automatically tries to save the session at the end of execution).
 [2020-10-30 01:06 UTC] jozyah-etienne at eerees dot com
> You can create a custom session save handler for such purposes.

Also custom session handlers doesn't work with some PHP features and makes more problems - I'm looking at you, session.upload_progress.enabled INI option :D
 [2020-10-30 04:11 UTC] rtrtrtrtrt at dfdfdfdf dot dfd
> Who knows how many times people lost their session because of this bug?

get some relieable hardware and until then you have bigger problems!
how did the wordl survive writeback-storage until you came?
 [2020-10-30 05:11 UTC] requinix@php.net
-Block user comment: No +Block user comment: Yes
 [2020-10-30 05:11 UTC] requinix@php.net
This is a completely reasonable request.
 
PHP Copyright © 2001-2024 The PHP Group
All rights reserved.
Last updated: Thu Nov 21 13:01:29 2024 UTC