[Pkg-gauche-devel] threads and fork on machine with VIPT-WB cache
NIIBE Yutaka
gniibe at fsij.org
Tue Apr 6 04:57:07 UTC 2010
John David Anglin wrote:
> It is interesting that in the case of the Debian bug that
> a thread of the parent process causes the COW break and thereby corrupts
> its own memory. As far as I can tell, the fork'd child never writes
> to the memory that causes the fault.
Thanks for writing and testing a patch.
The case of #561203 is second scenario. I think that this case is
relevant to VIVT-WB machine too (provided kernel does copy by kernel
address).
James Bottomley wrote:
> So this is going to be a hard sell because of the arch churn. There are,
> however, three ways to do it with the original signature.
Currently, I think that signature change would be inevitable for
ptep_set_wrprotect.
> 1. implement copy_user_highpage ... this allows us to copy through
> the child's page cache (which is coherent with the parent's
> before the cow) and thus pick up any cache changes without a
> flush
Let me think about this way.
Well, this would improve both cases of the first scenario of mine and
the second scenario.
But... I think that even if we would have copy_user_highpage which
does copy by user address, we need to flush at ptep_set_wrprotect. I
think that we need to keep the condition: no dirty cache for COW page.
Think about third scenario of threads and fork:
(1) In process A, there are multiple threads, and a thread A-1 invokes
fork. We have process B, with a different space identifier color.
(2) Another thread A-2 in process A runs while A-1 copies memory by
dup_mmap. A-2 writes to the address <x> in a page. Let's call
this page <oldpage>.
(3) We have dirty cache for <x> by A-2 at the time of
ptep_set_wrprotect of thread A-1. Suppose that we don't flush
here.
(4) A-1 finishes copy, and sleeps.
(5) Child process B is waken up and sees old value at <x> in <oldpage>,
through different cache line. B sleeps.
(6) A-2 is waken up. A-2 touches the memory again, breaks COW. A-2
copies data on <oldpage> to <newpage>. OK, <newpage> is
consistent with copy_user_highpage by user address.
Note that during this copy, the cache line of <x> by A-2 is
flushed out to <oldpage>. It invokes another memory fault and COW
break. (I think that this memory fault is unhealthy.)
Then, new value goes to <x> on <oldpage> (when it's physically
tagged cache).
A-2 sleeps.
(7) Child process B is waken up. When it accesses at <x>, it sees new
value suddenly.
If we flush cache to <oldpage> at ptep_set_wrprotect, this couldn't
occur.
* * *
I know that we should not do "threads and fork". It is difficult to
define clean semantics. Because another thread may touch memory while
a thread which does memory copy for fork, the memory what the child
process will see may be inconsistent. For the child, a page might be
new, while another page might be old.
For VIVT-WB cache machine, I am considering a possibility for the
child process to have inconsistent memory even within a single page
(when we have no flush at ptep_set_wrprotect).
It will be needed for me to talk to linux-arch soon or later.
--
More information about the Pkg-gauche-devel
mailing list