[linux] 01/02: mm, gup: close FOLL MAP_PRIVATE race (CVE-2016-5195)

debian-kernel at lists.debian.org debian-kernel at lists.debian.org
Wed Oct 19 18:21:21 UTC 2016


This is an automated email from the git hooks/post-receive script.

benh pushed a commit to branch wheezy-security
in repository linux.

commit b92cb2120e58e73f782243a83ed1dba847f56ee6
Author: Ben Hutchings <ben at decadent.org.uk>
Date:   Mon Oct 17 20:24:29 2016 +0100

    mm, gup: close FOLL MAP_PRIVATE race (CVE-2016-5195)
---
 debian/changelog                                   |   1 +
 .../all/mm-gup-close-foll_map_private-race.patch   | 137 +++++++++++++++++++++
 debian/patches/series                              |   1 +
 3 files changed, 139 insertions(+)

diff --git a/debian/changelog b/debian/changelog
index a6a55f9..5a2a012 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -78,6 +78,7 @@ linux (3.2.82-1) UNRELEASED; urgency=medium
     - net: add a lock around icmp_sk()
     - fs/dcache: resched/chill only if we make no progress
     - fs/dcache: incremental fixup of the retry routine
+  * mm, gup: close FOLL MAP_PRIVATE race (CVE-2016-5195)
 
  -- Ben Hutchings <ben at decadent.org.uk>  Sun, 04 Sep 2016 14:08:46 +0100
 
diff --git a/debian/patches/bugfix/all/mm-gup-close-foll_map_private-race.patch b/debian/patches/bugfix/all/mm-gup-close-foll_map_private-race.patch
new file mode 100644
index 0000000..a37c914
--- /dev/null
+++ b/debian/patches/bugfix/all/mm-gup-close-foll_map_private-race.patch
@@ -0,0 +1,137 @@
+From: Michal Hocko <mhocko at suse.com>
+Date: Sun, 16 Oct 2016 11:55:00 +0200
+Subject: mm, gup: close FOLL MAP_PRIVATE race
+
+faultin_page drops FOLL_WRITE after the page fault handler did the CoW
+and then we retry follow_page_mask to get our CoWed page. This is racy,
+however because the page might have been unmapped by that time and so
+we would have to do a page fault again, this time without CoW. This
+would cause the page cache corruption for FOLL_FORCE on MAP_PRIVATE
+read only mappings with obvious consequences.
+
+This is an ancient bug that was actually already fixed once by Linus
+eleven years ago in commit 4ceb5db9757a ("Fix get_user_pages() race
+for write access") but that was then undone due to problems on s390
+by commit f33ea7f404e5 ("fix get_user_pages bug") because s390 didn't
+have proper dirty pte tracking until abf09bed3cce ("s390/mm: implement
+software dirty bits"). This wasn't a problem at the time as pointed out
+by Hugh Dickins because madvise relied on mmap_sem for write up until
+0a27a14a6292 ("mm: madvise avoid exclusive mmap_sem") but since then we
+can race with madvise which can unmap the fresh COWed page or with KSM
+and corrupt the content of the shared page.
+
+This patch is based on the Linus' approach to not clear FOLL_WRITE after
+the CoW page fault (aka VM_FAULT_WRITE) but instead introduces FOLL_COW
+to note this fact. The flag is then rechecked during follow_pfn_pte to
+enforce the page fault again if we do not see the CoWed page. Linus was
+suggesting to check pte_dirty again as s390 is OK now. But that would
+make backporting to some old kernels harder. So instead let's just make
+sure that vm_normal_page sees a pure anonymous page.
+
+This would guarantee we are seeing a real CoW page. Introduce
+can_follow_write_pte which checks both pte_write and falls back to
+PageAnon on forced write faults which passed CoW already. Thanks to Hugh
+to point out that a special care has to be taken for KSM pages because
+our COWed page might have been merged with a KSM one and keep its
+PageAnon flag.
+
+Fixes: 0a27a14a6292 ("mm: madvise avoid exclusive mmap_sem")
+Reported-by: Phil "not Paul" Oester <kernel at linuxace.com>
+Disclosed-by: Andy Lutomirski <luto at kernel.org>
+Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
+Signed-off-by: Michal Hocko <mhocko at suse.com>
+[bwh: Backported to 3.2:
+ - Adjust filename, context, indentation
+ - The 'no_page' exit path in follow_page() is different, so open-code the
+   cleanup
+ - Delete a now-unused label]
+Signed-off-by: Ben Hutchings <ben at decadent.org.uk>
+---
+ include/linux/mm.h |  1 +
+ mm/memory.c        | 39 ++++++++++++++++++++++++++++-----------
+ 2 files changed, 29 insertions(+), 11 deletions(-)
+
+--- a/include/linux/mm.h
++++ b/include/linux/mm.h
+@@ -1527,6 +1527,7 @@ struct page *follow_page(struct vm_area_
+ #define FOLL_MLOCK	0x40	/* mark page as mlocked */
+ #define FOLL_SPLIT	0x80	/* don't return transhuge pages, split them */
+ #define FOLL_HWPOISON	0x100	/* check page is hwpoisoned */
++#define FOLL_COW	0x4000	/* internal GUP flag */
+ 
+ typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
+ 			void *data);
+--- a/mm/memory.c
++++ b/mm/memory.c
+@@ -1427,6 +1427,24 @@ int zap_vma_ptes(struct vm_area_struct *
+ }
+ EXPORT_SYMBOL_GPL(zap_vma_ptes);
+ 
++static inline bool can_follow_write_pte(pte_t pte, struct page *page,
++					unsigned int flags)
++{
++	if (pte_write(pte))
++		return true;
++
++	/*
++	 * Make sure that we are really following CoWed page. We do not really
++	 * have to care about exclusiveness of the page because we only want
++	 * to ensure that once COWed page hasn't disappeared in the meantime
++	 * or it hasn't been merged to a KSM page.
++	 */
++	if ((flags & FOLL_FORCE) && (flags & FOLL_COW))
++		return page && PageAnon(page) && !PageKsm(page);
++
++	return false;
++}
++
+ /**
+  * follow_page - look up a page descriptor from a user-virtual address
+  * @vma: vm_area_struct mapping @address
+@@ -1509,10 +1527,13 @@ split_fallthrough:
+ 	pte = *ptep;
+ 	if (!pte_present(pte))
+ 		goto no_page;
+-	if ((flags & FOLL_WRITE) && !pte_write(pte))
+-		goto unlock;
+ 
+ 	page = vm_normal_page(vma, address, pte);
++	if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, page, flags)) {
++		pte_unmap_unlock(ptep, ptl);
++		return NULL;
++	}
++
+ 	if (unlikely(!page)) {
+ 		if ((flags & FOLL_DUMP) ||
+ 		    !is_zero_pfn(pte_pfn(pte)))
+@@ -1555,7 +1576,7 @@ split_fallthrough:
+ 			unlock_page(page);
+ 		}
+ 	}
+-unlock:
++
+ 	pte_unmap_unlock(ptep, ptl);
+ out:
+ 	return page;
+@@ -1789,17 +1810,13 @@ int __get_user_pages(struct task_struct
+ 				 * The VM_FAULT_WRITE bit tells us that
+ 				 * do_wp_page has broken COW when necessary,
+ 				 * even if maybe_mkwrite decided not to set
+-				 * pte_write. We can thus safely do subsequent
+-				 * page lookups as if they were reads. But only
+-				 * do so when looping for pte_write is futile:
+-				 * in some cases userspace may also be wanting
+-				 * to write to the gotten user page, which a
+-				 * read fault here might prevent (a readonly
+-				 * page might get reCOWed by userspace write).
++				 * pte_write. We cannot simply drop FOLL_WRITE
++				 * here because the COWed page might be gone by
++				 * the time we do the subsequent page lookups.
+ 				 */
+ 				if ((ret & VM_FAULT_WRITE) &&
+ 				    !(vma->vm_flags & VM_WRITE))
+-					foll_flags &= ~FOLL_WRITE;
++					foll_flags |= FOLL_COW;
+ 
+ 				cond_resched();
+ 			}
diff --git a/debian/patches/series b/debian/patches/series
index f1b424f..0b2f21b 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -1112,6 +1112,7 @@ bugfix/all/tcp-fix-use-after-free-in-tcp_xmit_retransmit_queue.patch
 bugfix/all/bluetooth-fix-potential-null-dereference-in-rfcomm-b.patch
 bugfix/all/keys-fix-short-sprintf-buffer-in-proc-keys-show-func.patch
 bugfix/all/scsi-arcmsr-buffer-overflow-in-arcmsr_iop_message_xf.patch
+bugfix/all/mm-gup-close-foll_map_private-race.patch
 
 # ABI maintenance
 debian/perf-hide-abi-change-in-3.2.30.patch

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/kernel/linux.git



More information about the Kernel-svn-changes mailing list