summaryrefslogtreecommitdiffstats
path: root/arch/x86/mm
Commit message (Collapse)AuthorAgeFilesLines
* sched/headers: Prepare to move 'init_task' and 'init_thread_union' from ↵Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | <linux/sched.h> to <linux/sched/task.h> Update all usage sites first. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/headers: Prepare to remove the <linux/mm_types.h> dependency from ↵Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | | | <linux/sched.h> Update code that relied on sched.h including various MM types for them. This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | | | | | | | | <linux/sched/task_stack.h> We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task_stack.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar2017-03-021-0/+2
| | | | | | | | | | | | | | | | | | | | <linux/sched/debug.h> We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/debug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/headers: Prepare for new header dependencies before moving more code ↵Ingo Molnar2017-03-022-0/+2
| | | | | | | | | | | | | | | | | | | | | | | to <linux/sched/mm.h> We are going to split more MM APIs out of <linux/sched.h>, which will have to be picked up from a couple of .c files. The APIs that we are going to move are: arch_pick_mmap_layout() arch_get_unmapped_area() arch_get_unmapped_area_topdown() mm_update_next_owner() Include the header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar2017-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | <linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* mm: add arch-independent testcases for RODATAJinbum Park2017-02-272-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes arch-independent testcases for RODATA. Both x86 and x86_64 already have testcases for RODATA, But they are arch-specific because using inline assembly directly. And cacheflush.h is not a suitable location for rodata-test related things. Since they were in cacheflush.h, If someone change the state of CONFIG_DEBUG_RODATA_TEST, It cause overhead of kernel build. To solve the above issues, write arch-independent testcases and move it to shared location. [jinb.park7@gmail.com: fix config dependency] Link: http://lkml.kernel.org/r/20170209131625.GA16954@pjb1027-Latitude-E5410 Link: http://lkml.kernel.org/r/20170129105436.GA9303@pjb1027-Latitude-E5410 Signed-off-by: Jinbum Park <jinb.park7@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* userfaultfd: non-cooperative: add event for memory unmapsMike Rapoport2017-02-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a non-cooperative userfaultfd monitor copies pages in the background, it may encounter regions that were already unmapped. Addition of UFFD_EVENT_UNMAP allows the uffd monitor to track precisely changes in the virtual memory layout. Since there might be different uffd contexts for the affected VMAs, we first should create a temporary representation for the unmap event for each uffd context and then notify them one by one to the appropriate userfault file descriptors. The event notification occurs after the mmap_sem has been released. [arnd@arndb.de: fix nommu build] Link: http://lkml.kernel.org/r/20170203165141.3665284-1-arnd@arndb.de [mhocko@suse.com: fix nommu build] Link: http://lkml.kernel.org/r/20170202091503.GA22823@dhcp22.suse.cz Link: http://lkml.kernel.org/r/1485542673-24387-3-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: fix get_user_pages() vs device-dax pud mappingsDan Williams2017-02-241-4/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new unit test for the device-dax 1GB enabling currently fails with this warning before hanging the test thread: WARNING: CPU: 0 PID: 21 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1e3/0x1f0 percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic [..] CPU: 0 PID: 21 Comm: rcuos/1 Tainted: G O 4.10.0-rc7-next-20170207+ #944 [..] Call Trace: dump_stack+0x86/0xc3 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5f/0x80 ? rcu_nocb_kthread+0x27a/0x510 ? dax_pmem_percpu_exit+0x50/0x50 [dax_pmem] percpu_ref_switch_to_atomic_rcu+0x1e3/0x1f0 ? percpu_ref_exit+0x60/0x60 rcu_nocb_kthread+0x339/0x510 ? rcu_nocb_kthread+0x27a/0x510 kthread+0x101/0x140 The get_user_pages() path needs to arrange for references to be taken against the dev_pagemap instance backing the pud mapping. Refactor the existing __gup_device_huge_pmd() to also account for the pud case. Link: http://lkml.kernel.org/r/148653181153.38226.9605457830505509385.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, x86: add support for PUD-sized transparent hugepagesMatthew Wilcox2017-02-241-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current transparent hugepage code only supports PMDs. This patch adds support for transparent use of PUDs with DAX. It does not include support for anonymous pages. x86 support code also added. Most of this patch simply parallels the work that was done for huge PMDs. The only major difference is how the new ->pud_entry method in mm_walk works. The ->pmd_entry method replaces the ->pte_entry method, whereas the ->pud_entry method works along with either ->pmd_entry or ->pte_entry. The pagewalk code takes care of locking the PUD before calling ->pud_walk, so handlers do not need to worry whether the PUD is stable. [dave.jiang@intel.com: fix SMP x86 32bit build for native_pud_clear()] Link: http://lkml.kernel.org/r/148719066814.31111.3239231168815337012.stgit@djiang5-desk3.ch.intel.com [dave.jiang@intel.com: native_pud_clear missing on i386 build] Link: http://lkml.kernel.org/r/148640375195.69754.3315433724330910314.stgit@djiang5-desk3.ch.intel.com Link: http://lkml.kernel.org/r/148545059381.17912.8602162635537598445.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Alexander Kapshuk <alexander.kapshuk@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jan Kara <jack@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: drop unused argument of zap_page_range()Kirill A. Shutemov2017-02-221-1/+1
| | | | | | | | | | | | | | There's no users of zap_page_range() who wants non-NULL 'details'. Let's drop it. Link: http://lkml.kernel.org/r/20170118122429.43661-3-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/memory_hotplug: set magic number to page->freelist instead of page->lru.nextYasuaki Ishimatsu2017-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To identify that pages of page table are allocated from bootmem allocator, magic number sets to page->lru.next. But page->lru list is initialized in reserve_bootmem_region(). So when calling free_pagetable(), the function cannot find the magic number of pages. And free_pagetable() frees the pages by free_reserved_page() not put_page_bootmem(). But if the pages are allocated from bootmem allocator and used as page table, the pages have private flag. So before freeing the pages, we should clear the private flag by put_page_bootmem(). Before applying the commit 7bfec6f47bb0 ("mm, page_alloc: check multiple page fields with a single branch"), we could find the following visible issue: BUG: Bad page state in process kworker/u1024:1 page:ffffea103cfd8040 count:0 mapcount:0 mappi flags: 0x6fffff80000800(private) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x800(private) <snip> Call Trace: [...] dump_stack+0x63/0x87 [...] bad_page+0x114/0x130 [...] free_pages_prepare+0x299/0x2d0 [...] free_hot_cold_page+0x31/0x150 [...] __free_pages+0x25/0x30 [...] free_pagetable+0x6f/0xb4 [...] remove_pagetable+0x379/0x7ff [...] vmemmap_free+0x10/0x20 [...] sparse_remove_one_section+0x149/0x180 [...] __remove_pages+0x2e9/0x4f0 [...] arch_remove_memory+0x63/0xc0 [...] remove_memory+0x8c/0xc0 [...] acpi_memory_device_remove+0x79/0xa5 [...] acpi_bus_trim+0x5a/0x8d [...] acpi_bus_trim+0x38/0x8d [...] acpi_device_hotplug+0x1b7/0x418 [...] acpi_hotplug_work_fn+0x1e/0x29 [...] process_one_work+0x152/0x400 [...] worker_thread+0x125/0x4b0 [...] kthread+0xd8/0xf0 [...] ret_from_fork+0x22/0x40 And the issue still silently occurs. Until freeing the pages of page table allocated from bootmem allocator, the page->freelist is never used. So the patch sets magic number to page->freelist instead of page->lru.next. [isimatu.yasuaki@jp.fujitsu.com: fix merge issue] Link: http://lkml.kernel.org/r/722b1cc4-93ac-dd8b-2be2-7a7e313b3b0b@gmail.com Link: http://lkml.kernel.org/r/2c29bd9f-5b67-02d0-18a3-8828e78bbb6f@gmail.com Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86/mm/ptdump: Add address marker for KASAN shadow regionAndrey Ryabinin2017-02-161-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Annotate the KASAN shadow with address markers in page table dump output: $ cat /sys/kernel/debug/kernel_page_tables ... ---[ Vmemmap ]--- 0xffffea0000000000-0xffffea0003000000 48M RW PSE GLB NX pmd 0xffffea0003000000-0xffffea0004000000 16M pmd 0xffffea0004000000-0xffffea0005000000 16M RW PSE GLB NX pmd 0xffffea0005000000-0xffffea0040000000 944M pmd 0xffffea0040000000-0xffffea8000000000 511G pud 0xffffea8000000000-0xffffec0000000000 1536G pgd ---[ KASAN shadow ]--- 0xffffec0000000000-0xffffed0000000000 1T ro GLB NX pte 0xffffed0000000000-0xffffed0018000000 384M RW PSE GLB NX pmd 0xffffed0018000000-0xffffed0020000000 128M pmd 0xffffed0020000000-0xffffed0028200000 130M RW PSE GLB NX pmd 0xffffed0028200000-0xffffed0040000000 382M pmd 0xffffed0040000000-0xffffed8000000000 511G pud 0xffffed8000000000-0xfffff50000000000 7680G pgd 0xfffff50000000000-0xfffffbfff0000000 7339776M ro GLB NX pte 0xfffffbfff0000000-0xfffffbfff0200000 2M pmd 0xfffffbfff0200000-0xfffffbfff0a00000 8M RW PSE GLB NX pmd 0xfffffbfff0a00000-0xfffffbffffe00000 244M pmd 0xfffffbffffe00000-0xfffffc0000000000 2M ro GLB NX pte ---[ KASAN shadow end ]--- 0xfffffc0000000000-0xffffff0000000000 3T pgd ---[ ESPfix Area ]--- ... Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: kasan-dev@googlegroups.com Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170214100839.17186-2-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/mm/ptdump: Optimize check for W+X mappings for CONFIG_KASAN=yAndrey Ryabinin2017-02-161-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enabling both DEBUG_WX=y and KASAN=y options significantly increases boot time (dozens of seconds at least). KASAN fills kernel page tables with repeated values to map several TBs of the virtual memory to the single kasan_zero_page: kasan_zero_pud -> kasan_zero_pmd-> kasan_zero_pte-> kasan_zero_page So, the page table walker used to find W+X mapping check the same kasan_zero_p?d page table entries a lot more than once. With patch pud walker will skip the pud if it has the same value as the previous one . Skipping done iff we search for W+X mappings, so this optimization won't affect the page table dump via debugfs. This dropped time spend in W+X check from ~30 sec to reasonable 0.1 sec: Before: [ 4.579991] Freeing unused kernel memory: 1000K [ 35.257523] x86/mm: Checked W+X mappings: passed, no W+X pages found. After: [ 5.138756] Freeing unused kernel memory: 1000K [ 5.266496] x86/mm: Checked W+X mappings: passed, no W+X pages found. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: kasan-dev@googlegroups.com Cc: Tobias Regnery <tobias.regnery@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170214100839.17186-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Merge branch 'linus' into x86/mmThomas Gleixner2017-02-161-0/+2
|\ | | | | | | Make sure to get the latest fixes before applying the ptdump enhancements.
| * x86/mm/ptdump: Fix soft lockup in page table walkerAndrey Ryabinin2017-02-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_KASAN=y needs a lot of virtual memory mapped for its shadow. In that case ptdump_walk_pgd_level_core() takes a lot of time to walk across all page tables and doing this without a rescheduling causes soft lockups: NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/0:1] ... Call Trace: ptdump_walk_pgd_level_core+0x40c/0x550 ptdump_walk_pgd_level_checkwx+0x17/0x20 mark_rodata_ro+0x13b/0x150 kernel_init+0x2f/0x120 ret_from_fork+0x2c/0x40 I guess that this issue might arise even without KASAN on huge machines with several terabytes of RAM. Stick cond_resched() in pgd loop to fix this. Reported-by: Tobias Regnery <tobias.regnery@gmail.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: kasan-dev@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170210095405.31802-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | x86/mm/pat: Use rb_entry()Geliang Tang2017-02-041-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | To make the code clearer, use rb_entry() instead of open coding it Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/974a91cd4ed2d04c92e4faa4765077e38f248d6b.1482157956.git.geliangtang@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | x86/mm/cpa: Avoid wbinvd() for PREEMPTJohn Ogness2017-01-301-0/+13
|/ | | | | | | | | | | | | | | | | | Although wbinvd() is faster than flushing many individual pages, it blocks the memory bus for "long" periods of time (>100us), thus directly causing unusually large latencies on all CPUs, regardless of any CPU isolation features that may be active. This is an unpriviledged operatation as it is exposed to user space via the graphics subsystem. For 1024 pages, flushing those pages individually can take up to 2200us, but the task remains fully preemptible during that time. Signed-off-by: John Ogness <john.ogness@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/mpx: Use compatible types in comparison to fix sparse errorTobias Klauser2017-01-141-1/+1
| | | | | | | | | | | | | | | | | info->si_addr is of type void __user *, so it should be compared against something from the same address space. This fixes the following sparse error: arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds2016-12-244-4/+4
| | | | | | | | | | | | | This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86/mpx: Move bd_addr to mm_context_tMark Rutland2016-12-171-5/+5
| | | | | | | | | | | | | | | | | | Currently bd_addr lives in mm_struct, which is otherwise architecture independent. Architecture-specific data is supposed to live within mm_context_t (itself contained in mm_struct). Other x86-specific context like the pkey accounting data lives in mm_context_t, and there's no readon the MPX data can't also live there. So as to keep the arch-specific data togather, and to set a good example for others, this patch moves bd_addr into x86's mm_context_t. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1481892055-24596-1-git-send-email-mark.rutland@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/mm: Drop unused argument 'removed' from sync_global_pgds()Kirill A. Shutemov2016-12-152-18/+8
| | | | | | | | | | | | | | Since commit af2cf278ef4f ("x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()") there are no callers of sync_global_pgds() which set the 'removed' argument to 1. Remove the argument and the related conditionals in the function. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20161214234403.137556-1-kirill.shutemov@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* ACPI/NUMA: Do not map pxm to node when NUMA is turned offBoris Ostrovsky2016-12-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | acpi_map_pxm_to_node() unconditially maps nodes even when NUMA is turned off. So acpi_get_node() might return a node > 0, which is fatal when NUMA is disabled as the rest of the kernel assumes that only node 0 exists. Expose numa_off to the acpi code and return NUMA_NO_NODE when it's set. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: linux-ia64@vger.kernel.org Cc: catalin.marinas@arm.com Cc: rjw@rjwysocki.net Cc: will.deacon@arm.com Cc: linux-acpi@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: lenb@kernel.org Link: http://lkml.kernel.org/r/1481602709-18260-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds2016-12-121-2/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FPU updates from Ingo Molnar: "The main changes in this cycle were: - do a large round of simplifications after all CPUs do 'eager' FPU context switching in v4.9: remove CR0 twiddling, remove leftover eager/lazy bts, etc (Andy Lutomirski) - more FPU code simplifications: remove struct fpu::counter, clarify nomenclature, remove unnecessary arguments/functions and better structure the code (Rik van Riel)" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Remove clts() x86/fpu: Remove stts() x86/fpu: Handle #NM without FPU emulation as an error x86/fpu, lguest: Remove CR0.TS support x86/fpu, kvm: Remove host CR0.TS manipulation x86/fpu: Remove irq_ts_save() and irq_ts_restore() x86/fpu: Stop saving and restoring CR0.TS in fpu__init_check_bugs() x86/fpu: Get rid of two redundant clts() calls x86/fpu: Finish excising 'eagerfpu' x86/fpu: Split old_fpu & new_fpu handling into separate functions x86/fpu: Remove 'cpu' argument from __cpu_invalidate_fpregs_state() x86/fpu: Split old & new FPU code paths x86/fpu: Remove __fpregs_(de)activate() x86/fpu: Rename lazy restore functions to "register state valid" x86/fpu, kvm: Remove KVM vcpu->fpu_counter x86/fpu: Remove struct fpu::counter x86/fpu: Remove use_eager_fpu() x86/fpu: Remove the XFEATURE_MASK_EAGER/LAZY distinction x86/fpu: Hard-disable lazy FPU mode x86/crypto, x86/fpu: Remove X86_FEATURE_EAGER_FPU #ifdef from the crc32c code
| * Merge branch 'linus' into x86/fpu, to resolve conflictsIngo Molnar2016-11-231-1/+6
| |\ | | | | | | | | | | | | | | | | | | Conflicts: arch/x86/kernel/fpu/core.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * \ Merge branch 'core/urgent' into x86/fpu, to merge fixesIngo Molnar2016-11-014-7/+20
| |\ \ | | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | x86/fpu: Finish excising 'eagerfpu'Andy Lutomirski2016-10-181-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that eagerfpu= is gone, remove it from the docs and some comments. Also sync the changes to tools/. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/cf430dd4481d41280e93ac6cf0def1007a67fc8e.1476740397.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge branch 'x86-asm-for-linus' of ↵Linus Torvalds2016-12-121-2/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "The main changes in this development cycle were: - a large number of call stack dumping/printing improvements: higher robustness, better cross-context dumping, improved output, etc. (Josh Poimboeuf) - vDSO getcpu() performance improvement for future Intel CPUs with the RDPID instruction (Andy Lutomirski) - add two new Intel AVX512 features and the CPUID support infrastructure for it: AVX512IFMA and AVX512VBMI. (Gayatri Kammela, He Chen) - more copy-user unification (Borislav Petkov) - entry code assembly macro simplifications (Alexander Kuleshov) - vDSO C/R support improvements (Dmitry Safonov) - misc fixes and cleanups (Borislav Petkov, Paul Bolle)" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) scripts/decode_stacktrace.sh: Fix address line detection on x86 x86/boot/64: Use defines for page size x86/dumpstack: Make stack name tags more comprehensible selftests/x86: Add test_vdso to test getcpu() x86/vdso: Use RDPID in preference to LSL when available x86/dumpstack: Handle NULL stack pointer in show_trace_log_lvl() x86/cpufeatures: Enable new AVX512 cpu features x86/cpuid: Provide get_scattered_cpuid_leaf() x86/cpuid: Cleanup cpuid_regs definitions x86/copy_user: Unify the code by removing the 64-bit asm _copy_*_user() variants x86/unwind: Ensure stack grows down x86/vdso: Set vDSO pointer only after success x86/prctl/uapi: Remove #ifdef for CHECKPOINT_RESTORE x86/unwind: Detect bad stack return address x86/dumpstack: Warn on stack recursion x86/unwind: Warn on bad frame pointer x86/decoder: Use stderr if insn sanity test fails x86/decoder: Use stdout if insn decoder test is successful mm/page_alloc: Remove kernel address exposure in free_reserved_area() x86/dumpstack: Remove raw stack dump ...
| * \ \ \ Merge branch 'linus' into x86/asm, to pick up fixesIngo Molnar2016-11-012-3/+17
| |\ \ \ \ | | | |/ / | | |/| | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | x86/dumpstack: Remove kernel text addresses from stack dumpJosh Poimboeuf2016-10-251-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Printing kernel text addresses in stack dumps is of questionable value, especially now that address randomization is becoming common. It can be a security issue because it leaks kernel addresses. It also affects the usefulness of the stack dump. Linus says: "I actually spend time cleaning up commit messages in logs, because useless data that isn't actually information (random hex numbers) is actively detrimental. It makes commit logs less legible. It also makes it harder to parse dumps. It's not useful. That makes it actively bad. I probably look at more oops reports than most people. I have not found the hex numbers useful for the last five years, because they are just randomized crap. The stack content thing just makes code scroll off the screen etc, for example." The only real downside to removing these addresses is that they can be used to disambiguate duplicate symbol names. However such cases are rare, and the context of the stack dump should be enough to be able to figure it out. There's now a 'faddr2line' script which can be used to convert a function address to a file name and line: $ ./scripts/faddr2line ~/k/vmlinux write_sysrq_trigger+0x51/0x60 write_sysrq_trigger+0x51/0x60: write_sysrq_trigger at drivers/tty/sysrq.c:1098 Or gdb can be used: $ echo "list *write_sysrq_trigger+0x51" |gdb ~/k/vmlinux |grep "is in" (gdb) 0xffffffff815b5d83 is in driver_probe_device (/home/jpoimboe/git/linux/drivers/base/dd.c:378). (But note that when there are duplicate symbol names, gdb will only show the first symbol it finds. faddr2line is recommended over gdb because it handles duplicates and it also does function size checking.) Here's an example of what a stack dump looks like after this change: BUG: unable to handle kernel NULL pointer dereference at (null) IP: sysrq_handle_crash+0x45/0x80 PGD 36bfa067 [ 29.650644] PUD 7aca3067 Oops: 0002 [#1] PREEMPT SMP Modules linked in: ... CPU: 1 PID: 786 Comm: bash Tainted: G E 4.9.0-rc1+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 task: ffff880078582a40 task.stack: ffffc90000ba8000 RIP: 0010:sysrq_handle_crash+0x45/0x80 RSP: 0018:ffffc90000babdc8 EFLAGS: 00010296 RAX: ffff880078582a40 RBX: 0000000000000063 RCX: 0000000000000001 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000292 RBP: ffffc90000babdc8 R08: 0000000b31866061 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000007 R14: ffffffff81ee8680 R15: 0000000000000000 FS: 00007ffb43869700(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000007a3e9000 CR4: 00000000001406e0 Stack: ffffc90000babe00 ffffffff81572d08 ffffffff81572bd5 0000000000000002 0000000000000000 ffff880079606600 00007ffb4386e000 ffffc90000babe20 ffffffff81573201 ffff880036a3fd00 fffffffffffffffb ffffc90000babe40 Call Trace: __handle_sysrq+0x138/0x220 ? __handle_sysrq+0x5/0x220 write_sysrq_trigger+0x51/0x60 proc_reg_write+0x42/0x70 __vfs_write+0x37/0x140 ? preempt_count_sub+0xa1/0x100 ? __sb_start_write+0xf5/0x210 ? vfs_write+0x183/0x1a0 vfs_write+0xb8/0x1a0 SyS_write+0x58/0xc0 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x7ffb42f55940 RSP: 002b:00007ffd33bb6b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007ffb42f55940 RDX: 0000000000000002 RSI: 00007ffb4386e000 RDI: 0000000000000001 RBP: 0000000000000011 R08: 00007ffb4321ea40 R09: 00007ffb43869700 R10: 00007ffb43869700 R11: 0000000000000246 R12: 0000000000778a10 R13: 00007ffd33bb5c00 R14: 0000000000000007 R15: 0000000000000010 Code: 34 e8 d0 34 bc ff 48 c7 c2 3b 2b 57 81 be 01 00 00 00 48 c7 c7 e0 dd e5 81 e8 a8 55 ba ff c7 05 0e 3f de 00 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 5d c3 e8 4c 49 bc ff 84 c0 75 c3 48 c7 RIP: sysrq_handle_crash+0x45/0x80 RSP: ffffc90000babdc8 CR2: 0000000000000000 Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/69329cb29b8f324bb5fcea14d61d224807fb6488.1477405374.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'mm-pat-for-linus' of ↵Linus Torvalds2016-12-121-5/+2
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull mm/PAT cleanup from Ingo Molnar: "A single cleanup for a generic interface that was originally introduced for PAT" * 'mm-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pat, mm: Make track_pfn_insert() return void
| * | | | x86/pat, mm: Make track_pfn_insert() return voidBorislav Petkov2016-11-091-5/+2
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It only returns 0 so we can save us the testing of its retval everywhere. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: mcgrof@suse.com Cc: dri-devel@lists.freedesktop.org Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Airlie <airlied@redhat.com> Cc: dan.j.williams@intel.com Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20161026174839.rusfxkm3xt4ennhe@pd.tnic Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* / | | x86/traps: Ignore high word of regs->cs in early_fixup_exception()Andy Lutomirski2016-11-211-1/+6
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the 80486 DX, it seems that some exceptions may leave garbage in the high bits of CS. This causes sporadic failures in which early_fixup_exception() refuses to fix up an exception. As far as I can tell, this has been buggy for a long time, but the problem seems to have been exacerbated by commits: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4") e1bfc11c5a6f ("x86/init: Fix cr4_init_shadow() on CR4-less machines") This appears to have broken for as long as we've had early exception handling. [ Note to stable maintainers: This patch is needed all the way back to 3.4, but it will only apply to 4.6 and up, as it depends on commit: 0e861fbb5bda ("x86/head: Move early exception panic code into early_fixup_exception()") If you want to backport to kernels before 4.6, please don't backport the prerequisites (there was a big chain of them that rewrote a lot of the early exception machinery); instead, ask me and I can send you a one-liner that will apply. ] Reported-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 4c5023a3fa2e ("x86-32: Handle exception table entries during early boot") Link: http://lkml.kernel.org/r/cb32c69920e58a1a58e7b5cad975038a69c0ce7d.1479609510.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge tag 'drm-x86-pat-regression-fix' of ↵Linus Torvalds2016-10-281-0/+14
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://people.freedesktop.org/~airlied/linux Pull drm x86/pat regression fixes from Dave Airlie: "This is a standalone pull request for the fix for a regression introduced in -rc1 by a change to vm_insert_mixed to start using the PAT range tracking to validate page protections. With this fix in place, all the VRAM mappings for GPU drivers ended up at UC instead of WC. There are probably better ways to fix this long term, but nothing I'd considered for -fixes that wouldn't need more settling in time. So I've just created a new arch API that the drivers can reserve all their VRAM aperture ranges as WC" * tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux: drm/drivers: add support for using the arch wc mapping API. x86/io: add interface to reserve io memtype for a resource range. (v1.1)
| * | | x86/io: add interface to reserve io memtype for a resource range. (v1.1)Dave Airlie2016-10-261-0/+14
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent change to the mm code in: 87744ab3832b mm: fix cache mode tracking in vm_insert_mixed() started enforcing checking the memory type against the registered list for amixed pfn insertion mappings. It happens that the drm drivers for a number of gpus relied on this being broken. Currently the driver only inserted VRAM mappings into the tracking table when they came from the kernel, and userspace mappings never landed in the table. This led to a regression where all the mapping end up as UC instead of WC now. I've considered a number of solutions but since this needs to be fixed in fixes and not next, and some of the solutions were going to introduce overhead that hadn't been there before I didn't consider them viable at this stage. These mainly concerned hooking into the TTM io reserve APIs, but these API have a bunch of fast paths I didn't want to unwind to add this to. The solution I've decided on is to add a new API like the arch_phys_wc APIs (these would have worked but wc_del didn't take a range), and use them from the drivers to add a WC compatible mapping to the table for all VRAM on those GPUs. This means we can then create userspace mapping that won't get degraded to UC. v1.1: use CONFIG_X86_PAT + add some comments in io.h Cc: Toshi Kani <toshi.kani@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: x86@kernel.org Cc: mcgrof@suse.com Cc: Dan Williams <dan.j.williams@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
* / / kconfig.h: remove config_enabled() macroMasahiro Yamada2016-10-271-3/+3
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of config_enabled() is ambiguous. For config options, IS_ENABLED(), IS_REACHABLE(), etc. will make intention clearer. Sometimes config_enabled() has been used for non-config options because it is useful to check whether the given symbol is defined or not. I have been tackling on deprecating config_enabled(), and now is the time to finish this work. Some new users have appeared for v4.9-rc1, but it is trivial to replace them: - arch/x86/mm/kaslr.c replace config_enabled() with IS_ENABLED() because CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean. - include/asm-generic/export.h replace config_enabled() with __is_defined(). Then, config_enabled() can be removed now. Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config options, and __is_defined() for non-config symbols. Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Marek <mmarek@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Garnier <thgarnie@google.com> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'gup_flag-cleanups'Linus Torvalds2016-10-192-4/+3
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge the gup_flags cleanups from Lorenzo Stoakes: "This patch series adjusts functions in the get_user_pages* family such that desired FOLL_* flags are passed as an argument rather than implied by flags. The purpose of this change is to make the use of FOLL_FORCE explicit so it is easier to grep for and clearer to callers that this flag is being used. The use of FOLL_FORCE is an issue as it overrides missing VM_READ/VM_WRITE flags for the VMA whose pages we are reading from/writing to, which can result in surprising behaviour. The patch series came out of the discussion around commit 38e088546522 ("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"), which addressed a BUG_ON() being triggered when a page was faulted in with PROT_NONE set but having been overridden by FOLL_FORCE. do_numa_page() was run on the assumption the page _must_ be one marked for NUMA node migration as an actual PROT_NONE page would have been dealt with prior to this code path, however FOLL_FORCE introduced a situation where this assumption did not hold. See https://marc.info/?l=linux-mm&m=147585445805166 for the patch proposal" Additionally, there's a fix for an ancient bug related to FOLL_FORCE and FOLL_WRITE by me. [ This branch was rebased recently to add a few more acked-by's and reviewed-by's ] * gup_flag-cleanups: mm: replace access_process_vm() write parameter with gup_flags mm: replace access_remote_vm() write parameter with gup_flags mm: replace __access_remote_vm() write parameter with gup_flags mm: replace get_user_pages_remote() write/force parameters with gup_flags mm: replace get_user_pages() write/force parameters with gup_flags mm: replace get_vaddr_frames() write/force parameters with gup_flags mm: replace get_user_pages_locked() write/force parameters with gup_flags mm: replace get_user_pages_unlocked() write/force parameters with gup_flags mm: remove write/force parameters from __get_user_pages_unlocked() mm: remove write/force parameters from __get_user_pages_locked() mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
| * mm: replace get_user_pages() write/force parameters with gup_flagsLorenzo Stoakes2016-10-191-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes the 'write' and 'force' from get_user_pages() and replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm: replace get_user_pages_unlocked() write/force parameters with gup_flagsLorenzo Stoakes2016-10-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | This removes the 'write' and 'force' use from get_user_pages_unlocked() and replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'work.uaccess2' of ↵Linus Torvalds2016-10-111-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess.h prepwork from Al Viro: "Preparations to tree-wide switch to use of linux/uaccess.h (which, obviously, will allow to start unifying stuff for real). The last step there, ie PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ `git grep -l "$PATT"|grep -v ^include/linux/uaccess.h` is not taken here - I would prefer to do it once just before or just after -rc1. However, everything should be ready for it" * 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: remove a stray reference to asm/uaccess.h in docs sparc64: separate extable_64.h, switch elf_64.h to it score: separate extable.h, switch module.h to it mips: separate extable.h, switch module.h to it x86: separate extable.h, switch sections.h to it remove stray include of asm/uaccess.h from cacheflush.h mn10300: remove a bogus processor.h->uaccess.h include xtensa: split uaccess.h into C and asm sides bonding: quit messing with IOCTL kill __kernel_ds_p off mn10300: finish verify_area() off frv: move HAVE_ARCH_UNMAPPED_AREA to pgtable.h exceptions: detritus removal
| * exceptions: detritus removalAl Viro2016-09-271-1/+1
| | | | | | | | | | | | externs and defines for stuff that is never used Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'mm-pkeys-for-linus' of ↵Linus Torvalds2016-10-102-8/+143
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull protection keys syscall interface from Thomas Gleixner: "This is the final step of Protection Keys support which adds the syscalls so user space can actually allocate keys and protect memory areas with them. Details and usage examples can be found in the documentation. The mm side of this has been acked by Mel" * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pkeys: Update documentation x86/mm/pkeys: Do not skip PKRU register if debug registers are not used x86/pkeys: Fix pkeys build breakage for some non-x86 arches x86/pkeys: Add self-tests x86/pkeys: Allow configuration of init_pkru x86/pkeys: Default to a restrictive init PKRU pkeys: Add details of system call use to Documentation/ generic syscalls: Wire up memory protection keys syscalls x86: Wire up protection keys system calls x86/pkeys: Allocation/free syscalls x86/pkeys: Make mprotect_key() mask off additional vm_flags mm: Implement new pkey_mprotect() system call x86/pkeys: Add fault handling for PF_PK page fault bit
| * | x86/pkeys: Allow configuration of init_pkruDave Hansen2016-09-091-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As discussed in the previous patch, there is a reliability benefit to allowing an init value for the Protection Keys Rights User register (PKRU) which differs from what the XSAVE hardware provides. But, having PKRU be 0 (its init value) provides some nonzero amount of optimization potential to the hardware. It can, for instance, skip writes to the XSAVE buffer when it knows that PKRU is in its init state. The cost of losing this optimization is approximately 100 cycles per context switch for a workload which lightly using XSAVE state (something not using AVX much). The overhead comes from a combinaation of actually manipulating PKRU and the overhead of pullin in an extra cacheline. This overhead is not huge, but it's also not something that I think we should unconditionally inflict on everyone. So, make it configurable both at boot-time and from debugfs. Changes to the debugfs value affect all processes created after the write to debugfs. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163023.407672D2@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/pkeys: Default to a restrictive init PKRUDave Hansen2016-09-091-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PKRU is the register that lets you disallow writes or all access to a given protection key. The XSAVE hardware defines an "init state" of 0 for PKRU: its most permissive state, allowing access/writes to everything. Since we start off all new processes with the init state, we start all processes off with the most permissive possible PKRU. This is unfortunate. If a thread is clone()'d [1] before a program has time to set PKRU to a restrictive value, that thread will be able to write to all data, no matter what pkey is set on it. This weakens any integrity guarantees that we want pkeys to provide. To fix this, we define a very restrictive PKRU to override the XSAVE-provided value when we create a new FPU context. We choose a value that only allows access to pkey 0, which is as restrictive as we can practically make it. This does not cause any practical problems with applications using protection keys because we require them to specify initial permissions for each key when it is allocated, which override the restrictive default. In the end, this ensures that threads which do not know how to manage their own pkey rights can not do damage to data which is pkey-protected. I would have thought this was a pretty contrived scenario, except that I heard a bug report from an MPX user who was creating threads in some very early code before main(). It may be crazy, but folks evidently _do_ it. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163021.F3C25D4A@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/pkeys: Allocation/free syscallsDave Hansen2016-09-091-8/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds two new system calls: int pkey_alloc(unsigned long flags, unsigned long init_access_rights) int pkey_free(int pkey); These implement an "allocator" for the protection keys themselves, which can be thought of as analogous to the allocator that the kernel has for file descriptors. The kernel tracks which numbers are in use, and only allows operations on keys that are valid. A key which was not obtained by pkey_alloc() may not, for instance, be passed to pkey_mprotect(). These system calls are also very important given the kernel's use of pkeys to implement execute-only support. These help ensure that userspace can never assume that it has control of a key unless it first asks the kernel. The kernel does not promise to preserve PKRU (right register) contents except for allocated pkeys. The 'init_access_rights' argument to pkey_alloc() specifies the rights that will be established for the returned pkey. For instance: pkey = pkey_alloc(flags, PKEY_DENY_WRITE); will allocate 'pkey', but also sets the bits in PKRU[1] such that writing to 'pkey' is already denied. The kernel does not prevent pkey_free() from successfully freeing in-use pkeys (those still assigned to a memory range by pkey_mprotect()). It would be expensive to implement the checks for this, so we instead say, "Just don't do it" since sane software will never do it anyway. Any piece of userspace calling pkey_alloc() needs to be prepared for it to fail. Why? pkey_alloc() returns the same error code (ENOSPC) when there are no pkeys and when pkeys are unsupported. They can be unsupported for a whole host of reasons, so apps must be prepared for this. Also, libraries or LD_PRELOADs might steal keys before an application gets access to them. This allocation mechanism could be implemented in userspace. Even if we did it in userspace, we would still need additional user/kernel interfaces to tell userspace which keys are being used by the kernel internally (such as for execute-only mappings). Having the kernel provide this facility completely removes the need for these additional interfaces, or having an implementation of this in userspace at all. Note that we have to make changes to all of the architectures that do not use mman-common.h because we use the new PKEY_DENY_ACCESS/WRITE macros in arch-independent code. 1. PKRU is the Protection Key Rights User register. It is a usermode-accessible register that controls whether writes and/or access to each individual pkey is allowed or denied. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/pkeys: Add fault handling for PF_PK page fault bitDave Hansen2016-09-091-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PF_PK means that a memory access violated the protection key access restrictions. It is unconditionally an access_error() because the permissions set on the VMA don't matter (the PKRU value overrides it), and we never "resolve" PK faults (like how a COW can "resolve write fault). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163010.DD1FE1ED@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds2016-10-033-5/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Header file and a wrapper functions cleanup" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Migrate exception table users off module.h and onto extable.h x86: Clean up various simple wrapper functions
| * | | x86: Migrate exception table users off module.h and onto extable.hPaul Gortmaker2016-09-202-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These files were only including module.h for exception table related functions. We've now separated that content out into its own file "extable.h" so now move over to that and avoid all the extra header content in module.h that we don't really need to compile these files. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/20160919210418.30243-1-paul.gortmaker@windriver.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86: Clean up various simple wrapper functionsMasahiro Yamada2016-09-131-3/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove unneeded variables and assignments. While we are here, let's fix the following as well: - Remove unnecessary parentheses - Remove unnecessary unsigned-suffix 'U' from constant values - Reword the comment in set_apic_id() (suggested by Thomas Gleixner) Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andrew Banman <abanman@sgi.com> Cc: Borislav Petkov <bp@suse.de> Cc: Daniel J Blueman <daniel@numascale.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mike Travis <travis@sgi.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steffen Persvold <sp@numascale.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Wei Jiangang <weijg.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/1473573502-27954-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds2016-10-031-1/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The changes in this cycle were: - Save e820 table RAM footprint on larger kernel configurations. (Denys Vlasenko) - pmem related fixes (Dan Williams) - theoretical e820 boundary condition fix (Wei Yang)" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation x86/e820: Use much less memory for e820/e820_saved, save up to 120k x86/e820: Prepare e280 code for switch to dynamic storage x86/e820: Mark some static functions __init x86/e820: Fix very large 'size' handling boundary condition