summaryrefslogtreecommitdiffstats
path: root/virt
Commit message (Collapse)AuthorAgeFilesLines
* vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds2018-02-111-4/+4
| | | | | | | | | | | | | | | | | | | | | | | This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2018-02-1012-183/+464
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Radim Krčmář: "ARM: - icache invalidation optimizations, improving VM startup time - support for forwarded level-triggered interrupts, improving performance for timers and passthrough platform devices - a small fix for power-management notifiers, and some cosmetic changes PPC: - add MMIO emulation for vector loads and stores - allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without requiring the complex thread synchronization of older CPU versions - improve the handling of escalation interrupts with the XIVE interrupt controller - support decrement register migration - various cleanups and bugfixes. s390: - Cornelia Huck passed maintainership to Janosch Frank - exitless interrupts for emulated devices - cleanup of cpuflag handling - kvm_stat counter improvements - VSIE improvements - mm cleanup x86: - hypervisor part of SEV - UMIP, RDPID, and MSR_SMI_COUNT emulation - paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit - allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512 features - show vcpu id in its anonymous inode name - many fixes and cleanups - per-VCPU MSR bitmaps (already merged through x86/pti branch) - stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)" * tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits) KVM: PPC: Book3S: Add MMIO emulation for VMX instructions KVM: PPC: Book3S HV: Branch inside feature section KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code KVM: PPC: Book3S PR: Fix broken select due to misspelling KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs() KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled KVM: PPC: Book3S HV: Drop locks before reading guest memory kvm: x86: remove efer_reload entry in kvm_vcpu_stat KVM: x86: AMD Processor Topology Information x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested kvm: embed vcpu id to dentry of vcpu anon inode kvm: Map PFN-type memory regions as writable (if possible) x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n KVM: arm/arm64: Fixup userspace irqchip static key optimization KVM: arm/arm64: Fix userspace_irqchip_in_use counting KVM: arm/arm64: Fix incorrect timer_is_pending logic MAINTAINERS: update KVM/s390 maintainers MAINTAINERS: add Halil as additional vfio-ccw maintainer MAINTAINERS: add David as a reviewer for KVM/s390 ...
| * Merge branch 'x86/hyperv' of ↵Radim Krčmář2018-02-016-28/+33
| |\ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Topic branch for stable KVM clockource under Hyper-V. Thanks to Christoffer Dall for resolving the ARM conflict.
| * | kvm: embed vcpu id to dentry of vcpu anon inodeMasatake YAMATO2018-01-311-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All d-entries for vcpu have the same, "anon_inode:kvm-vcpu". That means it is impossible to know the mapping between fds for vcpu and vcpu from userland. # LC_ALL=C ls -l /proc/617/fd | grep vcpu lrwx------. 1 qemu qemu 64 Jan 7 16:50 18 -> anon_inode:kvm-vcpu lrwx------. 1 qemu qemu 64 Jan 7 16:50 19 -> anon_inode:kvm-vcpu It is also impossible to know the mapping between vma for kvm_run structure and vcpu from userland. # LC_ALL=C grep vcpu /proc/617/maps 7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu 7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu This change adds vcpu id to d-entries for vcpu. With this change you can get the following output: # LC_ALL=C ls -l /proc/617/fd | grep vcpu lrwx------. 1 qemu qemu 64 Jan 7 16:50 18 -> anon_inode:kvm-vcpu:0 lrwx------. 1 qemu qemu 64 Jan 7 16:50 19 -> anon_inode:kvm-vcpu:1 # LC_ALL=C grep vcpu /proc/617/maps 7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu:0 7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393 anon_inode:kvm-vcpu:1 With the mappings known from the output, a tool like strace can report more details of qemu-kvm process activities. Here is the strace output of my local prototype: # ./strace -KK -f -p 617 2>&1 | grep 'KVM_RUN\| K' ... [pid 664] ioctl(18, KVM_RUN, 0) = 0 (KVM_EXIT_MMIO) K ready_for_interrupt_injection=1, if_flag=0, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00 K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112 [pid 664] ioctl(18, KVM_RUN, 0) = 0 (KVM_EXIT_MMIO) K ready_for_interrupt_injection=1, if_flag=1, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00 K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112 ... Signed-off-by: Masatake YAMATO <yamato@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | kvm: Map PFN-type memory regions as writable (if possible)KarimAllah Ahmed2018-01-311-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For EPT-violations that are triggered by a read, the pages are also mapped with write permissions (if their memory region is also writable). That would avoid getting yet another fault on the same page when a write occurs. This optimization only happens when you have a "struct page" backing the memory region. So also enable it for memory regions that do not have a "struct page". Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
| * | Merge tag 'kvm-arm-for-v4.16' of ↵Radim Krčmář2018-01-3110-123/+368
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Changes for v4.16 The changes for this version include icache invalidation optimizations (improving VM startup time), support for forwarded level-triggered interrupts (improved performance for timers and passthrough platform devices), a small fix for power-management notifiers, and some cosmetic changes.
| | * | KVM: arm/arm64: Fixup userspace irqchip static key optimizationChristoffer Dall2018-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I introduced a static key to avoid work in the critical path for userspace irqchips which is very rarely used, I accidentally messed up my logic and used && where I should have used ||, because the point was to short-circuit the evaluation in case userspace irqchips weren't even in use. This fixes an issue when running in-kernel irqchip VMs alongside userspace irqchip VMs. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Fixes: c44c232ee2d3 ("KVM: arm/arm64: Avoid work when userspace iqchips are not used") Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Fix userspace_irqchip_in_use countingChristoffer Dall2018-01-311-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were not decrementing the static key count in the right location. kvm_arch_vcpu_destroy() is only called to clean up after a failed VCPU create attempt, whereas kvm_arch_vcpu_free() is called on teardown of the VM as well. Move the static key decrement call to kvm_arch_vcpu_free(). Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Fix incorrect timer_is_pending logicChristoffer Dall2018-01-311-19/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the recently introduced support for level-triggered mapped interrupt, I accidentally left the VCPU thread busily going back and forward between the guest and the hypervisor whenever the guest was blocking, because I would always incorrectly report that a timer interrupt was pending. This is because the timer->irq.level field is not valid for mapped interrupts, where we offload the level state to the hardware, and as a result this field is always true. Luckily the problem can be relatively easily solved by not checking the cached signal state of either timer in kvm_timer_should_fire() but instead compute the timer state on the fly, which we do already if the cached signal state wasn't high. In fact, the only reason for checking the cached signal state was a tiny optimization which would only be potentially faster when the polling loop detects a pending timer interrupt, which is quite unlikely. Instead of duplicating the logic from kvm_arch_timer_handler(), we enlighten kvm_timer_should_fire() to report something valid when the timer state is loaded onto the hardware. We can then call this from kvm_arch_timer_handler() as well and avoid the call to __timer_snapshot_state() in kvm_arch_timer_get_input_level(). Reported-by: Tomasz Nowicki <tn@semihalf.com> Tested-by: Tomasz Nowicki <tn@semihalf.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Handle CPU_PM_ENTER_FAILEDJames Morse2018-01-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpu_pm_enter() calls the pm notifier chain with CPU_PM_ENTER, then if there is a failure: CPU_PM_ENTER_FAILED. When KVM receives CPU_PM_ENTER it calls cpu_hyp_reset() which will return us to the hyp-stub. If we subsequently get a CPU_PM_ENTER_FAILED, KVM does nothing, leaving the CPU running with the hyp-stub, at odds with kvm_arm_hardware_enabled. Add CPU_PM_ENTER_FAILED as a fallthrough for CPU_PM_EXIT, this reloads KVM based on kvm_arm_hardware_enabled. This is safe even if CPU_PM_ENTER never gets as far as KVM, as cpu_hyp_reinit() calls cpu_hyp_reset() to make sure the hyp-stub is loaded before reloading KVM. Fixes: 67f691976662 ("arm64: kvm: allows kvm cpu hotplug") Cc: <stable@vger.kernel.org> # v4.7+ CC: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Drop vcpu parameter from guest cache maintenance operartionsMarc Zyngier2018-01-081-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vcpu parameter isn't used for anything, and gets in the way of further cleanups. Let's get rid of it. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Preserve Exec permission across R/W permission faultsMarc Zyngier2018-01-081-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, we loose the Exec property whenever we take permission faults, as we always reconstruct the PTE/PMD from scratch. This can be counter productive as we can end-up with the following fault sequence: X -> RO -> ROX -> RW -> RWX Instead, we can lookup the existing PTE/PMD and clear the XN bit in the new entry if it was already cleared in the old one, leadig to a much nicer fault sequence: X -> ROX -> RWX Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Only clean the dcache on translation faultMarc Zyngier2018-01-081-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only case where we actually need to perform a dcache maintenance is when we map the page for the first time, and subsequent permission faults do not require cache maintenance. Let's make it conditional on not being a permission fault (and thus a translation fault). Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Limit icache invalidation to prefetch abortsMarc Zyngier2018-01-081-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've so far eagerly invalidated the icache, no matter how the page was faulted in (data or prefetch abort). But we can easily track execution by setting the XN bits in the S2 page tables, get the prefetch abort at HYP and perform the icache invalidation at that time only. As for most VMs, the instruction working set is pretty small compared to the data set, this is likely to save some traffic (specially as the invalidation is broadcast). Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Split dcache/icache flushingMarc Zyngier2018-01-081-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we're about to introduce opportunistic invalidation of the icache, let's split dcache and icache flushing. Acked-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.hMarc Zyngier2018-01-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_hyp.h has an odd dependency on kvm_mmu.h, which makes the opposite inclusion impossible. Let's start with breaking that useless dependency. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Avoid work when userspace iqchips are not usedChristoffer Dall2018-01-022-19/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently check if the VM has a userspace irqchip in several places along the critical path, and if so, we do some work which is only required for having an irqchip in userspace. This is unfortunate, as we could avoid doing any work entirely, if we didn't have to support irqchip in userspace. Realizing the userspace irqchip on ARM is mostly a developer or hobby feature, and is unlikely to be used in servers or other scenarios where performance is a priority, we can use a refcounted static key to only check the irqchip configuration when we have at least one VM that uses an irqchip in userspace. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Provide a get_input_level for the arch timerChristoffer Dall2018-01-021-46/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VGIC can now support the life-cycle of mapped level-triggered interrupts, and we no longer have to read back the timer state on every exit from the VM if we had an asserted timer interrupt signal, because the VGIC already knows if we hit the unlikely case where the guest disables the timer without ACKing the virtual timer interrupt. This means we rework a bit of the code to factor out the functionality to snapshot the timer state from vtimer_save_state(), and we can reuse this functionality in the sync path when we have an irqchip in userspace, and also to support our implementation of the get_input_level() function for the timer. This change also means that we can no longer rely on the timer's view of the interrupt line to set the active state, because we no longer maintain this state for mapped interrupts when exiting from the guest. Instead, we only set the active state if the virtual interrupt is active, and otherwise we simply let the timer fire again and raise the virtual interrupt from the ISR. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Support VGIC dist pend/active changes for mapped IRQsChristoffer Dall2018-01-023-6/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For mapped IRQs (with the HW bit set in the LR) we have to follow some rules of the architecture. One of these rules is that VM must not be allowed to deactivate a virtual interrupt with the HW bit set unless the physical interrupt is also active. This works fine when injecting mapped interrupts, because we leave it up to the injector to either set EOImode==1 or manually set the active state of the physical interrupt. However, the guest can set virtual interrupt to be pending or active by writing to the virtual distributor, which could lead to deactivating a virtual interrupt with the HW bit set without the physical interrupt being active. We could set the physical interrupt to active whenever we are about to enter the VM with a HW interrupt either pending or active, but that would be really slow, especially on GICv2. So we take the long way around and do the hard work when needed, which is expected to be extremely rare. When the VM sets the pending state for a HW interrupt on the virtual distributor we set the active state on the physical distributor, because the virtual interrupt can become active and then the guest can deactivate it. When the VM clears the pending state we also clear it on the physical side, because the injector might otherwise raise the interrupt. We also clear the physical active state when the virtual interrupt is not active, since otherwise a SPEND/CPEND sequence from the guest would prevent signaling of future interrupts. Changing the state of mapped interrupts from userspace is not supported, and it's expected that userspace unmaps devices from VFIO before attempting to set the interrupt state, because the interrupt state is driven by hardware. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Support a vgic interrupt line level sample functionChristoffer Dall2018-01-022-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GIC sometimes need to sample the physical line of a mapped interrupt. As we know this to be notoriously slow, provide a callback function for devices (such as the timer) which can do this much faster than talking to the distributor, for example by comparing a few in-memory values. Fall back to the good old method of poking the physical GIC if no callback is provided. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: vgic: Support level-triggered mapped interruptsChristoffer Dall2018-01-024-0/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Level-triggered mapped IRQs are special because we only observe rising edges as input to the VGIC, and we don't set the EOI flag and therefore are not told when the level goes down, so that we can re-queue a new interrupt when the level goes up. One way to solve this problem is to side-step the logic of the VGIC and special case the validation in the injection path, but it has the unfortunate drawback of having to peak into the physical GIC state whenever we want to know if the interrupt is pending on the virtual distributor. Instead, we can maintain the current semantics of a level triggered interrupt by sort of treating it as an edge-triggered interrupt, following from the fact that we only observe an asserting edge. This requires us to be a bit careful when populating the LRs and when folding the state back in though: * We lower the line level when populating the LR, so that when subsequently observing an asserting edge, the VGIC will do the right thing. * If the guest never acked the interrupt while running (for example if it had masked interrupts at the CPU level while running), we have to preserve the pending state of the LR and move it back to the line_level field of the struct irq when folding LR state. If the guest never acked the interrupt while running, but changed the device state and lowered the line (again with interrupts masked) then we need to observe this change in the line_level. Both of the above situations are solved by sampling the physical line and set the line level when folding the LR back. * Finally, if the guest never acked the interrupt while running and sampling the line reveals that the device state has changed and the line has been lowered, we must clear the physical active state, since we will otherwise never be told when the interrupt becomes asserted again. This has the added benefit of making the timer optimization patches (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a bit simpler, because the timer code doesn't have to clear the active state on the sync anymore. It also potentially improves the performance of the timer implementation because the GIC knows the state or the LR and only needs to clear the active state when the pending bit in the LR is still set, where the timer has to always clear it when returning from running the guest with an injected timer interrupt. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Don't cache the timer IRQ levelChristoffer Dall2018-01-021-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The timer logic was designed after a strict idea of modeling an interrupt line level in software, meaning that only transitions in the level need to be reported to the VGIC. This works well for the timer, because the arch timer code is in complete control of the device and can track the transitions of the line. However, as we are about to support using the HW bit in the VGIC not just for the timer, but also for VFIO which cannot track transitions of the interrupt line, we have to decide on an interface between the GIC and other subsystems for level triggered mapped interrupts, which both the timer and VFIO can use. VFIO only sees an asserting transition of the physical interrupt line, and tells the VGIC when that happens. That means that part of the interrupt flow is offloaded to the hardware. To use the same interface for VFIO devices and the timer, we therefore have to change the timer (we cannot change VFIO because it doesn't know the details of the device it is assigning to a VM). Luckily, changing the timer is simple, we just need to stop 'caching' the line level, but instead let the VGIC know the state of the timer every time there is a potential change in the line level, and when the line level should be asserted from the timer ISR. The VGIC can ignore extra notifications using its validate mechanism. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Factor out functionality to get vgic mmio requester_vcpuChristoffer Dall2018-01-021-16/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are about to distinguish between userspace accesses and mmio traps for a number of the mmio handlers. When the requester vcpu is NULL, it means we are handling a userspace access. Factor out the functionality to get the request vcpu into its own function, mostly so we have a common place to document the semantics of the return value. Also take the chance to move the functionality outside of holding a spinlock and instead explicitly disable and enable preemption. This supports PREEMPT_RT kernels as well. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm/arm64: Remove redundant preemptible checksChristoffer Dall2018-01-021-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __this_cpu_read() and __this_cpu_write() functions already implement checks for the required preemption levels when using CONFIG_DEBUG_PREEMPT which gives you nice error messages and such. Therefore there is no need to explicitly check this using a BUG_ON() in the code (which we don't do for other uses of per cpu variables either). Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | KVM: arm: Use PTR_ERR_OR_ZERO()Vasyl Gomonovych2018-01-021-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix ptr_ret.cocci warnings: virt/kvm/arm/vgic/vgic-its.c:971:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | | KVM: introduce kvm_arch_vcpu_async_ioctlPaolo Bonzini2017-12-142-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the vcpu_load/vcpu_put pushdown, the handling of asynchronous VCPU ioctl is already much clearer in that it is obvious that they bypass vcpu_load and vcpu_put. However, it is still not perfect in that the different state of the VCPU mutex is still hidden in the caller. Separate those ioctls into a new function kvm_arch_vcpu_async_ioctl that returns -ENOIOCTLCMD for more "traditional" synchronous ioctls. Cc: James Hogan <jhogan@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctlChristoffer Dall2017-12-142-20/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the calls to vcpu_load() and vcpu_put() in to the architecture specific implementations of kvm_arch_vcpu_ioctl() which dispatches further architecture-specific ioctls on to other functions. Some architectures support asynchronous vcpu ioctls which cannot call vcpu_load() or take the vcpu->mutex, because that would prevent concurrent execution with a running VCPU, which is the intended purpose of these ioctls, for example because they inject interrupts. We repeat the separate checks for these specifics in the architecture code for MIPS, S390 and PPC, and avoid taking the vcpu->mutex and calling vcpu_load for these ioctls. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_fpuChristoffer Dall2017-12-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_fpu(). Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_fpuChristoffer Dall2017-12-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_fpu(). Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_guest_debugChristoffer Dall2017-12-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_guest_debug(). Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_translateChristoffer Dall2017-12-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_translate(). Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_mpstateChristoffer Dall2017-12-142-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_mpstate(). Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_mpstateChristoffer Dall2017-12-142-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_mpstate(). Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_sregsChristoffer Dall2017-12-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_sregs(). Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_sregsChristoffer Dall2017-12-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_sregs(). Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_regsChristoffer Dall2017-12-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_regs(). Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_get_regsChristoffer Dall2017-12-141-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_regs(). Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_runChristoffer Dall2017-12-142-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_run(). Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390 parts Reviewed-by: Cornelia Huck <cohuck@redhat.com> [Rebased. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Prepare for moving vcpu_load/vcpu_put into arch specific codeChristoffer Dall2017-12-141-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for moving calls to vcpu_load() and vcpu_put() into the architecture specific implementations of the KVM vcpu ioctls, move the calls in the main kvm_vcpu_ioctl() dispatcher function to each case of the ioctl select statement. This allows us to move the vcpu_load() and vcpu_put() calls into architecture specific implementations of vcpu ioctls, one by one. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: Take vcpu->mutex outside vcpu_loadChristoffer Dall2017-12-141-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we're about to call vcpu_load() from architecture-specific implementations of the KVM vcpu ioctls, but yet we access data structures protected by the vcpu->mutex in the generic code, factor this logic out from vcpu_load(). x86 is the only architecture which calls vcpu_load() outside of the main vcpu ioctl function, and these calls will no longer take the vcpu mutex following this patch. However, with the exception of kvm_arch_vcpu_postcreate (see below), the callers are either in the creation or destruction path of the VCPU, which means there cannot be any concurrent access to the data structure, because the file descriptor is not yet accessible, or is already gone. kvm_arch_vcpu_postcreate makes the newly created vcpu potentially accessible by other in-kernel threads through the kvm->vcpus array, and we therefore take the vcpu mutex in this case directly. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | kvm_main: Use common error handling code in kvm_dev_ioctl_create_vm()Markus Elfring2017-12-141-10/+11
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a jump target so that a bit of exception handling can be better reused at the end of this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* | | Merge tag 'arm64-upstream' of ↵Linus Torvalds2018-02-082-25/+120
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull more arm64 updates from Catalin Marinas: "As I mentioned in the last pull request, there's a second batch of security updates for arm64 with mitigations for Spectre/v1 and an improved one for Spectre/v2 (via a newly defined firmware interface API). Spectre v1 mitigation: - back-end version of array_index_mask_nospec() - masking of the syscall number to restrict speculation through the syscall table - masking of __user pointers prior to deference in uaccess routines Spectre v2 mitigation update: - using the new firmware SMC calling convention specification update - removing the current PSCI GET_VERSION firmware call mitigation as vendors are deploying new SMCCC-capable firmware - additional branch predictor hardening for synchronous exceptions and interrupts while in user mode Meltdown v3 mitigation update: - Cavium Thunder X is unaffected but a hardware erratum gets in the way. The kernel now starts with the page tables mapped as global and switches to non-global if kpti needs to be enabled. Other: - Theoretical trylock bug fixed" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (38 commits) arm64: Kill PSCI_GET_VERSION as a variant-2 workaround arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support arm/arm64: smccc: Implement SMCCC v1.1 inline primitive arm/arm64: smccc: Make function identifiers an unsigned quantity firmware/psci: Expose SMCCC version through psci_ops firmware/psci: Expose PSCI conduit arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support arm/arm64: KVM: Turn kvm_psci_version into a static inline arm/arm64: KVM: Advertise SMCCC v1.1 arm/arm64: KVM: Implement PSCI 1.0 support arm/arm64: KVM: Add smccc accessors to PSCI code arm/arm64: KVM: Add PSCI_VERSION helper arm/arm64: KVM: Consolidate the PSCI include files arm64: KVM: Increment PC after handling an SMC trap arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls arm64: entry: Apply BP hardening for suspicious interrupts from EL0 arm64: entry: Apply BP hardening for high-priority synchronous exceptions arm64: futex: Mask __user pointers prior to dereference ...
| * | | arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening supportMarc Zyngier2018-02-061-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new feature of SMCCC 1.1 is that it offers firmware-based CPU workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides BP hardening for CVE-2017-5715. If the host has some mitigation for this issue, report that we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the host workaround on every guest exit. Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | arm/arm64: KVM: Turn kvm_psci_version into a static inlineMarc Zyngier2018-02-061-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're about to need kvm_psci_version in HYP too. So let's turn it into a static inline, and pass the kvm structure as a second parameter (so that HYP can do a kern_hyp_va on it). Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | arm/arm64: KVM: Advertise SMCCC v1.1Marc Zyngier2018-02-061-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new SMC Calling Convention (v1.1) allows for a reduced overhead when calling into the firmware, and provides a new feature discovery mechanism. Make it visible to KVM guests. Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | arm/arm64: KVM: Implement PSCI 1.0 supportMarc Zyngier2018-02-061-1/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PSCI 1.0 can be trivially implemented by providing the FEATURES call on top of PSCI 0.2 and returning 1.0 as the PSCI version. We happily ignore everything else, as they are either optional or are clarifications that do not require any additional change. PSCI 1.0 is now the default until we decide to add a userspace selection API. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | arm/arm64: KVM: Add smccc accessors to PSCI codeMarc Zyngier2018-02-061-10/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of open coding the accesses to the various registers, let's add explicit SMCCC accessors. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | arm/arm64: KVM: Add PSCI_VERSION helperMarc Zyngier2018-02-061-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we're about to trigger a PSCI version explosion, it doesn't hurt to introduce a PSCI_VERSION helper that is going to be used everywhere. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * | | arm/arm64: KVM: Consolidate the PSCI include filesMarc Zyngier2018-02-062-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we're about to update the PSCI support, and because I'm lazy, let's move the PSCI include file to include/kvm so that both ARM architectures can find it. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | | | Merge tag 'usercopy-v4.16-rc1' of ↵Linus Torvalds2018-02-031-2/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardened usercopy whitelisting from Kees Cook: "Currently, hardened usercopy performs dynamic bounds checking on slab cache objects. This is good, but still leaves a lot of kernel memory available to be copied to/from userspace in the face of bugs. To further restrict what memory is available for copying, this creates a way to whitelist specific areas of a given slab cache object for copying to/from userspace, allowing much finer granularity of access control. Slab caches that are never exposed to userspace can declare no whitelist for their objects, thereby keeping them unavailable to userspace via dynamic copy operations. (Note, an implicit form of whitelisting is the use of constant sizes in usercopy operations and get_user()/put_user(); these bypass all hardened usercopy checks since these sizes cannot change at runtime.) This new check is WARN-by-default, so any mistakes can be found over the next several releases without breaking anyone's system. The series has roughly the following sections: - remove %p and improve reporting with offset - prepare infrastructure and whitelist kmalloc - update VFS subsystem with whitelists - update SCSI subsystem with whitelists - update network subsystem with whitelists - update process memory with whitelists - update per-architecture thread_struct with whitelists - update KVM with whitelists and fix ioctl bug - mark all other allocations as not whitelisted - update lkdtm for more sensible test overage" * tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits) lkdtm: Update usercopy tests for whitelisting usercopy: Restrict non-usercopy caches to size 0 kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl kvm: whitelist struct kvm_vcpu_arch arm: Implement thread_struct whitelist for hardened usercopy arm64: Implement thread_struct whitelist for hardened usercopy x86: Implement thread_struct whitelist for hardened usercopy fork: Provide usercopy whitelisting for task_struct fork: Define usercopy region in thread_stack slab caches fork: Define usercopy region in mm_struct slab caches net: Restrict unwhitelisted proto caches to size 0 sctp: Copy struct sctp_sock.autoclose to userspace using put_user() sctp: Define usercopy region in SCTP proto slab cache caif: Define usercopy region in caif proto slab cache ip: Define usercopy region in IP proto slab cache net: Define usercopy region in struct proto slab cache scsi: Define usercopy region in scsi_sense_cache slab cache cifs: Define usercopy region in cifs_request slab cache vxfs: Define usercopy region in vxfs_inode slab cache ufs: Define usercopy region in ufs_inode_cache slab cache ...