1. 26 Sep, 2018 1 commit
  2. 24 Aug, 2018 2 commits
  3. 25 Jul, 2018 1 commit
    • Lan Tianyu's avatar
      KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel. · 76267a8a
      Lan Tianyu authored
      commit b5020a8e6b54d2ece80b1e7dedb33c79a40ebd47 upstream.
      Syzbot reports crashes in kvm_irqfd_assign(), caused by use-after-free
      when kvm_irqfd_assign() and kvm_irqfd_deassign() run in parallel
      for one specific eventfd. When the assign path hasn't finished but irqfd
      has been added to kvm->irqfds.items list, another thead may deassign the
      eventfd and free struct kvm_kernel_irqfd(). The assign path then uses
      the struct kvm_kernel_irqfd that has been freed by deassign path. To avoid
      such issue, keep irqfd under kvm->irq_srcu protection after the irqfd
      has been added to kvm->irqfds.items list, and call synchronize_srcu()
      in irq_shutdown() to make sure that irqfd has been fully initialized in
      the assign path.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTianyu Lan <tianyu.lan@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  4. 22 Jul, 2018 1 commit
  5. 30 May, 2018 1 commit
  6. 22 May, 2018 1 commit
  7. 11 Mar, 2018 1 commit
    • Wanpeng Li's avatar
      KVM: mmu: Fix overlap between public and private memslots · 2e112f36
      Wanpeng Li authored
      commit b28676bb8ae4569cced423dc2a88f7cb319d5379 upstream.
      Reported by syzkaller:
          pte_list_remove: ffff9714eb1f8078 0->BUG
          ------------[ cut here ]------------
          kernel BUG at arch/x86/kvm/mmu.c:1157!
          invalid opcode: 0000 [#1] SMP
          RIP: 0010:pte_list_remove+0x11b/0x120 [kvm]
          Call Trace:
           drop_spte+0x83/0xb0 [kvm]
           mmu_page_zap_pte+0xcc/0xe0 [kvm]
           kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm]
           kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm]
           kvm_arch_flush_shadow_all+0xe/0x10 [kvm]
           kvm_mmu_notifier_release+0x6c/0xa0 [kvm]
           ? kvm_mmu_notifier_release+0x5/0xa0 [kvm]
           ? __mmu_notifier_release+0x5/0x110
           ? do_exit+0x281/0xcb0
           ? __context_tracking_exit.part.5+0x4a/0x150
      The reason is that when creates new memslot, there is no guarantee for new
      memslot not overlap with private memslots. This can be triggered by the
      following program:
         #include <fcntl.h>
         #include <pthread.h>
         #include <setjmp.h>
         #include <signal.h>
         #include <stddef.h>
         #include <stdint.h>
         #include <stdio.h>
         #include <stdlib.h>
         #include <string.h>
         #include <sys/ioctl.h>
         #include <sys/stat.h>
         #include <sys/syscall.h>
         #include <sys/types.h>
         #include <unistd.h>
         #include <linux/kvm.h>
         long r[16];
         int main()
      	void *p = valloc(0x4000);
      	r[2] = open("/dev/kvm", 0);
      	r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);
      	uint64_t addr = 0xf000;
      	ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr);
      	r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul);
      	ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul);
      	ioctl(r[6], KVM_RUN, 0);
      	ioctl(r[6], KVM_RUN, 0);
      	struct kvm_userspace_memory_region mr = {
      		.slot = 0,
      		.flags = KVM_MEM_LOG_DIRTY_PAGES,
      		.guest_phys_addr = 0xf000,
      		.memory_size = 0x4000,
      		.userspace_addr = (uintptr_t) p
      	ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr);
      	return 0;
      This patch fixes the bug by not adding a new memslot even if it
      overlaps with private memslots.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
  8. 25 Dec, 2017 2 commits
  9. 16 Dec, 2017 1 commit
  10. 14 Dec, 2017 5 commits
  11. 09 Dec, 2017 1 commit
    • Christoffer Dall's avatar
      KVM: arm/arm64: Fix occasional warning from the timer work function · ee01c59b
      Christoffer Dall authored
      [ Upstream commit 63e41226 ]
      When a VCPU blocks (WFI) and has programmed the vtimer, we program a
      soft timer to expire in the future to wake up the vcpu thread when
      appropriate.  Because such as wake up involves a vcpu kick, and the
      timer expire function can get called from interrupt context, and the
      kick may sleep, we have to schedule the kick in the work function.
      The work function currently has a warning that gets raised if it turns
      out that the timer shouldn't fire when it's run, which was added because
      the idea was that in that case the work should never have been cancelled.
      However, it turns out that this whole thing is racy and we can get
      spurious warnings.  The problem is that we clear the armed flag in the
      work function, which may run in parallel with the
      kvm_timer_unschedule->timer_disarm() call.  This results in a possible
      situation where the timer_disarm() call does not call
      cancel_work_sync(), which effectively synchronizes the completion of the
      work function with running the VCPU.  As a result, the VCPU thread
      proceeds before the work function completees, causing changes to the
      timer state such that kvm_timer_should_fire(vcpu) returns false in the
      work function.
      All we do in the work function is to kick the VCPU, and an occasional
      rare extra kick never harmed anyone.  Since the race above is extremely
      rare, we don't bother checking if the race happens but simply remove the
      check and the clearing of the armed flag from the work function.
      Reported-by: Matthias Brugger's avatarMatthias Brugger <mbrugger@suse.com>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  12. 27 Jul, 2017 1 commit
  13. 14 Jun, 2017 2 commits
  14. 08 Apr, 2017 2 commits
  15. 18 Mar, 2017 1 commit
  16. 12 Mar, 2017 1 commit
  17. 26 Jan, 2017 1 commit
  18. 19 Jan, 2017 1 commit
    • Wanpeng Li's avatar
      KVM: eventfd: fix NULL deref irqbypass consumer · 7caf473f
      Wanpeng Li authored
      commit 4f3dbdf4 upstream.
      Reported syzkaller:
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
          IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          PGD 0
          Oops: 0002 [#1] SMP
          CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
          Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
          task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
          RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          Call Trace:
           irqfd_shutdown+0x66/0xa0 [kvm]
           ? process_one_work+0x480/0x480
           ? kthread_create_on_node+0x60/0x60
          RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
          CR2: 0000000000000008
      The syzkaller folks reported a NULL pointer dereference that due to
      unregister an consumer which fails registration before. The syzkaller
      creates two VMs w/ an equal eventfd occasionally. So the second VM
      fails to register an irqbypass consumer. It will make irqfd as inactive
      and queue an workqueue work to shutdown irqfd and unregister the irqbypass
      consumer when eventfd is closed. However, the second consumer has been
      initialized though it fails registration. So the token(same as the first
      VM's) is taken to unregister the consumer through the workqueue, the
      consumer of the first VM is found and unregistered, then NULL deref incurred
      in the path of deleting consumer from the consumers list.
      This patch fixes it by making irq_bypass_register/unregister_consumer()
      looks for the consumer entry based on consumer pointer itself instead of
      token matching.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Suggested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  19. 01 Dec, 2016 1 commit
  20. 24 Nov, 2016 1 commit
  21. 19 Nov, 2016 1 commit
    • Paolo Bonzini's avatar
      KVM: async_pf: avoid recursive flushing of work items · 22583f0d
      Paolo Bonzini authored
      This was reported by syzkaller:
          [ INFO: possible recursive locking detected ]
          4.9.0-rc4+ #49 Not tainted
          kworker/2:1/5658 is trying to acquire lock:
           ([ 1644.769018] (&work->work)
          [<     inline     >] list_empty include/linux/compiler.h:243
          [<ffffffff8128dd60>] flush_work+0x0/0x660 kernel/workqueue.c:1511
          but task is already holding lock:
           ([ 1644.769018] (&work->work)
          [<ffffffff812916ab>] process_one_work+0x94b/0x1900 kernel/workqueue.c:2093
          stack backtrace:
          CPU: 2 PID: 5658 Comm: kworker/2:1 Not tainted 4.9.0-rc4+ #49
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Workqueue: events async_pf_execute
           ffff8800676ff630 ffffffff81c2e46b ffffffff8485b930 ffff88006b1fc480
           0000000000000000 ffffffff8485b930 ffff8800676ff7e0 ffffffff81339b27
           ffff8800676ff7e8 0000000000000046 ffff88006b1fcce8 ffff88006b1fccf0
          Call Trace:
          [<ffffffff8128ddf3>] flush_work+0x93/0x660 kernel/workqueue.c:2846
          [<ffffffff812954ea>] __cancel_work_timer+0x17a/0x410 kernel/workqueue.c:2916
          [<ffffffff81295797>] cancel_work_sync+0x17/0x20 kernel/workqueue.c:2951
          [<ffffffff81073037>] kvm_clear_async_pf_completion_queue+0xd7/0x400 virt/kvm/async_pf.c:126
          [<     inline     >] kvm_free_vcpus arch/x86/kvm/x86.c:7841
          [<ffffffff810b728d>] kvm_arch_destroy_vm+0x23d/0x620 arch/x86/kvm/x86.c:7946
          [<     inline     >] kvm_destroy_vm virt/kvm/kvm_main.c:731
          [<ffffffff8105914e>] kvm_put_kvm+0x40e/0x790 virt/kvm/kvm_main.c:752
          [<ffffffff81072b3d>] async_pf_execute+0x23d/0x4f0 virt/kvm/async_pf.c:111
          [<ffffffff8129175c>] process_one_work+0x9fc/0x1900 kernel/workqueue.c:2096
          [<ffffffff8129274f>] worker_thread+0xef/0x1480 kernel/workqueue.c:2230
          [<ffffffff812a5a94>] kthread+0x244/0x2d0 kernel/kthread.c:209
          [<ffffffff831f102a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
      The reason is that kvm_put_kvm is causing the destruction of the VM, but
      the page fault is still on the ->queue list.  The ->queue list is owned
      by the VCPU, not by the work items, so we cannot just add list_del to
      the work item.
      Instead, use work->vcpu to note async page faults that have been resolved
      and will be processed through the done list.  There is no need to flush
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
  22. 18 Nov, 2016 1 commit
    • Wei Huang's avatar
      KVM: arm64: Fix the issues when guest PMCCFILTR is configured · b112c84a
      Wei Huang authored
      KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured.
      But this function can't deals with PMCCFILTR correctly because the evtCount
      bits of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event
      type of other PMXEVTYPER<n> registers. To fix it, when eventsel == 0, this
      function shouldn't return immediately; instead it needs to check further
      if select_idx is ARMV8_PMU_CYCLE_IDX.
      Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER
      blindly to attr.config. Instead it ought to convert the request to the
      "cpu cycle" event type (i.e. 0x11).
      To support this patch and to prevent duplicated definitions, a limited
      set of ARMv8 perf event types were relocated from perf_event.c to
      Cc: stable@vger.kernel.org # 4.6+
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarWei Huang <wei@redhat.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
  23. 04 Nov, 2016 2 commits
  24. 26 Oct, 2016 1 commit
    • Paolo Bonzini's avatar
      KVM: fix OOPS on flush_work · 36343f6e
      Paolo Bonzini authored
      The conversion done by commit 3706feac ("KVM: Remove deprecated
      create_singlethread_workqueue") is broken.  It flushes a single work
      item &irqfd->shutdown instead of all of them, and even worse if there
      is no irqfd on the list then you get a NULL pointer dereference.
      Revert the virt/kvm/eventfd.c part of that patch; to avoid the
      deprecated function, just allocate our own workqueue---it does
      not even have to be unbound---with alloc_workqueue.
      Fixes: 3706feacReviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
  25. 25 Oct, 2016 1 commit
    • Lorenzo Stoakes's avatar
      mm: unexport __get_user_pages() · 0d731759
      Lorenzo Stoakes authored
      This patch unexports the low-level __get_user_pages() function.
      Recent refactoring of the get_user_pages* functions allow flags to be
      passed through get_user_pages() which eliminates the need for access to
      this function from its one user, kvm.
      We can see that the two calls to get_user_pages() which replace
      __get_user_pages() in kvm_main.c are equivalent by examining their call
          get_user_pages(start, 1, flags, page, NULL)
          __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
      			    false, flags | FOLL_TOUCH)
          __get_user_pages(current, current->mm, start, 1,
      		     flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)
          get_user_pages(addr, 1, flags, NULL, NULL)
          __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
      			    false, flags | FOLL_TOUCH)
          __get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
      		     NULL, NULL)
      Signed-off-by: default avatarLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  26. 18 Oct, 2016 1 commit
  27. 27 Sep, 2016 2 commits
  28. 22 Sep, 2016 3 commits
    • Vladimir Murzin's avatar
      ARM: KVM: Support vgic-v3 · acda5430
      Vladimir Murzin authored
      This patch allows to build and use vgic-v3 in 32-bit mode.
      Unfortunately, it can not be split in several steps without extra
      stubs to keep patches independent and bisectable.  For instance,
      virt/kvm/arm/vgic/vgic-v3.c uses function from vgic-v3-sr.c, handling
      access to GICv3 cpu interface from the guest requires vgic_v3.vgic_sre
      to be already defined.
      It is how support has been done:
      * handle SGI requests from the guest
      * report configured SRE on access to GICv3 cpu interface from the guest
      * required vgic-v3 macros are provided via uapi.h
      * static keys are used to select GIC backend
      * to make vgic-v3 build KVM_ARM_VGIC_V3 guard is removed along with
        the static inlines
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarVladimir Murzin <vladimir.murzin@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
    • Vladimir Murzin's avatar
      KVM: arm: vgic: Support 64-bit data manipulation on 32-bit host systems · d7d0a11e
      Vladimir Murzin authored
      We have couple of 64-bit registers defined in GICv3 architecture, so
      unsigned long accesses to these registers will only access a single
      32-bit part of that regitser. On the other hand these registers can't
      be accessed as 64-bit with a single instruction like ldrd/strd or
      ldmia/stmia if we run a 32-bit host because KVM does not support
      access to MMIO space done by these instructions.
      It means that a 32-bit guest accesses these registers in 32-bit
      chunks, so the only thing we need to do is to ensure that
      extract_bytes() always takes 64-bit data.
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarVladimir Murzin <vladimir.murzin@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
    • Vladimir Murzin's avatar
      KVM: arm: vgic: Fix compiler warnings when built for 32-bit · e533a37f
      Vladimir Murzin authored
      Well, this patch is looking ahead of time, but we'll get following
      compiler warnings as soon as we introduce vgic-v3 to 32-bit world
        CC      arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.o
      arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_mmio_read_v3r_typer':
      arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:184:35: warning: left shift count >= width of type [-Wshift-count-overflow]
        value = (mpidr & GENMASK(23, 0)) << 32;
      In file included from ./include/linux/kernel.h:10:0,
                       from ./include/asm-generic/bug.h:13,
                       from ./arch/arm/include/asm/bug.h:59,
                       from ./include/linux/bug.h:4,
                       from ./include/linux/io.h:23,
                       from ./arch/arm/include/asm/arch_gicv3.h:23,
                       from ./include/linux/irqchip/arm-gic-v3.h:411,
                       from arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:14:
      arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c: In function 'vgic_v3_dispatch_sgi':
      ./include/linux/bitops.h:6:24: warning: left shift count >= width of type [-Wshift-count-overflow]
       #define BIT(nr)   (1UL << (nr))
      arch/arm/kvm/../../../virt/kvm/arm/vgic/vgic-mmio-v3.c:614:20: note: in expansion of macro 'BIT'
        broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
      Let's fix them now.
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarVladimir Murzin <vladimir.murzin@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>