OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Vitaly Kuznetsov	7863e346e1	KVM: async_pf: Cleanup kvm_setup_async_pf() schedule_work() returns 'false' only when the work is already on the queue and this can't happen as kvm_setup_async_pf() always allocates a new one. Also, to avoid potential race, it makes sense to to schedule_work() at the very end after we've added it to the queue. While on it, do some minor cleanup. gfn_to_pfn_async() mentioned in a comment does not currently exist and, moreover, we can check kvm_is_error_hva() at the very beginning, before we try to allocate work so 'retry_sync' label can go away completely. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200610175532.779793-1-vkuznets@redhat.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-11 12:35:19 -04:00
Colin Ian King	cd18eaeaff	kvm: i8254: remove redundant assignment to pointer s The pointer s is being assigned a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Message-Id: <20200609233121.1118683-1-colin.king@canonical.com> Fixes: `7837699fa6` ("KVM: In kernel PIT model") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-11 12:35:18 -04:00
Felipe Franciosi	384dea1c91	KVM: x86: respect singlestep when emulating instruction When userspace configures KVM_GUESTDBG_SINGLESTEP, KVM will manage the presence of X86_EFLAGS_TF via kvm_set/get_rflags on vcpus. The actual rflag bit is therefore hidden from callers. That includes init_emulate_ctxt() which uses the value returned from kvm_get_flags() to set ctxt->tf. As a result, x86_emulate_instruction() will skip a single step, leaving singlestep_rip stale and not returning to userspace. This resolves the issue by observing the vcpu guest_debug configuration alongside ctxt->tf in x86_emulate_instruction(), performing the single step if set. Cc: stable@vger.kernel.org Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Message-Id: <20200519081048.8204-1-felipe@nutanix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-11 12:35:18 -04:00
Vitaly Kuznetsov	7e464770a4	KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported KVM_CAP_HYPERV_ENLIGHTENED_VMCS will be reported as supported even when nested VMX is not, fix evmcs_test/hyperv_cpuid tests to check for both. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200610135847.754289-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-11 12:35:18 -04:00
Vitaly Kuznetsov	41a23ab336	KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check state_test/smm_test use KVM_CAP_NESTED_STATE check as an indicator for nested VMX/SVM presence and this is incorrect. Check for the required features dirrectly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200610135847.754289-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-11 12:35:17 -04:00
Paolo Bonzini	77f81f37fb	Merge branch 'kvm-basic-exit-reason' into HEAD Using a topic branch so that stable branches can simply cherry-pick the patch. Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-11 12:35:14 -04:00
Sean Christopherson	2ebac8bb3c	KVM: nVMX: Consult only the "basic" exit reason when routing nested exit Consult only the basic exit reason, i.e. bits 15:0 of vmcs.EXIT_REASON, when determining whether a nested VM-Exit should be reflected into L1 or handled by KVM in L0. For better or worse, the switch statement in nested_vmx_exit_reflected() currently defaults to "true", i.e. reflects any nested VM-Exit without dedicated logic. Because the case statements only contain the basic exit reason, any VM-Exit with modifier bits set will be reflected to L1, even if KVM intended to handle it in L0. Practically speaking, this only affects EXIT_REASON_MCE_DURING_VMENTRY, i.e. a #MC that occurs on nested VM-Enter would be incorrectly routed to L1, as "failed VM-Entry" is the only modifier that KVM can currently encounter. The SMM modifiers will never be generated as KVM doesn't support/employ a SMI Transfer Monitor. Ditto for "exit from enclave", as KVM doesn't yet support virtualizing SGX, i.e. it's impossible to enter an enclave in a KVM guest (L1 or L2). Fixes: `644d711aa0` ("KVM: nVMX: Deciding if L0 or L1 should handle an L2 exit") Cc: Jim Mattson <jmattson@google.com> Cc: Xiaoyao Li <xiaoyao.li@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200227174430.26371-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-11 11:28:11 -04:00
Sean Christopherson	80fbd280be	KVM: x86: Unexport x86_fpu_cache and make it static Make x86_fpu_cache static now that FPU allocation and destruction is handled entirely by common x86 code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200608180218.20946-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-09 05:57:27 -04:00
Sean Christopherson	2067028512	KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K Explicitly set the VA width to 48 bits for the x86_64-only PXXV48_4K VM mode instead of asserting the guest VA width is 48 bits. The fact that KVM supports 5-level paging is irrelevant unless the selftests opt-in to 5-level paging by setting CR4.LA57 for the guest. The overzealous assert prevents running the selftests on a kernel with 5-level paging enabled. Incorporate LA57 into the assert instead of removing the assert entirely as a sanity check of KVM's CPUID output. Fixes: `567a9f1e9d` ("KVM: selftests: Introduce VM_MODE_PXXV48_4K") Reported-by: Sergio Perez Gonzalez <sergio.perez.gonzalez@intel.com> Cc: Adriana Cervantes Jimenez <adriana.cervantes.jimenez@intel.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200528021530.28091-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-09 05:54:50 -04:00
Eiichi Tsukata	e649b3f018	KVM: x86: Fix APIC page invalidation race Commit `b1394e745b` ("KVM: x86: fix APIC page invalidation") tried to fix inappropriate APIC page invalidation by re-introducing arch specific kvm_arch_mmu_notifier_invalidate_range() and calling it from kvm_mmu_notifier_invalidate_range_start. However, the patch left a possible race where the VMCS APIC address cache is updated before it is unmapped: (Invalidator) kvm_mmu_notifier_invalidate_range_start() (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD) (KVM VCPU) vcpu_enter_guest() (KVM VCPU) kvm_vcpu_reload_apic_access_page() (Invalidator) actually unmap page Because of the above race, there can be a mismatch between the host physical address stored in the APIC_ACCESS_PAGE VMCS field and the host physical address stored in the EPT entry for the APIC GPA (0xfee0000). When this happens, the processor will not trap APIC accesses, and will instead show the raw contents of the APIC-access page. Because Windows OS periodically checks for unexpected modifications to the LAPIC register, this will show up as a BSOD crash with BugCheck CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in https://bugzilla.redhat.com/show_bug.cgi?id=1751017. The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range() cannot guarantee that no additional references are taken to the pages in the range before kvm_mmu_notifier_invalidate_range_end(). Fortunately, this case is supported by the MMU notifier API, as documented in include/linux/mmu_notifier.h: * If the subsystem * can't guarantee that no additional references are taken to * the pages in the range, it has to implement the * invalidate_range() notifier to remove any references taken * after invalidate_range_start(). The fix therefore is to reload the APIC-access page field in the VMCS from kvm_mmu_notifier_invalidate_range() instead of ..._range_start(). Cc: stable@vger.kernel.org Fixes: `b1394e745b` ("KVM: x86: fix APIC page invalidation") Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951 Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Message-Id: <20200606042627.61070-1-eiichi.tsukata@nutanix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-08 09:05:38 -04:00
Paolo Bonzini	fb7333dfd8	KVM: SVM: fix calls to is_intercept is_intercept takes an INTERCEPT_* constant, not SVM_EXIT_*; because of this, the compiler was removing the body of the conditionals, as if is_intercept returned 0. This unveils a latent bug: when clearing the VINTR intercept, int_ctl must also be changed in the L1 VMCB (svm->nested.hsave), just like the intercept itself is also changed in the L1 VMCB. Otherwise V_IRQ remains set and, due to the VINTR intercept being clear, we get a spurious injection of a vector 0 interrupt on the next L2->L1 vmexit. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-08 08:00:57 -04:00
Vitaly Kuznetsov	75ad6e8002	KVM: selftests: fix vmx_preemption_timer_test build with GCC10 GCC10 fails to build vmx_preemption_timer_test: gcc -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 -fno-stack-protector -fno-PIE -I../../../../tools/include -I../../../../tools/arch/x86/include -I../../../../usr/include/ -Iinclude -Ix86_64 -Iinclude/x86_64 -I.. -pthread -no-pie x86_64/evmcs_test.c ./linux/tools/testing/selftests/kselftest_harness.h ./linux/tools/testing/selftests/kselftest.h ./linux/tools/testing/selftests/kvm/libkvm.a -o ./linux/tools/testing/selftests/kvm/x86_64/evmcs_test /usr/bin/ld: ./linux/tools/testing/selftests/kvm/libkvm.a(vmx.o): ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:603: multiple definition of `ctrl_exit_rev'; /tmp/ccMQpvNt.o: ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:603: first defined here /usr/bin/ld: ./linux/tools/testing/selftests/kvm/libkvm.a(vmx.o): ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:602: multiple definition of `ctrl_pin_rev'; /tmp/ccMQpvNt.o: ./linux/tools/testing/selftests/kvm/include/x86_64/vmx.h:602: first defined here ... ctrl_exit_rev/ctrl_pin_rev/basic variables are only used in vmx_preemption_timer_test.c, just move them there. Fixes: `8d7fbf01f9` ("KVM: selftests: VMX preemption timer migration test") Reported-by: Marcelo Bandeira Condotta <mcondotta@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200608112346.593513-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-08 07:59:43 -04:00
Vitaly Kuznetsov	5ae1452f5f	KVM: selftests: Add x86_64/debug_regs to .gitignore Add x86_64/debug_regs to .gitignore. Reported-by: Marcelo Bandeira Condotta <mcondotta@redhat.com> Fixes: `449aa906e6` ("KVM: selftests: Add KVM_SET_GUEST_DEBUG test") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200608112346.593513-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-08 07:59:42 -04:00
Vitaly Kuznetsov	25597f64c2	Revert "KVM: x86: work around leak of uninitialized stack contents" handle_vmptrst()/handle_vmread() stopped injecting #PF unconditionally and switched to nested_vmx_handle_memory_failure() which just kills the guest with KVM_EXIT_INTERNAL_ERROR in case of MMIO access, zeroing 'exception' in kvm_write_guest_virt_system() is not needed anymore. This reverts commit `541ab2aeb2`. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200605115906.532682-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-08 07:59:42 -04:00
Vitaly Kuznetsov	7a35e515a7	KVM: VMX: Properly handle kvm_read/write_guest_virt() result Syzbot reports the following issue: WARNING: CPU: 0 PID: 6819 at arch/x86/kvm/x86.c:618 kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618 ... Call Trace: ... RIP: 0010:kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618 ... nested_vmx_get_vmptr+0x1f9/0x2a0 arch/x86/kvm/vmx/nested.c:4638 handle_vmon arch/x86/kvm/vmx/nested.c:4767 [inline] handle_vmon+0x168/0x3a0 arch/x86/kvm/vmx/nested.c:4728 vmx_handle_exit+0x29c/0x1260 arch/x86/kvm/vmx/vmx.c:6067 'exception' we're trying to inject with kvm_inject_emulated_page_fault() comes from: nested_vmx_get_vmptr() kvm_read_guest_virt() kvm_read_guest_virt_helper() vcpu->arch.walk_mmu->gva_to_gpa() but it is only set when GVA to GPA conversion fails. In case it doesn't but we still fail kvm_vcpu_read_guest_page(), X86EMUL_IO_NEEDED is returned and nested_vmx_get_vmptr() calls kvm_inject_emulated_page_fault() with zeroed 'exception'. This happen when the argument is MMIO. Paolo also noticed that nested_vmx_get_vmptr() is not the only place in KVM code where kvm_read/write_guest_virt() return result is mishandled. VMX instructions along with INVPCID have the same issue. This was already noticed before, e.g. see commit `541ab2aeb2` ("KVM: x86: work around leak of uninitialized stack contents") but was never fully fixed. KVM could've handled the request correctly by going to userspace and performing I/O but there doesn't seem to be a good need for such requests in the first place. Introduce vmx_handle_memory_failure() as an interim solution. Note, nested_vmx_get_vmptr() now has three possible outcomes: OK, PF, KVM_EXIT_INTERNAL_ERROR and callers need to know if userspace exit is needed (for KVM_EXIT_INTERNAL_ERROR) in case of failure. We don't seem to have a good enum describing this tristate, just add "int *ret" to nested_vmx_get_vmptr() interface to pass the information. Reported-by: syzbot+2a7156e11dc199bdbd8a@syzkaller.appspotmail.com Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200605115906.532682-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-08 07:59:08 -04:00
Paolo Bonzini	34d2618d33	KVM: x86: emulate reserved nops from 0f/18 to 0f/1f Instructions starting with 0f18 up to 0f1f are reserved nops, except those that were assigned to MPX. These include the endbr markers used by CET. List them correctly in the opcode table. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-05 11:16:15 -04:00
Vitaly Kuznetsov	b80db73dc8	KVM: selftests: Fix build with "make ARCH=x86_64" Marcelo reports that kvm selftests fail to build with "make ARCH=x86_64": gcc -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 -fno-stack-protector -fno-PIE -I../../../../tools/include -I../../../../tools/arch/x86_64/include -I../../../../usr/include/ -Iinclude -Ilib -Iinclude/x86_64 -I.. -c lib/kvm_util.c -o /var/tmp/20200604202744-bin/lib/kvm_util.o In file included from lib/kvm_util.c:11: include/x86_64/processor.h:14:10: fatal error: asm/msr-index.h: No such file or directory #include <asm/msr-index.h> ^~~~~~~~~~~~~~~~~ compilation terminated. "make ARCH=x86", however, works. The problem is that arch specific headers for x86_64 live in 'tools/arch/x86/include', not in 'tools/arch/x86_64/include'. Fixes: `66d69e081b` ("selftests: fix kvm relocatable native/cross builds and installs") Reported-by: Marcelo Bandeira Condotta <mcondotta@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200605142028.550068-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-05 11:15:55 -04:00
Paolo Bonzini	ba4e627921	PPC KVM update for 5.8 - Updates and bug fixes for secure guest support - Other minor bug fixes and cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJe1ZJVAAoJEJ2a6ncsY3GfbAkH/Ai18+o6+ZPXIBwr/39sMAHi cdyJDDYPQgATJ1Aie25um/cCCvGtx5PLQS6gVq8uoKb/zefrOUsEgG45muqGy1aI 3EJXkAl1636f154Q9iZWPAr4ZG+dUiVTp/ACZcw1uAJLnnXrTHZtL4H+tvFplT7m 1sBF6Mepha5B3oJyBDgPDpyfafsrzVeF+SpyywHhHR71DGYcGDwWWRliXxyfSPzh yrnOuS6LVScjDHfKrdPYptaFiPUfJiPLbVCh/APxx9oXXlnSHQ+MfgrJisL4OSUa 4AQdTJKbEZUlkzf62xwXb2HmtDzyt2qD5A/NTr6cAZDsbdEVRr81mkI3iUim+rM= =1OTR -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-next-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD PPC KVM update for 5.8 - Updates and bug fixes for secure guest support - Other minor bug fixes and cleanups.	2020-06-04 14:58:03 -04:00
Anthony Yznaga	3741679ba4	KVM: x86: minor code refactor and comments fixup around dirty logging Consolidate the code and correct the comments to show that the actions taken to update existing mappings to disable or enable dirty logging are not necessary when creating, moving, or deleting a memslot. Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Message-Id: <1591128450-11977-4-git-send-email-anthony.yznaga@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 14:42:40 -04:00
Anthony Yznaga	4b44295538	KVM: x86: avoid unnecessary rmap walks when creating/moving slots On large memory guests it has been observed that creating a memslot for a very large range can take noticeable amount of time. Investigation showed that the time is spent walking the rmaps to update existing sptes to remove write access or set/clear dirty bits to support dirty logging. These rmap walks are unnecessary when creating or moving a memslot. A newly created memslot will not have any existing mappings, and the existing mappings of a moved memslot will have been invalidated and flushed. Any mappings established once the new/moved memslot becomes visible will be set using the properties of the new slot. Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Message-Id: <1591128450-11977-3-git-send-email-anthony.yznaga@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 14:42:35 -04:00
Anthony Yznaga	5688fed649	KVM: x86: remove unnecessary rmap walk of read-only memslots There's no write access to remove. An existing memslot cannot be updated to set or clear KVM_MEM_READONLY, and any mappings established in a newly created or moved read-only memslot will already be read-only. Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Message-Id: <1591128450-11977-2-git-send-email-anthony.yznaga@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 14:42:25 -04:00
Denis Efremov	7ec28e264f	KVM: Use vmemdup_user() Replace opencoded alloc and copy with vmemdup_user(). Signed-off-by: Denis Efremov <efremov@linux.com> Message-Id: <20200603101131.2107303-1-efremov@linux.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 14:41:05 -04:00
Sean Christopherson	0e96edd9a9	x86/kvm: Remove defunct KVM_DEBUG_FS Kconfig Remove KVM_DEBUG_FS, which can easily be misconstrued as controlling KVM-as-a-host. The sole user of CONFIG_KVM_DEBUG_FS was removed by commit `cfd8983f03` ("x86, locking/spinlocks: Remove ticket (spin)lock implementation"). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200528031121.28904-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 14:12:36 -04:00
Huacai Chen	0f78355c45	KVM: MIPS: Enable KVM support for Loongson-3 This patch enable KVM support for Loongson-3 by selecting HAVE_KVM, but only enable KVM/VZ on Loongson-3A R4+ (because VZ of early processors are incomplete). Besides, Loongson-3 support SMP guests, so we clear the linked load bit of LLAddr in kvm_vz_vcpu_load() if the guest has more than one VCPUs. Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-15-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:51:53 -04:00
Huacai Chen	dc6d95b153	KVM: MIPS: Add more MMIO load/store instructions emulation This patch add more MMIO load/store instructions emulation, which can be observed in QXL and some other device drivers: 1, LWL, LWR, LDW, LDR, SWL, SWR, SDL and SDR for all MIPS; 2, GSLBX, GSLHX, GSLWX, GSLDX, GSSBX, GSSHX, GSSWX and GSSDX for Loongson-3. Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-14-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:51:49 -04:00
Huacai Chen	8a5097ee90	KVM: MIPS: Add CONFIG6 and DIAG registers emulation Loongson-3 has CONFIG6 and DIAG registers which need to be emulated. CONFIG6 is mostly used to enable/disable FTLB and SFB, while DIAG is mostly used to flush BTB, ITLB, DTLB, VTLB and FTLB. Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-13-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:51:45 -04:00
Huacai Chen	7f2a83f1c2	KVM: MIPS: Add CPUCFG emulation for Loongson-3 Loongson-3 overrides lwc2 instructions to implement CPUCFG and CSR read/write functions. These instructions all cause guest exit so CSR doesn't benifit KVM guest (and there are always legacy methods to provide the same functions as CSR). So, we only emulate CPUCFG and let it return a reduced feature list (which means the virtual CPU doesn't have any other advanced features, including CSR) in KVM. Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-12-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:51:33 -04:00
Huacai Chen	f21db3090d	KVM: MIPS: Add Loongson-3 Virtual IPI interrupt support This patch add Loongson-3 Virtual IPI interrupt support in the kernel. The current implementation of IPI emulation in QEMU is based on GIC for MIPS, but Loongson-3 doesn't use GIC. Furthermore, IPI emulation in QEMU is too expensive for performance (because of too many context switches between Host and Guest). With current solution, the IPI delay may even cause RCU stall warnings in a multi-core Guest. So, we design a faster solution that emulate IPI interrupt in kernel (only used by Loongson-3 now). Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-11-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:50:39 -04:00
Huacai Chen	3f51d8fcac	KVM: MIPS: Add more types of virtual interrupts In current implementation, MIPS KVM uses IP2, IP3, IP4 and IP7 for external interrupt, two kinds of IPIs and timer interrupt respectively, but Loongson-3 based machines prefer to use IP2, IP3, IP6 and IP7 for two kinds of external interrupts, IPI and timer interrupt. So we define two priority-irq mapping tables: kvm_loongson3_priority_to_irq[] for Loongson-3, and kvm_default_priority_to_irq[] for others. The virtual interrupt infrastructure is updated to deliver all types of interrupts from IP2, IP3, IP4, IP6 and IP7. Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-10-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:49:13 -04:00
Huacai Chen	49bb96003f	KVM: MIPS: Let indexed cacheops cause guest exit on Loongson-3 Loongson-3's indexed cache operations need a node-id in the address, but in KVM guest the node-id may be incorrect. So, let indexed cache operations cause guest exit on Loongson-3. Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-9-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:49:08 -04:00
Huacai Chen	52c07e1cbb	KVM: MIPS: Use root tlb to control guest's CCA for Loongson-3 KVM guest has two levels of address translation: guest tlb translates GVA to GPA, and root tlb translates GPA to HPA. By default guest's CCA is controlled by guest tlb, but Loongson-3 maintains all cache coherency by hardware (including multi-core coherency and I/O DMA coherency) so it prefers all guest mappings be cacheable mappings. Thus, we use root tlb to control guest's CCA for Loongson-3. Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-8-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:49:04 -04:00
Huacai Chen	3210e2c279	KVM: MIPS: Introduce and use cpu_guest_has_ldpte Loongson-3 has lddir/ldpte instructions and their related CP0 registers are the same as HTW. So we introduce a cpu_guest_has_ldpte flag and use it to indicate whether we need to save/restore HTW related CP0 registers (PWBase, PWSize, PWField and PWCtl). Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-7-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:49:00 -04:00
Huacai Chen	8bf3129503	KVM: MIPS: Use lddir/ldpte instructions to lookup gpa_mm.pgd Loongson-3 can use lddir/ldpte instuctions to accelerate page table walking, so use them to lookup gpa_mm.pgd. Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-6-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:48:55 -04:00
Huacai Chen	bf10efbb81	KVM: MIPS: Add EVENTFD support which is needed by VHOST Add EVENTFD support for KVM/MIPS, which is needed by VHOST. Tested on Loongson-3 platform. Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-5-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:48:51 -04:00
Huacai Chen	210b4b9112	KVM: MIPS: Increase KVM_MAX_VCPUS and KVM_USER_MEM_SLOTS to 16 Loongson-3 based machines can have as many as 16 CPUs, and so does memory slots, so increase KVM_MAX_VCPUS and KVM_USER_MEM_SLOTS to 16. Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <1590220602-3547-4-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:48:45 -04:00
Xing Li	5816c76dea	KVM: MIPS: Fix VPN2_MASK definition for variable cpu_vmbits If a CPU support more than 32bit vmbits (which is true for 64bit CPUs), VPN2_MASK set to fixed 0xffffe000 will lead to a wrong EntryHi in some functions such as _kvm_mips_host_tlb_inv(). The cpu_vmbits definition of 32bit CPU in cpu-features.h is 31, so we still use the old definition. Cc: Stable <stable@vger.kernel.org> Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Xing Li <lixing@loongson.cn> [Huacai: Improve commit messages] Signed-off-by: Huacai Chen <chenhc@lemote.com> Message-Id: <1590220602-3547-3-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:48:41 -04:00
Xing Li	fe2b73dba4	KVM: MIPS: Define KVM_ENTRYHI_ASID to cpu_asid_mask(&boot_cpu_data) The code in decode_config4() of arch/mips/kernel/cpu-probe.c asid_mask = MIPS_ENTRYHI_ASID; if (config4 & MIPS_CONF4_AE) asid_mask \|= MIPS_ENTRYHI_ASIDX; set_cpu_asid_mask(c, asid_mask); set asid_mask to cpuinfo->asid_mask. So in order to support variable ASID_MASK, KVM_ENTRYHI_ASID should also be changed to cpu_asid_mask(&boot_cpu_data). Cc: Stable <stable@vger.kernel.org> #4.9+ Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Xing Li <lixing@loongson.cn> [Huacai: Change current_cpu_data to boot_cpu_data for optimization] Signed-off-by: Huacai Chen <chenhc@lemote.com> Message-Id: <1590220602-3547-2-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 13:48:33 -04:00
Babu Moger	fa44b82eb8	KVM: x86: Move MPK feature detection to common code Both Intel and AMD support (MPK) Memory Protection Key feature. Move the feature detection from VMX to the common code. It should work for both the platforms now. Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <158932795627.44260.15144185478040178638.stgit@naples-babu.amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 12:35:06 -04:00
Xiaoyao Li	65b1891499	KVM: x86: Assign correct value to array.maxnent Delay the assignment of array.maxnent to use correct value for the case cpuid->nent > KVM_MAX_CPUID_ENTRIES. Fixes: `e53c95e8d4` ("KVM: x86: Encapsulate CPUID entries and metadata in struct") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200604041636.1187-1-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 12:21:10 -04:00
Sean Christopherson	f4f6bd93fd	KVM: VMX: Always treat MSR_IA32_PERF_CAPABILITIES as a valid PMU MSR Unconditionally return true when querying the validity of MSR_IA32_PERF_CAPABILITIES so as to defer the validity check to intel_pmu_{get,set}_msr(), which can properly give the MSR a pass when the access is initiated from host userspace. The MSR is emulated so there is no underlying hardware dependency to worry about. Fixes: `27461da310` ("KVM: x86/pmu: Support full width counting") Cc: Like Xu <like.xu@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200603203303.28545-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 12:20:44 -04:00
Paolo Bonzini	d56f5136b0	KVM: let kvm_destroy_vm_debugfs clean up vCPU debugfs directories After commit `63d0434` ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point") we are creating the pre-vCPU debugfs files after the creation of the vCPU file descriptor. This makes it possible for userspace to reach kvm_vcpu_release before kvm_create_vcpu_debugfs has finished. The vcpu->debugfs_dentry then does not have any associated inode anymore, and this causes a NULL-pointer dereference in debugfs_create_file. The solution is simply to avoid removing the files; they are cleaned up when the VM file descriptor is closed (and that must be after KVM_CREATE_VCPU returns). We can stop storing the dentry in struct kvm_vcpu too, because it is not needed anywhere after kvm_create_vcpu_debugfs returns. Reported-by: syzbot+705f4401d5a93a59b87d@syzkaller.appspotmail.com Fixes: `63d0434837` ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-06-04 11:00:54 -04:00
Linus Torvalds	6929f71e46	atomisp: avoid warning about unused function The atomisp_mrfld_power() function isn't actually ever called, because the two call-sites have commented out the use because it breaks on some platforms. That results in: drivers/staging/media/atomisp/pci/atomisp_v4l2.c:764:12: warning: ‘atomisp_mrfld_power’ defined but not used [-Wunused-function] 764 \| static int atomisp_mrfld_power(struct atomisp_device *isp, bool enable) \| ^~~~~~~~~~~~~~~~~~~ during the build. Rather than commenting out the use entirely, just disable it semantically instead (using a "0 &&" construct), leaving the call in place from a syntax standpoint, and avoiding the warning. I really don't want my builds to have any warnings that can then hide real issues. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 21:22:46 -07:00
Linus Torvalds	a98f670e41	media updates for v5.8-rc1 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl7XUmwACgkQCF8+vY7k 4RU4zg//fT32wiVAPHCCp+pDZVnWNeipXE1gnpqghd/qZXfzBPiLEC9sPS74VVkA jf1hhR33VZpKAKTPg/b074qhRZBywEOdHZnT/0CEE1oNB61shVOnyDYzLGSq95cO 6V55ovbi5IOkrg0QEJbHpG5YHzt+pq5XeWOkqGNsHwla7N7iMGMVYfHepVVDWPnZ 0wGYFF9cAJP+X/uxqkZLDVMA/K1I+QKh6vrj/qx53/eRt8VID3+i8ig3guk4PlUq 7RLw5w/CywtNaGE5zaz7T3i2eoED71JHOTXi6RxdP1z8IDvELZ9mT95GQ+enlwqt AS6Ju1sV40wviHMv5prJWQjJkrrtYH3S907lIjwBpQLNGbh2+5crCd/6CwumkGgv 1cCZ1dVmXpCe++9mU9AXmSkjsjGPStNcmHMOpc1Pwn9jUV3LQOOSDp8+RYdt1WHU Iw9cyM8NOpz5Mv/B1/ZPQ1gPb9lr1gE09XyUekxtAI/nl4nNHGWO8QDuX7Odfrv9 8nfo14lk/p6XCTA8dsWJCgI5B1fgnqD4frHKWO9Uctppc/KBW41c8JpQUjBNlG/T MhtlGwYMVgSQxpQ6wK018JUAFoWkn1Sr0zMKRayqCnMjMLHsaMwE6kq+LgmRBqbB ersKV/9ZLYqCU1d6PhEVG6xUs6GsWdLcyhALlmHsddPSdpFXdf8= =KNAo -----END PGP SIGNATURE----- Merge tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - Media documentation is now split into admin-guide, driver-api and userspace-api books (a longstanding request from Jon); - The media Kconfig was reorganized, in order to make easier to select drivers and their dependencies; - The testing drivers now has a separate directory; - added a new driver for Rockchip Video Decoder IP; - The atomisp staging driver was resurrected. It is meant to work with 4 generations of cameras on Atom-based laptops, tablets and cell phones. So, it seems worth investing time to cleanup this driver and making it in good shape. - Added some V4L2 core ancillary routines to help with h264 codecs; - Added an ov2740 image sensor driver; - The si2157 gained support for Analog TV, which, in turn, added support for some cx231xx and cx23885 boards to also support analog standards; - Added some V4L2 controls (V4L2_CID_CAMERA_ORIENTATION and V4L2_CID_CAMERA_SENSOR_ROTATION) to help identifying where the camera is located at the device; - VIDIOC_ENUM_FMT was extended to support MC-centric devices; - Lots of drivers improvements and cleanups. * tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (503 commits) media: Documentation: media: Refer to mbus format documentation from CSI-2 docs media: s5k5baf: Replace zero-length array with flexible-array media: i2c: imx219: Drop <linux/clk-provider.h> and <linux/clkdev.h> media: i2c: Add ov2740 image sensor driver media: ov8856: Implement sensor module revision identification media: ov8856: Add devicetree support media: dt-bindings: ov8856: Document YAML bindings media: dvb-usb: Add Cinergy S2 PCIe Dual Port support media: dvbdev: Fix tuner->demod media controller link media: dt-bindings: phy: phy-rockchip-dphy-rx0: move rockchip dphy rx0 bindings out of staging media: staging: dt-bindings: phy-rockchip-dphy-rx0: remove non-used reg property media: atomisp: unify the version for isp2401 a0 and b0 versions media: atomisp: update TODO with the current data media: atomisp: adjust some code at sh_css that could be broken media: atomisp: don't produce errs for ignored IRQs media: atomisp: print IRQ when debugging media: atomisp: isp_mmu: don't use kmem_cache media: atomisp: add a notice about possible leak resources media: atomisp: disable the dynamic and reserved pools media: atomisp: turn on camera before setting it ...	2020-06-03 20:59:38 -07:00
Linus Torvalds	ee01c4d72a	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "More mm/ work, plenty more to come Subsystems affected by this patch series: slub, memcg, gup, kasan, pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs, thp, mmap, kconfig" * akpm: (131 commits) arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined riscv: support DEBUG_WX mm: add DEBUG_WX support drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid() powerpc/mm: drop platform defined pmd_mknotpresent() mm: thp: don't need to drain lru cache when splitting and mlocking THP hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs sparc32: register memory occupied by kernel as memblock.memory include/linux/memblock.h: fix minor typo and unclear comment mm, mempolicy: fix up gup usage in lookup_node tools/vm/page_owner_sort.c: filter out unneeded line mm: swap: memcg: fix memcg stats for huge pages mm: swap: fix vmstats for huge pages mm: vmscan: limit the range of LRU type balancing mm: vmscan: reclaim writepage is IO cost mm: vmscan: determine anon/file pressure balance at the reclaim root mm: balance LRU lists based on relative thrashing mm: only count actual rotations as LRU reclaim cost ...	2020-06-03 20:24:15 -07:00
Zong Li	09587a09ad	arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined Extract DEBUG_WX to mm/Kconfig.debug for shared use. Change to use ARCH_HAS_DEBUG_WX instead of DEBUG_WX defined by arch port. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/e19709e7576f65e303245fe520cad5f7bae72763.1587455584.git.zong.li@sifive.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:50 -07:00
Zong Li	7e01ccb43d	x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined Extract DEBUG_WX to mm/Kconfig.debug for shared use. Change to use ARCH_HAS_DEBUG_WX instead of DEBUG_WX defined by arch port. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/430736828d149df3f5b462d291e845ec690e0141.1587455584.git.zong.li@sifive.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:50 -07:00
Zong Li	b422d28b21	riscv: support DEBUG_WX Support DEBUG_WX to check whether there are mapping with write and execute permission at the same time. [akpm@linux-foundation.org: replace macros with C] Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/282e266311bced080bc6f7c255b92f87c1eb65d6.1587455584.git.zong.li@sifive.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:50 -07:00
Zong Li	375d315cbf	mm: add DEBUG_WX support Patch series "Extract DEBUG_WX to shared use". Some architectures support DEBUG_WX function, it's verbatim from each others, so extract to mm/Kconfig.debug for shared use. PPC and ARM ports don't support generic page dumper yet, so we only refine x86 and arm64 port in this patch series. For RISC-V port, the DEBUG_WX support depends on other patches which be merged already: - RISC-V page table dumper - Support strict kernel memory permissions for security This patch (of 4): Some architectures support DEBUG_WX function, it's verbatim from each others. Extract to mm/Kconfig.debug for shared use. [akpm@linux-foundation.org: reword text, per Will Deacon & Zong Li] Link: http://lkml.kernel.org/r/20200427194245.oxRJKj3fn%25akpm@linux-foundation.org [zong.li@sifive.com: remove the specific name of arm64] Link: http://lkml.kernel.org/r/3a6a92ecedc54e1d0fc941398e63d504c2cd5611.1589178399.git.zong.li@sifive.com [zong.li@sifive.com: add MMU dependency for DEBUG_WX] Link: http://lkml.kernel.org/r/4a674ac7863ff39ca91847b10e51209771f99416.1589178399.git.zong.li@sifive.com Suggested-by: Palmer Dabbelt <palmer@dabbelt.com> Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/cover.1587455584.git.zong.li@sifive.com Link: http://lkml.kernel.org/r/23980cd0f0e5d79e24a92169116407c75bcc650d.1587455584.git.zong.li@sifive.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:49 -07:00
Scott Cheloha	4fb6eabf10	drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup Searching for a particular memory block by id is an O(n) operation because each memory block's underlying device is kept in an unsorted linked list on the subsystem bus. We can cut the lookup cost to O(log n) if we cache each memory block in an xarray. This time complexity improvement is significant on systems with many memory blocks. For example: 1. A 128GB POWER9 VM with 256MB memblocks has 512 blocks. With this change memory_dev_init() completes ~12ms faster and walk_memory_blocks() completes ~12ms faster. Before: [ 0.005042] memory_dev_init: adding memory blocks [ 0.021591] memory_dev_init: added memory blocks [ 0.022699] walk_memory_blocks: walking memory blocks [ 0.038730] walk_memory_blocks: walked memory blocks 0-511 After: [ 0.005057] memory_dev_init: adding memory blocks [ 0.009415] memory_dev_init: added memory blocks [ 0.010519] walk_memory_blocks: walking memory blocks [ 0.014135] walk_memory_blocks: walked memory blocks 0-511 2. A 256GB POWER9 LPAR with 256MB memblocks has 1024 blocks. With this change memory_dev_init() completes ~88ms faster and walk_memory_blocks() completes ~87ms faster. Before: [ 0.252246] memory_dev_init: adding memory blocks [ 0.395469] memory_dev_init: added memory blocks [ 0.409413] walk_memory_blocks: walking memory blocks [ 0.433028] walk_memory_blocks: walked memory blocks 0-511 [ 0.433094] walk_memory_blocks: walking memory blocks [ 0.500244] walk_memory_blocks: walked memory blocks 131072-131583 After: [ 0.245063] memory_dev_init: adding memory blocks [ 0.299539] memory_dev_init: added memory blocks [ 0.313609] walk_memory_blocks: walking memory blocks [ 0.315287] walk_memory_blocks: walked memory blocks 0-511 [ 0.315349] walk_memory_blocks: walking memory blocks [ 0.316988] walk_memory_blocks: walked memory blocks 131072-131583 3. A 32TB POWER9 LPAR with 256MB memblocks has 131072 blocks. With this change we complete memory_dev_init() ~37 minutes faster and walk_memory_blocks() at least ~30 minutes faster. The exact timing for walk_memory_blocks() is missing, though I observed that the soft lockups in walk_memory_blocks() disappeared with the change, suggesting that lower bound. Before: [ 13.703907] memory_dev_init: adding blocks [ 2287.406099] memory_dev_init: added all blocks [ 2347.494986] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160 [ 2527.625378] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160 [ 2707.761977] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160 [ 2887.899975] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160 [ 3068.028318] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160 [ 3248.158764] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160 [ 3428.287296] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160 [ 3608.425357] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160 [ 3788.554572] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160 [ 3968.695071] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160 [ 4148.823970] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160 After: [ 13.696898] memory_dev_init: adding blocks [ 15.660035] memory_dev_init: added all blocks (the walk_memory_blocks traces disappear) There should be no significant negative impact for machines with few memory blocks. A sparse xarray has a small footprint and an O(log n) lookup is negligibly slower than an O(n) lookup for only the smallest number of memory blocks. 1. A 16GB x86 machine with 128MB memblocks has 132 blocks. With this change memory_dev_init() completes ~300us faster and walk_memory_blocks() completes no faster or slower. The improvement is pretty close to noise. Before: [ 0.224752] memory_dev_init: adding memory blocks [ 0.227116] memory_dev_init: added memory blocks [ 0.227183] walk_memory_blocks: walking memory blocks [ 0.227183] walk_memory_blocks: walked memory blocks 0-131 After: [ 0.224911] memory_dev_init: adding memory blocks [ 0.226935] memory_dev_init: added memory blocks [ 0.227089] walk_memory_blocks: walking memory blocks [ 0.227089] walk_memory_blocks: walked memory blocks 0-131 [david@redhat.com: document the locking] Link: http://lkml.kernel.org/r/bc21eec6-7251-4c91-2f57-9a0671f8d414@redhat.com Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Nathan Lynch <nathanl@linux.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rick Lindsley <ricklind@linux.vnet.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Link: http://lkml.kernel.org/r/20200121231028.13699-1-cheloha@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:49 -07:00
Anshuman Khandual	86ec2da037	mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid() pmd_present() is expected to test positive after pmdp_mknotpresent() as the PMD entry still points to a valid huge page in memory. pmdp_mknotpresent() implies that given PMD entry is just invalidated from MMU perspective while still holding on to pmd_page() referred valid huge page thus also clearing pmd_present() test. This creates the following situation which is counter intuitive. [pmd_present(pmd_mknotpresent(pmd)) = true] This renames pmd_mknotpresent() as pmd_mkinvalid() reflecting the helper's functionality more accurately while changing the above mentioned situation as follows. This does not create any functional change. [pmd_present(pmd_mkinvalid(pmd)) = true] This is not applicable for platforms that define own pmdp_invalidate() via __HAVE_ARCH_PMDP_INVALIDATE. Suggestion for renaming came during a previous discussion here. https://patchwork.kernel.org/patch/11019637/ [anshuman.khandual@arm.com: change pmd_mknotvalid() to pmd_mkinvalid() per Will] Link: http://lkml.kernel.org/r/1587520326-10099-3-git-send-email-anshuman.khandual@arm.com Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Will Deacon <will@kernel.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Link: http://lkml.kernel.org/r/1584680057-13753-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-03 20:09:49 -07:00

1 2 3 4 5 ...

925475 Commits All Branches Search

925475 Commits

All Branches