OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Peter Maydell	e8f4ba8331	scripts/kernel-doc: Add missing close-paren in c:function directives When kernel-doc generates a 'c:function' directive for a function one of whose arguments is a function pointer, it fails to print the close-paren after the argument list of the function pointer argument. For instance: long work_on_cpu(int cpu, long (fn) (void , void * arg) in driver-api/basics.html is missing a ')' separating the "void " of the 'fn' arguments from the ", void arg" which is an argument to work_on_cpu(). Add the missing close-paren, so that we render the prototype correctly: long work_on_cpu(int cpu, long (fn)(void ), void * arg) (Note that Sphinx stops rendering a space between the '(fn)' and the '(void )' once it gets something that's syntactically valid.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Link: https://lore.kernel.org/r/20200414143743.32677-1-peter.maydell@linaro.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2020-04-15 14:58:12 -06:00
Eric Biggers	52338dfb3c	docs: admin-guide: merge sections for the kernel.modprobe sysctl Documentation for the kernel.modprobe sysctl was added both by commit `0317c5371e` ("docs: merge debugging-modules.txt into sysctl/kernel.rst") and by commit `6e71582506` ("docs: admin-guide: document the kernel.modprobe sysctl"), resulting in the same sysctl being documented in two places. Merge these into one place. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Stephen Kitt <steve@sk2.org> Link: https://lore.kernel.org/r/20200414172430.230293-1-ebiggers@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2020-04-15 14:50:09 -06:00
Chris Packham	404e603f1e	docs: timekeeping: Use correct prototype for deprecated functions Use the correct prototypes for do_gettimeofday(), getnstimeofday() and getnstimeofday64(). All of these returned void and passed the return value by reference. This should make the documentation of their deprecation and replacements easier to search for. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200414221222.23996-1-chris.packham@alliedtelesis.co.nz Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2020-04-15 14:48:26 -06:00
Jason Gunthorpe	7dba92037b	net/rds: Use ERR_PTR for rds_message_alloc_sgs() Returning the error code via a 'int *ret' when the function returns a pointer is very un-kernely and causes gcc 10's static analysis to choke: net/rds/message.c: In function ‘rds_message_map_pages’: net/rds/message.c:358:10: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized] 358 \| return ERR_PTR(ret); Use a typical ERR_PTR return instead. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-15 12:33:29 -07:00
Vladimir Oltean	87b0f983f6	net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge To rehash a previous explanation given in commit `1c44ce560b` ("net: mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is up"), the switch driver operates the in a mode where a single VLAN can be transmitted as untagged on a particular egress port. That is the "native VLAN on trunk port" use case. The configuration for this native VLAN is driven in 2 ways: - Set the egress port rewriter to strip the VLAN tag for the native VID (as it is egress-untagged, after all). - Configure the ingress port to drop untagged and priority-tagged traffic, if there is no native VLAN. The intention of this setting is that a trunk port with no native VLAN should not accept untagged traffic. Since both of the above configurations for the native VLAN should only be done if VLAN awareness is requested, they are actually done from the ocelot_port_vlan_filtering function, after the basic procedure of toggling the VLAN awareness flag of the port. But there's a problem with that simplistic approach: we are trying to juggle with 2 independent variables from a single function: - Native VLAN of the port - its value is held in port->vid. - VLAN awareness state of the port - currently there are some issues here, more on that later. The actual problem can be seen when enslaving the switch ports to a VLAN filtering bridge: 0. The driver configures a pvid of zero for each port, when in standalone mode. While the bridge configures a default_pvid of 1 for each port that gets added as a slave to it. 1. The bridge calls ocelot_port_vlan_filtering with vlan_aware=true. The VLAN-filtering-dependent portion of the native VLAN configuration is done, considering that the native VLAN is 0. 2. The bridge calls ocelot_vlan_add with vid=1, pvid=true, untagged=true. The native VLAN changes to 1 (change which gets propagated to hardware). 3. ??? - nobody calls ocelot_port_vlan_filtering again, to reapply the VLAN-filtering-dependent portion of the native VLAN configuration, for the new native VLAN of 1. One can notice that after toggling "ip link set dev br0 type bridge vlan_filtering 0 && ip link set dev br0 type bridge vlan_filtering 1", the new native VLAN finally makes it through and untagged traffic finally starts flowing again. But obviously that shouldn't be needed. So it is clear that 2 independent variables need to both re-trigger the native VLAN configuration. So we introduce the second variable as ocelot_port->vlan_aware. Actually both the DSA Felix driver and the Ocelot driver already had each its own variable: - Ocelot: ocelot_port_private->vlan_aware - Felix: dsa_port->vlan_filtering but the common Ocelot library needs to work with a single, common, variable, so there is some refactoring done to move the vlan_aware property from the private structure into the common ocelot_port structure. Fixes: `97bb69e1e3` ("net: mscc: ocelot: break apart ocelot_vlan_port_apply") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-15 12:27:35 -07:00
David S. Miller	78b877113f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2020-04-15 The following pull-request contains BPF updates for your net tree. We've added 10 non-merge commits during the last 3 day(s) which contain a total of 11 files changed, 238 insertions(+), 95 deletions(-). The main changes are: 1) Fix offset overflow for BPF_MEM BPF_DW insn mapping on arm32 JIT, from Luke Nelson and Xi Wang. 2) Prevent mprotect() to make frozen & mmap()'ed BPF map writeable again, from Andrii Nakryiko and Jann Horn. 3) Fix type of old_fd in bpf_xdp_set_link_opts to int in libbpf and add selftests, from Toke Høiland-Jørgensen. 4) Fix AF_XDP to check that headroom cannot be larger than the available space in the chunk, from Magnus Karlsson. 5) Fix reset of XDP prog when expected_fd is set, from David Ahern. 6) Fix a segfault in bpftool's struct_ops command when BTF is not available, from Daniel T. Lee. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-15 11:35:42 -07:00
David S. Miller	6058ee09ec	A couple of fixes: * FTM responder policy netlink validation fix (but the only user validates again later) * kernel-doc fixes * a fix for a race in mac80211 radio registration vs. userspace * a mesh channel switch fix * a fix for a syzbot reported kasprintf() issue -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl6WyIAACgkQB8qZga/f l8TqyBAAnR+HrstmY0g8pIfsN925axUP4blRuS4U55olmEvS4xFOKhIQ5ksMceD0 ndvTxXOjJ3EA6Zc5icGG+hcPihhcO/V6UzE83KHqVEa0LP7o6Nmjyh5kiyl1HS6y naB6/g3AQcUGqD1zsT8EZ6HeePXDVMxnT0CORC2JtXJCPmBeLpfAROEhKmLuuonp WugB8monGfWQuNJTNILTg4Ya8/LR4GW7ABTNrHOnAqCpexBhrGcxdbiiqTm8ionF sLcXkfzGd7KudTxS0qTJZnR3Bja7o96FjPxy1If5eQFUiDbgVYr0C8I69X6fTzyP l4O5b1DRcLLX/1N8H5DmUsRMk4froUtVY9hqTIVE956d1HWqOSfXXMx4yxpvqCp+ dzOM6mHzfWp29aKSNqtbkrDjGJB7EviR8Fan6Sngh2EgvUVhpdwpAaKepdcZHUkW th7ukrh/KTTaied8zu8CS5RVJPn/m4afsxl05yGSDh7kJ+VN9We+0fytOb+zvyUi HO9ru0w++zqdXU82l4qaQvLJXISpEyf53IqM2PnHIS5Zvo/3HnGC2vogHGiwAccY yNbD3IYueGIgzRrSN2CaAmoFY1CzC5rK73y9Qh1JIJqWOKv66MhBBPjm3kBraiqM hj5BH3Qz9PYnaNk1uDwHZN9dLPYQr0GPHPJEMx3QsRUVau95F0Q= =1yy8 -----END PGP SIGNATURE----- Merge tag 'mac80211-for-net-2020-04-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A couple of fixes: * FTM responder policy netlink validation fix (but the only user validates again later) * kernel-doc fixes * a fix for a race in mac80211 radio registration vs. userspace * a mesh channel switch fix * a fix for a syzbot reported kasprintf() issue ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-15 11:27:23 -07:00
Dmitry Osipenko	8814044fe0	i2c: tegra: Synchronize DMA before termination DMA transfer could be completed, but CPU (which handles DMA interrupt) may get too busy and can't handle the interrupt in a timely manner, despite of DMA IRQ being raised. In this case the DMA state needs to synchronized before terminating DMA transfer in order not to miss the DMA transfer completion. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2020-04-15 18:27:31 +02:00
Dmitry Osipenko	a900aeac25	i2c: tegra: Better handle case where CPU0 is busy for a long time Boot CPU0 always handle I2C interrupt and under some rare circumstances (like running KASAN + NFS root) it may stuck in uninterruptible state for a significant time. In this case we will get timeout if I2C transfer is running on a sibling CPU, despite of IRQ being raised. In order to handle this rare condition, the IRQ status needs to be checked after completion timeout. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2020-04-15 18:27:25 +02:00
Sean Christopherson	53b3d8e9d5	KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault) Export the page fault propagation helper so that VMX can use it to correctly emulate TLB invalidation on page faults in an upcoming patch. In the (hopefully) not-too-distant future, SGX virtualization will also want access to the helper for injecting page faults to the correct level (L1 vs. L2) when emulating ENCLS instructions. Rename the function to kvm_inject_emulated_page_fault() to clarify that it is (a) injecting a fault and (b) only for page faults. WARN if it's invoked with an exception other than PF_VECTOR. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-6-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:50 -04:00
Junaid Shahid	d6e3f8385d	KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT Free all roots when emulating INVVPID for L1 and EPT is disabled, as outstanding changes to the page tables managed by L1 need to be recognized. Because L1 and L2 share an MMU when EPT is disabled, and because VPID is not tracked by the MMU role, all roots in the current MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or VM-Exit could do a fast CR3 switch (without a flush/sync) and consume stale SPTEs. Fixes: `5c614b3583` ("KVM: nVMX: nested VPID emulation") Signed-off-by: Junaid Shahid <junaids@google.com> [sean: ported to upstream KVM, reworded the comment and changelog] Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-5-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:49 -04:00
Sean Christopherson	f8aa7e3958	KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 Free all L2 (guest_mmu) roots when emulating INVEPT for L1. Outstanding changes to the EPT tables managed by L1 need to be recognized, and relying on KVM to always flush L2's EPTP context on nested VM-Enter is dangerous. Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote TLB flush if necessary, e.g. if L1 has never entered L2 then there is nothing to be done. Nuking all L2 roots is overkill for the single-context variant, but it's the safe and easy bet. A more precise zap mechanism will be added in the future. Add a TODO to call out that KVM only needs to invalidate affected contexts. Fixes: `14c07ad89f` ("x86/kvm/mmu: introduce guest_mmu") Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-4-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:49 -04:00
Sean Christopherson	eed0030e4c	KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT) Signal VM-Fail for the single-context variant of INVEPT if the specified EPTP is invalid. Per the INEVPT pseudocode in Intel's SDM, it's subject to the standard EPT checks: If VM entry with the "enable EPT" VM execution control set to 1 would fail due to the EPTP value then VMfail(Invalid operand to INVEPT/INVVPID); Fixes: `bfd0a56b90` ("nEPT: Nested INVEPT") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-3-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:48 -04:00
Sean Christopherson	e8eff28215	KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush Flush all EPTP/VPID contexts if a TLB flush _may_ have been triggered by a remote or deferred TLB flush, i.e. by KVM_REQ_TLB_FLUSH. Remote TLB flushes require all contexts to be invalidated, not just the active contexts, e.g. all mappings in all contexts for a given HVA need to be invalidated on a mmu_notifier invalidation. Similarly, the instigator of the deferred TLB flush may be expecting all contexts to be flushed, e.g. vmx_vcpu_load_vmcs(). Without nested VMX, flushing only the current EPTP/VPID context isn't problematic because KVM uses a constant VPID for each vCPU, and mmu_alloc_direct_roots() all but guarantees KVM will use a single EPTP for L1. In the rare case where a different EPTP is created or reused, KVM (currently) unconditionally flushes the new EPTP context prior to entering the guest. With nested VMX, KVM conditionally uses a different VPID for L2, and unconditionally uses a different EPTP for L2. Because KVM doesn't _intentionally_ guarantee L2's EPTP/VPID context is flushed on nested VM-Enter, it'd be possible for a malicious L1 to attack the host and/or different VMs by exploiting the lack of flushing for L2. 1) Launch nested guest from malicious L1. 2) Nested VM-Enter to L2. 3) Access target GPA 'g'. CPU inserts TLB entry tagged with L2's ASID mapping 'g' to host PFN 'x'. 2) Nested VM-Exit to L1. 3) L1 triggers kernel same-page merging (ksm) by duplicating/zeroing the page for PFN 'x'. 4) Host kernel merges PFN 'x' with PFN 'y', i.e. unmaps PFN 'x' and remaps the page to PFN 'y'. mmu_notifier sends invalidate command, KVM flushes TLB only for L1's ASID. 4) Host kernel reallocates PFN 'x' to some other task/guest. 5) Nested VM-Enter to L2. KVM does not invalidate L2's EPTP or VPID. 6) L2 accesses GPA 'g' and gains read/write access to PFN 'x' via its stale TLB entry. However, current KVM unconditionally flushes L1's EPTP/VPID context on nested VM-Exit. But, that behavior is mostly unintentional, KVM doesn't go out of its way to flush EPTP/VPID on nested VM-Enter/VM-Exit, rather a TLB flush is guaranteed to occur prior to re-entering L1 due to __kvm_mmu_new_cr3() always being called with skip_tlb_flush=false. On nested VM-Enter, this happens via kvm_init_shadow_ept_mmu() (nested EPT enabled) or in nested_vmx_load_cr3() (nested EPT disabled). On nested VM-Exit it occurs via nested_vmx_load_cr3(). This also fixes a bug where a deferred TLB flush in the context of L2, with EPT disabled, would flush L1's VPID instead of L2's VPID, as vmx_flush_tlb() flushes L1's VPID regardless of is_guest_mode(). Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Ben Gardon <bgardon@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Junaid Shahid <junaids@google.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: John Haxby <john.haxby@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Fixes: `efebf0aaec` ("KVM: nVMX: Do not flush TLB on L1<->L2 transitions if L1 uses VPID and EPT") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:48 -04:00
Wainer dos Santos Moschetta	909e0abaac	selftests: kvm: Add testcase for creating max number of memslots This patch introduces test_add_max_memory_regions(), which checks that a VM can have added memory slots up to the limit defined in KVM_CAP_NR_MEMSLOTS. Then attempt to add one more slot to verify it fails as expected. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-11-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:47 -04:00
Sean Christopherson	5b4f758f45	KVM: selftests: Make set_memory_region_test common to all architectures Make set_memory_region_test available on all architectures by wrapping the bits that are x86-specific in ifdefs. A future testcase to create the maximum number of memslots will be architecture agnostic. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-10-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:47 -04:00
Sean Christopherson	8cc2dd637b	KVM: selftests: Add "zero" testcase to set_memory_region_test Add a testcase for running a guest with no memslots to the memory region test. The expected result on x86_64 is that the guest will trigger an internal KVM error due to the initial code fetch encountering a non-existent memslot and resulting in an emulation failure. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-9-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:45 -04:00
Wainer dos Santos Moschetta	4cd94d125d	selftests: kvm: Add vm_get_fd() in kvm_util Introduces the vm_get_fd() function in kvm_util which returns the VM file descriptor. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-8-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:44 -04:00
Sean Christopherson	8fb38f05ca	KVM: selftests: Add "delete" testcase to set_memory_region_test Add a testcase for deleting memslots while the guest is running. Like the "move" testcase, this is x86_64-only as it relies on MMIO happening when a non-existent memslot is encountered. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-7-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:44 -04:00
Sean Christopherson	8a0639fe92	KVM: sefltests: Add explicit synchronization to move mem region test Use sem_post() and sem_timedwait() to synchronize test stages between the vCPU thread and the main thread instead of using usleep() to wait for the vCPU thread and hoping for the best. Opportunistically refactor the code to make it suck less in general, and to prepare for adding more testcases. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:43 -04:00
Sean Christopherson	3e6b941267	KVM: selftests: Add GUEST_ASSERT variants to pass values to host Add variants of GUEST_ASSERT to pass values back to the host, e.g. to help debug/understand a failure when the the cause of the assert isn't necessarily binary. It'd probably be possible to auto-calculate the number of arguments and just have a single GUEST_ASSERT, but there are a limited number of variants and silently eating arguments could lead to subtle code bugs. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200410231707.7128-5-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:43 -04:00
Sean Christopherson	8c996e4dae	KVM: selftests: Add util to delete memory region Add a utility to delete a memory region, it will be used by x86's set_memory_region_test. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <20200410231707.7128-4-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:42 -04:00
Sean Christopherson	4d9bba9007	KVM: selftests: Use kernel's list instead of homebrewed replacement Replace the KVM selftests' homebrewed linked lists for vCPUs and memory regions with the kernel's 'struct list_head'. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <20200410231707.7128-3-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:42 -04:00
Sean Christopherson	238022ff5d	KVM: selftests: Take vcpu pointer instead of id in vm_vcpu_rm() The sole caller of vm_vcpu_rm() already has the vcpu pointer, take it directly instead of doing an extra lookup. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-Id: <20200410231707.7128-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:41 -04:00
Eric Northup	43d05de2be	KVM: pass through CPUID(0x80000006) Return the host's L2 cache and TLB information for CPUID.0x80000006 instead of zeroing out the entry as part of KVM_GET_SUPPORTED_CPUID. This allows a userspace VMM to feed KVM_GET_SUPPORTED_CPUID's output directly into KVM_SET_CPUID2 (without breaking the guest). Signed-off-by: Eric Northup (Google) <digitaleric@gmail.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Jon Cargille <jcargill@google.com> Message-Id: <20200415012320.236065-1-jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:41 -04:00
Peter Shier	24647e0a39	KVM: x86: Return updated timer current count register from KVM_GET_LAPIC kvm_vcpu_ioctl_get_lapic (implements KVM_GET_LAPIC ioctl) does a bulk copy of the LAPIC registers but must take into account that the one-shot and periodic timer current count register is computed upon reads and is not present in register state. When restoring LAPIC state (e.g. after migration), restart timers from their their current count values at time of save. Note: When a one-shot timer expires, the code in arch/x86/kvm/lapic.c does not zero the value of the LAPIC initial count register (emulating HW behavior). If no other timer is run and pending prior to a subsequent KVM_GET_LAPIC call, the returned register set will include the expired one-shot initial count. On a subsequent KVM_SET_LAPIC call the code will see a non-zero initial count and start a new one-shot timer using the expired timer's count. This is a prior existing bug and will be addressed in a separate patch. Thanks to jmattson@google.com for this find. Signed-off-by: Peter Shier <pshier@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <20181010225653.238911-1-pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:40 -04:00
Colin Ian King	788109c1cc	KVM: remove redundant assignment to variable r The variable r is being assigned with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Message-Id: <20200410113526.13822-1-colin.king@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:40 -04:00
Uros Bizjak	56a87e5d99	KVM: SVM: Fix __svm_vcpu_run declaration. The function returns no value. Cc: Paolo Bonzini <pbonzini@redhat.com> Fixes: `199cd1d7b5` ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200409114926.1407442-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:39 -04:00
Uros Bizjak	b61f62d408	KVM: SVM: Do not setup frame pointer in __svm_vcpu_run __svm_vcpu_run is a leaf function and does not need a frame pointer. %rbp is also destroyed a few instructions later when guest registers are loaded. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200409120440.1427215-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:38 -04:00
Borislav Petkov	b2bce0a589	KVM: SVM: Fix build error due to missing release_pages() include Fix: arch/x86/kvm/svm/sev.c: In function ‘sev_pin_memory’: arch/x86/kvm/svm/sev.c:360:3: error: implicit declaration of function ‘release_pages’;\ did you mean ‘reclaim_pages’? [-Werror=implicit-function-declaration] 360 \| release_pages(pages, npinned); \| ^~~~~~~~~~~~~ \| reclaim_pages because svm.c includes pagemap.h but the carved out sev.c needs it too. Triggered by a randconfig build. Fixes: `eaf78265a4` ("KVM: SVM: Move SEV code to separate file") Signed-off-by: Borislav Petkov <bp@suse.de> Message-Id: <20200411160927.27954-1-bp@alien8.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:37 -04:00
Uros Bizjak	b4fd630812	KVM: SVM: Do not mark svm_vcpu_run with STACK_FRAME_NON_STANDARD svm_vcpu_run does not change stack or frame pointer anymore. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Message-Id: <20200414113612.104501-1-ubizjak@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:36 -04:00
Oliver Upton	69c0975525	kvm: nVMX: match comment with return type for nested_vmx_exit_reflected nested_vmx_exit_reflected() returns a bool, not int. As such, refer to the return values as true/false in the comment instead of 1/0. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20200414221241.134103-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:35 -04:00
Oliver Upton	b045ae906b	kvm: nVMX: reflect MTF VM-exits if injected by L1 According to SDM 26.6.2, it is possible to inject an MTF VM-exit via the VM-entry interruption-information field regardless of the 'monitor trap flag' VM-execution control. KVM appropriately copies the VM-entry interruption-information field from vmcs12 to vmcs02. However, if L1 has not set the 'monitor trap flag' VM-execution control, KVM fails to reflect the subsequent MTF VM-exit into L1. Fix this by consulting the VM-entry interruption-information field of vmcs12 to determine if L1 has injected the MTF VM-exit. If so, reflect the exit, regardless of the 'monitor trap flag' VM-execution control. Fixes: `5f3d45e7f2` ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <20200414224746.240324-1-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-15 12:08:35 -04:00
Marc Zyngier	1c32ca5dc6	KVM: arm: vgic: Fix limit condition when writing to GICD_I[CS]ACTIVER When deciding whether a guest has to be stopped we check whether this is a private interrupt or not. Unfortunately, there's an off-by-one bug here, and we fail to recognize a whole range of interrupts as being global (GICv2 SPIs 32-63). Fix the condition from > to be >=. Cc: stable@vger.kernel.org Fixes: `abd7229626` ("KVM: arm/arm64: Simplify active_change_prepare and plug race") Reported-by: André Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2020-04-15 14:56:14 +01:00
Fangrui Song	c9a4ef6645	arm64: Delete the space separator in __emit_inst In assembly, many instances of __emit_inst(x) expand to a directive. In a few places __emit_inst(x) is used as an assembler macro argument. For example, in arch/arm64/kvm/hyp/entry.S ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) expands to the following by the C preprocessor: alternative_insn nop, .inst (0xd500401f \| ((0) << 16 \| (4) << 5) \| ((!!1) << 8)), 4, 1 Both comma and space are separators, with an exception that content inside a pair of parentheses/quotes is not split, so the clang integrated assembler splits the arguments to: nop, .inst, (0xd500401f \| ((0) << 16 \| (4) << 5) \| ((!!1) << 8)), 4, 1 GNU as preprocesses the input with do_scrub_chars(). Its arm64 backend (along with many other non-x86 backends) sees: alternative_insn nop,.inst(0xd500401f\|((0)<<16\|(4)<<5)\|((!!1)<<8)),4,1 # .inst(...) is parsed as one argument while its x86 backend sees: alternative_insn nop,.inst (0xd500401f\|((0)<<16\|(4)<<5)\|((!!1)<<8)),4,1 # The extra space before '(' makes the whole .inst (...) parsed as two arguments The non-x86 backend's behavior is considered unintentional (https://sourceware.org/bugzilla/show_bug.cgi?id=25750). So drop the space separator inside `.inst (...)` to make the clang integrated assembler work. Suggested-by: Ilie Halip <ilie.halip@gmail.com> Signed-off-by: Fangrui Song <maskray@google.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://github.com/ClangBuiltLinux/linux/issues/939 Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2020-04-15 13:07:12 +01:00
Toke Høiland-Jørgensen	c6c111523d	selftests/bpf: Check for correct program attach/detach in xdp_attach test David Ahern noticed that there was a bug in the EXPECTED_FD code so programs did not get detached properly when that parameter was supplied. This case was not included in the xdp_attach tests; so let's add it to be sure that such a bug does not sneak back in down. Fixes: `87854a0b57` ("selftests/bpf: Add tests for attaching XDP programs") Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200414145025.182163-2-toke@redhat.com	2020-04-15 13:26:08 +02:00
Toke Høiland-Jørgensen	49b452c382	libbpf: Fix type of old_fd in bpf_xdp_set_link_opts The 'old_fd' parameter used for atomic replacement of XDP programs is supposed to be an FD, but was left as a u32 from an earlier iteration of the patch that added it. It was converted to an int when read, so things worked correctly even with negative values, but better change the definition to correctly reflect the intention. Fixes: `bd5ca3ef93` ("libbpf: Add function to set link XDP fd while specifying old program") Reported-by: David Ahern <dsahern@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200414145025.182163-1-toke@redhat.com	2020-04-15 13:26:08 +02:00
Andrii Nakryiko	25498a1969	libbpf: Always specify expected_attach_type on program load if supported For some types of BPF programs that utilize expected_attach_type, libbpf won't set load_attr.expected_attach_type, even if expected_attach_type is known from section definition. This was done to preserve backwards compatibility with old kernels that didn't recognize expected_attach_type attribute yet (which was added in `5e43f899b0` ("bpf: Check attach type at prog load time"). But this is problematic for some BPF programs that utilize newer features that require kernel to know specific expected_attach_type (e.g., extended set of return codes for cgroup_skb/egress programs). This patch makes libbpf specify expected_attach_type by default, but also detect support for this field in kernel and not set it during program load. This allows to have a good metadata for bpf_program (e.g., bpf_program__get_extected_attach_type()), but still work with old kernels (for cases where it can work at all). Additionally, due to expected_attach_type being always set for recognized program types, bpf_program__attach_cgroup doesn't have to do extra checks to determine correct attach type, so remove that additional logic. Also adjust section_names selftest to account for this change. More detailed discussion can be found in [0]. [0] https://lore.kernel.org/bpf/20200412003604.GA15986@rdna-mbp.dhcp.thefacebook.com/ Fixes: `5cf1e91456` ("bpf: cgroup inet skb programs can return 0 to 3") Fixes: `5e43f899b0` ("bpf: Check attach type at prog load time") Reported-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20200414182645.1368174-1-andriin@fb.com	2020-04-15 13:22:43 +02:00
Magnus Karlsson	99e3a236dd	xsk: Add missing check on user supplied headroom size Add a check that the headroom cannot be larger than the available space in the chunk. In the current code, a malicious user can set the headroom to a value larger than the chunk size minus the fixed XDP headroom. That way packets with a length larger than the supported size in the umem could get accepted and result in an out-of-bounds write. Fixes: `c0c77d8fb7` ("xsk: add user memory registration support sockopt") Reported-by: Bui Quang Minh <minhquangbui99@gmail.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://bugzilla.kernel.org/show_bug.cgi?id=207225 Link: https://lore.kernel.org/bpf/1586849715-23490-1-git-send-email-magnus.karlsson@intel.com	2020-04-15 13:07:18 +02:00
Mark Rutland	9cc3d0c691	arm64: vdso: don't free unallocated pages The aarch32_vdso_pages[] array never has entries allocated in the C_VVAR or C_VDSO slots, and as the array is zero initialized these contain NULL. However in __aarch32_alloc_vdso_pages() when aarch32_alloc_kuser_vdso_page() fails we attempt to free the page whose struct page is at NULL, which is obviously nonsensical. This patch removes the erroneous page freeing. Fixes: `7c1deeeb01` ("arm64: compat: VDSO setup for compat layer") Cc: <stable@vger.kernel.org> # 5.3.x- Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2020-04-15 11:13:16 +01:00
Wolfram Sang	3c1d1613be	i2c: remove i2c_new_probed_device API All in-tree users have been converted to the new i2c_new_scanned_device function, so remove this deprecated one. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2020-04-15 11:48:21 +02:00
Wolfram Sang	edb2c9dd39	i2c: altera: use proper variable to hold errno device_property_read_u32() returns errno or 0, so we should use the integer variable 'ret' and not the u32 'val' to hold the retval. Fixes: `0560ad5762` ("i2c: altera: Add Altera I2C Controller driver") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2020-04-15 11:48:21 +02:00
Hans de Goede	d79294d0de	i2c: designware: platdrv: Remove DPM_FLAG_SMART_SUSPEND flag on BYT and CHT We already set DPM_FLAG_SMART_PREPARE, so we completely skip all callbacks (other then prepare) where possible, quoting from dw_i2c_plat_prepare(): /* * If the ACPI companion device object is present for this device, it * may be accessed during suspend and resume of other devices via I2C * operation regions, so tell the PM core and middle layers to avoid * skipping system suspend/resume callbacks for it in that case. */ return !has_acpi_companion(dev); Also setting the DPM_FLAG_SMART_SUSPEND will cause acpi_subsys_suspend() to leave the controller runtime-suspended even if dw_i2c_plat_prepare() returned 0. Leaving the controller runtime-suspended normally, when the I2C controller is suspended during the suspend_late phase, is not an issue because the pm_runtime_get_sync() done by i2c_dw_xfer() will (runtime-)resume it. But for dw I2C controllers on Bay- and Cherry-Trail devices acpi_lpss.c leaves the controller alive until the suspend_noirq phase, because it may be used by the _PS3 ACPI methods of PCI devices and PCI devices are left powered on until the suspend_noirq phase. Between the suspend_late and resume_early phases runtime-pm is disabled. So for any ACPI I2C OPRegion accesses done after the suspend_late phase, the pm_runtime_get_sync() done by i2c_dw_xfer() is a no-op and the controller is left runtime-suspended. i2c_dw_xfer() has a check to catch this condition (rather then waiting for the I2C transfer to timeout because the controller is suspended). acpi_subsys_suspend() leaving the controller runtime-suspended in combination with an ACPI I2C OPRegion access done after the suspend_late phase triggers this check, leading to the following error being logged on a Bay Trail based Lenovo Thinkpad 8 tablet: [ 93.275882] i2c_designware 80860F41:00: Transfer while suspended [ 93.275993] WARNING: CPU: 0 PID: 412 at drivers/i2c/busses/i2c-designware-master.c:429 i2c_dw_xfer+0x239/0x280 ... [ 93.276252] Workqueue: kacpi_notify acpi_os_execute_deferred [ 93.276267] RIP: 0010:i2c_dw_xfer+0x239/0x280 ... [ 93.276340] Call Trace: [ 93.276366] __i2c_transfer+0x121/0x520 [ 93.276379] i2c_transfer+0x4c/0x100 [ 93.276392] i2c_acpi_space_handler+0x219/0x510 [ 93.276408] ? up+0x40/0x60 [ 93.276419] ? i2c_acpi_notify+0x130/0x130 [ 93.276433] acpi_ev_address_space_dispatch+0x1e1/0x252 ... So since on BYT and CHT platforms we want ACPI I2c OPRegion accesses to work until the suspend_noirq phase, we need the controller to be runtime-resumed during the suspend phase if it is runtime-suspended suspended at that time. This means that we must not set the DPM_FLAG_SMART_SUSPEND on these platforms. On BYT and CHT we already have a special ACCESS_NO_IRQ_SUSPEND flag to make sure the controller stays functional until the suspend_noirq phase. This commit makes the driver not set the DPM_FLAG_SMART_SUSPEND flag when that flag is set. Cc: stable@vger.kernel.org Fixes: `b30f2f6556` ("i2c: designware: Set IRQF_NO_SUSPEND flag for all BYT and CHT controllers") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2020-04-15 11:48:20 +02:00
Jason Yan	b0e387c3ec	x86/umip: Make umip_insns static Fix the following sparse warning: arch/x86/kernel/umip.c:84:12: warning: symbol 'umip_insns' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://lkml.kernel.org/r/20200413082213.22934-1-yanaijie@huawei.com	2020-04-15 11:13:12 +02:00
Borislav Petkov	e0d648f9d8	sched/vtime: Work around an unitialized variable warning Work around this warning: kernel/sched/cputime.c: In function ‘kcpustat_field’: kernel/sched/cputime.c:1007:6: warning: ‘val’ may be used uninitialized in this function [-Wmaybe-uninitialized] because GCC can't see that val is used only when err is 0. Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200327214334.GF8015@zn.tnic	2020-04-15 11:06:50 +02:00
Peter Xu	3662daf023	sched/isolation: Allow "isolcpus=" to skip unknown sub-parameters The "isolcpus=" parameter allows sub-parameters before the cpulist is specified, and if the parser detects an unknown sub-parameters the whole parameter will be ignored. This design is incompatible with itself when new sub-parameters are added. An older kernel will not recognize the new sub-parameter and will invalidate the whole parameter so the CPU isolation will not take effect. It emits a warning: isolcpus: Error, unknown flag The better and compatible way is to allow "isolcpus=" to skip unknown sub-parameters, so that even if new sub-parameters are added an older kernel will still be able to behave as usual even if with the new sub-parameter specified on the command line. Ideally this should have been there when the first sub-parameter for "isolcpus=" was introduced. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200403223517.406353-1-peterx@redhat.com	2020-04-15 10:38:26 +02:00
Eugene Syromiatnikov	a966dcfe15	clone3: add build-time CLONE_ARGS_SIZE_VER* validity checks CLONE_ARGS_SIZE_VER* macros are defined explicitly and not via the offsets of the relevant struct clone_args fields, which makes it rather error-prone, so it probably makes sense to add some compile-time checks for them (including the one that breaks on struct clone_args extension as a reminder to add a relevant size macro and a similar check). Function copy_clone_args_from_user seems to be a good place for such checks. Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20200412202658.GA31499@asgard.redhat.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2020-04-15 09:56:32 +02:00
Eugene Syromiatnikov	62173872ca	clone3: add a check for the user struct size if CLONE_INTO_CGROUP is set Passing CLONE_INTO_CGROUP with an under-sized structure (that doesn't properly contain cgroup field) seems like garbage input, especially considering the fact that fd 0 is a valid descriptor. Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20200412203123.GA5869@asgard.redhat.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2020-04-15 09:56:25 +02:00
Eugene Syromiatnikov	e82a118f57	clone3: fix cgroup argument sanity check Checking that cgroup field value of struct clone_args is less than 0 is useless, as it is defined as unsigned 64-bit integer. Moreover, it doesn't catch the situations where its higher bits are lost during the assignment to the cgroup field of the cgroup field of the internal struct kernel_clone_args (where it is declared as signed 32-bit integer), so it is still possible to pass garbage there. A check against INT_MAX solves both these issues. Fixes: `ef2c41cf38` ("clone3: allow spawning processes into cgroups") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20200412202533.GA29554@asgard.redhat.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2020-04-15 09:56:12 +02:00
Tamizh chelvam	93e2d04a18	mac80211: fix channel switch trigger from unknown mesh peer Previously mesh channel switch happens if beacon contains CSA IE without checking the mesh peer info. Due to that channel switch happens even if the beacon is not from its own mesh peer. Fixing that by checking if the CSA originated from the same mesh network before proceeding for channel switch. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Link: https://lore.kernel.org/r/1585403604-29274-1-git-send-email-tamizhr@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-04-15 09:54:26 +02:00

... 4 5 6 7 8 ...

915877 Commits All Branches Search

915877 Commits

All Branches