Commit Graph

966051 Commits

Author SHA1 Message Date
Juergen Gross 073d0552ea xen/events: avoid removing an event channel while handling it
Today it can happen that an event channel is being removed from the
system while the event handling loop is active. This can lead to a
race resulting in crashes or WARN() splats when trying to access the
irq_info structure related to the event channel.

Fix this problem by using a rwlock taken as reader in the event
handling loop and as writer when deallocating the irq_info structure.

As the observed problem was a NULL dereference in evtchn_from_irq()
make this function more robust against races by testing the irq_info
pointer to be not NULL before dereferencing it.

And finally make all accesses to evtchn_to_irq[row][col] atomic ones
in order to avoid seeing partial updates of an array element in irq
handling. Note that irq handling can be entered only for event channels
which have been valid before, so any not populated row isn't a problem
in this regard, as rows are only ever added and never removed.

This is XSA-331.

Cc: stable@vger.kernel.org
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reported-by: Jinoh Kang <luke1337@theori.io>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Wei Liu <wl@xen.org>
2020-10-20 10:21:51 +02:00
Christoph Hellwig 0eb3b4ab76 ARM/sa1111: add a missing include of dma-map-ops.h
Ensure the dmabounce functions are available for all Kconfig
permutations.

Fixes: 0a0f0d8be7 ("dma-mapping: split <linux/dma-mapping.h>")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-10-20 09:40:33 +02:00
Steve French acf96fef46 smb3.1.1: do not fail if no encryption required but server doesn't support it
There are cases where the server can return a cipher type of 0 and
it not be an error. For example server supported no encryption types
(e.g. server completely disabled encryption), or the server and
client didn't support any encryption types in common (e.g. if a
server only supported AES256_CCM). In those cases encryption would
not be supported, but that can be ok if the client did not require
encryption on mount and it should not return an error.

In the case in which mount requested encryption ("seal" on mount)
then checks later on during tree connection will return the proper
rc, but if seal was not requested by client, since server is allowed
to return 0 to indicate no supported cipher, we should not fail mount.

Reported-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-20 02:15:56 -05:00
Ido Schimmel df6afe2f7c nexthop: Fix performance regression in nexthop deletion
While insertion of 16k nexthops all using the same netdev ('dummy10')
takes less than a second, deletion takes about 130 seconds:

# time -p ip -b nexthop.batch
real 0.29
user 0.01
sys 0.15

# time -p ip link set dev dummy10 down
real 131.03
user 0.06
sys 0.52

This is because of repeated calls to synchronize_rcu() whenever a
nexthop is removed from a nexthop group:

# /usr/share/bcc/tools/offcputime -p `pgrep -nx ip` -K
...
    b'finish_task_switch'
    b'schedule'
    b'schedule_timeout'
    b'wait_for_completion'
    b'__wait_rcu_gp'
    b'synchronize_rcu.part.0'
    b'synchronize_rcu'
    b'__remove_nexthop'
    b'remove_nexthop'
    b'nexthop_flush_dev'
    b'nh_netdev_event'
    b'raw_notifier_call_chain'
    b'call_netdevice_notifiers_info'
    b'__dev_notify_flags'
    b'dev_change_flags'
    b'do_setlink'
    b'__rtnl_newlink'
    b'rtnl_newlink'
    b'rtnetlink_rcv_msg'
    b'netlink_rcv_skb'
    b'rtnetlink_rcv'
    b'netlink_unicast'
    b'netlink_sendmsg'
    b'____sys_sendmsg'
    b'___sys_sendmsg'
    b'__sys_sendmsg'
    b'__x64_sys_sendmsg'
    b'do_syscall_64'
    b'entry_SYSCALL_64_after_hwframe'
    -                ip (277)
        126554955

Since nexthops are always deleted under RTNL, synchronize_net() can be
used instead. It will call synchronize_rcu_expedited() which only blocks
for several microseconds as opposed to multiple milliseconds like
synchronize_rcu().

With this patch deletion of 16k nexthops takes less than a second:

# time -p ip link set dev dummy10 down
real 0.12
user 0.00
sys 0.04

Tested with fib_nexthops.sh which includes torture tests that prompted
the initial change:

# ./fib_nexthops.sh
...
Tests passed: 134
Tests failed:   0

Fixes: 90f33bffa3 ("nexthops: don't modify published nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20201016172914.643282-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 20:07:15 -07:00
Linus Torvalds 270315b823 RISC-V Patches for the 5.10 Merge Window, Part 1
This contains a handful of cleanups and new features, including:
 
 * A handful of cleanups for our page fault handling.
 * Improvements to how we fill out cacheinfo.
 * Support for EFI-based systems.
 
 ---
 
 This contains a merge from the EFI tree that was necessary as some of the EFI
 support landed over there.  It's my first time doing something like this,
 
 I haven't included the set_fs stuff because the base branch it depends on
 hasn't been merged yet.  I'll probably have another merge window PR, as
 there's more in flight (most notably the fix for new binutils I just sent out),
 but I figured there was no reason to delay this any longer.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl+KQ6gTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYibmwD/4qWfOW7R/kUWi08ethcaAhNEWLvqIh
 2/KjGLORw+NTZ1F4pEFyQG5LRd3yWDT/UXh/k8gXINqmdclNV01Z3T+O7WuRlISs
 07i26W1qRpNeJ7lDVhr9foKpeOU/AXvidgoF330nGlyO4HZkYKhK2yB3t8uGWywr
 Zt/EpMJeBIRKzWiLhOgLAdYJthhZ9AlnouNnr9myHnO5Ksel+AZ/BKYvn7ZbHMns
 6vFUxp6392/LERRRIfDqPsTuxPIYMHjuEsGSESLsjAIyq/shgN1knG/C+zwU5DcK
 zUDBt1DEP7Tb45w7VBASSjn1M+cUolz9/c2dBhlVcdBlk1GKF+KILSTmWUBpQ8oP
 ETVAuQK5HTcjy9bVcJMj0Oa3mFshVAAByOH+Wyrdo+qSLkb7y3spPvsL4dyjrKjL
 +pe6C7WvavaEFoQXVWO2sTUBGYt7qDLRdrDgOGBIHylTXhTxf2wYzAF4ZmDROECT
 Qfc7Ac3aIWYvWDmxE+x8OniuclfZ0DndKLKQj6FJWUTIxFZzTxsHK75d47D1ID0S
 ZwAmUd0eYjjwMTO/6AM/Aqu3o8IP4GOXjJf4ijxH9+LjpUhm/ibmHDAUY69sU1WX
 kdX51gQzoEuW7XMVz1HoTSvaGGKtyFDuRxs8RG/tSFaRtznRz0Sro6BpLCeG968n
 k/d5WL/vZZ/NDA==
 =FYs/
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.10-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:
 "A handful of cleanups and new features:

   - A handful of cleanups for our page fault handling

   - Improvements to how we fill out cacheinfo

   - Support for EFI-based systems"

* tag 'riscv-for-linus-5.10-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (22 commits)
  RISC-V: Add page table dump support for uefi
  RISC-V: Add EFI runtime services
  RISC-V: Add EFI stub support.
  RISC-V: Add PE/COFF header for EFI stub
  RISC-V: Implement late mapping page table allocation functions
  RISC-V: Add early ioremap support
  RISC-V: Move DT mapping outof fixmap
  RISC-V: Fix duplicate included thread_info.h
  riscv/mm/fault: Set FAULT_FLAG_INSTRUCTION flag in do_page_fault()
  riscv/mm/fault: Fix inline placement in vmalloc_fault() declaration
  riscv: Add cache information in AUX vector
  riscv: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
  riscv: Set more data to cacheinfo
  riscv/mm/fault: Move access error check to function
  riscv/mm/fault: Move FAULT_FLAG_WRITE handling in do_page_fault()
  riscv/mm/fault: Simplify mm_fault_error()
  riscv/mm/fault: Move fault error handling to mm_fault_error()
  riscv/mm/fault: Simplify fault error handling
  riscv/mm/fault: Move vmalloc fault handling to vmalloc_fault()
  riscv/mm/fault: Move bad area handling to bad_area()
  ...
2020-10-19 18:18:30 -07:00
Linus Torvalds d3876ff744 m68knommu: collection of fixes for 5.10
Fixes include:
 . switch to using asm-generic uaccess code
 . fix sparse warnings in signal code
 . fix compilation of ColdFire MMC support
 . support sysrq in ColdFire serial driver
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEmsfM6tQwfNjBOxr3TiQVqaG9L4AFAl+M4OUACgkQTiQVqaG9
 L4Bb0A//fsrbpUL6p0NHiuOoP+327shE+Tw+l78yCQRmnxtSnI2cK/Mq7O4/mLI/
 6w/Fisw/EnBAVeG1v6HZ4BoQopTVzFj7/iE4KgOkum0FDhy55VHKhkAWRAX6ZslW
 RozzahotW4LbmhD1IUSmlNU5gIyABm3hSlZKjKJEo90FuGTeMpwnClefjhTSzOuY
 rfKk73PtDHCCRXfF52vweabhR+17Akh2D3gSwje5MSRvIC+7BC3wyE1s4B/pGFX2
 Xw9ziGw8dz9J7wqREKT/AUuylqO++HrGfAeXkNJ6xggI8iOJsxVb943BDgu3P1gF
 MaowvV/sHwwocNBOzqDExobx9OS3/UC124+sSwcPJxBCXDJkMiqvxVGymX+6j/aN
 WXXv+8DVUDL3hKbRsVVssiBo2oQ/pBtZigyB2HHt9z9zIPtyQ03nm0Nud8o+fYm6
 6CT2xdntnBKAjAyP2eu7L3dEBCoK3E6juYMiwdiwys8C2i8SF8hCI+QgXRiSQcml
 dQeyCxZJLxwH1fG3zJByWDUP9Pm6hC3MV1dKGhzbVJnieX0qKn7sHW1CBcvW/X84
 qZp+C7O+DjJs0yfR2QzfJ7gIk1GhJH13HwiXWx4BMEGU4oAUCgajEeEip+XOKol4
 as5xSqcyU8eGNDmzjibvi2Lqnn/CPzjx/CATaduYrpAmao55kg4=
 =QMK9
 -----END PGP SIGNATURE-----

Merge tag 'm68knommu-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

Pull m68knommu updates from Greg Ungerer:
 "A collection of fixes for 5.10:

   - switch to using asm-generic uaccess code

   - fix sparse warnings in signal code

   - fix compilation of ColdFire MMC support

   - support sysrq in ColdFire serial driver"

* tag 'm68knommu-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  serial: mcf: add sysrq capability
  m68knommu: include SDHC support only when hardware has it
  m68knommu: fix sparse warnings in signal code
  m68knommu: switch to using asm-generic/uaccess.h
2020-10-19 18:12:44 -07:00
Maxim Kochetkov a15a6afb3b net: dsa: seville: the packet buffer is 2 megabits, not megabytes
The VSC9953 Seville switch has 2 megabits of buffer split into 4360
words of 60 bytes each. 2048 * 1024 is 2 megabytes instead of 2 megabits.
2 megabits is (2048 / 8) * 1024 = 256 * 1024.

Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Fixes: a63ed92d21 ("net: dsa: seville: fix buffer size of the queue system")
Link: https://lore.kernel.org/r/20201019050625.21533-1-fido_max@inbox.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 18:03:42 -07:00
Po-Hsu Lin 26ebd6fed9 selftests: rtnetlink: load fou module for kci_test_encap_fou() test
The kci_test_encap_fou() test from kci_test_encap() in rtnetlink.sh
needs the fou module to work. Otherwise it will fail with:

  $ ip netns exec "$testns" ip fou add port 7777 ipproto 47
  RTNETLINK answers: No such file or directory
  Error talking to the kernel

Add the CONFIG_NET_FOU into the config file as well. Which needs at
least to be set as a loadable module.

Fixes: 6227efc1a2 ("selftests: rtnetlink.sh: add vxlan and fou test cases")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Link: https://lore.kernel.org/r/20201019030928.9859-1-po-hsu.lin@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 17:55:29 -07:00
Christian Eggers bc7e343dbd net: dsa: tag_ksz: KSZ8795 and KSZ9477 also use tail tags
The Marvell 88E6060 uses tag_trailer.c and the KSZ8795, KSZ9477 and
KSZ9893 switches also use tail tags.

Fixes: 7a6ffe764b ("net: dsa: point out the tail taggers")
Signed-off-by: Christian Eggers <ceggers@arri.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20201016171603.10587-1-ceggers@arri.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 17:32:50 -07:00
Valentin Vidic 3bd57b9055 net: korina: cast KSEG0 address to pointer in kfree
Fixes gcc warning:

passing argument 1 of 'kfree' makes pointer from integer without a cast

Fixes: 3af5f0f5c7 ("net: korina: fix kfree of rx/tx descriptor array")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Link: https://lore.kernel.org/r/20201018184255.28989-1-vvidic@valentin-vidic.from.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 17:00:00 -07:00
Heiner Kallweit 424a646e07 r8169: fix operation under forced interrupt threading
For several network drivers it was reported that using
__napi_schedule_irqoff() is unsafe with forced threading. One way to
fix this is switching back to __napi_schedule, but then we lose the
benefit of the irqoff version in general. As stated by Eric it doesn't
make sense to make the minimal hard irq handlers in drivers using NAPI
a thread. Therefore ensure that the hard irq handler is never
thread-ified.

Fixes: 9a899a35b0 ("r8169: switch to napi_schedule_irqoff")
Link: https://lkml.org/lkml/2020/10/18/19
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/4d3ef84a-c812-5072-918a-22a6f6468310@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-19 16:55:54 -07:00
Martin KaFai Lau 8568c3cefd bpf: selftest: Ensure the return value of the bpf_per_cpu_ptr() must be checked
This patch tests all pointers returned by bpf_per_cpu_ptr() must be
tested for NULL first before it can be accessed.

This patch adds a subtest "null_check", so it moves the ".data..percpu"
existence check to the very beginning and before doing any subtest.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201019194225.1051596-1-kafai@fb.com
2020-10-19 15:57:42 -07:00
Martin KaFai Lau e710bcc6d9 bpf: selftest: Ensure the return value of bpf_skc_to helpers must be checked
This patch tests:

int bpf_cls(struct __sk_buff *skb)
{
	/* REG_6: sk
	 * REG_7: tp
	 * REG_8: req_sk
	 */

	sk = skb->sk;
	if (!sk)
		return 0;

	tp = bpf_skc_to_tcp_sock(sk);
	req_sk = bpf_skc_to_tcp_request_sock(sk);
	if (!req_sk)
		return 0;

	/* !tp has not been tested, so verifier should reject. */
	return *(__u8 *)tp;
}

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201019194219.1051314-1-kafai@fb.com
2020-10-19 15:57:42 -07:00
Martin KaFai Lau 93c230e3f5 bpf: Enforce id generation for all may-be-null register type
The commit af7ec13833 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
introduces RET_PTR_TO_BTF_ID_OR_NULL and
the commit eaa6bcb71e ("bpf: Introduce bpf_per_cpu_ptr()")
introduces RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL.
Note that for RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL, the reg0->type
could become PTR_TO_MEM_OR_NULL which is not covered by
BPF_PROBE_MEM.

The BPF_REG_0 will then hold a _OR_NULL pointer type. This _OR_NULL
pointer type requires the bpf program to explicitly do a NULL check first.
After NULL check, the verifier will mark all registers having
the same reg->id as safe to use.  However, the reg->id
is not set for those new _OR_NULL return types.  One of the ways
that may be wrong is, checking NULL for one btf_id typed pointer will
end up validating all other btf_id typed pointers because
all of them have id == 0.  The later tests will exercise
this path.

To fix it and also avoid similar issue in the future, this patch
moves the id generation logic out of each individual RET type
test in check_helper_call().  Instead, it does one
reg_type_may_be_null() test and then do the id generation
if needed.

This patch also adds a WARN_ON_ONCE in mark_ptr_or_null_reg()
to catch future breakage.

The _OR_NULL pointer usage in the bpf_iter_reg.ctx_arg_info is
fine because it just happens that the existing id generation after
check_ctx_access() has covered it.  It is also using the
reg_type_may_be_null() to decide if id generation is needed or not.

Fixes: af7ec13833 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
Fixes: eaa6bcb71e ("bpf: Introduce bpf_per_cpu_ptr()")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201019194212.1050855-1-kafai@fb.com
2020-10-19 15:57:42 -07:00
Linus Torvalds bbe85027ce Recalling the first round of new code for 5.10, in which we added:
- New feature: Widen inode timestamps and quota grace expiration
   timestamps to support dates through the year 2486.
 - New feature: storing inode btree counts in the AGI to speed up certain
   mount time per-AG block reservation operatoins and add a little more
   metadata redundancy.
 
 For the second round of new code for 5.10:
 - Deprecate the V4 filesystem format, some disused mount options, and some
   legacy sysctl knobs now that we can support dates into the 25th century.
   Note that removal of V4 support will not happen until the early 2030s.
 - Fix some probles with inode realtime flag propagation.
 - Fix some buffer handling issues when growing a rt filesystem.
 - Fix a problem where a BMAP_REMAP unmap call would free rt extents even
   though the purpose of BMAP_REMAP is to avoid freeing the blocks.
 - Strengthen the dabtree online scrubber to check hash values on child
   dabtree blocks.
 - Actually log new intent items created as part of recovering log intent
   items.
 - Fix a bug where quotas weren't attached to an inode undergoing bmap
   intent item recovery.
 - Fix a buffer overrun problem with specially crafted log buffer
   headers.
 - Various cleanups to type usage and slightly inaccurate comments.
 - More cleanups to the xattr, log, and quota code.
 - Don't run the (slower) shared-rmap operations on attr fork mappings.
 - Fix a bug where we failed to check the LSN of finobt blocks during
   replay and could therefore overwrite newer data with older data.
 - Clean up the ugly nested transaction mess that log recovery uses to
   stage intent item recovery in the correct order by creating a proper
   data structure to capture recovered chains.
 - Use the capture structure to resume intent item chains with the
   same log space and block reservations as when they were captured.
 - Fix a UAF bug in bmap intent item recovery where we failed to maintain
   our reference to the incore inode if the bmap operation needed to
   relog itself to continue.
 - Rearrange the defer ops mechanism to finish newly created subtasks
   of a parent task before moving on to the next parent task.
 - Automatically relog intent items in deferred ops chains if doing so
   would help us avoid pinning the log tail.  This will help fix some
   log scaling problems now and will facilitate atomic file updates later.
 - Fix a deadlock in the GETFSMAP implementation by using an internal
   memory buffer to reduce indirect calls and copies to userspace,
   thereby improving its performance by ~20%.
 - Fix various problems when calling growfs on a realtime volume would
   not fully update the filesystem metadata.
 - Fix broken Kconfig asking about deprecated XFS when XFS is disabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl+KRuUACgkQ+H93GTRK
 tOtqsBAAm+AZ92DRjOD7/TbU3vJALRKBBUCc6weEYUJKaZUkdYpx7Fn8Si3K8Nu0
 Pxo8qLO8WtP3ECyd+CZgkQgZAhHrjRG+FnCOuNyj1yMguX9CDu4cK0dOh/M64+pM
 BvWPqLfd99mzr7HkQ0SuLIyDMeio3leU4lySAIVpADO3V7WF5ZgHCfEETpOh5Di1
 oIzhYlxHyfK+32u4sXSWsPnogQZwjyn4CyQ+6humK0d089pVB1wbjHaTym7exjSa
 cFhMqS1XDbpMuoF4BXMcx31UTOb+8/S6TKCVsRl61j3XKGzbYKSrLmrSb/r6gFWn
 wyXJGmLok0I2UDnX1ZArIWstJHcgPlTelWrssG8wAnopLSJoU10f8o88m43d0krF
 fCUCac1rKPcisg7CS5njgUkOBknSLeBCeztl59N/8acnkaETPQr0tReDpB4wGGaW
 aGEWBrCbz1QZyfDBttNPQLcreROGukZ8R8MMRl4GiAQwZz5UrTUFeoK6thplHVvp
 ANhpYGdJy4jJ79wt4MNVYUF8U8IRWdn0ddsRx08pLWchC1PH8HH944qrUXAVPYZ+
 MohSQqKtjvaKwZLP86SvCJFs20wEEUxzCSQbz4LTO5aBz3uDfo0LQSOYzPBU+OKp
 E33SNds13nsjeH8HBLtXH3lr3absywLcV2ZMaIGsQSLpE2p8AHA=
 =LuWg
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull more xfs updates from Darrick Wong:
 "The second large pile of new stuff for 5.10, with changes even more
  monumental than last week!

  We are formally announcing the deprecation of the V4 filesystem format
  in 2030. All users must upgrade to the V5 format, which contains
  design improvements that greatly strengthen metadata validation,
  supports reflink and online fsck, and is the intended vehicle for
  handling timestamps past 2038. We're also deprecating the old Irix
  behavioral tweaks in September 2025.

  Coming along for the ride are two design changes to the deferred
  metadata ops subsystem. One of the improvements is to retain correct
  logical ordering of tasks and subtasks, which is a more logical design
  for upper layers of XFS and will become necessary when we add atomic
  file range swaps and commits. The second improvement to deferred ops
  improves the scalability of the log by helping the log tail to move
  forward during long-running operations. This reduces log contention
  when there are a large number of threads trying to run transactions.

  In addition to that, this fixes numerous small bugs in log recovery;
  refactors logical intent log item recovery to remove the last
  remaining place in XFS where we could have nested transactions; fixes
  a couple of ways that intent log item recovery could fail in ways that
  wouldn't have happened in the regular commit paths; fixes a deadlock
  vector in the GETFSMAP implementation (which improves its performance
  by 20%); and fixes serious bugs in the realtime growfs, fallocate, and
  bitmap handling code.

  Summary:

   - Deprecate the V4 filesystem format, some disused mount options, and
     some legacy sysctl knobs now that we can support dates into the
     25th century. Note that removal of V4 support will not happen until
     the early 2030s.

   - Fix some probles with inode realtime flag propagation.

   - Fix some buffer handling issues when growing a rt filesystem.

   - Fix a problem where a BMAP_REMAP unmap call would free rt extents
     even though the purpose of BMAP_REMAP is to avoid freeing the
     blocks.

   - Strengthen the dabtree online scrubber to check hash values on
     child dabtree blocks.

   - Actually log new intent items created as part of recovering log
     intent items.

   - Fix a bug where quotas weren't attached to an inode undergoing bmap
     intent item recovery.

   - Fix a buffer overrun problem with specially crafted log buffer
     headers.

   - Various cleanups to type usage and slightly inaccurate comments.

   - More cleanups to the xattr, log, and quota code.

   - Don't run the (slower) shared-rmap operations on attr fork
     mappings.

   - Fix a bug where we failed to check the LSN of finobt blocks during
     replay and could therefore overwrite newer data with older data.

   - Clean up the ugly nested transaction mess that log recovery uses to
     stage intent item recovery in the correct order by creating a
     proper data structure to capture recovered chains.

   - Use the capture structure to resume intent item chains with the
     same log space and block reservations as when they were captured.

   - Fix a UAF bug in bmap intent item recovery where we failed to
     maintain our reference to the incore inode if the bmap operation
     needed to relog itself to continue.

   - Rearrange the defer ops mechanism to finish newly created subtasks
     of a parent task before moving on to the next parent task.

   - Automatically relog intent items in deferred ops chains if doing so
     would help us avoid pinning the log tail. This will help fix some
     log scaling problems now and will facilitate atomic file updates
     later.

   - Fix a deadlock in the GETFSMAP implementation by using an internal
     memory buffer to reduce indirect calls and copies to userspace,
     thereby improving its performance by ~20%.

   - Fix various problems when calling growfs on a realtime volume would
     not fully update the filesystem metadata.

   - Fix broken Kconfig asking about deprecated XFS when XFS is
     disabled"

* tag 'xfs-5.10-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (48 commits)
  xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n
  xfs: fix high key handling in the rt allocator's query_range function
  xfs: annotate grabbing the realtime bitmap/summary locks in growfs
  xfs: make xfs_growfs_rt update secondary superblocks
  xfs: fix realtime bitmap/summary file truncation when growing rt volume
  xfs: fix the indent in xfs_trans_mod_dquot
  xfs: do the ASSERT for the arguments O_{u,g,p}dqpp
  xfs: fix deadlock and streamline xfs_getfsmap performance
  xfs: limit entries returned when counting fsmap records
  xfs: only relog deferred intent items if free space in the log gets low
  xfs: expose the log push threshold
  xfs: periodically relog deferred intent items
  xfs: change the order in which child and parent defer ops are finished
  xfs: fix an incore inode UAF in xfs_bui_recover
  xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering
  xfs: clean up bmap intent item recovery checking
  xfs: xfs_defer_capture should absorb remaining transaction reservation
  xfs: xfs_defer_capture should absorb remaining block reservations
  xfs: proper replay of deferred ops queued during log recovery
  xfs: remove XFS_LI_RECOVERED
  ...
2020-10-19 14:38:46 -07:00
Linus Torvalds 694565356c fuse update for 5.10
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCX4n0/gAKCRDh3BK/laaZ
 PM3jAP4xhaix0j/y3VyaxsUqWg6ZSrjq6X0o9clGMJv27IAtjgD/fJ7ZwzTldojD
 qb7N3utjLiPVRjwFmvsZ8JZ7O7PbwQ0=
 =oUbZ
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Support directly accessing host page cache from virtiofs. This can
   improve I/O performance for various workloads, as well as reducing
   the memory requirement by eliminating double caching. Thanks to Vivek
   Goyal for doing most of the work on this.

 - Allow automatic submounting inside virtiofs. This allows unique
   st_dev/ st_ino values to be assigned inside the guest to files
   residing on different filesystems on the host. Thanks to Max Reitz
   for the patches.

 - Fix an old use after free bug found by Pradeep P V K.

* tag 'fuse-update-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (25 commits)
  virtiofs: calculate number of scatter-gather elements accurately
  fuse: connection remove fix
  fuse: implement crossmounts
  fuse: Allow fuse_fill_super_common() for submounts
  fuse: split fuse_mount off of fuse_conn
  fuse: drop fuse_conn parameter where possible
  fuse: store fuse_conn in fuse_req
  fuse: add submount support to <uapi/linux/fuse.h>
  fuse: fix page dereference after free
  virtiofs: add logic to free up a memory range
  virtiofs: maintain a list of busy elements
  virtiofs: serialize truncate/punch_hole and dax fault path
  virtiofs: define dax address space operations
  virtiofs: add DAX mmap support
  virtiofs: implement dax read/write operations
  virtiofs: introduce setupmapping/removemapping commands
  virtiofs: implement FUSE_INIT map_alignment field
  virtiofs: keep a list of free dax memory ranges
  virtiofs: add a mount option to enable dax
  virtiofs: set up virtio_fs dax_device
  ...
2020-10-19 14:28:30 -07:00
Michael Neuling d1781f2370 selftests/powerpc: Make alignment handler test P9N DD2.1 vector CI load workaround
alignment_handler currently only tests the unaligned cases but it can
also be useful for testing the workaround for the P9N DD2.1 vector CI
load issue fixed by p9_hmi_special_emu(). This workaround was
introduced in 5080332c2c ("powerpc/64s: Add workaround for P9 vector
CI load issue").

This changes the loop to start from offset 0 rather than 1 so that we
test the kernel emulation in p9_hmi_special_emu().

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201013043741.743413-2-mikey@neuling.org
2020-10-20 08:00:42 +11:00
Michael Neuling 1da4a0272c powerpc: Fix undetected data corruption with P9N DD2.1 VSX CI load emulation
__get_user_atomic_128_aligned() stores to kaddr using stvx which is a
VMX store instruction, hence kaddr must be 16 byte aligned otherwise
the store won't occur as expected.

Unfortunately when we call __get_user_atomic_128_aligned() in
p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't
guaranteed to be 16B aligned. This means that the write to vbuf in
__get_user_atomic_128_aligned() has the bottom bits of the address
truncated. This results in other local variables being
overwritten. Also vbuf will not contain the correct data which results
in the userspace emulation being wrong and hence undetected user data
corruption.

In the past we've been mostly lucky as vbuf has ended up aligned but
this is fragile and isn't always true. CONFIG_STACKPROTECTOR in
particular can change the stack arrangement enough that our luck runs
out.

This issue only occurs on POWER9 Nimbus <= DD2.1 bare metal.

The fix is to align vbuf to a 16 byte boundary.

Fixes: 5080332c2c ("powerpc/64s: Add workaround for P9 vector CI load issue")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201013043741.743413-1-mikey@neuling.org
2020-10-20 07:59:21 +11:00
Linus Torvalds 922a763ae1 zonefs changes for 5.10
This pull request introduces the following changes to zonefs:
 
 * Add the "explicit-open" mount option to automatically issue a
   REQ_OP_ZONE_OPEN command to the device whenever a sequential zone file
   is open for writing for the first time. This avoids "insufficient zone
   resources" errors for write operations on some drives with limited
   zone resources or on ZNS drives with a limited number of active zones.
   From Johannes.
 
 Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCX4zOWAAKCRDdoc3SxdoY
 dh7PAP9IdcYnR9x6ttd2Aqsm17IBfY6b/TroE70Lm2YTlY0nTgD+IJTwYQG8KQAE
 QHAe6TD6VQfSftOeAOAnjEG64Iv2hQE=
 =vwu+
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs updates from Damien Le Moal:
 "Add an 'explicit-open' mount option to automatically issue a
  REQ_OP_ZONE_OPEN command to the device whenever a sequential zone file
  is open for writing for the first time.

  This avoids 'insufficient zone resources' errors for write operations
  on some drives with limited zone resources or on ZNS drives with a
  limited number of active zones. From Johannes"

* tag 'zonefs-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: document the explicit-open mount option
  zonefs: open/close zone on file open/close
  zonefs: provide no-lock zonefs_io_error variant
  zonefs: introduce helper for zone management
2020-10-19 13:52:01 -07:00
Alexandre Belloni 35331b506f rtc: r9701: set range
Set range and remove the set_time check. This is a classic BCD RTC.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201015191135.471249-6-alexandre.belloni@bootlin.com
2020-10-19 22:48:55 +02:00
Alexandre Belloni dfe13cf2ae rtc: r9701: convert to devm_rtc_allocate_device
This allows further improvement of the driver.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201015191135.471249-5-alexandre.belloni@bootlin.com
2020-10-19 22:48:55 +02:00
Alexandre Belloni 8b34134907 rtc: r9701: stop setting RWKCNT
tm_wday is never checked for validity and it is not read back in
r9701_get_datetime. Avoid setting it to stop tripping static checkers:

        drivers/rtc/rtc-r9701.c:109 r9701_set_datetime()
        error: undefined (user controlled) shift '1 << dt->tm_wday'

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201015191135.471249-4-alexandre.belloni@bootlin.com
2020-10-19 22:48:55 +02:00
Alexandre Belloni 2a8f3380c9 rtc: r9701: remove useless memset
The RTC core already sets to zero the struct rtc_tie it passes to the
driver, avoid doing it a second time.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201015191135.471249-3-alexandre.belloni@bootlin.com
2020-10-19 22:48:55 +02:00
Alexandre Belloni 7390bec4ed rtc: r9701: stop setting a default time
It doesn't make sense to set the RTC to a default value at probe time. Let
the core handle invalid date and time.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201015191135.471249-2-alexandre.belloni@bootlin.com
2020-10-19 22:48:55 +02:00
Alexandre Belloni 92c6dcfbd1 rtc: r9701: remove leftover comment
Commit 22652ba724 ("rtc: stop validating rtc_time in .read_time") removed
the code but not the associated comment.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201015191135.471249-1-alexandre.belloni@bootlin.com
2020-10-19 22:48:55 +02:00
Alexandre Belloni 2eeaa532ac rtc: rv3032: Add a driver for Microcrystal RV-3032
New driver for the Microcrystal RV-3032, including support for:
 - Date/time
 - Alarms
 - Low voltage detection
 - Trickle charge
 - Trimming
 - Clkout
 - RAM
 - EEPROM
 - Temperature sensor

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20201013144110.1942218-3-alexandre.belloni@bootlin.com
2020-10-19 22:47:56 +02:00
Alexandre Belloni 5ebe59a505 dt-bindings: rtc: rv3032: add RV-3032 bindings
Document the Microcrystal RV-3032 device tree bindings

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20201013144110.1942218-2-alexandre.belloni@bootlin.com
2020-10-19 22:47:56 +02:00
Alexandre Belloni 61ee0674bc dt-bindings: rtc: add trickle-voltage-millivolt
Some RTCs have a trickle charge that is able to output different voltages
depending on the type of the connected auxiliary power (battery, supercap,
...). Add a property allowing to specify the necessary voltage.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20201013144110.1942218-1-alexandre.belloni@bootlin.com
2020-10-19 22:47:56 +02:00
Shyam Prasad N 0bd294b55a cifs: Return the error from crypt_message when enc/dec key not found.
In crypt_message, when smb2_get_enc_key returns error, we need to
return the error back to the caller. If not, we end up processing
the message further, causing a kernel oops due to unwarranted access
of memory.

Call Trace:
smb3_receive_transform+0x120/0x870 [cifs]
cifs_demultiplex_thread+0xb53/0xc20 [cifs]
? cifs_handle_standard+0x190/0x190 [cifs]
kthread+0x116/0x130
? kthread_park+0x80/0x80
ret_from_fork+0x1f/0x30

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-19 15:11:39 -05:00
Steve French 63ca565635 smb3.1.1: set gcm256 when requested
update smb encryption code to set 32 byte key length and to
set gcm256 when requested on mount.

Signed-off-by: Steve French <stfrench@microsoft.com>
2020-10-19 15:11:11 -05:00
Steve French fd08f2dbf0 smb3.1.1: rename nonces used for GCM and CCM encryption
Now that 256 bit encryption can be negotiated, update
names of the nonces to match the updated official protocol
documentation (e.g. AES_GCM_NONCE instead of AES_128GCM_NONCE)
since they apply to both 128 bit and 256 bit encryption.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-10-19 15:11:06 -05:00
Steve French 511ac89e59 smb3.1.1: print warning if server does not support requested encryption type
If server does not support AES-256-GCM and it was required on mount, print
warning message. Also log and return a different error message (EOPNOTSUPP)
when encryption mechanism is not supported vs the case when an unknown
unrequested encryption mechanism could be returned (EINVAL).

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2020-10-19 15:08:42 -05:00
Pavel Begunkov 900fad45dc io_uring: fix racy REQ_F_LINK_TIMEOUT clearing
io_link_timeout_fn() removes REQ_F_LINK_TIMEOUT from the link head's
flags, it's not atomic and may race with what the head is doing.

If io_link_timeout_fn() doesn't clear the flag, as forced by this patch,
then it may happen that for "req -> link_timeout1 -> link_timeout2",
__io_kill_linked_timeout() would find link_timeout2 and try to cancel
it, so miscounting references. Teach it to ignore such double timeouts
by marking the active one with a new flag in io_prep_linked_timeout().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19 13:29:51 -06:00
Pavel Begunkov 4d52f33899 io_uring: do poll's hash_node init in common code
Move INIT_HLIST_NODE(&req->hash_node) into __io_arm_poll_handler(), so
that it doesn't duplicated and common poll code would be responsible for
it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19 13:29:29 -06:00
Pavel Begunkov dd221f46f6 io_uring: inline io_poll_task_handler()
io_poll_task_handler() doesn't add clarity, inline it in its only user.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19 13:29:29 -06:00
Pavel Begunkov 069b89384d io_uring: remove extra ->file check in poll prep
io_poll_add_prep() doesn't need to verify ->file because it's already
done in io_init_req().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19 13:29:29 -06:00
Pavel Begunkov 2c3bac6dd6 io_uring: make cached_cq_overflow non atomic_t
ctx->cached_cq_overflow is changed only under completion_lock. Convert
it from atomic_t to just int, and mark all places when it's read without
lock with READ_ONCE, which guarantees atomicity (relaxed ordering).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19 13:29:29 -06:00
Pavel Begunkov d148ca4b07 io_uring: inline io_fail_links()
Inline io_fail_links() and kill extra io_cqring_ev_posted().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19 13:29:29 -06:00
Pavel Begunkov ec99ca6c47 io_uring: kill ref get/drop in personality init
Don't take an identity on personality/creds init only to drop it a few
lines after. Extract a function which prepares req->work but leaves it
without identity.

Note: it's safe to not check REQ_F_WORK_INITIALIZED there because it's
nobody had a chance to init it before io_init_req().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19 13:29:29 -06:00
Pavel Begunkov 2e5aa6cb4d io_uring: flags-based creds init in queue
Use IO_WQ_WORK_CREDS to figure out if req has creds to be used.
Since recently it should rely only on flags, but not value of
work.creds.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-19 13:29:29 -06:00
Tom Rix 76702a2e72 bpf: Remove unneeded break
A break is not needed if it is preceded by a return.

Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201019173846.1021-1-trix@redhat.com
2020-10-19 20:40:21 +02:00
Chris Wilson 4a9bb58aba drm/i915/gt: Wait for CSB entries on Tigerlake
On Tigerlake, we are seeing a repeat of commit d8f5053117 ("drm/i915/icl:
Forcibly evict stale csb entries") where, presumably, due to a missing
Global Observation Point synchronisation, the write pointer of the CSB
ringbuffer is updated _prior_ to the contents of the ringbuffer. That is
we see the GPU report more context-switch entries for us to parse, but
those entries have not been written, leading us to process stale events,
and eventually report a hung GPU.

However, this effect appears to be much more severe than we previously
saw on Icelake (though it might be best if we try the same approach
there as well and measure), and Bruce suggested the good idea of resetting
the CSB entry after use so that we can detect when it has been updated by
the GPU. By instrumenting how long that may be, we can set a reliable
upper bound for how long we should wait for:

    513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045
References: d8f5053117 ("drm/i915/icl: Forcibly evict stale csb entries")
References: HSDES#22011327657, HSDES#1508287568
Suggested-by: Bruce Chang <yu.bruce.chang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org # v5.4
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-2-chris@chris-wilson.co.uk
(cherry picked from commit 233c1ae3c8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 14:32:31 -04:00
Chris Wilson ca05277e40 drm/i915/gt: Widen CSB pointer to u64 for the parsers
A CSB entry is 64b, and it is simpler for us to treat it as an array of
64b entries than as an array of pairs of 32b entries.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-1-chris@chris-wilson.co.uk
(cherry picked from commit f24a44e52f)
(cherry picked from commit 3d4dbe0e0f0d04ebcea917b7279586817da8cf46)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 14:31:59 -04:00
Arvind Sankar b17a45b6e5 x86/boot/64: Explicitly map boot_params and command line
Commits

  ca0e22d4f0 ("x86/boot/compressed/64: Always switch to own page table")
  8570978ea0 ("x86/boot/compressed/64: Don't pre-map memory in KASLR code")

set up a new page table in the decompressor stub, but without explicit
mappings for boot_params and the kernel command line, relying on the #PF
handler instead.

This is fragile, as boot_params and the command line mappings are
required for the main kernel. If EARLY_PRINTK and RANDOMIZE_BASE are
disabled, a QEMU/OVMF boot never accesses the command line in the
decompressor stub, and so it never gets mapped. The main kernel accesses
it from the identity mapping if AMD_MEM_ENCRYPT is enabled, and will
crash.

Fix this by adding back the explicit mapping of boot_params and the
command line.

Note: the changes also removed the explicit mapping of the main kernel,
with the result that .bss and .brk may not be in the identity mapping,
but those don't get accessed by the main kernel before it switches to
its own page tables.

 [ bp: Pass boot_params with a MOV %rsp... instead of PUSH/POP. Use
   block formatting for the comment. ]

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20201016200404.1615994-1-nivedita@alum.mit.edu
2020-10-19 19:39:50 +02:00
Al Grant f3d301c1f2 perf: correct SNOOPX field offset
perf_event.h has macros that define the field offsets in the
data_src bitmask in perf records. The SNOOPX and REMOTE offsets
were both 37. These are distinct fields, and the bitfield layout
in perf_mem_data_src confirms that SNOOPX should be at offset 38.

Fixes: 52839e653b ("perf tools: Add support for printing new mem_info encodings")
Signed-off-by: Al Grant <al.grant@foss.arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/4ac9f5cc-4388-b34a-9999-418a4099415d@foss.arm.com
2020-10-19 19:39:22 +02:00
Chris Wilson db9bc2d35f drm/i915: Use the active reference on the vma while capturing
During error capture, we need to take a reference to the vma from before
the reset in order to catpure the contents of the vma later. Currently
we are using both an active reference and a kref, but due to nature of
the i915_vma reference handling, that kref is on the vma->obj and not
the vma itself. This means the vma may be destroyed as soon as it is
idle, that is in between the i915_active_release(&vma->active) and the
i915_vma_put(vma):

<3> [197.866181] BUG: KASAN: use-after-free in intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<3> [197.866339] Read of size 8 at addr ffff8881258cb800 by task gem_exec_captur/1041
<3> [197.866467]
<4> [197.866512] CPU: 2 PID: 1041 Comm: gem_exec_captur Not tainted 5.9.0-g5e4234f97efba-kasan_200+ #1
<4> [197.866521] Hardware name: Intel Corp. Broxton P/Apollolake RVP1A, BIOS APLKRVPA.X64.0150.B11.1608081044 08/08/2016
<4> [197.866530] Call Trace:
<4> [197.866549]  dump_stack+0x99/0xd0
<4> [197.866760]  ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<4> [197.866783]  print_address_description.constprop.8+0x3e/0x60
<4> [197.866797]  ? kmsg_dump_rewind_nolock+0xd4/0xd4
<4> [197.866819]  ? lockdep_hardirqs_off+0xd4/0x120
<4> [197.867037]  ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<4> [197.867249]  ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<4> [197.867270]  kasan_report.cold.10+0x1f/0x37
<4> [197.867492]  ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<4> [197.867710]  intel_engine_coredump_add_vma+0x36c/0x4a0 [i915]
<4> [197.867949]  i915_gpu_coredump.part.29+0x150/0x7b0 [i915]
<4> [197.868186]  i915_capture_error_state+0x5e/0xc0 [i915]
<4> [197.868396]  intel_gt_handle_error+0x6eb/0xa20 [i915]
<4> [197.868624]  ? intel_gt_reset_global+0x370/0x370 [i915]
<4> [197.868644]  ? check_flags+0x50/0x50
<4> [197.868662]  ? __lock_acquire+0xd59/0x6b00
<4> [197.868678]  ? register_lock_class+0x1ad0/0x1ad0
<4> [197.868944]  i915_wedged_set+0xcf/0x1b0 [i915]
<4> [197.869147]  ? i915_wedged_get+0x90/0x90 [i915]
<4> [197.869371]  ? i915_wedged_get+0x90/0x90 [i915]
<4> [197.869398]  simple_attr_write+0x153/0x1c0
<4> [197.869428]  full_proxy_write+0xee/0x180
<4> [197.869442]  ? __sb_start_write+0x1f3/0x310
<4> [197.869465]  vfs_write+0x1a3/0x640
<4> [197.869492]  ksys_write+0xec/0x1c0
<4> [197.869507]  ? __ia32_sys_read+0xa0/0xa0
<4> [197.869525]  ? lockdep_hardirqs_on_prepare+0x32b/0x4e0
<4> [197.869541]  ? syscall_enter_from_user_mode+0x1c/0x50
<4> [197.869566]  do_syscall_64+0x33/0x80
<4> [197.869579]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4> [197.869590] RIP: 0033:0x7fd8b7aee281
<4> [197.869604] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [197.869613] RSP: 002b:00007ffea3b72008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [197.869625] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd8b7aee281
<4> [197.869633] RDX: 0000000000000002 RSI: 00007fd8b81a82e7 RDI: 000000000000000d
<4> [197.869641] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000034
<4> [197.869650] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd8b81a82e7
<4> [197.869658] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000000
<3> [197.869707]
<3> [197.869757] Allocated by task 1041:
<4> [197.869833]  kasan_save_stack+0x19/0x40
<4> [197.869843]  __kasan_kmalloc.constprop.5+0xc1/0xd0
<4> [197.869853]  kmem_cache_alloc+0x106/0x8e0
<4> [197.870059]  i915_vma_instance+0x212/0x1930 [i915]
<4> [197.870270]  eb_lookup_vmas+0xe06/0x1d10 [i915]
<4> [197.870475]  i915_gem_do_execbuffer+0x131d/0x4080 [i915]
<4> [197.870682]  i915_gem_execbuffer2_ioctl+0x103/0x5d0 [i915]
<4> [197.870701]  drm_ioctl_kernel+0x1d2/0x270
<4> [197.870710]  drm_ioctl+0x40d/0x85c
<4> [197.870721]  __x64_sys_ioctl+0x10d/0x170
<4> [197.870731]  do_syscall_64+0x33/0x80
<4> [197.870740]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
<3> [197.870748]
<3> [197.870798] Freed by task 22:
<4> [197.870865]  kasan_save_stack+0x19/0x40
<4> [197.870875]  kasan_set_track+0x1c/0x30
<4> [197.870884]  kasan_set_free_info+0x1b/0x30
<4> [197.870894]  __kasan_slab_free+0x111/0x160
<4> [197.870903]  kmem_cache_free+0xcd/0x710
<4> [197.871109]  i915_vma_parked+0x618/0x800 [i915]
<4> [197.871307]  __gt_park+0xdb/0x1e0 [i915]
<4> [197.871501]  ____intel_wakeref_put_last+0xb1/0x190 [i915]
<4> [197.871516]  process_one_work+0x8dc/0x15d0
<4> [197.871525]  worker_thread+0x82/0xb30
<4> [197.871535]  kthread+0x36d/0x440
<4> [197.871545]  ret_from_fork+0x22/0x30
<3> [197.871553]
<3> [197.871602] The buggy address belongs to the object at ffff8881258cb740
 which belongs to the cache i915_vma of size 968

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2553
Fixes: 2850748ef8 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201016092527.29039-1-chris@chris-wilson.co.uk
(cherry picked from commit 178536b829)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 13:29:57 -04:00
Chris Wilson 64402570e1 drm/i915/gt: Undo forced context restores after trivial preemptions
We may try to preempt the currently executing request, only to find that
after unravelling all the dependencies that the original executing
context is still the earliest in the topological sort and re-submitted
back to HW (if we do detect some change in the ELSP that requires
re-submission). However, due to the way we check for wrap-around during
the unravelling, we mark any context that has been submitted just once
(i.e. with the rq->wa_tail set, but the ring->tail earlier) as
potentially wrapping and requiring a forced restore on resubmission.
This was expected to be not a problem, as it was anticipated that most
unwinding for preemption would result in a context switch and the few
that did not would be lost in the noise. It did not take long for
someone to find one particular workload where the cost of those extra
context restores was measurable.

However, since we know the wa_tail is of fixed size, and we know that a
request must be larger than the wa_tail itself, we can safely maintain
the check for request wrapping and check against a slightly future point
in the ring that includes an expected wa_tail. (That is if the
ring->tail is already set to rq->wa_tail, including another 8 bytes in
the check does not invalidate the incremental wrap detection.)

Fixes: 8ab3a3812a ("drm/i915/gt: Incrementally check for rewinding")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk
(cherry picked from commit bb65548e3c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 13:29:55 -04:00
Chris Wilson 9b99e5ba3e drm/i915/gt: Delay execlist processing for tgl
When running gem_exec_nop, it floods the system with many requests (with
the goal of userspace submitting faster than the HW can process a single
empty batch). This causes the driver to continually resubmit new
requests onto the end of an active context, a flood of lite-restore
preemptions. If we time this just right, Tigerlake hangs.

Inserting a small delay between the processing of CS events and
submitting the next context, prevents the hang. Naturally it does not
occur with debugging enabled. The suspicion then is that this is related
to the issues with the CS event buffer, and inserting an mmio read of
the CS pointer status appears to be very successful in preventing the
hang. Other registers, or uncached reads, or plain mb, do not prevent
the hang, suggesting that register is key -- but that the hang can be
prevented by a simple udelay, suggests it is just a timing issue like
that encountered by commit 233c1ae3c8 ("drm/i915/gt: Wait for CSB
entries on Tigerlake"). Also note that the hang is not prevented by
applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU
between requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk
(cherry picked from commit 6ca7217dff)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 13:29:52 -04:00
Chris Wilson d5e8782129 drm/i915/gem: Support parsing of oversize batches
Matthew Auld noted that on more recent systems (such as the parser for
gen9) we may have objects that are larger than expected by the GEM uAPI
(i.e. greater than u32). These objects would have incorrect implicit
batch lengths, causing the parser to reject them for being incomplete,
or worse.

Based on a patch by Matthew Auld.

Reported-by: Matthew Auld <matthew.auld@intel.com>
Fixes: 435e8fc059 ("drm/i915: Allow parsing of unsized batches")
Testcase: igt/gem_exec_params/larger-than-life-batch
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20201015115954.871-1-chris@chris-wilson.co.uk
(cherry picked from commit 57b2d834bf)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 13:29:50 -04:00
Ville Syrjälä 1664ffee76 drm/i915: Mark ininitial fb obj as WT on eLLC machines to avoid rcu lockup during fbdev init
Currently we leave the cache_level of the initial fb obj
set to NONE. This means on eLLC machines the first pin_to_display()
will try to switch it to WT which requires a vma unbind+bind.
If that happens during the fbdev initialization rcu does not
seem operational which causes the unbind to get stuck. To
most appearances this looks like a dead machine on boot.

Avoid the unbind by already marking the object cache_level
as WT when creating it. We still do an excplicit ggtt pin
which will rewrite the PTEs anyway, so they will match whatever
cache level we set.

Cc: <stable@vger.kernel.org> # v5.7+
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-1-ville.syrjala@linux.intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-1-chris@chris-wilson.co.uk
(cherry picked from commit d46b60a2e8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2020-10-19 13:29:47 -04:00