Commit Graph

1014947 Commits

Author SHA1 Message Date
Linus Torvalds 224478289c ARM fixes:
* Another state update on exit to userspace fix
 
 * Prevent the creation of mixed 32/64 VMs
 
 * Fix regression with irqbypass not restarting the guest on failed connect
 
 * Fix regression with debug register decoding resulting in overlapping access
 
 * Commit exception state on exit to usrspace
 
 * Fix the MMU notifier return values
 
 * Add missing 'static' qualifiers in the new host stage-2 code
 
 x86 fixes:
 * fix guest missed wakeup with assigned devices
 
 * fix WARN reported by syzkaller
 
 * do not use BIT() in UAPI headers
 
 * make the kvm_amd.avic parameter bool
 
 PPC fixes:
 * make halt polling heuristics consistent with other architectures
 
 selftests:
 * various fixes
 
 * new performance selftest memslot_perf_test
 
 * test UFFD minor faults in demand_paging_test
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCyF0MUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOHSgf/Q4Hm5e12Bj2xJy6A+iShnrbbT8PW
 hcIIOA7zGWXfjVYcBV7anbj7CcpzfIz0otcRBABa5mkhj+fb3YmPEb0EzCPi4Hru
 zxpcpB2w7W7WtUOIKe2EmaT+4Pk6/iLcfr8UMHMqx460akE9OmIg10QNWai3My/3
 RIOeakSckBI9e/1TQZbxH66dsLwCT0lLco7i7AWHdFxkzUQyoA34HX5pczOCBsO5
 3nXH+/txnRVhqlcyzWLVVGVzFqmpHtBqkIInDOXfUqIoxo/gOhOgF1QdMUEKomxn
 5ZFXlL5IXNtr+7yiI67iHX7CWkGZE9oJ04TgPHn6LR6wRnVvc3JInzcB5Q==
 =ollO
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "ARM fixes:

   - Another state update on exit to userspace fix

   - Prevent the creation of mixed 32/64 VMs

   - Fix regression with irqbypass not restarting the guest on failed
     connect

   - Fix regression with debug register decoding resulting in
     overlapping access

   - Commit exception state on exit to usrspace

   - Fix the MMU notifier return values

   - Add missing 'static' qualifiers in the new host stage-2 code

  x86 fixes:

   - fix guest missed wakeup with assigned devices

   - fix WARN reported by syzkaller

   - do not use BIT() in UAPI headers

   - make the kvm_amd.avic parameter bool

  PPC fixes:

   - make halt polling heuristics consistent with other architectures

  selftests:

   - various fixes

   - new performance selftest memslot_perf_test

   - test UFFD minor faults in demand_paging_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits)
  selftests: kvm: fix overlapping addresses in memslot_perf_test
  KVM: X86: Kill off ctxt->ud
  KVM: X86: Fix warning caused by stale emulation context
  KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception
  KVM: x86/mmu: Fix comment mentioning skip_4k
  KVM: VMX: update vcpu posted-interrupt descriptor when assigning device
  KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK
  KVM: x86: add start_assignment hook to kvm_x86_ops
  KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch
  selftests: kvm: do only 1 memslot_perf_test run by default
  KVM: X86: Use _BITUL() macro in UAPI headers
  KVM: selftests: add shared hugetlbfs backing source type
  KVM: selftests: allow using UFFD minor faults for demand paging
  KVM: selftests: create alias mappings when using shared memory
  KVM: selftests: add shmem backing source type
  KVM: selftests: refactor vm_mem_backing_src_type flags
  KVM: selftests: allow different backing source types
  KVM: selftests: compute correct demand paging size
  KVM: selftests: simplify setup_demand_paging error handling
  KVM: selftests: Print a message if /dev/kvm is missing
  ...
2021-05-29 06:02:25 -10:00
Linus Torvalds 866c4b8a18 - Fix races in vfio-ccw request handling.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmCyA4sACgkQjYWKoQLX
 FBjddQf+MWC5uLlKarG4BUb38iH/iCMEw3+euYiUIMrGNskjXPUbA+8u7XVVG/iY
 DccOtnUj70qKIYSCC37IItL45Xe6+oDU2cSoL1gsNLPgsgW2Y9W4opu6+WtS0i8B
 fXi/l25zefZbHLa9WM5GcXfBILSQOjVBS59zxA+jeN3Ap8gHwJsuwRXBM2xeDbP4
 CloRS/6Q/0tgBp1uMCfKFAO60UzwnyeR7ULrEpjk3f++maUKEIO2GnpFK5ly3kD/
 PcTqsljspjKeD4879n0XNBd4+zcUMQjw02Afnts0FZaypuScWJSesQ6NWuAIJ57v
 glp7IBlino2U4kcvuyAa5BK3ZRZPyA==
 =Dp3D
 -----END PGP SIGNATURE-----

Merge tag 's390-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:
 "Fix races in vfio-ccw request handling"

* tag 's390-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  vfio-ccw: Serialize FSM IDLE state with I/O completion
  vfio-ccw: Reset FSM state to IDLE inside FSM
  vfio-ccw: Check initialized flag in cp_init()
2021-05-29 05:51:53 -10:00
Paolo Bonzini 000ac42953 selftests: kvm: fix overlapping addresses in memslot_perf_test
vm_create allocates memory and maps it close to GPA.  This memory
is separate from what is allocated in subsequent calls to
vm_userspace_mem_region_add, so it is incorrect to pass the
test memory size to vm_create_default.  Just pass a small
fixed amount of memory which can be used later for page table,
otherwise GPAs are already allocated at MEM_GPA and the
test aborts.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-29 06:28:06 -04:00
Thomas Gleixner 7d65f9e806 x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
PIC interrupts do not support affinity setting and they can end up on
any online CPU. Therefore, it's required to mark the associated vectors
as system-wide reserved. Otherwise, the corresponding irq descriptors
are copied to the secondary CPUs but the vectors are not marked as
assigned or reserved. This works correctly for the IO/APIC case.

When the IO/APIC is disabled via config, kernel command line or lack of
enumeration then all legacy interrupts are routed through the PIC, but
nothing marks them as system-wide reserved vectors.

As a consequence, a subsequent allocation on a secondary CPU can result in
allocating one of these vectors, which triggers the BUG() in
apic_update_vector() because the interrupt descriptor slot is not empty.

Imran tried to work around that by marking those interrupts as allocated
when a CPU comes online. But that's wrong in case that the IO/APIC is
available and one of the legacy interrupts, e.g. IRQ0, has been switched to
PIC mode because then marking them as allocated will fail as they are
already marked as system vectors.

Stay consistent and update the legacy vectors after attempting IO/APIC
initialization and mark them as system vectors in case that no IO/APIC is
available.

Fixes: 69cde0004a ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: Imran Khan <imran.f.khan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com
2021-05-29 11:41:14 +02:00
Linus Torvalds 6799d4f2da SCSI fixes on 20210528
10 small fixes, all in drivers.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYLF1vSYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSaWAP94iRVJ
 DUTEuUl8RCvwyBqW/K0wF2AfE96z5arYYYNfjwD/Y3Zcf5iGyVTmGXH9SgM0jQTs
 qXFcjrsqhZLzA6R50QU=
 =UNqY
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Ten small fixes, all in drivers"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: qla2xxx: Wait for stop_phase1 at WWN removal
  scsi: hisi_sas: Drop free_irq() of devm_request_irq() allocated irq
  scsi: vmw_pvscsi: Set correct residual data length
  scsi: bnx2fc: Return failure if io_req is already in ABTS processing
  scsi: aic7xxx: Remove multiple definition of globals
  scsi: aic7xxx: Restore several defines for aic7xxx firmware build
  scsi: target: iblock: Fix smp_processor_id() BUG messages
  scsi: libsas: Use _safe() loop in sas_resume_port()
  scsi: target: tcmu: Fix xarray RCU warning
  scsi: target: core: Avoid smp_processor_id() in preemptible code
2021-05-28 14:47:48 -10:00
Linus Torvalds 0217a27e4d block-5.13-2021-05-28
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmCxY3UQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpkwuD/9myUaQum6BWTam0eaTL/J7c0eq3Eagmb+V
 q0mlfL1B/vfiQcg7BDPYe7cgdOjIv1z2/ucif3y7SL0VbyuTe4a81PrNh74EKAMQ
 z5cfRpPpHoeOu92C/cWMq/eOJOkeiiBXCpss1TrpJOw7mYbCpAGFYjCyBj5DY70m
 K36gkspt+mV11kufhDjyUugIbuODaj7m6FJx8R0Jj28mCNiN40zTbPbUJUAe/Wqp
 LyyFxPwYmuMneNky4LIGmC34uao9UYq7IUBuKsjuFpyU+UExhb0Re2ls7voSgpP4
 baXnHSyqWaDlchULCFcq1oI0jj05uKP4JO7/HOmpYw3+8Gf6Z5z10+iU5sikusxM
 KVyh1qb84zFJAuYsVtsMX1nPjXGk2DU7u5NWdvmYRH69FYV2dV1uviRNVPd7p1LX
 D00oNLxsXgUmtPwyHfOtY400BwWdFCM1SpTq9vaQ02TT4q5mGwBcAg+o2h/dsjb1
 Ru2Gl5UU6jmeJLMEgxks0mNKclSwgexAaiWRrRINcOWqqChB9+4hGKm7DI7HBMov
 5xN9MkMRMpNovUFTqkCT+jdXqYKjG1vojWBr0S2UVZNvVivARvfXrtrDa3PzTb4L
 7ysrjeUenhL0kJY7W+6yKqKQ1QCtaCfgajW+uiQU+dyo97yviNYE47Z3aqATeh1l
 wTBTaXfOMg==
 =jZJD
 -----END PGP SIGNATURE-----

Merge tag 'block-5.13-2021-05-28' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request (Christoph):
      - fix a memory leak in nvme_cdev_add (Guoqing Jiang)
      - fix inline data size comparison in nvmet_tcp_queue_response (Hou
        Pu)
      - fix false keep-alive timeout when a controller is torn down
        (Sagi Grimberg)
      - fix a nvme-tcp Kconfig dependency (Sagi Grimberg)
      - short-circuit reconnect retries for FC (Hannes Reinecke)
      - decode host pathing error for connect (Hannes Reinecke)

 - MD pull request (Song):
      - Fix incorrect chunk boundary assert (Christoph)

 - Fix s390/dasd verification panic (Stefan)

* tag 'block-5.13-2021-05-28' of git://git.kernel.dk/linux-block:
  nvmet: fix false keep-alive timeout when a controller is torn down
  nvmet-tcp: fix inline data size comparison in nvmet_tcp_queue_response
  nvme-tcp: remove incorrect Kconfig dep in BLK_DEV_NVME
  md/raid5: remove an incorrect assert in in_chunk_boundary
  s390/dasd: add missing discipline function
  nvme-fabrics: decode host pathing error for connect
  nvme-fc: short-circuit reconnect retries
  nvme: fix potential memory leaks in nvme_cdev_add
2021-05-28 14:42:37 -10:00
Linus Torvalds b3dbbae609 io_uring-5.13-2021-05-28
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmCxY4wQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpqJnD/sEHg2ZVzc3CUtvLI11C+O4nkqzUpetOD8I
 iKtvCYKYNTATOPLGQjsznNTTVcUhN4Mud9XWHjyR3nli98fwRrzLuK3EfJjuq1cL
 v6DZVuYKq4k6s0QN6K8yTMslYBQTmk85l8rvXs06jVqDadnnVc+JdfWWBDducs0e
 56Wtmlse18PhzfDjqtsjAOQBjpv4bhQaJTrYOHcEIqFiih2ZpSvyP3SLED7/nvoe
 Q8MNF0Htff/oVbUEzp/NfhHoOFIZ17wwPV3fRC7zat2Dp4R9ZxpScmozLn8PkdO9
 DW+rKpuCbYTYwY1p11cQ5EhiNWNfPMxX4YXovUP9z+M2cgGUK1IhWQRM83L9bAXt
 r/9Md5WjnNpeDr6/YW6uMe1lOrrEy2ZJfNJ2JJbiXo6CWiz+g2qfHLOxwVsEnfoy
 vZoSbDD8ItZDooaXDFGEp1PLpkka4vt/6Ebg0fUtEeG8QQ48eG5L9xpPMSjm90y9
 /UKZdS1pvSl/x6he+RDPg4aVGBWIhGJhv+Q22hNTO3g5u5QE+hXLvFh0QvoOkDQK
 FGlhIa431EiOdm3rdFCG2I4kH1QzQTO6XLHpoVabGXJULPvS2ztnHCz3pYqOU9w1
 Mh12t1RtWzvcTkyOutfsjVqszV3kTl6O6GkI8CiqqjomnbbfORj6CDsi7h9RFZI+
 HtnY2GbSJg==
 =dfLl
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.13-2021-05-28' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few minor fixes:

   - Fix an issue with hashed wait removal on exit (Zqiang, Pavel)

   - Fix a recent data race introduced in this series (Marco)"

* tag 'io_uring-5.13-2021-05-28' of git://git.kernel.dk/linux-block:
  io_uring: fix data race to avoid potential NULL-deref
  io-wq: Fix UAF when wakeup wqe in hash waitqueue
  io_uring/io-wq: close io-wq full-stop gap
2021-05-28 14:35:55 -10:00
Linus Torvalds 567d1fd853 drm fixes for 5.13-rc4
ttm:
 - prevent irrelevant swapout
 
 amdgpu:
 - MultiGPU fan fix
 - VCN powergating fixes
 
 amdkfd:
 - Fix SDMA register offset error
 
 meson:
 - fix shutdown crash
 
 i915:
 - Re-enable LTTPR non-transparent LT mode for DPCD_REV<1.4
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmCxS2kACgkQDHTzWXnE
 hr74KA//XR5fhlA7RNDfIL+VBxEoEde5qmweUKPs4cQ8564/XimrmVAfWxoZTImw
 Y2mOqmOtyayyRmpb8McTyZa5Ev1qBW0DLuD55JBBEcObNKsqPqoM1zm7KpmfCP8Q
 QoltLluIMOae9XgEuQQkfUatSfW2miG384EsFydNrxgNHnAhLJXEVevFyZnudlJG
 qnj79t3QU3KvLpz6AE8Fv/Em4uKS7TMF+BjFuREKC31cCXbIV+48vWuhB+Ja0+wj
 afdhhC93LiMgQ2s0LqglSuAOzu2Sd4yHCvAZij0W794VeEOU9DURy/bP1o8QJROd
 /rduex54VLmaW0263REb0l91sfcEVyZiPIm9rLWGr3Su6QChPkDlFUShy1L9OltQ
 WhVgdmar5RCVWnecvNwJUWRh3bGoxiFHBDVfFQ2AGE2eFCRQ4FZNfgRD4Od0T5yP
 dR2rDVhzXMykesy7HrKdwpN2Y2PvmCC3vAheoDgkRJztuVHjNSH62hzfonwSvdkW
 m0BdDA1CXSV4WvbjAH/WWfnEj6p3Jc6KuWu5b1BmMJxzvBQ6I5l7d+EOgy39g7/a
 HTgQbV+S6Jes2wYMZghfPwQ30YAc81QuU6uPd3ZSGN45Vo5QrFqEqj/zK4lxvQQD
 LTxjnGVpN+4BtJLE4WzJLulNf1ATlEG2t5d5MXjqngaQTz3E9is=
 =Uyrv
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2021-05-29' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Pretty quiet this week, couple of amdgpu, one i915, and a few misc otherwise.

  ttm:
   - prevent irrelevant swapout

  amdgpu:
   - MultiGPU fan fix
   - VCN powergating fixes

  amdkfd:
   - Fix SDMA register offset error

  meson:
   - fix shutdown crash

  i915:
   - Re-enable LTTPR non-transparent LT mode for DPCD_REV < 1.4"

* tag 'drm-fixes-2021-05-29' of git://anongit.freedesktop.org/drm/drm:
  drm/ttm: Skip swapout if ttm object is not populated
  drm/i915: Reenable LTTPR non-transparent LT mode for DPCD_REV<1.4
  drm/meson: fix shutdown crash when component not probed
  drm/amdgpu/jpeg3: add cancel_delayed_work_sync before power gate
  drm/amdgpu/jpeg2.5: add cancel_delayed_work_sync before power gate
  drm/amdgpu/jpeg2.0: add cancel_delayed_work_sync before power gate
  drm/amdgpu/vcn3: add cancel_delayed_work_sync before power gate
  drm/amdgpu/vcn2.5: add cancel_delayed_work_sync before power gate
  drm/amdgpu/vcn2.0: add cancel_delayed_work_sync before power gate
  drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
  drm/amdkfd: correct sienna_cichlid SDMA RLC register offset error
  drm/amd/pm: correct MGpuFanBoost setting
2021-05-28 14:28:58 -10:00
Linus Torvalds f289d99045 perf tools fixes for v5.12: 3rd batch
- Fix error checking of BPF prog attachment in 'perf stat'.
 
 - Fix getting maximum number of fds in the vendor events JSON parser.
 
 - Move debug initialization earlier, fixing a segfault in some cases.
 
 - Fix eventcode of power10 JSON events.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYLFRCQAKCRCyPKLppCJ+
 JzhmAQCY5szt0eyPpkDVn0c4vP26E1w/pP0EpYWhqhjilP6NFQEAxoQuhwxwzyjv
 bdWgAyjKJ6Qt1jYKUK+A0nSyYNW4UAI=
 =kN6V
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v5.13-2021-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix error checking of BPF prog attachment in 'perf stat'.

 - Fix getting maximum number of fds in the vendor events JSON parser.

 - Move debug initialization earlier, fixing a segfault in some cases.

 - Fix eventcode of power10 JSON events.

* tag 'perf-tools-fixes-for-v5.13-2021-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf vendor events powerpc: Fix eventcode of power10 JSON events
  perf stat: Fix error check for bpf_program__attach
  perf debug: Move debug initialization earlier
  perf jevents: Fix getting maximum number of fds
2021-05-28 14:23:05 -10:00
Linus Torvalds 7c0ec89d31 3 SMB3 fixes, two for stable, and the other fixes a problem pointed out with a recently added ioctl
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmCxbv4ACgkQiiy9cAdy
 T1GkPwwAqq0tvNDZnQ6aAur1jwHMiOIAydvpgNKXlKkiXYu+qQbMRgOdUsOtjM00
 Idi8PHKkID2X63HeiwLwoQTfTXGcs6I4UM1iOslEs2ZaX+Fkgo5PG25lIFRTAsqR
 tqYGqGi6yL6TWE7PlVJxr3QuwGeMr7B8X9A0lTZ7YJwslhByK8ymasPdF+jSgPQI
 zDOuAeXiYvlph8sCftWX7gF34aBfKgiH8LhA6M2SY5S16g7LwtXUJjq1PJactoD3
 +nEPyCtRoN6ohScKNVnM8JDpOKIrM+mJ42RG28ZLo6//8so0SFcUdC8VhECBxOWL
 9WkoL2GxRV0LoRnzCZS30EpAi/eQU+QlTrPueGp+n8GjauJMDPoxJ2l6UXox6CLm
 8CqwxKATG6WbrdcGhaVbIxVbAWC7Ze271C/7L61R5K+RmDTXc6jI4vAIw1Pib4o+
 CG6XtxHya5PM0zvyLgU28M6aY+WExbwnkSQKvI2FJZkOVG0xdFCy2O1QLLRBChmn
 a6hsA05a
 =8nZ2
 -----END PGP SIGNATURE-----

Merge tag '5.13-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Three SMB3 fixes.

  Two for stable, and the other fixes a problem pointed out with a
  recently added ioctl"

* tag '5.13-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: change format of CIFS_FULL_KEY_DUMP ioctl
  cifs: fix string declarations and assignments in tracepoints
  cifs: set server->cipher_type to AES-128-CCM for SMB3.0
2021-05-28 14:15:47 -10:00
Jakub Kicinski 6850ec9737 Merge branch 'mptcp-fixes-for-5-13'
Mat Martineau says:

====================
mptcp: Fixes for 5.13

These patches address two issues in MPTCP.

Patch 1 fixes a locking issue affecting MPTCP-level retransmissions.

Patches 2-4 improve handling of out-of-order packet arrival early
in a connection, so it falls back to TCP rather than forcing a
reset. Includes a selftest.
====================

Link: https://lore.kernel.org/r/20210527233140.182728-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28 13:51:43 -07:00
Paolo Abeni 69ca3d29a7 mptcp: update selftest for fallback due to OoO
The previous commit noted that we can have fallback
scenario due to OoO (or packet drop). Update the self-tests
accordingly

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28 13:51:40 -07:00
Paolo Abeni dea2b1ea9c mptcp: do not reset MP_CAPABLE subflow on mapping errors
When some mapping related errors occurs we close the main
MPC subflow with a RST. We should instead fallback gracefully
to TCP, and do the reset only for MPJ subflows.

Fixes: d22f4988ff ("mptcp: process MP_CAPABLE data option")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/192
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28 13:51:40 -07:00
Paolo Abeni 06f9a435b3 mptcp: always parse mptcp options for MPC reqsk
In subflow_syn_recv_sock() we currently skip options parsing
for OoO packet, given that such packets may not carry the relevant
MPC option.

If the peer generates an MPC+data TSO packet and some of the early
segments are lost or get reorder, we server will ignore the peer key,
causing transient, unexpected fallback to TCP.

The solution is always parsing the incoming MPTCP options, and
do the fallback only for in-order packets. This actually cleans
the existing code a bit.

Fixes: d22f4988ff ("mptcp: process MP_CAPABLE data option")
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28 13:51:40 -07:00
Paolo Abeni b5941f066b mptcp: fix sk_forward_memory corruption on retransmission
MPTCP sk_forward_memory handling is a bit special, as such field
is protected by the msk socket spin_lock, instead of the plain
socket lock.

Currently we have a code path updating such field without handling
the relevant lock:

__mptcp_retrans() -> __mptcp_clean_una_wakeup()

Several helpers in __mptcp_clean_una_wakeup() will update
sk_forward_alloc, possibly causing such field corruption, as reported
by Matthieu.

Address the issue providing and using a new variant of blamed function
which explicitly acquires the msk spin lock.

Fixes: 64b9cea7a0 ("mptcp: fix spurious retransmissions")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/172
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28 13:51:39 -07:00
Linus Torvalds 5ff2756afd NFS client bugfixes for Linux 5.13
Highlights include:
 
 Stable fixes
 - Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
 - Fix Oops in xs_tcp_send_request() when transport is disconnected
 - Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
 
 Bugfixes
 - Fix instances where signal_pending() should be fatal_signal_pending()
 - fix an incorrect limit in filelayout_decode_layout()
 - Fixes for the SUNRPC backlogged RPC queue
 - Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
 - Revert commit 586a0787ce ("Clean up rpcrdma_prepare_readch()")
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmCvomgACgkQZwvnipYK
 APLg6xAAqlR/HLNYOLAToQ6d9wzrL6Po3x8Lx7VURjCkFaoKB/jMq3Zbu/K8mZ+X
 CC6/XFtB9AikloK7sle6lRPuwwPL6y+vOML0Ais/dkYPNkbhe9ylf0rsYQiPljXT
 8PAcqn8FXTZ9fKpKU8Quw24X1Jfkk6zUEeMy50HYDBfTx+gYEojMEKa6cl4URGzO
 2JpuBO4Ku/vWDOPj7bWBX9wi7wkrJjGSYDnx1A5SOgUdV87H8VJkbTo9vVdEwFoE
 OtE8MQmFhdton0u9+MKImFQdVfxoYLB1Ig1G45NXGHee91dwfYU0U05THj7E/xP9
 RQWtmJcKdvY1w8sRK/PNEHo43Vkow4usffSrIWNBZ6aO5EkbQFn1tmKMSDtsrkZ2
 ONMfKBiEhhQSy+QRXMR/RC86t4dsQ8SApu62qQT4VuuXqzYhrBum2DqkW0X6Zcti
 gi17+PfjRbgWNvul2yegBvDU016H324aCeT9nfWe0D9iwF7tPK4xsuNTYrWwbFOA
 YFAecIXoyBRtbIV6NZ95/+P5HEFBLAYewEVLpdAOBGQ9fjO023ERiC2sitl5P+ku
 v6V+4HAtBgcPfm/8BZwUYUYBpXnnTZFTizqdJdGGydPXeC671gANPJe6e4xOttCK
 frXFGd9OOqPSXdsRZLUVvhczTOFOGa/UVVG0GxIr4ggy8oKjDMk=
 =66ks
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
"Stable fixes:
   - Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
   - Fix Oops in xs_tcp_send_request() when transport is disconnected
   - Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()

  Bugfixes:
   - Fix instances where signal_pending() should be fatal_signal_pending()
   - fix an incorrect limit in filelayout_decode_layout()
   - Fixes for the SUNRPC backlogged RPC queue
   - Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
   - Revert commit 586a0787ce ("Clean up rpcrdma_prepare_readch()")"

* tag 'nfs-for-5.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: Remove trailing semicolon in macros
  xprtrdma: Revert 586a0787ce
  NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
  NFS: Clean up reset of the mirror accounting variables
  NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
  NFS: Fix an Oopsable condition in __nfs_pageio_add_request()
  SUNRPC: More fixes for backlog congestion
  SUNRPC: Fix Oops in xs_tcp_send_request() when transport is disconnected
  NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
  SUNRPC in case of backlog, hand free slots directly to waiting task
  pNFS/NFSv4: Remove redundant initialization of 'rd_size'
  NFS: fix an incorrect limit in filelayout_decode_layout()
  fs/nfs: Use fatal_signal_pending instead of signal_pending
2021-05-28 08:53:19 -10:00
Linus Torvalds fc683f967a sound fixes for 5.12-rc4
A slightly high volume at this time due to pending ASoC fixes.
 While there are a few generic simple-card fixes for regressions,
 most of the changes are device-specific fixes: ASoC Intel SOF, codec
 clocks, other codec / platform fixes as well as usual HD-audio and
 USB-audio.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmCvcpIOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE8tGxAArLWNZYYN+iWlj/Ag1uIryLYAWi7/rXM3P6hF
 2fyvjHq405I80V47EsfQJc4N4WglHuOTatl0U7+OYqgvg1PqwUVgwGUj1zRjXdfS
 JN8waSfsd5F9+7UaLNzwx3FRl/mkm58ATtDfbvOg4qVxRWxBMVOQR/1NY0BwyGkO
 nZHe9BH7GrwucqGnjyy2Uoc1hMkoUZ27rLhqfGDbyRGPoy3k6ovvo+DnHIord9FO
 hyXJpWZyuxGdj9KBSREDRu0Xu2BZVw9t5YOl6o+0Y9AHKlA0vnjASpA/vC1u5tOO
 XG2uEUq2eKJkFj/XOdrEethNQE4rFSt4ClPGnEJFDHtmgG7tC3pqkul9o7vbpC8l
 evclr5VkhYdYCyoV5KXiXvRZRCkzm+s2LkHn8IUU3Hub+od3sZKMpf2E1ODN7RU9
 sCZ1dcHG0cKcecZOMlQ2dyQJNx6/kbhg0MGlPIZFqHX9s9VMw5VFl6BE5urXKbQX
 W8wi0NnCJ/+p3pEijTN0V+NXqmjt4i02/HRut7XZWVDt5CDAkszXCjlaMbWf+9eF
 ayttOFbFQMdKA9GsK7dQsRJVVNp7mQ548E3WrzYhRKvUeB2UDbGaa43cHSTacEmE
 YXmWUkpBFS8SCT52EMPIkk2NnG1XletzH+mc/JieOgvUG+Q/jiNuqth+BXrTu+CT
 CcxfZhA=
 =AIkz
 -----END PGP SIGNATURE-----

Merge tag 'sound-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A slightly high volume at this time due to pending ASoC fixes.

  While there are a few generic simple-card fixes for regressions, most
  of the changes are device-specific fixes: ASoC Intel SOF, codec
  clocks, other codec / platform fixes as well as usual HD-audio and
  USB-audio"

* tag 'sound-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (37 commits)
  ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 17 G8
  ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 15 G8
  ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook G8
  ALSA: hda/realtek: fix mute/micmute LEDs for HP 855 G8
  ALSA: hda/realtek: Chain in pop reduction fixup for ThinkStation P340
  ALSA: usb-audio: scarlett2: snd_scarlett_gen2_controls_create() can be static
  ALSA: hda/realtek: the bass speaker can't output sound on Yoga 9i
  ALSA: hda/realtek: Headphone volume is controlled by Front mixer
  ALSA: usb-audio: scarlett2: Improve driver startup messages
  ALSA: usb-audio: scarlett2: Fix device hang with ehci-pci
  ALSA: usb-audio: fix control-request direction
  ASoC: qcom: lpass-cpu: Use optional clk APIs
  ASoC: cs35l33: fix an error code in probe()
  ASoC: SOF: Intel: hda: don't send DAI_CONFIG IPC for older firmware
  ASoC: fsl: fix SND_SOC_IMX_RPMSG dependency
  ASoC: cs42l52: Minor tidy up of error paths
  ASoC: cs35l32: Add missing regmap use_single config
  ASoC: cs35l34: Add missing regmap use_single config
  ASoC: cs42l73: Add missing regmap use_single config
  ASoC: cs53l30: Add missing regmap use_single config
  ...
2021-05-28 08:47:50 -10:00
Linus Torvalds 8508b97ae2 Clang feature fixes for v5.13-rc4
- Correctly pass stack frame size checking under LTO (Nick Desaulniers)
 
 - Avoid CFI mismatches by checking initcall_t types (Marco Elver)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmCxJpgACgkQiXL039xt
 wCbvEg//XBvNxYIAEGhj2rw9MR88FaKNl/6P1RcvSGt6C+Yqr1bls2Dce+tLXKwb
 T2KCflc4aSwDjZAWTH+RKT55bZpqJmhOpgMpR8pitXpoeSf9hxT2N7xWfNmAVwl8
 yuZBuWNLK6G7EJv8muBqpnt6GPgdLFOKBY6EEKVIx3Ms+BqDllWGFgVnVO0vNA2V
 yHULddGXr1DD8rZI0M9/orCLHgZ61rNmglPQsndA4wAYw3SD9ZdX5So3UwngRJdN
 SP3Z3hrru7xdKKhMHyqm63rjplbPzrKbQn0/OeUIeBTN2EYIGJOc1JH+L1UXpL1V
 ZsFj/zC1D3o4E70fTYhpxbUIMmAJ4gpBBG1pjOao7dK8wT4mL41+i75nGuSEfyeq
 +1xq4Q78mvLminz6ugbpIq80prjkTe+CrvO6LUro3WxVYz5WRnZPUf1wv2YiN+P5
 PMaWh6g8cLGzg4BZ96F0E4R3d6w4UACo5jerPw6pgBP5ChNHRoMkEatGCjYqiSk+
 6T3Cex7M1BEg7aznL1GavtkRYKVnUg3bO/Xc5WLZoriV/W+ToVanJ6kIpk5xM8Um
 ylMDSiv50AzHkKYxUvec9Kx2hknoqDMzOWQShFYqSuNP7XLiilYxicnjb/lZ4s9Z
 EazlpCw+bI/d93qIdLZH6bwXD99IfmfWdL9Kg7Woqp42lk3PN+8=
 =NgsV
 -----END PGP SIGNATURE-----

Merge tag 'clang-features-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull clang feature fixes from Kees Cook:

 - Correctly pass stack frame size checking under LTO (Nick Desaulniers)

 - Avoid CFI mismatches by checking initcall_t types (Marco Elver)

* tag 'clang-features-v5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  Makefile: LTO: have linker check -Wframe-larger-than
  init: verify that function is initcall_t at compile-time
2021-05-28 08:31:48 -10:00
Linus Torvalds afdd14704d - fixed function/preempt traces hangs
- a few build fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmCw2McaHHRzYm9nZW5k
 QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHChmg//WYL4LeyHJd7UjA4YplgP
 J+DKFm3AlcbugD0kUm8MhiX1Dt9L1oM+NTrm+/uyHi5Vsp8ReV2h+iUudLwoE8tD
 MP4IarvoOXFj1VmRQkFOfJbHq+An7k9kNBcq16DIfUtD07GRFzCgVyQix1/osRzg
 vg7FP/UOMwI9DAkN67CieWk75mmzzcC3xDaoTC1dcTG2tuUxwXzmw2JcskzLfbGk
 66Er9O476v6JxQu24GL3Ls15L426YLhAQ5vS4vBrHRXRgtjpr2B4ZE95jZrUQdbQ
 omsTDFNeViTOQgLeF1b/eCU6cZqbOLOUd9GVWVfgsEZCk8qXu14jCWCohVPri9w1
 NYzmOhyvvxpMzL0/rKoljBV01eJXrA66v/abUwDhbBhAt5qowjIK0OmRcDVFVrHX
 XA/xM0R//BxdR4idXtp/kVIOxfRgUIYptcqFICMlvaHIeGvtt8vVL4SZVMHXNwYK
 L5hiIH5z8Sc1MxLPYuCxqveDHKdm8wPRqEEgThT7UNjEgLBPtvarRj0mQ9PQe4xr
 8YVPkSeaZNRsRuSKNBCYopo+iqbipyR34lTTc/IfJULaGE6CB9PAk6qyf5GIFycZ
 3HK8SLec5hI4dfPZ0y40xVZLNBMFKcCI4hKUJV+o/t0btNP1xgBCQkdAFJ6o34iu
 eYare6N46pJOzHHN/nlEaew=
 =c2eW
 -----END PGP SIGNATURE-----

Merge tag 'mips-fixes_5.13_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

 - fix function/preempt trace hangs

 - a few build fixes

* tag 'mips-fixes_5.13_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: Fix kernel hang under FUNCTION_GRAPH_TRACER and PREEMPT_TRACER
  MIPS: ralink: export rt_sysc_membase for rt2880_wdt.c
  MIPS: launch.h: add include guard to prevent build errors
  MIPS: alchemy: xxs1500: add gpio-au1000.h header file
2021-05-28 08:24:13 -10:00
Paolo Bonzini a3d2ec9d3c KVM/arm64 fixes for 5.13, take #2
- Another state update on exit to userspace fix
 - Prevent the creation of mixed 32/64 VMs
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmCvdGkPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDyxYP90X4vSdDvewU+xtyD84g4UEyzAnHbjaSWL/i
 59vXVqqLPhGkUillWCIdKGWz9q27Ixs1PyMrySIoM+U1hARLQ3MdE2nUBfKZRg9W
 oLArFSPDYstyGpdG0B5YB8bXiF8ePXesWQ0lwx/02KGKNLo+t5FEeVVg7zf7eVkv
 9dwItKtnmU8LiUhGtYrHuogDX9KRuuK7SH+ouF4kwf5FMy24k2IfZSUQP1Sj/hNC
 oA9iqWCA6XgQkaVLnxqXmPpY6krUDgbQsFZK92XpQRngJa8FoFQr6EMgHBTrBIat
 LKIIacqKXZQgRk9ala60Jyf4REYZ5600HzaIUhU4cI9EZA7R4zXy6mX+F/PU7b46
 NQVXLNYxpzy7ETlzXUApoi8l88mKuR6sJgQrJku1TBu97778MMEjiuk5n51Pjj2F
 p+bkRx2X5ctEIQx6RNTmVi4VIcF8+dbRqkgChDmaP/9CVXZGn8J6Wd8j28C7B+sJ
 s3qFeWI/sYvGmf2cBuq6tYHVT9TnVu43Ypq2nhTzwGXdjhbFH1y/I2ZeVUGAvQdl
 N1/Lg4QWjpm/UNBcFUVUr34rf5GnNgKPkT668Yt0bhsthVOE6YZrK7Szbc+ert+1
 QkFWZuglLc014NegHyjUybooTykrvBhDzyTU9PhTQ5YU9hyq81eN+i+LgH+rPJjM
 UmAVeCo=
 =lGh1
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.13, take #2

- Another state update on exit to userspace fix
- Prevent the creation of mixed 32/64 VMs
2021-05-28 13:02:03 -04:00
Wanpeng Li b35491e66c KVM: X86: Kill off ctxt->ud
ctxt->ud is consumed only by x86_decode_insn(), we can kill it off by
passing emulation_type to x86_decode_insn() and dropping ctxt->ud
altogether. Tracking that info in ctxt for literally one call is silly.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <1622160097-37633-2-git-send-email-wanpengli@tencent.com>
2021-05-28 12:59:10 -04:00
Wanpeng Li da6393cdd8 KVM: X86: Fix warning caused by stale emulation context
Reported by syzkaller:

  WARNING: CPU: 7 PID: 10526 at linux/arch/x86/kvm//x86.c:7621 x86_emulate_instruction+0x41b/0x510 [kvm]
  RIP: 0010:x86_emulate_instruction+0x41b/0x510 [kvm]
  Call Trace:
   kvm_mmu_page_fault+0x126/0x8f0 [kvm]
   vmx_handle_exit+0x11e/0x680 [kvm_intel]
   vcpu_enter_guest+0xd95/0x1b40 [kvm]
   kvm_arch_vcpu_ioctl_run+0x377/0x6a0 [kvm]
   kvm_vcpu_ioctl+0x389/0x630 [kvm]
   __x64_sys_ioctl+0x8e/0xd0
   do_syscall_64+0x3c/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Commit 4a1e10d5b5 ("KVM: x86: handle hardware breakpoints during emulation())
adds hardware breakpoints check before emulation the instruction and parts of
emulation context initialization, actually we don't have the EMULTYPE_NO_DECODE flag
here and the emulation context will not be reused. Commit c8848cee74 ("KVM: x86:
set ctxt->have_exception in x86_decode_insn()) triggers the warning because it
catches the stale emulation context has #UD, however, it is not during instruction
decoding which should result in EMULATION_FAILED. This patch fixes it by moving
the second part emulation context initialization into init_emulate_ctxt() and
before hardware breakpoints check. The ctxt->ud will be dropped by a follow-up
patch.

syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=134683fdd00000

Reported-by: syzbot+71271244f206d17f6441@syzkaller.appspotmail.com
Fixes: 4a1e10d5b5 (KVM: x86: handle hardware breakpoints during emulation)
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <1622160097-37633-1-git-send-email-wanpengli@tencent.com>
2021-05-28 12:59:09 -04:00
Yuan Yao e87e46d5f3 KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception
The kvm_get_linear_rip() handles x86/long mode cases well and has
better readability, __kvm_set_rflags() also use the paired
function kvm_is_linear_rip() to check the vcpu->arch.singlestep_rip
set in kvm_arch_vcpu_ioctl_set_guest_debug(), so change the
"CS.BASE + RIP" code in kvm_arch_vcpu_ioctl_set_guest_debug() and
handle_exception_nmi() to this one.

Signed-off-by: Yuan Yao <yuan.yao@intel.com>
Message-Id: <20210526063828.1173-1-yuan.yao@linux.intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-28 12:57:53 -04:00
Sargun Dhillon aac902925e Documentation: seccomp: Fix user notification documentation
The documentation had some previously incorrect information about how
userspace notifications (and responses) were handled due to a change
from a previously proposed patchset.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210517193908.3113-2-sargun@sargun.me
2021-05-28 09:54:12 -07:00
Lukas Bulwahn 8aa0ae4399 MAINTAINERS: adjust to removing i2c designware platform data
Commit 5a517b5bf6 ("i2c: designware: Get rid of legacy platform data")
removes ./include/linux/platform_data/i2c-designware.h, but misses to
adjust the SYNOPSYS DESIGNWARE I2C DRIVER section in MAINTAINERS.

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains:

  warning: no file matches F: include/linux/platform_data/i2c-designware.h

Remove the file entry to this removed file as well.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-05-28 16:48:48 +02:00
Nicholas Piggin 1438709e63 KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
Similar to commit 25edcc50d7 ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.

Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv

Fixes: b005255e12 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
2021-05-28 22:54:27 +10:00
Nicholas Piggin 5362a4b6ee powerpc: Fix reverse map real-mode address lookup with huge vmalloc
real_vmalloc_addr() does not currently work for huge vmalloc, which is
what the reverse map can be allocated with for radix host, hash guest.

Extract the hugepage aware equivalent from eeh code into a helper, and
convert existing sites including this one to use it.

Fixes: 8abddd968a ("powerpc/64s/radix: Enable huge vmalloc mappings")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210526120005.3432222-1-npiggin@gmail.com
2021-05-28 22:54:27 +10:00
Kajol Jain 8fc4e4aa2b perf vendor events powerpc: Fix eventcode of power10 JSON events
Fixed the eventcode values in the power10 JSON event files to prepend
"0x" since these are hexadecimal values.

The patch also changes the event description of the PM_EXEC_STALL_LOAD_FINISH
and PM_EXEC_STALL_NTC_FLUSH event and move some events to correct files.

Fixes: 32daa5d789 ("perf vendor events: Initial JSON/events list for power10 platform")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Paul A. Clarke <pc@us.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20210525063723.1191514-1-kjain@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-28 09:22:24 -03:00
Naveen N. Rao 82123a3d1d powerpc/kprobes: Fix validation of prefixed instructions across page boundary
When checking if the probed instruction is the suffix of a prefixed
instruction, we access the instruction at the previous word. If the
probed instruction is the very first word of a module, we can end up
trying to access an invalid page.

Fix this by skipping the check for all instructions at the beginning of
a page. Prefixed instructions cannot cross a 64-byte boundary and as
such, we don't expect to encounter a suffix as the very first word in a
page for kernel text. Even if there are prefixed instructions crossing
a page boundary (from a module, for instance), the instruction will be
illegal, so preventing probing on the suffix of such prefix instructions
isn't worthwhile.

Fixes: b4657f7650 ("powerpc/kprobes: Don't allow breakpoints on suffixes")
Cc: stable@vger.kernel.org # v5.8+
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0df9a032a05576a2fa8e97d1b769af2ff0eafbd6.1621416666.git.naveen.n.rao@linux.vnet.ibm.com
2021-05-28 21:52:42 +10:00
Greg Kroah-Hartman 56dde68f85 Revert "serial: 8250: 8250_omap: Fix possible interrupt storm"
This reverts commit 31fae7c8b1.

Tony writes:
	I just noticed this causes the following regression in Linux
	next when pressing a key on uart console after boot at least on
	omap3. This seems to happen on serial_port_in(port, UART_RX) in
	the quirk handling.

So let's drop this.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/YLCCJzkkB4N7LTQS@atomide.com
Fixes: 31fae7c8b1 ("serial: 8250: 8250_omap: Fix possible interrupt storm")
Reported-by: Tony Lindgren <tony@atomide.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-28 10:58:49 +02:00
Krzysztof Kozlowski 2499042326 i2c: s3c2410: fix possible NULL pointer deref on read message after write
Interrupt handler processes multiple message write requests one after
another, till the driver message queue is drained.  However if driver
encounters a read message without preceding START, it stops the I2C
transfer as it is an invalid condition for the controller.  At least the
comment describes a requirement "the controller forces us to send a new
START when we change direction".  This stop results in clearing the
message queue (i2c->msg = NULL).

The code however immediately jumped back to label "retry_write" which
dereferenced the "i2c->msg" making it a possible NULL pointer
dereference.

The Coverity analysis:
1. Condition !is_msgend(i2c), taking false branch.
   if (!is_msgend(i2c)) {

2. Condition !is_lastmsg(i2c), taking true branch.
   } else if (!is_lastmsg(i2c)) {

3. Condition i2c->msg->flags & 1, taking true branch.
   if (i2c->msg->flags & I2C_M_RD) {

4. write_zero_model: Passing i2c to s3c24xx_i2c_stop, which sets i2c->msg to NULL.
   s3c24xx_i2c_stop(i2c, -EINVAL);

5. Jumping to label retry_write.
   goto retry_write;

6. var_deref_model: Passing i2c to is_msgend, which dereferences null i2c->msg.
   if (!is_msgend(i2c)) {"

All previous calls to s3c24xx_i2c_stop() in this interrupt service
routine are followed by jumping to end of function (acknowledging
the interrupt and returning).  This seems a reasonable choice also here
since message buffer was entirely emptied.

Addresses-Coverity: Explicit null dereferenced
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-05-28 10:16:23 +02:00
Qii Wang fed1bd51a5 i2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset
The i2c controller driver do dma reset after transfer timeout,
but sometimes dma reset will trigger an unexpected DMA_ERR irq.
It will cause the i2c controller to continuously send interrupts
to the system and cause soft lock-up. So we need to disable i2c
start_en and clear intr_stat to stop i2c controller before dma
reset when transfer timeout.

Fixes: aafced673c06("i2c: mediatek: move dma reset before i2c reset")
Signed-off-by: Qii Wang <qii.wang@mediatek.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
2021-05-28 10:13:07 +02:00
Dave Airlie aeeb517368 Merge tag 'drm-intel-fixes-2021-05-27' of ssh://git.freedesktop.org/git/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.13-rc4:
- Re-enable LTTPR non-transparent LT mode for DPCD_REV<1.4

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/875yz4bnmj.fsf@intel.com
2021-05-28 13:28:18 +10:00
Dave Airlie b26389e854 A fix in meson for a crash at shutdown and one for TTM to prevent
irrelevant swapout
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCYK+LpgAKCRDj7w1vZxhR
 xQpPAQDGiLT6DMi3bnnPydqCyZZfkSy4lXNflOeoRe34eAcCSgD+KfQR2gaHJoA0
 T4YbzZB21ZbxZFomjo+WNv0WYtImSgY=
 =24lG
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-fixes-2021-05-27' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-fixes

A fix in meson for a crash at shutdown and one for TTM to prevent
irrelevant swapout

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20210527120828.3w7f53krzkslc4ii@gilmour
2021-05-28 13:24:42 +10:00
Namhyung Kim c673b7f59e perf stat: Fix error check for bpf_program__attach
It seems the bpf_program__attach() returns a negative error code instead
of a NULL pointer in case of error.

Fixes: 7fac83aaf2 ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Link: http://lore.kernel.org/lkml/20210527220052.1657578-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-27 21:51:21 -03:00
Dave Airlie ac6e9e3d19 Merge tag 'amd-drm-fixes-5.13-2021-05-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.13-2021-05-26:

amdgpu:
- MultiGPU fan fix
- VCN powergating fixes

amdkfd:
- Fix SDMA register offset error

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210527031831.4057-1-alexander.deucher@amd.com
2021-05-28 09:18:04 +10:00
Jakub Kicinski 44991d61aa Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for net:

1) Fix incorrect sockopts unregistration from error path,
   from Florian Westphal.

2) A few patches to provide better error reporting when missing kernel
   netfilter options are missing in .config.

3) Fix dormant table flag updates.

4) Memleak in IPVS  when adding service with IP_VS_SVC_F_HASHED flag.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
  ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
  netfilter: nf_tables: fix table flag updates
  netfilter: nf_tables: extended netlink error reporting for chain type
  netfilter: nf_tables: missing error reporting for not selected expressions
  netfilter: conntrack: unregister ipv4 sockopts on error unwind
====================

Link: https://lore.kernel.org/r/20210527190115.98503-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27 15:54:12 -07:00
Linus Torvalds 97e5bf604b Merge branch 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu
Pull percpu fixes from Dennis Zhou:
 "This contains a cleanup to lib/percpu-refcount.c and an update to the
  MAINTAINERS file to more formally take over support for lib/percpu*"

* 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
  MAINTAINERS: Add lib/percpu* as part of percpu entry
  percpu_ref: Don't opencode percpu_ref_is_dying
2021-05-27 12:01:26 -10:00
Linus Torvalds 3c856a3180 arm64 fixes:
- Don't use contiguous or block mappings for the linear map when KFENCE
   is enabled.
 
 - Fix link in the arch_counter_enforce_ordering() comment.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmCv3NUACgkQa9axLQDI
 XvFcew//TTlg2fNMdfHQb2t62yDlBHb6nz1LtOE6xIUyE7m4h3G36J9z+EcRb+bc
 GvDB8kyFLGea7cpOI3wrNNlMLPw5F+SxNRddlBpO9YwD0Y/8mO9EYhe2ZE8SiNlZ
 EDyQhMimVutWIBhPhX9aTZIaQEcVY8fAu0jW3F1+t6iHUYlsusaVtMPdjWehsptY
 vywOSeW/2Z2tTNjOiDwy3E96FMwV5qFMH6bBgbG+9QUT1XpqWvHxPEvZu0sgd+PR
 x2DbMSS6U+eTQECE+XtPhGm1BQay4Dyzq6E2seH4vGxuWUx3dRQcFP7lAvRtFLah
 YKLmu0i6wbHItwcRYpN1lQOGI7hgcIQiVm5SycjfwYneKj2cCBHxtZirnG7WpEhX
 lCkeO4qrrw+oe2p+05xgiMQ00fSQLpb9e0gu/aBe9OS0DaLC0R86PJJOD1F0fd/m
 F3DW2CXcFEQHcZRYC5xxCM1UUAP1iHvIdpkvxKAsWuMnzxmD4lC1Dr8fU55cAbDi
 2Vr+6W1I5o7hvhQsHmiYy/n54IIKmWhfopzTBPtJqdiHT3fPFOmdmM6CmDy+cyHA
 xxHA24QmAFSlLeYMsYYXLuAkLsllrRl8wTb8ds+EPEPkjrp2bWwc2Ls853oWRL7H
 HfBdcfE9t5vP2zP9J8H/FpNkD6AzdL03ihvYpKiaGgXmPPEFyGI=
 =gSNa
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Don't use contiguous or block mappings for the linear map when KFENCE
   is enabled.

 - Fix link in the arch_counter_enforce_ordering() comment.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: don't use CON and BLK mapping if KFENCE is enabled
  arm64: Fix stale link in the arch_counter_enforce_ordering() comment
2021-05-27 11:58:26 -10:00
Linus Torvalds 38747c9a2d - Fix DM verity target's 'require_signatures' module_param permissions.
- Revert DM snapshot fix from v5.13-rc3 and then properly fix crash
   when an origin has no snapshots. This allows only the proper fix to
   go to stable@ (since the original fix was successfully dropped).
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmCv6dwACgkQxSPxCi2d
 A1ouvgf/UTg13oWs3w6O36ib9pxFjEmy+APsYsC0cYYEywPpsyNZol/zxuX5hgfQ
 vsThW0l/IPq6TJFSpoYhrnW6syTQkTosDnpTVq1MZcEEDW8lXcsqdElP2qjc9FCn
 jpnma6zfYJzF/ucIZBIF8vuFyQyF+p73XjOf56j2fMnsN2re5KLHK1NylyWq8G5p
 C4bKhqJmQDUKf5Za361rLz91GrAYhljqc6QoqyKlyz2X5JQX/Mw6zjhIaHdqcSZg
 Xbd+aHB/N/4jTqNM8ClPu1J+1uzoaZzHgcNxKTZDiaUjfM8uCbj/htA1L83M4jUe
 iTGXoD8pN5Sr37+fMarkBcUAC11/1A==
 =9/SR
 -----END PGP SIGNATURE-----

Merge tag 'for-5.13/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM verity target's 'require_signatures' module_param permissions.

 - Revert DM snapshot fix from v5.13-rc3 and then properly fix crash
   when an origin has no snapshots. This allows only the proper fix to
   go to stable@ (since the original fix was successfully dropped).

* tag 'for-5.13/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm snapshot: properly fix a crash when an origin has no snapshots
  dm snapshot: revert "fix a crash when an origin has no snapshots"
  dm verity: fix require_signatures module_param permissions
2021-05-27 11:54:36 -10:00
Ariel Levkovich fb91702b74 net/sched: act_ct: Fix ct template allocation for zone 0
Fix current behavior of skipping template allocation in case the
ct action is in zone 0.

Skipping the allocation may cause the datapath ct code to ignore the
entire ct action with all its attributes (commit, nat) in case the ct
action in zone 0 was preceded by a ct clear action.

The ct clear action sets the ct_state to untracked and resets the
skb->_nfct pointer. Under these conditions and without an allocated
ct template, the skb->_nfct pointer will remain NULL which will
cause the tc ct action handler to exit without handling commit and nat
actions, if such exist.

For example, the following rule in OVS dp:
recirc_id(0x2),ct_state(+new-est-rel-rpl+trk),ct_label(0/0x1), \
in_port(eth0),actions:ct_clear,ct(commit,nat(src=10.11.0.12)), \
recirc(0x37a)

Will result in act_ct skipping the commit and nat actions in zone 0.

The change removes the skipping of template allocation for zone 0 and
treats it the same as any other zone.

Fixes: b57dc7c13e ("net/sched: Introduce action ct")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/20210526170110.54864-1-lariel@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27 14:54:23 -07:00
Paul Blakey 0cc254e5aa net/sched: act_ct: Offload connections with commit action
Currently established connections are not offloaded if the filter has a
"ct commit" action. This behavior will not offload connections of the
following scenario:

$ tc_filter add dev $DEV ingress protocol ip prio 1 flower \
  ct_state -trk \
  action ct commit action goto chain 1

$ tc_filter add dev $DEV ingress protocol ip chain 1 prio 1 flower \
  action mirred egress redirect dev $DEV2

$ tc_filter add dev $DEV2 ingress protocol ip prio 1 flower \
  action ct commit action goto chain 1

$ tc_filter add dev $DEV2 ingress protocol ip prio 1 chain 1 flower \
  ct_state +trk+est \
  action mirred egress redirect dev $DEV

Offload established connections, regardless of the commit flag.

Fixes: 46475bb20f ("net/sched: act_ct: Software offload of established flows")
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Link: https://lore.kernel.org/r/1622029449-27060-1-git-send-email-paulb@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27 14:54:14 -07:00
Parav Pandit b28d8f0c25 devlink: Correct VIRTUAL port to not have phys_port attributes
Physical port name, port number attributes do not belong to virtual port
flavour. When VF or SF virtual ports are registered they incorrectly
append "np0" string in the netdevice name of the VF/SF.

Before this fix, VF netdevice name were ens2f0np0v0, ens2f0np0v1 for VF
0 and 1 respectively.

After the fix, they are ens2f0v0, ens2f0v1.

With this fix, reading /sys/class/net/ens2f0v0/phys_port_name returns
-EOPNOTSUPP.

Also devlink port show example for 2 VFs on one PF to ensure that any
physical port attributes are not exposed.

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
pci/0000:06:00.3/196608: type eth netdev ens2f0v0 flavour virtual splittable false
pci/0000:06:00.4/262144: type eth netdev ens2f0v1 flavour virtual splittable false

This change introduces a netdevice name change on systemd/udev
version 245 and higher which honors phys_port_name sysfs file for
generation of netdevice name.

This also aligns to phys_port_name usage which is limited to switchdev
ports as described in [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/Documentation/networking/switchdev.rst

Fixes: acf1ee44ca ("devlink: Introduce devlink port flavour virtual")
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20210526200027.14008-1-parav@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27 14:43:41 -07:00
Filipe Manana 76a6d5cd74 btrfs: fix deadlock when cloning inline extents and low on available space
There are a few cases where cloning an inline extent requires copying data
into a page of the destination inode. For these cases we are allocating
the required data and metadata space while holding a leaf locked. This can
result in a deadlock when we are low on available space because allocating
the space may flush delalloc and two deadlock scenarios can happen:

1) When starting writeback for an inode with a very small dirty range that
   fits in an inline extent, we deadlock during the writeback when trying
   to insert the inline extent, at cow_file_range_inline(), if the extent
   is going to be located in the leaf for which we are already holding a
   read lock;

2) After successfully starting writeback, for non-inline extent cases,
   the async reclaim thread will hang waiting for an ordered extent to
   complete if the ordered extent completion needs to modify the leaf
   for which the clone task is holding a read lock (for adding or
   replacing file extent items). So the cloning task will wait forever
   on the async reclaim thread to make progress, which in turn is
   waiting for the ordered extent completion which in turn is waiting
   to acquire a write lock on the same leaf.

So fix this by making sure we release the path (and therefore the leaf)
every time we need to copy the inline extent's data into a page of the
destination inode, as by that time we do not need to have the leaf locked.

Fixes: 05a5a7621c ("Btrfs: implement full reflink support for inline extents")
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 23:31:52 +02:00
Filipe Manana ea7036de0d btrfs: fix fsync failure and transaction abort after writes to prealloc extents
When doing a series of partial writes to different ranges of preallocated
extents with transaction commits and fsyncs in between, we can end up with
a checksum items in a log tree. This causes an fsync to fail with -EIO and
abort the transaction, turning the filesystem to RO mode, when syncing the
log.

For this to happen, we need to have a full fsync of a file following one
or more fast fsyncs.

The following example reproduces the problem and explains how it happens:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  # Create our test file with 2 preallocated extents. Leave a 1M hole
  # between them to ensure that we get two file extent items that will
  # never be merged into a single one. The extents are contiguous on disk,
  # which will later result in the checksums for their data to be merged
  # into a single checksum item in the csums btree.
  #
  $ xfs_io -f \
           -c "falloc 0 1M" \
           -c "falloc 3M 3M" \
           /mnt/foobar

  # Now write to the second extent and leave only 1M of it as unwritten,
  # which corresponds to the file range [4M, 5M[.
  #
  # Then fsync the file to flush delalloc and to clear full sync flag from
  # the inode, so that a future fsync will use the fast code path.
  #
  # After the writeback triggered by the fsync we have 3 file extent items
  # that point to the second extent we previously allocated:
  #
  # 1) One file extent item of type BTRFS_FILE_EXTENT_REG that covers the
  #    file range [3M, 4M[
  #
  # 2) One file extent item of type BTRFS_FILE_EXTENT_PREALLOC that covers
  #    the file range [4M, 5M[
  #
  # 3) One file extent item of type BTRFS_FILE_EXTENT_REG that covers the
  #    file range [5M, 6M[
  #
  # All these file extent items have a generation of 6, which is the ID of
  # the transaction where they were created. The split of the original file
  # extent item is done at btrfs_mark_extent_written() when ordered extents
  # complete for the file ranges [3M, 4M[ and [5M, 6M[.
  #
  $ xfs_io -c "pwrite -S 0xab 3M 1M" \
           -c "pwrite -S 0xef 5M 1M" \
           -c "fsync" \
           /mnt/foobar

  # Commit the current transaction. This wipes out the log tree created by
  # the previous fsync.
  sync

  # Now write to the unwritten range of the second extent we allocated,
  # corresponding to the file range [4M, 5M[, and fsync the file, which
  # triggers the fast fsync code path.
  #
  # The fast fsync code path sees that there is a new extent map covering
  # the file range [4M, 5M[ and therefore it will log a checksum item
  # covering the range [1M, 2M[ of the second extent we allocated.
  #
  # Also, after the fsync finishes we no longer have the 3 file extent
  # items that pointed to 3 sections of the second extent we allocated.
  # Instead we end up with a single file extent item pointing to the whole
  # extent, with a type of BTRFS_FILE_EXTENT_REG and a generation of 7 (the
  # current transaction ID). This is due to the file extent item merging we
  # do when completing ordered extents into ranges that point to unwritten
  # (preallocated) extents. This merging is done at
  # btrfs_mark_extent_written().
  #
  $ xfs_io -c "pwrite -S 0xcd 4M 1M" \
           -c "fsync" \
           /mnt/foobar

  # Now do some write to our file outside the range of the second extent
  # that we allocated with fallocate() and truncate the file size from 6M
  # down to 5M.
  #
  # The truncate operation sets the full sync runtime flag on the inode,
  # forcing the next fsync to use the slow code path. It also changes the
  # length of the second file extent item so that it represents the file
  # range [3M, 5M[ and not the range [3M, 6M[ anymore.
  #
  # Finally fsync the file. Since this is a fsync that triggers the slow
  # code path, it will remove all items associated to the inode from the
  # log tree and then it will scan for file extent items in the
  # fs/subvolume tree that have a generation matching the current
  # transaction ID, which is 7. This means it will log 2 file extent
  # items:
  #
  # 1) One for the first extent we allocated, covering the file range
  #    [0, 1M[
  #
  # 2) Another for the first 2M of the second extent we allocated,
  #    covering the file range [3M, 5M[
  #
  # When logging the first file extent item we log a single checksum item
  # that has all the checksums for the entire extent.
  #
  # When logging the second file extent item, we also lookup for the
  # checksums that are associated with the range [0, 2M[ of the second
  # extent we allocated (file range [3M, 5M[), and then we log them with
  # btrfs_csum_file_blocks(). However that results in ending up with a log
  # that has two checksum items with ranges that overlap:
  #
  # 1) One for the range [1M, 2M[ of the second extent we allocated,
  #    corresponding to the file range [4M, 5M[, which we logged in the
  #    previous fsync that used the fast code path;
  #
  # 2) One for the ranges [0, 1M[ and [0, 2M[ of the first and second
  #    extents, respectively, corresponding to the files ranges [0, 1M[
  #    and [3M, 5M[. This one was added during this last fsync that uses
  #    the slow code path and overlaps with the previous one logged by
  #    the previous fast fsync.
  #
  # This happens because when logging the checksums for the second
  # extent, we notice they start at an offset that matches the end of the
  # checksums item that we logged for the first extent, and because both
  # extents are contiguous on disk, btrfs_csum_file_blocks() decides to
  # extend that existing checksums item and append the checksums for the
  # second extent to this item. The end result is we end up with two
  # checksum items in the log tree that have overlapping ranges, as
  # listed before, resulting in the fsync to fail with -EIO and aborting
  # the transaction, turning the filesystem into RO mode.
  #
  $ xfs_io -c "pwrite -S 0xff 0 1M" \
           -c "truncate 5M" \
           -c "fsync" \
           /mnt/foobar
  fsync: Input/output error

After running the example, dmesg/syslog shows the tree checker complained
about the checksum items with overlapping ranges and we aborted the
transaction:

  $ dmesg
  (...)
  [756289.557487] BTRFS critical (device sdc): corrupt leaf: root=18446744073709551610 block=30720000 slot=5, csum end range (16777216) goes beyond the start range (15728640) of the next csum item
  [756289.560583] BTRFS info (device sdc): leaf 30720000 gen 7 total ptrs 7 free space 11677 owner 18446744073709551610
  [756289.562435] BTRFS info (device sdc): refs 2 lock_owner 0 current 2303929
  [756289.563654] 	item 0 key (257 1 0) itemoff 16123 itemsize 160
  [756289.564649] 		inode generation 6 size 5242880 mode 100600
  [756289.565636] 	item 1 key (257 12 256) itemoff 16107 itemsize 16
  [756289.566694] 	item 2 key (257 108 0) itemoff 16054 itemsize 53
  [756289.567725] 		extent data disk bytenr 13631488 nr 1048576
  [756289.568697] 		extent data offset 0 nr 1048576 ram 1048576
  [756289.569689] 	item 3 key (257 108 1048576) itemoff 16001 itemsize 53
  [756289.570682] 		extent data disk bytenr 0 nr 0
  [756289.571363] 		extent data offset 0 nr 2097152 ram 2097152
  [756289.572213] 	item 4 key (257 108 3145728) itemoff 15948 itemsize 53
  [756289.573246] 		extent data disk bytenr 14680064 nr 3145728
  [756289.574121] 		extent data offset 0 nr 2097152 ram 3145728
  [756289.574993] 	item 5 key (18446744073709551606 128 13631488) itemoff 12876 itemsize 3072
  [756289.576113] 	item 6 key (18446744073709551606 128 15728640) itemoff 11852 itemsize 1024
  [756289.577286] BTRFS error (device sdc): block=30720000 write time tree block corruption detected
  [756289.578644] ------------[ cut here ]------------
  [756289.579376] WARNING: CPU: 0 PID: 2303929 at fs/btrfs/disk-io.c:465 csum_one_extent_buffer+0xed/0x100 [btrfs]
  [756289.580857] Modules linked in: btrfs dm_zero dm_dust loop dm_snapshot (...)
  [756289.591534] CPU: 0 PID: 2303929 Comm: xfs_io Tainted: G        W         5.12.0-rc8-btrfs-next-87 #1
  [756289.592580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [756289.594161] RIP: 0010:csum_one_extent_buffer+0xed/0x100 [btrfs]
  [756289.595122] Code: 5d c3 e8 76 60 (...)
  [756289.597509] RSP: 0018:ffffb51b416cb898 EFLAGS: 00010282
  [756289.598142] RAX: 0000000000000000 RBX: fffff02b8a365bc0 RCX: 0000000000000000
  [756289.598970] RDX: 0000000000000000 RSI: ffffffffa9112421 RDI: 00000000ffffffff
  [756289.599798] RBP: ffffa06500880000 R08: 0000000000000000 R09: 0000000000000000
  [756289.600619] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
  [756289.601456] R13: ffffa0652b1d8980 R14: ffffa06500880000 R15: 0000000000000000
  [756289.602278] FS:  00007f08b23c9800(0000) GS:ffffa0682be00000(0000) knlGS:0000000000000000
  [756289.603217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [756289.603892] CR2: 00005652f32d0138 CR3: 000000025d616003 CR4: 0000000000370ef0
  [756289.604725] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [756289.605563] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [756289.606400] Call Trace:
  [756289.606704]  btree_csum_one_bio+0x244/0x2b0 [btrfs]
  [756289.607313]  btrfs_submit_metadata_bio+0xb7/0x100 [btrfs]
  [756289.608040]  submit_one_bio+0x61/0x70 [btrfs]
  [756289.608587]  btree_write_cache_pages+0x587/0x610 [btrfs]
  [756289.609258]  ? free_debug_processing+0x1d5/0x240
  [756289.609812]  ? __module_address+0x28/0xf0
  [756289.610298]  ? lock_acquire+0x1a0/0x3e0
  [756289.610754]  ? lock_acquired+0x19f/0x430
  [756289.611220]  ? lock_acquire+0x1a0/0x3e0
  [756289.611675]  do_writepages+0x43/0xf0
  [756289.612101]  ? __filemap_fdatawrite_range+0xa4/0x100
  [756289.612800]  __filemap_fdatawrite_range+0xc5/0x100
  [756289.613393]  btrfs_write_marked_extents+0x68/0x160 [btrfs]
  [756289.614085]  btrfs_sync_log+0x21c/0xf20 [btrfs]
  [756289.614661]  ? finish_wait+0x90/0x90
  [756289.615096]  ? __mutex_unlock_slowpath+0x45/0x2a0
  [756289.615661]  ? btrfs_log_inode_parent+0x3c9/0xdc0 [btrfs]
  [756289.616338]  ? lock_acquire+0x1a0/0x3e0
  [756289.616801]  ? lock_acquired+0x19f/0x430
  [756289.617284]  ? lock_acquire+0x1a0/0x3e0
  [756289.617750]  ? lock_release+0x214/0x470
  [756289.618221]  ? lock_acquired+0x19f/0x430
  [756289.618704]  ? dput+0x20/0x4a0
  [756289.619079]  ? dput+0x20/0x4a0
  [756289.619452]  ? lockref_put_or_lock+0x9/0x30
  [756289.619969]  ? lock_release+0x214/0x470
  [756289.620445]  ? lock_release+0x214/0x470
  [756289.620924]  ? lock_release+0x214/0x470
  [756289.621415]  btrfs_sync_file+0x46a/0x5b0 [btrfs]
  [756289.621982]  do_fsync+0x38/0x70
  [756289.622395]  __x64_sys_fsync+0x10/0x20
  [756289.622907]  do_syscall_64+0x33/0x80
  [756289.623438]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [756289.624063] RIP: 0033:0x7f08b27fbb7b
  [756289.624588] Code: 0f 05 48 3d 00 (...)
  [756289.626760] RSP: 002b:00007ffe2583f940 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
  [756289.627639] RAX: ffffffffffffffda RBX: 00005652f32cd0f0 RCX: 00007f08b27fbb7b
  [756289.628464] RDX: 00005652f32cbca0 RSI: 00005652f32cd110 RDI: 0000000000000003
  [756289.629323] RBP: 00005652f32cd110 R08: 0000000000000000 R09: 00007f08b28c4be0
  [756289.630172] R10: fffffffffffff39a R11: 0000000000000293 R12: 0000000000000001
  [756289.631007] R13: 00005652f32cd0f0 R14: 0000000000000001 R15: 00005652f32cc480
  [756289.631819] irq event stamp: 0
  [756289.632188] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  [756289.632911] hardirqs last disabled at (0): [<ffffffffa7e97c29>] copy_process+0x879/0x1cc0
  [756289.633893] softirqs last  enabled at (0): [<ffffffffa7e97c29>] copy_process+0x879/0x1cc0
  [756289.634871] softirqs last disabled at (0): [<0000000000000000>] 0x0
  [756289.635606] ---[ end trace 0a039fdc16ff3fef ]---
  [756289.636179] BTRFS: error (device sdc) in btrfs_sync_log:3136: errno=-5 IO failure
  [756289.637082] BTRFS info (device sdc): forced readonly

Having checksum items covering ranges that overlap is dangerous as in some
cases it can lead to having extent ranges for which we miss checksums
after log replay or getting the wrong checksum item. There were some fixes
in the past for bugs that resulted in this problem, and were explained and
fixed by the following commits:

  27b9a8122f ("Btrfs: fix csum tree corruption, duplicate and outdated checksums")
  b84b8390d6 ("Btrfs: fix file read corruption after extent cloning and fsync")
  40e046acbd ("Btrfs: fix missing data checksums after replaying a log tree")
  e289f03ea7 ("btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents")

Fix the issue by making btrfs_csum_file_blocks() taking into account the
start offset of the next checksum item when it decides to extend an
existing checksum item, so that it never extends the checksum to end at a
range that goes beyond the start range of the next checksum item.

When we can not access the next checksum item without releasing the path,
simply drop the optimization of extending the previous checksum item and
fallback to inserting a new checksum item - this happens rarely and the
optimization is not significant enough for a log tree in order to justify
the extra complexity, as it would only save a few bytes (the size of a
struct btrfs_item) of leaf space.

This behaviour is only needed when inserting into a log tree because
for the regular checksums tree we never have a case where we try to
insert a range of checksums that overlap with a range that was previously
inserted.

A test case for fstests will follow soon.

Reported-by: Philipp Fent <fent@in.tum.de>
Link: https://lore.kernel.org/linux-btrfs/93c4600e-5263-5cba-adf0-6f47526e7561@in.tum.de/
CC: stable@vger.kernel.org # 5.4+
Tested-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 23:31:36 +02:00
Josef Bacik dc09ef3562 btrfs: abort in rename_exchange if we fail to insert the second ref
Error injection stress uncovered a problem where we'd leave a dangling
inode ref if we failed during a rename_exchange.  This happens because
we insert the inode ref for one side of the rename, and then for the
other side.  If this second inode ref insert fails we'll leave the first
one dangling and leave a corrupt file system behind.  Fix this by
aborting if we did the insert for the first inode ref.

CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 23:31:16 +02:00
Josef Bacik f96d44743a btrfs: check error value from btrfs_update_inode in tree log
Error injection testing uncovered a case where we ended up with invalid
link counts on an inode.  This happened because we failed to notice an
error when updating the inode while replaying the tree log, and
committed the transaction with an invalid file system.

Fix this by checking the return value of btrfs_update_inode.  This
resolved the link count errors I was seeing, and we already properly
handle passing up the error values in these paths.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 23:31:13 +02:00
Josef Bacik 011b28acf9 btrfs: fixup error handling in fixup_inode_link_counts
This function has the following pattern

	while (1) {
		ret = whatever();
		if (ret)
			goto out;
	}
	ret = 0
out:
	return ret;

However several places in this while loop we simply break; when there's
a problem, thus clearing the return value, and in one case we do a
return -EIO, and leak the memory for the path.

Fix this by re-arranging the loop to deal with ret == 1 coming from
btrfs_search_slot, and then simply delete the

	ret = 0;
out:

bit so everybody can break if there is an error, which will allow for
proper error handling to occur.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 23:31:08 +02:00
Josef Bacik d61bec08b9 btrfs: mark ordered extent and inode with error if we fail to finish
While doing error injection testing I saw that sometimes we'd get an
abort that wouldn't stop the current transaction commit from completing.
This abort was coming from finish ordered IO, but at this point in the
transaction commit we should have gotten an error and stopped.

It turns out the abort came from finish ordered io while trying to write
out the free space cache.  It occurred to me that any failure inside of
finish_ordered_io isn't actually raised to the person doing the writing,
so we could have any number of failures in this path and think the
ordered extent completed successfully and the inode was fine.

Fix this by marking the ordered extent with BTRFS_ORDERED_IOERR, and
marking the mapping of the inode with mapping_set_error, so any callers
that simply call fdatawait will also get the error.

With this we're seeing the IO error on the free space inode when we fail
to do the finish_ordered_io.

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 23:31:01 +02:00
Josef Bacik 856bd270dc btrfs: return errors from btrfs_del_csums in cleanup_ref_head
We are unconditionally returning 0 in cleanup_ref_head, despite the fact
that btrfs_del_csums could fail.  We need to return the error so the
transaction gets aborted properly, fix this by returning ret from
btrfs_del_csums in cleanup_ref_head.

Reviewed-by: Qu Wenruo <wqu@suse.com>
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 23:30:55 +02:00