OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Dave Chinner	d634525db6	xfs: replace kmem_alloc_large() with kvmalloc() There is no reason for this wrapper existing anymore. All the places that use KM_NOFS allocation are within transaction contexts and hence covered by memalloc_nofs_save/restore contexts. Hence we don't need any special handling of vmalloc for large IOs anymore and so special casing this code isn't necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-09 15:57:43 -07:00
Dave Chinner	98fe2c3cef	xfs: remove kmem_alloc_io() Since commit `59bb47985c` ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), the core slab code now guarantees slab alignment in all situations sufficient for IO purposes (i.e. minimum of 512 byte alignment of >= 512 byte sized heap allocations) we no longer need the workaround in the XFS code to provide this guarantee. Replace the use of kmem_alloc_io() with kmem_alloc() or kmem_alloc_large() appropriately, and remove the kmem_alloc_io() interface altogether. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-09 15:57:43 -07:00
Dave Chinner	de2860f463	mm: Add kvrealloc() During log recovery of an XFS filesystem with 64kB directory buffers, rebuilding a buffer split across two log records results in a memory allocation warning from krealloc like this: xfs filesystem being mounted at /mnt/scratch supports timestamps until 2038 (0x7fffffff) XFS (dm-0): Unmounting Filesystem XFS (dm-0): Mounting V5 Filesystem XFS (dm-0): Starting recovery (logdev: internal) ------------[ cut here ]------------ WARNING: CPU: 5 PID: 3435170 at mm/page_alloc.c:3539 get_page_from_freelist+0xdee/0xe40 ..... RIP: 0010:get_page_from_freelist+0xdee/0xe40 Call Trace: ? complete+0x3f/0x50 __alloc_pages+0x16f/0x300 alloc_pages+0x87/0x110 kmalloc_order+0x2c/0x90 kmalloc_order_trace+0x1d/0x90 __kmalloc_track_caller+0x215/0x270 ? xlog_recover_add_to_cont_trans+0x63/0x1f0 krealloc+0x54/0xb0 xlog_recover_add_to_cont_trans+0x63/0x1f0 xlog_recovery_process_trans+0xc1/0xd0 xlog_recover_process_ophdr+0x86/0x130 xlog_recover_process_data+0x9f/0x160 xlog_recover_process+0xa2/0x120 xlog_do_recovery_pass+0x40b/0x7d0 ? __irq_work_queue_local+0x4f/0x60 ? irq_work_queue+0x3a/0x50 xlog_do_log_recovery+0x70/0x150 xlog_do_recover+0x38/0x1d0 xlog_recover+0xd8/0x170 xfs_log_mount+0x181/0x300 xfs_mountfs+0x4a1/0x9b0 xfs_fs_fill_super+0x3c0/0x7b0 get_tree_bdev+0x171/0x270 ? suffix_kstrtoint.constprop.0+0xf0/0xf0 xfs_fs_get_tree+0x15/0x20 vfs_get_tree+0x24/0xc0 path_mount+0x2f5/0xaf0 __x64_sys_mount+0x108/0x140 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Essentially, we are taking a multi-order allocation from kmem_alloc() (which has an open coded no fail, no warn loop) and then reallocating it out to 64kB using krealloc(__GFP_NOFAIL) and that is then triggering the above warning. This is a regression caused by converting this code from an open coded no fail/no warn reallocation loop to using __GFP_NOFAIL. What we actually need here is kvrealloc(), so that if contiguous page allocation fails we fall back to vmalloc() and we don't get nasty warnings happening in XFS. Fixes: `771915c4f6` ("xfs: remove kmem_realloc()") Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-09 15:57:43 -07:00
Darrick J. Wong	43059d5416	xfs: dump log intent items that cannot be recovered due to corruption If we try to recover a log intent item and the operation fails due to filesystem corruption, dump the contents of the item to the log for further analysis. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-08-09 11:13:17 -07:00
Darrick J. Wong	48c6615cc5	xfs: grab active perag ref when reading AG headers This patch prepares scrub to deal with the possibility of tearing down entire AGs by changing the order of resource acquisition to match the rest of the XFS codebase. In other words, scrub now grabs AG resources in order of: perag structure, then AGI/AGF/AGFL buffers, then btree cursors; and releases them in reverse order. This requires us to distinguish xchk_ag_init callers -- some are responding to a user request to check AG metadata, in which case we can return ENOENT to userspace; but other callers have an ondisk reference to an AG that they're trying to cross-reference. In this second case, the lack of an AG means there's ondisk corruption, since ondisk metadata cannot point into nonexistent space. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>	2021-08-09 11:13:17 -07:00
Darrick J. Wong	f19ee6bb1a	xfs: drop experimental warnings for bigtime and inobtcount These two features were merged a year ago, userspace tooling have been merged, and no serious errors have been reported by the developers. Drop the experimental tag to encourage wider testing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-08-09 11:13:17 -07:00
Darrick J. Wong	b7df7630cc	xfs: fix silly whitespace problems with kernel libxfs Fix a few whitespace errors such as spaces at the end of the line, etc. This gets us back to something more closely resembling parity. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-08-09 11:13:17 -07:00
Darrick J. Wong	40b1de007a	xfs: throttle inode inactivation queuing on memory reclaim Now that we defer inode inactivation, we've decoupled the process of unlinking or closing an inode from the process of inactivating it. In theory this should lead to better throughput since we now inactivate the queued inodes in batches instead of one at a time. Unfortunately, one of the primary risks with this decoupling is the loss of rate control feedback between the frontend and background threads. In other words, a rm -rf /* thread can run the system out of memory if it can queue inodes for inactivation and jump to a new CPU faster than the background threads can actually clear the deferred work. The workers can get scheduled off the CPU if they have to do IO, etc. To solve this problem, we configure a shrinker so that it will activate the /second/ time the shrinkers are called. The custom shrinker will queue all percpu deferred inactivation workers immediately and set a flag to force frontend callers who are releasing a vfs inode to wait for the inactivation workers. On my test VM with 560M of RAM and a 2TB filesystem, this seems to solve most of the OOMing problem when deleting 10 million inodes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-08-09 11:13:17 -07:00
Darrick J. Wong	a6343e4d92	xfs: avoid buffer deadlocks when walking fs inodes When we're servicing an INUMBERS or BULKSTAT request or running quotacheck, grab an empty transaction so that we can use its inherent recursive buffer locking abilities to detect inode btree cycles without hitting ABBA buffer deadlocks. This patch requires the deferred inode inactivation patchset because xfs_irele cannot directly call xfs_inactive when the iwalk itself has an (empty) transaction. Found by fuzzing an inode btree pointer to introduce a cycle into the tree (xfs/365). Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-08-09 11:13:16 -07:00
Darrick J. Wong	e8d04c2abc	xfs: use background worker pool when transactions can't get free space In xfs_trans_alloc, if the block reservation call returns ENOSPC, we call xfs_blockgc_free_space with a NULL icwalk structure to try to free space. Each frontend thread that encounters this situation starts its own walk of the inode cache to see if it can find anything, which is wasteful since we don't have any additional selection criteria. For this one common case, create a function that reschedules all pending background work immediately and flushes the workqueue so that the scan can run in parallel. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-08-09 11:13:16 -07:00
Darrick J. Wong	6f6490914d	xfs: don't run speculative preallocation gc when fs is frozen Now that we have the infrastructure to switch background workers on and off at will, fix the block gc worker code so that we don't actually run the worker when the filesystem is frozen, same as we do for deferred inactivation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-08-09 10:52:19 -07:00
Darrick J. Wong	01e8f379a4	xfs: flush inode inactivation work when compiling usage statistics Users have come to expect that the space accounting information in statfs and getquota reports are fairly accurate. Now that we inactivate inodes from a background queue, these numbers can be thrown off by whatever resources are singly-owned by the inodes in the queue. Flush the pending inactivations when userspace asks for a space usage report. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-08-09 10:52:18 -07:00
Darrick J. Wong	2eb665027b	xfs: inactivate inodes any time we try to free speculative preallocations Other parts of XFS have learned to call xfs_blockgc_free_{space,quota} to try to free speculative preallocations when space is tight. This means that file writes, transaction reservation failures, quota limit enforcement, and the EOFBLOCKS ioctl all call this function to free space when things are tight. Since inode inactivation is now a background task, this means that the filesystem can be hanging on to unlinked but not yet freed space. Add this to the list of things that xfs_blockgc_free_* makes writer threads scan for when they cannot reserve space. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-08-09 10:52:18 -07:00
Darrick J. Wong	65f03d8652	xfs: queue inactivation immediately when free realtime extents are tight Now that we have made the inactivation of unlinked inodes a background task to increase the throughput of file deletions, we need to be a little more careful about how long of a delay we can tolerate. Similar to the patch doing this for free space on the data device, if the file being inactivated is a realtime file and the realtime volume is running low on free extents, we want to run the worker ASAP so that the realtime allocator can make better decisions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-08-09 10:52:18 -07:00
Darrick J. Wong	108523b8de	xfs: queue inactivation immediately when quota is nearing enforcement Now that we have made the inactivation of unlinked inodes a background task to increase the throughput of file deletions, we need to be a little more careful about how long of a delay we can tolerate. Specifically, if the dquots attached to the inode being inactivated are nearing any kind of enforcement boundary, we want to queue that inactivation work immediately so that users don't get EDQUOT/ENOSPC errors even after they deleted a bunch of files to stay within quota. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-08-09 10:52:18 -07:00
Darrick J. Wong	7d6f07d2c5	xfs: queue inactivation immediately when free space is tight Now that we have made the inactivation of unlinked inodes a background task to increase the throughput of file deletions, we need to be a little more careful about how long of a delay we can tolerate. On a mostly empty filesystem, the risk of the allocator making poor decisions due to fragmentation of the free space on account a lengthy delay in background updates is minimal because there's plenty of space. However, if free space is tight, we want to deallocate unlinked inodes as quickly as possible to avoid fallocate ENOSPC and to give the allocator the best shot at optimal allocations for new writes. Therefore, queue the percpu worker immediately if the filesystem is more than 95% full. This follows the same principle that XFS becomes less aggressive about speculative allocations and lazy cleanup (and more precise about accounting) when nearing full. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-08-09 10:52:17 -07:00
Dave Chinner	ab23a77687	xfs: per-cpu deferred inode inactivation queues Move inode inactivation to background work contexts so that it no longer runs in the context that releases the final reference to an inode. This will allow process work that ends up blocking on inactivation to continue doing work while the filesytem processes the inactivation in the background. A typical demonstration of this is unlinking an inode with lots of extents. The extents are removed during inactivation, so this blocks the process that unlinked the inode from the directory structure. By moving the inactivation to the background process, the userspace applicaiton can keep working (e.g. unlinking the next inode in the directory) while the inactivation work on the previous inode is done by a different CPU. The implementation of the queue is relatively simple. We use a per-cpu lockless linked list (llist) to queue inodes for inactivation without requiring serialisation mechanisms, and a work item to allow the queue to be processed by a CPU bound worker thread. We also keep a count of the queue depth so that we can trigger work after a number of deferred inactivations have been queued. The use of a bound workqueue with a single work depth allows the workqueue to run one work item per CPU. We queue the work item on the CPU we are currently running on, and so this essentially gives us affine per-cpu worker threads for the per-cpu queues. THis maintains the effective CPU affinity that occurs within XFS at the AG level due to all objects in a directory being local to an AG. Hence inactivation work tends to run on the same CPU that last accessed all the objects that inactivation accesses and this maintains hot CPU caches for unlink workloads. A depth of 32 inodes was chosen to match the number of inodes in an inode cluster buffer. This hopefully allows sequential allocation/unlink behaviours to defering inactivation of all the inodes in a single cluster buffer at a time, further helping maintain hot CPU and buffer cache accesses while running inactivations. A hard per-cpu queue throttle of 256 inode has been set to avoid runaway queuing when inodes that take a long to time inactivate are being processed. For example, when unlinking inodes with large numbers of extents that can take a lot of processing to free. Signed-off-by: Dave Chinner <dchinner@redhat.com> [djwong: tweak comments and tracepoints, convert opflags to state bits] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-06 11:05:39 -07:00
Darrick J. Wong	62af7d54a0	xfs: detach dquots from inode if we don't need to inactivate it If we don't need to inactivate an inode, we can detach the dquots and move on to reclamation. This isn't strictly required here; it's a preparation patch for deferred inactivation per reviewer request[1] to move the creation of xfs_inode_needs_inactivation into a separate change. Eventually this !need_inactive chunk will turn into the code path for inodes that skip xfs_inactive and go straight to memory reclaim. [1] https://lore.kernel.org/linux-xfs/20210609012838.GW2945738@locust/T/#mca6d958521cb88bbc1bfe1a30767203328d410b5 Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-08-06 11:05:39 -07:00
Darrick J. Wong	c6c2066db3	xfs: move xfs_inactive call to xfs_inode_mark_reclaimable Move the xfs_inactive call and all the other debugging checks and stats updates into xfs_inode_mark_reclaimable because most of that are implementation details about the inode cache. This is preparation for deferred inactivation that is coming up. We also move it around xfs_icache.c in preparation for deferred inactivation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-08-06 11:05:38 -07:00
Dave Chinner	0ed17f01c8	xfs: introduce all-mounts list for cpu hotplug notifications The inode inactivation and CIL tracking percpu structures are per-xfs_mount structures. That means when we get a CPU dead notification, we need to then iterate all the per-cpu structure instances to process them. Rather than keeping linked lists of per-cpu structures in each subsystem, add a list of all xfs_mounts that the generic xfs_cpu_dead() function will iterate and call into each subsystem appropriately. This allows us to handle both per-mount and global XFS percpu state from xfs_cpu_dead(), and avoids the need to link subsystem structures that can be easily found from the xfs_mount into their own global lists. Signed-off-by: Dave Chinner <dchinner@redhat.com> [djwong: expand some comments about mount list setup ordering rules] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-06 11:05:38 -07:00
Dave Chinner	f1653c2e28	xfs: introduce CPU hotplug infrastructure We need to move to per-cpu state for both deferred inode inactivation and CIL tracking, but to do that we need to handle CPUs being removed from the system by the hot-plug code. Introduce generic XFS infrastructure to handle CPU hotplug events that is set up at module init time and torn down at module exit time. Initially, we only need CPU dead notifications, so we only set up a callback for these notifications. The infrastructure can be updated in future for other CPU hotplug state machine notifications easily if ever needed. Signed-off-by: Dave Chinner <dchinner@redhat.com> [djwong: rearrange some macros, fix function prototypes] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-06 11:05:37 -07:00
Christoph Hellwig	149e53afc8	xfs: remove the active vs running quota differentiation These only made a difference when quotaoff supported disabling quota accounting on a mounted file system, so we can switch everyone to use a single set of flags and helpers now. Note that the *QUOTA_ON naming for the helpers is kept as it was the much more commonly used one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-06 11:05:37 -07:00
Christoph Hellwig	e497dfba6b	xfs: remove the flags argument to xfs_qm_dquot_walk We always purge all dquots now, so drop the argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-06 11:05:36 -07:00
Christoph Hellwig	777eb1fa85	xfs: remove xfs_dqrele_all_inodes xfs_dqrele_all_inodes is unused now, remove it and all supporting code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-06 11:05:36 -07:00
Christoph Hellwig	40b52225e5	xfs: remove support for disabling quota accounting on a mounted file system Disabling quota accounting is hairy, racy code with all kinds of pitfalls. And it has a very strange mind set, as quota accounting (unlike enforcement) really is a propery of the on-disk format. There is no good use case for supporting this. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-08-06 11:05:36 -07:00
Linus Torvalds	c500bee1c5	Linux 5.14-rc4	2021-08-01 17:04:17 -07:00
Linus Torvalds	d4affd6b6e	perf tools fixes for v5.14: 2nd batch - Revert "perf map: Fix dso->nsinfo refcounting", this makes 'perf top' to abort, uncovering a design flaw on how namespace information is kept, the fix is more than we can do right now, leave it for the next merge window. - Split --dump-raw-trace by AUX records for ARM's CoreSight, fixing up the decoding of some records. - Fix PMU alias matching. Thanks to James Clark and John Garry for these fixes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYQaw5wAKCRCyPKLppCJ+ J0n2AP42gFxTmz5YDv1YUjaC8GPMsqp0kNYo67zq0x4QMbp7pAEAt2S5Z7kbcK9h m57dq2wTz/lWCLCS0ccrhS+b7mjYlwk= =zRFX -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v5.14-2021-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Revert "perf map: Fix dso->nsinfo refcounting", this makes 'perf top' abort, uncovering a design flaw on how namespace information is kept. The fix for that is more than we can do right now, leave it for the next merge window. - Split --dump-raw-trace by AUX records for ARM's CoreSight, fixing up the decoding of some records. - Fix PMU alias matching. Thanks to James Clark and John Garry for these fixes. * tag 'perf-tools-fixes-for-v5.14-2021-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: Revert "perf map: Fix dso->nsinfo refcounting" perf pmu: Fix alias matching perf cs-etm: Split --dump-raw-trace by AUX records	2021-08-01 12:25:30 -07:00
Linus Torvalds	c82357a7b3	powerpc fixes for 5.14 #4 - Don't use r30 in VDSO code, to avoid breaking existing Go lang programs. - Change an export symbol to allow non-GPL modules to use spinlocks again. Thanks to: Paul Menzel, Srikar Dronamraju. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmEGnMETHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPzxD/9HXKi1gyaxgGEVsJKAWzhVYajsOn3g 7QnihBnyZFWNPWgyaoRnb06xg3/uVCnXtnnXARgzQ0E0nGlywVLvEscpvSqB+kn1 VKlPGWN6f/MmN2WTXQlT1/qLQUvdyiniieqg9BecxE8G4CfKRi3jw5Q8bMTIh2Lq Oh1XzUXD6P+/Wv3Sfx0goJ9+SU7uxJRW8dgpzseBy3NnK9HAoRaIf1V8N2fsgyil IgMlpi249Q2uFAewrh2PcYrAeATFXwIaZ1n+VHck299M0oQzyq1dttyjspihyOUB JhrYZtU5aWp1NrBNBwIJ0YpUHw8Jdsr6kPl0SpY8yHORBeAssuQOE/v0qR9pypsT DHBNMAniudTO6TJGDRvxN58y0BXzvnk5m+mBbUuhXeBLdcKFlpN7M4nXSdAh60JZ Uw107OY5/NoFpXuhDX1B46E0OuQFtwcVSW93kmEUR6KDX+MbIlKfQbNan7VqpEbG IFJCNcawBV9I7WeX8+ijFM5SvglC8gYnp5A7hNt/7ptOZjG5Nm6J8k+EDUIlgbiO BFJJRSfkVWFi+NUadSMyX4rWxSykpNsovZzAvodz+Evy1LgSSe71vbcbETtHYpxF +kGISw+EtCqgJ4qPI9k71oxO/edoMpuMhtgINPrxXtPywbG6Kj8RDrjWHrhJRSzo kT1ybTi4P7uDBw== =yBq0 -----END PGP SIGNATURE----- Merge tag 'powerpc-5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Don't use r30 in VDSO code, to avoid breaking existing Go lang programs. - Change an export symbol to allow non-GPL modules to use spinlocks again. Thanks to Paul Menzel, and Srikar Dronamraju. * tag 'powerpc-5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/vdso: Don't use r30 to avoid breaking Go lang powerpc/pseries: Fix regression while building external modules	2021-08-01 12:18:44 -07:00
Linus Torvalds	aa6603266c	Fixes for 5.14-rc4: * Fix a number of coordination bugs relating to cache flushes for metadata writeback, cache flushes for multi-buffer log writes, and FUA writes for single-buffer log writes. * Fix a bug with incorrect replay of attr3 blocks. * Fix unnecessary stalls when flushing logs to disk. * Fix spoofing problems when recovering realtime bitmap blocks. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmEC2PgACgkQ+H93GTRK tOtEvg//XQTcqKgO+60lJzhfgfGD8HsYWGcAc0UW8vu0I6gPNstd/PHKBCYkhT66 rp0l8CtZhbo3qj2ZJTIDvVxFeeAUcMhAIgU4gJB6OmW6/VV8NJlArfeyaA+85/lV lVYD53qBcc0IydDWlRD5oU8T55pqv9hg0W9WkpWrtjxoTlxPX5rDj7yrKEqiQs1M IUa5X4Qnwo/C2ATD/t2G3PIM7OxdCJ7YjyrZ27VWWRsUJW8DOqXtJX6HBs+VT9cM mh/IeIy60rmKgf2Ag2ZJCvrKnmqXqJFyGjEDzk6gXoqktQyWnUBLhQoyLh5r9UlA 4ThLGvPwUh5QEFOoo3cpN72X0wUeHcebfh4DgY/G3PeEK4J1CVq1UXLB1a8Si7X4 qf5ZqfUU4dr6v8C2AIqd9S/H6wm8v84hzA2uXca9tsw67rAcLc6N0rHydlLtn+n8 DL4PQYcUmn0LGrhIi2t/4ec80SGBf7ad/iDbr3A0K5NsV5kMl8dReg2yCDl9kHM0 yHFk8zLTKh5fs7fmmJXOORP33YMzstET9L1oKBv9cd9iMlHNUn27o9tpwwa2noM+ v6E+UCKlRTauj/MTxZITdmNzgGEymgu5bpbb77N24OTF9jf48OEW+cr0ZzgrVYtk wGuj9RFGcwneJoWjVPGURu1xBuC1AX9PbqnR9NQXbqmuwd6BINk= =pLW3 -----END PGP SIGNATURE----- Merge tag 'xfs-5.14-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: "This contains a bunch of bug fixes in XFS. Dave and I have been busy the last couple of weeks to find and fix as many log recovery bugs as we can find; here are the results so far. Go fstests -g recoveryloop! ;) - Fix a number of coordination bugs relating to cache flushes for metadata writeback, cache flushes for multi-buffer log writes, and FUA writes for single-buffer log writes - Fix a bug with incorrect replay of attr3 blocks - Fix unnecessary stalls when flushing logs to disk - Fix spoofing problems when recovering realtime bitmap blocks" * tag 'xfs-5.14-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: prevent spoofing of rtbitmap blocks when recovering buffers xfs: limit iclog tail updates xfs: need to see iclog flags in tracing xfs: Enforce attr3 buffer recovery order xfs: logging the on disk inode LSN can make it go backwards xfs: avoid unnecessary waits in xfs_log_force_lsn() xfs: log forces imply data device cache flushes xfs: factor out forced iclog flushes xfs: fix ordering violation between cache flushes and tail updates xfs: fold __xlog_state_release_iclog into xlog_state_release_iclog xfs: external logs need to flush data device xfs: flush data dev on external log write	2021-08-01 12:07:23 -07:00
Linus Torvalds	f3438b4c4e	3 cifs/smb3 fixes, including two for stable, and a fix for an fallocate problem noticed by Clang -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmEEaj8ACgkQiiy9cAdy T1FdSgv+KaGjM/IPPHDymSoWEtkPp3x3frPEXfJrTSD/M8NVx+9UEnwH8JEZEn+t PEQyGxOx28qo8ZtNJKmKAAkAm45/gTriteyjk8m2witJdDaK7wyhRYJ3Mmo99ot5 hOV55Jn40N+dK4r2AfWv6WauiDnLw83QlSmm5eN1/om3FXFx49IOjeKGaagpHGSQ MaaEX0pMcyMop/a1TFOC01M8ergYfWF/3S3zmA5S9Ei4+PefxbhYpdj4RTRZY7rp +2ZNyAs4Zc88l6weF4XCNck/2qrW6A4a1DCmnD19MJTGK+4FqqIBZ0f5fWV6aFGl EIi6vsHnt4b2MGkePs/SrpifCMZk/uSM7p9TQvplX0KHhIJ5h55pO0+GFdH0uYlv KkV2SjepCavsn8pwhVyqMRQ8+/MCQi+vAAP9vmdD2+V0cps8VFBexVSfuWEdwSBd JuDWeVP1kgR4OQGrCp4r8wWhSCh+FOGsdrwTHTwPWL9GCoUuiTui6Z36JM5fT6Dg 437Hg7RY =qJCo -----END PGP SIGNATURE----- Merge tag '5.14-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Three cifs/smb3 fixes, including two for stable, and a fix for an fallocate problem noticed by Clang" * tag '5.14-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: add missing parsing of backupuid smb3: rc uninitialized in one fallocate path SMB3: fix readpage for large swap cache	2021-07-31 09:25:12 -07:00
Linus Torvalds	c7d1022326	Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi (mac80211) and netfilter trees. Current release - regressions: - mac80211: fix starting aggregation sessions on mesh interfaces Current release - new code bugs: - sctp: send pmtu probe only if packet loss in Search Complete state - bnxt_en: add missing periodic PHC overflow check - devlink: fix phys_port_name of virtual port and merge error - hns3: change the method of obtaining default ptp cycle - can: mcba_usb_start(): add missing urb->transfer_dma initialization Previous releases - regressions: - set true network header for ECN decapsulation - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and LRO - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811 PHY - sctp: fix return value check in __sctp_rcv_asconf_lookup Previous releases - always broken: - bpf: - more spectre corner case fixes, introduce a BPF nospec instruction for mitigating Spectre v4 - fix OOB read when printing XDP link fdinfo - sockmap: fix cleanup related races - mac80211: fix enabling 4-address mode on a sta vif after assoc - can: - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF - j1939: j1939_session_deactivate(): clarify lifetime of session object, avoid UAF - fix number of identical memory leaks in USB drivers - tipc: - do not blindly write skb_shinfo frags when doing decryption - fix sleeping in tipc accept routine Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmEEWm8ACgkQMUZtbf5S Irv84A//V/nn9VRdpDpmodwBWVEc9SA00M/nmziRBLwRyG+fRMtnePY4Ha40TPbh LL6orth08hZKOjVmMc6Ea4EjZbV5E3iAKtAnaX6wi1HpEXVxKtFYnWxu9ydwTEd9 An1fltDtWYkNi3kiq7il+Tp1/yZAQ+NYv5zQZCWJ47kkN3jkjULdAEBqODA2A6Ul 0PQgS1rKzXukE19PlXDuaNuEekhTiEfaTwzHjdBJZkj1toGJGfHsvdQ/YJjixzB9 44SjE4PfxIaMWP0BVaD6hwzaVQhaZETXhZZufdIDdQd7sDbmd6CPODX6mXfLEq4u JaWylgobsK+5ScHE6siVI+ZlW7stq9l1Ynm10ADiwsZVzKEoP745484aEFOLO6Z+ Ln/IqDQCP/yJQmnl2i0+TfqVDh6BKYoIfUUK/+nzHw4Otycy0m3kj4P+74aYfjOv Q+cUgbXUemcrpq6wGUK+zK0NyNHVILvdPDnHPMMypwqPk18y5ZmFvaJAVUPSavD9 N7t9LoLyGwK3i/Ir4l+JJZ1KgAv1+TbmyNBWvY1Yk/r/vHU3nBPIv26s7YarNAwD 094vJEJ0+mqO4h+Xj1Nc7HEBFi46JfpN2L8uYoM7gpwziIRMdmpXVLmpEk43WmFi UMwWJWqabPEXaozC2UFcFLSk+jS7DiD+G5eG+Fd5HecmKzd7RI0= =sKPI -----END PGP SIGNATURE----- Merge tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi (mac80211) and netfilter trees. Current release - regressions: - mac80211: fix starting aggregation sessions on mesh interfaces Current release - new code bugs: - sctp: send pmtu probe only if packet loss in Search Complete state - bnxt_en: add missing periodic PHC overflow check - devlink: fix phys_port_name of virtual port and merge error - hns3: change the method of obtaining default ptp cycle - can: mcba_usb_start(): add missing urb->transfer_dma initialization Previous releases - regressions: - set true network header for ECN decapsulation - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and LRO - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811 PHY - sctp: fix return value check in __sctp_rcv_asconf_lookup Previous releases - always broken: - bpf: - more spectre corner case fixes, introduce a BPF nospec instruction for mitigating Spectre v4 - fix OOB read when printing XDP link fdinfo - sockmap: fix cleanup related races - mac80211: fix enabling 4-address mode on a sta vif after assoc - can: - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF - j1939: j1939_session_deactivate(): clarify lifetime of session object, avoid UAF - fix number of identical memory leaks in USB drivers - tipc: - do not blindly write skb_shinfo frags when doing decryption - fix sleeping in tipc accept routine" * tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits) gve: Update MAINTAINERS list can: esd_usb2: fix memory leak can: ems_usb: fix memory leak can: usb_8dev: fix memory leak can: mcba_usb_start(): add missing urb->transfer_dma initialization can: hi311x: fix a signedness bug in hi3110_cmd() MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver bpf: Fix leakage due to insufficient speculative store bypass mitigation bpf: Introduce BPF nospec instruction for mitigating Spectre v4 sis900: Fix missing pci_disable_device() in probe and remove net: let flow have same hash in two directions nfc: nfcsim: fix use after free during module unload tulip: windbond-840: Fix missing pci_disable_device() in probe and remove sctp: fix return value check in __sctp_rcv_asconf_lookup nfc: s3fwrn5: fix undefined parameter values in dev_err() net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32 net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev() net/mlx5: Unload device upon firmware fatal error net/mlx5e: Fix page allocation failure for ptp-RQ over SF net/mlx5e: Fix page allocation failure for trap-RQ over SF ...	2021-07-30 16:01:36 -07:00
Linus Torvalds	e1dab4c02d	ACPI fixes for 5.14-rc4 - Revert recent change of the ACPI IRQ resources handling that attempted to improve the ACPI IRQ override selection logic, but introduced serious regressions on some systems (Hui Wang). - Fix up quirks for AMD platforms in the suspend-to-idle support code so as to take upcoming systems using uPEP HID AMDI007 into account as appropriate (Mario Limonciello). - Fix the code retrieving DPTF attributes of the PCH FIVR so that it agrees on the return data type with the ACPI control method evaluated for this purpose (Srinivas Pandruvada). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmEESVASHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxAg4P/3XEhqudZI7+5VsGgjdVSuYIQZEVBiIt 6M1WOIVwbL+pybeadYVTIzjsBIWWyem48mvUf9liqpzNfgTdyVdbANyJNct9NsgN fETI/ethoBE0+VA8m5Z59ze3vwLWHII++GrL2J6/XQm+VV1mul2FsZ9GJj8v+8zf rD/OatQZMkdLQ5Z7E3OfeNRETmyuxd95wI3xmNcUxtMWkpWq41tRCoXHRPGuWxGF xJKRHDtN7MqXI6WPvdKLMZ2XXbbbmwr4fw5/r5schkP8dbtzMLMPhd7blZlA81jF no7jNQ8EPs5IIgpMkxNo1wVnYK0ALDWrAKifODsQe1WJbRThz9SRAssYD7WQkczE zoE6FcUt2rrKj91P0cnOUWJ+PI8WTa4RStjva1zxliwgv7pDn5SuedAdPv0P/9Zz XO74NrnrF8P+H/rWMNX3/kVzzabw58gzr0o/2a17sFyk+dVAb3vsjQRS+MWy/GLA OsiAqqRe3jC2OtyeQ053FLxacSqItBrfySPB9fGO5rpi5KOG/8ODNf3Y9Z+aWeln LNi7/SU4ZUbklmr8BbXRAdfCnZBrmr9+ddeP4Qg4cBtoJQUsjQHSQzXEuFQWPFNL L+oHkPLNw0Yqej5pJa5eoOKEH0lm0aBDivNV3zJ/0PqD2zFtNB6LQWIftOYARaz0 CDfY0XwUdq58 =IGYg -----END PGP SIGNATURE----- Merge tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These revert a recent IRQ resources handling modification that turned out to be problematic, fix suspend-to-idle handling on AMD platforms to take upcoming systems into account properly and fix the retrieval of the DPTF attributes of the PCH FIVR. Specifics: - Revert recent change of the ACPI IRQ resources handling that attempted to improve the ACPI IRQ override selection logic, but introduced serious regressions on some systems (Hui Wang). - Fix up quirks for AMD platforms in the suspend-to-idle support code so as to take upcoming systems using uPEP HID AMDI007 into account as appropriate (Mario Limonciello). - Fix the code retrieving DPTF attributes of the PCH FIVR so that it agrees on the return data type with the ACPI control method evaluated for this purpose (Srinivas Pandruvada)" * tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: DPTF: Fix reading of attributes Revert "ACPI: resources: Add checks for ACPI IRQ override" ACPI: PM: Add support for upcoming AMD uPEP HID AMDI007	2021-07-30 15:56:24 -07:00
Linus Torvalds	3a34b13a88	pipe: make pipe writes always wake up readers Since commit `1b6b26ae70` ("pipe: fix and clarify pipe write wakeup logic") we have sanitized the pipe write logic, and would only try to wake up readers if they needed it. In particular, if the pipe already had data in it before the write, there was no point in trying to wake up a reader, since any existing readers must have been aware of the pre-existing data already. Doing extraneous wakeups will only cause potential thundering herd problems. However, it turns out that some Android libraries have misused the EPOLL interface, and expected "edge triggered" be to "any new write will trigger it". Even if there was no edge in sight. Quoting Sandeep Patil: "The commit `1b6b26ae70` ('pipe: fix and clarify pipe write wakeup logic') changed pipe write logic to wakeup readers only if the pipe was empty at the time of write. However, there are libraries that relied upon the older behavior for notification scheme similar to what's described in [1] One such library 'realm-core'[2] is used by numerous Android applications. The library uses a similar notification mechanism as GNU Make but it never drains the pipe until it is full. When Android moved to v5.10 kernel, all applications using this library stopped working. The library has since been fixed[3] but it will be a while before all applications incorporate the updated library" Our regression rule for the kernel is that if applications break from new behavior, it's a regression, even if it was because the application did something patently wrong. Also note the original report [4] by Michal Kerrisk about a test for this epoll behavior - but at that point we didn't know of any actual broken use case. So add the extraneous wakeup, to approximate the old behavior. [ I say "approximate", because the exact old behavior was to do a wakeup not for each write(), but for each pipe buffer chunk that was filled in. The behavior introduced by this change is not that - this is just "every write will cause a wakeup, whether necessary or not", which seems to be sufficient for the broken library use. ] It's worth noting that this adds the extraneous wakeup only for the write side, while the read side still considers the "edge" to be purely about reading enough from the pipe to allow further writes. See commit `f467a6a664` ("pipe: fix and clarify pipe read wakeup logic") for the pipe read case, which remains that "only wake up if the pipe was full, and we read something from it". Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1] Link: https://github.com/realm/realm-core [2] Link: https://github.com/realm/realm-core/issues/4666 [3] Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4] Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/ Reported-by: Sandeep Patil <sspatil@android.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-30 15:42:34 -07:00
Arnaldo Carvalho de Melo	9bac1bd6e6	Revert "perf map: Fix dso->nsinfo refcounting" This makes 'perf top' abort in some cases, and the right fix will involve surgery that is too much to do at this stage, so revert for now and fix it in the next merge window. This reverts commit `2d6b74baa7`. Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2021-07-30 18:26:22 -03:00
Rafael J. Wysocki	e83f54eacf	Merge branches 'acpi-resources' and 'acpi-dptf' * acpi-resources: Revert "ACPI: resources: Add checks for ACPI IRQ override" * acpi-dptf: ACPI: DPTF: Fix reading of attributes	2021-07-30 20:26:38 +02:00
Linus Torvalds	4669e13cd6	block-5.14-2021-07-30 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmEEEz8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpglXD/9CGREpOf1W5oqOScpTygjehwrRnAYisQv6 Oca/qGHBa61BTN3taAJc4NMwl+IwFBER2kdTcOyz8hNmyAUPyRmFND0mG2vGTzQA P9+ekiRKCJ1aRLsnyBL0JBbmvdoPMBHz39P165vMWMrVmnlpcPKoYDS0itHtYYNP VD5Y3A9ACGMDglipDmL+3tsXQo/AoJqRO8WGMUBY2qJ0lasYuCbPpzq0kHzXi6kE 0X64bg6JOZVd3wdyWywKahW3ntsVNLswRUBzLVrnjwE29UuBGWgF+/vwyW/Ob0yS ojafKvehCYnV8Q7IatASOtbwGLvLKgpJZXf7VUEsYnSD6SnmoZctjMjRdyLhNWut lD86Y+eWjQM0pUsOVPykfrV2hd9CrhjyRFskcbI0SJRlMOl0Lstl/X17efDWcDmz 1/V8ub3gKA3HF2Gc/QKhPJDClxM7SaWnsAO3Rk+qJ6bT4EiiRg2GewI1C7YNpmGW ty1fqcQE36JtSWadH4KL/evmX258ROfn3QT1nut2jpNsd1RQ+hHBcjcfeOx6n1GX ALxT8LnmlVYbAUwQvXJcqFcft8K3JoB5ZXT74lat/CAbIKhfEUeSUiqnQcQ8kJLW MTKviuZ9eJHO6/E7vw08ARDR0PmpSFqvc6rK9DiIM/kmVDz8OdLMovTqzX/hIzUT 7IfyHzQbwg== =5FG2 -----END PGP SIGNATURE----- Merge tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - gendisk freeing fix (Christoph) - blk-iocost wake ordering fix (Tejun) - tag allocation error handling fix (John) - loop locking fix. While this isn't the prettiest fix in the world, nobody has any good alternatives for 5.14. Something to likely revisit for 5.15. (Tetsuo) * tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block: block: delay freeing the gendisk blk-iocost: fix operation ordering in iocg_wake_fn() blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling loop: reintroduce global lock for safe loop_validate_file() traversal	2021-07-30 11:08:12 -07:00
Linus Torvalds	27eb687bcd	io_uring-5.14-2021-07-30 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmEEE1IQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpm3jD/9Tx+VeLEmfapYdZXziDWrVecwm0iWfSoT8 ibKJGIYHmCH1XxIVT16+a1HcJAra7NGS60zW6JAvBh5ZWL/smEriOu+R6Twa1ez2 9Gby+39+V0PP3x9sKtRp7TsmsA0paIqVG4zwfyaUCyvYfiSTURoYde4lZwSGEMhb 8b0FPju/hmN/iyRGtu1eQvTbp252vahkGE8PKYkZWxNkTdJpvRax3kmbjH3A8/X3 rr0KDMgk4ePVap+3i/h94rXweaLCq9KiSm95Zvs63me6J2CbpKz/hGtvR5TiENiI 0mEqkA9PVE6LRbF4T6gk2gGYrkfEiAzca0r8BdoY0TxuEb2SfK5P3JOfYxS4VHek lVc+lvm0YglRn6dWxdUMhXYTrlAx4cRnIM9Oqg/WzN4WCZREouE2J/MXHNy3lNfQ zQQkmbNzeLQHaiq/JWAtu39LAeHWwEC/FernVV1i0wYOT6EacpVNM6OLUmHQGzkZ mnQpc9AzCgLifNZb4DDlP0MyM1D+GXGm5tdozmUILFQoAnvA6+3EtnRxOH9cWLaa mSNenp5kag/nbdkFTo3X+ptGYgLBWEluT/dKMsoqulPu+ZCV4zh9rAIgWqQUYwa2 5z/d6OAr3V8hjChiF4a7JrRUISu0f0Eh/GRCqTSp97Hys5rJcDxGB9WQKv7u+sTc BArhbyoHcg== =pzj2 -----END PGP SIGNATURE----- Merge tag 'io_uring-5.14-2021-07-30' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: - A fix for block backed reissue (me) - Reissue context hardening (me) - Async link locking fix (Pavel) * tag 'io_uring-5.14-2021-07-30' of git://git.kernel.dk/linux-block: io_uring: fix poll requests leaking second poll entries io_uring: don't block level reissue off completion path io_uring: always reissue from task_work context io_uring: fix race in unified task_work running io_uring: fix io_prep_async_link locking	2021-07-30 11:01:47 -07:00
Linus Torvalds	f6c5971bb7	libata-5.14-2021-07-30 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmEEEyMQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgppZeEADdqROLANHp21UFSPyqllHumXVrCK3jXk9d ZHahUqT+xQqYZ3BC0hyP7vYuq+FWpr5Rumk6nah46JRv8RnvEHLOjkBqravGl6SV Zw2qvGe2R7LueBshsbG9m79D0cR2hcrMj2DYvsNIriTxkDVIo2wReaAg3V/vaep6 +kpvcjEFB9G4K/ypG2qPJnZ2TCoBmi/iJK5wTbQOpPAxQJxBCJGffBLXg/Olfy74 k6Oovp0bQWTEziAXNlgawn/Tiwav617/eZgz4ZxgnqzeVD1jJK8bPSf+O1UbNH6z lmULEdrc7fMTDgTbv5mElmxtXv+Ba5WZnZgzBFASt1BgvW/BSRNhs191T9Mq4U4L gLWDL/oRPhnCOP/AYQVhXzaV98hlOD+UBH3zypbBsCuWLGgDOoZOqjYyTOk+9PwB 0LFEZr5i/ZAQmgvtYSOH8u9NowhfOThVDhvfWmoD6ByoF0rPeVyPUUr0P910aVwW R2JkHKdixqCvyxIZqxwWfTjzApn8fzBGlcY6skMeXbh5pDo9F5HL/QbkKedoUpbj fcbklkr/Aggz3pLWq49RqeTtUZiFnolOtUpz09sojA75BxBV0Aa11FYf8JNSKUx+ 8RWLIT80PIxKiPV7Ym4ZG9qJKfzob7Oq/XwKxtReKCnfFcGdF2imroajggvawsmS 8UtOqwsHjg== =m5TP -----END PGP SIGNATURE----- Merge tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block Pull libata fixlets from Jens Axboe: - A fix for PIO highmem (Christoph) - Kill HAVE_IDE as it's now unused (Lukas) * tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block: arch: Kconfig: clean up obsolete use of HAVE_IDE libata: fix ata_pio_sector for CONFIG_HIGHMEM	2021-07-30 10:56:47 -07:00
Linus Torvalds	051df241e4	for-5.14-rc3-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmEEDKIACgkQxWXV+ddt WDtW+BAAnUD7h3ollIQo4C6hE9WaTG49Tp12Z00Og2m8hn4XyhI2QIaDz6a2CU7n MLQv16vZUQk5Z/VMtczM+5ZF5Rf0ywlMXnS4Sq5yKWT0YHpnH7q2nMAvg4gql/tJ Ldov92hnTrFAZX6vvkLVM5lZriY7fop3Lv2vHeAKu4CymAoisAv+SLa5xYkBR6Ig 3S16+lh/rIRgssI7KuDnjp9iTXvnB1J2MbfAOLNfqjXGWUDumu1k7HWQSNYZnHJX L390/QS3F3K6Trxkf5MSUXOxQROqcGKQVKyAR5ZvyULKly84nDpiINze80yCopq/ 7//32pO43xDPb78c7saxSWtjdgX4XsBOdzIoiJZHnc5CTTbCcneLes8zz4fD6AGq vjZKDLTgiO/sRlkQHZQk1y+7CawrqbKkAG+O7MqF7KGOtQ1WLRGfAkFP732TBFXM TyoZ7ENh3TiFDdeRmkOonpQ2k3DctW+7z2BmdlsuSXgD8fFbEArfxnO1SnRHrmcr C8FNeSkks8MTL7uePNUxwlnB8uHuGWCgSuS++q4OkCnzA3AmO6cRlDoMT3RMwVB/ wQxvqF/U6JJx16YOVqwA6ZjuUWVwyBj/WBKlaxgfghz8CUmDC0D4Xb2/S1UVcZi6 bFRph0UKeE5LaduoNZYaAqMOinCXFmetjudPmWO4sWfPrLb1mOY= =J0Pw -----END PGP SIGNATURE----- Merge tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix -Warray-bounds warning, to help external patchset to make it default treewide - fix writeable device accounting (syzbot report) - fix fsync and log replay after a rename and inode eviction - fix potentially lost error code when submitting multiple bios for compressed range * tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: calculate number of eb pages properly in csum_tree_block btrfs: fix rw device counting in __btrfs_free_extra_devids btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction btrfs: mark compressed range uptodate only if all bio succeed	2021-07-30 10:50:09 -07:00
Linus Torvalds	8723bc8fb3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: - resume timing fix for intel-ish driver (Ye Xiang) - fix for using incorrect MMIO register in amd_sfh driver (Dylan MacKenzie) - Cintiq 24HDT / 27QHDT regression fix and touch processing fix for Wacom driver (Jason Gerecke) - device removal bugfix for ft260 driver (Michael Zaidman) - other small assorted fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: ft260: fix device removal due to USB disconnect HID: wacom: Skip processing of touches with negative slot values HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT HID: Kconfig: Fix spelling mistake "Uninterruptable" -> "Uninterruptible" HID: apple: Add support for Keychron K1 wireless keyboard HID: fix typo in Kconfig HID: ft260: fix format type warning in ft260_word_show() HID: amd_sfh: Use correct MMIO register for DMA address HID: asus: Remove check for same LED brightness on set HID: intel-ish-hid: use async resume function	2021-07-30 10:36:36 -07:00
Linus Torvalds	ad6ec09d96	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "7 patches. Subsystems affected by this patch series: lib, ocfs2, and mm (slub, migration, and memcg)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() slub: fix unreclaimable slab stat for bulk free mm/migrate: fix NR_ISOLATED corruption on 64-bit mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code ocfs2: issue zeroout to EOF blocks ocfs2: fix zero out valid data lib/test_string.c: move string selftest in the Runtime Testing menu	2021-07-30 10:29:58 -07:00
Jakub Kicinski	8d67041228	linux-can-fixes-for-5.14-20210730 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmEDowsTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCpyVqK+u3vqb9iB/9mje9K6evhlrP/eHmhy50Tah5PTtUM SNLNBvvLxtinFSvqcmUdwt6eX9hSNqvUr4MGeqoSGUVj5WnbLPenVlqgqa+/eS4b mGVfC61RzNyloxTh/qxsqerWc2t9MO6HvC20lKHvxN0ZHleYRdWVkMxl7DUYDAvE h/WGTvc6G1//XdGbaOEoEcZSYfLGR0G5/uWDo83vpGA1lHfvmrVdcNi/tyLqLWzz qwatRqn5QQp/MiKN8VO3hFEAzwcqwEOcDdwdmqHZ5lN6qyUNMeDynPv3LuM01AlE ds+AzOcPuDm/CkRbMrrow1hR+y4xfIDrXqoD3qIY2mwRxzsMdO+IJlEn =3PEn -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-07-30 The first patch is by me and adds Yasushi SHOJI as a reviewer for the Microchip CAN BUS Analyzer Tool driver. Dan Carpenter's patch fixes a signedness bug in the hi311x driver. Pavel Skripkin provides 4 patches, the first targets the mcba_usb driver by adding the missing urb->transfer_dma initialization, which was broken in a previous commit. The last 3 patches fix a memory leak in the usb_8dev, ems_usb and esd_usb2 driver. * tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: esd_usb2: fix memory leak can: ems_usb: fix memory leak can: usb_8dev: fix memory leak can: mcba_usb_start(): add missing urb->transfer_dma initialization can: hi311x: fix a signedness bug in hi3110_cmd() MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver ==================== Link: https://lore.kernel.org/r/20210730070526.1699867-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-07-30 19:29:52 +02:00
Wang Hai	121dffe20b	mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() When I use kfree_rcu() to free a large memory allocated by kmalloc_node(), the following dump occurs. BUG: kernel NULL pointer dereference, address: 0000000000000020 [...] Oops: 0000 [#1] SMP [...] Workqueue: events kfree_rcu_work RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline] RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline] RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363 [...] Call Trace: kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293 kfree_bulk include/linux/slab.h:413 [inline] kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300 process_one_work+0x207/0x530 kernel/workqueue.c:2276 worker_thread+0x320/0x610 kernel/workqueue.c:2422 kthread+0x13d/0x160 kernel/kthread.c:313 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 When kmalloc_node() a large memory, page is allocated, not slab, so when freeing memory via kfree_rcu(), this large memory should not be used by memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for slab. Using page_objcgs_check() instead of page_objcgs() in memcg_slab_free_hook() to fix this bug. Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com Fixes: `270c6a7146` ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-30 10:14:39 -07:00
Shakeel Butt	f227f0faf6	slub: fix unreclaimable slab stat for bulk free SLUB uses page allocator for higher order allocations and update unreclaimable slab stat for such allocations. At the moment, the bulk free for SLUB does not share code with normal free code path for these type of allocations and have missed the stat update. So, fix the stat update by common code. The user visible impact of the bug is the potential of inconsistent unreclaimable slab stat visible through meminfo and vmstat. Link: https://lkml.kernel.org/r/20210728155354.3440560-1-shakeelb@google.com Fixes: `6a486c0ad4` ("mm, sl[ou]b: improve memory accounting") Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-30 10:14:39 -07:00
Aneesh Kumar K.V	b5916c0254	mm/migrate: fix NR_ISOLATED corruption on 64-bit Similar to commit `2da9f6305f` ("mm/vmscan: fix NR_ISOLATED_FILE corruption on 64-bit") avoid using unsigned int for nr_pages. With unsigned int type the large unsigned int converts to a large positive signed long. Symptoms include CMA allocations hanging forever due to alloc_contig_range->...->isolate_migratepages_block waiting forever in "while (unlikely(too_many_isolated(pgdat)))". Link: https://lkml.kernel.org/r/20210728042531.359409-1-aneesh.kumar@linux.ibm.com Fixes: `c5fc5c3ae0` ("mm: migrate: account THP NUMA migration counters correctly") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-30 10:14:39 -07:00
Johannes Weiner	30def93565	mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code Dan Carpenter reports: The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr 29, 2021, leads to the following static checker warning: kernel/cgroup/rstat.c:200 cgroup_rstat_flush() warn: sleeping in atomic context mm/memcontrol.c 3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) 3573 { 3574 unsigned long val; 3575 3576 if (mem_cgroup_is_root(memcg)) { 3577 cgroup_rstat_flush(memcg->css.cgroup); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is from static analysis and potentially a false positive. The problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold() which holds an rcu_read_lock(). And the cgroup_rstat_flush() function can sleep. 3578 val = memcg_page_state(memcg, NR_FILE_PAGES) + 3579 memcg_page_state(memcg, NR_ANON_MAPPED); 3580 if (swap) 3581 val += memcg_page_state(memcg, MEMCG_SWAP); 3582 } else { 3583 if (!swap) 3584 val = page_counter_read(&memcg->memory); 3585 else 3586 val = page_counter_read(&memcg->memsw); 3587 } 3588 return val; 3589 } __mem_cgroup_threshold() indeed holds the rcu lock. In addition, the thresholding code is invoked during stat changes, and those contexts have irqs disabled as well. If the lock breaking occurs inside the flush function, it will result in a sleep from an atomic context. Use the irqsafe flushing variant in mem_cgroup_usage() to fix this. Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org Fixes: `2d146aa3aa` ("mm: memcontrol: switch to rstat") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Chris Down <chris@chrisdown.name> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-30 10:14:39 -07:00
Junxiao Bi	9449ad33be	ocfs2: issue zeroout to EOF blocks For punch holes in EOF blocks, fallocate used buffer write to zero the EOF blocks in last cluster. But since ->writepage will ignore EOF pages, those zeros will not be flushed. This "looks" ok as commit `6bba4471f0` ("ocfs2: fix data corruption by fallocate") will zero the EOF blocks when extend the file size, but it isn't. The problem happened on those EOF pages, before writeback, those pages had DIRTY flag set and all buffer_head in them also had DIRTY flag set, when writeback run by write_cache_pages(), DIRTY flag on the page was cleared, but DIRTY flag on the buffer_head not. When next write happened to those EOF pages, since buffer_head already had DIRTY flag set, it would not mark page DIRTY again. That made writeback ignore them forever. That will cause data corruption. Even directio write can't work because it will fail when trying to drop pages caches before direct io, as it found the buffer_head for those pages still had DIRTY flag set, then it will fall back to buffer io mode. To make a summary of the issue, as writeback ingores EOF pages, once any EOF page is generated, any write to it will only go to the page cache, it will never be flushed to disk even file size extends and that page is not EOF page any more. The fix is to avoid zero EOF blocks with buffer write. The following code snippet from qemu-img could trigger the corruption. 656 open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR\|O_DIRECT\|O_CLOEXEC) = 11 ... 660 fallocate(11, FALLOC_FL_KEEP_SIZE\|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...> 660 fallocate(11, 0, 2275868672, 327680) = 0 658 pwrite64(11, " Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-30 10:14:39 -07:00
Junxiao Bi	f267aeb6de	ocfs2: fix zero out valid data If append-dio feature is enabled, direct-io write and fallocate could run in parallel to extend file size, fallocate used "orig_isize" to record i_size before taking "ip_alloc_sem", when ocfs2_zeroout_partial_cluster() zeroout EOF blocks, i_size maybe already extended by ocfs2_dio_end_io_write(), that will cause valid data zeroed out. Link: https://lkml.kernel.org/r/20210722054923.24389-1-junxiao.bi@oracle.com Fixes: `6bba4471f0` ("ocfs2: fix data corruption by fallocate") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-30 10:14:39 -07:00
Matteo Croce	b2ff70a01a	lib/test_string.c: move string selftest in the Runtime Testing menu STRING_SELFTEST is presented in the "Library routines" menu. Move it in Kernel hacking > Kernel Testing and Coverage > Runtime Testing together with other similar tests found in lib/ --- Runtime Testing <> Test functions located in the hexdump module at runtime <> Test string functions (NEW) <> Test functions located in the string_helpers module at runtime <> Test strscpy() family of functions at runtime <> Test kstrto() family of functions at runtime <> Test printf() family of functions at runtime <*> Test scanf() family of functions at runtime Link: https://lkml.kernel.org/r/20210719185158.190371-1-mcroce@linux.microsoft.com Signed-off-by: Matteo Croce <mcroce@microsoft.com> Cc: Peter Rosin <peda@axentia.se> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-07-30 10:14:39 -07:00
Catherine Sullivan	028a71775f	gve: Update MAINTAINERS list The team maintaining the gve driver has undergone some changes, this updates the MAINTAINERS file accordingly. Signed-off-by: Catherine Sullivan <csully@google.com> Signed-off-by: Jon Olson <jonolson@google.com> Signed-off-by: David Awogbemila <awogbemila@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://lore.kernel.org/r/20210729155258.442650-1-csully@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-07-30 19:08:24 +02:00

1 2 3 4 5 ...

1030459 Commits All Branches Search

1030459 Commits

All Branches