OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Lennert Buytenhek	b0dd4d7ada	ahci: add 43-bit DMA address quirk for ASMedia ASM1061 controllers [ Upstream commit 20730e9b277873deeb6637339edcba64468f3da3 ] With one of the on-board ASM1061 AHCI controllers (1b21:0612) on an ASUSTeK Pro WS WRX80E-SAGE SE WIFI mainboard, a controller hang was observed that was immediately preceded by the following kernel messages: ahci 0000:28:00.0: Using 64-bit DMA addresses ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00000 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00300 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00380 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00400 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00680 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00700 flags=0x0000] The first message is produced by code in drivers/iommu/dma-iommu.c which is accompanied by the following comment that seems to apply: /* * Try to use all the 32-bit PCI addresses first. The original SAC vs. * DAC reasoning loses relevance with PCIe, but enough hardware and * firmware bugs are still lurking out there that it's safest not to * venture into the 64-bit space until necessary. * * If your device goes wrong after seeing the notice then likely either * its driver is not setting DMA masks accurately, the hardware has * some inherent bug in handling >32-bit addresses, or not all the * expected address bits are wired up between the device and the IOMMU. */ Asking the ASM1061 on a discrete PCIe card to DMA from I/O virtual address 0xffffffff00000000 produces the following I/O page faults: vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000000 flags=0x0010] vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000500 flags=0x0010] Note that the upper 21 bits of the logged DMA address are zero. (When asking a different PCIe device in the same PCIe slot to DMA to the same I/O virtual address, we do see all the upper 32 bits of the DMA address as 1, so this is not an issue with the chipset or IOMMU configuration on the test system.) Also, hacking libahci to always set the upper 21 bits of all DMA addresses to 1 produces no discernible effect on the behavior of the ASM1061, and mkfs/mount/scrub/etc work as without this hack. This all strongly suggests that the ASM1061 has a 43 bit DMA address limit, and this commit therefore adds a quirk to deal with this limit. This issue probably applies to (some of) the other supported ASMedia parts as well, but we limit it to the PCI IDs known to refer to ASM1061 parts, as that's the only part we know for sure to be affected by this issue at this point. Link: https://lore.kernel.org/linux-ide/ZaZ2PIpEId-rl6jv@wantstofly.org/ Signed-off-by: Lennert Buytenhek <kernel@wantstofly.org> [cassel: drop date from error messages in commit log] Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Charles Keepax	ab7318c795	spi: cs42l43: Handle error from devm_pm_runtime_enable [ Upstream commit f9f4b0c6425eb9ffd9bf62b8b8143e786b6ba695 ] As it devm_pm_runtime_enable can fail due to memory allocations, it is best to handle the error. Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com> Link: https://msgid.link/r/20240124174101.2270249-1-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Maksim Kiselev	673629018b	aoe: avoid potential deadlock at set_capacity [ Upstream commit e169bd4fb2b36c4b2bee63c35c740c85daeb2e86 ] Move set_capacity() outside of the section procected by (&d->lock). To avoid possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- [1] lock(&bdev->bd_size_lock); local_irq_disable(); [2] lock(&d->lock); [3] lock(&bdev->bd_size_lock); <Interrupt> [4] lock(&d->lock); * DEADLOCK * Where [1](&bdev->bd_size_lock) hold by zram_add()->set_capacity(). [2]lock(&d->lock) hold by aoeblk_gdalloc(). And aoeblk_gdalloc() is trying to acquire [3](&bdev->bd_size_lock) at set_capacity() call. In this situation an attempt to acquire [4]lock(&d->lock) from aoecmd_cfg_rsp() will lead to deadlock. So the simplest solution is breaking lock dependency [2](&d->lock) -> [3](&bdev->bd_size_lock) by moving set_capacity() outside. Signed-off-by: Maksim Kiselev <bigunclemax@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240124072436.3745720-2-bigunclemax@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Conrad Kostecki	89f6705161	ahci: asm1166: correct count of reported ports [ Upstream commit 0077a504e1a4468669fd2e011108db49133db56e ] The ASM1166 SATA host controller always reports wrongly, that it has 32 ports. But in reality, it only has six ports. This seems to be a hardware issue, as all tested ASM1166 SATA host controllers reports such high count of ports. Example output: ahci 0000:09:00.0: AHCI 0001.0301 32 slots 32 ports 6 Gbps 0xffffff3f impl SATA mode. By adjusting the port_map, the count is limited to six ports. New output: ahci 0000:09:00.0: AHCI 0001.0301 32 slots 32 ports 6 Gbps 0x3f impl SATA mode. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=211873 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218346 Signed-off-by: Conrad Kostecki <conikost@gentoo.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Shyam Prasad N	f642fcf3f7	cifs: helper function to check replayable error codes [ Upstream commit 64cc377b7628b81ffdbdb1c6bacfba895dcac3f8 ] The code to check for replay is not just -EAGAIN. In some cases, the send request or receive response may result in network errors, which we're now mapping to -ECONNABORTED. This change introduces a helper function which checks if the error returned in one of the above two errors. And all checks for replays will now use this helper. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Shyam Prasad N	c09de6bb3a	cifs: translate network errors on send to -ECONNABORTED [ Upstream commit a68106a6928e0a6680f12bcc7338c0dddcfe4d11 ] When the network stack returns various errors, we today bubble up the error to the user (in case of soft mounts). This change translates all network errors except -EINTR and -EAGAIN to -ECONNABORTED. A similar approach is taken when we receive network errors when reading from the socket. The change also forces the cifsd thread to reconnect during it's next activity. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Shyam Prasad N	59e04d39fc	cifs: cifs_pick_channel should try selecting active channels [ Upstream commit fc43a8ac396d302ced1e991e4913827cf72c8eb9 ] cifs_pick_channel today just selects a channel based on the policy of least loaded channel. However, it does not take into account if the channel needs reconnect. As a result, we can have failures in send that can be completely avoided. This change doesn't make a channel a candidate for this selection if it needs reconnect. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Kees Cook	8fbefa7a75	smb: Work around Clang __bdos() type confusion [ Upstream commit 8deb05c84b63b4fdb8549e08942867a68924a5b8 ] Recent versions of Clang gets confused about the possible size of the "user" allocation, and CONFIG_FORTIFY_SOURCE ends up emitting a warning[1]: repro.c:126:4: warning: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 126 \| __write_overflow_field(p_size_field, size); \| ^ for this memset(): int len; __le16 user; ... len = ses->user_name ? strlen(ses->user_name) : 0; user = kmalloc(2 + (len 2), GFP_KERNEL); ... if (len) { ... } else { memset(user, '\0', 2); } While Clang works on this bug[2], switch to using a direct assignment, which avoids memset() entirely which both simplifies the code and silences the false positive warning. (Making "len" size_t also silences the warning, but the direct assignment seems better.) Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://github.com/ClangBuiltLinux/linux/issues/1966 [1] Link: https://github.com/llvm/llvm-project/issues/77813 [2] Cc: Steve French <sfrench@samba.org> Cc: Paulo Alcantara <pc@manguebit.com> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Shyam Prasad N <sprasad@microsoft.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: llvm@lists.linux.dev Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Christian A. Ehrhardt	0f1bae071d	block: Fix WARNING in _copy_from_iter [ Upstream commit 13f3956eb5681a4045a8dfdef48df5dc4d9f58a6 ] Syzkaller reports a warning in _copy_from_iter because an iov_iter is supposedly used in the wrong direction. The reason is that syzcaller managed to generate a request with a transfer direction of SG_DXFER_TO_FROM_DEV. This instructs the kernel to copy user buffers into the kernel, read into the copied buffers and then copy the data back to user space. Thus the iovec is used in both directions. Detect this situation in the block layer and construct a new iterator with the correct direction for the copy-in. Reported-by: syzbot+a532b03fdfee2c137666@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/0000000000009b92c10604d7a5e9@google.com/t/ Reported-by: syzbot+63dec323ac56c28e644f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/0000000000003faaa105f6e7c658@google.com/T/ Signed-off-by: Christian A. Ehrhardt <lk@c--e.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240121202634.275068-1-lk@c--e.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Devyn Liu	d637b51182	spi: hisi-sfc-v3xx: Return IRQ_NONE if no interrupts were detected [ Upstream commit de8b6e1c231a95abf95ad097b993d34b31458ec9 ] Return IRQ_NONE from the interrupt handler when no interrupt was detected. Because an empty interrupt will cause a null pointer error: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Call trace: complete+0x54/0x100 hisi_sfc_v3xx_isr+0x2c/0x40 [spi_hisi_sfc_v3xx] __handle_irq_event_percpu+0x64/0x1e0 handle_irq_event+0x7c/0x1cc Signed-off-by: Devyn Liu <liudingyuan@huawei.com> Link: https://msgid.link/r/20240123071149.917678-1-liudingyuan@huawei.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Mika Westerberg	8298ea0f51	spi: intel-pci: Add support for Arrow Lake SPI serial flash [ Upstream commit 8afe3c7fcaf72fca1e7d3dab16a5b7f4201ece17 ] This adds the PCI ID of the Arrow Lake and Meteor Lake-S PCH SPI serial flash controller. This one supports all the necessary commands Linux SPI-NOR stack requires. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://msgid.link/r/20240122120034.2664812-3-mika.westerberg@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Liming Sun	763c59714c	platform/mellanox: mlxbf-tmfifo: Drop Tx network packet when Tx TmFIFO is full [ Upstream commit 8cbc756b802605dee3dd40019bd75960772bacf5 ] Starting from Linux 5.16 kernel, Tx timeout mechanism was added in the virtio_net driver which prints the "Tx timeout" warning message when a packet stays in Tx queue for too long. Below is an example of the reported message: "[494105.316739] virtio_net virtio1 tmfifo_net0: TX timeout on queue: 0, sq: output.0, vq: 0×1, name: output.0, usecs since last trans: 3079892256". This issue could happen when external host driver which drains the FIFO is restared, stopped or upgraded. To avoid such confusing "Tx timeout" messages, this commit adds logic to drop the outstanding Tx packet if it's not able to transmit in two seconds due to Tx FIFO full, which can be considered as congestion or out-of-resource drop. This commit also handles the special case that the packet is half- transmitted into the Tx FIFO. In such case, the packet is discarded with remaining length stored in vring->rem_padding. So paddings with zeros can be sent out when Tx space is available to maintain the integrity of the packet format. The padded packet will be dropped on the receiving side. Signed-off-by: Liming Sun <limings@nvidia.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240111173106.96958-1-limings@nvidia.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:49 +01:00
Fullway Wang	99f1abc34a	fbdev: sis: Error out if pixclock equals zero [ Upstream commit e421946be7d9bf545147bea8419ef8239cb7ca52 ] The userspace program could pass any values to the driver through ioctl() interface. If the driver doesn't check the value of pixclock, it may cause divide-by-zero error. In sisfb_check_var(), var->pixclock is used as a divisor to caculate drate before it is checked against zero. Fix this by checking it at the beginning. This is similar to CVE-2022-3061 in i740fb which was fixed by commit `15cf0b8`. Signed-off-by: Fullway Wang <fullwaywang@outlook.com> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
Fullway Wang	bc3c2e58d7	fbdev: savage: Error out if pixclock equals zero [ Upstream commit 04e5eac8f3ab2ff52fa191c187a46d4fdbc1e288 ] The userspace program could pass any values to the driver through ioctl() interface. If the driver doesn't check the value of pixclock, it may cause divide-by-zero error. Although pixclock is checked in savagefb_decode_var(), but it is not checked properly in savagefb_probe(). Fix this by checking whether pixclock is zero in the function savagefb_check_var() before info->var.pixclock is used as the divisor. This is similar to CVE-2022-3061 in i740fb which was fixed by commit `15cf0b8`. Signed-off-by: Fullway Wang <fullwaywang@outlook.com> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
Felix Fietkau	54b79d8786	wifi: mac80211: fix race condition on enabling fast-xmit [ Upstream commit bcbc84af1183c8cf3d1ca9b78540c2185cd85e7f ] fast-xmit must only be enabled after the sta has been uploaded to the driver, otherwise it could end up passing the not-yet-uploaded sta via drv_tx calls to the driver, leading to potential crashes because of uninitialized drv_priv data. Add a missing sta->uploaded check and re-check fast xmit after inserting a sta. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://msgid.link/20240104181059.84032-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
Michal Kazior	29df20cae2	wifi: cfg80211: fix missing interfaces when dumping [ Upstream commit a6e4f85d3820d00694ed10f581f4c650445dbcda ] The nl80211_dump_interface() supports resumption in case nl80211_send_iface() doesn't have the resources to complete its work. The logic would store the progress as iteration offsets for rdev and wdev loops. However the logic did not properly handle resumption for non-last rdev. Assuming a system with 2 rdevs, with 2 wdevs each, this could happen: dump(cb=[0, 0]): if_start=cb[1] (=0) send rdev0.wdev0 -> ok send rdev0.wdev1 -> yield cb[1] = 1 dump(cb=[0, 1]): if_start=cb[1] (=1) send rdev0.wdev1 -> ok // since if_start=1 the rdev0.wdev0 got skipped // through if_idx < if_start send rdev1.wdev1 -> ok The if_start needs to be reset back to 0 upon wdev loop end. The problem is actually hard to hit on a desktop, and even on most routers. The prerequisites for this manifesting was: - more than 1 wiphy - a few handful of interfaces - dump without rdev or wdev filter I was seeing this with 4 wiphys 9 interfaces each. It'd miss 6 interfaces from the last wiphy reported to userspace. Signed-off-by: Michal Kazior <michal@plume.com> Link: https://msgid.link/20240116142340.89678-1-kazikcz@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
Vinod Koul	22dced37d9	dmaengine: dw-edma: increase size of 'name' in debugfs code [ Upstream commit cb95a4fa50bbc1262bfb7fea482388a50b12948f ] We seem to have hit warnings of 'output may be truncated' which is fixed by increasing the size of 'name' drivers/dma/dw-edma/dw-hdma-v0-debugfs.c: In function ‘dw_hdma_v0_debugfs_on’: drivers/dma/dw-edma/dw-hdma-v0-debugfs.c:125:50: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 8 [-Werror=format-truncation=] 125 \| snprintf(name, sizeof(name), "%s:%d", CHANNEL_STR, i); \| ^~ drivers/dma/dw-edma/dw-hdma-v0-debugfs.c: In function ‘dw_hdma_v0_debugfs_on’: drivers/dma/dw-edma/dw-hdma-v0-debugfs.c:142:50: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 8 [-Werror=format-truncation=] 142 \| snprintf(name, sizeof(name), "%s:%d", CHANNEL_STR, i); \| ^~ drivers/dma/dw-edma/dw-edma-v0-debugfs.c: In function ‘dw_edma_debugfs_regs_wr’: drivers/dma/dw-edma/dw-edma-v0-debugfs.c:193:50: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 8 [-Werror=format-truncation=] 193 \| snprintf(name, sizeof(name), "%s:%d", CHANNEL_STR, i); \| ^~ Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
Vinod Koul	9f11992462	dmaengine: fsl-qdma: increase size of 'irq_name' [ Upstream commit 6386f6c995b3ab91c72cfb76e4465553c555a8da ] We seem to have hit warnings of 'output may be truncated' which is fixed by increasing the size of 'irq_name' drivers/dma/fsl-qdma.c: In function ‘fsl_qdma_irq_init’: drivers/dma/fsl-qdma.c:824:46: error: ‘%d’ directive writing between 1 and 11 bytes into a region of size 10 [-Werror=format-overflow=] 824 \| sprintf(irq_name, "qdma-queue%d", i); \| ^~ drivers/dma/fsl-qdma.c:824:35: note: directive argument in the range [-2147483641, 2147483646] 824 \| sprintf(irq_name, "qdma-queue%d", i); \| ^~~~~~~~~~~~~~ drivers/dma/fsl-qdma.c:824:17: note: ‘sprintf’ output between 12 and 22 bytes into a destination of size 20 824 \| sprintf(irq_name, "qdma-queue%d", i); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
Vinod Koul	6e400d6b96	dmaengine: shdma: increase size of 'dev_id' [ Upstream commit 404290240827c3bb5c4e195174a8854eef2f89ac ] We seem to have hit warnings of 'output may be truncated' which is fixed by increasing the size of 'dev_id' drivers/dma/sh/shdmac.c: In function ‘sh_dmae_probe’: drivers/dma/sh/shdmac.c:541:34: error: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 9 [-Werror=format-truncation=] 541 \| "sh-dmae%d.%d", pdev->id, id); \| ^~ In function ‘sh_dmae_chan_probe’, inlined from ‘sh_dmae_probe’ at drivers/dma/sh/shdmac.c:845:9: drivers/dma/sh/shdmac.c:541:26: note: directive argument in the range [0, 2147483647] 541 \| "sh-dmae%d.%d", pdev->id, id); \| ^~~~~~~~~~~~~~ drivers/dma/sh/shdmac.c:541:26: note: directive argument in the range [0, 19] drivers/dma/sh/shdmac.c:540:17: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 16 540 \| snprintf(sh_chan->dev_id, sizeof(sh_chan->dev_id), \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 541 \| "sh-dmae%d.%d", pdev->id, id); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
Shyam Prasad N	8d76726eeb	cifs: open_cached_dir should not rely on primary channel [ Upstream commit 936eba9cfb5cfbf6a2c762cd163605f2b784e03e ] open_cached_dir today selects ses->server a.k.a primary channel to send requests. When multichannel is used, the primary channel maybe down. So it does not make sense to rely only on that channel. This fix makes this function pick a channel with the standard helper function cifs_pick_channel. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
Dmitry Bogdanov	36bc5040c8	scsi: target: core: Add TMF to tmr_list handling [ Upstream commit 83ab68168a3d990d5ff39ab030ad5754cbbccb25 ] An abort that is responded to by iSCSI itself is added to tmr_list but does not go to target core. A LUN_RESET that goes through tmr_list takes a refcounter on the abort and waits for completion. However, the abort will be never complete because it was not started in target core. Unable to locate ITT: 0x05000000 on CID: 0 Unable to locate RefTaskTag: 0x05000000 on CID: 0. wait_for_tasks: Stopping tmf LUN_RESET with tag 0x0 ref_task_tag 0x0 i_state 34 t_state ISTATE_PROCESSING refcnt 2 transport_state active,stop,fabric_stop wait for tasks: tmf LUN_RESET with tag 0x0 ref_task_tag 0x0 i_state 34 t_state ISTATE_PROCESSING refcnt 2 transport_state active,stop,fabric_stop ... INFO: task kworker/0:2:49 blocked for more than 491 seconds. task:kworker/0:2 state:D stack: 0 pid: 49 ppid: 2 flags:0x00000800 Workqueue: events target_tmr_work [target_core_mod] Call Trace: __switch_to+0x2c4/0x470 _schedule+0x314/0x1730 schedule+0x64/0x130 schedule_timeout+0x168/0x430 wait_for_completion+0x140/0x270 target_put_cmd_and_wait+0x64/0xb0 [target_core_mod] core_tmr_lun_reset+0x30/0xa0 [target_core_mod] target_tmr_work+0xc8/0x1b0 [target_core_mod] process_one_work+0x2d4/0x5d0 worker_thread+0x78/0x6c0 To fix this, only add abort to tmr_list if it will be handled by target core. Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://lore.kernel.org/r/20240111125941.8688-1-d.bogdanov@yadro.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
Christoph Müllner	12d43aec0e	tools: selftests: riscv: Fix compile warnings in mm tests [ Upstream commit 12c16919652b5873f524c8b361336ecfa5ce5e6b ] When building the mm tests with a riscv32 compiler, we see a range of shift-count-overflow errors from shifting 1UL by more than 32 bits in do_mmaps(). Since, the relevant code is only called from code that is gated by `__riscv_xlen == 64`, we can just apply the same gating to do_mmaps(). Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231123185821.2272504-6-christoph.muellner@vrull.eu Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
Christoph Müllner	a613c64666	tools: selftests: riscv: Fix compile warnings in vector tests [ Upstream commit e1baf5e68ed128c1e22ba43e5190526d85de323c ] GCC prints a couple of format string warnings when compiling the vector tests. Let's follow the recommendation in Documentation/printk-formats.txt to fix these warnings. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20231123185821.2272504-5-christoph.muellner@vrull.eu Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
Mahesh Rajashekhara	df75b8ef71	scsi: smartpqi: Fix logical volume rescan race condition [ Upstream commit fb4cece17b4583f55b34a8538e27a4adc833c9d4 ] Correct rescan flag race condition. Multiple conditions are being evaluated before notifying OS to do a rescan. Driver will skip rescanning the device if any one of the following conditions are met: - Devices that have not yet been added to the OS or devices that have been removed. - Devices which are already marked for removal or in the phase of removal. Under very rare conditions, after logical volume size expansion, the OS still sees the size of the logical volume which was before expansion. The rescan flag in the driver is used to signal the need for a logical volume rescan. A race condition can occur in the driver, and it leads to one thread overwriting the flag inadvertently. As a result, driver is not notifying the OS SML to rescan the logical volume. Move device->rescan update into new function pqi_mark_volumes_for_rescan() and protect with a spin lock. Move check for device->rescan into new function pqi_volume_rescan_needed() and protect function call with a spin_lock. Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Co-developed-by: Murthy Bhat <Murthy.Bhat@microchip.com> Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20231219193653.277553-3-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:48 +01:00
David Strahan	ce10905116	scsi: smartpqi: Add new controller PCI IDs [ Upstream commit c6d5aa44eaf6d119f9ceb3bfc7d22405ac04232a ] All PCI ID entries in Hex. Add PCI IDs for Cisco controllers: VID / DID / SVID / SDID ---- ---- ---- ---- Cisco 24G TriMode M1 RAID 4GB FBWC 32D 9005 / 028f / 1137 / 02f8 Cisco 24G TriMode M1 RAID 4GB FBWC 16D 9005 / 028f / 1137 / 02f9 Cisco 24G TriMode M1 HBA 16D 9005 / 028f / 1137 / 02fa Add PCI IDs for CloudNine controllers: VID / DID / SVID / SDID ---- ---- ---- ---- SmartRAID P7604N-16i 9005 / 028f / 1f51 / 100e SmartRAID P7604N-8i 9005 / 028f / 1f51 / 100f SmartRAID P7504N-16i 9005 / 028f / 1f51 / 1010 SmartRAID P7504N-8i 9005 / 028f / 1f51 / 1011 SmartRAID P7504N-8i 9005 / 028f / 1f51 / 1043 SmartHBA P6500-8i 9005 / 028f / 1f51 / 1044 SmartRAID P7504-8i 9005 / 028f / 1f51 / 1045 Reviewed-by: Murthy Bhat <Murthy.Bhat@microchip.com> Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com> Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Signed-off-by: David Strahan <david.strahan@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Link: https://lore.kernel.org/r/20231219193653.277553-2-don.brace@microchip.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:47 +01:00
Hector Martin	43ee59fa01	dmaengine: apple-admac: Keep upper bits of REG_BUS_WIDTH [ Upstream commit 306f5df81fcc89b462fbeb9dbe26d9a8ad7c7582 ] For RX channels, REG_BUS_WIDTH seems to default to a value of 0xf00, and macOS preserves the upper bits when setting the configuration in the lower ones. If we reset the upper bits to 0, this causes framing errors on suspend/resume (the data stream "tears" and channels get swapped around). Keeping the upper bits untouched, like the macOS driver does, fixes this issue. Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Martin Povišer <povik+lin@cutebit.org> Signed-off-by: Martin Povišer <povik+lin@cutebit.org> Link: https://lore.kernel.org/r/20231029170704.82238-1-povik+lin@cutebit.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2024-03-01 13:34:47 +01:00
Jan Kiszka	5babeec518	riscv/efistub: Ensure GP-relative addressing is not used commit afb2a4fb84555ef9e61061f6ea63ed7087b295d5 upstream. The cflags for the RISC-V efistub were missing -mno-relax, thus were under the risk that the compiler could use GP-relative addressing. That happened for _edata with binutils-2.41 and kernel 6.1, causing the relocation to fail due to an invalid kernel_size in handle_kernel_image. It was not yet observed with newer versions, but that may just be luck. Cc: <stable@vger.kernel.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-03-01 13:34:47 +01:00
Dan Carpenter	6ea2f3b9b9	PCI: dwc: Fix a 64bit bug in dw_pcie_ep_raise_msix_irq() commit b5d1b4b46f856da1473c7ba9a5cdfcb55c9b2478 upstream. The "msg_addr" variable is u64. However, the "aligned_offset" is an unsigned int. This means that when the code does: msg_addr &= ~aligned_offset; it will unintentionally zero out the high 32 bits. Use ALIGN_DOWN() to do the alignment instead. Fixes: 2217fffcd63f ("PCI: dwc: endpoint: Fix dw_pcie_ep_raise_msix_irq() alignment support") Link: https://lore.kernel.org/r/af59c7ad-ab93-40f7-ad4a-7ac0b14d37f5@moroto.mountain Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-03-01 13:34:47 +01:00
Cyril Hrubis	74fd1b8c44	sched/rt: Disallow writing invalid values to sched_rt_period_us commit 079be8fc630943d9fc70a97807feb73d169ee3fc upstream. The validation of the value written to sched_rt_period_us was broken because: - the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec() Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check: if (sysclt_sched_rt_period <= 0) return EINVAL; This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead. Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX. As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace. There is also a LTP test being submitted for these sysctl files at: http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis@suse.cz/ Signed-off-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz Cc: Mahmoud Adam <mngyadam@amazon.com> Signed-off-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-03-01 13:34:47 +01:00
Greg Kroah-Hartman	d8a27ea2c9	Linux 6.6.18 Link: https://lore.kernel.org/r/20240220205637.572693592@linuxfoundation.org Tested-by: SeongJae Park <sj@kernel.org> Tested-by: Allen Pais <apais@linux.microsoft.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20240221125953.770767246@linuxfoundation.org Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com> Tested-by: Allen Pais <apais@linux.microsoft.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: kernelci.org bot <bot@kernelci.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Tested-by: Kelsey Steele <kelseysteele@linux.microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:28 +01:00
Dan Carpenter	9e083726d5	tracing: Fix a NULL vs IS_ERR() bug in event_subsystem_dir() commit 5264a2f4bb3baf712e19f1f053caaa8d7d3afa2e upstream. The eventfs_create_dir() function returns error pointers, it never returns NULL. Update the check to reflect that. Link: https://lore.kernel.org/linux-trace-kernel/ff641474-84e2-46a7-9d7a-62b251a1050c@moroto.mountain Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 5790b1fb3d67 ("eventfs: Remove eventfs_file and just use eventfs_inode") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:28 +01:00
Steven Rostedt (Google)	9389eaaca7	tracing: Make system_callback() function static commit 5ddd8baa4857709b4e5d84b376d735152851955b upstream. The system_callback() function in trace_events.c is only used within that file. The "static" annotation was missed. Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310051743.y9EobbUr-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:28 +01:00
Vegard Nossum	cec85aa54b	Documentation/arch/ia64/features.rst: fix kernel-feat directive My mainline commit c48a7c44a1d0 ("docs: kernel_feat.py: fix potential command injection") contains a bug which can manifests like this when building the documentation: Sphinx parallel build error: UnboundLocalError: local variable 'fname' referenced before assignment make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2 However, this only appears when there exists a '.. kernel-feat::' directive that points to a non-existent file, which isn't the case in mainline. When this commit was backported to stable 6.6, it didn't change Documentation/arch/ia64/features.rst since ia64 was removed in 6.7 in commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture"). This lead to the build failure seen above -- but only in stable kernels. This patch fixes the backport and should only be applied to kernels where Documentation/arch/ia64/features.rst exists and commit c48a7c44a1d0 has also been applied. A second patch will follow to fix kernel_feat.py in mainline so that it doesn't error out when the '.. kernel-feat::' directive points to a nonexistent file. Link: https://lore.kernel.org/all/ZbkfGst991YHqJHK@fedora64.linuxtx.org/ Fixes: `e961f8c696` ("docs: kernel_feat.py: fix potential command injection") # stable 6.6.15 Reported-by: Justin Forbes <jforbes@fedoraproject.org> Reported-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:28 +01:00
Borislav Petkov (AMD)	ccce12ecf2	x86/barrier: Do not serialize MSR accesses on AMD commit 04c3024560d3a14acd18d0a51a1d0a89d29b7eb5 upstream. AMD does not have the requirement for a synchronization barrier when acccessing a certain group of MSRs. Do not incur that unnecessary penalty there. There will be a CPUID bit which explicitly states that a MFENCE is not needed. Once that bit is added to the APM, this will be extended with it. While at it, move to processor.h to avoid include hell. Untangling that file properly is a matter for another day. Some notes on the performance aspect of why this is relevant, courtesy of Kishon VijayAbraham <Kishon.VijayAbraham@amd.com>: On a AMD Zen4 system with 96 cores, a modified ipi-bench[1] on a VM shows x2AVIC IPI rate is 3% to 4% lower than AVIC IPI rate. The ipi-bench is modified so that the IPIs are sent between two vCPUs in the same CCX. This also requires to pin the vCPU to a physical core to prevent any latencies. This simulates the use case of pinning vCPUs to the thread of a single CCX to avoid interrupt IPI latency. In order to avoid run-to-run variance (for both x2AVIC and AVIC), the below configurations are done: 1) Disable Power States in BIOS (to prevent the system from going to lower power state) 2) Run the system at fixed frequency 2500MHz (to prevent the system from increasing the frequency when the load is more) With the above configuration: ) Performance measured using ipi-bench for AVIC: Average Latency: 1124.98ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.6759M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] ) Performance measured using ipi-bench for x2AVIC: Average Latency: 1172.42ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 40.9432M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] From above, x2AVIC latency is ~4% more than AVIC. However, the expectation is x2AVIC performance to be better or equivalent to AVIC. Upon analyzing the perf captures, it is observed significant time is spent in weak_wrmsr_fence() invoked by x2apic_send_IPI(). With the fix to skip weak_wrmsr_fence() *) Performance measured using ipi-bench for x2AVIC: Average Latency: 1117.44ns [Time to send IPI from one vCPU to another vCPU] Cumulative throughput: 42.9608M/s [Total number of IPIs sent in a second from 48 vCPUs simultaneously] Comparing the performance of x2AVIC with and without the fix, it can be seen the performance improves by ~4%. Performance captured using an unmodified ipi-bench using the 'mesh-ipi' option with and without weak_wrmsr_fence() on a Zen4 system also showed significant performance improvement without weak_wrmsr_fence(). The 'mesh-ipi' option ignores CCX or CCD and just picks random vCPU. Average throughput (10 iterations) with weak_wrmsr_fence(), Cumulative throughput: 4933374 IPI/s Average throughput (10 iterations) without weak_wrmsr_fence(), Cumulative throughput: 6355156 IPI/s [1] https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/ipi-bench Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230622095212.20940-1-bp@alien8.de Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:27 +01:00
Mikulas Patocka	438d19492b	dm: limit the number of targets and parameter size area commit bd504bcfec41a503b32054da5472904b404341a4 upstream. The kvmalloc function fails with a warning if the size is larger than INT_MAX. The warning was triggered by a syscall testing robot. In order to avoid the warning, this commit limits the number of targets to 1048576 and the size of the parameter area to 1073741824. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:27 +01:00
Ryusuke Konishi	2c3bdba002	nilfs2: fix potential bug in end_buffer_async_write commit 5bc09b397cbf1221f8a8aacb1152650c9195b02b upstream. According to a syzbot report, end_buffer_async_write(), which handles the completion of block device writes, may detect abnormal condition of the buffer async_write flag and cause a BUG_ON failure when using nilfs2. Nilfs2 itself does not use end_buffer_async_write(). But, the async_write flag is now used as a marker by commit `7f42ec3941` ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") as a means of resolving double list insertion of dirty blocks in nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the resulting crash. This modification is safe as long as it is used for file data and b-tree node blocks where the page caches are independent. However, it was irrelevant and redundant to also introduce async_write for segment summary and super root blocks that share buffers with the backing device. This led to the possibility that the BUG_ON check in end_buffer_async_write would fail as described above, if independent writebacks of the backing device occurred in parallel. The use of async_write for segment summary buffers has already been removed in a previous change. Fix this issue by removing the manipulation of the async_write flag for the remaining super root block buffer. Link: https://lkml.kernel.org/r/20240203161645.4992-1-konishi.ryusuke@gmail.com Fixes: `7f42ec3941` ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+5c04210f7c7f897c1e7f@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000019a97c05fd42f8c8@google.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:27 +01:00
Saravana Kannan	c20fc13082	of: property: Add in-ports/out-ports support to of_graph_get_port_parent() commit 8f1e0d791b5281f3a38620bc7c57763dc551be15 upstream. Similar to the existing "ports" node name, coresight device tree bindings have added "in-ports" and "out-ports" as standard node names for a collection of ports. Add support for these name to of_graph_get_port_parent() so that remote-endpoint parsing can find the correct parent node for these coresight ports too. Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20240207011803.2637531-4-saravanak@google.com Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:27 +01:00
Linus Torvalds	b6a2a9cbb6	sched/membarrier: reduce the ability to hammer on sys_membarrier commit 944d5fe50f3f03daacfea16300e656a1691c4a23 upstream. On some systems, sys_membarrier can be very expensive, causing overall slowdowns for everything. So put a lock on the path in order to serialize the accesses to prevent the ability for this to be called at too high of a frequency and saturate the machine. Reviewed-and-tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Borislav Petkov <bp@alien8.de> Fixes: `22e4ebb975` ("membarrier: Provide expedited private command") Fixes: `c5f58bd58f` ("membarrier: Provide GLOBAL_EXPEDITED command") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:27 +01:00
Ard Biesheuvel	0a962f2fba	x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section commit 1ad55cecf22f05f1c884adf63cc09d3c3e609ebf upstream. The .compat section is a dummy PE section that contains the address of the 32-bit entrypoint of the 64-bit kernel image if it is bootable from 32-bit firmware (i.e., CONFIG_EFI_MIXED=y) This section is only 8 bytes in size and is only referenced from the loader, and so it is placed at the end of the memory view of the image, to avoid the need for padding it to 4k, which is required for sections appearing in the middle of the image. Unfortunately, this violates the PE/COFF spec, and even if most EFI loaders will work correctly (including the Tianocore reference implementation), PE loaders do exist that reject such images, on the basis that both the file and memory views of the file contents should be described by the section headers in a monotonically increasing manner without leaving any gaps. So reorganize the sections to avoid this issue. This results in a slight padding overhead (< 4k) which can be avoided if desired by disabling CONFIG_EFI_MIXED (which is only needed in rare cases these days) Fixes: 3e3eabe26dc8 ("x86/boot: Increase section and file alignment to 4k/512") Reported-by: Mike Beaton <mjsbeaton@gmail.com> Link: https://lkml.kernel.org/r/CAHzAAWQ6srV6LVNdmfbJhOwhBw5ZzxxZZ07aHt9oKkfYAdvuQQ%40mail.gmail.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:27 +01:00
Ard Biesheuvel	686b58ce50	x86/boot: Increase section and file alignment to 4k/512 commit 3e3eabe26dc88692d34cf76ca0e0dd331481cc15 upstream. Align x86 with other EFI architectures, and increase the section alignment to the EFI page size (4k), so that firmware is able to honour the section permission attributes and map code read-only and data non-executable. There are a number of requirements that have to be taken into account: - the sign tools get cranky when there are gaps between sections in the file view of the image - the virtual offset of each section must be aligned to the image's section alignment - the file offset and size of each section must be aligned to the image's file alignment - the image size must be aligned to the section alignment - each section's virtual offset must be greater than or equal to the size of the headers. In order to meet all these requirements, while avoiding the need for lots of padding to accommodate the .compat section, the latter is placed at an arbitrary offset towards the end of the image, but aligned to the minimum file alignment (512 bytes). The space before the .text section is therefore distributed between the PE header, the .setup section and the .compat section, leaving no gaps in the file coverage, making the signing tools happy. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230915171623.655440-18-ardb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:27 +01:00
Ard Biesheuvel	f7eedad780	x86/boot: Split off PE/COFF .data section commit 34951f3c28bdf6481d949a20413b2ce7693687b2 upstream. Describe the code and data of the decompressor binary using separate .text and .data PE/COFF sections, so that we will be able to map them using restricted permissions once we increase the section and file alignment sufficiently. This avoids the need for memory mappings that are writable and executable at the same time, which is something that is best avoided for security reasons. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230915171623.655440-17-ardb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:27 +01:00
Ard Biesheuvel	476316bb48	x86/boot: Drop PE/COFF .reloc section commit fa5750521e0a4efbc1af05223da9c4bbd6c21c83 upstream. Ancient buggy EFI loaders may have required a .reloc section to be present at some point in time, but this has not been true for a long time so the .reloc section can just be dropped. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230915171623.655440-16-ardb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:27 +01:00
Ard Biesheuvel	0db81e8e20	x86/boot: Construct PE/COFF .text section from assembler commit efa089e63b56bdc5eca754b995cb039dd7a5457e upstream. Now that the size of the setup block is visible to the assembler, it is possible to populate the PE/COFF header fields from the asm code directly, instead of poking the values into the binary using the build tool. This will make it easier to reorganize the section layout without having to tweak the build tool in lockstep. This change has no impact on the resulting bzImage binary. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230915171623.655440-15-ardb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:26 +01:00
Ard Biesheuvel	0cf3d613a1	x86/boot: Derive file size from _edata symbol commit aeb92067f6ae994b541d7f9752fe54ed3d108bcc upstream. Tweak the linker script so that the value of _edata represents the decompressor binary's file size rounded up to the appropriate alignment. This removes the need to calculate it in the build tool, and will make it easier to refer to the file size from the header directly in subsequent changes to the PE header layout. While adding _edata to the sed regex that parses the compressed vmlinux's symbol list, tweak the regex a bit for conciseness. This change has no impact on the resulting bzImage binary when configured with CONFIG_EFI_STUB=y. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230915171623.655440-14-ardb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:26 +01:00
Ard Biesheuvel	c731fbcfdb	x86/boot: Define setup size in linker script commit 093ab258e3fb1d1d3afdfd4a69403d44ce90e360 upstream. The setup block contains the real mode startup code that is used when booting from a legacy BIOS, along with the boot_params/setup_data that is used by legacy x86 bootloaders to pass the command line and initial ramdisk parameters, among other things. The setup block also contains the PE/COFF header of the entire combined image, which includes the compressed kernel image, the decompressor and the EFI stub. This PE header describes the layout of the executable image in memory, and currently, the fact that the setup block precedes it makes it rather fiddly to get the right values into the right place in the final image. Let's make things a bit easier by defining the setup_size in the linker script so it can be referenced from the asm code directly, rather than having to rely on the build tool to calculate it. For the time being, add 64 bytes of fixed padding for the .reloc and .compat sections - this will be removed in a subsequent patch after the PE/COFF header has been reorganized. This change has no impact on the resulting bzImage binary when configured with CONFIG_EFI_MIXED=y. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230915171623.655440-13-ardb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:26 +01:00
Ard Biesheuvel	431b39e625	x86/boot: Set EFI handover offset directly in header asm commit eac956345f99dda3d68f4ae6cf7b494105e54780 upstream. The offsets of the EFI handover entrypoints are available to the assembler when constructing the header, so there is no need to set them from the build tool afterwards. This change has no impact on the resulting bzImage binary. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230915171623.655440-12-ardb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:26 +01:00
Ard Biesheuvel	8e102324e7	x86/boot: Grab kernel_info offset from zoffset header directly commit 2e765c02dcbfc2a8a4527c621a84b9502f6b9bd2 upstream. Instead of parsing zoffset.h and poking the kernel_info offset value into the header from the build tool, just grab the value directly in the asm file that describes this header. This change has no impact on the resulting bzImage binary. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230915171623.655440-11-ardb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:26 +01:00
Ard Biesheuvel	a38801ba18	x86/boot: Drop references to startup_64 commit b618d31f112bea3d2daea19190d63e567f32a4db upstream. The x86 boot image generation tool assign a default value to startup_64 and subsequently parses the actual value from zoffset.h but it never actually uses the value anywhere. So remove this code. This change has no impact on the resulting bzImage binary. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230912090051.4014114-25-ardb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:26 +01:00
Ard Biesheuvel	08796fc9bf	x86/boot: Drop redundant code setting the root device commit 7448e8e5d15a3c4df649bf6d6d460f78396f7e1e upstream. The root device defaults to 0,0 and is no longer configurable at build time [0], so there is no need for the build tool to ever write to this field. [0] `079f85e624` ("x86, build: Do not set the root_dev field in bzImage") This change has no impact on the resulting bzImage binary. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230912090051.4014114-23-ardb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:26 +01:00
Ard Biesheuvel	4bac079dba	x86/boot: Omit compression buffer from PE/COFF image memory footprint commit 8eace5b3555606e684739bef5bcdfcfe68235257 upstream. Now that the EFI stub decompresses the kernel and hands over to the decompressed image directly, there is no longer a need to provide a decompression buffer as part of the .BSS allocation of the PE/COFF image. It also means the PE/COFF image can be loaded anywhere in memory, and setting the preferred image base is unnecessary. So drop the handling of this from the header and from the build tool. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230912090051.4014114-22-ardb@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-02-23 09:25:26 +01:00

1 2 3 4 5 ...

1221287 Commits All Branches Search

1221287 Commits

All Branches