Similar to the ancient commit a5fe8e7695 ("regulatory: add NUL
to alpha2"), add another byte to alpha2 in the request struct so
that when we use nla_put_string(), we don't overrun anything.
Fixes: 73d54c9e74 ("cfg80211: add regulatory netlink multicast group")
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
- Lots of fixes for the new IOCTL interface and general uverbs flow.
Found through testing and syzkaller
- Bugfixes for the new resource track netlink reporting
- Remove some unneeded WARN_ONs that were triggering for some users in IPoIB
- Various fixes for the bnxt_re driver
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJajcKaAAoJELgmozMOVy/dD7IP/AiIZm8Inop73694/2fVDkKN
UqfGLPqJ5MVgjfDFfurYlBxIMuI2zF0OTWXx1GcEO16Pj9pzGsKV+5qqpnOgTgsZ
zXsJhzsrScZGd5IHW+a+kKeLZpJK+ViYRgbYmYaPRI/9FuqfzZkwloAR3jKh1S4q
Y15pbyVW0zW/xekZQykvnySmowwGO3s8IC1453cHW+mwWwMaDoj0B5rwH4HB4eZR
oOC/FZRrgBAzh8VLKO3NFgrx1DLHnUkS3za1HcGDBbM3Va0iOFWG3SD1mxxd8UET
188gWK268TgzDdMsCAuUdz2Vp3rRr7EziyQKopikCSvRV0IQrfK6LdN4erh8yysQ
bPiHjuUD++IfSupk3PoagYH5GbNv1ip4uR4Rl/QD3lezH6UMKC2Npv7Ucqfvwioa
tL2nkpDHd/fktvrhfsGuNYzKhtCIwlsINvjEM7sydZDYOpxNmEu/R3imQdfiLH4Z
38BmlIQoX1wFpyOuUnqVjaI3EapVSFjCKlDralO0gVpcDYj4yM88K7rA0cSnlAfC
8xOmXti3zatrH02LFFUugujGspEn771QtHqW/gd6XjWZwajacbP0i/tB5rnIAxlP
nNLbNsABcFy745SPkoDG1OymCsvhHk+lkTobEdGjgzPVE2ipOoq1VHG1KlJRF4is
GO0m3JHr9YdLlAwsu+PS
=Fir8
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"Nothing in this is overly interesting, it's mostly your garden variety
fixes.
There was some work in this merge cycle around the new ioctl kABI, so
there are fixes in here related to that (probably with more to come).
We've also recently added new netlink support with a goal of moving
the primary means of configuring the entire subsystem to netlink
(eventually, this is a long term project), so there are fixes for
that.
Then a few bnxt_re driver fixes, and a few minor WARN_ON removals, and
that covers this pull request. There are already a few more fixes on
the list as of this morning, so there will certainly be more to come
in this rc cycle ;-)
Summary:
- Lots of fixes for the new IOCTL interface and general uverbs flow.
Found through testing and syzkaller
- Bugfixes for the new resource track netlink reporting
- Remove some unneeded WARN_ONs that were triggering for some users
in IPoIB
- Various fixes for the bnxt_re driver"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (27 commits)
RDMA/uverbs: Fix kernel panic while using XRC_TGT QP type
RDMA/bnxt_re: Avoid system hang during device un-reg
RDMA/bnxt_re: Fix system crash during load/unload
RDMA/bnxt_re: Synchronize destroy_qp with poll_cq
RDMA/bnxt_re: Unpin SQ and RQ memory if QP create fails
RDMA/bnxt_re: Disable atomic capability on bnxt_re adapters
RDMA/restrack: don't use uaccess_kernel()
RDMA/verbs: Check existence of function prior to accessing it
RDMA/vmw_pvrdma: Fix usage of user response structures in ABI file
RDMA/uverbs: Sanitize user entered port numbers prior to access it
RDMA/uverbs: Fix circular locking dependency
RDMA/uverbs: Fix bad unlock balance in ib_uverbs_close_xrcd
RDMA/restrack: Increment CQ restrack object before committing
RDMA/uverbs: Protect from command mask overflow
IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroy
IB/uverbs: Improve lockdep_check
RDMA/uverbs: Protect from races between lookup and destroy of uobjects
IB/uverbs: Hold the uobj write lock after allocate
IB/uverbs: Fix possible oops with duplicate ioctl attributes
IB/uverbs: Add ioctl support for 32bit processes
...
This pull request contains a handful of small cleanups. The only
functional change is that IRQs are now enabled during exception
handling, which was found when some warnings triggered with
`CONFIG_DEBUG_ATOMIC_SLEEP=y`. The remaining fixes should have no
functional change: `sbi_save()` has been renamed to `parse_dtb()`
reflect what it actually does, and a handful of unused Kconfig entries
have been removed.
This is based on rc1 as a break to my usual flow, as it appears I missed
rc2 over the holiday. I've merged master as of Wednesday morning,
af3e79d295 ("Merge tag 'leds_for-4.16-rc3' of
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds")
without any conflicts and I've given it a simple build test. If this
isn't OK then feel free to drop the patch set and I'll send another
against rc3 for rc4.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlqNnPYTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQdslD/48VsB5shoMy2hzs7N0myJ0869zB9w4
C+7A+vgJQCutDbzZVRLCWhRLqZ3bL51ids1cnDtOMISYqHctuIW7xGOtoJELm1qc
78KXyhr6UvIFqPHyVGWOygW52dzDdrL7HKkDvy+TWk11yWIDZpgqVSVWJPZrvTm2
qK5IU2NpFVeH8pKkq7G4hZuWRJh2XlPefH7pOh1NYIDK90UShUOK1BmIahraq7Jb
wIlR6Xxet2BoxjG5wHhSG9JOYJCsgEeoEb6D0fJJz4JIBXHySLUWb3ZfwGstLY2m
AGO485E+EU0YL6E5AhPrenCzX4xyJlOTb7spAFAqTEkKyytwdM27A+Yv95wo47NZ
FtlsIRyQsiU6+XWCTvZP53ARQB8J92Pyw1KyDBf8WCl47BJOKwlCBNOvKNDLXtKK
E9h8FuXsBHGeay/+QQCWvwr7uqT2Z4Q33MYBKlmRVAQlEGj10WywVqai4Kk5Tz3f
tmvhHxtbA3Fuu4OACM8xZ4m/vDj+pGLkKw91v9XHF8hcBB5x/fkVRt+GHmg5EsKN
n9f936OpvT8RoBAvCNrtK7zXNWemwKmzkAH0GOdTpkW2fSSokTT9Gnyb66pvrIPL
TH0SqKOMQ8vQMMm/W0nxuqxaF99qvxoy2bTqa2ojT/NUZQqXz42WxSfhoeRJFTvF
UdElhd88BBwcoA==
=9zYf
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-4.16-rc3-riscv_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V cleanups from Palmer Dabbelt:
"This contains a handful of small cleanups.
The only functional change is that IRQs are now enabled during
exception handling, which was found when some warnings triggered with
`CONFIG_DEBUG_ATOMIC_SLEEP=y`.
The remaining fixes should have no functional change: `sbi_save()` has
been renamed to `parse_dtb()` reflect what it actually does, and a
handful of unused Kconfig entries have been removed"
* tag 'riscv-for-linus-4.16-rc3-riscv_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
Rename sbi_save to parse_dtb to improve code readability
RISC-V: Enable IRQ during exception handling
riscv: Remove ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select
riscv: kconfig: Remove RISCV_IRQ_INTC select
riscv: Remove ARCH_WANT_OPTIONAL_GPIOLIB select
MV88E6352 and later switches support GPIO control through the "Scratch
& Misc" global2 register. Two of the pins controlled this way on the
mv88e6390 family are the external MDIO pins. They can either by used
as part of the MII interface for port 0, GPIOs, or MDIO. Add a
function to configure them for MDIO, if possible, and call it when
registering the external MDIO bus.
Suggested-by: Russell King <rmk@armlinux.org.uk>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the recent change, transmissions that only needed
one descriptor were being missed. The result is that such
packets were tracked as outstanding transmissions but never
removed when its completion notification was received.
Fixes: ffc385b95a ("ibmvnic: Keep track of supplementary TX descriptors")
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The login buffer is released before the driver can perform
sanity checks between resources the driver requested and what
firmware will provide. Don't release the login buffer until
the sanity check is performed.
Fixes: 34f0f4e3f4 ("ibmvnic: Fix login buffer memory leaks")
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AFAIK the only version of smc9194.c with Mac support is the one in the
linux-mac68k CVS repo, which never made it to the mainline.
Despite that, from v2.3.45, arch/m68k/config.in listed CONFIG_SMC9194
under CONFIG_MAC. This mistake got carried over into Kconfig in v2.5.55.
(See pre-git era "[PATCH] add m68k dependencies to net driver config".)
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series includes shared code updates for mlx5 core driver for both
netdev and rdma subsystems.
By Saeed,
First six patches of the series are meant to address a performance issue
and should provide a performance boost for multi core IRQ interrupt hungry
workloads. The issue is fixed in the first patch, all other patches are
meant to refactor the code in light of this fix.
The problem it comes to fix, is a shared spinlock accessed across all HCA
IRQs which protects the CQ database. To solve this we simply move the CQ
database and its spinlock to be per EQ (IRQ), thus per core.
By Yonatan,
Fragmented completion queue (CQ) for RDMA,
core driver implementation to create fragmented CQ buffers rather than
one large contiguous memory buffer, the implementation scheme already
exist and used by the netdev CQs, the patch shares that code with the
rdma CQ creation flow and makes use of the new API in mlx5_ib driver.
Thanks,
Saeed.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJajcwxAAoJEEg/ir3gV/o+uJAIAICP2LBu5ANBgX7Z7/B0IZX4
Abkfk+IKR44v+3t0BBgEAZI2A9LcTQ8pwIOeLCDpcKd78665Y7rYQL1p31WuPKc5
1WwwBesjQLAmdiNLIjUnjW1fd3YyOmqLBcaMMYIeF0m4m4pCQue1kwtxesQrhdZV
I7FBxxnQt/8o6hoh/5nmU4DzGyZaY+uR/4FxUCEreklfhNeI5M2DZKe1xQCmhsJ4
Tosh/cJ8kSBJfclH6hk1lh5eWGDYDltWCpY86KBWsYktl10VAE3P7LPmekn487ta
e6cL1NPoEHbp+/UBThkNZMzUnA7wpoNeDbTTtVZrdcL1KivwColt1r3pmS1bqEQ=
=DlfW
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2018-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5-updates-2018-02-21
This series includes shared code updates for mlx5 core driver for both
netdev and rdma subsystems.
By Saeed,
First six patches of the series are meant to address a performance issue
and should provide a performance boost for multi core IRQ interrupt hungry
workloads. The issue is fixed in the first patch, all other patches are
meant to refactor the code in light of this fix.
The problem it comes to fix, is a shared spinlock accessed across all HCA
IRQs which protects the CQ database. To solve this we simply move the CQ
database and its spinlock to be per EQ (IRQ), thus per core.
By Yonatan,
Fragmented completion queue (CQ) for RDMA,
core driver implementation to create fragmented CQ buffers rather than
one large contiguous memory buffer, the implementation scheme already
exist and used by the netdev CQs, the patch shares that code with the
rdma CQ creation flow and makes use of the new API in mlx5_ib driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The result of the skb flow dissect is copied from keys to hash_keys to
ensure only the intended data is hashed. The original L4 hash patch
overlooked setting the addr_type for this case; add it.
Fixes: bf4e0a3db9 ("net: ipv4: add support for ECMP hash policy choice")
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BBR uses tcp_tso_autosize() in an attempt to probe what would be the
burst sizes and to adjust cwnd in bbr_target_cwnd() with following
gold formula :
/* Allow enough full-sized skbs in flight to utilize end systems. */
cwnd += 3 * bbr->tso_segs_goal;
But GSO can be lacking or be constrained to very small
units (ip link set dev ... gso_max_segs 2)
What we really want is to have enough packets in flight so that both
GSO and GRO are efficient.
So in the case GSO is off or downgraded, we still want to have the same
number of packets in flight as if GSO/TSO was fully operational, so
that GRO can hopefully be working efficiently.
To fix this issue, we make tcp_tso_autosize() unaware of
sk->sk_gso_max_segs
Only tcp_tso_segs() has to enforce the gso_max_segs limit.
Tested:
ethtool -K eth0 tso off gso off
tc qd replace dev eth0 root pfifo_fast
Before patch:
for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
691 (ss -temoi shows cwnd is stuck around 6 )
667
651
631
517
After patch :
# for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
1733 (ss -temoi shows cwnd is around 386 )
1778
1746
1781
1718
Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an attempt is made to disable RX checksums, USB adapter is changed
but netdev->features is not, because smsc75xx_set_features() returns a
non zero value.
This throws errors from netdev_rx_csum_fault() :
<devname>: hw csum failure
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before, if cb->start() failed, the module reference would never be put,
because cb->cb_running is intentionally false at this point. Users are
generally annoyed by this because they can no longer unload modules that
leak references. Also, it may be possible to tediously wrap a reference
counter back to zero, especially since module.c still uses atomic_inc
instead of refcount_inc.
This patch expands the error path to simply call module_put if
cb->start() fails.
Fixes: 41c87425a1 ("netlink: do not set cb_running if dump's start() errs")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge misc fixes from Andrew Morton:
"16 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: don't defer struct page initialization for Xen pv guests
lib/Kconfig.debug: enable RUNTIME_TESTING_MENU
vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems
selftests/memfd: add run_fuse_test.sh to TEST_FILES
bug.h: work around GCC PR82365 in BUG()
mm/swap.c: make functions and their kernel-doc agree (again)
mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc
ida: do zeroing in ida_pre_get()
mm, swap, frontswap: fix THP swap if frontswap enabled
certs/blacklist_nohashes.c: fix const confusion in certs blacklist
kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
mm, mlock, vmscan: no more skipping pagevecs
mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats
Kbuild: always define endianess in kconfig.h
include/linux/sched/mm.h: re-inline mmdrop()
tools: fix cross-compile var clobbering
Each read from a file in efivarfs results in two calls to EFI
(one to get the file size, another to get the actual data).
On X86 these EFI calls result in broadcast system management
interrupts (SMI) which affect performance of the whole system.
A malicious user can loop performing reads from efivarfs bringing
the system to its knees.
Linus suggested per-user rate limit to solve this.
So we add a ratelimit structure to "user_struct" and initialize
it for the root user for no limit. When allocating user_struct for
other users we set the limit to 100 per second. This could be used
for other places that want to limit the rate of some detrimental
user action.
In efivarfs if the limit is exceeded when reading, we take an
interruptible nap for 50ms and check the rate limit again.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The header files for some structures could get included in such a way
that struct attributes (specifically __randomize_layout from path.h) would
be parsed as variable names instead of attributes. This could lead to
some instances of a structure being unrandomized, causing nasty GPFs, etc.
This patch makes sure the compiler_types.h header is included in
kconfig.h so that we've always got types and struct attributes defined,
since kconfig.h is included from the compiler command line.
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Root-caused-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Fixes: 3859a271a0 ("randstruct: Mark various structs for randomization")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On i386, there are 2 types of PLTs, PIC and non-PIC. PIE and shared
objects must use PIC PLT. To use PIC PLT, you need to load
_GLOBAL_OFFSET_TABLE_ into EBX first. There is no need for that on
x86-64 since x86-64 uses PC-relative PLT.
On x86-64, for 32-bit PC-relative branches, we can generate PLT32
relocation, instead of PC32 relocation, which can also be used as
a marker for 32-bit PC-relative branches. Linker can always reduce
PLT32 relocation to PC32 if function is defined locally. Local
functions should use PC32 relocation. As far as Linux kernel is
concerned, R_X86_64_PLT32 can be treated the same as R_X86_64_PC32
since Linux kernel doesn't use PLT.
R_X86_64_PLT32 for 32-bit PC-relative branches has been enabled in
binutils master branch which will become binutils 2.31.
[ hjl is working on having better documentation on this all, but a few
more notes from him:
"PLT32 relocation is used as marker for PC-relative branches. Because
of EBX, it looks odd to generate PLT32 relocation on i386 when EBX
doesn't have GOT.
As for symbol resolution, PLT32 and PC32 relocations are almost
interchangeable. But when linker sees PLT32 relocation against a
protected symbol, it can resolved locally at link-time since it is
used on a branch instruction. Linker can't do that for PC32
relocation"
but for the kernel use, the two are basically the same, and this
commit gets things building and working with the current binutils
master - Linus ]
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kmalloc() can't always allocate large enough buffers for big_key to use for
crypto (1MB + some metadata) so we cannot use that to allocate the buffer.
Further, vmalloc'd pages can't be passed to sg_init_one() and the aead
crypto accessors cannot be called progressively and must be passed all the
data in one go (which means we can't pass the data in one block at a time).
Fix this by allocating the buffer pages individually and passing them
through a multientry scatterlist to the crypto layer. This has the bonus
advantage that we don't have to allocate a contiguous series of pages.
We then vmap() the page list and pass that through to the VFS read/write
routines.
This can trigger a warning:
WARNING: CPU: 0 PID: 60912 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0xb7c/0x15f8
([<00000000002acbb6>] __alloc_pages_nodemask+0x1ee/0x15f8)
[<00000000002dd356>] kmalloc_order+0x46/0x90
[<00000000002dd3e0>] kmalloc_order_trace+0x40/0x1f8
[<0000000000326a10>] __kmalloc+0x430/0x4c0
[<00000000004343e4>] big_key_preparse+0x7c/0x210
[<000000000042c040>] key_create_or_update+0x128/0x420
[<000000000042e52c>] SyS_add_key+0x124/0x220
[<00000000007bba2c>] system_call+0xc4/0x2b0
from the keyctl/padd/useradd test of the keyutils testsuite on s390x.
Note that it might be better to shovel data through in page-sized lumps
instead as there's no particular need to use a monolithic buffer unless the
kernel itself wants to access the data.
Fixes: 13100a72f4 ("Security: Keys: Big keys stored encrypted")
Reported-by: Paul Bunyan <pbunyan@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Kirill Marinushkin <k.marinushkin@gmail.com>
The asymmetric key type allows an X.509 certificate to be added even if
its signature's hash algorithm is not available in the crypto API. In
that case 'payload.data[asym_auth]' will be NULL. But the key
restriction code failed to check for this case before trying to use the
signature, resulting in a NULL pointer dereference in
key_or_keyring_common() or in restrict_link_by_signature().
Fix this by returning -ENOPKG when the signature is unsupported.
Reproducer when all the CONFIG_CRYPTO_SHA512* options are disabled and
keyctl has support for the 'restrict_keyring' command:
keyctl new_session
keyctl restrict_keyring @s asymmetric builtin_trusted
openssl req -new -sha512 -x509 -batch -nodes -outform der \
| keyctl padd asymmetric desc @s
Fixes: a511e1af8b ("KEYS: Move the point of trust determination to __key_link()")
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
The X.509 parser mishandles the case where the certificate's signature's
hash algorithm is not available in the crypto API. In this case,
x509_get_sig_params() doesn't allocate the cert->sig->digest buffer;
this part seems to be intentional. However,
public_key_verify_signature() is still called via
x509_check_for_self_signed(), which triggers the 'BUG_ON(!sig->digest)'.
Fix this by making public_key_verify_signature() return -ENOPKG if the
hash buffer has not been allocated.
Reproducer when all the CONFIG_CRYPTO_SHA512* options are disabled:
openssl req -new -sha512 -x509 -batch -nodes -outform der \
| keyctl padd asymmetric desc @s
Fixes: 6c2dc5ae4a ("X.509: Extract signature digest and make self-signed cert checks earlier")
Reported-by: Paolo Valente <paolo.valente@linaro.org>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
If none of the certificates in a SignerInfo's certificate chain match a
trusted key, nor is the last certificate signed by a trusted key, then
pkcs7_validate_trust_one() tries to check whether the SignerInfo's
signature was made directly by a trusted key. But, it actually fails to
set the 'sig' variable correctly, so it actually verifies the last
signature seen. That will only be the SignerInfo's signature if the
certificate chain is empty; otherwise it will actually be the last
certificate's signature.
This is not by itself a security problem, since verifying any of the
certificates in the chain should be sufficient to verify the SignerInfo.
Still, it's not working as intended so it should be fixed.
Fix it by setting 'sig' correctly for the direct verification case.
Fixes: 757932e6da ("PKCS#7: Handle PKCS#7 messages that contain no X.509 certs")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
If there is a blacklisted certificate in a SignerInfo's certificate
chain, then pkcs7_verify_sig_chain() sets sinfo->blacklisted and returns
0. But, pkcs7_verify() fails to handle this case appropriately, as it
actually continues on to the line 'actual_ret = 0;', indicating that the
SignerInfo has passed verification. Consequently, PKCS#7 signature
verification ignores the certificate blacklist.
Fix this by not considering blacklisted SignerInfos to have passed
verification.
Also fix the function comment with regards to when 0 is returned.
Fixes: 03bb79315d ("PKCS#7: Handle blacklisted certificates")
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
When pkcs7_verify_sig_chain() is building the certificate chain for a
SignerInfo using the certificates in the PKCS#7 message, it is passing
the wrong arguments to public_key_verify_signature(). Consequently,
when the next certificate is supposed to be used to verify the previous
certificate, the next certificate is actually used to verify itself.
An attacker can use this bug to create a bogus certificate chain that
has no cryptographic relationship between the beginning and end.
Fortunately I couldn't quite find a way to use this to bypass the
overall signature verification, though it comes very close. Here's the
reasoning: due to the bug, every certificate in the chain beyond the
first actually has to be self-signed (where "self-signed" here refers to
the actual key and signature; an attacker might still manipulate the
certificate fields such that the self_signed flag doesn't actually get
set, and thus the chain doesn't end immediately). But to pass trust
validation (pkcs7_validate_trust()), either the SignerInfo or one of the
certificates has to actually be signed by a trusted key. Since only
self-signed certificates can be added to the chain, the only way for an
attacker to introduce a trusted signature is to include a self-signed
trusted certificate.
But, when pkcs7_validate_trust_one() reaches that certificate, instead
of trying to verify the signature on that certificate, it will actually
look up the corresponding trusted key, which will succeed, and then try
to verify the *previous* certificate, which will fail. Thus, disaster
is narrowly averted (as far as I could tell).
Fixes: 6c2dc5ae4a ("X.509: Extract signature digest and make self-signed cert checks earlier")
Cc: <stable@vger.kernel.org> # v4.7+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
test_maps contains a series of stress tests, and previously it will break the
rest tests when it failed to alloc memory.
-----------------------
Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
Failed to create hashmap key=16 value=262144 'Cannot allocate memory'
Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
Failed to create hashmap key=8 value=262144 'Cannot allocate memory'
test_maps: test_maps.c:955: run_parallel: Assertion `status == 0' failed.
Aborted
not ok 1..3 selftests: test_maps [FAIL]
-----------------------
after this patch, the rest tests will be continue when it occurs an ENOMEM failure
CC: Alexei Starovoitov <alexei.starovoitov@gmail.com>
CC: Philip Li <philip.li@intel.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Li Zhijian <zhijianx.li@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
ioremap_page_range doesn't honour break-before-make and attempts to put
down huge mappings (using p*d_set_huge) over the top of pre-existing
table entries. This leads to us leaking page table memory and also gives
rise to TLB conflicts and spurious aborts, which have been seen in
practice on Cortex-A75.
Until this has been resolved, refuse to put block mappings when the
existing entry is found to be present.
Fixes: 324420bf91 ("arm64: add support for ioremap() block mappings")
Reported-by: Hanjun Guo <hanjun.guo@linaro.org>
Reported-by: Lei Li <lious.lilei@hisilicon.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
GPIO library can return -ENOSYS for the failed request.
Instead of failing ->probe() in this case override error code to 0.
Fixes: ca382f5b38 ("i2c: designware: add i2c gpio recovery option")
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
We were leaving them in the power on state (or the state the firmware
had set up for some client, if we were taking over from them). The
boot state was 30 core clocks, when we actually want to sample some
time after (to make sure that the new input bit has actually arrived).
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
Let's test that we get the flags correctly, and that we preserve the filter
index across the ptrace(PTRACE_SECCOMP_GET_METADATA) correctly.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
CC: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Previously if users passed a small size for the input structure size, they
would get get odd behavior. It doesn't make sense to pass a structure
smaller than at least filter_off size, so let's just give -EINVAL in this
case.
This changes userspace visible behavior, but was only introduced in commit
26500475ac ("ptrace, seccomp: add support for retrieving seccomp
metadata") in 4.16-rc2, so should be safe to change if merged before then.
Reported-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
CC: Kees Cook <keescook@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Commit 26500475ac ("ptrace, seccomp: add support for retrieving seccomp
metadata") introduced `struct seccomp_metadata`, which contained unsigned
longs that should be arch independent. The type of the flags member was
chosen to match the corresponding argument to seccomp(), and so we need
something at least as big as unsigned long. My understanding is that __u64
should fit the bill, so let's switch both types to that.
While this is userspace facing, it was only introduced in 4.16-rc2, and so
should be safe assuming it goes in before then.
Reported-by: "Dmitry V. Levin" <ldv@altlinux.org>
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
CC: Kees Cook <keescook@chromium.org>
CC: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Dmitry V. Levin" <ldv@altlinux.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
bpf builds a test program for loading BPF ELF files. Add the executable
to the .gitignore list.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Daniel Díaz <daniel.diaz@linaro.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Both glibc and the kernel have in6_* macros definitions. Build fails
because it picks up wrong in6_* macro from the kernel header and not the
header from glibc.
Fixes build error below:
clang -I. -I./include/uapi -I../../../include/uapi
-Wno-compare-distinct-pointer-types \
-O2 -target bpf -emit-llvm -c test_tcpbpf_kern.c -o - | \
llc -march=bpf -mcpu=generic -filetype=obj
-o .../tools/testing/selftests/bpf/test_tcpbpf_kern.o
In file included from test_tcpbpf_kern.c:12:
.../netinet/in.h:101:5: error: expected identifier
IPPROTO_HOPOPTS = 0, /* IPv6 Hop-by-Hop options. */
^
.../linux/in6.h:131:26: note: expanded from macro 'IPPROTO_HOPOPTS'
^
In file included from test_tcpbpf_kern.c:12:
/usr/include/netinet/in.h:103:5: error: expected identifier
IPPROTO_ROUTING = 43, /* IPv6 routing header. */
^
.../linux/in6.h:132:26: note: expanded from macro 'IPPROTO_ROUTING'
^
In file included from test_tcpbpf_kern.c:12:
.../netinet/in.h:105:5: error: expected identifier
IPPROTO_FRAGMENT = 44, /* IPv6 fragmentation header. */
^
Since both glibc and the kernel have in6_* macros definitions, use the
one from glibc. Kernel headers will check for previous libc definitions
by including include/linux/libc-compat.h.
Reported-by: Daniel Díaz <daniel.diaz@linaro.org>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Daniel Díaz <daniel.diaz@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The only user of this variable is inside of an #ifdef, causing
a warning without CONFIG_INET:
net/core/filter.c: In function '____bpf_sock_ops_cb_flags_set':
net/core/filter.c:3382:6: error: unused variable 'val' [-Werror=unused-variable]
int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS;
This replaces the #ifdef with a nicer IS_ENABLED() check that
makes the code more readable and avoids the warning.
Fixes: b13d880721 ("bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commit f7f99100d8 ("mm: stop zeroing memory during allocation in
vmemmap") broke Xen pv domains in some configurations, as the "Pinned"
information in struct page of early page tables could get lost.
This will lead to the kernel trying to write directly into the page
tables instead of asking the hypervisor to do so. The result is a crash
like the following:
BUG: unable to handle kernel paging request at ffff8801ead19008
IP: xen_set_pud+0x4e/0xd0
PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065
Oops: 0003 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271
Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014
task: ffffffff81c10480 task.stack: ffffffff81c00000
RIP: e030:xen_set_pud+0x4e/0xd0
Call Trace:
__pmd_alloc+0x128/0x140
ioremap_page_range+0x3f4/0x410
__ioremap_caller+0x1c3/0x2e0
acpi_os_map_iomem+0x175/0x1b0
acpi_tb_acquire_table+0x39/0x66
acpi_tb_validate_table+0x44/0x7c
acpi_tb_verify_temp_table+0x45/0x304
acpi_reallocate_root_table+0x12d/0x141
acpi_early_init+0x4d/0x10a
start_kernel+0x3eb/0x4a1
xen_start_kernel+0x528/0x532
Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3
RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8
CR2: ffff8801ead19008
---[ end trace 38eca2e56f1b642e ]---
Avoid this problem by not deferring struct page initialization when
running as Xen pv guest.
Pavel said:
: This is unique for Xen, so this particular issue won't effect other
: configurations. I am going to investigate if there is a way to
: re-enable deferred page initialization on xen guests.
[akpm@linux-foundation.org: explicitly include xen.h]
Link: http://lkml.kernel.org/r/20180216154101.22865-1-jgross@suse.com
Fixes: f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: <stable@vger.kernel.org> [4.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit d3deafaa8b ("lib/: make RUNTIME_TESTS a menuconfig to ease
disabling it all") causes a regression when using runtime tests due to
it defaults RUNTIME_TESTING_MENU to not set.
Link: http://lkml.kernel.org/r/20180214133015.10090-1-anders.roxell@linaro.org
Fixes: d3deafaa8b ("lib/: make RUNTIME_TESTS a menuconfig to easedisabling it all")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Cc: Vincent Legoll <vincent.legoll@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kai Heng Feng has noticed that BUG_ON(PageHighMem(pg)) triggers in
drivers/media/common/saa7146/saa7146_core.c since 19809c2da2 ("mm,
vmalloc: use __GFP_HIGHMEM implicitly").
saa7146_vmalloc_build_pgtable uses vmalloc_32 and it is reasonable to
expect that the resulting page is not in highmem. The above commit
aimed to add __GFP_HIGHMEM only for those requests which do not specify
any zone modifier gfp flag. vmalloc_32 relies on GFP_VMALLOC32 which
should do the right thing. Except it has been missed that GFP_VMALLOC32
is an alias for GFP_KERNEL on 32b architectures. Thanks to Matthew to
notice this.
Fix the problem by unconditionally setting GFP_DMA32 in GFP_VMALLOC32
for !64b arches (as a bailout). This should do the right thing and use
ZONE_NORMAL which should be always below 4G on 32b systems.
Debugged by Matthew Wilcox.
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20180212095019.GX21609@dhcp22.suse.cz
Fixes: 19809c2da2 ("mm, vmalloc: use __GFP_HIGHMEM implicitly”)
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Kai Heng Feng <kai.heng.feng@canonical.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While testing memfd tests, there is a missing script, as reported by
kselftest:
./run_tests.sh: line 7: ./run_fuse_test.sh: No such file or directory
Link: http://lkml.kernel.org/r/1517955779-11386-1-git-send-email-daniel.diaz@linaro.org
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Looking at functions with large stack frames across all architectures
led me discovering that BUG() suffers from the same problem as
fortify_panic(), which I've added a workaround for already.
In short, variables that go out of scope by calling a noreturn function
or __builtin_unreachable() keep using stack space in functions
afterwards.
A workaround that was identified is to insert an empty assembler
statement just before calling the function that doesn't return. I'm
adding a macro "barrier_before_unreachable()" to document this, and
insert calls to that in all instances of BUG() that currently suffer
from this problem.
The files that saw the largest change from this had these frame sizes
before, and much less with my patch:
fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=]
fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=]
net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=]
drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=]
In case of ARC and CRIS, it turns out that the BUG() implementation
actually does return (or at least the compiler thinks it does),
resulting in lots of warnings about uninitialized variable use and
leaving noreturn functions, such as:
block/cfq-iosched.c: In function 'cfq_async_queue_prio':
block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type]
include/linux/dmaengine.h: In function 'dma_maxpq':
include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type]
This makes them call __builtin_trap() instead, which should normally
dump the stack and kill the current process, like some of the other
architectures already do.
I tried adding barrier_before_unreachable() to panic() and
fortify_panic() as well, but that had very little effect, so I'm not
submitting that patch.
Vineet said:
: For ARC, it is double win.
:
: 1. Fixes 3 -Wreturn-type warnings
:
: | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function
: [-Wreturn-type]
: | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function
: [-Wreturn-type]
: | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of
: non-void function [-Wreturn-type]
:
: 2. bloat-o-meter reports code size improvements as gcc elides the
: generated code for stack return.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc]
Tested-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc]
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was a conflict between the commit e02a9f048e ("mm/swap.c: make
functions and their kernel-doc agree") and the commit f144c390f9 ("mm:
docs: fix parameter names mismatch") that both tried to fix mismatch
betweeen pagevec_lookup_entries() parameter names and their description.
Since nr_entries is a better name for the parameter, fix the description
again.
Link: http://lkml.kernel.org/r/1518116946-20947-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As far as I can tell, the only place the per-cpu ida_bitmap is populated
is in ida_pre_get. The pre-allocated element is stolen in two places in
ida_get_new_above, in both cases immediately followed by a memset(0).
Since ida_get_new_above is called with locks held, do the zeroing in
ida_pre_get, or rather let kmalloc() do it. Also, apparently gcc
generates ~44 bytes of code to do a memset(, 0, 128):
$ scripts/bloat-o-meter vmlinux.{0,1}
add/remove: 0/0 grow/shrink: 2/1 up/down: 5/-88 (-83)
Function old new delta
ida_pre_get 115 119 +4
vermagic 27 28 +1
ida_get_new_above 715 627 -88
Link: http://lkml.kernel.org/r/20180108225634.15340-1-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur in
random user space applications as follow,
kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
#0 0x00007fc08889ae0d _int_malloc (libc.so.6)
#1 0x00007fc08889c2f3 malloc (libc.so.6)
#2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
#3 0x0000560e6005e75c n/a (urxvt)
#4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
#5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
#6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
#7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
#8 0x0000560e6005cb55 ev_run (urxvt)
#9 0x0000560e6003b9b9 main (urxvt)
#10 0x00007fc08883af4a __libc_start_main (libc.so.6)
#11 0x0000560e6003f9da _start (urxvt)
After bisection, it was found the first bad commit is bd4c82c22c ("mm,
THP, swap: delay splitting THP after swapped out").
The root cause is as follows:
When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages to
improve performance. But zswap (frontswap) will treat THP as a normal
page, so only the head page is saved. After swapping in, tail pages
will not be restored to their original contents, causing memory
corruption in the applications.
This is fixed by refusing to save page in the frontswap store functions
if the page is a THP. So that the THP will be swapped out to swap
device.
Another choice is to split THP if frontswap is enabled. But it is found
that the frontswap enabling isn't flexible. For example, if
CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
zswap itself isn't enabled.
Frontswap has multiple backends, to make it easy for one backend to
enable THP support, the THP checking is put in backend frontswap store
functions instead of the general interfaces.
Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com
Fixes: bd4c82c22c ("mm, THP, swap: delay splitting THP after swapped out")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org> [put THP checking in backend]
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Shaohua Li <shli@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: <stable@vger.kernel.org> [4.14]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
const must be marked __initconst, not __initdata.
Link: http://lkml.kernel.org/r/20171222001335.1987-1-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
chan->n_subbufs is set by the user and relay_create_buf() does a kmalloc()
of chan->n_subbufs * sizeof(size_t *).
kmalloc_slab() will generate a warning when this fails if
chan->subbufs * sizeof(size_t *) > KMALLOC_MAX_SIZE.
Limit chan->n_subbufs to the maximum allowed kmalloc() size.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1802061216100.122576@chino.kir.corp.google.com
Fixes: f6302f1bcd ("relay: prevent integer overflow in relay_open()")
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a thread mlocks an address space backed either by file pages which
are currently not present in memory or swapped out anon pages (not in
swapcache), a new page is allocated and added to the local pagevec
(lru_add_pvec), I/O is triggered and the thread then sleeps on the page.
On I/O completion, the thread can wake on a different CPU, the mlock
syscall will then sets the PageMlocked() bit of the page but will not be
able to put that page in unevictable LRU as the page is on the pagevec
of a different CPU. Even on drain, that page will go to evictable LRU
because the PageMlocked() bit is not checked on pagevec drain.
The page will eventually go to right LRU on reclaim but the LRU stats
will remain skewed for a long time.
This patch puts all the pages, even unevictable, to the pagevecs and on
the drain, the pages will be added on their LRUs correctly by checking
their evictability. This resolves the mlocked pages on pagevec of other
CPUs issue because when those pagevecs will be drained, the mlocked file
pages will go to unevictable LRU. Also this makes the race with munlock
easier to resolve because the pagevec drains happen in LRU lock.
However there is still one place which makes a page evictable and does
PageLRU check on that page without LRU lock and needs special attention.
TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock().
#0: __pagevec_lru_add_fn #1: clear_page_mlock
SetPageLRU() if (!TestClearPageMlocked())
return
smp_mb() // <--required
// inside does PageLRU
if (!PageMlocked()) if (isolate_lru_page())
move to evictable LRU putback_lru_page()
else
move to unevictable LRU
In '#1', TestClearPageMlocked() provides full memory barrier semantics
and thus the PageLRU check (inside isolate_lru_page) can not be
reordered before it.
In '#0', without explicit memory barrier, the PageMlocked() check can be
reordered before SetPageLRU(). If that happens, '#0' can put a page in
unevictable LRU and '#1' might have just cleared the Mlocked bit of that
page but fails to isolate as PageLRU fails as '#0' still hasn't set
PageLRU bit of that page. That page will be stranded on the unevictable
LRU.
There is one (good) side effect though. Without this patch, the pages
allocated for System V shared memory segment are added to evictable LRUs
even after shmctl(SHM_LOCK) on that segment. This patch will correctly
put such pages to unevictable LRU.
Link: http://lkml.kernel.org/r/20171121211241.18877-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shaohua Li <shli@fb.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit a983b5ebee ("mm: memcontrol: fix excessive complexity in
memory.stat reporting"), we observed slowly upward creeping NR_WRITEBACK
counts over the course of several days, both the per-memcg stats as well
as the system counter in e.g. /proc/meminfo.
The conversion from full per-cpu stat counts to per-cpu cached atomic
stat counts introduced an irq-unsafe RMW operation into the updates.
Most stat updates come from process context, but one notable exception
is the NR_WRITEBACK counter. While writebacks are issued from process
context, they are retired from (soft)irq context.
When writeback completions interrupt the RMW counter updates of new
writebacks being issued, the decs from the completions are lost.
Since the global updates are routed through the joint lruvec API, both
the memcg counters as well as the system counters are affected.
This patch makes the joint stat and event API irq safe.
Link: http://lkml.kernel.org/r/20180203082353.17284-1-hannes@cmpxchg.org
Fixes: a983b5ebee ("mm: memcontrol: fix excessive complexity in memory.stat reporting")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Debugged-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Build testing with LTO found a couple of files that get compiled
differently depending on whether asm/byteorder.h gets included early
enough or not. In particular, include/asm-generic/qrwlock_types.h is
affected by this, but there are probably others as well.
The symptom is a series of LTO link time warnings, including these:
net/netlabel/netlabel_unlabeled.h:223: error: type of 'netlbl_unlhsh_add' does not match original declaration [-Werror=lto-type-mismatch]
int netlbl_unlhsh_add(struct net *net,
net/netlabel/netlabel_unlabeled.c:377: note: 'netlbl_unlhsh_add' was previously declared here
include/net/ipv6.h:360: error: type of 'ipv6_renew_options_kern' does not match original declaration [-Werror=lto-type-mismatch]
ipv6_renew_options_kern(struct sock *sk,
net/ipv6/exthdrs.c:1162: note: 'ipv6_renew_options_kern' was previously declared here
net/core/dev.c:761: note: 'dev_get_by_name_rcu' was previously declared here
struct net_device *dev_get_by_name_rcu(struct net *net, const char *name)
net/core/dev.c:761: note: code may be misoptimized unless -fno-strict-aliasing is used
drivers/gpu/drm/i915/i915_drv.h:3377: error: type of 'i915_gem_object_set_to_wc_domain' does not match original declaration [-Werror=lto-type-mismatch]
i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write);
drivers/gpu/drm/i915/i915_gem.c:3639: note: 'i915_gem_object_set_to_wc_domain' was previously declared here
include/linux/debugfs.h:92:9: error: type of 'debugfs_attr_read' does not match original declaration [-Werror=lto-type-mismatch]
ssize_t debugfs_attr_read(struct file *file, char __user *buf,
fs/debugfs/file.c:318: note: 'debugfs_attr_read' was previously declared here
include/linux/rwlock_api_smp.h:30: error: type of '_raw_read_unlock' does not match original declaration [-Werror=lto-type-mismatch]
void __lockfunc _raw_read_unlock(rwlock_t *lock) __releases(lock);
kernel/locking/spinlock.c:246:26: note: '_raw_read_unlock' was previously declared here
include/linux/fs.h:3308:5: error: type of 'simple_attr_open' does not match original declaration [-Werror=lto-type-mismatch]
int simple_attr_open(struct inode *inode, struct file *file,
fs/libfs.c:795: note: 'simple_attr_open' was previously declared here
All of the above are caused by include/asm-generic/qrwlock_types.h
failing to include asm/byteorder.h after commit e0d02285f1
("locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'")
in linux-4.15.
Similar bugs may or may not exist in older kernels as well, but there is
no easy way to test those with link-time optimizations, and kernels
before 4.14 are harder to fix because they don't have Babu's patch
series
We had similar issues with CONFIG_ symbols in the past and ended up
always including the configuration headers though linux/kconfig.h. This
works around the issue through that same file, defining either
__BIG_ENDIAN or __LITTLE_ENDIAN depending on CONFIG_CPU_BIG_ENDIAN,
which is now always set on all architectures since commit 4c97a0c8fe
("arch: define CPU_BIG_ENDIAN for all fixed big endian archs").
Link: http://lkml.kernel.org/r/20180202154104.1522809-2-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Babu Moger <babu.moger@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As Peter points out, Doing a CALL+RET for just the decrement is a bit silly.
Fixes: d70f2a14b7 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc")
Acked-by: Peter Zijlstra (Intel) <peterz@infraded.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently a number of Makefiles break when used with toolchains that
pass extra flags in CC and other cross-compile related variables (such
as --sysroot).
Thus we get this error when we use a toolchain that puts --sysroot in
the CC var:
~/src/linux/tools$ make iio
[snip]
iio_event_monitor.c:18:10: fatal error: unistd.h: No such file or directory
#include <unistd.h>
^~~~~~~~~~
This occurs because we clobber several env vars related to
cross-compiling with lines like this:
CC = $(CROSS_COMPILE)gcc
Although this will point to a valid cross-compiler, we lose any extra
flags that might exist in the CC variable, which can break toolchains
that rely on them (for example, those that use --sysroot).
This easily shows up using a Yocto SDK:
$ . [snip]/sdk/environment-setup-cortexa8hf-neon-poky-linux-gnueabi
$ echo $CC
arm-poky-linux-gnueabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=hard
-mcpu=cortex-a8
--sysroot=[snip]/sdk/sysroots/cortexa8hf-neon-poky-linux-gnueabi
$ echo $CROSS_COMPILE
arm-poky-linux-gnueabi-
$ echo ${CROSS_COMPILE}gcc
krm-poky-linux-gnueabi-gcc
Although arm-poky-linux-gnueabi-gcc is a cross-compiler, we've lost the
--sysroot and other flags that enable us to find the right libraries to
link against, so we can't find unistd.h and other libraries and headers.
Normally with the --sysroot flag we would find unistd.h in the sdk
directory in the sysroot:
$ find [snip]/sdk/sysroots -path '*/usr/include/unistd.h'
[snip]/sdk/sysroots/cortexa8hf-neon-poky-linux-gnueabi/usr/include/unistd.h
The perf Makefile adds CC = $(CROSS_COMPILE)gcc if and only if CC is not
already set, and it compiles correctly with the above toolchain.
So, generalize the logic that perf uses in the common Makefile and
remove the manual CC = $(CROSS_COMPILE)gcc lines from each Makefile.
Note that this patch does not fix cross-compile for all the tools (some
have other bugs), but it does fix it for all except usb and acpi, which
still have other unrelated issues.
I tested both with and without the patch on native and cross-build and
there appear to be no regressions.
Link: http://lkml.kernel.org/r/20180107214028.23771-1-martin@martingkelly.com
Signed-off-by: Martin Kelly <martin@martingkelly.com>
Acked-by: Mark Brown <broonie@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Pali Rohar <pali.rohar@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Valentina Manea <valentina.manea.m@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>