Send ETHTOOL_MSG_CHANNELS_NTF notification whenever channel counts of
a network device are modified using ETHTOOL_MSG_CHANNELS_SET netlink
message or ETHTOOL_SCHANNELS ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement CHANNELS_SET netlink request to set channel counts of a network
device. These are traditionally set with ETHTOOL_SCHANNELS ioctl request.
Like the ioctl implementation, the generic ethtool code checks if supplied
values do not exceed driver defined limits; if they do, first offending
attribute is reported using extack. Checks preventing removing channels
used for RX indirection table or zerocopy AF_XDP socket are also
implemented.
Move ethtool_get_max_rxfh_channel() helper into common.c so that it can be
used by both ioctl and netlink code.
v2:
- fix netdev reference leak in error path (found by Jakub Kicinsky)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement CHANNELS_GET request to get channel counts of a network device.
These are traditionally available via ETHTOOL_GCHANNELS ioctl request.
Omit attributes for channel types which are not supported by driver or
device (zero reported for maximum).
v2: (all suggested by Jakub Kicinski)
- minor cleanup in channels_prepare_data()
- more descriptive channels_reply_size()
- omit attributes with zero max count
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Send ETHTOOL_MSG_RINGS_NTF notification whenever ring sizes of a network
device are modified using ETHTOOL_MSG_RINGS_SET netlink message or
ETHTOOL_SRINGPARAM ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement RINGS_SET netlink request to set ring sizes of a network device.
These are traditionally set with ETHTOOL_SRINGPARAM ioctl request.
Like the ioctl implementation, the generic ethtool code checks if supplied
values do not exceed driver defined limits; if they do, first offending
attribute is reported using extack.
v2:
- fix netdev reference leak in error path (found by Jakub Kicinsky)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement RINGS_GET request to get ring sizes of a network device. These
are traditionally available via ETHTOOL_GRINGPARAM ioctl request.
Omit attributes for ring types which are not supported by driver or device
(zero reported for maximum).
v2: (all suggested by Jakub Kicinski)
- minor cleanup in rings_prepare_data()
- more descriptive rings_reply_size()
- omit attributes with zero max size
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement PRIVFLAGS_SET netlink request to set private flags of a network
device. These are traditionally set with ETHTOOL_SPFLAGS ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement PRIVFLAGS_GET request to get private flags for a network device.
These are traditionally available via ETHTOOL_GPFLAGS ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Send ETHTOOL_MSG_FEATURES_NTF notification whenever network device features
are modified using ETHTOOL_MSG_FEATURES_SET netlink message, ethtool ioctl
request or any other way resulting in call to netdev_update_features() or
netdev_change_features()
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement FEATURES_SET netlink request to set network device features.
These are traditionally set using ETHTOOL_SFEATURES ioctl request.
Actual change is subject to netdev_change_features() sanity checks so that
it can differ from what was requested. Unlike with most other SET requests,
in addition to error code and optional extack, kernel provides an optional
reply message (ETHTOOL_MSG_FEATURES_SET_REPLY) in the same format but with
different semantics: information about difference between user request and
actual result and difference between old and new state of dev->features.
This reply message can be suppressed by setting ETHTOOL_FLAG_OMIT_REPLY
flag in request header.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement FEATURES_GET request to get network device features. These are
traditionally available via ETHTOOL_GFEATURES ioctl request.
v2:
- style cleanup suggested by Jakub Kicinski
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Send ETHTOOL_MSG_WOL_NTF notification whenever wake-on-lan settings of
a device are modified using ETHTOOL_MSG_WOL_SET netlink message or
ETHTOOL_SWOL ioctl request.
As notifications can be received by anyone, do not include SecureOn(tm)
password in notification messages.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement WOL_SET netlink request to set wake-on-lan settings. This is
equivalent to ETHTOOL_SWOL ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement WOL_GET request to get wake-on-lan settings for a device,
traditionally available via ETHTOOL_GWOL ioctl request.
As part of the implementation, provide symbolic names for wake-on-line
modes as ETH_SS_WOL_MODES string set.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Send ETHTOOL_MSG_DEBUG_NTF notification message whenever debugging message
mask for a device are modified using ETHTOOL_MSG_DEBUG_SET netlink message
or ETHTOOL_SMSGLVL ioctl request.
The notification message has the same format as reply to DEBUG_GET request.
As with other ethtool notifications, netlink requests only trigger the
notification if the mask is actually changed while ioctl request trigger it
whenever the request results in calling the ethtool_ops handler.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement DEBUG_SET netlink request to set debugging settings for a device.
At the moment, only message mask corresponding to message level as set by
ETHTOOL_SMSGLVL ioctl request can be set. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement DEBUG_GET request to get debugging settings for a device. At the
moment, only message mask corresponding to message level as reported by
ETHTOOL_GMSGLVL ioctl request is provided. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)
As part of the implementation, provide symbolic names for message mask bits
as ETH_SS_MSG_CLASSES string set.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement LINKSTATE_GET netlink request to get link state information.
At the moment, only link up flag as provided by ETHTOOL_GLINK ioctl command
is returned.
LINKSTATE_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Send ETHTOOL_MSG_LINKMODES_NTF notification message whenever device link
settings or advertised modes are modified using ETHTOOL_MSG_LINKMODES_SET
netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands.
The notification message has the same format as reply to LINKMODES_GET
request. ETHTOOL_MSG_LINKMODES_SET netlink request only triggers the
notification if there is a change but the ioctl command handlers do not
check if there is an actual change and trigger the notification whenever
the commands are executed.
As all work is done by ethnl_default_notify() handler and callback
functions introduced to handle LINKMODES_GET requests, all that remains is
adding entries for ETHTOOL_MSG_LINKMODES_NTF into ethnl_notify_handlers and
ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where
needed.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement LINKMODES_SET netlink request to set advertised linkmodes and
related attributes as ETHTOOL_SLINKSETTINGS and ETHTOOL_SSET commands do.
The request allows setting autonegotiation flag, speed, duplex and
advertised link modes.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement LINKMODES_GET netlink request to get link modes related
information provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl
commands.
This request provides supported, advertised and peer advertised link modes,
autonegotiation flag, speed and duplex.
LINKMODES_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Send ETHTOOL_MSG_LINKINFO_NTF notification message whenever device link
settings are modified using ETHTOOL_MSG_LINKINFO_SET netlink message or
ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands.
The notification message has the same format as reply to LINKINFO_GET
request. ETHTOOL_MSG_LINKINFO_SET netlink request only triggers the
notification if there is a change but the ioctl command handlers do not
check if there is an actual change and trigger the notification whenever
the commands are executed.
As all work is done by ethnl_default_notify() handler and callback
functions introduced to handle LINKINFO_GET requests, all that remains is
adding entries for ETHTOOL_MSG_LINKINFO_NTF into ethnl_notify_handlers and
ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where
needed.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement LINKINFO_SET netlink request to set link settings queried by
LINKINFO_GET message.
Only physical port, phy MDIO address and MDI(-X) control can be set,
attempt to modify MDI(-X) status and transceiver is rejected.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Implement LINKINFO_GET netlink request to get basic link settings provided
by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands.
This request provides settings not directly related to autonegotiation and
link mode selection: physical port, phy MDIO address, MDI(-X) status,
MDI(-X) control and transceiver.
LINKINFO_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Requests a contents of one or more string sets, i.e. indexed arrays of
strings; this information is provided by ETHTOOL_GSSET_INFO and
ETHTOOL_GSTRINGS commands of ioctl interface. Unlike ioctl interface, all
information can be retrieved with one request and mulitple string sets can
be requested at once.
There are three types of requests:
- no NLM_F_DUMP, no device: get "global" stringsets
- no NLM_F_DUMP, with device: get string sets related to the device
- NLM_F_DUMP, no device: get device related string sets for all devices
Client can request either all string sets of given type (global or device
related) or only specific sets. With ETHTOOL_A_STRSET_COUNTS flag set, only
set sizes (numbers of strings) are returned.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
The ethtool netlink code uses common framework for passing arbitrary
length bit sets to allow future extensions. A bitset can be a list (only
one bitmap) or can consist of value and mask pair (used e.g. when client
want to modify only some bits). A bitset can use one of two formats:
verbose (bit by bit) or compact.
Verbose format consists of bitset size (number of bits), list flag and
an array of bit nests, telling which bits are part of the list or which
bits are in the mask and which of them are to be set. In requests, bits
can be identified by index (position) or by name. In replies, kernel
provides both index and name. Verbose format is suitable for "one shot"
applications like standard ethtool command as it avoids the need to
either keep bit names (e.g. link modes) in sync with kernel or having to
add an extra roundtrip for string set request (e.g. for private flags).
Compact format uses one (list) or two (value/mask) arrays of 32-bit
words to store the bitmap(s). It is more suitable for long running
applications (ethtool in monitor mode or network management daemons)
which can retrieve the names once and then pass only compact bitmaps to
save space.
Userspace requests can use either format; ETHTOOL_FLAG_COMPACT_BITSETS
flag in request header tells kernel which format to use in reply.
Notifications always use compact format.
As some code uses arrays of unsigned long for internal representation and
some arrays of u32 (or even a single u32), two sets of parse/compose
helpers are introduced. To avoid code duplication, helpers for unsigned
long arrays are implemented as wrappers around helpers for u32 arrays.
There are two reasons for this choice: (1) u32 arrays are more frequent in
ethtool code and (2) unsigned long array can be always interpreted as an
u32 array on little endian 64-bit and all 32-bit architectures while we
would need special handling for odd number of u32 words in the opposite
direction.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Basic genetlink and init infrastructure for the netlink interface, register
genetlink family "ethtool". Add CONFIG_ETHTOOL_NETLINK Kconfig option to
make the build optional. Add initial overall interface description into
Documentation/networking/ethtool-netlink.rst, further patches will add more
detailed information.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
For memory accesses with write-combining attributes (e.g. those returned
by ioremap_wc()), the CPU may wait for prior accesses to be merged with
subsequent ones. But in some situation, such wait is bad for the
performance.
We introduce io_stop_wc() to prevent the merging of write-combining
memory accesses before this macro with those after it.
We add implementation for ARM64 using DGH instruction and provide NOP
implementation for other architectures.
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Suggested-by: Will Deacon <will@kernel.org>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211221035556.60346-1-wangxiongfeng2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Conflicts:
arch/arm64/include/asm/barrier.h
include/asm-generic/barrier.h
Since userspace can make use of the CNTVSS_EL0 instruction, expose
it via a HWCAP.
Suggested-by: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-18-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: hongrongxuan <hongrongxuan@huawei.com>
Conflicts:
Documentation/arm64/cpu-feature-registers.rst
arch/arm64/include/asm/hwcap.h
arch/arm64/include/uapi/asm/hwcap.h
arch/arm64/kernel/cpufeature.c
arch/arm64/kernel/cpuinfo.c
Sync code to the same with tk4 pub/lts/0017-kabi, except deleted rue
and wujing. Partners can submit pull requests to this branch, and we
can pick the commits to tk4 pub/lts/0017-kabi easly.
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Gitee limit the repo's size to 3GB, to reduce the size of the code,
sync codes to ock 5.4.119-20.0009.21 in one commit.
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Sync kernel codes to the same with 590eaf1fec ("Init Repo base on
linux 5.4.32 long term, and add base tlinux kernel interfaces."), which
is from tk4, and it is the base of tk4.
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Pull x86 TSX Async Abort and iTLB Multihit mitigations from Thomas Gleixner:
"The performance deterioration departement is not proud at all of
presenting the seventh installment of speculation mitigations and
hardware misfeature workarounds:
1) TSX Async Abort (TAA) - 'The Annoying Affair'
TAA is a hardware vulnerability that allows unprivileged
speculative access to data which is available in various CPU
internal buffers by using asynchronous aborts within an Intel TSX
transactional region.
The mitigation depends on a microcode update providing a new MSR
which allows to disable TSX in the CPU. CPUs which have no
microcode update can be mitigated by disabling TSX in the BIOS if
the BIOS provides a tunable.
Newer CPUs will have a bit set which indicates that the CPU is not
vulnerable, but the MSR to disable TSX will be available
nevertheless as it is an architected MSR. That means the kernel
provides the ability to disable TSX on the kernel command line,
which is useful as TSX is a truly useful mechanism to accelerate
side channel attacks of all sorts.
2) iITLB Multihit (NX) - 'No eXcuses'
iTLB Multihit is an erratum where some Intel processors may incur
a machine check error, possibly resulting in an unrecoverable CPU
lockup, when an instruction fetch hits multiple entries in the
instruction TLB. This can occur when the page size is changed
along with either the physical address or cache type. A malicious
guest running on a virtualized system can exploit this erratum to
perform a denial of service attack.
The workaround is that KVM marks huge pages in the extended page
tables as not executable (NX). If the guest attempts to execute in
such a page, the page is broken down into 4k pages which are
marked executable. The workaround comes with a mechanism to
recover these shattered huge pages over time.
Both issues come with full documentation in the hardware
vulnerabilities section of the Linux kernel user's and administrator's
guide.
Thanks to all patch authors and reviewers who had the extraordinary
priviledge to be exposed to this nuisance.
Special thanks to Borislav Petkov for polishing the final TAA patch
set and to Paolo Bonzini for shepherding the KVM iTLB workarounds and
providing also the backports to stable kernels for those!"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation/taa: Fix printing of TAA_MSG_SMT on IBRS_ALL CPUs
Documentation: Add ITLB_MULTIHIT documentation
kvm: x86: mmu: Recovery of shattered NX large pages
kvm: Add helper function for creating VM worker threads
kvm: mmu: ITLB_MULTIHIT mitigation
cpu/speculation: Uninline and export CPU mitigations helpers
x86/cpu: Add Tremont to the cpu vulnerability whitelist
x86/bugs: Add ITLB_MULTIHIT bug infrastructure
x86/tsx: Add config options to set tsx=on|off|auto
x86/speculation/taa: Add documentation for TSX Async Abort
x86/tsx: Add "auto" option to the tsx= cmdline parameter
kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
x86/speculation/taa: Add sysfs reporting for TSX Async Abort
x86/speculation/taa: Add mitigation for TSX Async Abort
x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
x86/cpu: Add a helper function x86_read_arch_cap_msr()
x86/msr: Add the IA32_TSX_CTRL MSR
Add TLS TX counter description for the handshake retransmitted
packets that triggers the resync procedure then skip it, going
into the regular transmit flow.
Fixes: 46a3ea9807 ("net/mlx5e: kTLS, Enhance TX resync flow")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the initial ITLB_MULTIHIT documentation.
[ tglx: Add it to the index so it gets actually built. ]
Signed-off-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>
Signed-off-by: Nelson D'Souza <nelson.dsouza@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The page table pages corresponding to broken down large pages are zapped in
FIFO order, so that the large page can potentially be recovered, if it is
not longer being used for execution. This removes the performance penalty
for walking deeper EPT page tables.
By default, one large page will last about one hour once the guest
reaches a steady state.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
With some Intel processors, putting the same virtual address in the TLB
as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit
and cause the processor to issue a machine check resulting in a CPU lockup.
Unfortunately when EPT page tables use huge pages, it is possible for a
malicious guest to cause this situation.
Add a knob to mark huge pages as non-executable. When the nx_huge_pages
parameter is enabled (and we are using EPT), all huge pages are marked as
NX. If the guest attempts to execute in one of those pages, the page is
broken down into 4K pages, which are then marked executable.
This is not an issue for shadow paging (except nested EPT), because then
the host is in control of TLB flushes and the problematic situation cannot
happen. With nested EPT, again the nested guest can cause problems shadow
and direct EPT is treated in the same way.
[ tglx: Fixup default to auto and massage wording a bit ]
Originally-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some processors may incur a machine check error possibly resulting in an
unrecoverable CPU lockup when an instruction fetch encounters a TLB
multi-hit in the instruction TLB. This can occur when the page size is
changed along with either the physical address or cache type. The relevant
erratum can be found here:
https://bugzilla.kernel.org/show_bug.cgi?id=205195
There are other processors affected for which the erratum does not fully
disclose the impact.
This issue affects both bare-metal x86 page tables and EPT.
It can be mitigated by either eliminating the use of large pages or by
using careful TLB invalidations when changing the page size in the page
tables.
Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in
MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which
are mitigated against this issue.
Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull networking fixes from David Miller:
1) Fix free/alloc races in batmanadv, from Sven Eckelmann.
2) Several leaks and other fixes in kTLS support of mlx5 driver, from
Tariq Toukan.
3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
Høiland-Jørgensen.
4) Add an r8152 device ID, from Kazutoshi Noguchi.
5) Missing include in ipv6's addrconf.c, from Ben Dooks.
6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
easily infer the 32-bit secret otherwise etc.
7) Several netdevice nesting depth fixes from Taehee Yoo.
8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
when doing lockless skb_queue_empty() checks, and accessing
sk_napi_id/sk_incoming_cpu lockless as well.
9) Fix jumbo packet handling in RXRPC, from David Howells.
10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.
11) Fix DMA synchronization in gve driver, from Yangchun Fu.
12) Several bpf offload fixes, from Jakub Kicinski.
13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.
14) Fix ping latency during high traffic rates in hisilicon driver, from
Jiangfent Xiao.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
net: fix installing orphaned programs
net: cls_bpf: fix NULL deref on offload filter removal
selftests: bpf: Skip write only files in debugfs
selftests: net: reuseport_dualstack: fix uninitalized parameter
r8169: fix wrong PHY ID issue with RTL8168dp
net: dsa: bcm_sf2: Fix IMP setup for port different than 8
net: phylink: Fix phylink_dbg() macro
gve: Fixes DMA synchronization.
inet: stop leaking jiffies on the wire
ixgbe: Remove duplicate clear_bit() call
Documentation: networking: device drivers: Remove stray asterisks
e1000: fix memory leaks
i40e: Fix receive buffer starvation for AF_XDP
igb: Fix constant media auto sense switching when no cable is connected
net: ethernet: arc: add the missed clk_disable_unprepare
igb: Enable media autosense for the i350.
igb/igc: Don't warn on fatal read failures when the device is removed
tcp: increase tcp_max_syn_backlog max value
net: increase SOMAXCONN to 4096
netdevsim: Fix use-after-free during device dismantle
...
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2019-11-01
This series contains updates to e1000, igb, igc, ixgbe, i40e and driver
documentation.
Lyude Paul fixes an issue where a fatal read error occurs when the
device is unplugged from the machine. So change the read error into a
warn while the device is still present.
Manfred Rudigier found that the i350 device was not apart of the "Media
Auto Sense" feature, yet the device supports it. So add the missing
i350 device to the check and fix an issue where the media auto sense
would flip/flop when no cable was connected to the port causing spurious
kernel log messages.
I fixed an issue where the fix to resolve receive buffer starvation was
applied in more than one place in the driver, one being the incorrect
location in the i40e driver.
Wenwen Wang fixes a potential memory leak in e1000 where allocated
memory is not properly cleaned up in one of the error paths.
Jonathan Neuschäfer cleans up the driver documentation to be consistent
and remove the footnote reference, since the footnote no longer exists in
the documentation.
Igor Pylypiv cleans up a duplicate clearing of a bit, no need to clear
it twice.
v2: Fixed alignment issue in patch 3 of the series based on community
feedback.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These asterisks were once references to a line that said:
"* Other names and brands may be claimed as the property of others."
But now, they serve no purpose; they can only irritate the reader.
Fixes: de3edab427 ("e1000: update README for e1000")
Fixes: a3fb65680f ("e100.txt: Cleanup license info in kernel doc")
Fixes: da8c01c450 ("e1000e.txt: Add e1000e documentation")
Fixes: f12a84a9f6 ("Documentation: fm10k: Add kernel documentation")
Fixes: b55c52b193 ("igb.txt: Add igb documentation")
Fixes: c4e9b56e24 ("igbvf.txt: Add igbvf Documentation")
Fixes: d7064f4c19 ("Documentation/networking/: Update Intel wired LAN driver documentation")
Fixes: c4b8c01112 ("ixgbevf.txt: Update ixgbevf documentation")
Fixes: 1e06edcc2f ("Documentation: i40e: Prepare documentation for RST conversion")
Fixes: 105bf2fe6b ("i40evf: add driver to kernel build system")
Fixes: 1fae869bcf ("Documentation: ice: Prepare documentation for RST conversion")
Fixes: df69ba4321 ("ionic: Add basic framework for IONIC Network device driver")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
- Enable CPU errata workarounds for Broadcom Brahma-B53
- Enable CPU errata workarounds for Qualcomm Hydra/Kryo CPUs
- Fix initial dirty status of writeable, shared mappings
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl28HzAQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNH3gB/4hJoYsASohxTVEcILOp7gZQZd4zgMuF16Z
ci9XcUgmpT3LNQTqSYASDxZZylVdK7eEq4yUXFpe57D5WL6GyEBLDWr09O6qb6F1
p/IuyEkUjram8GzRZsdW3/i786m887T1VYtRg6C7GKU9dHTRzkZcPTklWqc1CsEN
u7KqLGzWHxRNNUVWFhEsn9kTSARVOMfqXfERcpc2f6E5olXz8E62K+av2NL3u5o7
JQqHFqi5iJB66qc0AvUxc7oq/+Hvtz5nQfFm0IWQvGy3dvZ/vTGxYwAW2f7t70SH
MGHT+MsqYEENDjunMKtdHZ+D3A1xkYcrsKgOBkSBTTVlgrSonCr/
=0QZC
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"These are almost exclusively related to CPU errata in CPUs from
Broadcom and Qualcomm where the workarounds were either not being
enabled when they should have been or enabled when they shouldn't have
been.
The only "interesting" fix is ensuring that writeable, shared mappings
are initially mapped as clean since we inadvertently broke the logic
back in v4.14 and then noticed the problem via code inspection the
other day.
The only critical issue we have outstanding is a sporadic NULL
dereference in the scheduler, which doesn't appear to be
arm64-specific and PeterZ is tearing his hair out over it at the
moment.
Summary:
- Enable CPU errata workarounds for Broadcom Brahma-B53
- Enable CPU errata workarounds for Qualcomm Hydra/Kryo CPUs
- Fix initial dirty status of writeable, shared mappings"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: apply ARM64_ERRATUM_843419 workaround for Brahma-B53 core
arm64: Brahma-B53 is SSB and spectre v2 safe
arm64: apply ARM64_ERRATUM_845719 workaround for Brahma-B53 core
arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo
arm64: cpufeature: Enable Qualcomm Falkor/Kryo errata 1003
arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default
The Broadcom Brahma-B53 core is susceptible to the issue described by
ARM64_ERRATUM_843419 so this commit enables the workaround to be applied
when executing on that core.
Since there are now multiple entries to match, we must convert the
existing ARM64_ERRATUM_843419 into an erratum list and use
cpucap_multi_entry_cap_matches to match our entries.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
The Broadcom Brahma-B53 core is susceptible to the issue described by
ARM64_ERRATUM_845719 so this commit enables the workaround to be applied
when executing on that core.
Since there are now multiple entries to match, we must convert the
existing ARM64_ERRATUM_845719 into an erratum list.
Signed-off-by: Doug Berger <opendmb@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
tcp_max_syn_backlog default value depends on memory size
and TCP ehash size. Before this patch, the max value
was 2048 [1], which is considered too small nowadays.
Increase it to 4096 to match the recent SOMAXCONN change.
[1] This is with TCP ehash size being capped to 524288 buckets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
SOMAXCONN is /proc/sys/net/core/somaxconn default value.
It has been defined as 128 more than 20 years ago.
Since it caps the listen() backlog values, the very small value has
caused numerous problems over the years, and many people had
to raise it on their hosts after beeing hit by problems.
Google has been using 1024 for at least 15 years, and we increased
this to 4096 after TCP listener rework has been completed, more than
4 years ago. We got no complain of this change breaking any
legacy application.
Many applications indeed setup a TCP listener with listen(fd, -1);
meaning they let the system select the backlog.
Raising SOMAXCONN lowers chance of the port being unavailable under
even small SYNFLOOD attack, and reduces possibilities of side channel
vulnerabilities.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Kryo cores share errata 1009 with Falkor, so add their model
definitions and enable it for them as well.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[will: Update entry in silicon-errata.rst]
Signed-off-by: Will Deacon <will@kernel.org>
Add the documenation for TSX Async Abort. Include the description of
the issue, how to check the mitigation state, control the mitigation,
guidance for system administrators.
[ bp: Add proper SPDX tags, touch ups by Josh and me. ]
Co-developed-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mark Gross <mgross@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>