Daniel Borkmann says:
====================
pull-request: bpf 2018-07-01
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) A bpf_fib_lookup() helper fix to change the API before freeze to
return an encoding of the FIB lookup result and return the nexthop
device index in the params struct (instead of device index as return
code that we had before), from David.
2) Various BPF JIT fixes to address syzkaller fallout, that is, do not
reject progs when set_memory_*() fails since it could still be RO.
Also arm32 JIT was not using bpf_jit_binary_lock_ro() API which was
an issue, and a memory leak in s390 JIT found during review, from
Daniel.
3) Multiple fixes for sockmap/hash to address most of the syzkaller
triggered bugs. Usage with IPv6 was crashing, a GPF in bpf_tcp_close(),
a missing sock_map_release() routine to hook up to callbacks, and a
fix for an omitted bucket lock in sock_close(), from John.
4) Two bpftool fixes to remove duplicated error message on program load,
and another one to close the libbpf object after program load. One
additional fix for nfp driver's BPF offload to avoid stopping offload
completely if replace of program failed, from Jakub.
5) Couple of BPF selftest fixes that bail out in some of the test
scripts if the user does not have the right privileges, from Jeffrin.
6) Fixes in test_bpf for s390 when CONFIG_BPF_JIT_ALWAYS_ON is set
where we need to set the flag that some of the test cases are expected
to fail, from Kleber.
7) Fix to detangle BPF_LIRC_MODE2 dependency from CONFIG_CGROUP_BPF
since it has no relation to it and lirc2 users often have configs
without cgroups enabled and thus would not be able to use it, from Sean.
8) Fix a selftest failure in sockmap by removing a useless setrlimit()
call that would set a too low limit where at the same time we are
already including bpf_rlimit.h that does the job, from Yonghong.
9) Fix BPF selftest config with missing missing NET_SCHED, from Anders.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A PCIe endpoint carries the process address space identifier (PASID) in
the TLP prefix as part of the memory read/write transaction. The address
information in the TLP is relevant only for a given PASID context.
An IOMMU takes PASID value and the address information from the
TLP to look up the physical address in the system.
PASID is an End-End TLP Prefix (PCIe r4.0, sec 6.20). Sec 2.2.10.2 says
It is an error to receive a TLP with an End-End TLP Prefix by a
Receiver that does not support End-End TLP Prefixes. A TLP in
violation of this rule is handled as a Malformed TLP. This is a
reported error associated with the Receiving Port (see Section 6.2).
Prevent error condition by proactively requiring End-End TLP prefix to be
supported on the entire data path between the endpoint and the root port
before enabling PASID.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* finally some of the promised HE code, but it turns
out to be small - but everything kept changing, so
one part I did in the driver was >30 patches for
what was ultimately <200 lines of code ... similar
here for this code.
* improved scan privacy support - can now specify scan
flags for randomizing the sequence number as well as
reducing the probe request element content
* rfkill cleanups
* a timekeeping cleanup from Arnd
* various other cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAls2HpsACgkQB8qZga/f
l8RPuQ//aZbTXc/GkYh0/GAmF4ORHePOHTXTZbMEzPeHQSlUE0nTSieyVtamsyy+
P+0Ik/lck15Oq/8qabUqDfDY37Fm/OD88jxmoVhjDdgTUcTbIm71n1yS9vDLytuL
n0Awq2d8xuR2bRkwGgt3Bg0RsCbvqUTa/irrighPiKGqwdVGf7kqGi76hsLrMkx9
MQsVh1tRJCEvqEfs3yojhPna4AFjl9OoKFh0JjKJmKv5MWY5x4ojYG3kvvnAq2uF
TIqko4l+R6AR+IzgBsPfzjj8YSJT67Z9IGe8YzId3OcMubpaJqKwrIq0+sYD/9AO
/FGlK7V/NNge4E7sRPwu+dFzf9tOQAtKE06Icxy7aFknhdv5yGnuT2XaIUt2fv6b
1jMWMPxY8azBL3H2siDJ17ouRoIJbkw+3o41m3ZCneLebMWjIX/s2Azqiz2lUiU2
RjZ9Zr0qXdSghK5yD6/iInUBdmNBNq5ubQ8OIAy7fL7linvBAO23iP/G4E7zBikw
9DtHvrpRx2yA4oYTZiaP0FIEmN/nhVuY7VLdjfLlLBtU9cs9kxOydOVSVB9MeJfE
c+HiIApuykDxUj5mrd2mo7AkINjUVXKrVZLOH8hqlNvbjJRmcfyR/TOUJzdfeLX+
0jmji7TMZaaooUEm+KllCnIyUxSmlS25/Ekfm2gdx/rMXXzi/Oo=
=sNaA
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2018-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Small merge conflict in net/mac80211/scan.c, I preserved
the kcalloc() conversion. -DaveM
Johannes Berg says:
====================
This round's updates:
* finally some of the promised HE code, but it turns
out to be small - but everything kept changing, so
one part I did in the driver was >30 patches for
what was ultimately <200 lines of code ... similar
here for this code.
* improved scan privacy support - can now specify scan
flags for randomizing the sequence number as well as
reducing the probe request element content
* rfkill cleanups
* a timekeeping cleanup from Arnd
* various other cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit extends the existing TIPC socket diagnostics framework
for information related to TIPC group communication.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds diag support for SMC-D.
Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When sk_rmem_alloc is larger than the receive buffer and we can't
schedule more memory for it, the skb will be dropped.
In above situation, if this skb is put into the ofo queue,
LINUX_MIB_TCPOFODROP is incremented to track it.
While if this skb is put into the receive queue, there's no record.
So a new SNMP counter is introduced to track this behavior.
LINUX_MIB_TCPRCVQDROP: Number of packets meant to be queued in rcv queue
but dropped because socket rcvbuf limit hit.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup PCI_REBAR_CTRL_BAR_SHIFT handling. That was hard coded instead of
properly defined in the header for some reason.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Allow setting tunnel options using the act_tunnel_key action.
Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.
# ip link add name geneve0 type geneve dstport 0 external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 \
ip_proto udp \
action tunnel_key \
set src_ip 10.0.99.192 \
dst_ip 10.0.99.193 \
dst_port 6081 \
id 11 \
geneve_opts 0102:80:00800022,0102:80:00800022 \
action mirred egress redirect dev geneve0
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
io_pgetevents() will not change the signal mask. Mark it const to make
it clear and to reduce the need for casts in user code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[hch: reapply the patch that got incorrectly reverted]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This feature is actually already supported by sk->sk_reuse which can be
set by socket level opt SO_REUSEADDR. But it's not working exactly as
RFC6458 demands in section 8.1.27, like:
- This option only supports one-to-one style SCTP sockets
- This socket option must not be used after calling bind()
or sctp_bindx().
Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
work in linux.
To separate it from the socket level version, this patch adds 'reuse' in
sctp_sock and it works pretty much as sk->sk_reuse, but with some extra
setup limitations that are needed when it is being enabled.
"It should be noted that the behavior of the socket-level socket option
to reuse ports and/or addresses for SCTP sockets is unspecified", so it
leaves SO_REUSEADDR as is for the compatibility.
Note that the name SCTP_REUSE_PORT is somewhat confusing, as its
functionality is nearly identical to SO_REUSEADDR, but with some
extra restrictions. Here it uses 'reuse' in sctp_sock instead of
'reuseport'. As for sk->sk_reuseport support for SCTP, it will be
added in another patch.
Thanks to Neil to make this clear.
v1->v2:
- add sctp_sk->reuse to separate it from the socket level version.
v2->v3:
- improve changelog according to Marcelo's suggestion.
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ILA_CMD_FLUSH netlink command to clear the ILA translation table.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For ACLs implemented using either FIB rules or FIB entries, the BPF
program needs the FIB lookup status to be able to drop the packet.
Since the bpf_fib_lookup API has not reached a released kernel yet,
change the return code to contain an encoding of the FIB lookup
result and return the nexthop device index in the params struct.
In addition, inform the BPF program of any post FIB lookup reason as
to why the packet needs to go up the stack.
The fib result for unicast routes must have an egress device, so remove
the check that it is non-NULL.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The poll() changes were not well thought out, and completely
unexplained. They also caused a huge performance regression, because
"->poll()" was no longer a trivial file operation that just called down
to the underlying file operations, but instead did at least two indirect
calls.
Indirect calls are sadly slow now with the Spectre mitigation, but the
performance problem could at least be largely mitigated by changing the
"->get_poll_head()" operation to just have a per-file-descriptor pointer
to the poll head instead. That gets rid of one of the new indirections.
But that doesn't fix the new complexity that is completely unwarranted
for the regular case. The (undocumented) reason for the poll() changes
was some alleged AIO poll race fixing, but we don't make the common case
slower and more complex for some uncommon special case, so this all
really needs way more explanations and most likely a fundamental
redesign.
[ This revert is a revert of about 30 different commits, not reverted
individually because that would just be unnecessarily messy - Linus ]
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix coding style issues in tc pedit headers detected by the
checkpatch script.
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend slotting with support for non-uniform distributions. This is
similar to netem's non-uniform distribution delay feature.
Commit f043efeae2f1 ("netem: support delivering packets in delayed
time slots") added the slotting feature to approximate the behaviors
of media with packet aggregation but only supported a uniform
distribution for delays between transmission attempts. Tests with TCP
BBR with emulated wifi links with non-uniform distributions produced
more useful results.
Syntax:
slot dist DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
[bytes MAX_BYTES]
The syntax and use of the distribution table is the same as in the
non-uniform distribution delay feature. A file DISTRIBUTION must be
present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
NETEM_DIST_SCALE. A random value x is selected from the table and it
takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
supported.
Examples:
Normal distribution delay with mean = 800us and stdev = 100us.
> tc qdisc add dev eth0 root netem slot dist normal 800us 100us
Optionally set the max slot size in bytes and/or packets.
> tc qdisc add dev eth0 root netem slot dist normal 800us 100us \
bytes 64k packets 42
Signed-off-by: Yousuk Seung <ysseung@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The __NEED_MEDIA_LEGACY_API define is 1) ugly and 2) dangerous
since it is all too easy for drivers to define it to get hold of
legacy defines. Instead just define what we need in media-device.c
which is the only place where we need the legacy define
(MEDIA_ENT_T_DEVNODE_UNKNOWN).
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
The flag IN_MASK_CREATE is introduced as a flag for inotiy_add_watch()
which prevents inotify from modifying any existing watches when invoked.
If the pathname specified in the call has a watched inode associated
with it and IN_MASK_CREATE is specified, fail with an errno of EEXIST.
Use of IN_MASK_CREATE with IN_MASK_ADD is reserved for future use and
will return EINVAL.
RATIONALE
In the current implementation, there is no way to prevent
inotify_add_watch() from modifying existing watch descriptors. Even if
the caller keeps a record of all watch descriptors collected, this is
only sufficient to detect that an existing watch descriptor may have
been modified.
The assumption that a particular path will map to the same inode over
multiple calls to inotify_add_watch() cannot be made as files can be
renamed or deleted. It is also not possible to assume that two distinct
paths do no map to the same inode, due to hard-links or a dereferenced
symbolic link. Further uses of inotify_add_watch() to revert the change
may cause other watch descriptors to be modified or created, merely
compunding the problem. There is currently no system call such as
inotify_modify_watch() to explicity modify a watch descriptor, which
would be able to revert unwanted changes. Thus the caller cannot
guarantee to be able to revert any changes to existing watch decriptors.
Additionally the caller cannot assume that the events that are
associated with a watch descriptor are within the set requested, as any
future calls to inotify_add_watch() may unintentionally modify a watch
descriptor's mask. Thus it cannot currently be guaranteed that a watch
descriptor will only generate events which have been requested. The
program must filter events which come through its watch descriptor to
within its expected range.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Henry Wilson <henry.wilson@acentic.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Three small bug fixes (barrier elimination, memory leak on unload,
spinlock recursion) and a technical enhancement left over from the
merge window: the TCMU read length support is required for tape
devices read when the length of the read is greater than the tape
block size.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWzL0MSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishc/IAP9clsIx
CwZd3X/c/AffETSGIIMAjjfOAUJq7xEcR+cyUgD/ZSaIaI/PlvdpIfXPPEd7W3UC
W1maxi8MwP4OF6Xi7mw=
=TQl+
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three small bug fixes (barrier elimination, memory leak on unload,
spinlock recursion) and a technical enhancement left over from the
merge window: the TCMU read length support is required for tape
devices read when the length of the read is greater than the tape
block size"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: scsi_debug: Fix memory leak on module unload
scsi: qla2xxx: Spinlock recursion in qla_target
scsi: ipr: Eliminate duplicate barriers
scsi: target: tcmu: add read length support
It will be helpful if we could display the drops due to zero window or no
enough window space.
So a new SNMP MIB entry is added to track this behavior.
This entry is named LINUX_MIB_TCPZEROWINDOWDROP and published in
/proc/net/netstat in TcpExt line as TCPZeroWindowDrop.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The usual mixed bunch. Particular good to see is the generic
touch screen driver. Will be interesting to see if this works
for other ADCs without major changes.
Core features
* Channel types
- New position relative channel type primarily for touch screen
sensors to feed the generic touchscreen driver.
New device support
* ad5586
- Add support for the AD5311R DAC.
* Generic touch screen driver as an IIO consumer.
- Note this is in input, but due to dependencies is coming through
the IIO tree.
- Specific support for this added to the at91-sama5d2 ADC.
- Various necessary DT bindings added.
Staging Drops
* ADIS16060 gyro
- A device with a very odd interface that was never cleanly supported.
It's now very difficult to get, so unlikely it'll ever be fixed up.
Cleanups and minor features and fixes
* core
- Fix y2038 timestamp issues now the core support is in place.
* 104-quad-8
- Provide some defines for magic numbers to help readability.
- Fix an off by one error in register selection
* ad7606
- Put in a missing function parameter name in a prototype.
* adis16023
- Use generic sign_extend function rather than local version.
* adis16240
- Use generic sign_extend funciton rather than local version.
* at91-sama5d2
- Drop dependency on HAS_DMA now this is handled elsewhere. Will
improve build test coverage.
- Add oversampling ratio control. Note there is a minor ABI change
here to increase the apparent depth to 14 bits so as to allow
for transparent provision of different oversampling ratios that
drop the actual bit depth to 13 or 12 bits.
* hx711
- Add a MAINTAINERS entry for this device.
* inv_mpu6050
- Replace the timestamp fifo 'special' code with generic timestamp
handling.
- Switch to using local store of timestamp divider rather than rate
as that is more helpful for accurate time measurement.
- Fix an unaligned access that didn't seem to be causing any trouble.
- Use the fifo overflow bit to track the overflow status rather than
a software counter.
- New timestamping mechanism to deal with missed sample interrupts.
* stm32-adc
- Drop HAS_DMA build dependency.
* sun4i-gpadc
- Select REGMAP_IRQ a very rarely hit build issue fix.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAlsxT24RHGppYzIzQGtl
cm5lbC5vcmcACgkQVIU0mcT0FogOHRAAllFPJBoq+nVqGAqe6CjSmWhAFdyv/Ey0
4iDBsdBHrzhxnDuatqzbFhvu9TgqQxMBSbcRfqrGnJrnCFDNcvo1eJcxq13YFPCy
EitRhu5kUdZ6Ez+OQOjX77wGlPyh8idZnaUkWjowOjnXztkHQjy6IWTbDKFk9bPY
9fGBpmPDn1xaVPPWh7lZVAdTwCt2i+efG4sGyZyMQODjtM0q2G1MoHm9ioqgg8lN
hzO7YIrZmLlXRUhapQ2/61uwa/2WMrcGK5v8eCGphEZnPN5lUWrT//w91+BCQpBC
A9gRFpWblz5qHaRpNhzNbQjUrGvTAeIhF+bdOV2W+oI8CJhTJ0AlNqtUXs2pbaJn
FO6jGwkC+jOA3XdE4tbiqenuMSZNggXBCgyRMfIK5WuIeBF02w57KHxgefkGTnQe
Iqc9QDLLfkGDsOoh1l/+TMWjAxfXJLd7d04wYcRIDy6wumTi6GJxUesiQAnyq5eo
Rg+o8gbfZcnbgzphBRoQjjftFMPeYdr48bCGCmjFjNnIPnemmUZg988gTggl6NST
mzbFBsAnejmYpT393FPL0K9dLVUq5cRngQMuLVDjR4VnlQEVMyV0O+CsW83iSM36
3nqpaUOapqsKJT74n62k1YtzJgxr1uoyMS0LGjldAPLDiTgMf9YiPCCihCpiCY89
K9gE6lzS70A=
=SvKj
-----END PGP SIGNATURE-----
Merge tag 'iio-for-4.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next
Jonathan writes:
First set of IIO new device support, features and cleanups in the 4.19 cycle
The usual mixed bunch. Particular good to see is the generic
touch screen driver. Will be interesting to see if this works
for other ADCs without major changes.
Core features
* Channel types
- New position relative channel type primarily for touch screen
sensors to feed the generic touchscreen driver.
New device support
* ad5586
- Add support for the AD5311R DAC.
* Generic touch screen driver as an IIO consumer.
- Note this is in input, but due to dependencies is coming through
the IIO tree.
- Specific support for this added to the at91-sama5d2 ADC.
- Various necessary DT bindings added.
Staging Drops
* ADIS16060 gyro
- A device with a very odd interface that was never cleanly supported.
It's now very difficult to get, so unlikely it'll ever be fixed up.
Cleanups and minor features and fixes
* core
- Fix y2038 timestamp issues now the core support is in place.
* 104-quad-8
- Provide some defines for magic numbers to help readability.
- Fix an off by one error in register selection
* ad7606
- Put in a missing function parameter name in a prototype.
* adis16023
- Use generic sign_extend function rather than local version.
* adis16240
- Use generic sign_extend funciton rather than local version.
* at91-sama5d2
- Drop dependency on HAS_DMA now this is handled elsewhere. Will
improve build test coverage.
- Add oversampling ratio control. Note there is a minor ABI change
here to increase the apparent depth to 14 bits so as to allow
for transparent provision of different oversampling ratios that
drop the actual bit depth to 13 or 12 bits.
* hx711
- Add a MAINTAINERS entry for this device.
* inv_mpu6050
- Replace the timestamp fifo 'special' code with generic timestamp
handling.
- Switch to using local store of timestamp divider rather than rate
as that is more helpful for accurate time measurement.
- Fix an unaligned access that didn't seem to be causing any trouble.
- Use the fifo overflow bit to track the overflow status rather than
a software counter.
- New timestamping mechanism to deal with missed sample interrupts.
* stm32-adc
- Drop HAS_DMA build dependency.
* sun4i-gpadc
- Select REGMAP_IRQ a very rarely hit build issue fix.
struct itimerspec is not y2038-safe.
Introduce a new struct __kernel_itimerspec based on the kernel internal
y2038-safe struct itimerspec64.
The definition of struct __kernel_itimerspec includes two struct
__kernel_timespec.
Since struct __kernel_timespec has the same representation in native and
compat modes, so does struct __kernel_itimerspec. This helps have a common
entry point for syscalls using struct __kernel_itimerspec.
New y2038-safe syscalls will use this new type. Since most of the new
syscalls are just an update to the native syscalls with the type update,
place the new definition under CONFIG_64BIT_TIME. This helps architectures
that do not support the above config to keep using the old definition of
struct itimerspec.
Also change the get/put_itimerspec64 to use struct__kernel_itimerspec.
This will help 32 bit architectures to use the new syscalls when
architectures select CONFIG_64BIT_TIME.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: viro@zeniv.linux.org.uk
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-api@vger.kernel.org
Cc: y2038@lists.linaro.org
Link: https://lkml.kernel.org/r/20180617051144.29756-2-deepa.kernel@gmail.com
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlsudKcQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgphYXEAC0G+fJcLgue64LwLNjXksDUsoldqNWjTDZ
e3szwseN3Woe7t+Q9DwEUO48T421WhDYe3UY50EZF3Fz5yYLi0ggTeq4qsKgpaXx
80dUY9/STxFZUV/tKrYMz5xPjM3k5x+m8FTeGlFgyR77fbOZE6/8w9XPkpmQN/B2
ayyEPMgAR9aZKhtOU4TJY45O278b2Qdfd0s6XUsxDTwBoR1mTQqKhNnY38PrIA5s
wkhtKz5YRdq5tNhKt2IwYABpJ4HJ1Hz/FE6Fo1RNx+6vfxfEhsk2/DZBn0rzzGDj
kvSaVlPeQEu9SZdJGtb2sx26elVWdt5sqMORQ0zb1kVCochZiT1IO3Z81XCy2Y3u
5WVvOlgsGni9BWr/K+iiN/dc+cKoRPy0g/ib77XUiJHgIC+urUIr3XgcIiuP9GZc
cgCfF+6+GvRKLatSsDZSbns5cpgueJLNogZUyhs7oUouRIECo0DD9NnDAlHmaZYB
957bRd/ykE6QOvtxEsG0TWc3tgyHjJFQQ9ukb45VmzYVB8mITKOGLpceJaqzju9c
AnGyQDddK4fqOct1pgFsSA5C3V6OOVXDpUiFn3DTNoQZcu5HN2hMwEA/g/u5TnGD
N2sEwcu2WCoUjzsMEWFLmX8R4w8laZUDwkNSRC5tYzQEZpVDfmAsRtRcUHJdNLl/
zPj8QMAidA==
=hCgr
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180623' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Further timeout fixes. We aren't quite there yet, so expect another
round of fixes for that to completely close some of the IRQ vs
completion races. (Christoph/Bart)
- Set of NVMe fixes from the usual suspects, mostly error handling
- Two off-by-one fixes (Dan)
- Another bdi race fix (Jan)
- Fix nbd reconfigure with NBD_DISCONNECT_ON_CLOSE (Doron)
* tag 'for-linus-20180623' of git://git.kernel.dk/linux-block:
blk-mq: Fix timeout handling in case the timeout handler returns BLK_EH_DONE
bdi: Fix another oops in wb_workfn()
lightnvm: Remove depends on HAS_DMA in case of platform dependency
nvme-pci: limit max IO size and segments to avoid high order allocations
nvme-pci: move nvme_kill_queues to nvme_remove_dead_ctrl
nvme-fc: release io queues to allow fast fail
nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag.
block: sed-opal: Fix a couple off by one bugs
blk-mq-debugfs: Off by one in blk_mq_rq_state_name()
nvmet: reset keep alive timer in controller enable
nvme-rdma: don't override opts->queue_size
nvme-rdma: Fix command completion race at error recovery
nvme-rdma: fix possible free of a non-allocated async event buffer
nvme-rdma: fix possible double free condition when failing to create a controller
Revert "block: Add warning for bi_next not NULL in bio_endio()"
block: fix timeout changes for legacy request drivers
This patch adds support for virtual xfrm interfaces.
Packets that are routed through such an interface
are guaranteed to be IPsec transformed or dropped.
It is a generic virtual interface that ensures IPsec
transformation, no need to know what happens behind
the interface. This means that we can tunnel IPv4 and
IPv6 through the same interface and support all xfrm
modes (tunnel, transport and beet) on it.
Co-developed-by: Lorenzo Colitti <lorenzo@google.com>
Co-developed-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
This patch adds the xfrm interface id as a lookup key
for xfrm states and policies. With this we can assign
states and policies to virtual xfrm interfaces.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Benedict Wong <benedictwong@google.com>
Tested-by: Benedict Wong <benedictwong@google.com>
Tested-by: Antony Antony <antony@phenome.org>
Reviewed-by: Eyal Birger <eyal.birger@gmail.com>
We already support setting an output mark at the xfrm_state,
unfortunately this does not support the input direction and
masking the marks that will be applied to the skb. This change
adds support applying a masked value in both directions.
The existing XFRMA_OUTPUT_MARK number is reused for this purpose
and as it is now bi-directional, it is renamed to XFRMA_SET_MARK.
An additional XFRMA_SET_MARK_MASK attribute is added for setting the
mask. If the attribute mask not provided, it is set to 0xffffffff,
keeping the XFRMA_OUTPUT_MARK existing 'full mask' semantics.
Co-developed-by: Tobias Brunner <tobias@strongswan.org>
Co-developed-by: Eyal Birger <eyal.birger@gmail.com>
Co-developed-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
If NBD_DISCONNECT_ON_CLOSE is set on a device, then the driver will
issue a disconnect from nbd_release if the device has no remaining
bdev->bd_openers.
Fix ret val so reconfigure with only setting the flag succeeds.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The AUDIT_FILTER_TYPE name is vague and misleading due to not describing
where or when the filter is applied and obsolete due to its available
filter fields having been expanded.
Userspace has already renamed it from AUDIT_FILTER_TYPE to
AUDIT_FILTER_EXCLUDE without checking if it already exists. The
userspace maintainer assures that as long as it is set to the same value
it will not be a problem since the userspace code does not treat
compiler warnings as errors. If this policy changes then checks if it
already exists can be added at the same time.
See: https://github.com/linux-audit/audit-kernel/issues/89
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Generally target core and TCMUser seem to work fine for tape devices and
media changers. But there is at least one situation where TCMUser is not
able to support sequential access device emulation correctly.
The situation is when an initiator sends a SCSI READ CDB with a length that
is greater than the length of the tape block to read. We can distinguish
two subcases:
A) The initiator sent the READ CDB with the SILI bit being set.
In this case the sequential access device has to transfer the data from
the tape block (only the length of the tape block) and transmit a good
status. The current interface between TCMUser and the userspace does
not support reduction of the read data size by the userspace program.
The patch below fixes this subcase by allowing the userspace program to
specify a reduced data size in read direction.
B) The initiator sent the READ CDB with the SILI bit not being set.
In this case the sequential access device has to transfer the data from
the tape block as in A), but additionally has to transmit CHECK
CONDITION with the ILI bit set and NO SENSE in the sensebytes. The
information field in the sensebytes must contain the residual count.
With the below patch a user space program can specify the real read data
length and appropriate sensebytes. TCMUser then uses the se_cmd flag
SCF_TREAT_READ_AS_NORMAL, to force target core to transmit the real data
size and the sensebytes. Note: the flag SCF_TREAT_READ_AS_NORMAL is
introduced by Lee Duncan's patch "[PATCH v4] target: transport should
handle st FM/EOM/ILI reads" from Tue, 15 May 2018 18:25:24 -0700.
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This was too hard to split ... this adds a number of features
to the SCOM user interface:
- Support for indirect SCOMs
- read()/write() interface now handle errors and retries
- New ioctl() "raw" interface for use by debuggers
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Eddie James <eajames@linux.vnet.ibm.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
- can.rst: fix a footnote reference;
- crypto_engine.rst: Fix two parsing warnings;
- Fix a lot of broken references to Documentation/*;
- Improves the scripts/documentation-file-ref-check script,
in order to help detecting/fixing broken references,
preventing false-positives.
After this patch series, only 33 broken references to doc files are
detected by scripts/documentation-file-ref-check.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbJC2aAAoJEAhfPr2O5OEVPmMP/2rN5m9LZ048oRWlg4hCwo73
4FpWqDg18hbWCMHXYHIN1UACIMUkIUfgLhF7WE3D/XqRMuxHoiE5u7DUdak7+VNt
wunpksKFJbgyfFMHRvykHcZV+jQFVbM7eFvXVPIvoSaAeGH6zx4imHTyeDn3x/nL
gdtBqM4bvEhmBjotBTRR4PB8+oPrT/HIT5npHepx3UnFFFAzDQGEZ/I67/el2G5C
pVmYdBXvr7iqrvUs6FilHLTEfe1quCI4UaKNfLHKrxXrTkiJQFOwugYuobZfNmxT
GwjWzfpNy9HMlKJFYipcByALxel1Mnpqz5mIxFQaCTygBuEsORCWzW5MoKIsIUJ0
KOoG76v0rUyMvLBRvaoao3CHYHdzxhQbtVV9DjyDuDksa2G5IoCAF1t6DyIOitRw
9plMnGckk+FJ/MXJKYWXHszFS8NhI0SF2zHe3s1DmRTD8P6oxkxvxBFz6iqqADmL
W6XHd8CcqJItaS9ctPen91TFuysN1HFpdzLLY+xwWmmKOcWC/jFjhTm8pj7xLQHM
5yuuEcefsajf+Xk4w2fSQmRfXnuq+oOlPuWpwSvEy+59cHGI0ms18P1nHy/yt3II
CJywwdx6fjwDon57RFKH7kkGd7px317zMqWdIv9gUj/qZAy9gcdLdvEQLhx9u0aV
4F+hLKFDFEpf58xqRT1R
=/ozx
-----END PGP SIGNATURE-----
Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental
Pull documentation fixes from Mauro Carvalho Chehab:
"This solves a series of broken links for files under Documentation,
and improves a script meant to detect such broken links (see
scripts/documentation-file-ref-check).
The changes on this series are:
- can.rst: fix a footnote reference;
- crypto_engine.rst: Fix two parsing warnings;
- Fix a lot of broken references to Documentation/*;
- improve the scripts/documentation-file-ref-check script, in order
to help detecting/fixing broken references, preventing
false-positives.
After this patch series, only 33 broken references to doc files are
detected by scripts/documentation-file-ref-check"
* tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits)
fix a series of Documentation/ broken file name references
Documentation: rstFlatTable.py: fix a broken reference
ABI: sysfs-devices-system-cpu: remove a broken reference
devicetree: fix a series of wrong file references
devicetree: fix name of pinctrl-bindings.txt
devicetree: fix some bindings file names
MAINTAINERS: fix location of DT npcm files
MAINTAINERS: fix location of some display DT bindings
kernel-parameters.txt: fix pointers to sound parameters
bindings: nvmem/zii: Fix location of nvmem.txt
docs: Fix more broken references
scripts/documentation-file-ref-check: check tools/*/Documentation
scripts/documentation-file-ref-check: get rid of false-positives
scripts/documentation-file-ref-check: hint: dash or underline
scripts/documentation-file-ref-check: add a fix logic for DT
scripts/documentation-file-ref-check: accept more wildcards at filenames
scripts/documentation-file-ref-check: fix help message
media: max2175: fix location of driver's companion documentation
media: v4l: fix broken video4linux docs locations
media: dvb: point to the location of the old README.dvb-usb file
...
Pull aio fixes from Al Viro:
"Assorted AIO followups and fixes"
* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
eventpoll: switch to ->poll_mask
aio: only return events requested in poll_mask() for IOCB_CMD_POLL
eventfd: only return events requested in poll_mask()
aio: mark __aio_sigset::sigmask const
Pull networking fixes from David Miller:
1) Various netfilter fixlets from Pablo and the netfilter team.
2) Fix regression in IPVS caused by lack of PMTU exceptions on local
routes in ipv6, from Julian Anastasov.
3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia.
4) Don't crash on poll in TLS, from Daniel Borkmann.
5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things
including Avahi mDNS. From Bart Van Assche.
6) Missing of_node_put in qcom/emac driver, from Yue Haibing.
7) We lack checking of the TCP checking in one special case during SYN
receive, from Frank van der Linden.
8) Fix module init error paths of mac80211 hwsim, from Johannes Berg.
9) Handle 802.1ad properly in stmmac driver, from Elad Nachman.
10) Must grab HW caps before doing quirk checks in stmmac driver, from
Jose Abreu.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
net: stmmac: Run HWIF Quirks after getting HW caps
neighbour: skip NTF_EXT_LEARNED entries during forced gc
net: cxgb3: add error handling for sysfs_create_group
tls: fix waitall behavior in tls_sw_recvmsg
tls: fix use-after-free in tls_push_record
l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
mlxsw: spectrum_switchdev: Fix port_vlan refcounting
mlxsw: spectrum_router: Align with new route replace logic
mlxsw: spectrum_router: Allow appending to dev-only routes
ipv6: Only emit append events for appended routes
stmmac: added support for 802.1ad vlan stripping
cfg80211: fix rcu in cfg80211_unregister_wdev
mac80211: Move up init of TXQs
mac80211_hwsim: fix module init error paths
cfg80211: initialize sinfo in cfg80211_get_station
nl80211: fix some kernel doc tag mistakes
hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
rds: avoid unenecessary cong_update in loop transport
l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
...
VF support for virtio.
DMA barriers for virtio strong barriers.
Bugfixes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbHykhAAoJECgfDbjSjVRpTAgH/iS2bIo0DOvlC5wPljVMopKV
fD3n5dPUDOc2yWv2H9wwc3xDO6f3kByMjLnHvn+PM2ZX/ms731QaPd5sTlzUm+jj
LzvI0gc9cyym8INZcU+xuTLQhiC13wZmZIHuP7X4TRsKBPTSaT+goSRk63qmuJF7
0V8BJcj2QXaygaWD1P5SczrL4nFK7nn5PWZqZTPk3ohuLcUtgcv6Qb+idj+tCnov
6osK122JkN6GO/LuVgEPxKamDgi9SB+sXeqNCYSzgKzXEUyC/cMtxyExXKxwqDEI
MCcfPcoS1IklvII0ZYCTFKJYDTkPCjZ3HQwxF9aVjy4FirJGpRI3NRp5Eqr9rG4=
=+EYn
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"virtio, vhost: features, fixes
- PCI virtual function support for virtio
- DMA barriers for virtio strong barriers
- bugfixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio: update the comments for transport features
virtio_pci: support enabling VFs
vhost: fix info leak due to uninitialized memory
virtio_ring: switch to dma_XX barriers for rpmsg
As we move stuff around, some doc references are broken. Fix some of
them via this script:
./scripts/documentation-file-ref-check --fix
Manually checked if the produced result is valid, removing a few
false-positives.
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Add support for the HE in cfg80211 and also add userspace API to
nl80211 to send rate information out, conforming with P802.11ax_D2.0.
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Ido Yariv <idox.yariv@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Add the scan flags for randomized SN and minimized probe request
content for improved scan privacy.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
There is a bunch of tags marking constants with &, which means struct
or enum name. Replace them with %, which is the correct tag for
constants.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Here is a collection of small fixes on top of the previous update.
All small and obvious fixes. Mostly for usual suspects, USB-audio and
HD-audio, but a few trivial error handling fixes for misc drivers as
well.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlsizZwOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/eOQ/9Gmpmt5HtrfesubzjUgkNxAJ9rEIVI8YaNnzv
okk8X9q3BWfW6s/QThMQoTglB+wdvxl96SvhAqsyZcfcdE9rdMm5A0N5P+uz+EWV
VmJf1bslKGn/EcPPIt4rSAF/YtR7UXHOQ59ZngYVtUDQWXcE48sTPOn84OR0acSI
z/tx5RXUCDmsC+uFn4okUNpnDBttK+G4+2beLb3NJtAqVsT/x0pJOcWzmFmh3UZW
7pzCQ6rvJpN8oK8Q5rI8282OG3MU0qTFP7o25ncimQr2kSSnUmdxUwIfXLUa4UCv
y+ZSYpBhGBhm8JV154OiXpdbolUDFEGBlUjHgFKB9qdiBGVy8bndAIrEHVv4Hql+
cTC5xR3GQ2clJYUSPDAORgXG3f/lnD9i4rQ8eRXqyz+8XkiTGfQ27u6eis2nccy3
Wf3Pa7j7knnlfBIw4Fnw7OpcyXUyC+KZ+IQFEmERAIfmLrmQh10oO8o91+hivMFB
e/1MCg+pvk1QGR/iT/WbUeOp0ifg3pv2W16vPv/KlrOlimxEbsYbegCzt19OZ4D8
XY+d8Wwf64awwpynoYXkPUMt6T1hFvJpucdlKOkj52DtnDenio4TNf0AyKDAZSHm
9B7NSikEgsxLO8NtPjSVKR4LB3NOhxJVP1OEAjOHTK02jy4pq+HjapYfVy8x9NZn
9mslHp0=
=XDCD
-----END PGP SIGNATURE-----
Merge tag 'sound-fix-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here is a collection of small fixes on top of the previous update.
All small and obvious fixes. Mostly for usual suspects, USB-audio and
HD-audio, but a few trivial error handling fixes for misc drivers as
well"
* tag 'sound-fix-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Always create the interrupt pipe for the mixer
ALSA: usb-audio: Add insertion control for UAC3 BADD
ALSA: usb-audio: Change in connectors control creation interface
ALSA: usb-audio: Add bi-directional terminal types
ALSA: lx6464es: add error handling for pci_ioremap_bar
ALSA: sonicvibes: add error handling for snd_ctl_add
ALSA: usb-audio: Remove explicitly listed Mytek devices
ALSA: usb-audio: Generic DSD detection for XMOS-based implementations
ALSA: usb-audio: Add native DSD support for Mytek DACs
ALSA: hda/realtek - Add shutup hint
ALSA: usb-audio: Disable the quirk for Nura headset
ALSA: hda: add dock and led support for HP ProBook 640 G4
ALSA: hda: add dock and led support for HP EliteBook 830 G5
ALSA: emu10k1: add error handling for snd_ctl_add
ALSA: fm801: add error handling for snd_ctl_add
io_pgetevents() will not change the signal mask. Mark it const
to make it clear and to reduce the need for casts in user code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ARM: lazy context-switching of FPSIMD registers on arm64, "split"
regions for vGIC redistributor
* s390: cleanups for nested, clock handling, crypto, storage keys and
control register bits
* x86: many bugfixes, implement more Hyper-V super powers,
implement lapic_timer_advance_ns even when the LAPIC timer
is emulated using the processor's VMX preemption timer. Two
security-related bugfixes at the top of the branch.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJbH8Z/AAoJEL/70l94x66DF+UIAJeOuTp6LGasT/9uAb2OovaN
+5kGmOPGFwkTcmg8BQHI2fXT4vhxMXWPFcQnyig9eXJVxhuwluXDOH4P9IMay0yw
VDCBsWRdMvZDQad2hn6Z5zR4Jx01XrSaG/KqvXbbDKDCy96mWG7SYAY2m3ZwmeQi
3Pa3O3BTijr7hBYnMhdXGkSn4ZyU8uPaAgIJ8795YKeOJ2JmioGYk6fj6y2WCxA3
ztJymBjTmIoZ/F8bjuVouIyP64xH4q9roAyw4rpu7vnbWGqx1fjPYJoB8yddluWF
JqCPsPzhKDO7mjZJy+lfaxIlzz2BN7tKBNCm88s5GefGXgZwk3ByAq/0GQ2M3rk=
=H5zI
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Small update for KVM:
ARM:
- lazy context-switching of FPSIMD registers on arm64
- "split" regions for vGIC redistributor
s390:
- cleanups for nested
- clock handling
- crypto
- storage keys
- control register bits
x86:
- many bugfixes
- implement more Hyper-V super powers
- implement lapic_timer_advance_ns even when the LAPIC timer is
emulated using the processor's VMX preemption timer.
- two security-related bugfixes at the top of the branch"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (79 commits)
kvm: fix typo in flag name
kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access
KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system
KVM: x86: introduce linear_{read,write}_system
kvm: nVMX: Enforce cpl=0 for VMX instructions
kvm: nVMX: Add support for "VMWRITE to any supported field"
kvm: nVMX: Restrict VMX capability MSR changes
KVM: VMX: Optimize tscdeadline timer latency
KVM: docs: nVMX: Remove known limitations as they do not exist now
KVM: docs: mmu: KVM support exposing SLAT to guests
kvm: no need to check return value of debugfs_create functions
kvm: Make VM ioctl do valloc for some archs
kvm: Change return type to vm_fault_t
KVM: docs: mmu: Fix link to NPT presentation from KVM Forum 2008
kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation
KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation
KVM: introduce kvm_make_vcpus_request_mask() API
KVM: x86: hyperv: do rep check for each hypercall separately
...
NFT_SET_EVAL is signalling the kernel that this sets can be updated from
the evaluation path, even if there are no expressions attached to the
element. Otherwise, set updates with no expressions fail. Update
description to describe the right semantics.
Fixes: 22fe54d5fe ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
KVM_X86_DISABLE_EXITS_HTL really refers to exit on halt.
Obviously a typo: should be named KVM_X86_DISABLE_EXITS_HLT.
Fixes: caa057a2ca ("KVM: X86: Provide a capability to disable HLT intercepts")
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The existing comments for transport features are outdated.
So update them to address the latest changes in the spec.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
There is a new feature bit allocated in virtio spec to
support SR-IOV (Single Root I/O Virtualization):
https://github.com/oasis-tcs/virtio-spec/issues/11
This patch enables the support for this feature bit in
virtio driver.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree:
1) Reject non-null terminated helper names from xt_CT, from Gao Feng.
2) Fix KASAN splat due to out-of-bound access from commit phase, from
Alexey Kodanev.
3) Missing conntrack hook registration on IPVS FTP helper, from Julian
Anastasov.
4) Incorrect skbuff allocation size in bridge nft_reject, from Taehee Yoo.
5) Fix inverted check on packet xmit to non-local addresses, also from
Julian.
6) Fix ebtables alignment compat problems, from Alin Nastac.
7) Hook mask checks are not correct in xt_set, from Serhey Popovych.
8) Fix timeout listing of element in ipsets, from Jozsef.
9) Cap maximum timeout value in ipset, also from Jozsef.
10) Don't allow family option for hash:mac sets, from Florent Fourcot.
11) Restrict ebtables to work with NFPROTO_BRIDGE targets only, this
Florian.
12) Another bug reported by KASAN in the rbtree set backend, from
Taehee Yoo.
13) Missing __IPS_MAX_BIT update doesn't include IPS_OFFLOAD_BIT.
From Gao Feng.
14) Missing initialization of match/target in ebtables, from Florian
Westphal.
15) Remove useless nft_dup.h file in include path, from C. Labbe.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This migrates rpmsg to use SPDX license headers and fixes a
use-after-free in SMD.
-----BEGIN PGP SIGNATURE-----
iQJPBAABCAA5FiEEBd4DzF816k8JZtUlCx85Pw2ZrcUFAlsevT4bHGJqb3JuLmFu
ZGVyc3NvbkBsaW5hcm8ub3JnAAoJEAsfOT8Nma3FBREP/1EuBVx80l+zg8gby5aV
r7xJxSiUOmwla/L8w4M7fUVsfQdvPoFBo/g4SYCpW2fxM8Xm3lzeme5R3Upp1YhE
rVCAKAwM5lGzQvqYTVMF/0e8TBGAa9KDKCW4XF3A5UKFxjZanQ51ZFgcYEpLPFBt
5V3Zx/xQkjwS6upu1O6oGnw5gf33UUZq5T97N9snMMrumTzyJwd7LiBbcVZOE9HH
yAJcWw291yOz5dtG8ZWca4vfBoGIU26nXVe+Zvxsj1BesEw6mkdm4swutnmGswKE
me1HZ637dxQ34QqXKSh/3K5fMFLu+VHXnMs5LcNpo0DpDzD51r5tGk3MMLJunWST
jo4SRJotTKGhzY1HOP+hoohjhnrzh14Q5XMMfRh10W4rzJMz59IMngtyYuc9+r2F
A+aK6Nf9C2QoAcf5dZoz0jsFUyXiRW9N6Yu6pWlSS2LekN1EyTxEBJM7311W3Z29
qTckOYOuLWYTx8nWsT9KYxzE4EqJIx3D14pDoNVkxgnGmHnIKczKGfB7TKixheDN
l9cNZQqKw+eG7OEiHwsdUe0cvL800WVhnLKH9DsO8Rz6BS/9CIx49X1fOT4bKLuU
9ru3i0KrEobilGkzx8whB0d0vGSIgcnrJr82nfNQwdoRWEalNBzOwdrYVrN+CN1a
CWiwLHi1Vfd5ubeLPVKUe6db
=8SM9
-----END PGP SIGNATURE-----
Merge tag 'rpmsg-v4.18' of git://github.com/andersson/remoteproc
Pull rpmsg updates from Bjorn Andersson:
"This migrates rpmsg to use SPDX license headers and fixes a
use-after-free in SMD"
* tag 'rpmsg-v4.18' of git://github.com/andersson/remoteproc:
rpmsg: smd: do not use mananged resources for endpoints and channels
rpmsg: char: Switch to SPDX license identifier
rpmsg: glink: Switch to SPDX license identifier
rpmsg: smd: Switch to SPDX license identifier
rpmsg: virtio_rpmsg_bus: Switch to SPDX license identifier
rpmsg: Switch to SPDX license identifier
rpmsg: qcom_smd: Access APCS through mailbox framework
rpmsg: Add driver_override device attribute for rpmsg_device
Pull networking fixes from David Miller:
1) Fix several bpfilter/UMH bugs, in particular make the UMH build not
depend upon X86 specific Kconfig symbols. From Alexei Starovoitov.
2) Fix handling of modified context pointer in bpf verifier, from
Daniel Borkmann.
3) Kill regression in ifdown/ifup sequences for hv_netvsc driver, from
Dexuan Cui.
4) When the bonding primary member name changes, we have to re-evaluate
the bond->force_primary setting, from Xiangning Yu.
5) Eliminate possible padding beyone end of SKB in cdc_ncm driver, from
Bjørn Mork.
6) RX queue length reported for UDP sockets in procfs and socket diag
are inaccurate, from Paolo Abeni.
7) Fix br_fdb_find_port() locking, from Petr Machata.
8) Limit sk_rcvlowat values properly in TCP, from Soheil Hassas
Yeganeh.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
tcp: limit sk_rcvlowat by the maximum receive buffer
net: phy: dp83822: use BMCR_ANENABLE instead of BMSR_ANEGCAPABLE for DP83620
socket: close race condition between sock_close() and sockfs_setattr()
net: bridge: Fix locking in br_fdb_find_port()
udp: fix rx queue len reported by diag and proc interface
cdc_ncm: avoid padding beyond end of skb
net/sched: act_simple: fix parsing of TCA_DEF_DATA
net: fddi: fix a possible null-ptr-deref
net: aquantia: fix unsigned numvecs comparison with less than zero
net: stmmac: fix build failure due to missing COMMON_CLK dependency
bpfilter: fix race in pipe access
bpf, xdp: fix crash in xdp_umem_unaccount_pages
xsk: Fix umem fill/completion queue mmap on 32-bit
tools/bpf: fix selftest get_cgroup_id_user
bpfilter: fix OUTPUT_FORMAT
umh: fix race condition
net: mscc: ocelot: Fix uninitialized error in ocelot_netdevice_event()
bonding: re-evaluate force_primary when the primary slave name changes
ip_tunnel: Fix name string concatenate in __ip_tunnel_create()
hv_netvsc: Fix a network regression after ifdown/ifup
...
This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc,
xfcp, hisi_sas, cxlflash, qla2xxx. In the absence of Nic, we're also
taking target updates which are mostly minor except for the tcmu
refactor. The only real core change to worry about is the removal of
high page bouncing (in sas, storvsc and iscsi). This has been well
tested and no problems have shown up so far.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWx1pbCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishUucAP42pccS
ziKyiOizuxv9fZ4Q+nXd1A9zhI5tqqpkHjcQegEA40qiZSi3EKGKR8W0UpX7Ntmo
tqrZJGojx9lnrAM2RbQ=
=NMXg
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc,
xfcp, hisi_sas, cxlflash, qla2xxx.
In the absence of Nic, we're also taking target updates which are
mostly minor except for the tcmu refactor.
The only real core change to worry about is the removal of high page
bouncing (in sas, storvsc and iscsi). This has been well tested and no
problems have shown up so far"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (268 commits)
scsi: lpfc: update driver version to 12.0.0.4
scsi: lpfc: Fix port initialization failure.
scsi: lpfc: Fix 16gb hbas failing cq create.
scsi: lpfc: Fix crash in blk_mq layer when executing modprobe -r lpfc
scsi: lpfc: correct oversubscription of nvme io requests for an adapter
scsi: lpfc: Fix MDS diagnostics failure (Rx < Tx)
scsi: hisi_sas: Mark PHY as in reset for nexus reset
scsi: hisi_sas: Fix return value when get_free_slot() failed
scsi: hisi_sas: Terminate STP reject quickly for v2 hw
scsi: hisi_sas: Add v2 hw force PHY function for internal ATA command
scsi: hisi_sas: Include TMF elements in struct hisi_sas_slot
scsi: hisi_sas: Try wait commands before before controller reset
scsi: hisi_sas: Init disks after controller reset
scsi: hisi_sas: Create a scsi_host_template per HW module
scsi: hisi_sas: Reset disks when discovered
scsi: hisi_sas: Add LED feature for v3 hw
scsi: hisi_sas: Change common allocation mode of device id
scsi: hisi_sas: change slot index allocation mode
scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate()
scsi: hisi_sas: fix a typo in hisi_sas_task_prep()
...
Pull restartable sequence support from Thomas Gleixner:
"The restartable sequences syscall (finally):
After a lot of back and forth discussion and massive delays caused by
the speculative distraction of maintainers, the core set of
restartable sequences has finally reached a consensus.
It comes with the basic non disputed core implementation along with
support for arm, powerpc and x86 and a full set of selftests
It was exposed to linux-next earlier this week, so it does not fully
comply with the merge window requirements, but there is really no
point to drag it out for yet another cycle"
* 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq/selftests: Provide Makefile, scripts, gitignore
rseq/selftests: Provide parametrized tests
rseq/selftests: Provide basic percpu ops test
rseq/selftests: Provide basic test
rseq/selftests: Provide rseq library
selftests/lib.mk: Introduce OVERRIDE_TARGETS
powerpc: Wire up restartable sequences system call
powerpc: Add syscall detection for restartable sequences
powerpc: Add support for restartable sequences
x86: Wire up restartable sequence system call
x86: Add support for restartable sequences
arm: Wire up restartable sequences system call
arm: Add syscall detection for restartable sequences
arm: Add restartable sequences support
rseq: Introduce restartable sequences system call
uapi/headers: Provide types_32_64.h
Add new channel type for relative position on a pad.
These type of analog sensor offers the position of a pen
on a touchpad, and is represented as a voltage, which can be
converted to a position on X and Y axis on the pad.
The channel will hand the relative position on the pad in both directions.
The channel can then be consumed by a touchscreen driver or
read as-is for a raw indication of the touchpen on a touchpad.
Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Here is the big staging and IIO driver update for 4.18-rc1.
It was delayed as I wanted to make sure the final driver deletions did
not cause any major merge issues, and all now looks good.
There are a lot of patches here, just over 1000. The diffstat summary
shows the major changes here:
1007 files changed, 16828 insertions(+), 227770 deletions(-)
Because of this, we might be close to shrinking the overall kernel
source code size for two releases in a row.
There was loads of work in this release cycle, primarily:
- tons of ks7010 driver cleanups
- lots of mt7621 driver fixes and cleanups
- most driver cleanups
- wilc1000 fixes and cleanups
- lots and lots of IIO driver cleanups and new additions
- debugfs cleanups for all staging drivers
- lots of other staging driver cleanups and fixes, the shortlog
has the full details.
but the big user-visable things here are the removal of 3 chunks of
code:
- ncpfs and ipx were removed on schedule, no one has cared about
this code since it moved to staging last year, and if it needs
to come back, it can be reverted.
- lustre file system is removed. I've ranted at the lustre
developers about once a year for the past 5 years, with no
real forward progress at all to clean things up and get the
code into the "real" part of the kernel. Given that the
lustre developers continue to work on an external tree and try
to port those changes to the in-kernel tree every once in a
while, this whole thing really really is not working out at
all. So I'm deleting it so that the developers can spend the
time working in their out-of-tree location and get things
cleaned up properly to get merged into the tree correctly at a
later date.
Because of these file removals, you will have merge issues on some of
these files (2 in the ipx code, 1 in the ncpfs code, and 1 in the
atomisp driver). Just delete those files, it's a simple merge :)
All of this has been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWxvjGQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymoEwCbBYnyUl3cwCszIJ3L3/zvUWpmqIgAn1DDsAim
dM4lmKg6HX/JBSV4GAN0
=zdta
-----END PGP SIGNATURE-----
Merge tag 'staging-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO updates from Greg KH:
"Here is the big staging and IIO driver update for 4.18-rc1.
It was delayed as I wanted to make sure the final driver deletions did
not cause any major merge issues, and all now looks good.
There are a lot of patches here, just over 1000. The diffstat summary
shows the major changes here:
1007 files changed, 16828 insertions(+), 227770 deletions(-)
Because of this, we might be close to shrinking the overall kernel
source code size for two releases in a row.
There was loads of work in this release cycle, primarily:
- tons of ks7010 driver cleanups
- lots of mt7621 driver fixes and cleanups
- most driver cleanups
- wilc1000 fixes and cleanups
- lots and lots of IIO driver cleanups and new additions
- debugfs cleanups for all staging drivers
- lots of other staging driver cleanups and fixes, the shortlog has
the full details.
but the big user-visable things here are the removal of 3 chunks of
code:
- ncpfs and ipx were removed on schedule, no one has cared about this
code since it moved to staging last year, and if it needs to come
back, it can be reverted.
- lustre file system is removed.
I've ranted at the lustre developers about once a year for the past
5 years, with no real forward progress at all to clean things up
and get the code into the "real" part of the kernel.
Given that the lustre developers continue to work on an external
tree and try to port those changes to the in-kernel tree every once
in a while, this whole thing really really is not working out at
all. So I'm deleting it so that the developers can spend the time
working in their out-of-tree location and get things cleaned up
properly to get merged into the tree correctly at a later date.
Because of these file removals, you will have merge issues on some of
these files (2 in the ipx code, 1 in the ncpfs code, and 1 in the
atomisp driver). Just delete those files, it's a simple merge :)
All of this has been in linux-next for a while with no reported
problems"
* tag 'staging-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1011 commits)
staging: ipx: delete it from the tree
ncpfs: remove uapi .h files
ncpfs: remove Documentation
ncpfs: remove compat functionality
staging: ncpfs: delete it
staging: lustre: delete the filesystem from the tree.
staging: vc04_services: no need to save the log debufs dentries
staging: vc04_services: vchiq_debugfs_log_entry can be a void *
staging: vc04_services: remove struct vchiq_debugfs_info
staging: vc04_services: move client dbg directory into static variable
staging: vc04_services: remove odd vchiq_debugfs_top() wrapper
staging: vc04_services: no need to check debugfs return values
staging: mt7621-gpio: reorder includes alphabetically
staging: mt7621-gpio: change gc_map to don't use pointers
staging: mt7621-gpio: use GPIOF_DIR_OUT and GPIOF_DIR_IN macros instead of custom values
staging: mt7621-gpio: change 'to_mediatek_gpio' to make just a one line return
staging: mt7621-gpio: dt-bindings: update documentation for #interrupt-cells property
staging: mt7621-gpio: update #interrupt-cells for the gpio node
staging: mt7621-gpio: dt-bindings: complete documentation for the gpio
staging: mt7621-dts: add missing properties to gpio node
...
Pull aio iopriority support from Al Viro:
"The rest of aio stuff for this cycle - Adam's aio ioprio series"
* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: aio ioprio use ioprio_check_cap ret val
fs: aio ioprio add explicit block layer dependence
fs: iomap dio set bio prio from kiocb prio
fs: blkdev set bio prio from kiocb prio
fs: Add aio iopriority support
fs: Convert kiocb rw_hint from enum to u16
block: add ioprio_check_cap function
Merge updates from Andrew Morton:
- a few misc things
- ocfs2 updates
- v9fs updates
- MM
- procfs updates
- lib/ updates
- autofs updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
autofs: small cleanup in autofs_getpath()
autofs: clean up includes
autofs: comment on selinux changes needed for module autoload
autofs: update MAINTAINERS entry for autofs
autofs: use autofs instead of autofs4 in documentation
autofs: rename autofs documentation files
autofs: create autofs Kconfig and Makefile
autofs: delete fs/autofs4 source files
autofs: update fs/autofs4/Makefile
autofs: update fs/autofs4/Kconfig
autofs: copy autofs4 to autofs
autofs4: use autofs instead of autofs4 everywhere
autofs4: merge auto_fs.h and auto_fs4.h
fs/binfmt_misc.c: do not allow offset overflow
checkpatch: improve patch recognition
lib/ucs2_string.c: add MODULE_LICENSE()
lib/mpi: headers cleanup
lib/percpu_ida.c: use _irqsave() instead of local_irq_save() + spin_lock
lib/idr.c: remove simple_ida_lock
lib/bitmap.c: micro-optimization for __bitmap_complement()
...
The autofs module has long since been removed so there's no need to have
two separate include files for autofs.
Link: http://lkml.kernel.org/r/152626703024.28589.9571964661718767929.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define a new PageTable bit in the page_type and use it to mark pages in
use as page tables. This can be helpful when debugging crashdumps or
analysing memory fragmentation. Add a KPF flag to report these pages to
userspace and update page-types.c to interpret that flag.
Note that only pages currently accounted as NR_PAGETABLES are tracked as
PageTable; this does not include pgd/p4d/pud/pmd pages. Those will be the
subject of a later patch.
Link: http://lkml.kernel.org/r/20180518194519.3820-4-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Borkmann says:
====================
pull-request: bpf 2018-06-08
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix in the BPF verifier to reject modified ctx pointers on helper
functions, from Daniel.
2) Fix in BPF kselftests for get_cgroup_id_user() helper to only
record the cgroup id for a provided pid in order to reduce test
failures from processes interferring with the test, from Yonghong.
3) Fix a crash in AF_XDP's mem accounting when the process owning
the sock has CAP_IPC_LOCK capabilities set, from Daniel.
4) Fix an issue for AF_XDP on 32 bit machines where XDP_UMEM_PGOFF_*_RING
defines need ULL suffixes and use loff_t type as they are otherwise
truncated, from Geert.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull integrity updates from James Morris:
"From Mimi:
- add run time support for specifying additional security xattrs
included in the security.evm HMAC/signature
- some code clean up and bug fixes"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
EVM: unlock on error path in evm_read_xattrs()
EVM: prevent array underflow in evm_write_xattrs()
EVM: Fix null dereference on xattr when xattr fails to allocate
EVM: fix memory leak of temporary buffer 'temp'
IMA: use list_splice_tail_init_rcu() instead of its open coded variant
ima: use match_string() helper
ima: fix updating the ima_appraise flag
ima: based on policy verify firmware signatures (pre-allocated buffer)
ima: define a new policy condition based on the filesystem name
EVM: Allow runtime modification of the set of verified xattrs
EVM: turn evm_config_xattrnames into a list
integrity: Add an integrity directory in securityfs
ima: Remove unused variable ima_initialized
ima: Unify logging
ima: Reflect correct permissions for policy
With gcc-4.1.2 on 32-bit:
net/xdp/xsk.c:663: warning: integer constant is too large for ‘long’ type
net/xdp/xsk.c:665: warning: integer constant is too large for ‘long’ type
Add the missing "ULL" suffixes to the large XDP_UMEM_PGOFF_*_RING values
to fix this.
net/xdp/xsk.c:663: warning: comparison is always false due to limited range of data type
net/xdp/xsk.c:665: warning: comparison is always false due to limited range of data type
"unsigned long" is 32-bit on 32-bit systems, hence the offset is
truncated, and can never be equal to any of the XDP_UMEM_PGOFF_*_RING
values. Use loff_t (and the required cast) to fix this.
Fixes: 423f38329d ("xsk: add umem fill queue support and mmap")
Fixes: fe2308328c ("xsk: add umem completion queue support and mmap")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This is internal, not exposed through uapi, and although it maps with
userspace LOG_*, with the introduction of LOGLEVEL_AUDIT we are
incurring in namespace pollution.
This patch adds the NFT_LOGLEVEL_ enumeration and use it from nft_log.
Fixes: 1a893b44de ("netfilter: nf_tables: Add audit support to log statement")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlsZdg0UHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwJOBAAsuuWsOdiJRRhQLU5WfEMFgzcL02R
gsumqZkK7E8LOq0DPNMtcgv9O0KgYZyCiZyTMJ8N7sEYohg04lMz8mtYXOibjcwI
p+nVMko8jQXV9FXwSMGVqigEaLLcrbtkbf/mPriD63DDnRMa/+/Jh15SwfLTydIH
QRTJbIxkS3EiOauj5C8QY3UwzjlvV9mDilzM/x+MSK27k2HFU9Pw/3lIWHY716rr
grPZTwBTfIT+QFZjwOm6iKzHjxRM830sofXARkcH4CgSNaTeq5UbtvAs293MHvc+
v/v/1dfzUh00NxfZDWKHvTUMhjazeTeD9jEVS7T+HUcGzvwGxMSml6bBdznvwKCa
46ynePOd1VcEBlMYYS+P4njRYBLWeUwt6/TzqR4yVwb0keQ6Yj3Y9H2UpzscYiCl
O+0qz6RwyjKY0TpxfjoojgHn4U5ByI5fzVDJHbfr2MFTqqRNaabVrfl6xU4sVuhh
OluT5ym+/dOCTI/wjlolnKNb0XThVre8e2Busr3TRvuwTMKMIWqJ9sXLovntdbqE
furPD/UnuZHkjSFhQ1SQwYdWmsZI5qAq2C9haY8sEWsXEBEcBGLJ2BEleMxm8UsL
KXuy4ER+R4M+sFtCkoWf3D4NTOBUdPHi4jyk6Ooo1idOwXCsASVvUjUEG5YcQC6R
kpJ1VPTKK1XN64I=
=aFAi
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- unify AER decoding for native and ACPI CPER sources (Alexandru
Gagniuc)
- add TLP header info to AER tracepoint (Thomas Tai)
- add generic pcie_wait_for_link() interface (Oza Pawandeep)
- handle AER ERR_FATAL by removing and re-enumerating devices, as
Downstream Port Containment does (Oza Pawandeep)
- factor out common code between AER and DPC recovery (Oza Pawandeep)
- stop triggering DPC for ERR_NONFATAL errors (Oza Pawandeep)
- share ERR_FATAL recovery path between AER and DPC (Oza Pawandeep)
- disable ASPM L1.2 substate if we don't have LTR (Bjorn Helgaas)
- respect platform ownership of LTR (Bjorn Helgaas)
- clear interrupt status in top half to avoid interrupt storm (Oza
Pawandeep)
- neaten pci=earlydump output (Andy Shevchenko)
- avoid errors when extended config space inaccessible (Gilles Buloz)
- prevent sysfs disable of device while driver attached (Christoph
Hellwig)
- use core interface to report PCIe link properties in bnx2x, bnxt_en,
cxgb4, ixgbe (Bjorn Helgaas)
- remove unused pcie_get_minimum_link() (Bjorn Helgaas)
- fix use-before-set error in ibmphp (Dan Carpenter)
- fix pciehp timeouts caused by Command Completed errata (Bjorn
Helgaas)
- fix refcounting in pnv_php hotplug (Julia Lawall)
- clear pciehp Presence Detect and Data Link Layer Status Changed on
resume so we don't miss hotplug events (Mika Westerberg)
- only request pciehp control if we support it, so platform can use
ACPI hotplug otherwise (Mika Westerberg)
- convert SHPC to be builtin only (Mika Westerberg)
- request SHPC control via _OSC if we support it (Mika Westerberg)
- simplify SHPC handoff from firmware (Mika Westerberg)
- fix an SHPC quirk that mistakenly included *all* AMD bridges as well
as devices from any vendor with device ID 0x7458 (Bjorn Helgaas)
- assign a bus number even to non-native hotplug bridges to leave
space for acpiphp additions, to fix a common Thunderbolt xHCI
hot-add failure (Mika Westerberg)
- keep acpiphp from scanning native hotplug bridges, to fix common
Thunderbolt hot-add failures (Mika Westerberg)
- improve "partially hidden behind bridge" messages from core (Mika
Westerberg)
- add macros for PCIe Link Control 2 register (Frederick Lawler)
- replace IB/hfi1 custom macros with PCI core versions (Frederick
Lawler)
- remove dead microblaze and xtensa code (Bjorn Helgaas)
- use dev_printk() when possible in xtensa and mips (Bjorn Helgaas)
- remove unused pcie_port_acpi_setup() and portdrv_acpi.c (Bjorn
Helgaas)
- add managed interface to get PCI host bridge resources from OF (Jan
Kiszka)
- add support for unbinding generic PCI host controller (Jan Kiszka)
- fix memory leaks when unbinding generic PCI host controller (Jan
Kiszka)
- request legacy VGA framebuffer only for VGA devices to avoid false
device conflicts (Bjorn Helgaas)
- turn on PCI_COMMAND_IO & PCI_COMMAND_MEMORY in pci_enable_device()
like everybody else, not in pcibios_fixup_bus() (Bjorn Helgaas)
- add generic enable function for simple SR-IOV hardware (Alexander
Duyck)
- use generic SR-IOV enable for ena, nvme (Alexander Duyck)
- add ACS quirk for Intel 7th & 8th Gen mobile (Alex Williamson)
- add ACS quirk for Intel 300 series (Mika Westerberg)
- enable register clock for Armada 7K/8K (Gregory CLEMENT)
- reduce Keystone "link already up" log level (Fabio Estevam)
- move private DT functions to drivers/pci/ (Rob Herring)
- factor out dwc CONFIG_PCI Kconfig dependencies (Rob Herring)
- add DesignWare support to the endpoint test driver (Gustavo
Pimentel)
- add DesignWare support for endpoint mode (Gustavo Pimentel)
- use devm_ioremap_resource() instead of devm_ioremap() in dra7xx and
artpec6 (Gustavo Pimentel)
- fix Qualcomm bitwise NOT issue (Dan Carpenter)
- add Qualcomm runtime PM support (Srinivas Kandagatla)
- fix DesignWare enumeration below bridges (Koen Vandeputte)
- use usleep() instead of mdelay() in endpoint test (Jia-Ju Bai)
- add configfs entries for pci_epf_driver device IDs (Kishon Vijay
Abraham I)
- clean up pci_endpoint_test driver (Gustavo Pimentel)
- update Layerscape maintainer email addresses (Minghuan Lian)
- add COMPILE_TEST to improve build test coverage (Rob Herring)
- fix Hyper-V bus registration failure caused by domain/serial number
confusion (Sridhar Pitchai)
- improve Hyper-V refcounting and coding style (Stephen Hemminger)
- avoid potential Hyper-V hang waiting for a response that will never
come (Dexuan Cui)
- implement Mediatek chained IRQ handling (Honghui Zhang)
- fix vendor ID & class type for Mediatek MT7622 (Honghui Zhang)
- add Mobiveil PCIe host controller driver (Subrahmanya Lingappa)
- add Mobiveil MSI support (Subrahmanya Lingappa)
- clean up clocks, MSI, IRQ mappings in R-Car probe failure paths
(Marek Vasut)
- poll more frequently (5us vs 5ms) while waiting for R-Car data link
active (Marek Vasut)
- use generic OF parsing interface in R-Car (Vladimir Zapolskiy)
- add R-Car V3H (R8A77980) "compatible" string (Sergei Shtylyov)
- add R-Car gen3 PHY support (Sergei Shtylyov)
- improve R-Car PHYRDY polling (Sergei Shtylyov)
- clean up R-Car macros (Marek Vasut)
- use runtime PM for R-Car controller clock (Dien Pham)
- update arm64 defconfig for Rockchip (Shawn Lin)
- refactor Rockchip code to facilitate both root port and endpoint
mode (Shawn Lin)
- add Rockchip endpoint mode driver (Shawn Lin)
- support VMD "membar shadow" feature (Jon Derrick)
- support VMD bus number offsets (Jon Derrick)
- add VMD "no AER source ID" quirk for more device IDs (Jon Derrick)
- remove unnecessary host controller CONFIG_PCIEPORTBUS Kconfig
selections (Bjorn Helgaas)
- clean up quirks.c organization and whitespace (Bjorn Helgaas)
* tag 'pci-v4.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (144 commits)
PCI/AER: Replace struct pcie_device with pci_dev
PCI/AER: Remove unused parameters
PCI: qcom: Include gpio/consumer.h
PCI: Improve "partially hidden behind bridge" log message
PCI: Improve pci_scan_bridge() and pci_scan_bridge_extend() doc
PCI: Move resource distribution for single bridge outside loop
PCI: Account for all bridges on bus when distributing bus numbers
ACPI / hotplug / PCI: Drop unnecessary parentheses
ACPI / hotplug / PCI: Mark stale PCI devices disconnected
ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug
PCI: hotplug: Add hotplug_is_native()
PCI: shpchp: Add shpchp_is_native()
PCI: shpchp: Fix AMD POGO identification
PCI: mobiveil: Add MSI support
PCI: mobiveil: Add Mobiveil PCIe Host Bridge IP driver
PCI/AER: Decode Error Source Requester ID
PCI/AER: Remove aer_recover_work_func() forward declaration
PCI/DPC: Use the generic pcie_do_fatal_recovery() path
PCI/AER: Pass service type to pcie_do_fatal_recovery()
PCI/DPC: Disable ERR_NONFATAL handling by DPC
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbGDjcAAoJEAhfPr2O5OEV1KwP/Am2n5Ehc2W0/DLD3K7LlwgN
8JnMPWNQTCreLRgJD0/KX+HH1M+yBJ05KF/Glm7fcOKpOhWqrUbPgtiQT0GyHHBB
uvmQfGJrRvCrP+1S1SeWtNhItsyWvCfDaorzsTWJYEF/F9Wtj/Sj92DC1y/BKQaR
Rcs4yeCqlFEp3rjbXExanFhA3/XeMzK2sby8c51cILTZPkWI64qrHcZRWOW7+zZ6
fKEVDNOKxa7sfg8I9yaI73lBGXBpCJxLROloJ3jEtuH5gY3nR6PZdXunBC5K0+pX
UH1vUeBcS/3ExQWL0zmCqHz1aYb6kzTSbIPs+NktxyzTTb8FDjT9JhV/9AxhEQfK
iIxv81LRBdbEoPxbvx88sj5VVvlRla/NRv03WsXhhVDqx2SZJuNgSXw3XJtGDx3j
AuUur0AQH4KMNNmwxmyn6wbhm7N63AaGbmYE2sRmaL7lk6b48BSbsUpOM5AMWzZe
ZsESSsoqjRR86zFtVVzI7ZImCk16D6mNRP94Z0DQReWAJ6zS57/EYfKZ+pEQ6mww
IyoJalD+pBe160fqsSo59F4k2fqzsqP4p8m29OQWFvyl7+UboMBz7FscWfyT98+R
MbJolZ8QJNlOVaOusxPYLfdfjVmkHCt4E0cBZVFYIvliTGd5QiKqHAW+kTYQH0No
Y0nHm4bShsUY8I9YCgsk
=r/WP
-----END PGP SIGNATURE-----
Merge tag 'media/v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- remove of atomisp driver from staging, as nobody would have time to
dedicate huge efforts to fix all the problems there. Also, we have a
feeling that the driver may not even run the way it is.
- move Zoran driver to staging, in order to be either fixed to use VB2
and the proper media kAPIs or to be removed
- remove videobuf-dvb driver, with is unused for a while
- some V4L2 documentation fixes/improvements
- new sensor drivers: imx258 and ov7251
- a new driver was added to allow using I2C transparent drivers
- several improvements at the ddbridge driver
- several improvements at the ISDB pt1 driver, making it more coherent
with the DVB framework
- added a new platform driver for MIPI CSI-2 RX: cadence
- now, all media drivers can be compiled on x86 with COMPILE_TEST
- almost all media drivers now build on non-x86 architectures with
COMPILE_TEST
- lots of other random stuff: cleanups, support for new board models,
bug fixes, etc
* tag 'media/v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (464 commits)
media: omap2: fix compile-testing with FB_OMAP2=m
media: media/radio/Kconfig: add back RADIO_ISA
media: v4l2-ioctl.c: fix missing unlock in __video_do_ioctl()
media: pxa_camera: ignore -ENOIOCTLCMD from v4l2_subdev_call for s_power
media: arch: sh: migor: Fix TW9910 PDN gpio
media: staging: tegra-vde: Reset VDE regardless of memory client resetting failure
media: marvel-ccic: mmp: select VIDEOBUF2_VMALLOC/DMA_CONTIG
media: marvel-ccic: allow ccic and mmp drivers to coexist
media: uvcvideo: Prevent setting unavailable flags
media: ddbridge: conditionally enable fast TS for stv0910-equipped bridges
media: dvb-frontends/stv0910: make TS speed configurable
media: ddbridge/mci: add identifiers to function definition arguments
media: ddbridge/mci: protect against out-of-bounds array access in stop()
media: rc: ensure input/lirc device can be opened after register
media: rc: nuvoton: Keep device enabled during reg init
media: rc: nuvoton: Keep track of users on CIR enable/disable
media: rc: nuvoton: Tweak the interrupt enabling dance
media: uvcvideo: Support realtek's UVC 1.5 device
media: uvcvideo: Fix driver reference counting
media: gspca_zc3xx: Enable short exposure times for OV7648
...
The most interesting part of this update is user namespace support, mostly
done by Eric Biederman. This enables safe unprivileged fuse mounts within
a user namespace.
There are also a couple of fixes for bugs found by syzbot and miscellaneous
fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCWxanJAAKCRDh3BK/laaZ
PJjDAP4r6f4kL/5DZxK7JSnSue8BHESGD1LCMVgL57e9WmZukgD/cOtaO85ie3lh
DWuhX5xGZVMMX4frIGLfBn8ogSS+egw=
=3luD
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
"The most interesting part of this update is user namespace support,
mostly done by Eric Biederman. This enables safe unprivileged fuse
mounts within a user namespace.
There are also a couple of fixes for bugs found by syzbot and
miscellaneous fixes and cleanups"
* tag 'fuse-update-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: don't keep dead fuse_conn at fuse_fill_super().
fuse: fix control dir setup and teardown
fuse: fix congested state leak on aborted connections
fuse: Allow fully unprivileged mounts
fuse: Ensure posix acls are translated outside of init_user_ns
fuse: add writeback documentation
fuse: honor AT_STATX_FORCE_SYNC
fuse: honor AT_STATX_DONT_SYNC
fuse: Restrict allow_other to the superblock's namespace or a descendant
fuse: Support fuse filesystems outside of init_user_ns
fuse: Fail all requests with invalid uids or gids
fuse: Remove the buggy retranslation of pids in fuse_dev_do_read
fuse: return -ECONNABORTED on /dev/fuse read after abort
fuse: atomic_o_trunc should truncate pagecache
The __IPS_MAX_BIT is used in __ctnetlink_change_status as the max bit
value. When add new bit IPS_OFFLOAD_BIT whose value is 14, we should
increase the __IPS_MAX_BIT too, from 14 to 15.
There is no any bug in current codes, although it lost one loop in
__ctnetlink_change_status. Because the new bit IPS_OFFLOAD_BIT belongs
the IPS_UNCHANGEABLE_MASK.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull networking updates from David Miller:
1) Add Maglev hashing scheduler to IPVS, from Inju Song.
2) Lots of new TC subsystem tests from Roman Mashak.
3) Add TCP zero copy receive and fix delayed acks and autotuning with
SO_RCVLOWAT, from Eric Dumazet.
4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
Brouer.
5) Add ttl inherit support to vxlan, from Hangbin Liu.
6) Properly separate ipv6 routes into their logically independant
components. fib6_info for the routing table, and fib6_nh for sets of
nexthops, which thus can be shared. From David Ahern.
7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
messages from XDP programs. From Nikita V. Shirokov.
8) Lots of long overdue cleanups to the r8169 driver, from Heiner
Kallweit.
9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.
10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.
11) Plumb extack down into fib_rules, from Roopa Prabhu.
12) Add Flower classifier offload support to igb, from Vinicius Costa
Gomes.
13) Add UDP GSO support, from Willem de Bruijn.
14) Add documentation for eBPF helpers, from Quentin Monnet.
15) Add TLS tx offload to mlx5, from Ilya Lesokhin.
16) Allow applications to be given the number of bytes available to read
on a socket via a control message returned from recvmsg(), from
Soheil Hassas Yeganeh.
17) Add x86_32 eBPF JIT compiler, from Wang YanQing.
18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
From Björn Töpel.
19) Remove indirect load support from all of the BPF JITs and handle
these operations in the verifier by translating them into native BPF
instead. From Daniel Borkmann.
20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.
21) Allow XDP programs to do lookups in the main kernel routing tables
for forwarding. From David Ahern.
22) Allow drivers to store hardware state into an ELF section of kernel
dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.
23) Various RACK and loss detection improvements in TCP, from Yuchung
Cheng.
24) Add TCP SACK compression, from Eric Dumazet.
25) Add User Mode Helper support and basic bpfilter infrastructure, from
Alexei Starovoitov.
26) Support ports and protocol values in RTM_GETROUTE, from Roopa
Prabhu.
27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
Brouer.
28) Add lots of forwarding selftests, from Petr Machata.
29) Add generic network device failover driver, from Sridhar Samudrala.
* ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
strparser: Add __strp_unpause and use it in ktls.
rxrpc: Fix terminal retransmission connection ID to include the channel
net: hns3: Optimize PF CMDQ interrupt switching process
net: hns3: Fix for VF mailbox receiving unknown message
net: hns3: Fix for VF mailbox cannot receiving PF response
bnx2x: use the right constant
Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
enic: fix UDP rss bits
netdev-FAQ: clarify DaveM's position for stable backports
rtnetlink: validate attributes in do_setlink()
mlxsw: Add extack messages for port_{un, }split failures
netdevsim: Add extack error message for devlink reload
devlink: Add extack to reload and port_{un, }split operations
net: metrics: add proper netlink validation
ipmr: fix error path when ipmr_new_table fails
ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
net: hns3: remove unused hclgevf_cfg_func_mta_filter
netfilter: provide udp*_lib_lookup for nf_tproxy
qed*: Utilize FW 8.37.2.0
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAlsXFUEUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQVeRaWujKfIoomg//eRNpc6x9kxTijN670AC2uD0CBTlZ
2z6mHuJaOhG8bTxjZxQfUBoo6/eZJ2YC1yq6ornGFNzw4sfKsR/j86ujJim2HAmo
opUhziq3SILGEvjsxfPkREe/wb49jy0AA/WjZqciitB1ig8Hz7xzqi0lpNaEspFh
QJFB6XXkojWGFGrRzruAVJnPS+pDWoTQR0qafs3JWKnpeinpOdZnl1hPsysAEHt5
Ag8o4qS/P9xJM0khi7T+jWECmTyT/mtWqEtFcZ0o+JLOgt/EMvNX6DO4ETDiYRD2
mVChga9x5r78bRgNy2U8IlEWWa76WpcQAEODvhzbijX4RxMAmjsmLE+e+udZSnMZ
eCITl2f7ExxrL5SwNFC/5h7pAv0RJ+SOC19vcyeV4JDlQNNVjUy/aNKv5baV0aeg
EmkeobneMWxqHx52aERz8RF1in5pT8gLOYoYnWfNpcDEmjLrwhuZLX2asIzUEqrS
SoPJ8hxIDCxceHOWIIrz5Dqef7x28Dyi46w3QINC8bSy2RnR/H3q40DRegvXOGiS
9WcbbwbhnM4Kau413qKicGCvdqTVYdeyZqo7fVelSciD139Vk7pZotyom4MuU25p
fIyGfXa8/8gkl7fZ+HNkZbba0XWNfAZt//zT095qsp3CkhVnoybwe6OwG1xRqErq
W7OOQbS7vvN/KGo=
=10u6
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Another reasonable chunk of audit changes for v4.18, thirteen patches
in total.
The thirteen patches can mostly be broken down into one of four
categories: general bug fixes, accessor functions for audit state
stored in the task_struct, negative filter matches on executable
names, and extending the (relatively) new seccomp logging knobs to the
audit subsystem.
The main driver for the accessor functions from Richard are the
changes we're working on to associate audit events with containers,
but I think they have some standalone value too so I figured it would
be good to get them in now.
The seccomp/audit patches from Tyler apply the seccomp logging
improvements from a few releases ago to audit's seccomp logging;
starting with this patchset the changes in
/proc/sys/kernel/seccomp/actions_logged should apply to both the
standard kernel logging and audit.
As usual, everything passes the audit-testsuite and it happens to
merge cleanly with your tree"
[ Heh, except it had trivial merge conflicts with the SELinux tree that
also came in from Paul - Linus ]
* tag 'audit-pr-20180605' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: Fix wrong task in comparison of session ID
audit: use existing session info function
audit: normalize loginuid read access
audit: use new audit_context access funciton for seccomp_actions_logged
audit: use inline function to set audit context
audit: use inline function to get audit context
audit: convert sessionid unset to a macro
seccomp: Don't special case audited processes when logging
seccomp: Audit attempts to modify the actions_logged sysctl
seccomp: Configurable separator for the actions_logged string
seccomp: Separate read and write code for actions_logged sysctl
audit: allow not equal op for audit by executable
audit: add syscall information to FEATURE_CHANGE records
- add macros for PCIe Link Control 2 register (Frederick Lawler)
- replace IB/hfi1 custom macros with PCI core versions (Frederick Lawler)
- remove dead microblaze and xtensa code (Bjorn Helgaas)
- use dev_printk() when possible in xtensa and mips (Bjorn Helgaas)
* pci/misc:
MIPS: PCI: Use dev_printk() when possible
xtensa/PCI: Use dev_printk() when possible
xtensa/PCI: Make variables static
xtensa/PCI: Remove dead code
microblaze/PCI: Remove pcibios_claim_one_bus() dead code
microblaze/PCI: Remove pcibios_finish_adding_to_bus() dead code
IB/hfi1: Replace custom hfi1 macros with PCIe macros
PCI: Add PCI_EXP_LNKCTL2_TLS* macros
We've got many code additions at this cycle as a result of quite a few
new drivers. Below are highlights:
Core stuff:
- Fix the long-standing issue with the device registration order;
the control device is now registered at last
- PCM locking code cleanups for RT kernels
- Fixes for possible races in ALSA timer resolution accesses
- TLV offset definitions in uapi
ASoC:
- Many fixes for the topology stuff, including fixes for v4 ABI
compatibility
- Lots of cleanups / quirks for Intel platforms based on Realtek
CODECs
- Continued componentization works, removing legacy CODEC stuff
- Conversion of OMAP DMA to the new, more standard SDMA-PCM driver
- Fixes and updates to Cirrus Logic SoC drivers
- New Qualcomm DSP support
- New drivers for Analog SSM2305, Atmel I2S controllers, Mediatek
MT6351, MT6797 and MT7622, Qualcomm DSPs, Realtek RT1305, RT1306 and
RT5668 and TI TSCS454
HD-audio:
- Finally better support for some CA0132 boards, allowing Windows
firmware
- HP Spectre x360 support along with a bulk of COEF stuff
- Blacklisting power save default some known boards reported on Fedora
USB-audio:
- Continued improvements on UAC3 support; now BADD is supported
- Fixes / improvements for Dell WD15 dock
- Allow DMA coherent pages for PCM buffers for ARCH, MIPS & co
Others:
- New Xen sound frontend driver support
- Cache implementation and other improvements for FireWire DICE
- Conversions to octal permissions in allover places
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlsXjEAOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/szw/9FdtjD7LOBMNgbVbeU+SDTEUGh1OdIElSE+bL
vU1USUNud9TgYb4SFM4grjsB9v6vM+bZ8mquzLpSzGj2J/Yl3dT7reyr6TYfoGfE
oCETfYLk0gbhQrrqWoVwRHsPAJYyj6dGXZ/Kiy9MuD9bfWUGAehuqKl1inySxcGb
VTrhlegHApRJ7z+Yzn6K3Git+aCYhpgxO5NK1DkoagHAsAhJhdavYWhm8lcQ4pnO
UlahRms0cTpDFrIkHHKH+c1ihyxn3RVpueQIIpx5xRpIHXezMnUk8mpRduqcYGk2
9cxVlC4KMucsAud39joGN6BWlkmfpmlMfGkdVlAnlBpdTyYC1pJRCKPXX3P+d9Tk
NvH3jKx/izNYFPLOysoV4vc5puDqSEfAC3geD+ugJFhhJuH9sAMyGOx9MRhC6ebf
qGI2IEhyn9tVc8/f3glEqS79WDHas+dUCkXxhbAQtj4NcjqgXkM26h5MnrzIYV23
m5iAzXIDuT44Qw1BxHQb40DzgHZMU3p/c/PAqU/Ex9+Oi1mq6E8V7MH+n9Ylo2vN
Mw3atYiLqv9xv+7/MmvGUQuTyMR3HB0KyNUCcSyuWPnuqZ/Gi+wIg9cuYXYfrn57
NtCoW4gzaex909z+QoZa5YxYRfZBJuRjYU0ugOBdK6R3+l/6IsGTasdR6LqngYY6
RIgK2S8=
=37hC
-----END PGP SIGNATURE-----
Merge tag 'sound-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"We've got many code additions at this cycle as a result of quite a few
new drivers. Below are highlights:
Core stuff:
- Fix the long-standing issue with the device registration order; the
control device is now registered at last
- PCM locking code cleanups for RT kernels
- Fixes for possible races in ALSA timer resolution accesses
- TLV offset definitions in uapi
ASoC:
- Many fixes for the topology stuff, including fixes for v4 ABI
compatibility
- Lots of cleanups / quirks for Intel platforms based on Realtek
CODECs
- Continued componentization works, removing legacy CODEC stuff
- Conversion of OMAP DMA to the new, more standard SDMA-PCM driver
- Fixes and updates to Cirrus Logic SoC drivers
- New Qualcomm DSP support
- New drivers for Analog SSM2305, Atmel I2S controllers, Mediatek
MT6351, MT6797 and MT7622, Qualcomm DSPs, Realtek RT1305, RT1306
and RT5668 and TI TSCS454
HD-audio:
- Finally better support for some CA0132 boards, allowing Windows
firmware
- HP Spectre x360 support along with a bulk of COEF stuff
- Blacklisting power save default some known boards reported on
Fedora
USB-audio:
- Continued improvements on UAC3 support; now BADD is supported
- Fixes / improvements for Dell WD15 dock
- Allow DMA coherent pages for PCM buffers for ARCH, MIPS & co
Others:
- New Xen sound frontend driver support
- Cache implementation and other improvements for FireWire DICE
- Conversions to octal permissions in allover places"
* tag 'sound-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (386 commits)
ASoC: dapm: delete dapm_kcontrol_data paths list before freeing it
ALSA: usb-audio: remove redundant check on err
ASoC: topology: Move skl-tplg-interface.h to uapi
ASoC: topology: Move v4 manifest header data structures to uapi
ASoC: topology: Improve backwards compatibility with v4 topology files
ALSA: pci/hda: Remove unused, broken, header file
ASoC: TSCS454: Add Support
ASoC: Intel: kbl: Move codec sysclk config to codec_init function
ASoC: simple-card: set cpu dai clk in hw_params
ALSA: hda - Handle kzalloc() failure in snd_hda_attach_pcm_stream()
ALSA: oxygen: use match_string() helper
ASoC: dapm: use match_string() helper
ASoC: max98095: use match_string() helper
ASoC: max98088: use match_string() helper
ASoC: Intel: bytcr_rt5651: Set card long_name based on quirks
ASoC: mt6797-mt6351: add hostless phone call path
ASoC: mt6797: add Hostless DAI
ASoC: mt6797: add PCM interface
ASoC: mediatek: export mtk-afe symbols as needed
ASoC: codecs: PCM1789: include gpio/consumer.h
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbFzauAAoJEAx081l5xIa+jzwP/AwfTreH3XBmlLeMO8kwkAMW
AiH1u3KHrlitN1U85x90dC8hVuuV2Kv9pEXQo1me/TAL/QCb9VZYCSf2dHHkkMwD
1UdwxAQW7soBbTRo9k0zuVJpRGSYIiYfqDh3L6KmTC3UF0tUJm53VOWCLG/DNrtn
XhEnGnlCUNbABZMMpavEvlwxPtvaYFlp6M+MmwRPIx32SrPJ3vSaBzxwWxOaFmjU
xiRdK+GpAbCV8yGeCCSkDgEe/TdWGUhZxoC9dLb0H9ex7ip8uZ0W4D+VTHPFrhQX
6nCpqUbp7BQTsbVSd1pAVsAv45scmSgWbKcqfC0NKSVcsHcJZBR0tQOF9OvnGZcf
o1Hv/beTqJ++IcG2rEIwyJTGxAGfZ0YSb0evTC9VcszaYo+b3+G283bdztIjzDeS
0QCTeLHYbZRHPITWVULNpMWy3TkJv32IdFhQfYSnD8/OGQIxLNhh4FFOtHnOmxSF
N8dnzOLKXhXMo/NgOL+UMNnbgLqIyOtEXCPDLuOQJNv/SOp8662m/A0yRjQNR6M2
gsPmR7dxQIwwJMyqrkLDOF411ABZohulquYgwLgG938MRPmTpPWOR72PtpGF4hAW
HLg+3HHBd1N/A1mlJUMAbUn2eMUACZBUIycE9u+U/geRgve/OQnzJH/FKGP2EJ4R
pf6CruEva+6GRR5GVzuM
=twst
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2018-06-06-1' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"This starts to support NVIDIA volta hardware with nouveau, and adds
amdgpu support for the GPU in the Kabylake-G (the intel + radeon
single package chip), along with some initial Intel icelake enabling.
Summary:
New Drivers:
- v3d - driver for broadcom V3D V3.x+ hardware
- xen-front - XEN PV display frontend
core:
- handle zpos normalization in the core
- stop looking at legacy pointers in atomic paths
- improved scheduler documentation
- improved aspect ratio validation
- aspect ratio support for 64:27 and 256:135
- drop unused control node code.
i915:
- Icelake (ICL) enabling
- GuC/HuC refactoring
- PSR/PSR2 enabling and fixes
- DPLL management refactoring
- DP MST fixes
- NV12 enabling
- HDCP improvements
- GEM/Execlist/reset improvements
- GVT improvements
- stolen memory first 4k fix
amdgpu:
- Vega 20 support
- VEGAM support (Kabylake-G)
- preOS scanout buffer reservation
- power management gfxoff support for raven
- SR-IOV fixes
- Vega10 power profiles and clock voltage control
- scatter/gather display support on CZ/ST
amdkfd:
- GFX9 dGPU support
- userptr memory mapping
nouveau:
- major refactoring for Volta GV100 support
tda998x:
- HDMI i2c CEC support
etnaviv:
- removed unused logging code
- license text cleanups
- MMU handling improvements
- timeout fence fix for 50 days uptime
tegra:
- IOMMU support in gr2d/gr3d drivers
- zpos support
vc4:
- syncobj support
- CTM, plane alpha and async cursor support
analogix_dp:
- HPD and aux chan fixes
sun4i:
- MIPI DSI support
tilcdc:
- clock divider fixes for OMAP-l138 LCDK board
rcar-du:
- R8A77965 support
- dma-buf fences fixes
- hardware indexed crtc/du group handling
- generic zplane property support
atmel-hclcdc:
- generic zplane property support
mediatek:
- use generic video mode function
exynos:
- S5PV210 FIMD variant support
- IPP v2 framework
- more HW overlays support"
* tag 'drm-next-2018-06-06-1' of git://anongit.freedesktop.org/drm/drm: (1286 commits)
drm/amdgpu: fix 32-bit build warning
drm/exynos: fimc: signedness bug in fimc_setup_clocks()
drm/exynos: scaler: fix static checker warning
drm/amdgpu: Use dev_info() to report amdkfd is not supported for this ASIC
drm/amd/display: Remove use of division operator for long longs
drm/amdgpu: Update GFX info structure to match what vega20 used
drm/amdgpu/pp: remove duplicate assignment
drm/sched: add rcu_barrier after entity fini
drm/amdgpu: move VM BOs on LRU again
drm/amdgpu: consistenly use VM moved flag
drm/amdgpu: kmap PDs/PTs in amdgpu_vm_update_directories
drm/amdgpu: further optimize amdgpu_vm_handle_moved
drm/amdgpu: cleanup amdgpu_vm_validate_pt_bos v2
drm/amdgpu: rework VM state machine lock handling v2
drm/amdgpu: Add runtime VCN PG support
drm/amdgpu: Enable VCN static PG by default on RV
drm/amdgpu: Add VCN static PG support on RV
drm/amdgpu: Enable VCN CG by default on RV
drm/amdgpu: Add static CG control for VCN on RV
drm/exynos: Fix default value for zpos plane property
...
Expose a new system call allowing each thread to register one userspace
memory area to be used as an ABI between kernel and user-space for two
purposes: user-space restartable sequences and quick access to read the
current CPU number value from user-space.
* Restartable sequences (per-cpu atomics)
Restartables sequences allow user-space to perform update operations on
per-cpu data without requiring heavy-weight atomic operations.
The restartable critical sections (percpu atomics) work has been started
by Paul Turner and Andrew Hunter. It lets the kernel handle restart of
critical sections. [1] [2] The re-implementation proposed here brings a
few simplifications to the ABI which facilitates porting to other
architectures and speeds up the user-space fast path.
Here are benchmarks of various rseq use-cases.
Test hardware:
arm32: ARMv7 Processor rev 4 (v7l) "Cubietruck", 2-core
x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading
The following benchmarks were all performed on a single thread.
* Per-CPU statistic counter increment
getcpu+atomic (ns/op) rseq (ns/op) speedup
arm32: 344.0 31.4 11.0
x86-64: 15.3 2.0 7.7
* LTTng-UST: write event 32-bit header, 32-bit payload into tracer
per-cpu buffer
getcpu+atomic (ns/op) rseq (ns/op) speedup
arm32: 2502.0 2250.0 1.1
x86-64: 117.4 98.0 1.2
* liburcu percpu: lock-unlock pair, dereference, read/compare word
getcpu+atomic (ns/op) rseq (ns/op) speedup
arm32: 751.0 128.5 5.8
x86-64: 53.4 28.6 1.9
* jemalloc memory allocator adapted to use rseq
Using rseq with per-cpu memory pools in jemalloc at Facebook (based on
rseq 2016 implementation):
The production workload response-time has 1-2% gain avg. latency, and
the P99 overall latency drops by 2-3%.
* Reading the current CPU number
Speeding up reading the current CPU number on which the caller thread is
running is done by keeping the current CPU number up do date within the
cpu_id field of the memory area registered by the thread. This is done
by making scheduler preemption set the TIF_NOTIFY_RESUME flag on the
current thread. Upon return to user-space, a notify-resume handler
updates the current CPU value within the registered user-space memory
area. User-space can then read the current CPU number directly from
memory.
Keeping the current cpu id in a memory area shared between kernel and
user-space is an improvement over current mechanisms available to read
the current CPU number, which has the following benefits over
alternative approaches:
- 35x speedup on ARM vs system call through glibc
- 20x speedup on x86 compared to calling glibc, which calls vdso
executing a "lsl" instruction,
- 14x speedup on x86 compared to inlined "lsl" instruction,
- Unlike vdso approaches, this cpu_id value can be read from an inline
assembly, which makes it a useful building block for restartable
sequences.
- The approach of reading the cpu id through memory mapping shared
between kernel and user-space is portable (e.g. ARM), which is not the
case for the lsl-based x86 vdso.
On x86, yet another possible approach would be to use the gs segment
selector to point to user-space per-cpu data. This approach performs
similarly to the cpu id cache, but it has two disadvantages: it is
not portable, and it is incompatible with existing applications already
using the gs segment selector for other purposes.
Benchmarking various approaches for reading the current CPU number:
ARMv7 Processor rev 4 (v7l)
Machine model: Cubietruck
- Baseline (empty loop): 8.4 ns
- Read CPU from rseq cpu_id: 16.7 ns
- Read CPU from rseq cpu_id (lazy register): 19.8 ns
- glibc 2.19-0ubuntu6.6 getcpu: 301.8 ns
- getcpu system call: 234.9 ns
x86-64 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz:
- Baseline (empty loop): 0.8 ns
- Read CPU from rseq cpu_id: 0.8 ns
- Read CPU from rseq cpu_id (lazy register): 0.8 ns
- Read using gs segment selector: 0.8 ns
- "lsl" inline assembly: 13.0 ns
- glibc 2.19-0ubuntu6 getcpu: 16.6 ns
- getcpu system call: 53.9 ns
- Speed (benchmark taken on v8 of patchset)
Running 10 runs of hackbench -l 100000 seems to indicate, contrary to
expectations, that enabling CONFIG_RSEQ slightly accelerates the
scheduler:
Configuration: 2 sockets * 8-core Intel(R) Xeon(R) CPU E5-2630 v3 @
2.40GHz (directly on hardware, hyperthreading disabled in BIOS, energy
saving disabled in BIOS, turboboost disabled in BIOS, cpuidle.off=1
kernel parameter), with a Linux v4.6 defconfig+localyesconfig,
restartable sequences series applied.
* CONFIG_RSEQ=n
avg.: 41.37 s
std.dev.: 0.36 s
* CONFIG_RSEQ=y
avg.: 40.46 s
std.dev.: 0.33 s
- Size
On x86-64, between CONFIG_RSEQ=n/y, the text size increase of vmlinux is
567 bytes, and the data size increase of vmlinux is 5696 bytes.
[1] https://lwn.net/Articles/650333/
[2] http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Hunter <ahh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20151027235635.16059.11630.stgit@pjt-glaptop.roam.corp.google.com
Link: http://lkml.kernel.org/r/20150624222609.6116.86035.stgit@kitami.mtv.corp.google.com
Link: https://lkml.kernel.org/r/20180602124408.8430-3-mathieu.desnoyers@efficios.com
Provide helper macros for fields which represent pointers in
kernel-userspace ABI. This facilitates handling of 32-bit
user-space by 64-bit kernels by defining those fields as
32-bit 0-padding and 32-bit integer on 32-bit architectures,
which allows the kernel to treat those as 64-bit integers.
The order of padding and 32-bit integer depends on the
endianness.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Hunter <ahh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: linux-api@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180602124408.8430-2-mathieu.desnoyers@efficios.com
Here is the big tty/serial driver update for 4.18-rc1.
There's nothing major here, just lots of serial driver updates. Full
details are in the shortlog, nothing anything specific to call out here.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWxbZzQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ym2pQCggrWJsOeXLXgzhVDH6/qMFP9R/hEAoLTvmOWQ
BlPIlvRDm/ud33VogJ8t
=XS8u
-----END PGP SIGNATURE-----
Merge tag 'tty-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the big tty/serial driver update for 4.18-rc1.
There's nothing major here, just lots of serial driver updates. Full
details are in the shortlog, nothing anything specific to call out
here.
All have been in linux-next for a while with no reported issues"
* tag 'tty-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (55 commits)
vt: Perform safe console erase only once
serial: imx: disable UCR4_OREN on shutdown
serial: imx: drop CTS/RTS handling from shutdown
tty: fix typo in ASYNCB_FOURPORT comment
serial: samsung: check DMA engine capabilities before using DMA mode
tty: Fix data race in tty_insert_flip_string_fixed_flag
tty: serial: msm_geni_serial: Fix TX infinite loop
serial: 8250_dw: Fix runtime PM handling
serial: 8250: omap: Fix idling of clocks for unused uarts
tty: serial: drop ATH79 specific SoC symbols
serial: 8250: Add missing rxtrig_bytes on Altera 16550 UART
serial/aspeed-vuart: fix a couple mod_timer() calls
serial: sh-sci: Use spin_{try}lock_irqsave instead of open coding version
serial: 8250_of: Add IO space support
tty/serial: atmel: use port->name as name in request_irq()
serial: imx: dma_unmap_sg buffers on shutdown
serial: imx: cleanup imx_uart_disable_dma()
tty: serial: qcom_geni_serial: Add early console support
tty: serial: qcom_geni_serial: Return IRQ_NONE for spurious interrupts
tty: serial: qcom_geni_serial: Use iowrite32_rep to write to FIFO
...
Here is the big USB pull request for 4.18-rc1.
Lots of stuff here, the highlights are:
- phy driver updates and new additions
- usual set of xhci driver updates
- normal set of musb updates
- gadget driver updates and new controllers
- typec work, it's getting closer to getting fully out of the
staging portion of the tree.
- lots of minor cleanups and bugfixes.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWxba6w8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykumQCg2abWE5LijR0SNJIwX8xk64HLUAMAnAxBZDG3
aB0GyOQd54L+09q4LAdn
=ZCEx
-----END PGP SIGNATURE-----
Merge tag 'usb-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB and PHY updates from Greg KH:
"Here is the big USB pull request for 4.18-rc1.
Lots of stuff here, the highlights are:
- phy driver updates and new additions
- usual set of xhci driver updates
- normal set of musb updates
- gadget driver updates and new controllers
- typec work, it's getting closer to getting fully out of the staging
portion of the tree.
- lots of minor cleanups and bugfixes.
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (263 commits)
Revert "xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue"
xhci: Add quirk to zero 64bit registers on Renesas PCIe controllers
xhci: Allow more than 32 quirks
usb: xhci: force all memory allocations to node
selftests: add test for USB over IP driver
USB: typec: fsusb302: no need to check return value of debugfs_create_dir()
USB: gadget: udc: s3c2410_udc: no need to check return value of debugfs_create functions
USB: gadget: udc: renesas_usb3: no need to check return value of debugfs_create functions
USB: gadget: udc: pxa27x_udc: no need to check return value of debugfs_create functions
USB: gadget: udc: gr_udc: no need to check return value of debugfs_create functions
USB: gadget: udc: bcm63xx_udc: no need to check return value of debugfs_create functions
USB: udc: atmel_usba_udc: no need to check return value of debugfs_create functions
USB: dwc3: no need to check return value of debugfs_create functions
USB: dwc2: no need to check return value of debugfs_create functions
USB: core: no need to check return value of debugfs_create functions
USB: chipidea: no need to check return value of debugfs_create functions
USB: ehci-hcd: no need to check return value of debugfs_create functions
USB: fhci-hcd: no need to check return value of debugfs_create functions
USB: fotg210-hcd: no need to check return value of debugfs_create functions
USB: imx21-hcd: no need to check return value of debugfs_create functions
...
algorithms. Yes, Speck is contrversial, but the intention is to use
them only for the lowest end Android devices, where the alternative
*really* is no encryption at all for data stored at rest.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlsW/RAACgkQ8vlZVpUN
gaO1pwf/WOusoXBK5sUuiC8d9I5s+OlPhTKhrh+BcL7/xhOkyh2xDv2FEwsjhwUf
qo26AMf7DsWKWgJ6wDQ1z+PIuPSNeQy5dCKbz2hbfNjET3vdk2NuvPWnIbFrmIek
LB6Ii9jKlPJRO4T3nMrE9JzJZLsoX5OKRSgYTT3EviuW/wSXaFyi7onFnyKXBnF/
e689tE50P42PgTEDKs4RDw43PwEGbcl5Vtj+Lnoh6VGX/pYvL/9ZbEYlKrgqSOU4
DmckR8D8UU/Gy6G5bvMsVuJpLEU7vBxupOOHI/nJFwR6tuYi0Q1j7C/zH8BvWp5e
o8P5GpOWk7Gm346FaUlkAZ+25bCU+A==
=EBeE
-----END PGP SIGNATURE-----
Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypt updates from Ted Ts'o:
"Add bunch of cleanups, and add support for the Speck128/256
algorithms.
Yes, Speck is contrversial, but the intention is to use them only for
the lowest end Android devices, where the alternative *really* is no
encryption at all for data stored at rest"
* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: log the crypto algorithm implementations
fscrypt: add Speck128/256 support
fscrypt: only derive the needed portion of the key
fscrypt: separate key lookup from key derivation
fscrypt: use a common logging function
fscrypt: remove internal key size constants
fscrypt: remove unnecessary check for non-logon key type
fscrypt: make fscrypt_operations.max_namelen an integer
fscrypt: drop empty name check from fname_decrypt()
fscrypt: drop max_namelen check from fname_decrypt()
fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info()
fscrypt: don't clear flags on crypto transform
fscrypt: remove stale comment from fscrypt_d_revalidate()
fscrypt: remove error messages for skcipher_request_alloc() failure
fscrypt: remove unnecessary NULL check when allocating skcipher
fscrypt: clean up after fscrypt_prepare_lookup() conversions
fs, fscrypt: only define ->s_cop when FS_ENCRYPTION is enabled
fscrypt: use unbound workqueue for decryption
- Strengthen inode number and structure validation when allocating inodes.
- Reduce pointless buffer allocations during cache miss
- Use FUA for pure data O_DSYNC directio writes
- Various iomap refactorings
- Strengthen quota metadata verification to avoid unfixable broken quota
- Make AGFL block freeing a deferred operation to avoid blowing out
transaction reservations when running complex operations
- Get rid of the log item descriptors to reduce log overhead
- Fix various reflink bugs where inodes were double-joined to
transactions
- Don't issue discards when trimming unwritten extents
- Refactor incore dquot initialization and retrieval interfaces
- Fix some locking problmes in the quota scrub code
- Strengthen btree structure checks in scrub code
- Rewrite swapfile activation to use iomap and support unwritten extents
- Make scrub exit to userspace sooner when corruptions or
cross-referencing problems are found
- Make scrub invoke the data fork scrubber directly on metadata inodes
- Don't do background reclamation of post-eof and cow blocks when the fs
is suspended
- Fix secondary superblock buffer lifespan hinting
- Refactor growfs to use table-dispatched functions instead of long
stringy functions
- Move growfs code to libxfs
- Implement online fs label getting and setting
- Introduce online filesystem repair (in a very limited capacity)
- Fix unit conversion problems in the realtime freemap iteration
functions
- Various refactorings and cleanups in preparation to remove buffer
heads in a future release
- Reimplement the old bmap call with iomap
- Remove direct buffer head accesses from seek hole/data
- Various bug fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlsR9dEACgkQ+H93GTRK
tOv0dw//cBwRgY4jhC6b9oMk2DNRWUiTt1F2yoqr28661GPo124iXAMLIwJe1DiV
W/qpN3HUz7P46xKOVY+MXaj0JIDFxJ8c5tHAQMH/TkDc49S+mkcGyaoPJ39hnc6u
yikG+Hq4m0YWhHaeUhKTe8pnhXBaziz5A2NtKtwh6lPOIW+Wds51T77DJnViqADq
tZzmAq8fS9/ELpxe0Th/2D7iTWCr2c3FLsW2KgbbNvQ4e34zVE1ix1eBtEzQE+Mm
GUjdQhYVS1oCzqZfCxJkzR4R/1TAFyS0FXOW7PHo8FAX/kas9aQbRlnHSAQ/08EE
8Z2p3GsFip7dgmd6O6nAmFAStW6GRvgyycJ7Y+Y0IsJj6aDp9OxhRExyF+uocJR9
b9ChOH6PMEtRB/RRlBg66pbS61abvNGutzl61ZQZGBHEvL3VqDcd68IomdD5bNSB
pXo6mOJIcKuXsghZszsHAV9uuMe4zQAMbLy7QH6V8LyWeSAG9hTXOT9EA4MWktEJ
SCQFf7RRPgU5pEAgOS8LgKrawqnBaqFcFvkvWsQhyiltTFz29cwxH7tjSXYMAOFE
W+RMp8kbkPnGOaJJeKxT+/RGRB534URk0jIEKtRb679xkEF3HE58exXEVrnojJq6
0m712+EYuZSYhFBwrvEnQjNHr0x2r/A/iBJZ6HhyV0aO1RWm4n4=
=11pr
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.18-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"New features this cycle include the ability to relabel mounted
filesystems, support for fallocated swapfiles, and using FUA for pure
data O_DSYNC directio writes. With this cycle we begin to integrate
online filesystem repair and refactor the growfs code in preparation
for eventual subvolume support, though the road ahead for both
features is quite long.
There are also numerous refactorings of the iomap code to remove
unnecessary log overhead, to disentangle some of the quota code, and
to prepare for buffer head removal in a future upstream kernel.
Metadata validation continues to improve, both in the hot path
veifiers and the online filesystem check code. I anticipate sending a
second pull request in a few days with more metadata validation
improvements.
This series has been run through a full xfstests run over the weekend
and through a quick xfstests run against this morning's master, with
no major failures reported.
Summary:
- Strengthen inode number and structure validation when allocating
inodes.
- Reduce pointless buffer allocations during cache miss
- Use FUA for pure data O_DSYNC directio writes
- Various iomap refactorings
- Strengthen quota metadata verification to avoid unfixable broken
quota
- Make AGFL block freeing a deferred operation to avoid blowing out
transaction reservations when running complex operations
- Get rid of the log item descriptors to reduce log overhead
- Fix various reflink bugs where inodes were double-joined to
transactions
- Don't issue discards when trimming unwritten extents
- Refactor incore dquot initialization and retrieval interfaces
- Fix some locking problmes in the quota scrub code
- Strengthen btree structure checks in scrub code
- Rewrite swapfile activation to use iomap and support unwritten
extents
- Make scrub exit to userspace sooner when corruptions or
cross-referencing problems are found
- Make scrub invoke the data fork scrubber directly on metadata
inodes
- Don't do background reclamation of post-eof and cow blocks when the
fs is suspended
- Fix secondary superblock buffer lifespan hinting
- Refactor growfs to use table-dispatched functions instead of long
stringy functions
- Move growfs code to libxfs
- Implement online fs label getting and setting
- Introduce online filesystem repair (in a very limited capacity)
- Fix unit conversion problems in the realtime freemap iteration
functions
- Various refactorings and cleanups in preparation to remove buffer
heads in a future release
- Reimplement the old bmap call with iomap
- Remove direct buffer head accesses from seek hole/data
- Various bug fixes"
* tag 'xfs-4.18-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (121 commits)
fs: use ->is_partially_uptodate in page_cache_seek_hole_data
fs: remove the buffer_unwritten check in page_seek_hole_data
fs: move page_cache_seek_hole_data to iomap.c
xfs: use iomap_bmap
iomap: add an iomap-based bmap implementation
iomap: add a iomap_sector helper
iomap: use __bio_add_page in iomap_dio_zero
iomap: move IOMAP_F_BOUNDARY to gfs2
iomap: fix the comment describing IOMAP_NOWAIT
iomap: inline data should be an iomap type, not a flag
mm: split ->readpages calls to avoid non-contiguous pages lists
mm: return an unsigned int from __do_page_cache_readahead
mm: give the 'ret' variable a better name __do_page_cache_readahead
block: add a lower-level bio_add_page interface
xfs: fix error handling in xfs_refcount_insert()
xfs: fix xfs_rtalloc_rec units
xfs: strengthen rtalloc query range checks
xfs: xfs_rtbuf_get should check the bmapi_read results
xfs: xfs_rtword_t should be unsigned, not signed
dax: change bdev_dax_supported() to support boolean returns
...
Now that ncpfs is removed from the tree, there is no need to keep the
uapi header files around as no one uses them, and it is not a feature
that the kernel supports anymore.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-06-05
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add a new BPF hook for sendmsg similar to existing hooks for bind and
connect: "This allows to override source IP (including the case when it's
set via cmsg(3)) and destination IP:port for unconnected UDP (slow path).
TCP and connected UDP (fast path) are not affected. This makes UDP support
complete, that is, connected UDP is handled by connect hooks, unconnected
by sendmsg ones.", from Andrey.
2) Rework of the AF_XDP API to allow extending it in future for type writer
model if necessary. In this mode a memory window is passed to hardware
and multiple frames might be filled into that window instead of just one
that is the case in the current fixed frame-size model. With the new
changes made this can be supported without having to add a new descriptor
format. Also, core bits for the zero-copy support for AF_XDP have been
merged as agreed upon, where i40e bits will be routed via Jeff later on.
Various improvements to documentation and sample programs included as
well, all from Björn and Magnus.
3) Given BPF's flexibility, a new program type has been added to implement
infrared decoders. Quote: "The kernel IR decoders support the most
widely used IR protocols, but there are many protocols which are not
supported. [...] There is a 'long tail' of unsupported IR protocols,
for which lircd is need to decode the IR. IR encoding is done in such
a way that some simple circuit can decode it; therefore, BPF is ideal.
[...] user-space can define a decoder in BPF, attach it to the rc
device through the lirc chardev.", from Sean.
4) Several improvements and fixes to BPF core, among others, dumping map
and prog IDs into fdinfo which is a straight forward way to correlate
BPF objects used by applications, removing an indirect call and therefore
retpoline in all map lookup/update/delete calls by invoking the callback
directly for 64 bit archs, adding a new bpf_skb_cgroup_id() BPF helper
for tc BPF programs to have an efficient way of looking up cgroup v2 id
for policy or other use cases. Fixes to make sure we zero tunnel/xfrm
state that hasn't been filled, to allow context access wrt pt_regs in
32 bit archs for tracing, and last but not least various test cases
for fixes that landed in bpf earlier, from Daniel.
5) Get rid of the ndo_xdp_flush API and extend the ndo_xdp_xmit with
a XDP_XMIT_FLUSH flag instead which allows to avoid one indirect
call as flushing is now merged directly into ndo_xdp_xmit(), from Jesper.
6) Add a new bpf_get_current_cgroup_id() helper that can be used in
tracing to retrieve the cgroup id from the current process in order
to allow for e.g. aggregation of container-level events, from Yonghong.
7) Two follow-up fixes for BTF to reject invalid input values and
related to that also two test cases for BPF kselftests, from Martin.
8) Various API improvements to the bpf_fib_lookup() helper, that is,
dropping MPLS bits which are not fully hashed out yet, rejecting
invalid helper flags, returning error for unsupported address
families as well as renaming flowlabel to flowinfo, from David.
9) Various fixes and improvements to sockmap BPF kselftests in particular
in proper error detection and data verification, from Prashant.
10) Two arm32 BPF JIT improvements. One is to fix imm range check with
regards to whether immediate fits into 24 bits, and a naming cleanup
to get functions related to rsh handling consistent to those handling
lsh, from Wang.
11) Two compile warning fixes in BPF, one for BTF and a false positive
to silent gcc in stack_map_get_build_id_offset(), from Arnd.
12) Add missing seg6.h header into tools include infrastructure in order
to fix compilation of BPF kselftests, from Mathieu.
13) Several formatting cleanups in the BPF UAPI helper description that
also fix an error during rst2man compilation, from Quentin.
14) Hide an unused variable in sk_msg_convert_ctx_access() when IPv6 is
not built into the kernel, from Yue.
15) Remove a useless double assignment in dev_map_enqueue(), from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the xsk_rcv to support the new MEM_TYPE_ZERO_COPY memory, and
wireup ndo_bpf call in bind.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull timers and timekeeping updates from Thomas Gleixner:
- Core infrastucture work for Y2038 to address the COMPAT interfaces:
+ Add a new Y2038 safe __kernel_timespec and use it in the core
code
+ Introduce config switches which allow to control the various
compat mechanisms
+ Use the new config switch in the posix timer code to control the
32bit compat syscall implementation.
- Prevent bogus selection of CPU local clocksources which causes an
endless reselection loop
- Remove the extra kthread in the clocksource code which has no value
and just adds another level of indirection
- The usual bunch of trivial updates, cleanups and fixlets all over the
place
- More SPDX conversions
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
clocksource/drivers/mxs_timer: Switch to SPDX identifier
clocksource/drivers/timer-imx-tpm: Switch to SPDX identifier
clocksource/drivers/timer-imx-gpt: Switch to SPDX identifier
clocksource/drivers/timer-imx-gpt: Remove outdated file path
clocksource/drivers/arc_timer: Add comments about locking while read GFRC
clocksource/drivers/mips-gic-timer: Add pr_fmt and reword pr_* messages
clocksource/drivers/sprd: Fix Kconfig dependency
clocksource: Move inline keyword to the beginning of function declarations
timer_list: Remove unused function pointer typedef
timers: Adjust a kernel-doc comment
tick: Prefer a lower rating device only if it's CPU local device
clocksource: Remove kthread
time: Change nanosleep to safe __kernel_* types
time: Change types to new y2038 safe __kernel_* types
time: Fix get_timespec64() for y2038 safe compat interfaces
time: Add new y2038 safe __kernel_timespec
posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME
time: Introduce CONFIG_COMPAT_32BIT_TIME
time: Introduce CONFIG_64BIT_TIME in architectures
compat: Enable compat_get/put_timespec64 always
...
Pull siginfo updates from Eric Biederman:
"This set of changes close the known issues with setting si_code to an
invalid value, and with not fully initializing struct siginfo. There
remains work to do on nds32, arc, unicore32, powerpc, arm, arm64, ia64
and x86 to get the code that generates siginfo into a simpler and more
maintainable state. Most of that work involves refactoring the signal
handling code and thus careful code review.
Also not included is the work to shrink the in kernel version of
struct siginfo. That depends on getting the number of places that
directly manipulate struct siginfo under control, as it requires the
introduction of struct kernel_siginfo for the in kernel things.
Overall this set of changes looks like it is making good progress, and
with a little luck I will be wrapping up the siginfo work next
development cycle"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
signal/sh: Stop gcc warning about an impossible case in do_divide_error
signal/mips: Report FPE_FLTUNK for undiagnosed floating point exceptions
signal/um: More carefully relay signals in relay_signal.
signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR}
signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code
signal/signalfd: Add support for SIGSYS
signal/signalfd: Remove __put_user from signalfd_copyinfo
signal/xtensa: Use force_sig_fault where appropriate
signal/xtensa: Consistenly use SIGBUS in do_unaligned_user
signal/um: Use force_sig_fault where appropriate
signal/sparc: Use force_sig_fault where appropriate
signal/sparc: Use send_sig_fault where appropriate
signal/sh: Use force_sig_fault where appropriate
signal/s390: Use force_sig_fault where appropriate
signal/riscv: Replace do_trap_siginfo with force_sig_fault
signal/riscv: Use force_sig_fault where appropriate
signal/parisc: Use force_sig_fault where appropriate
signal/parisc: Use force_sig_mceerr where appropriate
signal/openrisc: Use force_sig_fault where appropriate
signal/nios2: Use force_sig_fault where appropriate
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlsVV6wACgkQxWXV+ddt
WDsaOhAAnrG3o/Bkn4loifYjiWQoArswwVJ2m5MvG14oPcn/c6wCUlE6mHdEsT8l
WgltSRSd/3hJ1BcuWjhcR8RSLBRSaPcw4FAHikAdOWMrLXSKfA+quT/upXD00iDS
+bnHoknc5vvk+Ow0KyGODAeMYfJm8gGB43BeVDr/mgrVRrwYY9r14s+6LX897NOT
qObdxBjtedua0uIYZX5673siFaSbvJNQyGn6iKO9Ww+7bkWjgKO8qY8/3/gEaJn1
q1303Oo65VOXQQKZed+NjPNakpqRmABaN79qTwWMnUI4qYBKi651a7sr5HljDpmc
szY3uF/E7AOfJSDeTBdglFR1eQgv20ceeIx7FN0n6iUbuIVH9P85mBF6f76Hivpq
fA39uPYxWwWq8RxmC09yqoTkuNS0zAQhov4n8XO0q/81Gfek0RpR+LMK15W02Ur8
+5Pazkgz+xRYRPhpExe75pbM5N31KFJdf+4wpXeZoXKGTeiiCl/SZV3NzyUtemGg
eN7ltHD/Y9iNpx7sgHn+2veXwn6psDad8ORxIlRyKDeot2pNZ4lnekPbqMC/IwAz
hF3ZpcEo8v2Q2kqwCqw5evq4NSOvfG0cEuIqAKir1AgTn2nPH99qRSR1He9kluap
GBPTPXNgBrhhdsh5kltI6CJcGE46bHsLG4LI5WvRs+wOtyaG5Ww=
=jIvf
-----END PGP SIGNATURE-----
Merge tag 'for-4.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"User visible features:
- added support for the ioctl FS_IOC_FSGETXATTR, per-inode flags,
successor of GET/SETFLAGS; now supports only existing flags:
append, immutable, noatime, nodump, sync
- 3 new unprivileged ioctls to allow users to enumerate subvolumes
- dedupe syscall implementation does not restrict the range to 16MiB,
though it still splits the whole range to 16MiB chunks
- on user demand, rmdir() is able to delete an empty subvolume,
export the capability in sysfs
- fix inode number types in tracepoints, other cleanups
- send: improved speed when dealing with a large removed directory,
measurements show decrease from 2000 minutes to 2 minutes on a
directory with 2 million entries
- pre-commit check of superblock to detect a mysterious in-memory
corruption
- log message updates
Other changes:
- orphan inode cleanup improved, does no keep long-standing
reservations that could lead up to early ENOSPC in some cases
- slight improvement of handling snapshotted NOCOW files by avoiding
some unnecessary tree searches
- avoid OOM when dealing with many unmergeable small extents at flush
time
- speedup conversion of free space tree representations from/to
bitmap/tree
- code refactoring, deletion, cleanups:
+ delayed refs
+ delayed iput
+ redundant argument removals
+ memory barrier cleanups
+ remove a redundant mutex supposedly excluding several ioctls to
run in parallel
- new tracepoints for blockgroup manipulation
- more sanity checks of compressed headers"
* tag 'for-4.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (183 commits)
btrfs: Add unprivileged version of ino_lookup ioctl
btrfs: Add unprivileged ioctl which returns subvolume's ROOT_REF
btrfs: Add unprivileged ioctl which returns subvolume information
Btrfs: clean up error handling in btrfs_truncate()
btrfs: Factor out write portion of btrfs_get_blocks_direct
btrfs: Factor out read portion of btrfs_get_blocks_direct
btrfs: return ENOMEM if path allocation fails in btrfs_cross_ref_exist
btrfs: raid56: Remove VLA usage
btrfs: return error value if create_io_em failed in cow_file_range
btrfs: drop useless member qgroup_reserved of btrfs_pending_snapshot
btrfs: drop unused parameter qgroup_reserved
btrfs: balance dirty metadata pages in btrfs_finish_ordered_io
btrfs: lift some btrfs_cross_ref_exist checks in nocow path
btrfs: Remove fs_info argument from btrfs_uuid_tree_rem
btrfs: Remove fs_info argument from btrfs_uuid_tree_add
Btrfs: remove unused check of skip_locking
Btrfs: remove always true check in unlock_up
Btrfs: grab write lock directly if write_lock_level is the max level
Btrfs: move get root out of btrfs_search_slot to a helper
Btrfs: use more straightforward extent_buffer_uptodate check
...
Pull aio updates from Al Viro:
"Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly.
The only thing I'm holding back for a day or so is Adam's aio ioprio -
his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case),
but let it sit in -next for decency sake..."
* 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
aio: sanitize the limit checking in io_submit(2)
aio: fold do_io_submit() into callers
aio: shift copyin of iocb into io_submit_one()
aio_read_events_ring(): make a bit more readable
aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
aio: take list removal to (some) callers of aio_complete()
aio: add missing break for the IOCB_CMD_FDSYNC case
random: convert to ->poll_mask
timerfd: convert to ->poll_mask
eventfd: switch to ->poll_mask
pipe: convert to ->poll_mask
crypto: af_alg: convert to ->poll_mask
net/rxrpc: convert to ->poll_mask
net/iucv: convert to ->poll_mask
net/phonet: convert to ->poll_mask
net/nfc: convert to ->poll_mask
net/caif: convert to ->poll_mask
net/bluetooth: convert to ->poll_mask
net/sctp: convert to ->poll_mask
net/tipc: convert to ->poll_mask
...
Currently, AF_XDP only supports a fixed frame-size memory scheme where
each frame is referenced via an index (idx). A user passes the frame
index to the kernel, and the kernel acts upon the data. Some NICs,
however, do not have a fixed frame-size model, instead they have a
model where a memory window is passed to the hardware and multiple
frames are filled into that window (referred to as the "type-writer"
model).
By changing the descriptor format from the current frame index
addressing scheme, AF_XDP can in the future be extended to support
these kinds of NICs.
In the index-based model, an idx refers to a frame of size
frame_size. Addressing a frame in the UMEM is done by offseting the
UMEM starting address by a global offset, idx * frame_size + offset.
Communicating via the fill- and completion-rings are done by means of
idx.
In this commit, the idx is removed in favor of an address (addr),
which is a relative address ranging over the UMEM. To convert an
idx-based address to the new addr is simply: addr = idx * frame_size +
offset.
We also stop referring to the UMEM "frame" as a frame. Instead it is
simply called a chunk.
To transfer ownership of a chunk to the kernel, the addr of the chunk
is passed in the fill-ring. Note, that the kernel will mask addr to
make it chunk aligned, so there is no need for userspace to do
that. E.g., for a chunk size of 2k, passing an addr of 2048, 2050 or
3000 to the fill-ring will refer to the same chunk.
On the completion-ring, the addr will match that of the Tx descriptor,
passed to the kernel.
Changing the descriptor format to use chunks/addr will allow for
future changes to move to a type-writer based model, where multiple
frames can reside in one chunk. In this model passing one single chunk
into the fill-ring, would potentially result in multiple Rx
descriptors.
This commit changes the uapi of AF_XDP sockets, and updates the
documentation.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
As Michal noted the flow struct takes both the flow label and priority.
Update the bpf_fib_lookup API to note that it is flowinfo and not just
the flow label.
Cc: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf has been used extensively for tracing. For example, bcc
contains an almost full set of bpf-based tools to trace kernel
and user functions/events. Most tracing tools are currently
either filtered based on pid or system-wide.
Containers have been used quite extensively in industry and
cgroup is often used together to provide resource isolation
and protection. Several processes may run inside the same
container. It is often desirable to get container-level tracing
results as well, e.g. syscall count, function count, I/O
activity, etc.
This patch implements a new helper, bpf_get_current_cgroup_id(),
which will return cgroup id based on the cgroup within which
the current task is running.
The later patch will provide an example to show that
userspace can get the same cgroup id so it could
configure a filter or policy in the bpf program based on
task cgroup id.
The helper is currently implemented for tracing. It can
be added to other program types as well when needed.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use the appropriate SPDX license identifier in the rpmsg char driver
source file and drop the previous boilerplate license text. The uapi
header file already had the SPDX license identifier added as part of
a mass update but the license text removal was deferred for later,
and this patch drops the same.
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Since the remaining bits are not filled in struct bpf_tunnel_key
resp. struct bpf_xfrm_state and originate from uninitialized stack
space, we should make sure to clear them before handing control
back to the program.
Also add a padding element to struct bpf_xfrm_state for future use
similar as we have in struct bpf_tunnel_key and clear it as well.
struct bpf_xfrm_state {
__u32 reqid; /* 0 4 */
__u32 spi; /* 4 4 */
__u16 family; /* 8 2 */
/* XXX 2 bytes hole, try to pack */
union {
__u32 remote_ipv4; /* 4 */
__u32 remote_ipv6[4]; /* 16 */
}; /* 12 16 */
/* size: 28, cachelines: 1, members: 4 */
/* sum members: 26, holes: 1, sum holes: 2 */
/* last cacheline: 28 bytes */
};
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a new bpf_skb_cgroup_id() helper that allows to retrieve the
cgroup id from the skb's socket. This is useful in particular to
enable bpf_get_cgroup_classid()-like behavior for cgroup v1 in
cgroup v2 by allowing ID based matching on egress. This can in
particular be used in combination with applying policy e.g. from
map lookups, and also complements the older bpf_skb_under_cgroup()
interface. In user space the cgroup id for a given path can be
retrieved through the f_handle as demonstrated in [0] recently.
[0] https://lkml.org/lkml/2018/5/22/1190
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Filling in the padding slot in the bpf structure as a bug fix in 'ne'
overlapped with actually using that padding area for something in
'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
PCIe ERR_NONFATAL errors mean a particular transaction is unreliable but
the Link is otherwise fully functional (PCIe r4.0, sec 6.2.2).
The AER driver handles these by logging the error details and calling
driver-supplied pci_error_handlers callbacks. It does not reset downstream
devices, does not remove them from the PCI subsystem, does not re-enumerate
them, and does not call their driver .remove() or .probe() methods.
But DPC driver previously enabled DPC on ERR_NONFATAL, so if the hardware
supports DPC, these errors caused a Link reset (performed automatically by
the hardware), followed by the DPC driver removing affected devices (which
calls their .remove() methods), bringing the Link back up, and
re-enumerating (which calls driver .probe() methods).
Disable ERR_NONFATAL DPC triggering so these errors will only be handled by
AER. This means drivers won't have to deal with different usage of their
pci_error_handlers callbacks and .probe() and .remove() methods based on
whether the platform has DPC support.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
This features which allows you to limit the maximum number of
connections per arbitrary key. The connlimit expression is stateful,
therefore it can be used from meters to dynamically populate a set, this
provides a mapping to the iptables' connlimit match. This patch also
comes that allows you define static connlimit policies.
This extension depends on the nf_conncount infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree, the most relevant things in this batch are:
1) Compile masquerade infrastructure into NAT module, from Florian Westphal.
Same thing with the redirection support.
2) Abort transaction if early initialization of the commit phase fails.
Also from Florian.
3) Get rid of synchronize_rcu() by using rule array in nf_tables, from
Florian.
4) Abort nf_tables batch if fatal signal is pending, from Florian.
5) Use .call_rcu nfnetlink from nf_tables to make dumps fully lockless.
From Florian Westphal.
6) Support to match transparent sockets from nf_tables, from Máté Eckl.
7) Audit support for nf_tables, from Phil Sutter.
8) Validate chain dependencies from commit phase, fall back to fine grain
validation only in case of errors.
9) Attach dst to skbuff from netfilter flowtable packet path, from
Jason A. Donenfeld.
10) Use artificial maximum attribute cap to remove VLA from nfnetlink.
Patch from Kees Cook.
11) Add extension to allow to forward packets through neighbour layer.
12) Add IPv6 conntrack helper support to IPVS, from Julian Anastasov.
13) Add IPv6 FTP conntrack support to IPVS, from Julian Anastasov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows us to forward packets from the netdev family via neighbour
layer, so you don't need an explicit link-layer destination when using
this expression from rules. The ttl/hop_limit field is decremented.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This extends log statement to support the behaviour achieved with
AUDIT target in iptables.
Audit logging is enabled via a pseudo log level 8. In this case any
other settings like log prefix are ignored since audit log format is
fixed.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now it can only match the transparent flag of an ip/ipv6 socket.
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
FRRouting installs routes into the kernel associated with
the originating protocol. Add these values to the well
known values in rtnetlink.h.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the per-I/O equivalent of the ioprio_set system call.
When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the
newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field.
This patch depends on block: add ioprio_check_cap function.
Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add unprivileged version of ino_lookup ioctl BTRFS_IOC_INO_LOOKUP_USER
to allow normal users to call "btrfs subvolume list/show" etc. in
combination with BTRFS_IOC_GET_SUBVOL_INFO/BTRFS_IOC_GET_SUBVOL_ROOTREF.
This can be used like BTRFS_IOC_INO_LOOKUP but the argument is
different. This is because it always searches the fs/file tree
correspoinding to the fd with which this ioctl is called and also
returns the name of bottom subvolume.
The main differences from original ino_lookup ioctl are:
1. Read + Exec permission will be checked using inode_permission()
during path construction. -EACCES will be returned in case
of failure.
2. Path construction will be stopped at the inode number which
corresponds to the fd with which this ioctl is called. If
constructed path does not exist under fd's inode, -EACCES
will be returned.
3. The name of bottom subvolume is also searched and filled.
Note that the maximum length of path is shorter 256 (BTRFS_VOL_NAME_MAX+1)
bytes than ino_lookup ioctl because of space of subvolume's name.
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ style fixes ]
Signed-off-by: David Sterba <dsterba@suse.com>
Add unprivileged ioctl BTRFS_IOC_GET_SUBVOL_ROOTREF which returns
ROOT_REF information of the subvolume containing this inode except the
subvolume name (this is because to prevent potential name leak). The
subvolume name will be gained by user version of ino_lookup ioctl
(BTRFS_IOC_INO_LOOKUP_USER) which also performs permission check.
The min id of root ref's subvolume to be searched is specified by
@min_id in struct btrfs_ioctl_get_subvol_rootref_args. After the search
ends, @min_id is set to the last searched root ref's subvolid + 1. Also,
if there are more root refs than BTRFS_MAX_ROOTREF_BUFFER_NUM,
-EOVERFLOW is returned. Therefore the caller can just call this ioctl
again without changing the argument to continue search.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ style fixes and struct item renames ]
Signed-off-by: David Sterba <dsterba@suse.com>
Add new unprivileged ioctl BTRFS_IOC_GET_SUBVOL_INFO which returns
the information of subvolume containing this inode.
(i.e. returns the information in ROOT_ITEM and ROOT_BACKREF.)
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
[ minor style fixes, update struct comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
The GET_ID command, added as of SEV API v0.16, allows the SEV firmware
to be queried about a unique CPU ID. This unique ID can then be used
to obtain the public certificate containing the Chip Endorsement Key
(CEK) public key signed by the AMD SEV Signing Key (ASK).
For more information please refer to "Section 5.12 GET_ID" of
https://support.amd.com/TechDocs/55766_SEV-KM%20API_Specification.pdf
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add support for BPF_PROG_LIRC_MODE2. This type of BPF program can call
rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
that the last key should be repeated.
The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
the target_fd must be the /dev/lircN device.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
MPLS support will not be submitted this dev cycle, but in working on it
I do see a few changes are needed to the API. For now, drop mpls from the
API. Since the fields in question are unions, the mpls fields can be added
back later without affecting the uapi.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
These are minor edits for the eBPF helpers documentation in
include/uapi/linux/bpf.h.
The main fix consists in removing "BPF_FIB_LOOKUP_", because it ends
with a non-escaped underscore that gets interpreted by rst2man and
produces the following message in the resulting manual page:
DOCUTILS SYSTEM MESSAGES
System Message: ERROR/3 (/tmp/bpf-helpers.rst:, line 1514)
Unknown target name: "bpf_fib_lookup".
Other edits consist in:
- Improving formatting for flag values for "bpf_fib_lookup()" helper.
- Emphasising a parameter name in description of the return value for
"bpf_get_stack()" helper.
- Removing unnecessary blank lines between "Description" and "Return"
sections for the few helpers that would use it, for consistency.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently, if two interfaces have addresses in the same connected route,
then the order of the prefix route entries is determined by the order in
which the addresses are assigned or the links brought up.
Add IFA_RT_PRIORITY to allow user to specify the metric of the prefix
route associated with an address giving interface managers better
control of the order of prefix routes.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This feature bit can be used by hypervisor to indicate virtio_net device to
act as a standby for another device with the same MAC address.
VIRTIO_NET_F_STANDBY is defined as bit 62 as it is a device feature bit.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In addition to already existing BPF hooks for sys_bind and sys_connect,
the patch provides new hooks for sys_sendmsg.
It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR`
that provides access to socket itlself (properties like family, type,
protocol) and user-passed `struct sockaddr *` so that BPF program can
override destination IP and port for system calls such as sendto(2) or
sendmsg(2) and/or assign source IP to the socket.
The hooks are implemented as two new attach types:
`BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and
UDPv6 correspondingly.
UDPv4 and UDPv6 separate attach types for same reason as sys_bind and
sys_connect hooks, i.e. to prevent reading from / writing to e.g.
user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound.
The difference with already existing hooks is sys_sendmsg are
implemented only for unconnected UDP.
For TCP it doesn't make sense to change user-provided `struct sockaddr *`
at sendto(2)/sendmsg(2) time since socket either was already connected
and has source/destination set or wasn't connected and call to
sendto(2)/sendmsg(2) would lead to ENOTCONN anyway.
Connected UDP is already handled by sys_connect hooks that can override
source/destination at connect time and use fast-path later, i.e. these
hooks don't affect UDP fast-path.
Rewriting source IP is implemented differently than that in sys_connect
hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to
just bind socket to desired local IP address since source IP can be set
on per-packet basis by using ancillary data (cmsg(3)). So no matter if
socket is bound or not, source IP has to be rewritten on every call to
sys_sendmsg.
To do so two new fields are added to UAPI `struct bpf_sock_addr`;
* `msg_src_ip4` to set source IPv4 for UDPv4;
* `msg_src_ip6` to set source IPv6 for UDPv6.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
We need a new capability to indicate support for the newly added
HvFlushVirtualAddress{List,Space}{,Ex} hypercalls. Upon seeing this
capability, userspace is supposed to announce PV TLB flush features
by setting the appropriate CPUID bits (if needed).
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Simple one-shot poll through the io_submit() interface. To poll for
a file descriptor the application should submit an iocb of type
IOCB_CMD_POLL. It will poll the fd for the events specified in the
the first 32 bits of the aio_buf field of the iocb.
Unlike poll or epoll without EPOLLONESHOT this interface always works
in one shot mode, that is once the iocb is completed, it will have to be
resubmitted.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Pull networking fixes from David Miller:
"Let's begin the holiday weekend with some networking fixes:
1) Whoops need to restrict cfg80211 wiphy names even more to 64
bytes. From Eric Biggers.
2) Fix flags being ignored when using kernel_connect() with SCTP,
from Xin Long.
3) Use after free in DCCP, from Alexey Kodanev.
4) Need to check rhltable_init() return value in ipmr code, from Eric
Dumazet.
5) XDP handling fixes in virtio_net from Jason Wang.
6) Missing RTA_TABLE in rtm_ipv4_policy[], from Roopa Prabhu.
7) Need to use IRQ disabling spinlocks in mlx4_qp_lookup(), from Jack
Morgenstein.
8) Prevent out-of-bounds speculation using indexes in BPF, from
Daniel Borkmann.
9) Fix regression added by AF_PACKET link layer cure, from Willem de
Bruijn.
10) Correct ENIC dma mask, from Govindarajulu Varadarajan.
11) Missing config options for PMTU tests, from Stefano Brivio"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
ibmvnic: Fix partial success login retries
selftests/net: Add missing config options for PMTU tests
mlx4_core: allocate ICM memory in page size chunks
enic: set DMA mask to 47 bit
ppp: remove the PPPIOCDETACH ioctl
ipv4: remove warning in ip_recv_error
net : sched: cls_api: deal with egdev path only if needed
vhost: synchronize IOTLB message with dev cleanup
packet: fix reserve calculation
net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
bpf: properly enforce index mask to prevent out-of-bounds speculation
net/mlx4: Fix irq-unsafe spinlock usage
net: phy: broadcom: Fix bcm_write_exp()
net: phy: broadcom: Fix auxiliary control register reads
net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
ibmvnic: Only do H_EOI for mobility events
tuntap: correctly set SOCKWQ_ASYNC_NOSPACE
virtio-net: fix leaking page for gso packet during mergeable XDP
...
Define netlink messages and attributes to support user kernel
communication that uses the conntrack limit feature.
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series contains updates for mlx5e netdevice driver with one subject,
DSCP to priority mapping, in the first patch Huy adds the needed API in
dcbnl, the second patch adds the needed mlx5 core capability bits for the
feature, and all other patches are mlx5e (netdev) only changes to add
support for the feature.
From: Huy Nguyen
Dscp to priority mapping for Ethernet packet:
These patches enable differentiated services code point (dscp) to
priority mapping for Ethernet packet. Once this feature is
enabled, the packet is routed to the corresponding priority based on its
dscp. User can combine this feature with priority flow control (pfc)
feature to have priority flow control based on the dscp.
Firmware interface:
Mellanox firmware provides two control knobs for this feature:
QPTS register allow changing the trust state between dscp and
pcp mode. The default is pcp mode. Once in dscp mode, firmware will
route the packet based on its dscp value if the dscp field exists.
QPDPM register allow mapping a specific dscp (0 to 63) to a
specific priority (0 to 7). By default, all the dscps are mapped to
priority zero.
Software interface:
This feature is controlled via application priority TLV. IEEE
specification P802.1Qcd/D2.1 defines priority selector id 5 for
application priority TLV. This APP TLV selector defines DSCP to priority
map. This APP TLV can be sent by the switch or can be set locally using
software such as lldptool. In mlx5 drivers, we add the support for net
dcb's getapp and setapp call back. Mlx5 driver only handles the selector
id 5 application entry (dscp application priority application entry).
If user sends multiple dscp to priority APP TLV entries on the same
dscp, the last sent one will take effect. All the previous sent will be
deleted.
This attribute combined with pfc attribute allows advanced user to
fine tune the qos setting for specific priority queue. For example,
user can give dedicated buffer for one or more priorities or user
can give large buffer to certain priorities.
The dcb buffer configuration will be controlled by lldptool.
>> lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6
maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6
>> lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0
sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively
After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to
choose dcbnl over devlink interface since this feature is intended to set
port attributes which are governed by the netdev instance of that port, where
devlink API is more suitable for global ASIC configurations.
The firmware trust state (in QPTS register) is changed based on the
number of dscp to priority application entries. When the first dscp to
priority application entry is added by the user, the trust state is
changed to dscp. When the last dscp to priority application entry is
deleted by the user, the trust state is changed to pcp.
When the port is in DSCP trust state, the transmit queue is selected
based on the dscp of the skb.
When the port is in DSCP trust state and vport inline mode is not NONE,
firmware requires mlx5 driver to copy the IP header to the
wqe ethernet segment inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features
such as xdpsq, icosq are not modified.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbBy2YAAoJEEg/ir3gV/o+RkYIAId0h3GqVof41/qCHPuqx+7j
J/lTJSU+PHDZfVAuXhoQN5lFGsRku/wkqdSZNXxwvbUxVzIoUpxWvlaEv+481Cmj
+WhVC1QIExGYUs4CCGjgl853kNtGpDrZmk+olY3QaQISHdpcmgs/W13YlEKRyw5q
nFSa6GLA/3IQXb7eOwUiSW1gYYjn/eM2XFHntTTOeasa9dwb2QDqDQdoYiKdS+yS
KxqxrlADU4V+CtNlOmZ4QGxOa7suBBr+a6JvNnvviYOKKpYMNQRbhUbgu0pPdKAt
zbHHS7Twy1ka3s6CKk/QIT7/YWfCOQgWy6LGPCtUuGUA6dy5xsV7i2BtoDwMC7o=
=EJd2
-----END PGP SIGNATURE-----
Merge tag 'mlx5e-updates-2018-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5e-updates-2018-05-19
This series contains updates for mlx5e netdevice driver with one subject,
DSCP to priority mapping, in the first patch Huy adds the needed API in
dcbnl, the second patch adds the needed mlx5 core capability bits for the
feature, and all other patches are mlx5e (netdev) only changes to add
support for the feature.
From: Huy Nguyen
Dscp to priority mapping for Ethernet packet:
These patches enable differentiated services code point (dscp) to
priority mapping for Ethernet packet. Once this feature is
enabled, the packet is routed to the corresponding priority based on its
dscp. User can combine this feature with priority flow control (pfc)
feature to have priority flow control based on the dscp.
Firmware interface:
Mellanox firmware provides two control knobs for this feature:
QPTS register allow changing the trust state between dscp and
pcp mode. The default is pcp mode. Once in dscp mode, firmware will
route the packet based on its dscp value if the dscp field exists.
QPDPM register allow mapping a specific dscp (0 to 63) to a
specific priority (0 to 7). By default, all the dscps are mapped to
priority zero.
Software interface:
This feature is controlled via application priority TLV. IEEE
specification P802.1Qcd/D2.1 defines priority selector id 5 for
application priority TLV. This APP TLV selector defines DSCP to priority
map. This APP TLV can be sent by the switch or can be set locally using
software such as lldptool. In mlx5 drivers, we add the support for net
dcb's getapp and setapp call back. Mlx5 driver only handles the selector
id 5 application entry (dscp application priority application entry).
If user sends multiple dscp to priority APP TLV entries on the same
dscp, the last sent one will take effect. All the previous sent will be
deleted.
This attribute combined with pfc attribute allows advanced user to
fine tune the qos setting for specific priority queue. For example,
user can give dedicated buffer for one or more priorities or user
can give large buffer to certain priorities.
The dcb buffer configuration will be controlled by lldptool.
>> lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6
maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6
>> lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0
sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively
After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to
choose dcbnl over devlink interface since this feature is intended to set
port attributes which are governed by the netdev instance of that port, where
devlink API is more suitable for global ASIC configurations.
The firmware trust state (in QPTS register) is changed based on the
number of dscp to priority application entries. When the first dscp to
priority application entry is added by the user, the trust state is
changed to dscp. When the last dscp to priority application entry is
deleted by the user, the trust state is changed to pcp.
When the port is in DSCP trust state, the transmit queue is selected
based on the dscp of the skb.
When the port is in DSCP trust state and vport inline mode is not NONE,
firmware requires mlx5 driver to copy the IP header to the
wqe ethernet segment inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features
such as xdpsq, icosq are not modified.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for a new port flag - BR_ISOLATED. If it is set
then isolated ports cannot communicate between each other, but they can
still communicate with non-isolated ports. The same can be achieved via
ACLs but they can't scale with large number of ports and also the
complexity of the rules grows. This feature can be used to achieve
isolated vlan functionality (similar to pvlan) as well, though currently
it will be port-wide (for all vlans on the port). The new test in
should_deliver uses data that is already cache hot and the new boolean
is used to avoid an additional source port test in should_deliver.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
before f_count has reached 0, which is fundamentally a bad idea. It
does check 'f_count < 2', which excludes concurrent operations on the
file since they would only be possible with a shared fd table, in which
case each fdget() would take a file reference. However, it fails to
account for the fact that even with 'f_count == 1' the file can still be
linked into epoll instances. As reported by syzbot, this can trivially
be used to cause a use-after-free.
Yet, the only known user of PPPIOCDETACH is pppd versions older than
ppp-2.4.2, which was released almost 15 years ago (November 2003).
Also, PPPIOCDETACH apparently stopped working reliably at around the
same time, when the f_count check was added to the kernel, e.g. see
https://lkml.org/lkml/2002/12/31/83. Also, the current 'f_count < 2'
check makes PPPIOCDETACH only work in single-threaded applications; it
always fails if called from a multithreaded application.
All pppd versions released in the last 15 years just close() the file
descriptor instead.
Therefore, instead of hacking around this bug by exporting epoll
internals to modules, and probably missing other related bugs, just
remove the PPPIOCDETACH ioctl and see if anyone actually notices. Leave
a stub in place that prints a one-time warning and returns EINVAL.
Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-05-24
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc).
2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers.
3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit.
4) Jiong Wang adds support for indirect and arithmetic shifts to NFP
5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible.
6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing
to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions.
7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT.
8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events.
9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, suppose a userspace application has loaded a bpf program
and attached it to a tracepoint/kprobe/uprobe, and a bpf
introspection tool, e.g., bpftool, wants to show which bpf program
is attached to which tracepoint/kprobe/uprobe. Such attachment
information will be really useful to understand the overall bpf
deployment in the system.
There is a name field (16 bytes) for each program, which could
be used to encode the attachment point. There are some drawbacks
for this approaches. First, bpftool user (e.g., an admin) may not
really understand the association between the name and the
attachment point. Second, if one program is attached to multiple
places, encoding a proper name which can imply all these
attachments becomes difficult.
This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
Given a pid and fd, if the <pid, fd> is associated with a
tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
. prog_id
. tracepoint name, or
. k[ret]probe funcname + offset or kernel addr, or
. u[ret]probe filename + offset
to the userspace.
The user can use "bpftool prog" to find more information about
bpf program itself with prog_id.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In this patch, we add dcbnl buffer attribute to allow user
change the NIC's buffer configuration such as priority
to buffer mapping and buffer size of individual buffer.
This attribute combined with pfc attribute allows advanced user to
fine tune the qos setting for specific priority queue. For example,
user can give dedicated buffer for one or more priorities or user
can give large buffer to certain priorities.
The dcb buffer configuration will be controlled by lldptool.
lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6
maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6
lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0
sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively
After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to
choose dcbnl over devlink interface since this feature is intended to set
port attributes which are governed by the netdev instance of that port, where
devlink API is more suitable for global ASIC configurations.
We present an use case scenario where dcbnl buffer attribute configured
by advance user helps reduce the latency of messages of different sizes.
Scenarios description:
On ConnectX-5, we run latency sensitive traffic with
small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
traffic with large messages sizes 512KB and 1MB. We group small, medium,
and large message sizes to their own pfc enables priorities as follow.
Priorities 1 & 2 (64B, 256B and 1KB)
Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
Priorities 5 & 6 (512KB and 1MB)
By default, ConnectX-5 maps all pfc enabled priorities to a single
lossless fixed buffer size of 50% of total available buffer space. The
other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
we create three equal size lossless buffers. Each buffer has 25% of total
available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
to lossless buffer mappings are set as follow.
Priorities 1 & 2 on lossless buffer #1
Priorities 3 & 4 on lossless buffer #2
Priorities 5 & 6 on lossless buffer #3
We observe improvements in latency for small and medium message sizes
as follows. Please note that the large message sizes bandwidth performance is
reduced but the total bandwidth remains the same.
256B message size (42 % latency reduction)
4K message size (21% latency reduction)
64K message size (16% latency reduction)
CC: Ido Schimmel <idosch@idosch.org>
CC: Jakub Kicinski <jakub.kicinski@netronome.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Or Gerlitz <gerlitz.or@gmail.com>
CC: Parav Pandit <parav@mellanox.com>
CC: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.
Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.
Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.
This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.
The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
bpf_lwt_seg6_action helper, the BPF program should return this
value, as the skb's destination is already set and the default
lookup should not be performed.
- BPF_DROP : the packet will be dropped.
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The BPF seg6local hook should be powerful enough to enable users to
implement most of the use-cases one could think of. After some thinking,
we figured out that the following actions should be possible on a SRv6
packet, requiring 3 specific helpers :
- bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
- bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
(to add/delete TLVs)
- bpf_lwt_seg6_action: Apply some SRv6 network programming actions
(specifically End.X, End.T, End.B6 and
End.B6.Encap)
The specifications of these helpers are provided in the patch (see
include/uapi/linux/bpf.h).
The non-sensitive fields of the SRH are the following : flags, tag and
TLVs. The other fields can not be modified, to maintain the SRH
integrity. Flags, tag and TLVs can easily be modified as their validity
can be checked afterwards via seg6_validate_srh. It is not allowed to
modify the segments directly. If one wants to add segments on the path,
he should stack a new SRH using the End.B6 action via
bpf_lwt_seg6_action.
Growing, shrinking or editing TLVs via the helpers will flag the SRH as
invalid, and it will have to be re-validated before re-entering the IPv6
layer. This flag is stored in a per-CPU buffer, along with the current
header length in bytes.
Storing the SRH len in bytes in the control block is mandatory when using
bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
boundary). When adding/deleting TLVs within the BPF program, the SRH may
temporary be in an invalid state where its length cannot be rounded to 8
bytes without remainder, hence the need to store the length in bytes
separately. The caller of the BPF program can then ensure that the SRH's
final length is valid using this value. Again, a final SRH modified by a
BPF program which doesn’t respect the 8-bytes boundary will be discarded
as it will be considered as invalid.
Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
available from the LWT BPF IN hook, but not from the seg6local BPF one.
This helper allows to encapsulate a Segment Routing Header (either with
a new outer IPv6 header, or by inlining it directly in the existing IPv6
header) into a non-SRv6 packet. This helper is required if we want to
offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
as the BPF seg6local hook only works on traffic already containing a SRH.
This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
the same purpose but with a static SRH per route.
These helpers require CONFIG_IPV6=y (and not =m).
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds new two new fields to struct bpf_prog_info. For
multi-function programs, these fields can be used to pass
a list of the JITed image lengths of each function for a
given program to userspace using the bpf system call with
the BPF_OBJ_GET_INFO_BY_FD command.
This can be used by userspace applications like bpftool
to split up the contiguous JITed dump, also obtained via
the system call, into more relatable chunks corresponding
to each function.
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds new two new fields to struct bpf_prog_info. For
multi-function programs, these fields can be used to pass
a list of kernel symbol addresses for all functions in a
given program to userspace using the bpf system call with
the BPF_OBJ_GET_INFO_BY_FD command.
When bpf_jit_kallsyms is enabled, we can get the address
of the corresponding kernel symbol for a callee function
and resolve the symbol's name. The address is determined
by adding the value of the call instruction's imm field
to __bpf_call_base. This offset gets assigned to the imm
field by the verifier.
For some architectures, such as powerpc64, the imm field
is not large enough to hold this offset.
We resolve this by:
[1] Assigning the subprog id to the imm field of a call
instruction in the verifier instead of the offset of
the callee's symbol's address from __bpf_call_base.
[2] Determining the address of a callee's corresponding
symbol by using the imm field as an index for the
list of kernel symbol addresses now available from
the program info.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, they are:
1) Remove obsolete nf_log tracing from nf_tables, from Florian Westphal.
2) Add support for map lookups to numgen, random and hash expressions,
from Laura Garcia.
3) Allow to register nat hooks for iptables and nftables at the same
time. Patchset from Florian Westpha.
4) Timeout support for rbtree sets.
5) ip6_rpfilter works needs interface for link-local addresses, from
Vincent Bernat.
6) Add nf_ct_hook and nf_nat_hook structures and use them.
7) Do not drop packets on packets raceing to insert conntrack entries
into hashes, this is particularly a problem in nfqueue setups.
8) Address fallout from xt_osf separation to nf_osf, patches
from Florian Westphal and Fernando Mancera.
9) Remove reference to struct nft_af_info, which doesn't exist anymore.
From Taehee Yoo.
This batch comes with is a conflict between 25fd386e0b ("netfilter:
core: add missing __rcu annotation") in your tree and 2c205dd398
("netfilter: add struct nf_nat_hook and use it") coming in this batch.
This conflict can be solved by leaving the __rcu tag on
__netfilter_net_init() - added by 25fd386e0b - and remove all code
related to nf_nat_decode_session_hook - which is gone after
2c205dd398, as described by:
diff --cc net/netfilter/core.c
index e0ae4aae96f5,206fb2c4c319..168af54db975
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@@ -611,7 -580,13 +611,8 @@@ const struct nf_conntrack_zone nf_ct_zo
EXPORT_SYMBOL_GPL(nf_ct_zone_dflt);
#endif /* CONFIG_NF_CONNTRACK */
- static void __net_init __netfilter_net_init(struct nf_hook_entries **e, int max)
-#ifdef CONFIG_NF_NAT_NEEDED
-void (*nf_nat_decode_session_hook)(struct sk_buff *, struct flowi *);
-EXPORT_SYMBOL(nf_nat_decode_session_hook);
-#endif
-
+ static void __net_init
+ __netfilter_net_init(struct nf_hook_entries __rcu **e, int max)
{
int h;
I can also merge your net-next tree into nf-next, solve the conflict and
resend the pull request if you prefer so.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* a fix for a race in aggregation, which I want to let
bake for a bit longer before sending to stable
* some new statistics (ACK RSSI, TXQ)
* TXQ configuration
* preparations for HE, particularly radiotap
* replace confusing "country IE" by "country element" since it's
not referring to Ireland
Note that I merged net-next to get a fix from mac80211 that got
there via net, to apply one patch that would otherwise conflict.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlsFWnEACgkQB8qZga/f
l8Ry2xAAmOLiTrZ8VlZIwzXEoIrr2b7VM0jQsbCLGmBDu82EV4aRRtX9ZeZm9PuR
a6t9kvQFyT0/7tInTfv6I9JlNCZpwT9Mc3Ttw2JQgJ9zm/IYsxmWJ4TtjIz7F+AA
rqmxdplSCSJUcIVQ/mJ1oINl3p4ewoAv1doxtQx0Ucavb31ROwjVRUX24OJd1SeK
YOFSjoTLHcCDS5jaTbzAGwI31F3plHG8NKMLlwGtrYMhN2SmaQV2YU+YTPJuiQbt
EGa3MukngxF7ck+D57CJM+OcLrPF4RiuT6pmJHR8as5Yz5u40bgn3wZu361EcmSy
wpJKFNsTOJS+nFHS/zMTWiVbB12bBGNWf3rZXUv5yH1TwVf8y8B2p2jrEasmVcjB
PgwNcylNJYfqd2W439xwt1ChGAzc388U2yyzMtWmnNQeAAUFMthtjhEv2Vnowxf3
cFvO5okRpVpOP42JB57VZfNoPeeUHnPfrlDl40AwbKUkKeVOom5oJQIi5WMg4nAV
+MXooiJStZxMsY1PDyQgE06dL40r2HlmaX0DB/UbbWeVAaJ2c4aS3ptApEWrfedY
FDTL0XhfqejPbK2Au/KX64TTj8ID2bGsundM4ErcilOK3Pu63FMv9b0mziBd8jX1
6lJE2oIR8w10dFZG4O5itVE8n6PE2Fgx728480Lsjuz56GVxMB0=
=G98y
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2018-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
For this round, we have various things all over the place, notably
* a fix for a race in aggregation, which I want to let
bake for a bit longer before sending to stable
* some new statistics (ACK RSSI, TXQ)
* TXQ configuration
* preparations for HE, particularly radiotap
* replace confusing "country IE" by "country element" since it's
not referring to Ireland
Note that I merged net-next to get a fix from mac80211 that got
there via net, to apply one patch that would otherwise conflict.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a followup to fib rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
and user mode helper code that is embedded into bpfilter.ko
The steps to build bpfilter.ko are the following:
- main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
- with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
is converted into bpfilter_umh.o object file
with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
Example:
$ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
- bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko
bpfilter_kern.c is a normal kernel module code that calls
the fork_usermode_blob() helper to execute part of its own data
as a user mode process.
Notice that _binary_net_bpfilter_bpfilter_umh_start - end
is placed into .init.rodata section, so it's freed as soon as __init
function of bpfilter.ko is finished.
As part of __init the bpfilter.ko does first request/reply action
via two unix pipe provided by fork_usermode_blob() helper to
make sure that umh is healthy. If not it will kill it via pid.
Later bpfilter_process_sockopt() will be called from bpfilter hooks
in get/setsockopt() to pass iptable commands into umh via bpfilter.ko
If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
kill umh as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id"
could cause confusion because the "id" of "btf_id" means the BPF obj id
given to the BTF object while
"btf_key_id" and "btf_value_id" means the BTF type id within
that BTF object.
To make it clear, btf_key_id and btf_value_id are
renamed to btf_key_type_id and btf_value_type_id.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch does the followings:
1. Limit BTF_MAX_TYPES and BTF_MAX_NAME_OFFSET to 64k. We can
raise it later.
2. Remove the BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID. They are
currently encoded at the highest bit of a u32.
It is because the current use case does not require supporting
parent type (i.e type_id referring to a type in another BTF file).
It also does not support referring to a string in ELF.
The BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID checks are replaced
by BTF_TYPE_ID_CHECK and BTF_STR_OFFSET_CHECK which are
defined in btf.c instead of uapi/linux/btf.h.
3. Limit the BTF_INFO_KIND from 5 bits to 4 bits which is enough.
There is unused bits headroom if we ever needed it later.
4. The root bit in BTF_INFO is also removed because it is not
used in the current use case.
5. Remove BTF_INT_VARARGS since func type is not supported now.
The BTF_INT_ENCODING is limited to 4 bits instead of 8 bits.
The above can be added back later because the verifier
ensures the unused bits are zeros.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
There are currently unused section descriptions in the btf_header. Those
sections are here to support future BTF use cases. For example, the
func section (func_off) is to support function signature (e.g. the BPF
prog function signature).
Instead of spelling out all potential sections up-front in the btf_header.
This patch makes changes to btf_header such that extending it (e.g. adding
a section) is possible later. The unused ones can be removed for now and
they can be added back later.
This patch:
1. adds a hdr_len to the btf_header. It will allow adding
sections (and other info like parent_label and parent_name)
later. The check is similar to the existing bpf_attr.
If a user passes in a longer hdr_len, the kernel
ensures the extra tailing bytes are 0.
2. allows the section order in the BTF object to be
different from its sec_off order in btf_header.
3. each sec_off is followed by a sec_len. It must not have gap or
overlapping among sections.
The string section is ensured to be at the end due to the 4 bytes
alignment requirement of the type section.
The above changes will allow enough flexibility to
add new sections (and other info) to the btf_header later.
This patch also removes an unnecessary !err check
at the end of btf_parse().
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use NL80211_CMD_UPDATE_CONNECT_PARAMS to update new ERP information,
Association IEs and the Authentication type to driver / firmware which
will be used in subsequent roamings.
Signed-off-by: Vidyullatha Kanchanapally <vidyullatha@codeaurora.org>
[arend: extended fils-sk kernel doc and added check in wiphy_register()]
Reviewed-by: Jithu Jance <jithu.jance@broadcom.com>
Reviewed-by: Eylon Pedinovsky <eylon.pedinovsky@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In case of FILS shared key offload the parameters can change
upon roaming of which user-space needs to be notified.
Reviewed-by: Jithu Jance <jithu.jance@broadcom.com>
Reviewed-by: Eylon Pedinovsky <eylon.pedinovsky@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Bring in net-next which had pulled in net, so I have the changes
from mac80211 and can apply a patch that would otherwise conflict.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In this commit we remove the explicit ring structure from the the
uapi. It is tricky for an uapi to depend on a certain L1 cache line
size, since it can differ for variants of the same architecture. Now,
we let the user application determine the offsets of the producer,
consumer and descriptors by asking the socket via getsockopt.
A typical flow would be (Rx ring):
struct xdp_mmap_offsets off;
struct xdp_desc *ring;
u32 *prod, *cons;
void *map;
...
getsockopt(fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
map = mmap(NULL, off.rx.desc +
NUM_DESCS * sizeof(struct xdp_desc),
PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_POPULATE, sfd,
XDP_PGOFF_RX_RING);
prod = map + off.rx.producer;
cons = map + off.rx.consumer;
ring = map + off.rx.desc;
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Move the sxdp_flags up, avoiding a hole in the uapi structure.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.
TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.
The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.
Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge speculative store buffer bypass fixes from Thomas Gleixner:
- rework of the SPEC_CTRL MSR management to accomodate the new fancy
SSBD (Speculative Store Bypass Disable) bit handling.
- the CPU bug and sysfs infrastructure for the exciting new Speculative
Store Bypass 'feature'.
- support for disabling SSB via LS_CFG MSR on AMD CPUs including
Hyperthread synchronization on ZEN.
- PRCTL support for dynamic runtime control of SSB
- SECCOMP integration to automatically disable SSB for sandboxed
processes with a filter flag for opt-out.
- KVM integration to allow guests fiddling with SSBD including the new
software MSR VIRT_SPEC_CTRL to handle the LS_CFG based oddities on
AMD.
- BPF protection against SSB
.. this is just the core and x86 side, other architecture support will
come separately.
* 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
bpf: Prevent memory disambiguation attack
x86/bugs: Rename SSBD_NO to SSB_NO
KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
x86/bugs: Rework spec_ctrl base and mask logic
x86/bugs: Remove x86_spec_ctrl_set()
x86/bugs: Expose x86_spec_ctrl_base directly
x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
x86/speculation: Rework speculative_store_bypass_update()
x86/speculation: Add virtualized speculative store bypass disable support
x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
x86/speculation: Handle HT correctly on AMD
x86/cpufeatures: Add FEATURE_ZEN
x86/cpufeatures: Disentangle SSBD enumeration
x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
KVM: SVM: Move spec control call after restore of GS
x86/cpu: Make alternative_msr_write work for 32-bit code
x86/bugs: Fix the parameters alignment and missing void
x86/bugs: Make cpu_show_common() static
...
fscrypt currently only supports AES encryption. However, many low-end
mobile devices have older CPUs that don't have AES instructions, e.g.
the ARMv8 Cryptography Extensions. Currently, user data on such devices
is not encrypted at rest because AES is too slow, even when the NEON
bit-sliced implementation of AES is used. Unfortunately, it is
infeasible to encrypt these devices at all when AES is the only option.
Therefore, this patch updates fscrypt to support the Speck block cipher,
which was recently added to the crypto API. The C implementation of
Speck is not especially fast, but Speck can be implemented very
efficiently with general-purpose vector instructions, e.g. ARM NEON.
For example, on an ARMv7 processor, we measured the NEON-accelerated
Speck128/256-XTS at 69 MB/s for both encryption and decryption, while
AES-256-XTS with the NEON bit-sliced implementation was only 22 MB/s
encryption and 19 MB/s decryption.
There are multiple variants of Speck. This patch only adds support for
Speck128/256, which is the variant with a 128-bit block size and 256-bit
key size -- the same as AES-256. This is believed to be the most secure
variant of Speck, and it's only about 6% slower than Speck128/128.
Speck64/128 would be at least 20% faster because it has 20% rounds, and
it can be even faster on CPUs that can't efficiently do the 64-bit
operations needed for Speck128. However, Speck64's 64-bit block size is
not preferred security-wise. ARM NEON also supports the needed 64-bit
operations even on 32-bit CPUs, resulting in Speck128 being fast enough
for our targeted use cases so far.
The chosen modes of operation are XTS for contents and CTS-CBC for
filenames. These are the same modes of operation that fscrypt defaults
to for AES. Note that as with the other fscrypt modes, Speck will not
be used unless userspace chooses to use it. Nor are any of the existing
modes (which are all AES-based) being removed, of course.
We intentionally don't make CONFIG_FS_ENCRYPTION select
CONFIG_CRYPTO_SPECK, so people will have to enable Speck support
themselves if they need it. This is because we shouldn't bloat the
FS_ENCRYPTION dependencies with every new cipher, especially ones that
aren't recommended for most users. Moreover, CRYPTO_SPECK is just the
generic implementation, which won't be fast enough for many users; in
practice, they'll need to enable CRYPTO_SPECK_NEON to get acceptable
performance.
More details about our choice of Speck can be found in our patches that
added Speck to the crypto API, and the follow-on discussion threads.
We're planning a publication that explains the choice in more detail.
But briefly, we can't use ChaCha20 as we previously proposed, since it
would be insecure to use a stream cipher in this context, with potential
IV reuse during writes on f2fs and/or on wear-leveling flash storage.
We also evaluated many other lightweight and/or ARX-based block ciphers
such as Chaskey-LTS, RC5, LEA, CHAM, Threefish, RC6, NOEKEON, SPARX, and
XTEA. However, all had disadvantages vs. Speck, such as insufficient
performance with NEON, much less published cryptanalysis, or an
insufficient security level. Various design choices in Speck make it
perform better with NEON than competing ciphers while still having a
security margin similar to AES, and in the case of Speck128 also the
same available security levels. Unfortunately, Speck does have some
political baggage attached -- it's an NSA designed cipher, and was
rejected from an ISO standard (though for context, as far as I know none
of the above-mentioned alternatives are ISO standards either).
Nevertheless, we believe it is a good solution to the problem from a
technical perspective.
Certain algorithms constructed from ChaCha or the ChaCha permutation,
such as MEM (Masked Even-Mansour) or HPolyC, may also meet our
performance requirements. However, these are new constructions that
need more time to receive the cryptographic review and acceptance needed
to be confident in their security. HPolyC hasn't been published yet,
and we are concerned that MEM makes stronger assumptions about the
underlying permutation than the ChaCha stream cipher does. In contrast,
the XTS mode of operation is relatively well accepted, and Speck has
over 70 cryptanalysis papers. Of course, these ChaCha-based algorithms
can still be added later if they become ready.
The best known attack on Speck128/256 is a differential cryptanalysis
attack on 25 of 34 rounds with 2^253 time complexity and 2^125 chosen
plaintexts, i.e. only marginally faster than brute force. There is no
known attack on the full 34 rounds.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Devlink ports can have specific flavour according to the purpose of use.
This patch extend attrs_set so the driver can say which flavour port
has. Initial flavours are:
physical, cpu, dsa
User can query this to see right away what is the purpose of each port.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change existing setter for split port information into more generic
attrs setter. Alongside with that, allow to set port number and subport
number for split ports.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently sk_msg programs only have access to the raw data. However,
it is often useful when building policies to have the policies specific
to the socket endpoint. This allows using the socket tuple as input
into filters, etc.
This patch adds ctx access to the sock fields.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sites may wish to provide additional metadata alongside files in order
to make more fine-grained security decisions[1]. The security of this is
enhanced if this metadata is protected, something that EVM makes
possible. However, the kernel cannot know about the set of extended
attributes that local admins may wish to protect, and hardcoding this
policy in the kernel makes it difficult to change over time and less
convenient for distributions to enable.
This patch adds a new /sys/kernel/security/integrity/evm/evm_xattrs node,
which can be read to obtain the current set of EVM-protected extended
attributes or written to in order to add new entries. Extending this list
will not change the validity of any existing signatures provided that the
file in question does not have any of the additional extended attributes -
missing xattrs are skipped when calculating the EVM hash.
[1] For instance, a package manager could install information about the
package uploader in an additional extended attribute. Local LSM policy
could then be associated with that extended attribute in order to
restrict the privileges available to packages from less trusted
uploaders.
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: James Morris <james.morris@microsoft.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
This counter tracks number of ACK packets that the host has not sent,
thanks to ACK compression.
Sample output :
$ nstat -n;sleep 1;nstat|egrep "IpInReceives|IpOutRequests|TcpInSegs|TcpOutSegs|TcpExtTCPAckCompressed"
IpInReceives 123250 0.0
IpOutRequests 3684 0.0
TcpInSegs 123251 0.0
TcpOutSegs 3684 0.0
TcpExtTCPAckCompressed 119252 0.0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up SPDX-License-Identifier and removing licensing leftovers.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
wiphy names were recently limited to 128 bytes by commit a7cfebcb75
("cfg80211: limit wiphy names to 128 bytes"). As it turns out though,
this isn't sufficient because dev_vprintk_emit() needs the syslog header
string "SUBSYSTEM=ieee80211\0DEVICE=+ieee80211:$devname" to fit into 128
bytes. This triggered the "device/subsystem name too long" WARN when
the device name was >= 90 bytes. As before, this was reproduced by
syzbot by sending an HWSIM_CMD_NEW_RADIO command to the MAC80211_HWSIM
generic netlink family.
Fix it by further limiting wiphy names to 64 bytes.
Reported-by: syzbot+e64565577af34b3768dc@syzkaller.appspotmail.com
Fixes: a7cfebcb75 ("cfg80211: limit wiphy names to 128 bytes")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJa+mZEAAoJEAx081l5xIa+p/4P/3kIW0Zk6wO2HOF2u4TRZdhe
2b6yYP6ig1MMpLsJuRH2f8hnWl2f+CBzhwHaKbUni9ffY4TboOWeoYL5YWap2Pcp
MxRLBXBAI9+8zqqsrm/VB4gQL/Xp0nghN3CT1khLnMs38BkFUX7nASiSIknVIxj3
ux/95o0Tb2uYN886ILZCixPjmNUSgfNAyQuNNKRmT1EM3mgDZ2mc6BJoArPcCBqr
0vkekQA9+ZK4XYEHfjq/0CrVMLXhjaO05+BADK8A8WOtyvU+0xKjJjmQx0sQAd6L
Vcr+aMabJP8+3LeMDjIWqH0wUk6YqECwnUOoBkJFp5YTx+D1ff2RzmlWwvt9skIZ
4tmyFMfAn8XKkoSwa598/jamxOgTmMTIO8/6dJfO01sDgUvmTeR5z+ZTDG9FudFW
7Y2aHLMm19kitjqLDCpWBPmFGYVmfIsqA52qSgIjF4JVIurDk3PLRbQt++4k2j84
hLvYClJIs4ulTfmNRuBH4cVYtW5H5ohIkwP9L715Y+7ag/LUdQB1V6QsrX1bHEXg
KX1jP1UHqpUwNEQ9N2/1wVv1Ss7p7CKFY3C2UAMacRyymrws4McziPuXUalkBArs
royz2gRc5ykpbZ7Itlls43XlyMYxBeaogq+P2ODHouQMfDM21Gam/mpBPz3+t2c5
fo9rLqk3NqxPbHud1NJH
=4NJV
-----END PGP SIGNATURE-----
Merge drm-fixes-for-v4.17-rc6-urgent into drm-next
Need to backmerge some nouveau fixes to reduce
the nouveau -next conflicts a lot.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch creates new attributes to accept a map as argument and
then perform the lookup with the generated hash accordingly.
Both current hash functions are supported: Jenkins and Symmetric Hash.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stephen Rothwell says:
today's linux-next build (x86_64 allmodconfig) produced this warning:
./usr/include/linux/netfilter/nf_osf.h:25: found __[us]{8,16,32,64} type without #include <linux/types.h>
Fix that up and also move kernel-private struct out of uapi (it was not
exposed in any released kernel version).
tested via allmodconfig build + make headers_check.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: bfb15f2a95 ("netfilter: extract Passive OS fingerprint infrastructure from xt_osf")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-05-17
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Provide a new BPF helper for doing a FIB and neighbor lookup
in the kernel tables from an XDP or tc BPF program. The helper
provides a fast-path for forwarding packets. The API supports
IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are
implemented in this initial work, from David (Ahern).
2) Just a tiny diff but huge feature enabled for nfp driver by
extending the BPF offload beyond a pure host processing offload.
Offloaded XDP programs are allowed to set the RX queue index and
thus opening the door for defining a fully programmable RSS/n-tuple
filter replacement. Once BPF decided on a queue already, the device
data-path will skip the conventional RSS processing completely,
from Jakub.
3) The original sockmap implementation was array based similar to
devmap. However unlike devmap where an ifindex has a 1:1 mapping
into the map there are use cases with sockets that need to be
referenced using longer keys. Hence, sockhash map is added reusing
as much of the sockmap code as possible, from John.
4) Introduce BTF ID. The ID is allocatd through an IDR similar as
with BPF maps and progs. It also makes BTF accessible to user
space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data
through BPF_OBJ_GET_INFO_BY_FD, from Martin.
5) Enable BPF stackmap with build_id also in NMI context. Due to the
up_read() of current->mm->mmap_sem build_id cannot be parsed.
This work defers the up_read() via a per-cpu irq_work so that
at least limited support can be enabled, from Song.
6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND
JIT conversion as well as implementation of an optimized 32/64 bit
immediate load in the arm64 JIT that allows to reduce the number of
emitted instructions; in case of tested real-world programs they
were shrinking by three percent, from Daniel.
7) Add ifindex parameter to the libbpf loader in order to enable
BPF offload support. Right now only iproute2 can load offloaded
BPF and this will also enable libbpf for direct integration into
other applications, from David (Beckett).
8) Convert the plain text documentation under Documentation/bpf/ into
RST format since this is the appropriate standard the kernel is
moving to for all documentation. Also add an overview README.rst,
from Jesper.
9) Add __printf verification attribute to the bpf_verifier_vlog()
helper. Though it uses va_list we can still allow gcc to check
the format string, from Mathieu.
10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...'
is a bash 4.0+ feature which is not guaranteed to be available
when calling out to shell, therefore use a more portable variant,
from Joe.
11) Fix a 64 bit division in xdp_umem_reg() by using div_u64()
instead of relying on the gcc built-in, from Björn.
12) Fix a sock hashmap kmalloc warning reported by syzbot when an
overly large key size is used in hashmap then causing overflows
in htab->elem_size. Reject bogus attr->key_size early in the
sock_hash_alloc(), from Yonghong.
13) Ensure in BPF selftests when urandom_read is being linked that
--build-id is always enabled so that test_stacktrace_build_id[_nmi]
won't be failing, from Alexei.
14) Add bitsperlong.h as well as errno.h uapi headers into the tools
header infrastructure which point to one of the arch specific
uapi headers. This was needed in order to fix a build error on
some systems for the BPF selftests, from Sirio.
15) Allow for short options to be used in the xdp_monitor BPF sample
code. And also a bpf.h tools uapi header sync in order to fix a
selftest build failure. Both from Prashant.
16) More formally clarify the meaning of ID in the direct packet access
section of the BPF documentation, from Wang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This retains 256 chars as the maximum size through the interface, which
is the btrfs limit and AFAIK exceeds any other filesystem's maximum
label size.
This just copies the ioctl for now and leaves it in place for btrfs
for the time being. A later patch will allow btrfs to use the new
common ioctl definition, but it may be sent after this is merged.
(Note, Reviewed-by's were originally given for the combined vfs+btrfs
patch, some license taken here.)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Sockmap is currently backed by an array and enforces keys to be
four bytes. This works well for many use cases and was originally
modeled after devmap which also uses four bytes keys. However,
this has become limiting in larger use cases where a hash would
be more appropriate. For example users may want to use the 5-tuple
of the socket as the lookup key.
To support this add hash support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds support for the MIXER UNIT in UAC3. All the information
is obtained from the (HIGH CAPABILITY) Cluster's header. We don't
read the rest of the logical cluster to obtain the channel config
as that wont make any difference in the current mixer behaviour.
The name of the mixer unit is not yet requested as there is not
support for the UAC3 Class Specific String requests.
Tested in an UAC3 device working as a HEADSET with a basic mixer
unit (same as the one in the BADD spec) with no controls.
Signed-off-by: Jorge Sanjuan <jorge.sanjuan@codethink.co.uk>
Reviewed-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Tested-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Currently, when the rule is not to be exclusively executed by the
hardware, extack is not passed along and offloading failures don't
get logged. The idea was that hardware failures are okay because the
rule will get executed in software then and this way it doesn't confuse
unware users.
But this is not helpful in case one needs to understand why a certain
rule failed to get offloaded. Considering it may have been a temporary
failure, like resources exceeded or so, reproducing it later and knowing
that it is triggering the same reason may be challenging.
The ultimate goal is to improve Open vSwitch debuggability when using
flower offloading.
This patch adds a new flag to enable verbose logging. With the flag set,
extack will be passed to the driver, which will be able to log the
error. As the operation itself probably won't fail (not because of this,
at least), current iproute will already log it as a Warning.
The flag is generic, so it can be reused later. No need to restrict it
just for HW offloading. The command line will follow the syntax that
tc-ebpf already uses, tc ... [ verbose ] ... , and extend its meaning.
For example:
# ./tc qdisc add dev p7p1 ingress
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
flower verbose \
src_mac ed:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
Warning: TC offload is disabled on net device.
# echo $?
0
# ./tc filter add dev p7p1 parent ffff: protocol ip prio 1 \
flower \
src_mac ff:13:db:00:00:00 dst_mac 01:80:c2:00:00:d0 \
src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
# echo $?
0
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a macro, "AUDIT_SID_UNSET", to replace each instance of
initialization and comparison to an audit session ID.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /proc/vmcore are as follows:
1. During probe (before hardware is initialized), device drivers
register to the vmcore module (via vmcore_add_device_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.
2. vmcore module allocates the buffer with requested size. It adds
an Elf note and invokes the device driver's registered callback
function.
3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to vmcore module.
Ensure that the device dump buffer size is always aligned to page size
so that it can be mmaped.
Also, rename alloc_elfnotes_buf() to vmcore_alloc_buf() to make it more
generic and reserve NT_VMCOREDD note type to indicate vmcore device
dump.
Suggested-by: Eric Biederman <ebiederm@xmission.com>.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although the kernel doesn't use this, qemu imports these headers
and it's best to keep them consistent.
This define is also something userspace may want to use.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20180503021021.10694-1-airlied@gmail.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:
1) Fix handling of simultaneous open TCP connection in conntrack,
from Jozsef Kadlecsik.
2) Insufficient sanitify check of xtables extension names, from
Florian Westphal.
3) Skip unnecessary synchronize_rcu() call when transaction log
is already empty, from Florian Westphal.
4) Incorrect destination mac validation in ebt_stp, from Stephen
Hemminger.
5) xtables module reference counter leak in nft_compat, from
Florian Westphal.
6) Incorrect connection reference counting logic in IPVS
one-packet scheduler, from Julian Anastasov.
7) Wrong stats for 32-bits CPU in IPVS, also from Julian.
8) Calm down sparse error in netfilter core, also from Florian.
9) Use nla_strlcpy to fix compilation warning in nfnetlink_acct
and nfnetlink_cthelper, again from Florian.
10) Missing module alias in icmp and icmp6 xtables extensions,
from Florian Westphal.
11) Base chain statistics in nf_tables may be unset/null, from Florian.
12) Fix handling of large matchinfo size in nft_compat, this includes
one preparation for before this fix. From Florian.
13) Fix bogus EBUSY error when deleting chains due to incorrect reference
counting from the preparation phase of the two-phase commit protocol.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The bpf syscall and selftests conflicts were trivial
overlapping changes.
The r8169 change involved moving the added mdelay from 'net' into a
different function.
A TLS close bug fix overlapped with the splitting of the TLS state
into separate TX and RX parts. I just expanded the tests in the bug
fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
== X".
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Verify lengths of keys provided by the user is AF_KEY, from Kevin
Easton.
2) Add device ID for BCM89610 PHY. Thanks to Bhadram Varka.
3) Add Spectre guards to some ATM code, courtesy of Gustavo A. R.
Silva.
4) Fix infinite loop in NSH protocol code. To Eric Dumazet we are most
grateful for this fix.
5) Line up /proc/net/netlink headers properly. This fix from YU Bo, we
do appreciate.
6) Use after free in TLS code. Once again we are blessed by the
honorable Eric Dumazet with this fix.
7) Fix regression in TLS code causing stalls on partial TLS records.
This fix is bestowed upon us by Andrew Tomt.
8) Deal with too small MTUs properly in LLC code, another great gift
from Eric Dumazet.
9) Handle cached route flushing properly wrt. MTU locking in ipv4, to
Hangbin Liu we give thanks for this.
10) Fix regression in SO_BINDTODEVIC handling wrt. UDP socket demux.
Paolo Abeni, he gave us this.
11) Range check coalescing parameters in mlx4 driver, thank you Moshe
Shemesh.
12) Some ipv6 ICMP error handling fixes in rxrpc, from our good brother
David Howells.
13) Fix kexec on mlx5 by freeing IRQs in shutdown path. Daniel Juergens,
you're the best!
14) Don't send bonding RLB updates to invalid MAC addresses. Debabrata
Benerjee saved us!
15) Uh oh, we were leaking in udp_sendmsg and ping_v4_sendmsg. The ship
is now water tight, thanks to Andrey Ignatov.
16) IPSEC memory leak in ixgbe from Colin Ian King, man we've got holes
everywhere!
17) Fix error path in tcf_proto_create, Jiri Pirko what would we do
without you!
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits)
net sched actions: fix refcnt leak in skbmod
net: sched: fix error path in tcf_proto_create() when modules are not configured
net sched actions: fix invalid pointer dereferencing if skbedit flags missing
ixgbe: fix memory leak on ipsec allocation
ixgbevf: fix ixgbevf_xmit_frame()'s return type
ixgbe: return error on unsupported SFP module when resetting
ice: Set rq_last_status when cleaning rq
ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'
bonding: send learning packets for vlans on slave
bonding: do not allow rlb updates to invalid mac
net/mlx5e: Err if asked to offload TC match on frag being first
net/mlx5: E-Switch, Include VF RDMA stats in vport statistics
net/mlx5: Free IRQs in shutdown path
rxrpc: Trace UDP transmission failure
rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
rxrpc: Fix the min security level for kernel calls
rxrpc: Fix error reception on AF_INET6 sockets
rxrpc: Fix missing start of call timeout
qed: fix spelling mistake: "taskelt" -> "tasklet"
...
Provide a helper for doing a FIB and neighbor lookup in the kernel
tables from an XDP program. The helper provides a fastpath for forwarding
packets. If the packet is a local delivery or for any reason is not a
simple lookup and forward, the packet continues up the stack.
If it is to be forwarded, the forwarding can be done directly if the
neighbor is already known. If the neighbor does not exist, the first
few packets go up the stack for neighbor resolution. Once resolved, the
xdp program provides the fast path.
On successful lookup the nexthop dmac, current device smac and egress
device index are returned.
The API supports IPv4, IPv6 and MPLS protocols, but only IPv4 and IPv6
are implemented in this patch. The API includes layer 4 parameters if
the XDP program chooses to do deep packet inspection to allow compare
against ACLs implemented as FIB rules.
Header rewrite is left to the XDP program.
The lookup takes 2 flags:
- BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes
straight to the table associated with the device (expert setting for
those looking to maximize throughput)
- BPF_FIB_LOOKUP_OUTPUT to do a lookup from the egress perspective.
Default is an ingress lookup.
Initial performance numbers collected by Jesper, forwarded packets/sec:
Full stack XDP FIB lookup XDP Direct lookup
IPv4 1,947,969 7,074,156 7,415,333
IPv6 1,728,000 6,165,504 7,262,720
These number are single CPU core forwarding on a Broadwell
E5-1650 v4 @ 3.60GHz.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* WMM element validation
* SAE timeout
* add-BA timeout
* docbook parsing
* a few memory leaks in error paths
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlrzTUQACgkQB8qZga/f
l8QgqA/+KrZlJPuQjT0xZYYj0rXGc4GzeOJiyDqVz/25sGrmDP8pCnabJhC0l10X
+4edB85XjgPbQ/OZsnRSRaCQDPOuBUiQqCl33u6T/qngHdbxuVoxx8UykAcr/+z1
bmrokLBMXWijaacMine9ouCjjuMKXaiyPsZ5aobMpFAKFcVxHDK2hnlMp2bhfqVw
fuCLtMlvc5G5FGKEfLoT0YzFGhOpcz6Zmnuk58pCoVTUTNDOdbKUVI/4TSGHUnaH
YrS1I+ERIIIRf8NCdUHqvygigHyEYd4WlOtFnmrEEUrBf3YP/frUtf1oUlg77Kdc
bPIKD76cO8ONaFu45z4/nBhzQQPz6idUBMfk+gYuZ6bNxrFkJCHNoSkz2AXiOxQF
tThXUoSMnlrqyW+jJ0JdZHZQqU0vwaJvCh96F97ox3WWPoMlty+JtQ0orVJSkUF1
LHNmKCJfCJ2LfK5xr0tSnhBBeteyE18zaPZMHkEUh7gUkQeov9+wgvtPFdX/pQ0N
/sZCs4QQLmHn3R1f/XSytSIsEoIEneCKN/pwSc63M6SqqVDOE4+Mue7+Jzjq4xad
oY+pFIWusxyUQmC4R01/bQzADROp1vJFUS9/4sDmwsIk+RbRxSGP3o2kHwzl+Wc2
DxxQv2WN8V0L2i+ru7Ck+Q4zAUgxoIHqHWefKpI4MO9liCcBHFQ=
=mJ4m
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2018-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
We only have a few fixes this time:
* WMM element validation
* SAE timeout
* add-BA timeout
* docbook parsing
* a few memory leaks in error paths
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
C libraries with 64-bit time_t use an incompatible format for
struct omap3isp_stat_data. This changes the kernel code to
support either version, by moving over the normal handling
to the 64-bit variant, and adding compatiblity code to handle
the old binary format with the existing ioctl command code.
Fortunately, the command code includes the size of the structure,
so the difference gets handled automatically. In the process of
eliminating the references to 'struct timeval' from the kernel,
I also change the way the timestamp is generated internally,
basically by open-coding the v4l2_get_timestamp() call.
[Sakari Ailus: Alphabetical order of headers, clean up compat code]
Cc: Sakari Ailus <sakari.ailus@iki.fi>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
During BPF_OBJ_GET_INFO_BY_FD on a btf_fd, the current bpf_attr's
info.info is directly filled with the BTF binary data. It is
not extensible. In this case, we want to add BTF ID.
This patch adds "struct bpf_btf_info" which has the BTF ID as
one of its member. The BTF binary data itself is exposed through
the "btf" and "btf_size" members.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch gives an ID to each loaded BTF. The ID is allocated by
the idr like the existing prog-id and map-id.
The bpf_put(map->btf) is moved to __bpf_map_put() so that the
userspace can stop seeing the BTF ID ASAP when the last BTF
refcnt is gone.
It also makes BTF accessible from userspace through the
1. new BPF_BTF_GET_FD_BY_ID command. It is limited to CAP_SYS_ADMIN
which is inline with the BPF_BTF_LOAD cmd and the existing
BPF_[MAP|PROG]_GET_FD_BY_ID cmd.
2. new btf_id (and btf_key_id + btf_value_id) in "struct bpf_map_info"
Once the BTF ID handler is accessible from userspace, freeing a BTF
object has to go through a rcu period. The BPF_BTF_GET_FD_BY_ID cmd
can then be done under a rcu_read_lock() instead of taking
spin_lock.
[Note: A similar rcu usage can be done to the existing
bpf_prog_get_fd_by_id() in a follow up patch]
When processing the BPF_BTF_GET_FD_BY_ID cmd,
refcount_inc_not_zero() is needed because the BTF object
could be already in the rcu dead row . btf_get() is
removed since its usage is currently limited to btf.c
alone. refcount_inc() is used directly instead.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This adds support for exporting the mac80211 TXQ stats via nl80211 by
way of a nested TXQ stats attribute, as well as for configuring the
quantum and limits that were previously only changeable through debugfs.
This commit adds just the nl80211 API, a subsequent commit adds support to
mac80211 itself.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Minor conflict, a CHECK was placed into an if() statement
in net-next, whilst a newline was added to that CHECK
call in 'net'. Thanks to Daniel for the merge resolution.
Signed-off-by: David S. Miller <davem@davemloft.net>
Average ack rssi will be given to userspace via NL80211 interface
if firmware is capable. Userspace tool ‘iw’ can process this
information and give the output as one of the fields in
‘iw dev wlanX station dump’.
Example output :
localhost ~ #iw dev wlan-5000mhz station dump Station
34:f3:9a:aa:3b:29 (on wlan-5000mhz)
inactive time: 5370 ms
rx bytes: 85321
rx packets: 576
tx bytes: 14225
tx packets: 71
tx retries: 0
tx failed: 2
beacon loss: 0
rx drop misc: 0
signal: -54 dBm
signal avg: -53 dBm
tx bitrate: 866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
rx bitrate: 866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
avg ack signal: -56 dBm
authorized: yes
authenticated: yes
associated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
DTIM period: 2
beacon interval:100
short preamble: yes
short slot time:yes
connected time: 203 seconds
Main use case is to measure the signal strength of a connected station
to AP. Data packet transmit rates and bandwidth used by station can vary
a lot even if the station is at fixed location, especially if the rates
used are multi stream(2stream, 3stream) rates with different bandwidth(20/40/80 Mhz).
These multi stream rates are sensitive and station can use different transmit power
for each of the rate and bandwidth combinations. RSSI measured from these RX packets
on AP will be not stable and can vary a lot with in a short time.
Whereas 802.11 ack frames from station are sent relatively at a constant
rate (6/12/24 Mbps) with constant bandwidth(20 Mhz).
So average rssi of the ack packets is good and more accurate.
Signed-off-by: Balaji Pothunoori <bpothuno@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This will serve userspace entity to maintain its regulatory limitation.
More specifcally APs can use this data to calculate the WMM IE when
building: beacons, probe responses, assoc responses etc...
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree, more relevant updates in this batch are:
1) Add Maglev support to IPVS. Moreover, store lastest server weight in
IPVS since this is needed by maglev, patches from from Inju Song.
2) Preparation works to add iptables flowtable support, patches
from Felix Fietkau.
3) Hand over flows back to conntrack slow path in case of TCP RST/FIN
packet is seen via new teardown state, also from Felix.
4) Add support for extended netlink error reporting for nf_tables.
5) Support for larger timeouts that 23 days in nf_tables, patch from
Florian Westphal.
6) Always set an upper limit to dynamic sets, also from Florian.
7) Allow number generator to make map lookups, from Laura Garcia.
8) Use hash_32() instead of opencode hashing in IPVS, from Vicent Bernat.
9) Extend ip6tables SRH match to support previous, next and last SID,
from Ahmed Abdelsalam.
10) Move Passive OS fingerprint nf_osf.c, from Fernando Fernandez.
11) Expose nf_conntrack_max through ctnetlink, from Florent Fourcot.
12) Several housekeeping patches for xt_NFLOG, x_tables and ebtables,
from Taehee Yoo.
13) Unify meta bridge with core nft_meta, then make nft_meta built-in.
Make rt and exthdr built-in too, again from Florian.
14) Missing initialization of tbl->entries in IPVS, from Cong Wang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
IPCTNL_MSG_CT_GET_STATS netlink command allow to monitor current number
of conntrack entries. However, if one wants to compare it with the
maximum (and detect exhaustion), the only solution is currently to read
sysctl value.
This patch add nf_conntrack_max value in netlink message, and simplify
monitoring for application built on netlink API.
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add nf_osf_ttl() and nf_osf_match() into nf_osf.c to prepare for
nf_tables support.
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
These macros allow conveniently declaring arrays which use NFT_{RT,CT}_*
values as indexes.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
IPv6 Segment Routing Header (SRH) contains a list of SIDs to be crossed
by SR encapsulated packet. Each SID is encoded as an IPv6 prefix.
When a Firewall receives an SR encapsulated packet, it should be able
to identify which node previously processed the packet (previous SID),
which node is going to process the packet next (next SID), and which
node is the last to process the packet (last SID) which represent the
final destination of the packet in case of inline SR mode.
An example use-case of using these features could be SID list that
includes two firewalls. When the second firewall receives a packet,
it can check whether the packet has been processed by the first firewall
or not. Based on that check, it decides to apply all rules, apply just
subset of the rules, or totally skip all rules and forward the packet to
the next SID.
This patch extends SRH match to support matching previous SID, next SID,
and last SID.
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch includes a new attribute in the numgen structure to allow
the lookup of an element based on the number generator as a key.
For this purpose, different ops have been included to extend the
current numgen inc functions.
Currently, only supported for numgen incremental operations, but
it will be supported for random in a follow-up patch.
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)
- SPDX license tag cleanups (new tag Linux-OpenIB)
- RoCE GID fixes related to default GIDs
- Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
mlx4, mlx5, and hfi1 (medium batch)
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJa7JXPAAoJELgmozMOVy/dc0AP/0i7EajAmgl1ihka6BYVj2pa
DV8iSrVMDPulh9AVnAtwLJSbdwmgN/HeVzLzcutHyMYk6tAf8RCs6TsyoB36XiOL
oUh5+V2GyNnyh9veWPwyGTgZKCpPJc3uQFV6502lZVDYwArMfGatumApBgQVKiJ+
YdPEXEQZPNIs6YZB1WXkNYV/ra9u0aBByQvUrxwVZ2AND+srJYO82tqZit2wBtjK
UXrhmZbWXGWMFg8K3/lpfUkQhkG3Arj+tMKkCfqsVzC7wUPhlTKBHR9NmvdLIiy9
5Vhv7Xp78udcxZKtUeTFsbhaMqqK7x7sKHnpKAs7hOZNZ/Eg47BrMwMrZVLOFuDF
nBLUL1H+nJ1mASZoMWH5xzOpVew+e9X0cot09pVDBIvsOIh97wCG7hgptQ2Z5xig
fcDiMmg6tuakMsaiD0dzC9JI5HR6Z7+6oR1tBkQFDxQ+XkkcoFabdmkJaIRRwOj7
CUhXRgcm0UgVd03Jdta6CtYXsjSODirWg4AvSSMt9lUFpjYf9WZP00/YojcBbBEH
UlVrPbsKGyncgrm3FUP6kXmScESfdTljTPDLiY9cO9+bhhPGo1OHf005EfAp178B
jGp6hbKlt+rNs9cdXrPSPhjds+QF8HyfSlwyYVWKw8VWlh/5DG8uyGYjF05hYO0q
xhjIS6/EZjcTAh5e4LzR
=PI8v
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"This is our first pull request of the rc cycle. It's not that it's
been overly quiet, we were just waiting on a few things before sending
this off.
For instance, the 6 patch series from Intel for the hfi1 driver had
actually been pulled in on Tuesday for a Wednesday pull request, only
to have Jason notice something I missed, so we held off for some
testing, and then on Thursday had to respin the series because the
very first patch needed a minor fix (unnecessary cast is all).
There is a sizable hns patch series in here, as well as a reasonably
largish hfi1 patch series, then all of the lines of uapi updates are
just the change to the new official Linux-OpenIB SPDX tag (a bunch of
our files had what amounts to a BSD-2-Clause + MIT Warranty statement
as their license as a result of the initial code submission years ago,
and the SPDX folks decided it was unique enough to warrant a unique
tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4
and core/cache/cma updates to round out the bunch.
None of it was overly large by itself, but in the 2 1/2 weeks we've
been collecting patches, it has added up :-/.
As best I can tell, it's been through 0day (I got a notice about my
last for-next push, but not for my for-rc push, but Jason seems to
think that failure messages are prioritized and success messages not
so much). It's also been through linux-next. And yes, we did notice in
the context portion of the CMA query gid fix patch that there is a
dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage
and remove it anywhere we can.
Summary:
- Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)
- SPDX license tag cleanups (new tag Linux-OpenIB)
- RoCE GID fixes related to default GIDs
- Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
mlx4, mlx5, and hfi1 (medium batch)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits)
RDMA/cma: Do not query GID during QP state transition to RTR
IB/mlx4: Fix integer overflow when calculating optimal MTT size
IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
IB/hfi1: Fix loss of BECN with AHG
IB/hfi1 Use correct type for num_user_context
IB/hfi1: Fix handling of FECN marked multicast packet
IB/core: Make ib_mad_client_id atomic
iw_cxgb4: Atomically flush per QP HW CQEs
IB/uverbs: Fix kernel crash during MR deregistration flow
IB/uverbs: Prevent reregistration of DM_MR to regular MR
RDMA/mlx4: Add missed RSS hash inner header flag
RDMA/hns: Fix a couple misspellings
RDMA/hns: Submit bad wr
RDMA/hns: Update assignment method for owner field of send wqe
RDMA/hns: Adjust the order of cleanup hem table
RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set
RDMA/hns: Remove some unnecessary attr_mask judgement
RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set
...
If a seccomp user is not interested in Speculative Store Bypass mitigation
by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when
adding filters.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For certain use cases it is desired to enforce mitigations so they cannot
be undone afterwards. That's important for loader stubs which want to
prevent a child from disabling the mitigation again. Will also be used for
seccomp(). The extra state preserving of the prctl state for SSB is a
preparatory step for EBPF dymanic speculation control.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This adds a small BPF helper similar to bpf_skb_load_bytes() that
is able to load relative to mac/net header offset from the skb's
linear data. Compared to bpf_skb_load_bytes(), it takes a fifth
argument namely start_header, which is either BPF_HDR_START_MAC
or BPF_HDR_START_NET. This allows for a more flexible alternative
compared to LD_ABS/LD_IND with negative offset. It's enabled for
tc BPF programs as well as sock filter program types where it's
mainly useful in reuseport programs to ease access to lower header
data.
Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2017-March/000698.html
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In this commit, a new getsockopt is added: XDP_STATISTICS. This is
used to obtain stats from the sockets.
v2: getsockopt now returns size of stats structure.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Another setsockopt (XDP_TX_QUEUE) is added to let the process allocate
a queue, where the user process can pass frames to be transmitted by
the kernel.
The mmapping of the queue is done using the XDP_PGOFF_TX_QUEUE offset.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Here, we add another setsockopt for registered user memory (umem)
called XDP_UMEM_COMPLETION_QUEUE. Using this socket option, the
process can ask the kernel to allocate a queue (ring buffer) and also
mmap it (XDP_UMEM_PGOFF_COMPLETION_QUEUE) into the process.
The queue is used to explicitly pass ownership of umem frames from the
kernel to user process. This will be used by the TX path to tell user
space that a certain frame has been transmitted and user space can use
it for something else, if it wishes.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The xskmap is yet another BPF map, very much inspired by
dev/cpu/sockmap, and is a holder of AF_XDP sockets. A user application
adds AF_XDP sockets into the map, and by using the bpf_redirect_map
helper, an XDP program can redirect XDP frames to an AF_XDP socket.
Note that a socket that is bound to certain ifindex/queue index will
*only* accept XDP frames from that netdev/queue index. If an XDP
program tries to redirect from a netdev/queue index other than what
the socket is bound to, the frame will not be received on the socket.
A socket can reside in multiple maps.
v3: Fixed race and simplified code.
v2: Removed one indirection in map lookup.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Here, the bind syscall is added. Binding an AF_XDP socket, means
associating the socket to an umem, a netdev and a queue index. This
can be done in two ways.
The first way, creating a "socket from scratch". Create the umem using
the XDP_UMEM_REG setsockopt and an associated fill queue with
XDP_UMEM_FILL_QUEUE. Create the Rx queue using the XDP_RX_QUEUE
setsockopt. Call bind passing ifindex and queue index ("channel" in
ethtool speak).
The second way to bind a socket, is simply skipping the
umem/netdev/queue index, and passing another already setup AF_XDP
socket. The new socket will then have the same umem/netdev/queue index
as the parent so it will share the same umem. You must also set the
flags field in the socket address to XDP_SHARED_UMEM.
v2: Use PTR_ERR instead of passing error variable explicitly.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Another setsockopt (XDP_RX_QUEUE) is added to let the process allocate
a queue, where the kernel can pass completed Rx frames from the kernel
to user process.
The mmapping of the queue is done using the XDP_PGOFF_RX_QUEUE offset.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Here, we add another setsockopt for registered user memory (umem)
called XDP_UMEM_FILL_QUEUE. Using this socket option, the process can
ask the kernel to allocate a queue (ring buffer) and also mmap it
(XDP_UMEM_PGOFF_FILL_QUEUE) into the process.
The queue is used to explicitly pass ownership of umem frames from the
user process to the kernel. These frames will in a later patch be
filled in with Rx packet data by the kernel.
v2: Fixed potential crash in xsk_mmap.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In this commit the base structure of the AF_XDP address family is set
up. Further, we introduce the abilty register a window of user memory
to the kernel via the XDP_UMEM_REG setsockopt syscall. The memory
window is viewed by an AF_XDP socket as a set of equally large
frames. After a user memory registration all frames are "owned" by the
user application, and not the kernel.
v2: More robust checks on umem creation and unaccount on error.
Call set_page_dirty_lock on cleanup.
Simplified xdp_umem_reg.
Co-authored-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add two new prctls to control aspects of speculation related vulnerabilites
and their mitigations to provide finer grained control over performance
impacting mitigations.
PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature
which is selected with arg2 of prctl(2). The return value uses bit 0-2 with
the following meaning:
Bit Define Description
0 PR_SPEC_PRCTL Mitigation can be controlled per task by
PR_SET_SPECULATION_CTRL
1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is
disabled
2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is
enabled
If all bits are 0 the CPU is not affected by the speculation misfeature.
If PR_SPEC_PRCTL is set, then the per task control of the mitigation is
available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation
misfeature will fail.
PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which
is selected by arg2 of prctl(2) per task. arg3 is used to hand in the
control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE.
The common return values are:
EINVAL prctl is not implemented by the architecture or the unused prctl()
arguments are not 0
ENODEV arg2 is selecting a not supported speculation misfeature
PR_SET_SPECULATION_CTRL has these additional return values:
ERANGE arg3 is incorrect, i.e. it's not either PR_SPEC_ENABLE or PR_SPEC_DISABLE
ENXIO prctl control of the selected speculation misfeature is disabled
The first supported controlable speculation misfeature is
PR_SPEC_STORE_BYPASS. Add the define so this can be shared between
architectures.
Based on an initial patch from Tim Chen and mostly rewritten.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is the io_getevents equivalent of ppoll/pselect and allows to
properly mix signals and aio completions (especially with IOCB_CMD_POLL)
and atomically executes the following sequence:
sigset_t origmask;
pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
ret = io_getevents(ctx, min_nr, nr, events, timeout);
pthread_sigmask(SIG_SETMASK, &origmask, NULL);
Note that unlike many other signal related calls we do not pass a sigmask
size, as that would get us to 7 arguments, which aren't easily supported
by the syscall infrastructure. It seems a lot less painful to just add a
new syscall variant in the unlikely case we're going to increase the
sigset size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read from the socket.
The number of bytes pending on the socket is directly
available through ioctl(FIONREAD/SIOCINQ) and can be
approximated using getsockopt(MEMINFO) (rmem_alloc includes
skb overheads in addition to application data). But, both of
these options add an extra syscall per recvmsg. Moreover,
ioctl(FIONREAD/SIOCINQ) takes the socket lock.
Add the TCP_INQ socket option to TCP. When this socket
option is set, recvmsg() relays the number of bytes available
on the socket for reading to the application via the
TCP_CM_INQ control message.
Calculate the number of bytes after releasing the socket lock
to include the processed backlog, if any. To avoid an extra
branch in the hot path of recvmsg() for this new control
message, move all cmsg processing inside an existing branch for
processing receive timestamps. Since the socket lock is not held
when calculating the size of receive queue, TCP_INQ is a hint.
For example, it can overestimate the queue size by one byte,
if FIN is received.
With this method, applications can start reading from the socket
using a small buffer, and then use larger buffers based on the
remaining data when needed.
V3 change-log:
As suggested by David Miller, added loads with barrier
to check whether we have multiple threads calling recvmsg
in parallel. When that happens we lock the socket to
calculate inq.
V4 change-log:
Removed inline from a static function.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The intention is to get notified of process failures as soon
as possible, before a possible core dumping (which could be very long)
(e.g. in some process-manager). Coredump and exit process events
are perfect for such use cases (see 2b5faa4c55 "connector: Added
coredumping event to the process connector").
The problem is that for now the process-manager cannot know the parent
of a dying process using connectors. This could be useful if the
process-manager should monitor for failures only children of certain
parents, so we could filter the coredump and exit events by parent
process and/or thread ID.
Add parent pid and tgid to coredump and exit process connectors event
data.
Signed-off-by: Stefan Strogin <sstrogin@cisco.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This resolves the merge issue with drivers/usb/core/hcd.c
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix formatting (indent) for bpf_get_stack() helper documentation, so
that the doc is rendered correctly with the Python script.
Fixes: c195651e56 ("bpf: add bpf_get_stack helper")
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Some edits brought to the last iteration of BPF helper functions
documentation introduced an error with RST formatting. As a result, most
of one paragraph is rendered in bold text when only the name of a helper
should be. Fix it, and fix formatting of another function name in the
same paragraph.
Fixes: c6b5fb8690 ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.
Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.
1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
This operation does not involve any TCP locking.
2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
the transfert of pages from skbs to one VMA.
This operation only uses down_read(¤t->mm->mmap_sem) after
holding TCP lock, thus solving the lockdep issue.
This new implementation was suggested by Andy Lutomirski with great details.
Benefits are :
- Better scalability, in case multiple threads reuse VMAS
(without mmap()/munmap() calls) since mmap_sem wont be write locked.
- Better error recovery.
The previous mmap() model had to provide the expected size of the
mapping. If for some reason one part could not be mapped (partial MSS),
the whole operation had to be aborted.
With the tcp_zerocopy_receive struct, kernel can report how
many bytes were successfuly mapped, and how many bytes should
be read to skip the problematic sequence.
- No more memory allocation to hold an array of page pointers.
16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/
- skbs are freed while mmap_sem has been released
Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)
Note that memcg might require additional changes.
Fixes: 93ab6cc691 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull timer fixes from Thomas Gleixner:
"Two fixes from the timer departement:
- Fix a long standing issue in the NOHZ tick code which causes RB
tree corruption, delayed timers and other malfunctions. The cause
for this is code which modifies the expiry time of an enqueued
hrtimer.
- Revert the CLOCK_MONOTONIC/CLOCK_BOOTTIME unification due to
regression reports. Seems userspace _is_ relying on the documented
behaviour despite our hope that it wont"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME
tick/sched: Do not mess with an enqueued hrtimer
Helpers may operate on two types of ctx structures: user visible ones
(e.g. `struct bpf_sock_ops`) when used in user programs, and kernel ones
(e.g. `struct bpf_sock_ops_kern`) in kernel implementation.
UAPI documentation must refer to only user visible structures.
The patch replaces references to `_kern` structures in BPF helpers
description by corresponding user visible structures.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, stackmap and bpf_get_stackid helper are provided
for bpf program to get the stack trace. This approach has
a limitation though. If two stack traces have the same hash,
only one will get stored in the stackmap table,
so some stack traces are missing from user perspective.
This patch implements a new helper, bpf_get_stack, will
send stack traces directly to bpf program. The bpf program
is able to see all stack traces, and then can do in-kernel
processing or send stack traces to user space through
shared map or bpf_perf_event_output.
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ARM:
- PSCI selection API, a leftover from 4.16 (for stable)
- Kick vcpu on active interrupt affinity change
- Plug a VMID allocation race on oversubscribed systems
- Silence debug messages
- Update Christoffer's email address (linaro -> arm)
x86:
- Expose userspace-relevant bits of a newly added feature
- Fix TLB flushing on VMX with VPID, but without EPT
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJa44lQAAoJEED/6hsPKofo1dIH/3n9AZSWvavgL2V3j6agT8Yy
hxF4nHCFEJd5aqDNwbG9QEzivKw88r3o3mdB2XAQESB2MlCYR1jkTONm7yvVJTs/
/P9gj+DEQbCj2AgT//u3BGsAsZDKFhB9JwfmV2Mp4zDIqWFa6oCOGeq/iPVAGDcN
vUpuYeIicuH9SRoxH7de3z+BEXW0O+gCABXQtvA93FKTMz35yFTgmbDVCnvaV0zL
3B+3/4/jdbTRICW8EX6Li43+gEBUMtnVNkdqxLPTuCtDG8iuPUGfgF02gH99/9gj
hliV3Q4VUZKkSABW5AqKPe4+9rbsHCh9eL0LpHFGI9y+6LeUIOXAX4CtohR8gWE=
=W9Vz
-----END PGP SIGNATURE-----
rMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- PSCI selection API, a leftover from 4.16 (for stable)
- Kick vcpu on active interrupt affinity change
- Plug a VMID allocation race on oversubscribed systems
- Silence debug messages
- Update Christoffer's email address (linaro -> arm)
x86:
- Expose userspace-relevant bits of a newly added feature
- Fix TLB flushing on VMX with VPID, but without EPT"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
x86/headers/UAPI: Move DISABLE_EXITS KVM capability bits to the UAPI
kvm: apic: Flush TLB after APIC mode/address change if VPIDs are in use
arm/arm64: KVM: Add PSCI version selection API
KVM: arm/arm64: vgic: Kick new VCPU on interrupt migration
arm64: KVM: Demote SVE and LORegion warnings to debug only
MAINTAINERS: Update e-mail address for Christoffer Dall
KVM: arm/arm64: Close VMID generation race
The Link Control 2 register is missing macros for Target Link Speeds. Add
those in.
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
[bhelgaas: use "GT" instead of "GB"]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Move DISABLE_EXITS KVM capability bits to the UAPI just like the rest of
capabilities.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Here are 2 staging driver fixups for 4.17-rc3.
The first is the remaining stragglers of the irda code removal that you
pointed out during the merge window. The second is a fix for the
wilc1000 driver due to a patch that got merged in 4.17-rc1.
Both of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWuMyew8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymXxACffYtMbj0Vg5pD0yAPqRzJ2iVMVE0AnRkp4BYQ
kXgAjDeSyrdKPUwQ7Hl2
=UNuF
-----END PGP SIGNATURE-----
Merge tag 'staging-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging fixes from Greg KH:
"Here are two staging driver fixups for 4.17-rc3.
The first is the remaining stragglers of the irda code removal that
you pointed out during the merge window. The second is a fix for the
wilc1000 driver due to a patch that got merged in 4.17-rc1.
Both of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: wilc1000: fix NULL pointer exception in host_int_parse_assoc_resp_info()
staging: irda: remove remaining remants of irda code removal
After the introduction of a 128-bit node identity it may be difficult
for a user to correlate between this identity and the generated node
hash address.
We now try to make this easier by introducing a new ioctl() call for
fetching a node identity by using the hash value as key. This will
be particularly useful when we extend some of the commands in the
'tipc' tool, but we also expect regular user applications to need
this feature.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-04-27
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add extensive BPF helper description into include/uapi/linux/bpf.h
and a new script bpf_helpers_doc.py which allows for generating a
man page out of it. Thus, every helper in BPF now comes with proper
function signature, detailed description and return code explanation,
from Quentin.
2) Migrate the BPF collect metadata tunnel tests from BPF samples over
to the BPF selftests and further extend them with v6 vxlan, geneve
and ipip tests, simplify the ipip tests, improve documentation and
convert to bpf_ntoh*() / bpf_hton*() api, from William.
3) Currently, helpers that expect ARG_PTR_TO_MAP_{KEY,VALUE} can only
access stack and packet memory. Extend this to allow such helpers
to also use map values, which enabled use cases where value from
a first lookup can be directly used as a key for a second lookup,
from Paul.
4) Add a new helper bpf_skb_get_xfrm_state() for tc BPF programs in
order to retrieve XFRM state information containing SPI, peer
address and reqid values, from Eyal.
5) Various optimizations in nfp driver's BPF JIT in order to turn ADD
and SUB instructions with negative immediate into the opposite
operation with a positive immediate such that nfp can better fit
small immediates into instructions. Savings in instruction count
up to 4% have been observed, from Jakub.
6) Add the BPF prog's gpl_compatible flag to struct bpf_prog_info
and add support for dumping this through bpftool, from Jiri.
7) Move the BPF sockmap samples over into BPF selftests instead since
sockmap was rather a series of tests than sample anyway and this way
this can be run from automated bots, from John.
8) Follow-up fix for bpf_adjust_tail() helper in order to make it work
with generic XDP, from Nikita.
9) Some follow-up cleanups to BTF, namely, removing unused defines from
BTF uapi header and renaming 'name' struct btf_* members into name_off
to make it more clear they are offsets into string section, from Martin.
10) Remove test_sock_addr from TEST_GEN_PROGS in BPF selftests since
not run directly but invoked from test_sock_addr.sh, from Yonghong.
11) Remove redundant ret assignment in sample BPF loader, from Wang.
12) Add couple of missing files to BPF selftest's gitignore, from Anders.
There are two trivial merge conflicts while pulling:
1) Remove samples/sockmap/Makefile since all sockmap tests have been
moved to selftests.
2) Add both hunks from tools/testing/selftests/bpf/.gitignore to the
file since git should ignore all of them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
I don't know why signalfd has never grown support for SIGSYS but grow it now.
This corrects an oversight and removes a need for a default in the
switch statement. Allowing gcc to warn when future members are added
to the enum siginfo_layout, and signalfd does not handle them.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Dominique Martinet reported a TCP hang problem when simultaneous open was used.
The problem is that the tcp_conntracks state table is not smart enough
to handle the case. The state table could be fixed by introducing a new state,
but that would require more lines of code compared to this patch, due to the
required backward compatibility with ctnetlink.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Reported-by: Dominique Martinet <asmadeus@codewreck.org>
Tested-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions:
Helper from Nikita:
- bpf_xdp_adjust_tail()
Helper from Eyal:
- bpf_skb_get_xfrm_state()
v4:
- New patch (helpers did not exist yet for previous versions).
Cc: Nikita V. Shirokov <tehnerd@tehnerd.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions, all
written by John:
- bpf_redirect_map()
- bpf_sk_redirect_map()
- bpf_sock_map_update()
- bpf_msg_redirect_map()
- bpf_msg_apply_bytes()
- bpf_msg_cork_bytes()
- bpf_msg_pull_data()
v4:
- bpf_redirect_map(): Fix typos: "XDP_ABORT" changed to "XDP_ABORTED",
"his" to "this". Also add a paragraph on performance improvement over
bpf_redirect() helper.
v3:
- bpf_sk_redirect_map(): Improve description of BPF_F_INGRESS flag.
- bpf_msg_redirect_map(): Improve description of BPF_F_INGRESS flag.
- bpf_redirect_map(): Fix note on CPU redirection, not fully implemented
for generic XDP but supported on native XDP.
- bpf_msg_pull_data(): Clarify comment about invalidated verifier
checks.
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions:
Helpers from Lawrence:
- bpf_setsockopt()
- bpf_getsockopt()
- bpf_sock_ops_cb_flags_set()
Helpers from Yonghong:
- bpf_perf_event_read_value()
- bpf_perf_prog_read_value()
Helper from Josef:
- bpf_override_return()
Helper from Andrey:
- bpf_bind()
v4:
- bpf_perf_event_read_value(): State that this helper should be
preferred over bpf_perf_event_read().
v3:
- bpf_perf_event_read_value(): Fix time of selection for perf event type
in description. Remove occurences of "cores" to avoid confusion with
"CPU".
- bpf_bind(): Remove last paragraph of description, which was off topic.
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
[for bpf_perf_event_read_value(), bpf_perf_prog_read_value()]
Acked-by: Andrey Ignatov <rdna@fb.com>
[for bpf_bind()]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions:
Helper from Kaixu:
- bpf_perf_event_read()
Helpers from Martin:
- bpf_skb_under_cgroup()
- bpf_xdp_adjust_head()
Helpers from Sargun:
- bpf_probe_write_user()
- bpf_current_task_under_cgroup()
Helper from Thomas:
- bpf_skb_change_head()
Helper from Gianluca:
- bpf_probe_read_str()
Helpers from Chenbo:
- bpf_get_socket_cookie()
- bpf_get_socket_uid()
v4:
- bpf_perf_event_read(): State that bpf_perf_event_read_value() should
be preferred over this helper.
- bpf_skb_change_head(): Clarify comment about invalidated verifier
checks.
- bpf_xdp_adjust_head(): Clarify comment about invalidated verifier
checks.
- bpf_probe_write_user(): Add that dst must be a valid user space
address.
- bpf_get_socket_cookie(): Improve description by making clearer that
the cockie belongs to the socket, and state that it remains stable for
the life of the socket.
v3:
- bpf_perf_event_read(): Fix time of selection for perf event type in
description. Remove occurences of "cores" to avoid confusion with
"CPU".
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Gianluca Borello <g.borello@gmail.com>
Cc: Chenbo Feng <fengc@google.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
[for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions, all
written by Daniel:
- bpf_get_hash_recalc()
- bpf_skb_change_tail()
- bpf_skb_pull_data()
- bpf_csum_update()
- bpf_set_hash_invalid()
- bpf_get_numa_node_id()
- bpf_set_hash()
- bpf_skb_adjust_room()
- bpf_xdp_adjust_meta()
v4:
- bpf_skb_change_tail(): Clarify comment about invalidated verifier
checks.
- bpf_skb_pull_data(): Clarify the motivation for using this helper or
bpf_skb_load_bytes(), on non-linear buffers. Fix RST formatting for
*skb*. Clarify comment about invalidated verifier checks.
- bpf_csum_update(): Fix description of checksum (entire packet, not IP
checksum). Fix a typo: "header" instead of "helper".
- bpf_set_hash_invalid(): Mention bpf_get_hash_recalc().
- bpf_get_numa_node_id(): State that the helper is not restricted to
programs attached to sockets.
- bpf_skb_adjust_room(): Clarify comment about invalidated verifier
checks.
- bpf_xdp_adjust_meta(): Clarify comment about invalidated verifier
checks.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions, all
written by Daniel:
- bpf_get_prandom_u32()
- bpf_get_smp_processor_id()
- bpf_get_cgroup_classid()
- bpf_get_route_realm()
- bpf_skb_load_bytes()
- bpf_csum_diff()
- bpf_skb_get_tunnel_opt()
- bpf_skb_set_tunnel_opt()
- bpf_skb_change_proto()
- bpf_skb_change_type()
v4:
- bpf_get_prandom_u32(): Warn that the prng is not cryptographically
secure.
- bpf_get_smp_processor_id(): Fix a typo (case).
- bpf_get_cgroup_classid(): Clarify description. Add notes on the helper
being limited to cgroup v1, and to egress path.
- bpf_get_route_realm(): Add comparison with bpf_get_cgroup_classid().
Add a note about usage with TC and advantage of clsact. Fix a typo in
return value ("sdb" instead of "skb").
- bpf_skb_load_bytes(): Make explicit loading large data loads it to the
eBPF stack.
- bpf_csum_diff(): Add a note on seed that can be cascaded. Link to
bpf_l3|l4_csum_replace().
- bpf_skb_get_tunnel_opt(): Add a note about usage with "collect
metadata" mode, and example of this with Geneve.
- bpf_skb_set_tunnel_opt(): Add a link to bpf_skb_get_tunnel_opt()
description.
- bpf_skb_change_proto(): Mention that the main use case is NAT64.
Clarify comment about invalidated verifier checks.
v3:
- bpf_get_prandom_u32(): Fix helper name :(. Add description, including
a note on the internal random state.
- bpf_get_smp_processor_id(): Add description, including a note on the
processor id remaining stable during program run.
- bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is
required to use the helper. Add a reference to related documentation.
State that placing a task in net_cls controller disables cgroup-bpf.
- bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is
required to use this helper.
- bpf_skb_load_bytes(): Fix comment on current use cases for the helper.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions, all
written by Alexei:
- bpf_get_current_pid_tgid()
- bpf_get_current_uid_gid()
- bpf_get_current_comm()
- bpf_skb_vlan_push()
- bpf_skb_vlan_pop()
- bpf_skb_get_tunnel_key()
- bpf_skb_set_tunnel_key()
- bpf_redirect()
- bpf_perf_event_output()
- bpf_get_stackid()
- bpf_get_current_task()
v4:
- bpf_redirect(): Fix typo: "XDP_ABORT" changed to "XDP_ABORTED". Add
note on bpf_redirect_map() providing better performance. Replace "Save
for" with "Except for".
- bpf_skb_vlan_push(): Clarify comment about invalidated verifier
checks.
- bpf_skb_vlan_pop(): Clarify comment about invalidated verifier
checks.
- bpf_skb_get_tunnel_key(): Add notes on tunnel_id, "collect metadata"
mode, and example tunneling protocols with which it can be used.
- bpf_skb_set_tunnel_key(): Add a reference to the description of
bpf_skb_get_tunnel_key().
- bpf_perf_event_output(): Specify that, and for what purpose, the
helper can be used with programs attached to TC and XDP.
v3:
- bpf_skb_get_tunnel_key(): Change and improve description and example.
- bpf_redirect(): Improve description of BPF_F_INGRESS flag.
- bpf_perf_event_output(): Fix first sentence of description. Delete
wrong statement on context being evaluated as a struct pt_reg. Remove
the long yet incomplete example.
- bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being
configurable.
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add documentation for eBPF helper functions to bpf.h user header file.
This documentation can be parsed with the Python script provided in
another commit of the patch series, in order to provide a RST document
that can later be converted into a man page.
The objective is to make the documentation easily understandable and
accessible to all eBPF developers, including beginners.
This patch contains descriptions for the following helper functions, all
written by Alexei:
- bpf_map_lookup_elem()
- bpf_map_update_elem()
- bpf_map_delete_elem()
- bpf_probe_read()
- bpf_ktime_get_ns()
- bpf_trace_printk()
- bpf_skb_store_bytes()
- bpf_l3_csum_replace()
- bpf_l4_csum_replace()
- bpf_tail_call()
- bpf_clone_redirect()
v4:
- bpf_map_lookup_elem(): Add "const" qualifier for key.
- bpf_map_update_elem(): Add "const" qualifier for key and value.
- bpf_map_lookup_elem(): Add "const" qualifier for key.
- bpf_skb_store_bytes(): Clarify comment about invalidated verifier
checks.
- bpf_l3_csum_replace(): Mention L3 instead of just IP, and add a note
about bpf_csum_diff().
- bpf_l4_csum_replace(): Mention L4 instead of just TCP/UDP, and add a
note about bpf_csum_diff().
- bpf_tail_call(): Bring minor edits to description.
- bpf_clone_redirect(): Add a note about the relation with
bpf_redirect(). Also clarify comment about invalidated verifier
checks.
v3:
- bpf_map_lookup_elem(): Fix description of restrictions for flags
related to the existence of the entry.
- bpf_trace_printk(): State that trace_pipe can be configured. Fix
return value in case an unknown format specifier is met. Add a note on
kernel log notice when the helper is used. Edit example.
- bpf_tail_call(): Improve comment on stack inheritance.
- bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag.
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Remove previous "overview" of eBPF helpers from user bpf.h header.
Replace it by a comment explaining how to process the new documentation
(to come in following patches) with a Python script to produce RST, then
man page documentation.
Also add the aforementioned Python script under scripts/. It is used to
process include/uapi/linux/bpf.h and to extract helper descriptions, to
turn it into a RST document that can further be processed with rst2man
to produce a man page. The script takes one "--filename <path/to/file>"
option. If the script is launched from scripts/ in the kernel root
directory, it should be able to find the location of the header to
parse, and "--filename <path/to/file>" is then optional. If it cannot
find the file, then the option becomes mandatory. RST-formatted
documentation is printed to standard output.
Typical workflow for producing the final man page would be:
$ ./scripts/bpf_helpers_doc.py \
--filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
$ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
$ man /tmp/bpf-helpers.7
Note that the tool kernel-doc cannot be used to document eBPF helpers,
whose signatures are not available directly in the header files
(pre-processor directives are used to produce them at the beginning of
the compilation process).
v4:
- Also remove overviews for newly added bpf_xdp_adjust_tail() and
bpf_skb_get_xfrm_state().
- Remove vague statement about what helpers are restricted to GPL
programs in "LICENSE" section for man page footer.
- Replace license boilerplate with SPDX tag for Python script.
v3:
- Change license for man page.
- Remove "for safety reasons" from man page header text.
- Change "packets metadata" to "packets" in man page header text.
- Move and fix comment on helpers introducing no overhead.
- Remove "NOTES" section from man page footer.
- Add "LICENSE" section to man page footer.
- Edit description of file include/uapi/linux/bpf.h in man page footer.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Adding gpl_compatible flag to struct bpf_prog_info
so it can be dumped via bpf_prog_get_info_by_fd and
displayed via bpftool progs dump.
Alexei noticed 4-byte hole in struct bpf_prog_info,
so we put the u32 flags field in there, and we can
keep adding bit fields in there without breaking
user space.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Support generic segmentation offload for udp datagrams. Callers can
concatenate and send at once the payload of multiple datagrams with
the same destination.
To set segment size, the caller sets socket option UDP_SEGMENT to the
length of each discrete payload. This value must be smaller than or
equal to the relevant MTU.
A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a
per send call basis.
Total byte length may then exceed MTU. If not an exact multiple of
segment size, the last segment will be shorter.
The implementation adds a gso_size field to the udp socket, ip(v6)
cmsg cookie and inet_cork structure to be able to set the value at
setsockopt or cmsg time and to work with both lockless and corked
paths.
Initial benchmark numbers show UDP GSO about as expensive as TCP GSO.
tcp tso
3197 MB/s 54232 msg/s 54232 calls/s
6,457,754,262 cycles
tcp gso
1765 MB/s 29939 msg/s 29939 calls/s
11,203,021,806 cycles
tcp without tso/gso *
739 MB/s 12548 msg/s 12548 calls/s
11,205,483,630 cycles
udp
876 MB/s 14873 msg/s 624666 calls/s
11,205,777,429 cycles
udp gso
2139 MB/s 36282 msg/s 36282 calls/s
11,204,374,561 cycles
[*] after reverting commit 0a6b2a1dc2
("tcp: switch to GSO being always on")
Measured total system cycles ('-a') for one core while pinning both
the network receive path and benchmark process to that core:
perf stat -a -C 12 -e cycles \
./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4
Note the reduction in calls/s with GSO. Bytes per syscall drops
increases from 1470 to 61818.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Revert commits
92af4dcb4e ("tracing: Unify the "boot" and "mono" tracing clocks")
127bfa5f43 ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior")
7250a4047a ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior")
d6c7270e91 ("timekeeping: Remove boot time specific code")
f2d6fdbfd2 ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior")
d6ed449afd ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock")
72199320d4 ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock")
As stated in the pull request for the unification of CLOCK_MONOTONIC and
CLOCK_BOOTTIME, it was clear that we might have to revert the change.
As reported by several folks systemd and other applications rely on the
documented behaviour of CLOCK_MONOTONIC on Linux and break with the above
changes. After resume daemons time out and other timeout related issues are
observed. Rafael compiled this list:
* systemd kills daemons on resume, after >WatchdogSec seconds
of suspending (Genki Sky). [Verified that that's because systemd uses
CLOCK_MONOTONIC and expects it to not include the suspend time.]
* systemd-journald misbehaves after resume:
systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal
corrupted or uncleanly shut down, renaming and replacing.
(Mike Galbraith).
* NetworkManager reports "networking disabled" and networking is broken
after resume 50% of the time (Pavel). [May be because of systemd.]
* MATE desktop dims the display and starts the screensaver right after
system resume (Pavel).
* Full system hang during resume (me). [May be due to systemd or NM or both.]
That happens on debian and open suse systems.
It's sad, that these problems were neither catched in -next nor by those
folks who expressed interest in this change.
Reported-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Reported-by: Genki Sky <sky@genki.is>,
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
This commit introduces a helper which allows fetching xfrm state
parameters by eBPF programs attached to TC.
Prototype:
bpf_skb_get_xfrm_state(skb, index, xfrm_state, size, flags)
skb: pointer to skb
index: the index in the skb xfrm_state secpath array
xfrm_state: pointer to 'struct bpf_xfrm_state'
size: size of 'struct bpf_xfrm_state'
flags: reserved for future extensions
The helper returns 0 on success. Non zero if no xfrm state at the index
is found - or non exists at all.
struct bpf_xfrm_state currently includes the SPI, peer IPv4/IPv6
address and the reqid; it can be further extended by adding elements to
its end - indicating the populated fields by the 'size' argument -
keeping backwards compatibility.
Typical usage:
struct bpf_xfrm_state x = {};
bpf_skb_get_xfrm_state(skb, 0, &x, sizeof(x), 0);
...
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jason Wang points out that it's very hard for users to build an array of
stat names. The naive thing is to use VIRTIO_BALLOON_S_NR but that
breaks if we add more stats - as done e.g. recently by commit 6c64fe7f2
("virtio_balloon: export hugetlb page allocation counts").
Let's add an array of reasonably readable names.
Fixes: 6c64fe7f2 ("virtio_balloon: export hugetlb page allocation counts")
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jonathan Helman <jonathan.helman@oracle.com>
ebt_get_target similar to {ip/ip6/arp}t_get_target.
and ebt_get_target_c similar to {ip/ip6/arp}t_get_target_c.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is a patch proposal to support shifted ranges in portmaps. (i.e. tcp/udp
incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100)
Currently DNAT only works for single port or identical port ranges. (i.e.
ports 5000-5100 on WAN interface redirected to a LAN host while original
destination port is not altered) When different port ranges are configured,
either 'random' mode should be used, or else all incoming connections are
mapped onto the first port in the redirect range. (in described example
WAN:5000-5100 will all be mapped to 192.168.1.5:2000)
This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET
which uses a base port value to calculate an offset with the destination port
present in the incoming stream. That offset is then applied as index within the
redirect port range (index modulo rangewidth to handle range overflow).
In described example the base port would be 5000. An incoming stream with
destination port 5004 would result in an offset value 4 which means that the
NAT'ed stream will be using destination port 2004.
Other possibilities include deterministic mapping of larger or multiple ranges
to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port
51xx)
This patch does not change any current behavior. It just adds new NAT proto
range functionality which must be selected via the specific flag when intended
to use.
A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed
which makes this functionality immediately available.
Signed-off-by: Thierry Du Tre <thierry@dtsystems.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Based on discussion with Kate Stewart this license is not a
BSD-2-Clause, but is now formally identified as Linux-OpenIB
by SPDX.
The key difference between the licenses is in the 'warranty'
paragraph.
if_infiniband.h refers to the 'OpenIB.org' license, but
does not include the text, instead it links to an obsolete
web site that contains a license that matches the BSD-2-Clause
SPX. There is no 'three clause' version of the OpenIB.org
license.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch cleans up btf.h in uapi:
1) Rename "name" to "name_off" to better reflect it is an offset to the
string section instead of a char array.
2) Remove unused value BTF_FLAGS_COMPR and BTF_MAGIC_SWAP
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull perf fixes from Thomas Gleixner:
"A larger set of updates for perf.
Kernel:
- Handle the SBOX uncore monitoring correctly on Broadwell CPUs which
do not have SBOX.
- Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]. The
percentage of preempting and non-preempting context switches help
understanding the nature of workloads (CPU or IO bound) that are
running on a machine. This adds the kernel facility and userspace
changes needed to show this information in 'perf script' and 'perf
report -D' (Alexey Budankov)
- Remove a WARN_ON() in the trace/kprobes code which is pointless
because the return error code is already telling the caller what's
wrong.
- Revert a fugly workaround for clang BPF targets.
- Fix sample_max_stack maximum check and do not proceed when an error
has been detect, return them to avoid misidentifying errors (Jiri
Olsa)
- Add SPDX idenitifiers and get rid of GPL boilderplate.
Tools:
- Synchronize kernel ABI headers, v4.17-rc1 (Ingo Molnar)
- Support MAP_FIXED_NOREPLACE, noticed when updating the
tools/include/ copies (Arnaldo Carvalho de Melo)
- Add '\n' at the end of parse-options error messages (Ravi Bangoria)
- Add s390 support for detailed/verbose PMU event description (Thomas
Richter)
- perf annotate fixes and improvements:
* Allow showing offsets in more than just jump targets, use the
new 'O' hotkey in the TUI, config ~/.perfconfig
annotate.offset_level for it and for --stdio2 (Arnaldo Carvalho
de Melo)
* Use the resolved variable names from objdump disassembled lines
to make them more compact, just like was already done for some
instructions, like "mov", this eventually will be done more
generally, but lets now add some more to the existing mechanism
(Arnaldo Carvalho de Melo)
- perf record fixes:
* Change warning for missing topology sysfs entry to debug, as not
all architectures have those files, s390 being one of those
(Thomas Richter)
* Remove old error messages about things that unlikely to be the
root cause in modern systems (Andi Kleen)
- perf sched fixes:
* Fix -g/--call-graph documentation (Takuya Yamamoto)
- perf stat:
* Enable 1ms interval for printing event counters values in
(Alexey Budankov)
- perf test fixes:
* Run dwarf unwind on arm32 (Kim Phillips)
* Remove unused ptrace.h include from LLVM test, sidesteping older
clang's lack of support for some asm constructs (Arnaldo
Carvalho de Melo)
* Fixup BPF test using epoll_pwait syscall function probe, to cope
with the syscall routines renames performed in this development
cycle (Arnaldo Carvalho de Melo)
- perf version fixes:
* Do not print info about HAVE_LIBAUDIT_SUPPORT in 'perf version
--build-options' when HAVE_SYSCALL_TABLE_SUPPORT is true, as
libaudit won't be used in that case, print info about
syscall_table support instead (Jin Yao)
- Build system fixes:
* Use HAVE_..._SUPPORT used consistently (Jin Yao)
* Restore READ_ONCE() C++ compatibility in tools/include (Mark
Rutland)
* Give hints about package names needed to build jvmti (Arnaldo
Carvalho de Melo)"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs
perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server"
coresight: Move to SPDX identifier
perf test BPF: Fixup BPF test using epoll_pwait syscall function probe
perf tests mmap: Show which tracepoint is failing
perf tools: Add '\n' at the end of parse-options error messages
perf record: Remove suggestion to enable APIC
perf record: Remove misleading error suggestion
perf hists browser: Clarify top/report browser help
perf mem: Allow all record/report options
perf trace: Support MAP_FIXED_NOREPLACE
perf: Remove superfluous allocation error check
perf: Fix sample_max_stack maximum check
perf: Return proper values for user stack errors
perf list: Add s390 support for detailed/verbose PMU event description
perf script: Extend misc field decoding with switch out event type
perf report: Extend raw dump (-D) out with switch out event type
perf/core: Store context switch out type in PERF_RECORD_SWITCH[_CPU_WIDE]
tools/headers: Synchronize kernel ABI headers, v4.17-rc1
trace_kprobe: Remove warning message "Could not insert probe at..."
...
USB 3.2 specification adds Dual-lane support, doubling the maximum
SuperSpeedPlus data rate from 10Gbps to 20Gbps.
Dual-lane takes into use a second set of rx and tx wires/pins in the
Type-C cable and connector.
Add "rx_lanes" and "tx_lanes" variables to struct usb_device to store
the numer of lanes in use. Number of lanes can be read using the extended
port status hub request that was introduced in USB 3.1.
Extended port status rx and tx lane count are zero based, maximum
lanes supported by non inter-chip (SSIC) USB 3.2 is 2 (dual lane) with
rx and tx lane count symmetric. SSIC devices support asymmetric lanes
up to 4 lanes per direction.
If extended port status is not available then default to one lane.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
unblock earlier than designed. Thanks to Jann Horn from Google's
Project Zero for pointing this out to me.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlrcCWAACgkQ8vlZVpUN
gaOedwf/e1OU7CXMiinCcGfsr5XydZrEivaqS9QmqAKsLzJSNDDu1Jw6N9cbWagp
OEIIRZdaFPImZHosEbjOW12Z3nxnlDC8jtOLyLIRGSA2u4RXd03RupHhQW4cE7ys
EOljEvK5KFDIlPa947R5/k4CzC4O3PGf1GdWhHmkENOgd23GqI/yOTKQq5Z5ZgAp
rZzcXiuCSq1QkLME7ElxoOLQhs+fYiVGoAM/maxLa+2g4M1Y/YlHBDGhG4RB4lLA
3zugbyJ15tNfgNuRvCB4x304WkCp5VDlcsCiKq18LFcrkz1SYGj5LwG/bswDqgkS
0mOtZKu68NhutX8Pcy4vY3iOmMa1/Q==
=RhHb
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull /dev/random fixes from Ted Ts'o:
"Fix some bugs in the /dev/random driver which causes getrandom(2) to
unblock earlier than designed.
Thanks to Jann Horn from Google's Project Zero for pointing this out
to me"
* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: add new ioctl RNDRESEEDCRNG
random: crng_reseed() should lock the crng instance that it is modifying
random: set up the NUMA crng instances after the CRNG is fully initialized
random: use a different mixing algorithm for add_device_randomness()
random: fix crng_ready() test
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-04-21
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Initial work on BPF Type Format (BTF) is added, which is a meta
data format which describes the data types of BPF programs / maps.
BTF has its roots from CTF (Compact C-Type format) with a number
of changes to it. First use case is to provide a generic pretty
print capability for BPF maps inspection, later work will also
add BTF to bpftool. pahole support to convert dwarf to BTF will
be upstreamed as well (https://github.com/iamkafai/pahole/tree/btf),
from Martin.
2) Add a new xdp_bpf_adjust_tail() BPF helper for XDP that allows
for changing the data_end pointer. Only shrinking is currently
supported which helps for crafting ICMP control messages. Minor
changes in drivers have been added where needed so they recalc
the packet's length also when data_end was adjusted, from Nikita.
3) Improve bpftool to make it easier to feed hex bytes via cmdline
for map operations, from Quentin.
4) Add support for various missing BPF prog types and attach types
that have been added to kernel recently but neither to bpftool
nor libbpf yet. Doc and bash completion updates have been added
as well for bpftool, from Andrey.
5) Proper fix for avoiding to leak info stored in frame data on page
reuse for the two bpf_xdp_adjust_{head,meta} helpers by disallowing
to move the pointers into struct xdp_frame area, from Jesper.
6) Follow-up compile fix from BTF in order to include stdbool.h in
libbpf, from Björn.
7) Few fixes in BPF sample code, that is, a typo on the netdevice
in a comment and fixup proper dump of XDP action code in the
tracepoint exception, from Wang and Jesper.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Make documentation on target-supported userspace-I/O design be
usable by kernel-doc by using "DOC:". This is used in the driver-api
Documentation chapter.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: linux-scsi@vger.kernel.org
Cc: target-devel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In previous commit, we changed the default emulated MTU for UDP bearers
to 14k.
This commit adds the functionality to set/change the default value
by configuring new MTU for UDP media. UDP bearer(s) have to be disabled
and enabled back for the new MTU to take effect.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, all bearers are configured with MTU value same as the
underlying L2 device. However, in case of bearers with media type
UDP, higher throughput is possible with a fixed and higher emulated
MTU value than adapting to the underlying L2 MTU.
In this commit, we introduce a parameter mtu in struct tipc_media
and a default value is set for UDP. A default value of 14k
was determined by experimentation and found to have a higher throughput
than 16k. MTU for UDP bearers are assigned the above set value of
media MTU.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the kernel now modifies the timeout, make it possible to retrieve
the current value.
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
This patch adds pretty print support to the basic arraymap.
Support for other bpf maps can be added later.
This patch adds new attrs to the BPF_MAP_CREATE command to allow
specifying the btf_fd, btf_key_id and btf_value_id. The
BPF_MAP_CREATE can then associate the btf to the map if
the creating map supports BTF.
A BTF supported map needs to implement two new map ops,
map_seq_show_elem() and map_check_btf(). This patch has
implemented these new map ops for the basic arraymap.
It also adds file_operations, bpffs_map_fops, to the pinned
map such that the pinned map can be opened and read.
After that, the user has an intuitive way to do
"cat bpffs/pathto/a-pinned-map" instead of getting
an error.
bpffs_map_fops should not be extended further to support
other operations. Other operations (e.g. write/key-lookup...)
should be realized by the userspace tools (e.g. bpftool) through
the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc.
Follow up patches will allow the userspace to obtain
the BTF from a map-fd.
Here is a sample output when reading a pinned arraymap
with the following map's value:
struct map_value {
int count_a;
int count_b;
};
cat /sys/fs/bpf/pinned_array_map:
0: {1,2}
1: {3,4}
2: {5,6}
...
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds a BPF_BTF_LOAD command which
1) loads and verifies the BTF (implemented in earlier patches)
2) returns a BTF fd to userspace. In the next patch, the
BTF fd can be specified during BPF_MAP_CREATE.
It currently limits to CAP_SYS_ADMIN.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch introduces BPF type Format (BTF).
BTF (BPF Type Format) is the meta data format which describes
the data types of BPF program/map. Hence, it basically focus
on the C programming language which the modern BPF is primary
using. The first use case is to provide a generic pretty print
capability for a BPF map.
BTF has its root from CTF (Compact C-Type format). To simplify
the handling of BTF data, BTF removes the differences between
small and big type/struct-member. Hence, BTF consistently uses u32
instead of supporting both "one u16" and "two u32 (+padding)" in
describing type and struct-member.
It also raises the number of types (and functions) limit
from 0x7fff to 0x7fffffff.
Due to the above changes, the format is not compatible to CTF.
Hence, BTF starts with a new BTF_MAGIC and version number.
This patch does the first verification pass to the BTF. The first
pass checks:
1. meta-data size (e.g. It does not go beyond the total btf's size)
2. name_offset is valid
3. Each BTF_KIND (e.g. int, enum, struct....) does its
own check of its meta-data.
Some other checks, like checking a struct's member is referring
to a valid type, can only be done in the second pass. The second
verification pass will be implemented in the next patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Export data delivered and delivered with CE marks to
1) SNMP TCPDelivered and TCPDeliveredCE
2) getsockopt(TCP_INFO)
3) Timestamping API SOF_TIMESTAMPING_OPT_STATS
Note that for SCM_TSTAMP_ACK, the delivery info in
SOF_TIMESTAMPING_OPT_STATS is reported before the info
was fully updated on the ACK.
These stats help application monitor TCP delivery and ECN status
on per host, per connection, even per message level.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's currently no limit on wiphy names, other than netlink
message size and memory limitations, but that causes issues when,
for example, the wiphy name is used in a uevent, e.g. in rfkill
where we use the same name for the rfkill instance, and then the
buffer there is "only" 2k for the environment variables.
This was reported by syzkaller, which used a 4k name.
Limit the name to something reasonable, I randomly picked 128.
Reported-by: syzbot+230d9e642a85d3fec29c@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The new struct __kernel_timespec is similar to current
internal kernel struct timespec64 on 64 bit architecture.
The compat structure however is similar to below on little
endian systems (padding and tv_nsec are switched for big
endian systems):
typedef s32 compat_long_t;
typedef s64 compat_kernel_time64_t;
struct compat_kernel_timespec {
compat_kernel_time64_t tv_sec;
compat_long_t tv_nsec;
compat_long_t padding;
};
This allows for both the native and compat representations to
be the same and syscalls using this type as part of their ABI
can have a single entry point to both.
Note that the compat define is not included anywhere in the
kernel explicitly to avoid confusion.
These types will be used by the new syscalls that will be
introduced in the consequent patches.
Most of the new syscalls are just an update to the existing
native ones with this new type. Hence, put this new type under
an ifdef so that the architectures can define CONFIG_64BIT_TIME
when they are ready to handle this switch.
Cc: linux-arch@vger.kernel.org
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Adding new bpf helper which would allow us to manipulate
xdp's data_end pointer, and allow us to reduce packet's size
indended use case: to generate ICMP messages from XDP context,
where such message would contain truncated original packet.
Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Like tos inherit, ttl inherit should also means inherit the inner protocol's
ttl values, which actually not implemented in vxlan yet.
But we could not treat ttl == 0 as "use the inner TTL", because that would be
used also when the "ttl" option is not specified and that would be a behavior
change, and breaking real use cases.
So add a different attribute IFLA_VXLAN_TTL_INHERIT when "ttl inherit" is
specified with ip cmd.
Reported-by: Jianlin Shi <jishi@redhat.com>
Suggested-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Store preempting context switch out event into Perf trace as a part of
PERF_RECORD_SWITCH[_CPU_WIDE] record.
Percentage of preempting and non-preempting context switches help
understanding the nature of workloads (CPU or IO bound) that are running
on a machine;
The event is treated as preemption one when task->state value of the
thread being switched out is TASK_RUNNING. Event type encoding is
implemented using PERF_RECORD_MISC_SWITCH_OUT_PREEMPT bit;
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/9ff84e83-a0ca-dd82-a6d0-cb951689be74@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds missing values for the max read request size.
E.g. network driver r8169 uses a value of 4K.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>