Include <sys/socket.h> (guarded by ifndef __KERNEL__) to fix
the following linux/if.h userspace compilation errors:
/usr/include/linux/if.h:234:19: error: field 'ifru_addr' has incomplete type
struct sockaddr ifru_addr;
/usr/include/linux/if.h:235:19: error: field 'ifru_dstaddr' has incomplete type
struct sockaddr ifru_dstaddr;
/usr/include/linux/if.h:236:19: error: field 'ifru_broadaddr' has incomplete type
struct sockaddr ifru_broadaddr;
/usr/include/linux/if.h:237:19: error: field 'ifru_netmask' has incomplete type
struct sockaddr ifru_netmask;
/usr/include/linux/if.h:238:20: error: field 'ifru_hwaddr' has incomplete type
struct sockaddr ifru_hwaddr;
This also fixes userspace compilation of the following uapi headers:
linux/atmbr2684.h
linux/gsmmux.h
linux/if_arp.h
linux/if_bonding.h
linux/if_frad.h
linux/if_pppox.h
linux/if_tunnel.h
linux/netdevice.h
linux/route.h
linux/wireless.h
As no uapi header provides a definition of struct sockaddr, inclusion
of <sys/socket.h> seems to be the most conservative and the only safe
fix available.
All current users of <linux/if.h> are very likely to be including
<sys/socket.h> already because the latter is the sole provider
of struct sockaddr definition in libc, so adding a uapi header
with a definition of struct sockaddr would create a potential
conflict with <sys/socket.h>.
Replacing struct sockaddr in the definition of struct ifreq with
a different type would create a potential incompatibility with current
users of struct ifreq who might rely on ifru_addr et al members being
of type struct sockaddr.
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the big tty/serial driver patchset for 4.11-rc1.
Not much here, but a lot of little fixes and individual serial driver
updates all over the subsystem. Majority are for the sh-sci driver and
platform (the arch-specific changes have acks from the maintainer).
The start of the "serial bus" code is here as well, but nothing is
converted to use it yet. That work is still ongoing, hopefully will
start to show up across different subsystems for 4.12 (bluetooth is one
major place that will be used.)
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2lDg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylwfwCgyExa4x8Lur2nZyu8cgcaVRU68VAAoNQh0WJt
EZdEhEkRMt4d64j+ApYI
=iv7x
-----END PGP SIGNATURE-----
Merge tag 'tty-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big tty/serial driver patchset for 4.11-rc1.
Not much here, but a lot of little fixes and individual serial driver
updates all over the subsystem. Majority are for the sh-sci driver and
platform (the arch-specific changes have acks from the maintainer).
The start of the "serial bus" code is here as well, but nothing is
converted to use it yet. That work is still ongoing, hopefully will
start to show up across different subsystems for 4.12 (bluetooth is
one major place that will be used.)
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (109 commits)
tty: pl011: Work around QDF2400 E44 stuck BUSY bit
atmel_serial: Use the fractional divider when possible
tty: Remove extra include in HVC console tty framework
serial: exar: Enable MSI support
serial: exar: Move register defines from uapi header to consumer site
serial: pci: Remove unused pci_boards entries
serial: exar: Move Commtech adapters to 8250_exar as well
serial: exar: Fix feature control register constants
serial: exar: Fix initialization of EXAR registers for ports > 0
serial: exar: Fix mapping of port I/O resources
serial: sh-sci: fix hardware RX trigger level setting
tty/serial: atmel: ensure state is restored after suspending
serial: 8250_dw: Avoid "too much work" from bogus rx timeout interrupt
serdev: ttyport: check whether tty_init_dev() fails
serial: 8250_pci: make pciserial_detach_ports() static
ARM: dts: STiH410-b2260: Enable HW flow-control
ARM: dts: STiH407-family: Use new Pinctrl groups
ARM: dts: STiH407-pinctrl: Add Pinctrl group for HW flow-control
ARM: dts: STiH410-b2260: Identify the UART RTS line
dt-bindings: serial: Update 'uart-has-rtscts' description
...
Here is the big staging and iio driver patchsets for 4.11-rc1.
We almost broke even this time around, with only a few thousand lines
added overall, as we removed the old and obsolete i4l code, but added
some new drivers for the RPi platform, as well as adding some new IIO
drivers.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2j/w8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymZ1ACdFR4o6xYrWEizmao4a/u+lUZE1aIAnRmcGcIc
J+leO1n9bE5iadQvKYUW
=sKVA
-----END PGP SIGNATURE-----
Merge tag 'staging-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/iio driver updates from Greg KH:
"Here is the big staging and iio driver patchsets for 4.11-rc1.
We almost broke even this time around, with only a few thousand lines
added overall, as we removed the old and obsolete i4l code, but added
some new drivers for the RPi platform, as well as adding some new IIO
drivers.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (669 commits)
Staging: vc04_services: Fix the "space prohibited" code style errors
Staging: vc04_services: Fix the "wrong indent" code style errors
staging: octeon: Use net_device_stats from struct net_device
Staging: rtl8192u: ieee80211: ieee80211.h - style fix
Staging: rtl8192u: ieee80211: ieee80211_tx.c - style fix
Staging: rtl8192u: ieee80211: rtl819x_BAProc.c - style fix
Staging: rtl8192u: ieee80211: ieee80211_module.c - style fix
Staging: rtl8192u: ieee80211: rtl819x_TSProc.c - style fix
Staging: rtl8192u: r8192U.h - style fix
Staging: rtl8192u: r8192U_core.c - style fix
Staging: rtl8192u: r819xU_cmdpkt.c - style fix
staging: rtl8192u: blank lines aren't necessary before a close brace '}'
staging: rtl8192u: Adding space after enum and struct definition
staging: rtl8192u: Adding space after struct definition
Staging: ks7010: Add required and preferred spaces around operators
Staging: ks7010: ks*: Remove redundant blank lines
Staging: ks7010: ks*: Add missing blank lines after declarations
staging: visorbus, replace init_timer with setup_timer
staging: vt6656: rxtx.c Removed multiple dereferencing
staging: vt6656: Alignment match open parenthesis
...
Here is the big char/misc driver patchset for 4.11-rc1.
Lots of different driver subsystems updated here. Rework for the hyperv
subsystem to handle new platforms better, mei and w1 and extcon driver
updates, as well as a number of other "minor" driver updates. Full
details are in the shortlog below.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWK2iRQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynhFACguVE+/ixj5u5bT5DXQaZNai/6zIAAmgMWwd/t
YTD2cwsJsGbTT1fY3SUe
=CiSI
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here is the big char/misc driver patchset for 4.11-rc1.
Lots of different driver subsystems updated here: rework for the
hyperv subsystem to handle new platforms better, mei and w1 and extcon
driver updates, as well as a number of other "minor" driver updates.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (169 commits)
goldfish: Sanitize the broken interrupt handler
x86/platform/goldfish: Prevent unconditional loading
vmbus: replace modulus operation with subtraction
vmbus: constify parameters where possible
vmbus: expose hv_begin/end_read
vmbus: remove conditional locking of vmbus_write
vmbus: add direct isr callback mode
vmbus: change to per channel tasklet
vmbus: put related per-cpu variable together
vmbus: callback is in softirq not workqueue
binder: Add support for file-descriptor arrays
binder: Add support for scatter-gather
binder: Add extra size to allocator
binder: Refactor binder_transact()
binder: Support multiple /dev instances
binder: Deal with contexts in debugfs
binder: Support multiple context managers
binder: Split flat_binder_object
auxdisplay: ht16k33: remove private workqueue
auxdisplay: ht16k33: rework input device initialization
...
Highlights include:
- Support for direct mapped LPC on POWER9, giving Linux direct access to
devices that may be on there such as a UART.
- Memory hotplug support for the Power9 Radix MMU.
- Add new AUX vectors describing the processor's cache geometry, to be used by
glibc.
- The ability for a guest to ask the hypervisor to resize the guest's hash
table, and in addition support for doing so automatically when memory is
hotplugged into/out-of the guest. This allows the hash table to be sized
based on the current memory usage of the guest, rather than the maximum
possible memory usage.
- Implementation of optprobes (kprobe optimisation) for powerpc.
In addition there's the topic branch shared with the KVM tree, which includes
support for guests to use the Radix MMU on Power9.
Thanks to:
Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T, Anton Blanchard,
Benjamin Herrenschmidt, Chris Packham, Daniel Axtens, Daniel Borkmann, David
Gibson, Finn Thain, Gautham R. Shenoy, Gavin Shan, Greg Kurz, Joel Stanley,
John Allen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael
Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Ravi
Bangoria, Reza Arbab, Shailendra Singh, Vaibhav Jain, Wei Yongjun.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYrWj0AAoJEFHr6jzI4aWAsn4P/08Kz3TtOvDuuPGVNoO7fWOn
ag5/zVt8R4FuCALqWpAZbVqMuUU4wLxG0RuWlmBNYYhrjMC6JxHpOSjQXxM2D7YT
CdGTJxG414r6HMOeToL9i/z33o0m+KT07tscer+QMKlXVKCR2z0fEJch+zoPBHYA
5eVBFpLLTtbNiX9UnrcM/vYz61d56kT4YJey9/8qbAkTAc1rMPa8ucU5UiKYJ7yX
mF8cd7WE+7aqif9V8yN59G2rcbz+h3pbMw/gzImiYsYrUj4fLjU+VTKL5PPT+UFy
WWBXD3MAMm1dksMMZi3hgoo2BZhDn3RkymeYi6Jo4kDknNMPZzkMxGyvaJ8Eq5H/
bIYXdS1AbTtvaaEEuWqDFjpnChOEvj/8IeqitU0jjlql8BVjNKg/ESaaKucZr+pO
pk2Mfvw0Tb/lxJT5qj27yq4aRsxJwdFOoPYCN7MquPp/wV2Tg5M6h4nVQ4T6Wl0S
tMFQeCqXflhDWh0Xgr2UXpF66/YTj3Du5LasOTkgGeU30Z8TcNGFEmDWShKP3cEm
e0oQE+OHhIGN4WSBXBAto/Gw8/0v3pXlMs+VEfeHqenwPss2sWtSwXWe8khmiy9e
DJ48sTzj75/Zx1fiqRldw9YEnrL+NK0eOOpvzxeyKpfvdUytc+chFfEqUmO6kl3Z
DW2UmlZxmW+b0SfexCHL
=Icle
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights include:
- Support for direct mapped LPC on POWER9, giving Linux direct access
to devices that may be on there such as a UART.
- Memory hotplug support for the Power9 Radix MMU.
- Add new AUX vectors describing the processor's cache geometry, to
be used by glibc.
- The ability for a guest to ask the hypervisor to resize the guest's
hash table, and in addition support for doing so automatically when
memory is hotplugged into/out-of the guest. This allows the hash
table to be sized based on the current memory usage of the guest,
rather than the maximum possible memory usage.
- Implementation of optprobes (kprobe optimisation) for powerpc.
In addition there's the topic branch shared with the KVM tree, which
includes support for guests to use the Radix MMU on Power9.
Thanks to:
Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T, Anton
Blanchard, Benjamin Herrenschmidt, Chris Packham, Daniel Axtens,
Daniel Borkmann, David Gibson, Finn Thain, Gautham R. Shenoy, Gavin
Shan, Greg Kurz, Joel Stanley, John Allen, Madhavan Srinivasan,
Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Nathan Fontenot,
Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Ravi Bangoria, Reza
Arbab, Shailendra Singh, Vaibhav Jain, Wei Yongjun"
* tag 'powerpc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (129 commits)
powerpc/mm/radix: Skip ptesync in pte update helpers
powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mm
powerpc/mm/radix: Update pte update sequence for pte clear case
powerpc/mm: Update PROTFAULT handling in the page fault path
powerpc/xmon: Fix data-breakpoint
powerpc/mm: Fix build break with BOOK3S_64=n and MEMORY_HOTPLUG=y
powerpc/mm: Fix build break when CMA=n && SPAPR_TCE_IOMMU=y
powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n
powerpc/pseries: Fix typo in parameter description
powerpc/kprobes: Remove kprobe_exceptions_notify()
kprobes: Introduce weak variant of kprobe_exceptions_notify()
powerpc/ftrace: Fix confusing help text for DISABLE_MPROFILE_KERNEL
powerpc/powernv: Fix opal_exit tracepoint opcode
powerpc: Add a prototype for mcount() so it can be versioned
powerpc: Drop GPL from of_node_to_nid() export to match other arches
powerpc/kprobes: Optimize kprobe in kretprobe_trampoline()
powerpc/kprobes: Implement Optprobes
powerpc/kprobes: Fixes for kprobe_lookup_name() on BE
powerpc: Add helper to check if offset is within relative branch range
powerpc/bpf: Introduce __PPC_SH64()
...
Pull networking updates from David Miller:
"Highlights:
1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
Varadhan.
2) Simplify classifier state on sk_buff in order to shrink it a bit.
From Willem de Bruijn.
3) Introduce SIPHASH and it's usage for secure sequence numbers and
syncookies. From Jason A. Donenfeld.
4) Reduce CPU usage for ICMP replies we are going to limit or
suppress, from Jesper Dangaard Brouer.
5) Introduce Shared Memory Communications socket layer, from Ursula
Braun.
6) Add RACK loss detection and allow it to actually trigger fast
recovery instead of just assisting after other algorithms have
triggered it. From Yuchung Cheng.
7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.
8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.
9) Export MPLS packet stats via netlink, from Robert Shearman.
10) Significantly improve inet port bind conflict handling, especially
when an application is restarted and changes it's setting of
reuseport. From Josef Bacik.
11) Implement TX batching in vhost_net, from Jason Wang.
12) Extend the dummy device so that VF (virtual function) features,
such as configuration, can be more easily tested. From Phil
Sutter.
13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
Dumazet.
14) Add new bpf MAP, implementing a longest prefix match trie. From
Daniel Mack.
15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.
16) Add new aquantia driver, from David VomLehn.
17) Add bpf tracepoints, from Daniel Borkmann.
18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
Florian Fainelli.
19) Remove custom busy polling in many drivers, it is done in the core
networking since 4.5 times. From Eric Dumazet.
20) Support XDP adjust_head in virtio_net, from John Fastabend.
21) Fix several major holes in neighbour entry confirmation, from
Julian Anastasov.
22) Add XDP support to bnxt_en driver, from Michael Chan.
23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.
24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.
25) Support GRO in IPSEC protocols, from Steffen Klassert"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
Revert "ath10k: Search SMBIOS for OEM board file extension"
net: socket: fix recvmmsg not returning error from sock_error
bnxt_en: use eth_hw_addr_random()
bpf: fix unlocking of jited image when module ronx not set
arch: add ARCH_HAS_SET_MEMORY config
net: napi_watchdog() can use napi_schedule_irqoff()
tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
net/hsr: use eth_hw_addr_random()
net: mvpp2: enable building on 64-bit platforms
net: mvpp2: switch to build_skb() in the RX path
net: mvpp2: simplify MVPP2_PRS_RI_* definitions
net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
net: mvpp2: remove unused register definitions
net: mvpp2: simplify mvpp2_bm_bufs_add()
net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
net: mvpp2: release reference to txq_cpu[] entry after unmapping
net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
...
Pull audit updates from Paul Moore:
"The audit changes for v4.11 are relatively small compared to what we
did for v4.10, both in terms of size and impact.
- two patches from Steve tweak the formatting for some of the audit
records to make them more consistent with other audit records.
- three patches from Richard record the name of a module on module
load, fix the logging of sockaddr information when using
socketcall() on 32-bit systems, and add the ability to reset
audit's lost record counter.
- my lone patch just fixes an annoying style nit that I was reminded
about by one of Richard's patches.
All these patches pass our test suite"
* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
audit: remove unnecessary curly braces from switch/case statements
audit: log module name on init_module
audit: log 32-bit socketcalls
audit: add feature audit_lost reset
audit: Make AUDIT_ANOM_ABEND event normalized
audit: Make AUDIT_KERNEL event conform to the specification
Should be - 1 as in other _MAX definitions.
Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add netconf support to MPLS. Allows userpsace to learn and be notified
of changes to 'input' enable setting per interface.
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to add Stream Reset Event described in rfc6525
section 6.1.1.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the kernel side, sockaddr_storage is #define'd to
__kernel_sockaddr_storage. Replacing struct sockaddr_storage with
struct __kernel_sockaddr_storage defined by <linux/socket.h> fixes
the following linux/rds.h userspace compilation error:
/usr/include/linux/rds.h:226:26: error: field 'dest_addr' has incomplete type
struct sockaddr_storage dest_addr;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consistently use types from linux/types.h to fix the following
linux/rds.h userspace compilation errors:
/usr/include/linux/rds.h:106:2: error: unknown type name 'uint8_t'
uint8_t name[32];
/usr/include/linux/rds.h:107:2: error: unknown type name 'uint64_t'
uint64_t value;
/usr/include/linux/rds.h:117:2: error: unknown type name 'uint64_t'
uint64_t next_tx_seq;
/usr/include/linux/rds.h:118:2: error: unknown type name 'uint64_t'
uint64_t next_rx_seq;
/usr/include/linux/rds.h:121:2: error: unknown type name 'uint8_t'
uint8_t transport[TRANSNAMSIZ]; /* null term ascii */
/usr/include/linux/rds.h:122:2: error: unknown type name 'uint8_t'
uint8_t flags;
/usr/include/linux/rds.h:129:2: error: unknown type name 'uint64_t'
uint64_t seq;
/usr/include/linux/rds.h:130:2: error: unknown type name 'uint32_t'
uint32_t len;
/usr/include/linux/rds.h:135:2: error: unknown type name 'uint8_t'
uint8_t flags;
/usr/include/linux/rds.h:139:2: error: unknown type name 'uint32_t'
uint32_t sndbuf;
/usr/include/linux/rds.h:144:2: error: unknown type name 'uint32_t'
uint32_t rcvbuf;
/usr/include/linux/rds.h:145:2: error: unknown type name 'uint64_t'
uint64_t inum;
/usr/include/linux/rds.h:153:2: error: unknown type name 'uint64_t'
uint64_t hdr_rem;
/usr/include/linux/rds.h:154:2: error: unknown type name 'uint64_t'
uint64_t data_rem;
/usr/include/linux/rds.h:155:2: error: unknown type name 'uint32_t'
uint32_t last_sent_nxt;
/usr/include/linux/rds.h:156:2: error: unknown type name 'uint32_t'
uint32_t last_expected_una;
/usr/include/linux/rds.h:157:2: error: unknown type name 'uint32_t'
uint32_t last_seen_una;
/usr/include/linux/rds.h:164:2: error: unknown type name 'uint8_t'
uint8_t src_gid[RDS_IB_GID_LEN];
/usr/include/linux/rds.h:165:2: error: unknown type name 'uint8_t'
uint8_t dst_gid[RDS_IB_GID_LEN];
/usr/include/linux/rds.h:167:2: error: unknown type name 'uint32_t'
uint32_t max_send_wr;
/usr/include/linux/rds.h:168:2: error: unknown type name 'uint32_t'
uint32_t max_recv_wr;
/usr/include/linux/rds.h:169:2: error: unknown type name 'uint32_t'
uint32_t max_send_sge;
/usr/include/linux/rds.h:170:2: error: unknown type name 'uint32_t'
uint32_t rdma_mr_max;
/usr/include/linux/rds.h:171:2: error: unknown type name 'uint32_t'
uint32_t rdma_mr_size;
/usr/include/linux/rds.h:212:9: error: unknown type name 'uint64_t'
typedef uint64_t rds_rdma_cookie_t;
/usr/include/linux/rds.h:215:2: error: unknown type name 'uint64_t'
uint64_t addr;
/usr/include/linux/rds.h:216:2: error: unknown type name 'uint64_t'
uint64_t bytes;
/usr/include/linux/rds.h:221:2: error: unknown type name 'uint64_t'
uint64_t cookie_addr;
/usr/include/linux/rds.h:222:2: error: unknown type name 'uint64_t'
uint64_t flags;
/usr/include/linux/rds.h:228:2: error: unknown type name 'uint64_t'
uint64_t cookie_addr;
/usr/include/linux/rds.h:229:2: error: unknown type name 'uint64_t'
uint64_t flags;
/usr/include/linux/rds.h:234:2: error: unknown type name 'uint64_t'
uint64_t flags;
/usr/include/linux/rds.h:240:2: error: unknown type name 'uint64_t'
uint64_t local_vec_addr;
/usr/include/linux/rds.h:241:2: error: unknown type name 'uint64_t'
uint64_t nr_local;
/usr/include/linux/rds.h:242:2: error: unknown type name 'uint64_t'
uint64_t flags;
/usr/include/linux/rds.h:243:2: error: unknown type name 'uint64_t'
uint64_t user_token;
/usr/include/linux/rds.h:248:2: error: unknown type name 'uint64_t'
uint64_t local_addr;
/usr/include/linux/rds.h:249:2: error: unknown type name 'uint64_t'
uint64_t remote_addr;
/usr/include/linux/rds.h:252:4: error: unknown type name 'uint64_t'
uint64_t compare;
/usr/include/linux/rds.h:253:4: error: unknown type name 'uint64_t'
uint64_t swap;
/usr/include/linux/rds.h:256:4: error: unknown type name 'uint64_t'
uint64_t add;
/usr/include/linux/rds.h:259:4: error: unknown type name 'uint64_t'
uint64_t compare;
/usr/include/linux/rds.h:260:4: error: unknown type name 'uint64_t'
uint64_t swap;
/usr/include/linux/rds.h:261:4: error: unknown type name 'uint64_t'
uint64_t compare_mask;
/usr/include/linux/rds.h:262:4: error: unknown type name 'uint64_t'
uint64_t swap_mask;
/usr/include/linux/rds.h:265:4: error: unknown type name 'uint64_t'
uint64_t add;
/usr/include/linux/rds.h:266:4: error: unknown type name 'uint64_t'
uint64_t nocarry_mask;
/usr/include/linux/rds.h:269:2: error: unknown type name 'uint64_t'
uint64_t flags;
/usr/include/linux/rds.h:270:2: error: unknown type name 'uint64_t'
uint64_t user_token;
/usr/include/linux/rds.h:274:2: error: unknown type name 'uint64_t'
uint64_t user_token;
/usr/include/linux/rds.h:275:2: error: unknown type name 'int32_t'
int32_t status;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include <linux/in.h> to fix the following linux/mroute.h userspace
compilation errors:
/usr/include/linux/mroute.h:58:18: error: field 'vifc_lcl_addr' has incomplete type
struct in_addr vifc_lcl_addr; /* Local interface address */
/usr/include/linux/mroute.h:61:17: error: field 'vifc_rmt_addr' has incomplete type
struct in_addr vifc_rmt_addr; /* IPIP tunnel addr */
/usr/include/linux/mroute.h:72:17: error: field 'mfcc_origin' has incomplete type
struct in_addr mfcc_origin; /* Origin of mcast */
/usr/include/linux/mroute.h:73:17: error: field 'mfcc_mcastgrp' has incomplete type
struct in_addr mfcc_mcastgrp; /* Group in question */
/usr/include/linux/mroute.h:84:17: error: field 'src' has incomplete type
struct in_addr src;
/usr/include/linux/mroute.h:85:17: error: field 'grp' has incomplete type
struct in_addr grp;
/usr/include/linux/mroute.h:109:17: error: field 'im_src' has incomplete type
struct in_addr im_src,im_dst;
/usr/include/linux/mroute.h:109:24: error: field 'im_dst' has incomplete type
struct in_addr im_src,im_dst;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include <linux/in6.h> to fix the following linux/mroute6.h userspace
compilation errors:
/usr/include/linux/mroute6.h:80:22: error: field 'mf6cc_origin' has incomplete type
struct sockaddr_in6 mf6cc_origin; /* Origin of mcast */
/usr/include/linux/mroute6.h:81:22: error: field 'mf6cc_mcastgrp' has incomplete type
struct sockaddr_in6 mf6cc_mcastgrp; /* Group in question */
/usr/include/linux/mroute6.h:91:22: error: field 'src' has incomplete type
struct sockaddr_in6 src;
/usr/include/linux/mroute6.h:92:22: error: field 'grp' has incomplete type
struct sockaddr_in6 grp;
/usr/include/linux/mroute6.h:132:18: error: field 'im6_src' has incomplete type
struct in6_addr im6_src, im6_dst;
/usr/include/linux/mroute6.h:132:27: error: field 'im6_dst' has incomplete type
struct in6_addr im6_src, im6_dst;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include <linux/in6.h> to fix the following linux/ipv6_route.h userspace
compilation errors:
/usr/include/linux/ipv6_route.h:42:19: error: field 'rtmsg_dst' has incomplete type
struct in6_addr rtmsg_dst;
/usr/include/linux/ipv6_route.h:43:19: error: field 'rtmsg_src' has incomplete type
struct in6_addr rtmsg_src;
/ust/include/linux/ipv6_route.h:44:19: error: field 'rtmsg_gateway' has incomplete type
struct in6_addr rtmsg_gateway;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consistently use types from linux/types.h to fix the following
linux/target_core_user.h userspace compilation errors:
/usr/include/linux/target_core_user.h:108:4: error: unknown type name 'uint32_t'
uint32_t iov_cnt;
/usr/include/linux/target_core_user.h:109:4: error: unknown type name 'uint32_t'
uint32_t iov_bidi_cnt;
/usr/include/linux/target_core_user.h:110:4: error: unknown type name 'uint32_t'
uint32_t iov_dif_cnt;
/usr/include/linux/target_core_user.h:111:4: error: unknown type name 'uint64_t'
uint64_t cdb_off;
/usr/include/linux/target_core_user.h:112:4: error: unknown type name 'uint64_t'
uint64_t __pad1;
/usr/include/linux/target_core_user.h:113:4: error: unknown type name 'uint64_t'
uint64_t __pad2;
/usr/include/linux/target_core_user.h:117:4: error: unknown type name 'uint8_t'
uint8_t scsi_status;
/usr/include/linux/target_core_user.h:118:4: error: unknown type name 'uint8_t'
uint8_t __pad1;
/usr/include/linux/target_core_user.h:119:4: error: unknown type name 'uint16_t'
uint16_t __pad2;
/usr/include/linux/target_core_user.h:120:4: error: unknown type name 'uint32_t'
uint32_t __pad3;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).
Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.
If none of these two flags are set, this signals running
over older kernel.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
a VCPU out of KVM_RUN through a POSIX signal. A signal is attached
to a dummy signal handler; by blocking the signal outside KVM_RUN and
unblocking it inside, this possible race is closed:
VCPU thread service thread
--------------------------------------------------------------
check flag
set flag
raise signal
(signal handler does nothing)
KVM_RUN
However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
tsk->sighand->siglock on every KVM_RUN. This lock is often on a
remote NUMA node, because it is on the node of a thread's creator.
Taking this lock can be very expensive if there are many userspace
exits (as is the case for SMP Windows VMs without Hyper-V reference
time counter).
As an alternative, we can put the flag directly in kvm_run so that
KVM can see it:
VCPU thread service thread
--------------------------------------------------------------
raise signal
signal handler
set run->immediate_exit
KVM_RUN
check run->immediate_exit
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull networking fixes from David Miller:
1) In order to avoid problems in the future, make cgroup bpf overriding
explicit using BPF_F_ALLOW_OVERRIDE. From Alexei Staovoitov.
2) LLC sets skb->sk without proper skb->destructor and this explodes,
fix from Eric Dumazet.
3) Make sure when we have an ipv4 mapped source address, the
destination is either also an ipv4 mapped address or
ipv6_addr_any(). Fix from Jonathan T. Leighton.
4) Avoid packet loss in fec driver by programming the multicast filter
more intelligently. From Rui Sousa.
5) Handle multiple threads invoking fanout_add(), fix from Eric
Dumazet.
6) Since we can invoke the TCP input path in process context, without
BH being disabled, we have to accomodate that in the locking of the
TCP probe. Also from Eric Dumazet.
7) Fix erroneous emission of NETEVENT_DELAY_PROBE_TIME_UPDATE when we
aren't even updating that sysctl value. From Marcus Huewe.
8) Fix endian bugs in ibmvnic driver, from Thomas Falcon.
[ This is the second version of the pull that reverts the nested
rhashtable changes that looked a bit too scary for this late in the
release - Linus ]
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
rhashtable: Revert nested table changes.
ibmvnic: Fix endian errors in error reporting output
ibmvnic: Fix endian error when requesting device capabilities
net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification
net: xilinx_emaclite: fix freezes due to unordered I/O
net: xilinx_emaclite: fix receive buffer overflow
bpf: kernel header files need to be copied into the tools directory
tcp: tcp_probe: use spin_lock_bh()
uapi: fix linux/if_pppol2tp.h userspace compilation errors
packet: fix races in fanout_add()
ibmvnic: Fix initial MTU settings
net: ethernet: ti: cpsw: fix cpsw assignment in resume
kcm: fix a null pointer dereference in kcm_sendmsg()
net: fec: fix multicast filtering hardware setup
ipv6: Handle IPv4-mapped src to in6addr_any dst.
ipv6: Inhibit IPv4-mapped src address on the wire.
net/mlx5e: Disable preemption when doing TC statistics upcall
rhashtable: Add nested tables
tipc: Fix tipc_sk_reinit race conditions
gfs2: Use rhashtable walk interface in glock_hash_walk
...
Because of <linux/libc-compat.h> interface limitations, <netinet/in.h>
provided by libc cannot be included after <linux/in.h>, therefore any
header that includes <netinet/in.h> cannot be included after <linux/in.h>.
Change uapi/linux/l2tp.h, the last uapi header that includes
<netinet/in.h>, to include <linux/in.h> and <linux/in6.h> instead of
<netinet/in.h> and use __SOCK_SIZE__ instead of sizeof(struct sockaddr)
the same way as uapi/linux/in.h does, to fix linux/if_pppol2tp.h userspace
compilation errors like this:
In file included from /usr/include/linux/l2tp.h:12:0,
from /usr/include/linux/if_pppol2tp.h:21,
/usr/include/netinet/in.h:31:8: error: redefinition of 'struct in_addr'
Fixes: 47c3e7783b ("net: l2tp: deprecate PPPOL2TP_MSG_* in favour of L2TP_MSG_*")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IOC_OPAL_ACTIVATE_LSP took the wrong strcure which would
give us the wrong size when using _IOC_SIZE, switch it to the
right structure.
Fixes: 058f8a2 ("Include: Uapi: Add user ABI for Sed/Opal")
Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
As part of c11e391da2 "dma-buf: Add ioctls to
allow userspace to flush" a new uapi header file dma-buf.h was added, but an
entry was not added on Kbuild to install it. This patch resolves this omission
so that "make headers_install" installs this header.
Signed-off-by: Denys Dmytriyenko <denys@ti.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Tiago Vignatti <tiago.vignatti@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1487102447-59265-1-git-send-email-denis@denix.org
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYosRpAAoJEAhfPr2O5OEVAU4P/RB/A9v422J1aFixQ1fPp89P
xRp6m6xj2ln/r/ydl5j3LgSzA6nCSQT4p1jRalFRdpk/FyS4v6wE4RhaIDGW/Q1P
WDwRfcyrUdWIPZR6T0289m8eTHG0w4ewbEHPm5iG5UZQHZsObEpmNJD6mOSm00N4
0JhAmJIccWxIjObIajZizQ9BEUUE+T1PQgDf7QiTHFb3d1UrNXYu/cka8Ys9lafu
kR//BPRxLsCXWWHf45ZHgYH+V214haGcVhtlf1ehALlZQ3wNYZaC7XvH8G795ykR
ayrc77qLj2NROhTG4rvljyOH4L0I1IH0k2bC633FrPpgzOCCqNLBMLwn5YhFN3Z+
QNO2ChxDmZlWcqbepkOmK6IK2HmIulM4AOK0gQa+J4inYM3w2aZD3egaFFsx2I0+
1y7KSKAULNjEZanx80t3rxGw+bnIh80eD3n2ODgAU0spqEsm5cb3C3JIN3qeS8KQ
j6vVawB2+gTRCRIHWDBHnYUzkn2SPINp1bRajuxryH1u5TKgvs02NQB84i9mdNR+
iRpH7Yn8BIuA2w0sGGjbyBYyb6IVFrdx9yS7xsr4CbKYR8tWvEsqRxlN7V16J30M
crxj2OM+yAIXLTSB1bW892WL51JqJRGF3lFnUGS4R4PH9tMvR4YN/PT8yv0wZJC/
SfwwJ38/yZBkZuRSNEpt
=OTHY
-----END PGP SIGNATURE-----
Merge tag 'media/v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A colorspace regression fix in V4L2 core and a CEC core bug that makes
it discard valid messages"
* tag 'media/v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] cec: initiator should be the same as the destination for, poll
[media] videodev2.h: go back to limited range Y'CbCr for SRGB and, ADOBERGB
This reverts 'commit 7e0739cd9c ("[media] videodev2.h: fix
sYCC/AdobeYCC default quantization range").
The problem is that many drivers can convert R'G'B' content (often
from sensors) to Y'CbCr, but they all produce limited range Y'CbCr.
To stay backwards compatible the default quantization range for
sRGB and AdobeRGB Y'CbCr encoding should be limited range, not full
range, even though the corresponding standards specify full range.
Update the V4L2_MAP_QUANTIZATION_DEFAULT define accordingly and
also update the documentation.
Fixes: 7e0739cd9c ("[media] videodev2.h: fix sYCC/AdobeYCC default quantization range")
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Cc: <stable@vger.kernel.org> # for v4.9 and up
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, most relevantly they are:
1) Extend nft_exthdr to allow to match TCP options bitfields, from
Manuel Messner.
2) Allow to check if IPv6 extension header is present in nf_tables,
from Phil Sutter.
3) Allow to set and match conntrack zone in nf_tables, patches from
Florian Westphal.
4) Several patches for the nf_tables set infrastructure, this includes
cleanup and preparatory patches to add the new bitmap set type.
5) Add optional ruleset generation ID check to nf_tables and allow to
delete rules that got no public handle yet via NFTA_RULE_ID. These
patches add the missing kernel infrastructure to support rule
deletion by description from userspace.
6) Missing NFT_SET_OBJECT flag to select the right backend when sets
stores an object map.
7) A couple of cleanups for the expectation and SIP helper, from Gao
feng.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.
Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X
2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.
3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.
In the future this behavior may be extended with a chain of
non-overridable programs.
Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.
Add several testcases and adjust libbpf.
Fixes: 3007098494 ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new attribute allows us to uniquely identify a rule in transaction.
Robots may trigger an insertion followed by deletion in a batch, in that
scenario we still don't have a public rule handle that we can use to
delete the rule. This is similar to the NFTA_SET_ID attribute that
allows us to refer to an anonymous set from a batch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch allows userspace to specify the generation ID that has been
used to build an incremental batch update.
If userspace specifies the generation ID in the batch message as
attribute, then nfnetlink compares it to the current generation ID so
you make sure that you work against the right baseline. Otherwise, bail
out with ERESTART so userspace knows that its changeset is stale and
needs to respin. Userspace can do this transparently at the cost of
taking slightly more time to refresh caches and rework the changeset.
This check is optional, if there is no NFNL_BATCH_GENID attribute in the
batch begin message, then no check is performed.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Per PCIe r3.1, sec 6.2.10 and sec 7.13.4, on Root Ports that support "RP
Extensions for DPC",
When the DPC Trigger Status bit is Set and the DPC RP Busy bit is Set,
software must leave the Root Port in DPC until the DPC RP Busy bit reads
0b.
Wait up to 1 second for the Root Port to become non-busy.
[bhelgaas: changelog, spec references]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The eswitch_[gs]et command is supposed to be similar to port_[gs]et
command - for multiple eswitch attributes. However, when it was introduced
by 08f4b5918b ("net/devlink: Add E-Switch mode control") it was wrongly
named with the word "mode" in it. So fix this now, make the oririnal
enum value existing but obsolete.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* use shash in mac80211 crypto code where applicable
* some documentation fixes
* pass RSSI levels up in change notifications
* remove unused rfkill-regulator
* various other cleanups
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJYnHuZAAoJEGt7eEactAAdorYP/iIfEZjLblLZNui+93OHpmsq
QgXeQ9Vl/Xgq8g6vEb6jpHE6pvT/xoW/jt0s5rpYpPDzP7WRgkFQI144Jflx9gE+
7KRHYc7k9XqugbFGFakjT+DyduxrGkEhZ7Vpd39D+zlPaf+p/31Jk6C4MNwk2oG3
AA7ARUJfOKGpct3+l+cpnWQPUYQQKrSnjnMIwHL3Tu5pvGPDapdDWXbfn6T75Aof
3oVQSNtbEtdzW/Ty+HgDn/boOCRXzXlOCxFlE7OiH2AXfe2mqGU/hkcm/wZ0QxOf
9m3CxVjKH10tY6nsjDiXTOzFn+Zkzuum9/gcKtsgx+6BZ4qod2XuhtzmTeXggI8F
b7nGeadSfSS6/unUu5Fibdn+X4Cw8+Yu5Qyiwo+jLL6yyTLUg6aKN6N9HWddhBfa
pIjWAZVA1iB2m2XqUqRB/asEhslndSSfDmZK8nruYJSZWtQBNkNUdHXqqbqbBSHv
KtKbHMOKoLU4zKmV2vMWGy0qypZCxtZkNF6GURhMh2m89qBcIAApFwQMzK09MwmP
d5TcMwfi8YcKRb1Gw6n7gnJsC8e+tFDGXMi9w6z0FDGZvMbOlnO3ctSd/BY3B01H
DWimkX8Ev3kt6KKTgbJ/n0lR/vEmDGdKo2ahH1uHOTPudrjpOHUg0cdDzS/VPZqi
Qw4+FbVArISNrI2skTrA
=sqkq
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2017-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Some more updates:
* use shash in mac80211 crypto code where applicable
* some documentation fixes
* pass RSSI levels up in change notifications
* remove unused rfkill-regulator
* various other cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This command could be useful to inc/dec fields.
For example, to forward any TCP packet and decrease its TTL:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower ip_proto tcp \
action pedit munge ip ttl add 0xff pipe \
action mirred egress redirect dev veth0
In the example above, adding 0xff to this u8 field is actually
decreasing it by one, since the operation is masked.
Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend pedit to enable the user setting offset relative to network
headers. This change would enable to work with more complex header
schemes (vs the simple IPv4 case) where setting a fixed offset relative
to the network header is not enough.
After this patch, the action has information about the exact header type
and field inside this header. This information could be used later on
for hardware offloading of pedit.
Backward compatibility was being kept:
1. Old kernel <-> new userspace
2. New kernel <-> old userspace
3. add rule using new userspace <-> dump using old userspace
4. add rule using old userspace <-> dump using new userspace
When using the extended api, new netlink attributes are being used. This
way, operation will fail in (1) and (3) - and no malformed rule be added
or dumped. Of course, new user space that doesn't need the new
functionality can use the old netlink attributes and operation will
succeed.
Since action can support both api's, (2) should work, and it is easy to
write the new user space to have (4) work.
The action is having a strict check that only header types and commands
it can handle are accepted. This way future additions will be much
easier.
Usage example:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
flower \
ip_proto tcp \
dst_port 80 \
action pedit munge tcp dport set 8080 pipe \
action mirred egress redirect dev veth0
Will forward tcp port whose original dest port is 80, while modifying
the destination port to 8080.
Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a new binder_fd_array object,
that allows us to support one or more file descriptors
embedded in a buffer that is scatter-gathered.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Martijn Coenen <maco@google.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Serban Constantinescu <serban.constantinescu@arm.com>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Rom Lemarchand <romlem@google.com>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Martijn Coenen <maco@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously all data passed over binder needed
to be serialized, with the exception of Binder
objects and file descriptors.
This patchs adds support for scatter-gathering raw
memory buffers into a binder transaction, avoiding
the need to first serialize them into a Parcel.
To remain backwards compatibile with existing
binder clients, it introduces two new command
ioctls for this purpose - BC_TRANSACTION_SG and
BC_REPLY_SG. These commands may only be used with
the new binder_transaction_data_sg structure,
which adds a field for the total size of the
buffers we are scatter-gathering.
Because memory buffers may contain pointers to
other buffers, we allow callers to specify
a parent buffer and an offset into it, to indicate
this is a location pointing to the buffer that
we are fixing up. The kernel will then take care
of fixing up the pointer to that buffer as well.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Martijn Coenen <maco@google.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Serban Constantinescu <serban.constantinescu@arm.com>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Rom Lemarchand <romlem@google.com>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Martijn Coenen <maco@google.com>
[jstultz: Fold in small fix from Amit Pundir <amit.pundir@linaro.org>]
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
flat_binder_object is used for both handling
binder objects and file descriptors, even though
the two are mostly independent. Since we'll
have more fixup objects in binder in the future,
instead of extending flat_binder_object again,
split out file descriptors to their own object
while retaining backwards compatibility to
existing user-space clients. All binder objects
just share a header.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Martijn Coenen <maco@google.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Serban Constantinescu <serban.constantinescu@arm.com>
Cc: Dmitry Shmidt <dimitrysh@google.com>
Cc: Rom Lemarchand <romlem@google.com>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Martijn Coenen <maco@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
None of these registers is relevant for the userspace API.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the XR17V352 manual, bit 4 is IrDA control and bit 5 for
485. Fortunately, no driver used them so far.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.
Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched. If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.
This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry. If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key. The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry. This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.
The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.
The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple. This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.
When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state. While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards. If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change. When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.
It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information. If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.
The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields. Hence, the IP addresses are overlaid in union with ARP
and ND fields. This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets. ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array. It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to implement Sender-Side Procedures for the Add
Outgoing and Incoming Streams Request Parameter described in
rfc6525 section 5.1.5-5.1.6.
It is also to add sockopt SCTP_ADD_STREAMS in rfc6525 section
6.3.4 for users.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to implement Sender-Side Procedures for the SSN/TSN
Reset Request Parameter descibed in rfc6525 section 5.1.4.
It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3
for users.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tracksticks on the Lenovo thinkpads have their buttons connected
through the touchpad device. We already fixed that in synaptics.c, but
when we switch the device into RMI4 mode to have proper support, the
pass-through functionality can't deal with them easily.
We add a new PS/2 flag and protocol designed for psmouse. The RMI4 F03
pass-through can then emit a special set of commands to notify psmouse the
state of the buttons.
This patch implements the protocol in psmouse, while an other will
do the same for rmi4-f03.
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The nl80211_nan_dual_band_conf enumeration doesn't make much sense.
The default value is assigned to a bit, which makes it weird if the
default bit and other bits are set at the same time.
To improve this, get rid of NL80211_NAN_BAND_DEFAULT and add a wiphy
configuration to let the drivers define which bands are supported.
This is exposed to the userspace, which then can make a decision on
which band(s) to use. Additionally, rename all "dual_band" elements
to "bands", to make things clearer.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch implements the kernel side of the TCP option patch.
Signed-off-by: Manuel Messner <mm@skelett.io>
Reviewed-by: Florian Westphal <fw@strlen.de>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Just like with counters the direction attribute is optional.
We set priv->dir to MAX unconditionally to avoid duplicating the assignment
for all keys with optional direction.
For keys where direction is mandatory, existing code already returns
an error.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If NFT_EXTHDR_F_PRESENT is set, exthdr will not copy any header field
data into *dest, but instead set it to 1 if the header is found and 0
otherwise.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Update the drivers to pass the RSSI level as a cfg80211_cqm_rssi_notify
parameter and pass this value to userspace in a new nl80211 attribute.
This helps both userspace and also helps in the implementation of the
multiple RSSI thresholds CQM mechanism.
Note for marvell/mwifiex I pass 0 for the RSSI value because the new
RSSI value is not available to the driver at the time of the
cfg80211_cqm_rssi_notify call, but the driver queries the new value
immediately after that, so it is actually available just a moment later
if we wanted to defer caling cfg80211_cqm_rssi_notify until that moment.
Without this, the new cfg80211 code (patch 3) will call .get_station
which will send a duplicate HostCmd_CMD_RSSI_INFO command to the hardware.
Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
The big feature this time is support for POWER9 using the radix-tree
MMU for host and guest. This required some changes to arch/powerpc
code, so I talked with Michael Ellerman and he created a topic branch
with this patchset, which I merged into kvm-ppc-next and which Michael
will pull into his tree. Michael also put in some patches from Nick
Piggin which fix bugs in the interrupt vector code in relocatable
kernels when coming from a KVM guest.
Other notable changes include:
* Add the ability to change the size of the hashed page table,
from David Gibson.
* XICS (interrupt controller) emulation fixes and improvements,
from Li Zhong.
* Bug fixes from myself and Thomas Huth.
These patches define some new KVM capabilities and ioctls, but there
should be no conflicts with anything else currently upstream, as far
as I am aware.
Add a hypercall to retrieve the host realtime clock and the TSC value
used to calculate that clock read.
Used to implement clock synchronization between host and guest.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch is a quick fixup of the user structures that will prevent
the structures from being different sizes on 32 and 64 bit archs.
Taking this fix will allow us to *NOT* have to do compat ioctls for
the sed code.
Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Fixes: 19641f2d76 ("Include: Uapi: Add user ABI for Sed/Opal")
Signed-off-by: Jens Axboe <axboe@fb.com>
The libnetfilter_conntrack userland library always sets IPS_CONFIRMED
when building a CTA_STATUS attribute. If this toggles the bit from
0->1, the parser will return an error. On Linux 4.4+ this will cause any
NFQA_EXP attribute in the packet to be ignored. This breaks conntrackd's
userland helpers because they operate on unconfirmed connections.
Instead of returning -EBUSY if the user program asks to modify an
unchangeable bit, simply ignore the change.
Also, fix the logic so that user programs are allowed to clear
the bits that they are allowed to change.
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This resolves the merge errors that were reported in linux-next and it
picks up the staging and IIO fixes that we need/want in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, they are:
1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from
sk_buff so we only access one single cacheline in the conntrack
hotpath. Patchset from Florian Westphal.
2) Don't leak pointer to internal structures when exporting x_tables
ruleset back to userspace, from Willem DeBruijn. This includes new
helper functions to copy data to userspace such as xt_data_to_user()
as well as conversions of our ip_tables, ip6_tables and arp_tables
clients to use it. Not surprinsingly, ebtables requires an ad-hoc
update. There is also a new field in x_tables extensions to indicate
the amount of bytes that we copy to userspace.
3) Add nf_log_all_netns sysctl: This new knob allows you to enable
logging via nf_log infrastructure for all existing netnamespaces.
Given the effort to provide pernet syslog has been discontinued,
let's provide a way to restore logging using netfilter kernel logging
facilities in trusted environments. Patch from Michal Kubecek.
4) Validate SCTP checksum from conntrack helper, from Davide Caratti.
5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly
a copy&paste from the original helper, from Florian Westphal.
6) Reset netfilter state when duplicating packets, also from Florian.
7) Remove unnecessary check for broadcast in IPv6 in pkttype match and
nft_meta, from Liping Zhang.
8) Add missing code to deal with loopback packets from nft_meta when
used by the netdev family, also from Liping.
9) Several cleanups on nf_tables, one to remove unnecessary check from
the netlink control plane path to add table, set and stateful objects
and code consolidation when unregister chain hooks, from Gao Feng.
10) Fix harmless reference counter underflow in IPVS that, however,
results in problems with the introduction of the new refcount_t
type, from David Windsor.
11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp,
from Davide Caratti.
12) Missing documentation on nf_tables uapi header, from Liping Zhang.
13) Use rb_entry() helper in xt_connlimit, from Geliang Tang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
New nested netlink attribute to associate tunnel info per vlan.
This is used by bridge driver to send tunnel metadata to
bridge ports in vlan tunnel mode. This patch also adds new per
port flag IFLA_BRPORT_VLAN_TUNNEL to enable vlan tunnel mode.
off by default.
One example use for this is a vxlan bridging gateway or vtep
which maps vlans to vn-segments (or vnis). User can configure
per-vlan tunnel information which the bridge driver can use
to bridge vlan into the corresponding vn-segment.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vxlan COLLECT_METADATA mode today solves the per-vni netdev
scalability problem in l3 networks. It expects all forwarding
information to be present in dst_metadata. This patch series
enhances collect metadata mode to include the case where only
vni is present in dst_metadata, and the vxlan driver can then use
the rest of the forwarding information datbase to make forwarding
decisions. There is no change to default COLLECT_METADATA
behaviour. These changes only apply to COLLECT_METADATA when
used with the bridging use-case with a special dst_metadata
tunnel info flag (eg: where vxlan device is part of a bridge).
For all this to work, the vxlan driver will need to now support a
single fdb table hashed by mac + vni. This series essentially makes
this happen.
use-case and workflow:
vxlan collect metadata device participates in bridging vlan
to vn-segments. Bridge driver above the vxlan device,
sends the vni corresponding to the vlan in the dst_metadata.
vxlan driver will lookup forwarding database with (mac + vni)
for the required remote destination information to forward the
packet.
Changes introduced by this patch:
- allow learning and forwarding database state in vxlan netdev in
COLLECT_METADATA mode. Current behaviour is not changed
by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used
to support the new bridge friendly mode.
- A single fdb table hashed by (mac, vni) to allow fdb entries with
multiple vnis in the same fdb table
- rx path already has the vni
- tx path expects a vni in the packet with dst_metadata
- prior to this series, fdb remote_dsts carried remote vni and
the vxlan device carrying the fdb table represented the
source vni. With the vxlan device now representing multiple vnis,
this patch adds a src vni attribute to the fdb entry. The remote
vni already uses NDA_VNI attribute. This patch introduces
NDA_SRC_VNI netlink attribute to represent the src vni in a multi
vni fdb table.
iproute2 example (patched and pruned iproute2 output to just show
relevant fdb entries):
example shows same host mac learnt on two vni's.
before (netdev per vni):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
after this patch with collect metadata in bridged mode (single netdev):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the encode/decode functionality from the ife module instead of using
implementation inside the act_ife.
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This module is responsible for the ife encapsulation protocol
encode/decode logics. That module can:
- ife_encode: encode skb and reserve space for the ife meta header
- ife_decode: decode skb and extract the meta header size
- ife_tlv_meta_encode - encodes one tlv entry into the reserved ife
header space.
- ife_tlv_meta_decode - decodes one tlv entry from the packet
- ife_tlv_meta_next - advance to the next tlv
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the latest version of the IPv6 Segment Routing IETF draft [1] the
cleanup flag is removed and the flags field length is shrunk from 16 bits
to 8 bits. As a consequence, the input of the HMAC computation is modified
in a non-backward compatible way by covering the whole octet of flags
instead of only the cleanup bit. As such, if an implementation compatible
with the latest draft computes the HMAC of an SRH who has other flags set
to 1, then the HMAC result would differ from the current implementation.
This patch carries those modifications to prevent conflict with other
implementations of IPv6 SR.
[1] https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-05
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Debugging issues caused by pfmemalloc is often tedious.
Add a new SNMP counter to more easily diagnose these problems.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Acked-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This ioctl opens a file to which a socket is bound and
returns a file descriptor. The caller has to have CAP_NET_ADMIN
in the socket network namespace.
Currently it is impossible to get a path and a mount point
for a socket file. socket_diag reports address, device ID and inode
number for unix sockets. An address can contain a relative path or
a file may be moved somewhere. And these properties say nothing about
a mount namespace and a mount point of a socket file.
With the introduced ioctl, we can get a path by reading
/proc/self/fd/X and get mnt_id from /proc/self/fdinfo/X.
In CRIU we are going to use this ioctl to dump and restore unix socket.
Here is an example how it can be used:
$ strace -e socket,bind,ioctl ./test /tmp/test_sock
socket(AF_UNIX, SOCK_STREAM, 0) = 3
bind(3, {sa_family=AF_UNIX, sun_path="test_sock"}, 11) = 0
ioctl(3, SIOCUNIXFILE, 0) = 4
^Z
$ ss -a | grep test_sock
u_str LISTEN 0 1 test_sock 17798 * 0
$ ls -l /proc/760/fd/{3,4}
lrwx------ 1 root root 64 Feb 1 09:41 3 -> 'socket:[17798]'
l--------- 1 root root 64 Feb 1 09:41 4 -> /tmp/test_sock
$ cat /proc/760/fdinfo/4
pos: 0
flags: 012000000
mnt_id: 40
$ cat /proc/self/mountinfo | grep "^40\s"
40 19 0:37 / /tmp rw shared:23 - tmpfs tmpfs rw
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Kerrisk <<mtk.manpages@gmail.com> writes:
I would like to write code that discovers the namespace setup on a live
system. The NS_GET_PARENT and NS_GET_USERNS ioctl() operations added in
Linux 4.9 provide much of what I want, but there are still a couple of
small pieces missing. Those pieces are added with this patch series.
Here's an example program that makes use of the new ioctl() operations.
8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
/* ns_capable.c
(C) 2016 Michael Kerrisk, <mtk.manpages@gmail.com>
Licensed under the GNU General Public License v2 or later.
Test whether a process (identified by PID) might (subject to LSM checks)
have capabilities in a namespace (identified by a /proc/PID/ns/xxx file).
*/
} while (0)
exit(EXIT_FAILURE); } while (0)
/* Display capabilities sets of process with specified PID */
static void
show_cap(pid_t pid)
{
cap_t caps;
char *cap_string;
caps = cap_get_pid(pid);
if (caps == NULL)
errExit("cap_get_proc");
cap_string = cap_to_text(caps, NULL);
if (cap_string == NULL)
errExit("cap_to_text");
printf("Capabilities: %s\n", cap_string);
}
/* Obtain the effective UID pf the process 'pid' by
scanning its /proc/PID/file */
static uid_t
get_euid_of_process(pid_t pid)
{
char path[PATH_MAX];
char line[1024];
int uid;
snprintf(path, sizeof(path), "/proc/%ld/status", (long) pid);
FILE *fp;
fp = fopen(path, "r");
if (fp == NULL)
errExit("fopen-/proc/PID/status");
for (;;) {
if (fgets(line, sizeof(line), fp) == NULL) {
/* Should never happen... */
fprintf(stderr, "Failure scanning %s\n", path);
exit(EXIT_FAILURE);
}
if (strstr(line, "Uid:") == line) {
sscanf(line, "Uid: %*d %d %*d %*d", &uid);
return uid;
}
}
}
int
main(int argc, char *argv[])
{
int ns_fd, userns_fd, pid_userns_fd;
int nstype;
int next_fd;
struct stat pid_stat;
struct stat target_stat;
char *pid_str;
pid_t pid;
char path[PATH_MAX];
if (argc < 2) {
fprintf(stderr, "Usage: %s PID [ns-file]\n", argv[0]);
fprintf(stderr, "\t'ns-file' is a /proc/PID/ns/xxxx file; "
"if omitted, use the namespace\n"
"\treferred to by standard input "
"(file descriptor 0)\n");
exit(EXIT_FAILURE);
}
pid_str = argv[1];
pid = atoi(pid_str);
if (argc <= 2) {
ns_fd = STDIN_FILENO;
} else {
ns_fd = open(argv[2], O_RDONLY);
if (ns_fd == -1)
errExit("open-ns-file");
}
/* Get the relevant user namespace FD, which is 'ns_fd' if 'ns_fd' refers
to a user namespace, otherwise the user namespace that owns 'ns_fd' */
nstype = ioctl(ns_fd, NS_GET_NSTYPE);
if (nstype == -1)
errExit("ioctl-NS_GET_NSTYPE");
if (nstype == CLONE_NEWUSER) {
userns_fd = ns_fd;
} else {
userns_fd = ioctl(ns_fd, NS_GET_USERNS);
if (userns_fd == -1)
errExit("ioctl-NS_GET_USERNS");
}
/* Obtain 'stat' info for the user namespace of the specified PID */
snprintf(path, sizeof(path), "/proc/%s/ns/user", pid_str);
pid_userns_fd = open(path, O_RDONLY);
if (pid_userns_fd == -1)
errExit("open-PID");
if (fstat(pid_userns_fd, &pid_stat) == -1)
errExit("fstat-PID");
/* Get 'stat' info for the target user namesapce */
if (fstat(userns_fd, &target_stat) == -1)
errExit("fstat-PID");
/* If the PID is in the target user namespace, then it has
whatever capabilities are in its sets. */
if (pid_stat.st_dev == target_stat.st_dev &&
pid_stat.st_ino == target_stat.st_ino) {
printf("PID is in target namespace\n");
printf("Subject to LSM checks, it has the following capabilities\n");
show_cap(pid);
exit(EXIT_SUCCESS);
}
/* Otherwise, we need to walk through the ancestors of the target
user namespace to see if PID is in an ancestor namespace */
for (;;) {
int f;
next_fd = ioctl(userns_fd, NS_GET_PARENT);
if (next_fd == -1) {
/* The error here should be EPERM... */
if (errno != EPERM)
errExit("ioctl-NS_GET_PARENT");
printf("PID is not in an ancestor namespace\n");
printf("It has no capabilities in the target namespace\n");
exit(EXIT_SUCCESS);
}
if (fstat(next_fd, &target_stat) == -1)
errExit("fstat-PID");
/* If the 'stat' info for this user namespace matches the 'stat'
* info for 'next_fd', then the PID is in an ancestor namespace */
if (pid_stat.st_dev == target_stat.st_dev &&
pid_stat.st_ino == target_stat.st_ino)
break;
/* Next time round, get the next parent */
f = userns_fd;
userns_fd = next_fd;
close(f);
}
/* At this point, we found that PID is in an ancestor of the target
user namespace, and 'userns_fd' refers to the immediate descendant
user namespace of PID in the chain of user namespaces from PID to
the target user namespace. If the effective UID of PID matches the
owner UID of descendant user namespace, then PID has all
capabilities in the descendant namespace(s); otherwise, it just has
the capabilities that are in its sets. */
uid_t owner_uid, uid;
if (ioctl(userns_fd, NS_GET_OWNER_UID, &owner_uid) == -1) {
perror("ioctl-NS_GET_OWNER_UID");
exit(EXIT_FAILURE);
}
uid = get_euid_of_process(pid);
printf("PID is in an ancestor namespace\n");
if (owner_uid == uid) {
printf("And its effective UID matches the owner "
"of the namespace\n");
printf("Subject to LSM checks, PID has all capabilities in "
"that namespace!\n");
} else {
printf("But its effective UID does not match the owner "
"of the namespace\n");
printf("Subject to LSM checks, it has the following capabilities\n");
show_cap(pid);
}
exit(EXIT_SUCCESS);
}
8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
Michael Kerrisk (2):
nsfs: Add an ioctl() to return the namespace type
nsfs: Add an ioctl() to return owner UID of a userns
fs/nsfs.c | 13 +++++++++++++
include/uapi/linux/nsfs.h | 9 +++++++--
2 files changed, 20 insertions(+), 2 deletions(-)
I'd like to write code that discovers the user namespace hierarchy on a
running system, and also shows who owns the various user namespaces.
Currently, there is no way of getting the owner UID of a user namespace.
Therefore, this patch adds a new NS_GET_CREATOR_UID ioctl() that fetches
the UID (as seen in the user namespace of the caller) of the creator of
the user namespace referred to by the specified file descriptor.
If the supplied file descriptor does not refer to a user namespace,
the operation fails with the error EINVAL. If the owner UID does
not have a mapping in the caller's user namespace return the
overflow UID as that appears easier to deal with in practice
in user-space applications.
-- EWB Changed the handling of unmapped UIDs from -EOVERFLOW
back to the overflow uid. Per conversation with
Michael Kerrisk after examining his test code.
Acked-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Michael Kerrisk <mtk-manpages@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
To support unprivileged users mounting filesystems two permission
checks have to be performed: a test to see if the user allowed to
create a mount in the mount namespace, and a test to see if
the user is allowed to access the specified filesystem.
The automount case is special in that mounting the original filesystem
grants permission to mount the sub-filesystems, to any user who
happens to stumble across the their mountpoint and satisfies the
ordinary filesystem permission checks.
Attempting to handle the automount case by using override_creds
almost works. It preserves the idea that permission to mount
the original filesystem is permission to mount the sub-filesystem.
Unfortunately using override_creds messes up the filesystems
ordinary permission checks.
Solve this by being explicit that a mount is a submount by introducing
vfs_submount, and using it where appropriate.
vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let
sget and friends know that a mount is a submount so they can take appropriate
action.
sget and sget_userns are modified to not perform any permission checks
on submounts.
follow_automount is modified to stop using override_creds as that
has proven problemantic.
do_mount is modified to always remove the new MS_SUBMOUNT flag so
that we know userspace will never by able to specify it.
autofs4 is modified to stop using current_real_cred that was put in
there to handle the previous version of submount permission checking.
cifs is modified to pass the mountpoint all of the way down to vfs_submount.
debugfs is modified to pass the mountpoint all of the way down to
trace_automount by adding a new parameter. To make this change easier
a new typedef debugfs_automount_t is introduced to capture the type of
the debugfs automount function.
Cc: stable@vger.kernel.org
Fixes: 069d5ac9ae ("autofs: Fix automounts by using current_real_cred()->uid")
Fixes: aeaa4a79ff ("fs: Call d_automount with the filesystems creds")
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Currently turning on NFSv4.2 results in 4.2 clients suddenly seeing the
individual file labels as they're set on the server. This is not what
they've previously seen, and not appropriate in may cases. (In
particular, if clients have heterogenous security policies then one
client's labels may not even make sense to another.) Labeled NFS should
be opted in only in those cases when the administrator knows it makes
sense.
It's helpful to be able to turn 4.2 on by default, and otherwise the
protocol upgrade seems free of regressions. So, default labeled NFS to
off and provide an export flag to reenable it.
Users wanting labeled NFS support on an export will henceforth need to:
- make sure 4.2 support is enabled on client and server (as
before), and
- upgrade the server nfs-utils to a version supporting the new
"security_label" export flag.
- set that "security_label" flag on the export.
This is commit may be seen as a regression to anyone currently depending
on security labels. We believe those cases are currently rare.
Reported-by: tibbs@math.uh.edu
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Enable user-space to issue vector I/O commands through ioctls. To issue
a vector I/O, the ppa list with addresses is also required and must be
mapped for the controller to access.
For each ioctl, the result and status bits are returned as well, such
that user-space can retrieve the open-channel SSD completion bits.
The implementation covers the traditional use-cases of bad block
management, and vectored read/write/erase.
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Metadata implementation, test, and fixes.
Signed-off-by: Simon A.F. Lund <slund@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
advertise whether KVM is capable of handling the PAPR extensions for
resizing the hashed page table during guest runtime. It also adds
definitions for two new VM ioctl()s to implement this extension, and
documentation of the same.
Note that, HPT resizing is already possible with KVM PR without kernel
modification, since the HPT is managed within userspace (qemu). The
capability defined here will only be set where an in-kernel implementation
of resizing is necessary, i.e. for KVM HV. To determine if the userspace
resize implementation can be used, it's necessary to check
KVM_CAP_PPC_ALLOC_HTAB. Unfortunately older kernels incorrectly set
KVM_CAP_PPC_ALLOC_HTAB even with KVM PR. If userspace it want to support
resizing with KVM PR on such kernels, it will need a workaround.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds two capabilities and two ioctls to allow userspace to
find out about and configure the POWER9 MMU in a guest. The two
capabilities tell userspace whether KVM can support a guest using
the radix MMU, or using the hashed page table (HPT) MMU with a
process table and segment tables. (Note that the MMUs in the
POWER9 processor cores do not use the process and segment tables
when in HPT mode, but the nest MMU does).
The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
whether a guest will use the radix MMU or the HPT MMU, and to
specify the size and location (in guest space) of the process
table.
The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
the radix MMU. It returns a list of supported radix tree geometries
(base page size and number of bits indexed at each level of the
radix tree) and the encoding used to specify the various page
sizes for the TLB invalidate entry instruction.
Initially, both capabilities return 0 and the ioctls return -EINVAL,
until the necessary infrastructure for them to operate correctly
is added.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch introduce support for 2500BaseT and 5000BaseT link modes.
These modes are included in the new IEEE 802.3bz standard.
Signed-off-by: Pavel Belous <pavel.s.belous@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two stats in SCM_TIMESTAMPING_OPT_STATS:
TCP_NLA_DATA_SEGS_OUT: total data packets sent including retransmission
TCP_NLA_TOTAL_RETRANS: total data packets retransmitted
The names are picked to be consistent with corresponding fields in
TCP_INFO. This allows applications that are using the timestamping
API to measure latency stats to also retrive retransmission rate
of application write.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) GTP fixes from Andreas Schultz (missing genl module alias, clear IP
DF on transmit).
2) Netfilter needs to reflect the fwmark when sending resets, from Pau
Espin Pedrol.
3) nftable dump OOPS fix from Liping Zhang.
4) Fix erroneous setting of VIRTIO_NET_HDR_F_DATA_VALID on transmit,
from Rolf Neugebauer.
5) Fix build error of ipt_CLUSTERIP when procfs is disabled, from Arnd
Bergmann.
6) Fix regression in handling of NETIF_F_SG in harmonize_features(),
from Eric Dumazet.
7) Fix RTNL deadlock wrt. lwtunnel module loading, from David Ahern.
8) tcp_fastopen_create_child() needs to setup tp->max_window, from
Alexey Kodanev.
9) Missing kmemdup() failure check in ipv6 segment routing code, from
Eric Dumazet.
10) Don't execute unix_bind() under the bindlock, otherwise we deadlock
with splice. From WANG Cong.
11) ip6_tnl_parse_tlv_enc_lim() potentially reallocates the skb buffer,
therefore callers must reload cached header pointers into that skb.
Fix from Eric Dumazet.
12) Fix various bugs in legacy IRQ fallback handling in alx driver, from
Tobias Regnery.
13) Do not allow lwtunnel drivers to be unloaded while they are
referenced by active instances, from Robert Shearman.
14) Fix truncated PHY LED trigger names, from Geert Uytterhoeven.
15) Fix a few regressions from virtio_net XDP support, from John
Fastabend and Jakub Kicinski.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (102 commits)
ISDN: eicon: silence misleading array-bounds warning
net: phy: micrel: add support for KSZ8795
gtp: fix cross netns recv on gtp socket
gtp: clear DF bit on GTP packet tx
gtp: add genl family modules alias
tcp: don't annotate mark on control socket from tcp_v6_send_response()
ravb: unmap descriptors when freeing rings
virtio_net: reject XDP programs using header adjustment
virtio_net: use dev_kfree_skb for small buffer XDP receive
r8152: check rx after napi is enabled
r8152: re-schedule napi for tx
r8152: avoid start_xmit to schedule napi when napi is disabled
r8152: avoid start_xmit to call napi_schedule during autosuspend
net: dsa: Bring back device detaching in dsa_slave_suspend()
net: phy: leds: Fix truncated LED trigger names
net: phy: leds: Break dependency of phy.h on phy_led_triggers.h
net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash
net-next: ethernet: mediatek: change the compatible string
Documentation: devicetree: change the mediatek ethernet compatible string
bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status().
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYixa5AAoJEAhfPr2O5OEVxEIP/RdOWmeLxBW0AT8sRlvlw43K
ecFIq0WTIGr6w7VJPm2uHIcw9sdDnwQgozbp5di4KkDLH13aNoJYjMknpl13oPLM
KIBUvIAz7WG+/k3Iq45RLZ7eovrWzGSOTRGHOAitidDBmuFofkLslfzlJKhnNacB
PhSgzQsjK0GrBj0z8Ud/dCSWSR1EU67Uq+pdr+eFGJpMtEicQipnj36BkDMECW02
12wGht0NW7eDMILrNXCwfBYqsFLKfgJqhypoJLci1rnnrGtuMVmlphZJSfPC6Riz
6D6mbgNaYcSAQq8VsHW8L/T5EMio/TNCkapY3LjeUOp/qxTk486/sYWg22fN0qvg
Y1lCU1s5jY0Wu9TBdb6ty/dQ52f7gG9SGjRupRNYCYIVJI0V7UQdWxIrr+uqAkH/
Ha/n7q7oTT6RxjmhHB+qAOspeWI31sGzRBhpDve9R+6MQa366aek14SpuG+4OdUN
7As7Lih9sagi90tB+J2B9PTeuZnr4XaZB3Z+t2vaBb19NG4/vpHYSuC0ymp0mzLh
9hnxtfZCsEOYLSRTWRz5PJmAK2t5j26C5Pb3qeRA40rNbKdSyTj4ntJ/VLUaiFg5
m7kJ222GvdUTeuBLhCvzkjk1sCt1YRvaqQkmLrzpynJnzojuS4SXeWWmdygQIZwq
cQC5lcwMFM/x1MmzBAWI
=r51U
-----END PGP SIGNATURE-----
Merge tag 'media/v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- fix a regression on tvp5150 causing failures at input selection and
image glitches
- CEC was moved out of staging for v4.10. Fix some bugs on it while not
too late
- fix a regression on pctv452e caused by VM stack changes
- fix suspend issued with smiapp
- fix a regression on cobalt driver
- fix some warnings and Kconfig issues with some random configs.
* tag 'media/v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] s5k4ecgx: select CRC32 helper
[media] dvb: avoid warning in dvb_net
[media] v4l: tvp5150: Don't override output pinmuxing at stream on/off time
[media] v4l: tvp5150: Fix comment regarding output pin muxing
[media] v4l: tvp5150: Reset device at probe time, not in get/set format handlers
[media] pctv452e: move buffer to heap, no mutex
[media] media/cobalt: use pci_irq_allocate_vectors
[media] cec: fix race between configuring and unconfiguring
[media] cec: move cec_report_phys_addr into cec_config_thread_func
[media] cec: replace cec_report_features by cec_fill_msg_report_features
[media] cec: update log_addr[] before finishing configuration
[media] cec: CEC_MSG_GIVE_FEATURES should abort for CEC version < 2
[media] cec: when canceling a message, don't overwrite old status info
[media] cec: fix report_current_latency
[media] smiapp: Make suspend and resume functions __maybe_unused
[media] smiapp: Implement power-on and power-off sequences without runtime PM
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEES2FAuYbJvAGobdVQPTuqJaypJWoFAliHTm8THG1rbEBwZW5n
dXRyb25peC5kZQAKCRA9O6olrKklajYkB/wOJxby2DFi4ukCPyrTpXtkQWmXa6+M
Wr/nY5PBg88JipQ+u3rl6bW9NMF3U3xtPO6I/ezfn1jfWOTg0jVIecEfqlW8JNAU
275IOaiQ7jYgGEo+ALUAv2kSe7Fl8HQyaq+9ugMkWUTdv5Vjay1GZgCzyKZzjM8q
8UOoB1YsXlB9f230HPMLeHcbGHqHCOpfNk9yWNcbd0Up6U7EweXtK4Pf/L8g+bL1
0LR1QHRoUtmivuIi96KhB7O/+w8WuigWRy7qkfGpFescZQWOHzbwCVHlZ9ZYCOnJ
eGEP8koksJuMg4OoEUCv0k2v3Txnfr8YNC5MvNdQ06YmoBWwQeamCozL
=fBeU
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-4.11-20170124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2017-01-24
this is a pull request of 4 patches for net-next/master.
The first patch by Oliver Hartkopp adds a netlink API to configure the
interface termination of a CAN card. The next two patches are by me and
add a netlink API to query and configure CAN interfaces that only
support fixed bitrates. The last patch by Colin Ian King simplifies the
return path in the softing_cs driver's softingcs_probe() function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The address generation mode for IPv6 link-local can only be configured
by netlink messages. This patch adds the ability to change the address
generation mode via sysctl.
v1 -> v2
Removed the rtnl lock and switch to use RCU lock to iterate through
the netdev list.
v2 -> v3
Removed the addrgenmode variable from the idev structure and use the
systcl storage for the flag.
Simplifed the logic for sysctl handling by removing the supported
for all operation.
Added support for more types of tunnel interfaces for link-local
address generation.
Based the patches from net-next.
v3 -> v4
Removed unnecessary whitespace changes.
Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
- bump version strings, by Simon Wunderlich
- ignore self-generated loop detect MAC addresses in translation table,
by Simon Wunderlich
- install uapi batman_adv.h header, by Sven Eckelmann
- bump copyright years, by Sven Eckelmann
- Remove an unused variable in translation table code, by Sven Eckelmann
- Handle NET_XMIT_CN like NET_XMIT_SUCCESS (revised according to Davids
suggestion), and a follow up code clean up, by Gao Feng (2 patches)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAliKJcgWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoYzFEACGNOZcW2bFFpuE5CZWZpTW1n24
YAY/3C+THc89oUyn7ZbWpZ05HcG+JZXOWCWHaRTnd4UeA+sbki53Ioo52ZGgo79C
7LsyXMMH3F2NrXVEfCK/MUGZcqBAxMd4dcDPsYz5q5osxydjG3hEJLikqOluop3P
JKK9FTEeP2JeoWz9eEf7Io2te6EwcIjo3TRa9f53uyyhPnk5eS2NeSle5axZqS7c
l+NSZVlrrG7oUB0UdzABt5NWOvjDc+Lqp1dkoJo17PHOgXialIfjZOKUlKtjVx6E
06ow2HZaEYCtdlb0awjyPxtIpwiy0szTzJa/h4XtzeQHgbOKIqJWAEb80X7imHVE
aljP7A1uuGm0bcVQ+pq21PX8yLu4RCDPDE5Khu9atkiQP5+sVEdGiJ8Soaw4PmoD
yhDmXshrPGR8u5txN8gaHWG4MHt19645s8dHqHQ7tf5h+mf2QXQ2v/jJQsCV2UfY
vLd/JOD4ZVYWDCspcDdEwGc8KB9r6P31wXuSVjYfkqTFXocdzDr87V7C69CS0U+b
Lzel8oa/eVa2ppR+OhpELxhL2ahO7p1jZI2ix4NHftx5MAV4WJ3RfR2ev2Sf9Ukc
aGjzjuml3JVo2i+11lqiAtOaBK9wDNv1CaC+D2NmC6wpWzWQryNryw3fm2SoH3Xm
S+UoBDZpik+SIEBv3Q==
=cd1Y
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20170126' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- ignore self-generated loop detect MAC addresses in translation table,
by Simon Wunderlich
- install uapi batman_adv.h header, by Sven Eckelmann
- bump copyright years, by Sven Eckelmann
- Remove an unused variable in translation table code, by Sven Eckelmann
- Handle NET_XMIT_CN like NET_XMIT_SUCCESS (revised according to Davids
suggestion), and a follow up code clean up, by Gao Feng (2 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains a large batch with Netfilter fixes for
your net tree, they are:
1) Two patches to solve conntrack garbage collector cpu hogging, one to
remove GC_MAX_EVICTS and another to look at the ratio (scanned entries
vs. evicted entries) to make a decision on whether to reduce or not
the scanning interval. From Florian Westphal.
2) Two patches to fix incorrect set element counting if NLM_F_EXCL is
is not set. Moreover, don't decrenent set->nelems from abort patch
if -ENFILE which leaks a spare slot in the set. This includes a
patch to deconstify the set walk callback to update set->ndeact.
3) Two fixes for the fwmark_reflect sysctl feature: Propagate mark to
reply packets both from nf_reject and local stack, from Pau Espin Pedrol.
4) Fix incorrect handling of loopback traffic in rpfilter and nf_tables
fib expression, from Liping Zhang.
5) Fix oops on stateful objects netlink dump, when no filter is specified.
Also from Liping Zhang.
6) Fix a build error if proc is not available in ipt_CLUSTERIP, related
to fix that was applied in the previous batch for net. From Arnd Bergmann.
7) Fix lack of string validation in table, chain, set and stateful
object names in nf_tables, from Liping Zhang. Moreover, restrict
maximum log prefix length to 127 bytes, otherwise explicitly bail
out.
8) Two patches to fix spelling and typos in nf_tables uapi header file
and Kconfig, patches from Alexander Alemayhu and William Breathitt Gray.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
09748a22f4 ("batman-adv: add generic netlink family for batman-adv")
introduced the new batman_adv.h which describes the netlink attributes and
commands of batman-adv. But the Kbuild entry to install the header was not
added.
All currently known tools ship their own copy of batman_adv.h but it should
be installed anyway to later be able to migrate to the system batman_adv.h.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
alternative way to perform Fast Open on the active side (client). Prior
to this patch, a client needs to replace the connect() call with
sendto(MSG_FASTOPEN). This can be cumbersome for applications who want
to use Fast Open: these socket operations are often done in lower layer
libraries used by many other applications. Changing these libraries
and/or the socket call sequences are not trivial. A more convenient
approach is to perform Fast Open by simply enabling a socket option when
the socket is created w/o changing other socket calls sequence:
s = socket()
create a new socket
setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …);
newly introduced sockopt
If set, new functionality described below will be used.
Return ENOTSUPP if TFO is not supported or not enabled in the
kernel.
connect()
With cookie present, return 0 immediately.
With no cookie, initiate 3WHS with TFO cookie-request option and
return -1 with errno = EINPROGRESS.
write()/sendmsg()
With cookie present, send out SYN with data and return the number of
bytes buffered.
With no cookie, and 3WHS not yet completed, return -1 with errno =
EINPROGRESS.
No MSG_FASTOPEN flag is needed.
read()
Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but
write() is not called yet.
Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is
established but no msg is received yet.
Return number of bytes read if socket is established and there is
msg received.
The new API simplifies life for applications that always perform a write()
immediately after a successful connect(). Such applications can now take
advantage of Fast Open by merely making one new setsockopt() call at the time
of creating the socket. Nothing else about the application's socket call
sequence needs to change.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to save
user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.
Sample exercise(showing variable length use of cookie)
.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4
.. dump all gact actions..
sudo $TC -s actions ls action gact
action order 0: gact action pass
random type none pass val 0
index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4
.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1
... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux 4.9 added two ioctl() operations that can be used to discover:
* the parental relationships for hierarchical namespaces (user and PID)
[NS_GET_PARENT]
* the user namespaces that owns a specified non-user-namespace
[NS_GET_USERNS]
For no good reason that I can glean, NS_GET_USERNS was made synonymous
with NS_GET_PARENT for user namespaces. It might have been better if
NS_GET_USERNS had returned an error if the supplied file descriptor
referred to a user namespace, since it suggests that the caller may be
confused. More particularly, if it had generated an error, then I wouldn't
need the new ioctl() operation proposed here. (On the other hand, what
I propose here may be more generally useful.)
I would like to write code that discovers namespace relationships for
the purpose of understanding the namespace setup on a running system.
In particular, given a file descriptor (or pathname) for a namespace,
N, I'd like to obtain the corresponding user namespace. Namespace N
might be a user namespace (in which case my code would just use N) or
a non-user namespace (in which case my code will use NS_GET_USERNS to
get the user namespace associated with N). The problem is that there
is no way to tell the difference by looking at the file descriptor
(and if I try to use NS_GET_USERNS on an N that is a user namespace, I
get the parent user namespace of N, which is not what I want).
This patch therefore adds a new ioctl(), NS_GET_NSTYPE, which, given
a file descriptor that refers to a user namespace, returns the
namespace type (one of the CLONE_NEW* constants).
Signed-off-by: Michael Kerrisk <mtk-manpages@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
First, log prefix will be truncated to NF_LOG_PREFIXLEN-1, i.e. 127,
at nf_log_packet(), so the extra part is useless.
Second, after adding a log rule with a very very long prefix, we will
fail to dump the nft rules after this _special_ one, but acctually,
they do exist. For example:
# name_65000=$(printf "%0.sQ" {1..65000})
# nft add rule filter output log prefix "$name_65000"
# nft add rule filter output counter
# nft add rule filter output counter
# nft list chain filter output
table ip filter {
chain output {
type filter hook output priority 0; policy accept;
}
}
So now, restrict the log prefix length to NF_LOG_PREFIXLEN-1.
Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When programs need to calculate the csum from scratch for small UDP
packets and use bpf_l4_csum_replace() to feed the result from helpers
like bpf_csum_diff(), then we need a flag besides BPF_F_MARK_MANGLED_0
that would ignore the case of current csum being 0, and which would
still allow for the helper to set the csum and transform when needed
to CSUM_MANGLED_0.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This action allows the user to sample traffic matched by tc classifier.
The sampling consists of choosing packets randomly and sampling them using
the psample module. The user can configure the psample group number, the
sampling rate and the packet's truncation (to save kernel-user traffic).
Example:
To sample ingress traffic from interface eth1, one may use the commands:
tc qdisc add dev eth1 handle ffff: ingress
tc filter add dev eth1 parent ffff: \
matchall action sample rate 12 group 4
Where the first command adds an ingress qdisc and the second starts
sampling randomly with an average of one sampled packet per 12 packets on
dev eth1 to psample group 4.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a general way for kernel modules to sample packets, without being tied
to any specific subsystem. This netlink channel can be used by tc,
iptables, etc. and allow to standardize packet sampling in the kernel.
For every sampled packet, the psample module adds the following metadata
fields:
PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable
PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable
PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been
truncated during sampling
PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the
user who initiated the sampling. This field allows the user to
differentiate between several samplers working simultaneously and
filter packets relevant to him
PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The
sequence is kept for each group
PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets
PSAMPLE_ATTR_DATA - the actual packet bits
The sampled packets are sent to the PSAMPLE_NL_MCGRP_SAMPLE multicast
group. In addition, add the GET_GROUPS netlink command which allows the
user to see the current sample groups, their refcount and sequence number.
This command currently supports only netlink dump mode.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements an optional, per bridge port flag and feature to deliver
multicast packets to any host on the according port via unicast
individually. This is done by copying the packet per host and
changing the multicast destination MAC to a unicast one accordingly.
multicast-to-unicast works on top of the multicast snooping feature of
the bridge. Which means unicast copies are only delivered to hosts which
are interested in it and signalized this via IGMP/MLD reports
previously.
This feature is intended for interface types which have a more reliable
and/or efficient way to deliver unicast packets than broadcast ones
(e.g. wifi).
However, it should only be enabled on interfaces where no IGMPv2/MLDv1
report suppression takes place. This feature is disabled by default.
The initial patch and idea is from Felix Fietkau.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[linus.luessing@c0d3.blue: various bug + style fixes, commit message]
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some CAN interfaces only support fixed fixed bitrates. This patch adds a
netlink interface to get the list of the CAN interface's fixed bitrates and
data bitrates.
Inside the driver arrays of supported data- bitrate values are defined.
const u32 drvname_bitrate[] = { 20000, 50000, 100000 };
const u32 drvname_data_bitrate[] = { 200000, 500000, 1000000 };
struct drvname_priv *priv;
priv = netdev_priv(dev);
priv->bitrate_const = drvname_bitrate;
priv->bitrate_const_cnt = ARRAY_SIZE(drvname_bitrate);
priv->data_bitrate_const = drvname_data_bitrate;
priv->data_bitrate_const_cnt = ARRAY_SIZE(drvname_data_bitrate);
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>