Currently, we don't rename the upper/lower_ifc symlinks in
/sys/class/net/*/ , which might result stale/duplicate links/names.
Fix this by adding netdev_adjacent_rename_links(dev, oldname) which renames
all the upper/lower interface's links to dev from the upper/lower_oldname
to the new name.
We don't need a rollback because only we control these symlinks and if we
fail to rename them - sysfs will anyway complain.
Reported-by: Ding Tianhong <dingtianhong@huawei.com>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In file included from kernel/crash_dump.c:2:0:
include/linux/crash_dump.h:22:27: error: unknown type name `pgprot_t'
when CONFIG_CRASH_DUMP=y
The error was traced back to commit 9cb218131d ("vmcore: introduce
remap_oldmem_pfn_range()")
include <asm/pgtable.h> to get the missing definition
Signed-off-by: Qais Yousef <qais.yousef@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change move anycast_src_echo_reply sysctl with other ipv6 sysctls.
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.
Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We still need this notifier even when we don't config
PROC_FS.
It should be rare to have a kernel without PROC_FS,
so just for completeness.
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing net/netif_rx and net/netif_receive_skb trace events
provide little information about the skb, nor do they indicate how it
entered the stack.
Add trace events at entry of each of the exported functions, including
most fields that are likely to be interesting for debugging driver
datapath behaviour. Split netif_rx() and netif_receive_skb() so that
internal calls are not traced.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing net/net_dev_xmit trace event provides little information
about the skb that has been passed to the driver, and it is not
simple to add more since the skb may already have been freed at
the point the event is emitted.
Add a separate trace event before the skb is passed to the driver,
including most fields that are likely to be interesting for debugging
driver datapath behaviour.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a function to set up the partial checksum offset for IP
packets (and optionally re-calculate the pseudo-header checksum) into the
core network code.
The implementation was previously private and duplicated between xen-netback
and xen-netfront, however it is not xen-specific and is potentially useful
to any network driver.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The body of i2c_parent_is_i2c_adapter() is currently guarded by
I2C_MUX. It should be CONFIG_I2C_MUX instead.
Among potentially other problems, this resulted in i2c_lock_adapter()
only locking I2C mux child adapters, and not the parent adapter. In
turn, this could allow inter-mingling of mux child selection and I2C
transactions, which could result in I2C transactions being directed to
the wrong I2C bus, and possibly even switching between busses in the
middle of a transaction.
One concrete issue caused by this bug was corrupted HDMI EDID reads
during boot on the NVIDIA Tegra Seaboard system, although this only
became apparent in recent linux-next, when the boot timing was changed
just enough to trigger the race condition.
Fixes: 3923172b3d ("i2c: reduce parent checking to a NOOP in non-I2C_MUX case")
Cc: Phil Carmody <phil.carmody@partner.samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Conflicts:
net/xfrm/xfrm_policy.c
Steffen Klassert says:
====================
This pull request has a merge conflict between commits be7928d20b
("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and
da7c224b1b ("net: xfrm: xfrm_policy: silence compiler warning") from
the net-next tree and commit 2f3ea9a95c ("xfrm: checkpatch erros with
inline keyword position") from the ipsec-next tree.
The version from net-next can be used, like it is done in linux-next.
1) Checkpatch cleanups, from Weilong Chen.
2) Fix lockdep complaints when pktgen is used with IPsec,
from Fan Du.
3) Update pktgen to allow any combination of IPsec transport/tunnel mode
and AH/ESP/IPcomp type, from Fan Du.
4) Make pktgen_dst_metrics static, Fengguang Wu.
5) Compile fix for pktgen when CONFIG_XFRM is not set,
from Fan Du.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10G PHYs don't currently support running the state machine, which
is implicitly setup via of_phy_connect(). Therefore, it is necessary
to implement an OF version of phy_attach(), which does everything
except start the state machine.
Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
phy_attach_direct() may now attach to a generic 10G driver. It can
also be used exactly as phy_connect_direct(), which will be useful
when using of_mdio, as phy_connect (and therefore of_phy_connect)
start the PHY state machine, which is currently irrelevant for 10G
PHYs.
Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Need an extra parameter to read or write Clause 45 PHYs, so
need a different API with the extra parameter.
Signed-off-by: Andy Fleming <afleming@gmail.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not necessary at all.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_match_indev() is called in fast path, it is not wise to
search for a netdev by ifindex and then compare by its name,
just compare the ifindex.
Also, dev->name could be changed by user-space, therefore
the match would be always fail, but dev->ifindex could
be consistent.
BTW, this will also save some bytes from the core struct of u32.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It will be needed by the next patch.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to store the index separatedly
since tcf_hashinfo is allocated statically too.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The REGULATORY_CUSTOM_REG can be used during early init with
the goal of overriding the wiphy's default regulatory settings
in case the alpha2 of the device is not known. In the case that
the alpha2 becomes known lets avoid having drivers having to
clear the REGULATORY_CUSTOM_REG flag by doing it for them
when regulatory_hint() is used.
Cc: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It includes:
* A new NFC driver for Marvell's 8897, and a few NCI fixes and
improvements needed to support this chipset.
* An LLCP fix for how we were setting the default MIU on a p2p link. If
there is no explicit MIU extension announced at connection time, we
must use the default one and not the one announced at LLCP link
establishement time.
* A pn544 EEPROM config update. Some of the currently EEPROM configured
values are overwriting the firmware ones while other should not be set
by the driver itself.
* Some NFC digital stack fixes and improvements. Asynchronous functions
are better documented, RF technologies and CRC functions are set upon
PSL_REQ reception, and a few minor bugs are fixed.
* Minor and miscelaneous pn533, mei_phy and port100 fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJSzfOKAAoJEIqAPN1PVmxKODEP/i1tmx6bwSjuR0gMyvIkqcBJ
1mM7BwdXlTKTvS/HaKTqaftS5S9Kj/IYSsHPjqRAJp3ipZdc39D8rR3jiyhWzKyD
A/o6whTBTnyAgt8/enNp+h8S/Iq+E3itL/51KUOeeFIKSpGqqfcssZ1/3qhvoYZQ
75zck2OPiEs8KBl1bCrrzK1kP4s8aEH6PepmXd7WS8njKe+dcyl3erw0IVN4WPfP
FKFemvL/HP8+cUyshdiQGRiSw+TyD1VLaZinhyoeJxVRcXUjcodLwtCIATwqvu54
2fMk1ccFineAQZGFfZGbtMAjHQLUeOpHHxFfdkW1g7P9IBp4zjtEiNOhNvPnKlR2
p4g4R/vPdXxbQWjIoWzXI8qw/eFq8xIVC0ap37W/Y65532ParnXESAwk29BJ6770
kqpHTjfZTUmW2POuvqhEKUKPPVp5nt0ArgfnjvHOS1wxcT885vWeu/YOxpOm9VdU
rjFSBBaBDC43vGkCHn5szU9sEwu4O1/JFHElSToXsu+bRtS0tA3O62Kv732RZmbm
1SCbZ63o1ivZr8Q37bY1NDW1/YdwUJMNbEb/t/wDLBqYx0vQcD0aDUWCwoACi2Du
FbUElc975E1ChvM7VfV7uqFN0Pc++M1IEHLcM2BiXmSIGjziFY02FFDkqHiea/tC
xlYfYrrUoIj0v/u7q1t4
=ydBy
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz <sameo@linux.intel.com> says:
"This is the first NFC pull request for 3.14
It includes:
* A new NFC driver for Marvell's 8897, and a few NCI fixes and
improvements needed to support this chipset.
* An LLCP fix for how we were setting the default MIU on a p2p link. If
there is no explicit MIU extension announced at connection time, we
must use the default one and not the one announced at LLCP link
establishement time.
* A pn544 EEPROM config update. Some of the currently EEPROM configured
values are overwriting the firmware ones while other should not be set
by the driver itself.
* Some NFC digital stack fixes and improvements. Asynchronous functions
are better documented, RF technologies and CRC functions are set upon
PSL_REQ reception, and a few minor bugs are fixed.
* Minor and miscelaneous pn533, mei_phy and port100 fixes."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors
to be honored by protocols which do more stringent validation on the
ICMP's packet payload. This knob is useful for people who e.g. want to
run an unmodified DNS server in a namespace where they need to use pmtu
for TCP connections (as they are used for zone transfers or fallback
for requests) but don't want to use possibly spoofed UDP pmtu information.
Currently the whitelisted protocols are TCP, SCTP and DCCP as they check
if the returned packet is in the window or if the association is valid.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While forwarding we should not use the protocol path mtu to calculate
the mtu for a forwarded packet but instead use the interface mtu.
We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was
introduced for multicast forwarding. But as it does not conflict with
our usage in unicast code path it is perfect for reuse.
I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu
along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular
dependencies because of IPSKB_FORWARDED.
Because someone might have written a software which does probe
destinations manually and expects the kernel to honour those path mtus
I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone
can disable this new behaviour. We also still use mtus which are locked on a
route for forwarding.
The reason for this change is, that path mtus information can be injected
into the kernel via e.g. icmp_err protocol handler without verification
of local sockets. As such, this could cause the IPv4 forwarding path to
wrongfully emit fragmentation needed notifications or start to fragment
packets along a path.
Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED
won't be set and further fragmentation logic will use the path mtu to
determine the fragmentation size. They also recheck packet size with
help of path mtu discovery and report appropriate errors.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus disliked the _no_lockdep() naming, so instead
use the more-consistent raw_* prefix to the non-lockdep
enabled seqcount methods.
This also adds raw_ methods for the write operations
as well, which will be utilized in a following patch.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Krzysztof Hałasa <khalasa@piap.pl>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Willy Tarreau <w@1wt.eu>
Link: http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch adds a new netlink attribute for the source-IP and appends it
to the netlink reply. Now, iproute2 can have access to the source-IP.
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Please pull these updates for the 3.14 stream!
For the mac80211 bits, Johannes says:
"Felix adds some helper functions for P2P NoA software tracking, Joe
fixes alignment (but as this apparently never caused issues I didn't
send it to 3.13), Kyeyoon/Jouni add QoS-mapping support (a Hotspot 2.0
feature), Weilong fixed a bunch of checkpatch errors and I get to play
fire-fighter or so and clean up other people's locking issues. I also
added nl80211 vendor-specific events, as we'd discussed at the wireless
summit."
For the iwlwifi bits, Emmanuel says:
"I have here a rework of the interrupt handling to meet RT kernel
requirements - basically we don't take any lock in the primary interrupt
handler. This gave me a good reason to clean things up a bit on the way.
There is also a fix of the QoS mapping along with a few workarounds for
hardware / firmware issues that are hard to hit.
Three fixes suggested by static analyzers, and other various stuff.
Most importantly, I update the Copyright note to include the new year."
For the bluetooth bits, Gustavo says:
"More patches to 3.14. The bulk of changes here is the 6LoWPAN support for
Bluetooth LE Devices. The commits that touches net/ieee802154/ are already
acked by David Miller. Other than that we have some RFCOMM fixes and
improvements plus fixes and clean ups all over the tree."
Beyond that, ath9k, brcmfmac, mwifiex, and wil6210 get their usual
level of attention. The wl1251 driver gets a number of updates,
and there are a handful of other bits here and there.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
This batch contains one single patch with the l2tp match
for xtables, from James Chapman.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:
- NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
instead of lower device which misses the necessary txq synchronization for
lower device such as txq stopping or frozen required by dev watchdog or
control path.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
watchdog.
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
when tso is disabled for lower device.
Fix this by explicitly introducing a new param for .ndo_select_queue() for just
selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
forwarding transmission.
With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
dev_queue_xmit() to do the transmission.
In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a uAPSD service period ends with an MMPDU, we currently just
send that MMPDU, but it obviously won't get the EOSP bit set as
it doesn't have a QoS header. This contradicts the standard, so
add a QoS-nulldata frame after the MMPDU to properly terminate
the service period with a frame that has EOSP set.
Also fix a bug wrt. the TID for the MMPDU, it shouldn't be set
to 0 unconditionally but use the actual TID that was assigned.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Seventh bit of 8th byte of extended capabilities specifies wide
bandwidth support for TDLS links. Add this definition to ieee80211.
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pablo Neira Ayuso says:
====================
nf_tables updates for net-next
The following patchset contains the following nf_tables updates,
mostly updates from Patrick McHardy, they are:
* Add the "inet" table and filter chain type for this new netfilter
family: NFPROTO_INET. This special table/chain allows IPv4 and IPv6
rules, this should help to simplify the burden in the administration
of dual stack firewalls. This also includes several patches to prepare
the infrastructure for this new table and a new meta extension to
match the layer 3 and 4 protocol numbers, from Patrick McHardy.
* Load both IPv4 and IPv6 conntrack modules in nft_ct if the rule is used
in NFPROTO_INET, as we don't certainly know which one would be used,
also from Patrick McHardy.
* Do not allow to delete a table that contains sets, otherwise these
sets become orphan, from Patrick McHardy.
* Hold a reference to the corresponding nf_tables family module when
creating a table of that family type, to avoid the module deletion
when in use, from Patrick McHardy.
* Update chain counters before setting the chain policy to ensure that
we don't leave the chain in inconsistent state in case of errors (aka.
restore chain atomicity). This also fixes a possible leak if it fails
to allocate the chain counters if no counters are passed to be restored,
from Patrick McHardy.
* Don't check for overflows in the table counter if we are just renaming
a chain, from Patrick McHardy.
* Replay the netlink request after dropping the nfnl lock to load the
module that supports provides a chain type, from Patrick.
* Fix chain type module references, from Patrick.
* Several cleanups, function renames, constification and code
refactorizations also from Patrick McHardy.
* Add support to set the connmark, this can be used to set it based on
the meta mark (similar feature to -j CONNMARK --restore), from
Kristian Evensen.
* A couple of fixes to the recently added meta/set support and nft_reject,
and fix missing chain type unregistration if we fail to register our
the family table/filter chain type, from myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce an xtables add-on for matching L2TP packets. Supports L2TPv2
and L2TPv3 over IPv4 and IPv6. As well as filtering on L2TP tunnel-id
and session-id, the filtering decision can also include the L2TP
packet type (control or data), protocol version (2 or 3) and
encapsulation type (UDP or IP).
The most common use for this will likely be to filter L2TP data
packets of individual L2TP tunnels or sessions. While a u32 match can
be used, the L2TP protocol headers are such that field offsets differ
depending on bits set in the header, making rules for matching generic
L2TP connections cumbersome. This match extension takes care of all
that.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We don't encode argument types into function names and since besides
nft_do_chain() there are only AF-specific versions, there is no risk
of confusion.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Minor nf_chain_type cleanups:
- reorder struct to plug a hoe
- rename struct module member to "owner" for consistency
- rename nf_hookfn array to "hooks" for consistency
- reorder initializers for better readability
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The chain type module reference handling makes no sense at all: we take
a reference immediately when the module is registered, preventing the
module from ever being unloaded.
Fix by taking a reference when we're actually creating a chain of the
chain type and release the reference when destroying the chain.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds kernel support for setting properties of tracked
connections. Currently, only connmark is supported. One use-case
for this feature is to provide the same functionality as
-j CONNMARK --save-mark in iptables.
Some restructuring was needed to implement the set op. The new
structure follows that of nft_meta.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add a utility function to get the number of channels supported by
the device, and update the places in the code that need this data.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
[replace another occurrence in libertas, fix kernel-doc, fix bugs]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We should const-ify comparisons on skb_queue_* inline helper
functions as their parameters are const as well, so lets not
drop that.
Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For L3-proto independant rules we need to get at the L4 protocol value
directly. Add it to the nft_pktinfo struct and use the meta expression
to retrieve it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Needed by multi-family tables to distinguish IPv4 and IPv6 packets.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds a new table family and a new filter chain that you can
use to attach IPv4 and IPv6 rules. This should help to simplify
rule-set maintainance in dual-stack setups.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add support to register chains to multiple hooks for different address
families for mixed IPv4/IPv6 tables.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Multi-family tables need the AF from the hook ops. Add a pointer to the
hook ops and replace usage of the hooknum member in struct nft_pktinfo.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This value is no longer used by mac80211, and practically no
driver ever set it to a correct value anyway, so remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch built on top of Commit 299603e837
("net-gro: Prepare GRO stack for the upcoming tunneling support") to add
the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO
stack. It also serves as an example for supporting other encapsulation
protocols in the GRO stack in the future.
The patch supports version 0 and all the flags (key, csum, seq#) but
will flush any pkt with the S (seq#) flag. This is because the S flag
is not support by GSO, and a GRO pkt may end up in the forwarding path,
thus requiring GSO support to break it up correctly.
Currently the "packet_offload" structure only contains L3 (ETH_P_IP/
ETH_P_IPV6) GRO offload support so the encapped pkts are limited to
IP pkts (i.e., w/o L2 hdr). But support for other protocol type can
be easily added, so is the support for GRE variations like NVGRE.
The patch also support csum offload. Specifically if the csum flag is on
and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE),
the code will take advantage of the csum computed by the h/w when
validating the GRE csum.
Note that commit 60769a5dcd "ipv4: gre:
add GRO capability" already introduces GRO capability to IPv4 GRE
tunnels, using the gro_cells infrastructure. But GRO is done after
GRE hdr has been removed (i.e., decapped). The following patch applies
GRO when pkts first come in (before hitting the GRE tunnel code). There
is some performance advantage for applying GRO as early as possible.
Also this approach is transparent to other subsystem like Open vSwitch
where GRE decap is handled outside of the IP stack hence making it
harder for the gro_cells stuff to apply. On the other hand, some NICs
are still not capable of hashing on the inner hdr of a GRE pkt (RSS).
In that case the GRO processing of pkts from the same remote host will
all happen on the same CPU and the performance may be suboptimal.
I'm including some rough preliminary performance numbers below. Note
that the performance will be highly dependent on traffic load, mix as
usual. Moreover it also depends on NIC offload features hence the
following is by no means a comprehesive study. Local testing and tuning
will be needed to decide the best setting.
All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs.
(super_netperf 50 -H 192.168.1.18 -l 30)
An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1
mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123)
is configured.
The GRO support for pkts AFTER decap are controlled through the device
feature of the GRE device (e.g., ethtool -K gre1 gro on/off).
1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off
thruput: 9.16Gbps
CPU utilization: 19%
1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off
thruput: 5.9Gbps
CPU utilization: 15%
1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 12-13%
1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 10%
The following tests were performed on a different NIC that is capable of
csum offload. I.e., the h/w is capable of computing IP payload csum
(CHECKSUM_COMPLETE).
2.1 ethtool -K gre1 gro on (hence will use gro_cells)
2.1.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 8.53Gbps
CPU utilization: 9%
2.1.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 8.97Gbps
CPU utilization: 7-8%
2.1.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 8.83Gbps
CPU utilization: 5-6%
2.1.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.98Gbps
CPU utilization: 5%
2.2 ethtool -K gre1 gro off
2.2.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 5.93Gbps
CPU utilization: 9%
2.2.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 5.62Gbps
CPU utilization: 8%
2.2.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 7.69Gbps
CPU utilization: 8%
2.2.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.96Gbps
CPU utilization: 5-6%
Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change allows to follow a recommandation of RFC4942.
- Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
as source addresses for ICMPv6 echo reply. This sysctl is false by default
to preserve existing behavior.
- Add inline check ipv6_anycast_destination().
- Use them in icmpv6_echo_reply().
Reference:
RFC4942 - IPv6 Transition/Coexistence Security Considerations
(http://tools.ietf.org/html/rfc4942#section-2.1.6)
2.1.6. Anycast Traffic Identification and Security
[...]
To avoid exposing knowledge about the internal structure of the
network, it is recommended that anycast servers now take advantage of
the ability to return responses with the anycast address as the
source address if possible.
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
GRO/GSO layers can be enabled on a node, even if said
node is only forwarding packets.
This patch permits GSO (and upcoming GRO) support for GRE
encapsulated packets, even if the host has no GRE tunnel setup.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross says:
====================
[GIT net-next] Open vSwitch
Open vSwitch changes for net-next/3.14. Highlights are:
* Performance improvements in the mechanism to get packets to userspace
using memory mapped netlink and skb zero copy where appropriate.
* Per-cpu flow stats in situations where flows are likely to be shared
across CPUs. Standard flow stats are used in other situations to save
memory and allocation time.
* A handful of code cleanups and rationalization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This API can be used by drivers to send their custom
configuration using SET_CONFIG NCI command to the device.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Some drivers require special configuration while initializing.
This patch adds setup handler for this custom configuration.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Pull networking fixes from David Miller:
"I'm hoping this is the very last batch of networking fixes for 3.13,
here goes nothing:
1) Fix crashes in VLAN's header_ops passthru.
2) Bridge multicast code needs to use BH spinlocks to prevent
deadlocks with timers. From Curt Brune.
3) ipv6 tunnels lack proper synchornization when updating percpu
statistics. From Li RongQing.
4) Fixes to bnx2x driver from Yaniv Rosner, Dmitry Kravkov and Michal
Kalderon.
5) Avoid undefined operator evaluation order in llc code, from Daniel
Borkmann.
6) Error paths in various GSO offload paths do not unwind properly,
in particular they must undo any modifications they have made to
the SKB. From Wei-Chun Chao.
7) Fix RX refill races during restore in virtio-net, from Jason Wang.
8) Fix SKB use after free in LLC code, from Daniel Borkmann.
9) Missing unlock and OOPS in netpoll code when VLAN tag handling
fails.
10) Fix vxlan device attachment wrt ipv6, from Fan Du.
11) Don't allow creating infiniband links to non-infiniband devices,
from Hangbin Liu.
12) Revert FEC phy reset active low change, it breaks things. From
Fabio Estevam.
13) Fix header pointer handling in 6lowpan header building code, from
Daniel Borkmann.
14) Fix RSS handling in be2net driver, from Vasundhara Volam.
15) Fix modem port indexing in HSO driver, from Dan Williams"
* http://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
bridge: use spin_lock_bh() in br_multicast_set_hash_max
ipv6: don't install anycast address for /128 addresses on routers
hso: fix handling of modem port SERIAL_STATE notifications
isdn: Drop big endian cpp checks from telespci and hfc_pci drivers
be2net: fix max_evt_qs calculation for BE3 in SR-IOV config
be2net: increase the timeout value for loopback-test FW cmd
be2net: disable RSS when number of RXQs is reduced to 1 via set-channels
xen-netback: Include header for vmalloc
net: 6lowpan: fix lowpan_header_create non-compression memcpy call
fec: Revert "fec: Do not assume that PHY reset is active low"
bnx2x: fix VLAN configuration for VFs.
bnx2x: fix AFEX memory overflow
bnx2x: Clean before update RSS arrives
bnx2x: Correct number of MSI-X vectors for VFs
bnx2x: limit number of interrupt vectors for 57711
qlcnic: Fix bug in Tx completion path
infiniband: make sure the src net is infiniband when create new link
{vxlan, inet6} Mark vxlan_dev flags with VXLAN_F_IPV6 properly
cxgb4: allow large buffer size to have page size
netpoll: Fix missing TXQ unlock and and OOPS.
...
Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.
Signed-of-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Conflicts:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
net/ipv6/ip6_tunnel.c
net/ipv6/ip6_vti.c
ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.
qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP out_of_order_queue lock is not used, as queue manipulation
happens with socket lock held and we therefore use the lockless
skb queue routines (as __skb_queue_head())
We can use __skb_queue_head_init() instead of skb_queue_head_init()
to make this more consistent.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SDIO identifier for Broadcom WLAN devices were defined in the
brcmfmac SDIO driver. Moving the definitions in MMC header file
seems common sense.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.
>From the IETF draft below:
" Bufferbloat is a phenomenon where excess buffers in the network cause high
latency and jitter. As more and more interactive applications (e.g. voice over
IP, real time video streaming and financial transactions) run in the Internet,
high latency and jitter degrade application performance. There is a pressing
need to design intelligent queue management schemes that can control latency and
jitter; and hence provide desirable quality of service to users.
We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software. "
Many thanks to Dave Taht for extensive feedback, reviews, testing and
suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
suggestions. Naeem Khademi and Dave Taht independently contributed to ECN
support.
For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.
Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00
All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.
For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Mythili Prabhu <mysuryan@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says: <pablo@netfilter.org>
====================
nftables updates for net-next
The following patchset contains nftables updates for your net-next tree,
they are:
* Add set operation to the meta expression by means of the select_ops()
infrastructure, this allows us to set the packet mark among other things.
From Arturo Borrero Gonzalez.
* Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel
Borkmann.
* Add new queue expression to nf_tables. These comes with two previous patches
to prepare this new feature, one to add mask in nf_tables_core to
evaluate the queue verdict appropriately and another to refactor common
code with xt_NFQUEUE, from Eric Leblond.
* Do not hide nftables from Kconfig if nfnetlink is not enabled, also from
Eric Leblond.
* Add the reject expression to nf_tables, this adds the missing TCP RST
support. It comes with an initial patch to refactor common code with
xt_NFQUEUE, again from Eric Leblond.
* Remove an unused variable assignment in nf_tables_dump_set(), from Michal
Nazarewicz.
* Remove the nft_meta_target code, now that Arturo added the set operation
to the meta expression, from me.
* Add help information for nf_tables to Kconfig, also from me.
* Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is
available to other nf_tables objects, requested by Arturo, from me.
* Expose the table usage counter, so we can know how many chains are using
this table without dumping the list of chains, from Tomasz Bursztyka.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
macvlan needs vlan_pcpu_stats so make it visible even if compiling
without VLAN_8021Q support. Otherwise a very long compiler error happens.
Fixes: cdf3e274cf ("macvlan: unify macvlan_pcpu_stats and vlan_pcpu_stats")
Cc: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-By: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for your net-next tree,
they are:
* Add full port randomization support. Some crazy researchers found a way
to reconstruct the secure ephemeral ports that are allocated in random mode
by sending off-path bursts of UDP packets to overrun the socket buffer of
the DNS resolver to trigger retransmissions, then if the timing for the
DNS resolution done by a client is larger than usual, then they conclude
that the port that received the burst of UDP packets is the one that was
opened. It seems a bit aggressive method to me but it seems to work for
them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
new NAT mode to fully randomize ports using prandom.
* Add a new classifier to x_tables based on the socket net_cls set via
cgroups. These includes two patches to prepare the field as requested by
Zefan Li. Also from Daniel Borkmann.
* Use prandom instead of get_random_bytes in several locations of the
netfilter code, from Florian Westphal.
* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
mark, also from Florian Westphal.
* Fix compilation warning due to unused variable in IPVS, from Geert
Uytterhoeven.
* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.
* Add IPComp extension to x_tables, from Fan Du.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This function is used to get a specific core when there is more than
one core of that specific type. This is used in bgmac to reset all GMAC
cores.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
They are same, so unify them as one; since macvlan is a kind of vlan,
vlan_pcpu_stats should be a proper name for vlan and macvlan.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
They are same, so unify them as one, pcpu_sw_netstats.
Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
from if_tunnel and remove br_cpu_netstats from br_private.h
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to i40e and pci_regs.h.
Anjali provides a patch to prevent messages from stray HMC events, except
at interrupt message level, and refactors the HMC error handling.
Catherine adds routines in probe to populate/check PCI bus speed and width,
then verify we are in a 8GT/s x8 PCIe slot and warn when we are not.
Shannon adds Wake-on-LAN support for i40e, fixes curly brace use as well as
return type for i40e_vsi_clear_rings().
Joseph implements receive offload for VXLAN for i40e, where the hardware
supports checksum offload/verification of the inner/outer header.
Mitch provides the bulk of the changes, where he refactors the VF reset
code so that it works on real hardware. Then does code cleanup by
calling existing functions to enable and disable queues for VFs and
remove unused functions. Removes a unnecessary log messages that are
seen at every VF reset, for example complaining about disabling queues
that are already disabled. Fixes an error return when the VF asks to
add an invalid MAC address and if the VF sends a bad message, make it
more informative about what is actually going on.
Jesse refactors the LED function to flash LED lights correctly.
v2:
- removed patch 5 "i40e: add set settings and pauseparam" based on
feedback from Ben Hutchings, will re-work that patch for later
submission
- Added patch "i40e: Implementation of vxlan ndo's" from Joseph to
address Or Gerlitz's questions and concerns. This patch adds the
implementation for the VXLAN ndo's and allows the hardware to do
receive checksum offload for inner packets on the UDP ports that
VXLAN notifies us about.
- Added patch "i40e: using for_each_set_bit to simplify the code"
from Wei Yongjun. This patch uses for_each_set_bit() to simply
the code.
v3:
- fixed indentation issue in patch 11 based on feedback from
Sergei Shtylyov.
Sorry for the delayed release of v4, it was delayed to the holidays.
v4:
- Addressed Or Gerlitz's concerns about trying to get a hold of a mutex
while holding a spin lock in patch 6 by executing the AQ commands from
a subtask.
- Addressed David Miller's Kconfig concerns by creating a Kconfig VXLAN
option for i40e and wrapped appropriate code with the config option in
patch 6.
- Updated patch 7 based on the changes made in patch 6 in the above two
bullets.
v5:
- Added the patch to pci_regs.h based on David Miller's feedback to add
PCI defines for speed and width
- Updated patch 3 description to better explain the changes based on
feedback from David Miller
- Updated patch 4 to use the newly added defines to pci_regs.h instead
of local defines
- Updated patch 7 to use <net/vxlan.h> in the #include based on feedback
from David Miller
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
phy_scan_fixups() isn't and shouldn't be called by the drivers directly, so
unexport it. And since Florian Fainelli's recent patches, the function is only
called locally, so we can make it static as well.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove adjust_state() callback from 'struct phy_device' since it seems to have
never been really used from the inception: phy_start_machine() has been always
called with 2nd argument equal to NULL.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Running 'checkpatch.pl' gives some errors and warnings:
- no spaces around =;
- * separated by space from the function name;
- { in function definition not on a separate line;
- line over 80 characters.
While fixing these, also fix the following style issues:
- file name in the heading comment;
- alignment not matching open paren.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some controller pretend they support the Delete Stored Link Key command,
but in reality they really don't support it.
< HCI Command: Delete Stored Link Key (0x03|0x0012) plen 7
bdaddr 00:00:00:00:00:00 all 1
> HCI Event: Command Complete (0x0e) plen 4
Delete Stored Link Key (0x03|0x0012) ncmd 1
status 0x11 deleted 0
Error: Unsupported Feature or Parameter Value
Not correctly supporting this command causes the controller setup to
fail and will make a device not work. However sending the command for
controller that handle stored link keys is important. This quirk
allows a driver to disable the command if it knows that this command
handling is broken.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Add missing PCI bus link speed 8.0 GT/s and bus link widths of
x1, x2, x4 and x8.
CC: <linux-kernel@vger.kernel.org>
CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Add nested IFLA_BOND_AD_INFO for bonding 802.3ad info.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_AD_SELECT to allow get/set of bonding parameter
ad_select via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IFLA_BOND_AD_LACP_RATE to allow get/set of bonding parameter
lacp_rate via netlink.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The llc_sap_list_lock does not need to be global, only acquired
in core.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Namespace related cleaning
* make cred_to_ucred static
* remove unused sock_rmalloc function
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
percpu route cache eliminates share of dst refcnt between CPUs.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid doing a route lookup on every packet being tunneled.
In ip_tunnel.c cache the route returned from ip_route_output if
the tunnel is "connected" so that all the rouitng parameters are
taken from tunnel parms for a packet. Specifically, not NBMA tunnel
and tos is from tunnel parms (not inner packet).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It would be useful e.g. in a server or desktop environment to have
a facility in the notion of fine-grained "per application" or "per
application group" firewall policies. Probably, users in the mobile,
embedded area (e.g. Android based) with different security policy
requirements for application groups could have great benefit from
that as well. For example, with a little bit of configuration effort,
an admin could whitelist well-known applications, and thus block
otherwise unwanted "hard-to-track" applications like [1] from a
user's machine. Blocking is just one example, but it is not limited
to that, meaning we can have much different scenarios/policies that
netfilter allows us than just blocking, e.g. fine grained settings
where applications are allowed to connect/send traffic to, application
traffic marking/conntracking, application-specific packet mangling,
and so on.
Implementation of PID-based matching would not be appropriate
as they frequently change, and child tracking would make that
even more complex and ugly. Cgroups would be a perfect candidate
for accomplishing that as they associate a set of tasks with a
set of parameters for one or more subsystems, in our case the
netfilter subsystem, which, of course, can be combined with other
cgroup subsystems into something more complex if needed.
As mentioned, to overcome this constraint, such processes could
be placed into one or multiple cgroups where different fine-grained
rules can be defined depending on the application scenario, while
e.g. everything else that is not part of that could be dropped (or
vice versa), thus making life harder for unwanted processes to
communicate to the outside world. So, we make use of cgroups here
to track jobs and limit their resources in terms of iptables
policies; in other words, limiting, tracking, etc what they are
allowed to communicate.
In our case we're working on outgoing traffic based on which local
socket that originated from. Also, one doesn't even need to have
an a-prio knowledge of the application internals regarding their
particular use of ports or protocols. Matching is *extremly*
lightweight as we just test for the sk_classid marker of sockets,
originating from net_cls. net_cls and netfilter do not contradict
each other; in fact, each construct can live as standalone or they
can be used in combination with each other, which is perfectly fine,
plus it serves Tejun's requirement to not introduce a new cgroups
subsystem. Through this, we result in a very minimal and efficient
module, and don't add anything except netfilter code.
One possible, minimal usage example (many other iptables options
can be applied obviously):
1) Configuring cgroups if not already done, e.g.:
mkdir /sys/fs/cgroup/net_cls
mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
mkdir /sys/fs/cgroup/net_cls/0
echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
(resp. a real flow handle id for tc)
2) Configuring netfilter (iptables-nftables), e.g.:
iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP
3) Running applications, e.g.:
ping 208.67.222.222 <pid:1799>
echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
[...]
ping 208.67.220.220 <pid:1804>
ping: sendmsg: Operation not permitted
[...]
echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
[...]
Of course, real-world deployments would make use of cgroups user
space toolsuite, or own custom policy daemons dynamically moving
applications from/to various cgroups.
[1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
While we're at it and introduced CGROUP_NET_CLASSID, lets also make
NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
option consistent among networking cgroups, but also among cgroups
CONFIG conventions in general as the vast majority has a prefix of
CONFIG_CGROUP_<SUBSYS>.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Zefan Li requested [1] to perform the following cleanup/refactoring:
- Split cgroupfs classid handling into net core to better express a
possible more generic use.
- Disable module support for cgroupfs bits as the majority of other
cgroupfs subsystems do not have that, and seems to be not wished
from cgroup side. Zefan probably might want to follow-up for netprio
later on.
- By this, code can be further reduced which previously took care of
functionality built when compiled as module.
cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so
that we are consistent with {netclassid,netprio}_cgroup naming that is
under net/core/ as suggested by Zefan.
No change in functionality, but only code refactoring that is being
done here.
[1] http://patchwork.ozlabs.org/patch/304825/
Suggested-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The following code is not used in current upstream code.
Some of this seems to be old hooks, other might be used by some
out of tree module (which I don't care about breaking), and
the need_ipv4_conntrack was used by old NAT code but no longer
called.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Function never used in current upstream code.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We currently use prandom_u32() for allocation of ports in tcp bind(0)
and udp code. In case of plain SNAT we try to keep the ports as is
or increment on collision.
SNAT --random mode does use per-destination incrementing port
allocation. As a recent paper pointed out in [1] that this mode of
port allocation makes it possible to an attacker to find the randomly
allocated ports through a timing side-channel in a socket overloading
attack conducted through an off-path attacker.
So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
in regard to the attack described in this paper. As we need to keep
compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
selection algorithm with a simple prandom_u32() in order to mitigate
this attack vector. Note that the lfsr113's internal state is
periodically reseeded by the kernel through a local secure entropy
source.
More details can be found in [1], the basic idea is to send bursts
of packets to a socket to overflow its receive queue and measure
the latency to detect a possible retransmit when the port is found.
Because of increasing ports to given destination and port, further
allocations can be predicted. This information could then be used by
an attacker for e.g. for cache-poisoning, NS pinning, and degradation
of service attacks against DNS servers [1]:
The best defense against the poisoning attacks is to properly
deploy and validate DNSSEC; DNSSEC provides security not only
against off-path attacker but even against MitM attacker. We hope
that our results will help motivate administrators to adopt DNSSEC.
However, full DNSSEC deployment make take significant time, and
until that happens, we recommend short-term, non-cryptographic
defenses. We recommend to support full port randomisation,
according to practices recommended in [2], and to avoid
per-destination sequential port allocation, which we show may be
vulnerable to derandomisation attacks.
Joint work between Hannes Frederic Sowa and Daniel Borkmann.
[1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
[2] http://arxiv.org/pdf/1205.5190v1.pdf
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- VGA switcheroo was broken for some users as a result of the ACPI-based
PCI hotplug (ACPIPHP) changes in 3.12, because some previously ignored
hotplug events started to be handled. The fix causes them to be
ignored again.
- There are two more issues related to cpufreq's suspend/resume handling
changes from the 3.12 cycle addressed by Viresh Kumar's fixes.
- intel_pstate triggers a divide error in a timer function if the P-state
information it needs is missing during initialization. This leads to
kernel panics on nested KVM clients and is fixed by failing the
initialization cleanly in those cases.
- PCI initalization code changes during the 3.9 cycle uncovered BIOS
issues related to ACPI wakeup notifications (some BIOSes send them
for devices that aren't supposed to support ACPI wakeup). Work around
them by installing an ACPI wakeup notify handler for all PCI devices
with ACPI support.
- The Calxeda cpuilde driver's probe function is tagged as __init, which
is incorrect and causes a section mismatch to occur during build. Fix
from Andre Przywara removes the __init tag from there.
- During the 3.12 cycle ACPIPHP started to print warnings about missing
_ADR for devices that legitimately don't have it. Fix from Toshi Kani
makes it only print the warnings where they make sense.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABCAAGBQJSxrJgAAoJEILEb/54YlRxRPkP/ifzrrVhdzqXIEy44b93JeDx
oSmZW6yTO51GZlDx2bjt6CGJcIUDC4ExYV6S2tB44/DL19CYdIxi7oBaXtUvzGRs
oZ6B1wfvKOIxZ0RQguaGd1uerQU304CGwUXu/jpRZ/UuZZFKq5Uts6O3bilGzCfR
Y+MUH+qECwdBXaFHUISdWFsa3lxj0U0kglszh+DsxwS4gy/pLbCu5fKLgHLuNNQC
hhEEToQ6uF4o8hbkGJvgUPo3V3aUSXObgvJh4ntP09YE1AEJScLB4wKmqL0zN8Qj
pbBf1WC5OpGXv8zGM9ErrY64YaKA36uhJvOi6RtBGLbG+pYM6E6IM9zNf4Ku+T79
JNEulpq27aEx2JghNSgMFYQZEOGTH+q24iXZdZlOIvqWpMymATlqP/gAQQIpg3VC
OIIdocMFRsbgwFXf41uyUqs458fg5xREz5k6geWZeyriM45wFShR+JnMopQWc5OB
a3sbcWUShFBL1T0pqYR4SDLDvH4NdEP2NKO2jlqMXUewLXsVRRt/42etGoe0rI3C
cMWPQq7z0GNN+NboUviqwHdxUKqONWGt+pd/3u8FI/Y1IlXEeXQYGawhSu81uCpT
5gLaKDkwOrCSwOw68Msuod0Cce6TnoTowi6hP2aAEu8mDJwQY+toqA3+CPoO8nty
DdhZjP1afEgsVVyjErX4
=LXh0
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and PM fixes and new device IDs from Rafael Wysocki:
"These commits, except for one, are regression fixes and the remaining
one fixes a divide error leading to a kernel panic. The majority of
the regressions fixed here were introduced during the 3.12 cycle, one
of them is from this cycle and one is older.
Specifics:
- VGA switcheroo was broken for some users as a result of the
ACPI-based PCI hotplug (ACPIPHP) changes in 3.12, because some
previously ignored hotplug events started to be handled. The fix
causes them to be ignored again.
- There are two more issues related to cpufreq's suspend/resume
handling changes from the 3.12 cycle addressed by Viresh Kumar's
fixes.
- intel_pstate triggers a divide error in a timer function if the
P-state information it needs is missing during initialization.
This leads to kernel panics on nested KVM clients and is fixed by
failing the initialization cleanly in those cases.
- PCI initalization code changes during the 3.9 cycle uncovered BIOS
issues related to ACPI wakeup notifications (some BIOSes send them
for devices that aren't supposed to support ACPI wakeup). Work
around them by installing an ACPI wakeup notify handler for all PCI
devices with ACPI support.
- The Calxeda cpuilde driver's probe function is tagged as __init,
which is incorrect and causes a section mismatch to occur during
build. Fix from Andre Przywara removes the __init tag from there.
- During the 3.12 cycle ACPIPHP started to print warnings about
missing _ADR for devices that legitimately don't have it. Fix from
Toshi Kani makes it only print the warnings where they make sense"
* tag 'pm+acpi-3.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPIPHP / radeon / nouveau: Fix VGA switcheroo problem related to hotplug
intel_pstate: Fail initialization if P-state information is missing
ARM/cpuidle: remove __init tag from Calxeda cpuidle probe function
PCI / ACPI: Install wakeup notify handlers for all PCI devs with ACPI
cpufreq: preserve user_policy across suspend/resume
cpufreq: Clean up after a failing light-weight initialization
ACPI / PCI / hotplug: Avoid warning when _ADR not present
Introduce xfrm_state_lookup_byspi to find user specified by custom
from "pgset spi xxx". Using this scheme, any flow regardless its
saddr/daddr could be transform by SA specified with configurable
spi.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
VM to VM GSO traffic is broken if it goes through VXLAN or GRE
tunnel and the physical NIC on the host supports hardware VXLAN/GRE
GSO offload (e.g. bnx2x and next-gen mlx4).
Two issues -
(VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with
SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header
integrity check fails in udp4_ufo_fragment if inner protocol is
TCP. Also gso_segs is calculated incorrectly using skb->len that
includes tunnel header. Fix: robust check should only be applied
to the inner packet.
(VXLAN & GRE) Once GSO header integrity check passes, NULL segs
is returned and the original skb is sent to hardware. However the
tunnel header is already pulled. Fix: tunnel header needs to be
restored so that hardware can perform GSO properly on the original
packet.
Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SCTP outqueue structure maintains a data chunks
that are pending transmission, the list of chunks that
are pending a retransmission and a length of data in
flight. It also tries to keep the emtpy state so that
it can performe shutdown sequence or notify user.
The problem is that the empy state is inconsistently
tracked. It is possible to completely drain the queue
without sending anything when using PR-SCTP. In this
case, the empty state will not be correctly state as
report by Jamal Hadi Salim <jhs@mojatatu.com>. This
can cause an association to be perminantly stuck in the
SHUTDOWN_PENDING state.
Additionally, SCTP is incredibly inefficient when setting
the empty state. Even though all the data is availaible
in the outqueue structure, we ignore it and walk a list
of trasnports.
In the end, we can completely remove the extra empty
state and figure out if the queue is empty by looking
at 3 things: length of pending data, length of in-flight
data, and exisiting of retransmit data. All of these
are already in the strucutre.
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to export functions only used in one file.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2960ed3468 ("ARM: netx: move platform_data definitions")
moved the file to the current location but forgot to remove the pointer
to its previous location. Clean it up. While at it also change the header
file protection macros appropriately.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
since the prune parameter for fib6_clean_all always is 0, remove it.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Running 'make namespacecheck' shows:
net/ipv6/route.o
ipv6_route_table_template
rt6_bind_peer
net/ipv6/icmp.o
icmpv6_route_lookup
ipv6_icmp_table_template
This addresses some of those warnings by:
* make icmpv6_route_lookup static
* move inline's out of ip6_route.h since only used into route.c
* move rt6_bind_peer into route.c
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following functions are not used outside of net/core/dev.c
and should be declared static.
call_netdevice_notifiers_info
__dev_remove_offload
netdev_has_any_upper_dev
__netdev_adjacent_dev_remove
__netdev_adjacent_dev_link_lists
__netdev_adjacent_dev_unlink_lists
__netdev_adjacent_dev_unlink
__netdev_adjacent_dev_link_neighbour
__netdev_adjacent_dev_unlink_neighbour
And the following are never used and should be deleted
netdev_lower_dev_get_private_rcu
__netdev_find_adj_rcu
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>