Commit Graph

1275 Commits

Author SHA1 Message Date
Alexei Starovoitov e2989ee974 bpf, doc: update list of architectures that do eBPF JIT
update the list and remove 'in the future' statement,
since all still alive 64-bit architectures now do eBPF JIT.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-23 15:56:48 -04:00
Mauro Carvalho Chehab 8a6a285d61 docs-rst: usb: update old usbfs-related documentation
There's no usbfs anymore. The old features are now either
exported to /dev/bus/usb or via debugfs.

Update documentation accordingly, pointing to the new
places where the character devices and usb/devices are
now placed.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-04-20 15:30:33 -06:00
SeongJae Park 3bdadc86dc Documentation: Fix dead URLs to ftp.kernel.org
As ftp.kernel.org is closed [0], this commit fixes dead URLs in
documents to use www.kernel.org instead.

[0] https://www.kernel.org/shutting-down-ftp-services.html

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-03-29 15:46:06 -06:00
David S. Miller ba82427d4a Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-03-23

This series contains updates to i40e and i40e.txt documentation.

Jake provides all the changes in the series which are centered around
ntuple filter fixes and additional support.  Fixed the current
implementation of .set_rxnfc, where we were not reading the mask field
for filter entries which was resulting in filters not behaving as
expected and not working correctly.  When cleaning up after disabling
flow director support, ensure that the default input set is correctly
reprogrammed.  Since the hardware only supports a single input set for
all flows of that type, the driver shall only allow the input set to
change if there are no other configured filters for that flow type, so
add support to detect when we can update the input set for each flow
type.  Align the driver to other drivers to partition the ring_cookie
value into 8bits of VF index, along with 32bits of queue number instead
of using the user-def field.  Added support to parse the user-def field
into a data structure format to allow future extensions of the user-def
filed by keeping all the code that read/writes the field into a single
location.  Added support for flexible payloads passed via ethtool
user-def field.  We support a single flexible word (2byte) value per
protocol type, and we handle the FLX_PIT register using a list of
flexible entries so that each flow type may be configured separately.
Enabled flow director filters for SCTPv4 packets using the ethtool
ntuple interface to enable filters.  Updated the documentation on the
i40e driver to include the newly added support to ntuple filters.
Reduced complexity of a if-continue-else-break section of code by
taking advantage of using hlist_for_each_entry_continue() instead.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 13:45:07 -07:00
subashab@codeaurora.org dddb64bcb3 net: Add sysctl to toggle early demux for tcp and udp
Certain system process significant unconnected UDP workload.
It would be preferrable to disable UDP early demux for those systems
and enable it for TCP only.

By disabling UDP demux, we see these slight gains on an ARM64 system-
782 -> 788Mbps unconnected single stream UDPv4
633 -> 654Mbps unconnected UDPv4 different sources

The performance impact can change based on CPU architecure and cache
sizes. There will not much difference seen if entire UDP hash table
is in cache.

Both sysctls are enabled by default to preserve existing behavior.

v1->v2: Change function pointer instead of adding conditional as
suggested by Stephen.

v2->v3: Read once in callers to avoid issues due to compiler
optimizations. Also update commit message with the tests.

v3->v4: Store and use read once result instead of querying pointer
again incorrectly.

v4->v5: Refactor to avoid errors due to compilation with IPV6={m,n}

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Tom Herbert <tom@herbertland.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 13:17:07 -07:00
Jacob Keller 55877012d5 i40e: document drivers use of ntuple filters
Add documentation describing the drivers use of ethtool ntuple filters,
including the limitations that it has due to hardware, as well as how it
reads and parses the user-def data block.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-23 21:13:33 -07:00
Joel Scherpelz bbea124bc9 net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs.
This commit adds a new sysctl accept_ra_rt_info_min_plen that
defines the minimum acceptable prefix length of Route Information
Options. The new sysctl is intended to be used together with
accept_ra_rt_info_max_plen to configure a range of acceptable
prefix lengths. It is useful to prevent misconfigurations from
unintentionally blackholing too much of the IPv6 address space
(e.g., home routers announcing RIOs for fc00::/7, which is
incorrect).

Signed-off-by: Joel Scherpelz <jscherpelz@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 14:20:54 -07:00
Nikolay Aleksandrov bf4e0a3db9 net: ipv4: add support for ECMP hash policy choice
This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
 0 - layer 3 (default)
 1 - layer 4
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated (currently only for L4).
In L3 mode we always calculate the hash due to the ICMP error special
case, the flow dissector's field consistentification should handle the
address order thus we can remove the address reversals.
If the skb is provided we always use it for the hash calculation,
otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 15:27:19 -07:00
David S. Miller 41e95736b3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your
net-next tree. A couple of new features for nf_tables, and unsorted
cleanups and incremental updates for the Netfilter tree. More
specifically, they are:

1) Allow to check for TCP option presence via nft_exthdr, patch
   from Phil Sutter.

2) Add symmetric hash support to nft_hash, from Laura Garcia Liebana.

3) Use pr_cont() in ebt_log, from Joe Perches.

4) Remove some dead code in arp_tables reported via static analysis
   tool, from Colin Ian King.

5) Consolidate nf_tables expression validation, from Liping Zhang.

6) Consolidate set lookup via nft_set_lookup().

7) Remove unnecessary rcu read lock side in bridge netfilter, from
   Florian Westphal.

8) Remove unused variable in nf_reject_ipv4, from Tahee Yoo.

9) Pass nft_ctx struct to object initialization indirections, from
   Florian Westphal.

10) Add code to integrate conntrack helper into nf_tables, also from
    Florian.

11) Allow to check if interface index or name exists via
    NFTA_FIB_F_PRESENT, from Phil Sutter.

12) Simplify resolve_normal_ct(), from Florian.

13) Use per-limit spinlock in nft_limit and xt_limit, from Liping Zhang.

14) Use rwlock in nft_set_rbtree set, also from Liping Zhang.

15) One patch to remove a useless printk at netns init path in ipvs,
    and several patches to document IPVS knobs.

16) Use refcount_t for reference counter in the Netfilter/IPVS code,
    from Elena Reshetova.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 14:28:08 -07:00
Soheil Hassas Yeganeh 4396e46187 tcp: remove tcp_tw_recycle
The tcp_tw_recycle was already broken for connections
behind NAT, since the per-destination timestamp is not
monotonically increasing for multiple machines behind
a single destination address.

After the randomization of TCP timestamp offsets
in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets
for each connection), the tcp_tw_recycle is broken for all
types of connections for the same reason: the timestamps
received from a single machine is not monotonically increasing,
anymore.

Remove tcp_tw_recycle, since it is not functional. Also, remove
the PAWSPassive SNMP counter since it is only used for
tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req
since the strict argument is only set when tcp_tw_recycle is
enabled.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Lutz Vieweg <lvml@5t9.de>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 20:33:56 -07:00
Hangbin Liu 3c679cba58 ipvs: Document sysctl pmtu_disc
Document sysctl pmtu_disc based on commit 3654e61137 ("ipvs: add
pmtu_disc option to disable IP DF for TUN packets").

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2017-03-16 13:33:39 +01:00
Hangbin Liu 24b444155b ipvs: Document sysctl sync_ports
Document sysctl sync_ports based on commit f73181c828 ("ipvs: add support
for sync threads").

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2017-03-16 13:33:39 +01:00
Hangbin Liu 237e5722bc ipvs: Document sysctl sync_qlen_max and sync_sock_size
Document sysctl sync_qlen_max and sync_sock_size based on
commit 1c003b1580 ("ipvs: wakeup master thread").

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2017-03-16 13:33:39 +01:00
Hangbin Liu a2f346d82b ipvs: fix sync_threshold description and add sync_refresh_period, sync_retries
Fix sync_threshold description which should have two values. Also add
sync_refresh_period and sync_retries based on commit 749c42b620
("ipvs: reduce sync rate with time thresholds").

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2017-03-16 13:33:39 +01:00
David S. Miller 101c431492 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/genet/bcmgenet.c
	net/core/sock.c

Conflicts were overlapping changes in bcmgenet and the
lockdep handling of sockets.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-15 11:59:10 -07:00
Robert Shearman a59166e470 mpls: allow TTL propagation from IP packets to be configured
Allow TTL propagation from IP packets to MPLS packets to be
configured. Add a new optional LWT attribute, MPLS_IPTUNNEL_TTL, which
allows the TTL to be set in the resulting MPLS packet, with the value
of 0 having the semantics of enabling propagation of the TTL from the
IP header (i.e. non-zero values disable propagation).

Also allow the configuration to be overridden globally by reusing the
same sysctl to control whether the TTL is propagated from IP packets
into the MPLS header. If the per-LWT attribute is set then it
overrides the global configuration. If the TTL isn't propagated then a
default TTL value is used which can be configured via a new sysctl,
"net.mpls.default_ttl". This is kept separate from the configuration
of whether IP TTL propagation is enabled as it can be used in the
future when non-IP payloads are supported (i.e. where there is no
payload TTL that can be propagated).

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13 15:29:22 -07:00
Robert Shearman 5b441ac878 mpls: allow TTL propagation to IP packets to be configured
Provide the ability to control on a per-route basis whether the TTL
value from an MPLS packet is propagated to an IPv4/IPv6 packet when
the last label is popped as per the theoretical model in RFC 3443
through a new route attribute, RTA_TTL_PROPAGATE which can be 0 to
mean disable propagation and 1 to mean enable propagation.

In order to provide the ability to change the behaviour for packets
arriving with IPv4/IPv6 Explicit Null labels and to provide an easy
way for a user to change the behaviour for all existing routes without
having to reprogram them, a global knob is provided. This is done
through the addition of a new per-namespace sysctl,
"net.mpls.ip_ttl_propagate", which defaults to enabled. If the
per-route attribute is set (either enabled or disabled) then it
overrides the global configuration.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13 15:29:22 -07:00
Neil Jerram 88a7cddce2 Make IP 'forwarding' doc more precise
It wasn't clear if the 'forwarding' setting needs to be enabled on the
interface that packets are received from, or on the interface that
packets are forwarded to, or both.

In fact (according to my code reading) the setting is relevant on the
interface that packets are received from, so this change updates the doc
to say that.

Signed-off-by: Neil Jerram <neil@tigera.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-12 23:28:43 -07:00
Masahiro Yamada 9332ef9dbd scripts/spelling.txt: add "an user" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  an user||a user
  an userspace||a userspace

I also added "userspace" to the list since it is a common word in Linux.
I found some instances for "an userfaultfd", but I did not add it to the
list.  I felt it is endless to find words that start with "user" such as
"userland" etc., so must draw a line somewhere.

Link: http://lkml.kernel.org/r/1481573103-11329-4-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Linus Torvalds c1aac62f36 A slightly quieter cycle for documentation this time around.
Three more DocBook template files have been converted to RST; only 21 to
 go.  There are various build improvements and the usual array of
 documentation improvements and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYriFXAAoJEI3ONVYwIuV6iTMP/iV7ownq9IK1f8askcXKM76i
 NoRdj4/JywAPQ73vLhOSDVELGdVJNRBjdyOdBRzxPgsqAhFmm79lVYV2eLIffQ2k
 7LcVbEQR77I+4z9SwqIVbIWNCBry7Hu8aWh7moDL3I6yeuay408yr5YW2lIlsqHZ
 V/LZgkTWDe+iQPeXNA4Djzylx0lcRlAy4yMSLjN1+gb9/uBnXb9J0eGJzgfZfrL8
 fiIhymg3bv8vB99l6LMR5vT343QLWXf1yS31A7rPQvwkDo6zFehUJA0XNfIsl2dw
 VQYsvl9vp9wy3e6Y0qKXPn1XhAhCrm64P3crBxK31MMvcKZVCfeRSZ78wrvpvewy
 MVLlXdqop1bHPHowtRfA5jwxr1NqcYp+Jg0+YGX3iXpPi1Jfk36DNUy9iWvtvIzr
 lWgQcIKsdCwwYUcvPR8Kt8T/3q/AHbYlI6mimWlkmbZwncQcgCrH5xSG+c2BIPfV
 fn3W6eLHBn8RyVsxlaXlA0Y9TNtI/Cm85b3Ri10pFvhl868ppWfJxXHi7UtcbU58
 sQzahISCTXOH/NQwkkh7kFMtczbB43rAcChvF7EUYpazVBpJ4P4HxKFg3eIzIdc6
 VlBSaMu1hxUGoYxNNYuKr/nYstuczLOKzK7q4j/JOExY3RgTWP+T3bF02wgubvoa
 D/9WfScewkgCJRoA7i17
 =C5nd
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.11' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "A slightly quieter cycle for documentation this time around.

  Three more DocBook template files have been converted to RST; only 21
  to go. There are various build improvements and the usual array of
  documentation improvements and fixes"

* tag 'docs-4.11' of git://git.lwn.net/linux: (44 commits)
  docs / driver-api: Fix structure references in device_link.rst
  PM / docs: Fix structure references in device.rst
  Add a target to check broken external links in the Documentation
  Documentation: Fix linux-api list typo
  Documentation: DocBook/Makefile comment typo
  Improve sparse documentation
  Documentation: make Makefile.sphinx no-ops quieter
  Documentation: DMA-ISA-LPC.txt
  Documentation: input: fix path to input code definitions
  docs: Remove the copyright year from conf.py
  docs: Fix a warning in the Korean HOWTO.rst translation
  PM / sleep / docs: Convert PM notifiers document to reST
  PM / core / docs: Convert sleep states API document to reST
  PM / core: Update kerneldoc comments in pm.h
  doc-rst: Fix recursive make invocation from macros
  doc-rst: Delete output of failed dot-SVG conversion
  doc-rst: Break shell command sequences on failure
  Documentation/sphinx: make targets independent of Sphinx work for HAVE_SPHINX=0
  doc-rst: fixed cleandoc target when used with O=dir
  Documentation/sphinx: prevent generation of .pyc files in the source tree
  ...
2017-02-22 18:51:29 -08:00
Harald Welte 93a66e93c7 GTP: Add some basic documentation about drivers/net/gtp.c
In order to clarify what the module actually does, and how to use it,
let's add some basic documentation to the kernel tree, together with
pointers to related specs and projects.

Signed-off-by: Harald Welte <laforge@gnumonks.org>
Acked-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20 10:24:20 -05:00
David S. Miller 52e01b84a2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, they are:

1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from
   sk_buff so we only access one single cacheline in the conntrack
   hotpath. Patchset from Florian Westphal.

2) Don't leak pointer to internal structures when exporting x_tables
   ruleset back to userspace, from Willem DeBruijn. This includes new
   helper functions to copy data to userspace such as xt_data_to_user()
   as well as conversions of our ip_tables, ip6_tables and arp_tables
   clients to use it. Not surprinsingly, ebtables requires an ad-hoc
   update. There is also a new field in x_tables extensions to indicate
   the amount of bytes that we copy to userspace.

3) Add nf_log_all_netns sysctl: This new knob allows you to enable
   logging via nf_log infrastructure for all existing netnamespaces.
   Given the effort to provide pernet syslog has been discontinued,
   let's provide a way to restore logging using netfilter kernel logging
   facilities in trusted environments. Patch from Michal Kubecek.

4) Validate SCTP checksum from conntrack helper, from Davide Caratti.

5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly
   a copy&paste from the original helper, from Florian Westphal.

6) Reset netfilter state when duplicating packets, also from Florian.

7) Remove unnecessary check for broadcast in IPv6 in pkttype match and
   nft_meta, from Liping Zhang.

8) Add missing code to deal with loopback packets from nft_meta when
   used by the netdev family, also from Liping.

9) Several cleanups on nf_tables, one to remove unnecessary check from
   the netlink control plane path to add table, set and stateful objects
   and code consolidation when unregister chain hooks, from Gao Feng.

10) Fix harmless reference counter underflow in IPVS that, however,
    results in problems with the introduction of the new refcount_t
    type, from David Windsor.

11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp,
    from Davide Caratti.

12) Missing documentation on nf_tables uapi header, from Liping Zhang.

13) Use rb_entry() helper in xt_connlimit, from Geliang Tang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:58:20 -05:00
Michal Kubeček 2851940ffe netfilter: allow logging from non-init namespaces
Commit 69b34fb996 ("netfilter: xt_LOG: add net namespace support for
xt_LOG") disabled logging packets using the LOG target from non-init
namespaces. The motivation was to prevent containers from flooding
kernel log of the host. The plan was to keep it that way until syslog
namespace implementation allows containers to log in a safe way.

However, the work on syslog namespace seems to have hit a dead end
somewhere in 2013 and there are users who want to use xt_LOG in all
network namespaces. This patch allows to do so by setting

  /proc/sys/net/netfilter/nf_log_all_netns

to a nonzero value. This sysctl is only accessible from init_net so that
one cannot switch the behaviour from inside a container.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-02 14:31:58 +01:00
Robert Shearman 63a6fff353 net: Avoid receiving packets with an l3mdev on unbound UDP sockets
Packets arriving in a VRF currently are delivered to UDP sockets that
aren't bound to any interface. TCP defaults to not delivering packets
arriving in a VRF to unbound sockets. IP route lookup and socket
transmit both assume that unbound means using the default table and
UDP applications that haven't been changed to be aware of VRFs may not
function correctly in this case since they may not be able to handle
overlapping IP address ranges, or be able to send packets back to the
original sender if required.

So add a sysctl, udp_l3mdev_accept, to control this behaviour with it
being analgous to the existing tcp_l3mdev_accept, namely to allow a
process to have a VRF-global listen socket. Have this default to off
as this is the behaviour that users will expect, given that there is
no explicit mechanism to set unmodified VRF-unaware application into a
default VRF.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:00:58 -05:00
Masanari Iida 8da9704c8b Doc: Fix double words in Documentation
This patch fix some double words found in Documentation.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2017-01-26 15:25:41 -07:00
Andrew Lunn 434502930f net: dsa: Mop up remaining NET_DSA_HWMON references
Previous patches have moved the temperature sensor code into the
Marvell PHYs. A few now dead references to NET_DSA_HWMON were left
behind. Go reap them.

Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:45:05 -05:00
Krister Johansen 4548b683b7 Introduce a sysctl that modifies the value of PROT_SOCK.
Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
that denotes the first unprivileged inet port in the namespace.  To
disable all privileged ports set this to zero.  It also checks for
overlap with the local port range.  The privileged and local range may
not overlap.

The use case for this change is to allow containerized processes to bind
to priviliged ports, but prevent them from ever being allowed to modify
their container's network configuration.  The latter is accomplished by
ensuring that the network namespace is not a child of the user
namespace.  This modification was needed to allow the container manager
to disable a namespace's priviliged port restrictions without exposing
control of the network namespace to processes in the user namespace.

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24 12:10:51 -05:00
David S. Miller bb60b8b35a For 4.11, we seem to have more than in the past few releases:
* socket owner support for connections, so when the wifi
    manager (e.g. wpa_supplicant) is killed, connections are
    torn down - wpa_supplicant is critical to managing certain
    operations, and can opt in to this where applicable
  * minstrel & minstrel_ht updates to be more efficient (time and space)
  * set wifi_acked/wifi_acked_valid for skb->destructor use in the
    kernel, which was already available to userspace
  * don't indicate new mesh peers that might be used if there's no
    room to add them
  * multicast-to-unicast support in mac80211, for better medium usage
    (since unicast frames can use *much* higher rates, by ~3 orders of
    magnitude)
  * add API to read channel (frequency) limitations from DT
  * add infrastructure to allow randomizing public action frames for
    MAC address privacy (still requires driver support)
  * many cleanups and small improvements/fixes across the board
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYeKu7AAoJEGt7eEactAAdwjEP/RA4bXFMfkC7qUJ++cLrMMwY
 yCvjb8+ULWL2wbCzpfY37acbGJgot3DNoQJzrO2jMQPqyM9nRlTMg5aF49cI7t62
 gU6daNKJaGBe/0yeG7lTJ4n5UtVCDtN45hGc06Yert+ewb9njiJf+XYrtCWetsIJ
 5bOLYQKPWOz/7UyMH7uJ25zrPFaiA3y7XnXKPEudagG/EwEq9ZuUpSSfLwEAEBPi
 6i/2w4fLj32vXRsQMvQT0sU6mjd+1ub8Is7w5l2F06iWwNYPzdSM0IbU+E+ie2tk
 sE6RA70c4ILrp8KisTAz2lJPa4XEpFkLhI3lzRRy8CVzjyyo/OJen92zvr2R7TVb
 /uZG9qfRQ3UitQmgeKd+wS8PsbRAyWUR/xhNxD2r7zARH2vliwyneU+zEpXLeGA1
 Y4PrN1+Fk45Ye4/4XSbPO4cf1MHX7qinN4rjrpsJKPwoYD/gQ1cZvef4AbaKPvq6
 oCKRVrwNoUuSB8NTcMLPqze3WCfhnJyVUhCZTyzHeW4uG81qrHwrvBvM25vcWGcm
 CcSWFktFIpuGML4FCU3byZfb0NkmJtpCD4n7P98WFPGjvsWIEVCMckqlC8x1F7B7
 BqqjGS2mGA17Xy0uLfmN/JempesQJnZhnAnFERdyX1S1YQuKhLwEu7OsYegnStDL
 Cn1wFw2/qcgeTkJfBICB
 =UToW
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2017-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
For 4.11, we seem to have more than in the past few releases:
 * socket owner support for connections, so when the wifi
   manager (e.g. wpa_supplicant) is killed, connections are
   torn down - wpa_supplicant is critical to managing certain
   operations, and can opt in to this where applicable
 * minstrel & minstrel_ht updates to be more efficient (time and space)
 * set wifi_acked/wifi_acked_valid for skb->destructor use in the
   kernel, which was already available to userspace
 * don't indicate new mesh peers that might be used if there's no
   room to add them
 * multicast-to-unicast support in mac80211, for better medium usage
   (since unicast frames can use *much* higher rates, by ~3 orders of
   magnitude)
 * add API to read channel (frequency) limitations from DT
 * add infrastructure to allow randomizing public action frames for
   MAC address privacy (still requires driver support)
 * many cleanups and small improvements/fixes across the board
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-14 12:02:15 -05:00
Yuchung Cheng 4a7f600944 tcp: remove thin_dupack feature
Thin stream DUPACK is to start fast recovery on only one DUPACK
provided the connection is a thin stream (i.e., low inflight).  But
this older feature is now subsumed with RACK. If a connection
receives only a single DUPACK, RACK would arm a reordering timer
and soon starts fast recovery instead of timeout if no further
ACKs are received.

The socket option (THIN_DUPACK) is kept as a nop for compatibility.
Note that this patch does not change another thin-stream feature
which enables linear RTO. Although it might be good to generalize
that in the future (i.e., linear RTO for the first say 3 retries).

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13 22:37:16 -05:00
Yuchung Cheng bec41a11dd tcp: remove early retransmit
This patch removes the support of RFC5827 early retransmit (i.e.,
fast recovery on small inflight with <3 dupacks) because it is
subsumed by the new RACK loss detection. More specifically when
RACK receives DUPACKs, it'll arm a reordering timer to start fast
recovery after a quarter of (min)RTT, hence it covers the early
retransmit except RACK does not limit itself to specific inflight
or dupack numbers.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13 22:37:16 -05:00
David S. Miller 76eb75be79 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-05 11:03:07 -05:00
Sowmini Varadhan 7f953ab2ba af_packet: TX_RING support for TPACKET_V3
Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
does not currently have TX_RING support. As a result an application
that wants the best perf for Tx and Rx (e.g. to handle request/response
transacations) ends up needing 2 sockets, one with *_v2 for Tx and
another with *_v3 for Rx.

This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
so that an application can use a single descriptor to get the benefits
of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
first filling up multiple frames in the Tx ring and then triggering a
transmit. This patch only support fixed size Tx frames for TPACKET_V3,
and requires that tp_next_offset must be zero.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:00:27 -05:00
Alexander Alemayhu 4e5da369df Documentation/networking: fix typo in mpls-sysctl
s/utliziation/utilization

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 15:49:08 -05:00
Rafał Miłecki b7f98864de cfg80211: fix example REG_RULE usage in Documentation
It's just an example, but lets make it look more real to don't confuse
people about possible REG_RULE usage. Channels are 20 MHz wide, so start
and end frequencies are 10 MHz away from the center one.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-02 12:09:57 +01:00
Rafał Miłecki b4c7a3023a cfg80211: update wireless-regdb repo url in Documentation
It's maintained by Seth Forshe for a long time now.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Cc: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-02 12:09:52 +01:00
Linus Torvalds e7aa8c2eb1 These are the documentation changes for 4.10.
It's another busy cycle for the docs tree, as the sphinx conversion
 continues.  Highlights include:
 
  - Further work on PDF output, which remains a bit of a pain but should be
    more solid now.
 
  - Five more DocBook template files converted to Sphinx.  Only 27 to go...
    Lots of plain-text files have also been converted and integrated.
 
  - Images in binary formats have been replaced with more source-friendly
    versions.
 
  - Various bits of organizational work, including the renaming of various
    files discussed at the kernel summit.
 
  - New documentation for the device_link mechanism.
 
 ...and, of course, lots of typo fixes and small updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYTbl7AAoJEI3ONVYwIuV63NIP/REwzThnGWFJMRSuq8Ieq2r9
 sFSQsaGTGlhyKiDoEooo+SO/Za3uTonjK+e7WZg8mhdiEdamta5aociU/71C1Yy/
 T9ur0FhcGblrvZ1NidSDvCLwuECZOMMei7mgLZ9a+KCpc4ANqqTVZSUm1blKcqhF
 XelhVXxBa0ar35l/pVzyCxkdNXRWXv+MJZE8hp5XAdTdr11DS7UY9zrZdH31axtf
 BZlbYJrvB8WPydU6myTjRpirA17Hu7uU64MsL3bNIEiRQ+nVghEzQC8uxeUCvfVx
 r0H5AgGGQeir+e8GEv2T20SPZ+dumXs+y/HehKNb3jS3gV0mo+pKPeUhwLIxr+Zh
 QY64gf+jYf5ISHwAJRnU0Ima72ehObzSbx9Dko10nhq2OvbR5f83gjz9t9jKYFU7
 RDowICA8lwqyRbHRoVfyoW8CpVhWFpMFu3yNeJMckeTish3m7ANqzaWslbsqIP5G
 zxgFMIrVVSbeae+sUeygtEJAnWI09aZ4tuaUXYtGWwu6ikC/3aV6DryP4bthG2LF
 A19uV4nMrLuuh8g2wiTHHjMfjYRwvSn+f9yaolwJhwyNDXQzRPy+ZJ3W/6olOkXC
 bAxTmVRCW5GA/fmSrfXmW1KbnxlWfP2C62hzZQ09UHxzTHdR97oFLDQdZhKo1uwf
 pmSJR0hVeRUmA4uw6+Su
 =A0EV
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.10' of git://git.lwn.net/linux

Pull documentation update from Jonathan Corbet:
 "These are the documentation changes for 4.10.

  It's another busy cycle for the docs tree, as the sphinx conversion
  continues. Highlights include:

   - Further work on PDF output, which remains a bit of a pain but
     should be more solid now.

   - Five more DocBook template files converted to Sphinx. Only 27 to
     go... Lots of plain-text files have also been converted and
     integrated.

   - Images in binary formats have been replaced with more
     source-friendly versions.

   - Various bits of organizational work, including the renaming of
     various files discussed at the kernel summit.

   - New documentation for the device_link mechanism.

  ... and, of course, lots of typo fixes and small updates"

* tag 'docs-4.10' of git://git.lwn.net/linux: (193 commits)
  dma-buf: Extract dma-buf.rst
  Update Documentation/00-INDEX
  docs: 00-INDEX: document directories/files with no docs
  docs: 00-INDEX: remove non-existing entries
  docs: 00-INDEX: add missing entries for documentation files/dirs
  docs: 00-INDEX: consolidate process/ and admin-guide/ description
  scripts: add a script to check if Documentation/00-INDEX is sane
  Docs: change sh -> awk in REPORTING-BUGS
  Documentation/core-api/device_link: Add initial documentation
  core-api: remove an unexpected unident
  ppc/idle: Add documentation for powersave=off
  Doc: Correct typo, "Introdution" => "Introduction"
  Documentation/atomic_ops.txt: convert to ReST markup
  Documentation/local_ops.txt: convert to ReST markup
  Documentation/assoc_array.txt: convert to ReST markup
  docs-rst: parse-headers.pl: cleanup the documentation
  docs-rst: fix media cleandocs target
  docs-rst: media/Makefile: reorganize the rules
  docs-rst: media: build SVG from graphviz files
  docs-rst: replace bayer.png by a SVG image
  ...
2016-12-12 21:58:13 -08:00
Asbjørn Sloth Tønnesen 47c3e7783b net: l2tp: deprecate PPPOL2TP_MSG_* in favour of L2TP_MSG_*
PPPOL2TP_MSG_* and L2TP_MSG_* are duplicates, and are being used
interchangeably in the kernel, so let's standardize on L2TP_MSG_*
internally, and keep PPPOL2TP_MSG_* defined in UAPI for compatibility.

Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-10 23:29:11 -05:00
Woojung.Huh@microchip.com f38e7a32ee phy: add phy fixup unregister functions
>From : Woojung Huh <woojung.huh@microchip.com>

Add functions to unregister phy fixup for modules.

int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask)
	Unregister phy fixup from phy_fixup_list per bus_id, phy_uid &
	phy_uid_mask

int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask)
	Unregister phy fixup from phy_fixup_list.
	Use it for fixup registered by phy_register_fixup_for_uid()

int phy_unregister_fixup_for_id(const char *bus_id)
	Unregister phy fixup from phy_fixup_list.
	Use it for fixup registered by phy_register_fixup_for_id()

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 14:21:47 -05:00
Niklas Cassel 4022d039a3 net: smmac: allow configuring lower pbl values
The driver currently always sets the PBLx8/PBLx4 bit, which means that
the pbl values configured via the pbl/txpbl/rxpbl DT properties are
always multiplied by 8/4 in the hardware.

In order to allow the DT to configure lower pbl values, while at the
same time not changing behavior of any existing device trees using the
pbl/txpbl/rxpbl settings, add a property to disable the multiplication
of the pbl by 8/4 in the hardware.

Suggested-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 13:07:10 -05:00
Niklas Cassel 89caaa2d80 net: stmmac: add support for independent DMA pbl for tx/rx
GMAC and newer supports independent programmable burst lengths for
DMA tx/rx. Add new optional devicetree properties representing this.

To be backwards compatible, snps,pbl will still be valid, but
snps,txpbl/snps,rxpbl will override the value in snps,pbl if set.

If the IP is synthesized to use the AXI interface, there is a register
and a matching DT property inside the optional stmmac-axi-config DT node
for controlling burst lengths, named snps,blen.
However, using this register, it is not possible to control tx and rx
independently. Also, this register is not available if the IP was
synthesized with, e.g., the AHB interface.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-08 13:07:10 -05:00
David S. Miller 5fccd64aa4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains a large Netfilter update for net-next,
to summarise:

1) Add support for stateful objects. This series provides a nf_tables
   native alternative to the extended accounting infrastructure for
   nf_tables. Two initial stateful objects are supported: counters and
   quotas. Objects are identified by a user-defined name, you can fetch
   and reset them anytime. You can also use a maps to allow fast lookups
   using any arbitrary key combination. More info at:

   http://marc.info/?l=netfilter-devel&m=148029128323837&w=2

2) On-demand registration of nf_conntrack and defrag hooks per netns.
   Register nf_conntrack hooks if we have a stateful ruleset, ie.
   state-based filtering or NAT. The new nf_conntrack_default_on sysctl
   enables this from newly created netnamespaces. Default behaviour is not
   modified. Patches from Florian Westphal.

3) Allocate 4k chunks and then use these for x_tables counter allocation
   requests, this improves ruleset load time and also datapath ruleset
   evaluation, patches from Florian Westphal.

4) Add support for ebpf to the existing x_tables bpf extension.
   From Willem de Bruijn.

5) Update layer 4 checksum if any of the pseudoheader fields is updated.
   This provides a limited form of 1:1 stateless NAT that make sense in
   specific scenario, eg. load balancing.

6) Add support to flush sets in nf_tables. This series comes with a new
   set->ops->deactivate_one() indirection given that we have to walk
   over the list of set elements, then deactivate them one by one.
   The existing set->ops->deactivate() performs an element lookup that
   we don't need.

7) Two patches to avoid cloning packets, thus speed up packet forwarding
   via nft_fwd from ingress. From Florian Westphal.

8) Two IPVS patches via Simon Horman: Decrement ttl in all modes to
   prevent infinite loops, patch from Dwip Banerjee. And one minor
   refactoring from Gao feng.

9) Revisit recent log support for nf_tables netdev families: One patch
   to ensure that we correctly handle non-ethernet packets. Another
   patch to add missing logger definition for netdev. Patches from
   Liping Zhang.

10) Three patches for nft_fib, one to address insufficient register
    initialization and another to solve incorrect (although harmless)
    byteswap operation. Moreover update xt_rpfilter and nft_fib to match
    lbcast packets with zeronet as source, eg. DHCP Discover packets
    (0.0.0.0 -> 255.255.255.255). Also from Liping Zhang.

11) Built-in DCCP, SCTP and UDPlite conntrack and NAT support, from
    Davide Caratti. While DCCP is rather hopeless lately, and UDPlite has
    been broken in many-cast mode for some little time, let's give them a
    chance by placing them at the same level as other existing protocols.
    Thus, users don't explicitly have to modprobe support for this and
    NAT rules work for them. Some people point to the lack of support in
    SOHO Linux-based routers that make deployment of new protocols harder.
    I guess other middleboxes outthere on the Internet are also to blame.
    Anyway, let's see if this has any impact in the midrun.

12) Skip software SCTP software checksum calculation if the NIC comes
    with SCTP checksum offload support. From Davide Caratti.

13) Initial core factoring to prepare conversion to hook array. Three
    patches from Aaron Conole.

14) Gao Feng made a wrong conversion to switch in the xt_multiport
    extension in a patch coming in the previous batch. Fix it in this
    batch.

15) Get vmalloc call in sync with kmalloc flags to avoid a warning
    and likely OOM killer intervention from x_tables. From Marcelo
    Ricardo Leitner.

16) Update Arturo Borrero's email address in all source code headers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-07 19:16:46 -05:00
David S. Miller c3543688ab Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2016-12-03

Here's a set of Bluetooth & 802.15.4 patches for net-next (i.e. 4.10
kernel):

 - Fix for a potential NULL deref in the ieee802154 netlink code
 - Fix for the ED values of the at86rf2xx driver
 - Documentation updates to ieee802154
 - Cleanups to u8 vs __u8 usage
 - Timer API usage cleanups in HCI drivers

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05 13:37:28 -05:00
Florian Westphal 481fa37347 netfilter: conntrack: add nf_conntrack_default_on sysctl
This switch (default on) can be used to disable automatic registration
of connection tracking functionality in newly created network
namespaces.

This means that when net namespace goes down (or the tracker protocol
module is unloaded) we *might* have to unregister the hooks.

We can either add another per-netns variable that tells if
the hooks got registered by default, or, alternatively, just call
the protocol _put() function and have the callee deal with a possible
'extra' put() operation that doesn't pair with a get() one.

This uses the latter approach, i.e. a put() without a get has no effect.

Conntrack is still enabled automatically regardless of the new sysctl
setting if the new net namespace requires connection tracking, e.g. when
NAT rules are created.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04 21:17:25 +01:00
Erik Nordmark adc176c547 ipv6 addrconf: Implemented enhanced DAD (RFC7527)
Implemented RFC7527 Enhanced DAD.
IPv6 duplicate address detection can fail if there is some temporary
loopback of Ethernet frames. RFC7527 solves this by including a random
nonce in the NS messages used for DAD, and if an NS is received with the
same nonce it is assumed to be a looped back DAD probe and is ignored.
RFC7527 is enabled by default. Can be disabled by setting both of
conf/{all,interface}/enhanced_dad to zero.

Signed-off-by: Erik Nordmark <nordmark@arista.com>
Signed-off-by: Bob Gilligan <gilligan@arista.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 23:21:37 -05:00
Pavel Machek c6c60dae81 stmmac: cleanup documenation, make it match reality
Fix english in documentation, make documentation match reality, remove
options that were removed from code.

Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 15:07:32 -05:00
David S. Miller 2745529ac7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Couple conflicts resolved here:

1) In the MACB driver, a bug fix to properly initialize the
   RX tail pointer properly overlapped with some changes
   to support variable sized rings.

2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
   overlapping with a reorganization of the driver to support
   ACPI, OF, as well as PCI variants of the chip.

3) In 'net' we had several probe error path bug fixes to the
   stmmac driver, meanwhile a lot of this code was cleaned up
   and reorganized in 'net-next'.

4) The cls_flower classifier obtained a helper function in
   'net-next' called __fl_delete() and this overlapped with
   Daniel Borkamann's bug fix to use RCU for object destruction
   in 'net'.  It also overlapped with Jiri's change to guard
   the rhashtable_remove_fast() call with a check against
   tc_skip_sw().

5) In mlx4, a revert bug fix in 'net' overlapped with some
   unrelated changes in 'net-next'.

6) In geneve, a stale header pointer after pskb_expand_head()
   bug fix in 'net' overlapped with a large reorganization of
   the same code in 'net-next'.  Since the 'net-next' code no
   longer had the bug in question, there was nothing to do
   other than to simply take the 'net-next' hunks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-03 12:29:53 -05:00
Florian Westphal 25429d7b7d tcp: allow to turn tcp timestamp randomization off
Eric says: "By looking at tcpdump, and TS val of xmit packets of multiple
flows, we can deduct the relative qdisc delays (think of fq pacing).
This should work even if we have one flow per remote peer."

Having random per flow (or host) offsets doesn't allow that anymore so add
a way to turn this off.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02 12:49:59 -05:00
Francis Yan 1c885808e4 tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation.  For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.

To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-30 10:04:25 -05:00
Stefan Schmidt 6bf0d84d13 docs: ieee802154: update main documentation file
This updates some out of date documentation and fixes some wrong assumptions as
well as pure grammar fixes. This file needs to move towards the new kernel doc
system and getting an overhaul during this work.

Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>
2016-11-30 12:33:07 +01:00
Florian Fainelli cc9111162b Documentation: net: phy: Add links to several standards documents
Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0
revisions of the standard.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:07:43 -05:00
Florian Fainelli bf8f6952a2 Documentation: net: phy: Add blurb about RGMII
RGMII is a recurring source of pain for people with Gigabit Ethernet
hardware since it may require PHY driver and MAC driver level
configuration hints. Document what are the expectations from PHYLIB and
what options exist.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:07:42 -05:00
Florian Fainelli 2fa3e25b45 Documentation: net: phy: Add a paragraph about pause frames/flow control
Describe that the Ethernet MAC controller is ultimately responsible for
dealing with proper pause frames/flow control advertisement and
enabling, and that it is therefore allowed to have it change
phydev->supported/advertising with SUPPORTED_Pause and
SUPPORTED_AsymPause.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:07:42 -05:00
Florian Fainelli 527fd70e26 Documentation: net: phy: remove description of function pointers
Remove the function pointers documentation which duplicates information
found in include/linux/phy.h. Maintaining documentation about two
different locations just does not work, but the code is less likely to
be outdated.

Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 16:07:42 -05:00
Florian Westphal 486dcf43da netfilter: fix nf_conntrack_helper documentation
Since kernel 4.7 this defaults to off.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-24 12:50:24 +01:00
Jonathan Corbet 917fef6f7e Linux 4.9-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYHmoCAAoJEHm+PkMAQRiG7RMIAI2i7Y5hpL5yCxK5AFaL4u/G
 KxXfp1B1UanUTgjOmd7zGqtDYcFX9t7GTTUFixQ7/9Opr4PD9qbnatoDGSc3xjbT
 msDgA1B78F1/Q3kHWfeGq32MihQ4mj5NwUCo+igUcUvvWG7mHgzErj/Nh5RoobQX
 p/izdpTbrw3GX6xXB8olbG7XWHaVye/+TT3q6+gmgm8I/QEujcLeGoycE0zlhPN8
 FG/JX76At/+ZM2Py7Oxo3k+oKL9CHrtOQYDp/wN0uslV5eYvvkZz0/M1HMOGZt+c
 gZU5jzM17K7C4Nzo06WAuBU9wUBGc25m+cPicLlOmljnzfU+f50SKaDjZq3p7QI=
 =2KUF
 -----END PGP SIGNATURE-----

Merge tag 'v4.9-rc4' into sound

Bring in -rc4 patches so I can successfully merge the sound doc changes.
2016-11-18 16:13:41 -07:00
David S. Miller bb598c1b8c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of bug fixes in 'net' overlapping other changes in
'net-next-.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15 10:54:36 -05:00
David Lebrun 8bc66a4423 ipv6: sr: add documentation file for per-interface sysctls
This patch adds documentation for some SR-related per-interface
sysctls.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:40:06 -05:00
Hangbin Liu 1af92836e5 igmp: Document sysctl force_igmp_version
There is some difference between force_igmp_version and force_mld_version.
Add document to make users aware of this.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:22:55 -05:00
Fabian Mewes 8e0140a2d7 Documentation: networking: dsa: Update tagging protocols
Add Qualcomm QCA tagging introduced in cafdc45c9 to the
list of supported protocols.

Signed-off-by: Fabian Mewes <architekt@coding4coffee.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 20:39:15 -05:00
David S. Miller 27058af401 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mostly simple overlapping changes.

For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-30 12:42:58 -04:00
Linus Torvalds 2a26d99b25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Lots of fixes, mostly drivers as is usually the case.

   1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
      Khoroshilov.

   2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
      Pedersen.

   3) Don't put aead_req crypto struct on the stack in mac80211, from
      Ard Biesheuvel.

   4) Several uninitialized variable warning fixes from Arnd Bergmann.

   5) Fix memory leak in cxgb4, from Colin Ian King.

   6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.

   7) Several VRF semantic fixes from David Ahern.

   8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.

   9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.

  10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.

  11) Fix stale link state during failover in NCSCI driver, from Gavin
      Shan.

  12) Fix netdev lower adjacency list traversal, from Ido Schimmel.

  13) Propvide proper handle when emitting notifications of filter
      deletes, from Jamal Hadi Salim.

  14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.

  15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.

  16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.

  17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.

  18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
      Leitner.

  19) Revert a netns locking change that causes regressions, from Paul
      Moore.

  20) Add recursion limit to GRO handling, from Sabrina Dubroca.

  21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.

  22) Avoid accessing stale vxlan/geneve socket in data path, from
      Pravin Shelar"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
  geneve: avoid using stale geneve socket.
  vxlan: avoid using stale vxlan socket.
  qede: Fix out-of-bound fastpath memory access
  net: phy: dp83848: add dp83822 PHY support
  enic: fix rq disable
  tipc: fix broadcast link synchronization problem
  ibmvnic: Fix missing brackets in init_sub_crq_irqs
  ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
  Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
  arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
  net/mlx4_en: Save slave ethtool stats command
  net/mlx4_en: Fix potential deadlock in port statistics flow
  net/mlx4: Fix firmware command timeout during interrupt test
  net/mlx4_core: Do not access comm channel if it has not yet been initialized
  net/mlx4_en: Fix panic during reboot
  net/mlx4_en: Process all completions in RX rings after port goes up
  net/mlx4_en: Resolve dividing by zero in 32-bit system
  net/mlx4_core: Change the default value of enable_qos
  net/mlx4_core: Avoid setting ports to auto when only one port type is supported
  net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
  ...
2016-10-29 20:33:20 -07:00
David S. Miller 32ab0a38f0 Among various cleanups and improvements, we have the following:
* client FILS authentication support in mac80211 (Jouni)
  * AP/VLAN multicast improvements (Michael Braun)
  * config/advertising support for differing beacon intervals on
    multiple virtual interfaces (Purushottam Kushwaha, myself)
  * deprecate the old WDS mode for cfg80211-based drivers, the
    mode is hardly usable since it doesn't support any "modern"
    features like WPA encryption (2003), HT (2009) or VHT (2014),
    I'm not even sure WEP (introduced in 1997) could be done.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYEy/9AAoJEGt7eEactAAd/V0P/0FmHGS8HjlSjm+1p6sbWKbt
 5v8bb3cuKHQiYiUM6euIXql2OYuOEHVAQEpNoPXN9CsfKFYgbIH6yW6d8HtKNedV
 n9lmMy/U6yJX9nYt7yMIQ3kLkbEg+YU58B9Hf47waWXLLSNVumS8rfNBn43EoNQf
 VKWYPWpetsCRIWJ1fnLuxvMHCtOOYCxH+491BUonof32+DKPEAsAbnszZ2ElufTR
 7KNyA3K6leOtTd5Ml52dvLOGNc+h2C83VAMxiShq/6r8OnlX5tPifaubzd9n3m41
 jiJJH/92ESrtF2AaWEm8slcgtcfHS/O7y/FSoV4r0PMSvPTBdjwQ9nqCsbONd831
 vjj6c6YWNxgHPcISX0XcWz+FHnLJdUGaDUtHjAJYw4oH4gaRXwfSw0U+jvdlSMUf
 2CBUArk5f0OEguzwa/5X4Jio3OPPIj4jY/lKplcpLOUu8K2FWTLDuIlww/FHXovs
 rDzTLQeXZkx+MkszkTJN42qSEfOFly91J6OA2Wju+emBqrLIbkAGmvyLVg8U8BQd
 gG7oltgmZ6Xg6fEnUQqpIDO7UJlQ+GXAU04SpNDMv1j/ueUJskxlr3hYM7E9ueQv
 LJcZcVV0RAwNRw52cEsdcYCMLuSMYRrO1OHlkl0wd+x2hFrUCWnVzUgLEUhNBV+c
 ICmNMr96nKhrZI217yzF
 =SjfQ
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2016-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Among various cleanups and improvements, we have the following:
 * client FILS authentication support in mac80211 (Jouni)
 * AP/VLAN multicast improvements (Michael Braun)
 * config/advertising support for differing beacon intervals on
   multiple virtual interfaces (Purushottam Kushwaha, myself)
 * deprecate the old WDS mode for cfg80211-based drivers, the
   mode is hardly usable since it doesn't support any "modern"
   features like WPA encryption (2003), HT (2009) or VHT (2014),
   I'm not even sure WEP (introduced in 1997) could be done.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 17:28:45 -04:00
Mauro Carvalho Chehab 8c27ceff36 docs: fix locations of several documents that got moved
The previous patch renamed several files that are cross-referenced
along the Kernel documentation. Adjust the links to point to
the right places.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-10-24 08:12:35 -02:00
David S. Miller 8dbad1a811 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Fix compilation warning in xt_hashlimit on m68k 32-bits, from
   Geert Uytterhoeven.

2) Fix wrong timeout in set elements added from packet path via
   nft_dynset, from Anders K. Pedersen.

3) Remove obsolete nf_conntrack_events_retry_timeout sysctl
   documentation, from Nicolas Dichtel.

4) Ensure proper initialization of log flags via xt_LOG, from
   Liping Zhang.

5) Missing alias to autoload ipcomp, also from Liping Zhang.

6) Missing NFTA_HASH_OFFSET attribute validation, again from Liping.

7) Wrong integer type in the new nft_parse_u32_check() function,
   from Dan Carpenter.

8) Another wrong integer type declaration in nft_exthdr_init, also
   from Dan Carpenter.

9) Fix insufficient mode validation in nft_range.

10) Fix compilation warning in nft_range due to possible uninitialized
    value, from Arnd Bergmann.

11) Zero nf_hook_ops allocated via xt_hook_alloc() in x_tables to
    calm down kmemcheck, from Florian Westphal.

12) Schedule gc_worker() to run again if GC_MAX_EVICTS quota is reached,
    from Nicolas Dichtel.

13) Fix nf_queue() after conversion to single-linked hook list, related
    to incorrect bypass flag handling and incorrect hook point of
    reinjection.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21 10:25:22 -04:00
Nicolas Dichtel 4f76de5f23 netfilter: conntrack: remove obsolete sysctl (nf_conntrack_events_retry_timeout)
This entry has been removed in commit 9500507c61.

Fixes: 9500507c61 ("netfilter: conntrack: remove timer from ecache extension")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-17 17:38:18 +02:00
Sven Eckelmann 0de939bac2 batman-adv: Document new nc, mcast and tpmeter log levels
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:28:45 +02:00
Sven Eckelmann 204fa42c6f batman-adv: Add dat, mcast, nc and neighbor debugfs files to README
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:28:44 +02:00
Sven Eckelmann 1ed0359f15 batman-adv: Add network_coding and mcast sysfs files to README
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:28:24 +02:00
Sven Eckelmann c2059d8582 batman-adv: Add B.A.T.M.A.N. V sysfs files to README
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-10-17 16:12:48 +02:00
Linus Lüssing 4b82a004ec mac80211_hwsim: suggest nl80211 instead of wext driver in documentation
For mac80211_hwsim interfaces, suggest to use wpa_supplicant with the more
modern, netlink based driver instead of wext.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-17 11:38:01 +02:00
Linus Torvalds 5d89d9f502 linux-kselftest-4.9-rc1-update
This update consists of:
 
 - Fixes and improvements to existing tests
 - Moving code from Documentation to selftests, samples, and tools.
 
   Moves dnotify_test, prctl, ptp, vDSO, ia64, watchdog, and networking
   tests from Documentation to selftests.
 
   Moves mic/mpssd, misc-devices/mei, timers, watchdog, auxdisplay, and
   blackfin examples from Documentation to samples.
 
   Moves accounting, laptops/dslm, and pcmcia/crc32hash tools from
   Documentation to tools.
 
   Deletes BUILD_DOCSRC and its dependencies.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX/6zUAAoJEAsCRMQNDUMczIEP/0kH+yjJ3El4GYIokspR1/UU
 ++sy4XMzrD1UPy90v+ftcg4ss5R80r0v7EZ59k1UjDJSZ6WATHHGoZKCS2Dy3xcq
 i+0vm7Bawh7YWrXD3TunwaL97lwb2DdVTSxRXuU4Hfv+oVynUfh/+ZlCH6RCM2nm
 ZJE5PDYiq4nTVSRqFB2FyRE6yay5dPvpQ2ArwnSEw+ku4C+ZdGTGCWzS+aZBwZM/
 ykePkGLVRXz9FsWTCmipJzYu0Z/M4xEGlfXQZiiLG2HicbJNP6AqJImbQrANm+TW
 RFigYpofdhr9XG5TKTLIudaRt9qB6BE0mYEApZXH8U7NrHElfO9BBMEwzajl0V/2
 q/r5iej/CJult3zsfkhdHo7GLXpOaDLyoXiUI6UTgL0XOdWLAWTqDYx4JJz9sXxp
 B9dwKJeP5HLipk6FMkAHgJM90JKQFd/nLDKxeWexbMu/b/yQ2C9AR7NpdQ+c1X7I
 8W8UNEi/fnK75+r4t3NfeD2/5boq/jwujSKEMDQm/3R8L8EFYYb/TRoujFn89Na3
 wbZLV3hBL+KQ5lRyIx7X8RKyVJv1nlo9Wh57ItJed6zvGp5EmsI8w+DER2RfbO2c
 HR2JPDKSxmU8O2WBfDW5QoiPQH8Lssd147Ir0UFE7mwBXgWWsmxJxDpufizAXwyJ
 qnELJ9X3UFIdydtoObLr
 =60kH
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-4.9-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest updates from Shuah Khan:
 "This update consists of:

   - Fixes and improvements to existing tests

   - Moving code from Documentation to selftests, samples, and tools:

     * Moves dnotify_test, prctl, ptp, vDSO, ia64, watchdog, and
       networking tests from Documentation to selftests.

     * Moves mic/mpssd, misc-devices/mei, timers, watchdog, auxdisplay,
       and blackfin examples from Documentation to samples.

     * Moves accounting, laptops/dslm, and pcmcia/crc32hash tools from
       Documentation to tools.

     * Deletes BUILD_DOCSRC and its dependencies"

* tag 'linux-kselftest-4.9-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (21 commits)
  selftests/futex: Check ANSI terminal color support
  Doc: update 00-INDEX files to reflect the runnable code move
  samples: move blackfin gptimers-example from Documentation
  tools: move pcmcia crc32hash tool from Documentation
  tools: move laptops dslm tool from Documentation
  tools: move accounting tool from Documentation
  samples: move auxdisplay example code from Documentation
  samples: move watchdog example code from Documentation
  samples: move timers example code from Documentation
  samples: move misc-devices/mei example code from Documentation
  samples: move mic/mpssd example code from Documentation
  selftests: Move networking/timestamping from Documentation
  selftests: move watchdog tests from Documentation/watchdog
  selftests: move ia64 tests from Documentation/ia64
  selftests: move vDSO tests from Documentation/vDSO
  selftests: move ptp tests from Documentation/ptp
  selftests: move prctl tests from Documentation/prctl
  selftests: move dnotify_test from Documentation/filesystems
  selftests/timers: Add missing error code assignment before test
  selftests/zram: replace ZRAM_LZ4_COMPRESS
  ...
2016-10-14 15:17:12 -07:00
Alexander Alemayhu f56f7d2e1c Documentation/networking: update git urls to use https over http
This fixes the following errors when trying to clone the urls:

Cloning into 'net'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/' not found
Cloning into 'net-next'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/' not found
Cloning into 'linux'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/' not found
Cloning into 'stable-queue'...
fatal: repository 'http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/' not found

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-14 10:27:12 -04:00
Shuah Khan f59c668c0b Doc: update 00-INDEX files to reflect the runnable code move
Update 00-INDEX files with the current file list to reflect the runnable
code move.

Acked-by: Michal Marek <mmarek@suse.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2016-10-10 07:12:09 -06:00
Jiri Pirko fd41b0eaa0 doc: update switchdev L3 section
This is to reflect the change of FIB offload infrastructure from
switchdev objects to FIB notifier.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28 04:48:00 -04:00
Shuah Khan 3d2c86e305 selftests: Move networking/timestamping from Documentation
Remove networking from Documentation Makefile to move the test to
selftests. Update networking/timestamping Makefile to work under
selftests. These tests will not be run as part of selftests suite
and will not be included in install targets. They can be built and
run separately for now.

This is part of the effort to move runnable code from Documentation.

Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2016-09-20 09:59:50 -06:00
Mahesh Bandewar 4fbae7d83c ipvlan: Introduce l3s mode
In a typical IPvlan L3 setup where master is in default-ns and
each slave is into different (slave) ns. In this setup egress
packet processing for traffic originating from slave-ns will
hit all NF_HOOKs in slave-ns as well as default-ns. However same
is not true for ingress processing. All these NF_HOOKs are
hit only in the slave-ns skipping them in the default-ns.
IPvlan in L3 mode is restrictive and if admins want to deploy
iptables rules in default-ns, this asymmetric data path makes it
impossible to do so.

This patch makes use of the l3_rcv() (added as part of l3mdev
enhancements) to perform input route lookup on RX packets without
changing the skb->dev and then uses nf_hook at NF_INET_LOCAL_IN
to change the skb->dev just before handing over skb to L4.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-19 01:25:22 -04:00
David Howells d001648ec7 rxrpc: Don't expose skbs to in-kernel users [ver #2]
Don't expose skbs to in-kernel users, such as the AFS filesystem, but
instead provide a notification hook the indicates that a call needs
attention and another that indicates that there's a new call to be
collected.

This makes the following possibilities more achievable:

 (1) Call refcounting can be made simpler if skbs don't hold refs to calls.

 (2) skbs referring to non-data events will be able to be freed much sooner
     rather than being queued for AFS to pick up as rxrpc_kernel_recv_data
     will be able to consult the call state.

 (3) We can shortcut the receive phase when a call is remotely aborted
     because we don't have to go through all the packets to get to the one
     cancelling the operation.

 (4) It makes it easier to do encryption/decryption directly between AFS's
     buffers and sk_buffs.

 (5) Encryption/decryption can more easily be done in the AFS's thread
     contexts - usually that of the userspace process that issued a syscall
     - rather than in one of rxrpc's background threads on a workqueue.

 (6) AFS will be able to wait synchronously on a call inside AF_RXRPC.

To make this work, the following interface function has been added:

     int rxrpc_kernel_recv_data(
		struct socket *sock, struct rxrpc_call *call,
		void *buffer, size_t bufsize, size_t *_offset,
		bool want_more, u32 *_abort_code);

This is the recvmsg equivalent.  It allows the caller to find out about the
state of a specific call and to transfer received data into a buffer
piecemeal.

afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction
logic between them.  They don't wait synchronously yet because the socket
lock needs to be dealt with.

Five interface functions have been removed:

	rxrpc_kernel_is_data_last()
    	rxrpc_kernel_get_abort_code()
    	rxrpc_kernel_get_error_number()
    	rxrpc_kernel_free_skb()
    	rxrpc_kernel_data_consumed()

As a temporary hack, sk_buffs going to an in-kernel call are queued on the
rxrpc_call struct (->knlrecv_queue) rather than being handed over to the
in-kernel user.  To process the queue internally, a temporary function,
temp_deliver_data() has been added.  This will be replaced with common code
between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a
future patch.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01 16:43:27 -07:00
Vivien Didelot 8df3025520 net: dsa: add MDB support
Add SWITCHDEV_OBJ_ID_PORT_MDB support to the DSA layer.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-31 14:15:42 -07:00
David Howells 4de48af663 rxrpc: Pass struct socket * to more rxrpc kernel interface functions
Pass struct socket * to more rxrpc kernel interface functions.  They should
be starting from this rather than the socket pointer in the rxrpc_call
struct if they need to access the socket.

I have left:

	rxrpc_kernel_is_data_last()
	rxrpc_kernel_get_abort_code()
	rxrpc_kernel_get_error_number()
	rxrpc_kernel_free_skb()
	rxrpc_kernel_data_consumed()

unmodified as they're all about to be removed (and, in any case, don't
touch the socket).

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-30 16:07:53 +01:00
David Howells 8324f0bcfb rxrpc: Provide a way for AFS to ask for the peer address of a call
Provide a function so that kernel users, such as AFS, can ask for the peer
address of a call:

   void rxrpc_kernel_get_peer(struct rxrpc_call *call,
			      struct sockaddr_rxrpc *_srx);

In the future the kernel service won't get sk_buffs to look inside.
Further, this allows us to hide any canonicalisation inside AF_RXRPC for
when IPv6 support is added.

Also propagate this through to afs_find_server() and issue a warning if we
can't handle the address family yet.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-08-30 16:07:53 +01:00
David S. Miller 6abdd5f593 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All three conflicts were cases of simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-30 00:54:02 -04:00
Florian Fainelli 7d13eca09e Documentation: networking: dsa: Remove platform device TODO
Since commit 83c0afaec7 ("net: dsa: Add new binding implementation"),
the shortcomings of the dsa platform device have been addressed, remove
that TODO item.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-28 23:43:06 -04:00
Ido Schimmel 6bc506b4fb bridge: switchdev: Add forward mark support for stacked devices
switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
port netdevs so that packets being flooded by the device won't be
flooded twice.

It works by assigning a unique identifier (the ifindex of the first
bridge port) to bridge ports sharing the same parent ID. This prevents
packets from being flooded twice by the same switch, but will flood
packets through bridge ports belonging to a different switch.

This method is problematic when stacked devices are taken into account,
such as VLANs. In such cases, a physical port netdev can have upper
devices being members in two different bridges, thus requiring two
different 'offload_fwd_mark's to be configured on the port netdev, which
is impossible.

The main problem is that packet and netdev marking is performed at the
physical netdev level, whereas flooding occurs between bridge ports,
which are not necessarily port netdevs.

Instead, packet and netdev marking should really be done in the bridge
driver with the switch driver only telling it which packets it already
forwarded. The bridge driver will mark such packets using the mark
assigned to the ingress bridge port and will prevent the packet from
being forwarded through any bridge port sharing the same mark (i.e.
having the same parent ID).

Remove the current switchdev 'offload_fwd_mark' implementation and
instead implement the proposed method. In addition, make rocker - the
sole user of the mark - use the proposed method.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26 13:13:36 -07:00
Vivien Didelot 9d490b4ee4 net: dsa: rename switch operations structure
Now that the dsa_switch_driver structure contains only function pointers
as it is supposed to, rename it to the more appropriate dsa_switch_ops,
uniformly to any other operations structure in the kernel.

No functional changes here, basically just the result of something like:
s/dsa_switch_driver *drv/dsa_switch_ops *ops/g

However keep the {un,}register_switch_driver functions and their
dsa_switch_drivers list as is, since they represent the -- likely to be
deprecated soon -- legacy DSA registration framework.

In the meantime, also fix the following checks from checkpatch.pl to
make it happy with this patch:

    CHECK: Comparison to NULL could be written "!ops"
    #403: FILE: net/dsa/dsa.c:470:
    +	if (ops == NULL) {

    CHECK: Comparison to NULL could be written "ds->ops->get_strings"
    #773: FILE: net/dsa/slave.c:697:
    +		if (ds->ops->get_strings != NULL)

    CHECK: Comparison to NULL could be written "ds->ops->get_ethtool_stats"
    #824: FILE: net/dsa/slave.c:785:
    +	if (ds->ops->get_ethtool_stats != NULL)

    CHECK: Comparison to NULL could be written "ds->ops->get_sset_count"
    #835: FILE: net/dsa/slave.c:798:
    +		if (ds->ops->get_sset_count != NULL)

    total: 0 errors, 0 warnings, 4 checks, 784 lines checked

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-24 21:45:39 -07:00
Yuchung Cheng cebc5cbab4 net-tcp: retire TFO_SERVER_WO_SOCKOPT2 config
TFO_SERVER_WO_SOCKOPT2 was intended for debugging purposes during
Fast Open development. Remove this config option and also
update/clean-up the documentation of the Fast Open sysctl.

Reported-by: Piotr Jurkiewicz <piotr.jerzy.jurkiewicz@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-23 17:01:01 -07:00
Tom Herbert adcce4d5dd strparser: Documentation
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17 19:36:49 -04:00
David S. Miller f9f9ab1726 This feature patchset includes the following changes (mostly
chronological order):
 
  - bump version strings, by Simon Wunderlich
 
  - kerneldoc clean up, by Sven Eckelmann
 
  - enable RTNL automatic loading and according documentation
    changes, by Sven Eckelmann (2 patches)
 
  - fix/improve interface removal and associated locking, by
    Sven Eckelmann (3 patches)
 
  - clean up unused variables, by Linus Luessing
 
  - implement Gateway selection code for B.A.T.M.A.N. V by
    Antonio Quartulli (4 patches)
 
  - rewrite TQ comparison by Markus Pargmann
 
  - fix Cocinelle warnings on bool vs integers (by Fenguang Wu/Intels
    kbuild test robot) and bitwise arithmetic operations (by Linus
    Luessing)
 
  - rewrite packet creation for forwarding for readability and to avoid
    reference count mistakes, by Linus Luessing
 
  - use kmem_cache for translation table, which results in more efficient
    storing of translation table entries, by Sven Eckelmann
 
  - rewrite/clarify reference handling for send_skb_unicast, by Sven
    Eckelmann
 
  - fix debug messages when updating routes, by Sven Eckelmann
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCgAGBQJXrY1iAAoJEKEr45hCkp6hqjsP/1GI9Mm9iHGE5s4jY9+JORkn
 yR57i0l8IENLtQ2jrxu48VtyBKI5gQoitftRpAMZw5iUjgWXVTzA8/ik1Hy7VHnG
 NkDRAwHG/XH0peoubQGGPNbX2pZzBDnjR3wC9/8rOk/q6VqVqcLtgyHKbJFS5hd7
 dW3okXqKCZhJcTFnu95i7PZ9zTB7BrHEqcu9aDuA6VHdf4HF9ndCizP9bdnRCOVr
 wR1CkCrSjt2pMqRPLDAFcHzq/Lr+4LsNwodO0zqK5yetysJNaFJ7j5nTle2REk4L
 V2Wvbmyzxa8MznRphisYM+UJ12BxVjwmuilVoxgeu/FmfCpopA1L7lbWf+xxtAcP
 VLegLq2BG3fqkG6Pvk8emabC6oDxZaHsFV6uC4HylzLy09mCKWuap0qtgvNFjloM
 ntdYail5BFFsTq9j7KK7k4cfYikeLfmd3/j7F/ok+PJXGpAHKqOsfRABV61rxELH
 era8GrQmllh1UA/KX7j6rS4DK+AjaXmh+nk6+KDrd6IKo4+hZ1dg3UHB+ytrnx+6
 p0BoLgUnjBvNT44LsjtUZlt+3ILUspJWfb86kBgTFuZm8rJqulrJu6qDbmBZEayb
 PPrubxjYSKxR0nMlOVTBsGmNjugQIGn0ku89HKD210YZlpfnYENxwsxtYucWK/Tm
 AvwINRUXfumyJIZ385BQ
 =6XUi
 -----END PGP SIGNATURE-----

Merge tag 'batadv-next-for-davem-20160812' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature patchset includes the following changes (mostly
chronological order):

 - bump version strings, by Simon Wunderlich

 - kerneldoc clean up, by Sven Eckelmann

 - enable RTNL automatic loading and according documentation
   changes, by Sven Eckelmann (2 patches)

 - fix/improve interface removal and associated locking, by
   Sven Eckelmann (3 patches)

 - clean up unused variables, by Linus Luessing

 - implement Gateway selection code for B.A.T.M.A.N. V by
   Antonio Quartulli (4 patches)

 - rewrite TQ comparison by Markus Pargmann

 - fix Cocinelle warnings on bool vs integers (by Fenguang Wu/Intels
   kbuild test robot) and bitwise arithmetic operations (by Linus
   Luessing)

 - rewrite packet creation for forwarding for readability and to avoid
   reference count mistakes, by Linus Luessing

 - use kmem_cache for translation table, which results in more efficient
   storing of translation table entries, by Sven Eckelmann

 - rewrite/clarify reference handling for send_skb_unicast, by Sven
   Eckelmann

 - fix debug messages when updating routes, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12 20:55:41 -07:00
Netanel Belgazal 1738cd3ed3 net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)
This is a driver for the ENA family of networking devices.

Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12 17:12:08 -07:00
Sven Eckelmann f2c750fedd batman-adv: Use rtnl link in device creation example
The standard kernel API to add new virtual interfaces and attach other
interfaces to it is rtnl-link. batman-adv supports it since v3.10. This
functionality should be used instead of the legacy batman-adv-only sysfs
interface.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-08-09 07:54:26 +02:00
David Howells 372ee16386 rxrpc: Fix races between skb free, ACK generation and replying
Inside the kafs filesystem it is possible to occasionally have a call
processed and terminated before we've had a chance to check whether we need
to clean up the rx queue for that call because afs_send_simple_reply() ends
the call when it is done, but this is done in a workqueue item that might
happen to run to completion before afs_deliver_to_call() completes.

Further, it is possible for rxrpc_kernel_send_data() to be called to send a
reply before the last request-phase data skb is released.  The rxrpc skb
destructor is where the ACK processing is done and the call state is
advanced upon release of the last skb.  ACK generation is also deferred to
a work item because it's possible that the skb destructor is not called in
a context where kernel_sendmsg() can be invoked.

To this end, the following changes are made:

 (1) kernel_rxrpc_data_consumed() is added.  This should be called whenever
     an skb is emptied so as to crank the ACK and call states.  This does
     not release the skb, however.  kernel_rxrpc_free_skb() must now be
     called to achieve that.  These together replace
     rxrpc_kernel_data_delivered().

 (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().

     This makes afs_deliver_to_call() easier to work as the skb can simply
     be discarded unconditionally here without trying to work out what the
     return value of the ->deliver() function means.

     The ->deliver() functions can, via afs_data_complete(),
     afs_transfer_reply() and afs_extract_data() mark that an skb has been
     consumed (thereby cranking the state) without the need to
     conditionally free the skb to make sure the state is correct on an
     incoming call for when the call processor tries to send the reply.

 (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
     has finished with a packet and MSG_PEEK isn't set.

 (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().

     Because of this, we no longer need to clear the destructor and put the
     call before we free the skb in cases where we don't want the ACK/call
     state to be cranked.

 (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
     than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
     the delivery function already), and the caller is now responsible for
     producing an abort if that was the last packet.

 (6) There are many bits of unmarshalling code where:

 		ret = afs_extract_data(call, skb, last, ...);
		switch (ret) {
		case 0:		break;
		case -EAGAIN:	return 0;
		default:	return ret;
		}

     is to be found.  As -EAGAIN can now be passed back to the caller, we
     now just return if ret < 0:

 		ret = afs_extract_data(call, skb, last, ...);
		if (ret < 0)
			return ret;

 (7) Checks for trailing data and empty final data packets has been
     consolidated as afs_data_complete().  So:

		if (skb->len > 0)
			return -EBADMSG;
		if (!last)
			return 0;

     becomes:

		ret = afs_data_complete(call, skb, last);
		if (ret < 0)
			return ret;

 (8) afs_transfer_reply() now checks the amount of data it has against the
     amount of data desired and the amount of data in the skb and returns
     an error to induce an abort if we don't get exactly what we want.

Without these changes, the following oops can occasionally be observed,
particularly if some printks are inserted into the delivery path:

general protection fault: 0000 [#1] SMP
Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G            E   4.7.0-fsdevel+ #1303
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Workqueue: kafsd afs_async_workfn [kafs]
task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
RIP: 0010:[<ffffffff8108fd3c>]  [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1
RSP: 0018:ffff88040c073bc0  EFLAGS: 00010002
RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
FS:  0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
Stack:
 0000000000000006 000000000be04930 0000000000000000 ffff880400000000
 ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
 ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
Call Trace:
 [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74
 [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1
 [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189
 [<ffffffff810915f4>] lock_acquire+0x122/0x1b6
 [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6
 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
 [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49
 [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
 [<ffffffff814c928f>] skb_dequeue+0x18/0x61
 [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs]
 [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs]
 [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs]
 [<ffffffff81063a3a>] process_one_work+0x29d/0x57c
 [<ffffffff81064ac2>] worker_thread+0x24a/0x385
 [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0
 [<ffffffff810696f5>] kthread+0xf3/0xfb
 [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40
 [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-06 00:08:40 -04:00
Sowmini Varadhan 09204a6cda Documentation: RDS: Document Multipath RDS (mprds)
Document the design of mprds, covering a brief description
of the motivation, data-structures and modifications to the
RDS control plane.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 11:36:58 -07:00
Sowmini Varadhan d67214a29b Documentation: RDS: updates for SO_RDS_TRANSPORT socket option
Update the documentation to describe the changes added by
commit 8ba38460f3 ("net/rds Add getsockopt support for SO_RDS_TRANSPORT")

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-15 11:36:58 -07:00
David Ahern 484f674bb5 net: vrf: Address comments from last documentation update
Comments from Frank Kellerman on last doc update:
- extra whitespace in front of a neigh show command
- convert the brief link example to 'vrf red'

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13 18:04:17 -07:00
David Ahern 6e07653765 net: vrf: Documentation update
Update vrf documentation for changes made to 4.4 - 4.8 kernels
and iproute2 support for vrf keyword.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-13 11:36:27 -07:00
David S. Miller ae3e4562e2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next,
they are:

1) Don't use userspace datatypes in bridge netfilter code, from
   Tobin Harding.

2) Iterate only once over the expectation table when removing the
   helper module, instead of once per-netns, from Florian Westphal.

3) Extra sanitization in xt_hook_ops_alloc() to return error in case
   we ever pass zero hooks, xt_hook_ops_alloc():

4) Handle NFPROTO_INET from the logging core infrastructure, from
   Liping Zhang.

5) Autoload loggers when TRACE target is used from rules, this doesn't
   change the behaviour in case the user already selected nfnetlink_log
   as preferred way to print tracing logs, also from Liping Zhang.

6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
   by cache lines, increases the size of entries in 11% per entry.
   From Florian Westphal.

7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.

8) Remove useless defensive check in nf_logger_find_get() from Shivani
   Bhardwaj.

9) Remove zone extension as place it in the conntrack object, this is
   always include in the hashing and we expect more intensive use of
   zones since containers are in place. Also from Florian Westphal.

10) Owner match now works from any namespace, from Eric Bierdeman.

11) Make sure we only reply with TCP reset to TCP traffic from
    nf_reject_ipv4, patch from Liping Zhang.

12) Introduce --nflog-size to indicate amount of network packet bytes
    that are copied to userspace via log message, from Vishwanath Pai.
    This obsoletes --nflog-range that has never worked, it was designed
    to achieve this but it has never worked.

13) Introduce generic macros for nf_tables object generation masks.

14) Use generation mask in table, chain and set objects in nf_tables.
    This allows fixes interferences with ongoing preparation phase of
    the commit protocol and object listings going on at the same time.
    This update is introduced in three patches, one per object.

15) Check if the object is active in the next generation for element
    deactivation in the rbtree implementation, given that deactivation
    happens from the commit phase path we have to observe the future
    status of the object.

16) Support for deletion of just added elements in the hash set type.

17) Allow to resize hashtable from /proc entry, not only from the
    obscure /sys entry that maps to the module parameter, from Florian
    Westphal.

18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
    anymore since we tear down the ruleset whenever the netdevice
    goes away.

19) Support for matching inverted set lookups, from Arturo Borrero.

20) Simplify the iptables_mangle_hook() by removing a superfluous
    extra branch.

21) Introduce ether_addr_equal_masked() and use it from the netfilter
    codebase, from Joe Perches.

22) Remove references to "Use netfilter MARK value as routing key"
    from the Netfilter Kconfig description given that this toggle
    doesn't exists already for 10 years, from Moritz Sichert.

23) Introduce generic NF_INVF() and use it from the xtables codebase,
    from Joe Perches.

24) Setting logger to NONE via /proc was not working unless explicit
    nul-termination was included in the string. This fixes seems to
    leave the former behaviour there, so we don't break backward.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06 09:15:15 -07:00
Giuseppe CAVALLARO 70523e639b drivers: net: stmmac: reworking the PCS code.
The 3.xx and 4.xx synopsys gmacs have a very similar
PCS embedded module and they share almost the same registers:
for example:
  AN_Control, AN_Status, AN_Advertisement, AN_Link_Partner_Ability,
  AN_Expansion, TBI_Extended_Status.

Just the RGMII/SMII Control/Status register differs.

So This patch aims to reorganize and enhance the PCS support.
It removes the existent support from the dwmac1000/dwmac4_core.c
moving basic PCS functions inside a new file called: stmmac_pcs.h.

The patch also reviews the available APIs to be better shared among
different hardware and easily enhanced to support new features.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-28 08:54:23 -04:00
Florian Westphal 3183ab8997 netfilter: conntrack: allow increasing bucket size via sysctl too
No need to restrict this to module parameter.

We export a copy of the real hash size -- when user alters the value we
allocate the new table, copy entries etc before we update the real size
to the requested one.

This is also needed because the real size is used by concurrent readers
and cannot be changed without synchronizing the conntrack generation
seqcnt.

We only allow changing this value from the initial net namespace.

Tested using http-client-benchmark vs. httpterm with concurrent

while true;do
 echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
done

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24 11:03:28 +02:00
Oliver Hartkopp 9be05c7f37 can: bcm: add documentation for CAN FD support
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-06-17 15:39:47 +02:00
Eric Dumazet edb09eb17e net: sched: do not acquire qdisc spinlock in qdisc/class stats dump
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :

For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.

An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()

In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.

I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.

[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kevin Athey <kda@google.com>
Cc: Xiaotian Pei <xiaotian@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 16:37:14 -07:00
Eric Garver 176b346b37 Documentation: ip-sysctl.txt: clarify secure_redirects
Clarify how secure_redirects works. Mention that RFC1122 always applies.

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-29 22:40:53 -07:00
Florian Fainelli f05e2db199 Documentation: networking: dsa: Describe port_vlan_filtering
Described what the port_vlan_filtering function is supposed to
accomplish.

Fixes: fb2dabad69 ("net: dsa: support VLAN filtering switchdev attr")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:45:40 -07:00
Florian Fainelli 7013d8e1d0 Documentation: networking: dsa: Remove priv_size description
We no longer have a priv_size structure member since 5feebd0a8a ("net:
dsa: Remove allocation of driver private memory")

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:45:40 -07:00
Florian Fainelli 75c0bbd0e2 Documentation: networking: dsa: Remove poll_link description
This function has been removed in 4baee937b8 ("net: dsa: remove DSA
link polling") in favor of using the PHYLIB polling mechanism.

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 12:45:40 -07:00
Linus Torvalds e9ad9b9bd3 The most interesting thing (IMO) this time around is some beginning
infrastructural work to allow documents to be written using restructured
 text.  Maybe someday, in a galaxy far far away, we'll be able to eliminate
 the DocBook dependency and have a much better integrated set of kernel
 docs.  Someday.
 
 Beyond that, there's a new document on security hardening from Kees, the
 movement of some sample code over to samples/, a number of improvements to
 the serial docs from Geert, and the usual collection of corrections, typo
 fixes, etc.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXPf/VAAoJEI3ONVYwIuV60pkP/3brq+CavbwptWppESoyZaf7
 mpVSH7sOKicMcfHYYIXHmmg0K5gM4e22ATl39+izUCRZRwRnObXvroH++G5mARLs
 MUDxLvkc/QxDDuCZnUBq5E2gPtuyYpgj1q9fMGB+70ucc/EXYp5cxUhDmbNVrpSG
 KBMoZqKaW/Cf8/4fvRQG/glSR0iwyaQuvvoFAWLHgf8uWN/JPM2Cnv9V2zGQCtzP
 4B4Jzayu2BGKowBd65WUYdpGnccc7OAJFSJDY/Z9x7kVxKyD+VTn7VgxGnXxs88v
 uNmUEMENUpswzuoYEnDHoR0Y2o7jUi2doFKv+eacSmPaMLWL5EMDzcooZ+Vi7HWH
 mvp6GtAZ5qs96OGjsi+gFIw4kY8HGdnpzs7qk/uEdAndfAif5v24YLSQRG2rUCJM
 LxomnAWOJEIWGKJtuJnl16aZkgOcn6soecXw3PJmpxzhwd8BnQzwyZIdaZ98kwjA
 7Enq2Mmw5NBQwGIV2ODUxzoQ3Axj7aJJsDra2n6lPGTGXONGdgNFzk/hGmtQSuIp
 Aeatiy66FF0qKomzs2+EACOFP+eH/IId0yvW83Pj0o9nV25YZiPsw0Z1Tae5n3+g
 zgTFycalaowIwE3YzyH6BwvnMrluiPpUTjSLsmEaviJxE7/o+zrjOvMvallUIVUn
 YkJcia/DtSuc7u7LYkWe
 =2O+a
 -----END PGP SIGNATURE-----

Merge tag 'docs-for-linus' of git://git.lwn.net/linux

Pull Documentation updates from Jon Corbet:
 "A bit busier this time around.

  The most interesting thing (IMO) this time around is some beginning
  infrastructural work to allow documents to be written using
  restructured text.  Maybe someday, in a galaxy far far away, we'll be
  able to eliminate the DocBook dependency and have a much better
  integrated set of kernel docs.  Someday.

  Beyond that, there's a new document on security hardening from Kees,
  the movement of some sample code over to samples/, a number of
  improvements to the serial docs from Geert, and the usual collection
  of corrections, typo fixes, etc"

* tag 'docs-for-linus' of git://git.lwn.net/linux: (55 commits)
  doc: self-protection: provide initial details
  serial: doc: Use port->state instead of info
  serial: doc: Always refer to tty_port->mutex
  Documentation: vm: Spelling s/paltform/platform/g
  Documentation/memcg: update kmem limit doc as codes behavior
  docproc: print a comment about autogeneration for rst output
  docproc: add support for reStructuredText format via --rst option
  docproc: abstract terminating lines at first space
  docproc: abstract docproc directive detection
  docproc: reduce unnecessary indentation
  docproc: add variables for subcommand and filename
  kernel-doc: use rst C domain directives and references for types
  kernel-doc: produce RestructuredText output
  kernel-doc: rewrite usage description, remove duplicated comments
  Doc: correct the location of sysrq.c
  Documentation: fix common spelling mistakes
  samples: v4l: from Documentation to samples directory
  samples: connector: from Documentation to samples directory
  Documentation: xillybus: fix spelling mistake
  Documentation: x86: fix spelling mistakes
  ...
2016-05-19 18:07:25 -07:00
Daniel Borkmann 9295c03472 bpf, doc: fix typo on bpf_asm descriptions
Fix description of some of the bpf_asm tool related jump instructions
and generally move them to format A <op> k.

Reported-by: Sebastian Amend <sebastian.amend@googlemail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16 22:19:15 -04:00
David S. Miller e800072c18 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
In netdevice.h we removed the structure in net-next that is being
changes in 'net'.  In macsec.c and rtnetlink.c we have overlaps
between fixes in 'net' and the u64 attribute changes in 'net-next'.

The mlx5 conflicts have to do with vxlan support dependencies.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09 15:59:24 -04:00
Shmulik Ladkani c81aa79794 Documentation/networking: more accurate LCO explanation
In few places the term "ones-complement sum" was used but the actual
meaning is "the complement of the ones-complement sum".

Also, avoid enclosing long statements with underscore, to ease
readability.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-08 23:50:31 -04:00
Alexei Starovoitov f9c8d19d6c bpf: add documentation for 'direct packet access'
explain how verifier checks safety of packet access
and update email addresses.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:54 -04:00
Florian Westphal 5c2a9644d0 bonding: update documentation section after dev->trans_start removal
Drivers that use LLTX need to update trans_start of the netdev_queue.
(Most drivers don't use LLTX; stack does this update if .ndo_start_xmit
 returned TX_OK).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 17:07:13 -04:00
David S. Miller cba6532100 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv4/ip_gre.c

Minor conflicts between tunnel bug fixes in net and
ipv6 tunnel cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 00:52:29 -04:00
Eric Engestrom edb9a1b894 Documentation: networking: fix spelling mistakes
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 14:21:13 -04:00
Kees Cook 08559657b2 Documentation: fix common spelling mistakes
This fixes several spelling mistakes in the Documentation/ tree, which
are caught by checkpatch.pl's spell checking.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-04-28 07:51:59 -06:00
Florian Westphal f0cdf76c10 net: remove NETDEV_TX_LOCKED support
No more users in the tree, remove NETDEV_TX_LOCKED support.
Adds another hole in softnet_stats struct, but better than keeping
the unused collision counter around.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 15:53:05 -04:00
Nicolas Dichtel 9854518ea0 sched: align nlattr properly when needed
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 12:00:49 -04:00
Alexander Duyck f7a6272bf3 Documentation: Add documentation for TSO and GSO features
This document is a starting point for defining the TSO and GSO features.
The whole thing is starting to get a bit messy so I wanted to make sure we
have notes somwhere to start describing what does and doesn't work.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 16:23:41 -04:00
Masanari Iida bf91795e4a Doc: networking: Fix typo in dsa
This patch fix typos in Documentation/networking/dsa.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-13 22:38:56 -04:00
David Ahern a6db4494d2 net: ipv4: Consider failed nexthops in multipath routes
Multipath route lookups should consider knowledge about next hops and not
select a hop that is known to be failed.

Example:

                     [h2]                   [h3]   15.0.0.5
                      |                      |
                     3|                     3|
                    [SP1]                  [SP2]--+
                     1  2                   1     2
                     |  |     /-------------+     |
                     |   \   /                    |
                     |     X                      |
                     |    / \                     |
                     |   /   \---------------\    |
                     1  2                     1   2
         12.0.0.2  [TOR1] 3-----------------3 [TOR2] 12.0.0.3
                     4                         4
                      \                       /
                        \                    /
                         \                  /
                          -------|   |-----/
                                 1   2
                                [TOR3]
                                  3|
                                   |
                                  [h1]  12.0.0.1

host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:

    root@h1:~# ip ro ls
    ...
    12.0.0.0/24 dev swp1  proto kernel  scope link  src 12.0.0.1
    15.0.0.0/16
            nexthop via 12.0.0.2  dev swp1 weight 1
            nexthop via 12.0.0.3  dev swp1 weight 1
    ...

If the link between tor3 and tor1 is down and the link between tor1
and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
ssh 15.0.0.5 gets the other. Connections that attempt to use the
12.0.0.2 nexthop fail since that neighbor is not reachable:

    root@h1:~# ip neigh show
    ...
    12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
    12.0.0.2 dev swp1  FAILED
    ...

The failed path can be avoided by considering known neighbor information
when selecting next hops. If the neighbor lookup fails we have no
knowledge about the nexthop, so give it a shot. If there is an entry
then only select the nexthop if the state is sane. This is similar to
what fib_detect_death does.

To maintain backward compatibility use of the neighbor information is
based on a new sysctl, fib_multipath_use_neigh.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11 15:16:13 -04:00
Vivien Didelot 43c44a9f65 net: dsa: make the STP state function return void
The DSA layer doesn't care about the return code of the port_stp_update
routine, so make it void in the layer and the DSA drivers.

Replace the useless dsa_slave_stp_update function with a
dsa_slave_stp_state function used to reply to the switchdev
SWITCHDEV_ATTR_ID_PORT_STP_STATE attribute.

In the meantime, rename port_stp_update to port_stp_state_set to
explicit the state change.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08 16:50:40 -04:00
Vivien Didelot f453939c1a net: dsa: document missing functions
Add description for the missing port_vlan_prepare, port_fdb_prepare,
port_fdb_dump functions in the DSA documentation.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08 16:50:02 -04:00
David S. Miller 1089ac6977 For the 4.6 cycle, we have a number of changes:
* Bob's mesh mode rhashtable conversion, this includes
    the rhashtable API change for allocation flags
  * BSSID scan, connect() command reassoc support (Jouni)
  * fast (optimised data only) and support for RSS in mac80211 (myself)
  * various smaller changes
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJXBQ4GAAoJEGt7eEactAAdWiMP/ibaP3I79NDc0s7wCDA+KRkm
 hx0Qx4a0wwm7lDFlnGBjY6yKr+XFDliCvdGX7XGpLSsTioNg7eXPpwx5FQoj6RiV
 8+5RKE9fTguN9ofUzqAwHd9sVOaxvdlXbKfb/N93Gzjpw/meYk58wXdF7Almkroa
 ukgJeMzIlIh+6D96zFEA+Ofzp5chwh+x2Dn0wXutEe9P9fOERA859veAvx65b+Ql
 IRGTqyuY5B/wcbkr4o+DWQwgrdt7Vop9nYVPNWtMHm2JTzfuCSaQ2cD9TnVAK/bg
 /vtqC46KKNLyBRGexAPqdftY9PWcfipgE+n7k+Et4iGSmNm7Z3dEyewgXmqli7XJ
 X8Uiaq+N6Fpe06DVSU7aSRt8NLV64A44jXSfKRI9U2POUqKMn/PMdm8bhPW8qCdM
 ra6myWpQGHWK9e0TQQdShq0NQKGxCZAiSRiiIrbbvXl1CwXxkPCG39wAC3Sh1tEN
 ou4lGraeywGnTjaq+mwLEtHLoug8Y2x+Fz+Ze4Cu2enXxna9lp4lr+rFlc+2+0Er
 o9oPxkTk8krZGIj9M6PNc5W+InMwchaFX3076n67hnFHzFRlOQzkfffbPYlhKJDQ
 f8c9JiNZIoX/fD1TAKsrdO1+EKm/xo7w7pLgbMwQal8Jr88SkITDg0i3oXc56vNQ
 ZK2gUzwvrD/jh0AUyDfN
 =sj7y
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2016-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
For the 4.7 cycle, we have a number of changes:
 * Bob's mesh mode rhashtable conversion, this includes
   the rhashtable API change for allocation flags
 * BSSID scan, connect() command reassoc support (Jouni)
 * fast (optimised data only) and support for RSS in mac80211 (myself)
 * various smaller changes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08 16:42:31 -04:00
Ido Schimmel 75f3a1018f switchdev: Use switch ID in suggested udev rule
Since there can be multiple switch ASICs on the same system we should
use the switch ID in order to differentiate between them and set the
switch name (e.g. swX) accordingly.

Also, replace the order of the "Switch ID" and "Port Netdev Naming"
sections following the above change.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-05 15:07:54 -04:00
Sven Eckelmann 5c05803a3e mac80211: document only injected *_RADIOTAP_* flags
Not the internal flags but the radiotap flags are parsed when the monitor
injected frames are prepared for transmission. Thus the documentation
should only document these.

Reported-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Fixes: dfdfc2beb0 ("mac80211: Parse legacy and HT rate in injected frames")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05 10:48:57 +02:00
Lorenzo Bianconi 646e76bb5d mac80211: parse VHT info in injected frames
Add VHT radiotap parsing support to ieee80211_parse_tx_radiotap().
That capability has been tested using a d-link dir-860l rev b1 running
OpenWrt trunk and mt76 driver

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05 10:48:54 +02:00
Soheil Hassas Yeganeh fd91e12f59 sock: document timestamping via cmsg in Documentation
Update docs and add code snippet for using cmsg for timestamping.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04 15:50:30 -04:00
Alexandre TORGUE 0b7a43d376 Documentation: networking: update stmmac
Update stmmac driver documentation according to new GMAC 4.x family.

Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-02 20:23:09 -04:00
Dave Anderson 83d26b6326 bpf: doc: "neg" opcode has no operands
Fixes a copy-paste-o in the BPF opcode table: "neg" takes no arguments
and thus has no addressing modes.

Signed-off-by: Dave Anderson <danderson@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2016-03-31 00:30:41 -06:00
Nicolas Dichtel 3e34766048 switchdev: fix typo in comments/doc
Two minor typo.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-24 14:51:24 -04:00
Benjamin Poirier 537377d3b7 igmp: Document sysctl_igmp_max_msf
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-21 22:56:37 -04:00
Benjamin Poirier 6b226e2f80 net: Fix indentation of the conf/ documentation block
Commit d67ef35fff ("clarify documentation for
net.ipv4.igmp_max_memberships") mistakenly indented a block of
documentation such that it now looks like it belongs to a specific sysctl.
Restore that block's original position.

Cc: Jeremy Eder <jeder@redhat.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-21 22:56:37 -04:00
Vivien Didelot 71327a4e7d net: dsa: rename port_*_bridge routines
Rename DSA port_join_bridge and port_leave_bridge routines to
respectively port_bridge_join and port_bridge_leave in order to respect
an implicit Port::Bridge namespace.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 16:05:31 -04:00
Florian Fainelli 387178ec26 Documentation: networking: phy.txt: Add missing functions
Some new development in PHYLIB added new function pointers to the struct
phy_driver, document these.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-14 15:00:04 -04:00
Tom Herbert 10016594f4 kcm: Add description in Documentation
Add kcm.txt to desribe KCM and interfaces.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-09 16:36:16 -05:00
santosh.shilimkar@oracle.com dcdede0406 RDS: Drop stale iWARP RDMA transport
RDS iWarp support code has become stale and non testable. As
indicated earlier, am dropping the support for it.

If new iWarp user(s) shows up in future, we can adapat the RDS IB
transprt for the special RDMA READ sink case. iWarp needs an MR
for the RDMA READ sink.

Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:13:17 -05:00
David S. Miller d67703fced Here's another round of updates for -next:
* big A-MSDU RX performance improvement (avoid linearize of paged RX)
  * rfkill changes: cleanups, documentation, platform properties
  * basic PBSS support in cfg80211
  * MU-MIMO action frame processing support
  * BlockAck reordering & duplicate detection offload support
  * various cleanups & little fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJW0LZ0AAoJEGt7eEactAAdde0P/2meIOehuHBuAtL7REVoNhri
 bz9eSHMTg+ozCspL7F6vW1ifDI9AaJEaqJccmriueE/UQVC3VXRPGRJ4SFCwZGo9
 Zrtys2v9wOq0+XhxyN65Ucf41O9F/5FFabR5OFbf/pZhW5b2cubEjD1P4BB76Iya
 8O6wf9oDDjt3zJgYK+sygm3k9wtDVrH3qEbj8IDnCy22P7010qCsfok9swfaq8OB
 DBgb6BVfDOFTNXvJGH5fRuUKZdtovzzxorXnoG+zjmKmFdMVdgIYj9+2QfnMjW03
 B4/W85svcLLH8V3lHZc4G8oKM4J4XtjH1PskKIMF7ThJsKGMf8tL2vpt9rr8iscd
 Y9SwTEGc9JmhL7n2FaQFlY6ScLcp4ML+2rXxDOMpBmgF3Ne3yfBsJhLKZEl8vSfI
 mKhzGXpUKjJxJWIxkR0ylJy4/zHeIXkgRlUEhb8t+jgAqvOBTwiVY+vljHCDUERa
 sH40r1OqnGJtOHkSRqXSpxwXW+eKgyDd7fnnRX/tyttp2Fuew27/fN63SjpsfN6O
 3lfSM5bl3FcCKx7vqTLuqzsoqGvDDYkSq6GDfKDqeZIk0vaXA3SJNEOKgymFWQfR
 rzsaXvTbBT34GYRg3xS2NCxlmcBPemei/q0x6ZOffxhF41Qpqjs1dPB1Yq3AW4jD
 HGF+NdRbWEqEFVIjQa8w
 =JHOe
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2016-02-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Here's another round of updates for -next:
 * big A-MSDU RX performance improvement (avoid linearize of paged RX)
 * rfkill changes: cleanups, documentation, platform properties
 * basic PBSS support in cfg80211
 * MU-MIMO action frame processing support
 * BlockAck reordering & duplicate detection offload support
 * various cleanups & little fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 17:03:27 -05:00
David Ahern f1705ec197 net: ipv6: Make address flushing on ifdown optional
Currently, all ipv6 addresses are flushed when the interface is configured
down, including global, static addresses:

    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
        inet6 2100:1::2/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    << nothing; all addresses have been flushed>>

Add a new sysctl to make this behavior optional. The new setting defaults to
flush all addresses to maintain backwards compatibility. When the set global
addresses with no expire times are not flushed on an admin down. The sysctl
is per-interface or system-wide for all interfaces

    $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
or
    $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1

Will keep addresses on eth1 on an admin down.

    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
        inet6 2100:1::2/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
        inet6 2100:1::2/120 scope global tentative
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
           valid_lft forever preferred_lft forever

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 21:45:15 -05:00
Vivien Didelot 477b184526 net: dsa: drop vlan_getnext
The VLAN GetNext operation is specific to some switches, and thus can be
complicated to implement for some drivers.

Remove the support for the vlan_getnext/port_pvid_get approach in favor
of the generic and simpler port_vlan_dump function.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 15:20:21 -05:00
Vivien Didelot 65aebfc002 net: dsa: add port_vlan_dump routine
Similar to port_fdb_dump, add a port_vlan_dump function to DSA drivers
which gets passed the switchdev VLAN object and callback.

This function, if implemented, takes precedence over the soon legacy
vlan_getnext/port_pvid_get approach.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 15:20:20 -05:00
Sven Eckelmann dfdfc2beb0 mac80211: Parse legacy and HT rate in injected frames
Drivers/devices without their own rate control algorithm can get the
information what rates they should use from either the radiotap header of
injected frames or from the rate control algorithm. But the parsing of the
legacy rate information from the radiotap header was removed in commit
e6a9854b05 ("mac80211/drivers: rewrite the rate control API").

The removal of this feature heavily reduced the usefulness of frame
injection when wanting to simulate specific transmission behavior. Having
rate parsing together with MCS rates and retry support allows a fine
grained selection of the tx behavior of injected frames for these kind of
tests.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-02-24 09:04:30 +01:00
Vivien Didelot a6692754d6 net: dsa: pass bridge down to drivers
Some DSA drivers may or may not support multiple software bridges on top
of an hardware switch.

It is more convenient for them to access the bridge's net_device for
finer configuration.

Removing the need to craft and access a bitmask also simplifies the
code.

This patch changes the signature of bridge related functions, update DSA
drivers, and removes dsa_slave_br_port_mask.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-23 14:52:46 -05:00
Florian Westphal d1b4c689d4 netlink: remove mmapped netlink support
mmapped netlink has a number of unresolved issues:

- TX zerocopy support had to be disabled more than a year ago via
  commit 4682a03586 ("netlink: Always copy on mmap TX.")
  because the content of the mmapped area can change after netlink
  attribute validation but before message processing.

- RX support was implemented mainly to speed up nfqueue dumping packet
  payload to userspace.  However, since commit ae08ce0021
  ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy
  with the socket-based interface too (via the skb_zerocopy helper).

The other problem is that skbs attached to mmaped netlink socket
behave different from normal skbs:

- they don't have a shinfo area, so all functions that use skb_shinfo()
(e.g. skb_clone) cannot be used.

- reserving headroom prevents userspace from seeing the content as
it expects message to start at skb->head.
See for instance
commit aa3a022094 ("netlink: not trim skb for mmaped socket when dump").

- skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we
crash because it needs the sk to check if a tx ring is attached.

Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf359
("netfilter: nfnetlink: use original skbuff when acking batches").

mmaped netlink also didn't play nicely with the skb_zerocopy helper
used by nfqueue and openvswitch.  Daniel Borkmann fixed this via
commit 6bb0fef489 ("netlink, mmap: fix edge-case leakages in nf queue
zero-copy")' but at the cost of also needing to provide remaining
length to the allocation function.

nfqueue also has problems when used with mmaped rx netlink:
- mmaped netlink doesn't allow use of nfqueue batch verdict messages.
  Problem is that in the mmap case, the allocation time also determines
  the ordering in which the frame will be seen by userspace (A
  allocating before B means that A is located in earlier ring slot,
  but this also means that B might get a lower sequence number then A
  since seqno is decided later.  To fix this we would need to extend the
  spinlocked region to also cover the allocation and message setup which
  isn't desirable.
- nfqueue can now be configured to queue large (GSO) skbs to userspace.
  Queing GSO packets is faster than having to force a software segmentation
  in the kernel, so this is a desirable option.  However, with a mmap based
  ring one has to use 64kb per ring slot element, else mmap has to fall back
  to the socket path (NL_MMAP_STATUS_COPY) for all large packets.

To use the mmap interface, userspace not only has to probe for mmap netlink
support, it also has to implement a recv/socket receive path in order to
handle messages that exceed the size of an rx ring element.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 11:42:18 -05:00
Edward Cree e8ae7b000e Documentation/networking: add checksum-offloads.txt to explain LCO
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-12 05:52:16 -05:00
Johannes Berg 7a02bf892d ipv6: add option to drop unsolicited neighbor advertisements
In certain 802.11 wireless deployments, there will be NA proxies
that use knowledge of the network to correctly answer requests.
To prevent unsolicitd advertisements on the shared medium from
being a problem, on such deployments wireless needs to drop them.

Enable this by providing an option called "drop_unsolicited_na".

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 04:27:36 -05:00
Johannes Berg abbc30436d ipv6: add option to drop unicast encapsulated in L2 multicast
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv6 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.

Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 04:27:36 -05:00
Johannes Berg 97daf33145 ipv4: add option to drop gratuitous ARP packets
In certain 802.11 wireless deployments, there will be ARP proxies
that use knowledge of the network to correctly answer requests.
To prevent gratuitous ARP frames on the shared medium from being
a problem, on such deployments wireless needs to drop them.

Enable this by providing an option called "drop_gratuitous_arp".

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 04:27:35 -05:00
Johannes Berg 12b74dfadb ipv4: add option to drop unicast encapsulated in L2 multicast
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv4 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.

Additionally, enabling this option provides compliance with a SHOULD
clause of RFC 1122.

Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 04:27:35 -05:00
Sven Eckelmann 7b5e739619 batman-adv: Switch to HTTPS version of links
open-mesh.org and its subdomains can only be accessed via HTTPS. HTTP-only
requests are currently redirected automatically to HTTPS but references in
the source code should be only https.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-03 09:54:39 +08:00
Xin Long bffae6975e net: change tcp_syn_retries documentation
Documentation should be kept consistent with the code:

 static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
 #define MAX_TCP_SYNCNT          127

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-20 18:55:08 -08:00
Linus Torvalds e535d74bc5 A relatively boring cycle in the docs tree. There's a few kernel-doc
fixes and various document tweaks.
 
 One patch reaches out of the documentation subtree to fix a comment in
 init/do_mounts_rd.c.  There didn't seem to be anybody more appropriate to
 take that one, so I accepted it.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWmmJ8AAoJEI3ONVYwIuV6uqwP/0mnqdxVWo47ohaYJP7q0Soh
 ovJAbfttxKnkmOdGbWcNIJtTiw+MpdF805CYR+2treE0zvEEDodg7BhkDnmKZJ9n
 F1r53JrIj769E1c5ETmWTHcBt3jjtKyQIbBmDr4YTgX91dlKF28o1bMmyDECWIcT
 PktTlPUidDtffKMn3klh6baPCMrTpLJ8aLshBzUrQhrQY8lxcZKAU+98vtFzYofG
 LXCSulMYXumb7XBxErTLQZhmJslD4gaDMh2xkov6ALS8XNHnfoUIFRbArAllNfTf
 LQGJ6Q5qnn58UWi9F/vgDqx7+d1KIPUjBxJR9wfa0w9ggQhA9ly2BSN/fllbiSbp
 yIi1JS4hwBe8H/h577BNC3xjmgVN7mazZsXlS+fg3G16gpv4JdWeRY4efjosFIzQ
 EIJxB8qAovUNqw4s1mzRIJ5B9L7PEK27O6z8N27Fiw4EigtMTFAOC2/GD3ELx4iJ
 p1doiSr+wjfDcFd8kdIUiDKGrTSTXwNy3hUfrhzQyaEjDTJnx3+1+ono1orSazPO
 Fr2RSsC5VzX4IYSuxTMvFSKjN1Iiu8xqwq3IdclHXrBhRvwOF2wpjjQ5Guf0lHBJ
 FLBahSjZqt01kmwFykxoHps+VeSwpoEen6rClBQolfmtYVDTvgRNN46AxK9jZ8T4
 jZmCNNs/mYzrqo/RTnmw
 =u38W
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.5' of git://git.lwn.net/linux

Pull documentation updates from Jon Corbet:
 "A relatively boring cycle in the docs tree.  There's a few kernel-doc
  fixes and various document tweaks.

  One patch reaches out of the documentation subtree to fix a comment in
  init/do_mounts_rd.c.  There didn't seem to be anybody more appropriate
  to take that one, so I accepted it"

* tag 'docs-4.5' of git://git.lwn.net/linux: (29 commits)
  thermal: add description for integral_cutoff unit
  Documentation: update libhugetlbfs site url
  Documentation: Explain pci=conf1,conf2 more verbosely
  DMA-API: fix confusing sentence in Documentation/DMA-API.txt
  Documentation: translations: update linux cross reference link
  Documentation: fix typo in CodingStyle
  init, Documentation: Remove ramdisk_blocksize mentions
  Documentation-getdelays: Apply a recommendation from "checkpatch.pl" in main()
  Documentation: HOWTO: update versions from 3.x to 4.x
  Documentation: remove outdated references from translations
  Doc: treewide: Fix grammar "a" to "an"
  Documentation: cpu-hotplug: Fix sysfs mount instructions
  can-doc: Add hint about getting timestamps
  Fix CFQ I/O scheduler parameter name in documentation
  Documentation: arm: remove dead links from Marvell Berlin docs
  Documentation: HOWTO: update code cross reference link
  Doc: Docbook/iio: Fix typo in iio.tmpl
  DocBook: make index.html generation less verbose by default
  DocBook: Cleanup: remove an unused $(call) line
  DocBook: Add a help message for DOCBOOKS env var
  ...
2016-01-17 11:55:07 -08:00
Elad Raz 4f5590f8cd switchdev: Adding IGMP snooping documentation
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 16:50:21 -05:00
Sven Eckelmann cc69d3dbbb batman-adv: Change ifconfig examples to iproute2
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-01-09 20:56:00 +08:00
David Ahern 6dd9a14e92 net: Allow accepted sockets to be bound to l3mdev domain
Allow accepted sockets to derive their sk_bound_dev_if setting from the
l3mdev domain in which the packets originated. A sysctl setting is added
to control the behavior which is similar to sk_mark and
sysctl_tcp_fwmark_accept.

This effectively allow a process to have a "VRF-global" listen socket,
with child sockets bound to the VRF device in which the packet originated.
A similar behavior can be achieved using sk_mark, but a solution using marks
is incomplete as it does not handle duplicate addresses in different L3
domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
domain provides a complete solution.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18 14:43:38 -05:00
David S. Miller b3e0d3d7ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/geneve.c

Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-17 22:08:28 -05:00
Zhu Yanjun 566178f853 net: sctp: dynamically enable or disable pf state
As we all know, the value of pf_retrans >= max_retrans_path can
disable pf state. The variables of pf_retrans and max_retrans_path
can be changed by the userspace application.

Sometimes the user expects to disable pf state while the 2
variables are changed to enable pf state. So it is necessary to
introduce a new variable to disable pf state.

According to the suggestions from Vlad Yasevich, extra1 and extra2
are removed. The initialization of pf_enable is added.

Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-16 10:56:50 -05:00
Stefan Tatschner aecd89e855 can-doc: Add hint about getting timestamps
This patch adds a hint about how to get timestamps of received
CAN frames with ioctl(2). This hint has been applied to the
former SocketCAN Documentation, but it got lost during mainlining
the first bits and pieces to linux kernel.

Signed-off-by: Stefan Tatschner <rumpelsepp@sevenbyte.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-12-10 11:29:11 -07:00
Jeff Kirsher a3fb65680f e100.txt: Cleanup license info in kernel doc
Apparently the e100.txt document contained a "License" section left
over from days of old, which does not need to be in the kernel
documentation.  So clean it up..

CC: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
2015-12-03 12:58:10 -08:00
Linus Torvalds 4aeabc6b5c A few more documentation patches that wandered in and have no reason to
wait; these include some improvements to the suggestions for email clients
 and patch submission.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWRge3AAoJEI3ONVYwIuV6BboP/30p1d0kGC9KWH96aVZst1uu
 4VaUcOf+zKp8wQwhyHQIgmGlgD8u6fzfa8i2YIiVRzJD18Tm67KjHsbSiT4qgI84
 xizcmnuRogTthtWTmGITKfgQ5OL3Z0IX/5JgIMoIdmAKSjg/3kSR2b5FvN+tj6Qh
 9SQAWYIW2cvDWp9PV8gWOdA2j6EKDQ6BdwRE749LmS3MXRN9bM/KRezCyIrmCvOF
 FChTzxVjKd2TNYlO9nDyzn70WhyoV/e7jn+9AOKC4XHOCrFI0KB36mE8Wv/VmUmF
 +6YPLWNJrTb7J07p9fkYb0FeMzEVcZIVqtMgvtmWTrX4qat9VJI38tOcgDQKHOhj
 ETf+8DDM0t4pNUw+3KDdCdu+AIZcEuuRQYc7AUC/URHZGM+LsI0hERRcNS5Z/KRv
 lKyq3Y+A+/Z7b6Ia87lJavgubEdY3pagCOZXWQWvb8CwaTnOhuf4qfBK6pkqJBuN
 2DfNSQrsi2t1cqKfrnGr58kkMslOKzZLXXSjgvB8IhHmqedNha9JODykq2ldnMdx
 Lch1fIe8Exh+p0pY3nKUnmB69dMZnuNVnW92C4JbNr+zgU1jxQeLVer3bum3NzZw
 cEASmIPb1nJnl5aJkapc/rz+tSJ9GIIgaTyarNO+cVWQ712gYU5Ot4nNj5ayKW5C
 RgQOqBL+68GF1tMUB61b
 =6qJ0
 -----END PGP SIGNATURE-----

Merge tag '4.4-additional' of git://git.lwn.net/linux

Pull more documentation updates from Jon Corbet:
 "A few more documentation patches that wandered in and have no reason
  to wait; these include some improvements to the suggestions for email
  clients and patch submission"

* tag '4.4-additional' of git://git.lwn.net/linux:
  Documentation: Add minimal Mutt config for using Gmail
  Documentation: Add note on sending files directly with Mutt
  Documentation: dontdiff: remove media from dontdiff
  Documentation/SubmittingPatches: discuss In-Reply-To
  Remove email address from Documentation/filesystems/overlayfs.txt
  can-doc: Add missing semicolon to example
2015-11-13 09:19:05 -08:00
Stefan Tatschner e2807e67d5 can-doc: Add missing semicolon to example
The example code for CAN_BCM,

	connect(s, (struct sockaddr *)&addr, sizeof(addr))

lacks a semicolon at the end of the line. This patch adds that
missing semicolon to ensure that the given code snippet actually
compiles.

Signed-off-by: Stefan Tatschner <rumpelsepp@sevenbyte.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-11-11 10:04:53 -07:00
Linus Torvalds 2df4ee78d0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet.

 2) Several spots need to get to the original listner for SYN-ACK
    packets, most spots got this ok but some were not.  Whilst covering
    the remaining cases, create a helper to do this.  From Eric Dumazet.

 3) Missiing check of return value from alloc_netdev() in CAIF SPI code,
    from Rasmus Villemoes.

 4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich.

 5) Use after free in mvneta driver, from Justin Maggard.

 6) Fix race on dst->flags access in dst_release(), from Eric Dumazet.

 7) Add missing ZLIB_INFLATE dependency for new qed driver.  From Arnd
    Bergmann.

 8) Fix multicast getsockopt deadlock, from WANG Cong.

 9) Fix deadlock in btusb, from Kuba Pawlak.

10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6
    counter state.  From Sabrina Dubroca.

11) Fix packet_bind() race, which can cause lost notifications, from
    Francesco Ruggeri.

12) Fix MAC restoration in qlcnic driver during bonding mode changes,
    from Jarod Wilson.

13) Revert bridging forward delay change which broke libvirt and other
    userspace things, from Vlad Yasevich.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
  Revert "bridge: Allow forward delay to be cfgd when STP enabled"
  bpf_trace: Make dependent on PERF_EVENTS
  qed: select ZLIB_INFLATE
  net: fix a race in dst_release()
  net: mvneta: Fix memory use after free.
  net: Documentation: Fix default value tcp_limit_output_bytes
  macvtap: Resolve possible __might_sleep warning in macvtap_do_read()
  mvneta: add FIXED_PHY dependency
  net: caif: check return value of alloc_netdev
  net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA
  drivers: net: xgene: fix RGMII 10/100Mb mode
  netfilter: nft_meta: use skb_to_full_sk() helper
  net_sched: em_meta: use skb_to_full_sk() helper
  sched: cls_flow: use skb_to_full_sk() helper
  netfilter: xt_owner: use skb_to_full_sk() helper
  smack: use skb_to_full_sk() helper
  net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid()
  bpf: doc: correct arch list for supported eBPF JIT
  dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put"
  bonding: fix panic on non-ARPHRD_ETHER enslave failure
  ...
2015-11-10 18:11:41 -08:00
Niklas Cassel 821b414405 net: Documentation: Fix default value tcp_limit_output_bytes
Commit c39c4c6abb ("tcp: double default TSQ output bytes limit")
updated default value for tcp_limit_output_bytes

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-09 12:17:34 -05:00
Yang Shi d0b891415f bpf: doc: correct arch list for supported eBPF JIT
aarch64 and s390x support eBPF JIT too, correct document to reflect this and
avoid any confusion.

Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-08 20:46:48 -05:00
Linus Torvalds 5ebe0ee802 There is a nice new document from Neil on how pathname lookups work and
some new CAN driver documentation.  Beyond that, we have kernel-doc fixes,
 a bit more work to support reproducible builds, and the usual collection of
 small fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWO6HiAAoJEI3ONVYwIuV6ihwQAK0KC72h0706bdwDJ1p1/aJU
 QLuPeiKYWgGAXq2zgOyw3Povj4bkMwkiq1IGHLyK0Id4tg3ngxOXjimk4YKrqarI
 BD5HdpOm7IyQEe66ZU9b1RFDVst+bg3yp6ZIZsH5vQxl/KnyJ6AyaaDk8TPYId8S
 1+CykJzxyi7GyT/jlLpHbKtBKrraoVke+cNPMAvOf0NjSyO7Ix5B+qH50sttG6Eu
 9qcQ8hlKXOdZRTiGW6P+jeZNA+e5+CRpnG9VHBquHy4lI85kQThhWq41UMH690PP
 eRbLipeUybb0FwW2KwuMjGKEMDkMvrGJh0TzSXX9lGHd+5/41v7zcyKh8vJcpLjh
 bNQ2WOAKUBd2d15EP1MNoKXDLGJXusJczLwOjigWiSCQvgouAWwMrpWEw+Obv8Yl
 rdoH1oQqDFfDnk6mnKrSaqLWGNuLxDtkEl/1P0jsGSK6lM3FDkOgTuNPYXTJJgxN
 rXuGmPhyUlS2srERUeQJw2rISN0WRBvcKJGkMX6IpvrXHkItbelqK+yY1DeKPmcm
 qgbIx9ZWNqtltFpG22VVByqAVwucO5Nu8cAIQ2ysJsTnKOvQCQmhu5UKTjBCkEJM
 VpeMm32BfNiJFLuLTQGWBZ8bkRl2shQyXhOaR3uyqG4T+rpPD3qJi6dtFRpsAzOB
 q1nZuJCpOaxJFzjSKvpJ
 =emZ7
 -----END PGP SIGNATURE-----

Merge tag 'docs-for-linus' of git://git.lwn.net/linux

Pull documentation update from Jon Corbet:
 "There is a nice new document from Neil on how pathname lookups work
  and some new CAN driver documentation.  Beyond that, we have
  kernel-doc fixes, a bit more work to support reproducible builds, and
  the usual collection of small fixes"

* tag 'docs-for-linus' of git://git.lwn.net/linux: (34 commits)
  Documentation: add new description of path-name lookup.
  Documentation/vm/slub.txt: document slabinfo-gnuplot.sh
  Doc: ABI/stable: Fix typo in ABI/stable
  doc: Clarify that nmi_watchdog param is for hardlockups
  Typo correction for description in gpio document.
  DocBook: Fix kernel-doc to be case-insensitive for private:
  kernel-docs.txt: update kernelnewbies reference
  Doc:kvm: Fix typo in Doc/virtual/kvm
  Documentation/Changes: Add bc in "Current Minimal Requirements" section
  Documentation/email-clients.txt: remove trailing whitespace
  DocBook: Use a fixed encoding for output
  MAINTAINERS: The docs tree has moved
  Docs/kernel-parameters: Add earlycon devicetree usage
  SubmittingPatches: make Subject examples match the de facto standard
  Documentation: gpio: mention that <function>-gpio has been deprecated
  Documentation: cgroups: just fix a few typos
  Documentation: Update kselftest.txt
  Documentation: DMA API: Be more explicit that nents is always the same
  Documentation: Update the default value of crashkernel low
  zram: update documentation
  ...
2015-11-05 15:59:24 -08:00
David S. Miller e7b63ff115 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-10-30

1) The flow cache is limited by the flow cache limit which
   depends on the number of cpus and the xfrm garbage collector
   threshold which is independent of the number of cpus. This
   leads to the fact that on systems with more than 16 cpus
   we hit the xfrm garbage collector limit and refuse new
   allocations, so new flows are dropped. On systems with 16
   or less cpus, we hit the flowcache limit. In this case, we
   shrink the flow cache instead of refusing new flows.

   We increase the xfrm garbage collector threshold to INT_MAX
   to get the same behaviour, independent of the number of cpus.

2) Fix some unaligned accesses on sparc systems.
   From Sowmini Varadhan.

3) Fix some header checks in _decode_session4. We may call
   pskb_may_pull with a negative value converted to unsigened
   int from pskb_may_pull. This can lead to incorrect policy
   lookups. We fix this by a check of the data pointer position
   before we call pskb_may_pull.

4) Reload skb header pointers after calling pskb_may_pull
   in _decode_session4 as this may change the pointers into
   the packet.

5) Add a missing statistic counter on inner mode errors.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 20:51:56 +09:00
Ido Schimmel 371e59adce switchdev: Make flood to CPU optional
In certain use cases it is not always desirable for the switch device to
flood traffic to CPU port. Instead, only certain packet types (e.g.
STP, LACP) should be trapped to it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 12:26:40 +09:00
Ido Schimmel 741af0053b switchdev: Add support for flood control
Allow devices supporting this feature to control the flooding of unknown
unicast traffic, by making switchdev infrastructure propagate this setting
to the switch driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 12:26:38 +09:00
Yuchung Cheng 4f41b1c58a tcp: use RACK to detect losses
This patch implements the second half of RACK that uses the the most
recent transmit time among all delivered packets to detect losses.

tcp_rack_mark_lost() is called upon receiving a dubious ACK.
It then checks if an not-yet-sacked packet was sent at least
"reo_wnd" prior to the sent time of the most recently delivered.
If so the packet is deemed lost.

The "reo_wnd" reordering window starts with 1msec for fast loss
detection and changes to min-RTT/4 when reordering is observed.
We found 1msec accommodates well on tiny degree of reordering
(<3 pkts) on faster links. We use min-RTT instead of SRTT because
reordering is more of a path property but SRTT can be inflated by
self-inflicated congestion. The factor of 4 is borrowed from the
delayed early retransmit and seems to work reasonably well.

Since RACK is still experimental, it is now used as a supplemental
loss detection on top of existing algorithms. It is only effective
after the fast recovery starts or after the timeout occurs. The
fast recovery is still triggered by FACK and/or dupack threshold
instead of RACK.

We introduce a new sysctl net.ipv4.tcp_recovery for future
experiments of loss recoveries. For now RACK can be disabled by
setting it to 0.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 07:00:53 -07:00
Yuchung Cheng f672258391 tcp: track min RTT using windowed min-filter
Kathleen Nichols' algorithm for tracking the minimum RTT of a
data stream over some measurement window. It uses constant space
and constant time per update. Yet it almost always delivers
the same minimum as an implementation that has to keep all
the data in the window. The measurement window is tunable via
sysctl.net.ipv4.tcp_min_rtt_wlen with a default value of 5 minutes.

The algorithm keeps track of the best, 2nd best & 3rd best min
values, maintaining an invariant that the measurement time of
the n'th best >= n-1'th best. It also makes sure that the three
values are widely separated in the time window since that bounds
the worse case error when that data is monotonically increasing
over the window.

Upon getting a new min, we can forget everything earlier because
it has no value - the new min is less than everything else in the
window by definition and it's the most recent. So we restart fresh
on every new min and overwrites the 2nd & 3rd choices. The same
property holds for the 2nd & 3rd best.

Therefore we have to maintain two invariants to maximize the
information in the samples, one on values (1st.v <= 2nd.v <=
3rd.v) and the other on times (now-win <=1st.t <= 2nd.t <= 3rd.t <=
now). These invariants determine the structure of the code

The RTT input to the windowed filter is the minimum RTT measured
from ACK or SACK, or as the last resort from TCP timestamps.

The accessor tcp_min_rtt() returns the minimum RTT seen in the
window. ~0U indicates it is not available. The minimum is 1usec
even if the true RTT is below that.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 07:00:43 -07:00
Paolo Abeni 02a6d6136f Revert "ipv4/icmp: redirect messages can use the ingress daddr as source"
Revert the commit e2ca690b65 ("ipv4/icmp: redirect messages
can use the ingress daddr as source"), which tried to introduce a more
suitable behaviour for ICMP redirect messages generated by VRRP routers.
However RFC 5798 section 8.1.1 states:

    The IPv4 source address of an ICMP redirect should be the address
    that the end-host used when making its next-hop routing decision.

while said commit used the generating packet destination
address, which do not match the above and in most cases leads to
no redirect packets to be generated.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14 06:01:07 -07:00
David Ahern 4b418bff3d net: vrf: Documentation update, ip commands
Add ip commands with examples for creating VRF devics, enslaving interfaces
and dumping VRF-focused data (address, neighbors, routes).

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-13 18:55:31 -07:00
Paolo Abeni e2ca690b65 ipv4/icmp: redirect messages can use the ingress daddr as source
This patch allows configuring how the source address of ICMP
redirect messages is selected; by default the old behaviour is
retained, while setting icmp_redirects_use_orig_daddr force the
usage of the destination address of the packet that caused the
redirect.

The new behaviour fits closely the RFC 5798 section 8.1.1, and fix the
following scenario:

Two machines are set up with VRRP to act as routers out of a subnet,
they have IPs x.x.x.1/24 and x.x.x.2/24, with VRRP holding on to
x.x.x.254/24.

If a host in said subnet needs to get an ICMP redirect from the VRRP
router, i.e. to reach a destination behind a different gateway, the
source IP in the ICMP redirect is chosen as the primary IP on the
interface that the packet arrived at, i.e. x.x.x.1 or x.x.x.2.

The host will then ignore said redirect, due to RFC 1122 section 3.2.2.2,
and will continue to use the wrong next-op.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:38:02 -07:00
Jiri Pirko 1f86839874 switchdev: rename SWITCHDEV_ATTR_* enum values to SWITCHDEV_ATTR_ID_*
To be aligned with obj.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03 04:49:37 -07:00
Jiri Pirko 57d80838da switchdev: rename SWITCHDEV_OBJ_* enum values to SWITCHDEV_OBJ_ID_*
Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03 04:49:36 -07:00
Steffen Klassert c386578f1c xfrm: Let the flowcache handle its size by default.
The xfrm flowcache size is limited by the flowcache limit
(4096 * number of online cpus) and the xfrm garbage collector
threshold (2 * 32768), whatever is reached first. This means
that we can hit the garbage collector limit only on systems
with more than 16 cpus. On such systems we simply refuse
new allocations if we reach the limit, so new flows are dropped.
On syslems with 16 or less cpus, we hit the flowcache limit.
In this case, we shrink the flow cache instead of refusing new
flows.

We increase the xfrm garbage collector threshold to INT_MAX
to get the same behaviour, independent of the number of cpus.

The xfrm garbage collector threshold can still be set below
the flowcache limit to reduce the memory usage of the flowcache.

Tested-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-09-29 11:44:16 +02:00
David S. Miller 4963ed48f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv4/arp.c

The net/ipv4/arp.c conflict was one commit adding a new
local variable while another commit was deleting one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-26 16:08:27 -07:00
stephen hemminger 008aa6a4fa l2tp: remove references to modprobe in documentation
No longer need explicit modprobe's and update to use ip instead
of deprecated ifconfig command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25 12:27:23 -07:00
Jiri Pirko 7ea6eb3f56 switchdev: introduce transaction item queue for attr_set and obj_add
Now, the memory allocation in prepare/commit state is done separatelly
in each driver (rocker). Introduce the similar mechanism in generic
switchdev code, in form of queue. That can be used not only for memory
allocations, but also for different items. Abort item destruction
is handled as well.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24 22:59:21 -07:00
Scott Feldman 45ffda75e1 switchdev: update documentation on FDB ageing_time
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-23 14:35:58 -07:00
David S. Miller 99cb99aa05 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next tree
in this 4.4 development cycle, they are:

1) Schedule ICMP traffic to IPVS instances, this introduces a new schedule_icmp
   proc knob to enable/disable it. By default is off to retain the old
   behaviour. Patchset from Alex Gartrell.

I'm also including what Alex originally said for the record:

"The configuration of ipvs at Facebook is relatively straightforward.  All
ipvs instances bgp advertise a set of VIPs and the network prefers the
nearest one or uses ECMP in the event of a tie.  For the uninitiated, ECMP
deterministically and statelessly load balances by hashing the packet
(usually a 5-tuple of protocol, saddr, daddr, sport, and dport) and using
that number as an index (basic hash table type logic).

The problem is that ICMP packets (which contain really important
information like whether or not an MTU has been exceeded) will get a
different hash value and may end up at a different ipvs instance.  With no
information about where to route these packets, they are dropped, creating
ICMP black holes and breaking Path MTU discovery.  Suddenly, my mom's
pictures can't load and I'm fielding midday calls that I want nothing to do
with.

To address this, this patch set introduces the ability to schedule icmp
packets which is gated by a sysctl net.ipv4.vs.schedule_icmp.  If set to 0,
the old behavior is maintained -- otherwise ICMP packets are scheduled."

2) Add another proc entry to ignore tunneled packets to avoid routing loops
   from IPVS, also from Alex.

3) Fifteen patches from Eric Biederman to:

* Stop passing nf_hook_ops as parameter to the hook and use the state hook
  object instead all around the netfilter code, so only the private data
  pointer is passed to the registered hook function.

* Now that we've got state->net, propagate the netns pointer to netfilter hook
  clients to avoid its computation over and over again. A good example of how
  this has been simplified is the former TEE target (now nf_dup infrastructure)
  since it has killed the ugly pick_net() function.

There's another round of netns updates from Eric Biederman making the line. To
avoid the patchbomb again to almost all the networking mailing list (that is 84
patches) I'd suggest we send you a pull request with no patches or let me know
if you prefer a better way.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-22 13:11:43 -07:00
Pablo Neira Ayuso 36aea585a1 Merge tag 'ipvs-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next
Simon Horman says:

====================
IPVS Updates for v4.4

please consider these IPVS Updates for v4.4.

The updates include the following from Alex Gartrell:
* Scheduling of ICMP
* Sysctl to ignore tunneled packets; and hence some packet-looping scenarios
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-18 21:05:03 +02:00
Oliver Hartkopp ac78a15de4 can: Add documentation for CAN FD driver configuration
With Linux 3.15 the infrastructure for CAN FD hardware drivers had been
introduced into the kernel. Now the M_CAN driver and the peak_usb driver
support CAN FD. Update the documentation to show the latest CAN related
configuration options of 'ip' from iproute2 and describe the CAN FD specific
options to set the data bitrate and protocol version (ISO/non-ISO).

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-09-18 10:02:59 -06:00
David Ahern 562d897d15 net: Add documentation for VRF device
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 16:06:43 -07:00
Stefan Schmidt 29cd5ddc45 ieee802154: docs: fix project name to linux-wpan as well as some typos
Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-09-17 13:20:05 +02:00
Alex Gartrell 4e478098ac ipvs: add sysctl to ignore tunneled packets
This is a way to avoid nasty routing loops when multiple ipvs instances can
forward to eachother.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2015-09-17 11:50:02 +09:00
Linus Torvalds dd5cdb48ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Another merge window, another set of networking changes.  I've heard
  rumblings that the lightweight tunnels infrastructure has been voted
  networking change of the year.  But what do I know?

   1) Add conntrack support to openvswitch, from Joe Stringer.

   2) Initial support for VRF (Virtual Routing and Forwarding), which
      allows the segmentation of routing paths without using multiple
      devices.  There are some semantic kinks to work out still, but
      this is a reasonably strong foundation.  From David Ahern.

   3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.

   4) Ignore route nexthops with a link down state in ipv6, just like
      ipv4.  From Andy Gospodarek.

   5) Remove spinlock from fast path of act_gact and act_mirred, from
      Eric Dumazet.

   6) Document the DSA layer, from Florian Fainelli.

   7) Add netconsole support to bcmgenet, systemport, and DSA.  Also
      from Florian Fainelli.

   8) Add Mellanox Switch Driver and core infrastructure, from Jiri
      Pirko.

   9) Add support for "light weight tunnels", which allow for
      encapsulation and decapsulation without bearing the overhead of a
      full blown netdevice.  From Thomas Graf, Jiri Benc, and a cast of
      others.

  10) Add Identifier Locator Addressing support for ipv6, from Tom
      Herbert.

  11) Support fragmented SKBs in iwlwifi, from Johannes Berg.

  12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.

  13) Add BQL support to 3c59x driver, from Loganaden Velvindron.

  14) Stop using a zero TX queue length to mean that a device shouldn't
      have a qdisc attached, use an explicit flag instead.  From Phil
      Sutter.

  15) Use generic geneve netdevice infrastructure in openvswitch, from
      Pravin B Shelar.

  16) Add infrastructure to avoid re-forwarding a packet in software
      that was already forwarded by a hardware switch.  From Scott
      Feldman.

  17) Allow AF_PACKET fanout function to be implemented in a bpf
      program, from Willem de Bruijn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
  netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
  netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
  net: fec: clear receive interrupts before processing a packet
  ipv6: fix exthdrs offload registration in out_rt path
  xen-netback: add support for multicast control
  bgmac: Update fixed_phy_register()
  sock, diag: fix panic in sock_diag_put_filterinfo
  flow_dissector: Use 'const' where possible.
  flow_dissector: Fix function argument ordering dependency
  ixgbe: Resolve "initialized field overwritten" warnings
  ixgbe: Remove bimodal SR-IOV disabling
  ixgbe: Add support for reporting 2.5G link speed
  ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
  ixgbe: support for ethtool set_rxfh
  ixgbe: Avoid needless PHY access on copper phys
  ixgbe: cleanup to use cached mask value
  ixgbe: Remove second instance of lan_id variable
  ixgbe: use kzalloc for allocating one thing
  flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
  ixgbe: Remove unused PCI bus types
  ...
2015-09-03 08:08:17 -07:00
Andrew Lunn a5597008db phy: fixed_phy: Add gpio to determine link up/down.
An SFP module may have a link up/down status pin which can be
connection to a GPIO line of the host. Add support for reading such an
GPIO in the fixed_phy driver.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 14:48:02 -07:00
Philip Downey 87583ebb9f IGMP: Document igmp_link_local_mcast_reports
Document the addition of a new sysctl variable which controls the
generation of IGMP reports for link local multicast groups in the
224.0.0.X range.

IGMP reports for local multicast groups can now be optionally
inhibited by setting the value to zero e.g.:
echo 0 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports

To retain backwards compatibility the previous behaviour is retained
by default on system boot or reverted by setting the value back to
non-zero.

Signed-off-by: Philip Downey <pdowney@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-31 12:30:37 -07:00
Florian Fainelli ef6346386b Documentation: networking: dsa: Add Broadcom SF2 document
Add a document describing the Broadcom Starfigther 2 switch hardware,
its specifics, and how the driver is implemented and its specifics.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25 17:01:32 -07:00
Florian Fainelli 77760e9492 Documentation: networking: add a DSA document
Describe how the DSA subsystem works, its design principles,
limitations, and describe in details how to implement a DSA switch
driver.

Acked-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25 17:01:32 -07:00
Eric Dumazet 43e122b014 tcp: refine pacing rate determination
When TCP pacing was added back in linux-3.12, we chose
to apply a fixed ratio of 200 % against current rate,
to allow probing for optimal throughput even during
slow start phase, where cwnd can be doubled every other gRTT.

At Google, we found it was better applying a different ratio
while in Congestion Avoidance phase.
This ratio was set to 120 %.

We've used the normal tcp_in_slow_start() helper for a while,
then tuned the condition to select the conservative ratio
as soon as cwnd >= ssthresh/2 :

- After cwnd reduction, it is safer to ramp up more slowly,
  as we approach optimal cwnd.
- Initial ramp up (ssthresh == INFINITY) still allows doubling
  cwnd every other RTT.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25 11:33:54 -07:00
David S. Miller 0aa65cc0c2 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-08-16

Here's what's likely the last bluetooth-next pull request for 4.3:

 - 6lowpan/802.15.4 refactoring, cleanups & fixes
 - Document 6lowpan netdev usage in Documentation/networking/6lowpan.txt
 - Support for UART based QCA Bluetooth controllers
 - Power management support for Broeadcom Bluetooth controllers
 - Change LE connection initiation to always use passive scanning first
 - Support for new Silicon Wave USB ID

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 15:41:21 -07:00
David S. Miller c87acb2558 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-08-17

1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels.
   From Thomas Egerer.

2) Use kmemdup instead of duplicating it in xfrm_dump_sa().
   From Andrzej Hajda.

3) Pass oif to the xfrm lookups so that it gets set on the flow
   and the resolver routines can match based on oif.
   From David Ahern.

4) Add documentation for the new xfrm garbage collector threshold.
   From Alexander Duyck.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 14:05:14 -07:00
Scott Feldman dd19f83d6c rocker: hook ndo_neigh_destroy to cleanup neigh refs in driver
Rocker driver tracks arp_tbl neighs to resolve IPv4 route nexthops.  The
driver uses NETEVENT_NEIGH_UPDATE for neigh adds and updates, but there is
no event when the neigh is removed from the device (such as when the device
goes admin down).  This patches hooks ndo_neigh_destroy so the driver can
know when a neigh is removed from the device.  In response, the driver will
purge the neigh entry from its internal tbl.

I didn't find an in-tree users of ndo_neigh_destroy, so I'm not sure if
this ndo is vestigial or if there are out-of-tree users.  In any case, it
does what I need here.  An alternative design would be to generate
NETEVENT_NEIGH_UPDATE event when neigh is being destroyed, setting state to
NUD_NONE so driver knows neigh entry is dead.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 17:05:46 -07:00
Rick Jones e8fed985d7 documentation: bring vxlan documentation more up-to-date
A few things have changed since the previous version of the vxlan
documentation was written, so update it and correct some grammar and
such while we are at it.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-12 16:46:30 -07:00
Alexander Duyck e69948a0a5 net: Document xfrm4_gc_thresh and xfrm6_gc_thresh
This change adds documentation for xfrm4_gc_thresh and xfrm6_gc_thresh
based on the comments in commit eeb1b73378 ("xfrm: Increase the garbage
collector threshold").

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-08-12 08:28:04 +02:00
Alexander Aring ea9eb698b2 documentation: networking: add 6lowpan documentation
This patch adds a 6lowpan.txt into the networking documentation
directory. Currently this documentation describes how the lowpan
private data of net devices will be handled.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Suggested-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-08-11 22:05:36 +02:00
Tom Herbert b56774163f ipv6: Enable auto flow labels by default
Initialize auto_flowlabels to one. This enables automatic flow labels,
individual socket may disable them using the IPV6_AUTOFLOWLABEL socket
option.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31 17:07:12 -07:00
Tom Herbert 42240901f7 ipv6: Implement different admin modes for automatic flow labels
Change the meaning of net.ipv6.auto_flowlabels to provide a mode for
automatic flow labels generation. There are four modes:

0: flow labels are disabled
1: flow labels are enabled, sockets can opt-out
2: flow labels are allowed, sockets can opt-in
3: flow labels are enabled and enforced, no opt-out for sockets

np->autoflowlabel is initialized according to the sysctl value.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31 17:07:11 -07:00
Hangbin Liu 8013d1d7ea net/ipv6: add sysctl option accept_ra_min_hop_limit
Commit 6fd99094de ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.

RFC 4861, 6.3.4.  Processing Received Router Advertisements
   A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
   and Retrans Timer) may contain a value denoting that it is
   unspecified.  In such cases, the parameter should be ignored and the
   host should continue using whatever value it is already using.

   If the received Cur Hop Limit value is non-zero, the host SHOULD set
   its CurHopLimit variable to the received value.

So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30 15:56:40 -07:00
Joachim Eastwood 75fee59550 stmmac: remove setup/free glue callbacks
As all dwmac-* drivers have been converted to have a proper probe
function the setup callback can now be removed. Also remove the
free callback that wasn't used by any driver.

New dwmac-* drivers should implement standard probe and remove
functions to preform any needed setup and teardown.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 00:13:25 -07:00
Joachim Eastwood 0933328a1b stmmac: remove unused stmmac_of_data struct
As dwmac-* drivers that need OF match have been converted
to use their own internal OF match data structure this can
now be removed.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29 00:13:24 -07:00
Erik Kline 3985e8a361 ipv6: sysctl to restrict candidate source addresses
Per RFC 6724, section 4, "Candidate Source Addresses":

    It is RECOMMENDED that the candidate source addresses be the set
    of unicast addresses assigned to the interface that will be used
    to send to the destination (the "outgoing" interface).

Add a sysctl to enable this behaviour.

Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-22 10:54:11 -07:00
David S. Miller f3120acc78 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-07-17

This series contains updates to igb, ixgbe, ixgbevf, i40e, bnx2x,
freescale, siena and dp83640.

Jacob provides several patches to clarify the intended way to implement
both SIOCSHWTSTAMP and ethtool's get_ts_info().  It is okay to support
the specific filters in SIOCSHWTSTAMP by upscaling them to the generic
filters.

Alex Duyck provides a igb patch to pull the time stamp from the fragment
before it gets added to the skb, to avoid a possible issue in which the
fragment can possibly be less than IGB_RX_HDR_LEN due to the time stamp
being pulled after the copybreak check.  Also provides a ixgbevf patch to
fold the ixgbevf_pull_tail() call into ixgbevf_add_rx_frag(), which gives
the advantage that the fragment does not have to be modified after it is
added to the skb.

Fan provides patches for ixgbe/ixgbevf to set the receive hash type
based on receive descriptor RSS type.

Todd provides a fix for igb where on check for link on any media other
than copper was not being detected since it was looking on the incorrect
PHY page (due to the page being used gets switched before the function
to check link gets executed).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:50:19 -07:00
Joachim Eastwood f4c190eb8b stmmac: drop custom_* fields from plat_stmmacenet_data
Both of these fields are unused and has been unused since they
were added 3 and 5 years ago. Drop them since they are clearly
not very useful.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:45:57 -07:00
Scott Feldman a48037e7c6 switchdev: update documentation for offload_fwd_mark
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:32:45 -07:00
Jacob Keller eff3cddc22 clarify implementation of ethtool's get_ts_info op
This patch adds some clarification about the intended way to implement
both SIOCSHWTSTAMP and ethtool's get_ts_info. The HWTSTAMP API has
several Rx filters which are very specific, as well as more general
filters. The specific filters really only exist to support some broken
hardware which can't fully implement the generic filters. This patch
adds clarification that it is okay to support the specific filters in
SIOCSHWTSTAMP by upscaling them to the generic filters. In addition,
update the header for ethtool_ts_info to specify that drivers ought to
only report the filters they support without upscaling in this manner.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Reviewed-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-07-17 19:59:04 -07:00
Stefan Tatschner 03bc7523cb can-doc: Fix wrong chapter reference
In f35f6c8f7 (can: update MAINTAINERS and Documentation) chapter 3.3
was removed. This patch fixes some old references to chapter 3.4 which
no longer exists.

Signed-off-by: Stefan Tatschner <stefan@sevenbyte.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-07-10 15:18:33 -06:00
Tom Herbert 35a256fee5 ipv6: Nonlocal bind
Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.

This add the ip_nonlocal_bind sysctl under ipv6.

Testing:

Set up nonlocal binding and receive routing on a host, e.g.:

ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1

Set up routing to 2001:0:0:1::/64 on peer to go to first host

ping6 -I 2001:0:0:1::1 peer-address -- to verify

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 21:09:10 -07:00
Tejun Heo e2f15f9a79 netconsole: implement extended console support
printk logbuf keeps various metadata and optional key=value dictionary for
structured messages, both of which are stripped when messages are handed
to regular console drivers.

It can be useful to have this metadata and dictionary available to
netconsole consumers.  This obviously makes logging via netconsole more
complete and the sequence number in particular is useful in environments
where messages may be lost or reordered in transit - e.g.  when netconsole
is used to collect messages in a large cluster where packets may have to
travel congested hops to reach the aggregator.  The lost and reordered
messages can easily be identified and handled accordingly using the
sequence numbers.

printk recently added extended console support which can be selected by
setting CON_EXTENDED flag.  From console driver side, not much changes.
The only difference is that the text passed to the write callback is
formatted the same way as /dev/kmsg.

This patch implements extended console support for netconsole which can be
enabled by either prepending "+" to a netconsole boot param entry or
echoing 1 to "extended" file in configfs.  When enabled, netconsole
transmits extended log messages with headers identical to /dev/kmsg
output.

There's one complication due to message fragments.  netconsole limits the
maximum message size to 1k and messages longer than that are split into
multiple fragments.  As all extended console messages should carry
matching headers and be uniquely identifiable, each extended message
fragment carries full copy of the metadata and an extra header field to
identify the specific fragment.  The optional header is of the form
"ncfrag=OFF/LEN" where OFF is the byte offset into the message body and
LEN is the total length.

To avoid unnecessarily making printk format extended messages, Extended
netconsole is registered with printk when the first extended netconsole is
configured.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:39 -07:00
Linus Torvalds 1e467e68e5 Documentation updates for 4.2
The main thing here is Ingo's big subdirectory documenting feature support
 for each architecture.  Beyond that, it's the usual pile of fixes, tweaks,
 and small additions.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVi0g2AAoJEI3ONVYwIuV6Me4QAIfa79z05ABSjlyWaKw46plH
 lULR9cyHdR59JVPHKjSOfT9/c+GOdoz6kkXQoe/TgVyj5fRB8seUW5GJXCASndkk
 aVd4c6yKFH1NISXsSdVQC0JbpgAURgcSR6x59It++fG3NINvXronFTWGMBHMLKcI
 A2hM2jNP914Dy5r4ipWZKzF1KxIlqK9kmLxlNoE6/LoQfBhh1dMdnyfuM11sguAy
 s5pr9JeCPbWC0RE7st/qEivXF4lpj6hd3XoYfM2Y+oukj5xEPQevLTLHOgtesnx9
 guUAul5Sw27n+Dx8I0Qxf1n+5SkrijoAa72g5vAxTs+ilOey67qba012NaYSy7RK
 s15XOIZ/1JTS9JjkO7GR5NbG6AiIIAH5P+Y501ivCIrsWciTOgKj7cOzakIEV8/P
 NX4120Lh5lbBrWeYkl8WbgMO0Me8cThbALC+rncF/wjvGyREKyxNlZ9qvBqmHYjG
 5Et2DT+rANaDmmblgMK3tX/zI1g3pN51e+CRF+Hzh1jZD3MZ/i+KS4qgfGFDzMIj
 uoniO5VfyD4zRbyv4Grg7XMpXiP8xFxKDypglYiXzzwlkarUgbMGOoFE7AkiPOKB
 t9gLPetbDsDyU/bSpzHlfObZp+q+pCxHPhyLS7hxEi3gBxYajIMbkpHHJugnE0+H
 TfkIhy6QQm1vAPTpRXaE
 =ODt8
 -----END PGP SIGNATURE-----

Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6

Pull documentation updates from Jonathan Corbet:
 "The main thing here is Ingo's big subdirectory documenting feature
  support for each architecture.  Beyond that, it's the usual pile of
  fixes, tweaks, and small additions"

* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (79 commits)
  doc:md: fix typo in md.txt.
  Documentation/mic/mpssd: don't build x86 userspace when cross compiling
  Documentation/prctl: don't build tsc tests when cross compiling
  Documentation/vDSO: don't build tests when cross compiling
  Doc:ABI/testing: Fix typo in sysfs-bus-fcoe
  Doc: Docbook: Change wikipedia's URL from http to https in scsi.tmpl
  Doc: Change wikipedia's URL from http to https
  Documentation/kernel-parameters: add missing pciserial to the earlyprintk
  Doc:pps: Fix typo in pps.txt
  kbuild : Fix documentation of INSTALL_HDR_PATH
  Documentation: filesystems: updated struct file_operations documentation in vfs.txt
  kbuild: edit explanation of clean-files variable
  Doc: ja_JP: Fix typo in HOWTO
  Move freefall program from Documentation/ to tools/
  Documentation: ARM: EXYNOS: Describe boot loaders interface
  Doc:nfc: Fix typo in nfc-hci.txt
  vfs: Minor documentation fix
  Doc: networking: txtimestamp: fix printf format warning
  Documentation, intel_pstate: Improve legacy mode internal governors description
  Documentation: extend use case for EXPORT_SYMBOL_GPL()
  ...
2015-06-24 20:01:36 -07:00
Masanari Iida ae13c65bc7 Doc: Change wikipedia's URL from http to https
Recently wikipedia announced to secure access to the servers.
Now all http access re-route to https.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-06-22 10:14:05 -06:00
Scott Feldman b4ad7baa01 bridge: del external_learned fdbs from device on flush or ageout
We need to delete from offload the device externally learnded fdbs when any
one of these events happen:

1) Bridge ages out fdb.  (When bridge is doing ageing vs. device doing
ageing.  If device is doing ageing, it would send SWITCHDEV_FDB_DEL
directly).

2) STP state change flushes fdbs on port.

3) User uses sysfs interface to flush fdbs from bridge or bridge port:

	echo 1 >/sys/class/net/BR_DEV/bridge/flush
	echo 1 >/sys/class/net/BR_PORT/brport/flush

4) Offload driver send event SWITCHDEV_FDB_DEL to delete fdb entry.

For rocker, we can now get called to delete fdb entry in wait and nowait
contexts, so set NOWAIT flag when deleting fdb entry.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:08:49 -07:00
David S. Miller 25c43bf13b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-13 23:56:52 -07:00
Masanari Iida b07d496177 Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt
This patch fix URL (http to https) for wiki.wireshark.org.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-12 14:21:29 -07:00
Frans Klaver 03e8f01a67 Doc: networking: txtimestamp: fix printf format warning
Documentation/networking/timestamping/txtimestamp.c: In function ‘__print_timestamp’:
Documentation/networking/timestamping/txtimestamp.c:99:3: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘int64_t’ [-Wformat=]
   fprintf(stderr, "  (%+ld us)", cur_ms - prev_ms);

int64_t differs per platform, so a type specifier that differs along
with it is required.

Signed-off-by: Frans Klaver <fransklaver@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-06-05 07:59:10 +09:00
Scott Feldman 7616dcbb21 switchdev: documentation: use switchdev_port_obj_xxx for IPv4 FIB add/modify/delete ops
Clarify in documentation and code that IPV4 FIB add operation is used for
both adding a new FIB entry to the device and for modifying an existing FIB
entry on the device.

Also, remove left-over references to ipv4_fib ops and replace with details
on SWITCHDEV_PORT_IPV4_FIB object.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 23:47:23 -07:00
Scott Feldman 4b5364fbdc switchdev: documentation: for static FDB ops, use switchdev_port_fdb_xxx ops
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 23:47:23 -07:00
Scott Feldman d290f1fc70 switchdev: documentation: fix grammer error
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 23:47:23 -07:00
Scott Feldman f5ed2febda switchdev: documentation: fix longer-than-80-char lines
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 23:47:23 -07:00
David S. Miller 9d52bf0a23 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-05-28

Here's a set of patches intended for 4.2. The majority of the changes
are on the 802.15.4 side of things rather than Bluetooth related:

 - All sorts of cleanups & fixes to ieee802154 and related drivers
 - Rework of tx power support in ieee802154 and its drivers
 - Support for setting ieee802154 tx power through nl802154
 - New IDs for the btusb driver
 - Various cleanups & smaller fixes to btusb
 - New btrtl driver for Realtec devices
 - Fix suspend/resume for Realtek devices

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 23:26:45 -07:00
Eric Dumazet 07f4c90062 tcp/dccp: try to not exhaust ip_local_port_range in connect()
A long standing problem on busy servers is the tiny available TCP port
range (/proc/sys/net/ipv4/ip_local_port_range) and the default
sequential allocation of source ports in connect() system call.

If a host is having a lot of active TCP sessions, chances are
very high that all ports are in use by at least one flow,
and subsequent bind(0) attempts fail, or have to scan a big portion of
space to find a slot.

In this patch, I changed the starting point in __inet_hash_connect()
so that we try to favor even [1] ports, leaving odd ports for bind()
users.

We still perform a sequential search, so there is no guarantee, but
if connect() targets are very different, end result is we leave
more ports available to bind(), and we spread them all over the range,
lowering time for both connect() and bind() to find a slot.

This strategy only works well if /proc/sys/net/ipv4/ip_local_port_range
is even, ie if start/end values have different parity.

Therefore, default /proc/sys/net/ipv4/ip_local_port_range was changed to
32768 - 60999 (instead of 32768 - 61000)

There is no change on security aspects here, only some poor hashing
schemes could be eventually impacted by this change.

[1] : The odd/even property depends on ip_local_port_range values parity

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-27 13:30:44 -04:00
Lennert Buytenhek b251d4de67 Documentation/networking/ieee802154.txt: fix various inaccuracies.
* Update the linux-zigbee git:// repository URL.

* Remove the MLME section as the current kernel does not provide a
  full 802.15.4 MLME implementation.

* The hardmac example driver 'fakehard' was removed some time ago.

* The IEEE 802.15.4 device drivers live in drivers/net/ieee802154/,
  not in drivers/ieee802154/.

* The IEEE 802.15.4 MTU is 127 bytes, not 128 bytes.

* Some of the 6LoWPAN code lives in net/6lowpan/.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-05-26 20:25:58 +02:00
Jesper Dangaard Brouer 282fb58947 pktgen: add sample script pktgen_sample02_multiqueue.sh
Add the pktgen samples script pktgen_sample02_multiqueue.sh that
demonstrates generating packets on multiqueue NICs.

Specifically notice the options "-t" that specifies how many
kernel threads to activate.  Also notice the flag QUEUE_MAP_CPU,
which cause the SKB TX queue to be mapped to the CPU running the
kernel thread.  For best scalability people are also encourage to
map NIC IRQ /proc/irq/*/smp_affinity to CPU number.

Usage example with "-t" 4 threads and help:
 ./pktgen_sample02_multiqueue.sh -i eth4 -m 00:1B:21:3C:9D:F8 -t 4

Usage: ./pktgen_sample02_multiqueue.sh [-vx] -i ethX
  -i : ($DEV)       output interface/device (required)
  -s : ($PKT_SIZE)  packet size
  -d : ($DEST_IP)   destination IP
  -m : ($DST_MAC)   destination MAC-addr
  -t : ($THREADS)   threads to start
  -c : ($SKB_CLONE) SKB clones send before alloc new SKB
  -b : ($BURST)     HW level bursting of SKBs
  -v : ($VERBOSE)   verbose
  -x : ($DEBUG)     debug

Removing pktgen.conf-2-1 and pktgen.conf-2-2 as these examples
should be covered now.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22 23:59:17 -04:00
Jesper Dangaard Brouer 6f09479758 pktgen: add sample script pktgen_sample01_simple.sh
Add the first basic pktgen samples script pktgen_sample01_simple.sh,
which demonstrates the a simple use of the helper functions.
Removing pktgen.conf-1-1 as that example should be covered now.

The naming scheme pktgen_sampleNN, where NN is a number, should encourage
reading the samples in a specific order.

Script cause pktgen sending with a single thread and single interface,
and introduce flow variation via random UDP source port.

Usage example and help:
 ./pktgen_sample01_simple.sh -i eth4 -m 00:1B:21:3C:9D:F8 -d 192.168.8.2

Usage: ./pktgen_sample01_simple.sh [-vx] -i ethX
  -i : ($DEV)       output interface/device (required)
  -s : ($PKT_SIZE)  packet size
  -d : ($DEST_IP)   destination IP
  -m : ($DST_MAC)   destination MAC-addr
  -c : ($SKB_CLONE) SKB clones send before alloc new SKB
  -v : ($VERBOSE)   verbose
  -x : ($DEBUG)     debug

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22 23:59:17 -04:00
Jesper Dangaard Brouer 2a1ddf27e8 pktgen: document ability to add same device to several threads
The pktgen.txt documentation still claimed that adding same device to
multiple threads were not supported, but it have been since 2008 via
commit e6fce5b916 ("pktgen: multiqueue etc.").

Document this and describe the naming scheme dev@X, as the procfile name
still need to be unique.

Fixes: e6fce5b916 ("pktgen: multiqueue etc.")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22 23:59:16 -04:00
Jesper Dangaard Brouer 91db4b3c89 pktgen: doc were missing several config options
The pktgen.txt documentation over available config options were not complete.
Making the list complete by adding the following.

Pgcontrol commands:
 reset

Device commands:
 burst
 queue_map_min
 queue_map_max
 skb_priority
 tos
 traffic_class
 node
 spi
 dst6_max
 dst6_min
 vlan_cfi
 vlan_id
 vlan_p
 svlan_cfi
 svlan_id
 svlan_p

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22 23:59:16 -04:00
Jesper Dangaard Brouer d012827e81 pktgen: remove obsolete "max_before_softirq" from pktgen doc
And cleanup some whitespaces in pktgen.txt.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22 23:59:16 -04:00
Daniel Borkmann 492135557d tcp: add rfc3168, section 6.1.1.1. fallback
This work as a follow-up of commit f7b3bec6f5 ("net: allow setting ecn
via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing
ECN connections. In other words, this work adds a retry with a non-ECN
setup SYN packet, as suggested from the RFC on the first timeout:

  [...] A host that receives no reply to an ECN-setup SYN within the
  normal SYN retransmission timeout interval MAY resend the SYN and
  any subsequent SYN retransmissions with CWR and ECE cleared. [...]

Schematic client-side view when assuming the server is in tcp_ecn=2 mode,
that is, Linux default since 2009 via commit 255cac91c3 ("tcp: extend
ECN sysctl to allow server-side only ECN"):

 1) Normal ECN-capable path:

    SYN ECE CWR ----->
                <----- SYN ACK ECE
            ACK ----->

 2) Path with broken middlebox, when client has fallback:

    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
            SYN ----->
                <----- SYN ACK
            ACK ----->

In case we would not have the fallback implemented, the middlebox drop
point would basically end up as:

    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)
    SYN ECE CWR ----X crappy middlebox drops packet
                      (timeout, rtx)

In any case, it's rather a smaller percentage of sites where there would
occur such additional setup latency: it was found in end of 2014 that ~56%
of IPv4 and 65% of IPv6 servers of Alexa 1 million list would negotiate
ECN (aka tcp_ecn=2 default), 0.42% of these webservers will fail to connect
when trying to negotiate with ECN (tcp_ecn=1) due to timeouts, which the
fallback would mitigate with a slight latency trade-off. Recent related
paper on this topic:

  Brian Trammell, Mirja Kühlewind, Damiano Boppart, Iain Learmonth,
  Gorry Fairhurst, and Richard Scheffenegger:
    "Enabling Internet-Wide Deployment of Explicit Congestion Notification."
    Proc. PAM 2015, New York.
  http://ecn.ethz.ch/ecn-pam15.pdf

Thus, when net.ipv4.tcp_ecn=1 is being set, the patch will perform RFC3168,
section 6.1.1.1. fallback on timeout. For users explicitly not wanting this
which can be in DC use case, we add a net.ipv4.tcp_ecn_fallback knob that
allows for disabling the fallback.

tp->ecn_flags are not being cleared in tcp_ecn_clear_syn() on output, but
rather we let tcp_ecn_rcv_synack() take that over on input path in case a
SYN ACK ECE was delayed. Thus a spurious SYN retransmission will not prevent
ECN being negotiated eventually in that case.

Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf
Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
Signed-off-by: Brian Trammell <trammell@tik.ee.ethz.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Dave That <dave.taht@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-19 16:53:37 -04:00
Florian Westphal e578d9c025 net: sched: use counter to break reclassify loops
Seems all we want here is to avoid endless 'goto reclassify' loop.
tc_classify_compat even resets this counter when something other
than TC_ACT_RECLASSIFY is returned, so this skb-counter doesn't
break hypothetical loops induced by something other than perpetual
TC_ACT_RECLASSIFY return values.

skb_act_clone is now identical to skb_clone, so just use that.

Tested with following (bogus) filter:
tc filter add dev eth0 parent ffff: \
 protocol ip u32 match u32 0 0 police rate 10Kbit burst \
 64000 mtu 1500 action reclassify

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 15:08:14 -04:00
Scott Feldman 1f5dc44c88 switchdev: apply review comments on documentation
There were a few review comments on the switchdev.txt documentation that
didn't get included with the Spring Cleanup series, so include them now.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-13 12:26:28 -04:00
Scott Feldman 4ceec22d6d switchdev: bring documentation up-to-date
Much need updated of switchdev documentation to cover what's been
implmented to-date.  There are some XXX comments in the text for
unimplemented or broken items.  I'd like to keep these in there (poor-man's
TODO list) and update the document once each issue is resolved.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-12 18:43:56 -04:00
Mahesh Bandewar d22a5fc0c3 bonding: Implement user key part of port_key in an AD system.
The port key has three components - user-key, speed-part, and duplex-part.
The LSBit is for the duplex-part, next 5 bits are for the speed while the
remaining 10 bits are the user defined key bits. Get these 10 bits
from the user-space (through the SysFs interface) and use it to form the
admin port-key. Allowed range for the user-key is 0 - 1023 (10 bits). If
it is not provided then use zero for the user-key-bits (default).

It can set using following example code -

   # modprobe bonding mode=4
   # usr_port_key=$(( RANDOM & 0x3FF ))
   # echo $usr_port_key > /sys/class/net/bond0/bonding/ad_user_port_key
   # echo +eth1 > /sys/class/net/bond0/bonding/slaves
   ...
   # ip link set bond0 up

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
[jt: * fixed up style issues reported by checkpatch
     * fixed up context from change in ad_actor_sys_prio patch]
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:59:32 -04:00
Mahesh Bandewar 7451495755 bonding: Allow userspace to set actors' macaddr in an AD-system.
In an AD system, the communication between actor and partner is the
business between these two entities. In the current setup anyone on the
same L2 can "guess" the LACPDU contents and then possibly send the
spoofed LACPDUs and trick the partner causing connectivity issues for
the AD system. This patch allows to use a random mac-address obscuring
it's identity making it harder for someone in the L2 is do the same thing.

This patch allows user-space to choose the mac-address for the AD-system.
This mac-address can not be NULL or a Multicast. If the mac-address is set
from user-space; kernel will honor it and will not overwrite it. In the
absence (value from user space); the logic will default to using the
masters' mac as the mac-address for the AD-system.

It can be set using example code below -

   # modprobe bonding mode=4
   # sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \
                    $(( (RANDOM & 0xFE) | 0x02 )) \
                    $(( RANDOM & 0xFF )) \
                    $(( RANDOM & 0xFF )) \
                    $(( RANDOM & 0xFF )) \
                    $(( RANDOM & 0xFF )) \
                    $(( RANDOM & 0xFF )))
   # echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system
   # echo +eth1 > /sys/class/net/bond0/bonding/slaves
   ...
   # ip link set bond0 up

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
[jt: fixed up style issues reported by checkpatch]
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:59:32 -04:00
Mahesh Bandewar 6791e4661c bonding: Allow userspace to set actors' system_priority in AD system
This patch allows user to randomize the system-priority in an ad-system.
The allowed range is 1 - 0xFFFF while default value is 0xFFFF. If user
does not specify this value, the system defaults to 0xFFFF, which is
what it was before this patch.

Following example code could set the value -
    # modprobe bonding mode=4
    # sys_prio=$(( 1 + RANDOM + RANDOM ))
    # echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio
    # echo +eth1 > /sys/class/net/bond0/bonding/slaves
    ...
    # ip link set bond0 up

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
[jt: * fixed up style issues reported by checkpatch
     * changed how the default value is set in bond_check_params(), this
       makes the default consistent between what gets set for a new bond
       and what the default is claimed to be in the bonding options.]
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:59:31 -04:00
Alexei Starovoitov 62f64aed62 pktgen: introduce xmit_mode '<start_xmit|netif_receive>'
Introduce xmit_mode 'netif_receive' for pktgen which generates the
packets using familiar pktgen commands, but feeds them into
netif_receive_skb() instead of ndo_start_xmit().

Default mode is called 'start_xmit'.

It is designed to test netif_receive_skb and ingress qdisc
performace only. Make sure to understand how it works before
using it for other rx benchmarking.

Sample script 'pktgen.sh':
\#!/bin/bash
function pgset() {
  local result

  echo $1 > $PGDEV

  result=`cat $PGDEV | fgrep "Result: OK:"`
  if [ "$result" = "" ]; then
    cat $PGDEV | fgrep Result:
  fi
}

[ -z "$1" ] && echo "Usage: $0 DEV" && exit 1
ETH=$1

PGDEV=/proc/net/pktgen/kpktgend_0
pgset "rem_device_all"
pgset "add_device $ETH"

PGDEV=/proc/net/pktgen/$ETH
pgset "xmit_mode netif_receive"
pgset "pkt_size 60"
pgset "dst 198.18.0.1"
pgset "dst_mac 90:e2:ba:ff:ff:ff"
pgset "count 10000000"
pgset "burst 32"

PGDEV=/proc/net/pktgen/pgctrl
echo "Running... ctrl^C to stop"
pgset "start"
echo "Done"
cat /proc/net/pktgen/$ETH

Usage:
$ sudo ./pktgen.sh eth2
...
Result: OK: 232376(c232372+d3) usec, 10000000 (60byte,0frags)
  43033682pps 20656Mb/sec (20656167360bps) errors: 10000000

Raw netif_receive_skb speed should be ~43 million packet
per second on 3.7Ghz x86 and 'perf report' should look like:
  37.69%  kpktgend_0   [kernel.vmlinux]  [k] __netif_receive_skb_core
  25.81%  kpktgend_0   [kernel.vmlinux]  [k] kfree_skb
   7.22%  kpktgend_0   [kernel.vmlinux]  [k] ip_rcv
   5.68%  kpktgend_0   [pktgen]          [k] pktgen_thread_worker

If fib_table_lookup is seen on top, it means skb was processed
by the stack. To benchmark netif_receive_skb only make sure
that 'dst_mac' of your pktgen script is different from
receiving device mac and it will be dropped by ip_rcv

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 22:26:06 -04:00
Jesper Dangaard Brouer f1f00d8ff6 pktgen: adjust flag NO_TIMESTAMP to be more pktgen compliant
Allow flag NO_TIMESTAMP to turn timestamping on again, like other flags,
with a negation of the flag like !NO_TIMESTAMP.

Also document the option flag NO_TIMESTAMP.

Fixes: afb84b6261 ("pktgen: add flag NO_TIMESTAMP to disable timestamping")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-09 22:26:06 -04:00
Shawn Landden a2f1183599 can.h: make padding given by gcc explicit
The current definition of struct can_frame has a 16-byte size, with 8-byte
alignment, but the 3 bytes of padding are not explicit like the similar 2 bytes
of padding of struct canfd_frame. Make it explicit so it is easier to read.

Signed-off-by: Shawn Landden <shawn@churchofgit.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-05-06 08:03:19 +02:00
Tom Herbert 82a584b7cd ipv6: Flow label state ranges
This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.

Background:

IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.

Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one.  Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.

This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.

Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03 21:58:01 -04:00
Florian Westphal 4749c3ef85 net: sched: remove TC_MUNGED bits
Not used.

pedit sets TC_MUNGED when packet content was altered, but all the core
does is unset MUNGED again and then set OK2MUNGE.

And the latter isn't tested anywhere. So lets remove both
TC_MUNGED and TC_OK2MUNGE.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-02 22:25:17 -04:00
Eric Dumazet a31196b07f net: rfs: fix crash in get_rps_cpus()
Commit 567e4b7973 ("net: rfs: add hash collision detection") had one
mistake :

RPS_NO_CPU is no longer the marker for invalid cpu in set_rps_cpu()
and get_rps_cpu(), as @next_cpu was the result of an AND with
rps_cpu_mask

This bug showed up on a host with 72 cpus :
next_cpu was 0x7f, and the code was trying to access percpu data of an
non existent cpu.

In a follow up patch, we might get rid of compares against nr_cpu_ids,
if we init the tables with 0. This is silly to test for a very unlikely
condition that exists only shortly after table initialization, as
we got rid of rps_reset_sock_flow() and similar functions that were
writing this RPS_NO_CPU magic value at flow dismantle : When table is
old enough, it never contains this value anymore.

Fixes: 567e4b7973 ("net: rfs: add hash collision detection")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-26 16:07:57 -04:00
Robert Shearman 37bde79979 mpls: Per-device enabling of packet input
An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring this is for the edge device to always impose the labels,
and not allow forward labeled traffic from untrusted neighbours. This
is achieved by allowing a per-device configuration of whether MPLS
traffic input from that interface should be processed or not.

To be secure by default, the default state is changed to MPLS being
disabled on all interfaces unless explicitly enabled and no global
option is provided to change the default. Whilst this differs from
other protocols (e.g. IPv6), network operators are used to explicitly
enabling MPLS forwarding on interfaces, and with the number of links
to the MPLS core typically fairly low this doesn't present too much of
a burden on operators.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-22 14:24:54 -04:00
David S. Miller 87ffabb1f0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The dwmac-socfpga.c conflict was a case of a bug fix overlapping
changes in net-next to handle an error pointer differently.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-14 15:44:14 -04:00
Stephen Hemminger 322d3ed167 ixgb: remove references to ifconfig
Move documentation into this century, even if this device hasn't
been available for some time.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-09 22:04:04 -07:00
Stephen Hemminger d7018be0ae ixgbe: fix documentation
The MTU values in the documentation do not match the source.
The source has frame limit of IXGBE_MAX_JUMBO_FRAME_SIZE (9728)
which is MTU of 9710 because of the accounting for Ethernet header
and CRC.

Also, don't refer to the obsolete ifconfig command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-09 22:03:54 -07:00
Stephen Hemminger c0a34ebd43 igb: doc don't refer to ifconfig
ifconfig command is obsolete, best to remove all references so that
new users learn ip.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-04-09 22:03:20 -07:00
Sowmini Varadhan ebe96e641d RDS: Documentation: Document AF_RDS, PF_RDS and SOL_RDS correctly.
AF_RDS, PF_RDS and SOL_RDS are available in header files,
and there is no need to get their values from /proc. Document
this correctly.

Fixes: 0c5f9b8830 ("RDS: Documentation")

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-08 15:17:04 -04:00
Oliver Hartkopp a5581ef4c2 can: introduce new raw socket option to join the given CAN filters
The CAN_RAW socket can set multiple CAN identifier specific filters that lead
to multiple filters in the af_can.c filter processing. These filters are
indenpendent from each other which leads to logical OR'ed filters when applied.

This socket option joines the given CAN filters in the way that only CAN frames
are passed to user space that matched *all* given CAN filters. The semantic for
the applied filters is therefore changed to a logical AND.

This is useful especially when the filterset is a combination of filters where
the CAN_INV_FILTER flag is set in order to notch single CAN IDs or CAN ID
ranges from the incoming traffic.

As the raw_rcv() function is executed from NET_RX softirq the introduced
variables are implemented as per-CPU variables to avoid extensive locking at
CAN frame reception time.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-04-01 11:28:22 +02:00
Michal Sekletar 27cd545247 filter: introduce SKF_AD_VLAN_TPID BPF extension
If vlan offloading takes place then vlan header is removed from frame
and its contents, both vlan_tci and vlan_proto, is available to user
space via TPACKET interface. However, only vlan_tci can be used in BPF
filters.

This commit introduces a new BPF extension. It makes possible to load
the value of vlan_proto (vlan TPID) to register A. Support for classic
BPF and eBPF is being added, analogous to skb->protocol.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jpirko@redhat.com>

Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 15:25:15 -04:00
Hannes Frederic Sowa 9f0761c154 ipv6: add documentation for stable_secret, idgen_delay and idgen_retries knobs
Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:12:09 -04:00
Alexander Drozdov 682f048bd4 af_packet: pass checksum validation status to the user
Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the
af_packet user that at least the transport header checksum
has been already validated.

For now, the flag may be set for incoming packets only.

Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-23 22:01:28 -04:00
YOSHIFUJI Hideaki/吉藤英明 89c69d3ce5 net: neighbour: Document {mcast, ucast}_solicit, mcast_resolicit.
Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 21:47:40 -04:00
John Fastabend 822b3b2ebf net: Add max rate tx queue attribute
This adds a tx_maxrate attribute to the tx queue sysfs entry allowing
for max-rate limiting. Along with DCB-ETS and BQL this provides another
knob to tune queue performance. The limit units are Mbps.

By default it is disabled. To disable the rate limitation after it
has been set for a queue, it should be set to zero.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 14:55:18 -04:00