Commit Graph

63026 Commits

Author SHA1 Message Date
Eric Dumazet 996b175e39 tcp: out_of_order_queue do not use its lock
TCP out_of_order_queue lock is not used, as queue manipulation
happens with socket lock held and we therefore use the lockless
skb queue routines (as __skb_queue_head())

We can use __skb_queue_head_init() instead of skb_queue_head_init()
to make this more consistent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 16:34:34 -05:00
Vijay Subramanian d4b36210c2 net: pkt_sched: PIE AQM scheme
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.

>From the IETF draft below:
" Bufferbloat is a phenomenon where excess buffers in the network cause high
latency and jitter. As more and more interactive applications (e.g. voice over
IP, real time video streaming and financial transactions) run in the Internet,
high latency and jitter degrade application performance. There is a pressing
need to design intelligent queue management schemes that can control latency and
jitter; and hence provide desirable quality of service to users.

We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software.  "

Many thanks to Dave Taht for extensive feedback, reviews, testing and
suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
suggestions.  Naeem Khademi and Dave Taht independently contributed to ECN
support.

For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.

Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.

For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Mythili Prabhu <mysuryan@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 15:13:01 -05:00
David S. Miller 9aa28f2b71 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says: <pablo@netfilter.org>

====================
nftables updates for net-next

The following patchset contains nftables updates for your net-next tree,
they are:

* Add set operation to the meta expression by means of the select_ops()
  infrastructure, this allows us to set the packet mark among other things.
  From Arturo Borrero Gonzalez.

* Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel
  Borkmann.

* Add new queue expression to nf_tables. These comes with two previous patches
  to prepare this new feature, one to add mask in nf_tables_core to
  evaluate the queue verdict appropriately and another to refactor common
  code with xt_NFQUEUE, from Eric Leblond.

* Do not hide nftables from Kconfig if nfnetlink is not enabled, also from
  Eric Leblond.

* Add the reject expression to nf_tables, this adds the missing TCP RST
  support. It comes with an initial patch to refactor common code with
  xt_NFQUEUE, again from Eric Leblond.

* Remove an unused variable assignment in nf_tables_dump_set(), from Michal
  Nazarewicz.

* Remove the nft_meta_target code, now that Arturo added the set operation
  to the meta expression, from me.

* Add help information for nf_tables to Kconfig, also from me.

* Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is
  available to other nf_tables objects, requested by Arturo, from me.

* Expose the table usage counter, so we can know how many chains are using
  this table without dumping the list of chains, from Tomasz Bursztyka.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 13:29:30 -05:00
Hannes Frederic Sowa 1e85c9b66d 8021q: make vlan_pcpu_stats visible without CONFIG_VLAN_8021Q
macvlan needs vlan_pcpu_stats so make it visible even if compiling
without VLAN_8021Q support. Otherwise a very long compiler error happens.

Fixes: cdf3e274cf ("macvlan: unify macvlan_pcpu_stats and vlan_pcpu_stats")
Cc: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-By: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05 20:27:55 -05:00
David S. Miller 855404efae Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for your net-next tree,
they are:

* Add full port randomization support. Some crazy researchers found a way
  to reconstruct the secure ephemeral ports that are allocated in random mode
  by sending off-path bursts of UDP packets to overrun the socket buffer of
  the DNS resolver to trigger retransmissions, then if the timing for the
  DNS resolution done by a client is larger than usual, then they conclude
  that the port that received the burst of UDP packets is the one that was
  opened. It seems a bit aggressive method to me but it seems to work for
  them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
  new NAT mode to fully randomize ports using prandom.

* Add a new classifier to x_tables based on the socket net_cls set via
  cgroups. These includes two patches to prepare the field as requested by
  Zefan Li. Also from Daniel Borkmann.

* Use prandom instead of get_random_bytes in several locations of the
  netfilter code, from Florian Westphal.

* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
  mark, also from Florian Westphal.

* Fix compilation warning due to unused variable in IPVS, from Geert
  Uytterhoeven.

* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.

* Add IPComp extension to x_tables, from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05 20:18:50 -05:00
Hauke Mehrtens b2395b8aea bcma: export bcma_find_core_unit()
This function is used to get a specific core when there is more than
one core of that specific type. This is used in bgmac to reset all GMAC
cores.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:25:19 -05:00
Li RongQing cdf3e274cf macvlan: unify macvlan_pcpu_stats and vlan_pcpu_stats
They are same, so unify them as one; since macvlan is a kind of vlan,
vlan_pcpu_stats should be a proper name for vlan and macvlan.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:11:33 -05:00
Li RongQing 8f84985fec net: unify the pcpu_tstats and br_cpu_netstats as one
They are same, so unify them as one, pcpu_sw_netstats.

Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
from if_tunnel and remove br_cpu_netstats from br_private.h

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:10:24 -05:00
David S. Miller 653864d9dd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e and pci_regs.h.

Anjali provides a patch to prevent messages from stray HMC events, except
at interrupt message level, and refactors the HMC error handling.

Catherine adds routines in probe to populate/check PCI bus speed and width,
then verify we are in a 8GT/s x8 PCIe slot and warn when we are not.

Shannon adds Wake-on-LAN support for i40e, fixes curly brace use as well as
return type for i40e_vsi_clear_rings().

Joseph implements receive offload for VXLAN for i40e, where the hardware
supports checksum offload/verification of the inner/outer header.

Mitch provides the bulk of the changes, where he refactors the VF reset
code so that it works on real hardware.  Then does code cleanup by
calling existing functions to enable and disable queues for VFs and
remove unused functions.  Removes a unnecessary log messages that are
seen at every VF reset, for example complaining about disabling queues
that are already disabled.  Fixes an error return when the VF asks to
add an invalid MAC address and if the VF sends a bad message, make it
more informative about what is actually going on.

Jesse refactors the LED function to flash LED lights correctly.

v2:
 - removed patch 5 "i40e: add set settings and pauseparam" based on
   feedback from Ben Hutchings, will re-work that patch for later
   submission
 - Added patch "i40e: Implementation of vxlan ndo's" from Joseph to
   address Or Gerlitz's questions and concerns.  This patch adds the
   implementation for the VXLAN ndo's and allows the hardware to do
   receive checksum offload for inner packets on the UDP ports that
   VXLAN notifies us about.
 - Added patch "i40e: using for_each_set_bit to simplify the code"
   from Wei Yongjun.  This patch uses for_each_set_bit() to simply
   the code.

v3:
 - fixed indentation issue in patch 11 based on feedback from
   Sergei Shtylyov.

Sorry for the delayed release of v4, it was delayed to the holidays.

v4:
 - Addressed Or Gerlitz's concerns about trying to get a hold of a mutex
   while holding a spin lock in patch 6 by executing the AQ commands from
   a subtask.
 - Addressed David Miller's Kconfig concerns by creating a Kconfig VXLAN
   option for i40e and wrapped appropriate code with the config option in
   patch 6.
 - Updated patch 7 based on the changes made in patch 6 in the above two
   bullets.

v5:
 - Added the patch to pci_regs.h based on David Miller's feedback to add
   PCI defines for speed and width
 - Updated patch 3 description to better explain the changes based on
   feedback from David Miller
 - Updated patch 4 to use the newly added defines to pci_regs.h instead
   of local defines
 - Updated patch 7 to use <net/vxlan.h> in the #include based on feedback
   from David Miller
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 19:50:35 -05:00
Sergei Shtylyov fbfcec635d phylib: make phy_scan_fixups() static
phy_scan_fixups()  isn't and shouldn't be called by the drivers directly, so
unexport it. And since Florian Fainelli's recent patches, the function is only
called locally, so we can make it static as well.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 19:27:58 -05:00
Sergei Shtylyov 29935aebc7 phylib: remove unused adjust_state() callback
Remove adjust_state() callback from 'struct phy_device' since it seems to have
never been really used from the inception: phy_start_machine() has been always
called with 2nd argument equal to NULL.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 19:27:58 -05:00
Sergei Shtylyov 4017b4d321 <linux/phy.h>: coding style fixes
Running 'checkpatch.pl' gives some errors and warnings:

- no spaces around =;

- * separated by space from the function name;

- { in function definition not on a separate line;

- line over 80 characters.

While fixing these, also fix the following style issues:

- file name in the heading comment;

- alignment not matching open paren.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 19:27:58 -05:00
Jeff Kirsher 55fdbfe7be pci_regs.h: Add PCI bus link speed and width defines
Add missing PCI bus link speed 8.0 GT/s and bus link widths of
x1, x2, x4 and x8.

CC: <linux-kernel@vger.kernel.org>
CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-03 18:25:25 -08:00
sfeldma@cumulusnetworks.com 4ee7ac7526 bonding: add ad_info attribute netlink support
Add nested IFLA_BOND_AD_INFO for bonding 802.3ad info.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 21:03:21 -05:00
sfeldma@cumulusnetworks.com ec029fac3e bonding: add ad_select attribute netlink support
Add IFLA_BOND_AD_SELECT to allow get/set of bonding parameter
ad_select via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 21:03:21 -05:00
sfeldma@cumulusnetworks.com 998e40bbf8 bonding: add lacp_rate attribute netlink support
Add IFLA_BOND_AD_LACP_RATE to allow get/set of bonding parameter
lacp_rate via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 21:03:21 -05:00
stephen hemminger 5e419e68a6 llc: make lock static
The llc_sap_list_lock does not need to be global, only acquired
in core.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 20:56:48 -05:00
stephen hemminger 8f09898bf0 socket: cleanups
Namespace related cleaning

 * make cred_to_ucred static
 * remove unused sock_rmalloc function

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 20:55:58 -05:00
Tom Herbert 9a4aa9af44 ipv4: Use percpu Cache route in IP tunnels
percpu route cache eliminates share of dst refcnt between CPUs.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 19:40:57 -05:00
Tom Herbert 7d442fab0a ipv4: Cache dst in tunnels
Avoid doing a route lookup on every packet being tunneled.

In ip_tunnel.c cache the route returned from ip_route_output if
the tunnel is "connected" so that all the rouitng parameters are
taken from tunnel parms for a packet. Specifically, not NBMA tunnel
and tos is from tunnel parms (not inner packet).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 19:38:45 -05:00
Daniel Borkmann 82a37132f3 netfilter: x_tables: lightweight process control group matching
It would be useful e.g. in a server or desktop environment to have
a facility in the notion of fine-grained "per application" or "per
application group" firewall policies. Probably, users in the mobile,
embedded area (e.g. Android based) with different security policy
requirements for application groups could have great benefit from
that as well. For example, with a little bit of configuration effort,
an admin could whitelist well-known applications, and thus block
otherwise unwanted "hard-to-track" applications like [1] from a
user's machine. Blocking is just one example, but it is not limited
to that, meaning we can have much different scenarios/policies that
netfilter allows us than just blocking, e.g. fine grained settings
where applications are allowed to connect/send traffic to, application
traffic marking/conntracking, application-specific packet mangling,
and so on.

Implementation of PID-based matching would not be appropriate
as they frequently change, and child tracking would make that
even more complex and ugly. Cgroups would be a perfect candidate
for accomplishing that as they associate a set of tasks with a
set of parameters for one or more subsystems, in our case the
netfilter subsystem, which, of course, can be combined with other
cgroup subsystems into something more complex if needed.

As mentioned, to overcome this constraint, such processes could
be placed into one or multiple cgroups where different fine-grained
rules can be defined depending on the application scenario, while
e.g. everything else that is not part of that could be dropped (or
vice versa), thus making life harder for unwanted processes to
communicate to the outside world. So, we make use of cgroups here
to track jobs and limit their resources in terms of iptables
policies; in other words, limiting, tracking, etc what they are
allowed to communicate.

In our case we're working on outgoing traffic based on which local
socket that originated from. Also, one doesn't even need to have
an a-prio knowledge of the application internals regarding their
particular use of ports or protocols. Matching is *extremly*
lightweight as we just test for the sk_classid marker of sockets,
originating from net_cls. net_cls and netfilter do not contradict
each other; in fact, each construct can live as standalone or they
can be used in combination with each other, which is perfectly fine,
plus it serves Tejun's requirement to not introduce a new cgroups
subsystem. Through this, we result in a very minimal and efficient
module, and don't add anything except netfilter code.

One possible, minimal usage example (many other iptables options
can be applied obviously):

 1) Configuring cgroups if not already done, e.g.:

  mkdir /sys/fs/cgroup/net_cls
  mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
  mkdir /sys/fs/cgroup/net_cls/0
  echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
  (resp. a real flow handle id for tc)

 2) Configuring netfilter (iptables-nftables), e.g.:

  iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP

 3) Running applications, e.g.:

  ping 208.67.222.222  <pid:1799>
  echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
  [...]
  ping 208.67.220.220  <pid:1804>
  ping: sendmsg: Operation not permitted
  [...]
  echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
  [...]

Of course, real-world deployments would make use of cgroups user
space toolsuite, or own custom policy daemons dynamically moving
applications from/to various cgroups.

  [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:44 +01:00
Daniel Borkmann 86f8515f97 net: netprio: rename config to be more consistent with cgroup configs
While we're at it and introduced CGROUP_NET_CLASSID, lets also make
NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
option consistent among networking cgroups, but also among cgroups
CONFIG conventions in general as the vast majority has a prefix of
CONFIG_CGROUP_<SUBSYS>.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:42 +01:00
Daniel Borkmann fe1217c4f3 net: net_cls: move cgroupfs classid handling into core
Zefan Li requested [1] to perform the following cleanup/refactoring:

- Split cgroupfs classid handling into net core to better express a
  possible more generic use.

- Disable module support for cgroupfs bits as the majority of other
  cgroupfs subsystems do not have that, and seems to be not wished
  from cgroup side. Zefan probably might want to follow-up for netprio
  later on.

- By this, code can be further reduced which previously took care of
  functionality built when compiled as module.

cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so
that we are consistent with {netclassid,netprio}_cgroup naming that is
under net/core/ as suggested by Zefan.

No change in functionality, but only code refactoring that is being
done here.

 [1] http://patchwork.ozlabs.org/patch/304825/

Suggested-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:41 +01:00
stephen hemminger dcd93ed4cd netfilter: nf_conntrack: remove dead code
The following code is not used in current upstream code.
Some of this seems to be old hooks, other might be used by some
out of tree module (which I don't care about breaking), and
the need_ipv4_conntrack was used by old NAT code but no longer
called.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:37 +01:00
stephen hemminger 02eca9d2cc netfilter: ipset: remove unused code
Function never used in current upstream code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:35 +01:00
Daniel Borkmann 34ce324019 netfilter: nf_nat: add full port randomization support
We currently use prandom_u32() for allocation of ports in tcp bind(0)
and udp code. In case of plain SNAT we try to keep the ports as is
or increment on collision.

SNAT --random mode does use per-destination incrementing port
allocation. As a recent paper pointed out in [1] that this mode of
port allocation makes it possible to an attacker to find the randomly
allocated ports through a timing side-channel in a socket overloading
attack conducted through an off-path attacker.

So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
in regard to the attack described in this paper. As we need to keep
compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
selection algorithm with a simple prandom_u32() in order to mitigate
this attack vector. Note that the lfsr113's internal state is
periodically reseeded by the kernel through a local secure entropy
source.

More details can be found in [1], the basic idea is to send bursts
of packets to a socket to overflow its receive queue and measure
the latency to detect a possible retransmit when the port is found.
Because of increasing ports to given destination and port, further
allocations can be predicted. This information could then be used by
an attacker for e.g. for cache-poisoning, NS pinning, and degradation
of service attacks against DNS servers [1]:

  The best defense against the poisoning attacks is to properly
  deploy and validate DNSSEC; DNSSEC provides security not only
  against off-path attacker but even against MitM attacker. We hope
  that our results will help motivate administrators to adopt DNSSEC.
  However, full DNSSEC deployment make take significant time, and
  until that happens, we recommend short-term, non-cryptographic
  defenses. We recommend to support full port randomisation,
  according to practices recommended in [2], and to avoid
  per-destination sequential port allocation, which we show may be
  vulnerable to derandomisation attacks.

Joint work between Hannes Frederic Sowa and Daniel Borkmann.

 [1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
 [2] http://arxiv.org/pdf/1205.5190v1.pdf

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:26 +01:00
stephen hemminger 9c75f4029c sched action: make local function static
No need to export functions only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:36 -05:00
Sachin Kamat 96bfc80d83 net: Cleanup in eth-netx.h
Commit 2960ed3468 ("ARM: netx: move platform_data definitions")
moved the file to the current location but forgot to remove the pointer
to its previous location. Clean it up. While at it also change the header
file protection macros appropriately.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:35 -05:00
Li RongQing 0c3584d589 ipv6: remove prune parameter for fib6_clean_all
since the prune parameter for fib6_clean_all always is 0, remove it.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:35 -05:00
stephen hemminger e82435341f ipv6: namespace cleanups
Running 'make namespacecheck' shows:
  net/ipv6/route.o
    ipv6_route_table_template
    rt6_bind_peer
  net/ipv6/icmp.o
    icmpv6_route_lookup
    ipv6_icmp_table_template

This addresses some of those warnings by:
 * make icmpv6_route_lookup static
 * move inline's out of ip6_route.h since only used into route.c
 * move rt6_bind_peer into route.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:46:09 -05:00
stephen hemminger 1d143d9f0c net: core functions cleanup
The following functions are not used outside of net/core/dev.c
and should be declared static.

  call_netdevice_notifiers_info
  __dev_remove_offload
  netdev_has_any_upper_dev
  __netdev_adjacent_dev_remove
  __netdev_adjacent_dev_link_lists
  __netdev_adjacent_dev_unlink_lists
  __netdev_adjacent_dev_unlink
  __netdev_adjacent_dev_link_neighbour
  __netdev_adjacent_dev_unlink_neighbour

And the following are never used and should be deleted
  netdev_lower_dev_get_private_rcu
  __netdev_find_adj_rcu

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:46:09 -05:00
stephen hemminger 2173f8d953 netlink: cleanup tap related functions
Cleanups in netlink_tap code
 * remove unused function netlink_clear_multicast_users
 * make local function static

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:43:36 -05:00
stephen hemminger 3678a9d863 netlink: cleanup rntl_af_register
The function __rtnl_af_register is never called outside this
code, and the return value is always 0.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:42:19 -05:00
John W. Linville ad86c55bac Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-01-01 15:39:56 -05:00
Zhi Yong Wu c9d8ca0454 net, rps: fix build failure when CONFIG_RPS isn't set
In file included from net/socket.c:99:0:
include/net/sock.h: In function ‘sock_rps_record_flow’:
include/net/sock.h:849:30: error: ‘const struct sock’ has no member named ‘sk_rxhash’
include/net/sock.h: In function ‘sock_rps_reset_flow’:
include/net/sock.h:854:29: error: ‘const struct sock’ has no member named ‘sk_rxhash’

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 15:59:27 -05:00
Yang Yingliang 6a031f67c8 sch_netem: support of 64bit rates
Add a new attribute to support 64bit rates so that
tc can use them to break the 32bit limit.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 14:31:44 -05:00
Or Gerlitz 7ffdf726cf net/mlx4_core: Add basic support for TCP/IP offloads under tunneling
Add the low-level device commands and definitions used for TCP/IP HW offloads
of tunneled/vxlan traffic which are supported by the ConnectX3-pro NIC.

This is done through the following elements:

 - read tunneling device caps in QUERY_DEV_CAP
 - add helper function to do SET_PORT for tunneling
 - add DMFS VXLAN steering rule definitions
 - add CQE and WQE checksum offload field definitions

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 14:31:43 -05:00
Daniel Borkmann 604d13c97f netlink: specify netlink packet direction for nlmon
In order to facilitate development for netlink protocol dissector,
fill the unused field skb->pkt_type of the cloned skb with a hint
of the address space of the new owner (receiver) socket in the
notion of "to kernel" resp. "to user".

At the time we invoke __netlink_deliver_tap_skb(), we already have
set the new skb owner via netlink_skb_set_owner_r(), so we can use
that for netlink_is_kernel() probing.

In normal PF_PACKET network traffic, this field denotes if the
packet is destined for us (PACKET_HOST), if it's broadcast
(PACKET_BROADCAST), etc.

As we only have 3 bit reserved, we can use the value (= 6) of
PACKET_FASTROUTE as it's _not used_ anywhere in the whole kernel
and not supported anywhere, and packets of such type were never
exposed to user space, so there are no overlapping users of such
kind. Thus, as wished, that seems the only way to make both
PACKET_* values non-overlapping and therefore device agnostic.

By using those two flags for netlink skbs on nlmon devices, they
can be made available and picked up via sll_pkttype (previously
unused in netlink context) in struct sockaddr_ll. We now have
these two directions:

 - PACKET_USER (= 6)    ->  to user space
 - PACKET_KERNEL (= 7)  ->  to kernel space

Partial `ip a` example strace for sa_family=AF_NETLINK with
detected nl msg direction:

syscall:                     direction:
sendto(3,  ...) = 40         /* to kernel */
recvmsg(3, ...) = 3404       /* to user */
recvmsg(3, ...) = 1120       /* to user */
recvmsg(3, ...) = 20         /* to user */
sendto(3,  ...) = 40         /* to kernel */
recvmsg(3, ...) = 168        /* to user */
recvmsg(3, ...) = 144        /* to user */
recvmsg(3, ...) = 20         /* to user */

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 14:31:43 -05:00
Neil Horman 0a9a8bfd29 printk: Add a DEPRECATED macro
sctp has several points in its setsockopt path in which it issues deprecation
warnings.  It seems like it might be handy to macrotize such a warning so other
subsystems can use it easily

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 13:58:40 -05:00
Tom Herbert fe47755852 net: Allow setting sock flow hash without a sock
This patch adds sock_rps_record_flow_hash and sock_rps_reset_flow_hash
which take a hash value as an argument and sets the sock_flow_table
accordingly.  This allows the table to be populated in cases where flow
is being tracked outside of a sock structure.

sock_rps_record_flow and sock_rps_reset_flow call this function
where the hash is taken from sk_rxhash.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 13:31:34 -05:00
Eric Leblond cc70d069e2 netfilter: REJECT: separate reusable code
This patch prepares the addition of TCP reset support in
the nft_reject module by moving reusable code into a header
file.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-30 15:04:41 +01:00
stephen hemminger 24245a1b05 lro: remove dead code
Remove leftover code that is not used anywhere in current tree.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29 16:34:25 -05:00
stephen hemminger f7e56a76ac tcp: make local functions static
The following are only used in one file:
  tcp_connect_init
  tcp_set_rto

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29 16:34:24 -05:00
Stephen Hemminger ea074b3495 ipv4: ping make local stuff static
Don't export ping_table or ping_v4_sendmsg. Both are only used
inside ping code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-28 17:05:45 -05:00
Stephen Hemminger 068a6e1834 ipv4: remove unused function
inetpeer_invalidate_family defined but never used

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-28 17:03:20 -05:00
Stephen Hemminger 7195cf7221 arp: make arp_invalidate static
Don't export arp_invalidate, only used in arp.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-28 17:02:46 -05:00
Arturo Borrero Gonzalez e035b77ac7 netfilter: nf_tables: nft_meta module get/set ops
This patch adds kernel support for the meta expression in get/set
flavour. The set operation indicates that a given packet has to be
set with a property, currently one of mark, priority, nftrace.
The get op is what was currently working: evaluate the given
packet property.

In the nftrace case, the value is always 1. Such behaviour is copied
from net/netfilter/xt_TRACE.c

The NFTA_META_DREG and NFTA_META_SREG attributes are mutually
exclusives.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-12-28 14:02:12 +01:00
stephen hemminger 0db901bd33 macvlan: make start_xmit local
Only used in one file, no need to expose

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-28 01:20:46 -05:00
Yang Yingliang 2e04ad424b sch_tbf: add TBF_BURST/TBF_PBURST attribute
When we set burst to 1514 with low rate in userspace,
the kernel get a value of burst that less than 1514,
which doesn't work.

Because it may make some loss when transform burst
to buffer in userspace. This makes burst lose some
bytes, when the kernel transform the buffer back to
burst.

This patch adds two new attributes to support sending
burst/mtu to kernel directly to avoid the loss.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-26 13:54:22 -05:00
stephen hemminger 652a4a53fc mdio: unused ethtool functions
Use it or lose it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-26 13:29:35 -05:00