OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
David S. Miller	2bd736fa0d	Another pull request for the next cycle, this time with quite a bit of content: * mesh fixes/improvements from Alexis, Bob, Chun-Yeow and Jesse * TDLS higher bandwidth support (Arik) * OCB fixes from Bertold Van den Bergh * suspend/resume fixes from Eliad * dynamic SMPS support for minstrel-HT (Krishna Chaitanya) * VHT bitrate mask support (Lorenzo Bianconi) * better regulatory support for 5/10 MHz channels (Matthias May) * basic support for MU-MIMO to avoid the multi-vif issue (Sara Sharon) along with a number of other cleanups. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJVzg5bAAoJEDBSmw7B7bqr3PAP/1r8wyZXxtySzz6P5Z9k0+2I 52NiSUISgmtnaQUyahf4n90eMU+gGJWQwPwIZFvMKg6bD4RW2XI4MdKmviKx8skU 4sDlDxMFrVMfV/ySwiPDAONWPtwwgKllIt0IDDnKs6kPdDlUcbKOTEFYhzZ1HhTZ 7Og4rJm7M90QpdMU7hmxmE5KRkp1hW0Yce1KPTW5U0j9yl9zbi4eLVWT+ac1WnZs GpItajd0BFtBy7DRHzX8RiRJ4pi+aWxhuYNqiSxUm0BqPWCzT7PP15M1kCGwrXtm /TTSVJl7WkLbOYI0PE0Y0XcJfZUg1c9aecCR3ubmRrQrGfOBFpN01jUANIRwqvZ3 3QRq1RZNLac0+zlBPjoFdOHmoaVX6UcJQKSgOhcfuM1BcNFnXZEcHFN4/SaEUfvJ 1ltybEeOEAckCMqqfHb1g/nVfJnlBjy811GzIrsHXqKqb7rRfGkfxmBxLrRzVknS PC970pbuhxICeeryKdVgK5BClWeT3TB1srt6OZ0QR1zlcfZbLZ8jqJlHJcy3szFi P43X9w8I6ZNTzkBU+lsCt9gbveYS+rSaJ+zm/SaF21ro33+FEdZ+p1ujjzp729Tz PnKobaOrku38Be7CSwJ760WvngC7gbZqGybGknBsws4dqDXJste0UjxulZeyaOkN nVmHDL45jc5rd8qjoPQV =kV1a -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2015-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Another pull request for the next cycle, this time with quite a bit of content: * mesh fixes/improvements from Alexis, Bob, Chun-Yeow and Jesse * TDLS higher bandwidth support (Arik) * OCB fixes from Bertold Van den Bergh * suspend/resume fixes from Eliad * dynamic SMPS support for minstrel-HT (Krishna Chaitanya) * VHT bitrate mask support (Lorenzo Bianconi) * better regulatory support for 5/10 MHz channels (Matthias May) * basic support for MU-MIMO to avoid the multi-vif issue (Sara Sharon) along with a number of other cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:25:04 -07:00
David S. Miller	90eb7fa51c	Merge branch 'bpf_fanout' Willem de Bruijn says: ==================== packet: add cBPF and eBPF fanout modes Allow programmable fanout modes. Support both classical BPF programs passed directly and extended BPF programs passed by file descriptor. One use case is packet steering by deep packet inspection, for instance for packet steering by application layer header fields. Separate the configuration of the fanout mode and the configuration of the program, to allow dynamic updates to the latter at runtime. Changes v1 -> v2: - follow SO_LOCK_FILTER semantics on filter updates - only accept eBPF programs of type BPF_PROG_TYPE_SOCKET_FILTER - rename PACKET_FANOUT_BPF to PACKET_FANOUT_CBPF to match man 2 bpf usage: "classic" vs. "extended" BPF. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:22:48 -07:00
Willem de Bruijn	30da679e67	selftests/net: test extended BPF fanout mode Test PACKET_FANOUT_EBPF by inserting a program into the the kernel with bpf(), then attaching it to the fanout group. Observe the same payload-based distribution as in the PACKET_FANOUT_CBPF test. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:22:48 -07:00
Willem de Bruijn	95e22792fa	selftests/net: test classic bpf fanout mode Test PACKET_FANOUT_CBPF by inserting a cBPF program that selects a socket by payload. Requires modifying the test program to send packets with multiple payloads. Also fix a bug in testing the return value of mmap() Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:22:48 -07:00
Willem de Bruijn	f2e520956a	packet: add extended BPF fanout mode Add fanout mode PACKET_FANOUT_EBPF that accepts an en extended BPF program to select a socket. Update the internal eBPF program by passing to socket option SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf(). Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:22:48 -07:00
Willem de Bruijn	47dceb8ecd	packet: add classic BPF fanout mode Add fanout mode PACKET_FANOUT_CBPF that accepts a classic BPF program to select a socket. This avoids having to keep adding special case fanout modes. One example use case is application layer load balancing. The QUIC protocol, for instance, encodes a connection ID in UDP payload. Also add socket option SOL_PACKET/PACKET_FANOUT_DATA that updates data associated with the socket group. Fanout mode PACKET_FANOUT_CBPF is the only user so far. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:22:47 -07:00
Jiri Benc	a1c234f95c	lwtunnel: rename ip lwtunnel attributes We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look very similar, yet they serve very different purpose. This is confusing for anyone trying to implement a user space tool supporting lwt. As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more logical to have them in lwtunnel.h together with the encap enum. Fixes: `3093fbe7ff` ("route: Per route IP tunnel metadata via lightweight tunnel") Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:07:15 -07:00
Guenter Roeck	62ee783bf1	smsc911x: Fix crash seen if neither ACPI nor OF is configured or used Commit `0b50dc4fc9` ("Convert smsc911x to use ACPI as well as DT") makes the call to smsc911x_probe_config() unconditional, and no longer fails if there is no device node. device_get_phy_mode() is called unconditionally, and if there is no phy node configured returns an error code. This error code is assigned to phy_interface, and interpreted elsewhere in the code as valid phy mode. This in turn causes qemu to crash when running a variant of realview_pb_defconfig. qemu: hardware error: lan9118_read: Bad reg 0x86 Fixes: `0b50dc4fc9` ("Convert smsc911x to use ACPI as well as DT") Cc: Jeremy Linton <jeremy.linton@arm.com> Cc Graeme Gregory <graeme.gregory@linaro.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:06:16 -07:00
David S. Miller	c87acb2558	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-08-17 1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels. From Thomas Egerer. 2) Use kmemdup instead of duplicating it in xfrm_dump_sa(). From Andrzej Hajda. 3) Pass oif to the xfrm lookups so that it gets set on the flow and the resolver routines can match based on oif. From David Ahern. 4) Add documentation for the new xfrm garbage collector threshold. From Alexander Duyck. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 14:05:14 -07:00
Jesse Brandeburg	fbaff3ef85	net: fix endian check warning in etherdevice.h Sparse builds have been warning for a really long time now that etherdevice.h has a conversion that is unsafe. include/linux/etherdevice.h:79:32: warning: restricted __be16 degrades to integer This code change fixes the issue and generates the exact same assembly before/after (checked on x86_64) Fixes: `2c722fe1c8` (etherdevice: Optimize a few is_<foo>_ether_addr functions) Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 12:14:53 -07:00
David S. Miller	f3ae683f09	Merge branch 'iff_no_queue' Phil Sutter says: ==================== net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len This series adds a new private net_device flag indicating that a device may (and probably should) be used without a queueing discipline attached to it. This is already common practice for many virtual device types like e.g. loopback, VLAN (802.1Q) or bridges (802.1D). The reason for this is that these devices lack an underlying layer which could impose back pressure and therefore making a TX queue necessary to not slow down senders. Up to now, drivers being aware of the above applying to them set dev->tx_queue_len to zero to indicate no qdisc should be attached to the interface they drive and the kernel reacts upon this by assigning the noop qdisc instead of the default pfifo_fast. This implicit agreement though leads to an inconvenient situation once a user tries to attach a real qdisc to these devices, as the formerly special tx_queue_len value becomes a regular one, limiting the queue to zero packets and thus prevents any TX from happening. To overcome this, practically all qdisc implementations intercept and sanitize the malicious value. With this series applied, drivers may signal the lack of need for a qdisc without having to tamper with tx_queue_len, making fallbacks in qdiscs and caveats in userspace unnecessary. Upon upstream acceptance, this series will be followed up by a set of patches converting device drivers, adding a warning so out-of-tree driver authors get aware of this change and dropping all special handling of tx_queue_len in net/sched/. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 11:50:25 -07:00
Phil Sutter	4b46995568	net: sch_generic: react upon IFF_NO_QUEUE flag Handle IFF_NO_QUEUE as alternative to tx_queue_len being zero. Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 11:50:18 -07:00
Phil Sutter	fa8187c964	net: declare new net_device priv_flag IFF_NO_QUEUE This private net_device flag can be set by drivers to inform that a device runs fine without a qdisc attached. This was formerly done by setting tx_queue_len to zero. Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 11:50:18 -07:00
Richard Alpe	8f8ff9135b	tipc: don't sanity check non-existing TLV (NL compat) A zero length payload means that no TLV (Type Length Value) data has been passed. Prior to this patch a non-existing TLV could be sanity checked with TLV_OK() resulting in random behavior where a user sending an empty message occasionally got a incorrect "operation not supported" message back. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 10:39:54 -07:00
Yuval Mintz	da3cc2da7c	bnx2: Fix bandwidth allocation for some MF modes Management firmware tells driver in case bandwidth configuration for a specific function exists, but [regretably] the same field has different meanings depending on the multi-function mode - it can either be a percentile value or an actual speed. For newer multi-function modes current logic is incorrect - driver understands values as actual speeds instead of percentages, causing the resulting chip configuration to be incorrect. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-17 10:27:57 -07:00
Eric Dumazet	1e31367899	ipv4: fix refcount leak in fib_check_nh() fib_lookup() forces FIB_LOOKUP_NOREF flag, while fib_table_lookup() does not. This patch solves the typical message at reboot time or device dismantle : unregister_netdevice: waiting for eth0 to become free. Usage count = 4 Fixes: `3bfd847203` ("net: Use passed in table for nexthop lookups") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsa@cumulusnetworks.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-16 22:14:32 -07:00
Emmanuel Grumbach	8f9c98df94	mac80211: fix BIT position for TDLS WIDE extended cap The bit was not according to ieee80211 specification. Fix that. Reviewed-by: Arik Nemtsov <arik@wizery.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:53 +02:00
Johannes Berg	40d9a38ad3	mac80211: use DECLARE_EWMA Instead of using the out-of-line average calculation, use the new DECLARE_EWMA() macro to declare a signal EWMA, and use that. This actually reduces the code size slightly (on x86-64) while also reducing the station info size by 80 bytes. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:53 +02:00
Johannes Berg	2377799c08	average: provide macro to create static EWMA Having the EWMA parameters stored in the runtime struct imposes memory requirements for the constant values that could just be inlined in the code. This particularly makes sense if there are a lot of such structs, for example in mac80211 in the station table where each station has a number of these in an array, and there can be many stations. Provide a macro DECLARE_EWMA() that declares the necessary struct and inline functions to access it with the parameters hard-coded; using this also means the user no longer needs to 'select AVERAGE' as it's entirely self-contained. In the mac80211 case, on x86-64, this actually slightly reduces code size, while also saving 80 bytes of runtime memory per sta. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:52 +02:00
Su Kang Yin	2459cd876e	mac80211_hwsim: unregister genetlink family properly During hwsim_init_netlink(), we should call genl_unregister_family() if failed on netlink_register_notifier() since the genetlink is already registered. Signed-off-by: Su Kang Yin <cantona@cantona.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:52 +02:00
Lorenzo Bianconi	b119ad6e72	mac80211: add rate mask logic for vht rates Define rc_rateidx_vht_mcs_mask array and rate_idx_match_vht_mcs_mask() method in order to apply mcs mask for vht rates Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:51 +02:00
Lorenzo Bianconi	e910867bd2	mac80211: define rate_control_apply_mask_ratetbl() Define rate_control_apply_mask_ratetbl() in order to apply ratemask in rate_control_set_rates() for station rate table Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:51 +02:00
Lorenzo Bianconi	90c66bd223	mac80211: remove ieee80211_tx_rate dependency in rate mask code Remove ieee80211_tx_rate dependency in rate_idx_match_legacy_mask(), rate_idx_match_mcs_mask() and rate_idx_match_mask() in order to use the previous logic to define a ratemask in rate_control_set_rates() for station rate table. Moreover move rate mask definition logic in rate_control_cap_mask() Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:50 +02:00
Lorenzo Bianconi	35225eb7a5	mac80211: remove ieee80211_tx_info from rate_control_apply_mask signature Remove unnecessary ieee80211_tx_info pointer from rate_control_apply_mask signature. rate_control_apply_mask() will be used to define a ratemask in rate_control_set_rates() for station rate table Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:50 +02:00
Bertold Van den Bergh	4b819f6cc4	mac80211: Make OCB mode set BSSID Perform the BSS_CHANGED_BSSID action when joining an OCB network. This is required to set the broadcast BSSID in some network drivers. Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:49 +02:00
Bertold Van den Bergh	cc11729893	mac80211: Only accept data frames in OCB mode Currently OCB mode accepts frames with bssid==broadcast and type!=beacon. Some non-data frames are sent matching this, for example probe responses. This results in unnecessary creation of STA entries. Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:49 +02:00
Bertold Van den Bergh	5765f9f66e	mac80211: Set txrc.bss to true for OCB interfaces To make mac80211 accept the multicast rate requested by the user the rate control should be told that it is operating in BSS mode. Without this, the default rate is selected in rate_control_send_low (!pubsta and !txrc->bss) Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:48 +02:00
Bertold Van den Bergh	876dc9308e	nl80211: Allow setting multicast rate on OCB interfaces Allow setting multicast rate on OCB interfaces. Current behaviour results in EOPNOTSUPP when attempting this. Signed-off-by: Bertold Van den Bergh <bertold.vandenbergh@esat.kuleuven.be> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:48 +02:00
Michal Kazior	9189ee31df	cfg80211: propagate set_wiphy failure to userspace If driver failed to setup wiphy params (e.g. rts threshold, fragmentation treshold) userspace wasn't properly notified about this. This could lead to user confusion who would think the command succeeded even if that wasn't the case. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:49:47 +02:00
Matthias May	4edd56981c	cfg80211: regulatory: handle 5 and 10 MHz channels properly The original assumption of 20MHz wide channels hasn't been true since the addition of support for 5 and 10 MHz channels. Change the code to no longer disable all channels that don't fit into the 20MHz grid, but instead set the appropriate flags to disable operation on specific bandwidths. Signed-off-by: Matthias May <matthias.may@neratec.com> [reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-08-14 17:48:46 +02:00
David S. Miller	d52736e24f	Merge branch 'vrf-lite' David Ahern says: ==================== VRF-lite - v6 In the context of internet scale routing a requirement that always comes up is the need to partition the available routing tables into disjoint routing planes. A specific use case is the multi-tenancy problem where each tenant has their own unique routing tables and in the very least need different default gateways. This patch allows the ability to create virtual router domains (aka VRFs (VRF-lite to be specific) in the linux packet forwarding stack. The main observation is that through the use of rules and socket binding to interfaces, all the facilities that we need are already present in the infrastructure. What is missing is a handle that identifies a routing domain and can be used to gather applicable rules/tables and uniqify neighbor selection. The scheme used needs to preserves the notions of ECMP, and general routing principles. This driver is a cross between functionality that the IPVLAN driver and the Team drivers provide where a device is created and packets into/out of the routing domain are shuttled through this device. The device is then used as a handle to identify the applicable rules. The VRF device is thus the layer3 equivalent of a vlan device. The very important point to note is that this is only a Layer3 concept so L2 tools (e.g., LLDP) do not need to be run in each VRF, processes can run in unaware mode or select a VRF to be talking through. Also the behavioral model is a generalized application of the familiar VRF-Lite model with some performance paths that need optimization. (Specifically the output route selector that Roopa, Robert, Thomas and EricB are currently discussing on the MPLS thread) High Level points ================= 1. Simple overlay driver (minimal changes to current stack) * uses the existing fib tables and fib rules infrastructure 2. Modelled closely after the ipvlan driver 3. Uses current API and infrastructure. * Applications can use SO_BINDTODEVICE or cmsg device indentifiers to pick VRF (ping, traceroute just work) * Standard IP Rules work, and since they are aggregated against the device, scale is manageable 4. Completely orthogonal to Namespaces and only provides separation in the routing plane (and ARP) N2 N1 (all configs here) +---------------+ +--------------+ \| \| \|swp1 :10.0.1.1+----------------------+swp1 :10.0.1.2 \| \| \| \| \| \|swp2 :10.0.2.1+----------------------+swp2 :10.0.2.2 \| \| \| +---------------+ \| VRF 1 \| \| table 5 \| \| \| +---------------+ \| \| \| VRF 2 \| N3 \| table 6 \| +---------------+ \| \| \| \| \|swp3 :10.0.2.1+----------------------+swp1 :10.0.2.2 \| \| \| \| \| \|swp4 :10.0.3.1+----------------------+swp2 :10.0.3.2 \| +--------------+ +---------------+ Given the topology above, the setup needed to get the basic VRF functions working would be Create the VRF devices and associate with a table ip link add vrf1 type vrf table 5 ip link add vrf2 type vrf table 6 Install the lookup rules that map table to VRF domain ip rule add pref 200 oif vrf1 lookup 5 ip rule add pref 200 iif vrf1 lookup 5 ip rule add pref 200 oif vrf2 lookup 6 ip rule add pref 200 iif vrf2 lookup 6 ip link set vrf1 up ip link set vrf2 up Enslave the routing member interfaces ip link set swp1 master vrf1 ip link set swp2 master vrf1 ip link set swp3 master vrf2 ip link set swp4 master vrf2 Connected and local routes are automatically moved from main and local tables to the VRF table. ping using VRF0 is simply ping -I vrf0 10.0.1.2 Design Highlights ================= If a device is enslaved to a VRF device (ie., associated with a VRF) then: 1. Rx path The master device index is used as the iif for all lookups. 2. Tx path Similarly, for Tx the VRF device oif is used in the flow to direct lookups to the table associated with the VRF via its rule. From there the FLOWI_FLAG_VRFSRC flag is used to indicate that the oif should not be used for FIB table lookups. 3. Connected and local routes On link up for a device, connected and local routes are added to the table associated with the VRF device, rather than the local and main tables. 4. Socket lookups Sockets operating in the VRF must be bound to the VRF device. As such socket lookups compare the VRF device index to sk_bound_dev_if. 5. Neighbor entries Neighbor entries are not impacted by the VRF device. Entries are associated with a particular interface; the VRF association is indirect via the interface-to-VRF device enslavement. Version 6 - addressed comments from DaveM - added patch to properly set oif in ip_send_unicast_reply. Needs to be set to VRF device for proper FIB lookup - added patch to handle IP fragments Version 5 - dropped patch regarding socket lookups; no longer needed + removed vrf helpers no longer needed after this patch is dropped - removed dev_open and close operations + no need to reset vrf data on an ifdown and creates problems if a slave is deleted while the vrf interface is down (Thanks, Nikolay) - cleanups for sparse warnings + make C=2 is now clean for vrf driver Version 4 - builds are clean with and without VRF device enabled (no, yes and module) - tightened the driver implementation + device add/delete, slave add/remove, and module unload are all clean - fixed RCU references + with RCU and lock debugging enabled changes are clean through the suite of tests - TX path uses custom dst, so patch refactoring rtable allocation is dropped along with the patch adding rt_nexthop helper - dropped the task patch that adds default bind to interface for sockets and the associated chvrf example command + the patches are a convenience for running unmodified code. They are not needed for the core functionality. Any application with support for SO_BINDTODEVICE works properly with this patch set. Version 3 - addressed comments from first 2 RFCs with the exception of the name Nicolas: We will do the name conversion once we agree on what the correct name should be (vrf, mrf or something else) - packets flow through the VRF device in both directions allowing the following: - tcpdump -i vrf<n> - tc rules on vrf device - netfilter rules on vrf device TO-DO ===== 1. IPv6 2. ipsec, xfrms - dst patch accepted into ipsec-next; will post VRF patch once merge happens 3. listen filter to allow 1 socket to work with multiple VRF devices - i.e., bind to VRF's a, b, c only or NOT VRFs e, f, g Eric B: I have ipsec working with VRFs implemented using the VRF driver, including the worst case scenario of complete duplication in the networking config. Thanks to Nikolay for his many, many code reviews whipping the device driver into shape, and bug-Fixes and ideas from Hannes, Roopa Prabhu, Jon Toppins, Jamal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:22 -07:00
David Ahern	193125dbd8	net: Introduce VRF device driver This driver borrows heavily from IPvlan and teaming drivers. Routing domains (VRF-lite) are created by instantiating a VRF master device with an associated table and enslaving all routed interfaces that participate in the domain. As part of the enslavement, all connected routes for the enslaved devices are moved to the table associated with the VRF device. Outgoing sockets must bind to the VRF device to function. Standard FIB rules bind the VRF device to tables and regular fib rule processing is followed. Routed traffic through the box, is forwarded by using the VRF device as the IIF and following the IIF rule to a table that is mated with the VRF. Example: Create vrf 1: ip link add vrf1 type vrf table 5 ip rule add iif vrf1 table 5 ip rule add oif vrf1 table 5 ip route add table 5 prohibit default ip link set vrf1 up Add interface to vrf 1: ip link set eth1 master vrf1 Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:22 -07:00
David Ahern	9972f134a2	net: frags: Add VRF device index to cache and lookup Fragmentation cache uses information from the IP header to reassemble packets. That information can be duplicated across VRFs -- same source and destination addresses, protocol and id. Handle fragmentation with VRFs by adding the VRF device index to entries in the cache and the lookup arg. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:21 -07:00
David Ahern	f7ba868b71	net: Use VRF index for oif in ip_send_unicast_reply If output device is not specified use VRF device if input device is enslaved. This is needed to ensure tcp acks and resets go out VRF device. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:21 -07:00
David Ahern	3bfd847203	net: Use passed in table for nexthop lookups If a user passes in a table for new routes use that table for nexthop lookups. Specifically, this solves the case where a connected route does not exist in the main table, but only another table and then a subsequent route is added with a next hop using the connected route. ie., $ ip route ls default via 10.0.2.2 dev eth0 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 169.254.0.0/16 dev eth0 scope link metric 1003 192.168.56.0/24 dev eth1 proto kernel scope link src 192.168.56.51 $ ip route ls table 10 1.1.1.0/24 dev eth2 scope link Without this patch adding a nexthop route fails: $ ip route add table 10 2.2.2.0/24 via 1.1.1.10 RTNETLINK answers: Network is unreachable With this patch the route is added successfully. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:21 -07:00
David Ahern	021dd3b8a1	net: Add routes to the table associated with the device When a device associated with a VRF is brought up or down routes should be added to/removed from the table associated with the VRF. fib_magic defaults to using the main or local tables. Have it use the table with the device if there is one. A part of this is directing prefsrc validations to the correct table as well. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:21 -07:00
David Ahern	30bbaa1950	net: Fix up inet_addr_type checks Currently inet_addr_type and inet_dev_addr_type expect local addresses to be in the local table. With the VRF device local routes for devices associated with a VRF will be in the table associated with the VRF. Provide an alternate inet_addr lookup to use a specific table rather than defaulting to the local table. inet_addr_type_dev_table keeps the same semantics as inet_addr_type but if the passed in device is enslaved to a VRF then the table for that VRF is used for the lookup. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:21 -07:00
David Ahern	15be405eb2	net: Add inet_addr lookup by table Currently inet_addr_type and inet_dev_addr_type expect local addresses to be in the local table. With the VRF device local routes for devices associated with a VRF will be in the table associated with the VRF. Provide an alternate inet_addr lookup to use a specific table rather than defaulting to the local table. Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:21 -07:00
David Ahern	9a24abfa42	udp: Handle VRF device in sendmsg For unconnected UDP sockets using a VRF device lookup source address based on VRF table. This allows the UDP header to be properly setup before showing up at the VRF device via the dst. Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:20 -07:00
David Ahern	613d09b30f	net: Use VRF device index for lookups on TX As with ingress use the index of VRF master device for route lookups on egress. However, the oif should only be used to direct the lookups to a specific table. Routes in the table are not based on the VRF device but rather interfaces that are part of the VRF so do not consider the oif for lookups within the table. The FLOWI_FLAG_VRFSRC is used to control this latter part. Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:20 -07:00
David Ahern	cd2fbe1b6b	net: Use VRF device index for lookups on RX On ingress use index of VRF master device for route lookups if real device is enslaved. Rules are expected to be installed for the VRF device to direct lookups to a specific table. Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:20 -07:00
David Ahern	4e3c89920c	net: Introduce VRF related flags and helpers Add a VRF_MASTER flag for interfaces and helper functions for determining if a device is a VRF_MASTER. Add link attribute for passing VRF_TABLE id. Add vrf_ptr to netdevice. Add various macros for determining if a device is a VRF device, the index of the master VRF device and table associated with VRF device. Signed-off-by: Shrijeet Mukherjee <shm@cumulusnetworks.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:43:20 -07:00
Andy Gospodarek	0344338bd8	net: addr IFLA_OPERSTATE to netlink message for ipv6 ifinfo This is useful information to include in ipv6 netlink messages that report interface information. IFLA_OPERSTATE is already included in ipv4 messages, but missing for ipv6. This closes that gap. Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 22:35:30 -07:00
Sasha Levin	da65ad1fe3	net: allow sleeping when modifying store_rps_map Commit `10e4ea751` ("net: Fix race condition in store_rps_map") has moved the manipulation of the rps_needed jump label under a spinlock. Since changing the state of a jump label may sleep this is incorrect and causes warnings during runtime. Make rps_map_lock a mutex to allow sleeping under it. Fixes: `10e4ea751` ("net: Fix race condition in store_rps_map") Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 21:40:57 -07:00
David S. Miller	968d7cb82a	Merge branch 'mv88e6xxx-hw-vlan' Vivien Didelot says: ==================== net: dsa: mv88e6xxx: add hardware VLAN support This patchset brings support to access hardware VLAN entries in DSA and mv88e6xxx, through switchdev VLAN objects. In the following example, ports swp[0-2] belong to bridge br0, and ports swp[3-4] belong to bridge br1. Here's an example of what can be achieved after this patchset: # bridge vlan add dev swp1 vid 100 master # bridge vlan add dev swp2 vid 100 master # bridge vlan add dev swp3 vid 100 master # bridge vlan add dev swp4 vid 100 master # bridge vlan del dev swp1 vid 100 master The above commands correctly programmed hardware VLAN 100 for port swp2, while ports swp3 and swp4 use software VLAN 100, as shown with: # bridge vlan port vlan ids swp0 None swp0 swp1 None swp1 swp2 100 swp2 100 swp3 100 swp3 swp4 100 swp4 br0 None br1 None Assuming that port 5 is the CPU port, the hardware VLAN table would contain the following data: VID FID SID 0 1 2 3 4 5 6 100 8 0 x x t x x t x Where 'x' means excluded, and 't' means tagged. Also, adding an FDB entry to VLAN 100 for port swp2 like this: # bridge fdb add 3c:97:0e:11:6e:30 dev swp2 vlan 100 Would result in the following example output: # bridge fdb # 01:00:5e:00:00:01 dev eth0 self permanent # 01:00:5e:00:00:01 dev eth1 self permanent # 00:50:d2:10:78:15 dev swp0 master br0 permanent # 00:50:d2:10:78:15 dev swp2 vlan 100 master br0 permanent # 3c:97:0e:11:6e:30 dev swp2 vlan 100 self static # 00:50:d2:10:78:15 dev swp3 master br1 permanent # 00:50:d2:10:78:15 dev swp3 vlan 100 master br1 permanent And the Address Translation Unit would contain: DB T/P Vec State Addr 008 Port 004 e 3c:97:0e:11:6e:30 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 21:31:14 -07:00
Vivien Didelot	8efdda4a1b	net: dsa: mv88e6xxx: use port 802.1Q mode Secure This commit changes the 802.1Q mode of each port from Disabled to Secure. This enables the VLAN support, by checking the VTU entries on ingress. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 21:31:13 -07:00
Vivien Didelot	0d3b33e602	net: dsa: mv88e6xxx: add VLAN Load support Implement port_pvid_set and port_vlan_add to add new entries in the VLAN hardware table, and join ports to them. The patch also implement the STU Get Next and Load Purge operations, since it is required to have a valid STU entry for at least all VLANs. Each VLAN has its own forwarding database, with FID num_ports+1 to 4095. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 21:31:13 -07:00
Vivien Didelot	7dad08d738	net: dsa: mv88e6xxx: add VLAN Purge support Add support for the VTU Load Purge operation and implement the port_vlan_del driver function to remove a port from a VLAN entry, and delete the VLAN if the given port was its last member. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 21:31:13 -07:00
Vivien Didelot	02512b6fcc	net: dsa: mv88e6xxx: add VLAN support to FDB dump Add an helper function to read the next valid VLAN entry for a given port. It is used in the VID to FID conversion function to retrieve the forwarding database assigned to a given VLAN port. Finally update the FDB getnext operation to iterate on the next valid port VLAN when the end of the current database is reached. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 21:31:13 -07:00
Vivien Didelot	b8fee95710	net: dsa: mv88e6xxx: add VLAN Get Next support Implement the port_pvid_get and vlan_getnext driver functions required to dump VLAN entries from the hardware, with the VTU Get Next operation. Some functions and structure will be shared with STU operations, since their table format are similar (e.g. STU data entries are accessible with the same registers as VTU entries, except with an offset of 2). Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-13 21:31:13 -07:00

1 2 3 4 5 ...

535593 Commits All Branches Search

535593 Commits

All Branches