Jeff Kirsher says:
====================
This series contains updates to i40e only.
Greg provides fixes for the NPAR transmit scheduler where the driver
initialization caused the BW configurations to not take effect, so use
a BW configuration read and write back to "kick" the transmit scheduler
into action. Fixes the ethtool offline test, where we were not actually
taking the device offline before doing the testing.
Matt modifies the get and set LED functions so they ignore activity LEDs
since we are required to blink the link LEDs only.
Neerav provides a workaround for whenever a DCBX configuration is changed,
where the firmware doe not set the operational status bit of the
application TLV status as returned from the "Get CEE DCBX Oper Cfg" admin
queue command. So remove the check for the operational and sync bits of
the application TLV status until a firmware fix is provided.
Shannon changes the driver to grab the NVM devstarter version and not
the image version, since it is the more useful version and is what
should be displayed. Moves the IRQ tracking setup and tear down into
the same routines that do the IRQ setup and tear down. This keeps
like activities together and allows us to track exactly the number
of vectors reserved from the OS, which may be fewer than are available
from the hardware.
Jesse provides a fix to use a more portable sign extension by replacing
0xffff.... with ~(u64)0 or ~(u32)0. Also fixes XPS mask when resetting,
where the driver would accidentally clear the XPS mask for all queues
back to 0. This caused higher CPU utilization and had some other
performance impacts for transmit tests. Cleans up some whitespace
formatting.
Catherine provides a fix where some firmware versions are incorrectly
reporting a breakout cable as PHY type 0x3 when it should be 0x16
(I40E_PHY_TYPE_10GBASE_SFPP_CU). Adds the 10G and 40G AOC PHY types
to the case statement in get_media_type and ethtool get_settings so
that the correct information gets reported back to the user.
Anjali provides IOREMAP changes for future device support, where we
do not want to map the whole CSR space since some of it is mapped by
other drivers with different mapping methods.
Mitch changes the i40e driver to not "spam" the system log with
messages about VF VSI when VFs are created and when they are reset to
reduce user annoyance.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes this build error:
net/mpls/af_mpls.c: In function 'resize_platform_label_table':
net/mpls/af_mpls.c:767:4: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
labels = vzalloc(size);
^
Fixes: 7720c01f3f ("mpls: Add a sysctl to control the size of the mpls label table")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai says:
====================
cxgb4: RX Queue related cleanup and fixes
This patch series adds a common function to allocate RX queues and queue
allocation changes to RDMA CIQ
The patches series is created against 'net-next' tree.
And includes patches on cxgb4 driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
To allow for better scalability on systems with large core counts, we
will try and allocate enough RDMA Concentrator IQs and MSI/X vectors as
we have cores. If we cannot get enough MSI/X vectors, fall back to the
minimum required: 1 per adapter rx channel.
Also clean up cxgb_enable_msix() to make it readable and correct a bug
where the vectors are not correctly assigned if the driver doesn't get
the full amount requested.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a common function for all Rx queue allocation.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This extends the design in commit 958501163d ("bridge: Add support for
IEEE 802.11 Proxy ARP") with optional set of rules that are needed to
meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The
previously added BR_PROXYARP behavior is left as-is and a new
BR_PROXYARP_WIFI alternative is added so that this behavior can be
configured from user space when required.
In addition, this enables proxyarp functionality for unicast ARP
requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible
to use unicast as well as broadcast for these frames.
The key differences in functionality:
BR_PROXYARP:
- uses the flag on the bridge port on which the request frame was
received to determine whether to reply
- block bridge port flooding completely on ports that enable proxy ARP
BR_PROXYARP_WIFI:
- uses the flag on the bridge port to which the target device of the
request belongs
- block bridge port flooding selectively based on whether the proxyarp
functionality replied
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bump i40e to 1.2.11 and i40evf to 1.2.5
Change-ID: Ie13375941606b0a027e5b5dbc235f5f5f03b75c8
Signed-off-by: Sravanthi Tangeda <sravanthi.tangeda@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The PF driver spams the system log with messages about VF VSI when VFs
are created, as well as each time they are reset. This is annoying, and
the information isn't even useful most of the time.
Remove this message to reduce user annoyance.
Change-ID: I8de90d05380f54b038c9c8c3265150be87c9242c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Move the IRQ tracking setup and teardown into the same routines that
do the IRQ setup and teardown. This keeps like activities together and
allows us to track exactly the number of vectors reserved from the OS,
which may be fewer than are available from the HW.
Change-ID: I6b2b1a955c5f0ac6b94c3084304ed0b2ea6777cf
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
For future device support we do not want to map the whole CSR space since some
of it is mapped by other drivers with different mapping methods.
Note: As a side effect, the flash region (if exposed through the memory map)
gets unmapped too since it follows the future use region.
Change-ID: Ic729a2eacd692984220b1a415ff4fa0f98ea419a
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix some double blank lines and un-split a function declaration that all
fits on one line. Also make i40e_get_priv_flags static.
Change-ID: I11b5d25d1153a06b286d0d2f5d916d7727c58e4a
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add the 10G and 40G AOC PHY types to the case statement in get_media_type
and ethtool get_settings so that the correct information gets reported
back to the user.
Change-ID: I1b4849d22199a9acf7c8807166d0317c1faad375
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If the system administrator is requesting an offline diagnostic test using
'ethtool -t' then we should, you know, actually take the device offline
before doing the testing.
Change-ID: I6afa1cbfcc821c9ab6e6f47ed4d8dc2d8dd20e82
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Some FW versions are incorrectly reporting a breakout cable as PHY type
0x3 when it should be 0x16 (I40E_PHY_TYPE_10GBASE_SFPP_CU).
If we get this value back from FW and the version is < 4.40, reassign it
to I40E_PHY_TYPE_10GBASE_SFPP_CU.
Change-ID: Ibb41a0e3cd2c0753744e8553959240df6ed13ae8
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
During resets (possibly caused by a Tx hang) the driver would
accidentally clear the XPS mask for all queues back to 0.
This caused higher CPU utilization and had some other performance impacts
for transmit tests.
Change-ID: I95f112432c9e643a153eaa31cd28cdcbfdd01831
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use automatic sign extension by replacing 0xffff... constants
with ~(u64)0 or ~(u32)0.
Change-ID: I73cab4cd2611795bb12e00f0f24fafaaee07457c
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
0x2A is the NVM version so it has useful data but it is per image
version every image can have a different one. 0x18 is the dev starter
version which all the images for release will have the same version.
Of the two 0x18 is more useful and is what should be displayed.
Change-ID: Idf493da13a42ab211e2de0bef287f5de51033cca
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In CEE mode the firmware does not set the operational status bit of
the application TLV status as returned from the "Get CEE DCBX Oper Cfg"
AQ command. This occurs whenever a DCBX configuration is changed.
This is a workaround to remove the check for the operational and sync bits
of the application TLV status till a firmware fix is provided.
Change-ID: I1a31ff2fcadcb06feb5b55776a33593afc6ea176
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Modify our get and set LED functions so they ignore activity LEDs,
as we are required to blink the link LEDs only.
Change-ID: I647ea67a6fc95cbbab6e3cd01d81ec9ae096a9ad
Signed-off-by: Matt Jared <matthew.a.jared@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Recent changes to the driver initialization have caused the BW
configurations to not take effect. We use a BW configuration read and
write back to "kick" the Tx scheduler into action.
Change-ID: I94ab377c58d3a3986e3de62b6c199be3fd2ee5e6
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
1. Use c_index and ring->c_index to determine how many TxCBs/TxBDs are
ready for cleanup
- c_index = the current value of TDMA_CONS_INDEX
- TDMA_CONS_INDEX is HW-incremented and auto-wraparound (0x0-0xFFFF)
- ring->c_index = __bcmgenet_tx_reclaim() cleaned up to this point on
the previous invocation
2. Add bcmgenet_tx_ring->clean_ptr
- index of the next TxCB to be cleaned
- incremented as TxCBs/TxBDs are processed
- value always in range [ring->cb_ptr, ring->end_ptr]
3. Fix incrementing of dev->stats.tx_packets
- should be incremented only when tx_cb_ptr->skb != NULL
These changes simplify __bcmgenet_tx_reclaim(). Furthermore, Tx ring size
can now be any value.
With the old code, Tx ring size had to be a power-of-2:
num_tx_bds = ring->size;
c_index &= (num_tx_bds - 1);
last_c_index &= (num_tx_bds - 1);
Signed-off-by: Petri Gynther <pgynther@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck says:
====================
ipv4/fib_trie: Cleanups to prepare for introduction of key vector
This patch series is meant to mostly just clean up the fib_trie to prepare
it for the introduction of the key_vector. As such there are a number of
minor clean-ups such as reformatting the tnode to match the format once the
key vector is introduced, some optimizations to drop the need for a leaf
parent pointer, and some changes to remove duplication of effort such as
the 2 look-ups that were essentially being done per node insertion.
v2: Added code to cleanup idx >> n->bits and explain unsigned long logic
Added code to prevent allocation when tnode size is larger than size_t
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds code to prevent us from attempting to allocate a tnode with
a size larger than what can be represented by size_t.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change updates the fib_table_lookup function so that it is in sync
with the fib_find_node function in terms of the explanation for the index
check based on the bits value.
I have also updated it from doing a mask to just doing a compare as I have
found that seems to provide more options to the compiler as I have seen it
turn this into a shift of the value and test under some circumstances.
In addition I addressed one minor issue in which we kept computing the key
^ n->key when checking the fib aliases. I pulled the xor out of the loop
in order to reduce the number of memory reads in the lookup. As a result
we should save a couple cycles since the xor is only done once much earlier
in the lookup.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fib_table was wrapped in several places with an
rcu_read_lock/rcu_read_unlock however after looking over the code I found
several spots where the tables were being accessed as just standard
pointers without any protections. This change fixes that so that all of
the proper protections are in place when accessing the table to take RCU
replacement or removal of the table into account.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we are going to compact the leaf and tnode we first need to make sure
the fields are all in the same place. In that regard I am moving the leaf
pointer which represents the fib_alias hash list to occupy what is
currently the first key_vector pointer.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that the insert and delete functions make use of
the tnode pointer returned in the fib_find_node call. By doing this we
will not have to rely on the parent pointer in the leaf which will be going
away soon.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that the parent pointer is returned by reference in
fib_find_node. By doing this I can use it to find the parent node when I
am performing an insertion and I don't have to look for it again in
fib_insert_node.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that leaf_walk_rcu takes a tnode and a key instead
of the trie and a leaf.
The main idea behind this is to avoid using the leaf parent pointer as that
can have additional overhead in the future as I am trying to reduce the
size of a leaf down to 16 bytes on 64b systems and 12b on 32b systems.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that we only call resize on the tnodes, instead of
from each of the leaves. By doing this we can significantly reduce the
amount of time spent resizing as we can update all of the leaves in the
tnode first before we make any determinations about resizing. As a result
we can simply free the tnode in the case that all of the leaves from a
given tnode are flushed instead of resizing with each leaf removed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJU9tIxAAoJECte4hHFiupURkwP/03p+RItcv2+pBg3A61tqpo7
qwCNcJCZFhj/ULjZzZOgpeQS2TN0/ZlRwtZjuUX3E90oUd7ADjtsqOWrZKUko1o5
Km2gLQKCHCum6eAsvenWsqSXiMWMfu2+3vfvux6GuF7dO18wQydCHfAjpVU2jxHr
difmawYlisEBAT3sBdBsdypKkKCLM4HJ28fbJ2oGPTH1Jusxx8gRCdx2NyMNK5Kb
kqCiYntz8ghsYUGkVqhwUZPblae6u9EJeqMclGxtRvYlvynotkM+06gtV4uSNDTI
Z2lZd/4/M6UK6OputpcBofiTIF1VBvMDbq9yjTqKH3fiJhMuDgGaV+SqbyemjVoB
DURfohvS527JqQFs4vjN4vYx3t7EJJ9Si/CTHEiYcsNXXKnQ3cYiJAeG5qXZJZdh
h7TqGxzbP+5VJKWq/AsT6G74m2QUHpKIbcgvsJ4DA2WBEPN2OV9/r/X4EQZcQgxR
YCR6zRhvt7apO6ZZFtwX+tHbPVCGEB8m+Yj3f0Emga6S2v3Z2+s0bUfbme/FH8wI
k7ksaT9jEraM4KODTswzSmOxnjH8TkPw3B8hRY++s67x0/r73y9p9HLGTfQx9xiL
yUCoO3SQqK2/4PmqvO8FyNf6bWEpXugQlw0a2768hGTFWhwjp16dd4v42XF18+q2
gXYRNHe2WL0ROXp7Z8ov
=pJAi
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-4.1-20150304' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2015-03-04
this is a pull request of 3 patches for net-next/master.
Aaron Wu contributes three patches for the blackfin can driver, which
cleans up the driver and makes use of more platform independent code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla says:
====================
be2net: patch set
Hi Dave, the following patch set includes three feature additions relating
to SR-IOV to be2net.
Patch 1 avoid creating a non-RSS default RXQ when FW allows it.
This prevents wasting one RXQ for each VF.
Patch 2 adds support for evenly distributing all queue & filter resources
across VFs. The FW informs the driver as to which resources are distributable.
Patch 3 implements the sriov_configure PCI method to allow runtime
enablement of VFs via sysfs.
Pls consider applying this patch-set to the net-next tree. Thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the .sriov_configure() PCI method to allow for
runtime enabling/disabling of VFs. The module param "num_vfs" is now
deprecated.
At the time of driver load the PF-pool resources are allocated to the PF.
When the user enables VFs, the resources are then re-distributed across
PFs and VFs based on the number of VFs enabled.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When SR-IOV is enabled in the adapter, the FW distributes resources
evenly across the PF and it's VFs. This is currently done only for some
resources.
This patch adds support for a new cmd that queries the FW for the list
of resources for which the distribution is allowed and distributes them
accordingly.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On BE2, BE3 and Skhawk-R chips one non-RSS (called "default") RXQ was
needed to receive non-IP traffic. Some FW versions now export a
capability called IFACE_FLAGS_DEFQ_RSS where this requirement doesn't hold.
On such FWs the driver now does not create the non-RSS default queue.
This prevents wasting one RXQ per VF.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove Kconfig dependency and enable driver for
all ARCHs.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
sh_eth changes for net-next
Some minor new features and fixes.
These depend in part on the series I sent earlier for net, specifically
"sh_eth: WARN on access to a register not implemented in a particular
chip" depends on "sh_eth: Fix RX recovery on R-Car in case of RX ring
underrun".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The statistics registers have write-clear behaviour, which means we
will lose any increment between the read and write. Mitigate this by
only clearing when we read a non-zero value, so we will never falsely
report a total of zero. This also saves time as we only handle
error statistics here and they won't often be incremented.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are many different sets of registers implemented by the
different versions of this controller, and we can only expect this to
get more complicated in future. Limit how much ethtool needs to know
by including an explicit bitmap of which registers are included in the
dump, allowing room for future growth in the number of possible
registers.
As I don't have datasheets for all of these, I've only included
registers that are:
- defined in all 5 register type arrays, or
- used by the driver, or
- documented in the datasheet I have
Add one new capability flag so we can tell whether the RTRATE
register is implemented.
Delete the TSU_ADRL0 and TSU_ADR{H,L}31 definitions, as they weren't
used and the address table is already assumed to be contiguous.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we may silently read/write a register at offset 0. Change
this to WARN and then ignore the write or read-back all-ones.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At least on the R8A7790, RFS8 reflects the RINT8 (multicast) MAC
status flag.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Header file was in arch dependent location arch/blackfin/include/asm/bfin_can.h,
Now move and merge the useful contents of header file into driver code, note
the original header file is reserved for full registers set access test by other
code so it survives.
Signed-off-by: Aaron Wu <Aaron.wu@analog.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Blackfin was built without MMU, old driver code access the IO space by
physical address, introduce the ioremap approach to be compitable with
the common style supporting MMU enabled arch.
Signed-off-by: Aaron Wu <Aaron.wu@analog.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Replace the blackfin arch dependent style of bfin_read/bfin_write with
common readw/writew
Signed-off-by: Aaron Wu <Aaron.wu@analog.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Eric W. Biederman says:
====================
Basic MPLS support take 2
On top of my two pending neighbour table prep patches here is the mpls
support refactored to use them, and edited to not drop routes when
an interface goes down. Additionally the addition of RTA_LLGATEWAY
has been replaced with the addtion of RTA_VIA. RTA_VIA being an
attribute that includes the address family as well as the address
of the next hop.
MPLS is at it's heart simple and I have endeavoured to maintain that
simplicity in my implemenation.
This is an implementation of a RFC3032 forwarding engine, and basic MPLS
egress logic. Which should make linux sufficient to be a mpls
forwarding node or to be a LSA (Label Switched Router) as it says in all
of the MPLS documents. The ingress support will follow but it deserves
it's own discussion so I am pushing it separately.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike IPv4 this code notifies on all cases where mpls routes
are added or removed and it never automatically removes routes.
Avoiding both the userspace confusion that is caused by omitting
route updates and the possibility of a flood of netlink traffic
when an interface goes doew.
For now reserved labels are handled automatically and userspace
is not notified.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds two new netlink routing attributes:
RTA_VIA and RTA_NEWDST.
RTA_VIA specifies the specifies the next machine to send a packet to
like RTA_GATEWAY. RTA_VIA differs from RTA_GATEWAY in that it
includes the address family of the address of the next machine to send
a packet to. Currently the MPLS code supports addresses in AF_INET,
AF_INET6 and AF_PACKET. For AF_INET and AF_INET6 the destination mac
address is acquired from the neighbour table. For AF_PACKET the
destination mac_address is specified in the netlink configuration.
I think raw destination mac address support with the family AF_PACKET
will prove useful. There is MPLS-TP which is defined to operate
on machines that do not support internet packets of any flavor. Further
seem to be corner cases where it can be useful. At this point
I don't care much either way.
RTA_NEWDST specifies the destination address to forward the packet
with. MPLS typically changes it's destination address at every hop.
For a swap operation RTA_NEWDST is specified with a length of one label.
For a push operation RTA_NEWDST is specified with two or more labels.
For a pop operation RTA_NEWDST is not specified or equivalently an emtpy
RTAN_NEWDST is specified.
Those new netlink attributes are used to implement handling of rt-netlink
RTM_NEWROUTE, RTM_DELROUTE, and RTM_GETROUTE messages, to maintain the
MPLS label table.
rtm_to_route_config parses a netlink RTM_NEWROUTE or RTM_DELROUTE message,
verify no unhandled attributes or unhandled values are present and sets
up the data structures for mpls_route_add and mpls_route_del.
I did my best to match up with the existing conventions with the caveats
that MPLS addresses are all destination-specific-addresses, and so
don't properly have a scope.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>