This patch allows userspace to enslave/release slave devices via netlink
interface using IFLA_MASTER. This introduces generic way to add/remove
underling devices.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm testing an API that uses IFLA_AF_SPEC attribute.
In the rtnetlink core , the set_link_af() member
of the rtnl_af_ops struct receives the nested attribute
(as I expected), but the validate_link_af() member
receives the parent attribute.
IMO, this patch fixes this.
Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_group_changelink() is invoked by rtnl_newlink() before the link
attributes have been validated. Additionally the group changes are
performed even if NLM_F_CREATE is specified and a new link is
created, while more reasonable semantics would be to set the group
value on the newly created link.
Fix both problems by moving the rtnl_group_changelink() invocation
down to the handling of non-existant links without NLM_F_CREATE()
and add a dev_set_group() call to rtnl_create_link().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Vlad Dogaru <ddvlad@rosedu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a rtnetlink request specifies a negative or zero ifindex and has no
interface name attribute, but has a group attribute, then the chenges
are made to all the interfaces belonging to the specified group.
Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Net devices can now be grouped, enabling simpler manipulation from
userspace. This patch adds a group field to the net_device structure, as
well as rtnetlink support to query and modify it.
Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH,
when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits
being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL,
non-dump requests with NLM_F_EXCL set are mistaken as dump requests.
Substitute the condition to test for _all_ bits being set.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As David pointed out correctly, updates to af-specific attributes
are currently not atomic. If multiple changes are requested and
one of them fails, previous updates may have been applied already
leaving the link behind in a undefined state.
This patch splits the function parse_link_af() into two functions
validate_link_af() and set_link_at(). validate_link_af() is placed
to validate_linkmsg() check for errors as early as possible before
any changes to the link have been made. set_link_af() is called to
commit the changes later.
This method is not fail proof, while it is currently sufficient
to make set_link_af() inerrable and thus 100% atomic, the
validation function method will not be able to detect all error
scenarios in the future, there will likely always be errors
depending on states which are f.e. not protected by rtnl_mutex
and thus may change between validation and setting.
Also, instead of silently ignoring unknown address families and
config blocks for address families which did not register a set
function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
returned to avoid comitting 4 out of 5 update requests without
notifying the user.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each net_device contains address family specific data such as
per device settings and statistics. We already expose this data
via procfs/sysfs and partially netlink.
The netlink method requires the requester to send one RTM_GETLINK
request for each address family it wishes to receive data of
and then merge this data itself.
This patch implements a new API which combines all address family
specific link data in a new netlink attribute IFLA_AF_SPEC.
IFLA_AF_SPEC contains a sequence of nested attributes, one for each
address family which in turn defines the structure of its own
attribute. Example:
[IFLA_AF_SPEC] = {
[AF_INET] = {
[IFLA_INET_CONF] = ...,
},
[AF_INET6] = {
[IFLA_INET6_FLAGS] = ...,
[IFLA_INET6_CONF] = ...,
}
}
The API also allows for address families to implement a function
which parses the IFLA_AF_SPEC attribute sent by userspace to
implement address family specific link options.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
nlmsg_total_size() calculates the length of a netlink message
including header and alignment. nla_total_size() calculates the
space an individual attribute consumes which was meant to be used
in this context.
Also, ensure to account for the attribute header for the
IFLA_INFO_XSTATS attribute as implementations of get_xstats_size()
seem to assume that we do so.
The addition of two message headers minus the missing attribute
header resulted in a calculated message size that was larger than
required. Therefore we never risked running out of skb tailroom.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function rtnl_kill_links is defined but never used.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to use a temporary struct rtnl_link_stats64 variable,
just copy the source to skb buffer.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CodingStyle cleanups
EXPORT_SYMBOL should immediately follow the symbol declaration.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a small possibility that a reader gets incorrect values on 32
bit arches. SNMP applications could catch incorrect counters when a
32bit high part is changed by another stats consumer/provider.
One way to solve this is to add a rtnl_link_stats64 param to all
ndo_get_stats64() methods, and also add such a parameter to
dev_get_stats().
Rule is that we are not allowed to use dev->stats64 as a temporary
storage for 64bit stats, but a caller provided area (usually on stack)
Old drivers (only providing get_stats() method) need no changes.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use struct rtnl_link_stats64 as the statistics structure.
On 32-bit architectures, insert 32 bits of padding after/before each
field of struct net_device_stats to make its layout compatible with
struct rtnl_link_stats64. Add an anonymous union in net_device; move
stats into the union and add struct rtnl_link_stats64 stats64.
Add net_device_ops::ndo_get_stats64, implementations of which will
return a pointer to struct rtnl_link_stats64. Drivers that implement
this operation must not update the structure asynchronously.
Change dev_get_stats() to call ndo_get_stats64 if available, and to
return a pointer to struct rtnl_link_stats64. Change callers of
dev_get_stats() accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The wrong size was being calculated for vfinfo. In one case, it was over-
calculating using nlmsg_total_size on attrs, in another case, it was
under-calculating by assuming ifla_vf_* structs are packed together, but
each struct is it's own attr w/ hdr (and padding).
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Noticed by Patrick McHardy: was continuing to fill skb after a
nla_put_failure, ignoring the size calculated by upper layer. Now,
return -EMSGSIZE on any overruns, but also allow netdev to
fail ndo_get_vf_port with error other than -EMSGSIZE, thus unwinding
nest.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c02db8c6290bb992442fec1407643c94cc414375:
Author: Chris Wright <chrisw@sous-sol.org>
Date: Sun May 16 01:05:45 2010 -0700
Subject: rtnetlink: make SR-IOV VF interface symmetric
adds broken error handling to do_setlink() in net/core/rtnetlink.c. The
problem is the following chunk of code:
if (tb[IFLA_VFINFO_LIST]) {
struct nlattr *attr;
int rem;
nla_for_each_nested(attr, tb[IFLA_VFINFO_LIST], rem) {
if (nla_type(attr) != IFLA_VF_INFO)
----> goto errout;
err = do_setvfinfo(dev, attr);
if (err < 0)
goto errout;
modified = 1;
}
}
which can get to errout without setting err, resulting in the following error:
net/core/rtnetlink.c: In function 'do_setlink':
net/core/rtnetlink.c:904: warning: 'err' may be used uninitialized in this function
Change the code to return -EINVAL in this case. Note that this might not be
the appropriate error though.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chris Wright <chrisw@sous-sol.org>
cc: David S. Miller <davem@davemloft.net>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new netdev ops ndo_{set|get}_vf_port to allow setting of
port-profile on a netdev interface. Extends netlink socket RTM_SETLINK/
RTM_GETLINK with two new sub msgs called IFLA_VF_PORTS and IFLA_PORT_SELF
(added to end of IFLA_cmd list). These are both nested atrtibutes
using this layout:
[IFLA_NUM_VF]
[IFLA_VF_PORTS]
[IFLA_VF_PORT]
[IFLA_PORT_*], ...
[IFLA_VF_PORT]
[IFLA_PORT_*], ...
...
[IFLA_PORT_SELF]
[IFLA_PORT_*], ...
These attributes are design to be set and get symmetrically. VF_PORTS
is a list of VF_PORTs, one for each VF, when dealing with an SR-IOV
device. PORT_SELF is for the PF of the SR-IOV device, in case it wants
to also have a port-profile, or for the case where the VF==PF, like in
enic patch 2/2 of this patch set.
A port-profile is used to configure/enable the external switch virtual port
backing the netdev interface, not to configure the host-facing side of the
netdev. A port-profile is an identifier known to the switch. How port-
profiles are installed on the switch or how available port-profiles are
made know to the host is outside the scope of this patch.
There are two types of port-profiles specs in the netlink msg. The first spec
is for 802.1Qbg (pre-)standard, VDP protocol. The second spec is for devices
that run a similar protocol as VDP but in firmware, thus hiding the protocol
details. In either case, the specs have much in common and makes sense to
define the netlink msg as the union of the two specs. For example, both specs
have a notition of associating/deassociating a port-profile. And both specs
require some information from the hypervisor manager, such as client port
instance ID.
The general flow is the port-profile is applied to a host netdev interface
using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
switch, and the switch virtual port backing the host netdev interface is
configured/enabled based on the settings defined by the port-profile. What
those settings comprise, and how those settings are managed is again
outside the scope of this patch, since this patch only deals with the
first step in the flow.
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we have a set of nested attributes:
IFLA_VFINFO_LIST (NESTED)
IFLA_VF_INFO (NESTED)
IFLA_VF_MAC
IFLA_VF_VLAN
IFLA_VF_TX_RATE
This allows a single set to operate on multiple attributes if desired.
Among other things, it means a dump can be replayed to set state.
The current interface has yet to be released, so this seems like
something to consider for 2.6.34.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Decouple rtnetlink address families from real address families in socket.h to
be able to add rtnetlink interfaces to code that is not a real address family
without increasing AF_MAX/NPROTO.
This will be used to add support for multicast route dumping from all tables
as the proc interface can't be extended to support anything but the main table
without breaking compatibility.
This partialy undoes the patch to introduce independant families for routing
rules and converts ipmr routing rules to a new rtnetlink family. Similar to
that patch, values up to 127 are reserved for real address families, values
above that may be used arbitrarily.
Signed-off-by: Patrick McHardy <kaber@trash.net>
In the original code, if rtnl_create_link() returned an ERR_PTR then that
would get passed to rtnl_configure_link() which dereferences it.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Decouple the address family values used for fib_rules from the real
address families in socket.h. This allows to use fib_rules for
code that is not a real address family without increasing AF_MAX/NPROTO.
Values up to 127 are reserved for real address families and map directly
to the corresponding AF value, values starting from 128 are for other
uses. rtnetlink is changed to invoke the AF_UNSPEC dumpit/doit handlers
for these families.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When more data is stuffed into an nlmsg than initially projected, an
extra allocation needs to be done. Reserve enough for IFLA_STATS64 so
that this does not to needlessy happen.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Luck observes that the original IFLA_STATS64 submission causes
unaligned accesses. This is because nla_data() returns a pointer to a
memory region that is only aligned to 32 bits. Do some memcpying to
workaround this.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ignore the new NETDEV_PRE_TYPE_CHANGE event in rtnetlink_event() since
there have been no changes userspace needs to be notified of.
Also add a comment to the netdev notifier event definitions to remind
people to update the exclusion list when adding new event types.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
`ip -s link` shows interface counters truncated to 32 bit. This is
because interface statistics are transported only in 32-bit quantity
to userspace. This commit adds a new IFLA_STATS64 attribute that
exports them in full 64 bit.
References: http://lkml.indiana.edu/hypermail/linux/kernel/0307.3/0215.html
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit e8469ed959c373c2ff9e6f488aa5a14971aebe1f
Author: Patrick McHardy <kaber@trash.net>
Date: Tue Feb 23 20:41:30 2010 +0100
Support specifying the initial device flags when creating a device though
rtnl_link. Devices allocated by rtnl_create_link() are marked as INITIALIZING
in order to surpress netlink registration notifications. To complete setup,
rtnl_configure_link() must be called, which performs the device flag changes
and invokes the deferred notifiers if everything went well.
Two examples:
# add macvlan to eth0
#
$ ip link add link eth0 up allmulticast on type macvlan
[LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
link/ether 26:f8:84:02:f9:2a brd ff:ff:ff:ff:ff:ff
[ROUTE]ff00::/8 dev macvlan0 table local metric 256 mtu 1500 advmss 1440 hoplimit 0
[ROUTE]fe80::/64 dev macvlan0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0
[LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500
link/ether 26:f8:84:02:f9:2a
[ADDR]11: macvlan0 inet6 fe80::24f8:84ff:fe02:f92a/64 scope link
valid_lft forever preferred_lft forever
[ROUTE]local fe80::24f8:84ff:fe02:f92a via :: dev lo table local proto none metric 0 mtu 16436 advmss 16376 hoplimit 0
[ROUTE]default via fe80::215:e9ff:fef0:10f8 dev macvlan0 proto kernel metric 1024 mtu 1500 advmss 1440 hoplimit 0
[NEIGH]fe80::215:e9ff:fef0:10f8 dev macvlan0 lladdr 00:15:e9:f0:10:f8 router STALE
[ROUTE]2001:6f8:974::/64 dev macvlan0 proto kernel metric 256 expires 0sec mtu 1500 advmss 1440 hoplimit 0
[PREFIX]prefix 2001:6f8:974::/64 dev macvlan0 onlink autoconf valid 14400 preferred 131084
[ADDR]11: macvlan0 inet6 2001:6f8:974:0:24f8:84ff:fe02:f92a/64 scope global dynamic
valid_lft 86399sec preferred_lft 14399sec
# add VLAN to eth1, eth1 is down
#
$ ip link add link eth1 up type vlan id 1000
RTNETLINK answers: Network is down
<no events>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split dev_change_flags() into two functions: __dev_change_flags() to
perform the actual changes and __dev_notify_flags() to invoke netdevice
notifiers. This will be used by rtnl_link to defer netlink notifications
until the device has been fully configured.
This changes ordering of some operations, in particular:
- netlink notifications are sent after all changes have been performed.
As a side effect this surpresses one unnecessary netlink message when
the IFF_UP and other flags are changed simultaneously.
- The NETDEV_UP/NETDEV_DOWN and NETDEV_CHANGE notifiers are invoked
after all changes have been performed. Their relative is unchanged.
- net_dmaengine_put() is invoked before the NETDEV_DOWN notifier instead
of afterwards. This should not make any difference since both RX and TX
are already shut down at this point.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to support specifying device flags during device creation,
we must be able to roll back device registration in case setting the
flags fails without sending any notifications related to the device
to userspace.
This patch changes rollback_registered_many() and register_netdevice()
to manually send netlink notifications for devices not handled by
rtnl_link and allows to defer notifications for devices handled by
rtnl_link until setup is complete.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3b8bcfd (net: introduce pre-up netdev notifier) added a new
notifier which is run before a device is set UP for use by cfg80211.
The patch missed to add the new notifier to the ignore list in
rtnetlink_event(), so we currently get an unnecessary netlink
notification before a device is set UP.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch consists of a few minor cleanups to the SR-IOV
configurion code in rtnetlink.
- Remove unneccesary lock
- Remove unneccesary casts
- Return correct error code for no driver support
These changes are based on comments from Patrick McHardy
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update rcu_dereference() primitives to use new lockdep-based
checking. The rcu_dereference() in __in6_dev_get() may be
protected either by rcu_read_lock() or RTNL, per Eric Dumazet.
The rcu_dereference() in __sk_free() is protected by the fact
that it is never reached if an update could change it. Check
for this by using rcu_dereference_check() to verify that the
struct sock's ->sk_wmem_alloc counter is zero.
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1266887105-1528-5-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add code to allow rtnetlink clients to query and set VF information through
the PF driver.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__net_init/__net_exit are apparently not going away, so use them
to full extent.
In some cases __net_init was removed, because it was called from
__net_exit code.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I received some bug reports about userspace programs having problems
because after RTM_NEWLINK was received they could not immediate access
files under /proc/sys/net/ because they had not been registered yet.
The original problem was trivially fixed by moving the userspace
notification from rtnetlink_event() to the end of
register_netdevice().
When testing that change I discovered I was still getting RTM_NEWLINK
events before I could access proc and I was also getting RTM_NEWLINK
events after I was seeing RTM_DELLINK. Things practically guaranteed
to confuse userspace.
After a little more investigation these extra notifications proved to
be from the new notifiers NETDEV_POST_INIT and NETDEV_UNREGISTER_BATCH
hitting the default case in rtnetlink_event, and triggering
unnecessary RTM_NEWLINK messages.
rtnetlink_event now explicitly handles NETDEV_UNREGISTER_BATCH and
NETDEV_POST_INIT to avoid sending the incorrect userspace
notifications.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no good reason to not support userspace specifying the
network namespace during device creation, and it makes it easier
to create a network device and pass it to a child network namespace
with a well known name.
We have to be careful to ensure that the target network namespace
for the new device exists through the life of the call. To keep
that logic clear I have factored out the network namespace grabbing
logic into rtnl_link_get_net.
In addtion we need to continue to pass the source network namespace
to the rtnl_link_ops.newlink method so that we can find the base
device source network namespace.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Adding a list_head parameter to rtnl_link_ops->dellink() methods
allow us to queue devices on a list, in order to dismantle
them all at once.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When handling large number of netdevice, rtnl_dump_ifinfo()
is very slow because it has O(N^2) complexity.
Instead of scanning one single list, we can use the 256 sub lists
of the dev_index hash table.
This considerably speedups "ip link" operations
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_getlink() & rtnl_setlink() run with RTNL held, we can use
__dev_get_by_index() and __dev_get_by_name() variants and avoid
dev_hold()/dev_put()
Adds to rtnl_getlink() the capability to find a device by its name,
not only by its index.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>