Update fib_nl_dumprule for strict data checking. If the flag is set,
the dump request is expected to have fib_rule_hdr struct as the header.
All elements of the struct are expected to be 0 and no attributes can
be appended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update rtnl_net_dumpid for strict data checking. If the flag is set,
the dump request is expected to have an rtgenmsg struct as the header
which has the family as the only element. No data may be appended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update neightbl_dump_info for strict data checking. If the flag is set,
the dump request is expected to have an ndtmsg struct as the header.
All elements of the struct are expected to be 0 and no attributes can
be appended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update neigh_dump_info for strict data checking. If the flag is set,
the dump request is expected to have an ndmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the NDA_IFINDEX and
NDA_MASTER attributes are supported.
Existing code does not fail the dump if nlmsg_parse fails. That behavior
is kept for non-strict checking.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add helper to check netlink message for route dumps. If the strict flag
is set the dump request is expected to have an rtmsg struct as the header.
All elements of the struct are expected to be 0 with the exception of
rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes
can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX
set.
Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute,
and ip6mr_rtm_dumproute to call this helper if strict data checking is
enabled.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update ipmr_rtm_dumplink for strict data checking. If the flag is set,
the dump request is expected to have an ifinfomsg struct as the header.
All elements of the struct are expected to be 0 and no attributes can
be appended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update inet6_dump_ifinfo for strict data checking. If the flag is
set, the dump request is expected to have an ifinfomsg struct as
the header. All elements of the struct are expected to be 0 and no
attributes can be appended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update rtnl_stats_dump for strict data checking. If the flag is set,
the dump request is expected to have an if_stats_msg struct as the header.
All elements of the struct are expected to be 0 except filter_mask which
must be non-0 (legacy behavior). No attributes are supported.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update rtnl_bridge_getlink for strict data checking. If the flag is set,
the dump request is expected to have an ifinfomsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFLA_EXT_MASK
attribute is supported.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update rtnl_dump_ifinfo for strict data checking. If the flag is set,
the dump request is expected to have an ifinfomsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID,
IFLA_EXT_MASK, IFLA_MASTER, and IFLA_LINKINFO attributes are supported.
Existing code does not fail the dump if nlmsg_parse fails. That behavior
is kept for non-strict checking.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update inet6_dump_addr for strict data checking. If the flag is set, the
dump request is expected to have an ifaddrmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values suppored by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID
attribute is supported. Follow on patches can add support for other fields
(e.g., honor ifa_index and only return data for the given device index).
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update inet_dump_ifaddr for strict data checking. If the flag is set,
the dump request is expected to have an ifaddrmsg struct as the header
potentially followed by one or more attributes. Any data passed in the
header or as an attribute is taken as a request to influence the data
returned. Only values supported by the dump handler are allowed to be
non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID
attribute is supported. Follow on patches can support for other fields
(e.g., honor ifa_index and only return data for the given device index).
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new socket option, NETLINK_DUMP_STRICT_CHK, that userspace
can use via setsockopt to request strict checking of headers and
attributes on dump requests.
To get dump features such as kernel side filtering based on data in
the header or attributes appended to the dump request, userspace
must call setsockopt() for NETLINK_DUMP_STRICT_CHK and a non-zero
value. Since the netlink sock and its flags are private to the
af_netlink code, the strict checking flag is passed to dump handlers
via a flag in the netlink_callback struct.
For old userspace on new kernel there is no impact as all of the data
checks in later patches are wrapped in a check on the new strict flag.
For new userspace on old kernel, the setsockopt will fail and even if
new userspace sets data in the headers and appended attributes the
kernel will silently ignore it. Moving forward when the setsockopt
succeeds, the new userspace on old kernel means the dump request can
pass an attribute the kernel does not understand. The dump will then
fail as the older kernel does not understand it.
New userspace on new kernel setting the socket option gets the benefit
of the improved data dump.
Kernel side the NETLINK_DUMP_STRICT_CHK uapi is converted to a generic
NETLINK_F_STRICT_CHK flag which can potentially be leveraged for tighter
checking on the NEW, DEL, and SET commands.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull the inet6_fill_args arg up to in6_dump_addrs and move netnsid
into it.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure extack is passed to nlmsg_parse where easy to do so.
Most of these are dump handlers and leveraging the extack in
the netlink_callback.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Declare extack in netlink_dump and pass to dump handlers via
netlink_callback. Add any extack message after the dump_done_errno
allowing error messages to be returned. This will be useful when
strict checking is done on dump requests, returning why the dump
fails EINVAL.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have the knode count, we can instantly check if
any hnodes are non-empty. And that kills the check for extra
references to root hnode - those could happen only if there was
a knode to carry such a link.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
allows to simplify u32_delete() considerably
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both hnode ->tp_c and tp_c argument of u32_set_parms()
the latter is redundant, the former - never read...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It must be tc_u_common associated with that tp (i.e. tp->data).
Proof:
* both ->ht_up and ->tp_c are assign-once
* ->tp_c of anything inserted into tp_c->hlist is tp_c
* hnodes never get reinserted into the lists or moved
between those, so anything found by u32_lookup_ht(tp->data, ...)
will have ->tp_c equal to tp->data.
* tp->root->tp_c == tp->data.
* ->ht_up of anything inserted into hnode->ht[...] is
equal to hnode.
* knodes never get reinserted into hash chains or moved
between those, so anything returned by u32_lookup_key(ht, ...)
will have ->ht_up equal to ht.
* any knode returned by u32_get(tp, ...) will have ->ht_up->tp_c
point to tp->data
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the only thing we used ht for was ht->tp_c and callers can get that
without going through ->tp_c at all; start with lifting that into
the callers, next commits will massage those, eventually removing
->tp_c altogether.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* calculate key *once*, not for each hash chain element
* let tc_u_hash() return the pointer to chain head rather than index -
callers are cleaner that way.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
unused
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
not used anymore
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested by modifying iproute2 to allow sending a divisor > 255
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Operation makes no sense. Nothing will actually break if we do so
(depth limit in u32_classify() will prevent infinite loops), but
according to maintainers it's best prohibited outright.
NOTE: doing so guarantees that u32_destroy() will trigger the call
of u32_destroy_hnode(); we might want to make that unconditional.
Test:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 100 u32 \
link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
should fail with
Error: cls_u32: Not linking to root node
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
... and produce consistent error on attempt to delete such.
Existing check in u32_delete() is inconsistent - after
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: u32 \
divisor 1
tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: u32 \
divisor 1
both
tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 801: u32
and
tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 800: u32
will fail (at least with refcounting fixes), but the former will complain
about an attempt to remove a busy table, while the latter will recognize
it as root and yield "Not allowed to delete root node" instead.
The problem with the existing check is that several tcf_proto instances
might share the same tp->data and handle-to-hnode lookup will be the same
for all of them. So comparing an hnode to be deleted with tp->root won't
catch the case when one tp is used to try deleting the root of another.
Solution is trivial - mark the root hnodes explicitly upon allocation and
check for that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix 5 warnings and 14 checks issued by checkpatch.pl:
CHECK: Logical continuations should be on the previous line
+ if ((q->vars.qdelay < q->params.target / 2)
+ && (q->vars.prob < MAX_PROB / 5))
WARNING: line over 80 characters
+ q->params.tupdate = usecs_to_jiffies(nla_get_u32(tb[TCA_PIE_TUPDATE]));
CHECK: Blank lines aren't necessary after an open brace '{'
+{
+
CHECK: braces {} should be used on all arms of this statement
+ if (qlen < QUEUE_THRESHOLD)
[...]
+ else {
[...]
CHECK: Unbalanced braces around else statement
+ else {
CHECK: No space is necessary after a cast
+ if (delta > (s32) (MAX_PROB / (100 / 2)) &&
CHECK: Unnecessary parentheses around 'qdelay == 0'
+ if ((qdelay == 0) && (qdelay_old == 0) && update_prob)
CHECK: Unnecessary parentheses around 'qdelay_old == 0'
+ if ((qdelay == 0) && (qdelay_old == 0) && update_prob)
CHECK: Unnecessary parentheses around 'q->vars.prob == 0'
+ if ((q->vars.qdelay < q->params.target / 2) &&
+ (q->vars.qdelay_old < q->params.target / 2) &&
+ (q->vars.prob == 0) &&
+ (q->vars.avg_dq_rate > 0))
CHECK: Unnecessary parentheses around 'q->vars.avg_dq_rate > 0'
+ if ((q->vars.qdelay < q->params.target / 2) &&
+ (q->vars.qdelay_old < q->params.target / 2) &&
+ (q->vars.prob == 0) &&
+ (q->vars.avg_dq_rate > 0))
CHECK: Blank lines aren't necessary before a close brace '}'
+
+}
CHECK: Comparison to NULL could be written "!opts"
+ if (opts == NULL)
CHECK: No space is necessary after a cast
+ ((u32) PSCHED_TICKS2NS(q->params.target)) /
WARNING: line over 80 characters
+ nla_put_u32(skb, TCA_PIE_TUPDATE, jiffies_to_usecs(q->params.tupdate)) ||
CHECK: Blank lines aren't necessary before a close brace '}'
+
+}
CHECK: No space is necessary after a cast
+ .delay = ((u32) PSCHED_TICKS2NS(q->vars.qdelay)) /
WARNING: Missing a blank line after declarations
+ struct sk_buff *skb;
+ skb = qdisc_dequeue_head(sch);
WARNING: Missing a blank line after declarations
+ struct pie_sched_data *q = qdisc_priv(sch);
+ qdisc_reset_queue(sch);
WARNING: Missing a blank line after declarations
+ struct pie_sched_data *q = qdisc_priv(sch);
+ q->params.tupdate = 0;
Signed-off-by: Leslie Monis <lesliemonis@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds OEM commands and response handling. It also defines OEM
command and response structure as per NCSI specification along with its
handlers.
ncsi_cmd_handler_oem: This is a generic command request handler for OEM
commands
ncsi_rsp_handler_oem: This is a generic response handler for OEM commands
Signed-off-by: Vijay Khemka <vijaykhemka@fb.com>
Reviewed-by: Justin Lee <justin.lee1@dell.com>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
move_addr_to_kernel() returns only negative values on error, or zero on
success. Rewrite the error check to an idiomatic form to avoid confusing
the reader.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of TC attributes are processed without proper validation
(e.g., length checks). Add a tca policy for all input attributes and use
when invoking nlmsg_parse.
The 2 Fixes tags below cover the latest additions. The other attributes
are a string (KIND), nested attribute (OPTIONS which does seem to have
validation in most cases), for dumps only or a flag.
Fixes: 5bc1701881 ("net: sched: introduce multichain support for filters")
Fixes: d47a6b0e7c ("net: sched: introduce ingress/egress block index attributes for qdisc")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg',
which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh').
The problem is, the function bails out early if nlmsg_parse() fails, which
does occur for iproute2 usage of 'struct ndmsg' because the payload length
is shorter than the family header alone (as 'struct ifinfomsg' is assumed).
This breaks backward compatibility with userspace -- nothing is sent back.
Some examples with iproute2 and netlink library for go [1]:
1) $ bridge fdb show
33:33:00:00:00:01 dev ens3 self permanent
01:00:5e:00:00:01 dev ens3 self permanent
33:33:ff:15:98:30 dev ens3 self permanent
This one works, as it uses 'struct ifinfomsg'.
fdb_show() @ iproute2/bridge/fdb.c
"""
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
...
if (rtnl_dump_request(&rth, RTM_GETNEIGH, [...]
"""
2) $ ip --family bridge neigh
RTNETLINK answers: Invalid argument
Dump terminated
This one fails, as it uses 'struct ndmsg'.
do_show_or_flush() @ iproute2/ip/ipneigh.c
"""
.n.nlmsg_type = RTM_GETNEIGH,
.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
"""
3) $ ./neighlist
< no output >
This one fails, as it uses 'struct ndmsg'-based.
neighList() @ netlink/neigh_linux.go
"""
req := h.newNetlinkRequest(unix.RTM_GETNEIGH, [...]
msg := Ndmsg{
"""
The actual breakage was introduced by commit 0ff50e83b5 ("net: rtnetlink:
bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails
if the payload length (with the _actual_ family header) is less than the
family header length alone (which is assumed, in parameter 'hdrlen').
This is true in the examples above with struct ndmsg, with size and payload
length shorter than struct ifinfomsg.
However, that commit just intends to fix something under the assumption the
family header is indeed an 'struct ifinfomsg' - by preventing access to the
payload as such (via 'ifm' pointer) if the payload length is not sufficient
to actually contain it.
The assumption was introduced by commit 5e6d243587 ("bridge: netlink dump
interface at par with brctl"), to support iproute2's 'bridge fdb' command
(not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken.
So, in order to unbreak the 'struct ndmsg' family headers and still allow
'struct ifinfomsg' to continue to work, check for the known message sizes
used with 'struct ndmsg' in iproute2 (with zero or one attribute which is
not used in this function anyway) then do not parse the data as ifinfomsg.
Same examples with this patch applied (or revert/before the original fix):
$ bridge fdb show
33:33:00:00:00:01 dev ens3 self permanent
01:00:5e:00:00:01 dev ens3 self permanent
33:33:ff:15:98:30 dev ens3 self permanent
$ ip --family bridge neigh
dev ens3 lladdr 33:33:00:00:00:01 PERMANENT
dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT
dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT
$ ./neighlist
netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0x0, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x1, 0x0, 0x5e, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0xff, 0x15, 0x98, 0x30}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
Tested on mainline (v4.19-rc6) and net-next (3bd09b05b0).
References:
[1] netlink library for go (test-case)
https://github.com/vishvananda/netlink
$ cat ~/go/src/neighlist/main.go
package main
import ("fmt"; "syscall"; "github.com/vishvananda/netlink")
func main() {
neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE)
for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) }
}
$ export GOPATH=~/go
$ go get github.com/vishvananda/netlink
$ go build neighlist
$ ~/go/src/neighlist/neighlist
Thanks to David Ahern for suggestions to improve this patch.
Fixes: 0ff50e83b5 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error")
Fixes: 5e6d243587 ("bridge: netlink dump interface at par with brctl")
Reported-by: Aidan Obley <aobley@pivotal.io>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case ip_fib_metrics_init() returns an error, we better
rewrite rt->fib6_metrics with &dst_default_metrics so that
we do not crash later in ip_fib_metrics_put()
Fixes: 767a221753 ("net: common metrics init helper for FIB entries")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid the socket lookup cost in udp_gro_receive if no socket has a
udp tunnel callback configured.
udp_sk(sk)->gro_receive requires a registration with
setup_udp_tunnel_sock, which enables the static key.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following Sparse warnings:
net/bpfilter/bpfilter_kern.c:62:21: warning: cast removes address space
of expression
net/bpfilter/bpfilter_kern.c:101:49: warning: Using plain integer as
NULL pointer
Signed-off-by: Shanthosh RK <shanthosh.rk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit ec3ed293e7 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
with the spinlock held. Unfortunately, this causes a lot of
troubles here:
1. tcf_chain_destroy() could be called right after we queue the work
but before the work runs. This is a use-after-free.
2. The chain refcnt is already 0, we can't even just hold it again.
We can check refcnt==1 but it is ugly.
3. The chain with refcnt 0 is still visible in its block, which means
it could be still found and used!
4. The block has a refcnt too, we can't hold it without introducing a
proper API either.
We can make it working but the end result is ugly. Instead of wasting
time on reviewing it, let's just convert the troubling spinlock to
a mutex, which allows us to use non-atomic allocations too.
Fixes: ec3ed293e7 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
Reported-by: Ido Schimmel <idosch@idosch.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the attribute parsing from neigh_dump_table to neigh_dump_info, and
pass the filter arguments down to neigh_dump_table in a new struct. Add
the filter option to proxy neigh dumps as well to make them consistent.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we use raw socket as the vhost backend, a packet from virito with
gso offloading information, cannot be sent out in later validaton at
xmit path, as we did not set correct skb->protocol which is further used
for looking up the gso function.
To fix this, we set this field according to virito hdr information.
Fixes: e858fae2b0 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
Signed-off-by: Jianfeng Tan <jianfeng.tan@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the refcounting and potential free of dst metrics associated
for ipv4 and ipv6 to a common helper.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4 and ipv6 both use refcounted metrics if FIB entries have metrics set.
Move the common initialization code to a helper and use for both protocols.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the refcounting and potential free of dst metrics associated
with a fib entry to a helper and use it in both ipv4 and ipv6.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidate initialization of ipv4 and ipv6 metrics when fib entries
are created into a single helper, ip_fib_metrics_init, that handles
the call to ip_metrics_convert.
If no metrics are defined for the fib entry, then the metrics is set
to dst_default_metrics.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Load the respective NAT helper module if the flow uses it.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This traffic scheduler allows traffic classes states (transmission
allowed/not allowed, in the simplest case) to be scheduled, according
to a pre-generated time sequence. This is the basis of the IEEE
802.1Qbv specification.
Example configuration:
tc qdisc replace dev enp3s0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
base-time 1528743495910289987 \
sched-entry S 01 300000 \
sched-entry S 02 300000 \
sched-entry S 04 300000 \
clockid CLOCK_TAI
The configuration format is similar to mqprio. The main difference is
the presence of a schedule, built by multiple "sched-entry"
definitions, each entry has the following format:
sched-entry <CMD> <GATE MASK> <INTERVAL>
The only supported <CMD> is "S", which means "SetGateStates",
following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
is a bitmask where each bit is a associated with a traffic class, so
bit 0 (the least significant bit) being "on" means that traffic class
0 is "active" for that schedule entry. <INTERVAL> is a time duration
in nanoseconds that specifies for how long that state defined by <CMD>
and <GATE MASK> should be held before moving to the next entry.
This schedule is circular, that is, after the last entry is executed
it starts from the first one, indefinitely.
The other parameters can be defined as follows:
- base-time: specifies the instant when the schedule starts, if
'base-time' is a time in the past, the schedule will start at
base-time + (N * cycle-time)
where N is the smallest integer so the resulting time is greater
than "now", and "cycle-time" is the sum of all the intervals of the
entries in the schedule;
- clockid: specifies the reference clock to be used;
The parameters should be similar to what the IEEE 802.1Q family of
specification defines.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
msix_vec_per_pf_min - This param sets the number of minimal MSIX
vectors required for the device initialization. This value is set
in the device which limits MSIX vectors per PF.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
msix_vec_per_pf_max - This param sets the number of MSIX vectors
that the device requests from the host on driver initialization.
This value is set in the device which is applicable per PF.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ignore_ari - Device ignores ARI(Alternate Routing ID) capability,
even when platforms has the support and creates same number of
partitions when platform does not support ARI capability.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>