This patch makes a slight tweak to mqprio in order to bring the
classid values used back in line with what is used for mq. The general idea
is to reserve values :ffe0 - :ffef to identify hardware traffic classes
normally reported via dev->num_tc. By doing this we can maintain a
consistent behavior with mq for classid where :1 - :ffdf will represent a
physical qdisc mapped onto a Tx queue represented by classid - 1, and the
traffic classes will be mapped onto a known subset of classid values
reserved for our virtual qdiscs.
Note I reserved the range from :fff0 - :ffff since this way we might be
able to reuse these classid values with clsact and ingress which would mean
that for mq, mqprio, ingress, and clsact we should be able to maintain a
similar classid layout.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-10-13
This series contains updates to mqprio and i40e.
Amritha introduces a new hardware offload mode in tc/mqprio where the TCs,
the queue configurations and bandwidth rate limits are offloaded to the
hardware. The existing mqprio framework is extended to configure the queue
counts and layout and also added support for rate limiting. This is
achieved through new netlink attributes for the 'mode' option which takes
values such as 'dcb' (default) and 'channel' and a 'shaper' option for
QoS attributes such as bandwidth rate limits in hw mode 1. Legacy devices
can fall back to the existing setup supporting hw mode 1 without these
additional options where only the TCs are offloaded and then the 'mode'
and 'shaper' options defaults to DCB support. The i40e driver enables the
new mqprio hardware offload mechanism factoring the TCs, queue
configuration and bandwidth rates by creating HW channel VSIs.
In this new mode, the priority to traffic class mapping and the user
specified queue ranges are used to configure the traffic class when the
'mode' option is set to 'channel'. This is achieved by creating HW
channels(VSI). A new channel is created for each of the traffic class
configuration offloaded via mqprio framework except for the first TC (TC0)
which is for the main VSI. TC0 for the main VSI is also reconfigured as
per user provided queue parameters. Finally, bandwidth rate limits are set
on these traffic classes through the shaper attribute by sending these
rates in addition to the number of TCs and the queue configurations.
Colin Ian King makes an array of constant values "constant".
Alan fixes and issue where on some firmware versions, we were failing to
actually fill out the phy_types which caused ethtool to not report any
link types. Also hardened against a potentially malicious VF by not
letting the VF to reset itself after requesting to change the number of
queues (via ethtool), let the PF reset the VF to institute the requested
changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We need a real-time notification for tcp retransmission
for monitoring.
Of course we could use ftrace to dynamically instrument this
kernel function too, however we can't retrieve the connection
information at the same time, for example perf-tools [1] reads
/proc/net/tcp for socket details, which is slow when we have
a lots of connections.
Therefore, this patch adds a tracepoint for __tcp_retransmit_skb()
and exposes src/dst IP addresses and ports of the connection.
This also makes it easier to integrate into perf.
Note, I expose both IPv4 and IPv6 addresses at the same time:
for a IPv4 socket, v4 mapped address is used as IPv6 addresses,
for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel.
Also, add sk and skb pointers as they are useful for BPF.
1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Brendan Gregg <bgregg@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apparently ife_meta_id2name() is only called when
CONFIG_MODULES is defined.
This fixes:
net/sched/act_ife.c:251:20: warning: ‘ife_meta_id2name’ defined but not used [-Wunused-function]
static const char *ife_meta_id2name(u32 metaid)
^~~~~~~~~~~~~~~~
Fixes: d3f24ba895 ("net sched actions: fix module auto-loading")
Cc: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* make "make -s" silent for the certs file (Arnd)
* fix missing CONFIG_ in extra certs symbol (Arnd)
* use crypto_aead_authsize() to use the proper API
and two other changes:
* remove a set-but-unused variable
* don't track HT *capability* changes, capabilities
are supposed to be constant (HT operation changes)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAlngxMEACgkQa3t4Rpy0
AB35pg//YCsOvIMhgTXqx4U32APeDfMsn3o5nyFwoGCccQKOhCyuC5hqEQQ5vpBY
Jkz3DKwsqYtu0sCN/azlu/HcQe3B4JdyCXKIxQUQx8Bn/dJH2kQ5SnhX+SpH+8Uw
EsHzjTGjJski84vMe9V7QYO5SXQyXdx7tHHHjEXgw4xlIissMjnelYghQ5lev7UN
wgqqk6o/MucuVoQjmASX6UOh+yEHNW9PfVSpMhMnQ7n3IPjQ3MvlyjrXglByrAH9
kkQIYwonoKVzOfXjH8hzPFpiLCEFyDz7457uXXfSl8Zlv8dsdNKLoDc57NzTOw/k
KcK9ZXOim1v/Vg/7bO4JkUWxT+oelOMBG7R5BYtvo7GRrIzI9HX66Ns7K3c7t8SL
yGNRRw5ezOm1sVBIMbyuMbbLycbeGg+QcRlm84IkPCJpRed0LpgtCCKSQLtTTeLw
gVhgOI5VB3rk7wWCnSopiJJRQvFYfhnB5WRIdTsX8JQl+bc/TWH70+TjY2+rTWZx
uCjs3FeOowQzySOxIQndCa3Z+FDydZJWbB+E9iqsKrTWR6HKUBotbwmzWPr2iIJm
h7Kjbyx7yS15vurfQV2Mw+QNLHwMZrc3OliWawjnj6uk7BGtGujxHAQ0m8qG6q2A
FkPeQYJlFao0RTEJPHCeOG46kVApnD08r184N9S+QX7xYI+zppo=
=KTQB
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2017-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Three fixes for the recently added new code:
* make "make -s" silent for the certs file (Arnd)
* fix missing CONFIG_ in extra certs symbol (Arnd)
* use crypto_aead_authsize() to use the proper API
and two other changes:
* remove a set-but-unused variable
* don't track HT *capability* changes, capabilities
are supposed to be constant (HT operation changes)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that there is no user for the .set_addr function, remove it from
DSA. If a switch supports this feature (like mv88e6xxx), the
implementation can be done in the driver setup.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ICMP implementation currently replies to an ICMP time exceeded message
(type 11) with an ICMP host unreachable message (type 3, code 1).
However, time exceeded messages can either represent "time to live exceeded
in transit" (code 0) or "fragment reassembly time exceeded" (code 1).
Unconditionally replying to "fragment reassembly time exceeded" with
host unreachable messages might cause unjustified connection resets
which are now easily triggered as UFO has been removed, because, in turn,
sending large buffers triggers IP fragmentation.
The issue can be easily reproduced by running a lot of UDP streams
which is likely to trigger IP fragmentation:
# start netserver in the test namespace
ip netns add test
ip netns exec test netserver
# create a VETH pair
ip link add name veth0 type veth peer name veth0 netns test
ip link set veth0 up
ip -n test link set veth0 up
for i in $(seq 20 29); do
# assign addresses to both ends
ip addr add dev veth0 192.168.$i.1/24
ip -n test addr add dev veth0 192.168.$i.2/24
# start the traffic
netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 &
done
# wait
send_data: data send error: No route to host (errno 113)
netperf: send_omni: send_data failed: No route to host
We need to differentiate instead: if fragment reassembly time exceeded
is reported, we need to silently drop the packet,
if time to live exceeded is reported, maintain the current behaviour.
In both cases increment the related error count "icmpInTimeExcds".
While at it, fix a typo in a comment, and convert the if statement
into a switch to mate it more readable.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The offload types currently supported in mqprio are 0 (no offload) and
1 (offload only TCs) by setting these values for the 'hw' option. If
offloads are supported by setting the 'hw' option to 1, the default
offload mode is 'dcb' where only the TC values are offloaded to the
device. This patch introduces a new hardware offload mode called
'channel' with 'hw' set to 1 in mqprio which makes full use of the
mqprio options, the TCs, the queue configurations and the QoS parameters
for the TCs. This is achieved through a new netlink attribute for the
'mode' option which takes values such as 'dcb' (default) and 'channel'.
The 'channel' mode also supports QoS attributes for traffic class such as
minimum and maximum values for bandwidth rate limits.
This patch enables configuring additional HW shaper attributes associated
with a traffic class. Currently the shaper for bandwidth rate limiting is
supported which takes options such as minimum and maximum bandwidth rates
and are offloaded to the hardware in the 'channel' mode. The min and max
limits for bandwidth rates are provided by the user along with the TCs
and the queue configurations when creating the mqprio qdisc. The interface
can be extended to support new HW shapers in future through the 'shaper'
attribute.
Introduces a new data structure 'tc_mqprio_qopt_offload' for offloading
mqprio queue options and use this to be shared between the kernel and
device driver. This contains a copy of the existing data structure
for mqprio queue options. This new data structure can be extended when
adding new attributes for traffic class such as mode, shaper, shaper
parameters (bandwidth rate limits). The existing data structure for mqprio
queue options will be shared between the kernel and userspace.
Example:
queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\
min_rate 1Gbit 2Gbit max_rate 4Gbit 5Gbit
To dump the bandwidth rates:
qdisc mqprio 804a: root tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
queues:(0:3) (4:7)
mode:channel
shaper:bw_rlimit min_rate:1Gbit 2Gbit max_rate:4Gbit 5Gbit
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We already have point-to-multipoint flow control within a group. But
we even need the opposite; -a scheme which can handle that potentially
hundreds of sources may try to send messages to the same destination
simultaneously without causing buffer overflow at the recipient. This
commit adds such a mechanism.
The algorithm works as follows:
- When a member detects a new, joining member, it initially set its
state to JOINED and advertises a minimum window to the new member.
This window is chosen so that the new member can send exactly one
maximum sized message, or several smaller ones, to the recipient
before it must stop and wait for an additional advertisement. This
minimum window ADV_IDLE is set to 65 1kB blocks.
- When a member receives the first data message from a JOINED member,
it changes the state of the latter to ACTIVE, and advertises a larger
window ADV_ACTIVE = 12 x ADV_IDLE blocks to the sender, so it can
continue sending with minimal disturbances to the data flow.
- The active members are kept in a dedicated linked list. Each time a
message is received from an active member, it will be moved to the
tail of that list. This way, we keep a record of which members have
been most (tail) and least (head) recently active.
- There is a maximum number (16) of permitted simultaneous active
senders per receiver. When this limit is reached, the receiver will
not advertise anything immediately to a new sender, but instead put
it in a PENDING state, and add it to a corresponding queue. At the
same time, it will pick the least recently active member, send it an
advertisement RECLAIM message, and set this member to state
RECLAIMING.
- The reclaimee member has to respond with a REMIT message, meaning that
it goes back to a send window of ADV_IDLE, and returns its unused
advertised blocks beyond that value to the reclaiming member.
- When the reclaiming member receives the REMIT message, it unlinks
the reclaimee from its active list, resets its state to JOINED, and
notes that it is now back at ADV_IDLE advertised blocks to that
member. If there are still unread data messages sent out by
reclaimee before the REMIT, the member goes into an intermediate
state REMITTED, where it stays until the said messages have been
consumed.
- The returned advertised blocks can now be re-advertised to the
pending member, which is now set to state ACTIVE and added to
the active member list.
- To be proactive, i.e., to minimize the risk that any member will
end up in the pending queue, we start reclaiming resources already
when the number of active members exceeds 3/4 of the permitted
maximum.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following scenario is possible:
- A user sends a broadcast message, and thereafter immediately leaves
the group.
- The LEAVE message, following a different path than the broadcast,
arrives ahead of the broadcast, and the sending member is removed
from the receiver's list.
- The broadcast message arrives, but is dropped because the sender
now is unknown to the receipient.
We fix this by sequence numbering membership events, just like ordinary
unicast messages. Currently, when a JOIN is sent to a peer, it contains
a synchronization point, - the sequence number of the next sent
broadcast, in order to give the receiver a start synchronization point.
We now let even LEAVE messages contain such an "end synchronization"
point, so that the recipient can delay the removal of the sending member
until it knows that all messages have been received.
The received synchronization points are added as sequence numbers to the
generated membership events, making it possible to handle them almost
the same way as regular unicasts in the receiving filter function. In
particular, a DOWN event with a too high sequence number will be kept
in the reordering queue until the missing broadcast(s) arrive and have
been delivered.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following scenario is possible:
- A user joins a group, and immediately sends out a broadcast message
to its members.
- The broadcast message, following a different data path than the
initial JOIN message sent out during the joining procedure, arrives
to a receiver before the latter..
- The receiver drops the message, since it is not ready to accept any
messages until the JOIN has arrived.
We avoid this by treating group protocol JOIN messages like unicast
messages.
- We let them pass through the recipient's multicast input queue, just
like ordinary unicasts.
- We force the first following broadacst to be sent as replicated
unicast and being acknowledged by the recipient before accepting
any more broadcast transmissions.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need a mechanism guaranteeing that group unicasts sent out from a
socket are not bypassed by later sent broadcasts from the same socket.
We do this as follows:
- Each time a unicast is sent, we set a the broadcast method for the
socket to "replicast" and "mandatory". This forces the first
subsequent broadcast message to follow the same network and data path
as the preceding unicast to a destination, hence preventing it from
overtaking the latter.
- In order to make the 'same data path' statement above true, we let
group unicasts pass through the multicast link input queue, instead
of as previously through the unicast link input queue.
- In the first broadcast following a unicast, we set a new header flag,
requiring all recipients to immediately acknowledge its reception.
- During the period before all the expected acknowledges are received,
the socket refuses to accept any more broadcast attempts, i.e., by
blocking or returning EAGAIN. This period should typically not be
longer than a few microseconds.
- When all acknowledges have been received, the sending socket will
open up for subsequent broadcasts, this time giving the link layer
freedom to itself select the best transmission method.
- The forced and/or abrupt transmission method changes described above
may lead to broadcasts arriving out of order to the recipients. We
remedy this by introducing code that checks and if necessary
re-orders such messages at the receiving end.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Group unicast messages don't follow the same path as broadcast messages,
and there is a high risk that unicasts sent from a socket might bypass
previously sent broadcasts from the same socket.
We fix this by letting all unicast messages carry the sequence number of
the next sent broadcast from the same node, but without updating this
number at the receiver. This way, a receiver can check and if necessary
re-order such messages before they are added to the socket receive buffer.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previously introduced message transport to all group members is
based on the tipc multicast service, but is logically a broadcast
service within the group, and that is what we call it.
We now add functionality for sending messages to all group members
having a certain identity. Correspondingly, we call this feature 'group
multicast'. The service is using unicast when only one destination is
found, otherwise it will use the bearer broadcast service to transfer
the messages. In the latter case, the receiving members filter arriving
messages by looking at the intended destination instance. If there is
no match, the message will be dropped, while still being considered
received and read as seen by the flow control mechanism.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this commit, we make it possible to send connectionless unicast
messages to any member corresponding to the given member identity,
when there is more than one such member. The sender must use a
TIPC_ADDR_NAME address to achieve this effect.
We also perform load balancing between the destinations, i.e., we
primarily select one which has advertised sufficient send window
to not cause a block/EAGAIN delay, if any. This mechanism is
overlayed on the always present round-robin selection.
Anycast messages are subject to the same start synchronization
and flow control mechanism as group broadcast messages.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We now make it possible to send connectionless unicast messages
within a communication group. To send a message, the sender can use
either a direct port address, aka port identity, or an indirect port
name to be looked up.
This type of messages are subject to the same start synchronization
and flow control mechanism as group broadcast messages.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We introduce an end-to-end flow control mechanism for group broadcast
messages. This ensures that no messages are ever lost because of
destination receive buffer overflow, with minimal impact on performance.
For now, the algorithm is based on the assumption that there is only one
active transmitter at any moment in time.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like with any other service, group members' availability can be
subscribed for by connecting to be topology server. However, because
the events arrive via a different socket than the member socket, there
is a real risk that membership events my arrive out of synch with the
actual JOIN/LEAVE action. I.e., it is possible to receive the first
messages from a new member before the corresponding JOIN event arrives,
just as it is possible to receive the last messages from a leaving
member after the LEAVE event has already been received.
Since each member socket is internally also subscribing for membership
events, we now fix this problem by passing those events on to the user
via the member socket. We leverage the already present member synch-
ronization protocol to guarantee correct message/event order. An event
is delivered to the user as an empty message where the two source
addresses identify the new/lost member. Furthermore, we set the MSG_OOB
bit in the message flags to mark it as an event. If the event is an
indication about a member loss we also set the MSG_EOR bit, so it can
be distinguished from a member addition event.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With group communication, it becomes important for a message receiver to
identify not only from which socket (identfied by a node:port tuple) the
message was sent, but also the logical identity (type:instance) of the
sending member.
We fix this by adding a second instance of struct sockaddr_tipc to the
source address area when a message is read. The extra address struct
is filled in with data found in the received message header (type,) and
in the local member representation struct (instance.)
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a preparation for introducing flow control for multicast and datagram
messaging we need a more strictly defined framework than we have now. A
socket must be able keep track of exactly how many and which other
sockets it is allowed to communicate with at any moment, and keep the
necessary state for those.
We therefore introduce a new concept we have named Communication Group.
Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN.
The call takes four parameters: 'type' serves as group identifier,
'instance' serves as an logical member identifier, and 'scope' indicates
the visibility of the group (node/cluster/zone). Finally, 'flags' makes
it possible to set certain properties for the member. For now, there is
only one flag, indicating if the creator of the socket wants to receive
a copy of broadcast or multicast messages it is sending via the socket,
and if wants to be eligible as destination for its own anycasts.
A group is closed, i.e., sockets which have not joined a group will
not be able to send messages to or receive messages from members of
the group, and vice versa.
Any member of a group can send multicast ('group broadcast') messages
to all group members, optionally including itself, using the primitive
send(). The messages are received via the recvmsg() primitive. A socket
can only be member of one group at a time.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We often see a need for a linked list of destination identities,
sometimes containing a port number, sometimes a node identity, and
sometimes both. The currently defined struct u32_list is not generic
enough to cover all cases, so we extend it to contain two u32 integers
and rename it to struct tipc_dest_list.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We see an increasing need to send multiple single-buffer messages
of TIPC_SYSTEM_IMPORTANCE to different individual destination nodes.
Instead of looping over the send queue and sending each buffer
individually, as we do now, we add a new help function
tipc_node_distr_xmit() to do this.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the following commits we will need to handle multiple incoming and
rejected/returned buffers in the function socket.c::filter_rcv().
As a preparation for this, we generalize the function by handling
buffer queues instead of individual buffers. We also introduce a
help function tipc_skb_reject(), and rename filter_rcv() to
tipc_sk_filter_rcv() in line with other functions in socket.c.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the coming commits, functions at the socket level will need the
ability to read the availability status of a given node. We therefore
introduce a new function for this purpose, while renaming the existing
static function currently having the wanted name.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The address given to tipc_connect() is not completely sanity checked,
under the assumption that this will be done later in the function
__tipc_sendmsg() when the address is used there.
However, the latter functon will in the next commits serve as caller
to several other send functions, so we want to move the corresponding
sanity check there to the beginning of that function, before we possibly
need to grab the address stored by tipc_connect(). We must therefore
be able to trust that this address already has been thoroughly checked.
We do this in this commit.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As preparation for introducing communication groups, we add the ability
to issue topology subscriptions and receive topology events from kernel
space. This will make it possible for group member sockets to keep track
of other group members.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code here (more or less accidentally) tracks the HT capability of
the AP when connected, and we found at least one AP that erroneously
toggles its 20/40 capability bit when changing between 20/40 MHz. The
connection to the AP is then broken because we set the 40 MHz disable
flag based on this, as soon as it switches to 20 MHz, but because the
flag then changed, we disconnect.
I'd be inclined to just ignore this issue, since we then reconnect
while the AP is in 20 MHz mode and never use 40 MHz with it again,
but this code is a bit strange anyway - we don't use the capabilities
for anything else.
Change the code to simply not track the HT capabilities at all, which
assumes that the AP at least sets 20/40 capability when operating in
40 MHz (or higher). If not, rate scaling might end up using only the
narrower bandwidth.
The new behaviour also mirrors what VHT does, where we only check the
VHT operation.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The missing CONFIG_ prefix means this macro is never defined,
leading to a possible Kbuild warning:
net/wireless/reg.c:666:20: error: 'load_keys_from_buffer' defined but not used [-Werror=unused-function]
static void __init load_keys_from_buffer(const u8 *p, unsigned int buflen)
When we use the correct symbol, the warning also goes away.
Fixes: 90a53e4432 ("cfg80211: implement regdb signature checking")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Building an allmodconfig kernel with 'make -s' now prints a single line:
GEN net/wireless/shipped-certs.c
Using '$(kecho)' here will skip the output with 'make -s' but
otherwise keeps printing it, which is consistent with how we
handle all the other output.
Fixes: 90a53e4432 ("cfg80211: implement regdb signature checking")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch changes the parameter updating via RCU and not protected by a
spinlock anymore. This reduce the time that the spinlock is being held.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch migrates the current counter handling which is protected by a
spinlock to a per-cpu counter handling. This reduce the time where the
spinlock is being held.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the check of the two possible ife handlings encode
and decode to the init callback. The decode value is for usability
aspect and used in userspace code only. The current code offers encode
else decode only. This patch avoids any other option than this.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Macro __stringify_1() can stringify a macro argument, however IFE_META_*
are enums, so they never expand, however request_module expects an integer
in IFE module name, so as a result it always fails to auto-load.
Fixes: ef6980b6be ("introduce IFE action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make style of module alias name consistent with other subsystems in kernel,
for example net devices.
Fixes: 084e2f6566 ("Support to encoding decoding skb mark on IFE action")
Fixes: 200e10f469 ("Support to encoding decoding skb prio on IFE action")
Fixes: 408fbc22ef ("net sched ife action: Introduce skb tcindex metadata encap decap")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When af_mpls is built-in but the tunnel support is a module,
we get a link failure:
net/mpls/af_mpls.o: In function `mpls_init':
af_mpls.c:(.init.text+0xdc): undefined reference to `ip_tunnel_encap_add_ops'
This adds a Kconfig statement to prevent the broken
configuration and force mpls to be a module as well in
this case.
Fixes: bdc476413d ("ip_tunnel: add mpls over gre support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Amine Kherbouche <amine.kherbouche@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For RoCEs ib_query_gid() takes a reference count on the net_device.
This reference count must be decreased by the caller.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reported-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Fixes: 0cfdd8f92c ("smc: connection and link group creation")
Signed-off-by: David S. Miller <davem@davemloft.net>
SMC should not open code the function pointer get_netdev of the
IB device. Replacing ib_query_gid(..., NULL) with
ib_query_gid(..., gid_attr) allows access to the netdev.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Suggested-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to tell the DSA master network device doing the actual
transmission what the desired switch port and queue number is for it to
resolve that to the internal transmit queue it is mapped to.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for communicating a given DSA network device's port
number and switch index, create a specialized DSA notifier and two
events: DSA_PORT_REGISTER and DSA_PORT_UNREGISTER that communicate: the
slave network device (slave_dev), port number and switch number in the
tree.
This will be later used for network device drivers like bcmsysport which
needs to cooperate with its DSA network devices to set-up queue mapping
and scheduling.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function ipgre_mpls_encap_hlen is local to the source and
does not need to be in global scope, so make it static.
Cleans up sparse warning:
symbol 'ipgre_mpls_encap_hlen' was not declared. Should it be static?
Fixes: bdc476413d ("ip_tunnel: add mpls over gre support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The array sctp_sched_ops is local to the source and
does not need to be in global scope, so make it static.
Cleans up sparse warning:
symbol 'sctp_sched_ops' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to the previous patch, use the device lookup functions
that bump device refcount and flag this as DOIT_UNLOCKED to avoid
rtnl mutex.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of relying on rtnl mutex bump device reference count.
After this change, values reported can change in parallel, but thats not
much different from current state, as anyone can change the settings
right after rtnl_unlock (and before userspace processed reply).
While at it, switch to GFP_KERNEL allocation.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The helper and the struct field ares no longer used by any code,
so remove them.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only user of cls_flower->egress_dev is mlx5. So do the conversion
there alongside with the code originating the call in cls_flower
function fl_hw_replace_filter to the newly introduced egress device
callback infrastucture.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce infrastructure that allows drivers to register callbacks that
are called whenever tc would offload inserted rule and specified device
acts as tc action egress device.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return dev directly, NULL if not possible. That is enough.
Makes no sense to pass struct net * to get_dev op, as there is only one
net possible, the one the action was created in. So just store it in
mirred priv and use directly.
Rename the mirred op callback function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the necessary logic for decoding incoming messages of version 2 as
well. Also make sure there's room for the bigger of version 1 and 2
headers in the code allocating skbs for outgoing messages.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than parsing the header of incoming messages throughout the
implementation do it once when we retrieve the message and store the
relevant information in the "cb" member of the sk_buff.
This allows us to, in a later commit, decode version 2 messages into
this same structure.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>