The single-step debugging of KVM guests on x86 is broken: if we run
gdb 'stepi' command at the breakpoint when the guest interrupts are
enabled, RIP always jumps to native_apic_mem_write(). Then other
nasty effects follow.
Long investigation showed that on Jun 7, 2017 the
commit c8401dda2f ("KVM: x86: fix singlestepping over syscall")
introduced the kvm_run.debug corruption: kvm_vcpu_do_singlestep() can
be called without X86_EFLAGS_TF set.
Let's fix it. Please consider that for -stable.
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Cc: stable@vger.kernel.org
Fixes: c8401dda2f ("KVM: x86: fix singlestepping over syscall")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
HV_X64_MSR_GUEST_IDLE_AVAILABLE appeared in kvm_vcpu_ioctl_get_hv_cpuid()
by mistake: it announces support for HV_X64_MSR_GUEST_IDLE (0x400000F0)
which we don't support in KVM (yet).
Fixes: 2bc39970e9 ("x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Recently, DMG frequency bands have been extended till 71GHz, so extend
the range check till 20GHz (45-71GHZ), else some channels will be marked
as disabled.
Signed-off-by: Chaitanya Tata <Chaitanya.Tata@bluwireless.co.uk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If there are simulatenous queries of regdb, then there might be a case
where multiple queries can trigger request_firmware_no_wait and can have
parallel callbacks being executed asynchronously. In this scenario we
might hit the WARN_ON.
So remove the warn_on, as the code already handles multiple callbacks
gracefully.
Signed-off-by: Chaitanya Tata <chaitanya.tata@bluwireless.co.uk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
During refactor in commit 9e478066ea ("mac80211: fix MU-MIMO
follow-MAC mode") a new struct 'action' was declared with packed
attribute as:
struct {
struct ieee80211_hdr_3addr hdr;
u8 category;
u8 action_code;
} __packed action;
But since struct 'ieee80211_hdr_3addr' is declared with an aligned
keyword as:
struct ieee80211_hdr {
__le16 frame_control;
__le16 duration_id;
u8 addr1[ETH_ALEN];
u8 addr2[ETH_ALEN];
u8 addr3[ETH_ALEN];
__le16 seq_ctrl;
u8 addr4[ETH_ALEN];
} __packed __aligned(2);
Solve the ambiguity of placing aligned structure in a packed one by
adding the aligned(2) attribute to struct 'action'.
This removes the following warning (W=1):
net/mac80211/rx.c:234:2: warning: alignment 1 of 'struct <anonymous>' is less than 2 [-Wpacked-not-aligned]
Cc: Johannes Berg <johannes.berg@intel.com>
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
syzbot reported an out-of-bounds read when passing certain
malformed messages into nl80211. The specific place where
this happened isn't interesting, the problem is that nested
policy parsing was referring to the wrong maximum attribute
and thus the policy wasn't long enough.
Fix this by referring to the correct attribute. Since this
is really not necessary, I'll come up with a separate patch
to just pass the policy instead of both, in the common case
we can infer the maxattr from the size of the policy array.
Reported-by: syzbot+4157b036c5f4713b1f2f@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Fixes: 9bb7e0f24e ("cfg80211: add peer measurement with FTM initiator API")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The IBM virtual ethernet driver's polling function continues
to process frames after rescheduling NAPI, resulting in a warning
if it exhausted its budget. Do not restart polling after calling
napi_reschedule. Instead let frames be processed in the following
instance.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__bpf_redirect() and act_mirred checks this boolean
to determine whether to prefix an ethernet header.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ax88772_bind() should return error code immediately when the PHY
was not reset properly through ax88772a_hw_reset().
Otherwise, The asix_get_phyid() will block when get the PHY
Identifier from the PHYSID1 MII registers through asix_mdio_read()
due to the PHY isn't ready. Furthermore, it will produce a lot of
error message cause system crash.As follows:
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to write
reg index 0x0000: -71
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to send
software reset: ffffffb9
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to write
reg index 0x0000: -71
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to enable
software MII access
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to read
reg index 0x0000: -71
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to write
reg index 0x0000: -71
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to enable
software MII access
asix 1-1:1.0 (unnamed net_device) (uninitialized): Failed to read
reg index 0x0000: -71
...
Signed-off-by: Zhang Run <zhang.run@zte.com.cn>
Reviewed-by: Yang Wei <yang.wei9@zte.com.cn>
Tested-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
max_rcvbuf_size is no longer used since commit "414574a0af36".
Signed-off-by: Zhaolong Zhang <zhangzl2013@126.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following Marvell's acquisition of Cavium, we need to update all the
Cavium drivers maintainer's entries to point to our new e-mail addresses.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ameen Rahman <Ameen.Rahman@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Priyaranjan Jha says:
====================
tcp_bbr: Improving TCP BBR performance for WiFi and cellular networks
Ack aggregation is quite prevalent with wifi, cellular and cable modem
link tchnologies, ACK decimation in middleboxes, and common offloading
techniques such as TSO and GRO, at end hosts. Previously, BBR was often
cwnd-limited in the presence of severe ACK aggregation, which resulted in
low throughput due to insufficient data in flight.
To achieve good throughput for wifi and other paths with aggregation, this
patch series implements an ACK aggregation estimator for BBR, which
estimates the maximum recent degree of ACK aggregation and adapts cwnd
based on it. The algorithm is further described by the following
presentation:
https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00
(1) A preparatory patch, which refactors bbr_target_cwnd for generic
inflight provisioning.
(2) Implements BBR ack aggregation estimator and adapts cwnd based
on measured degree of ACK aggregation.
====================
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aggregation effects are extremely common with wifi, cellular, and cable
modem link technologies, ACK decimation in middleboxes, and LRO and GRO
in receiving hosts. The aggregation can happen in either direction,
data or ACKs, but in either case the aggregation effect is visible
to the sender in the ACK stream.
Previously BBR's sending was often limited by cwnd under severe ACK
aggregation/decimation because BBR sized the cwnd at 2*BDP. If packets
were acked in bursts after long delays (e.g. one ACK acking 5*BDP after
5*RTT), BBR's sending was halted after sending 2*BDP over 2*RTT, leaving
the bottleneck idle for potentially long periods. Note that loss-based
congestion control does not have this issue because when facing
aggregation it continues increasing cwnd after bursts of ACKs, growing
cwnd until the buffer is full.
To achieve good throughput in the presence of aggregation effects, this
algorithm allows the BBR sender to put extra data in flight to keep the
bottleneck utilized during silences in the ACK stream that it has evidence
to suggest were caused by aggregation.
A summary of the algorithm: when a burst of packets are acked by a
stretched ACK or a burst of ACKs or both, BBR first estimates the expected
amount of data that should have been acked, based on its estimated
bandwidth. Then the surplus ("extra_acked") is recorded in a windowed-max
filter to estimate the recent level of observed ACK aggregation. Then cwnd
is increased by the ACK aggregation estimate. The larger cwnd avoids BBR
being cwnd-limited in the face of ACK silences that recent history suggests
were caused by aggregation. As a sanity check, the ACK aggregation degree
is upper-bounded by the cwnd (at the time of measurement) and a global max
of BW * 100ms. The algorithm is further described by the following
presentation:
https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00
In our internal testing, we observed a significant increase in BBR
throughput (measured using netperf), in a basic wifi setup.
- Host1 (sender on ethernet) -> AP -> Host2 (receiver on wifi)
- 2.4 GHz -> BBR before: ~73 Mbps; BBR after: ~102 Mbps; CUBIC: ~100 Mbps
- 5.0 GHz -> BBR before: ~362 Mbps; BBR after: ~593 Mbps; CUBIC: ~601 Mbps
Also, this code is running globally on YouTube TCP connections and produced
significant bandwidth increases for YouTube traffic.
This is based on Ian Swett's max_ack_height_ algorithm from the
QUIC BBR implementation.
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because bbr_target_cwnd() is really a general-purpose BBR helper for
computing some volume of inflight data as a function of the estimated
BDP, refactor it into following helper functions:
- bbr_bdp()
- bbr_quantization_budget()
- bbr_inflight()
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Few chip versions use the same sequence to adjust 10M and ALDPS, so
let's factor it out. This patch also fixes a (most likely) typo in
rtl8168g_1_hw_phy_config. There bit 8 in reg 0x14 on page 0x0bcc
was set and not cleared. According to the vendor driver this bit
needs to be cleared in all cases.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chip versions from RTL8168g onward use the same sequence to disable
ALDPS (Advanced Link-Down Power Saving). So let's factor this out.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
and Adrian Vladu's first patch fixing a few spelling mistakes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAlxIsZEACgkQ3qZv95d3
LNyoBA/+Oa9dcAo5hMS4jiC7kF3s/G1E12Zb3RRZEuUEDLsFcg3OhDJquHHsiifA
j5yd1/GJnV2RiGhTeEhsLE+efCKjcfmEUx76/GYSCskxAHnG244JD6nemmuaxmUq
czegsT1TNg00w6Cx36C/9DQ3PApgBImsXSyhSKy1JrAYaYT9QxOsD41R0FrAWig9
d3w3hAsy+63yFcyP6VYivA8/ndj+kW7rscZ0cfEY+1DwW4gBR5v1wUco7UuffS55
2fUQNT6z+ptrOT4pCxGe8CRlJmefGWBcIfiso94DdMgtbEZHZNdE9qytYUn9OO3f
6SKh7qYNrIuS/gYrKqZufRUd5UIulaWGvNkqxS+DvrncWbTQUOygtuW5Av1bcNg5
Xwe3XGjOzwrSErOqDAW9XfbfA5Hx0ZOnihiXOByGOfWgC0C5+h2QOTVgdyYDRGm3
DaESVCsGLnlMX8kkB+hUa2YFoY8XWApubx+yHeHDHLjxYOM5kOpDF4tHjmKYt1si
Ym4ZZd0iZKsHRQHSePAfh7SaTxATVltne7cIBSD4q3j+JfM+n0uDikadnUsue0Vp
e6ugnNcO530NI/AbHoezfm2k/Wh1eQ1wOMsSlT5qyhG4qP8azxGMUQai1fFgQBtF
txryAbMq70/gbwfbUnuyFxgpE+D/B26/5us/y1xxxRG61vVezp4=
=GKPZ
-----END PGP SIGNATURE-----
Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Sasha Levin says:
====================
Hyper-V hv_netvsc commits for 5.0
Three patches from Haiyang Zhang to fix settings hash key using ethtool,
and Adrian Vladu's first patch fixing a few spelling mistakes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
I made a dumb mistake when I summed up the slave stats, obviously slaves
can come and go which would make the master stats unreliable.
Count and export the master stats separately.
Fixes: a258aeacd7 ("bonding: add support for xstats and export 3ad stats")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit says:
====================
net: phy: improve starting PHY
This patch series improves few aspects of starting the PHY.
v2:
- improve a warning in patch 4
v3:
- extend commit message for patch 2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we enable the interrupts in phy_start() we don't have to do it
before. Therefore remove enabling interrupts from phy_start_interrupts()
and rename this function to reflect the changed functionality.
v2:
- improve warning to clearly state that we fall back to polling
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Interrupts don't have to be enabled before calling phy_start().
Therefore let's enable them in phy_start(). In a subsequent step
we'll remove enabling interrupts from phy_connect_direct().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
phy_start() should be called from states PHY_READY or PHY_HALTED only.
Check for this to detect misbehaving drivers. Also the state machine
should be started only when being called from one of the valid states.
Some more background:
For all invalid states phy_start() basically was a no-op. All it did
was triggering a state machine run, but for all "running" states the
poll loop was active anyway. And if called from PHY_DOWN, the state
machine does nothing.
v3:
- extended commit message
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The state machine is a no-op before phy_start() has been called.
Therefore let's enable it in phy_start() only. In phy_start()
let's call phy_start_machine() instead of phy_trigger_machine().
phy_start_machine is an alias for phy_trigger_machine but it makes
clearer that we start the state machine here instead of just
triggering a run.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, the function devm_clk_get() returns ERR_PTR() and
never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().
Fixes: a7c30e62d4 ("net: stmmac: Add driver for Qualcomm ethqos")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Acked-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two statements are incorrecly indented, fix these by removing a space.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil says:
====================
Introduce ENETC ethernet drivers
ENETC is a multi-port virtualized Ethernet controller supporting GbE
designs and Time-Sensitive Networking (TSN) functionality.
ENETC is operating as an SR-IOV multi-PF capable Root Complex Integrated
Endpoint (RCIE). As such, it contains multiple physical (PF) and virtual
(VF) PCIe functions, discoverable by standard PCI Express.
The patch series adds basic enablement for these otherwise standard
buffer descriptor (BD) ring based ethernet devices (PCIe PFs and VFs),
currently included in the 64-bit dual ARMv8 processors LS1028A SoC.
The driver is portable to 32-bit designs, and it's independent of CPU
endianness.
Contributors:
Alex Marginean <alexandru.marginean@nxp.com>
Catalin Horghidan <catalin.horghidan@nxp.com>
TODO list:
* IEEE 1588 PTP support;
* TSN support;
* MDIO support and VF link management;
* power management support;
* flow control support;
* TC offloading with h/w MQPRIO;
* interrupt coalescing, configurable BD ring sizes, and other usual
config options if missing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A ternary match table is used for RFS. If multiple entries in the table
match, the entry with the lowest numerical values index is chosen as the
matching entry. Entries in the table are identified using an index
which takes a value from 0 to PRFSCAPR[NUM_RFS]-1 when accessed by the
PSI (PF).
Portions of the RFS table can be assigned to each SI by the PSI (PF)
driver in PSIaRFSCFGR. Assignments are cumulative, the entries assigned
to SIn start after those assigned to SIn-1. The total assignments to
all SIs must be equal to or less than the number available to the port
as found in PRFSCAPR.
For RSS, the Toeplitz hash function used requires two inputs, a 40B
random secret key that is supplied through the PRSSKR0-9 registers as well
as the relevant pieces of the packet header (n-tuple). The 6 LSB bits of
the hash function result will then be used as a pointer to obtain the tag
referenced in the 64 entry indirection table. The result will provide a
winning group which will be used to help route the received packet.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VSIs (VFs) may send a message to the PSI (PF) for general notification
or to gain access to hardware resources which requires host inspection.
These messages may vary in size and are handled as a partition copy
between two memory regions owned by the respective participants.
The PSI will respond with fail or success and a 16-bit message code.
The patch implements the vf to pf messaging mechanism above and, as the
first application making use of this support, it enables the VF to
configure its own primary MAC address.
Signed-off-by: Catalin Horghidan <catalin.horghidan@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds most h/w statistics counters: non-privileged SI conters, as
well as privileged Port and MAC counters available only to the PF.
Per ring software stats are also included.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ENETC is a multi-port virtualized Ethernet controller supporting GbE
designs and Time-Sensitive Networking (TSN) functionality.
ENETC is operating as an SR-IOV multi-PF capable Root Complex Integrated
Endpoint (RCIE). As such, it contains multiple physical (PF) and
virtual (VF) PCIe functions, discoverable by standard PCI Express.
Introduce basic PF and VF ENETC ethernet drivers. The PF has access to
the ENETC Port registers and resources and makes the required privileged
configurations for the underlying VF devices. Common functionality is
controlled through so called System Interface (SI) register blocks, PFs
and VFs own a SI each. Though SI register blocks are almost identical,
there are a few privileged SI level controls that are accessible only to
PFs, and so the distinction is made between PF SIs (PSI) and VF SIs (VSI).
As such, the bulk of the code, including datapath processing, basic h/w
offload support and generic pci related configuration, is shared between
the 2 drivers and is factored out in common source files (i.e. enetc.c).
Major functionalities included (for both drivers):
MSI-X support for Rx and Tx processing, assignment of Rx/Tx BD ring pairs
to MSI-X entries, multi-queue support, Rx S/G (Rx frame fragmentation) and
jumbo frame (up to 9600B) support, Rx paged allocation and reuse, Tx S/G
support (NETIF_F_SG), Rx and Tx checksum offload, PF MAC filtering and
initial control ring support, VLAN extraction/ insertion, PF Rx VLAN
CTAG filtering, VF mac address config support, VF VLAN isolation support,
etc.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Soften the memory barrier call of mb() by a sufficient wmb() in the
consumer index update of the event queues.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEENrCndlB/VnAEWuH5k9IU1zQoZfEFAlxHFuoTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRCT0hTXNChl8eRAB/9Sp32xbjIVeyyEro8+vmlJ/HlSNgcC
txqmmh0ce4n1gglV3G5ll8mDYJsk5Qz+ur8wbud7CVxFmpeGqkAA7qoYeD8DawSp
/77dBG8yX/t8vxU5jGK9l5FomVcrqKSVbK20R6u0MLeZhF+VdJCu7cDElgIsKDqX
N76Df/mTnfdhReoM05CXEwzvEN8StHljm+oecyunxfw3XPtqDkO7QHEVYpk2zwVI
4U2aYu9L1gDbj5yGXNu8l/WwEL4ttyC7rqoW9we1hsDVtxRh1/o+Tond6/HPxIcN
voO/fm97ZKxpDL3cE0Jg6Grq9jUek7GT8oRRkAdWO4YaLr3f70kibQ3s
=d+ps
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-5.0-20190122' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2019-01-22
this is a pull request of 4 patches for net/master.
The first patch by is by Manfred Schlaegl and reverts a patch that caused wrong
warning messages in certain use cases. The next patch is by Oliver Hartkopp for
the bcm that adds sanity checks for the timer value before using it to detect
potential interger overflows. The last two patches are for the flexcan driver,
YueHaibing's patch fixes the the return value in the error path of the
flexcan_setup_stop_mode() function. The second patch is by Uwe Kleine-König and
fixes a NULL pointer deref on older flexcan cores in flexcan_chip_start().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan says:
====================
mlx4_core fixes for 5.0-rc
This patchset includes two fixes for the mlx4_core driver.
First patch by Aya fixes inaccurate parsing of some FW fields, mistakenly
including additional (mostly reserved) bits.
Second patch by Jack fixes a wrong (yet harmless) error handling of
calls to copy_to_user() during the CQs init stage.
Series generated against net commit:
49a57857ae Linux 5.0-rc3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Procedure mlx4_init_user_cqes() handles returns by copy_to_user
incorrectly. copy_to_user() returns the number of bytes not copied.
Thus, a non-zero return should be treated as a -EFAULT error
(as is done elsewhere in the kernel). However, mlx4_init_user_cqes()
error handling simply returns the number of bytes not copied
(instead of -EFAULT).
Note, though, that this is a harmless bug: procedure mlx4_alloc_cq()
(which is the only caller of mlx4_init_user_cqes()) treats any
non-zero return as an error, but that returned error value is processed
internally, and not passed further up the call stack.
In addition, fixes the following sparse warning:
warning: incorrect type in argument 1 (different address spaces)
expected void [noderef] <asn:1>*to
got void *buf
Fixes: e45678973d ("{net, IB}/mlx4: Initialize CQ buffers in the driver when possible")
Reported by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver reads the query HCA capabilities without the corresponding masks.
Without the correct masks, the base addresses of the queues are
unaligned. In addition some reserved bits were wrongly read. Using the
correct masks, ensures alignment of the base addresses and allows future
firmware versions safe use of the reserved bits.
Fixes: ab9c17a009 ("mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet")
Fixes: 0ff1fb654b ("{NET, IB}/mlx4: Add device managed flow steering firmware API")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now sctp_transport_pmtu() passes transport->saddr into .get_dst() to set
flow sport from 'saddr'. However, transport->saddr is set only when
transport->dst exists in sctp_transport_route().
If sctp_transport_pmtu() is called without transport->saddr set, like
when transport->dst doesn't exists, the flow sport will be set to 0
from transport->saddr, which will cause a wrong route to be got.
Commit 6e91b578bf ("sctp: re-use sctp_transport_pmtu in
sctp_transport_route") made the issue be triggered more easily
since sctp_transport_pmtu() would be called in sctp_transport_route()
after that.
In gerneral, fl4->fl4_sport should always be set to
htons(asoc->base.bind_addr.port), unless transport->asoc doesn't exist
in sctp_v4/6_get_dst(), which is the case:
sctp_ootb_pkt_new() ->
sctp_transport_route()
For that, we can simply handle it by setting flow sport from saddr only
when it's 0 in sctp_v4/6_get_dst().
Fixes: 6e91b578bf ("sctp: re-use sctp_transport_pmtu in sctp_transport_route")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the paths:
sctp_sf_do_unexpected_init() ->
sctp_make_init_ack()
sctp_sf_do_dupcook_a/b()() ->
sctp_sf_do_5_1D_ce()
The new chunk 'retval' transport is set from the incoming chunk 'chunk'
transport. However, 'retval' transport belong to the new asoc, which
is a different one from 'chunk' transport's asoc.
It will cause that the 'retval' chunk gets set with a wrong transport.
Later when sending it and because of Commit b9fd683982 ("sctp: add
sctp_packet_singleton"), sctp_packet_singleton() will set some fields,
like vtag to 'retval' chunk from that wrong transport's asoc.
This patch is to fix it by setting 'retval' transport correctly which
belongs to the right asoc in sctp_make_init_ack() and
sctp_sf_do_5_1D_ce().
Fixes: b9fd683982 ("sctp: add sctp_packet_singleton")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to improve sctp stream adding events in 2 places:
1. In sctp_process_strreset_addstrm_out(), move up SCTP_MAX_STREAM
and in stream allocation failure checks, as the adding has to
succeed after reconf_timer stops for the in stream adding
request retransmission.
3. In sctp_process_strreset_addstrm_in(), no event should be sent,
as no in or out stream is added here.
Fixes: 50a41591f1 ("sctp: implement receiver-side procedures for the Add Outgoing Streams Request Parameter")
Fixes: c5c4ebb3ab ("sctp: implement receiver-side procedures for the Add Incoming Streams Request Parameter")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to improve sctp stream reset events in 4 places:
1. In sctp_process_strreset_outreq(), the flag should always be set with
SCTP_STREAM_RESET_INCOMING_SSN instead of OUTGOING, as receiver's in
stream is reset here.
2. In sctp_process_strreset_outreq(), move up SCTP_STRRESET_ERR_WRONG_SSN
check, as the reset has to succeed after reconf_timer stops for the
in stream reset request retransmission.
3. In sctp_process_strreset_inreq(), no event should be sent, as no in
or out stream is reset here.
4. In sctp_process_strreset_resp(), SCTP_STREAM_RESET_INCOMING_SSN or
OUTGOING event should always be sent for stream reset requests, no
matter it fails or succeeds to process the request.
Fixes: 8105447645 ("sctp: implement receiver-side procedures for the Outgoing SSN Reset Request Parameter")
Fixes: 16e1a91965 ("sctp: implement receiver-side procedures for the Incoming SSN Reset Request Parameter")
Fixes: 11ae76e67a ("sctp: implement receiver-side procedures for the Reconf Response Parameter")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip l add dev tun type gretap key 1000
ip a a dev tun 10.0.0.1/24
Packets with tun-id 1000 can be recived by tun dev. But packet can't
be sent through dev tun for non-tunnel-dst
With this patch: tunnel-dst can be get through lwtunnel like beflow:
ip r a 10.0.0.7 encap ip dst 172.168.0.11 dev tun
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Björn Töpel says:
====================
This series adds an AF_XDP sock_diag interface for querying sockets
from user-space. Tools like iproute2 ss(8) can use this interface to
list open AF_XDP sockets.
The diagnostic provides information about the Rx/Tx/fill/completetion
rings, umem, memory usage and such. For a complete list, please refer
to the xsk_diag.c file.
The AF_XDP sock_diag interface is optional, and can be built as a
module. A separate patch series, adding ss(8) iproute2 support, will
follow.
v1->v2:
* Removed extra newline
* Zero-out all user-space facing structures prior setting the
members
* Added explicit "pad" member in _msg struct
* Removed unused variable "req" in xsk_diag_handler_dump()
Thanks to Daniel for reviewing the series!
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds the sock_diag interface for querying sockets from user
space. Tools like iproute2 ss(8) can use this interface to list open
AF_XDP sockets.
The user-space ABI is defined in linux/xdp_diag.h and includes netlink
request and response structs. The request can query sockets and the
response contains socket information about the rings, umems, inode and
more.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This commit adds an id to the umem structure. The id uniquely
identifies a umem instance, and will be exposed to user-space via the
socket monitoring interface.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Track each AF_XDP socket in a per-netns list. This will be used later
by the sock_diag interface for querying sockets from userspace.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Before:
$ make -s -C tools/testing/selftests/bpf
readelf: Error: Missing knowledge of 32-bit reloc types used in DWARF
sections of machine number 247
readelf: Warning: unable to apply unsupported reloc type 10 to section
.debug_info
readelf: Warning: unable to apply unsupported reloc type 1 to section
.debug_info
readelf: Warning: unable to apply unsupported reloc type 10 to section
.debug_info
After:
$ make -s -C tools/testing/selftests/bpf
v2:
* use llvm-readelf instead of redirecting binutils' readelf stderr to
/dev/null
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>