Several DMA related functions (such as the dma_map_xxx functions)
are not used with high latency devices and don't need to be invoked
in this case.
Signed-off-by: Erik Stromdahl <erik.stromdahl@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
This patch add support to get RSSI from acknowledgment
frames for transmitted management frames.
hardware_used: QCA4019, QCA9984.
firmware version: 10.4-3.5.3-00052.
Signed-off-by: Venkateswara Naralasetty <vnaralas@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Whenever ath10k firmware discards a packet (HTT_TX_COMPL_STATE_DISCARD
flag), the skb is freed and mac80211 does not get feedback through
ieee80211_tx_status().
Instead, make sure that the IEEE80211_TX_STAT_ACK flag is disabled and
let the packet go through, like ath9k does.
Signed-off-by: Ignacio Nunez Hernanz <nacho.nunez@aoifes.com>
[kvalo@codeaurora.org: rebase patch manually]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Fix output from checkpatch.pl like:
Block comments use a trailing */ on a separate lin
Signed-off-by: Marcin Rokicki <marcin.rokicki@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
commit 7a0adc83f3 ("ath10k: improve tx scheduling") is causing
severe throughput drop in multi client mode. This issue is originally
reported in veriwave setup with 50 clients with TCP downlink traffic.
While increasing number of clients, the average throughput drops
gradually. With 50 clients, the combined peak throughput is decreased
to 98 Mbps whereas reverting given commit restored it to 550 Mbps.
Processing txqs for every tx completion is causing overhead. Ideally for
management frame tx completion, pending txqs processing can be avoided.
The change partly reverts the commit "ath10k: improve tx scheduling".
Processing pending txqs after all skbs tx completion will yeild enough
room to burst tx frames.
Fixes: 7a0adc83f3 ("ath10k: improve tx scheduling")
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
With the %pK format specifier we hide the kernel addresses
with the help of kptr_restrict sysctl.
In this patch, %p is changed to %pK in the driver code.
The sysctl is documented in Documentation/sysctl/kernel.txt.
Signed-off-by: Maharaja Kennadyrajan <c_mkenna@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Not sure this can happen, but seems like a reasonable sanity
check.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Smatch warns about a number of cases in ath10k where a pointer is
null-checked after it has already been dereferenced, in code involving
ath10k private virtual interface pointers.
Fix these by making the dereference happen later.
Addresses the following smatch warnings:
drivers/net/wireless/ath/ath10k/mac.c:3651 ath10k_mac_txq_init() warn: variable dereferenced before check 'txq' (see line 3649)
drivers/net/wireless/ath/ath10k/mac.c:3664 ath10k_mac_txq_unref() warn: variable dereferenced before check 'txq' (see line 3659)
drivers/net/wireless/ath/ath10k/htt_tx.c:70 __ath10k_htt_tx_txq_recalc() warn: variable dereferenced before check 'txq->sta' (see line 52)
drivers/net/wireless/ath/ath10k/htt_tx.c:740 ath10k_htt_tx_get_vdev_id() warn: variable dereferenced before check 'cb->vif' (see line 736)
drivers/net/wireless/ath/ath10k/txrx.c:86 ath10k_txrx_tx_unref() warn: variable dereferenced before check 'txq' (see line 84)
drivers/net/wireless/ath/ath10k/wmi.c:1837 ath10k_wmi_op_gen_mgmt_tx() warn: variable dereferenced before check 'cb->vif' (see line 1825)
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Recent changes revolving around implementing
wake_tx_queue support introduced a significant
performance regressions on some (slower, uni-proc)
systems.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Smatch complains that since "ev->peer_id" comes from skb->data that
means we can't trust it and have to do a bounds check on it to prevent
an array overflow.
Fixes: 6942726f7f ('ath10k: add fast peer_map lookup')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Fixes checkpatch warnings:
drivers/net/wireless/ath/ath10k/mac.c:452: Prefer ether_addr_equal() or ether_addr_equal_unaligned() over memcmp()
drivers/net/wireless/ath/ath10k/mac.c:455: Prefer ether_addr_equal() or ether_addr_equal_unaligned() over memcmp()
drivers/net/wireless/ath/ath10k/txrx.c:133: Prefer ether_addr_equal() or ether_addr_equal_unaligned() over memcmp()
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Fix checkpatch warnings about use of spaces with operators:
spaces preferred around that '*' (ctx:VxV)
This has been recently added to checkpatch.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
To optimize CPU usage htt rx descriptors will be reused instead of
refilling it for htt rx copy engine (CE5). To support that all htt rx
indications should be processed at same context. FIFO queue is used
to maintain tx completion status for each msdu. This helps to retain
the order of tx completion.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Frames that are transmitted via MGMT_TX are using reserved descriptor
slots in firmware. This limitation is for the htt_mgmt_tx path itself,
not for mgmt frames per se. In 16 MBSSID scenario, these reserved slots
will be easy exhausted due to frequent probe responses. So for 10.4
based solutions, probe responses are limited by a threshold (24).
management tx path is separate for all except tlv based solutions. Since
tlv solutions (qca6174 & qca9377) do not support 16 AP interfaces, it is
safe to move management descriptor limitation check under mgmt_tx
function. Though CPU improvement is negligible, unlikely conditions or
never hit conditions in hot path can be avoided on data transmission.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The pull-push functionality of 10.4 will be based
on peer_id and tid. These will need to be mapped,
eventually, to ieee80211_txq to be used with
ieee80211_tx_dequeue().
Iterating over existing stations every time
peer_id needs to be mapped to a station would be
inefficient wrt CPU time.
The new firmware, which will be the only user of
the code flow-wise, will guarantee to use low
peer_ids first so despite peer_map's apparent huge
size d-cache thrashing should not be a problem.
Older firmware hot paths will effectively not use
peer_map.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Tx pending counter logic assumed that the sk_buff
is already known and hence was performed in HTT
functions themselves.
However, for the sake of future wake_tx_queue()
usage the driver must be able to tell whether it
can submit more frames to firmware before it
dequeues frame from ieee80211_txq (and thus long
before HTT Tx functions are called) because once a
frame is dequeued it cannot be requeud back to
mac80211.
This prepares the driver for future changes.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Frames are not guaranteed to be 802.11 frames in
ath10k_htt_tx() and the tx completion handler.
In some cases, like TDLS, they can be Ethernet.
Hence checking, e.g. frame_control could yield
bogus results and behavior.
Fortunately this wasn't a real problem so far
because there's no FW/HW combination to encounter
this problem.
However it is good to fix this in advance.
Fixes: 75d85fd999 ("ath10k: introduce basic tdls functionality")
Fixes: eebc67fef3 ("ath10k: fix pmf for wmi-tlv on qca6174")
Fixes: 7b7da0a021 ("ath10k: drop probe responses when too many are queued")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ath10k driver is using dma_pool_alloc per packet and dma_pool_free
in coresponding at Tx completion.
Use of pre-allocated DMA buffer in Tx will improve saving CPU resource
by 5% while it consumes about 56KB memory more as trade off.
Signed-off-by: Peter Oh <poh@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
In a noisy environment, when multiple interfaces are created,
the management tx descriptors are fully occupied by the probe
responses from all the interfaces. This prevents a new station
from a successful association.
Fix this by limiting the probe responses when the specified
threshold limit is reached.
Signed-off-by: Vivek Natarajan <nataraja@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
During tx completion, tx_lock is held for longer than required, preventing
efficient refill of htt->pending_tx. Refactor the code so that only MSDU
related operations are protected by the lock.
Improves downstream performance on a dual-core ARM Freescale LS1024A
(f.k.a. Mindspeed Comcerto 2000) AP with a 3x3 client from 495 to 580 Mbps.
Other CPU bound multicore systems may also benefit.
Signed-off-by: Denton Gentry <dgentry@google.com>
Signed-off-by: Avery Pennarun <apenwarr@google.com>
[mfaltesek@google.com: removed conflicting code for tracking msdu_ids.]
Signed-off-by: Marty Faltesek <mfaltesek@google.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
wait_event_timeout(), introduced in 'commit 5e3dd157d7 ("ath10k: mac80211
driver for Qualcomm Atheros 802.11ac CQA98xx devices")' never returns < 0
so the only failure condition to be checked is == 0 (timeout). Further the
return type is long not int - an appropriately named variable is added
and the assignments fixed up.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Use the new IEEE80211_TX_STAT_NOACK_TRANSMITTED flag
to indicate successful transmission of no-ack frames.
This fixes multicast frame accounting.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
HTT Tx protocol uses arbitrary host assigned ids
too associate with MSDUs when delivering
completions.
Instead of rolling out own id generation scheme
use the tools provided in kernel.
This should have little to no effect on
performance.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
In some cases hw recovery was taking an absurdly
long time due to ath10k waiting for things that
would never really complete.
Instead of waiting for inevitable timeouts poke
all completions and wakequeues and check if it's
still worth waiting.
Reading/writing ar->state requires conf_mutex.
Since waiters might be holding it introduce a new
flag CRASH_FLUSH so it's possible to tell waiters
to abort whatever they were waiting for.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The tx info such as msdu_id, frame len, vdev id and tid are reported
to user space by tracepoint. This is useful for collecting tx
statistics.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
As suggeested by checkpatch:
WARNING: Prefer ether_addr_copy() over memcpy() if the Ethernet addresses are __aligned(2)
In wmi.c I had to change due to sparse warnings copying of struct wmi_mac_addr
from form &cmd->peer_macaddr.addr to cmd->peer_macaddr.addr. In
ath10k_wmi_set_ap_ps_param() I also added the missing ".addr" to the copy
command.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This makes it a lot easier to log and debug
messages if there's more than 1 ath10k device on a
system.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Firmware doesn't perform Rx reordering so it is
left to the host driver to do that.
Use mac80211 to perform reordering instead of
re-inventing the wheel.
This fixes TCP throughput issues in some
environments.
Reported-by: Denton Gentry <denton.gentry@gmail.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It is inefficient to grab irqsave spinlocks for
skb lists for each queue/dequeue action.
Using rx_ring.lock and tx_lock allows to use less
heavy bh spinlock functions and moving locking
upwards allows to toggle spinlocks less often.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Going through full htc tx path for htt tx is a
waste of resources. By skipping it it's possible
to easily submit scatter-gather to the pci hif for
reduced host cpu load and improved performance.
The new approach uses dma pool to store the
following metadata for each tx request:
* msdu fragment list
* htc header
* htt tx command
The htt tx command contains a msdu prefetch.
Instead of copying it original mapped msdu address
is used to submit a second scatter-gather item to
hif to make a complete htt tx command.
The htt tx command itself hands over dma mapped
pointers to msdus and completion of the command
itself doesn't mean the frame has been sent and
can be unmapped/freed. This is why htc tx
completion is skipped for htt tx as all tx related
resources are freed upon htt tx completion
indication event (which also implicitly means htt
tx command itself was completed).
Since now each htt tx request effectively consists
of 2 copy engine items CE_HTT_H2T_MSG_SRC_NENTRIES
is updated to allow maximum of
TARGET_10X_NUM_MSDU_DESC msdus being queued. This
keeps the tx path resource management simple.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
There's no real benefit from using them. DMA-API
already provides debugging. Some skbuffs are
already mapped directly with DMA-API since wrapper
arguments were insufficient and extending them
would be pointless.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Retrieve the mactime of ieee80211_rx_status based on received
data frame. The value is obtained from the htt_rx_indication_ppdu
structure and only available in 32-bit.
kvalo: white space fixes
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Makes it easier to determine why some failures
happened.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ieee80211_rx_status.flags is full. Define a new vht_flag
variable to be able to set more VHT related flags and make
room in flags.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Acked-by: Kalle Valo <kvalo@qca.qualcomm.com> [ath10k]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We should check MIC error flag base on
rx_attention, to have consistent status
of MIC failure and FCS error.
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Too much of tx info was being cleared. This caused
issues in some setups with tx frame status
reporting.
This should fix some cases of stations not being
able to associate to ath10k AP.
Reported-By: Matti Laakso <malaakso@elisanet.fi>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Garbage was reported in ieee80211_tx_info. This
led to a WARN_ON in cfg80211_calculate_bitrate().
This also fixes some random tx bitrate values
reported through `iw` command.
Reported-By: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
With commit 0cfcefef1 ("mac80211: support reporting A-MSDU subframes
individually") there's no need to have the hack to clear the retry bit in
ath10k_htt_rx_amsdu(), mac80211 can handle this properly now.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
NSTS values reported in the VHT-SIG-A1 are 0
through 7 but they actually describe number of
streams 1 through 8.
1SS frames were dropped. This patch fixes this.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
HW reports each A-MSDU subframe as a separate
sk_buff. It is impossible to configure it to
behave differently.
Until now ath10k was reconstructing A-MSDUs from
subframes which involved a lot of memory
operations. This proved to be a significant
contributor to degraded RX performance.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Instead of allocating sk_buff for a mere 16-byte
tx fragment list buffer use headroom of the
original msdu sk_buff.
This decreases CPU cache pressure and improves
performance.
Measured improvement on AP135 is 560mbps ->
590mbps of UDP TX briding traffic.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Until now the all MSDU transfer related structures
were freed when all resources were unreferenced.
Now HTC transfer is freed independently and HTT
transfer is so too.
This yields a way more simpler ath10k_skb_cb and
should possibly enable parallel pipe processing
(which is now serialized in
ath10k_pci_process_ce routine).
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This reduces number of memory accesses and
hopefully contributes to better performance in the
future.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It's more efficient to simply check num_pending_tx
value instead of traversing whole bitmap of
msdu ids.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>