Commit Graph

73 Commits

Author SHA1 Message Date
Mohammed Shafi Shajakhan 02a9e08d73 ath10k: Avoid potential page alloc BUG_ON in tx free path
'ath10k_htt_tx_free_cont_txbuf' and 'ath10k_htt_tx_free_cont_frag_desc'
have NULL pointer checks to avoid crash if they are called twice
but this is as of now not sufficient as these pointers are not assigned
to NULL once the contiguous DMA memory allocation is freed, fix this.
Though this may not be hit with the explicity check of state variable
'tx_mem_allocated' check, good to have this addressed as well.

Below BUG_ON is hit when the above scenario is simulated
with kernel debugging enabled

 page:f6d09a00 count:0 mapcount:-127 mapping:  (null)
index:0x0
 flags: 0x40000000()
 page dumped because: VM_BUG_ON_PAGE(page_ref_count(page)
== 0)
 ------------[ cut here ]------------
 kernel BUG at ./include/linux/mm.h:445!
 invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
 EIP is at put_page_testzero.part.88+0xd/0xf
 Call Trace:
  [<c118a2cc>] __free_pages+0x3c/0x40
  [<c118a30e>] free_pages+0x3e/0x50
  [<c10222b4>] dma_generic_free_coherent+0x24/0x30
  [<f8c1d9a8>] ath10k_htt_tx_free_cont_txbuf+0xf8/0x140

  [<f8c1e2a9>] ath10k_htt_tx_destroy+0x29/0xa0

  [<f8c143e0>] ath10k_core_destroy+0x60/0x80 [ath10k_core]
  [<f8acd7e9>] ath10k_pci_remove+0x79/0xa0 [ath10k_pci]
  [<c13ed7a8>] pci_device_remove+0x38/0xb0
  [<c14d3492>] __device_release_driver+0x72/0x100
  [<c14d36b7>] driver_detach+0x97/0xa0
  [<c14d29c0>] bus_remove_driver+0x40/0x80
  [<c14d427a>] driver_unregister+0x2a/0x60
  [<c13ec768>] pci_unregister_driver+0x18/0x70
  [<f8aced4f>] ath10k_pci_exit+0xd/0x2be [ath10k_pci]
  [<c1101e78>] SyS_delete_module+0x158/0x210
  [<c11b34f1>] ? __might_fault+0x41/0xa0
  [<c11b353b>] ? __might_fault+0x8b/0xa0
  [<c1001a4b>] do_fast_syscall_32+0x9b/0x1c0
  [<c178da34>] sysenter_past_esp+0x45/0x74

Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-12-15 11:17:52 +02:00
Mohammed Shafi Shajakhan 9ec34a8619 ath10k: fix Tx DMA alloc failure during continuous wifi down/up
With maximum number of vap's configured in a two radio supported
systems of ~256 Mb RAM, doing a continuous wifi down/up and
intermittent traffic streaming from the connected stations results
in failure to allocate contiguous memory for tx buffers. This results
in the disappearance of all VAP's and a manual reboot is needed as
this is not a crash (or) OOM(for OOM killer to be invoked). To address
this allocate contiguous memory for tx buffers one time and re-use them
until the modules are unloaded but this results in a slight increase in
memory footprint of ath10k when the wifi is down, but the modules are
still loaded. Also as of now we use a separate bool 'tx_mem_allocated'
to keep track of the one time memory allocation, as we cannot come up
with something like 'ath10k_tx_{register,unregister}' before
'ath10k_probe_fw' is called as 'ath10k_htt_tx_alloc_cont_frag_desc'
memory allocation is dependent on the hw_param 'continuous_frag_desc'

a) memory footprint of ath10k without the change

lsmod | grep ath10k
ath10k_core           414498  1 ath10k_pci
ath10k_pci             38236  0

b) memory footprint of ath10k with the change

ath10k_core           414980  1 ath10k_pci
ath10k_pci             38236  0

Memory Failure Call trace:

hostapd: page allocation failure: order:6, mode:0xd0
 [<c021f150>] (__dma_alloc_buffer.isra.23) from
[<c021f23c>] (__alloc_remap_buffer.isra.26+0x14/0xb8)
[<c021f23c>] (__alloc_remap_buffer.isra.26) from
[<c021f664>] (__dma_alloc+0x224/0x2b8)
[<c021f664>] (__dma_alloc) from [<c021f810>]
(arm_dma_alloc+0x84/0x90)
[<c021f810>] (arm_dma_alloc) from [<bf954764>]
(ath10k_htt_tx_alloc+0xe0/0x2e4 [ath10k_core])
[<bf954764>] (ath10k_htt_tx_alloc [ath10k_core]) from
[<bf94e6ac>] (ath10k_core_start+0x538/0xcf8 [ath10k_core])
[<bf94e6ac>] (ath10k_core_start [ath10k_core]) from
[<bf947eec>] (ath10k_start+0xbc/0x56c [ath10k_core])
[<bf947eec>] (ath10k_start [ath10k_core]) from
[<bf8a7a04>] (drv_start+0x40/0x5c [mac80211])
[<bf8a7a04>] (drv_start [mac80211]) from [<bf8b7cf8>]
(ieee80211_do_open+0x170/0x82c [mac80211])
[<bf8b7cf8>] (ieee80211_do_open [mac80211]) from
[<c056afc8>] (__dev_open+0xa0/0xf4)
[21053.491752] Normal: 641*4kB (UEMR) 505*8kB (UEMR) 330*16kB (UEMR)
126*32kB (UEMR) 762*64kB (UEMR) 237*128kB (UEMR) 1*256kB (M) 0*512kB
0*1024kB 0*2048kB 0*4096kB = 95276kB

Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-12-01 13:13:55 +02:00
Mohammed Shafi Shajakhan f004e532a8 ath10k: remove extraneous error message in tx alloc
Remove extraneous error message in 'ath10k_htt_tx_alloc_cont_frag_desc'
as the caller 'ath10k_htt_tx_alloc' already dumps a proper error
message

Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-11-15 17:05:38 +02:00
Mohammed Shafi Shajakhan 19f338c6eb ath10k: clean up HTT tx buffer allocation and free
cleanup 'ath10k_htt_tx_alloc' by introducing the API's
'ath10k_htt_tx_alloc/free_{cont_txbuf, txdone_fifo} and
re-use them whereever needed

Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-11-15 17:05:33 +02:00
Rajkumar Manoharan 3c97f5de1f ath10k: implement NAPI support
Add NAPI support for rx and tx completion. NAPI poll is scheduled
from interrupt handler. The design is as below

 - on interrupt
     - schedule napi and mask interrupts
 - on poll
   - process all pipes (no actual Tx/Rx)
   - process Rx within budget
   - if quota exceeds budget reschedule napi poll by returning budget
   - process Tx completions and update budget if necessary
   - process Tx fetch indications (pull-push)
   - push any other pending Tx (if possible)
   - before resched or napi completion replenish htt rx ring buffer
   - if work done < budget, complete napi poll and unmask interrupts

This change also get rid of two tasklets (intr_tq and txrx_compl_task).

Measured peak throughput with NAPI on IPQ4019 platform in controlled
environment. No noticeable reduction in throughput is seen and also
observed improvements in CPU usage. Approx. 15% CPU usage got reduced
in UDP uplink case.

DL: AP DUT Tx
UL: AP DUT Rx

IPQ4019 (avg. cpu usage %)

========
                TOT              +NAPI
              ===========      =============
TCP DL       644 Mbps (42%)    645 Mbps (36%)
TCP UL       673 Mbps (30%)    675 Mbps (26%)
UDP DL       682 Mbps (49%)    680 Mbps (49%)
UDP UL       720 Mbps (28%)    717 Mbps (11%)

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-09-09 14:49:47 +03:00
Ben Greear de0170beaa ath10k: ensure txrx-compl-task is stopped when cleaning htt-tx
Otherwise, the txrx-compl-task may access some bad memory?

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-07-08 09:41:55 +03:00
Bob Copeland a66cd733a7 ath10k: fix potential null dereference bugs
Smatch warns about a number of cases in ath10k where a pointer is
null-checked after it has already been dereferenced, in code involving
ath10k private virtual interface pointers.

Fix these by making the dereference happen later.

Addresses the following smatch warnings:

drivers/net/wireless/ath/ath10k/mac.c:3651 ath10k_mac_txq_init() warn: variable dereferenced before check 'txq' (see line 3649)
drivers/net/wireless/ath/ath10k/mac.c:3664 ath10k_mac_txq_unref() warn: variable dereferenced before check 'txq' (see line 3659)
drivers/net/wireless/ath/ath10k/htt_tx.c:70 __ath10k_htt_tx_txq_recalc() warn: variable dereferenced before check 'txq->sta' (see line 52)
drivers/net/wireless/ath/ath10k/htt_tx.c:740 ath10k_htt_tx_get_vdev_id() warn: variable dereferenced before check 'cb->vif' (see line 736)
drivers/net/wireless/ath/ath10k/txrx.c:86 ath10k_txrx_tx_unref() warn: variable dereferenced before check 'txq' (see line 84)
drivers/net/wireless/ath/ath10k/wmi.c:1837 ath10k_wmi_op_gen_mgmt_tx() warn: variable dereferenced before check 'cb->vif' (see line 1825)

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-06-30 13:54:15 +03:00
Kalle Valo c4cdf753ed ath10k: move fw_features to struct ath10k_fw_file
Preparation for testmode.c to use ath10k_core_fetch_board_data_api_n().

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-04-20 20:00:25 +03:00
Rajkumar Manoharan 59465fe46e ath10k: speedup htt rx descriptor processing for tx completion
To optimize CPU usage htt rx descriptors will be reused instead of
refilling it for htt rx copy engine (CE5). To support that all htt rx
indications should be processed at same context. FIFO queue is used
to maintain tx completion status for each msdu. This helps to retain
the order of tx completion.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-04-04 17:03:07 +03:00
Rajkumar Manoharan cac085524c ath10k: move mgmt descriptor limit handle under mgmt_tx
Frames that are transmitted via MGMT_TX are using reserved descriptor
slots in firmware. This limitation is for the htt_mgmt_tx path itself,
not for mgmt frames per se. In 16 MBSSID scenario, these reserved slots
will be easy exhausted due to frequent probe responses. So for 10.4
based solutions, probe responses are limited by a threshold (24).

management tx path is separate for all except tlv based solutions. Since
tlv solutions (qca6174 & qca9377) do not support 16 AP interfaces, it is
safe to move management descriptor limitation check under mgmt_tx
function. Though CPU improvement is negligible, unlikely conditions or
never hit conditions in hot path can be avoided on data transmission.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-03-18 09:52:27 +02:00
Michal Kazior 426e10eaf7 ath10k: implement push-pull tx
The current/old tx path design was that host, at
its own leisure, pushed tx frames to the device.
For HTT there was ~1000-1400 msdu queue depth.

After reaching that limit the driver would request
mac80211 to stop queues. There was little control
over what packets got in there as far as
DA/RA was considered so it was rather easy to
starve per-station traffic flows.

With MU-MIMO this became a significant problem
because the queue depth was insufficient to buffer
frames from multiple clients (which could have
different signal quality and capabilities) in an
efficient fashion.

Hence the new tx path in 10.4 was introduced: a
pull-push mode.

Firmware and host can share tx queue state via
DMA. The state is logically a 2 dimensional array
addressed via peer_id+tid pair. Each entry is a
counter (either number of bytes or packets. Host
keeps it updated and firmware uses it for
scheduling Tx pull requests to host.

This allows MU-MIMO to become a lot more effective
with 10+ clients.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-03-06 16:31:12 +02:00
Michal Kazior c1a43d9720 ath10k: implement updating shared htt txq state
Firmware 10.4.3 onwards can support a pull-push Tx
model where it shares a Tx queue state with the
host.

The host updates the DMA region it pointed to
during HTT setup whenever number of software
queued from (on host) changes. Based on this
information firmware issues fetch requests to the
host telling the host how many frames from a list
of given stations/tids should be submitted to the
firmware.

The code won't be called because not all
appropriate HTT events are processed yet.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-03-06 16:31:11 +02:00
Michal Kazior 839ae6371e ath10k: add new htt message generation/parsing logic
This merely adds some parsing, generation and
sanity checks with placeholders for real
code/functionality to be added later.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-03-06 16:14:32 +02:00
Michal Kazior 6421969f24 ath10k: refactor tx pending management
Tx pending counter logic assumed that the sk_buff
is already known and hence was performed in HTT
functions themselves.

However, for the sake of future wake_tx_queue()
usage the driver must be able to tell whether it
can submit more frames to firmware before it
dequeues frame from ieee80211_txq (and thus long
before HTT Tx functions are called) because once a
frame is dequeued it cannot be requeud back to
mac80211.

This prepares the driver for future changes.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-03-06 16:14:25 +02:00
Michal Kazior 9b15873628 ath10k: implement basic support for new tx path firmware
This allows to use the new firmware which
implements the new tx data path. Without this
patch firmware supporting new tx path stops
responding shortly after booting.

This patch doesn't implement the entire pull-push
logic available in the new firmware. This will be
done in subsequent patches.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-01-28 10:45:29 +02:00
Michal Kazior 575fc89500 ath10k: clean up cont frag desc init code
This makes the code easier to extend and re-use.

While at it fix _warn to _err. Other than that
there are no functional changes.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2016-01-28 10:45:28 +02:00
Felix Fietkau d6cb23b514 ath10k: stop abusing GFP_DMA
Allocations from the DMA zone were originally added for legacy ISA
stuff, or PCI devices that have specific limitations in their DMA
addressing capabilities. It has no place in ath10k, which can do
full 32-bit DMA.

Fixes memory allocation errors on some platforms.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-11-30 16:54:45 +02:00
Michal Kazior aca146afbc ath10k: store msdu_id instead of txbuf pointers
Txbuf is no longer a DMA pool and can be easily
tracked with a mere msdu_id. This saves 10 bytes
on 64bit systems and 6 bytes on 32bit systems of
precious sk_buff control buffer.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-11-23 17:12:32 +02:00
Michal Kazior 609db229b4 ath10k: replace vdev_id and tid in skb cb
This prepares the driver for future ieee80211_txq
and wake_tx_queue() support.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-11-23 17:12:30 +02:00
Michal Kazior 66b8a0108d ath10k: pack up flags in skb_cb
It was wasteful to have all the flags as separate
bools.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-11-23 17:12:27 +02:00
Michal Kazior bd87744028 ath10k: remove freq from skb_cb
It was wasteful to keep it in the struct.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-11-23 17:12:26 +02:00
Michal Kazior 8a933964e8 ath10k: remove txmode from skb_cb
It was wasteful to keep it in the struct because
it can be passed as function argument down the tx
path.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-11-23 17:12:22 +02:00
Michal Kazior fd12cb3246 ath10k: merge is_protected with nohwcrypt
It was wasteful to have two flags describing
the same thing.

While at it fix code style of
ath10k_tx_h_use_hwcrypto().

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-11-23 17:12:19 +02:00
Vasanthakumar Thiagarajan d39de9919a ath10k: fix peerid configuration in htt tx desc for htt version < 3.4
Of a word in struct htt_data_tx_desc htt version >= 3.4 firmware uses
LSB 16-bit for frequency configuration which is used for offchannel tx
and MSB 16-bit is for peerid. But other firmwares using version 2.X
(10.1, 10.2.2, 10.2.4 and 10.4) are using 32-bit for peerid in htt tx
desc. So far no issue is found with the existing code setting peerid and
freq for HTT version 2.X, this could be mainly because of 0 as frequecy
(home channel) is being always passed with those firmwares. There may be
issues when non-zero freq is passed with firmware using < 3.4 htt version.
To be safe use target_version_major and target_version_minor along with
htt-op-version before configuring peer id and freq in htt tx desc. This
patch extends ath10k_mac_tx_frm_has_freq() to check for htt_op_version_tlv
and uses the helper while setting peerid in htt_tx_desc.

Fixes: 8d6d362436 ("ath10k: fix offchan reliability")
Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-11-12 21:19:16 +02:00
Tamizh chelvam 90eceb3b5f ath10k: set peer MFP flag in peer assoc command
Set peer's management frame protection flag in peer assoc command,
this setting will enable/disable encrytion of management frames in fw.

Setting of this flag is based on whether MFP is enabled/disabled at STA
and a firmware feature flag ATH10K_FW_FEATURE_MFP_SUPPORT. This is because
only firmwares 10.1.561 and above have support for MFP.

Signed-off-by: Tamizh chelvam <c_traja@qti.qualcomm.com>
Signed-off-by: Manikanta pubbisetty <c_mpubbi@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-11-04 19:30:20 +02:00
Rajkumar Manoharan 3f0f7ed420 ath10k: export htt tx rx handlers
Some special copy engines delivers messages directly to HTT by
bypassing HTC layer. Hence exporting tx_completion and rx_handler
for delivering the data to HTT layer.

Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-10-16 15:49:35 +03:00
Peter Oh 683b95e807 ath10k: use pre-allocated DMA buffer in Tx
ath10k driver is using dma_pool_alloc per packet and dma_pool_free
in coresponding at Tx completion.
Use of pre-allocated DMA buffer in Tx will improve saving CPU resource
by 5% while it consumes about 56KB memory more as trade off.

Signed-off-by: Peter Oh <poh@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-10-06 15:04:12 +03:00
Kalle Valo b9e284e515 ath10k: brace style fixes
drivers/net/wireless/ath/ath10k/htt_tx.c:457: braces {} are not necessary for single statement blocks
drivers/net/wireless/ath/ath10k/htt_tx.c:545: braces {} are not necessary for single statement blocks
drivers/net/wireless/ath/ath10k/mac.c:200: braces {} are not necessary for single statement blocks

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-10-06 14:40:46 +03:00
Bob Copeland bc76c28719 ath10k: check for encryption before adding MIC_LEN
In the case of raw mode without nohwcrypt parameter, we
should still make sure the frame is protected before
adding MIC_LEN to avoid skb_under_panic errors.

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-09-17 13:48:41 +03:00
Vivek Natarajan 7b7da0a021 ath10k: drop probe responses when too many are queued
In a noisy environment, when multiple interfaces are created,
the management tx descriptors are fully occupied by the probe
responses from all the interfaces. This prevents a new station
from a successful association.

Fix this by limiting the probe responses when the specified
threshold limit is reached.

Signed-off-by: Vivek Natarajan <nataraja@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-09-09 12:37:41 +03:00
Michal Kazior 5e55e3cbd1 ath10k: fix dma_mapping_error() handling
The function returns 1 when DMA mapping fails. The
driver would return bogus values and could
possibly confuse itself if DMA failed.

Fixes: 767d34fc67 ("ath10k: remove DMA mapping wrappers")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-08-26 11:04:44 +03:00
Peter Oh ae7d3821a7 ath10k: initialize msdu ext. descriptor before use
Initial QCA99X0 support has a known issue with TCP Tx throughput.
All other path such as UDP Tx/Rx and TCP Rx meet their expectation
(> 900Mbps), but TCP Tx marked as low as 5Mbps when single pair is
used on iperf.

The root cause is turned out because TSO flag is not initialized
properly so that firmware configures TSO in wrong way.
TSO flags in msdu extension descriptor is required to be reset
to indicate firmware there is no TSO is enabled, otherwise it
could act as TSO is enabled which causes huge throughput drop.

In fact, it's enough by resetting TSO flags only to prevent the
unexpected behavior, but initializing whole msdu ext. descriptor
will help to clear uncertainty of firmware could bring on as it
constantly updated.

Signed-off-by: Peter Oh <poh@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-07-30 17:05:29 +03:00
David Liu ccec9038c7 ath10k: enable raw encap mode and software crypto engine
This patch enables raw Rx/Tx encap mode to support software based
crypto engine. This patch introduces a new module param 'cryptmode'.

 cryptmode:

   0: Use hardware crypto engine globally with native Wi-Fi mode TX/RX
      encapsulation to the firmware. This is the default mode.
   1: Use sofware crypto engine globally with raw mode TX/RX
      encapsulation to the firmware.

Known limitation:
   A-MSDU must be disabled for RAW Tx encap mode to perform well when
   heavy traffic is applied.

Testing: (by Michal Kazior <michal.kazior@tieto.com>)

     a) Performance Testing

      cryptmode=1
       ap=qca988x sta=killer1525
        killer1525  ->  qca988x     194.496 mbps [tcp1 ip4]
        killer1525  ->  qca988x     238.309 mbps [tcp5 ip4]
        killer1525  ->  qca988x     266.958 mbps [udp1 ip4]
        killer1525  ->  qca988x     477.468 mbps [udp5 ip4]
        qca988x     ->  killer1525  301.378 mbps [tcp1 ip4]
        qca988x     ->  killer1525  297.949 mbps [tcp5 ip4]
        qca988x     ->  killer1525  331.351 mbps [udp1 ip4]
        qca988x     ->  killer1525  371.528 mbps [udp5 ip4]
       ap=killer1525 sta=qca988x
        qca988x     ->  killer1525  331.447 mbps [tcp1 ip4]
        qca988x     ->  killer1525  328.783 mbps [tcp5 ip4]
        qca988x     ->  killer1525  375.309 mbps [udp1 ip4]
        qca988x     ->  killer1525  403.379 mbps [udp5 ip4]
        killer1525  ->  qca988x     203.689 mbps [tcp1 ip4]
        killer1525  ->  qca988x     222.339 mbps [tcp5 ip4]
        killer1525  ->  qca988x     264.199 mbps [udp1 ip4]
        killer1525  ->  qca988x     479.371 mbps [udp5 ip4]

      Note:
       - only open network tested for RAW vs nwifi performance comparison
       - killer1525 (qca6174 hw2.2) is 2x2 device (hence max 866mbps)
       - used iperf
       - OTA, devices a few cm apart from each other, no shielding
       - tcpX/udpX, X - means number of threads used

      Overview:
       - relative Tx performance drop is seen but is within reasonable and
         expected threshold (A-MSDU must be disabled with RAW Tx)

     b) Connectivity Testing

      cryptmode=1
       ap=iwl6205 sta1=qca988x crypto=open     topology-1ap1sta          OK
       ap=iwl6205 sta1=qca988x crypto=wep1     topology-1ap1sta          OK
       ap=iwl6205 sta1=qca988x crypto=wpa      topology-1ap1sta          OK
       ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta          OK
       ap=qca988x sta1=iwl6205 crypto=open     topology-1ap1sta          OK
       ap=qca988x sta1=iwl6205 crypto=wep1     topology-1ap1sta          OK
       ap=qca988x sta1=iwl6205 crypto=wpa      topology-1ap1sta          OK
       ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta          OK
       ap=iwl6205 sta1=qca988x crypto=open     topology-1ap1sta2br       OK
       ap=iwl6205 sta1=qca988x crypto=wep1     topology-1ap1sta2br       OK
       ap=iwl6205 sta1=qca988x crypto=wpa      topology-1ap1sta2br       OK
       ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta2br       OK
       ap=qca988x sta1=iwl6205 crypto=open     topology-1ap1sta2br       OK
       ap=qca988x sta1=iwl6205 crypto=wep1     topology-1ap1sta2br       OK
       ap=qca988x sta1=iwl6205 crypto=wpa      topology-1ap1sta2br       OK
       ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta2br       OK
       ap=iwl6205 sta1=qca988x crypto=open     topology-1ap1sta2br1vlan  OK
       ap=iwl6205 sta1=qca988x crypto=wep1     topology-1ap1sta2br1vlan  OK
       ap=iwl6205 sta1=qca988x crypto=wpa      topology-1ap1sta2br1vlan  OK
       ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta2br1vlan  OK
       ap=qca988x sta1=iwl6205 crypto=open     topology-1ap1sta2br1vlan  OK
       ap=qca988x sta1=iwl6205 crypto=wep1     topology-1ap1sta2br1vlan  OK
       ap=qca988x sta1=iwl6205 crypto=wpa      topology-1ap1sta2br1vlan  OK
       ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta2br1vlan  OK

      Note:
       - each test takes all possible endpoint pairs and pings
       - each pair-ping flushes arp table
       - ip6 is used

     c) Testbed Topology:

      1ap1sta:
        [ap] ---- [sta]

        endpoints: ap, sta

      1ap1sta2br:
        [veth0] [ap] ---- [sta] [veth2]
           |     |          |     |
        [veth1]  |          \   [veth3]
            \   /            \  /
            [br0]            [br1]

        endpoints: veth0, veth2, br0, br1
        note: STA works in 4addr mode, AP has wds_sta=1

      1ap1sta2br1vlan:
        [veth0] [ap] ---- [sta] [veth2]
           |     |          |     |
        [veth1]  |          \   [veth3]
            \   /            \  /
          [br0]              [br1]
            |                  |
          [vlan0_id2]        [vlan1_id2]

        endpoints: vlan0_id2, vlan1_id2
        note: STA works in 4addr mode, AP has wds_sta=1

Credits:

    Thanks to Michal Kazior <michal.kazior@tieto.com> who helped find the
    amsdu issue, contributed a workaround (already squashed into this
    patch), and contributed the throughput and connectivity tests results.

Signed-off-by: David Liu <cfliu.tw@gmail.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Tested-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-07-29 11:28:24 +03:00
Qi Zhou 005fb16131 ath10k: Improve performance by reducing tx_lock contention
During tx completion, tx_lock is held for longer than required, preventing
efficient refill of htt->pending_tx. Refactor the code so that only MSDU
related operations are protected by the lock.

Improves downstream performance on a dual-core ARM Freescale LS1024A
(f.k.a. Mindspeed Comcerto 2000) AP with a 3x3 client from 495 to 580 Mbps.
Other CPU bound multicore systems may also benefit.

Signed-off-by: Denton Gentry <dgentry@google.com>
Signed-off-by: Avery Pennarun <apenwarr@google.com>
[mfaltesek@google.com: removed conflicting code for tracking msdu_ids.]
Signed-off-by: Marty Faltesek <mfaltesek@google.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-07-29 11:27:24 +03:00
Raja Mani 1d0088f8c1 ath10k: extend struct htt_mgmt_tx_dec for qca99x0
HTT_H2T_MSG_TYPE_MGMT_TX msg in 10.4 firmware carries additional
4 byte in htt_mgmt_tx_desc where it tells to firmware that at what
rate mgmt frame has to go out in the air. It's an optional parameter,
setting this field to zero will force firmware to choose auto rate
and send the frame out.

Those 4 byte info is missed out in the current code and 10.4 firmware
ended up reading some junk in those 4 byte and sometime malfunctioning.

Fix it by adding 4 byte in struct htt_mgmt_tx_desc. Non 10.4 firmware
will not process those four byte. So, adding 4 byte at the end of
struct htt_mgmt_tx_desc will not create any impact on other chipset.

Signed-off-by: Raja Mani <rmani@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-07-24 11:10:43 +03:00
Manikanta Pubbisetty b963519509 ath10k: add TCP/UDP Checksum offload support for QCA99x0
The patch adds support to offload TCP/UDP checksum
calculations for QCA99x0.

Signed-off-by: Manikanta Pubbisetty <c_mpubbi@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-07-24 11:08:44 +03:00
Peter Oh fbc03a466f ath10k: update tx path to support QCA99X0
Since QCA99X0 uses fragmentation descriptor differently from
other ones on tx path, we need to handle it separately.

QCA99X0 is using 48 bits for address and 16 bits for length
out of 2 dword and each values have to be programmed by frag
desc base addr + msdu id, so that hardware can retrieve
corresponding frag data.

Signed-off-by: Peter Oh <poh@qca.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-07-24 10:44:44 +03:00
Raja Mani d9156b5f68 ath10k: configure frag desc memory to target for qca99X0
Pre qca99X0 chipsets follows the model where dynamically allocate
memory for frag desc on getting new skb for TX. But, this is not
going to be the case in qca99X0. It expects frag desc memory to be
allocated at boot time and let the driver to reuse allocated memory
after every TX completion. So there won't be any dynamic frag memory
memory allocation in qca99X0 during data transmission.

qca99X0 hardware doesn't need fragment desc address to be programmed
in msdu descriptor for every data transaction. It needs to know only
starting address of fragment descriptor at the time of the boot.
During data transmission, qca99X0 hardware can retrieve corresponding
frag addr by adding programmed frag desc base addr + msdu id.

Allocate continuous fragment descriptor memory (same size as number of
descriptor) at the time of target initialization and configure allocated
dma address to the target via HTT_H2T_MSG_TYPE_FRAG_DESC_BANK_CFG.

How this is allocated continuous memory is going to be used is not
covered in this patch. It just allocates memory and hand over to firmware.
If we don't do it at init time, qca99X0 will stall when firmware tries
to do TX.

Signed-off-by: Raja Mani <rmani@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-07-02 08:47:23 +03:00
Michal Kazior 96d828d45e ath10k: rework tx queue locking
Tx queue locking was very simple until now.
Multi-channel support will require a more flexible
and fine grained control.

This introduces a per-hw and per-vif (each with a
bitmask of reasons) tx queue locking.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-04-01 20:31:00 +03:00
Michal Kazior d740d8fd24 ath10k: unify tx mode and dispatch
There are a few different tx paths depending on
firmware and frame itself.

Creating a uniform decision will make it possible
to switch between different txmode easier, both
for testing and for future features as well.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Marek Puzyniak <marek.puzyniak@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-03-30 15:00:25 +03:00
Helmut Schaa 75930d1a80 ath10k: Use TX cksum offload only for CHECKSUM_PARTIAL
Otherwise ath10k will just checksum everything even if it did not
go through the TCP/IP stack (for example bridged frames). In the worst
case this could mean recreating the checksum for incorrect data.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-02-04 09:19:03 +02:00
Marek Kwaczynski eebc67fef3 ath10k: fix pmf for wmi-tlv on qca6174
New wmi-tlv firmware uses HTT 3.0 protocol which
uses TX_FRM command for management frames (instead
of a dedicated command). To support PMF it is
necessary to provide explicit tailroom.

Signed-off-by: Marek Kwaczynski <marek.kwaczynski@tieto.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-01-27 16:25:23 +02:00
Michal Kazior 89d6d83565 ath10k: use idr api for msdu_ids
HTT Tx protocol uses arbitrary host assigned ids
too associate with MSDUs when delivering
completions.

Instead of rolling out own id generation scheme
use the tools provided in kernel.

This should have little to no effect on
performance.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-01-27 16:16:59 +02:00
Julia Lawall 8be3b69259 ath10k: fix error return code
Return a negative error code on failure.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret; expression e1,e2;
@@
(
if (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2015-01-13 16:15:57 +02:00
Kalle Valo 91ad5f56f6 ath10k: set max_num_pending_tx in ath10k_core_init_firmware_features()
Better to have this in same place as other firmware interface handling.

Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2014-12-08 17:38:44 +02:00
Michal Kazior 8d6d362436 ath10k: fix offchan reliability
New firmware revisions don't need peer creation
when doing offchannel tx. Earlier revisions would
queue and never release frames without a peer.

This prevent new firmware revisions from stopping
replenishing wmi-htc tx credits and improves
reliability of offchannel tx which would sometimes
silently fail.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2014-11-26 08:29:50 +02:00
Rajkumar Manoharan 5ce8e7fdcc ath10k: handle ieee80211 header and payload tracing separately
For packet log, the transmitted frame 802.11 header alone is sufficient.
Recording entire packet is also consuming lot of disk space. To optimize
this, tx and rx data tracepoints are splitted into header and payload
tracepoints.

To record tx ieee80211 headers

     trace-cmd record -e ath10k_tx_hdr

To record complete packets

     trace-cmd record -e ath10k_tx_hdr -e ath10k_tx_payload

Cc: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2014-11-17 16:38:31 +02:00
Michal Kazior 7962b0d898 ath10k: speed up hw recovery
In some cases hw recovery was taking an absurdly
long time due to ath10k waiting for things that
would never really complete.

Instead of waiting for inevitable timeouts poke
all completions and wakequeues and check if it's
still worth waiting.

Reading/writing ar->state requires conf_mutex.
Since waiters might be holding it introduce a new
flag CRASH_FLUSH so it's possible to tell waiters
to abort whatever they were waiting for.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2014-10-31 02:32:28 +02:00
Rajkumar Manoharan 9b57f88f1e ath10k: add tracing for frame transmission
Add tracing support to forward management and data frames to
user space for packet inspection.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2014-10-08 13:26:42 +03:00
Rajkumar Manoharan d1e50f4703 ath10k: add tracing for tx info
The tx info such as msdu_id, frame len, vdev id and tid are reported
to user space by tracepoint. This is useful for collecting tx
statistics.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
2014-10-07 17:10:47 +03:00