Commit e82b0b3828 ("spi: bcm2835: Fix race on DMA termination") broke
the build with COMPILE_TEST=y on arches whose cmpxchg() requires 32-bit
operands (xtensa, older arm ISAs).
Fix by changing the dma_pending flag's type from bool to unsigned int.
Fixes: e82b0b3828 ("spi: bcm2835: Fix race on DMA termination")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
When in DMA mode, the BCM2835 SPI controller requires that the FIFO is
accessed in 4 byte chunks. This rule is not fulfilled if a transfer
consists of multiple sglist entries, one per page, and the first entry
starts in the middle of a page with an offset not a multiple of 4.
The driver currently falls back to programmed I/O for such transfers,
incurring a significant performance penalty.
Overcome this hardware limitation by transferring the first few bytes of
a transfer without DMA such that the remainder of the first sglist entry
becomes a multiple of 4. Specifics are provided in kerneldoc comments.
An alternative approach would have been to split transfers in the
->prepare_message hook, but this may necessitate two transfers per page,
defeating the goal of clustering multiple pages together in a single
transfer for efficiency. E.g. if the first TX sglist entry's length is
23 and the first RX's is 40, the first transfer would send and receive
23 bytes, the second 40 - 23 = 17 bytes, the third 4096 - 17 = 4079
bytes, the fourth 4096 - 4079 = 17 bytes and so on. In other words,
O(n) transfers are necessary (n = number of sglist entries), whereas
the algorithm implemented herein only requires O(1) additional work.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Document the driver's data structure to lower the barrier to entry for
contributors.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Commit a30a555d74 ("spi: bcm2835: transform native-cs to gpio-cs on
first spi_setup") disabled the use of hardware-controlled native Chip
Select in favour of software-controlled GPIO Chip Select but left code
to support the former untouched. Remove it to simplify the driver and
ease the addition of new features and further optimizations.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
If a DMA transfer finishes orderly right when spi_transfer_one_message()
determines that it has timed out, the callbacks bcm2835_spi_dma_done()
and bcm2835_spi_handle_err() race to call dmaengine_terminate_all(),
potentially leading to double termination.
Prevent by atomically changing the dma_pending flag before calling
dmaengine_terminate_all().
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Fixes: 3ecd37edaa ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
Cc: stable@vger.kernel.org # v4.2+
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
If submission of a DMA TX transfer succeeds but submission of the
corresponding RX transfer does not, the BCM2835 SPI driver terminates
the TX transfer but neglects to reset the dma_pending flag to false.
Thus, if the next transfer uses interrupt mode (because it is shorter
than BCM2835_SPI_DMA_MIN_LENGTH) and runs into a timeout,
dmaengine_terminate_all() will be called both for TX (once more) and
for RX (which was never started in the first place). Fix it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Fixes: 3ecd37edaa ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
Cc: stable@vger.kernel.org # v4.2+
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
The IRQ handler bcm2835_spi_interrupt() first reads as much as possible
from the RX FIFO, then writes as much as possible to the TX FIFO.
Afterwards it decides whether the transfer is finished by checking if
the TX FIFO is empty.
If very few bytes were written to the TX FIFO, they may already have
been transmitted by the time the FIFO's emptiness is checked. As a
result, the transfer will be declared finished and the chip will be
reset without reading the corresponding received bytes from the RX FIFO.
The odds of this happening increase with a high clock frequency (such
that the TX FIFO drains quickly) and either passing "threadirqs" on the
command line or enabling CONFIG_PREEMPT_RT_BASE (such that the IRQ
handler may be preempted between filling the TX FIFO and checking its
emptiness).
Fix by instead checking whether rx_len has reached zero, which means
that the transfer has been received in full. This is also more
efficient as it avoids one bus read access per interrupt. Note that
bcm2835_spi_transfer_one_poll() likewise uses rx_len to determine
whether the transfer has finished.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Fixes: e34ff011c7 ("spi: bcm2835: move to the transfer_one driver model")
Cc: stable@vger.kernel.org # v4.1+
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
The license text is specifying GPL v2 or later but the MODULE_LICENSE
is set to GPL v2 which means GNU Public License v2 only. So choose the
license text as the correct one.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Acked-by: Florian Kauer <florian.kauer@koalo.de>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
This should be fixed by commit 4c02cba18c
("pinctrl: bcm2835: Fix initial value for direction_output")
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Acked-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Change the initialization order of the HW so that the interrupt
is only requested after the HW is initialized
Also the use of irq_of_parse_and_map is replaced by platform_get_irq.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
There is a bug in the alignment checking of transfers,
that results in DMA not being used for un-aligned
transfers that do not cross page-boundries, which is valid.
This is due to a missconception of the meaning PAGE_MASK
when implementing that check originally - (PAGE_SIZE - 1)
should have been used instead.
Also fixes a copy/paste error.
Reported-by: <robert@axium.co.nz>
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
This resulted in the use of polling mode when other approaches
(dma or interrupts) would have been more appropriate.
Happened for transfers longer than 477 bytes.
Reported-by: Noralf Tronnes <noralf@tronnes.org>
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
When using reverse polarity for clock (spi-cpol) on a device
the clock line gets altered after chip-select has been asserted
resulting in an additional clock beat, which confuses hardware.
This did not show when using native-CS, as the same register
is used to control cs as well as polarity, so the changes came
into effect at the same time. Unfortunately this is not true
with gpio-cs.
To avoid this situation this patch moves the setup of polarity
(spi-cpol and spi-cpha) outside of the chip-select into
prepare_message, which is run prior to asserting chip-select.
Also fixes resetting 3-wire mode after use of rx-mode, so that
a 3-Wire sequence TX, RX, TX works as well (right now it runs
TX, RX, RX instead)
Reported-by: Noralf Tronnes <noralf@tronnes.org>
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
fixes several warnings/error emmitted by the kbuild system:
* warn: cast from pointer to integer of different size
using size_t instead of u32
* error: 'SZ_4K' undeclared
moved to PAGE_SIZE and PAGE_MASK instead
Review showed also a typo in the same code where tx_buff
was checked twice instead of checking both rx and tx_buff.
Reported by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Conditions per spi_transfer are:
* transfer.len >= 96 bytes (to avoid mapping overhead costs)
* transfer.len < 65536 bytes (limitaion by spi-hw block - could get extended)
* an individual scatter/gather transfer length must be a multiple of 4
for anything but the last transfer - spi-hw block limit.
(some shortcut has been taken in can_dma to avoid unnecessary mapping of
pages which, for which there is a chance that there is a split with a
transfer length not a multiple of 4)
If it becomes a necessity these restrictions can get removed by additional
code.
Note that this patch requires a patch to dma-bcm2835.c by Noralf to
enable scatter-gather mode inside the dmaengine, which has not been
merged yet.
That is why no patch to arch/arm/boot/dts/bcm2835.dtsi is included - the
code works as before without dma when tx/rx are not set, but it writes
a message warning about dma not used:
spi-bcm2835 20204000.spi: no tx-dma configuration found - not using dma mode
To enable dma-mode add the following lines to the device-tree:
dmas = <&dma 6>, <&dma 7>;
dma-names = "tx", "rx";
Tested-by: Noralf Trønnes <noralf@tronnes.org> (private communication)
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
The polling mode of the driver is designed for transfers that run
less than 30us - it will only execute under those circumstances.
So it should run comfortably without getting interrupted by the
scheduler.
But there are situations where the raspberry pi is so overloaded
that it can take up to 80 jiffies until the polling thread gets
rescheduled - this has been observed especially under heavy
IO situations.
In such a situation we now fall back to the interrupt handler and
log the situation at debug level.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
The way that the timeout code is written in the polling function
the timeout does also trigger when interrupted or rescheduled while
in the polling loop.
This patch changes the timeout from effectively 20ms (=2 jiffies) to
1 second and removes the time that the transfer really takes out of
the computation, as - per design - this is <30us and the jiffie resolution
is 10ms so that does not make any difference what so ever.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
In cases of short transfer times the CPU is spending lots of time
in the interrupt handler and scheduler to reschedule the worker thread.
Measurements show that we have times where it takes 29.32us to between
the last clock change and the time that the worker-thread is running again
returning from wait_for_completion_timeout().
During this time the interrupt-handler is running calling complete()
and then also the scheduler is rescheduling the worker thread.
This time can vary depending on how much of the code is still in
CPU-caches, when there is a burst of spi transfers the subsequent delays
are in the order of 25us, so the value of 30us seems reasonable.
With polling the whole transfer of 4 bytes at 10MHz finishes after 6.16us
(CS down to up) with the real transfer (clock running) taking 3.56us.
So the efficiency has much improved and is also freeing CPU cycles,
reducing interrupts and context switches.
Because of the above 30us seems to be a reasonable limit for polling.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Transforms the bcm-2835 native SPI-chip select to their gpio-cs equivalent.
This allows for some support of some optimizations that are not
possible due to HW-gliches on the CS line - especially filling
the FIFO before enabling SPI interrupts (by writing to CS register)
while the transfer is already in progress (See commit: e3a2be3030)
This patch also works arround some issues in bcm2835-pinctrl which does not
set the value when setting the GPIO as output - it just sets up output and
(typically) leaves the GPIO as low. When a fix for this is merged then this
gpio_set_value can get removed from bcm2835_spi_setup.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
To reduce the number of interrupts/message we fill the FIFO before
enabling interrupts - for short messages this reduces the interrupt count
from 2 to 1 interrupt.
There have been rare cases where short (<200ns) chip-select switches with
native CS have been observed during such operation, this is why this
optimization is only enabled for GPIO-CS.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Tested-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
This also allows for GPIO-CS to get used removing the limitation of
2/3 SPI devises on the SPI bus.
Fixes: spi-cs-high with native CS with multiple devices on the spi-bus
resetting the chip selects to "normal" polarity after a finished
transfer.
No other functionality/improvements added.
Tested with the following 4 devices on the spi-bus:
* mcp2515 with native CS
* mcp2515 with gpio CS
* fb_st7735r with native CS
(plus spi-cs-high via transistor inverting polarity)
* enc28j60 with gpio-CS
Tested-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
The official documentation is wrong in this respect.
Has been tested empirically for dividers 2-1024
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Implement the recommendation from the BCM2835 data-sheet
with regards to polling drivers to fill/drain the FIFO as much data as possible
also for the interrupt-driven case (which this driver is making use of).
This means that for long transfers (>64bytes) we need one interrupt
every 64 bytes instead of every 12 bytes, as the FIFO is 16 words (not bytes) wide.
Tested with mcp251x (can bus), fb_st7735 (TFT framebuffer device)
and enc28j60 (ethernet) drivers.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
The following errors/warnings issued by checkpatch.pl --strict have been fixed:
drivers/spi/spi-bcm2835.c:182: CHECK: Alignment should match open parenthesis
drivers/spi/spi-bcm2835.c:191: CHECK: braces {} should be used on all arms of this statement
drivers/spi/spi-bcm2835.c:234: CHECK: Alignment should match open parenthesis
drivers/spi/spi-bcm2835.c:256: CHECK: Alignment should match open parenthesis
drivers/spi/spi-bcm2835.c:271: CHECK: Alignment should match open parenthesis
drivers/spi/spi-bcm2835.c:346: CHECK: Alignment should match open parenthesis
total: 0 errors, 0 warnings, 6 checks, 403 lines checked
In 2 locations the arguments had to get split/moved to the next line so that the
line width stays below 80 chars.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
The purpose of commit 1e8a52e18c
"spi: By default setup spi_masters with 1 chipselect and dynamics bus number"
is to avoid setting default value for bus_num and num_chipselect in spi master
drivers. So let's remove the duplicate code.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-By: David Daney <david.daney@cavium.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Once a spi_master_get() call succeeds, we need an additional
spi_master_put() call to free the memory, otherwise we will
leak a reference to master. Fix by removing the unnecessary
spi_master_get() call.
Fixes: 247263dba2 ('spi: bcm2835: use devm_spi_register_master()')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mark Brown <broonie@linaro.org>
Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.
[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use devm_spi_register_master() to make cleanup paths simpler,
and remove a duplicate put.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
The call to spi_unregister_master results in device memory being freed, it must
no longer be accessed afterwards. Thus call spi_master_get() to get an extra
reference to the device and call spi_master_put() only after the last access to
device data.
Note, current code has an extra spi_master_put() call in bcm2835_spi_remove().
Thus this patch just adds an spi_master_get() to balance the reference count.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
We have a SPI_BPW_MASK macro defined in spi.h, use it instead of open-coded.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Replace a call to deprecated devm_request_and_ioremap by devm_ioremap_resource.
Found with coccicheck and this semantic patch:
scripts/coccinelle/api/devm_request_and_ioremap.cocci.
Signed-off-by: Laurent Navet <laurent.navet@gmail.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
This driver only supports bits_per_word==8, so inform the SPI core of
this. Remove all the open-coded validation that's no longer needed.
Signed-off-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
The BCM2835 contains two forms of SPI master controller (one known
simply as SPI0, and the other known as the "Universal SPI Master", in
the auxilliary block) and one form of SPI slave controller. This patch
adds support for the SPI0 controller.
This driver is taken from Chris Boot's repository at
git://github.com/bootc/linux.git rpi-linear
as of commit 6de2905 "spi-bcm2708: fix printf with spurious %s".
In the first SPI-related commit there, Chris wrote:
Thanks to csoutreach / A Robinson for his driver which I used as an
inspiration. You can find his version here:
http://piface.openlx.org.uk/raspberry-pi-spi-kernel-driver-available-for
Changes made during upstreaming:
* Renamed bcm2708 to bcm2835 as per upstream naming for this SoC.
* Removed support for brcm,realtime property.
* Increased transfer timeout to 30 seconds.
* Return IRQ_NONE from the IRQ handler if no interrupt was handled.
* Disable TA (Transfer Active) and clear FIFOs on a transfer timeout.
* Wrote device tree binding documentation.
* Request unnamed clock rather than "sys_pclk"; the DT will provide the
correct clock.
* Assume that tfr->speed_hz and tfr->bits_per_word are always set in
bcm2835_spi_start_transfer(), bcm2835_spi_transfer_one(), so no need
to check spi->speed_hz or tft->bits_per_word.
* Re-ordered probe() to remove the need for temporary variables.
* Call clk_disable_unprepare() rather than just clk_unprepare() on probe()
failure.
* Don't use devm_request_irq(), to ensure that the IRQ doesn't fire after
we've torn down the device, but not unhooked the IRQ.
* Moved probe()'s call to clk_prepare_enable() so we can be sure the clock
is enabled if the IRQ handler fires immediately.
* Remove redundant checks from bcm2835_spi_check_transfer() and
bcm2835_spi_setup().
* Re-ordered IRQ handler to check for RXR before DONE. Added comments to
ISR.
* Removed empty prepare/unprepare implementations.
* Removed use of devinit/devexit.
* Added BCM2835_ prefix to defines.
Signed-off-by: Chris Boot <bootc@bootc.net>
Signed-off-by: Stephen Warren <swarren@wwwdotorg.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>