In cases of short transfer times the CPU is spending lots of time
in the interrupt handler and scheduler to reschedule the worker thread.
Measurements show that we have times where it takes 29.32us to between
the last clock change and the time that the worker-thread is running again
returning from wait_for_completion_timeout().
During this time the interrupt-handler is running calling complete()
and then also the scheduler is rescheduling the worker thread.
This time can vary depending on how much of the code is still in
CPU-caches, when there is a burst of spi transfers the subsequent delays
are in the order of 25us, so the value of 30us seems reasonable.
With polling the whole transfer of 4 bytes at 10MHz finishes after 6.16us
(CS down to up) with the real transfer (clock running) taking 3.56us.
So the efficiency has much improved and is also freeing CPU cycles,
reducing interrupts and context switches.
Because of the above 30us seems to be a reasonable limit for polling.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Transforms the bcm-2835 native SPI-chip select to their gpio-cs equivalent.
This allows for some support of some optimizations that are not
possible due to HW-gliches on the CS line - especially filling
the FIFO before enabling SPI interrupts (by writing to CS register)
while the transfer is already in progress (See commit: e3a2be3030)
This patch also works arround some issues in bcm2835-pinctrl which does not
set the value when setting the GPIO as output - it just sets up output and
(typically) leaves the GPIO as low. When a fix for this is merged then this
gpio_set_value can get removed from bcm2835_spi_setup.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
When the CONTINUE bit is set, the interrupt status we are polling to
identify if a transaction has finished can be sporadic. Even though
the transfer has finished, the interrupt status may erroneously
indicate that there is still data in the FIFO. This behaviour causes
random timeouts in large PIO transfers.
Instead of using the CONTINUE bit to control the CS lines, use the SPI
core's CS GPIO handling. Also, now that the CONTINUE bit is not being
used, we can poll for the ALLDONE interrupt to indicate transfer
completion.
Signed-off-by: Sifan Naeem <sifan.naeem@imgtec.com>
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@imgtec.com>
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Imagination has recommended that the SPFI controller be reset after
each message, regardless of success or failure. Do this in an
unprepare_message() callback.
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
The driver can be greatly simplified by moving the transfer timeout
handling to a handle_err() callback.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@imgtec.com>
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Setting the transfer length in the TRANSACTION register after the
CONTROL register is programmed causes intermittent timeout issues in
SPFI transfers when using the SPI framework to control the CS GPIO
lines. To avoid this issue, set transfer length before programming
the CONTROL register.
Signed-off-by: Sifan Naeem <sifan.naeem@imgtec.com>
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
If a driver doesn't implement the master->handle_err() callback and an
SPI transfer fails, the kernel will crash with a NULL pointer
dereference:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c0003000
[00000000] *pgd=80000040004003, *pmd=00000000
Internal error: Oops: 80000206 [#1] SMP ARM
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc7-koelsch-05861-g1fc9fdd4add4f783 #1046
Hardware name: Generic R8A7791 (Flattened Device Tree)
task: eec359c0 ti: eec54000 task.ti: eec54000
PC is at 0x0
LR is at spi_transfer_one_message+0x1cc/0x1f0
Make the master->handle_err() callback optional to avoid the crash.
Also fix a spelling mistake in the callback documentation while we're at
it.
Fixes: b716c4ffc6 ("spi: introduce master->handle_err() callback")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Mark Brown <broonie@kernel.org>
Although the SPFI BITCLK divider supports a value of up to 255, only
values up to 128 are usable. This results in a maximum possible bit
clock rate of 1/4th the input clock rate.
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
In preparation for switching to using the SPI core's CS GPIO handling,
move setup of the PORT_STATE register, which must be configured before
CS is asserted, to a prepare_message() callback.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@imgtec.com>
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Add delay between chip select and clock signals, before clock starts and
after clock stops.
Signed-off-by: Aaron Brice <aaron.brice@datasoft.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Previous algorithm had an outer loop with the values {2,3,5,7} and an
inner loop with {2,4,6,8,16,32,...,32768}, and would pick the first
value over the required scaling value (where the total scale was the two
numbers multiplied).
Since the inner loop went up to 32768 it would always pick a value of 2
for PBR and a much higher than necessary value for BR. The desired
scale factor was being divided by two I believe to compensate for the
much higher scale factors (the divide by two not specified in the
reference manual).
Updated to check all values and find the smallest scale factor possible
without going over the desired clock rate.
Signed-off-by: Aaron Brice <aaron.brice@datasoft.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
We need "ret" to be unsigned for the error handling to work. The
signedness of "i" and "n" don't matter but qspi_set_send_trigger()
returns an int so I've changed them to int as well.
Fixes: 4b6fe3edcb ('spi: Using Trigger number to transmit/receive data')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
They are used to decide if the controller can do DMA on a buffer
of a specific length and thus are needed before any transfer is attempted.
This fixes a memory leak where the SPI core uses the drivers can_dma()
callback to determine if a buffer needs to be mapped. As the watermark
levels aren't correct at that point the driver falsely claims to be able to
DMA the buffer when it fact it isn't.
After the transfer has been done the core uses the same callback to
determine if it needs to unmap the buffers. As the driver now correctly
claims to not being able to DMA the buffer the core doesn't attempt to
unmap the buffer which leaves the SGT leaking.
Fixes: f62caccd12 (spi: spi-imx: add DMA support)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
cr_width may be not initialized before using by cr, the related warning
(with defconfig under blackfin by gcc5):
CC drivers/spi/spi-bfin5xx.o
drivers/spi/spi-bfin5xx.c: In function 'bfin_spi_pump_transfers':
drivers/spi/spi-bfin5xx.c:655:5: warning: 'cr_width' may be used uninitialized in this function [-Wmaybe-uninitialized]
cr |= cr_width;
^
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The current implementation of bitbang_txrx_be_cpha0 and
bitbang_txrx_be_cpha1 always call setmosi. That runs into several
unnecessary calls into the gpiolib when the level of the GPIO actually
has not to be changed.
This patch changes the routines to remember the last GPIO level
and only calls setmosi if an change has to be made. This
way it improves the transfer throughput.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
We refactored this code but accidentally left out a break statement so
QUARK_X1000_SSP isn't handled correctly.
Fixes: 025ffe88ee ('spi: pxa2xx: shift clk_div in one place')
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Previous algorithm had an outer loop with the values {2,3,5,7} and an
inner loop with {2,4,6,8,16,32,...,32768}, and would pick the first
value over the required scaling value (where the total scale was the two
numbers multiplied).
Since the inner loop went up to 32768 it would always pick a value of 2
for PBR and a much higher than necessary value for BR. The desired
scale factor was being divided by two I believe to compensate for the
much higher scale factors (the divide by two not specified in the
reference manual).
Updated to check all values and find the smallest scale factor possible
without going over the desired clock rate.
Signed-off-by: Aaron Brice <aaron.brice@datasoft.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
In order to transmit and receive data when have 32 bytes of data that
ready has prepared on Transmit/Receive Buffer to transmit or receive.
Instead transmits/receives a byte data using Transmit/Receive Buffer
Data Triggering Number will improve the speed of transfer data.
Signed-off-by: Hiep Cao Minh <cm-hiep@jinso.co.jp>
Signed-off-by: Mark Brown <broonie@kernel.org>
To reduce the number of interrupts/message we fill the FIFO before
enabling interrupts - for short messages this reduces the interrupt count
from 2 to 1 interrupt.
There have been rare cases where short (<200ns) chip-select switches with
native CS have been observed during such operation, this is why this
optimization is only enabled for GPIO-CS.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Tested-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Since spidev is a detail of how Linux controls a device rather than a
description of the hardware in the system we should never have a node
described as "spidev" in DT, any SPI device could be a spidev so this
is just not a useful description.
In order to help prevent users from writing such device trees generate a
warning if spidev is instantiated as a DT node without an ID in the match
table.
Signed-off-by: Mark Brown <broonie@kernel.org>
This also allows for GPIO-CS to get used removing the limitation of
2/3 SPI devises on the SPI bus.
Fixes: spi-cs-high with native CS with multiple devices on the spi-bus
resetting the chip selects to "normal" polarity after a finished
transfer.
No other functionality/improvements added.
Tested with the following 4 devices on the spi-bus:
* mcp2515 with native CS
* mcp2515 with gpio CS
* fb_st7735r with native CS
(plus spi-cs-high via transistor inverting polarity)
* enc28j60 with gpio-CS
Tested-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
We have found that we can sometimes see read failures on boards with
high-capacitance SPI lines. It seems that the controller samples the Rx
data line too early, and its register interface has an "Rx Sample Delay"
setting to fine-tune against this issue.
This patch adds a new optional device tree entry that can configure this
delay in terms of nanoseconds. The kernel will calculate the
best-fitting amount of parent clock ticks to program the controller with
based on that.
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
The Rockchip SPI driver currently calculates its clock rate divisor by
integer dividing the parent rate by the target rate, and then rounding
the result up to the next even number (since the divisor must be
even).
Clock rate divisors should always be rounded up, so that the resulting
frequency is lower or equal to the target. This is correctly done in the
second step here but not in the first, so we still have a risk of
exceeding the desired target frequency (e.g. setting spi-max-frequency
to 40000000 with a parent clock of 99000000 could lead to a divisor of
99000000 / 40000000 == 2 (which is even) that then results in an
effective frequency of 99000000 / 2 == 49500000 (potentially exceeding
the flash chip's specifications).
This patch changes the division to round up to fix this problem.
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Trying to register an SPI device asynchronously (via async_schedule() call)
results in an ugly complaint from request_module() warning about potential
deadlock (because request_module tries to wait for async works to
complete, the caller is also an async work in this case).
While we could try to switch to using request_module_nowait(), other buses,
as well as SPI itself when not using device tree, do not try to load
modules explicitly, but rather rely on the standard infrastructure (such as
udev) to execute module loading. There is no reason why SPI OF-described
devices should be treated differently.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The commit 1a7b7ee72c (spi: Ensure that CS line is in non-active state after
spi_setup()) introduces an unconditional call of spi_set_cs() before ->setup().
The dw_spi_set_cs() relies on that fact that ->setup() is already called, but
it doesn't now. This patch fixes the crash by adding an additional check to
dw_spi_set_cs().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The Quark SoC data sheet describes the baud rate setting using fractional
divider. The subset of possible values represented by a table suggests that the
divisor has one block that could divide by 5. This explains the number of the
beast in some cases in the table. Thus, in this particular case the divisor can
be evaluated as
5^i * 2^j * 2 * k,
where
i = [0, 1]
j = [0, 23]
k = [1, 256]
There are few cases as mentioned in the data sheet, i.e. better form of the
clock signal will be in case if DDS_CLK_RATE either 2^n or 2/5. It's also
possible to use any value that is less or equal to 0x33333 (1/5/16 = 1/80).
All three cases are compared to each other and the one that suits better is
chosen by the approximation algorithm. Anyone can play with the script [1] that
represents the algorithm.
[1] https://gist.github.com/06b084488b3629898121
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This patch refactors ssp_get_clk_div() and pxa2xx_ssp_get_clk_div() to align
clk_div calculations, i.e. ssp_get_clk_div() and quark_x1000_set_clk_regvals()
will return plain clk_div and it will be shifted to proper position in
pxa2xx_ssp_get_clk_div().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
`spidev_message()` sums the lengths of the individual SPI transfers to
determine the overall SPI message length. It restricts the total
length, returning an error if too long, but it does not check for
arithmetic overflow. For example, if the SPI message consisted of two
transfers and the first has a length of 10 and the second has a length
of (__u32)(-1), the total length would be seen as 9, even though the
second transfer is actually very long. If the second transfer specifies
a null `rx_buf` and a non-null `tx_buf`, the `copy_from_user()` could
overrun the spidev's pre-allocated tx buffer before it reaches an
invalid user memory address. Fix it by checking that neither the total
nor the individual transfer lengths exceed the maximum allowed value.
Thanks to Dan Carpenter for reporting the potential integer overflow.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Mark Brown <broonie@kernel.org>
The official documentation is wrong in this respect.
Has been tested empirically for dividers 2-1024
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Implement the recommendation from the BCM2835 data-sheet
with regards to polling drivers to fill/drain the FIFO as much data as possible
also for the interrupt-driven case (which this driver is making use of).
This means that for long transfers (>64bytes) we need one interrupt
every 64 bytes instead of every 12 bytes, as the FIFO is 16 words (not bytes) wide.
Tested with mcp251x (can bus), fb_st7735 (TFT framebuffer device)
and enc28j60 (ethernet) drivers.
Signed-off-by: Martin Sperl <kernel@martin.sperl.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
asm/irq.h is already included by linux/interrupt.h.
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
These asm/io.h, asm/irq.h and asm/delay.h are needless since they are
already included by linux/io.h via drivers/spi/spi-pxa2xx.h,
linux/interrupt.h and linux/delay.h.
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Use the endian agnositc IO functions instead of the __raw ones for when
the driver is in use on big-endian systems.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>