If the dw_edma_alloc_burst() function fails then we free "chunk" but
it's still on the "desc->chunk->list" list so it will lead to a use
after free. Also the "->chunks_alloc" count is incremented when it
shouldn't be.
In current kernels small allocations are guaranteed to succeed and
dw_edma_alloc_burst() can't fail so this will not actually affect
runtime.
Fixes: e63d79d1ff ("dmaengine: Add Synopsys eDMA IP core driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/X9dTBFrUPEvvW7qc@mwanda
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Fix the source and destination physical address calculation of a
peripheral device on scatter-gather implementation.
This issue manifested during tests using a 64 bits architecture system.
The abnormal behavior wasn't visible before due to all previous tests
were done using 32 bits architecture system, that masked his effect.
Fixes: e63d79d1ff ("dmaengine: Add Synopsys eDMA IP core driver")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/8d3ab7e2ba96563fe3495b32f60077fffb85307d.1597327623.git.gustavo.pimentel@synopsys.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Modify dw_edma_device_transfer() to also support the semantics of dma
device transfer for additional use cases involving pcitest utility as a
local initiator.
For its original use case, dw-edma supported the semantics of dma device
transfer from the perspective of a remote initiator who is located across
the PCIe bus from dma channel hardware.
To a remote initiator, DMA_DEV_TO_MEM means using a remote dma WRITE
channel to transfer from remote memory to local memory. A WRITE channel
would be employed on the remote device in order to move the contents of
remote memory to the bus destined for local memory.
To a remote initiator, DMA_MEM_TO_DEV means using a remote dma READ
channel to transfer from local memory to remote memory. A READ channel
would be employed on the remote device in order to move the contents of
local memory to the bus destined for remote memory.
>From the perspective of a local dma initiator who is co-located on the
same side of the PCIe bus as the dma channel hardware, the semantics of
dma device transfer are flipped.
To a local initiator, DMA_DEV_TO_MEM means using a local dma READ channel
to transfer from remote memory to local memory. A READ channel would be
employed on the local device in order to move the contents of remote
memory to the bus destined for local memory.
To a local initiator, DMA_MEM_TO_DEV means using a local dma WRITE channel
to transfer from local memory to remote memory. A WRITE channel would be
employed on the local device in order to move the contents of local memory
to the bus destined for remote memory.
To support local dma initiators, dw_edma_device_transfer() is modified to
now examine the direction field of struct dma_slave_config for the channel
which initiators can configure by calling dmaengine_slave_config().
If direction is configured as either DMA_DEV_TO_MEM or DMA_MEM_TO_DEV,
local initiator semantics are used. If direction is a value other than
DMA_DEV_TO_MEM nor DMA_MEM_TO_DEV, then remote initiator semantics are
used. This should maintain backward compatibility with the original use
case of dw-edma.
The dw-edma-test utility is an example of a remote initiator. From reading
its patch, dw-edma-test does not specifically set the direction field of
struct dma_slave_config. Since dw_edma_device_transfer() also does not
check the direction field of struct dma_slave_config, it seems safe to use
this convention in dw-edma to support both local and remote initiator
semantics.
Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com>
Link: https://lore.kernel.org/r/1588122633-1552-1-git-send-email-alan.mikhak@sifive.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Modify dw_edma_irq_request() to check if a struct msi_desc entry exists
before copying the contents of its struct msi_msg pointer.
Without this sanity check, __get_cached_msi_msg() crashes when invoked by
dw_edma_irq_request() running on a Linux-based PCIe endpoint device. MSI
interrupt are not received by PCIe endpoint devices. If irq_get_msi_desc()
returns null, then there is no cached struct msi_msg to be copied.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/1587607101-31914-1-git-send-email-alan.mikhak@sifive.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Decouple dw-edma-core.c from struct pci_dev as a step toward integration
of dw-edma with pci-epf-test so the latter can initiate dma operations
locally from the endpoint side. A barrier to such integration is the
dependency of dw_edma_probe() and other functions in dw-edma-core.c on
struct pci_dev.
The Synopsys DesignWare dw-edma driver was designed to run on host side
of PCIe link to initiate DMA operations remotely using eDMA channels of
PCIe controller on the endpoint side. This can be inferred from seeing
that dw-edma uses struct pci_dev and accesses hardware registers of dma
channels across the bus using BAR0 and BAR2.
The ops field of struct dw_edma in dw-edma-core.h is currenty undefined:
const struct dw_edma_core_ops *ops;
However, the kernel builds without failure even when dw-edma driver is
enabled. Instead of removing the currently undefined and usued ops field,
define struct dw_edma_core_ops and use the ops field to decouple
dw-edma-core.c from struct pci_dev.
Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/1586971629-30196-1-git-send-email-alan.mikhak@sifive.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
When building with 'make C=1', sparse reports an endianess bug:
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:60:30: warning: cast removes address space of expression
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: warning: incorrect type in argument 1 (different address spaces)
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: expected void const volatile [noderef] <asn:2>*addr
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: got void *[assigned] ptr
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: warning: incorrect type in argument 1 (different address spaces)
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: expected void const volatile [noderef] <asn:2>*addr
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: got void *[assigned] ptr
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: warning: incorrect type in argument 1 (different address spaces)
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: expected void const volatile [noderef] <asn:2>*addr
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:86:24: got void *[assigned] ptr
The current code is clearly wrong, as it passes an endian-swapped word
into a register function where it gets swapped again. Just pass the variables
directly into lower_32_bits()/upper_32_bits().
Fixes: 7e4b8a4fbe ("dmaengine: Add Synopsys eDMA IP version 0 support")
Link: https://lore.kernel.org/lkml/20190617131820.2470686-1-arnd@arndb.de/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/20190722124457.1093886-3-arnd@arndb.de
Signed-off-by: Vinod Koul <vkoul@kernel.org>
The new driver mixes up dma_addr_t and __iomem pointers, which results
in warnings on some 32-bit architectures, like:
drivers/dma/dw-edma/dw-edma-v0-core.c: In function '__dw_regs':
drivers/dma/dw-edma/dw-edma-v0-core.c:28:9: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
return (struct dw_edma_v0_regs __iomem *)dw->rg_region.vaddr;
Make it use __iomem pointers consistently here, and avoid using dma_addr_t
for __iomem tokens altogether.
A small complication here is the debugfs code, which passes an __iomem
token as the private data for debugfs files, requiring the use of
extra __force.
Fixes: 7e4b8a4fbe ("dmaengine: Add Synopsys eDMA IP version 0 support")
Link: https://lore.kernel.org/lkml/20190617131918.2518727-1-arnd@arndb.de/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20190722124457.1093886-2-arnd@arndb.de
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Putting large constant data on the stack causes unnecessary overhead
and stack usage:
drivers/dma/dw-edma/dw-edma-v0-debugfs.c:285:6: error: stack frame size of 1376 bytes in function 'dw_edma_v0_debugfs_on' [-Werror,-Wframe-larger-than=]
Mark the variable 'static const' in order for the compiler to move it
into the .rodata section where it does no such harm.
Fixes: 305aebeff8 ("dmaengine: Add Synopsys eDMA IP version 0 debugfs support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Link: https://lore.kernel.org/r/20190722124457.1093886-1-arnd@arndb.de
Signed-off-by: Vinod Koul <vkoul@kernel.org>
If CONFIG_PCI_MSI is not set, building with CONFIG_DW_EDMA
fails:
drivers/dma/dw-edma/dw-edma-core.c: In function dw_edma_irq_request:
drivers/dma/dw-edma/dw-edma-core.c:784:21: error: implicit declaration of function pci_irq_vector; did you mean rcu_irq_enter? [-Werror=implicit-function-declaration]
err = request_irq(pci_irq_vector(to_pci_dev(dev), 0),
^~~~~~~~~~~~~~
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: e63d79d1ff ("dmaengine: Add Synopsys eDMA IP core driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Synopsys eDMA IP is normally distributed along with Synopsys PCIe
EndPoint IP (depends of the use and licensing agreement).
This IP requires some basic configurations, such as:
- eDMA registers BAR
- eDMA registers offset
- eDMA registers size
- eDMA linked list memory BAR
- eDMA linked list memory offset
- eDMA linked list memory size
- eDMA data memory BAR
- eDMA data memory offset
- eDMA data memory size
- eDMA version
- eDMA mode
- IRQs available for eDMA
As a working example, PCIe glue-logic will attach to a Synopsys PCIe
EndPoint IP prototype kit (Vendor ID = 0x16c3, Device ID = 0xedda),
which has built-in an eDMA IP with this default configuration:
- eDMA registers BAR = 0
- eDMA registers offset = 0x00001000 (4 Kbytes)
- eDMA registers size = 0x00002000 (8 Kbytes)
- eDMA linked list memory BAR = 2
- eDMA linked list memory offset = 0x00000000 (0 Kbytes)
- eDMA linked list memory size = 0x00800000 (8 Mbytes)
- eDMA data memory BAR = 2
- eDMA data memory offset = 0x00800000 (8 Mbytes)
- eDMA data memory size = 0x03800000 (56 Mbytes)
- eDMA version = 0
- eDMA mode = EDMA_MODE_UNROLL
- IRQs = 1
This driver can be compile as built-in or external module in kernel.
To enable this driver just select DW_EDMA_PCIE option in kernel
configuration, however it requires and selects automatically DW_EDMA
option too.
Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Add support for the eDMA IP version 0 driver for both register maps (legacy
and unroll).
The legacy register mapping was the initial implementation, which consisted
in having all registers belonging to channels multiplexed, which could be
change anytime (which could led a race-condition) by view port register
(access to only one channel available each time).
This register mapping is not very effective and efficient in a multithread
environment, which has led to the development of unroll registers mapping,
which consists of having all channels registers accessible any time by
spreading all channels registers by an offset between them.
This version supports a maximum of 16 independent channels (8 write +
8 read), which can run simultaneously.
Implements a scatter-gather transfer through a linked list, where the size
of linked list depends on the allocated memory divided equally among all
channels.
Each linked list descriptor can transfer from 1 byte to 4 Gbytes and is
alignmented to DWORD.
Both SAR (Source Address Register) and DAR (Destination Address Register)
are alignmented to byte.
Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Add Synopsys PCIe Endpoint eDMA IP core driver to kernel.
This IP is generally distributed with Synopsys PCIe Endpoint IP (depends
of the use and licensing agreement).
This core driver, initializes and configures the eDMA IP using vma-helpers
functions and dma-engine subsystem.
This driver can be compile as built-in or external module in kernel.
To enable this driver just select DW_EDMA option in kernel configuration,
however it requires and selects automatically DMA_ENGINE and
DMA_VIRTUAL_CHANNELS option too.
In order to transfer data from point A to B as fast as possible this IP
requires a dedicated memory space containing linked list of elements.
All elements of this linked list are continuous and each one describes a
data transfer (source and destination addresses, length and a control
variable).
For the sake of simplicity, lets assume a memory space for channel write
0 which allows about 42 elements.
+---------+
| Desc #0 |-+
+---------+ |
V
+----------+
| Chunk #0 |-+
| CB = 1 | | +----------+ +-----+ +-----------+ +-----+
+----------+ +->| Burst #0 |->| ... |->| Burst #41 |->| llp |
| +----------+ +-----+ +-----------+ +-----+
V
+----------+
| Chunk #1 |-+
| CB = 0 | | +-----------+ +-----+ +-----------+ +-----+
+----------+ +->| Burst #42 |->| ... |->| Burst #83 |->| llp |
| +-----------+ +-----+ +-----------+ +-----+
V
+----------+
| Chunk #2 |-+
| CB = 1 | | +-----------+ +-----+ +------------+ +-----+
+----------+ +->| Burst #84 |->| ... |->| Burst #125 |->| llp |
| +-----------+ +-----+ +------------+ +-----+
V
+----------+
| Chunk #3 |-+
| CB = 0 | | +------------+ +-----+ +------------+ +-----+
+----------+ +->| Burst #126 |->| ... |->| Burst #129 |->| llp |
+------------+ +-----+ +------------+ +-----+
Legend:
- Linked list, also know as Chunk
- Linked list element*, also know as Burst *CB*, also know as Change Bit,
it's a control bit (and typically is toggled) that allows to easily
identify and differentiate between the current linked list and the
previous or the next one.
- LLP, is a special element that indicates the end of the linked list
element stream also informs that the next CB should be toggle
On every last Burst of the Chunk (Burst #41, Burst #83, Burst #125 or
even Burst #129) is set some flags on their control variable (RIE and
LIE bits) that will trigger the send of "done" interruption.
On the interruptions callback, is decided whether to recycle the linked
list memory space by writing a new set of Bursts elements (if still
exists Chunks to transfer) or is considered completed (if there is no
Chunks available to transfer).
On scatter-gather transfer mode, the client will submit a scatter-gather
list of n (on this case 130) elements, that will be divide in multiple
Chunks, each Chunk will have (on this case 42) a limited number of
Bursts and after transferring all Bursts, an interrupt will be
triggered, which will allow to recycle the all linked list dedicated
memory again with the new information relative to the next Chunk and
respective Burst associated and repeat the whole cycle again.
On cyclic transfer mode, the client will submit a buffer pointer, length
of it and number of repetitions, in this case each burst will correspond
directly to each repetition.
Each Burst can describes a data transfer from point A(source) to point
B(destination) with a length that can be from 1 byte up to 4 GB. Since
dedicated the memory space where the linked list will reside is limited,
the whole n burst elements will be organized in several Chunks, that
will be used later to recycle the dedicated memory space to initiate a
new sequence of data transfers.
The whole transfer is considered has completed when it was transferred
all bursts.
Currently this IP has a set well-known register map, which includes
support for legacy and unroll modes. Legacy mode is version of this
register map that has multiplexer register that allows to switch
registers between all write and read channels and the unroll modes
repeats all write and read channels registers with an offset between
them. This register map is called v0.
The IP team is creating a new register map more suitable to the latest
PCIe features, that very likely will change the map register, which this
version will be called v1. As soon as this new version is released by
the IP team the support for this version in be included on this driver.
According to the logic, patches 1, 2 and 3 should be squashed into 1
unique patch, but for the sake of simplicity of review, it was divided
in this 3 patches files.
Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>