2020-01-22 07:43:59 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
|
|
|
/* Copyright(c) 2019 Intel Corporation. All rights rsvd. */
|
|
|
|
#ifndef _IDXD_H_
|
|
|
|
#define _IDXD_H_
|
|
|
|
|
|
|
|
#include <linux/sbitmap.h>
|
2020-01-22 07:44:23 +08:00
|
|
|
#include <linux/dmaengine.h>
|
2020-01-22 07:43:59 +08:00
|
|
|
#include <linux/percpu-rwsem.h>
|
|
|
|
#include <linux/wait.h>
|
2020-01-22 07:44:29 +08:00
|
|
|
#include <linux/cdev.h>
|
2021-04-16 07:37:33 +08:00
|
|
|
#include <linux/idr.h>
|
dmaengine: idxd: Add IDXD performance monitor support
Implement the IDXD performance monitor capability (named 'perfmon' in
the DSA (Data Streaming Accelerator) spec [1]), which supports the
collection of information about key events occurring during DSA and
IAX (Intel Analytics Accelerator) device execution, to assist in
performance tuning and debugging.
The idxd perfmon support is implemented as part of the IDXD driver and
interfaces with the Linux perf framework. It has several features in
common with the existing uncore pmu support:
- it does not support sampling
- does not support per-thread counting
However it also has some unique features not present in the core and
uncore support:
- all general-purpose counters are identical, thus no event constraints
- operation is always system-wide
While the core perf subsystem assumes that all counters are by default
per-cpu, the uncore pmus are socket-scoped and use a cpu mask to
restrict counting to one cpu from each socket. IDXD counters use a
similar strategy but expand the scope even further; since IDXD
counters are system-wide and can be read from any cpu, the IDXD perf
driver picks a single cpu to do the work (with cpu hotplug notifiers
to choose a different cpu if the chosen one is taken off-line).
More specifically, the perf userspace tool by default opens a counter
for each cpu for an event. However, if it finds a cpumask file
associated with the pmu under sysfs, as is the case with the uncore
pmus, it will open counters only on the cpus specified by the cpumask.
Since perfmon only needs to open a single counter per event for a
given IDXD device, the perfmon driver will create a sysfs cpumask file
for the device and insert the first cpu of the system into it. When a
user uses perf to open an event, perf will open a single counter on
the cpu specified by the cpu mask. This amounts to the default
system-wide rather than per-cpu counting mentioned previously for
perfmon pmu events. In order to keep the cpu mask up-to-date, the
driver implements cpu hotplug support for multiple devices, as IDXD
usually enumerates and registers more than one idxd device.
The perfmon driver implements basic perfmon hardware capability
discovery and configuration, and is initialized by the IDXD driver's
probe function. During initialization, the driver retrieves the total
number of supported performance counters, the pmu ID, and the device
type from idxd device, and registers itself under the Linux perf
framework.
The perf userspace tool can be used to monitor single or multiple
events depending on the given configuration, as well as event groups,
which are also supported by the perfmon driver. The user configures
events using the perf tool command-line interface by specifying the
event and corresponding event category, along with an optional set of
filters that can be used to restrict counting to specific work queues,
traffic classes, page and transfer sizes, and engines (See [1] for
specifics).
With the configuration specified by the user, the perf tool issues a
system call passing that information to the kernel, which uses it to
initialize the specified event(s). The event(s) are opened and
started, and following termination of the perf command, they're
stopped. At that point, the perfmon driver will read the latest count
for the event(s), calculate the difference between the latest counter
values and previously tracked counter values, and display the final
incremental count as the event count for the cycle. An overflow
handler registered on the IDXD irq path is used to account for counter
overflows, which are signaled by an overflow interrupt.
Below are a couple of examples of perf usage for monitoring DSA events.
The following monitors all events in the 'engine' category. Becuuse
no filters are specified, this captures all engine events for the
workload, which in this case is 19 iterations of the work generated by
the kernel dmatest module.
Details describing the events can be found in Appendix D of [1],
Performance Monitoring Events, but briefly they are:
event 0x1: total input data processed, in 32-byte units
event 0x2: total data written, in 32-byte units
event 0x4: number of work descriptors that read the source
event 0x8: number of work descriptors that write the destination
event 0x10: number of work descriptors dispatched from batch descriptors
event 0x20: number of work descriptors dispatched from work queues
# perf stat -e dsa0/event=0x1,event_category=0x1/,
dsa0/event=0x2,event_category=0x1/,
dsa0/event=0x4,event_category=0x1/,
dsa0/event=0x8,event_category=0x1/,
dsa0/event=0x10,event_category=0x1/,
dsa0/event=0x20,event_category=0x1/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
5,332 dsa0/event=0x1,event_category=0x1/
5,327 dsa0/event=0x2,event_category=0x1/
19 dsa0/event=0x4,event_category=0x1/
19 dsa0/event=0x8,event_category=0x1/
0 dsa0/event=0x10,event_category=0x1/
19 dsa0/event=0x20,event_category=0x1/
21.977436186 seconds time elapsed
The command below illustrates filter usage with a simple example. It
specifies that MEM_MOVE operations should be counted for the DSA
device dsa0 (event 0x8 corresponds to the EV_MEM_MOVE event - Number
of Memory Move Descriptors, which is part of event category 0x3 -
Operations. The detailed category and event IDs are available in
Appendix D, Performance Monitoring Events, of [1]). In addition to
the event and event category, a number of filters are also specified
(the detailed filter values are available in Chapter 6.4 (Filter
Support) of [1]), which will restrict counting to only those events
that meet all of the filter criteria. In this case, the filters
specify that only MEM_MOVE operations that are serviced by work queue
wq0 and specifically engine number engine0 and traffic class tc0
having sizes between 0 and 4k and page size of between 0 and 1G result
in a counter hit; anything else will be filtered out and not appear in
the final count. Note that filters are optional - any filter not
specified is assumed to be all ones and will pass anything.
# perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
19 dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
21.865914091 seconds time elapsed
The output above reflects that the unspecified workload resulted in
the counting of 19 MEM_MOVE operation events that met the filter
criteria.
[1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html
[ Based on work originally by Jing Lin. ]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: https://lore.kernel.org/r/0c5080a7d541904c4ad42b848c76a1ce056ddac7.1619276133.git.zanussi@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-04-24 23:04:15 +08:00
|
|
|
#include <linux/pci.h>
|
|
|
|
#include <linux/perf_event.h>
|
2021-07-21 04:42:04 +08:00
|
|
|
#include <uapi/linux/idxd.h>
|
2020-01-22 07:43:59 +08:00
|
|
|
#include "registers.h"
|
|
|
|
|
|
|
|
#define IDXD_DRIVER_VERSION "1.00"
|
|
|
|
|
|
|
|
extern struct kmem_cache *idxd_desc_pool;
|
2021-07-21 04:42:10 +08:00
|
|
|
extern bool tc_override;
|
2020-01-22 07:43:59 +08:00
|
|
|
|
2021-04-16 07:37:10 +08:00
|
|
|
struct idxd_wq;
|
2021-07-16 02:43:20 +08:00
|
|
|
struct idxd_dev;
|
|
|
|
|
|
|
|
enum idxd_dev_type {
|
|
|
|
IDXD_DEV_NONE = -1,
|
|
|
|
IDXD_DEV_DSA = 0,
|
|
|
|
IDXD_DEV_IAX,
|
|
|
|
IDXD_DEV_WQ,
|
|
|
|
IDXD_DEV_GROUP,
|
|
|
|
IDXD_DEV_ENGINE,
|
|
|
|
IDXD_DEV_CDEV,
|
|
|
|
IDXD_DEV_MAX_TYPE,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct idxd_dev {
|
|
|
|
struct device conf_dev;
|
|
|
|
enum idxd_dev_type type;
|
|
|
|
};
|
2021-04-16 07:37:10 +08:00
|
|
|
|
2020-01-22 07:43:59 +08:00
|
|
|
#define IDXD_REG_TIMEOUT 50
|
|
|
|
#define IDXD_DRAIN_TIMEOUT 5000
|
|
|
|
|
|
|
|
enum idxd_type {
|
|
|
|
IDXD_TYPE_UNKNOWN = -1,
|
|
|
|
IDXD_TYPE_DSA = 0,
|
2020-11-18 04:39:14 +08:00
|
|
|
IDXD_TYPE_IAX,
|
|
|
|
IDXD_TYPE_MAX,
|
2020-01-22 07:43:59 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
#define IDXD_NAME_SIZE 128
|
dmaengine: idxd: Add IDXD performance monitor support
Implement the IDXD performance monitor capability (named 'perfmon' in
the DSA (Data Streaming Accelerator) spec [1]), which supports the
collection of information about key events occurring during DSA and
IAX (Intel Analytics Accelerator) device execution, to assist in
performance tuning and debugging.
The idxd perfmon support is implemented as part of the IDXD driver and
interfaces with the Linux perf framework. It has several features in
common with the existing uncore pmu support:
- it does not support sampling
- does not support per-thread counting
However it also has some unique features not present in the core and
uncore support:
- all general-purpose counters are identical, thus no event constraints
- operation is always system-wide
While the core perf subsystem assumes that all counters are by default
per-cpu, the uncore pmus are socket-scoped and use a cpu mask to
restrict counting to one cpu from each socket. IDXD counters use a
similar strategy but expand the scope even further; since IDXD
counters are system-wide and can be read from any cpu, the IDXD perf
driver picks a single cpu to do the work (with cpu hotplug notifiers
to choose a different cpu if the chosen one is taken off-line).
More specifically, the perf userspace tool by default opens a counter
for each cpu for an event. However, if it finds a cpumask file
associated with the pmu under sysfs, as is the case with the uncore
pmus, it will open counters only on the cpus specified by the cpumask.
Since perfmon only needs to open a single counter per event for a
given IDXD device, the perfmon driver will create a sysfs cpumask file
for the device and insert the first cpu of the system into it. When a
user uses perf to open an event, perf will open a single counter on
the cpu specified by the cpu mask. This amounts to the default
system-wide rather than per-cpu counting mentioned previously for
perfmon pmu events. In order to keep the cpu mask up-to-date, the
driver implements cpu hotplug support for multiple devices, as IDXD
usually enumerates and registers more than one idxd device.
The perfmon driver implements basic perfmon hardware capability
discovery and configuration, and is initialized by the IDXD driver's
probe function. During initialization, the driver retrieves the total
number of supported performance counters, the pmu ID, and the device
type from idxd device, and registers itself under the Linux perf
framework.
The perf userspace tool can be used to monitor single or multiple
events depending on the given configuration, as well as event groups,
which are also supported by the perfmon driver. The user configures
events using the perf tool command-line interface by specifying the
event and corresponding event category, along with an optional set of
filters that can be used to restrict counting to specific work queues,
traffic classes, page and transfer sizes, and engines (See [1] for
specifics).
With the configuration specified by the user, the perf tool issues a
system call passing that information to the kernel, which uses it to
initialize the specified event(s). The event(s) are opened and
started, and following termination of the perf command, they're
stopped. At that point, the perfmon driver will read the latest count
for the event(s), calculate the difference between the latest counter
values and previously tracked counter values, and display the final
incremental count as the event count for the cycle. An overflow
handler registered on the IDXD irq path is used to account for counter
overflows, which are signaled by an overflow interrupt.
Below are a couple of examples of perf usage for monitoring DSA events.
The following monitors all events in the 'engine' category. Becuuse
no filters are specified, this captures all engine events for the
workload, which in this case is 19 iterations of the work generated by
the kernel dmatest module.
Details describing the events can be found in Appendix D of [1],
Performance Monitoring Events, but briefly they are:
event 0x1: total input data processed, in 32-byte units
event 0x2: total data written, in 32-byte units
event 0x4: number of work descriptors that read the source
event 0x8: number of work descriptors that write the destination
event 0x10: number of work descriptors dispatched from batch descriptors
event 0x20: number of work descriptors dispatched from work queues
# perf stat -e dsa0/event=0x1,event_category=0x1/,
dsa0/event=0x2,event_category=0x1/,
dsa0/event=0x4,event_category=0x1/,
dsa0/event=0x8,event_category=0x1/,
dsa0/event=0x10,event_category=0x1/,
dsa0/event=0x20,event_category=0x1/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
5,332 dsa0/event=0x1,event_category=0x1/
5,327 dsa0/event=0x2,event_category=0x1/
19 dsa0/event=0x4,event_category=0x1/
19 dsa0/event=0x8,event_category=0x1/
0 dsa0/event=0x10,event_category=0x1/
19 dsa0/event=0x20,event_category=0x1/
21.977436186 seconds time elapsed
The command below illustrates filter usage with a simple example. It
specifies that MEM_MOVE operations should be counted for the DSA
device dsa0 (event 0x8 corresponds to the EV_MEM_MOVE event - Number
of Memory Move Descriptors, which is part of event category 0x3 -
Operations. The detailed category and event IDs are available in
Appendix D, Performance Monitoring Events, of [1]). In addition to
the event and event category, a number of filters are also specified
(the detailed filter values are available in Chapter 6.4 (Filter
Support) of [1]), which will restrict counting to only those events
that meet all of the filter criteria. In this case, the filters
specify that only MEM_MOVE operations that are serviced by work queue
wq0 and specifically engine number engine0 and traffic class tc0
having sizes between 0 and 4k and page size of between 0 and 1G result
in a counter hit; anything else will be filtered out and not appear in
the final count. Note that filters are optional - any filter not
specified is assumed to be all ones and will pass anything.
# perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
19 dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
21.865914091 seconds time elapsed
The output above reflects that the unspecified workload resulted in
the counting of 19 MEM_MOVE operation events that met the filter
criteria.
[1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html
[ Based on work originally by Jing Lin. ]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: https://lore.kernel.org/r/0c5080a7d541904c4ad42b848c76a1ce056ddac7.1619276133.git.zanussi@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-04-24 23:04:15 +08:00
|
|
|
#define IDXD_PMU_EVENT_MAX 64
|
2020-01-22 07:43:59 +08:00
|
|
|
|
|
|
|
struct idxd_device_driver {
|
2021-07-16 02:43:15 +08:00
|
|
|
const char *name;
|
2021-07-16 02:44:18 +08:00
|
|
|
enum idxd_dev_type *type;
|
2021-07-16 02:43:55 +08:00
|
|
|
int (*probe)(struct idxd_dev *idxd_dev);
|
|
|
|
void (*remove)(struct idxd_dev *idxd_dev);
|
2020-01-22 07:43:59 +08:00
|
|
|
struct device_driver drv;
|
|
|
|
};
|
|
|
|
|
2021-07-16 02:44:13 +08:00
|
|
|
extern struct idxd_device_driver dsa_drv;
|
2021-07-16 02:44:24 +08:00
|
|
|
extern struct idxd_device_driver idxd_drv;
|
2021-07-16 02:44:30 +08:00
|
|
|
extern struct idxd_device_driver idxd_dmaengine_drv;
|
2021-07-16 02:44:35 +08:00
|
|
|
extern struct idxd_device_driver idxd_user_drv;
|
2021-07-16 02:44:13 +08:00
|
|
|
|
2020-01-22 07:43:59 +08:00
|
|
|
struct idxd_irq_entry {
|
|
|
|
struct idxd_device *idxd;
|
|
|
|
int id;
|
2021-04-16 07:37:15 +08:00
|
|
|
int vector;
|
2020-01-22 07:43:59 +08:00
|
|
|
struct llist_head pending_llist;
|
|
|
|
struct list_head work_list;
|
2020-10-28 01:34:40 +08:00
|
|
|
/*
|
|
|
|
* Lock to protect access between irq thread process descriptor
|
|
|
|
* and irq thread processing error descriptor.
|
|
|
|
*/
|
|
|
|
spinlock_t list_lock;
|
2020-01-22 07:43:59 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
struct idxd_group {
|
2021-07-16 02:43:20 +08:00
|
|
|
struct idxd_dev idxd_dev;
|
2020-01-22 07:43:59 +08:00
|
|
|
struct idxd_device *idxd;
|
|
|
|
struct grpcfg grpcfg;
|
|
|
|
int id;
|
|
|
|
int num_engines;
|
|
|
|
int num_wqs;
|
|
|
|
bool use_token_limit;
|
|
|
|
u8 tokens_allowed;
|
|
|
|
u8 tokens_reserved;
|
|
|
|
int tc_a;
|
|
|
|
int tc_b;
|
|
|
|
};
|
|
|
|
|
dmaengine: idxd: Add IDXD performance monitor support
Implement the IDXD performance monitor capability (named 'perfmon' in
the DSA (Data Streaming Accelerator) spec [1]), which supports the
collection of information about key events occurring during DSA and
IAX (Intel Analytics Accelerator) device execution, to assist in
performance tuning and debugging.
The idxd perfmon support is implemented as part of the IDXD driver and
interfaces with the Linux perf framework. It has several features in
common with the existing uncore pmu support:
- it does not support sampling
- does not support per-thread counting
However it also has some unique features not present in the core and
uncore support:
- all general-purpose counters are identical, thus no event constraints
- operation is always system-wide
While the core perf subsystem assumes that all counters are by default
per-cpu, the uncore pmus are socket-scoped and use a cpu mask to
restrict counting to one cpu from each socket. IDXD counters use a
similar strategy but expand the scope even further; since IDXD
counters are system-wide and can be read from any cpu, the IDXD perf
driver picks a single cpu to do the work (with cpu hotplug notifiers
to choose a different cpu if the chosen one is taken off-line).
More specifically, the perf userspace tool by default opens a counter
for each cpu for an event. However, if it finds a cpumask file
associated with the pmu under sysfs, as is the case with the uncore
pmus, it will open counters only on the cpus specified by the cpumask.
Since perfmon only needs to open a single counter per event for a
given IDXD device, the perfmon driver will create a sysfs cpumask file
for the device and insert the first cpu of the system into it. When a
user uses perf to open an event, perf will open a single counter on
the cpu specified by the cpu mask. This amounts to the default
system-wide rather than per-cpu counting mentioned previously for
perfmon pmu events. In order to keep the cpu mask up-to-date, the
driver implements cpu hotplug support for multiple devices, as IDXD
usually enumerates and registers more than one idxd device.
The perfmon driver implements basic perfmon hardware capability
discovery and configuration, and is initialized by the IDXD driver's
probe function. During initialization, the driver retrieves the total
number of supported performance counters, the pmu ID, and the device
type from idxd device, and registers itself under the Linux perf
framework.
The perf userspace tool can be used to monitor single or multiple
events depending on the given configuration, as well as event groups,
which are also supported by the perfmon driver. The user configures
events using the perf tool command-line interface by specifying the
event and corresponding event category, along with an optional set of
filters that can be used to restrict counting to specific work queues,
traffic classes, page and transfer sizes, and engines (See [1] for
specifics).
With the configuration specified by the user, the perf tool issues a
system call passing that information to the kernel, which uses it to
initialize the specified event(s). The event(s) are opened and
started, and following termination of the perf command, they're
stopped. At that point, the perfmon driver will read the latest count
for the event(s), calculate the difference between the latest counter
values and previously tracked counter values, and display the final
incremental count as the event count for the cycle. An overflow
handler registered on the IDXD irq path is used to account for counter
overflows, which are signaled by an overflow interrupt.
Below are a couple of examples of perf usage for monitoring DSA events.
The following monitors all events in the 'engine' category. Becuuse
no filters are specified, this captures all engine events for the
workload, which in this case is 19 iterations of the work generated by
the kernel dmatest module.
Details describing the events can be found in Appendix D of [1],
Performance Monitoring Events, but briefly they are:
event 0x1: total input data processed, in 32-byte units
event 0x2: total data written, in 32-byte units
event 0x4: number of work descriptors that read the source
event 0x8: number of work descriptors that write the destination
event 0x10: number of work descriptors dispatched from batch descriptors
event 0x20: number of work descriptors dispatched from work queues
# perf stat -e dsa0/event=0x1,event_category=0x1/,
dsa0/event=0x2,event_category=0x1/,
dsa0/event=0x4,event_category=0x1/,
dsa0/event=0x8,event_category=0x1/,
dsa0/event=0x10,event_category=0x1/,
dsa0/event=0x20,event_category=0x1/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
5,332 dsa0/event=0x1,event_category=0x1/
5,327 dsa0/event=0x2,event_category=0x1/
19 dsa0/event=0x4,event_category=0x1/
19 dsa0/event=0x8,event_category=0x1/
0 dsa0/event=0x10,event_category=0x1/
19 dsa0/event=0x20,event_category=0x1/
21.977436186 seconds time elapsed
The command below illustrates filter usage with a simple example. It
specifies that MEM_MOVE operations should be counted for the DSA
device dsa0 (event 0x8 corresponds to the EV_MEM_MOVE event - Number
of Memory Move Descriptors, which is part of event category 0x3 -
Operations. The detailed category and event IDs are available in
Appendix D, Performance Monitoring Events, of [1]). In addition to
the event and event category, a number of filters are also specified
(the detailed filter values are available in Chapter 6.4 (Filter
Support) of [1]), which will restrict counting to only those events
that meet all of the filter criteria. In this case, the filters
specify that only MEM_MOVE operations that are serviced by work queue
wq0 and specifically engine number engine0 and traffic class tc0
having sizes between 0 and 4k and page size of between 0 and 1G result
in a counter hit; anything else will be filtered out and not appear in
the final count. Note that filters are optional - any filter not
specified is assumed to be all ones and will pass anything.
# perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
19 dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
21.865914091 seconds time elapsed
The output above reflects that the unspecified workload resulted in
the counting of 19 MEM_MOVE operation events that met the filter
criteria.
[1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html
[ Based on work originally by Jing Lin. ]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: https://lore.kernel.org/r/0c5080a7d541904c4ad42b848c76a1ce056ddac7.1619276133.git.zanussi@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-04-24 23:04:15 +08:00
|
|
|
struct idxd_pmu {
|
|
|
|
struct idxd_device *idxd;
|
|
|
|
|
|
|
|
struct perf_event *event_list[IDXD_PMU_EVENT_MAX];
|
|
|
|
int n_events;
|
|
|
|
|
|
|
|
DECLARE_BITMAP(used_mask, IDXD_PMU_EVENT_MAX);
|
|
|
|
|
|
|
|
struct pmu pmu;
|
|
|
|
char name[IDXD_NAME_SIZE];
|
|
|
|
int cpu;
|
|
|
|
|
|
|
|
int n_counters;
|
|
|
|
int counter_width;
|
|
|
|
int n_event_categories;
|
|
|
|
|
|
|
|
bool per_counter_caps_supported;
|
|
|
|
unsigned long supported_event_categories;
|
|
|
|
|
|
|
|
unsigned long supported_filters;
|
|
|
|
int n_filters;
|
|
|
|
|
|
|
|
struct hlist_node cpuhp_node;
|
|
|
|
};
|
|
|
|
|
2020-01-22 07:43:59 +08:00
|
|
|
#define IDXD_MAX_PRIORITY 0xf
|
|
|
|
|
|
|
|
enum idxd_wq_state {
|
|
|
|
IDXD_WQ_DISABLED = 0,
|
|
|
|
IDXD_WQ_ENABLED,
|
|
|
|
};
|
|
|
|
|
|
|
|
enum idxd_wq_flag {
|
|
|
|
WQ_FLAG_DEDICATED = 0,
|
2020-10-28 01:34:35 +08:00
|
|
|
WQ_FLAG_BLOCK_ON_FAULT,
|
2020-01-22 07:43:59 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
enum idxd_wq_type {
|
|
|
|
IDXD_WQT_NONE = 0,
|
|
|
|
IDXD_WQT_KERNEL,
|
2020-01-22 07:44:29 +08:00
|
|
|
IDXD_WQT_USER,
|
|
|
|
};
|
|
|
|
|
|
|
|
struct idxd_cdev {
|
2021-04-16 07:37:57 +08:00
|
|
|
struct idxd_wq *wq;
|
2020-01-22 07:44:29 +08:00
|
|
|
struct cdev cdev;
|
2021-07-16 02:43:20 +08:00
|
|
|
struct idxd_dev idxd_dev;
|
2020-01-22 07:44:29 +08:00
|
|
|
int minor;
|
2020-01-22 07:43:59 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
#define IDXD_ALLOCATED_BATCH_SIZE 128U
|
|
|
|
#define WQ_NAME_SIZE 1024
|
|
|
|
#define WQ_TYPE_SIZE 10
|
|
|
|
|
2020-01-22 07:44:17 +08:00
|
|
|
enum idxd_op_type {
|
|
|
|
IDXD_OP_BLOCK = 0,
|
|
|
|
IDXD_OP_NONBLOCK = 1,
|
|
|
|
};
|
|
|
|
|
2020-01-22 07:44:23 +08:00
|
|
|
enum idxd_complete_type {
|
|
|
|
IDXD_COMPLETE_NORMAL = 0,
|
|
|
|
IDXD_COMPLETE_ABORT,
|
2020-10-28 01:34:35 +08:00
|
|
|
IDXD_COMPLETE_DEV_FAIL,
|
2020-01-22 07:44:23 +08:00
|
|
|
};
|
|
|
|
|
2021-04-16 07:37:10 +08:00
|
|
|
struct idxd_dma_chan {
|
|
|
|
struct dma_chan chan;
|
|
|
|
struct idxd_wq *wq;
|
|
|
|
};
|
|
|
|
|
2020-01-22 07:43:59 +08:00
|
|
|
struct idxd_wq {
|
2020-10-28 01:34:35 +08:00
|
|
|
void __iomem *portal;
|
2021-07-21 04:42:04 +08:00
|
|
|
u32 portal_offset;
|
2021-04-21 02:46:22 +08:00
|
|
|
struct percpu_ref wq_active;
|
|
|
|
struct completion wq_dead;
|
2021-07-16 02:43:20 +08:00
|
|
|
struct idxd_dev idxd_dev;
|
2021-04-16 07:37:57 +08:00
|
|
|
struct idxd_cdev *idxd_cdev;
|
|
|
|
struct wait_queue_head err_queue;
|
2020-01-22 07:43:59 +08:00
|
|
|
struct idxd_device *idxd;
|
|
|
|
int id;
|
|
|
|
enum idxd_wq_type type;
|
|
|
|
struct idxd_group *group;
|
|
|
|
int client_count;
|
|
|
|
struct mutex wq_lock; /* mutex for workqueue */
|
|
|
|
u32 size;
|
|
|
|
u32 threshold;
|
|
|
|
u32 priority;
|
|
|
|
enum idxd_wq_state state;
|
|
|
|
unsigned long flags;
|
2020-10-28 05:34:09 +08:00
|
|
|
union wqcfg *wqcfg;
|
2020-01-22 07:43:59 +08:00
|
|
|
struct dsa_hw_desc **hw_descs;
|
|
|
|
int num_descs;
|
2020-11-18 04:39:14 +08:00
|
|
|
union {
|
|
|
|
struct dsa_completion_record *compls;
|
|
|
|
struct iax_completion_record *iax_compls;
|
|
|
|
};
|
2020-01-22 07:43:59 +08:00
|
|
|
dma_addr_t compls_addr;
|
|
|
|
int compls_size;
|
|
|
|
struct idxd_desc **descs;
|
2020-06-16 04:54:26 +08:00
|
|
|
struct sbitmap_queue sbq;
|
2021-04-16 07:37:10 +08:00
|
|
|
struct idxd_dma_chan *idxd_chan;
|
2020-01-22 07:43:59 +08:00
|
|
|
char name[WQ_NAME_SIZE + 1];
|
2020-08-29 06:12:10 +08:00
|
|
|
u64 max_xfer_bytes;
|
2020-08-29 06:12:50 +08:00
|
|
|
u32 max_batch_size;
|
2020-11-14 06:55:05 +08:00
|
|
|
bool ats_dis;
|
2020-01-22 07:43:59 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
struct idxd_engine {
|
2021-07-16 02:43:20 +08:00
|
|
|
struct idxd_dev idxd_dev;
|
2020-01-22 07:43:59 +08:00
|
|
|
int id;
|
|
|
|
struct idxd_group *group;
|
|
|
|
struct idxd_device *idxd;
|
|
|
|
};
|
|
|
|
|
|
|
|
/* shadow registers */
|
|
|
|
struct idxd_hw {
|
|
|
|
u32 version;
|
|
|
|
union gen_cap_reg gen_cap;
|
|
|
|
union wq_cap_reg wq_cap;
|
|
|
|
union group_cap_reg group_cap;
|
|
|
|
union engine_cap_reg engine_cap;
|
|
|
|
struct opcap opcap;
|
2021-04-21 02:46:34 +08:00
|
|
|
u32 cmd_cap;
|
2020-01-22 07:43:59 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
enum idxd_device_state {
|
|
|
|
IDXD_DEV_HALTED = -1,
|
|
|
|
IDXD_DEV_DISABLED = 0,
|
|
|
|
IDXD_DEV_ENABLED,
|
|
|
|
};
|
|
|
|
|
|
|
|
enum idxd_device_flag {
|
|
|
|
IDXD_FLAG_CONFIGURABLE = 0,
|
2020-06-27 02:11:18 +08:00
|
|
|
IDXD_FLAG_CMD_RUNNING,
|
2020-10-28 01:34:35 +08:00
|
|
|
IDXD_FLAG_PASID_ENABLED,
|
2020-01-22 07:43:59 +08:00
|
|
|
};
|
|
|
|
|
2021-04-16 07:37:10 +08:00
|
|
|
struct idxd_dma_dev {
|
|
|
|
struct idxd_device *idxd;
|
|
|
|
struct dma_device dma;
|
|
|
|
};
|
|
|
|
|
2021-04-16 07:38:09 +08:00
|
|
|
struct idxd_driver_data {
|
|
|
|
const char *name_prefix;
|
2020-01-22 07:43:59 +08:00
|
|
|
enum idxd_type type;
|
2021-04-16 07:38:09 +08:00
|
|
|
struct device_type *dev_type;
|
|
|
|
int compl_size;
|
|
|
|
int align;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct idxd_device {
|
2021-07-16 02:43:20 +08:00
|
|
|
struct idxd_dev idxd_dev;
|
2021-04-16 07:38:09 +08:00
|
|
|
struct idxd_driver_data *data;
|
2020-01-22 07:43:59 +08:00
|
|
|
struct list_head list;
|
|
|
|
struct idxd_hw hw;
|
|
|
|
enum idxd_device_state state;
|
|
|
|
unsigned long flags;
|
|
|
|
int id;
|
2020-01-22 07:44:29 +08:00
|
|
|
int major;
|
2021-07-21 04:42:15 +08:00
|
|
|
u32 cmd_status;
|
2020-01-22 07:43:59 +08:00
|
|
|
|
|
|
|
struct pci_dev *pdev;
|
|
|
|
void __iomem *reg_base;
|
|
|
|
|
|
|
|
spinlock_t dev_lock; /* spinlock for device */
|
2021-04-21 03:00:56 +08:00
|
|
|
spinlock_t cmd_lock; /* spinlock for device commands */
|
2020-06-27 02:11:18 +08:00
|
|
|
struct completion *cmd_done;
|
2021-04-16 07:37:51 +08:00
|
|
|
struct idxd_group **groups;
|
2021-04-16 07:37:39 +08:00
|
|
|
struct idxd_wq **wqs;
|
2021-04-16 07:37:44 +08:00
|
|
|
struct idxd_engine **engines;
|
2020-01-22 07:43:59 +08:00
|
|
|
|
2020-10-28 01:34:35 +08:00
|
|
|
struct iommu_sva *sva;
|
|
|
|
unsigned int pasid;
|
|
|
|
|
2020-01-22 07:43:59 +08:00
|
|
|
int num_groups;
|
|
|
|
|
|
|
|
u32 msix_perm_offset;
|
|
|
|
u32 wqcfg_offset;
|
|
|
|
u32 grpcfg_offset;
|
|
|
|
u32 perfmon_offset;
|
|
|
|
|
|
|
|
u64 max_xfer_bytes;
|
|
|
|
u32 max_batch_size;
|
|
|
|
int max_groups;
|
|
|
|
int max_engines;
|
|
|
|
int max_tokens;
|
|
|
|
int max_wqs;
|
|
|
|
int max_wq_size;
|
|
|
|
int token_limit;
|
2020-01-22 07:44:05 +08:00
|
|
|
int nr_tokens; /* non-reserved tokens */
|
2020-10-28 05:34:09 +08:00
|
|
|
unsigned int wqcfg_size;
|
2020-01-22 07:43:59 +08:00
|
|
|
|
|
|
|
union sw_err_reg sw_err;
|
2020-06-27 02:11:18 +08:00
|
|
|
wait_queue_head_t cmd_waitq;
|
2020-01-22 07:43:59 +08:00
|
|
|
int num_wq_irqs;
|
|
|
|
struct idxd_irq_entry *irq_entries;
|
2020-01-22 07:44:23 +08:00
|
|
|
|
2021-04-16 07:37:10 +08:00
|
|
|
struct idxd_dma_dev *idxd_dma;
|
2020-06-27 02:11:18 +08:00
|
|
|
struct workqueue_struct *wq;
|
|
|
|
struct work_struct work;
|
2021-04-21 02:46:34 +08:00
|
|
|
|
|
|
|
int *int_handles;
|
dmaengine: idxd: Add IDXD performance monitor support
Implement the IDXD performance monitor capability (named 'perfmon' in
the DSA (Data Streaming Accelerator) spec [1]), which supports the
collection of information about key events occurring during DSA and
IAX (Intel Analytics Accelerator) device execution, to assist in
performance tuning and debugging.
The idxd perfmon support is implemented as part of the IDXD driver and
interfaces with the Linux perf framework. It has several features in
common with the existing uncore pmu support:
- it does not support sampling
- does not support per-thread counting
However it also has some unique features not present in the core and
uncore support:
- all general-purpose counters are identical, thus no event constraints
- operation is always system-wide
While the core perf subsystem assumes that all counters are by default
per-cpu, the uncore pmus are socket-scoped and use a cpu mask to
restrict counting to one cpu from each socket. IDXD counters use a
similar strategy but expand the scope even further; since IDXD
counters are system-wide and can be read from any cpu, the IDXD perf
driver picks a single cpu to do the work (with cpu hotplug notifiers
to choose a different cpu if the chosen one is taken off-line).
More specifically, the perf userspace tool by default opens a counter
for each cpu for an event. However, if it finds a cpumask file
associated with the pmu under sysfs, as is the case with the uncore
pmus, it will open counters only on the cpus specified by the cpumask.
Since perfmon only needs to open a single counter per event for a
given IDXD device, the perfmon driver will create a sysfs cpumask file
for the device and insert the first cpu of the system into it. When a
user uses perf to open an event, perf will open a single counter on
the cpu specified by the cpu mask. This amounts to the default
system-wide rather than per-cpu counting mentioned previously for
perfmon pmu events. In order to keep the cpu mask up-to-date, the
driver implements cpu hotplug support for multiple devices, as IDXD
usually enumerates and registers more than one idxd device.
The perfmon driver implements basic perfmon hardware capability
discovery and configuration, and is initialized by the IDXD driver's
probe function. During initialization, the driver retrieves the total
number of supported performance counters, the pmu ID, and the device
type from idxd device, and registers itself under the Linux perf
framework.
The perf userspace tool can be used to monitor single or multiple
events depending on the given configuration, as well as event groups,
which are also supported by the perfmon driver. The user configures
events using the perf tool command-line interface by specifying the
event and corresponding event category, along with an optional set of
filters that can be used to restrict counting to specific work queues,
traffic classes, page and transfer sizes, and engines (See [1] for
specifics).
With the configuration specified by the user, the perf tool issues a
system call passing that information to the kernel, which uses it to
initialize the specified event(s). The event(s) are opened and
started, and following termination of the perf command, they're
stopped. At that point, the perfmon driver will read the latest count
for the event(s), calculate the difference between the latest counter
values and previously tracked counter values, and display the final
incremental count as the event count for the cycle. An overflow
handler registered on the IDXD irq path is used to account for counter
overflows, which are signaled by an overflow interrupt.
Below are a couple of examples of perf usage for monitoring DSA events.
The following monitors all events in the 'engine' category. Becuuse
no filters are specified, this captures all engine events for the
workload, which in this case is 19 iterations of the work generated by
the kernel dmatest module.
Details describing the events can be found in Appendix D of [1],
Performance Monitoring Events, but briefly they are:
event 0x1: total input data processed, in 32-byte units
event 0x2: total data written, in 32-byte units
event 0x4: number of work descriptors that read the source
event 0x8: number of work descriptors that write the destination
event 0x10: number of work descriptors dispatched from batch descriptors
event 0x20: number of work descriptors dispatched from work queues
# perf stat -e dsa0/event=0x1,event_category=0x1/,
dsa0/event=0x2,event_category=0x1/,
dsa0/event=0x4,event_category=0x1/,
dsa0/event=0x8,event_category=0x1/,
dsa0/event=0x10,event_category=0x1/,
dsa0/event=0x20,event_category=0x1/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
5,332 dsa0/event=0x1,event_category=0x1/
5,327 dsa0/event=0x2,event_category=0x1/
19 dsa0/event=0x4,event_category=0x1/
19 dsa0/event=0x8,event_category=0x1/
0 dsa0/event=0x10,event_category=0x1/
19 dsa0/event=0x20,event_category=0x1/
21.977436186 seconds time elapsed
The command below illustrates filter usage with a simple example. It
specifies that MEM_MOVE operations should be counted for the DSA
device dsa0 (event 0x8 corresponds to the EV_MEM_MOVE event - Number
of Memory Move Descriptors, which is part of event category 0x3 -
Operations. The detailed category and event IDs are available in
Appendix D, Performance Monitoring Events, of [1]). In addition to
the event and event category, a number of filters are also specified
(the detailed filter values are available in Chapter 6.4 (Filter
Support) of [1]), which will restrict counting to only those events
that meet all of the filter criteria. In this case, the filters
specify that only MEM_MOVE operations that are serviced by work queue
wq0 and specifically engine number engine0 and traffic class tc0
having sizes between 0 and 4k and page size of between 0 and 1G result
in a counter hit; anything else will be filtered out and not appear in
the final count. Note that filters are optional - any filter not
specified is assumed to be all ones and will pass anything.
# perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
19 dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
21.865914091 seconds time elapsed
The output above reflects that the unspecified workload resulted in
the counting of 19 MEM_MOVE operation events that met the filter
criteria.
[1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html
[ Based on work originally by Jing Lin. ]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: https://lore.kernel.org/r/0c5080a7d541904c4ad42b848c76a1ce056ddac7.1619276133.git.zanussi@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-04-24 23:04:15 +08:00
|
|
|
|
|
|
|
struct idxd_pmu *idxd_pmu;
|
2020-01-22 07:43:59 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
/* IDXD software descriptor */
|
|
|
|
struct idxd_desc {
|
2020-11-18 04:39:14 +08:00
|
|
|
union {
|
|
|
|
struct dsa_hw_desc *hw;
|
|
|
|
struct iax_hw_desc *iax_hw;
|
|
|
|
};
|
2020-01-22 07:43:59 +08:00
|
|
|
dma_addr_t desc_dma;
|
2020-11-18 04:39:14 +08:00
|
|
|
union {
|
|
|
|
struct dsa_completion_record *completion;
|
|
|
|
struct iax_completion_record *iax_completion;
|
|
|
|
};
|
2020-01-22 07:43:59 +08:00
|
|
|
dma_addr_t compl_dma;
|
2020-01-22 07:44:23 +08:00
|
|
|
struct dma_async_tx_descriptor txd;
|
2020-01-22 07:43:59 +08:00
|
|
|
struct llist_node llnode;
|
|
|
|
struct list_head list;
|
|
|
|
int id;
|
2020-06-16 04:54:26 +08:00
|
|
|
int cpu;
|
2020-01-22 07:43:59 +08:00
|
|
|
struct idxd_wq *wq;
|
|
|
|
};
|
|
|
|
|
2021-07-15 02:50:06 +08:00
|
|
|
/*
|
|
|
|
* This is software defined error for the completion status. We overload the error code
|
|
|
|
* that will never appear in completion status and only SWERR register.
|
|
|
|
*/
|
|
|
|
enum idxd_completion_status {
|
|
|
|
IDXD_COMP_DESC_ABORT = 0xff,
|
|
|
|
};
|
|
|
|
|
2021-07-16 02:43:20 +08:00
|
|
|
#define idxd_confdev(idxd) &idxd->idxd_dev.conf_dev
|
|
|
|
#define wq_confdev(wq) &wq->idxd_dev.conf_dev
|
|
|
|
#define engine_confdev(engine) &engine->idxd_dev.conf_dev
|
|
|
|
#define group_confdev(group) &group->idxd_dev.conf_dev
|
|
|
|
#define cdev_dev(cdev) &cdev->idxd_dev.conf_dev
|
|
|
|
|
|
|
|
#define confdev_to_idxd_dev(dev) container_of(dev, struct idxd_dev, conf_dev)
|
2021-07-16 02:43:55 +08:00
|
|
|
#define idxd_dev_to_idxd(idxd_dev) container_of(idxd_dev, struct idxd_device, idxd_dev)
|
|
|
|
#define idxd_dev_to_wq(idxd_dev) container_of(idxd_dev, struct idxd_wq, idxd_dev)
|
2021-07-16 02:43:20 +08:00
|
|
|
|
|
|
|
static inline struct idxd_device *confdev_to_idxd(struct device *dev)
|
|
|
|
{
|
|
|
|
struct idxd_dev *idxd_dev = confdev_to_idxd_dev(dev);
|
|
|
|
|
2021-07-16 02:43:55 +08:00
|
|
|
return idxd_dev_to_idxd(idxd_dev);
|
2021-07-16 02:43:20 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct idxd_wq *confdev_to_wq(struct device *dev)
|
|
|
|
{
|
|
|
|
struct idxd_dev *idxd_dev = confdev_to_idxd_dev(dev);
|
|
|
|
|
2021-07-16 02:43:55 +08:00
|
|
|
return idxd_dev_to_wq(idxd_dev);
|
2021-07-16 02:43:20 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct idxd_engine *confdev_to_engine(struct device *dev)
|
|
|
|
{
|
|
|
|
struct idxd_dev *idxd_dev = confdev_to_idxd_dev(dev);
|
|
|
|
|
|
|
|
return container_of(idxd_dev, struct idxd_engine, idxd_dev);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct idxd_group *confdev_to_group(struct device *dev)
|
|
|
|
{
|
|
|
|
struct idxd_dev *idxd_dev = confdev_to_idxd_dev(dev);
|
|
|
|
|
|
|
|
return container_of(idxd_dev, struct idxd_group, idxd_dev);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline struct idxd_cdev *dev_to_cdev(struct device *dev)
|
|
|
|
{
|
|
|
|
struct idxd_dev *idxd_dev = confdev_to_idxd_dev(dev);
|
|
|
|
|
|
|
|
return container_of(idxd_dev, struct idxd_cdev, idxd_dev);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void idxd_dev_set_type(struct idxd_dev *idev, int type)
|
|
|
|
{
|
|
|
|
if (type >= IDXD_DEV_MAX_TYPE) {
|
|
|
|
idev->type = IDXD_DEV_NONE;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
idev->type = type;
|
|
|
|
}
|
2020-01-22 07:43:59 +08:00
|
|
|
|
2020-01-22 07:44:29 +08:00
|
|
|
extern struct bus_type dsa_bus_type;
|
|
|
|
|
2020-10-28 01:34:35 +08:00
|
|
|
extern bool support_enqcmd;
|
2021-04-16 07:38:03 +08:00
|
|
|
extern struct ida idxd_ida;
|
2021-04-16 07:37:33 +08:00
|
|
|
extern struct device_type dsa_device_type;
|
|
|
|
extern struct device_type iax_device_type;
|
2021-04-16 07:37:39 +08:00
|
|
|
extern struct device_type idxd_wq_device_type;
|
2021-04-16 07:37:44 +08:00
|
|
|
extern struct device_type idxd_engine_device_type;
|
2021-04-16 07:37:51 +08:00
|
|
|
extern struct device_type idxd_group_device_type;
|
2021-04-16 07:37:33 +08:00
|
|
|
|
2021-07-16 02:43:55 +08:00
|
|
|
static inline bool is_dsa_dev(struct idxd_dev *idxd_dev)
|
2021-04-16 07:37:33 +08:00
|
|
|
{
|
2021-07-16 02:43:55 +08:00
|
|
|
return idxd_dev->type == IDXD_DEV_DSA;
|
2021-04-16 07:37:33 +08:00
|
|
|
}
|
|
|
|
|
2021-07-16 02:43:55 +08:00
|
|
|
static inline bool is_iax_dev(struct idxd_dev *idxd_dev)
|
2021-04-16 07:37:33 +08:00
|
|
|
{
|
2021-07-16 02:43:55 +08:00
|
|
|
return idxd_dev->type == IDXD_DEV_IAX;
|
2021-04-16 07:37:33 +08:00
|
|
|
}
|
|
|
|
|
2021-07-16 02:43:55 +08:00
|
|
|
static inline bool is_idxd_dev(struct idxd_dev *idxd_dev)
|
2021-04-16 07:37:33 +08:00
|
|
|
{
|
2021-07-16 02:43:55 +08:00
|
|
|
return is_dsa_dev(idxd_dev) || is_iax_dev(idxd_dev);
|
2021-04-16 07:37:33 +08:00
|
|
|
}
|
2020-10-28 01:34:35 +08:00
|
|
|
|
2021-07-16 02:43:55 +08:00
|
|
|
static inline bool is_idxd_wq_dev(struct idxd_dev *idxd_dev)
|
2021-04-16 07:37:39 +08:00
|
|
|
{
|
2021-07-16 02:43:55 +08:00
|
|
|
return idxd_dev->type == IDXD_DEV_WQ;
|
2021-04-16 07:37:39 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool is_idxd_wq_dmaengine(struct idxd_wq *wq)
|
|
|
|
{
|
|
|
|
if (wq->type == IDXD_WQT_KERNEL && strcmp(wq->name, "dmaengine") == 0)
|
|
|
|
return true;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2021-07-16 02:44:47 +08:00
|
|
|
static inline bool is_idxd_wq_user(struct idxd_wq *wq)
|
2021-04-16 07:37:39 +08:00
|
|
|
{
|
|
|
|
return wq->type == IDXD_WQT_USER;
|
|
|
|
}
|
|
|
|
|
2021-07-16 02:44:47 +08:00
|
|
|
static inline bool is_idxd_wq_kernel(struct idxd_wq *wq)
|
|
|
|
{
|
|
|
|
return wq->type == IDXD_WQT_KERNEL;
|
|
|
|
}
|
|
|
|
|
2020-01-22 07:43:59 +08:00
|
|
|
static inline bool wq_dedicated(struct idxd_wq *wq)
|
|
|
|
{
|
|
|
|
return test_bit(WQ_FLAG_DEDICATED, &wq->flags);
|
|
|
|
}
|
|
|
|
|
2020-10-28 01:34:35 +08:00
|
|
|
static inline bool wq_shared(struct idxd_wq *wq)
|
|
|
|
{
|
|
|
|
return !test_bit(WQ_FLAG_DEDICATED, &wq->flags);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool device_pasid_enabled(struct idxd_device *idxd)
|
|
|
|
{
|
|
|
|
return test_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool device_swq_supported(struct idxd_device *idxd)
|
|
|
|
{
|
|
|
|
return (support_enqcmd && device_pasid_enabled(idxd));
|
|
|
|
}
|
|
|
|
|
2020-01-22 07:44:29 +08:00
|
|
|
enum idxd_portal_prot {
|
|
|
|
IDXD_PORTAL_UNLIMITED = 0,
|
|
|
|
IDXD_PORTAL_LIMITED,
|
|
|
|
};
|
|
|
|
|
2021-04-21 02:46:34 +08:00
|
|
|
enum idxd_interrupt_type {
|
|
|
|
IDXD_IRQ_MSIX = 0,
|
|
|
|
IDXD_IRQ_IMS,
|
|
|
|
};
|
|
|
|
|
2020-01-22 07:44:29 +08:00
|
|
|
static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot)
|
|
|
|
{
|
|
|
|
return prot * 0x1000;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int idxd_get_wq_portal_full_offset(int wq_id,
|
|
|
|
enum idxd_portal_prot prot)
|
|
|
|
{
|
|
|
|
return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot);
|
|
|
|
}
|
|
|
|
|
2021-07-21 04:42:04 +08:00
|
|
|
#define IDXD_PORTAL_MASK (PAGE_SIZE - 1)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Even though this function can be accessed by multiple threads, it is safe to use.
|
|
|
|
* At worst the address gets used more than once before it gets incremented. We don't
|
|
|
|
* hit a threshold until iops becomes many million times a second. So the occasional
|
|
|
|
* reuse of the same address is tolerable compare to using an atomic variable. This is
|
|
|
|
* safe on a system that has atomic load/store for 32bit integers. Given that this is an
|
|
|
|
* Intel iEP device, that should not be a problem.
|
|
|
|
*/
|
|
|
|
static inline void __iomem *idxd_wq_portal_addr(struct idxd_wq *wq)
|
|
|
|
{
|
|
|
|
int ofs = wq->portal_offset;
|
|
|
|
|
|
|
|
wq->portal_offset = (ofs + sizeof(struct dsa_raw_desc)) & IDXD_PORTAL_MASK;
|
|
|
|
return wq->portal + ofs;
|
|
|
|
}
|
|
|
|
|
2020-01-22 07:44:05 +08:00
|
|
|
static inline void idxd_wq_get(struct idxd_wq *wq)
|
|
|
|
{
|
|
|
|
wq->client_count++;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void idxd_wq_put(struct idxd_wq *wq)
|
|
|
|
{
|
|
|
|
wq->client_count--;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int idxd_wq_refcount(struct idxd_wq *wq)
|
|
|
|
{
|
|
|
|
return wq->client_count;
|
|
|
|
};
|
|
|
|
|
2021-07-16 02:43:09 +08:00
|
|
|
int __must_check __idxd_driver_register(struct idxd_device_driver *idxd_drv,
|
|
|
|
struct module *module, const char *mod_name);
|
|
|
|
#define idxd_driver_register(driver) \
|
|
|
|
__idxd_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
|
|
|
|
|
|
|
|
void idxd_driver_unregister(struct idxd_device_driver *idxd_drv);
|
|
|
|
|
2021-07-16 02:44:47 +08:00
|
|
|
#define module_idxd_driver(__idxd_driver) \
|
|
|
|
module_driver(__idxd_driver, idxd_driver_register, idxd_driver_unregister)
|
|
|
|
|
2020-01-22 07:44:05 +08:00
|
|
|
int idxd_register_bus_type(void);
|
|
|
|
void idxd_unregister_bus_type(void);
|
2021-04-16 07:37:33 +08:00
|
|
|
int idxd_register_devices(struct idxd_device *idxd);
|
|
|
|
void idxd_unregister_devices(struct idxd_device *idxd);
|
2020-01-22 07:44:05 +08:00
|
|
|
int idxd_register_driver(void);
|
|
|
|
void idxd_unregister_driver(void);
|
2021-04-21 02:46:51 +08:00
|
|
|
void idxd_wqs_quiesce(struct idxd_device *idxd);
|
2020-01-22 07:43:59 +08:00
|
|
|
|
|
|
|
/* device interrupt control */
|
2021-04-13 00:23:27 +08:00
|
|
|
void idxd_msix_perm_setup(struct idxd_device *idxd);
|
|
|
|
void idxd_msix_perm_clear(struct idxd_device *idxd);
|
2020-01-22 07:43:59 +08:00
|
|
|
irqreturn_t idxd_misc_thread(int vec, void *data);
|
|
|
|
irqreturn_t idxd_wq_thread(int irq, void *data);
|
|
|
|
void idxd_mask_error_interrupts(struct idxd_device *idxd);
|
|
|
|
void idxd_unmask_error_interrupts(struct idxd_device *idxd);
|
|
|
|
void idxd_mask_msix_vectors(struct idxd_device *idxd);
|
2020-06-27 02:12:56 +08:00
|
|
|
void idxd_mask_msix_vector(struct idxd_device *idxd, int vec_id);
|
|
|
|
void idxd_unmask_msix_vector(struct idxd_device *idxd, int vec_id);
|
2020-01-22 07:43:59 +08:00
|
|
|
|
|
|
|
/* device control */
|
2021-07-16 02:44:24 +08:00
|
|
|
int idxd_register_idxd_drv(void);
|
|
|
|
void idxd_unregister_idxd_drv(void);
|
2021-07-16 02:44:01 +08:00
|
|
|
int idxd_device_drv_probe(struct idxd_dev *idxd_dev);
|
2021-07-16 02:44:07 +08:00
|
|
|
void idxd_device_drv_remove(struct idxd_dev *idxd_dev);
|
2021-07-16 02:43:31 +08:00
|
|
|
int drv_enable_wq(struct idxd_wq *wq);
|
2021-07-16 02:44:30 +08:00
|
|
|
int __drv_enable_wq(struct idxd_wq *wq);
|
2021-07-16 02:43:37 +08:00
|
|
|
void drv_disable_wq(struct idxd_wq *wq);
|
2021-07-16 02:44:30 +08:00
|
|
|
void __drv_disable_wq(struct idxd_wq *wq);
|
2021-02-01 23:26:14 +08:00
|
|
|
int idxd_device_init_reset(struct idxd_device *idxd);
|
2020-01-22 07:43:59 +08:00
|
|
|
int idxd_device_enable(struct idxd_device *idxd);
|
|
|
|
int idxd_device_disable(struct idxd_device *idxd);
|
2020-06-27 02:11:18 +08:00
|
|
|
void idxd_device_reset(struct idxd_device *idxd);
|
2021-06-05 08:06:21 +08:00
|
|
|
void idxd_device_clear_state(struct idxd_device *idxd);
|
2020-01-22 07:43:59 +08:00
|
|
|
int idxd_device_config(struct idxd_device *idxd);
|
2020-10-28 01:34:35 +08:00
|
|
|
void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid);
|
2021-04-21 02:46:28 +08:00
|
|
|
int idxd_device_load_config(struct idxd_device *idxd);
|
2021-04-21 02:46:34 +08:00
|
|
|
int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle,
|
|
|
|
enum idxd_interrupt_type irq_type);
|
|
|
|
int idxd_device_release_int_handle(struct idxd_device *idxd, int handle,
|
|
|
|
enum idxd_interrupt_type irq_type);
|
2020-01-22 07:43:59 +08:00
|
|
|
|
|
|
|
/* work queue control */
|
2021-04-21 02:46:51 +08:00
|
|
|
void idxd_wqs_unmap_portal(struct idxd_device *idxd);
|
2020-01-22 07:43:59 +08:00
|
|
|
int idxd_wq_alloc_resources(struct idxd_wq *wq);
|
|
|
|
void idxd_wq_free_resources(struct idxd_wq *wq);
|
|
|
|
int idxd_wq_enable(struct idxd_wq *wq);
|
2021-06-05 08:06:21 +08:00
|
|
|
int idxd_wq_disable(struct idxd_wq *wq, bool reset_config);
|
2020-06-27 02:11:18 +08:00
|
|
|
void idxd_wq_drain(struct idxd_wq *wq);
|
2021-04-13 00:02:36 +08:00
|
|
|
void idxd_wq_reset(struct idxd_wq *wq);
|
2020-01-22 07:44:05 +08:00
|
|
|
int idxd_wq_map_portal(struct idxd_wq *wq);
|
|
|
|
void idxd_wq_unmap_portal(struct idxd_wq *wq);
|
2020-10-28 01:34:35 +08:00
|
|
|
int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid);
|
|
|
|
int idxd_wq_disable_pasid(struct idxd_wq *wq);
|
2021-04-21 02:46:22 +08:00
|
|
|
void idxd_wq_quiesce(struct idxd_wq *wq);
|
|
|
|
int idxd_wq_init_percpu_ref(struct idxd_wq *wq);
|
2020-01-22 07:43:59 +08:00
|
|
|
|
2020-01-22 07:44:17 +08:00
|
|
|
/* submission */
|
|
|
|
int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc);
|
|
|
|
struct idxd_desc *idxd_alloc_desc(struct idxd_wq *wq, enum idxd_op_type optype);
|
|
|
|
void idxd_free_desc(struct idxd_wq *wq, struct idxd_desc *desc);
|
|
|
|
|
2020-01-22 07:44:23 +08:00
|
|
|
/* dmaengine */
|
|
|
|
int idxd_register_dma_device(struct idxd_device *idxd);
|
|
|
|
void idxd_unregister_dma_device(struct idxd_device *idxd);
|
|
|
|
int idxd_register_dma_channel(struct idxd_wq *wq);
|
|
|
|
void idxd_unregister_dma_channel(struct idxd_wq *wq);
|
|
|
|
void idxd_parse_completion_status(u8 status, enum dmaengine_tx_result *res);
|
|
|
|
void idxd_dma_complete_txd(struct idxd_desc *desc,
|
|
|
|
enum idxd_complete_type comp_type);
|
|
|
|
|
2020-01-22 07:44:29 +08:00
|
|
|
/* cdev */
|
|
|
|
int idxd_cdev_register(void);
|
|
|
|
void idxd_cdev_remove(void);
|
|
|
|
int idxd_cdev_get_major(struct idxd_device *idxd);
|
|
|
|
int idxd_wq_add_cdev(struct idxd_wq *wq);
|
|
|
|
void idxd_wq_del_cdev(struct idxd_wq *wq);
|
|
|
|
|
dmaengine: idxd: Add IDXD performance monitor support
Implement the IDXD performance monitor capability (named 'perfmon' in
the DSA (Data Streaming Accelerator) spec [1]), which supports the
collection of information about key events occurring during DSA and
IAX (Intel Analytics Accelerator) device execution, to assist in
performance tuning and debugging.
The idxd perfmon support is implemented as part of the IDXD driver and
interfaces with the Linux perf framework. It has several features in
common with the existing uncore pmu support:
- it does not support sampling
- does not support per-thread counting
However it also has some unique features not present in the core and
uncore support:
- all general-purpose counters are identical, thus no event constraints
- operation is always system-wide
While the core perf subsystem assumes that all counters are by default
per-cpu, the uncore pmus are socket-scoped and use a cpu mask to
restrict counting to one cpu from each socket. IDXD counters use a
similar strategy but expand the scope even further; since IDXD
counters are system-wide and can be read from any cpu, the IDXD perf
driver picks a single cpu to do the work (with cpu hotplug notifiers
to choose a different cpu if the chosen one is taken off-line).
More specifically, the perf userspace tool by default opens a counter
for each cpu for an event. However, if it finds a cpumask file
associated with the pmu under sysfs, as is the case with the uncore
pmus, it will open counters only on the cpus specified by the cpumask.
Since perfmon only needs to open a single counter per event for a
given IDXD device, the perfmon driver will create a sysfs cpumask file
for the device and insert the first cpu of the system into it. When a
user uses perf to open an event, perf will open a single counter on
the cpu specified by the cpu mask. This amounts to the default
system-wide rather than per-cpu counting mentioned previously for
perfmon pmu events. In order to keep the cpu mask up-to-date, the
driver implements cpu hotplug support for multiple devices, as IDXD
usually enumerates and registers more than one idxd device.
The perfmon driver implements basic perfmon hardware capability
discovery and configuration, and is initialized by the IDXD driver's
probe function. During initialization, the driver retrieves the total
number of supported performance counters, the pmu ID, and the device
type from idxd device, and registers itself under the Linux perf
framework.
The perf userspace tool can be used to monitor single or multiple
events depending on the given configuration, as well as event groups,
which are also supported by the perfmon driver. The user configures
events using the perf tool command-line interface by specifying the
event and corresponding event category, along with an optional set of
filters that can be used to restrict counting to specific work queues,
traffic classes, page and transfer sizes, and engines (See [1] for
specifics).
With the configuration specified by the user, the perf tool issues a
system call passing that information to the kernel, which uses it to
initialize the specified event(s). The event(s) are opened and
started, and following termination of the perf command, they're
stopped. At that point, the perfmon driver will read the latest count
for the event(s), calculate the difference between the latest counter
values and previously tracked counter values, and display the final
incremental count as the event count for the cycle. An overflow
handler registered on the IDXD irq path is used to account for counter
overflows, which are signaled by an overflow interrupt.
Below are a couple of examples of perf usage for monitoring DSA events.
The following monitors all events in the 'engine' category. Becuuse
no filters are specified, this captures all engine events for the
workload, which in this case is 19 iterations of the work generated by
the kernel dmatest module.
Details describing the events can be found in Appendix D of [1],
Performance Monitoring Events, but briefly they are:
event 0x1: total input data processed, in 32-byte units
event 0x2: total data written, in 32-byte units
event 0x4: number of work descriptors that read the source
event 0x8: number of work descriptors that write the destination
event 0x10: number of work descriptors dispatched from batch descriptors
event 0x20: number of work descriptors dispatched from work queues
# perf stat -e dsa0/event=0x1,event_category=0x1/,
dsa0/event=0x2,event_category=0x1/,
dsa0/event=0x4,event_category=0x1/,
dsa0/event=0x8,event_category=0x1/,
dsa0/event=0x10,event_category=0x1/,
dsa0/event=0x20,event_category=0x1/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
5,332 dsa0/event=0x1,event_category=0x1/
5,327 dsa0/event=0x2,event_category=0x1/
19 dsa0/event=0x4,event_category=0x1/
19 dsa0/event=0x8,event_category=0x1/
0 dsa0/event=0x10,event_category=0x1/
19 dsa0/event=0x20,event_category=0x1/
21.977436186 seconds time elapsed
The command below illustrates filter usage with a simple example. It
specifies that MEM_MOVE operations should be counted for the DSA
device dsa0 (event 0x8 corresponds to the EV_MEM_MOVE event - Number
of Memory Move Descriptors, which is part of event category 0x3 -
Operations. The detailed category and event IDs are available in
Appendix D, Performance Monitoring Events, of [1]). In addition to
the event and event category, a number of filters are also specified
(the detailed filter values are available in Chapter 6.4 (Filter
Support) of [1]), which will restrict counting to only those events
that meet all of the filter criteria. In this case, the filters
specify that only MEM_MOVE operations that are serviced by work queue
wq0 and specifically engine number engine0 and traffic class tc0
having sizes between 0 and 4k and page size of between 0 and 1G result
in a counter hit; anything else will be filtered out and not appear in
the final count. Note that filters are optional - any filter not
specified is assumed to be all ones and will pass anything.
# perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1
Performance counter stats for 'system wide':
19 dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
21.865914091 seconds time elapsed
The output above reflects that the unspecified workload resulted in
the counting of 19 MEM_MOVE operation events that met the filter
criteria.
[1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html
[ Based on work originally by Jing Lin. ]
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Link: https://lore.kernel.org/r/0c5080a7d541904c4ad42b848c76a1ce056ddac7.1619276133.git.zanussi@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2021-04-24 23:04:15 +08:00
|
|
|
/* perfmon */
|
|
|
|
#if IS_ENABLED(CONFIG_INTEL_IDXD_PERFMON)
|
|
|
|
int perfmon_pmu_init(struct idxd_device *idxd);
|
|
|
|
void perfmon_pmu_remove(struct idxd_device *idxd);
|
|
|
|
void perfmon_counter_overflow(struct idxd_device *idxd);
|
|
|
|
void perfmon_init(void);
|
|
|
|
void perfmon_exit(void);
|
|
|
|
#else
|
|
|
|
static inline int perfmon_pmu_init(struct idxd_device *idxd) { return 0; }
|
|
|
|
static inline void perfmon_pmu_remove(struct idxd_device *idxd) {}
|
|
|
|
static inline void perfmon_counter_overflow(struct idxd_device *idxd) {}
|
|
|
|
static inline void perfmon_init(void) {}
|
|
|
|
static inline void perfmon_exit(void) {}
|
|
|
|
#endif
|
|
|
|
|
2021-07-15 02:50:06 +08:00
|
|
|
static inline void complete_desc(struct idxd_desc *desc, enum idxd_complete_type reason)
|
|
|
|
{
|
|
|
|
idxd_dma_complete_txd(desc, reason);
|
|
|
|
idxd_free_desc(desc->wq, desc);
|
|
|
|
}
|
|
|
|
|
2020-01-22 07:43:59 +08:00
|
|
|
#endif
|