2005-04-17 06:20:36 +08:00
|
|
|
SYSFS FILES
|
|
|
|
|
|
|
|
For each InfiniBand device, the InfiniBand drivers create the
|
|
|
|
following files under /sys/class/infiniband/<device name>:
|
|
|
|
|
|
|
|
node_type - Node type (CA, switch or router)
|
|
|
|
node_guid - Node GUID
|
|
|
|
sys_image_guid - System image GUID
|
|
|
|
|
|
|
|
In addition, there is a "ports" subdirectory, with one subdirectory
|
|
|
|
for each port. For example, if mthca0 is a 2-port HCA, there will
|
|
|
|
be two directories:
|
|
|
|
|
|
|
|
/sys/class/infiniband/mthca0/ports/1
|
|
|
|
/sys/class/infiniband/mthca0/ports/2
|
|
|
|
|
|
|
|
(A switch will only have a single "0" subdirectory for switch port
|
|
|
|
0; no subdirectory is created for normal switch ports)
|
|
|
|
|
|
|
|
In each port subdirectory, the following files are created:
|
|
|
|
|
|
|
|
cap_mask - Port capability mask
|
|
|
|
lid - Port LID
|
|
|
|
lid_mask_count - Port LID mask count
|
|
|
|
rate - Port data rate (active width * active speed)
|
|
|
|
sm_lid - Subnet manager LID for port's subnet
|
|
|
|
sm_sl - Subnet manager SL for port's subnet
|
|
|
|
state - Port state (DOWN, INIT, ARMED, ACTIVE or ACTIVE_DEFER)
|
|
|
|
phys_state - Port physical state (Sleep, Polling, LinkUp, etc)
|
|
|
|
|
|
|
|
There is also a "counters" subdirectory, with files
|
|
|
|
|
|
|
|
VL15_dropped
|
|
|
|
excessive_buffer_overrun_errors
|
|
|
|
link_downed
|
|
|
|
link_error_recovery
|
|
|
|
local_link_integrity_errors
|
|
|
|
port_rcv_constraint_errors
|
|
|
|
port_rcv_data
|
|
|
|
port_rcv_errors
|
|
|
|
port_rcv_packets
|
|
|
|
port_rcv_remote_physical_errors
|
|
|
|
port_rcv_switch_relay_errors
|
|
|
|
port_xmit_constraint_errors
|
|
|
|
port_xmit_data
|
|
|
|
port_xmit_discards
|
|
|
|
port_xmit_packets
|
|
|
|
symbol_error
|
|
|
|
|
|
|
|
Each of these files contains the corresponding value from the port's
|
|
|
|
Performance Management PortCounters attribute, as described in
|
|
|
|
section 16.1.3.5 of the InfiniBand Architecture Specification.
|
|
|
|
|
|
|
|
The "pkeys" and "gids" subdirectories contain one file for each
|
|
|
|
entry in the port's P_Key or GID table respectively. For example,
|
|
|
|
ports/1/pkeys/10 contains the value at index 10 in port 1's P_Key
|
|
|
|
table.
|
|
|
|
|
IB/core: Make device counter infrastructure dynamic
In practice, each RDMA device has a unique set of counters that the
hardware implements. Having a central set of counters that they must
all adhere to is limiting and causes many useful counters to not be
available.
Therefore we create a dynamic counter registration infrastructure.
The driver must implement a stats structure allocation routine, in
which the driver must place the directory name it wants, a list of
names for all of the counters, an array of u64 counters themselves,
plus a few generic configuration options.
We then implement a core routine to create a sysfs file for each
of the named stats elements, and a core routine to retrieve the
stats when any of the sysfs attribute files are read.
To avoid excessive beating on the stats generation routine in the
drivers, the core code also caches the stats for a short period of
time so that someone attempting to read all of the stats in a
given device's directory will not result in a stats generation
call per file read.
Future work will attempt to standardize just the shared stats
elements, and possibly add a method to get the stats via netlink
in addition to sysfs.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
[ Add caching, make structure names more informative, add i40iw support,
other significant rewrites from the original patch ]
2016-05-17 01:49:33 +08:00
|
|
|
There is an optional "hw_counters" subdirectory that may be under either
|
|
|
|
the parent device or the port subdirectories or both. If present,
|
|
|
|
there are a list of counters provided by the hardware. They may match
|
|
|
|
some of the counters in the counters directory, but they often include
|
|
|
|
many other counters. In addition to the various counters, there will
|
|
|
|
be a file named "lifespan" that configures how frequently the core
|
|
|
|
should update the counters when they are being accessed (counters are
|
|
|
|
not updated if they are not being accessed). The lifespan is in milli-
|
|
|
|
seconds and defaults to 10 unless set to something else by the driver.
|
|
|
|
Users may echo a value between 0 - 10000 to the lifespan file to set
|
|
|
|
the length of time between updates in milliseconds.
|
|
|
|
|
2005-04-17 06:20:36 +08:00
|
|
|
MTHCA
|
|
|
|
|
|
|
|
The Mellanox HCA driver also creates the files:
|
|
|
|
|
|
|
|
hw_rev - Hardware revision number
|
|
|
|
fw_ver - Firmware version
|
|
|
|
hca_type - HCA type: "MT23108", "MT25208 (MT23108 compat mode)",
|
|
|
|
or "MT25208"
|
2015-07-31 03:17:43 +08:00
|
|
|
|
|
|
|
HFI1
|
|
|
|
|
|
|
|
The hfi1 driver also creates these additional files:
|
|
|
|
|
|
|
|
hw_rev - hardware revision
|
|
|
|
board_id - manufacturing board id
|
|
|
|
tempsense - thermal sense information
|
|
|
|
serial - board serial number
|
|
|
|
nfreectxts - number of free user contexts
|
|
|
|
nctxts - number of allowed contexts (PSM2)
|
|
|
|
chip_reset - diagnostic (root only)
|
|
|
|
boardversion - board version
|
2016-09-25 22:44:51 +08:00
|
|
|
|
|
|
|
sdma<N>/ - one directory per sdma engine (0 - 15)
|
|
|
|
sdma<N>/cpu_list - read-write, list of cpus for user-process to sdma
|
|
|
|
engine assignment.
|
|
|
|
sdma<N>/vl - read-only, vl the sdma engine maps to.
|
|
|
|
|
|
|
|
The new interface will give the user control on the affinity settings
|
|
|
|
for the hfi1 device.
|
|
|
|
As an example, to set an sdma engine irq affinity and thread affinity
|
|
|
|
of a user processes to use the sdma engine, which is "near" in terms
|
|
|
|
of NUMA configuration, or physical cpu location, the user will do:
|
|
|
|
|
|
|
|
echo "3" > /proc/irq/<N>/smp_affinity_list
|
|
|
|
echo "4-7" > /sys/devices/.../sdma3/cpu_list
|
|
|
|
cat /sys/devices/.../sdma3/vl
|
|
|
|
0
|
|
|
|
echo "8" > /proc/irq/<M>/smp_affinity_list
|
|
|
|
echo "9-12" > /sys/devices/.../sdma4/cpu_list
|
|
|
|
cat /sys/devices/.../sdma4/vl
|
|
|
|
1
|
|
|
|
|
|
|
|
to make sure that when a process runs on cpus 4,5,6, or 7,
|
|
|
|
and uses vl=0, then sdma engine 3 is selected by the driver,
|
|
|
|
and also the interrupt of the sdma engine 3 is steered to cpu 3.
|
|
|
|
Similarly, when a process runs on cpus 9,10,11, or 12 and sets vl=1,
|
|
|
|
then engine 4 will be selected and the irq of the sdma engine 4 is
|
|
|
|
steered to cpu 8.
|
|
|
|
This assumes that in the above N is the irq number of "sdma3",
|
|
|
|
and M is irq number of "sdma4" in the /proc/interrupts file.
|
|
|
|
|
2015-07-31 03:17:43 +08:00
|
|
|
ports/1/
|
2016-02-04 06:32:57 +08:00
|
|
|
CCMgtA/
|
2015-07-31 03:17:43 +08:00
|
|
|
cc_settings_bin - CCA tables used by PSM2
|
|
|
|
cc_table_bin
|
2016-02-04 06:32:57 +08:00
|
|
|
cc_prescan - enable prescaning for faster BECN response
|
2015-07-31 03:17:43 +08:00
|
|
|
sc2v/ - 32 files (0 - 31) used to translate sl->vl
|
|
|
|
sl2sc/ - 32 files (0 - 31) used to translate sl->sc
|
|
|
|
vl2mtu/ - 16 (0 - 15) files used to determine MTU for vl
|