Commit Graph

17 Commits

Author SHA1 Message Date
Vadim Pasternak a1ffd3c462 hwmon: (mlxreg-fan) Return zero speed for broken fan
Currently for broken fan driver returns value calculated based on error
code (0xFF) in related fan speed register.
Thus, for such fan user gets fan{n}_fault to 1 and fan{n}_input with
misleading value.

Add check for fan fault prior return speed value and return zero if
fault is detected.

Fixes: 65afb4c8e7 ("hwmon: (mlxreg-fan) Add support for Mellanox FAN driver")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20230212145730.24247-1-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2023-02-12 07:21:40 -08:00
Vadim Pasternak da74944d3a hwmon: (mlxreg-fan) Use pwm attribute for setting fan speed low limit
Recently 'cur_state' user space 'sysfs' interface 'sysfs' has been
deprecated. This interface is used in Nvidia systems for setting fan
speed limit. Currently fan speed limit is set from the user space by
setting 'sysfs' 'cur_state' attribute to 'max_state + n', where 'n' is
required limit, for example: 15 for 50% speed limit, 20 for full fan
speed enforcement.
The purpose of this feature is to provides ability to limit fan speed
according to some system wise considerations, like absence of some
replaceable units (PSU or line cards), high system ambient temperature,
unreliable transceivers temperature sensing or some other factors which
indirectly impacts system's airflow.

The motivation is to support fan low limit feature through 'hwmon'
interface.

Use 'hwmon' 'pwm' attribute for setting low limit for fan speed in
case 'thermal' subsystem is configured in kernel. In this case setting
fan speed through 'hwmon' will never let the 'thermal' subsystem to
select a lower duty cycle than the duty cycle selected with the 'pwm'
attribute.
From other side, fan speed is to be updated in hardware through 'pwm'
only in case the requested fan speed is above last speed set by
'thermal' subsystem, otherwise requested fan speed will be just stored
with no PWM update.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20220126141825.13545-1-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2022-02-27 17:03:17 -08:00
Vadim Pasternak b2be2422c0 hwmon: (mlxreg-fan) Support distinctive names per different cooling devices
Provide different names for cooling devices registration to allow
binding each cooling devices to relevant thermal zone. Thus, specific
cooling device can be associated with related thermal sensor by setting
thermal cooling device type for example to "mlxreg_fan2" and passing
this type to thermal_zone_bind_cooling_device() through 'cdev->type'.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210926053541.1806937-3-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-12 07:22:41 -07:00
Vadim Pasternak b1c2423734 hwmon: (mlxreg-fan) Modify PWM connectivity validation
Validate PWM connectivity only for additional PWM - "pwm1" is connected
on all systems, while "pwm2" - "pwm4" are optional. Validate
connectivity only for optional attributes by reading of related "pwm{n}"
registers - in case "pwm{n}" is not connected, register value is
supposed to be 0xff.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210926053541.1806937-2-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-12 07:22:41 -07:00
Colin Ian King 000cc5bc49 hwmon: (mlxreg-fan) Fix out of bounds read on array fan->pwm
Array fan->pwm[] is MLXREG_FAN_MAX_PWM elements in size, however the
for-loop has a off-by-one error causing index i to be out of range
causing an out of bounds read on the array. Fix this by replacing
the <= operator with < in the for-loop.

Addresses-Coverity: ("Out-of-bounds read")
Reported-by: Vadim Pasternak <vadimp@nvidia.com>
Fixes: 35edbaab3bbf ("hwmon: (mlxreg-fan) Extend driver to support multiply cooling devices")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210920180921.16246-1-colin.king@canonical.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-12 07:22:38 -07:00
Vadim Pasternak d7efb2ebc7 hwmon: (mlxreg-fan) Extend driver to support multiply cooling devices
Add support for additional cooling devices in order to support the
systems, which can be equipped with up-to four PWM controllers.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-12 07:22:37 -07:00
Vadim Pasternak 150f1e0c6f hwmon: (mlxreg-fan) Extend driver to support multiply PWM
Add additional PWM attributes in order to support the systems, which
can be equipped with up-to four PWM controllers. System capability of
additional PWM support is validated through the reading of relevant
registers.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210916194719.871413-3-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-12 07:22:37 -07:00
Vadim Pasternak bc8de07e88 hwmon: (mlxreg-fan) Extend the maximum number of tachometers
Extend support of maximum tachometers from 12 to 14 in order to support
new systems, equipped with more fans.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210916194719.871413-2-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-10-12 07:22:37 -07:00
Vadim Pasternak e6fab7af6b hwmon: (mlxreg-fan) Return non-zero value when fan current state is enforced from sysfs
Fan speed minimum can be enforced from sysfs. For example, setting
current fan speed to 20 is used to enforce fan speed to be at 100%
speed, 19 - to be not below 90% speed, etcetera. This feature provides
ability to limit fan speed according to some system wise
considerations, like absence of some replaceable units or high system
ambient temperature.

Request for changing fan minimum speed is configuration request and can
be set only through 'sysfs' write procedure. In this situation value of
argument 'state' is above nominal fan speed maximum.

Return non-zero code in this case to avoid
thermal_cooling_device_stats_update() call, because in this case
statistics update violates thermal statistics table range.
The issues is observed in case kernel is configured with option
CONFIG_THERMAL_STATISTICS.

Here is the trace from KASAN:
[  159.506659] BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x7d/0xb0
[  159.516016] Read of size 4 at addr ffff888116163840 by task hw-management.s/7444
[  159.545625] Call Trace:
[  159.548366]  dump_stack+0x92/0xc1
[  159.552084]  ? thermal_cooling_device_stats_update+0x7d/0xb0
[  159.635869]  thermal_zone_device_update+0x345/0x780
[  159.688711]  thermal_zone_device_set_mode+0x7d/0xc0
[  159.694174]  mlxsw_thermal_modules_init+0x48f/0x590 [mlxsw_core]
[  159.700972]  ? mlxsw_thermal_set_cur_state+0x5a0/0x5a0 [mlxsw_core]
[  159.731827]  mlxsw_thermal_init+0x763/0x880 [mlxsw_core]
[  160.070233] RIP: 0033:0x7fd995909970
[  160.074239] Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ..
[  160.095242] RSP: 002b:00007fff54f5d938 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  160.103722] RAX: ffffffffffffffda RBX: 0000000000000013 RCX: 00007fd995909970
[  160.111710] RDX: 0000000000000013 RSI: 0000000001906008 RDI: 0000000000000001
[  160.119699] RBP: 0000000001906008 R08: 00007fd995bc9760 R09: 00007fd996210700
[  160.127687] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000013
[  160.135673] R13: 0000000000000001 R14: 00007fd995bc8600 R15: 0000000000000013
[  160.143671]
[  160.145338] Allocated by task 2924:
[  160.149242]  kasan_save_stack+0x19/0x40
[  160.153541]  __kasan_kmalloc+0x7f/0xa0
[  160.157743]  __kmalloc+0x1a2/0x2b0
[  160.161552]  thermal_cooling_device_setup_sysfs+0xf9/0x1a0
[  160.167687]  __thermal_cooling_device_register+0x1b5/0x500
[  160.173833]  devm_thermal_of_cooling_device_register+0x60/0xa0
[  160.180356]  mlxreg_fan_probe+0x474/0x5e0 [mlxreg_fan]
[  160.248140]
[  160.249807] The buggy address belongs to the object at ffff888116163400
[  160.249807]  which belongs to the cache kmalloc-1k of size 1024
[  160.263814] The buggy address is located 64 bytes to the right of
[  160.263814]  1024-byte region [ffff888116163400, ffff888116163800)
[  160.277536] The buggy address belongs to the page:
[  160.282898] page:0000000012275840 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888116167000 pfn:0x116160
[  160.294872] head:0000000012275840 order:3 compound_mapcount:0 compound_pincount:0
[  160.303251] flags: 0x200000000010200(slab|head|node=0|zone=2)
[  160.309694] raw: 0200000000010200 ffffea00046f7208 ffffea0004928208 ffff88810004dbc0
[  160.318367] raw: ffff888116167000 00000000000a0006 00000001ffffffff 0000000000000000
[  160.327033] page dumped because: kasan: bad access detected
[  160.333270]
[  160.334937] Memory state around the buggy address:
[  160.356469] >ffff888116163800: fc ..

Fixes: 65afb4c8e7 ("hwmon: (mlxreg-fan) Add support for Mellanox FAN driver")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210916183151.869427-1-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-09-16 14:48:20 -07:00
Vadim Pasternak f7bf7eb2d7 hwmon: (mlxreg-fan) Add support for fan drawers capability and present registers
Add support for fan drawer's capability and present registers in order
to set mapping between the fan drawers and tachometers. Some systems
are equipped with fan drawers with one tachometer inside. Others with
fan drawers with several tachometers inside. Using present register
along with tachometer-to-drawer mapping allows to skip reading missed
tachometers and expose input for them as zero, instead of exposing
fault code returned by hardware.

Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Link: https://lore.kernel.org/r/20210322172237.2213584-1-vadimp@nvidia.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-04-20 06:50:14 -07:00
Linus Torvalds a455eda33f Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal soc updates from Eduardo Valentin:

 - thermal core has a new devm_* API for registering cooling devices. I
   took the entire series, that is why you see changes on drivers/hwmon
   in this pull (Guenter Roeck)

 - rockchip thermal driver gains support to PX30 SoC (Elaine Zhang)

 - the generic-adc thermal driver now considers the lookup table DT
   property as optional (Jean-Francois Dagenais)

 - Refactoring of tsens thermal driver (Amit Kucheria)

 - Cleanups on cpu cooling driver (Daniel Lezcano)

 - broadcom thermal driver dropped support to ACPI (Srinath Mannam)

 - tegra thermal driver gains support to OC hw throttle and GPU throtle
   (Wei Ni)

 - Fixes in several thermal drivers.

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal: (59 commits)
  hwmon: (pwm-fan) Use devm_thermal_of_cooling_device_register
  hwmon: (npcm750-pwm-fan) Use devm_thermal_of_cooling_device_register
  hwmon: (mlxreg-fan) Use devm_thermal_of_cooling_device_register
  hwmon: (gpio-fan) Use devm_thermal_of_cooling_device_register
  hwmon: (aspeed-pwm-tacho) Use devm_thermal_of_cooling_device_register
  thermal: rcar_gen3_thermal: Fix to show correct trip points number
  thermal: rcar_thermal: update calculation formula for R-Car Gen3 SoCs
  thermal: cpu_cooling: Actually trace CPU load in thermal_power_cpu_get_power
  thermal: rockchip: Support the PX30 SoC in thermal driver
  dt-bindings: rockchip-thermal: Support the PX30 SoC compatible
  thermal: rockchip: fix up the tsadc pinctrl setting error
  thermal: broadcom: Remove ACPI support
  thermal: Fix build error of missing devm_ioremap_resource on UM
  thermal/drivers/cpu_cooling: Remove pointless field
  thermal/drivers/cpu_cooling: Add Software Package Data Exchange (SPDX)
  thermal/drivers/cpu_cooling: Fixup the header and copyright
  thermal/drivers/cpu_cooling: Remove pointless test in power2state()
  thermal: rcar_gen3_thermal: disable interrupt in .remove
  thermal: rcar_gen3_thermal: fix interrupt type
  thermal: Introduce devm_thermal_of_cooling_device_register
  ...
2019-05-16 07:56:57 -07:00
Guenter Roeck 9ebe010e56 hwmon: (mlxreg-fan) Use devm_thermal_of_cooling_device_register
Call devm_thermal_of_cooling_device_register() to register the cooling
device. Also introduce struct device *dev = &pdev->dev; to make the code
easier to read.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-05-14 07:00:46 -07:00
Guenter Roeck 725dcf082c hwmon: (mlxreg-fan) Use HWMON_CHANNEL_INFO macro
The HWMON_CHANNEL_INFO macro simplifies the code, reduces the likelihood
of errors, and makes the code easier to read.

The conversion was done automatically with coccinelle. The semantic patch
used to make this change is as follows.

@r@
initializer list elements;
identifier i;
@@

-u32 i[] = {
-  elements,
-  0
-};

@s@
identifier r.i,j,ty;
@@

-struct hwmon_channel_info j = {
-       .type = ty,
-       .config = i,
-};

@script:ocaml t@
ty << s.ty;
elements << r.elements;
shorter;
elems;
@@

shorter :=
   make_ident (List.hd(List.rev (Str.split (Str.regexp "_") ty)));
elems :=
   make_ident
    (String.concat ","
     (List.map (fun x -> Printf.sprintf "\n\t\t\t   %s" x)
       (Str.split (Str.regexp " , ") elements)))

@@
identifier s.j,t.shorter;
identifier t.elems;
@@

- &j
+ HWMON_CHANNEL_INFO(shorter,elems)

This patch does not introduce functional changes. Many thanks to
Julia Lawall for providing the semantic patch.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2019-04-15 17:19:53 -07:00
Vadim Pasternak b429ebc86f hwmon: (mlxreg-fan) Add support for fan capability registers
Add support for fan capability registers in order to distinct between
the systems which have minor fan configuration differences. This
reduces the amount of code used to describe such systems.
The capability registers provides system specific information about the
number of physically connected tachometers and system specific fan
speed scale parameter.
For example one system can be equipped with twelve fan tachometers,
while the other with for example, eight or six. Or one system should
use default fan speed divider value, while the other has a scale
parameter defined in hardware, which should be used for divider
setting.
Reading this information from the capability registers allows to use the
same fan structure for the systems with the such differences.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2019-04-15 17:19:53 -07:00
Vadim Pasternak 3f9ffa5c3a hwmon: (mlxreg-fan) Modify macros for tachometer fault status reading
Modify macros for tachometer fault status reading for making it more
simple and clear.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-12-16 15:13:16 -08:00
Vadim Pasternak 243cfe3fb8 hwmon: (mlxreg-fan) Fix macros for tacho fault reading
Fix macros for tacometer fault reading.
This fix is relevant for three Mellanox systems MQMB7, MSN37, MSN34,
which are about to be released to the customers.
At the moment, none of them is at customers sites.

Fixes: 65afb4c8e7 ("hwmon: (mlxreg-fan) Add support for Mellanox FAN driver")
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-11-16 08:10:23 -08:00
Vadim Pasternak 65afb4c8e7 hwmon: (mlxreg-fan) Add support for Mellanox FAN driver
Driver obtains PWM and tachometers registers location according to the
system configuration and creates FAN/PWM hwmon objects and a cooling
device. PWM and tachometers are controlled through the on-board
programmable device, which exports its register map. This device could be
attached to any bus type, for which register mapping is supported. Single
instance is created with one PWM control, up to 12 tachometers and one
cooling device. It could be as many instances as programmable device
supports.

Currently driver will be activated from the Mellanox platform driver:
drivers/platform/x86/mlx-platform.c.
For the future ARM based systems it could be activated from the ARM
platform module.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-07-08 20:08:13 -07:00