Commit Graph

18 Commits

Author SHA1 Message Date
Zhang Yuchen c608966f3f ipmi: fix msg stack when IPMI is disconnected
If you continue to access and send messages at a high frequency (once
every 55s) when the IPMI is disconnected, messages will accumulate in
intf->[hp_]xmit_msg. If it lasts long enough, it takes up a lot of
memory.

The reason is that if IPMI is disconnected, each message will be set to
IDLE after it returns to HOSED through IDLE->ERROR0->HOSED. The next
message goes through the same process when it comes in. This process
needs to wait for IBF_TIMEOUT * (MAX_ERROR_RETRIES + 1) = 55s.

Each message takes 55S to destroy. This results in a continuous increase
in memory.

I find that if I wait 5 seconds after the first message fails, the
status changes to ERROR0 in smi_timeout(). The next message will return
the error code IPMI_NOT_IN_MY_STATE_ERR directly without wait.

This is more in line with our needs.

So instead of setting each message state to IDLE after it reaches the
state HOSED, set state to ERROR0.

After testing, the problem has been solved, no matter how many
consecutive sends, will not cause continuous memory growth. It also
returns to normal immediately after the IPMI is restored.

In addition, the HOSED state should also count as invalid. So the HOSED
is removed from the invalid judgment in start_kcs_transaction().

The verification operations are as follows:

1. Use BPF to record the ipmi_alloc/free_smi_msg().

  $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc
      %p\n",retval);} kprobe:free_recv_msg {printf("free  %p\n",arg0)}'

2. Exec `date; time for x in $(seq 1 2); do ipmitool mc info; done`.
3. Record the output of `time` and when free all msgs.

Before:

`time` takes 120s, This is because `ipmitool mc info` send 4 msgs and
waits only 15 seconds for each message. Last msg is free after 440s.

  $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc
      %p\n",retval);} kprobe:free_recv_msg {printf("free  %p\n",arg0)}'
  Oct 05 11:40:55 Attaching 2 probes...
  Oct 05 11:41:12 alloc 0xffff9558a05f0c00
  Oct 05 11:41:27 alloc 0xffff9558a05f1a00
  Oct 05 11:41:42 alloc 0xffff9558a05f0000
  Oct 05 11:41:57 alloc 0xffff9558a05f1400
  Oct 05 11:42:07 free  0xffff9558a05f0c00
  Oct 05 11:42:07 alloc 0xffff9558a05f7000
  Oct 05 11:42:22 alloc 0xffff9558a05f2a00
  Oct 05 11:42:37 alloc 0xffff9558a05f5a00
  Oct 05 11:42:52 alloc 0xffff9558a05f3a00
  Oct 05 11:43:02 free  0xffff9558a05f1a00
  Oct 05 11:43:57 free  0xffff9558a05f0000
  Oct 05 11:44:52 free  0xffff9558a05f1400
  Oct 05 11:45:47 free  0xffff9558a05f7000
  Oct 05 11:46:42 free  0xffff9558a05f2a00
  Oct 05 11:47:37 free  0xffff9558a05f5a00
  Oct 05 11:48:32 free  0xffff9558a05f3a00

  $ root@dc00-pb003-t106-n078:~# date;time for x in $(seq 1 2); do
  ipmitool mc info; done

  Wed Oct  5 11:41:12 CST 2022
  No data available
  Get Device ID command failed
  No data available
  No data available
  No valid response received
  Get Device ID command failed: Unspecified error
  No data available
  Get Device ID command failed
  No data available
  No data available
  No valid response received
  No data available
  Get Device ID command failed

  real        1m55.052s
  user        0m0.001s
  sys        0m0.001s

After:

`time` takes 55s, all msgs is returned and free after 55s.

  $ bpftrace -e 'kretprobe:ipmi_alloc_recv_msg {printf("alloc
      %p\n",retval);} kprobe:free_recv_msg {printf("free  %p\n",arg0)}'

  Oct 07 16:30:35 Attaching 2 probes...
  Oct 07 16:30:45 alloc 0xffff955943aa9800
  Oct 07 16:31:00 alloc 0xffff955943aacc00
  Oct 07 16:31:15 alloc 0xffff955943aa8c00
  Oct 07 16:31:30 alloc 0xffff955943aaf600
  Oct 07 16:31:40 free  0xffff955943aa9800
  Oct 07 16:31:40 free  0xffff955943aacc00
  Oct 07 16:31:40 free  0xffff955943aa8c00
  Oct 07 16:31:40 free  0xffff955943aaf600
  Oct 07 16:31:40 alloc 0xffff9558ec8f7e00
  Oct 07 16:31:40 free  0xffff9558ec8f7e00
  Oct 07 16:31:40 alloc 0xffff9558ec8f7800
  Oct 07 16:31:40 free  0xffff9558ec8f7800
  Oct 07 16:31:40 alloc 0xffff9558ec8f7e00
  Oct 07 16:31:40 free  0xffff9558ec8f7e00
  Oct 07 16:31:40 alloc 0xffff9558ec8f7800
  Oct 07 16:31:40 free  0xffff9558ec8f7800

  root@dc00-pb003-t106-n078:~# date;time for x in $(seq 1 2); do
  ipmitool mc info; done
  Fri Oct  7 16:30:45 CST 2022
  No data available
  Get Device ID command failed
  No data available
  No data available
  No valid response received
  Get Device ID command failed: Unspecified error
  Get Device ID command failed: 0xd5 Command not supported in present state
  Get Device ID command failed: Command not supported in present state

  real        0m55.038s
  user        0m0.001s
  sys        0m0.001s

Signed-off-by: Zhang Yuchen <zhangyuchen.lcr@bytedance.com>
Message-Id: <20221009091811.40240-2-zhangyuchen.lcr@bytedance.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
2022-10-17 09:51:27 -05:00
Corey Minyard a190db945b ipmi: Clean up some printks
Convert to dev_xxx() and fix some verbage.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
2020-09-15 09:57:45 -05:00
Xianting Tian c2b1e76d8c ipmi:sm: Print current state when the state is invalid
Print current state before returning IPMI_NOT_IN_MY_STATE_ERR so we can
know where this issue is coming from and possibly fix the state machine.

Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
Message-Id: <20200915074441.4090-1-tian.xianting@h3c.com>
[Converted printk() to pr_xxx().]
Signed-off-by: Corey Minyard <cminyard@mvista.com>
2020-09-15 09:46:20 -05:00
Joe Perches f993cdd99a ipmi: Convert printk(KERN_<level> to pr_<level>(
Use the more common logging style.

Miscellanea:

o Convert old style continuation printks without KERN_CONT to pr_cont
o Coalesce formats
o Realign arguments
o Remove unnecessary casts

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
2018-09-18 16:15:33 -05:00
Corey Minyard 243ac21035 ipmi: Add or fix SPDX-License-Identifier in all files
And get rid of the license text that is no longer necessary.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Rocky Craig <rocky.craig@hp.com>
2018-02-27 07:42:51 -06:00
Corey Minyard 81d02b7f8c ipmi: Make some data const that was only read
Several data structures were only used for reading, so make them
const.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
2015-09-03 15:02:27 -05:00
Corey Minyard eb6d78ec21 ipmi: Reset the KCS timeout when starting error recovery
The OBF timer in KCS was not reset in one situation when error recovery
was started, resulting in an immediate timeout.

Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-17 12:23:06 -07:00
Xie XiuQi ccb3368cb4 ipmi: use USEC_PER_SEC instead of 1000000 for more meaningful
Use USEC_PER_SEC instead of 1000000, that making the later bugfix
more clearly.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-25 15:31:58 -08:00
Matthew Garrett 828dc9da50 ipmi: increase KCS timeouts
We currently time out and retry KCS transactions after 1 second of waiting
for IBF or OBF.  This appears to be too short for some hardware.  The IPMI
spec says "All system software wait loops should include error timeouts.
For simplicity, such timeouts are not shown explicitly in the flow
diagrams.  A five-second timeout or greater is recommended".  Change the
timeout to five seconds to satisfy the slow hardware.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 17:14:36 -07:00
Julia Lawall d1da96aada drivers/char/ipmi: Use KCS_IDLE_STATE
KCS_IDLE and KCS_IDLE state have the same value, but in this function the
constants ending in _STATE are compared to the state variable.

Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Core Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:10 -08:00
Corey Minyard c305e3d38e IPMI: Style fixes in the system interface code
Lots of style fixes for the IPMI system interface driver.  No functional
changes.  Basically fixes everything reported by checkpatch and fixes the
comment style.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Rocky Craig <rocky.craig@hp.com>
Cc: Hannes Schulz <schulz@schwaar.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:15 -07:00
Corey Minyard 5a1099baa8 [PATCH] IPMI: increase KCS message size
Increase the maximum message size a KCS interface supports to the maximum
message size the rest of the driver supports.  Some systems really support
messages this big.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: Patrick Schoeller <Patrick.Schoeller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:48 -08:00
Corey Minyard 4d7cbac7c8 [PATCH] IPMI: Fix BT long busy
The IPMI BT subdriver has been patched to survive "long busy" timeouts seen
during firmware upgrades and resets.  The patch never returns the HOSED state,
synthesizes response messages with meaningful completion codes, and recovers
gracefully when the hardware finishes the long busy.  The subdriver now issues
a "Get BT Capabilities" command and properly uses those results.  More
informative completion codes are returned on error from transaction starts;
this logic was propogated to the KCS and SMIC subdrivers.  Finally, indent and
other style quirks were normalized.

Signed-off-by: Rocky Craig <rocky.craig@hp.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:47 -08:00
Corey Minyard 8a3628d53f [PATCH] IPMI: tidy up various things
Tidy up various coding standard things, mostly removing the space after !,
but also break some long lines and fix a few other spacing inconsistencies.
Also fixes some bad error reporting when deleting an IPMI user.

Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:54 -08:00
Corey Minyard c3e7e7916e [PATCH] ipmi: kcs error0 delay
BMCs can get into ERROR0 state while flashing new firmware, particularly while
the BMC is erasing the next flash block, which may take a just under 2 seconds
on a Dell PowerEdge 2800 (1.75 seconds typical), during which time the
single-threaded firmware may not be able to process new commands.  In
particular, clearing OBF may not take effect immediately.

We want it to delay in ERROR0 after clearing OBF a bit waiting for OBF to
actually be clear before proceeding.

This introduces a new return value from the LLDD's event loop,
SI_SM_CALL_WITH_TICK_DELAY.  This means the calling thread/timer should
schedule_timeout() at least 1 tick, rather than busy-wait.  This is a longer
delay than SI_SM_CALL_WITH_DELAY, which is typically a 250us busy-wait.

Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:44 -08:00
Corey Minyard c4edff1c19 [PATCH] ipmi: various si cleanup
A number of small changes for the various system interface drivers,
consolidated from a number of patches from Matt Domsch.

Clear B2H_ATN and drain the BMC message buffer on command timeout.  This
prevents further commands from failing after a timeout.

Add bt_debug and smic_debug module parameters, expose them in sysfs.  This
lets you enable and disable debugging messages at runtime.

Unsigned jiffies math in ipmi_si_intf.c causes a too-large value to be passed
to ->event() after jiffies wrap-around.  The BT driver had caught this, but
didn't know how to fix it.  Now all calls to ->event() use a sane value for
time.

Increase timeout for commands handed to the BT driver from 2 seconds to 5
seconds.  This is necessary particularly when the previous command was a
"Clear SEL", as that command completes, yet the BMC isn't really ready to
handle another command yet.

Silence BT debugging messages which were being printed on the console.

Increase SMIC timeout form 1/10s to 2s.  This is needed on Dell PowerEdge 2650
and PowerEdge 750 with ERA/O cards to allow commands to complete without
timing out.

Adds kcs_debug module param, to match behavior of BT and SMIC.  This also
prevents messages from being sent to the console unless explicitly requested.

Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:44 -08:00
Corey Minyard 1fdd75bd6c [PATCH] ipmi: clean up versioning of the IPMI driver
This adds MODULE_VERSION, MODULE_DESCRIPTION, and MODULE_AUTHOR tags to the
IPMI driver modules.  Also changes the MODULE_VERSION to remove the
prepended 'v' on each value, consistent with the module versioning policy.

This patch also removes all the version information from everything except
the ipmi_msghandler module.

Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07 16:57:48 -07:00
Linus Torvalds 1da177e4c3 Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
2005-04-16 15:20:36 -07:00