OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Russ Anderson	264b0f9930	[IA64] MCA recovery: Montecito support The information in MCA records is filled in slightly differently on Montecito than on Madison/McKinley. Usually, the cache check and bus check target identifiers have the same address. On Montecito the cache check and bus check target identifiers can be different if a corrected error (ie SBE or unconsumed poison data) was encountered and then an uncorrected error (ie DBE) was consumed. In that case, the cache check target identifier is the physical address of the DBE (that caused the MCA to surface) while the bus check target identifier is the physical address of the SBE. This patch correctly finds the target identifier that triggered the MCA. If there are multiple valid cache target identifiers in the same error record then use the one with the lowest cache level. Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-10-31 14:30:34 -08:00
Hidetoshi Seto	43ed3baf62	[IA64] printing support for MCA/INIT Printing message to console from MCA/INIT handler is useful, however doing oops_in_progress = 1 in them exactly makes something in kernel wrong. Especially it sounds ugly if system goes wrong after returning from recoverable MCA. This patch adds ia64_mca_printk() function that collects messages into temporary-not-so-large message buffer during in MCA/INIT environment and print them out later, after returning to normal context or when handlers determine to down the system. Also this print function is exported for use in extensional MCA handler. It would be useful to describe detail about recovery. NOTE: I don't think it is sane thing if temporary message buffer is enlarged enough to hold whole stack dumps from INIT, so buffering is disabled during stack dump from INIT-monarch (= default_monarch_init_process). please fix it in future. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-09-26 14:44:37 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Russ Anderson	189979619f	[IA64] Add mca recovery failure messages When the mca recovery code encounters a condition that makes the MCA non-recoverable, print the reason it could not recover. This will make it easier to identify why the recovery code did not recover. Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-04-27 14:34:01 -07:00
Russ Anderson	d2a28ad9fa	[IA64] MCA recovery: kernel context recovery table Memory errors encountered by user applications may surface when the CPU is running in kernel context. The current code will not attempt recovery if the MCA surfaces in kernel context (privilage mode 0). This patch adds a check for cases where the user initiated the load that surfaces in kernel interrupt code. An example is a user process lauching a load from memory and the data in memory had bad ECC. Before the bad data gets to the CPU register, and interrupt comes in. The code jumps to the IVT interrupt entry point and begins execution in kernel context. The process of saving the user registers (SAVE_REST) causes the bad data to be loaded into a CPU register, triggering the MCA. The MCA surfaces in kernel context, even though the load was initiated from user context. As suggested by David and Tony, this patch uses an exception table like approach, puting the tagged recovery addresses in a searchable table. One difference from the exception table is that MCAs do not surface in precise places (such as with a TLB miss), so instead of tagging specific instructions, address ranges are registers. A single macro is used to do the tagging, with the input parameter being the label of the starting address and the macro being the ending address. This limits clutter in the code. This patch only tags one spot, the interrupt ivt entry. Testing showed that spot to be a "heavy hitter" with MCAs surfacing while saving user registers. Other spots can be added as needed by adding a single macro. Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-03-24 09:49:52 -08:00
Russ Anderson	e1c48554ae	[IA64] mca recovery return value when no bus check When there is no bus check, the return code should be failure, not success. Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-03-07 15:40:06 -08:00
Russ Anderson	ea0e92a613	[IA64] Increase severity of MCA recovery messages The MCA recovery messages are currently KERN_DEBUG, so they don't show up in /var/log/messages (by default). Increase the severity to KERN_ERR, for the initial message (and also add the physical address to this message). Leave the successful isolation message as KERN_DEBUG, but increase the severity when isolation fails to KERN_CRIT. [Russ' patch made these all KERN_CRIT] Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-03-07 15:23:25 -08:00
Hidetoshi Seto	a947464617	[IA64] mca_drv: Add minstate validation MCA driver can cause panic if kernel gets a state info with no minstate. This patch adds minstate validation before handling it. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2006-02-09 14:42:55 -08:00
Tony Luck	cf1d469ec1	Pull mca-check-psp into release branch	2005-11-10 10:38:05 -08:00
Russ Anderson	cbb9214434	[IA64] MCA recovery: Bump reference count on bad pages When a page has a memory uncorrectable ECC error, the recovery code wants to prevent the page from being reused. This change bumps the reference count to prevent the page from getting back on the free list. Signed-off-by: Russ Anderson (rja@sgi.com) Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-11-08 10:04:16 -08:00
Russ Anderson	56f87b8217	[IA64] MCA recovery: pfn_valid() needs a pfn paddr needs to be shifted by PAGE_SHIFT to be valid input for pfn_valid(). Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-11-08 10:03:05 -08:00
Russ Anderson	a14f25a076	[IA64] MCA recovery based on PSP bits The determination of whether an MCA is recoverable or not must be based on the bits set in the PSP (Processor State Parameter). The specific bits are shown in the Intel IA-64 Architecture Software Developer's Manual, Vol 2, Table 11-6 Software Recovery Bits in Processor State Parameter. Those bits should be consistent across the entire IA-64 family of processors. Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-11-08 10:00:56 -08:00
Hidetoshi Seto	4881e2cd25	[IA64] MCA recovery verify pfn_valid Verify the pfn is valid before calling pfn_to_page(), and cut isolation message if nothing was done. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-09-22 13:27:59 -07:00
Hidetoshi Seto	20305e5972	[IA64] mca_drv cleanup There were some trailing white spaces, long lines, brackets in weird style etc. This patch cleans them up. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-09-16 10:39:40 -07:00
Keith Owens	7f613c7d22	[PATCH] MCA/INIT: use per cpu stacks The bulk of the change. Use per cpu MCA/INIT stacks. Change the SAL to OS state (sos) to be per process. Do all the assembler work on the MCA/INIT stacks, leaving the original stack alone. Pass per cpu state data to the C handlers for MCA and INIT, which also means changing the mca_drv interfaces slightly. Lots of verification on whether the original stack is usable before converting it to a sleeping process. Signed-off-by: Keith Owens <kaos@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-09-11 14:08:41 -07:00
Russ Anderson	b1b901c202	[IA64] MCA recovery improvements Jack Steiner uncovered some opportunities for improvement in the MCA recovery code. 1) Set bsp to save registers on the kernel stack. 2) Disable interrupts while in the MCA recovery code. 3) Change the way the user process is killed, to avoid a panic in schedule. Testing shows that these changes make the recovery code much more reliable with the 2.6.12 kernel. Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2005-05-03 13:47:42 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

17 Commits