... from back in 2004; again, it's ifdefed out by CONFIG_FPU.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
we shouldn't bugger off to userland when there still are
pending signals; among other things it makes e.g. SIGSEGV
triggered by failure to build a sigframe to be delivered
_now_ and not when we hit the next syscall or interrupt.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
If we leave sigreturn via ret_from_signal, we end up with syscall
trace only on entry, leading to very unhappy strace, among other
things. Note that this means different behaviours for signals
delivered while we were in pagefault and for ones delivered while
we were in interrupt...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
a) we should hold modifying regs->format until we know we *will* be
doing stack expansion; otherwise attacker can modify sigframe to
have wrong ->sc_formatvec and install SIGSEGV handler.
b) we should *not* mix copying saved extra stuff from userland with
expanding the stack; once we'd done that manual memmove, we'd better
not return to C, so cleanup is very hard to do. The easiest way
is to copy it on stack first, making sure we won't overwrite on stack
expansion. Fortunately that's easy to do...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Same principle as with the previous patch - do not destroy the
state if sigframe setup fails. Incidentally, it's actually
_less_ work - we don't need to go through adjust_stack dance
on failure if we don't touch regs->stkadj until we know we'd
written sigframe out.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
If we'd failed in setup_frame(), we've no place to store
the original sigmask. It's not an unrecoverable situation -
we raise SIGSEGV, but that SIGSEGV might be successfully
handled (e.g. on altstack). In that case we really don't
want sa_mask of original signal permanently slapped on
the set of blocked signals.
Standard solution: have setup_frame()/setup_rt_frame()
report failure and don't mess with the signal-related
state if that has happened...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Instead of checking the return value of do_signal() we can just do
the work (raise SIGTRAP and clear SR.T1) directly in handle_signal(),
when setting the sigframe up. Simplifies the assembler glue and is
closer to the way we do it on other targets.
Note that do_delayed_trace does *not* disappear; it's still needed
to deal with single-stepping through syscall, since 68040 doesn't
raise the trace exception at all if the trap exception is pending.
We hit it after returning from sys_...() if TIF_DELAYED_TRACE is
set; all that has changed is that we don't reuse it for "single-step
into the handler" codepath.
As the result, do_signal() doesn't need to return anything anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
and saner do_signal() arguments, while we are at it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
... and had been such since the introduction of get_signal_to_deliver()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Minor formatting fixup since the information which core was associated
with the MCE is not always valid.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Building for X86_32 produces shift count warnings, so use BIT_64() to
eliminate the warnings.
drivers/edac/mce_amd.c:778: warning: left shift count >= width of type
drivers/edac/mce_amd.c:778: warning: left shift count >= width of type
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: bluesmoke-devel@lists.sourceforge.net
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Now that everything is inplace, enable MCE decoding on F15h. Make
initcall routine a bit more readable.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Shorten up MCi_STATUS flags and add BD's new deferred and poison types.
Also, simplify formatting.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
MCE bank 2 is redefined from a BU to a CU (Combined Unit) bank on F15h.
Add a decoder function for CU MCEs.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add a decoder for F15h DC MCEs to support the new types of DC MCEs
introduced by the BD microarchitecture.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F15h enlarges the extended error code of an MCE to a 5-bit field
(MCi_STATUS[20:16]). Add a mask variable which default 0xf is overridden
on F15h.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
K8 does not allow for an atomic RMW to a cacheline as F10h does so
disable the error injection interface for it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Make the ->{get|set}_sdram_scrub_rate return the actual scrub rate
bandwidth it succeeded setting and remove superfluous arg pointer used
for that. A negative value returned still means that an error occurred
while setting the scrubrate. Document this for future reference.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Now that all prerequisites are in place, drop the two-stage driver
instances initialization in favor of the following simple init sequence:
1. Probe PCI device: we only test ECC capabilities here and if none exit
early.
2. If the hw supports ECC and it is/can be enabled, we init the per-node
instance.
Remove "amd64_" prefix from static functions touched, while at it.
There actually should be no visible functional change resulting from
this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Rework the code to check the hardware ECC capabilities at PCI probing
time. We do all further initialization only if we actually can/have ECC
enabled.
While at it:
0. Fix function naming.
1. Simplify/clarify debug output.
2. Remove amd64_ prefix from the static functions
3. Reorganize code.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This is in preparation for the init path reorganization where we want
only to
1) test whether a particular node supports ECC
2) can it be enabled
and only then do the necessary allocation/initialization. For that,
we need to decouple the ECC settings of the node from the instance's
descriptor.
The should be no functional change introduced by this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
PCI ECS is being enabled by default since 2.6.26 on AMD so this code is
just superfluous now, remove it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Remove static allocation in favor of dynamically allocating space for as
many driver instances as northbridges present on the system.
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add a macro per printk level, shorten up error messages. Add relevant
information to KERN_INFO level. No functional change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Rename variables representing PCI devices to their BKDG names for faster
search and shorter, clearer code.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Move the remaining per-family init code into the proper place and
simplify the rest of the initialization. Reorganize error handling in
amd64_init_one_instance().
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Run a per-family init function which does all the settings based on
the family this driver instance is running on. Move the scrubrate
calculation in it and simplify code.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.
The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.
We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.
- check the global sum once every interval (this will delay zero detection
for some interval, so it's probably a showstopper for vfsmounts).
- keep a local count and only taking the global sum when local reaches 0 (this
is difficult for vfsmounts, because we can't hold preempt off for the life of
a reference, so a counter would need to be per-thread or tied strongly to a
particular CPU which requires more locking).
- keep a local difference of increments and decrements, which allows us to sum
the total difference and hence find the refcount when summing all CPUs. Then,
keep a single integer "long" refcount for slow and long lasting references,
and only take the global sum of local counters when the long refcount is 0.
This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.
This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.
This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Suggested by Andreas, mnt_ prefix is clearer namespace, follows kernel
conventions better, and is easier for tab complete. I introduced these
names so I'll admit they were not good choices.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
The standard memcmp function on a Westmere system shows up hot in
profiles in the `git diff` workload (both parallel and single threaded),
and it is likely due to the costs associated with trapping into
microcode, and little opportunity to improve memory access (dentry
name is not likely to take up more than a cacheline).
So replace it with an open-coded byte comparison. This increases code
size by 8 bytes in the critical __d_lookup_rcu function, but the
speedup is huge, averaging 10 runs of each:
git diff st user sys elapsed CPU
before 1.15 2.57 3.82 97.1
after 1.14 2.35 3.61 96.8
git diff mt user sys elapsed CPU
before 1.27 3.85 1.46 349
after 1.26 3.54 1.43 333
Elapsed time for single threaded git diff at 95.0% confidence:
-0.21 +/- 0.01
-5.45% +/- 0.24%
It's -0.66% +/- 0.06% elapsed time on my Opteron, so rep cmp costs on the
fam10h seem to be relatively smaller, but there is still a win.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
This makes single threaded git diff -1.25% +/- 0.05% elapsed time on my
2s12c24t Westmere system, and -0.86% +/- 0.05% on my 2s8c Barcelona, by
prefetching the important first cacheline of the inode in while we do the
actual name compare and other operations on the dentry.
There was no measurable slowdown in the single file stat case, or the creat
case (where negative dentries would be common).
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Regardless of how much we possibly try to scale dcache, there is likely
always going to be some fundamental contention when adding or removing children
under the same parent. Pseudo filesystems do not seem need to have connected
dentries because by definition they are disconnected.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>