For CPUs which have an unknown or invalid CPU location (physical location)
assume that their cycle counters aren't syncronized across CPUs.
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: c8c3735997 ("parisc: Enhance detection of synchronous cr16 clocksources")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Helge Deller <deller@gmx.de>
__cmpxchg_u64 is built and used outside CONFIG_64BIT and thus needs to
be exported. This fixes the following build error seen when building
parisc:allmodconfig.
ERROR: "__cmpxchg_u64" [drivers/net/ethernet/intel/i40e/i40e.ko] undefined!
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
As discussed on the debian-hppa list, double-wordcompare and exchange
operations fail on 32-bit kernels. Looking at the code, I realized that
the ",ma" completer does the wrong thing in the "ldw,ma 4(%r26), %r29"
instruction. This increments %r26 and causes the following store to
write to the wrong location.
Note by Helge Deller:
The patch applies cleanly to stable kernel series if this upstream
commit is merged in advance:
f4125cfdb3 ("parisc: Avoid trashing sr2 and sr3 in LWS code").
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Tested-by: Christoph Biedl <debian.axhn@manchmal.in-ulm.de>
Fixes: 8920649120 ("parisc: Implement new LWS CAS supporting 64 bit operations.")
Cc: stable@vger.kernel.org # 3.13+
Signed-off-by: Helge Deller <deller@gmx.de>
Pull watchddog clean-up and fixes from Thomas Gleixner:
"The watchdog (hard/softlockup detector) code is pretty much broken in
its current state. The patch series addresses this by removing all
duct tape and refactoring it into a workable state.
The reasons why I ask for inclusion that late in the cycle are:
1) The code causes lockdep splats vs. hotplug locking which get
reported over and over. Unfortunately there is no easy fix.
2) The risk of breakage is minimal because it's already broken
3) As 4.14 is a long term stable kernel, I prefer to have working
watchdog code in that and the lockdep issues resolved. I wouldn't
ask you to pull if 4.14 wouldn't be a LTS kernel or if the
solution would be easy to backport.
4) The series was around before the merge window opened, but then got
delayed due to the UP failure caused by the for_each_cpu()
surprise which we discussed recently.
Changes vs. V1:
- Addressed your review points
- Addressed the warning in the powerpc code which was discovered late
- Changed two function names which made sense up to a certain point
in the series. Now they match what they do in the end.
- Fixed a 'unused variable' warning, which got not detected by the
intel robot. I triggered it when trying all possible related config
combinations manually. Randconfig testing seems not random enough.
The changes have been tested by and reviewed by Don Zickus and tested
and acked by Micheal Ellerman for powerpc"
* 'core-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
watchdog/core: Put softlockup_threads_initialized under ifdef guard
watchdog/core: Rename some softlockup_* functions
powerpc/watchdog: Make use of watchdog_nmi_probe()
watchdog/core, powerpc: Lock cpus across reconfiguration
watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()
watchdog/hardlockup/perf: Fix spelling mistake: "permanetely" -> "permanently"
watchdog/hardlockup/perf: Cure UP damage
watchdog/hardlockup: Clean up hotplug locking mess
watchdog/hardlockup/perf: Simplify deferred event destroy
watchdog/hardlockup/perf: Use new perf CPU enable mechanism
watchdog/hardlockup/perf: Implement CPU enable replacement
watchdog/hardlockup/perf: Implement init time detection of perf
watchdog/hardlockup/perf: Implement init time perf validation
watchdog/core: Get rid of the racy update loop
watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage
watchdog/sysctl: Clean up sysctl variable name space
watchdog/sysctl: Get rid of the #ifdeffery
watchdog/core: Clean up header mess
watchdog/core: Further simplify sysctl handling
watchdog/core: Get rid of the thread teardown/setup dance
...
While scanning the PDT for reported broken memory modules, warn if the
initrd was coincidentally loaded into bad memory.
Signed-off-by: Helge Deller <deller@gmx.de>
According to the programming note at page 1-31 of the PA 1.1 Firmware
Architecture document, one should use the PDC_INSTR firmware function to
get the instruction that invokes a PDCE_CHECK in the HPMC handler. This
patch follows this note and sets the instruction which has been a nop up
until now.
Testing on a C3000 and C8000 showed that this firmware call isn't
implemented on those machines, so maybe it's only needed on older ones.
Signed-off-by: Helge Deller <deller@gmx.de>
Check stack pointer if we are reaching the stack end and stop unwinding
if we do. This fixes early backtraces and avoids showing unrealistic
call stacks.
Signed-off-by: Helge Deller <deller@gmx.de>
The broken lockup_detector_suspend/resume() interface is going away. Use
the new lockup_detector_soft_poweroff() interface to stop the watchdog from
the busy looping power off routine.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: linux-parisc@vger.kernel.org
Link: http://lkml.kernel.org/r/20170912194146.407385557@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull namespace updates from Eric Biederman:
"Life has been busy and I have not gotten half as much done this round
as I would have liked. I delayed it so that a minor conflict
resolution with the mips tree could spend a little time in linux-next
before I sent this pull request.
This includes two long delayed user namespace changes from Kirill
Tkhai. It also includes a very useful change from Serge Hallyn that
allows the security capability attribute to be used inside of user
namespaces. The practical effect of this is people can now untar
tarballs and install rpms in user namespaces. It had been suggested to
generalize this and encode some of the namespace information
information in the xattr name. Upon close inspection that makes the
things that should be hard easy and the things that should be easy
more expensive.
Then there is my bugfix/cleanup for signal injection that removes the
magic encoding of the siginfo union member from the kernel internal
si_code. The mips folks reported the case where I had used FPE_FIXME
me is impossible so I have remove FPE_FIXME from mips, while at the
same time including a return statement in that case to keep gcc from
complaining about unitialized variables.
I almost finished the work to get make copy_siginfo_to_user a trivial
copy to user. The code is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3
But I did not have time/energy to get the code posted and reviewed
before the merge window opened.
I was able to see that the security excuse for just copying fields
that we know are initialized doesn't work in practice there are buggy
initializations that don't initialize the proper fields in siginfo. So
we still sometimes copy unitialized data to userspace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
Introduce v3 namespaced file capabilities
mips/signal: In force_fcr31_sig return in the impossible case
signal: Remove kernel interal si_code magic
fcntl: Don't use ambiguous SIG_POLL si_codes
prctl: Allow local CAP_SYS_ADMIN changing exe_file
security: Use user_namespace::level to avoid redundant iterations in cap_capable()
userns,pidns: Verify the userns for new pid namespaces
signal/testing: Don't look for __SI_FAULT in userspace
signal/mips: Document a conflict with SI_USER with SIGFPE
signal/sparc: Document a conflict with SI_USER with SIGFPE
signal/ia64: Document a conflict with SI_USER with SIGFPE
signal/alpha: Document a conflict with SI_USER for SIGTRAP
Add some machine-specific information like values of cr16 cycle counter,
machine-specific software ID and machine model to the random generator.
Signed-off-by: Helge Deller <deller@gmx.de>
While testing UBSAN I saw this BUG:
BUG: spinlock bad magic on CPU#0, swapper/0
in unwind code. Let's avoid that by static initialization.
Signed-off-by: Helge Deller <deller@gmx.de>
This patch adds full support to read PDT info on all machine types. At bootup
the PDT is read and bad memory excluded from usage via memblock_reserve().
Later in the boot process a kernel thread is started (kpdtd) which regularily
checks firmare for new reported bad memory and tries to soft offline pages in
case of correctable errors and to kill processes and exclude such memory in
case of uncorrectable errors via memory_failure().
Signed-off-by: Helge Deller <deller@gmx.de>
Older machines with a PAT firmware (e.g. the rp5470) return their Page
Deallocation Table (PDT) info per cell via the PDC_PAT_MEM_PD_INFO PDC call.
This patch adds the necessary structures and wrappers to call firmware.
Signed-off-by: Helge Deller <deller@gmx.de>
Since kernel 4.11 the thread and irq stacks on parisc randomly overflow
the default size of 16k. The reason why stack usage suddenly grew is yet
unknown.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Helge Deller <deller@gmx.de>
In testing James' patch to drivers/parisc/pdc_stable.c, I hit the BUG
statement in flush_cache_range() during a system shutdown:
kernel BUG at arch/parisc/kernel/cache.c:595!
CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx
IAOQ[0]: flush_cache_range+0x144/0x148
IAOQ[1]: flush_cache_page+0x0/0x1a8
RP(r2): flush_cache_range+0xec/0x148
Backtrace:
[<00000000402910ac>] unmap_page_range+0x84/0x880
[<00000000402918f4>] unmap_single_vma+0x4c/0x60
[<0000000040291a18>] zap_page_range_single+0x110/0x160
[<0000000040291c34>] unmap_mapping_range+0x174/0x1a8
[<000000004026ccd8>] truncate_pagecache+0x50/0xa8
[<000000004026cd84>] truncate_setsize+0x54/0x70
[<000000004033d534>] put_aio_ring_file+0x44/0xb0
[<000000004033d5d8>] aio_free_ring+0x38/0x140
[<000000004033d714>] free_ioctx+0x34/0xa8
[<00000000401b0028>] process_one_work+0x1b8/0x4d0
[<00000000401b04f4>] worker_thread+0x1b4/0x648
[<00000000401b9128>] kthread+0x1b0/0x208
[<0000000040150020>] end_fault_vector+0x20/0x28
[<0000000040639518>] nf_ip_reroute+0x50/0xa8
[<0000000040638ed0>] nf_ip_route+0x10/0x78
[<0000000040638c90>] xfrm4_mode_tunnel_input+0x180/0x1f8
CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx
Backtrace:
[<0000000040163bf0>] show_stack+0x20/0x38
[<0000000040688480>] dump_stack+0xa8/0x120
[<0000000040163dc4>] die_if_kernel+0x19c/0x2b0
[<0000000040164d0c>] handle_interruption+0xa24/0xa48
This patch modifies flush_cache_range() to handle non current contexts.
In as much as this occurs infrequently, the simplest approach is to
flush the entire cache when this happens.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
It's always bothered me that we only disable preemption in
copy_user_page around the call to flush_dcache_page_asm.
This patch extends this to after the copy.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
Helge noticed that we flush the TLB page in flush_cache_page but not in
flush_cache_range or flush_cache_mm.
For a long time, we have had random segmentation faults building
packages on machines with PA8800/8900 processors. These machines only
support equivalent aliases. We don't see these faults on machines that
don't require strict coherency. So, it appears TLB speculation
sometimes leads to cache corruption on machines that require coherency.
This patch adds TLB flushes to flush_cache_range and flush_cache_mm when
coherency is required. We only flush the TLB in flush_cache_page when
coherency is required.
The patch also optimizes flush_cache_range. It turns out we always have
the right context to use flush_user_dcache_range_asm and
flush_user_icache_range_asm.
The patch has been tested for some time on rp3440, rp3410 and A500-44.
It's been boot tested on c8000. No random segmentation faults were
observed during testing.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
Some machines can't power off the machine, so disable the lockup detectors to
avoid this watchdog BUG to show up every few seconds:
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd-shutdow:1]
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.9+
The Page Deallocation Table (PDT) holds the physical addresses of all broken
memory addresses. With the physical address we now are able to show which DIMM
slot (e.g. 1a, 3c) actually holds the broken memory module so that users are
able to replace it.
Signed-off-by: Helge Deller <deller@gmx.de>
Add a firmware wrapper function, which asks PDC firmware for the DIMM slot of a
physical address. This is needed to show users which DIMM module needs
replacement in case a broken DIMM was encountered.
Signed-off-by: Helge Deller <deller@gmx.de>
Commit c9c2877d08 ("parisc: Add Page Deallocation Table (PDT) support")
introduced the pdc_pat_mem_read_pd_pdt() firmware helper function, which
crashed the system because it trashed the stack if the
pdc_pat_mem_read_pd_retinfo struct was located on the stack (and which is
in size less than the required 32 64-bit values).
Fix it by using the pdc_result struct instead when calling firmware and copy
the return values back into the result struct when finished sucessfully.
While debugging this code I noticed that the pdc_type wasn't set correctly
either, so let's fix that too.
Fixes: c9c2877d08 ("parisc: Add Page Deallocation Table (PDT) support")
Signed-off-by: Helge Deller <deller@gmx.de>
struct siginfo is a union and the kernel since 2.4 has been hiding a union
tag in the high 16bits of si_code using the values:
__SI_KILL
__SI_TIMER
__SI_POLL
__SI_FAULT
__SI_CHLD
__SI_RT
__SI_MESGQ
__SI_SYS
While this looks plausible on the surface, in practice this situation has
not worked well.
- Injected positive signals are not copied to user space properly
unless they have these magic high bits set.
- Injected positive signals are not reported properly by signalfd
unless they have these magic high bits set.
- These kernel internal values leaked to userspace via ptrace_peek_siginfo
- It was possible to inject these kernel internal values and cause the
the kernel to misbehave.
- Kernel developers got confused and expected these kernel internal values
in userspace in kernel self tests.
- Kernel developers got confused and set si_code to __SI_FAULT which
is SI_USER in userspace which causes userspace to think an ordinary user
sent the signal and that it was not kernel generated.
- The values make it impossible to reorganize the code to transform
siginfo_copy_to_user into a plain copy_to_user. As si_code must
be massaged before being passed to userspace.
So remove these kernel internal si codes and make the kernel code simpler
and more maintainable.
To replace these kernel internal magic si_codes introduce the helper
function siginfo_layout, that takes a signal number and an si_code and
computes which union member of siginfo is being used. Have
siginfo_layout return an enumeration so that gcc will have enough
information to warn if a switch statement does not handle all of union
members.
A couple of architectures have a messed up ABI that defines signal
specific duplications of SI_USER which causes more special cases in
siginfo_layout than I would like. The good news is only problem
architectures pay the cost.
Update all of the code that used the previous magic __SI_ values to
use the new SIL_ values and to call siginfo_layout to get those
values. Escept where not all of the cases are handled remove the
defaults in the switch statements so that if a new case is missed in
the future the lack will show up at compile time.
Modify the code that copies siginfo si_code to userspace to just copy
the value and not cast si_code to a short first. The high bits are no
longer used to hold a magic union member.
Fixup the siginfo header files to stop including the __SI_ values in
their constants and for the headers that were missing it to properly
update the number of si_codes for each signal type.
The fixes to copy_siginfo_from_user32 implementations has the
interesting property that several of them perviously should never have
worked as the __SI_ values they depended up where kernel internal.
With that dependency gone those implementations should work much
better.
The idea of not passing the __SI_ values out to userspace and then
not reinserting them has been tested with criu and criu worked without
changes.
Ref: 2.4.0-test1
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When compiling the 4.13-rc kernel I got those linker errors:
libgcc2.c:(.text+0x110): relocation truncated to fit: R_PARISC_PCREL22F against symbol `$$divU'
defined in .text.div section in /usr/lib/gcc/hppa64-linux-gnu/4.9.2/libgcc.a(_divU.o)
hppa64-linux-gnu-ld: /usr/lib/gcc/hppa64-linux-gnu/4.9.2/libgcc.a(_moddi3.o)(.text+0x174): cannot reach $$divU
Avoid such errors by bundling the millicode routines in the linker script.
Signed-off-by: Helge Deller <deller@gmx.de>
Before the irq handler detects a low stack and then panics the kernel, disable
further stack checks to avoid recursive panics.
Reported-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Here is the big driver core update for 4.13-rc1.
The large majority of this is a lot of cleanup of old fields in the
driver core structures and their remaining usages in random drivers.
All of those fixes have been reviewed by the various subsystem
maintainers. There's also some small firmware updates in here, a new
kobject uevent api interface that makes userspace interaction easier,
and a few other minor things.
All of these have been in linux-next for a long while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWVpX4A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymobgCfd0d13IfpZoq1N41wc6z2Z0xD7cwAnRMeH1/p
kEeISGpHPYP9f8PBh9FO
=Hfqt
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big driver core update for 4.13-rc1.
The large majority of this is a lot of cleanup of old fields in the
driver core structures and their remaining usages in random drivers.
All of those fixes have been reviewed by the various subsystem
maintainers. There's also some small firmware updates in here, a new
kobject uevent api interface that makes userspace interaction easier,
and a few other minor things.
All of these have been in linux-next for a long while with no reported
issues"
* tag 'driver-core-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (56 commits)
arm: mach-rpc: ecard: fix build error
zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()
driver-core: remove struct bus_type.dev_attrs
powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type
powerpc: vio: use dev_groups and not dev_attrs for bus_type
USB: usbip: convert to use DRIVER_ATTR_RW
s390: drivers: convert to use DRIVER_ATTR_RO/WO
platform: thinkpad_acpi: convert to use DRIVER_ATTR_RO/RW
pcmcia: ds: convert to use DRIVER_ATTR_RO
wireless: ipw2x00: convert to use DRIVER_ATTR_RW
net: ehea: convert to use DRIVER_ATTR_RO
net: caif: convert to use DRIVER_ATTR_RO
TTY: hvc: convert to use DRIVER_ATTR_RW
PCI: pci-driver: convert to use DRIVER_ATTR_WO
IB: nes: convert to use DRIVER_ATTR_RW
HID: hid-core: convert to use DRIVER_ATTR_RO and drv_groups
arm: ecard: fix dev_groups patch typo
tty: serdev: use dev_groups and not dev_attrs for bus_type
sparc: vio: use dev_groups and not dev_attrs for bus_type
hid: intel-ish-hid: use dev_groups and not dev_attrs for bus_type
...
Pull parisc updates from Helge Deller:
"Main changes are:
- Added support to the parisc dma functions to return DMA_ERROR_CODE
if DMA isn't possible. This fixes a long standing kernel crash if
parport_pc is enabled (by Thomas Bogendoerfer, marked for stable
series).
- Use the compat_sys_keyctl() in compat mode (by Eric Biggers, marked
for stable series).
- Initial support for the Page Deallocation Table (PDT) which is
maintained by firmware and holds the list of memory addresses which
had physical errors. By checking that list we can prevent Linux to
use those broken memory areas.
- Ensure IRQs are off in switch_mm().
- Report SIGSEGV instead of SIGBUS when running out of stack.
- Mark the cr16 clocksource stable on single-socket and single-core
machines"
* 'parisc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs
parisc: Report SIGSEGV instead of SIGBUS when running out of stack
parisc: use compat_sys_keyctl()
parisc: Don't hardcode PSW values in hpmc code
parisc: Don't hardcode PSW values in gsc_*() functions
parisc: Avoid zeroing gr[0] in fixup_exception()
parisc/mm: Ensure IRQs are off in switch_mm()
parisc: Add Page Deallocation Table (PDT) support
parisc: Enhance detection of synchronous cr16 clocksources
parisc: Drop per_cpu uaccess related exception_data struct
parisc: Inline trivial exception code in lusercopy.S
Architectures with a compat syscall table must put compat_sys_keyctl()
in it, not sys_keyctl(). The parisc architecture was not doing this;
fix it.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d55977 ("sched/core: Remove pointless printout in
sched_show_task()"). Remove the implementations as well.
Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: <linux-parisc@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The firmare in most parisc machines maintains a Page Deallocation Table (PDT)
which holds a list of physical memory addresses where hardware detected memory
errors (single bit and double bit errors).
This patch adds the missing PDC firmware calls and the logic to read the PDT
from firmware, report all current PDT entries and exclude the reported bad
memory from being used by Linux.
Signed-off-by: Helge Deller <deller@gmx.de>
The cr16 clocks of the physical PARISC CPUs are usually nonsynchronous.
Nevertheless, it seems that each CPU socket (which holds two cores) of
PA8800 and PA8900 CPUs (e.g. in a C8000 workstation) is fed by the same
clock source, which makes the cr16 clocks of each CPU socket syncronous.
Let's try to detect such situations and mark the cr16 clocksource stable
on single-socket and single-core machines.
Signed-off-by: Helge Deller <deller@gmx.de>
The last users have been migrated off by commits d19f5e41b3 ("parisc:
Clean up fixup routines for get_user()/put_user()") and 554bfeceb8
("parisc: Fix access fault handling in pa_memcpy()").
Signed-off-by: Helge Deller <deller@gmx.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z
akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G
tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St
svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM
KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5
gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG
0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f
oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+
IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE
iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3
IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq
ags00JtMLitfNPBH3eSl
=eE4D
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add framework for supporting PCIe devices in Endpoint mode (Kishon
Vijay Abraham I)
- use non-postable PCI config space mappings when possible (Lorenzo
Pieralisi)
- clean up and unify mmap of PCI BARs (David Woodhouse)
- export and unify Function Level Reset support (Christoph Hellwig)
- avoid FLR for Intel 82579 NICs (Sasha Neftin)
- add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)
- short-circuit config access failures for disconnected devices (Keith
Busch)
- remove D3 sleep delay when possible (Adrian Hunter)
- freeze PME scan before suspending devices (Lukas Wunner)
- stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)
- disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)
- add arch-specific alignment control to improve device passthrough by
avoiding multiple BARs in a page (Yongji Xie)
- add sysfs sriov_drivers_autoprobe to control VF driver binding
(Bodong Wang)
- allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)
- fix crashes when unbinding host controllers that don't support
removal (Brian Norris)
- add driver for MicroSemi Switchtec management interface (Logan
Gunthorpe)
- add driver for Faraday Technology FTPCI100 host bridge (Linus
Walleij)
- add i.MX7D support (Andrey Smirnov)
- use generic MSI support for Aardvark (Thomas Petazzoni)
- make Rockchip driver modular (Brian Norris)
- advertise 128-byte Read Completion Boundary support for Rockchip
(Shawn Lin)
- advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)
- convert atomic_t to refcount_t in HV driver (Elena Reshetova)
- add CPU IRQ affinity in HV driver (K. Y. Srinivasan)
- fix PCI bus removal in HV driver (Long Li)
- add support for ThunderX2 DMA alias topology (Jayachandran C)
- add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)
- add ITE 8893 bridge DMA alias quirk (Jarod Wilson)
- restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
(Manish Jaggi)
* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
PCI: Don't allow unbinding host controllers that aren't prepared
ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
MAINTAINERS: Add PCI Endpoint maintainer
Documentation: PCI: Add userguide for PCI endpoint test function
tools: PCI: Add sample test script to invoke pcitest
tools: PCI: Add a userspace tool to test PCI endpoint
Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
misc: Add host side PCI driver for PCI test function device
PCI: Add device IDs for DRA74x and DRA72x
dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
PCI: dwc: dra7xx: Workaround for errata id i870
dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
PCI: dwc: dra7xx: Add EP mode support
PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
dt-bindings: PCI: Add DT bindings for PCI designware EP mode
PCI: dwc: designware: Add EP mode support
Documentation: PCI: Add binding documentation for pci-test endpoint function
ixgbe: Use pcie_flr() instead of duplicating it
IB/hfi1: Use pcie_flr() instead of duplicating it
PCI: imx6: Fix spelling mistake: "contol" -> "control"
...
This typo is quite common. Fix it and add it to the spelling file so
that checkpatch catches it earlier.
Link: http://lkml.kernel.org/r/20170317011131.6881-2-sboyd@codeaurora.org
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__vmalloc* allows users to provide gfp flags for the underlying
allocation. This API is quite popular
$ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
77
The only problem is that many people are not aware that they really want
to give __GFP_HIGHMEM along with other flags because there is really no
reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
which are mapped to the kernel vmalloc space. About half of users don't
use this flag, though. This signals that we make the API unnecessarily
too complex.
This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
be mapped to the vmalloc space. Current users which add __GFP_HIGHMEM
are simplified and drop the flag.
Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Cristopher Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In all cases we know which BAR it is. Passing it in means that arch code
(or generic code; watch this space) won't have to go looking for it again.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Commit 73580dac76 ("parisc: Fix system shutdown halt") introduced an endless
loop for systems which don't provide a software power off function. But the
soft lockup detector will detect this and report stalled CPUs after some time.
Avoid those unwanted warnings by disabling the soft lockup detector.
Fixes: 73580dac76 ("parisc: Fix system shutdown halt")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.9+
Al Viro noticed that userspace accesses via get_user()/put_user() can be
simplified a lot with regard to usage of the exception handling.
This patch implements a fixup routine for get_user() and put_user() in such
that the exception handler will automatically load -EFAULT into the register
%r8 (the error value) in case on a fault on userspace. Additionally the fixup
routine will zero the target register on fault in case of a get_user() call.
The target register is extracted out of the faulting assembly instruction.
This patch brings a few benefits over the old implementation:
1. Exception handling gets much cleaner, easier and smaller in size.
2. Helper functions like fixup_get_user_skip_1 (all of fixup.S) can be dropped.
3. No need to hardcode %r9 as target register for get_user() any longer. This
helps the compiler register allocator and thus creates less assembler
statements.
4. No dependency on the exception_data contents any longer.
5. Nested faults will be handled cleanly.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
On those parisc machines which don't provide a software power off
function, the system currently kills the init process at the end of a
shutdown and unexpectedly restarts insteads of halting.
Fix it by adding a loop which will not return.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.9+