This patch updates/fixes all spin_unlock_wait() implementations.
The update is in semantics; where it previously was only a control
dependency, we now upgrade to a full load-acquire to match the
store-release from the spin_unlock() we waited on. This ensures that
when spin_unlock_wait() returns, we're guaranteed to observe the full
critical section we waited on.
This fixes a number of spin_unlock_wait() users that (not
unreasonably) rely on this.
I also fixed a number of ticket lock versions to only wait on the
current lock holder, instead of for a full unlock, as this is
sufficient.
Furthermore; again for ticket locks; I added an smp_rmb() in between
the initial ticket load and the spin loop testing the current value
because I could not convince myself the address dependency is
sufficient, esp. if the loads are of different sizes.
I'm more than happy to remove this smp_rmb() again if people are
certain the address dependency does indeed work as expected.
Note: PPC32 will be fixed independently
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: chris@zankel.net
Cc: cmetcalf@mellanox.com
Cc: davem@davemloft.net
Cc: dhowells@redhat.com
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: linux@armlinux.org.uk
Cc: mpe@ellerman.id.au
Cc: ralf@linux-mips.org
Cc: realmz6@gmail.com
Cc: rkuo@codeaurora.org
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: vgupta@synopsys.com
Cc: ysato@users.sourceforge.jp
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This architecture selects RTC_CLASS unconditionally, so the GEN_RTC
has not worked here for a long time.
Now we can remove both the asm/rtc.h header and the Kconfig dependency
for CONFIG_GEN_RTC.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Nothing on these architectures ever includes the asm/mc146818rtc.h
file, the drivers that used to do this have been fixed long ago,
and the remaining users are all PC-specific.
This removes the files for good.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
The attached patch updates the parisc version of futex.h to match the
current generic implementation except for the spinlock code.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Add a native implementation for the sched_clock() function which utilizes the
processor-internal cycle counter (Control Register 16) as high-resolution time
source.
With this patch we now get much more fine-grained resolutions in various
in-kernel time measurements (e.g. when viewing the function tracing logs), and
probably a more accurate scheduling on SMP systems.
There are a few specific implementation details in this patch:
1. On a 32bit kernel we emulate the higher 32bits of the required 64-bit
resolution of sched_clock() by increasing a per-cpu counter at every
wrap-around of the 32bit cycle counter.
2. In a SMP system, the cycle counters of the various CPUs are not syncronized
(similiar to the TSC in a x86_64 system). To cope with this we define
HAVE_UNSTABLE_SCHED_CLOCK and let the upper layers do the adjustment work.
3. Since we need HAVE_UNSTABLE_SCHED_CLOCK, we need to provide a cmpxchg64()
function even on a 32-bit kernel.
4. A 64-bit SMP kernel which is started on a UP system will mark the
sched_clock() implementation as "stable", which means that we don't expect any
jumps in the returned counter. This is true because we then run only on one
CPU.
Signed-off-by: Helge Deller <deller@gmx.de>
By adding TRACEHOOK support we now get a clean user interface to access
registers via PTRACE_GETREGS, PTRACE_SETREGS, PTRACE_GETFPREGS and
PTRACE_SETFPREGS.
The user-visible regset struct user_regs_struct and user_fp_struct are
modelled similiar to x86 and can be accessed via PTRACE_GETREGSET.
Signed-off-by: Helge Deller <deller@gmx.de>
This patch simplifies the code for get_user() and put_user() a lot.
Instead of accessing kernel memory (%sr0) and userspace memory (%sr3)
hard-coded in the assembler instruction, we now preload %sr2 with either
%sr0 (for accessing KERNEL_DS) or with sr3 (to access USER_DS) and
use %sr2 in the load directly.
The generated code avoids a branch and speeds up execution by generating
less assembler instructions.
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
This patch adds support for the TIF_SYSCALL_TRACEPOINT on the parisc
architecture. Basically, it calls the appropriate tracepoints on syscall
entry and exit.
Signed-off-by: Helge Deller <deller@gmx.de>
Pull parisc ftrace fixes from Helge Deller:
"This is (most likely) the last pull request for v4.6 for the parisc
architecture.
It fixes the FTRACE feature for parisc, which is horribly broken since
quite some time and doesn't even compile. This patch just fixes the
bare minimum (it actually removes more lines than it adds), so that
the function tracer works again on 32- and 64bit kernels.
I've queued up additional patches on top of this patch which e.g. add
the syscall tracer, but those have to wait for the merge window for
v4.7."
* 'parisc-4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix ftrace function tracer
Fix the FTRACE function tracer for 32- and 64-bit kernel.
The former code was horribly broken.
Reimplement most coding in assembly and utilize optimizations, e.g. put
mcount() and ftrace_stub() into one L1 cacheline.
Signed-off-by: Helge Deller <deller@gmx.de>
Update the comment to reflect the changes of commit 0de7985 (parisc: Use
generic extable search and sort routines).
Signed-off-by: Helge Deller <deller@gmx.de>
Handling exceptions from modules never worked on parisc.
It was just masked by the fact that exceptions from modules
don't happen during normal use.
When a module triggers an exception in get_user() we need to load the
main kernel dp value before accessing the exception_data structure, and
afterwards restore the original dp value of the module on exit.
Noticed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Pull parisc fixes from Helge Deller:
"Fix seccomp filter support and SIGSYS signals on compat kernel.
Both patches are tagged for v4.5 stable kernel"
* 'parisc-4.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix and enable seccomp filter support
parisc: Fix SIGSYS signals in compat case
The seccomp filter support requires careful handling of task registers. This
includes reloading of the return value (%r28) and proper syscall exit if
secure_computing() returned -1.
Additionally we need to sign-extend the syscall number from signed 32bit to
signed 64bit in do_syscall_trace_enter() since the ptrace interface only allows
storing 32bit values in compat mode.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.5
Switch to the generic extable search and sort routines which were introduced
with commit a272858 from Ard Biesheuvel. This saves quite some memory in the
vmlinux binary with the 64bit kernel.
Signed-off-by: Helge Deller <deller@gmx.de>
The system calls alloc_hugepages() and free_hugepages() were introduced
in Linux 2.5.36 and removed again in 2.5.54. They were never implemented
on parisc, so let's drop them now.
Signed-off-by: Helge Deller <deller@gmx.de>
Pull networking updates from David Miller:
"Highlights:
1) Support more Realtek wireless chips, from Jes Sorenson.
2) New BPF types for per-cpu hash and arrap maps, from Alexei
Starovoitov.
3) Make several TCP sysctls per-namespace, from Nikolay Borisov.
4) Allow the use of SO_REUSEPORT in order to do per-thread processing
of incoming TCP/UDP connections. The muxing can be done using a
BPF program which hashes the incoming packet. From Craig Gallek.
5) Add a multiplexer for TCP streams, to provide a messaged based
interface. BPF programs can be used to determine the message
boundaries. From Tom Herbert.
6) Add 802.1AE MACSEC support, from Sabrina Dubroca.
7) Avoid factorial complexity when taking down an inetdev interface
with lots of configured addresses. We were doing things like
traversing the entire address less for each address removed, and
flushing the entire netfilter conntrack table for every address as
well.
8) Add and use SKB bulk free infrastructure, from Jesper Brouer.
9) Allow offloading u32 classifiers to hardware, and implement for
ixgbe, from John Fastabend.
10) Allow configuring IRQ coalescing parameters on a per-queue basis,
from Kan Liang.
11) Extend ethtool so that larger link mode masks can be supported.
From David Decotigny.
12) Introduce devlink, which can be used to configure port link types
(ethernet vs Infiniband, etc.), port splitting, and switch device
level attributes as a whole. From Jiri Pirko.
13) Hardware offload support for flower classifiers, from Amir Vadai.
14) Add "Local Checksum Offload". Basically, for a tunneled packet
the checksum of the outer header is 'constant' (because with the
checksum field filled into the inner protocol header, the payload
of the outer frame checksums to 'zero'), and we can take advantage
of that in various ways. From Edward Cree"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
bonding: fix bond_get_stats()
net: bcmgenet: fix dma api length mismatch
net/mlx4_core: Fix backward compatibility on VFs
phy: mdio-thunder: Fix some Kconfig typos
lan78xx: add ndo_get_stats64
lan78xx: handle statistics counter rollover
RDS: TCP: Remove unused constant
RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
net: smc911x: convert pxa dma to dmaengine
team: remove duplicate set of flag IFF_MULTICAST
bonding: remove duplicate set of flag IFF_MULTICAST
net: fix a comment typo
ethernet: micrel: fix some error codes
ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
bpf, dst: add and use dst_tclassid helper
bpf: make skb->tc_classid also readable
net: mvneta: bm: clarify dependencies
cls_bpf: reset class and reuse major in da
ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
ldmvsw: Add ldmvsw.c driver code
...
Pull read-only kernel memory updates from Ingo Molnar:
"This tree adds two (security related) enhancements to the kernel's
handling of read-only kernel memory:
- extend read-only kernel memory to a new class of formerly writable
kernel data: 'post-init read-only memory' via the __ro_after_init
attribute, and mark the ARM and x86 vDSO as such read-only memory.
This kind of attribute can be used for data that requires a once
per bootup initialization sequence, but is otherwise never modified
after that point.
This feature was based on the work by PaX Team and Brad Spengler.
(by Kees Cook, the ARM vDSO bits by David Brown.)
- make CONFIG_DEBUG_RODATA always enabled on x86 and remove the
Kconfig option. This simplifies the kernel and also signals that
read-only memory is the default model and a first-class citizen.
(Kees Cook)"
* 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ARM/vdso: Mark the vDSO code read-only after init
x86/vdso: Mark the vDSO code read-only after init
lkdtm: Verify that '__ro_after_init' works correctly
arch: Introduce post-init read-only memory
x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
asm-generic: Consolidate mark_rodata_ro()
This patch updates csum_ipv6_magic so that it correctly recognizes that
protocol is a unsigned 8 bit value.
This will allow us to better understand what limitations may or may not be
present in how we handle the data. For example there are a number of
places that call htonl on the protocol value. This is likely not necessary
and can be replaced with a multiplication by ntohl(1) which will be
converted to a shift by the compiler.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates all instances of csum_tcpudp_magic and
csum_tcpudp_nofold to reflect the types that are usually used as the source
inputs. For example the protocol field is populated based on nexthdr which
is actually an unsigned 8 bit value. The length is usually populated based
on skb->len which is an unsigned integer.
This addresses an issue in which the IPv6 function csum_ipv6_magic was
generating a checksum using the full 32b of skb->len while
csum_tcpudp_magic was only using the lower 16 bits. As a result we could
run into issues when attempting to adjust the checksum as there was no
protocol agnostic way to update it.
With this change the value is still truncated as many architectures use
"(len + proto) << 8", however this truncation only occurs for values
greater than 16776960 in length and as such is unlikely to occur as we stop
the inner headers at ~64K in size.
I did have to make a few minor changes in the arm, mn10300, nios2, and
score versions of the function in order to support these changes as they
were either using things such as an OR to combine the protocol and length,
or were using ntohs to convert the length which would have truncated the
value.
I also updated a few spots in terms of whitespace and type differences for
the addresses. Most of this was just to make sure all of the definitions
were in sync going forward.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
For a long time all architectures implement the pci_dma_* functions using
the generic DMA API, and they all use the same header to do so.
Move this header, pci-dma-compat.h, to include/linux and include it from
the generic pci.h instead of having each arch duplicate this include.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
David Binderman reported a style issue in the floppy.h header file:
arch/parisc/include/asm/floppy.h:221: (style) Boolean result is used in bitwise
operation. Clarify expression with parentheses.
Reported-by: David Binderman <dcb314@hotmail.com>
Cc: David Binderman <dcb314@hotmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
purpose is to allow an application to give feedback to the kernel about
the quality of the network path for a connected socket. The value
argument indicates the type of quality report. For this initial patch
the only supported advice is a value of 1 which indicates "bad path,
please reroute"-- the action taken by the kernel is to call
dst_negative_advice which will attempt to choose a different ECMP route,
reset the TX hash for flow label and UDP source port in encapsulation,
etc.
This facility should be useful for connected UDP sockets where only the
application can provide any feedback about path quality. It could also
be useful for TCP applications that have additional knowledge about the
path outside of the normal TCP control loop.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the easiest ways to protect the kernel from attack is to reduce
the internal attack surface exposed when a "write" flaw is available. By
making as much of the kernel read-only as possible, we reduce the
attack surface.
Many things are written to only during __init, and never changed
again. These cannot be made "const" since the compiler will do the wrong
thing (we do actually need to write to them). Instead, move these items
into a memory region that will be made read-only during mark_rodata_ro()
which happens after all kernel __init code has finished.
This introduces __ro_after_init as a way to mark such memory, and adds
some documentation about the existing __read_mostly marking.
This improves the security of the Linux kernel by marking formerly
read-write memory regions as read-only on a fully booted up system.
Based on work by PaX Team and Brad Spengler.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-5-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the generic implementation to <linux/dma-mapping.h> now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.
[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Helge Deller <deller@gmx.de>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commits 21f55b018b ("arch/*/include/uapi/asm/mman.h: : let MADV_FREE
have same value for all architectures") and ef58978f1e ("mm: define
MADV_FREE for some arches") both defined MADV_FREE, but did not use the
same values. This results in build errors such as
./arch/alpha/include/uapi/asm/mman.h:53:0: error: "MADV_FREE" redefined
./arch/alpha/include/uapi/asm/mman.h:50:0: note: this is the location of the previous definition
for the affected architectures.
Fixes: 21f55b018b ("arch/*/include/uapi/asm/mman.h: : let MADV_FREE have same value for all architectures")
Fixes: ef58978f1e ("mm: define MADV_FREE for some arches")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull parsic updates from Helge Deller:
"This patchset includes two major fixes which are both scheduled for
stable:
First, __ARCH_SI_PREAMBLE_SIZE was defined with a wrong value.
Second, huge page pte and TLB changes needed protection with a
spinlock. Other than that there are just some trivial optimizations
and cleanups"
* 'parisc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Protect huge page pte changes with spinlocks
parisc: Imporove debug info about space registers and TLB configuration
parisc: Drop parisc-specific NSIGTRAP define
parisc: Fix __ARCH_SI_PREAMBLE_SIZE
parisc: Reduce overhead of parisc_requires_coherency()
parisc: Initialize PCI bridge cache line and default latency
PA-RISC doesn't have atomic instructions to modify page table entries, so it
takes spinlock in the TLB handler and modifies the page table entry
non-atomically. If you modify the page table entry without the spinlock, you
may race with TLB handler on another CPU and your modification may be lost.
Protect against that with usage of purge_tlb_start() and purge_tlb_end() which
handles the TLB spinlock.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.4
For uapi, need try to let all macros have same value, and MADV_FREE is
added into main branch recently, so need redefine MADV_FREE for it.
At present, '8' can be shared with all architectures, so redefine it to
'8'.
[sudipm.mukherjee@gmail.com: correct uniform value of MADV_FREE]
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Shaohua Li <shli@kernel.org>
Cc: <yalin.wang2010@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Jason Evans <je@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttil <mika.penttila@nextfour.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Shaohua Li <shli@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most architectures use asm-generic, but alpha, mips, parisc, xtensa need
their own definitions.
This patch defines MADV_FREE for them so it should fix build break for
their architectures.
Maybe, I should split and feed pieces to arch maintainers but included
here for mmotm convenience.
[gang.chen.5i5j@gmail.com: let MADV_FREE have same value for all architectures]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Shaohua Li <shli@kernel.org>
Cc: <yalin.wang2010@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Evans <je@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttil <mika.penttila@nextfour.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Shaohua Li <shli@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On a 64bit kernel build the compiler aligns the _sifields union in the
struct siginfo_t on a 64bit address. The __ARCH_SI_PREAMBLE_SIZE define
compensates for this alignment and thus fixes the wait testcase of the
strace package.
The symptoms of a wrong __ARCH_SI_PREAMBLE_SIZE value is that
_sigchld.si_stime variable is missed to be copied and thus after a
copy_siginfo() will have uninitialized values.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
PCI controllers and pci-pci bridges may have not been fully initialized
regarding cache line and defaul latency.
This partly reverts
commit 5f0e9b4 ("parisc: Remove unused pcibios_init_bus()")
Signed-off-by: Helge Deller <deller@gmx.de>
Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group. These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.
This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull parisc update from Helge Deller:
"This patchset adds Huge Page and HUGETLBFS support for parisc"
Honestly, the hugepage support should have gone through in the merge
window, and is not really an rc-time fix. But it only touches
arch/parisc, and I cannot find it in myself to care. If one of the
three parisc users notices a breakage, I will point at Helge and make
rude farting noises.
* 'parisc-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Map kernel text and data on huge pages
parisc: Add Huge Page and HUGETLBFS support
parisc: Use long branch to do_syscall_trace_exit
parisc: Increase initial kernel mapping to 32MB on 64bit kernel
parisc: Initialize the fault vector earlier in the boot process.
parisc: Add defines for Huge page support
parisc: Drop unused MADV_xxxK_PAGES flags from asm/mman.h
parisc: Drop definition of start_thread_som for HP-UX SOM binaries
parisc: Fix wrong comment regarding first pmd entry flags
This patch adds huge page support to allow userspace to allocate huge
pages and to use hugetlbfs filesystem on 32- and 64-bit Linux kernels.
A later patch will add kernel support to map kernel text and data on
huge pages.
The only requirement is, that the kernel needs to be compiled for a
PA8X00 CPU (PA2.0 architecture). Older PA1.X CPUs do not support
variable page sizes. 64bit Kernels are compiled for PA2.0 by default.
Technically on parisc multiple physical huge pages may be needed to
emulate standard 2MB huge pages.
Signed-off-by: Helge Deller <deller@gmx.de>
For the 64bit kernel the initially 16 MB kernel memory might become too
small if you build a kernel with many modules built-in and with kernel
text and data areas mapped on huge pages.
This patch increases the initial mapping to 32MB for 64bit kernels and
keeps 16MB for 32bit kernels.
Signed-off-by: Helge Deller <deller@gmx.de>
Huge pages on parisc will have the same size as one pmd table, which
is on a 64bit kernel 2MB on a kernel with 4K kernel page sizes, and
on a 32bit kernel 4MB when used with 4K kernel pages.
Since parisc does not physically supports 2MB huge page sizes, emulate
it with two consecutive 1MB page sizes instead. Keeping the same huge
page size as one pmd will allow us to add transparent huge page support
later on.
Bit 21 in the pte flags was unused and will now be used to mark a page
as huge page (_PAGE_HPAGE_BIT).
Signed-off-by: Helge Deller <deller@gmx.de>
Drop the MADV_xxK_PAGES flags, which were never used and were from a proposed
API which was never integrated into the generic Linux kernel code.
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
The definition of start_thread_som was planned to be used to execute
HP-UX SOM binaries. Since HP-UX compatibility was dropped with kernel 4.0
there is no need to carry it further.
Signed-off-by: Helge Deller <deller@gmx.de>
Pull parisc updates from Helge Deller:
"We have two patches in here:
- The parisc uapi headers have been screwed up since quite some time.
This patch fixes some bugs (e.g. endianess not respected in
compat_semid64_ds) and cleans them up (e.g. uid_t was used instead
of __kernel_uid_t) so that they can be used by userspace again.
This patch has been reviewed by Arnd Bergmann and is scheduled for
stable kernel series.
- Drop the hpux_stat64 struct from stat.h, we do not support HP-UX
binaries since kernel 4.0"
* 'parisc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fixes and cleanups in kernel uapi header files
parisc: Drop hpux_stat64 struct from stat.h header file
Removal started in commit 5bbeed12bd ("sparc32: drop unused
kmap_atomic_to_page"). Let's do it across the whole tree.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes some bugs and partly cleans up the parisc uapi header
files to what glibc defined:
- compat_semid64_ds was wrong and did not take the endianess into
account
- ipc64_perm exported userspace types which broke building userspace
packages on debian (e.g. trinity)
- ipc64_perm needs to use a 32bit mode_t on 64bit kernel
- msqid64_ds and semid64_ds needs unsigned longs for various struct members
- shmid64_ds exported size_t instead of __kernel_size_t
And finally add some compile-time checks for the sizes of those structs
to avoid future breakage.
Runtime-tested with the Linux Test Project (LTP) testsuite.
Cc: <stable@vger.kernel.org> # 3.18+
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Helge Deller <deller@gmx.de>
The struct hpux_stat64 is not needed any longer since we dropped HP-UX
support in commit 04c1614 ("parisc: hpux - Drop support for HP-UX
binaries").
Signed-off-by: Helge Deller <deller@gmx.de>
The previous patch introduced a flag that specified pages in a VMA should
be placed on the unevictable LRU, but they should not be made present when
the area is created. This patch adds the ability to set this state via
the new mlock system calls.
We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
MCL_ONFAULT should be used as a modifier to the two other mlockall flags.
When used with MCL_CURRENT, all current mappings will be marked with
VM_LOCKED | VM_LOCKONFAULT. When used with MCL_FUTURE, the mm->def_flags
will be marked with VM_LOCKED | VM_LOCKONFAULT. When used with both
MCL_CURRENT and MCL_FUTURE, all current mappings and mm->def_flags will be
marked with VM_LOCKED | VM_LOCKONFAULT.
Prior to this patch, mlockall() will unconditionally clear the
mm->def_flags any time it is called without MCL_FUTURE. This behavior is
maintained after adding MCL_ONFAULT. If a call to mlockall(MCL_FUTURE) is
followed by mlockall(MCL_CURRENT), the mm->def_flags will be cleared and
new VMAs will be unlocked. This remains true with or without MCL_ONFAULT
in either mlockall() invocation.
munlock() will unconditionally clear both vma flags. munlockall()
unconditionally clears for VMA flags on all VMAs and in the mm->def_flags
field.
Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull parisc updates from Helge Deller:
"The most important change is that we reduce L1_CACHE_BYTES to 16
bytes, for which a trivial patch for XPS in the network layer was
needed. Then we wire up the sys_membarrier and userfaultfd syscalls
and added two other small cleanups"
* 'parisc-4.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Change L1_CACHE_BYTES to 16
net/xps: Fix calculation of initial number of xps queues
parisc: reduce syslog debug output
parisc: serial/mux: Convert to uart_console_device instead of open-coded
parisc: Wire up userfaultfd syscall
parisc: allocate sys_membarrier system call number
Change L1_CACHE_BYTES to 16 bytes.
Tested for 16 days on rp3440.
Additional remarks from Helge Deller:
Saves ~17 kb of kernel code/data and gives a slight performance improvement in
various test cases.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
This patch makes sure that atomic_{read,set}() are at least
{READ,WRITE}_ONCE().
We already had the 'requirement' that atomic_read() should use
ACCESS_ONCE(), and most archs had this, but a few were lacking.
All are now converted to use READ_ONCE().
And, by a symmetry and general paranoia argument, upgrade atomic_set()
to use WRITE_ONCE().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: james.hogan@imgtec.com
Cc: linux-kernel@vger.kernel.org
Cc: oleg@redhat.com
Cc: will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 3cc2dac5be ("drivers/video/fbdev/atyfb: Replace MTRR UC hole
with strong UC") introduces calls to ioremap_wc and ioremap_uc. This
causes build failures with parisc:allmodconfig. Map the missing
functions to ioremap_nocache.
Fixes: 3cc2dac5be ("drivers/video/fbdev/atyfb:
Replace MTRR UC hole with strong UC")
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Implement atomic logic ops -- atomic_{or,xor,and}.
These will replace the atomic_{set,clear}_mask functions that are
available on some archs.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Implement atomic logic ops -- atomic_{or,xor,and}.
These will replace the atomic_{set,clear}_mask functions that are
available on some archs.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit 0e0da48dee ("parisc: mm: don't count preallocated pmds")
introduced a memory leak.
After this commit, the 'return' statement in pmd_free is executed in all
cases. Even for pmd that are not attached to the pgd. So 'free_pages'
can never be called anymore, leading to a memory leak.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Commit 2ae416b142 ("mm: new mm hook framework") introduced an empty
header file (mm-arch-hooks.h) for every architecture, even those which
doesn't need to define mm hooks.
As suggested by Geert Uytterhoeven, this could be cleaned through the use
of a generic header file included via each per architecture
asm/include/Kbuild file.
The PowerPC architecture is not impacted here since this architecture has
to defined the arch_remap MM hook.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The increased use of pdtlb/pitlb instructions seemed to increase the
frequency of random segmentation faults building packages. Further, we
had a number of cases where TLB inserts would repeatedly fail and all
forward progress would stop. The Haskell ghc package caused a lot of
trouble in this area. The final indication of a race in pte handling was
this syslog entry on sibaris (C8000):
swap_free: Unused swap offset entry 00000004
BUG: Bad page map in process mysqld pte:00000100 pmd:019bbec5
addr:00000000ec464000 vm_flags:00100073 anon_vma:0000000221023828 mapping: (null) index:ec464
CPU: 1 PID: 9176 Comm: mysqld Not tainted 4.0.0-2-parisc64-smp #1 Debian 4.0.5-1
Backtrace:
[<0000000040173eb0>] show_stack+0x20/0x38
[<0000000040444424>] dump_stack+0x9c/0x110
[<00000000402a0d38>] print_bad_pte+0x1a8/0x278
[<00000000402a28b8>] unmap_single_vma+0x3d8/0x770
[<00000000402a4090>] zap_page_range+0xf0/0x198
[<00000000402ba2a4>] SyS_madvise+0x404/0x8c0
Note that the pte value is 0 except for the accessed bit 0x100. This bit
shouldn't be set without the present bit.
It should be noted that the madvise system call is probably a trigger for many
of the random segmentation faults.
In looking at the kernel code, I found the following problems:
1) The pte_clear define didn't take TLB lock when clearing a pte.
2) We didn't test pte present bit inside lock in exception support.
3) The pte and tlb locks needed to merged in order to ensure consistency
between page table and TLB. This also has the effect of serializing TLB
broadcasts on SMP systems.
The attached change implements the above and a few other tweaks to try
to improve performance. Based on the timing code, TLB purges are very
slow (e.g., ~ 209 cycles per page on rp3440). Thus, I think it
beneficial to test the split_tlb variable to avoid duplicate purges.
Probably, all PA 2.0 machines have combined TLBs.
I dropped using __flush_tlb_range in flush_tlb_mm as I realized all
applications and most threads have a stack size that is too large to
make this useful. I added some comments to this effect.
Since implementing 1 through 3, I haven't had any random segmentation
faults on mx3210 (rp3440) in about one week of building code and running
as a Debian buildd.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Helge Deller <deller@gmx.de>
Pull asm/scatterlist.h removal from Jens Axboe:
"We don't have any specific arch scatterlist anymore, since parisc
finally switched over. Kill the include"
* 'for-4.2/sg' of git://git.kernel.dk/linux-block:
remove scatterlist.h generation from arch Kbuild files
remove <asm/scatterlist.h>
CRIU is recreating the process memory layout by remapping the checkpointee
memory area on top of the current process (criu). This includes remapping
the vDSO to the place it has at checkpoint time.
However some architectures like powerpc are keeping a reference to the
vDSO base address to build the signal return stack frame by calling the
vDSO sigreturn service. So once the vDSO has been moved, this reference
is no more valid and the signal frame built later are not usable.
This patch serie is introducing a new mm hook framework, and a new
arch_remap hook which is called when mremap is done and the mm lock still
hold. The next patch is adding the vDSO remap and unmap tracking to the
powerpc architecture.
This patch (of 3):
This patch introduces a new set of header file to manage mm hooks:
- per architecture empty header file (arch/x/include/asm/mm-arch-hooks.h)
- a generic header (include/linux/mm-arch-hooks.h)
The architecture which need to overwrite a hook as to redefine it in its
header file, while architecture which doesn't need have nothing to do.
The default hooks are defined in the generic header and are used in the
case the architecture is not defining it.
In a next step, mm hooks defined in include/asm-generic/mm_hooks.h should
be moved here.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull scheduler updates from Ingo Molnar:
"The main changes are:
- lockless wakeup support for futexes and IPC message queues
(Davidlohr Bueso, Peter Zijlstra)
- Replace spinlocks with atomics in thread_group_cputimer(), to
improve scalability (Jason Low)
- NUMA balancing improvements (Rik van Riel)
- SCHED_DEADLINE improvements (Wanpeng Li)
- clean up and reorganize preemption helpers (Frederic Weisbecker)
- decouple page fault disabling machinery from the preemption
counter, to improve debuggability and robustness (David
Hildenbrand)
- SCHED_DEADLINE documentation updates (Luca Abeni)
- topology CPU masks cleanups (Bartosz Golaszewski)
- /proc/sched_debug improvements (Srikar Dronamraju)"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits)
sched/deadline: Remove needless parameter in dl_runtime_exceeded()
sched: Remove superfluous resetting of the p->dl_throttled flag
sched/deadline: Drop duplicate init_sched_dl_class() declaration
sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target
sched/deadline: Make init_sched_dl_class() __init
sched/deadline: Optimize pull_dl_task()
sched/preempt: Add static_key() to preempt_notifiers
sched/preempt: Fix preempt notifiers documentation about hlist_del() within unsafe iteration
sched/stop_machine: Fix deadlock between multiple stop_two_cpus()
sched/debug: Add sum_sleep_runtime to /proc/<pid>/sched
sched/debug: Replace vruntime with wait_sum in /proc/sched_debug
sched/debug: Properly format runnable tasks in /proc/sched_debug
sched/numa: Only consider less busy nodes as numa balancing destinations
Revert 095bebf61a ("sched/numa: Do not move past the balance point if unbalanced")
sched/fair: Prevent throttling in early pick_next_task_fair()
preempt: Reorganize the notrace definitions a bit
preempt: Use preempt_schedule_context() as the official tracing preemption point
sched: Make preempt_schedule_context() function-tracing safe
x86: Remove cpu_sibling_mask() and cpu_core_mask()
x86: Replace cpu_**_mask() with topology_**_cpumask()
...
Pull locking updates from Ingo Molnar:
"The main changes are:
- 'qspinlock' support, enabled on x86: queued spinlocks - these are
now the spinlock variant used by x86 as they outperform ticket
spinlocks in every category. (Waiman Long)
- 'pvqspinlock' support on x86: paravirtualized variant of queued
spinlocks. (Waiman Long, Peter Zijlstra)
- 'qrwlock' support, enabled on x86: queued rwlocks. Similar to
queued spinlocks, they are now the variant used by x86:
CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y
CONFIG_QUEUED_SPINLOCKS=y
CONFIG_ARCH_USE_QUEUED_RWLOCKS=y
CONFIG_QUEUED_RWLOCKS=y
- various lockdep fixlets
- various locking primitives cleanups, further WRITE_ONCE()
propagation"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
locking/lockdep: Remove hard coded array size dependency
locking/qrwlock: Don't contend with readers when setting _QW_WAITING
lockdep: Do not break user-visible string
locking/arch: Rename set_mb() to smp_store_mb()
locking/arch: Add WRITE_ONCE() to set_mb()
rtmutex: Warn if trylock is called from hard/softirq context
arch: Remove __ARCH_HAVE_CMPXCHG
locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG
locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS
locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS
locking/pvqspinlock: Replace xchg() by the more descriptive set_mb()
locking/pvqspinlock, x86: Enable PV qspinlock for Xen
locking/pvqspinlock, x86: Enable PV qspinlock for KVM
locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching
locking/pvqspinlock: Implement simple paravirt support for the qspinlock
locking/qspinlock: Revert to test-and-set on hypervisors
locking/qspinlock: Use a simple write to grab the lock
locking/qspinlock: Optimize for smaller NR_CPUS
locking/qspinlock: Extract out code snippets for the next patch
locking/qspinlock: Add pending bit
...
pci_dma_burst_advice() was added by e24c2d963a ("[PATCH] PCI: DMA
bursting advice") but apparently never used. Remove it.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Michal Simek <monstr@monstr.eu> # microblaze
CC: David S. Miller <davem@davemloft.net>
We removed the only user of this define in the rtmutex code. Get rid
of it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
On architectures where the stack grows upwards (CONFIG_STACK_GROWSUP=y,
currently parisc and metag only) stack randomization sometimes leads to crashes
when the stack ulimit is set to lower values than STACK_RND_MASK (which is 8 MB
by default if not defined in arch-specific headers).
The problem is, that when the stack vm_area_struct is set up in fs/exec.c, the
additional space needed for the stack randomization (as defined by the value of
STACK_RND_MASK) was not taken into account yet and as such, when the stack
randomization code added a random offset to the stack start, the stack
effectively got smaller than what the user defined via rlimit_max(RLIMIT_STACK)
which then sometimes leads to out-of-stack situations and crashes.
This patch fixes it by adding the maximum possible amount of memory (based on
STACK_RND_MASK) which theoretically could be added by the stack randomization
code to the initial stack size. That way, the user-defined stack size is always
guaranteed to be at minimum what is defined via rlimit_max(RLIMIT_STACK).
This bug is currently not visible on the metag architecture, because on metag
STACK_RND_MASK is defined to 0 which effectively disables stack randomization.
The changes to fs/exec.c are inside an "#ifdef CONFIG_STACK_GROWSUP"
section, so it does not affect other platformws beside those where the
stack grows upwards (parisc and metag).
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
Cc: James Hogan <james.hogan@imgtec.com>
Cc: linux-metag@vger.kernel.org
Cc: stable@vger.kernel.org # v3.16+
We don't have any arch specific scatterlist now that parisc switched over
to the generic one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
The following warning is seen when compiling parisc images
./arch/parisc/include/asm/pgalloc.h: In function 'pgd_alloc':
./arch/parisc/include/asm/pgalloc.h:29:5: warning: "PT_NLEVELS" is not defined
Some definitions of PT_NLEVELS were missed with the conversion to
CONFIG_PGTABLE_LEVELS.
Fixes: f24ffde432 ("parisc: expose number of page table levels
on Kconfig level")
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Helge Deller <deller@gmx.de>
The only reason to keep parisc's private asm/scatterlist.h was that it
had the macro sg_virt_addr(). Convert all callers to use something else
(sometimes just sg->offset was enough, others should use sg_virt()), and
we can just use the asm-generic scatterlist.h instead.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Dave Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Switch to using the newly created asm-generic/seccomp.h for the seccomp
strict mode syscall definitions. Definitions were identical.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull exec domain removal from Richard Weinberger:
"This series removes execution domain support from Linux.
The idea behind exec domains was to support different ABIs. The
feature was never complete nor stable. Let's rip it out and make the
kernel signal handling code less complicated"
* 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc: (27 commits)
arm64: Removed unused variable
sparc: Fix execution domain removal
Remove rest of exec domains.
arch: Remove exec_domain from remaining archs
arc: Remove signal translation and exec_domain
xtensa: Remove signal translation and exec_domain
xtensa: Autogenerate offsets in struct thread_info
x86: Remove signal translation and exec_domain
unicore32: Remove signal translation and exec_domain
um: Remove signal translation and exec_domain
tile: Remove signal translation and exec_domain
sparc: Remove signal translation and exec_domain
sh: Remove signal translation and exec_domain
s390: Remove signal translation and exec_domain
mn10300: Remove signal translation and exec_domain
microblaze: Remove signal translation and exec_domain
m68k: Remove signal translation and exec_domain
m32r: Remove signal translation and exec_domain
m32r: Autogenerate offsets in struct thread_info
frv: Remove signal translation and exec_domain
...
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the code which sets up the pmd depend on PT_NLEVELS == 3, not on
CONFIG_64BIT. The reason is, that a 64bit kernel with a page size
greater than 4k doesn't need the pmd and thus has PT_NLEVELS = 2.
Signed-off-by: Helge Deller <deller@gmx.de>
The patch dc6c9a35b6 that counts pmds
allocated for a process introduced a bug on 64-bit PA-RISC kernels.
The PA-RISC architecture preallocates one pmd with each pgd. This
preallocated pmd can never be freed - pmd_free does nothing when it is
called with this pmd. When the kernel attempts to free this preallocated
pmd, it decreases the count of allocated pmds. The result is that the
counter underflows and this error is reported.
This patch fixes the bug by artifically incrementing the counter in
pmd_free when the kernel tries to free the preallocated pmd.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Core mm expects __PAGETABLE_{PUD,PMD}_FOLDED to be defined if these page
table levels folded. Usually, these defines are provided by
<asm-generic/pgtable-nopmd.h> and <asm-generic/pgtable-nopud.h>.
But some architectures fold page table levels in a custom way. They
need to define these macros themself. This patch adds missing defines.
The patch fixes mm->nr_pmds underflow and eliminates dead __pmd_alloc()
and __pud_alloc() on architectures without these page table levels.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While working on arch/parisc/include/asm/uaccess.h, I noticed that some
macros within this header are made harder to read because they violate a
coding style rule: space is missing after comma.
Fix it up.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
virtio wants to read bitwise types from userspace using get_user. At the
moment this triggers sparse errors, since the value is passed through an
integer.
Fix that up using __force.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target. This is because the
restart_block is held in the same memory allocation as the kernel stack.
Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.
Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.
It's also a decent simplification, since the restart code is more or less
identical on all architectures.
[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKP has triggered a compiler warning after my recent patch "mm: account
pmd page tables to the process":
mm/mmap.c: In function 'exit_mmap':
>> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default]
The code:
> 2857 WARN_ON(mm_nr_pmds(mm) >
2858 round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
In this, on tile, we have FIRST_USER_ADDRESS defined as 0. round_up() has
the same type -- int. PUD_SHIFT.
I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned
long. On every arch for consistency.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull parisc build fix from Helge Deller:
"This unbreaks the kernel compilation on parisc with gcc-4.9"
* 'parisc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix out-of-register compiler error in ldcw inline assembler function
The __ldcw macro has a problem when its argument needs to be reloaded from
memory. The output memory operand and the input register operand both need to
be reloaded using a register in class R1_REGS when generating 64-bit code.
This fails because there's only a single register in the class. Instead, use a
memory clobber. This also makes the __ldcw macro a compiler memory barrier.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> [3.13+]
Signed-off-by: Helge Deller <deller@gmx.de>
Pull networking updates from David Miller:
1) New offloading infrastructure and example 'rocker' driver for
offloading of switching and routing to hardware.
This work was done by a large group of dedicated individuals, not
limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu
2) Start making the networking operate on IOV iterators instead of
modifying iov objects in-situ during transfers. Thanks to Al Viro
and Herbert Xu.
3) A set of new netlink interfaces for the TIPC stack, from Richard
Alpe.
4) Remove unnecessary looping during ipv6 routing lookups, from Martin
KaFai Lau.
5) Add PAUSE frame generation support to gianfar driver, from Matei
Pavaluca.
6) Allow for larger reordering levels in TCP, which are easily
achievable in the real world right now, from Eric Dumazet.
7) Add a variable of napi_schedule that doesn't need to disable cpu
interrupts, from Eric Dumazet.
8) Use a doubly linked list to optimize neigh_parms_release(), from
Nicolas Dichtel.
9) Various enhancements to the kernel BPF verifier, and allow eBPF
programs to actually be attached to sockets. From Alexei
Starovoitov.
10) Support TSO/LSO in sunvnet driver, from David L Stevens.
11) Allow controlling ECN usage via routing metrics, from Florian
Westphal.
12) Remote checksum offload, from Tom Herbert.
13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
driver, from Thomas Lendacky.
14) Add MPLS support to openvswitch, from Simon Horman.
15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
Klassert.
16) Do gro flushes on a per-device basis using a timer, from Eric
Dumazet. This tries to resolve the conflicting goals between the
desired handling of bulk vs. RPC-like traffic.
17) Allow userspace to ask for the CPU upon what a packet was
received/steered, via SO_INCOMING_CPU. From Eric Dumazet.
18) Limit GSO packets to half the current congestion window, from Eric
Dumazet.
19) Add a generic helper so that all drivers set their RSS keys in a
consistent way, from Eric Dumazet.
20) Add xmit_more support to enic driver, from Govindarajulu
Varadarajan.
21) Add VLAN packet scheduler action, from Jiri Pirko.
22) Support configurable RSS hash functions via ethtool, from Eyal
Perry.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
Fix race condition between vxlan_sock_add and vxlan_sock_release
net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
net/mlx4: Add support for A0 steering
net/mlx4: Refactor QUERY_PORT
net/mlx4_core: Add explicit error message when rule doesn't meet configuration
net/mlx4: Add A0 hybrid steering
net/mlx4: Add mlx4_bitmap zone allocator
net/mlx4: Add a check if there are too many reserved QPs
net/mlx4: Change QP allocation scheme
net/mlx4_core: Use tasklet for user-space CQ completion events
net/mlx4_core: Mask out host side virtualization features for guests
net/mlx4_en: Set csum level for encapsulated packets
be2net: Export tunnel offloads only when a VxLAN tunnel is created
gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
net: fec: only enable mdio interrupt before phy device link up
net: fec: clear all interrupt events to support i.MX6SX
net: fec: reset fep link status in suspend function
net: sock: fix access via invalid file descriptor
net: introduce helper macro for_each_cmsghdr
...
As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.
This basically reverts commit 71ae8aac3e ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
237217546d ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f74f ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652df5
("perf tools: Fix include for non x86 architectures").
Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While there normally is no reason to have a pull request for asm-generic
but have all changes get merged through whichever tree needs them, I do
have a series for 3.19. There are two sets of patches that change
significant portions of asm/io.h, and this branch contains both in order
to resolve the conflicts:
- Will Deacon has done a set of patches to ensure that all architectures
define {read,write}{b,w,l,q}_relaxed() functions or get them by
including asm-generic/io.h. These functions are commonly used on ARM
specific drivers to avoid expensive L2 cache synchronization implied by
the normal {read,write}{b,w,l,q}, but we need to define them on all
architectures in order to share the drivers across architectures and
to enable CONFIG_COMPILE_TEST configurations for them
- Thierry Reding has done an unrelated set of patches that extends
the asm-generic/io.h file to the degree necessary to make it useful
on ARM64 and potentially other architectures.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIVAwUAVIdwNmCrR//JCVInAQJWuw/9FHt2ThMnI1J1Jqy4CVwtyjWTSa6Y/uVj
xSytS7AOvmU/nw1quSoba5mN9fcUQUtK9kqjqNcq71WsQcDE6BF9SFpi9cWtjWcI
ZfWsC+5kqry/mbnuHefENipem9RqBrLbOBJ3LARf5M8rZJuTz1KbdZs9r9+1QsCX
ou8jeqVvNKUn9J1WyekJBFSrPOtZ4bCUpeyh23JHRfPtJeAHNOuPuymj6WceAz98
uMV1icRaCBMySsf9HgsHRYW5HwuCm3MrrYj6ukyPpgxYz7FRq4hJLDs6GnlFtAGb
71g87NpFdB32qbW+y1ntfYaJyUryMHMVHBWcV5H9m0btdHTRHYZjoOGOPuyLHHO8
+l4/FaOQhnDL8cNDj0HKfhdlyaFylcWgs1wzj68nv31c1dGjcJcQiyCDwry9mJhr
erh4EewcerUvWzbBMQ4JP1f8syKMsKwbo1bVU61a1RQJxEqVCzJMLweGSOFmqMX2
6E4ZJVWv81UFLoFTzYx+7+M45K4NWywKNQdzwKmqKHc4OQyvq4ALJI0A7SGFJdDR
HJ7VqDiLaSdBitgJcJUxNzKcyXij6wE9jE1fBe3YDFE4LrnZXFVLN+MX6hs7AIFJ
vJM1UpxRxQUMGIH2m7rbDNazOAsvQGxINOjNor23cNLuf6qLY1LrpHVPQDAfJVvA
6tROM77bwIQ=
=xUv6
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic asm/io.h rewrite from Arnd Bergmann:
"While there normally is no reason to have a pull request for
asm-generic but have all changes get merged through whichever tree
needs them, I do have a series for 3.19.
There are two sets of patches that change significant portions of
asm/io.h, and this branch contains both in order to resolve the
conflicts:
- Will Deacon has done a set of patches to ensure that all
architectures define {read,write}{b,w,l,q}_relaxed() functions or
get them by including asm-generic/io.h.
These functions are commonly used on ARM specific drivers to avoid
expensive L2 cache synchronization implied by the normal
{read,write}{b,w,l,q}, but we need to define them on all
architectures in order to share the drivers across architectures
and to enable CONFIG_COMPILE_TEST configurations for them
- Thierry Reding has done an unrelated set of patches that extends
the asm-generic/io.h file to the degree necessary to make it useful
on ARM64 and potentially other architectures"
* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (29 commits)
ARM64: use GENERIC_PCI_IOMAP
sparc: io: remove duplicate relaxed accessors on sparc32
ARM: sa11x0: Use void __iomem * in MMIO accessors
arm64: Use include/asm-generic/io.h
ARM: Use include/asm-generic/io.h
asm-generic/io.h: Implement generic {read,write}s*()
asm-generic/io.h: Reconcile I/O accessor overrides
/dev/mem: Use more consistent data types
Change xlate_dev_{kmem,mem}_ptr() prototypes
ARM: ixp4xx: Properly override I/O accessors
ARM: ixp4xx: Fix build with IXP4XX_INDIRECT_PCI
ARM: ebsa110: Properly override I/O accessors
ARC: Remove redundant PCI_IOBASE declaration
documentation: memory-barriers: clarify relaxed io accessor semantics
x86: io: implement dummy relaxed accessor macros for writes
tile: io: implement dummy relaxed accessor macros for writes
sparc: io: implement dummy relaxed accessor macros for writes
powerpc: io: implement dummy relaxed accessor macros for writes
parisc: io: implement dummy relaxed accessor macros for writes
mn10300: io: implement dummy relaxed accessor macros for writes
...
introduce new setsockopt() command:
setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))
where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER
setsockopt() calls bpf_prog_get() which increments refcnt of the program,
so it doesn't get unloaded while socket is using the program.
The same eBPF program can be attached to multiple sockets.
User task exit automatically closes socket which calls sk_filter_uncharge()
which decrements refcnt of eBPF program
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ieee802154/fakehard.c
A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
in 'net-next'.
Add build fix into the merge from Stephen Rothwell in openvswitch, the
logging macros take a new initial 'log' argument, a new call was added
in 'net' so when we merge that in here we have to explicitly add the
new 'log' arg to it else the build fails.
Signed-off-by: David S. Miller <davem@davemloft.net>
Alternative to RPS/RFS is to use hardware support for multiple
queues.
Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.
Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.
We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.
After accept(), connect(), or even file descriptor passing around
processes, applications can use :
int cpu;
socklen_t len = sizeof(cpu);
getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch over the msgctl, shmat, shmctl and semtimedop syscalls to use the compat
layer. The problem was found with the debian procenv package, which called
shmctl(0, SHM_INFO, &info);
in which the shmctl syscall then overwrote parts of the surrounding areas on
the stack on which the info variable was stored and thus lead to a segfault
later on.
Additionally fix the definition of struct shminfo64 to use unsigned longs like
the other architectures. This has no impact on userspace since we only have a
32bit userspace up to now.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v3.10+
write{b,w,l,q}_relaxed are implemented by some architectures in order to
permit memory-mapped I/O accesses with weaker barrier semantics than the
non-relaxed variants.
This patch adds dummy macros for the write accessors to parisc, in the
same vein as the dummy definitions for the relaxed read accessors.
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull audit updates from Eric Paris:
"So this change across a whole bunch of arches really solves one basic
problem. We want to audit when seccomp is killing a process. seccomp
hooks in before the audit syscall entry code. audit_syscall_entry
took as an argument the arch of the given syscall. Since the arch is
part of what makes a syscall number meaningful it's an important part
of the record, but it isn't available when seccomp shoots the
syscall...
For most arch's we have a better way to get the arch (syscall_get_arch)
So the solution was two fold: Implement syscall_get_arch() everywhere
there is audit which didn't have it. Use syscall_get_arch() in the
seccomp audit code. Having syscall_get_arch() everywhere meant it was
a useless flag on the stack and we could get rid of it for the typical
syscall entry.
The other changes inside the audit system aren't grand, fixed some
records that had invalid spaces. Better locking around the task comm
field. Removing some dead functions and structs. Make some things
static. Really minor stuff"
* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: rename audit_log_remove_rule to disambiguate for trees
audit: cull redundancy in audit_rule_change
audit: WARN if audit_rule_change called illegally
audit: put rule existence check in canonical order
next: openrisc: Fix build
audit: get comm using lock to avoid race in string printing
audit: remove open_arg() function that is never used
audit: correct AUDIT_GET_FEATURE return message type
audit: set nlmsg_len for multicast messages.
audit: use union for audit_field values since they are mutually exclusive
audit: invalid op= values for rules
audit: use atomic_t to simplify audit_serial()
kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
audit: reduce scope of audit_log_fcaps
audit: reduce scope of audit_net_id
audit: arm64: Remove the audit arch argument to audit_syscall_entry
arm64: audit: Add audit hook in syscall_trace_enter/exit()
audit: x86: drop arch from __audit_syscall_entry() interface
sparc: implement is_32bit_task
sparc: properly conditionalize use of TIF_32BIT
...
Pull arch atomic cleanups from Ingo Molnar:
"This is a series kept separate from the main locking tree, which
cleans up and improves various details in the atomics type handling:
- Remove the unused atomic_or_long() method
- Consolidate and compress atomic ops implementations between
architectures, to reduce linecount and to make it easier to add new
ops.
- Rewrite generic atomic support to only require cmpxchg() from an
architecture - generate all other methods from that"
* 'locking-arch-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read()
locking, mips: Fix atomics
locking, sparc64: Fix atomics
locking,arch: Rewrite generic atomic support
locking,arch,xtensa: Fold atomic_ops
locking,arch,sparc: Fold atomic_ops
locking,arch,sh: Fold atomic_ops
locking,arch,powerpc: Fold atomic_ops
locking,arch,parisc: Fold atomic_ops
locking,arch,mn10300: Fold atomic_ops
locking,arch,mips: Fold atomic_ops
locking,arch,metag: Fold atomic_ops
locking,arch,m68k: Fold atomic_ops
locking,arch,m32r: Fold atomic_ops
locking,arch,ia64: Fold atomic_ops
locking,arch,hexagon: Fold atomic_ops
locking,arch,cris: Fold atomic_ops
locking,arch,avr32: Fold atomic_ops
locking,arch,arm64: Fold atomic_ops
locking,arch,arm: Fold atomic_ops
...
This patch reduces the value of SIGRTMIN on PARISC from 37 to 32, thus
increasing the number of available RT signals and bring it in sync with other
Linux architectures.
Historically we wanted to natively support HP-UX 32bit binaries with the
PA-RISC Linux port. Because of that we carried the various available signals
from HP-UX (e.g. SIGEMT and SIGLOST) and folded them in between the native
Linux signals. Although this was the right decision at that time, this
required us to increase SIGRTMIN to at least 37 which left us with 27 (64-37)
RT signals.
Those 27 RT signals haven't been a problem in the past, but with the upcoming
importance of systemd we now got the problem that systemd alloctes (hardcoded)
signals up to SIGRTMIN+29 which is beyond our NSIG of 64. Because of that we
have not been able to use systemd on the PARISC Linux port yet.
Of course we could ask the systemd developers to not use those hardcoded
values, but this change is very unlikely, esp. with PA-RISC being a niche
architecture.
The other possibility would be to increase NSIG to e.g. 128, but this would
mean to duplicate most of the existing Linux signal handling code into the
parisc specific Linux kernel tree which would most likely introduce lots of new
bugs beside the code duplication.
The third option is to drop some HP-UX signals and shuffle some other signals
around to bring SIGRTMIN to 32. This is of course an ABI change, but testing
has shown that existing Linux installations are not visibly affected by this
change - most likely because we move those signals around which are rarely used
and move them to slots which haven't been used in Linux yet. In an existing
installation I was able to exchange either the Linux kernel or glibc (or both)
without affecting the boot process and installed applications.
Dropping the HP-UX signals isn't an issue either, since support for HP-UX was
basically dropped a few months back with Kernel 3.14 in commit
f5a408d53e already, when we changed EWOULDBLOCK
to be equal to EAGAIN.
So, even if this is an ABI change, it's better to change it now and thus bring
PARISC Linux in sync with other architectures to avoid other issues in the
future.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Carlos O'Donell <carlos@systemhalted.org>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: PARISC Linux Kernel Mailinglist <linux-parisc@vger.kernel.org>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Pull timer fixes from Ingo Molnar:
"Main changes:
- Fix the deadlock reported by Dave Jones et al
- Clean up and fix nohz_full interaction with arch abilities
- nohz init code consolidation/cleanup"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz: nohz full depends on irq work self IPI support
nohz: Consolidate nohz full init code
arm64: Tell irq work about self IPI support
arm: Tell irq work about self IPI support
x86: Tell irq work about self IPI support
irq_work: Force raised irq work to run on irq work interrupt
irq_work: Introduce arch_irq_work_has_interrupt()
nohz: Move nohz full init call to tick init
Use the much more reader friendly ACCESS_ONCE() instead of the cast to volatile.
This is purely a stylistic change.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1411482607-20948-1-git-send-email-bobby.prani@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The nohz full code needs irq work to trigger its own interrupt so that
the subsystem can work even when the tick is stopped.
Lets introduce arch_irq_work_has_interrupt() that archs can override to
tell about their support for this ability.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Commit: e676253b19 (serial/8250: Add
support for RS485 IOCTLs), adds support for RS485 ioctls for 825_core on
all the archs. Unfortunaltely the definition of TIOCSRS485 and
TIOCGRS485 was missing on the ioctls.h file
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
OK, no LoC saved in this case because sub was defined in terms of add.
Still do it because this also prepares for easy addition of new ops.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: linux-parisc@vger.kernel.org
Link: http://lkml.kernel.org/r/20140508135852.659342353@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull locking updates from Ingo Molnar:
"The main changes in this cycle are:
- big rtmutex and futex cleanup and robustification from Thomas
Gleixner
- mutex optimizations and refinements from Jason Low
- arch_mutex_cpu_relax() removal and related cleanups
- smaller lockdep tweaks"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
arch, locking: Ciao arch_mutex_cpu_relax()
locking/lockdep: Only ask for /proc/lock_stat output when available
locking/mutexes: Optimize mutex trylock slowpath
locking/mutexes: Try to acquire mutex only if it is unlocked
locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macro
locking/mutexes: Correct documentation on mutex optimistic spinning
rtmutex: Make the rtmutex tester depend on BROKEN
futex: Simplify futex_lock_pi_atomic() and make it more robust
futex: Split out the first waiter attachment from lookup_pi_state()
futex: Split out the waiter check from lookup_pi_state()
futex: Use futex_top_waiter() in lookup_pi_state()
futex: Make unlock_pi more robust
rtmutex: Avoid pointless requeueing in the deadlock detection chain walk
rtmutex: Cleanup deadlock detector debug logic
rtmutex: Confine deadlock logic to futex
rtmutex: Simplify remove_waiter()
rtmutex: Document pi chain walk
rtmutex: Clarify the boost/deboost part
rtmutex: No need to keep task ref for lock owner check
rtmutex: Simplify and document try_to_take_rtmutex()
...
The sa_restorer field in struct sigaction is obsolete and no longer in
the parisc implementation. However, the core code assumes the field is
present if SA_RESTORER is defined. So, the define needs to be removed.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
The arch_mutex_cpu_relax() function, introduced by 34b133f, is
hacky and ugly. It was added a few years ago to address the fact
that common cpu_relax() calls include yielding on s390, and thus
impact the optimistic spinning functionality of mutexes. Nowadays
we use this function well beyond mutexes: rwsem, qrwlock, mcs and
lockref. Since the macro that defines the call is in the mutex header,
any users must include mutex.h and the naming is misleading as well.
This patch (i) renames the call to cpu_relax_lowlatency ("relax, but
only if you can do it with very low latency") and (ii) defines it in
each arch's asm/processor.h local header, just like for regular cpu_relax
functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax,
and thus we can take it out of mutex.h. While this can seem redundant,
I believe it is a good choice as it allows us to move out arch specific
logic from generic locking primitives and enables future(?) archs to
transparently define it, similarly to System Z.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bharat Bhushan <r65777@freescale.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Joseph Myers <joseph@codesourcery.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Qiaowei Ren <qiaowei.ren@intel.com>
Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Stratos Karafotis <stratosk@semaphore.gr>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: adi-buildroot-devel@lists.sourceforge.net
Cc: linux390@de.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-am33-list@redhat.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-cris-kernel@axis.com
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux@lists.openrisc.net
Cc: linux-m32r-ja@ml.linux-m32r.org
Cc: linux-m32r@ml.linux-m32r.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-metag@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1404079773.2619.4.camel@buesod1.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Strings library contributed to glibc but re-licensed under GPLv2)
- Optimised crypto algorithms making use of the ARMv8 crypto extensions
(together with kernel API for using FPSIMD instructions in interrupt
context)
- Ftrace support
- CPU topology parsing from DT
- ESR_EL1 (Exception Syndrome Register) exposed to user space signal
handlers for SIGSEGV/SIGBUS (useful to emulation tools like Qemu)
- 1GB section linear mapping if applicable
- Barriers usage clean-up
- Default pgprot clean-up
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
iQIcBAABAgAGBQJTkb+CAAoJEGvWsS0AyF7xLyEQAJgL8s2SdDyd+R8aukNDu3n9
tCK7yVHO9Kg96dfeXVuSOVEo2jszo6R3nxzUL05FMovr230WBcmoeHvHz8ETGnw1
g0yO8Ltkckjevog4UleCa3wGtYISjvwwrTalzbqoEWzsF2AV8oiqv/yuIn/EdkUr
jaOqfNsnAQa8TIz4vMhi/AVdJWTTU/F6WP80oqCbxqXu/WL2InuBlHtOJMbk1HDI
u1DJUGDQ1B9OgSVRkAOjCjSsEtz8sDY3lXsg3V1qT5+NbZTyomYM2IiBLdgQcX4P
t/rqX9nX4VmRQtzefeP5WhKFks2x80C0BKibWC4teeL++tJHbgbFkyjoZZGcP27o
zued3cYABrjrcAEU6ko/LUiL2Q4ozBOzosClpjpWulCxNPzsOps82UZWo3F3XbAt
xjE3k7WF9WeNBOJdDGrarEaSLdnjjgCLoWVs8cOUYLpOOrtdSw16D29jJ68U0Y5g
31wdwKxoueC8SFt8M9fP9J9Jyau08g+kvW1xQXrRmroppweFxjSpSy90imARyux/
wUFz79HxkQB79ZHpJ0I5TNrw/w+7pBnfVSKGPOzrk+ZUsaH76caNRBoffUCzFMzz
T3Sc8A36TZtOIcGR/Q4DMZNFXlIUXDSzCHP2Iu0QoIjTd5Ex96cqNvy3nswCYWwv
yGe3ZEqUq9+WL7snNW4v
=Jj8U
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into next
Pull arm64 updates from Catalin Marinas:
- Optimised assembly string/memory routines (based on the AArch64
Cortex Strings library contributed to glibc but re-licensed under
GPLv2)
- Optimised crypto algorithms making use of the ARMv8 crypto extensions
(together with kernel API for using FPSIMD instructions in interrupt
context)
- Ftrace support
- CPU topology parsing from DT
- ESR_EL1 (Exception Syndrome Register) exposed to user space signal
handlers for SIGSEGV/SIGBUS (useful to emulation tools like Qemu)
- 1GB section linear mapping if applicable
- Barriers usage clean-up
- Default pgprot clean-up
Conflicts as per Catalin.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (57 commits)
arm64: kernel: initialize broadcast hrtimer based clock event device
arm64: ftrace: Add system call tracepoint
arm64: ftrace: Add CALLER_ADDRx macros
arm64: ftrace: Add dynamic ftrace support
arm64: Add ftrace support
ftrace: Add arm64 support to recordmcount
arm64: Add 'notrace' attribute to unwind_frame() for ftrace
arm64: add __ASSEMBLY__ in asm/insn.h
arm64: Fix linker script entry point
arm64: lib: Implement optimized string length routines
arm64: lib: Implement optimized string compare routines
arm64: lib: Implement optimized memcmp routine
arm64: lib: Implement optimized memset routine
arm64: lib: Implement optimized memmove routine
arm64: lib: Implement optimized memcpy routine
arm64: defconfig: enable a few more common/useful options in defconfig
ftrace: Make CALLER_ADDRx macros more generic
arm64: Fix deadlock scenario with smp_send_stop()
arm64: Fix machine_shutdown() definition
arm64: Support arch_irq_work_raise() via self IPIs
...
pcibios_penalize_isa_irq() is only implemented by x86 now, and legacy ISA
is not used by some architectures. Make pcibios_penalize_isa_irq() a
__weak function to simplify the code. This removes the need for new
platforms to add stub implementations of pcibios_penalize_isa_irq().
[bhelgaas: changelog, comments]
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Most archs with HAVE_ARCH_CALLER_ADDR have pretty much the same
definitions of CALLER_ADDRx(n). Instead of duplicating the code for all
the archs, define a ftrace_return_address0() and
ftrace_return_address(n) that can be overwritten by the archs if they
need to do something different. Instead of 7 macros in every arch, we
now only have at most 2 (and actually only 1 as
ftrace_return_address0() should be the same for all archs).
The CALLER_ADDRx(n) will now be defined in linux/ftrace.h and use the
ftrace_return_address*(n?) macros. This removes a lot of the duplicate
code.
Link: http://lkml.kernel.org/p/1400585464-30333-1-git-send-email-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This patch affects only architectures where the stack grows upwards
(currently parisc and metag only). On those do not hardcode the maximum
initial stack size to 1GB for 32-bit processes, but make it configurable
via a config option.
The main problem with the hardcoded stack size is, that we have two
memory regions which grow upwards: stack and heap. To keep most of the
memory available for heap in a flexmap memory layout, it makes no sense
to hard allocate up to 1GB of the memory for stack which can't be used
as heap then.
This patch makes the stack size for 32-bit processes configurable and
uses 80MB as default value which has been in use during the last few
years on parisc and which hasn't showed any problems yet.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: linux-parisc@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: John David Anglin <dave.anglin@bell.net>
Specify the maximum stack size for arches where the stack grows upward
(parisc and metag) in asm/processor.h rather than hard coding in
fs/exec.c so that metag can specify a smaller value of 256MB rather than
1GB.
This fixes a BUG on metag if the RLIMIT_STACK hard limit is increased
beyond a safe value by root. E.g. when starting a process after running
"ulimit -H -s unlimited" it will then attempt to use a stack size of the
maximum 1GB which is far too big for metag's limited user virtual
address space (stack_top is usually 0x3ffff000):
BUG: failure at fs/exec.c:589/shift_arg_pages()!
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: linux-parisc@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # only needed for >= v3.9 (arch/metag)
Pull parisc fixes from Helge Deller:
"Drop the architecture-specifc value for_STK_LIM_MAX to fix stack
related problems with GNU make"
* 'parisc-3.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Use generic uapi/asm/resource.h file
parisc: remove _STK_LIM_MAX override
There are only a couple of architectures that override _STK_LIM_MAX to
a non-infinity value. This changes the stack allocation semantics in
subtle ways. For example, GNU make changes its stack allocation to the
hard maximum defined by _STK_LIM_MAX. As a results, threads executed
by processes running under make are allocated a stack size of
_STK_LIM_MAX rather than a sensible default value. This causes various
thread stress tests to fail when they can't muster more than about 50
threads.
The attached change implements the default behavior used by the
majority of architectures.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Reviewed-by: Carlos O'Donell <carlos@systemhalted.org>
Cc: stable@vger.kernel.org # 3.14
Signed-off-by: Helge Deller <deller@gmx.de>
Pull parisc updates from Helge Deller:
"There are two major changes in this patchset:
The major fix is that the epoll_pwait() syscall for 32bit userspace
was not using the compat wrapper on a 64bit kernel.
Secondly we changed the value of SHMLBA from 4MB to PAGE_SIZE to
reflect that we can actually mmap to any multiple of PAGE_SIZE. The
only thing which needs care is that shared mmaps need to be mapped at
the same offset inside the 4MB cache window"
* 'parisc-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix epoll_pwait syscall on compat kernel
parisc: change value of SHMLBA from 0x00400000 to PAGE_SIZE
parisc: Replace __get_cpu_var uses for address calculation
On parisc, SHMLBA was defined to 0x00400000 (4MB) to reflect that we need to
take care of our caches for shared mappings. But actually, we can map a file at
any multiple address of PAGE_SIZE, so let us correct that now with a value of
PAGE_SIZE for SHMLBA. Instead we now take care of this cache colouring via the
constant SHM_COLOUR while we map shared pages.
Signed-off-by: Helge Deller <deller@gmx.de>
CC: Jeroen Roovers <jer@gentoo.org>
CC: John David Anglin <dave.anglin@bell.net>
CC: Carlos O'Donell <carlos@systemhalted.org>
Cc: stable@kernel.org [3.13+]
Pull core locking updates from Ingo Molnar:
"The biggest change is the MCS spinlock generalization changes from Tim
Chen, Peter Zijlstra, Jason Low et al. There's also lockdep
fixes/enhancements from Oleg Nesterov, in particular a false negative
fix related to lockdep_set_novalidate_class() usage"
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
locking/mutex: Fix debug checks
locking/mutexes: Add extra reschedule point
locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
locking/mutexes: Unlock the mutex without the wait_lock
locking/mutexes: Modify the way optimistic spinners are queued
locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
locking: Move mcs_spinlock.h into kernel/locking/
m68k: Skip futex_atomic_cmpxchg_inatomic() test
futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
lockdep: Change lockdep_set_novalidate_class() to use _and_name
lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate
lockdep: Don't create the wrong dependency on hlock->check == 0
lockdep: Make held_lock->check and "int check" argument bool
locking/mcs: Allow architecture specific asm files to be used for contended case
locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
sched/wait: Suppress Sparse 'variable shadowing' warning
hung_task/Documentation: Fix hung_task_warnings description
locking/mcs: Allow architectures to hook in to contended paths
locking/mcs: Micro-optimize the MCS code, add extra comments
...
Now that the arch_{spin,read,write}_relax macros default to cpu_relax(),
remove the redundant definitions for parisc.
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Helge Deller <deller@gmx.de>
We seem to be nearly the only platform which does not provide the
sys_utimes syscall. Adding it now makes our life much easier with
userspace applications (like dietlibc and e2fsprogs) since we then
behave like all other platforms too and don't need extra patches which
are hard to get upstream anyway because we are not a mainstream
architecture.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v3.13
The attached change removes the unused and experimental
CONFIG_PARISC_TMPALIAS code. It doesn't work and I don't believe it will
ever be used.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
This patch allows each architecture to add its specific assembly optimized
arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for
MCS lock and unlock functions.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: AswinChandramouleeswaran <aswin@hp.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rik vanRiel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: MichelLespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Figo.zhang" <figo1802@gmail.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390347382.3138.67.camel@schen9-DESK
Signed-off-by: Ingo Molnar <mingo@kernel.org>