Pull networking updates from David Miller:
"Highlights:
1) Support SPI based w5100 devices, from Akinobu Mita.
2) Partial Segmentation Offload, from Alexander Duyck.
3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.
4) Allow cls_flower stats offload, from Amir Vadai.
5) Implement bpf blinding, from Daniel Borkmann.
6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
actually using FASYNC these atomics are superfluous. From Eric
Dumazet.
7) Run TCP more preemptibly, also from Eric Dumazet.
8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
driver, from Gal Pressman.
9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.
10) Improve BPF usage documentation, from Jesper Dangaard Brouer.
11) Support tunneling offloads in qed, from Manish Chopra.
12) aRFS offloading in mlx5e, from Maor Gottlieb.
13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
Leitner.
14) Add MSG_EOR support to TCP, this allows controlling packet
coalescing on application record boundaries for more accurate
socket timestamp sampling. From Martin KaFai Lau.
15) Fix alignment of 64-bit netlink attributes across the board, from
Nicolas Dichtel.
16) Per-vlan stats in bridging, from Nikolay Aleksandrov.
17) Several conversions of drivers to ethtool ksettings, from Philippe
Reynes.
18) Checksum neutral ILA in ipv6, from Tom Herbert.
19) Factorize all of the various marvell dsa drivers into one, from
Vivien Didelot
20) Add VF support to qed driver, from Yuval Mintz"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
Revert "phy dp83867: Make rgmii parameters optional"
r8169: default to 64-bit DMA on recent PCIe chips
phy dp83867: Make rgmii parameters optional
phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
bpf: arm64: remove callee-save registers use for tmp registers
asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
switchdev: pass pointer to fib_info instead of copy
net_sched: close another race condition in tcf_mirred_release()
tipc: fix nametable publication field in nl compat
drivers: net: Don't print unpopulated net_device name
qed: add support for dcbx.
ravb: Add missing free_irq() calls to ravb_close()
qed: Remove a stray tab
net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fec-mpc52xx: use phydev from struct net_device
bpf, doc: fix typo on bpf_asm descriptions
stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
net: ethernet: fs-enet: use phydev from struct net_device
...
This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:
- The fs.suid_dumpable sysctl is set to 2.
- The kernel.core_pattern sysctl's value starts with "/". (Systems
where kernel.core_pattern starts with "|/" are not affected.)
- Unprivileged user namespace creation is permitted. (This is
true on Linux >=3.8, but some distributions disallow it by
default using a distro patch.)
Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.
To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 protection key support from Ingo Molnar:
"This tree adds support for a new memory protection hardware feature
that is available in upcoming Intel CPUs: 'protection keys' (pkeys).
There's a background article at LWN.net:
https://lwn.net/Articles/643797/
The gist is that protection keys allow the encoding of
user-controllable permission masks in the pte. So instead of having a
fixed protection mask in the pte (which needs a system call to change
and works on a per page basis), the user can map a (handful of)
protection mask variants and can change the masks runtime relatively
cheaply, without having to change every single page in the affected
virtual memory range.
This allows the dynamic switching of the protection bits of large
amounts of virtual memory, via user-space instructions. It also
allows more precise control of MMU permission bits: for example the
executable bit is separate from the read bit (see more about that
below).
This tree adds the MM infrastructure and low level x86 glue needed for
that, plus it adds a high level API to make use of protection keys -
if a user-space application calls:
mmap(..., PROT_EXEC);
or
mprotect(ptr, sz, PROT_EXEC);
(note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
this special case, and will set a special protection key on this
memory range. It also sets the appropriate bits in the Protection
Keys User Rights (PKRU) register so that the memory becomes unreadable
and unwritable.
So using protection keys the kernel is able to implement 'true'
PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
PROT_READ as well. Unreadable executable mappings have security
advantages: they cannot be read via information leaks to figure out
ASLR details, nor can they be scanned for ROP gadgets - and they
cannot be used by exploits for data purposes either.
We know about no user-space code that relies on pure PROT_EXEC
mappings today, but binary loaders could start making use of this new
feature to map binaries and libraries in a more secure fashion.
There is other pending pkeys work that offers more high level system
call APIs to manage protection keys - but those are not part of this
pull request.
Right now there's a Kconfig that controls this feature
(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
(like most x86 CPU feature enablement code that has no runtime
overhead), but it's not user-configurable at the moment. If there's
any serious problem with this then we can make it configurable and/or
flip the default"
* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
mm/core, x86/mm/pkeys: Add execute-only protection keys support
x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
x86/mm/pkeys: Allow kernel to modify user pkey rights register
x86/fpu: Allow setting of XSAVE state
x86/mm: Factor out LDT init from context init
mm/core, x86/mm/pkeys: Add arch_validate_pkey()
mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
x86/mm/pkeys: Add Kconfig prompt to existing config option
x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
x86/mm/pkeys: Dump PKRU with other kernel registers
mm/core, x86/mm/pkeys: Differentiate instruction fetches
x86/mm/pkeys: Optimize fault handling in access_error()
mm/core: Do not enforce PKEY permissions on remote mm access
um, pkeys: Add UML arch_*_access_permitted() methods
mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
x86/mm/gup: Simplify get_user_pages() PTE bit handling
...
There are few things about *pte_alloc*() helpers worth cleaning up:
- 'vma' argument is unused, let's drop it;
- most __pte_alloc() callers do speculative check for pmd_none(),
before taking ptl: let's introduce pte_alloc() macro which does
the check.
The only direct user of __pte_alloc left is userfaultfd, which has
different expectation about atomicity wrt pmd.
- pte_alloc_map() and pte_alloc_map_lock() are redefined using
pte_alloc().
[sudeep.holla@arm.com: fix build for arm64 hugetlbpage]
[sfr@canb.auug.org.au: fix arch/arm/mm/mmu.c some more]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit db2f24dc24
was plain wrong. I did not realize the we are
allowed to loop here.
In fact we have to loop and must not return to userspace
before all SIGSEGVs have been delivered.
Other archs do this directly in their entry code, UML
does it here.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
Commit 16da306849 ("um: kill pfn_t") introduced a compile warning for
defconfig (SUBARCH=i386):
arch/um/kernel/skas/mmu.c:38:206:
warning: right shift count >= width of type [-Wshift-count-overflow]
Aforementioned patch changes the definition of the phys_to_pfn() macro
from
((pfn_t) ((p) >> PAGE_SHIFT))
to
((p) >> PAGE_SHIFT)
This effectively changes the phys_to_pfn() expansion's type from
unsigned long long to unsigned long.
Through the callchain init_stub_pte() => mk_pte(), the expansion of
phys_to_pfn() is (indirectly) fed into the 'phys' argument of the
pte_set_val(pte, phys, prot) macro, eventually leading to
(pte).pte_high = (phys) >> 32;
This results in the warning from above.
Since UML only deals with 32 bit addresses, the upper 32 bits from
'phys' used to be always zero anyway. Also, all page protection flags
defined by UML don't use any bits beyond bit 9. Since the contents of a
PTE are defined within architecture scope only, the ->pte_high member
can be safely removed.
Remove the ->pte_high member from struct pte_t.
Rename ->pte_low to ->pte.
Adapt the pte helper macros in arch/um/include/asm/page.h.
Noteworthy is the pte_copy() macro where a smp_wmb() gets dropped. This
write barrier doesn't seem to be paired with any read barrier though and
thus, was useless anyway.
Fixes: 16da306849 ("um: kill pfn_t")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The core has developed a need for a "pfn_t" type [1]. Convert the usage
of pfn_t by usermode-linux to an unsigned long, and update pfn_to_phys()
to drop its expectation of a typed pfn.
[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull misc vfs updates from Al Viro:
"All kinds of stuff. That probably should've been 5 or 6 separate
branches, but by the time I'd realized how large and mixed that bag
had become it had been too close to -final to play with rebasing.
Some fs/namei.c cleanups there, memdup_user_nul() introduction and
switching open-coded instances, burying long-dead code, whack-a-mole
of various kinds, several new helpers for ->llseek(), assorted
cleanups and fixes from various people, etc.
One piece probably deserves special mention - Neil's
lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
called without ->i_mutex and tries to avoid ever taking it. That, of
course, means that it's not useful for any directory modifications,
but things like getting inode attributes in nfds readdirplus are fine
with that. I really should've asked for moratorium on lookup-related
changes this cycle, but since I hadn't done that early enough... I
*am* asking for that for the coming cycle, though - I'm going to try
and get conversion of i_mutex to rwsem with ->lookup() done under lock
taken shared.
There will be a patch closer to the end of the window, along the lines
of the one Linus had posted last May - mechanical conversion of
->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
inode_is_locked()/inode_lock_nested(). To quote Linus back then:
-----
| This is an automated patch using
|
| sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
| sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
| sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
| sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
| sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
|
| with a very few manual fixups
-----
I'm going to send that once the ->i_mutex-affecting stuff in -next
gets mostly merged (or when Linus says he's about to stop taking
merges)"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
nfsd: don't hold i_mutex over userspace upcalls
fs:affs:Replace time_t with time64_t
fs/9p: use fscache mutex rather than spinlock
proc: add a reschedule point in proc_readfd_common()
logfs: constify logfs_block_ops structures
fcntl: allow to set O_DIRECT flag on pipe
fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
fs: xattr: Use kvfree()
[s390] page_to_phys() always returns a multiple of PAGE_SIZE
nbd: use ->compat_ioctl()
fs: use block_device name vsprintf helper
lib/vsprintf: add %*pg format specifier
fs: use gendisk->disk_name where possible
poll: plug an unused argument to do_poll
amdkfd: don't open-code memdup_user()
cdrom: don't open-code memdup_user()
rsxx: don't open-code memdup_user()
mtip32xx: don't open-code memdup_user()
[um] mconsole: don't open-code memdup_user_nul()
[um] hostaudio: don't open-code memdup_user()
...
Open the memory mapped file with the O_TMPFILE flag when available.
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Acked-by: Tristan Schmelcher <tschmelcher@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Remove the insecure 0777 mode for temporary file to prohibit other users
to change the executable mapped code.
An attacker could gain access to the mapped file descriptor from the
temporary file (before it is unlinked) in a read-only mode but it should
not be accessible in write mode to avoid arbitrary code execution.
To not change the hostfs behavior, the temporary file creation
permission now depends on the current umask(2) and the implementation of
mkstemp(3).
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Acked-by: Tristan Schmelcher <tschmelcher@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This brings SECCOMP_MODE_STRICT and SECCOMP_MODE_FILTER support through
prctl(2) and seccomp(2) to User-mode Linux for i386 and x86_64
subarchitectures.
secure_computing() is called first in handle_syscall() so that the
syscall emulation will be aborted quickly if matching a seccomp rule.
This is inspired from Meredydd Luff's patch
(https://gerrit.chromium.org/gerrit/21425).
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: David Drysdale <drysdale@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Kees Cook <keescook@chromium.org>
Add subarchitecture-independent implementation of asm-generic/syscall.h
allowing access to user system call parameters and results:
* syscall_get_nr()
* syscall_rollback()
* syscall_get_error()
* syscall_get_return_value()
* syscall_set_return_value()
* syscall_get_arguments()
* syscall_set_arguments()
* syscall_get_arch() provided by arch/x86/um/asm/syscall.h
This provides the necessary syscall helpers needed by
HAVE_ARCH_SECCOMP_FILTER plus syscall_get_error().
This is inspired from Meredydd Luff's patch
(https://gerrit.chromium.org/gerrit/21425).
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: David Drysdale <drysdale@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Kees Cook <keescook@chromium.org>
This fix two related bugs:
* PTRACE_GETREGS doesn't get the right orig_ax (syscall) value
* PTRACE_SETREGS can't set the orig_ax value (erased by initial value)
Get rid of the now useless and error-prone get_syscall().
Fix inconsistent behavior in the ptrace implementation for i386 when
updating orig_eax automatically update the syscall number as well. This
is now updated in handle_syscall().
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Anton Ivanov <aivanov@brocade.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: David Drysdale <drysdale@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Kees Cook <keescook@chromium.org>
This decreases the number of syscalls per read/write by half.
Signed-off-by: Anton Ivanov <aivanov@brocade.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Software IRQ processing in generic architectures assumes that the
exit out of hard IRQ may have re-enabled interrupts (some
architectures may have an implicit EOI). It presumes them enabled
and toggles the flags once more just in case unless this is turned
off in the architecture specific hardirq.h by setting
__ARCH_IRQ_EXIT_IRQS_DISABLED
This patch adds this to UML where due to the way IRQs are handled
it is an optimization (it works fine without it too).
Signed-off-by: Anton Ivanov <aivanov@brocade.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
The existing IRQ handler design in UML does not prevent reentrancy
This is mitigated by fd-enable/fd-disable semantics for the IO
portion of the UML subsystem. The timer, however, can and is
re-entered resulting in very deep stack usage and occasional
stack exhaustion.
This patch prevents this by checking if there is a timer
interrupt in-flight before processing any pending timer interrupts.
Signed-off-by: Anton Ivanov <aivanov@brocade.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
I was seeing some really weird behaviour where piping UML's output
somewhere would cause output to get duplicated:
$ ./vmlinux | head -n 40
Checking that ptrace can change system call numbers...Core dump limits :
soft - 0
hard - NONE
OK
Checking syscall emulation patch for ptrace...Core dump limits :
soft - 0
hard - NONE
OK
Checking advanced syscall emulation patch for ptrace...Core dump limits :
soft - 0
hard - NONE
OK
Core dump limits :
soft - 0
hard - NONE
This is because these tests do a fork() which duplicates the non-empty
stdout buffer, then glibc flushes the duplicated buffer as each child
exits.
A simple workaround is to flush before forking.
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
When using va_list ensure that va_start will be followed by va_end.
Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
On gcc Ubuntu 4.8.4-2ubuntu1~14.04, linking vmlinux fails with:
arch/um/os-Linux/built-in.o: In function `os_timer_create':
/android/kernel/android/arch/um/os-Linux/time.c:51: undefined reference to `timer_create'
arch/um/os-Linux/built-in.o: In function `os_timer_set_interval':
/android/kernel/android/arch/um/os-Linux/time.c:84: undefined reference to `timer_settime'
arch/um/os-Linux/built-in.o: In function `os_timer_remain':
/android/kernel/android/arch/um/os-Linux/time.c:109: undefined reference to `timer_gettime'
arch/um/os-Linux/built-in.o: In function `os_timer_one_shot':
/android/kernel/android/arch/um/os-Linux/time.c:132: undefined reference to `timer_settime'
arch/um/os-Linux/built-in.o: In function `os_timer_disable':
/android/kernel/android/arch/um/os-Linux/time.c:145: undefined reference to `timer_settime'
This is because -lrt appears in the generated link commandline
after arch/um/os-Linux/built-in.o. Fix this by removing -lrt from
arch/um/Makefile and adding it to the UM-specific section of
scripts/link-vmlinux.sh.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
If get_signal() returns us a signal to post
we must not call it again, otherwise the already
posted signal will be overridden.
Before commit a610d6e672 this was the case as we stopped
the while after a successful handle_signal().
Cc: <stable@vger.kernel.org> # 3.10-
Fixes: a610d6e672 ("pull clearing RESTORE_SIGMASK into block_sigmask()")
Signed-off-by: Richard Weinberger <richard@nod.at>
UML is using an obsolete itimer call for
all timers and "polls" for kernel space timer firing
in its userspace portion resulting in a long list
of bugs and incorrect behaviour(s). It also uses
ITIMER_VIRTUAL for its timer which results in the
timer being dependent on it running and the cpu
load.
This patch fixes this by moving to posix high resolution
timers firing off CLOCK_MONOTONIC and relaying the timer
correctly to the UML userspace.
Fixes:
- crashes when hosts suspends/resumes
- broken userspace timers - effecive ~40Hz instead
of what they should be. Note - this modifies skas behavior
by no longer setting an itimer per clone(). Timer events
are relayed instead.
- kernel network packet scheduling disciplines
- tcp behaviour especially under load
- various timer related corner cases
Finally, overall responsiveness of userspace is better.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Anton Ivanov <aivanov@brocade.com>
[rw: massaged commit message]
Signed-off-by: Richard Weinberger <richard@nod.at>
since GFP_KERNEL with GFP_ATOMIC while spinlock is held,
as code while holding a spinlock should be atomic.
GFP_KERNEL may sleep and can cause deadlock,
where as GFP_ATOMIC may fail but certainly avoids deadlockdex f70dd54..d898f6c 100644
Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
We have to exclude memory locations <= PAGE_SIZE from
the condition and let the kernel mode fault path catch it.
Otherwise a kernel NULL pointer exception will be reported
as a kernel user space access.
Fixes: d2313084e2 (um: Catch unprotected user memory access)
Signed-off-by: Richard Weinberger <richard@nod.at>
If UML is executing a helper program it is using
waitpid() with the __WCLONE flag to wait for the program
as the helper is executed from a clone()'ed thread.
While using __WCLONE is perfectly fine for clone()'ed
childs it won't detect terminated childs if the helper
has issued an execve().
We have to use __WALL to wait for both clone()'ed and
regular childs to detect the termination before and
after an execve().
Reported-and-tested-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Commit 30b11ee9a (um: Remove copy&paste code from init.h)
uncovered an issue wrt. out-of-tree builds.
For out-of-tree builds, we must not rely on relative paths.
Before 30b11ee9a it worked by chance as no host code included
generated header files.
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Pull strscpy string copy function implementation from Chris Metcalf.
Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.
The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.
strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result. To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.
strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string. Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated. It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.
strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG. It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.
So why did I waffle about this for so long?
Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.
And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.
So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches. Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.
* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: use global strscpy() rather than private copy
string: provide strscpy()
Make asm/word-at-a-time.h available on all architectures
Pull timer updates from Thomas Gleixner:
"Rather large, but nothing exiting:
- new range check for settimeofday() to prevent that boot time
becomes negative.
- fix for file time rounding
- a few simplifications of the hrtimer code
- fix for the proc/timerlist code so the output of clock realtime
timers is accurate
- more y2038 work
- tree wide conversion of clockevent drivers to the new callbacks"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (88 commits)
hrtimer: Handle failure of tick_init_highres() gracefully
hrtimer: Unconfuse switch_hrtimer_base() a bit
hrtimer: Simplify get_target_base() by returning current base
hrtimer: Drop return code of hrtimer_switch_to_hres()
time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
time: Introduce current_kernel_time64()
time: Introduce struct itimerspec64
time: Add the common weak version of update_persistent_clock()
time: Always make sure wall_to_monotonic isn't positive
time: Fix nanosecond file time rounding in timespec_trunc()
timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers
cris/time: Migrate to new 'set-state' interface
kernel: broadcast-hrtimer: Migrate to new 'set-state' interface
xtensa/time: Migrate to new 'set-state' interface
unicore/time: Migrate to new 'set-state' interface
um/time: Migrate to new 'set-state' interface
sparc/time: Migrate to new 'set-state' interface
sh/localtimer: Migrate to new 'set-state' interface
score/time: Migrate to new 'set-state' interface
s390/time: Migrate to new 'set-state' interface
...
Migrate um driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.
This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: user-mode-linux-devel@lists.sourceforge.net
Cc: user-mode-linux-user@lists.sourceforge.net
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Commit 2ae416b142 ("mm: new mm hook framework") introduced an empty
header file (mm-arch-hooks.h) for every architecture, even those which
doesn't need to define mm hooks.
As suggested by Geert Uytterhoeven, this could be cleaned through the use
of a generic header file included via each per architecture
asm/include/Kbuild file.
The PowerPC architecture is not impacted here since this architecture has
to defined the arch_remap MM hook.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Added the x86 implementation of word-at-a-time to the
generic version, which previously only supported big-endian.
Omitted the x86-specific load_unaligned_zeropad(), which in
any case is also not present for the existing BE-only
implementation of a word-at-a-time, and is only used under
CONFIG_DCACHE_WORD_ACCESS.
Added as a "generic-y" to the Kbuilds of all architectures
that didn't previously have it.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Once x86 exports its do_signal(), the prototypes will clash.
Fix the clash and also improve the code a bit: remove the
unnecessary kern_do_signal() indirection. This allows
interrupt_end() to share the 'regs' parameter calculation.
Also remove the unused return code to match x86.
Minimally build and boot tested.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/67c57eac09a589bac3c6c5ff22f9623ec55a184a.1435952415.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Main excitement here is Peter Zijlstra's lockless rbtree optimization to
speed module address lookup. He found some abusers of the module lock
doing that too.
A little bit of parameter work here too; including Dan Streetman's breaking
up the big param mutex so writing a parameter can load another module (yeah,
really). Unfortunately that broke the usual suspects, !CONFIG_MODULES and
!CONFIG_SYSFS, so those fixes were appended too.
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH
58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7
b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z
rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K
wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t
GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB
PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4
qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR
HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5
OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh
dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek
tLdh/a9GiCitqS0bT7GE
=tWPQ
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"Main excitement here is Peter Zijlstra's lockless rbtree optimization
to speed module address lookup. He found some abusers of the module
lock doing that too.
A little bit of parameter work here too; including Dan Streetman's
breaking up the big param mutex so writing a parameter can load
another module (yeah, really). Unfortunately that broke the usual
suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
appended too"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
modules: only use mod->param_lock if CONFIG_MODULES
param: fix module param locks when !CONFIG_SYSFS.
rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
module: add per-module param_lock
module: make perm const
params: suppress unused variable error, warn once just in case code changes.
modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
kernel/module.c: avoid ifdefs for sig_enforce declaration
kernel/workqueue.c: remove ifdefs over wq_power_efficient
kernel/params.c: export param_ops_bool_enable_only
kernel/params.c: generalize bool_enable_only
kernel/module.c: use generic module param operaters for sig_enforce
kernel/params: constify struct kernel_param_ops uses
sysfs: tightened sysfs permission checks
module: Rework module_addr_{min,max}
module: Use __module_address() for module_address_lookup()
module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
module: Optimize __module_address() using a latched RB-tree
rbtree: Implement generic latch_tree
seqlock: Introduce raw_read_seqcount_latch()
...
Pull UML updates from Richard Weinberger:
- remove hppfs ("HonePot ProcFS")
- initial support for musl libc
- uaccess cleanup
- random cleanups and bug fixes all over the place
* 'for-linus-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (21 commits)
um: Don't pollute kernel namespace with uapi
um: Include sys/types.h for makedev(), major(), minor()
um: Do not use stdin and stdout identifiers for struct members
um: Do not use __ptr_t type for stack_t's .ss pointer
um: Fix mconsole dependency
um: Handle tracehook_report_syscall_entry() result
um: Remove copy&paste code from init.h
um: Stop abusing __KERNEL__
um: Catch unprotected user memory access
um: Fix warning in setup_signal_stack_si()
um: Rework uaccess code
um: Add uaccess.h to ldt.c
um: Add uaccess.h to syscalls_64.c
um: Add asm/elf.h to vma.c
um: Cleanup mem_32/64.c headers
um: Remove hppfs
um: Move syscall() declaration into os.h
um: kernel: ksyms: Export symbol syscall() for fixing modpost issue
um/os-Linux: Use char[] for syscall_stub declarations
um: Use char[] for linker script address declarations
...
Here's the big char/misc driver pull request for 4.2-rc1.
Lots of mei, extcon, coresight, uio, mic, and other driver updates in
here. Full details in the shortlog. All of these have been in
linux-next for some time with no reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlWNn0gACgkQMUfUDdst+ykCCQCgvdF4F2+Hy9+RATdk22ak1uq1
JDMAoJTf4oyaIEdaiOKfEIWg9MasS42B
=H5wD
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver updates from Greg KH:
"Here's the big char/misc driver pull request for 4.2-rc1.
Lots of mei, extcon, coresight, uio, mic, and other driver updates in
here. Full details in the shortlog. All of these have been in
linux-next for some time with no reported problems"
* tag 'char-misc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (176 commits)
mei: me: wait for power gating exit confirmation
mei: reset flow control on the last client disconnection
MAINTAINERS: mei: add mei_cl_bus.h to maintained file list
misc: sram: sort and clean up included headers
misc: sram: move reserved block logic out of probe function
misc: sram: add private struct device and virt_base members
misc: sram: report correct SRAM pool size
misc: sram: bump error message level on unclean driver unbinding
misc: sram: fix device node reference leak on error
misc: sram: fix enabled clock leak on error path
misc: mic: Fix reported static checker warning
misc: mic: Fix randconfig build error by including errno.h
uio: pruss: Drop depends on ARCH_DAVINCI_DA850 from config
uio: pruss: Add CONFIG_HAS_IOMEM dependence
uio: pruss: Include <linux/sizes.h>
extcon: Redefine the unique id of supported external connectors without 'enum extcon' type
char:xilinx_hwicap:buffer_icap - change 1/0 to true/false for bool type variable in function buffer_icap_set_configuration().
Drivers: hv: vmbus: Allocate ring buffer memory in NUMA aware fashion
parport: check exclusive access before register
w1: use correct lock on error in w1_seq_show()
...
Pull asm/scatterlist.h removal from Jens Axboe:
"We don't have any specific arch scatterlist anymore, since parisc
finally switched over. Kill the include"
* 'for-4.2/sg' of git://git.kernel.dk/linux-block:
remove scatterlist.h generation from arch Kbuild files
remove <asm/scatterlist.h>
Don't include ptrace uapi stuff in arch headers, it will
pollute the kernel namespace and conflict with existing
stuff.
In this case it fixes clashes with common names like R8.
Signed-off-by: Richard Weinberger <richard@nod.at>
The functions in question are not part of the POSIX standard,
documentation however hints that the corresponding header shall
be sys/types.h. C libraries other than glibc, namely musl, did
not include that header via other ways and complained.
Signed-off-by: Hans-Werner Hilse <hwhilse@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
stdin, stdout and stderr are macros according to C89/C99.
Thus do not use them as struct member identifiers to avoid
bad results from macro expansion.
Signed-off-by: Hans-Werner Hilse <hwhilse@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
__ptr_t type is a glibc-specific type, while the generally
documented type is a void*. That's what other C libraries use,
too.
Signed-off-by: Hans-Werner Hilse <hwhilse@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
CRIU is recreating the process memory layout by remapping the checkpointee
memory area on top of the current process (criu). This includes remapping
the vDSO to the place it has at checkpoint time.
However some architectures like powerpc are keeping a reference to the
vDSO base address to build the signal return stack frame by calling the
vDSO sigreturn service. So once the vDSO has been moved, this reference
is no more valid and the signal frame built later are not usable.
This patch serie is introducing a new mm hook framework, and a new
arch_remap hook which is called when mremap is done and the mm lock still
hold. The next patch is adding the vDSO remap and unmap tracking to the
powerpc architecture.
This patch (of 3):
This patch introduces a new set of header file to manage mm hooks:
- per architecture empty header file (arch/x/include/asm/mm-arch-hooks.h)
- a generic header (include/linux/mm-arch-hooks.h)
The architecture which need to overwrite a hook as to redefine it in its
header file, while architecture which doesn't need have nothing to do.
The default hooks are defined in the generic header and are used in the
case the architecture is not defining it.
In a next step, mm hooks defined in include/asm-generic/mm_hooks.h should
be moved here.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a "param_lock" mutex to each module, and update params.c to use
the correct built-in or module mutex while locking kernel params.
Remove the kparam_block_sysfs_r/w() macros, replace them with direct
calls to kernel_param_[un]lock(module).
The kernel param code currently uses a single mutex to protect
modification of any and all kernel params. While this generally works,
there is one specific problem with it; a module callback function
cannot safely load another module, i.e. with request_module() or even
with indirect calls such as crypto_has_alg(). If the module to be
loaded has any of its params configured (e.g. with a /etc/modprobe.d/*
config file), then the attempt will result in a deadlock between the
first module param callback waiting for modprobe, and modprobe trying to
lock the single kernel param mutex to set the new module's param.
This fixes that by using per-module mutexes, so that each individual module
is protected against concurrent changes in its own kernel params, but is
not blocked by changes to other module params. All built-in modules
continue to use the built-in mutex, since they will always be loaded at
runtime and references (e.g. request_module(), crypto_has_alg()) to them
will never cause load-time param changing.
This also simplifies the interface used by modules to block sysfs access
to their params; while there are currently functions to block and unblock
sysfs param access which are split up by read and write and expect a single
kernel param to be passed, their actual operation is identical and applies
to all params, not just the one passed to them; they simply lock and unlock
the global param mutex. They are replaced with direct calls to
kernel_param_[un]lock(THIS_MODULE), which locks THIS_MODULE's param_lock, or
if the module is built-in, it locks the built-in mutex.
Suggested-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
tracehook_report_syscall_entry() is allowed to fail,
in case of failure we have to abort the current syscall.
Signed-off-by: Richard Weinberger <richard@nod.at>
As we got rid of the __KERNEL__ abuse, we can directly
include linux/compiler.h now.
This also allows gcc 5 to build UML.
Reported-by: Hans-Werner Hilse <hwhilse@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Currently UML is abusing __KERNEL__ to distinguish between
kernel and host code (os-Linux). It is better to use a custom
define such that existing users of __KERNEL__ don't get confused.
Signed-off-by: Richard Weinberger <richard@nod.at>
If the kernel tries to access user memory without copy_from_user()
a trap will happen as kernel and userspace run in different processes
on the host side. Currently this special page fault cannot be resolved
and will happen over and over again. As result UML will lockup.
This patch allows the page fault code to detect that situation and
causes a panic() such that the root cause of the unprotected memory
access can be found and fixed.
Signed-off-by: Richard Weinberger <richard@nod.at>
hppfs (honeypot procfs) was an attempt to use UML as honeypot.
It was never stable nor in heavy use.
As Al Viro and Christoph Hellwig pointed some major issues out
it is better to let it die.
Signed-off-by: Richard Weinberger <richard@nod.at>
syscall() is implemented in libc.so/a (e.g. for glibc, in "syscall.o"),
so for normal ".o" files, it is undefined, neither can be found within
kernel wide, so will break modpost.
Since ".o" files is OK, can simply export 'syscall' symbol, let modpost
know about that, then can fix this issue.
The related error (with allmodconfig under um):
MODPOST 1205 modules
ERROR: "syscall" [fs/hostfs/hostfs.ko] undefined!
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
When declaring __syscall_stub_start, use the same type in UML userspace
code as in arch/um/include/asm/sections.h.
While at it, also declare batch_syscall_stub as char[].
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
The linker script defines some variables which are declared either with
type char[] in include/asm-generic/sections.h or with a meaningless
integer type in arch/um/include/asm/sections.h.
Fix this inconsistency by declaring every variable char[].
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
arch/um/kernel/dyn.lds.S and arch/um/kernel/uml.lds.S define some
UML-specific symbols. These symbols are used in the kernel part of UML
with extern declarations.
Move these declarations to a new header, asm/sections.h, like other
architectures do.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
[um maintainers appear to be vanished]
I can't prove the case pointed out in
https://bugzilla.kernel.org/show_bug.cgi?id=82341
is correct so let us play safe.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce faulthandler_disabled() and use it to check for irq context and
disabled pagefaults (via pagefault_disable()) in the pagefault handlers.
Please note that we keep the in_atomic() checks in place - to detect
whether in irq context (in which case preemption is always properly
disabled).
In contrast, preempt_disable() should never be used to disable pagefaults.
With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
counter, and therefore the result of in_atomic() differs.
We validate that condition by using might_fault() checks when calling
might_sleep().
Therefore, add a comment to faulthandler_disabled(), describing why this
is needed.
faulthandler_disabled() and pagefault_disable() are defined in
linux/uaccess.h, so let's properly add that include to all relevant files.
This patch is based on a patch from Thomas Gleixner.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull exec domain removal from Richard Weinberger:
"This series removes execution domain support from Linux.
The idea behind exec domains was to support different ABIs. The
feature was never complete nor stable. Let's rip it out and make the
kernel signal handling code less complicated"
* 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc: (27 commits)
arm64: Removed unused variable
sparc: Fix execution domain removal
Remove rest of exec domains.
arch: Remove exec_domain from remaining archs
arc: Remove signal translation and exec_domain
xtensa: Remove signal translation and exec_domain
xtensa: Autogenerate offsets in struct thread_info
x86: Remove signal translation and exec_domain
unicore32: Remove signal translation and exec_domain
um: Remove signal translation and exec_domain
tile: Remove signal translation and exec_domain
sparc: Remove signal translation and exec_domain
sh: Remove signal translation and exec_domain
s390: Remove signal translation and exec_domain
mn10300: Remove signal translation and exec_domain
microblaze: Remove signal translation and exec_domain
m68k: Remove signal translation and exec_domain
m32r: Remove signal translation and exec_domain
m32r: Autogenerate offsets in struct thread_info
frv: Remove signal translation and exec_domain
...
Pull UML updates from Richard Weinberger:
- hostfs saw a face lifting
- old/broken stuff was removed (SMP, HIGHMEM, SKAS3/4)
- random cleanups and bug fixes
* tag 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (26 commits)
um: Print minimum physical memory requirement
um: Move uml_postsetup in the init_thread stack
um: add a kmsg_dumper
x86, UML: fix integer overflow in ELF_ET_DYN_BASE
um: hostfs: Reduce number of syscalls in readdir
um: Remove broken highmem support
um: Remove broken SMP support
um: Remove SKAS3/4 support
um: Remove ppc cruft
um: Remove ia64 cruft
um: Remove dead code from stacktrace
hostfs: No need to box and later unbox the file mode
hostfs: Use page_offset()
hostfs: Set page flags in hostfs_readpage() correctly
hostfs: Remove superfluous initializations in hostfs_open()
hostfs: hostfs_open: Reset open flags upon each retry
hostfs: Remove superfluous test in hostfs_open()
hostfs: Report append flag in ->show_options()
hostfs: Use __getname() in follow_link
hostfs: Remove open coded strcpy()
...
Pull kbuild updates from Michal Marek:
"Here is the first round of kbuild changes for v4.1-rc1:
- kallsyms fix for ARM and cleanup
- make dep(end) removed (developers have no sense of nostalgia these
days...)
- include Makefiles by relative path
- stop useless rebuilds of asm-offsets.h and bounds.h"
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
Kbuild: kallsyms: drop special handling of pre-3.0 GCC symbols
Kbuild: kallsyms: ignore veneers emitted by the ARM linker
kbuild: ia64: use $(src)/Makefile.gate rather than particular path
kbuild: include $(src)/Makefile rather than $(obj)/Makefile
kbuild: use relative path more to include Makefile
kbuild: use relative path to include Makefile
kbuild: do not add $(bounds-file) and $(offsets-file) to targets
kbuild: remove warning about "make depend"
kbuild: Don't reset timestamps in include/generated if not needed
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Jeff Dike <jdike@addtoit.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Print a more sensible message about the minimum physical memory
requirement.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
atomic_notifier_chain_register() and uml_postsetup() do call kernel code
that rely on the "current" kernel macro and a valid task_struct resp.
thread_info struct. Give those functions a valid stack by moving
uml_postsetup() in the init_thread stack. This moves enables a panic()
call in this early code to generate a valid stacktrace, instead of
crashing.
E.g. when an UML kernel is started with an initrd but too few physical
memory the panic() call get's actually processed.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Add a kmsg_dumper, that dumps the kmsg buffer to stdout, when no console
is available. This an enables the printing of early panic() calls
triggered in uml_postsetup().
When a panic() call happens so early in the UML kernel no
earlyprintk/console is available yet, but with a kmsg_dumper in place
the kernel message buffer will be outputted to the user, to give a
better hint, of what the failure was.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Highmem was always buggy and experimental on UML(i386).
In times where 64 bit computers are default we can
remove that experimental code.
Signed-off-by: Richard Weinberger <richard@nod.at>
At times where UML used the TT mode to operate it had
kind of SMP support. It never got finished nor was
stable.
Let's rip out that cruft and stop confusing developers
which do tree-wide SMP cleanups.
If someone wants SMP support UML it has do be done from scratch.
Signed-off-by: Richard Weinberger <richard@nod.at>
Before we had SKAS0 UML had two modes of operation
TT (tracing thread) and SKAS3/4 (separated kernel address space).
TT was known to be insecure and got removed a long time ago.
SKAS3/4 required a few (3 or 4) patches on the host side which never went
mainline. The last host patch is 10 years old.
With SKAS0 mode (separated kernel address space using 0 host patches),
default since 2005, SKAS3/4 is obsolete and can be removed.
Signed-off-by: Richard Weinberger <richard@nod.at>
That code is a relict from the early days of UML.
ppc support was never completed nor worked.
Let's rip it out.
Signed-off-by: Richard Weinberger <richard@nod.at>
That code is a relict from the early days of UML.
ia64 support was never completed nor worked.
Let's rip it out.
Signed-off-by: Richard Weinberger <richard@nod.at>
As execution domain support is gone we can remove
signal translation from the signal code and remove
exec_domain from thread_info.
Signed-off-by: Richard Weinberger <richard@nod.at>
Prior to this commit, it was impossible to use relative path to
include Makefiles from the top level Makefile because the option
"--include-dir=$(srctree)" becomes effective when Make enters into
sub Makefiles.
To use relative path in any places, this commit moves the option
above the "sub-make" target.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target. This is because the
restart_block is held in the same memory allocation as the kernel stack.
Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.
Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.
It's also a decent simplification, since the restart code is more or less
identical on all architectures.
[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKP has triggered a compiler warning after my recent patch "mm: account
pmd page tables to the process":
mm/mmap.c: In function 'exit_mmap':
>> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default]
The code:
> 2857 WARN_ON(mm_nr_pmds(mm) >
2858 round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
In this, on tile, we have FIRST_USER_ADDRESS defined as 0. round_up() has
the same type -- int. PUD_SHIFT.
I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned
long. On every arch for consistency.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.
That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works. However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV. And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.
However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space. And user space really
expected SIGSEGV, not SIGBUS.
To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it. They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.
This is the mindless minimal patch to do this. A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.
Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
futex_atomic_cmpxchg_inatomic() does not work on UML because
it triggers a copy_from_user() in kernel context.
On UML copy_from_user() can only be used if the kernel was called
by a real user space process such that UML can use ptrace()
to fetch the value.
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Tested-by: Daniel Walter <d.walter@0x90.at>
Here's the big tty/serial driver update for 3.19-rc1.
There are a number of TTY core changes/fixes in here from Peter Hurley
that have all been teted in linux-next for a long time now. There are
also the normal serial driver updates as well, full details in the
changelog below.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlSOD/MACgkQMUfUDdst+ymW+wCfbSzoYMRObIImMPWfoQtxkvvN
rpkAnAtyEP/zZIfkQIuKTSH6FJxocF8V
=WZt3
-----END PGP SIGNATURE-----
Merge tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here's the big tty/serial driver update for 3.19-rc1.
There are a number of TTY core changes/fixes in here from Peter Hurley
that have all been teted in linux-next for a long time now. There are
also the normal serial driver updates as well, full details in the
changelog below"
* tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (219 commits)
serial: pxa: hold port.lock when reporting modem line changes
tty-hvsi_lib: Deletion of an unnecessary check before the function call "tty_kref_put"
tty: Deletion of unnecessary checks before two function calls
n_tty: Fix read_buf race condition, increment read_head after pushing data
serial: of-serial: add PM suspend/resume support
Revert "serial: of-serial: add PM suspend/resume support"
Revert "serial: of-serial: fix up PM ops on no_console_suspend and port type"
serial: 8250: don't attempt a trylock if in sysrq
serial: core: Add big-endian iotype
serial: samsung: use port->fifosize instead of hardcoded values
serial: samsung: prefer to use fifosize from driver data
serial: samsung: fix style problems
serial: samsung: wait for transfer completion before clock disable
serial: icom: fix error return code
serial: tegra: clean up tty-flag assignments
serial: Fix io address assign flow with Fintek PCI-to-UART Product
serial: mxs-auart: fix tx_empty against shift register
serial: mxs-auart: fix gpio change detection on interrupt
serial: mxs-auart: Fix mxs_auart_set_ldisc()
serial: 8250_dw: Use 64-bit access for OCTEON.
...
Pull networking updates from David Miller:
1) New offloading infrastructure and example 'rocker' driver for
offloading of switching and routing to hardware.
This work was done by a large group of dedicated individuals, not
limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu
2) Start making the networking operate on IOV iterators instead of
modifying iov objects in-situ during transfers. Thanks to Al Viro
and Herbert Xu.
3) A set of new netlink interfaces for the TIPC stack, from Richard
Alpe.
4) Remove unnecessary looping during ipv6 routing lookups, from Martin
KaFai Lau.
5) Add PAUSE frame generation support to gianfar driver, from Matei
Pavaluca.
6) Allow for larger reordering levels in TCP, which are easily
achievable in the real world right now, from Eric Dumazet.
7) Add a variable of napi_schedule that doesn't need to disable cpu
interrupts, from Eric Dumazet.
8) Use a doubly linked list to optimize neigh_parms_release(), from
Nicolas Dichtel.
9) Various enhancements to the kernel BPF verifier, and allow eBPF
programs to actually be attached to sockets. From Alexei
Starovoitov.
10) Support TSO/LSO in sunvnet driver, from David L Stevens.
11) Allow controlling ECN usage via routing metrics, from Florian
Westphal.
12) Remote checksum offload, from Tom Herbert.
13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
driver, from Thomas Lendacky.
14) Add MPLS support to openvswitch, from Simon Horman.
15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
Klassert.
16) Do gro flushes on a per-device basis using a timer, from Eric
Dumazet. This tries to resolve the conflicting goals between the
desired handling of bulk vs. RPC-like traffic.
17) Allow userspace to ask for the CPU upon what a packet was
received/steered, via SO_INCOMING_CPU. From Eric Dumazet.
18) Limit GSO packets to half the current congestion window, from Eric
Dumazet.
19) Add a generic helper so that all drivers set their RSS keys in a
consistent way, from Eric Dumazet.
20) Add xmit_more support to enic driver, from Govindarajulu
Varadarajan.
21) Add VLAN packet scheduler action, from Jiri Pirko.
22) Support configurable RSS hash functions via ethtool, from Eyal
Perry.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
Fix race condition between vxlan_sock_add and vxlan_sock_release
net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
net/mlx4: Add support for A0 steering
net/mlx4: Refactor QUERY_PORT
net/mlx4_core: Add explicit error message when rule doesn't meet configuration
net/mlx4: Add A0 hybrid steering
net/mlx4: Add mlx4_bitmap zone allocator
net/mlx4: Add a check if there are too many reserved QPs
net/mlx4: Change QP allocation scheme
net/mlx4_core: Use tasklet for user-space CQ completion events
net/mlx4_core: Mask out host side virtualization features for guests
net/mlx4_en: Set csum level for encapsulated packets
be2net: Export tunnel offloads only when a VxLAN tunnel is created
gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
net: fec: only enable mdio interrupt before phy device link up
net: fec: clear all interrupt events to support i.MX6SX
net: fec: reset fep link status in suspend function
net: sock: fix access via invalid file descriptor
net: introduce helper macro for_each_cmsghdr
...
As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.
This basically reverts commit 71ae8aac3e ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
237217546d ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f74f ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652df5
("perf tools: Fix include for non x86 architectures").
Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The x86 MPX patch set calls arch_unmap() and arch_bprm_mm_init()
from fs/exec.c, so we need at least a stub for them in all
architectures. They are only called under an #ifdef for
CONFIG_MMU=y, so we can at least restict this to architectures
with MMU support.
blackfin/c6x have no MMU support, so do not call arch_unmap().
They also do not include mm_hooks.h or mmu_context.h at all and
do not need to be touched.
s390, um and unicore32 do not use asm-generic/mm_hooks.h, so got
their own arch_unmap() versions. (I also moved um's
arch_dup_mmap() to be closer to the other mm_hooks.h functions).
xtensa only includes mm_hooks when MMU=y, which should be fine
since arch_unmap() is called only from MMU=y code.
For the rest, we use the stub copies of these functions in
asm-generic/mm_hook.h.
I cross compiled defconfigs for cris (to check NOMMU) and s390
to make sure that this works. I also checked a 64-bit build
of UML and all my normal x86 builds including PARAVIRT on and
off.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141118182350.8B4AA2C2@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Although the tty core maintains a pid reference for the foreground
process group, if the foreground process group is changed that
pid reference is dropped. Thus, the pid reference used for signalling
could become stale.
Safely obtain a pid reference to the foreground process group and
release the reference after signalling is complete.
cc: Jeff Dike <jdike@addtoit.com>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Reviewed-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull audit updates from Eric Paris:
"So this change across a whole bunch of arches really solves one basic
problem. We want to audit when seccomp is killing a process. seccomp
hooks in before the audit syscall entry code. audit_syscall_entry
took as an argument the arch of the given syscall. Since the arch is
part of what makes a syscall number meaningful it's an important part
of the record, but it isn't available when seccomp shoots the
syscall...
For most arch's we have a better way to get the arch (syscall_get_arch)
So the solution was two fold: Implement syscall_get_arch() everywhere
there is audit which didn't have it. Use syscall_get_arch() in the
seccomp audit code. Having syscall_get_arch() everywhere meant it was
a useless flag on the stack and we could get rid of it for the typical
syscall entry.
The other changes inside the audit system aren't grand, fixed some
records that had invalid spaces. Better locking around the task comm
field. Removing some dead functions and structs. Make some things
static. Really minor stuff"
* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: rename audit_log_remove_rule to disambiguate for trees
audit: cull redundancy in audit_rule_change
audit: WARN if audit_rule_change called illegally
audit: put rule existence check in canonical order
next: openrisc: Fix build
audit: get comm using lock to avoid race in string printing
audit: remove open_arg() function that is never used
audit: correct AUDIT_GET_FEATURE return message type
audit: set nlmsg_len for multicast messages.
audit: use union for audit_field values since they are mutually exclusive
audit: invalid op= values for rules
audit: use atomic_t to simplify audit_serial()
kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
audit: reduce scope of audit_log_fcaps
audit: reduce scope of audit_net_id
audit: arm64: Remove the audit arch argument to audit_syscall_entry
arm64: audit: Add audit hook in syscall_trace_enter/exit()
audit: x86: drop arch from __audit_syscall_entry() interface
sparc: implement is_32bit_task
sparc: properly conditionalize use of TIF_32BIT
...
Pull UML update from Richard Weinberger:
"Besides of fixes this contains also support for CONFIG_STACKTRACE by
Daniel Walter"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: net: Eliminate NULL test after alloc_bootmem
um: Add support for CONFIG_STACKTRACE
um: ubd: Fix for processes stuck in D state forever
um: delete unnecessary bootmem struct page array
um: remove csum_partial_copy_generic_i386 to clean up exception table
alloc_bootmem and related functions never return NULL. Thus a NULL
test or memset after calls to these functions is unnecessary.
The following Coccinelle semantic patch was used for making the change:
@@
expression E;
statement S;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\)(...)
... when != E
- if (E == NULL) S
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Richard Weinberger <richard@nod.at>
Starting with Linux 3.12 processes get stuck in D state forever in
UserModeLinux under sync heavy workloads. This bug was introduced by
commit 805f11a0d5 (um: ubd: Add REQ_FLUSH suppport).
Fix bug by adding a check if FLUSH request was successfully submitted to
the I/O thread and keeping the FLUSH request on the request queue on
submission failures.
Fixes: 805f11a0d5 (um: ubd: Add REQ_FLUSH suppport)
Signed-off-by: Thorsten Knabe <linux@thorsten-knabe.de>
Cc: stable@kernel.org # >= 3.12
Signed-off-by: Richard Weinberger <richard@nod.at>
1) uml kernel bootmem managed through bootmem_data->node_bootmem_map,
not the struct page array, so the array is unnecessary.
2) the bootmem struct page array has been pointed by a *local* pointer,
struct page *map, in init_maps function. The array can be accessed only
in init_maps's scope. As a result, uml kernel wastes about 1% of total
memory.
Signed-off-by: Honggang Li <enjoymindful@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
Hansen)
- Various sched/idle refinements for better idle handling (Nicolas
Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)
- sched/numa updates and optimizations (Rik van Riel)
- sysbench speedup (Vincent Guittot)
- capacity calculation cleanups/refactoring (Vincent Guittot)
- Various cleanups to thread group iteration (Oleg Nesterov)
- Double-rq-lock removal optimization and various refactorings
(Kirill Tkhai)
- various sched/deadline fixes
... and lots of other changes"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
sched/fair: Delete resched_cpu() from idle_balance()
sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
sched: Improve sysbench performance by fixing spurious active migration
sched/x86: Fix up typo in topology detection
x86, sched: Add new topology for multi-NUMA-node CPUs
sched/rt: Use resched_curr() in task_tick_rt()
sched: Use rq->rd in sched_setaffinity() under RCU read lock
sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
sched: Use dl_bw_of() under RCU read lock
sched/fair: Remove duplicate code from can_migrate_task()
sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
sched: print_rq(): Don't use tasklist_lock
sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
sched: Fix the task-group check in tg_has_rt_tasks()
sched/fair: Leverage the idle state info when choosing the "idlest" cpu
sched: Let the scheduler see CPU idle states
sched/deadline: Fix inter- exclusive cpusets migrations
sched/deadline: Clear dl_entity params when setscheduling to different class
sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
...
The nohz full code needs irq work to trigger its own interrupt so that
the subsystem can work even when the tick is stopped.
Lets introduce arch_irq_work_has_interrupt() that archs can override to
tell about their support for this ability.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Pull arch signal handling cleanup from Richard Weinberger:
"This patch series moves all remaining archs to the get_signal(),
signal_setup_done() and sigsp() functions.
Currently these archs use open coded variants of the said functions.
Further, unused parameters get removed from get_signal_to_deliver(),
tracehook_signal_handler() and signal_delivered().
At the end of the day we save around 500 lines of code."
* 'signal-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc: (43 commits)
powerpc: Use sigsp()
openrisc: Use sigsp()
mn10300: Use sigsp()
mips: Use sigsp()
microblaze: Use sigsp()
metag: Use sigsp()
m68k: Use sigsp()
m32r: Use sigsp()
hexagon: Use sigsp()
frv: Use sigsp()
cris: Use sigsp()
c6x: Use sigsp()
blackfin: Use sigsp()
avr32: Use sigsp()
arm64: Use sigsp()
arc: Use sigsp()
sas_ss_flags: Remove nested ternary if
Rip out get_signal_to_deliver()
Clean up signal_delivered()
tracehook_signal_handler: Remove sig, info, ka and regs
...
The core mm code will provide a default gate area based on
FIXADDR_USER_START and FIXADDR_USER_END if
!defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).
This default is only useful for ia64. arm64, ppc, s390, sh, tile, 64-bit
UML, and x86_32 have their own code just to disable it. arm, 32-bit UML,
and x86_64 have gate areas, but they have their own implementations.
This gets rid of the default and moves the code into ia64.
This should save some code on architectures without a gate area: it's now
possible to inline the gate_area functions in the default case.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle]
Acked-by: Richard Weinberger <richard@nod.at> [for um]
Acked-by: Will Deacon <will.deacon@arm.com> [for arm64]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nathan Lynch <Nathan_Lynch@mentor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rather than have architectures #define ARCH_HAS_SG_CHAIN in an
architecture specific scatterlist.h, make it a proper Kconfig option and
use that instead. At same time, remove the header files are are now
mostly useless and just include asm-generic/scatterlist.h.
[sfr@canb.auug.org.au: powerpc files now need asm/dma.h]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [powerpc]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If do_ops() fails we have to release current->mm->mmap_sem
otherwise the failing task will never terminate.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Trinity discovered an execution path such that a task
can unmap his stub page.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
This reverts commit 0974a9cadc.
The real for for that issue is to release current->mm->mmap_sem in
fix_range_common().
Signed-off-by: Richard Weinberger <richard@nod.at>
Pull kbuild updates from Michal Marek:
"Kbuild changes for v3.16-rc1:
- cross-compilation fix so that cc-option is testing the right
compiler
- Fix for make defconfig all
- Using relative paths to the object and source directory where
possible, plus fixes for the fallout of the change
- several cleanups in the Makefiles and scripts
The powerpc fix is from today, because it was only discovered
recently. The rest has been in linux-next for some time"
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
powerpc: Avoid circular dependency with zImage.%
kbuild: create include/config directory in scripts/kconfig/Makefile
kbuild: do not create include/linux directory
Makefile: Fix unrecognized cross-compiler command line options
kbuild: do not add "selinux" to subdir- twice
um: Fix for relative objtree when generating x86 headers
kbuild: Use relative path when building in a subdir of the source tree
kbuild: Use relative path when building in the source tree
kbuild: Use relative path for $(objtree)
firmware: Use $(quote) in the Makefile
firmware: Simplify directory creation
kbuild: trivial - fix comment block indent
kbuild: trivial - remove trailing spaces
kbuild: support simultaneous "make %config" and "make all"
kbuild: move extra gcc checks to scripts/Makefile.extrawarn
In an O= build, rely on the generated Makefile to call the main Makefile
properly. When building in the source tree, we do not need to specify
the -C and O= either. This fixes the problem when $(objtree) is a
relative path and the -C changes the directory.
Reported-by: Richard Weinberger <richard.weinberger@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
The mmu-gather operation 'tlb_flush_mmu()' has done two things: the
actual tlb flush operation, and the batched freeing of the pages that
the TLB entries pointed at.
This splits the operation into separate phases, so that the forced
batched flushing done by zap_pte_range() can now do the actual TLB flush
while still holding the page table lock, but delay the batched freeing
of all the pages to after the lock has been dropped.
This in turn allows us to avoid a race condition between
set_page_dirty() (as called by zap_pte_range() when it finds a dirty
shared memory pte) and page_mkclean(): because we now flush all the
dirty page data from the TLB's while holding the pte lock,
page_mkclean() will be held up walking the (recently cleaned) page
tables until after the TLB entries have been flushed from all CPU's.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The reverse case of this race (you must msync before read) is
well known. This is the not so common one.
It can be triggered only on systems which do a lot of task
switching and only at UML startup. If you are starting 200+ UMLs
~ 0.5% will always die without this fix.
Signed-off-by: Anton Ivanov <antivano@cisco.com>
[rw: minor whitespace fixes]
Signed-off-by: Richard Weinberger <richard@nod.at>
UML does not handle sigpipe. As a result when running it under
expect or redirecting the IO from the console to an external program
it will crash if the program stops or exits.
Signed-off-by: Anton Ivanov <antivano@cisco.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Inferring the mount hierarchy correctly from /proc/mounts is hard when MS_MOVE
may have been used, and the previous code did it wrongly. This change simplifies
the logic to only require that /dev/shm be _on_ tmpfs (which can be checked
trivially with statfs) rather than that it be a _mountpoint_ of tmpfs, since
there isn't a compelling reason to be that strict. We also now check for tmpfs
on whatever directory we ultimately use so that the user is better informed.
This change also moves the more standard TMPDIR environment variable check ahead
of the others.
Applies to 3.12.
Signed-off-by: Tristan Schmelcher <tschmelcher@google.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Pull audit updates from Eric Paris.
* git://git.infradead.org/users/eparis/audit: (28 commits)
AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
audit: do not cast audit_rule_data pointers pointlesly
AUDIT: Allow login in non-init namespaces
audit: define audit_is_compat in kernel internal header
kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
sched: declare pid_alive as inline
audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
syscall_get_arch: remove useless function arguments
audit: remove stray newline from audit_log_execve_info() audit_panic() call
audit: remove stray newlines from audit_log_lost messages
audit: include subject in login records
audit: remove superfluous new- prefix in AUDIT_LOGIN messages
audit: allow user processes to log from another PID namespace
audit: anchor all pid references in the initial pid namespace
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t ppid of task in init_pid_ns
audit: rename the misleading audit_get_context() to audit_take_context()
audit: Add generic compat syscall support
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
...
arch_align_stack() moved to asm/exec.h, so change the comment referring to
asm/system.h which no longer exists.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking updates from David Miller:
"Here is my initial pull request for the networking subsystem during
this merge window:
1) Support for ESN in AH (RFC 4302) from Fan Du.
2) Add full kernel doc for ethtool command structures, from Ben
Hutchings.
3) Add BCM7xxx PHY driver, from Florian Fainelli.
4) Export computed TCP rate information in netlink socket dumps, from
Eric Dumazet.
5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
Dichtel.
6) Convert many drivers to pci_enable_msix_range(), from Alexander
Gordeev.
7) Record SKB timestamps more efficiently, from Eric Dumazet.
8) Switch to microsecond resolution for TCP round trip times, also
from Eric Dumazet.
9) Clean up and fix 6lowpan fragmentation handling by making use of
the existing inet_frag api for it's implementation.
10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.
11) Auto size SKB lengths when composing netlink messages based upon
past message sizes used, from Eric Dumazet.
12) qdisc dumps can take a long time, add a cond_resched(), From Eric
Dumazet.
13) Sanitize netpoll core and drivers wrt. SKB handling semantics.
Get rid of never-used-in-tree netpoll RX handling. From Eric W
Biederman.
14) Support inter-address-family and namespace changing in VTI tunnel
driver(s). From Steffen Klassert.
15) Add Altera TSE driver, from Vince Bridgers.
16) Optimizing csum_replace2() so that it doesn't adjust the checksum
by checksumming the entire header, from Eric Dumazet.
17) Expand BPF internal implementation for faster interpreting, more
direct translations into JIT'd code, and much cleaner uses of BPF
filtering in non-socket ocntexts. From Daniel Borkmann and Alexei
Starovoitov"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
net: Add a test to see if a skb is freeable in irq context
qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
net: ptp: move PTP classifier in its own file
net: sxgbe: make "core_ops" static
net: sxgbe: fix logical vs bitwise operation
net: sxgbe: sxgbe_mdio_register() frees the bus
Call efx_set_channels() before efx->type->dimension_resources()
xen-netback: disable rogue vif in kthread context
net/mlx4: Set proper build dependancy with vxlan
be2net: fix build dependency on VxLAN
mac802154: make csma/cca parameters per-wpan
mac802154: allow only one WPAN to be up at any given time
net: filter: minor: fix kdoc in __sk_run_filter
netlink: don't compare the nul-termination in nla_strcmp
can: c_can: Avoid led toggling for every packet.
can: c_can: Simplify TX interrupt cleanup
can: c_can: Store dlc private
can: c_can: Reduce register access
can: c_can: Make the code readable
...
Replace dev_kfree_skb with dev_consume_skb_any in uml_net_start_xmit
as it can be called in hard irq and other contexts.
dev_consume_skb_any is used as uml_net_start_xmit typically
consumes (not drops) packets.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Currently AUDITSYSCALL has a long list of architecture depencency:
depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML ||
SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL
for simplicity.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com> (arm)
Acked-by: Richard Guy Briggs <rgb@redhat.com> (audit)
Acked-by: Matt Turner <mattst88@gmail.com> (alpha)
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch allows each architecture to add its specific assembly optimized
arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for
MCS lock and unlock functions.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: AswinChandramouleeswaran <aswin@hp.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rik vanRiel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: MichelLespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Figo.zhang" <figo1802@gmail.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390347382.3138.67.camel@schen9-DESK
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We perform a clean up of the Kbuid files in each architecture.
We order the files in each Kbuild in alphabetical order
by running the below script.
for i in arch/*/include/asm/Kbuild
do
cat $i | gawk '/^generic-y/ {
i = 3;
do {
for (; i <= NF; i++) {
if ($i == "\\") {
getline;
i = 1;
continue;
}
if ($i != "")
hdr[$i] = $i;
}
break;
} while (1);
next;
}
// {
print $0;
}
END {
n = asort(hdr);
for (i = 1; i <= n; i++)
print "generic-y += " hdr[i];
}' > ${i}.sorted;
mv ${i}.sorted $i;
done
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
Cc: AswinChandramouleeswaran <aswin@hp.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "Figo.zhang" <figo1802@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: George Spelvin <linux@horizon.com>
Cc: MichelLespinasse <walken@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
[ Fixed build bug. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull UML changes from Richard Weinberger:
"This time only various cleanups and housekeeping patches"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: hostfs: make functions static
um: Include generic barrier.h
um: Removed unused attributes from thread_struct
Pull networking updates from David Miller:
1) BPF debugger and asm tool by Daniel Borkmann.
2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.
3) Correct reciprocal_divide and update users, from Hannes Frederic
Sowa and Daniel Borkmann.
4) Currently we only have a "set" operation for the hw timestamp socket
ioctl, add a "get" operation to match. From Ben Hutchings.
5) Add better trace events for debugging driver datapath problems, also
from Ben Hutchings.
6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
have a small send and a previous packet is already in the qdisc or
device queue, defer until TX completion or we get more data.
7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.
8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
Borkmann.
9) Share IP header compression code between Bluetooth and IEEE802154
layers, from Jukka Rissanen.
10) Fix ipv6 router reachability probing, from Jiri Benc.
11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.
12) Support tunneling in GRO layer, from Jerry Chu.
13) Allow bonding to be configured fully using netlink, from Scott
Feldman.
14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
already get the TCI. From Atzm Watanabe.
15) New "Heavy Hitter" qdisc, from Terry Lam.
16) Significantly improve the IPSEC support in pktgen, from Fan Du.
17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
Herbert.
18) Add Proportional Integral Enhanced packet scheduler, from Vijay
Subramanian.
19) Allow openvswitch to mmap'd netlink, from Thomas Graf.
20) Key TCP metrics blobs also by source address, not just destination
address. From Christoph Paasch.
21) Support 10G in generic phylib. From Andy Fleming.
22) Try to short-circuit GRO flow compares using device provided RX
hash, if provided. From Tom Herbert.
The wireless and netfilter folks have been busy little bees too.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
net/cxgb4: Fix referencing freed adapter
ipv6: reallocate addrconf router for ipv6 address when lo device up
fib_frontend: fix possible NULL pointer dereference
rtnetlink: remove IFLA_BOND_SLAVE definition
rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
qlcnic: update version to 5.3.55
qlcnic: Enhance logic to calculate msix vectors.
qlcnic: Refactor interrupt coalescing code for all adapters.
qlcnic: Update poll controller code path
qlcnic: Interrupt code cleanup
qlcnic: Enhance Tx timeout debugging.
qlcnic: Use bool for rx_mac_learn.
bonding: fix u64 division
rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
sfc: Use the correct maximum TX DMA ring size for SFC9100
Add Shradha Shah as the sfc driver maintainer.
net/vxlan: Share RX skb de-marking and checksum checks with ovs
tulip: cleanup by using ARRAY_SIZE()
ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
net/cxgb4: Don't retrieve stats during recovery
...
Signed-off-by: Mark Salter <msalter@redhat.com>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On UML SUBARCH can be x86, x86_64 and i386 and if it is x86
we use uname -m to select a defconfig.
Therefore we can no longer use -mcmodel=large only if SUBARCH
is x86_64.
Reported-and-tested-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
We cannot use print_stack_trace because the name conflicts
with linux/stacktrace.h.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Pull UML changes from Richard Weinberger:
"This pile contains a nice defconfig cleanup, a rewritten stack
unwinder and various cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Remove unused declarations from <as-layout.h>
um: remove used STDIO_CONSOLE Kconfig param
um/vdso: add .gitignore for a couple of targets
arch/um: make it work with defconfig and x86_64
um: Make kstack_depth_to_print conform to arch/x86
um: Get rid of thread_struct->saved_task
um: Make stack trace reliable against kernel mode faults
um: Rewrite show_stack()
Pull irq cleanups from Ingo Molnar:
"This is a multi-arch cleanup series from Thomas Gleixner, which we
kept to near the end of the merge window, to not interfere with
architecture updates.
This series (motivated by the -rt kernel) unifies more aspects of IRQ
handling and generalizes PREEMPT_ACTIVE"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
preempt: Make PREEMPT_ACTIVE generic
sparc: Use preempt_schedule_irq
ia64: Use preempt_schedule_irq
m32r: Use preempt_schedule_irq
hardirq: Make hardirq bits generic
m68k: Simplify low level interrupt handling code
genirq: Prevent spurious detection for unconditionally polled interrupts
_end is used, but it's already provided by <asm/sections.h>, so use that.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: user-mode-linux-devel@lists.sourceforge.net
Signed-off-by: Richard Weinberger <richard@nod.at>
This removes the STDIO_CONSOLE Kconfig parameter which
is defined but no longer used anywhere in the makefiles and source code.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
arch/um/defconfig only lists one default configuration, and that applies
only to the i386 architecture. Replace it with two minimal
configuration files generated using `make savedefconfig`:
i386_defconfig and x86_64_defconfig
The build scripts now require two updates:
1. um's Kconfig (arch/x86/um/Kconfig) should specify an ARCH_DEFCONFIG
section explicitly pointing to these scripts if the required
variables are set. Take care to remove the DEFCONFIG_LIST section
defined in the included file arch/um/Kconfig.common.
2. um's Makefile (arch/um/Makefile) should set KBUILD_DEFCONFIG properly
for the top-level Makefile to pick up. Copy the logic in
arch/x86/Makefile to properly pick the defconfig file depending on
the actual architecture; except we're working with $SUBARCH here,
instead of $ARCH.
Now, you can do:
$ ARCH=um make defconfig
$ ARCH=um make
and successfully build User-Mode Linux on an x86_64 box in default
configuration.
Cc: Richard Weinberger <richard@nod.at>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
As UML uses an alternative signal stack we cannot use
the current stack pointer for stack dumping if UML itself
dies by SIGSEGV. To bypass this issue we save regs taken
from mcontext in our segv handler into thread_struct and
use these regs to obtain the stack pointer in show_stack().
Signed-off-by: Richard Weinberger <richard@nod.at>
Currently on UML stack traces are not very reliable and both
x86 and x86_64 have their on implementations.
This patch unifies both and adds support to outline unreliable
functions calls.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Resolve cherry-picking conflicts:
Conflicts:
mm/huge_memory.c
mm/memory.c
mm/mprotect.c
See this upstream merge commit for more details:
52469b4fcd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We don't cap the size of buffer from the user so we could write past the
end of the array here. Only root can write to this file.
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to prepare to per-arch implementations of preempt_count move
the required bits into an asm-generic header and use this for all
archs.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-h5j0c1r3e3fk015m30h8f1zx@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.
Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kernel faults are expected to handle OOM conditions gracefully (gup,
uaccess etc.), so they should never invoke the OOM killer. Reserve this
for faults triggered in user context when it is the only option.
Most architectures already do this, fix up the remaining few.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These handlers are not optional and need in our case
dummy implementions to avoid NULL pointer bugs within
the irq core code.
Reported-and-tested-by: Toralf Foester <toralf.foerster@gmx.de>
Signed-off-by: Richard Weinberger <richard@nod.at>