This is the prevent previous stores from overlapping the block stores
done by the memcpy loop.
Based upon a glibc patch by Jose E. Marchesi
Signed-off-by: David S. Miller <davem@davemloft.net>
Choose PAGE_OFFSET dynamically based upon cpu type.
Original UltraSPARC-I (spitfire) chips only supported a 44-bit
virtual address space.
Newer chips (T4 and later) support 52-bit virtual addresses
and up to 47-bits of physical memory space.
Therefore we have to adjust PAGE_SIZE dynamically based upon
the capabilities of the chip.
Note that this change alone does not allow us to support > 43-bit
physical memory, to do that we need to re-arrange our page table
support. The current encodings of the pmd_t and pgd_t pointers
restricts us to "32 + 11" == 43 bits.
This change can waste quite a bit of memory for the various tables.
In particular, a future change should work to size and allocate
kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically.
This isn't easy as we really cannot take a TLB miss when accessing
kern_linear_bitmap[]. We'd have to lock it into the TLB or similar.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
The functions
__down_read
__down_read_trylock
__down_write
__down_write_trylock
__up_read
__up_write
__downgrade_write
are implemented inline, so remove corresponding EXPORT_SYMBOLs
(They lead to compile errors on RT kernel).
Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The help text for this config is duplicated across the x86, parisc, and
s390 Kconfig.debug files. Arnd Bergman noted that the help text was
slightly misleading and should be fixed to state that enabling this
option isn't a problem when using pre 4.4 gcc.
To simplify the rewording, consolidate the text into lib/Kconfig.debug
and modify it there to be more explicit about when you should say N to
this config.
Also, make the text a bit more generic by stating that this option
enables compile time checks so we can cover architectures which emit
warnings vs. ones which emit errors. The details of how an
architecture decided to implement the checks isn't as important as the
concept of compile time checking of copy_from_user() calls.
While we're doing this, remove all the copy_from_user_overflow() code
that's duplicated many times and place it into lib/ so that any
architecture supporting this option can get the function for free.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
srmmu_nocache_bitmap is cleared by bit_map_init(). But bit_map_init()
attempts to clear by memset(), so it can't clear the trailing edge of
bitmap properly on big-endian architecture if the number of bits is not
a multiple of BITS_PER_LONG.
Actually, the number of bits in srmmu_nocache_bitmap is not always
a multiple of BITS_PER_LONG. It is calculated as below:
bitmap_bits = srmmu_nocache_size >> SRMMU_NOCACHE_BITMAP_SHIFT;
srmmu_nocache_size is decided proportionally by the amount of system RAM
and it is rounded to a multiple of PAGE_SIZE. SRMMU_NOCACHE_BITMAP_SHIFT
is defined as (PAGE_SHIFT - 4). So it can only be said that bitmap_bits
is a multiple of 16.
This fixes the problem by using bitmap_clear() instead of memset()
in bit_map_init() and this also uses BITS_TO_LONGS() to calculate correct
size at bitmap allocation time.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparc32 already supported it, as a consequence of using the
generic atomic64 implementation. And the sparc64 implementation
is rather trivial.
This allows us to set ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE for all
of sparc, and avoid the annoying warning from lib/atomic64_test.c
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds optimized memset/bzero/page-clear routines for Niagara-4.
We basically can do what powerpc has been able to do for a decade (via
the "dcbz" instruction), which is use cache line clearing stores for
bzero and memsets with a 'c' argument of zero.
As long as we make the cache initializing store to each 32-byte
subblock of the L2 cache line, it works.
As with other Niagara-4 optimized routines, the key is to make sure to
avoid any usage of the %asi register, as reads and writes to it cost
at least 50 cycles.
For the user clear cases, we don't use these new routines, we use the
Niagara-1 variants instead. Those have to use %asi in an unavoidable
way.
A Niagara-4 8K page clear costs just under 600 cycles.
Add definitions of the MRU variants of the cache initializing store
ASIs. By default, cache initializing stores install the line as Least
Recently Used. If we know we're going to use the data immediately
(which is true for page copies and clears) we can use the Most
Recently Used variant, to decrease the likelyhood of the lines being
evicted before they get used.
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a Niagara 2 memcpy fix in this tree and I have
a Kconfig fix from Dave Jones which requires the sparc-next
changes which went upstream yesterday.
Signed-off-by: David S. Miller <davem@davemloft.net>
It gets clobbered by the kernel's VISEntryHalf, so we have to save it
in a different register than the set clobbered by that macro.
The instance in glibc is OK and doesn't have this problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
To use this, an architecture simply needs to:
1) Provide a user_addr_max() implementation via asm/uaccess.h
2) Add "select GENERIC_STRNCPY_FROM_USER" to their arch Kcnfig
3) Remove the existing strncpy_from_user() implementation and symbol
exports their architecture had.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: David Howells <dhowells@redhat.com>
Hide details of maximum user address calculation in a new
asm/uaccess.h interface named user_addr_max().
Provide little-endian implementation in find_zero(), which should work
but can probably be improved.
Abstrace alignment check behind IS_UNALIGNED() macro.
Kill double-semicolon, noticed by David Howells.
Signed-off-by: David S. Miller <davem@davemloft.net>
Compute a mask that will only have 0x80 in the bytes which
had a zero in them. The formula is:
~(((x & 0x7f7f7f7f) + 0x7f7f7f7f) | x | 0x7f7f7f7f)
In the inner word iteration, we have to compute the "x | 0x7f7f7f7f"
part, so we can reuse that in the above calculation.
Once we have this mask, we perform divide and conquer to find the
highest 0x80 location.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus removed the end-of-address-space hackery from
fs/namei.c:do_getname() so we really have to validate these edge
conditions and cannot cheat any more (as x86 used to as well).
Move to a common C implementation like x86 did. And if both
src and dst are sufficiently aligned we'll do word at a time
copies and checks as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise if no references exist in the static kernel image,
we won't export the symbol properly to modules.
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on copy from microblaze add ucmpdi2 implementation.
This fixes build of niu driver which failed with:
drivers/built-in.o: In function `niu_get_nfc':
niu.c:(.text+0x91494): undefined reference to `__ucmpdi2'
This driver will never be used on a sparc32 system,
but patch added to fix build breakage with all*config builds.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the explicit calls to .udiv/.umul in assembler, I made a
mechanical (read as: safe) transformation. I didn't attempt
to make any simplifications.
In particular, __ndelay and __udelay can be simplified significantly.
Some of the %y reads are unnecessary and these routines have no need
any longer for allocating a register window, they can be leaf
functions.
Signed-off-by: David S. Miller <davem@davemloft.net>
We always have this instruction available, so no need to use
btfixup for it any more.
This also eradicates the whole of atomic_32.S and thus the
__atomic_begin and __atomic_end symbols completely.
Signed-off-by: David S. Miller <davem@davemloft.net>
Both sparc 32-bit's software divide assembler and MPILIB provide
clz_tab[] with identical contents.
Break it out into a seperate object file and select it when
SPARC32 or MPILIB is set.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: James Morris <jmorris@namei.org>
Many architectures don't want to pull in iomap.c,
so they ended up duplicating pci_iomap from that file.
That function isn't trivial, and we are going to modify it
https://lkml.org/lkml/2011/11/14/183
so the duplication hurts.
This reduces the scope of the problem significantly,
by moving pci_iomap to a separate file and
referencing that from all architectures.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJPBZXBAAoJECgfDbjSjVRpuuYIAIMD0wE96MuTOSBJX4VG8VAP
UyjL9dsfMRy8CKioQo5/fxpTY07YBCWmNauSSX7pzgcoUKBfYIGn4Z1qwGYsWK9M
CzLs6PXLTugw0FtKobHZl/klRTWEBS6YOUjp9x568rplwF+Ppk7b993uj7eS/g+e
T0mUKzqg4/UavbHd9+W5KgC4drQ5hgtu2WZHoUxBK4umnd3C2G+U82Sthg50o/XU
SC8IGm39K8I36HoIWgXj3Y7nkOP3mQELohOT4ZPiVSmLvGS4i47+ix75anO+8ZvZ
jxHr8RC85IK1Nd89NZhbKOyvx0QQiwoKUZaTwcWXJNSOADzZnM6icdIsodc+Elo=
=ccQZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
lib: use generic pci_iomap on all architectures
Many architectures don't want to pull in iomap.c,
so they ended up duplicating pci_iomap from that file.
That function isn't trivial, and we are going to modify it
https://lkml.org/lkml/2011/11/14/183
so the duplication hurts.
This reduces the scope of the problem significantly,
by moving pci_iomap to a separate file and
referencing that from all architectures.
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
alpha: drop pci_iomap/pci_iounmap from pci-noop.c
mn10300: switch to GENERIC_PCI_IOMAP
mn10300: add missing __iomap markers
frv: switch to GENERIC_PCI_IOMAP
tile: switch to GENERIC_PCI_IOMAP
tile: don't panic on iomap
sparc: switch to GENERIC_PCI_IOMAP
sh: switch to GENERIC_PCI_IOMAP
powerpc: switch to GENERIC_PCI_IOMAP
parisc: switch to GENERIC_PCI_IOMAP
mips: switch to GENERIC_PCI_IOMAP
microblaze: switch to GENERIC_PCI_IOMAP
arm: switch to GENERIC_PCI_IOMAP
alpha: switch to GENERIC_PCI_IOMAP
lib: add GENERIC_PCI_IOMAP
lib: move GENERIC_IOMAP to lib/Kconfig
Fix up trivial conflicts due to changes nearby in arch/{m68k,score}/Kconfig
atomic24 support was used to semaphores in the past - but is no longer used.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
sparc copied pci_iomap from generic code, probably to avoid
pulling the rest of iomap.c in. Since that's in
a separate file now, we can reuse the common implementation.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Properly return the original destination buffer pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Kjetil Oftedal <oftedal@gmail.com>
This is setting things up so that we can correct the return
value, so that it properly returns the original destination
buffer pointer.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Kjetil Oftedal <oftedal@gmail.com>
Don't use floating point on Niagara2, use the traditional
plain Niagara code instead.
Unroll Niagara loops to 128 bytes for copy, and 256 bytes
for clear.
Signed-off-by: David S. Miller <davem@davemloft.net>
Should have been done in commit 1af08a1407f4 ("This is in preparation
for more generic atomic").
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arun Sharma <asharma@fb.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Hans-Christian Egtvedt" <hans-christian.egtvedt@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we are in the label cc_dword_align, registers %o0 and %o1 have the same last 2 bits,
but it's not guaranteed one of them is zero. So we can get unaligned memory access
in label ccte. Example of parameters which lead to this:
%o0=0x7ff183e9, %o1=0x8e709e7d, %g1=3
With the parameters I had a memory corruption, when the additional 5 bytes were rewritten.
This patch corrects the error.
One comment to the patch. We don't care about the third bit in %o1, because cc_end_cruft
stores word or less.
Signed-off-by: Tkhai Kirill <tkhai@yandex.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use bitmap_set() instead of calling __set_bit() each bit.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
SPIN_LOCK_UNLOCK is deprecated. Use the lockdep capable variant
instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
As noticed by Mikulas Patocka, the backoff macros don't
completely nop out for UP builds, we still get a
branch always and a delay slot nop.
Fix this by making the branch to the backoff spin loop
selective, then we can nop out the spin loop completely.
Signed-off-by: David S. Miller <davem@davemloft.net>
Simple microoptimizations for sparc64 atomic functions:
Save one instruction by using a delay slot.
Use %g1 instead of %g7, because %g1 is written earlier.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Basically tip-off the powerpc code, use a 64-bit type and atomic64_t
interfaces for the implementation.
This gets us off of the by-hand asm code I wrote, which frankly I
think probably ruins I-cache hit rates.
The idea was the keep the call chains less deep, but anything taking
the rw-semaphores probably is also calling other stuff and therefore
already has allocated a stack-frame. So no real stack frame savings
ever.
Ben H. has posted patches to make powerpc use 64-bit too and with some
abstractions we can probably use a shared header file somewhere.
With suggestions from Sam Ravnborg.
Signed-off-by: David S. Miller <davem@davemloft.net>
128 bytes is sufficient for the register window save area, but the
calling conventions allow the callee to save up to 6 incoming argument
registers into the stack frame after the register window save area.
This means a minimal stack frame is 176 bytes (128 + (6 * 8)).
This fixes random crashes when using the function tracer.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>