Keep xen_max_p2m_pfn up to date with the end of the extra memory
we're adding. It is possible that it will be too high since memory
may be truncated by a "mem=" option on the kernel command line, but
that won't matter.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
If extra memory is very much larger than the base memory size
then all of the base memory can be filled with structures reserved to
describe the extra memory, leaving no space for anything else.
Even at the maximum ratio there will be little space for anything else,
but this change is intended to at least allow the system to boot rather
than crash mysteriously.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
If an entire E820 RAM region is beyond mem_end, still add its
pages to the extra area so that space can be used by the kernel.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
If Xen gives us non-RAM E820 entries (dom0 only, typically), then
make sure the extra RAM region is beyond them. It's OK for
the extra space to grow into E820 regions, however.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When using the e820 map to get the initial pseudo-physical address space,
look for either Xen-provided memory which doesn't lie within an E820
region, or an E820 RAM region which extends beyond the Xen-provided
memory range.
Count these pages, and add them to a new "extra memory" range. This range
has an E820 RAM range to describe it - so the kernel will allocate page
structures for it - but it is also marked reserved so that the kernel
will not attempt to use it.
The balloon driver can then add this range as a set of currently
ballooned-out pages, which can be used to extend the domain beyond its
original size.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Rather than simply using a flat memory map from Xen, use its provided
E820 map. This allows the domain builder to tell the domain to reserve
space for more pages than those initially provided at domain-build time.
It also allows the host to specify holes in the address space (for
PCI-passthrough, for example).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When setting up a pte for a missing pfn (no matching mfn), just create
an empty pte rather than a junk mapping.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When building mfn parts of p2m structure, we rely on being able to
use mfn_to_virt, which in turn requires kernel to be mapped into
the linear area (which is distinct from the kernel image mapping
on 64-bit). Defer calling xen_build_mfn_list_list() until after
xen_setup_kernel_pagetable();
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
set_phys_to_machine() can return false on failure, which means a memory
allocation failure for the p2m structure. It can only fail if setting
the mfn for a pfn in previously unused address space. It is guaranteed
to succeed if you're setting a mapping to INVALID_P2M_ENTRY or updating
the mfn for an existing pfn.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Make the p2m structure a 3 level tree which covers the full possible
physical space.
The p2m structure contains mappings from the domain's pfns to system-wide
mfns. The structure has 3 levels and two roots. The first root is for
the domain's own use, and is linked with virtual addresses. The second
is all mfn references, and is used by Xen on save/restore to allow it to
update the p2m mapping for the domain.
At boot, the domain builder provides a simple flat p2m array for all the
initially present pages. We construct the two levels above that using
the early_brk allocator. After early boot time, set_phys_to_machine()
will allocate any missing levels using the normal kernel allocator
(at GFP_KERNEL, so it must be called in a normal blocking context).
Because the early_brk() API requires us to pre-reserve the maximum amount
of memory we could allocate, there is still a CONFIG_XEN_MAX_DOMAIN_MEMORY
config option, but its only negative side-effect is to increase the
kernel's apparent bss size. However, since all unused brk memory is
returned to the heap, there's no real downside to making it large.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Change event delivery to:
- mask+clear event in the upcall function
- use handle_fasteoi_irq as the handler
- unmask in the eoi function (and handle migration)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Allocate p2m tables based on the actual runtime maximum pfn rather than
the static config-time limit.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Use early brk mechanism to allocate p2m tables, to save memory when
booting non-Xen.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Tony Luck reports that the addition of the access_ok() check in commit
0eead9ab41 ("Don't dump task struct in a.out core-dumps") broke the
ia64 compile due to missing the necessary header file includes.
Rather than add yet another include (<asm/unistd.h>) to make everything
happy, just uninline the silly core dump helper functions and move the
bodies to fs/exec.c where they make a lot more sense.
dump_seek() in particular was too big to be an inline function anyway,
and none of them are in any way performance-critical. And we really
don't need to mess up our include file headers more than they already
are.
Reported-and-tested-by: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
ehea: Fix a checksum issue on the receive path
net: allow FEC driver to use fixed PHY support
tg3: restore rx_dropped accounting
b44: fix carrier detection on bind
net: clear heap allocations for privileged ethtool actions
NET: wimax, fix use after free
ATM: iphase, remove sleep-inside-atomic
ATM: mpc, fix use after free
ATM: solos-pci, remove use after free
net/fec: carrier off initially to avoid root mount failure
r8169: use device model DMA API
r8169: allocate with GFP_KERNEL flag when able to sleep
akiphie points out that a.out core-dumps have that odd task struct
dumping that was never used and was never really a good idea (it goes
back into the mists of history, probably the original core-dumping
code). Just remove it.
Also do the access_ok() check on dump_write(). It probably doesn't
matter (since normal filesystems all seem to do it anyway), but he
points out that it's normally done by the VFS layer, so ...
[ I suspect that we should possibly do "vfs_write()" instead of
calling ->write directly. That also does the whole fsnotify and write
statistics thing, which may or may not be a good idea. ]
And just to be anal, do this all for the x86-64 32-bit a.out emulation
code too, even though it's not enabled (and won't currently even
compile)
Reported-by: akiphie <akiphie@lavabit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ring-buffer: Fix typo of time extends per page
perf, MIPS: Support cross compiling of tools/perf for MIPS
perf: Fix incorrect copy_from_user() usage
* master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: relax ioremap prohibition (309caa9) for -final and -stable
ARM: 6440/1: ep93xx: DMA: fix channel_disable
cpuimx27: fix i2c bus selection
cpuimx27: fix compile when ULPI is selected
ARM: 6435/1: Fix HWCAP_TLS flag for ARM11MPCore/Cortex-A9
ARM: 6436/1: AT91: Fix power-saving in idle-mode on 926T processors
ARM: fix section mismatch warnings in Versatile Express
ARM: 6412/1: kprobes-decode: add support for MOVW instruction
ARM: 6419/1: mmu: Fix MT_MEMORY and MT_MEMORY_NONCACHED pte flags
ARM: 6416/1: errata: faulty hazard checking in the Store Buffer may lead to data corruption
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
omap: iommu-load cam register before flushing the entry
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: Silent spurious error message
drm/radeon/kms: fix bad cast/shift in evergreen.c
drm/radeon/kms: make TV/DFP table info less verbose
drm/radeon/kms: leave certain CP int bits enabled
drm/radeon/kms: avoid corner case issue with unmappable vram V2
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, numa: For each node, register the memory blocks actually used
x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order
x86, mce, therm_throt.c: Fix missing curly braces in error handling logic
Commit 0793448 "DMAENGINE: generic channel status v2" changed the interface for
how dma channel progress is retrieved. It inadvertently exported an internal
helper function ioat_tx_status() instead of ioat_dma_tx_status(). The latter
polls the hardware to get the latest completion state, while the helper just
evaluates the current state without touching hardware. The effect is that we
end up waiting for completion timeouts or descriptor allocation errors before
the completion state is updated.
iperf (before fix):
[SUM] 0.0-41.3 sec 364 MBytes 73.9 Mbits/sec
iperf (after fix):
[SUM] 0.0- 4.5 sec 499 MBytes 940 Mbits/sec
This is a regression starting with 2.6.35.
Cc: <stable@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Reported-by: Richard Scobie <richard@sauce.co.nz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currently we set all skbs with CHECKSUM_UNNECESSARY, even
those whose protocol we don't know. This patch just
add the CHECKSUM_COMPLETE tag for non TCP/UDP packets.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As of commit 43a9aa64a2 "NFSD:
Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR", we sometimes call
fh_unlock on a filehandle that isn't fully initialized.
We should fix up the callers, but as a quick fix it is also sufficient
just to remove this assertion.
Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
At least one board using the FEC driver does not have a conventional
PHY attached to it, it is directly connected to a somewhat simple
ethernet switch (the board is the SnapGear/LITE, and the attached
4-port ethernet switch is a RealTek RTL8305). This switch does not
present the usual register interface of a PHY, it presents nothing.
So a PHY scan will find nothing - it finds ID's of 0 for each PHY
on the attached MII bus.
After the FEC driver was changed to use phylib for supporting PHYs
it no longer works on this particular board/switch setup.
Add code support to use a fixed phy if no PHY is found on the MII bus.
This is based on the way the cpmac.c driver solved this same problem.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
... but produce a big warning about the problem as encouragement
for people to fix their drivers.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When channel_disable() is called, it disables per channel interrupts and
waits until channels state becomes STATE_STALL, and then disables the
channel. Now, if the DMA transfer is disabled while the channel is in
STATE_NEXT we will not wait anything and disable the channel immediately.
This seems to cause weird data corruption for example in audio transfers.
Fix is to wait while we are in STATE_NEXT or STATE_ON and only then
disable the channel.
Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
Acked-by: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Time stamps for the ring buffer are created by the difference between
two events. Each page of the ring buffer holds a full 64 bit timestamp.
Each event has a 27 bit delta stamp from the last event. The unit of time
is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events
happen more than 134 milliseconds apart, a time extend is inserted
to add more bits for the delta. The time extend has 59 bits, which
is good for ~18 years.
Currently the time extend is committed separately from the event.
If an event is discarded before it is committed, due to filtering,
the time extend still exists. If all events are being filtered, then
after ~134 milliseconds a new time extend will be added to the buffer.
This can only happen till the end of the page. Since each page holds
a full timestamp, there is no reason to add a time extend to the
beginning of a page. Time extends can only fill a page that has actual
data at the beginning, so there is no fear that time extends will fill
more than a page without any data.
When reading an event, a loop is made to skip over time extends
since they are only used to maintain the time stamp and are never
given to the caller. As a paranoid check to prevent the loop running
forever, with the knowledge that time extends may only fill a page,
a check is made that tests the iteration of the loop, and if the
iteration is more than the number of time extends that can fit in a page
a warning is printed and the ring buffer is disabled (all of ftrace
is also disabled with it).
There is another event type that is called a TIMESTAMP which can
hold 64 bits of data in the theoretical case that two events happen
18 years apart. This code has not been implemented, but the name
of this event exists, as well as the structure for it. The
size of a TIMESTAMP is 16 bytes, where as a time extend is only
8 bytes. The macro used to calculate how many time extends can fit on
a page used the TIMESTAMP size instead of the time extend size
cutting the amount in half.
The following test case can easily trigger the warning since we only
need to have half the page filled with time extends to trigger the
warning:
# cd /sys/kernel/debug/tracing/
# echo function > current_tracer
# echo 'common_pid < 0' > events/ftrace/function/filter
# echo > trace
# echo 1 > trace_marker
# sleep 120
# cat trace
Enabling the function tracer and then setting the filter to only trace
functions where the process id is negative (no events), then clearing
the trace buffer to ensure that we have nothing in the buffer,
then write to trace_marker to add an event to the beginning of a page,
sleep for 2 minutes (only 35 seconds is probably needed, but this
guarantees the bug), and then finally reading the trace which will
trigger the bug.
This patch fixes the typo and prevents the false positive of that warning.
Reported-by: Hans J. Koch <hjk@linutronix.de>
Tested-by: Hans J. Koch <hjk@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
I see the following error message in my kernel log from time to time:
radeon 0000:07:00.0: ffff88007c334000 reserve failed for wait
radeon 0000:07:00.0: ffff88007c334000 reserve failed for wait
After investigation, it turns out that there's nothing to be afraid of
and everything works as intended. So remove the spurious log message.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Make TV standard and DFP table revisions debug only.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These bits are used for internal communication and should
be left enabled. This may fix s/r issues on some systems.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We should not allocate any object into unmappable vram if we
have no means to access them which on all GPU means having the
CP running and on newer GPU having the blit utility working.
This patch limit the vram allocation to visible vram until
we have acceleration up and running.
Note that it's more than unlikely that we run into any issue
related to that as when acceleration is not woring userspace
should allocate any object in vram beside front buffer which
should fit in visible vram.
V2 use real_vram_size as mc_vram_size could be bigger than
the actual amount of vram
[airlied: fixup r700_cp_stop case]
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
perf events: repair incorrect use of copy_from_user
This makes the perf_event_period() return 0 instead of
-EFAULT on success.
Signed-off-by: John Blackwood<john.blackwood@ccur.com>
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100928220311.GA18145@tsunami.ccur.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch disables the fanotify syscalls by just not building them and
letting the cond_syscall() statements in kernel/sys_ni.c redirect them
to sys_ni_syscall().
It was pointed out by Tvrtko Ursulin that the fanotify interface did not
include an explicit prioritization between groups. This is necessary
for fanotify to be usable for hierarchical storage management software,
as they must get first access to the file, before inotify-like notifiers
see the file.
This feature can be added in an ABI compatible way in the next release
(by using a number of bits in the flags field to carry the info) but it
was suggested by Alan that maybe we should just hold off and do it in
the next cycle, likely with an (new) explicit argument to the syscall.
I don't like this approach best as I know people are already starting to
use the current interface, but Alan is all wise and noone on list backed
me up with just using what we have. I feel this is needlessly ripping
the rug out from under people at the last minute, but if others think it
needs to be a new argument it might be the best way forward.
Three choices:
Go with what we got (and implement the new feature next cycle). Add a
new field right now (and implement the new feature next cycle). Wait
till next cycle to release the ABI (and implement the new feature next
cycle). This is number 3.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 511d22247b (tg3: 64 bit stats on all arches), overlooked the
rx_dropped accounting.
We use a full "struct rtnl_link_stats64" to hold rx_dropped value, but
forgot to report it in tg3_get_stats64().
Use an "unsigned long" instead to shrink "struct tg3" by 176 bytes, and
report this value to stats readers.
Increment rx_dropped counter for oversized frames.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Matt Carlson <mcarlson@broadcom.com>
Acked-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For carrier detection to work properly when binding the driver with a cable
unplugged, netif_carrier_off() should be called after register_netdev(),
not before.
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>