Currently UBI erases all corrupted eraseblocks, irrespectively of the nature
of corruption: corruption due to power cuts and non-power cut corruption.
The former case is OK, but the latter is not, because UBI may destroy
potentially important data.
With this patch, during scanning, when UBI hits a PEB with corrupted VID
header, it checks whether this PEB contains only 0xFF data. If yes, it is
safe to erase this PEB and it is put to the 'erase' list. If not, this may
be important data and it is better to avoid erasing this PEB. Instead,
UBI puts it to the corr list and moves out of the pool of available PEB.
IOW, UBI preserves this PEB.
Such corrupted PEB lessen the amount of available PEBs. So the more of them
we accumulate, the less PEBs are available. The maximum amount of non-power
cut corrupted PEBs is 8.
This patch is a response to UBIFS problem where reporter
(Matthew L. Creech <mlcreech@gmail.com>) observes that UBIFS index points
to an unmapped LEB. The theory is that corresponding PEB somehow got
corrupted and UBI wiped it. This patch (actually a series of patches)
tries to make sure such PEBs are preserved - this would make it is easier
to analyze the corruption.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Start using the 'corr' list and add there PEBs which look truly corrupted,
which means they have corrupted VID header and the data which follows the
corrupted header does not contain all 0xFF bytes.
At the moment, this does not change UBI functionality much because these
PEBs will be erase when scanning finishes. But the plan is to teach UBI
preserving them.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Introduce a helper function to print hexdump: 'ubi_dbg_print_hex_dump()'.
It is compiled out if debugging is enabled. Will be used in the next patch.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch turns static function 'check_pattern()' into a non-static
'ubi_check_pattern()'. This is just a preparation for the chages which
are coming in the next patches.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Currently UBI maintains 2 lists of PEBs during scanning:
1. 'erase' list - PEBs which have no corruptions but should be erased
2. 'corr' list - PEBs which have some corruptions and should be erased
But we do not really need 2 lists for PEBs which should be erased after
scanning is done - this is redundant. So this patch makes sure all PEBs
which are corrupted are moved to the head of the 'erase' list. We add
them to the head to make sure they are erased first and we get rid of
corruption ASAP.
However, we do not remove the 'corr' list and realted functions, because
the plan is to use this list for other purposes. Namely, we plan to
put eraseblocks with corruption which does not look like it was caused
by unclean power cut. Then we'll preserve thes PEBs in order to avoid
killing potentially valuable user data.
This patch also amends PEBs accounting, because it was closely tight to
the 'erase'/'corr' lists separation.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch introduces 'add_corrupted()' function and separates out 'corr' list
manipulation from the common 'add_to_list()' function. This is just a
preparation for further changes - this patch does not change functionality.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch improves readability and simplifies scanning code by changing a
long cascade of 'if' statements to a switch statement. This should presumably
be a little faster as well.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Rename local variable 'ec_corr' into 'ec_err' to make the code a little bit
more readable. 'ec_err' is more appropriate because it sounds more like 'error
when EC was read' and it looks more logical because we use it together with
'err'. Just a minor nicification which should improve the rather complex
scanning code.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Currently UBI has one small flaw - when we read EC or VID header, but find only
0xFF bytes, we return UBI_IO_FF and do not report whether we had bit-flips or
not. In case of the VID header, the scanning code adds this PEB to the free list,
even though there were bit-flips.
Imagine the following situation: we start writing VID header to a PEB and have a
power cut, so the PEB becomes unstable. When we scan and read the PEB, we get
a bit-flip. Currently, UBI would just ignore this and treat the PEB as free. This
patch changes UBI behavior and now UBI will schedule this PEB for erasure.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The 'UBI_IO_PEB_EMPTY' and 'UBI_IO_PEB_FREE' are essentially the same
and mean that there are only 0xFF bytes instead of headers. Simplify
UBI a little by turning them into a single 'UBI_IO_FF' error code.
Also, stop maintaining commentaries in 'ubi_io_read_vid_hdr()' which are
almost identical to commentaries in 'ubi_io_read_ec_hdr()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Rename UBI_IO_BAD_HDR_READ into UBI_IO_BAD_HDR_EBADMSG which is presumably more
self-documenting and readable. Indeed, the '_READ' suffix does not tell much and
even confuses, while '_EBADMSG' tells about uncorrectable ECC error, because we
use -EBADMSG all over the place to represent ECC errors.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cleanup the Kconfig for UBI by using menuconfig to enable/disable the entire
driver. Remove the dependency checks for MTD_UBI and MTD_UBI_DEBUG by
wrapping the options in if/endif blocks and remove any redundant checks.
Remove all default n since that is the Kconfig default. Change menu "Additional
UBI debugging messages" into a comment to remove one menu level.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Tony Luck reports that the addition of the access_ok() check in commit
0eead9ab41 ("Don't dump task struct in a.out core-dumps") broke the
ia64 compile due to missing the necessary header file includes.
Rather than add yet another include (<asm/unistd.h>) to make everything
happy, just uninline the silly core dump helper functions and move the
bodies to fs/exec.c where they make a lot more sense.
dump_seek() in particular was too big to be an inline function anyway,
and none of them are in any way performance-critical. And we really
don't need to mess up our include file headers more than they already
are.
Reported-and-tested-by: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
ehea: Fix a checksum issue on the receive path
net: allow FEC driver to use fixed PHY support
tg3: restore rx_dropped accounting
b44: fix carrier detection on bind
net: clear heap allocations for privileged ethtool actions
NET: wimax, fix use after free
ATM: iphase, remove sleep-inside-atomic
ATM: mpc, fix use after free
ATM: solos-pci, remove use after free
net/fec: carrier off initially to avoid root mount failure
r8169: use device model DMA API
r8169: allocate with GFP_KERNEL flag when able to sleep
akiphie points out that a.out core-dumps have that odd task struct
dumping that was never used and was never really a good idea (it goes
back into the mists of history, probably the original core-dumping
code). Just remove it.
Also do the access_ok() check on dump_write(). It probably doesn't
matter (since normal filesystems all seem to do it anyway), but he
points out that it's normally done by the VFS layer, so ...
[ I suspect that we should possibly do "vfs_write()" instead of
calling ->write directly. That also does the whole fsnotify and write
statistics thing, which may or may not be a good idea. ]
And just to be anal, do this all for the x86-64 32-bit a.out emulation
code too, even though it's not enabled (and won't currently even
compile)
Reported-by: akiphie <akiphie@lavabit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ring-buffer: Fix typo of time extends per page
perf, MIPS: Support cross compiling of tools/perf for MIPS
perf: Fix incorrect copy_from_user() usage
* master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: relax ioremap prohibition (309caa9) for -final and -stable
ARM: 6440/1: ep93xx: DMA: fix channel_disable
cpuimx27: fix i2c bus selection
cpuimx27: fix compile when ULPI is selected
ARM: 6435/1: Fix HWCAP_TLS flag for ARM11MPCore/Cortex-A9
ARM: 6436/1: AT91: Fix power-saving in idle-mode on 926T processors
ARM: fix section mismatch warnings in Versatile Express
ARM: 6412/1: kprobes-decode: add support for MOVW instruction
ARM: 6419/1: mmu: Fix MT_MEMORY and MT_MEMORY_NONCACHED pte flags
ARM: 6416/1: errata: faulty hazard checking in the Store Buffer may lead to data corruption
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
omap: iommu-load cam register before flushing the entry
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: Silent spurious error message
drm/radeon/kms: fix bad cast/shift in evergreen.c
drm/radeon/kms: make TV/DFP table info less verbose
drm/radeon/kms: leave certain CP int bits enabled
drm/radeon/kms: avoid corner case issue with unmappable vram V2
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, numa: For each node, register the memory blocks actually used
x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order
x86, mce, therm_throt.c: Fix missing curly braces in error handling logic
Commit 0793448 "DMAENGINE: generic channel status v2" changed the interface for
how dma channel progress is retrieved. It inadvertently exported an internal
helper function ioat_tx_status() instead of ioat_dma_tx_status(). The latter
polls the hardware to get the latest completion state, while the helper just
evaluates the current state without touching hardware. The effect is that we
end up waiting for completion timeouts or descriptor allocation errors before
the completion state is updated.
iperf (before fix):
[SUM] 0.0-41.3 sec 364 MBytes 73.9 Mbits/sec
iperf (after fix):
[SUM] 0.0- 4.5 sec 499 MBytes 940 Mbits/sec
This is a regression starting with 2.6.35.
Cc: <stable@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Reported-by: Richard Scobie <richard@sauce.co.nz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currently we set all skbs with CHECKSUM_UNNECESSARY, even
those whose protocol we don't know. This patch just
add the CHECKSUM_COMPLETE tag for non TCP/UDP packets.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As of commit 43a9aa64a2 "NFSD:
Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR", we sometimes call
fh_unlock on a filehandle that isn't fully initialized.
We should fix up the callers, but as a quick fix it is also sufficient
just to remove this assertion.
Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
At least one board using the FEC driver does not have a conventional
PHY attached to it, it is directly connected to a somewhat simple
ethernet switch (the board is the SnapGear/LITE, and the attached
4-port ethernet switch is a RealTek RTL8305). This switch does not
present the usual register interface of a PHY, it presents nothing.
So a PHY scan will find nothing - it finds ID's of 0 for each PHY
on the attached MII bus.
After the FEC driver was changed to use phylib for supporting PHYs
it no longer works on this particular board/switch setup.
Add code support to use a fixed phy if no PHY is found on the MII bus.
This is based on the way the cpmac.c driver solved this same problem.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
... but produce a big warning about the problem as encouragement
for people to fix their drivers.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When channel_disable() is called, it disables per channel interrupts and
waits until channels state becomes STATE_STALL, and then disables the
channel. Now, if the DMA transfer is disabled while the channel is in
STATE_NEXT we will not wait anything and disable the channel immediately.
This seems to cause weird data corruption for example in audio transfers.
Fix is to wait while we are in STATE_NEXT or STATE_ON and only then
disable the channel.
Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
Acked-by: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Time stamps for the ring buffer are created by the difference between
two events. Each page of the ring buffer holds a full 64 bit timestamp.
Each event has a 27 bit delta stamp from the last event. The unit of time
is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events
happen more than 134 milliseconds apart, a time extend is inserted
to add more bits for the delta. The time extend has 59 bits, which
is good for ~18 years.
Currently the time extend is committed separately from the event.
If an event is discarded before it is committed, due to filtering,
the time extend still exists. If all events are being filtered, then
after ~134 milliseconds a new time extend will be added to the buffer.
This can only happen till the end of the page. Since each page holds
a full timestamp, there is no reason to add a time extend to the
beginning of a page. Time extends can only fill a page that has actual
data at the beginning, so there is no fear that time extends will fill
more than a page without any data.
When reading an event, a loop is made to skip over time extends
since they are only used to maintain the time stamp and are never
given to the caller. As a paranoid check to prevent the loop running
forever, with the knowledge that time extends may only fill a page,
a check is made that tests the iteration of the loop, and if the
iteration is more than the number of time extends that can fit in a page
a warning is printed and the ring buffer is disabled (all of ftrace
is also disabled with it).
There is another event type that is called a TIMESTAMP which can
hold 64 bits of data in the theoretical case that two events happen
18 years apart. This code has not been implemented, but the name
of this event exists, as well as the structure for it. The
size of a TIMESTAMP is 16 bytes, where as a time extend is only
8 bytes. The macro used to calculate how many time extends can fit on
a page used the TIMESTAMP size instead of the time extend size
cutting the amount in half.
The following test case can easily trigger the warning since we only
need to have half the page filled with time extends to trigger the
warning:
# cd /sys/kernel/debug/tracing/
# echo function > current_tracer
# echo 'common_pid < 0' > events/ftrace/function/filter
# echo > trace
# echo 1 > trace_marker
# sleep 120
# cat trace
Enabling the function tracer and then setting the filter to only trace
functions where the process id is negative (no events), then clearing
the trace buffer to ensure that we have nothing in the buffer,
then write to trace_marker to add an event to the beginning of a page,
sleep for 2 minutes (only 35 seconds is probably needed, but this
guarantees the bug), and then finally reading the trace which will
trigger the bug.
This patch fixes the typo and prevents the false positive of that warning.
Reported-by: Hans J. Koch <hjk@linutronix.de>
Tested-by: Hans J. Koch <hjk@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
I see the following error message in my kernel log from time to time:
radeon 0000:07:00.0: ffff88007c334000 reserve failed for wait
radeon 0000:07:00.0: ffff88007c334000 reserve failed for wait
After investigation, it turns out that there's nothing to be afraid of
and everything works as intended. So remove the spurious log message.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Make TV standard and DFP table revisions debug only.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These bits are used for internal communication and should
be left enabled. This may fix s/r issues on some systems.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We should not allocate any object into unmappable vram if we
have no means to access them which on all GPU means having the
CP running and on newer GPU having the blit utility working.
This patch limit the vram allocation to visible vram until
we have acceleration up and running.
Note that it's more than unlikely that we run into any issue
related to that as when acceleration is not woring userspace
should allocate any object in vram beside front buffer which
should fit in visible vram.
V2 use real_vram_size as mc_vram_size could be bigger than
the actual amount of vram
[airlied: fixup r700_cp_stop case]
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
perf events: repair incorrect use of copy_from_user
This makes the perf_event_period() return 0 instead of
-EFAULT on success.
Signed-off-by: John Blackwood<john.blackwood@ccur.com>
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100928220311.GA18145@tsunami.ccur.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch disables the fanotify syscalls by just not building them and
letting the cond_syscall() statements in kernel/sys_ni.c redirect them
to sys_ni_syscall().
It was pointed out by Tvrtko Ursulin that the fanotify interface did not
include an explicit prioritization between groups. This is necessary
for fanotify to be usable for hierarchical storage management software,
as they must get first access to the file, before inotify-like notifiers
see the file.
This feature can be added in an ABI compatible way in the next release
(by using a number of bits in the flags field to carry the info) but it
was suggested by Alan that maybe we should just hold off and do it in
the next cycle, likely with an (new) explicit argument to the syscall.
I don't like this approach best as I know people are already starting to
use the current interface, but Alan is all wise and noone on list backed
me up with just using what we have. I feel this is needlessly ripping
the rug out from under people at the last minute, but if others think it
needs to be a new argument it might be the best way forward.
Three choices:
Go with what we got (and implement the new feature next cycle). Add a
new field right now (and implement the new feature next cycle). Wait
till next cycle to release the ABI (and implement the new feature next
cycle). This is number 3.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 511d22247b (tg3: 64 bit stats on all arches), overlooked the
rx_dropped accounting.
We use a full "struct rtnl_link_stats64" to hold rx_dropped value, but
forgot to report it in tg3_get_stats64().
Use an "unsigned long" instead to shrink "struct tg3" by 176 bytes, and
report this value to stats readers.
Increment rx_dropped counter for oversized frames.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Matt Carlson <mcarlson@broadcom.com>
Acked-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For carrier detection to work properly when binding the driver with a cable
unplugged, netif_carrier_off() should be called after register_netdev(),
not before.
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russ reported SGI UV is broken recently. He said:
| The SRAT table shows that memory range is spread over two nodes.
|
| SRAT: Node 0 PXM 0 100000000-800000000
| SRAT: Node 1 PXM 1 800000000-1000000000
| SRAT: Node 0 PXM 0 1000000000-1080000000
|
|Previously, the kernel early_node_map[] would show three entries
|with the proper node.
|
|[ 0.000000] 0: 0x00100000 -> 0x00800000
|[ 0.000000] 1: 0x00800000 -> 0x01000000
|[ 0.000000] 0: 0x01000000 -> 0x01080000
|
|The problem is recent community kernel early_node_map[] shows
|only two entries with the node 0 entry overlapping the node 1
|entry.
|
| 0: 0x00100000 -> 0x01080000
| 1: 0x00800000 -> 0x01000000
After looking at the changelog, Found out that it has been broken for a while by
following commit
|commit 8716273cae
|Author: David Rientjes <rientjes@google.com>
|Date: Fri Sep 25 15:20:04 2009 -0700
|
| x86: Export srat physical topology
Before that commit, register_active_regions() is called for every SRAT memory
entry right away.
Use nodememblk_range[] instead of nodes[] in order to make sure we
capture the actual memory blocks registered with each node. nodes[]
contains an extended range which spans all memory regions associated
with a node, but that does not mean that all the memory in between are
included.
Reported-by: Russ Anderson <rja@sgi.com>
Tested-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB27BDF.5000800@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org> 2.6.33 .34 .35 .36
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Several other ethtool functions leave heap uncleared (potentially) by
drivers. Some interfaces appear safe (eeprom, etc), in that the sizes
are well controlled. In some situations (e.g. unchecked error conditions),
the heap will remain unchanged in areas before copying back to userspace.
Note that these are less of an issue since these all require CAP_NET_ADMIN.
Cc: stable@kernel.org
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanse found that i2400m_rx frees skb, but still uses skb->len even
though it has skb_len defined. So use skb_len properly in the code.
And also define it unsinged int rather than size_t to solve
compilation warnings.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: linux-wimax@intel.com
Acked-by: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanse found that ia_init_one locks a spinlock and inside of that it
calls ia_start which calls:
* request_irq
* tx_init which does kmalloc(GFP_KERNEL)
Both of them can thus sleep and result in a deadlock. I don't see a
reason to have a per-device spinlock there which is used only there
and inited right before the lock location. So remove it completely.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanse found that mpc_push frees skb and then it dereferences it. It
is a typo, new_skb should be dereferenced there.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stanse found we do in console_show:
kfree_skb(skb);
return skb->len;
which is not good. Fix that by remembering the len and use it in the
function instead.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
kbuild: fix oldnoconfig to do the right thing
kconfig: Temporarily disable dependency warnings
kconfig: delay symbol direct dependency initialization