Currently when UBIFS fills up the current bud (which is the last in the journal
head) and switches to the next bud, it first writes the log reference node for
the next bud and only after this synchronizes the write-buffer of the previous
bud. This is not a big deal, but an unclean power cut may lead to a situation
when we have corruption in a next-to-last bud, although it is much more logical
that we have to have corruption only in the last bud.
This patch also removes write-buffer synchronization from
'ubifs_wbuf_seek_nolock()' because this is not needed anymore (we synchronize
the write-buffer explicitly everywhere now) and also because this is just
prone to various errors.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This is a minor change which makes 2 functions static because they
are not used outside the gc.c file: 'data_nodes_cmp()' and
'nondata_nodes_cmp()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We have duplicated code in 'ubifs_garbage_collect()' and
'ubifs_rcvry_gc_commit()', which is about handling the special case of free
LEB. In both cases we just want to garbage-collect the LEB using
'ubifs_garbage_collect_leb()'.
This patch teaches 'ubifs_garbage_collect_leb()' to handle free LEB's so that
the caller does not have to do this and the duplicated code is removed.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Commit 2fde99cb55 "UBIFS: mark VFS SB RO too"
introduced regression. This commit made UBIFS set the 'MS_RDONLY' flag in the
VFS superblock when it switches to R/O mode due to an error. This was done
to make VFS show the R/O UBIFS flag in /proc/mounts.
However, several places in UBIFS relied on the 'MS_RDONLY' flag and assume this
flag can only change when we re-mount. For example, 'ubifs_put_super()'.
This patch introduces new UBIFS flag - 'c->ro_mount' which changes only when
we re-mount, and preserves the way UBIFS was originally mounted (R/W or R/O).
This allows us to de-initialize UBIFS cleanly in 'ubifs_put_super()'.
This patch also changes all 'ubifs_assert(!c->ro_media)' assertions to
'ubifs_assert(!c->ro_media && !c->ro_mount)', because we never should write
anything if the FS was mounter R/O.
All the places where we test for 'MS_RDONLY' flag in the VFS SB were changed
and now we test the 'c->ro_mount' flag instead, because it preserves the
original UBIFS mount type, unlike the 'MS_RDONLY' flag.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The R/O state may have various reasons:
1. The UBI volume is R/O
2. The FS is mounted R/O
3. The FS switched to R/O mode because of an error
However, in UBIFS we have only one variable which represents cases
1 and 3 - 'c->ro_media'. Indeed, we set this to 1 if we switch to
R/O mode due to an error, and then we test it in many places to
make sure that we stop writing as soon as the error happens.
But this is very unclean. One consequence of this, for example, is
that in 'ubifs_remount_fs()' we use 'c->ro_media' to check whether
we are in R/O mode because on an error, and we print a message
in this case. However, if we are in R/O mode because the media
is R/O, our message is bogus.
This patch introduces new flag - 'c->ro_error' which is set when
we switch to R/O mode because of an error. It also changes all
"if (c->ro_media)" checks to "if (c->ro_error)" checks, because
this is what the checks actually mean. We do not need to check
for 'c->ro_media' because if the UBI volume is in R/O mode, we
do not allow R/W mounting, and now writes can happen. This is
guaranteed by VFS. But it is good to double-check this, so this
patch also adds many "ubifs_assert(!c->ro_media)" checks.
In the 'ubifs_remount_fs()' function this patch makes a bit more
changes - it fixes the error messages as well.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The UBIFS bug in the GC list sorting comparison functions inspired
me to write internal debugging check functions which verify that
the list of nodes is sorted properly.
So, this patch implements 2 new debugging functions:
o 'dbg_check_data_nodes_order()' - check order of data nodes list
o 'dbg_check_nondata_nodes_order()' - check order of non-data nodes list
The debugging functions are executed only if general UBIFS debugging checks are
enabled. And they are compiled out if UBIFS debugging is disabled.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When running the integrity test ('integck' from mtd-utils) on current
UBIFS on 2.6.35, I see that assertions in UBIFS 'list_sort()' comparison
functions trigger sometimes, e.g.:
UBIFS assert failed in data_nodes_cmp at 132 (pid 28311)
My investigation showed that this happens when 'list_sort()' calls the 'cmp()'
function with equivalent arguments. In this case, the 'struct list_head'
parameter, passed to 'cmp()' is bogus, and it does not belong to any element in
the original list.
And this issue seems to be introduced by commit:
commit 835cc0c847
Author: Don Mullis <don.mullis@gmail.com>
Date: Fri Mar 5 13:43:15 2010 -0800
It is easy to work around the issue by doing:
if (a == b)
return 0;
in UBIFS. It works, but 'lib_sort()' should nevertheless be fixed. Although it
is harmless to have this piece of code in UBIFS.
This patch adds that code to both UBIFS 'cmp()' functions:
'data_nodes_cmp()' and 'nondata_nodes_cmp()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Improve assertions in gc.c in the comparison functions for 'list_sort()': check
key types _and_ node types.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
In comparison function for 'list_sort()' we use key type to distinguish between
node types. However, we have a bit simper way to detect node type -
'snod->type'. This more logical to use, comparing to decoding key types. Also
allows to get rid of 2 local variables.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When moving nodes in GC, do not try to look up truncation nodes in TNC,
because they do not exist there. This would be harmless, because the TNC
look-up would fail, if we did not have bug 'ubifs_add_snod()' which reads
garbage into 'snod->key'. But in any case, it is less error prone to
explicitly ignore everything but inode, data, dentry and xentry nodes.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch fixes the following false assertion warning:
UBIFS assert failed in data_nodes_cmp at 130 (pid 15107)
The assertion was wrong because it did not take into account that the
node can be an xentry.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
'ubifs_garbage_collect_leb()' should never return '-ENOSPC', and if it
does, this is an error. Thus, do not treat this error code specially.
'-EAGAIN' is a special error code, but not '-ENOSPC'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
In 'ubifs_garbage_collect()' on error path, we first switch to R/O mode, and
then synchronize write-buffers (to make sure no data are lost). But the GC
write-buffer synchronization will fail, because we are already in R/O mode.
This patch re-orders this and makes sure we first synchronize the write-buffer,
and then switch to R/O mode.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
There are two copies of list_sort() in the tree already, one in the DRM
code, another in ubifs. Now XFS needs this as well. Create a generic
list_sort() function from the ubifs version and convert existing users
to it so we don't end up with yet another copy in the tree.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Artem Bityutskiy <dedekind@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At the moment UBIFS print large and scary error messages and
flash dumps in case of nearly any corruption, even if it is
a recoverable corruption. For example, if the master node is
corrupted, ubifs_scan() prints error dumps, then UBIFS recovers
just fine and goes on.
This patch makes UBIFS print scary error messages only in
real cases, which are not recoverable. It adds 'quiet' argument
to the 'ubifs_scan()' function, so the caller may ask 'ubi_scan()'
not to print error messages if the caller is able to do recovery.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Adrian Hunter <Adrian.Hunter@nokia.com>
The 'joinup()' function cannot deal with situations when nodes
go in reverse order - it just leaves them in this order. This
patch implement full nodes sorting using n*log(n) algorithm.
It sorts data nodes for bulk-read, and direntry nodes for
readdir().
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
- preserve the idx_gc list - it will be needed in the same
state, should UBIFS be remounted rw again
- prevent remounting ro if we have switched to read only
mode (due to a fatal error)
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
When freeing the c->idx_lebs list, we have to release the LEBs as well,
because we might be called from mount to read-only mode code. Otherwise
the LEBs stay taken forever, which may cause problems when we re-mount
back ro RW mode.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Make garbage collection try to keep data nodes from the same
inode together and in ascending order. This improves
performance when reading those nodes especially when bulk-read
is used.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
IS_ERR() macro already has unlikely(), so do not use constructions
like 'if (unlikely(IS_ERR())'.
Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
- update GC sequence number if any nodes may have been moved
even if GC did not finish the LEB
- don't ignore error return when reading
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
The TNC mutex is unlocked prematurely when reading leaf nodes
with non-hashed keys. This is unsafe because the node may be
moved by garbage collection and the eraseblock unmapped, although
that has never actually happened during stress testing.
This patch fixes the flaw by detecting the race and retrying with
the TNC mutex locked.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
This is a new flash file system. See
http://www.linux-mtd.infradead.org/doc/ubifs.html
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>