OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Alexey Dobriyan	5bcd7ff9e1	proc: proc_init_inodecache() can't fail kmem_cache creation code will panic, don't return anything. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2008-10-23 13:27:50 +04:00
Joe Korty	7c88db0cb5	proc: fix vma display mismatch between /proc/pid/{maps,smaps} Commit `4752c36978` aka "maps4: simplify interdependence of maps and smaps" broke /proc/pid/smaps, causing it to display some vmas twice and other vmas not at all. For example: grep .- /proc/1/smaps >/tmp/smaps; diff /proc/1/maps /tmp/smaps 1 25d24 2 < 7fd7e23aa000-7fd7e23ac000 rw-p 7fd7e23aa000 00:00 0 3 28a28 4 > ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] The bug has something to do with setting m->version before all the seq_printf's have been performed. show_map was doing this correctly, but show_smap was doing this in the middle of its seq_printf sequence. This patch arranges things so that the setting of m->version in show_smap is also done at the end of its seq_printf sequence. Testing: in addition to the above grep test, for each process I summed up the 'Rss' fields of /proc/pid/smaps and compared that to the 'VmRSS' field of /proc/pid/status. All matched except for Xorg (which has a /dev/mem mapping which Rss accounts for but VmRSS does not). This result gives us some confidence that neither /proc/pid/maps nor /proc/pid/smaps are any longer skipping or double-counting vmas. Signed-off-by: Joe Korty <joe.korty@ccur.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2008-10-23 13:21:29 +04:00
Paul Mundt	2515ddc6db	binfmt_elf_fdpic: Update for cputime changes. Commit `f06febc96b` ("timers: fix itimer/ many thread hang") introduced a new task_cputime interface and subsequently only converted binfmt_elf over to it. This results in the build for binfmt_elf_fdpic blowing up given that p->signal->{u,s}time have disappeared from underneath us. Apply the same trivial fix from binfmt_elf to binfmt_elf_fdpic. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 20:17:18 -07:00
Linus Torvalds	9301975ec2	Merge branch 'genirq-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip This merges branches irq/genirq, irq/sparseirq-v4, timers/hpet-percpu and x86/uv. The sparseirq branch is just preliminary groundwork: no sparse IRQs are actually implemented by this tree anymore - just the new APIs are added while keeping the old way intact as well (the new APIs map 1:1 to irq_desc[]). The 'real' sparse IRQ support will then be a relatively small patch ontop of this - with a v2.6.29 merge target. * 'genirq-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (178 commits) genirq: improve include files intr_remapping: fix typo io_apic: make irq_mis_count available on 64-bit too genirq: fix name space collisions of nr_irqs in arch/* genirq: fix name space collision of nr_irqs in autoprobe.c genirq: use iterators for irq_desc loops proc: fixup irq iterator genirq: add reverse iterator for irq_desc x86: move ack_bad_irq() to irq.c x86: unify show_interrupts() and proc helpers x86: cleanup show_interrupts genirq: cleanup the sparseirq modifications genirq: remove artifacts from sparseirq removal genirq: revert dynarray genirq: remove irq_to_desc_alloc genirq: remove sparse irq code genirq: use inline function for irq_to_desc genirq: consolidate nr_irqs and for_each_irq_desc() x86: remove sparse irq from Kconfig genirq: define nr_irqs for architectures with GENERIC_HARDIRQS=n ...	2008-10-20 13:23:01 -07:00
Linus Torvalds	99ebcf8285	Merge branch 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits) fix documentation of sysrq-q really Fix documentation of sysrq-q timer_list: add base address to clock base timer_list: print cpu number of clockevents device timer_list: print real timer address NOHZ: restart tick device from irq_enter() NOHZ: split tick_nohz_restart_sched_tick() NOHZ: unify the nohz function calls in irq_enter() timers: fix itimer/many thread hang, fix timers: fix itimer/many thread hang, v3 ntp: improve adjtimex frequency rounding timekeeping: fix rounding problem during clock update ntp: let update_persistent_clock() sleep hrtimer: reorder struct hrtimer to save 8 bytes on 64bit builds posix-timers: lock_timer: make it readable posix-timers: lock_timer: kill the bogus ->it_id check posix-timers: kill ->it_sigev_signo and ->it_sigev_value posix-timers: sys_timer_create: cleanup the error handling posix-timers: move the initialization of timer->sigq from send to create path posix-timers: sys_timer_create: simplify and s/tasklist/rcu/ ... Fix trivial conflicts due to sysrq-q description clahes in Documentation/sysrq.txt and drivers/char/sysrq.c	2008-10-20 13:19:56 -07:00
Linus Torvalds	1d9a8a47d6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: implement nonseekable open fuse: add include protectors fuse: config description improvement fuse: add missing fuse_request_free fuse: fix SEEK_END incorrectness	2008-10-20 12:53:27 -07:00
Alexey Dobriyan	6da0b38f44	fs/Kconfig: move ext2, ext3, ext4, JBD, JBD2 out Use fs/*/Kconfig more, which is good because everything related to one filesystem is in one place and fs/Kconfig is quite fat. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 11:43:59 -07:00
Linus Torvalds	45e4a24f7b	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: (26 commits) 9p: add more conservative locking 9p: fix oops in protocol stat parsing error path. 9p: fix device file handling 9p: Improve debug support 9p: eliminate depricated conv functions 9p: rework client code to use new protocol support functions 9p: remove unnecessary tag field from p9_req_t structure 9p: remove 9p fcall debug prints 9p: add new protocol support code 9p: encapsulate version function 9p: move dirread to fs layer 9p: adjust 9p vfs write operation 9p: move readn meta-function from client to fs layer 9p: consolidate read/write functions 9p: drop broken unused error path from p9_conn_create() 9p: make rpc code common and rework flush code 9p: use the rcall structure passed in the request in trans_fd read_work 9p: apply common request code to trans_fd 9p: apply common tagpool handling to trans_fd 9p: move request management to client code ...	2008-10-20 09:39:47 -07:00
Linus Torvalds	52c6738b7f	Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: use correct fs type for v4 submounts and referrals Make nfs_file_cred more robust. NFS: Enable NFSv4 callback server to listen on AF_INET6 sockets	2008-10-20 09:39:20 -07:00
Linus Torvalds	396b122f6a	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 * 'linux-next' of git://git.infradead.org/ubifs-2.6: (25 commits) UBIFS: fix ubifs_compress commentary UBIFS: amend printk UBIFS: do not read unnecessary bytes when unpacking bits UBIFS: check buffer length when scanning for LPT nodes UBIFS: correct condition to eliminate unecessary assignment UBIFS: add more debugging messages for LPT UBIFS: fix bulk-read handling uptodate pages UBIFS: improve garbage collection UBIFS: allow for sync_fs when read-only UBIFS: commit on sync_fs UBIFS: correct comment for commit_on_unmount UBIFS: update dbg_dump_inode UBIFS: fix commentary UBIFS: fix races in bit-fields UBIFS: ensure data read beyond i_size is zeroed out correctly UBIFS: correct key comparison UBIFS: use bit-fields when possible UBIFS: check data CRC when in error state UBIFS: improve znode splitting rules UBIFS: add no_chk_data_crc mount option ...	2008-10-20 09:19:03 -07:00
Linus Torvalds	2be508d847	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (69 commits) Revert "[MTD] m25p80.c code cleanup" [MTD] [NAND] GPIO driver depends on ARM... for now. [MTD] [NAND] sh_flctl: fix compile error [MTD] [NOR] AT49BV6416 has swapped erase regions [MTD] [NAND] GPIO NAND flash driver [MTD] cmdlineparts documentation change - explain where mtd-id comes from [MTD] cfi_cmdset_0002.c: Add Macronix CFI V1.0 TopBottom detection [MTD] [NAND] Fix compilation warnings in drivers/mtd/nand/cs553x_nand.c [JFFS2] Write buffer offset adjustment for NOR-ECC (Sibley) flash [MTD] mtdoops: Fix a bug where block may not be erased [MTD] mtdoops: Add a magic number to logged kernel oops [MTD] mtdoops: Fix an off by one error [JFFS2] Correct parameter names of jffs2_compress() in comments [MTD] [NAND] sh_flctl: add support for Renesas SuperH FLCTL [MTD] [NAND] Bug on atmel_nand HW ECC : OOB info not correctly written [MTD] [MAPS] Remove unused variable after ROM API cleanup. [MTD] m25p80.c extended jedec support (v2) [MTD] remove unused mtd parameter in of_mtd_parse_partitions() [MTD] [NAND] remove dead Kconfig associated with !CONFIG_PPC_MERGE [MTD] [NAND] driver extension to support NAND on TQM85xx modules ...	2008-10-20 09:03:12 -07:00
Alexey Dobriyan	bb26b963d8	fs/Kconfig: move CIFS out Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:42 -07:00
Simon Horman	85a0ee342e	kdump: add is_vmcore_usable() and vmcore_unusable() The usage of elfcorehdr_addr has changed recently such that being set to ELFCORE_ADDR_MAX is used by is_kdump_kernel() to indicate if the code is executing in a kernel executed as a crash kernel. However, arch/ia64/kernel/setup.c:reserve_elfcorehdr will rest elfcorehdr_addr to ELFCORE_ADDR_MAX on error, which means any subsequent calls to is_kdump_kernel() will return 0, even though they should return 1. Ok, at this point in time there are no subsequent calls, but I think its fair to say that there is ample scope for error or at the very least confusion. This patch add an extra state, ELFCORE_ADDR_ERR, which indicates that elfcorehdr_addr was passed on the command line, and thus execution is taking place in a crashdump kernel, but vmcore can't be used for some reason. This is tested for using is_vmcore_usable() and set using vmcore_unusable(). A subsequent patch makes use of this new code. To summarise, the states that elfcorehdr_addr can now be in are as follows: ELFCORE_ADDR_MAX: not a crashdump kernel ELFCORE_ADDR_ERR: crashdump kernel but vmcore is unusable any other value: crash dump kernel and vmcore is usable Signed-off-by: Simon Horman <horms@verge.net.au> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:40 -07:00
Vivek Goyal	57cac4d188	kdump: make elfcorehdr_addr independent of CONFIG_PROC_VMCORE o elfcorehdr_addr is used by not only the code under CONFIG_PROC_VMCORE but also by the code which is not inside CONFIG_PROC_VMCORE. For example, is_kdump_kernel() is used by powerpc code to determine if kernel is booting after a panic then use previous kernel's TCE table. So even if CONFIG_PROC_VMCORE is not set in second kernel, one should be able to correctly determine that we are booting after a panic and setup calgary iommu accordingly. o So remove the assumption that elfcorehdr_addr is under CONFIG_PROC_VMCORE. o Move definition of elfcorehdr_addr to arch dependent crash files. (Unfortunately crash dump does not have an arch independent file otherwise that would have been the best place). o kexec.c is not the right place as one can Have CRASH_DUMP enabled in second kernel without KEXEC being enabled. o I don't see sh setup code parsing the command line for elfcorehdr_addr. I am wondering how does vmcore interface work on sh. Anyway, I am atleast defining elfcoredhr_addr so that compilation is not broken on sh. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Simon Horman <horms@verge.net.au> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
Roland McGrath	656eb2cd5d	add CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS This adds a kconfig option to change the /proc/PID/coredump_filter default. Fedora has been carrying a trivial patch to change the hard-wired value for this default, since Fedora 8. The default default can't change safely because there are old GDB versions out there (all before 6.7) that are confused by the core dump files created by the MMF_DUMP_ELF_HEADERS setting. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
Oleg Nesterov	6409324b38	coredump: format_corename: don't append .%pid if multi-threaded If the coredumping is multi-threaded, format_corename() appends .%pid to the corename. This was needed before the proper multi-thread core dump support, now all the threads in the mm go into a single unified core file. Remove this special case, it is not even documented and we have "%p" and core_uses_pid. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Roland McGrath <roland@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: La Monte Yarroll <piggy@laurelnetworks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
Lai Jiangshan	3eda201180	seq_file: add seq_cpumask_list(), seq_nodemask_list() seq_cpumask_list(), seq_nodemask_list() are very like seq_cpumask(), seq_nodemask(), but they print human readable string. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
Lai Jiangshan	85dd030edb	seq_file: don't call bitmap_scnprintf_len() "m->count + len < m->size" is true commonly, so bitmap_scnprintf() is commonly called. this fix saves a call to bitmap_scnprintf_len(). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
Eric Sesterhenn	248736c2a5	hfsplus: fix possible deadlock when handling corrupted extents A corrupted extent for the extent file itself may try to get an impossible extent, causing a deadlock if I see it correctly. Check the inode number after the first_blocks checks and fail if it's the extent file, as according to the spec the extent file should have no extent for itself. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Alan Cox	6e71529444	hfsplus: missing O_LARGEFILE check hfsplus: O_LARGEFILE checking is missing Addresses http://bugzilla.kernel.org/show_bug.cgi?id=8490 From: Alan Cox <alan@redhat.com> Reported-by: didier <did447@gmail.com> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Eric Sandeen	cdbf6dba28	ext3: avoid printk floods in the face of directory corruption A very large directory with many read failures (either due to storage problems, or due to invalid size & blocks from corruption) will generate a printk storm as the filesystem continues to try to read all the blocks. This flood of messages can tie up the box until it is complete - which may be a very long time, especially for very large corrupted values. This is fixed by only reporting the corruption once each time we try to read the directory. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eugene Teo <eugeneteo@kernel.sg> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Aneesh Kumar K.V	5ec8b75e3a	ext3: truncate block allocated on a failed ext3_write_begin For blocksize < pagesize we need to remove blocks that got allocated in block_write_begin() if we fail with ENOSPC for later blocks. block_write_begin() internally does this if it allocated page locally. This makes sure we don't have blocks outside inode.i_size during ENOSPC. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Eugene Dashevsky	6a897cf447	ext3: fix ext3_dx_readdir hash collision handling This fixes a bug where readdir() would return a directory entry twice if there was a hash collision in an hash tree indexed directory. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eugene Dashevsky <eugene@ibrix.com> Signed-off-by: Mike Snitzer <msnitzer@ibrix.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Hidehiro Kawai	960a22ae60	jbd: ordered data integrity fix In ordered mode, if a file data buffer being dirtied exists in the committing transaction, we write the buffer to the disk, move it from the committing transaction to the running transaction, then dirty it. But we don't have to remove the buffer from the committing transaction when the buffer couldn't be written out, otherwise it would miss the error and the committing transaction would not abort. This patch adds an error check before removing the buffer from the committing transaction. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:37 -07:00
Hidehiro Kawai	0e4fb5e283	ext3: add an option to control error handling on file data If the journal doesn't abort when it gets an IO error in file data blocks, the file data corruption will spread silently. Because most of applications and commands do buffered writes without fsync(), they don't notice the IO error. It's scary for mission critical systems. On the other hand, if the journal aborts whenever it gets an IO error in file data blocks, the system will easily become inoperable. So this patch introduces a filesystem option to determine whether it aborts the journal or just call printk() when it gets an IO error in file data. If you mount a ext3 fs with data_err=abort option, it aborts on file data write error. If you mount it with data_err=ignore, it doesn't abort, just call printk(). data_err=ignore is the default. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:37 -07:00
Mingming Cao	46d01a225e	ext3: fix ext3 block reservation early ENOSPC issue We could run into ENOSPC error on ext3, even when there is free blocks on the filesystem. The problem is triggered in the case the goal block group has 0 free blocks , and the rest block groups are skipped due to the check of "free_blocks < windowsz/2". Current code could fall back to non reservation allocation to prevent early ENOSPC after examing all the block groups with reservation on , but this code was bypassed if the reservation window is turned off already, which is true in this case. This patch fixed two issues: 1) We don't need to turn off block reservation if the goal block group has 0 free blocks left and continue search for the rest of block groups. Current code the intention is to turn off the block reservation if the goal allocation group has a few (some) free blocks left (not enough for make the desired reservation window),to try to allocation in the goal block group, to get better locality. But if the goal blocks have 0 free blocks, it should leave the block reservation on, and continues search for the next block groups,rather than turn off block reservation completely. 2) we don't need to check the window size if the block reservation is off. The problem was originally found and fixed in ext4. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:37 -07:00
Josef Bacik	972fbf7798	ext3: don't try to resize if there are no reserved gdt blocks left When trying to resize a ext3 fs and you run out of reserved gdt blocks, you get an error that doesn't actually tell you what went wrong, it just says that the gdb it picked is not correct, which is the case since you don't have any reserved gdt blocks left. This patch adds a check to make sure you have reserved gdt blocks to use, and if not prints out a more relevant error. Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Andreas Dilger <adilger@sun.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:37 -07:00
Hidehiro Kawai	885e353c74	jbd: don't dirty original metadata buffer on abort Currently, original metadata buffers are dirtied when they are unfiled whether the journal has aborted or not. Eventually these buffers will be written-back to the filesystem by pdflush. This means some metadata buffers are written to the filesystem without journaling if the journal aborts. So if both journal abort and system crash happen at the same time, the filesystem would become inconsistent state. Additionally, replaying journaled metadata can overwrite the latest metadata on the filesystem partly. Because, if the journal aborts, journaled metadata are preserved and replayed during the next mount not to lose uncheckpointed metadata. This would also break the consistency of the filesystem. This patch prevents original metadata buffers from being dirtied on abort by clearing BH_JBDDirty flag from those buffers. Thus, no metadata buffers are written to the filesystem without journaling. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:37 -07:00
Hidehiro Kawai	d1645e526a	jbd: abort when failed to log metadata buffers If we failed to write metadata buffers to the journal space and succeeded to write the commit record, stale data can be written back to the filesystem as metadata in the recovery phase. To avoid this, when we failed to write out metadata buffers, abort the journal before writing the commit record. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
KOSAKI Motohiro	e575f111dc	coredump_filter: add hugepage dumping Presently hugepage's vma has a VM_RESERVED flag in order not to be swapped. But a VM_RESERVED vma isn't core dumped because this flag is often used for some kernel vmas (e.g. vmalloc, sound related). Thus hugepages are never dumped and it can't be debugged easily. Many developers want hugepages to be included into core-dump. However, We can't read generic VM_RESERVED area because this area is often IO mapping area. then these area reading may change device state. it is definitly undesiable side-effect. So adding a hugepage specific bit to the coredump filter is better. It will be able to hugepage core dumping and doesn't cause any side-effect to any i/o devices. In additional, libhugetlb use hugetlb private mapping pages as anonymous page. Then, hugepage private mapping pages should be core dumped by default. Then, /proc/[pid]/core_dump_filter has two new bits. - bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes) - bit 6 mean hugetlb shared mapping pages are dumped or not. (default: no) I tested by following method. % ulimit -c unlimited % ./crash_hugepage 50 % ./crash_hugepage 50 -p % ls -lh % gdb ./crash_hugepage core % % echo 0x43 > /proc/self/coredump_filter % ./crash_hugepage 50 % ./crash_hugepage 50 -p % ls -lh % gdb ./crash_hugepage core #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #include <string.h> #include "hugetlbfs.h" int main(int argc, char** argv){ char* p; int ch; int mmap_flags = MAP_SHARED; int fd; int nr_pages; while((ch = getopt(argc, argv, "p")) != -1) { switch (ch) { case 'p': mmap_flags &= ~MAP_SHARED; mmap_flags \|= MAP_PRIVATE; break; default: /* nothing/ break; } } argc -= optind; argv += optind; if (argc == 0){ printf("need # of pages\n"); exit(1); } nr_pages = atoi(argv[0]); if (nr_pages < 2) { printf("nr_pages must >2\n"); exit(1); } fd = hugetlbfs_unlinked_fd(); p = mmap(NULL, nr_pages gethugepagesize(), PROT_READ\|PROT_WRITE, mmap_flags, fd, 0); sleep(2); (p + gethugepagesize()) = 1; / COW / sleep(2); / crash! / (int*)0 = 1; return 0; } Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: William Irwin <wli@holomorphy.com> Cc: Adam Litke <agl@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
Nick Piggin	51b07fc3c5	fs: buffer lock use lock bitops trylock_buffer and unlock_buffer open and close a critical section. Hence, we can use the lock bitops to get the desired memory ordering. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
Nick Piggin	5344b7e648	vmstat: mlocked pages statistics Add NR_MLOCK zone page state, which provides a (conservative) count of mlocked pages (actually, the number of mlocked pages moved off the LRU). Reworked by lts to fit in with the modified mlock page support in the Reclaim Scalability series. [kosaki.motohiro@jp.fujitsu.com: fix incorrect Mlocked field of /proc/meminfo] [lee.schermerhorn@hp.com: mlocked-pages: add event counting with statistics] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:31 -07:00
Lee Schermerhorn	ba9ddf4939	Ramfs and Ram Disk pages are unevictable Christoph Lameter pointed out that ram disk pages also clutter the LRU lists. When vmscan finds them dirty and tries to clean them, the ram disk writeback function just redirties the page so that it goes back onto the active list. Round and round she goes... With the ram disk driver [rd.c] replaced by the newer 'brd.c', this is no longer the case, as ram disk pages are no longer maintained on the lru. [This makes them unmigratable for defrag or memory hot remove, but that can be addressed by a separate patch series.] However, the ramfs pages behave like ram disk pages used to, so: Define new address_space flag [shares address_space flags member with mapping's gfp mask] to indicate that the address space contains all unevictable pages. This will provide for efficient testing of ramfs pages in page_evictable(). Also provide wrapper functions to set/test the unevictable state to minimize #ifdefs in ramfs driver and any other users of this facility. Set the unevictable state on address_space structures for new ramfs inodes. Test the unevictable state in page_evictable() to cull unevictable pages. These changes depend on [CONFIG_]UNEVICTABLE_LRU. [riel@redhat.com: undo the brd.c part] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Debugged-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:26 -07:00
Lee Schermerhorn	7b854121eb	Unevictable LRU Page Statistics Report unevictable pages per zone and system wide. Kosaki Motohiro added support for memory controller unevictable statistics. [riel@redhat.com: fix printk in show_free_areas()] [akpm@linux-foundation.org: fix units in /proc/vmstats] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Debugged-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:26 -07:00
Rik van Riel	4f98a2fee8	vmscan: split LRU lists into anon & file sets Split the LRU lists in two, one set for pages that are backed by real file systems ("file") and one for pages that are backed by memory and swap ("anon"). The latter includes tmpfs. The advantage of doing this is that the VM will not have to scan over lots of anonymous pages (which we generally do not want to swap out), just to find the page cache pages that it should evict. This patch has the infrastructure and a basic policy to balance how much we scan the anon lists and how much we scan the file lists. The big policy changes are in separate patches. [lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset] [kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru] [kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page] [hugh@veritas.com: memcg swapbacked pages active] [hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED] [akpm@linux-foundation.org: fix /proc/vmstat units] [nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration] [kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo] [kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:25 -07:00
Thomas Gleixner	c465a76af6	Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus	2008-10-20 13:14:06 +02:00
Geert Uytterhoeven	54779aabb0	UBIFS: fix ubifs_compress commentary Update the comment for ubifs_compress(), which incorrectly states that it returnsa success/failure indicator. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2008-10-19 13:01:37 +03:00
Artem Bityutskiy	fae7fb299f	UBIFS: amend printk It is better to print "Reserved for root" than "Reserved pool size", because it is more obvious for users what this means. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>	2008-10-19 13:01:30 +03:00
Adrian Hunter	727d2dc045	UBIFS: do not read unnecessary bytes when unpacking bits Fixes the following Oops: BUG: unable to handle kernel paging request at f8d24000 IP: [<f8ff0657>] :ubifs:ubifs_unpack_bits+0xcd/0x231 pde = 34333067 pte = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: deflate zlib_deflate lzo lzo_decompress lzo_compress ubifs ubi nandsim nand nand_ids nand_ecc mtd nfsd lockd sunrpc exportfs [last unloaded: nand_ecc] Pid: 7450, comm: sync Not tainted (2.6.27-rc8-ubifs-2.6 #27) EIP: 0060:[<f8ff0657>] EFLAGS: 00010206 CPU: 0 EIP is at ubifs_unpack_bits+0xcd/0x231 [ubifs] EAX: 00000000 EBX: 00000000 ECX: d7e43dc0 EDX: 0000ff00 ESI: 00000004 EDI: f8d23ffe EBP: d7e43db4 ESP: d7e43d8c DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process sync (pid: 7450, ti=d7e42000 task=eb6f9530 task.ti=d7e42000) Stack: 00000400 c0103db4 dc5e8090 d7e43dc0 d7e43dc0 d7e43dc4 0000001c 00000004 f496d1e0 f8d23ffc d7e43dd4 f8ffac7e f8d23ffe 00000000 f8d23ffe f2b7af68 f496d1e0 f8d23ffc d7e43e2c f8ffadc5 00000000 0001f000 00000000 c03b10a7 Call Trace: [<c0103db4>] ? restore_nocheck_notrace+0x0/0xe [<f8ffac7e>] ? is_a_node+0x43/0x92 [ubifs] [<f8ffadc5>] ? dbg_check_ltab+0xf8/0x5c9 [ubifs] [<c03b10a7>] ? mutex_lock_nested+0x1b2/0x2a0 [<f8ffc86e>] ? ubifs_lpt_start_commit+0x49/0xecb [ubifs] [<c03b0ef3>] ? mutex_unlock+0xd/0xf [<f8fef017>] ? ubifs_tnc_start_commit+0x1cf/0xef8 [ubifs] [<f8fe65d8>] ? do_commit+0x18f/0x52d [ubifs] [<f8fe69f6>] ? ubifs_run_commit+0x80/0xca [ubifs] [<f8fd8d35>] ? ubifs_sync_fs+0xdb/0xf6 [ubifs] [<c0181a07>] ? sync_filesystems+0xc6/0x10c [<c019f279>] ? do_sync+0x3b/0x6a [<c019f2ba>] ? sys_sync+0x12/0x18 [<c0103ced>] ? sysenter_do_call+0x12/0x35 ======================= Code: 4d ec 89 01 8b 45 e8 89 10 89 d8 89 f1 d3 e8 85 c0 74 07 29 d6 83 fe 20 75 2a 89 d8 83 c4 1c 5b 5e 5f 5d c3 0f b6 57 01 c1 e2 08 <0f> b6 47 02 c1 e0 10 09 c2 0f b6 07 09 c2 0f b EIP: [<f8ff0657>] ubifs_unpack_bits+0xcd/0x231 [ubifs] SS:ESP 0068:d7e43d8c ---[ end trace 1bbb4c407a6dd816 ]--- Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>	2008-10-19 13:01:21 +03:00
Alexander Belyakov	5bf1723723	[JFFS2] Write buffer offset adjustment for NOR-ECC (Sibley) flash After choosing new c->nextblock, don't leave the wbuf offset field occasionally pointing at the start of the next physical eraseblock. This was causing a BUG() on NOR-ECC (Sibley) flash, where we start writing after the cleanmarker. Among other this fix should cover write buffer offset adjustment after flushing the last page of an eraseblock. Signed-off-by: Alexander Belyakov <abelyako@googlemail.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2008-10-18 11:54:09 +01:00
Linus Torvalds	58617d5e59	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Remove automatic enabling of the HUGE_FILE feature flag ext4: Replace hackish ext4_mb_poll_new_transaction with commit callback ext4: Update Documentation/filesystems/ext4.txt ext4: Remove unused mount options: nomballoc, mballoc, nocheck ext4: Remove compile warnings when building w/o CONFIG_PROC_FS ext4: Add missing newlines to printk messages ext4: Fix file fragmentation during large file write. vfs: Add no_nrwrite_index_update writeback control flag vfs: Remove the range_cont writeback mode. ext4: Use tag dirty lookup during mpage_da_submit_io ext4: let the block device know when unused blocks can be discarded ext4: Don't reuse released data blocks until transaction commits ext4: Use an rbtree for tracking blocks freed during transaction. ext4: Do mballoc init before doing filesystem recovery ext4: Free ext4_prealloc_space using kmem_cache_free ext4: Fix Kconfig typo for ext4dev ext4: Remove an old reference to ext4dev in Makefile comment	2008-10-17 15:08:11 -07:00
Magnus Deininger	57c7b4e68e	9p: fix device file handling In v9fs_get_inode(), for block, as well as char devices (in theory), the function init_special_inode() is called to set up callback functions for file ops. this function uses the file mode's value to determine whether to use block or char dev functions. In v9fs_inode_from_fid(), the function p9mode2unixmode() is used, but for all devices it initially returns S_IFBLK, then uses v9fs_get_inode() to initialise a new inode, then finally uses v9fs_stat2inode(), which would determine whether the inode is a block or character device. However, at that point init_special_inode() had already decided to use the block device functions, so even if the inode's mode is turned to a character device, the block functions are still used to operate on them. The attached patch simply calls init_special_inode() again for devices after parsing device node data in v9fs_stat2inode() so that the proper functions are used. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-10-17 12:44:46 -05:00
Andy Adamson	ec9a05c94c	NFS: use correct fs type for v4 submounts and referrals Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-10-17 13:06:48 -04:00
Neil Brown	504e518953	Make nfs_file_cred more robust. As not all files have an associated open_context (e.g. device special files), it is safest to test for the existence of the open context before de-referencing it. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-10-17 13:06:45 -04:00
Chuck Lever	18de973530	NFS: Enable NFSv4 callback server to listen on AF_INET6 sockets Allow the NFS callback server to listen for requests via an AF_INET6 or AF_INET socket when IPv6 support is present in the kernel. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-10-17 13:06:41 -04:00
Eric Van Hensbergen	02da398b95	9p: eliminate depricated conv functions Remove depricated conv functions which have been replaced with new protocol routines. This patch also reworks the one instance of the file-system code which directly calls conversion routines (to accomplish unpacking dirreads). Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-10-17 11:06:57 -05:00
Eric Van Hensbergen	51a87c552d	9p: rework client code to use new protocol support functions Now that the new protocol functions are in place, this patch switches the client code to using the new support code. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-10-17 11:04:45 -05:00
Eric Van Hensbergen	06b55b464e	9p: move dirread to fs layer Currently reading a directory is implemented in the client code. This function is not actually a wire operation, but a meta operation which calls read operations and processes the results. This patch moves this functionality to the fs layer and calls component wire operations instead of constructing their packets. This provides a cleaner separation and will help when we reorganize the client functions and protocol processing methods. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-10-17 11:04:43 -05:00
Eric Van Hensbergen	dfb0ec2e13	9p: adjust 9p vfs write operation Currently, the 9p net wire operation ensures that all data is sent by sending multiple packets if the data requested is larger than the msize. This is better handled in the vfs code so that we can simplify wire operations to being concerned with only putting data onto and taking data off of the wire. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-10-17 11:04:43 -05:00
Eric Van Hensbergen	fbedadc16e	9p: move readn meta-function from client to fs layer There are a couple of methods in the client code which aren't actually wire operations. To keep things organized cleaner, these operations are being moved to the fs layer. This patch moves the readn meta-function (which executes multiple wire reads until a buffer is full) to the fs layer. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2008-10-17 11:04:43 -05:00

1 2 3 4 5 ...

10441 Commits