linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Andreas Gruenbacher	bf781714b3	jffs2: Add missing capability check for listing trusted xattrs The vfs checks if a task has the appropriate access for get and set operations, but it cannot do that for the list operation; the file system must check for that itself. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: linux-mtd@lists.infradead.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-13 20:34:30 -05:00
Andreas Gruenbacher	e282fb7f3b	hfsplus: Remove unused xattr handler list operations The list operations can never be called; they are even documented to be unused. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-13 20:34:29 -05:00
Andreas Gruenbacher	13d3408f10	ubifs: Remove unused security xattr handler Ubifs installs a security xattr handler in sb->s_xattr but doesn't use the generic_{get,set,list,remove}xattr inode operations needed for processing this list of attribute handlers; the handler is never called. Instead, ubifs uses its own xattr handlers which also process security xattrs. Remove the dead code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Richard Weinberger <richard@nod.at> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: linux-mtd@lists.infradead.org Cc: Subodh Nijsure <snijsure@grid-net.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-13 20:34:29 -05:00
Andreas Gruenbacher	dae5f57a72	vfs: Fix the posix_acl_xattr_list return value When a filesystem that contains POSIX ACLs is mounted without ACL support (-o noacl), the appropriate behavior is not to list any existing POSIX ACL xattrs. The return value for list xattr handlers in this case is 0, not an error code: several filesystems that use the POSIX ACL xattr handlers do not expect the list operation to fail. Symlinks cannot have ACLs, so posix_acl_xattr_list will never be called for symlinks in the first place. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-13 20:34:28 -05:00
Andreas Gruenbacher	c361016ade	vfs: Check attribute names in posix acl xattr handers The get and set operations of the POSIX ACL xattr handlers failed to check the attribute names, so all names with "system.posix_acl_access" or "system.posix_acl_default" as a prefix were accepted. Reject invalid names from now on. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-13 20:34:28 -05:00
Tzvetelin Katchov	7c7afc440c	fs: 9p: cache.h: Add #define of include guard The include file was intended to have an include guard, but the #define part is missing. Signed-off-by: Tzvetelin Katchov <katchov@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:19:50 -05:00
Ross Zwisler	c8fffa6435	vfs: remove stale comment in inode_operations The big warning comment that is currently at the end of struct inode_operations was added as part of this commit: `4aa7c6346b` ("vfs: add i_op->dentry_open()") It was added to warn people not to use the newly added 'dentry_open' function pointer. This function pointer was removed as part of this commit: `4bacc9c923` ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") The comment was left behind and now refers to nothing, so remove it. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:19:38 -05:00
Ross Zwisler	5c50002963	vfs: remove unused wrapper block_page_mkwrite() The function currently called "__block_page_mkwrite()" used to be called "block_page_mkwrite()" until a wrapper for this function was added by: commit `24da4fab5a` ("vfs: Create __block_page_mkwrite() helper passing error values back") This wrapper, the current "block_page_mkwrite()", is currently unused. __block_page_mkwrite() is used directly by ext4, nilfs2 and xfs. Remove the unused wrapper, rename __block_page_mkwrite() back to block_page_mkwrite() and update the comment above block_page_mkwrite(). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.com> Cc: Jan Kara <jack@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:19:33 -05:00
Maciej W. Rozycki	54d15714f7	binfmt_elf: Correct `arch_check_elf's description Correct `arch_check_elf's description, mistakenly copied and pasted from `arch_elf_pt_proc'. Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:19:21 -05:00
Randy Dunlap	88a578d823	fs: fix writeback.c kernel-doc warnings Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro to after the function's opening brace. Also #undef this macro at the end of the function. ..//fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE' ..//fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:18:27 -05:00
Randy Dunlap	034ae4bac9	fs: fix inode.c kernel-doc warning Fix kernel-doc warning in fs/inode.c: ..//fs/inode.c:1606: warning: No description found for parameter 'inode' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:18:27 -05:00
Eric Biggers	6ae0806993	fs/pipe.c: return error code rather than 0 in pipe_write() pipe_write() would return 0 if it failed to merge the beginning of the data to write with the last, partially filled pipe buffer. It should return an error code instead. Userspace programs could be confused by write() returning 0 when called with a nonzero 'count'. The EFAULT error case was a regression from `f0d1bec9d5` ("new helper: copy_page_from_iter()"), while the ops->confirm() error case was a much older bug. Test program: #include <assert.h> #include <errno.h> #include <unistd.h> int main(void) { int fd[2]; char data[1] = {0}; assert(0 == pipe(fd)); assert(1 == write(fd[1], data, 1)); /* prior to this patch, write() returned 0 here */ assert(-1 == write(fd[1], NULL, 1)); assert(errno == EFAULT); } Cc: stable@vger.kernel.org # at least v3.15+ Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:18:26 -05:00
Eric Biggers	e9bb1f9b12	fs/pipe.c: preserve alloc_file() error code If sys_pipe() was unable to allocate a 'struct file', it always failed with ENFILE, which means "The number of simultaneously open files in the system would exceed a system-imposed limit." However, alloc_file() actually returns an ERR_PTR value and might fail with other error codes. Currently, in addition to ENFILE, it can fail with ENOMEM, potentially when there are few open files in the system. Update sys_pipe() to preserve this error code. In a prior submission of a similar patch (1) some concern was raised about introducing a new error code for sys_pipe(). However, for most system calls, programs cannot assume that new error codes will never be introduced. In addition, ENOMEM was, in fact, already a possible error code for sys_pipe(), in the case where the file descriptor table could not be expanded due to insufficient memory. (1) http://comments.gmane.org/gmane.linux.kernel/1357942 Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:18:23 -05:00
Maciej W. Rozycki	b582ef5c53	binfmt_elf: Don't clobber passed executable's file header Do not clobber the buffer space passed from `search_binary_handler' and originally preloaded by `prepare_binprm' with the executable's file header by overwriting it with its interpreter's file header. Instead keep the buffer space intact and directly use the data structure locally allocated for the interpreter's file header, fixing a bug introduced in 2.1.14 with loadable module support (linux-mips.org commit beb11695 [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history). Adjust the amount of data read from the interpreter's file accordingly. This was not an issue before loadable module support, because back then `load_elf_binary' was executed only once for a given ELF executable, whether the function succeeded or failed. With loadable module support supported and enabled, upon a failure of `load_elf_binary' -- which may for example be caused by architecture code rejecting an executable due to a missing hardware feature requested in the file header -- a module load is attempted and then the function reexecuted by `search_binary_handler'. With the executable's file header replaced with its interpreter's file header the executable can then be erroneously accepted in this subsequent attempt. Cc: stable@vger.kernel.org # all the way back Signed-off-by: Maciej W. Rozycki <macro@imgtec.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:18:07 -05:00
David Howells	102f4d900c	FS-Cache: Handle a write to the page immediately beyond the EOF marker Handle a write being requested to the page immediately beyond the EOF marker on a cache object. Currently this gets an assertion failure in CacheFiles because the EOF marker is used there to encode information about a partial page at the EOF - which could lead to an unknown blank spot in the file if we extend the file over it. The problem is actually in fscache where we check the index of the page being written against store_limit. store_limit is set to the number of pages that we're allowed to store by fscache_set_store_limit() - which means it's one more than the index of the last page we're allowed to store. The problem is that we permit writing to a page with an index _equal_ to the store limit - when we should reject that case. Whilst we're at it, change the triggered assertion in CacheFiles to just return -ENOBUFS instead. The assertion failure looks something like this: CacheFiles: Assertion failed 1000 < 7b1 is false ------------[ cut here ]------------ kernel BUG at fs/cachefiles/rdwr.c:962! ... RIP: 0010:[<ffffffffa02c9e83>] [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles] Cc: stable@vger.kernel.org # v2.6.31+; earlier - that + backport of `a17754f` (at least) Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:11:02 -05:00
NeilBrown	95201a4060	cachefiles: perform test on s_blocksize when opening cache file. cachefiles requires that s_blocksize in the cache is not greater than PAGE_SIZE, and performs the check every time a block is accessed. Move the test to the place where the file is "opened", where other file-validity tests are performed. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:08:17 -05:00
Kinglong Mee	b130ed5998	FS-Cache: Don't override netfs's primary_index if registering failed Only override netfs->primary_index when registering success. Cc: stable@vger.kernel.org # v2.6.30+ Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:07:51 -05:00
Kinglong Mee	86108c2e34	FS-Cache: Increase reference of parent after registering, netfs success If netfs exist, fscache should not increase the reference of parent's usage and n_children, otherwise, never be decreased. v2: thanks David's suggest, move increasing reference of parent if success use kmem_cache_free() freeing primary_index directly v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;" Cc: stable@vger.kernel.org # v2.6.30+ Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:06:53 -05:00
Daniel Borkmann	0ee9608c89	debugfs: fix refcount imbalance in start_creating In debugfs' start_creating(), we pin the file system to safely access its root. When we failed to create a file, we unpin the file system via failed_creating() to release the mount count and eventually the reference of the vfsmount. However, when we run into an error during lookup_one_len() when still in start_creating(), we only release the parent's mutex but not so the reference on the mount. Looks like it was done in the past, but after splitting portions of __create_file() into start_creating() and end_creating() via `190afd81e4` ("debugfs: split the beginning and the end of __create_file() off"), this seemed missed. Noticed during code review. Fixes: `190afd81e4` ("debugfs: split the beginning and the end of __create_file() off") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-11-11 02:04:44 -05:00
Andrew Morton	ce5c2d2c25	arm64: fixup for mm renames __GFP_WAIT was renamed for __GFP_RECLAIM and the gfpflags_allow_blocking() helper was added. Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-07 16:45:51 -08:00
Linus Torvalds	ad804a0b2a	Merge branch 'akpm' (patches from Andrew) Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...	2015-11-07 14:32:45 -08:00
Linus Torvalds	ab9f2faf8f	Initial 4.4 merge window submission - "Checksum offload support in user space" enablement - Misc cxgb4 fixes, add T6 support - Misc usnic fixes - 32 bit build warning fixes - Misc ocrdma fixes - Multicast loopback prevention extension - Extend the GID cache to store and return attributes of GIDs - Misc iSER updates - iSER clustering update - Network NameSpace support for rdma CM - Work Request cleanup series - New Memory Registration API -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWPO5UAAoJELgmozMOVy/dSCQP/iX2ImMZOS3VkOYKhLR3dSv8 4vTEiYIoAT1JEXiPpiabuuACwotcZcMRk9kZ0dcWmBoFusTzKJmoDOkgAYd95XqY EsAyjqtzUGNNMjH5u5W+kdbaFdH9Ktq7IJvspRlJuvzC47Srax+qBxX01jrAkDgh 4PoA3hEa2KkvkDjY2Mhvk9EWd/uflO9Ky6o0D8jUQkWtEvKBRyDjQLk30oW6wHX9 pTWqww3dD0EXTrR+PDA88v2saKH1kZFU1Nt2eU8Bw+zlJM8hcX6U7PfRX0g3HT/J o+7ejTdLPWFDH35gJOU+KE519f1JbwfRjPJCqbOC9IttBB7iHSbhcpQLpWv4JV1x agdBeDA3TGQj3dHb2SkYMlWXCBp7q8UCbVGvvirTFzGSGU73sc6hhP+vCKvPQIlE Ah5tUqD7Y3mOBjvuDeIzKMLXILd5d3cH+m7Laytrf5e7fJPmBRZyOkcMh0QVElyl mKo+PFjghgeTFb405J7SDDw/vThVyN9HyIt7AGEzObaajzOOk9R1hkQr46XVy9TK yi58fl85yQ2n6TWV6NRnvkQoMy/N2HAEuXk/7HtO0PabV5w3Lo0zvXB9SnVrrVEm 58FWRBYCWorVSdSacuDnPm0iz45WSRIb9G9sBlhEC93eXRq2rSBoy4RvyLeliHFH hllyhNNolI6FJ64j07Xm =bBIY -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "This is my initial round of 4.4 merge window patches. There are a few other things I wish to get in for 4.4 that aren't in this pull, as this represents what has gone through merge/build/run testing and not what is the last few items for which testing is not yet complete. - "Checksum offload support in user space" enablement - Misc cxgb4 fixes, add T6 support - Misc usnic fixes - 32 bit build warning fixes - Misc ocrdma fixes - Multicast loopback prevention extension - Extend the GID cache to store and return attributes of GIDs - Misc iSER updates - iSER clustering update - Network NameSpace support for rdma CM - Work Request cleanup series - New Memory Registration API" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits) IB/core, cma: Make __attribute_const__ declarations sparse-friendly IB/core: Remove old fast registration API IB/ipath: Remove fast registration from the code IB/hfi1: Remove fast registration from the code RDMA/nes: Remove old FRWR API IB/qib: Remove old FRWR API iw_cxgb4: Remove old FRWR API RDMA/cxgb3: Remove old FRWR API RDMA/ocrdma: Remove old FRWR API IB/mlx4: Remove old FRWR API support IB/mlx5: Remove old FRWR API support IB/srp: Dont allocate a page vector when using fast_reg IB/srp: Remove srp_finish_mapping IB/srp: Convert to new registration API IB/srp: Split srp_map_sg RDS/IW: Convert to new memory registration API svcrdma: Port to new memory registration API xprtrdma: Port to new memory registration API iser-target: Port to new memory registration API IB/iser: Port to new fast registration API ...	2015-11-07 13:33:07 -08:00
Linus Torvalds	75021d2859	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial updates from Jiri Kosina: "Trivial stuff from trivial tree that can be trivially summed up as: - treewide drop of spurious unlikely() before IS_ERR() from Viresh Kumar - cosmetic fixes (that don't really affect basic functionality of the driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek - various comment / printk fixes and updates all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: bcache: Really show state of work pending bit hwmon: applesmc: fix comment typos Kconfig: remove comment about scsi_wait_scan module class_find_device: fix reference to argument "match" debugfs: document that debugfs_remove*() accepts NULL and error values net: Drop unlikely before IS_ERR(_OR_NULL) mm: Drop unlikely before IS_ERR(_OR_NULL) fs: Drop unlikely before IS_ERR(_OR_NULL) drivers: net: Drop unlikely before IS_ERR(_OR_NULL) drivers: misc: Drop unlikely before IS_ERR(_OR_NULL) UBI: Update comments to reflect UBI_METAONLY flag pktcdvd: drop null test before destroy functions	2015-11-07 13:05:44 -08:00
Linus Torvalds	6f1da317ac	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: "Highlights: - Intel Skylake Win8 precision touchpads support fixes/improvements from Mika Westerberg - Lenovo Yoga 2 quirk from Ritesh Raj Sarraf - potential uninitialized buffer access fix in HID core from Richard Purdie - Wacom Intuos and Wacom Cintiq 2 support improvements from Jason Gerecke and Ping Cheng - initiation of sysfs deprecation process for most of the roccat drivers, from the roccat support maintiner Stefan Achatz - quite a few device ID / quirk additions and small fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (30 commits) HID: logitech: Add support for G29 HID: logitech: Simplify wheel detection scheme HID: wacom: Call 'wacom_query_tablet_data' only after 'hid_hw_start' HID: wacom: Fix ABS_MISC reporting for Cintiq Companion 2 HID: wacom: Remove useless conditions from 'wacom_query_tablet_data' HID: wacom: fix Intuos wireless report id issue HID: fix some indenting issues HID: wacom: Expect 'touch_max' touches if HID_DG_CONTACTCOUNT not present HID: wacom: Tie cached HID_DG_CONTACTCOUNT indices to report ID HID: roccat: Fixed resubmit: Deprecating most Roccat sysfs attributes HID: wacom: Report full pressure range for Intuos, Cintiq 13HD Touch HID: wacom: Add support for Cintiq Companion 2 HID: multitouch: Fetch feature reports on demand for Win8 devices HID: sensor-hub: Add quirk for Lenovo Yoga 2 with ITE Chips HID: usbhid: Fix for the WiiU adapter from Mayflash HID: corsair: boolify struct k90_led.removed HID: corsair: Add Corsair Vengeance K90 driver HID: hid-input: allow input_configured callback return errors HID: multitouch: Add suffix for HID_DG_TOUCHPAD HID: i2c-hid: Fill in physical device providing HID functionality ...	2015-11-07 12:49:27 -08:00
Linus Torvalds	99aaa9c64b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fix from Jiri Kosina: "A fix for a kernel oops in case CONFIG_DEBUG_SET_MODULE_RONX is unset (as in such case it's possible for module struct to share a page with executable text, which is currently not being handled with grace) from Josh Poimboeuf" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX	2015-11-07 12:15:17 -08:00
Davidlohr Bueso	5f2a2d5d42	ipc,msg: drop dst nil validation in copy_msg `d0edd85283` ("ipc: convert invalid scenarios to use WARN_ON") relaxed the nil dst parameter check, originally being a full BUG_ON. However, this check seems quite unnecessary when the only purpose is for ceckpoint/restore (MSG_COPY flag): o The copy variable is set initially to nil, apparently as a way of ensuring that prepare_copy is previously called. Which is in fact done, unconditionally at the beginning of do_msgrcv. o There is no concurrency with 'copy' (stack allocated in do_msgrcv). Furthermore, any errors in 'copy' (and thus prepare_copy/copy_msg) should always handled by IS_ERR() family. Therefore remove this check altogether as it can never occur with the current users. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Anish Bhatt	cb7ae262e2	include/linux/zutil.h: fix usage example of zlib_adler32() alder32 was renamed to zlib_adler32 since before 2.6.11. Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Vitaly Kuznetsov	08d78658f3	panic: release stale console lock to always get the logbuf printed out In some cases we may end up killing the CPU holding the console lock while still having valuable data in logbuf. E.g. I'm observing the following: - A crash is happening on one CPU and console_unlock() is being called on some other. - console_unlock() tries to print out the buffer before releasing the lock and on slow console it takes time. - in the meanwhile crashing CPU does lots of printk()-s with valuable data (which go to the logbuf) and sends IPIs to all other CPUs. - console_unlock() finishes printing previous chunk and enables interrupts before trying to print out the rest, the CPU catches the IPI and never releases console lock. This is not the only possible case: in VT/fb subsystems we have many other console_lock()/console_unlock() users. Non-masked interrupts (or receiving NMI in case of extreme slowness) will have the same result. Getting the whole console buffer printed out on crash should be top priority. [akpm@linux-foundation.org: tweak comment text] Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Baoquan He <bhe@redhat.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Robin Murphy	7f8306429c	dma-debug: check nents in dma_sync_sg* Like dma_unmap_sg, dma_sync_sg* should be called with the original number of entries passed to dma_map_sg, so do the same check in the sync path as we do in the unmap path. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Sakari Ailus <sakari.ailus@iki.fi> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Robin Murphy	002edb6f6f	dma-mapping: tidy up dma_parms default handling Many DMA controllers and other devices set max_segment_size to indicate their scatter-gather capability, but have no interest in segment_boundary_mask. However, the existence of a dma_parms structure precludes the use of any default value, leaving them as zeros (assuming a properly kzalloc'ed structure). If a well-behaved IOMMU (or SWIOTLB) then tries to respect this by ensuring a mapped segment does not cross a zero-byte boundary, hilarity ensues. Since zero is a nonsensical value for either parameter, treat it as an indicator for "default", as might be expected. In the process, clean up a bit by replacing the bare constants with slightly more meaningful macros and removing the superfluous "else" statements. [akpm@linux-foundation.org: dma-mapping.h needs sizes.h for SZ_64K] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Sumit Semwal <sumit.semwal@linaro.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Sakari Ailus <sakari.ailus@iki.fi> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Ben Segall	8639b46139	pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode setpriority(PRIO_USER, 0, x) will change the priority of tasks outside of the current pid namespace. This is in contrast to both the other modes of setpriority and the example of kill(-1). Fix this. getpriority and ioprio have the same failure mode, fix them too. Eric said: : After some more thinking about it this patch sounds justifiable. : : My goal with namespaces is not to build perfect isolation mechanisms : as that can get into ill defined territory, but to build well defined : mechanisms. And to handle the corner cases so you can use only : a single namespace with well defined results. : : In this case you have found the two interfaces I am aware of that : identify processes by uid instead of by pid. Which quite frankly is : weird. Unfortunately the weird unexpected cases are hard to handle : in the usual way. : : I was hoping for a little more information. Changes like this one we : have to be careful of because someone might be depending on the current : behavior. I don't think they are and I do think this make sense as part : of the pid namespace. Signed-off-by: Ben Segall <bsegall@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ambrose Feinstein <ambrose@google.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Minfei Huang	de90a6bcae	kexec: use file name as the output message prefix kexec output message misses the prefix "kexec", when Dave Young split the kexec code. Now, we use file name as the output message prefix. Currently, the format of output message: [ 140.290795] SYSC_kexec_load: hello, world [ 140.291534] kexec: sanity_check_segment_list: hello, world Ideally, the format of output message: [ 30.791503] kexec: SYSC_kexec_load, Hello, world [ 79.182752] kexec_core: sanity_check_segment_list, Hello, world Remove the custom prefix "kexec" in output message. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Greg Thelen	0f930902eb	fs, seqfile: always allow oom killer Since `5cec38ac86` ("fs, seq_file: fallback to vmalloc instead of oom kill processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or smaller allocations; but larger allocations can use the oom killer via vmalloc(). Thus reads of small files can return ENOMEM, but larger files use the oom killer to avoid ENOMEM. The effect of this bug is that reads from /proc and other virtual filesystems can return ENOMEM instead of the preferred behavior - oom killing something (possibly the calling process). I don't know of anyone except Google who has noticed the issue. I suspect the fix is more needed in smaller systems where there isn't any reclaimable memory. But these seem like the kinds of systems which probably don't use the oom killer for production situations. Memory overcommit requires use of the oom killer to select a victim regardless of file size. Enable oom killer for small seq_buf_alloc() allocations. Fixes: `5cec38ac86` ("fs, seq_file: fallback to vmalloc instead of oom kill processes") Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Andy Shevchenko	25c6bb76ea	seq_file: reuse string_escape_str() strint_escape_str() escapes input string by given criteria. In case of seq_escape() the criteria is to convert some characters to their octal representation. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Andy Shevchenko	8b91a318e4	fs/seq_file: use seq_* helpers in seq_hex_dump() This improves code readability. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Oleg Nesterov	d61ba58953	coredump: change zap_threads() and zap_process() to use for_each_thread() Change zap_threads() paths to use for_each_thread() rather than while_each_thread(). While at it, change zap_threads() to avoid the nested if's to make the code more readable and lessen the indentation. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Kyle Walker <kwalker@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stanislav Kozina <skozina@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Oleg Nesterov	5fa534c987	coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP task_will_free_mem() is wrong in many ways, and in particular the SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the coredumping without SIGNAL_GROUP_COREDUMP bit set. change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if other CLONE_VM processes can't react to SIGKILL. Fortunately, at least oom-kill case if fine; it kills all tasks sharing the same mm, so it should also kill the process which actually dumps the core. The change in prepare_signal() is not strictly necessary, it just ensures that the patch does not bring another subtle behavioural change. But it reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Kyle Walker <kwalker@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Stanislav Kozina <skozina@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Oleg Nesterov	9317bb9696	signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) jffs2_garbage_collect_thread() does allow_signal(SIGCONT) for no reason, SIGCONT will wake a stopped task up even if it is ignored. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Felipe Balbi <balbi@ti.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Oleg Nesterov	9a13049e83	signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() jffs2_garbage_collect_thread() can race with SIGCONT and sleep in TASK_STOPPED state after it was already sent. Add the new helper, kernel_signal_stop(), which does this correctly. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Felipe Balbi <balbi@ti.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Oleg Nesterov	be0e6f290f	signal: turn dequeue_signal_lock() into kernel_dequeue_signal() 1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This matches another "for kthreads only" kernel_sigaction() helper. 2. Remove the "tsk" and "mask" arguments, they are always current and current->blocked. And it is simply wrong if tsk != current. 3. We could also remove the 3rd "siginfo_t *info" arg but it looks potentially useful. However we can simplify the callers if we change kernel_dequeue_signal() to accept info => NULL. 4. Remove _irqsave, it is never called from atomic context. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Felipe Balbi <balbi@ti.com> Cc: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Oleg Nesterov	2e01fabe67	signals: kill block_all_signals() and unblock_all_signals() It is hardly possible to enumerate all problems with block_all_signals() and unblock_all_signals(). Just for example, 1. block_all_signals(SIGSTOP/etc) simply can't help if the caller is multithreaded. Another thread can dequeue the signal and force the group stop. 2. Even is the caller is single-threaded, it will "stop" anyway. It will not sleep, but it will spin in kernel space until SIGCONT or SIGKILL. And a lot more. In short, this interface doesn't work at all, at least the last 10+ years. Daniel said: Yeah the only times I played around with the DRM_LOCK stuff was when old drivers accidentally deadlocked - my impression is that the entire DRM_LOCK thing was never really tested properly ;-) Hence I'm all for purging where this leaks out of the drm subsystem. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Dave Airlie <airlied@redhat.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Ryusuke Konishi	4f05028f8d	nilfs2: fix gcc uninitialized-variable warnings in powerpc build Some false positive warnings are reported for powerpc build. The following warnings are reported in http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/ CC fs/nilfs2/super.o fs/nilfs2/super.c: In function 'nilfs_resize_fs': fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here CC fs/nilfs2/recovery.o fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs': fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here fs/nilfs2/recovery.c: In function 'nilfs_search_super_root': fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized] Another similar warning is reported in http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/ CC fs/nilfs2/btree.o fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert': include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized] fs/nilfs2/btree.c:1859:22: note: 'bh' was declared here This cleans out these warnings by forcing the variables to be initialized. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Ryusuke Konishi	09ef29e0f6	nilfs2: fix gcc unused-but-set-variable warnings Fix the following build warnings: $ make W=1 [...] CC [M] fs/nilfs2/btree.o fs/nilfs2/btree.c: In function 'nilfs_btree_split': fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable] __u64 newptr; ^ fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable] __u64 newkey; ^ CC [M] fs/nilfs2/dat.o fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end': fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable] __u64 start; ^ CC [M] fs/nilfs2/segment.o fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush': fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable] int err; ^ CC [M] fs/nilfs2/sufile.o fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc': fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable] unsigned long nsegments, ncleansegs, nsus, cnt; ^ CC [M] fs/nilfs2/alloc.o fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry': fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable] unsigned long n, entries_per_group, groups_per_desc_block; ^ Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Ryusuke Konishi	c35c7ac5da	MAINTAINERS: nilfs2: add header file for tracing This adds header file "include/trace/events/nilfs2.h" to maintainer-ship of nilfs2 so that updates to the nilfs2 header file go to the mailing list of nilfs2. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Hitoshi Mitake	a9cd207c23	nilfs2: add tracepoints for analyzing reading and writing metadata files This patch adds tracepoints for analyzing requests of reading and writing metadata files. The tracepoints cover every in-place mdt files (cpfile, sufile, and datfile). Example of tracing mdt_insert_new_block(): cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155 cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5 cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253 Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: TK Kato <TK.Kato@wdc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Hitoshi Mitake	83eec5e6dd	nilfs2: add tracepoints for analyzing sufile manipulation This patch adds tracepoints which would be useful for analyzing segment usage from a perspective of high level sufile manipulation (check, alloc, free). sufile is an important in-place updated metadata file, so analyzing the behavior would be useful for performance turning. example of usage (a case of allocation): $ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end. segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2 segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3 Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benixon Dhas <benixon.dhas@wdc.com> Cc: TK Kato <TK.Kato@wdc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Hitoshi Mitake	44fda11460	nilfs2: add a tracepoint for transaction events This patch adds a tracepoint for transaction events of nilfs. With the tracepoint, these events can be tracked: begin, abort, commit, trylock, lock, and unlock. Basically, these events have corresponding functions e.g. begin event corresponds nilfs_transaction_begin(). The unlock event is an exception. It corresponds to the iteration in nilfs_transaction_lock(). Only one tracepoint is introcued: nilfs2_transaction_transition. The above events are distinguished with newly introduced enum. With this tracepoint, we can analyse a critical section of segment constructoin. Sample output by tpoint of perf-tools: cp-4457 [000] ...1 63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT segctord-4371 [001] ...1 68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK segctord-4371 [001] ...1 68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK segctord-4371 [001] ...1 68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN segctord-4371 [001] ...1 68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT segctord-4371 [001] ...1 68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK segctord-4371 [001] ...1 132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK This patch also does trivial cleaning of comma usage in collection stage transition event for consistent coding style. Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Hitoshi Mitake	5849770383	nilfs2: add a tracepoint for tracking stage transition of segment construction This patch adds a tracepoint for tracking stage transition of block collection in segment construction. With the tracepoint, we can analysis the behavior of segment construction in depth. It would be useful for bottleneck detection and debugging, etc. The tracepoint is created with the standard trace API of linux (like ext3, ext4, f2fs and btrfs). So we can analysis with existing tools easily. Of course, more detailed analysis will be possible if we can create nilfs specific analysis tools. Below is an example of event dump with Brendan Gregg's perf-tools (https://github.com/brendangregg/perf-tools). Time consumption between each stage can be obtained. $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end. segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE For capturing transition correctly, this patch adds wrappers for the member scnt of nilfs_cstage. With this change, every transition of the stage can produce trace event in a correct manner. Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Ryusuke Konishi	d0c14a9ee7	nilfs2: free unused dat file blocks during garbage collection As a nilfs2 volume ages, the amount of available disk space decreases little by little due to bloat of DAT (disk address translation) metadata file. Even if we delete all files in a file system and free their block addresses from the DAT file through a garbage collection, empty DAT blocks are not freed. This fixes the issue by extending the deallocator of block addresses so that empty data blocks and empty bitmap blocks of DAT are deleted. The following comparison shows the effect of this patch. Each shows disk amount information of a nilfs2 volume that we cleaned out by deleting all files and running gc after having filled 90% of its capacity. Before: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 500105212 3022844 472072192 1% /test After: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 500105212 16380 475078656 1% /test Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Ryusuke Konishi	da019954dd	nilfs2: add helper functions to delete blocks from dat file This adds delete functions for data blocks of metadata files using bitmap based allocator. nilfs_palloc_delete_entry_block() deletes an entry block (e.g. block storing dat entries), and nilfs_palloc_delete_bitmap_block() deletes a bitmap block, respectively. These helpers are intended to be used in the successive change on deallocator of block addresses ("nilfs2: free unused dat file blocks during garbage collection"). Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00

1 2 3 4 5 ...

557581 Commits All Branches Search

557581 Commits

All Branches