Commit Graph

469371 Commits

Author SHA1 Message Date
Miao Xie 443f24fee7 Btrfs: cleanup unused latest_devid and latest_trans in fs_devices
The member variants - latest_devid and latest_trans - of fs_devices structure
are set, but no one use them to do anything. so remove them.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:49 -07:00
Miao Xie 6ba40b615f Btrfs: update the comment of total_bytes and disk_total_bytes of btrfs_devie
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:48 -07:00
Miao Xie addc3fa74e Btrfs: Fix the problem that the dirty flag of dev stats is cleared
The io error might happen during writing out the device stats, and the
device stats information and dirty flag would be update at that time,
but the current code didn't consider this case, just clear the dirty
flag, it would cause that we forgot to write out the new device stats
information. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:46 -07:00
Miao Xie d5ee37bcb1 Btrfs: make the device lock and its protected data in the same cacheline
The lock in btrfs_device structure was far away from its protected data, it would
make CPU load the cache line twice when we accessed them, move them together.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:45 -07:00
Miao Xie 5f546063ce Btrfs: fix wrong generation check of super block on a seed device
The super block generation of the seed devices is not the same as the
filesystem which sprouted from them because we don't update the super
block on the seed devices when we change that new filesystem. So we
should not use the generation of that new filesystem to check the super
block generation on the seed devices, Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:44 -07:00
Miao Xie 17a9be2f28 Btrfs: fix wrong fsid check of scrub
All the metadata in the seed devices has the same fsid as the fsid
of the seed filesystem which is on the seed device, so we should check
them by the current filesystem. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:43 -07:00
David Sterba 2fad4e83e1 btrfs: wake up transaction thread from SYNC_FS ioctl
The transaction thread may want to do more work, namely it pokes the
cleaner ktread that will start processing uncleaned subvols.

This can be triggered by user via the 'btrfs fi sync' command, otherwise
there was a delay up to 30 seconds before the cleaner started to clean
old snapshots.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:42 -07:00
Wang Shilong c01a5c074c Btrfs: fix wrong max inline data size limit
inline data is stored from offset of @disk_bytenr in
struct btrfs_file_extent_item. So substracting total
size of struct btrfs_file_extent_item is wrong, fix it.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:40 -07:00
Wang Shilong 354877befa Btrfs: fix off-by-one in cow_file_range_inline()
Btrfs could still inline file data if its size is same as
page size, so don't skip max value here.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:39 -07:00
Wang Shilong 7816030eb4 Btrfs: fall into nocompression codes quickly if possible
If flag NOCOMPRESS is set which means bad compression ratio,
we could avoid call cow_file_range_async() for this case earlier.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:38 -07:00
Wang Shilong f79707b092 Btrfs: fix wrong skipping compression for an inode
If a file's compression ratios is bad, we will set NOCOMPRESS
flag for it, and it will skip compression for that inode next time.

However, if we remount fs to COMPRESS_FORCE, it still should try
if we could compress pages for that inode, this patch fix wrong
check for this problem.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:36 -07:00
Fabian Frederick d447d0da44 Btrfs: fix sparse warning
Fix the following sparse warning:
fs/btrfs/send.c:518:51: warning: incorrect type in argument 2 (different address spaces)
fs/btrfs/send.c:518:51:    expected char const [noderef] <asn:1>*<noident>
fs/btrfs/send.c:518:51:    got char *

We can safely use (const char __user *) with set_fs(KERNEL_DS)

__force added to avoid sparse-all warning:
fs/btrfs/send.c:518:40: warning: cast adds address space to expression (<asn:1>)

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Zach Brown <zab@zabbo.net>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:35 -07:00
HIMANGI SARAOGI 14586651ed Btrfs: use BUG_ON
Use BUG_ON(x) rather than if(x) BUG();

The semantic patch that fixes this problem is as follows:

// <smpl>
@@ identifier x; @@
-if (x) BUG();
+BUG_ON(x);
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:34 -07:00
Sergey Senozhatsky 7880991344 btrfs compression: merge inflate and deflate z_streams
`struct workspace' used for zlib compression contains two zlib
z_stream-s: `def_strm' used in zlib_compress_pages(), and `inf_strm'
used in zlib_decompress/zlib_decompress_biovec(). None of these
functions use `inf_strm' and `def_strm' simultaniously, meaning that
for every compress/decompress operation we need only one z_stream
(out of two available).

`inf_strm' and `def_strm' are different in size of ->workspace. For
inflate stream we vmalloc() zlib_inflate_workspacesize() bytes, for
deflate stream - zlib_deflate_workspacesize() bytes. On my system zlib
returns the following workspace sizes, correspondingly: 42312 and 268104
(+ guard pages).

Keep only one `z_stream' in `struct workspace' and use it for both
compression and decompression. Hence, instead of vmalloc() of two
z_stream->worskpace-s, allocate only one of size:
	max(zlib_deflate_workspacesize(), zlib_inflate_workspacesize())

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:33 -07:00
Filipe Manana 555e128640 Btrfs: set error return value in btrfs_get_blocks_direct
We were returning with 0 (success) because we weren't extracting the
error code from em (PTR_ERR(em)). Fix it.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:32 -07:00
Filipe Manana 27a3507de9 Btrfs: reduce size of struct extent_state
The tree field of struct extent_state was only used to figure out if
an extent state was connected to an inode's io tree or not. For this
we can just use the rb_node field itself.

On a x86_64 system with this change the sizeof(struct extent_state) is
reduced from 96 bytes down to 88 bytes, meaning that with a page size
of 4096 bytes we can now store 46 extent states per page instead of 42.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:30 -07:00
Fabian Frederick 6f84e23646 btrfs: use PTR_ERR_OR_ZERO
replace IS_ERR/PTR_ERR

Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:29 -07:00
Wang Shilong 29549aec76 Btrfs: print btrfs specific info for some fatal error cases
Marc argued that if there are several btrfs filesystems mounted,
while users even don't know which filesystem hit the corrupted
errors something like generation verification failure.

Since @extent_buffer structure has a member @fs_info, let's output
btrfs device info.

Reported-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:28 -07:00
Miao Xie d20983b40e Btrfs: fix writing data into the seed filesystem
If we mounted a seed filesystem with degraded option, and then added a new
device into the seed filesystem, then we found adding device failed because
of the IO failure.

Steps to reproduce:
 # mkfs.btrfs -d raid1 -m raid1 <dev0> <dev1>
 # btrfstune -S 1 <dev0>
 # mount <dev0> -o degraded <mnt>
 # btrfs device add -f <dev2> <mnt>

It is because the original didn't set the chunk on the seed device to be
read-only if the degraded flag was set. It was introduced by patch f48b90756,
which fixed the problem the raid1 filesystem became read-only after one device
of it was missing. But this fix method was not right, we should set the read-only
flag according to the number of the missing devices, not the degraded mount
option, if the number of the missing devices is less than the max error number
that the profile of the chunk tolerates, we don't set it to be read-only.

Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:27 -07:00
Wang Shilong 47059d930f Btrfs: make defragment work with nodatacow option
Btrfs defragment will utilize COW feature, which means this
did not work for nodatacow option, this problem was detected
by xfstests generic/018 with nodatacow mount option.

Fix this problem by forcing cow for a extent with state
@EXTETN_DEFRAG setting.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:26 -07:00
Satoru Takeuchi 48fcc3ff7d btrfs: label should not contain return char
Rediffed remaining parts of original patch from Anand Jain.  This makes
sure to avoid trailing newlines in the btrfs label output

reproducer.sh:

===============================================================================

TEST_DEV=/dev/vdb
TEST_DIR=/home/sat/mnt

umount /home/sat/mnt

mkfs.btrfs -f $TEST_DEV
UUID=$(btrfs fi show $TEST_DEV | head -1 | sed -e 's/.*uuid: \([-0-9a-z]*\)$/\1/')
mount $TEST_DEV $TEST_DIR
LABELFILE=/sys/fs/btrfs/$UUID/label

echo "Test for empty label..." >&2
LINES="$(cat $LABELFILE | wc -l | awk '{print $1}')"
RET=0

if [ $LINES -eq 0 ] ; then
    echo '[PASS] Trailing \n is removed correctly.' >&2
else
    echo '[FAIL] Trailing \n still exists.' >&2
    RET=1
fi

echo "Test for non-empty label..." >&2

echo testlabel >$LABELFILE
LINES="$(cat $LABELFILE | wc -l | awk '{print $1}')"

if [ $LINES -eq 1 ] ; then
    echo '[PASS] Trailing \n is removed correctly.' >&2
else
    echo '[FAIL] Trailing \n still exists.' >&2
    RET=1
fi

exit $RET
===============================================================================

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:25 -07:00
Anand Jain ec95d4917b btrfs: device delete must be sysloged
as in the disk add patch, disk detached from the volume must be
recorded in the syslog as well for the same reason.

Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:23 -07:00
Anand Jain 43d2076168 btrfs: device add must be sysloged
when we add a new disk to the mounted btrfs we don't record it
as of now, disk add is a critical change of btrfs configuration,
it must be recorded in the syslog to help offline investigations
of customer problems when reported.

Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:20 -07:00
Wang Shilong 4027e0f4c4 Btrfs: clear compress-force when remounting with compress option
Steps to reproduce:
 # mkfs.btrfs -f /dev/sdb
 # mount /dev/sdb /mnt -o compress-force=lzo
 # mount /dev/sdb /mnt -o remount,compress=zlib
 # cat /proc/mounts

Remounting from compress-force to compress could not clear compress-force
option. The problem is there is no way for users to clear compress-force
option separately.

Fix this problem by clearing @FORCE_COMPRESS flag when remounting to
compress=xxx.

Suggested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Tested-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:19 -07:00
David Sterba ed6078f703 btrfs: use DIV_ROUND_UP instead of open-coded variants
The form

  (value + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT

is equivalent to

  (value + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE

The rest is a simple subsitution, no difference in the generated
assembly code.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:17 -07:00
David Sterba 4e54b17ad6 btrfs: clean away stripe_align helper
Only wraps the ALIGN macro.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:16 -07:00
David Sterba 707e8a0715 btrfs: use nodesize everywhere, kill leafsize
The nodesize and leafsize were never of different values. Unify the
usage and make nodesize the one. Cleanup the redundant checks and
helpers.

Shaves a few bytes from .text:

  text    data     bss     dec     hex filename
852418   24560   23112  900090   dbbfa btrfs.ko.before
851074   24584   23112  898770   db6d2 btrfs.ko.after

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:14 -07:00
David Sterba 962a298f35 btrfs: kill the key type accessor helpers
btrfs_set_key_type and btrfs_key_type are used inconsistently along with
open coded variants. Other members of btrfs_key are accessed directly
without any helpers anyway.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:12 -07:00
David Sterba 3abdbd780e btrfs: make close_ctree return void
There's no user of the return value and we can get rid of the comment in
put_super.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:11 -07:00
David Sterba 57cdc8db21 btrfs: cleanup ino cache members of btrfs_root
The naming is confusing, generic yet used for a specific cache. Add a
prefix 'ino_' or rename appropriately.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:09 -07:00
David Sterba c6f83c74fd btrfs: clenaup: don't call btrfs_release_path before free_path
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:08 -07:00
David Sterba 32471dc2ba btrfs: remove obsolete comment in btrfs_clean_one_deleted_snapshot
The comment applied when there was a BUG_ON.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:07 -07:00
Linus Torvalds 9e82bf0141 Linux 3.17-rc5 2014-09-14 17:50:12 -07:00
Linus Torvalds 83373f7028 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "double iput() on failure exit in lustre, racy removal of spliced
  dentries from ->s_anon in __d_materialise_dentry() plus a bunch of
  assorted RCU pathwalk fixes"

The RCU pathwalk fixes end up fixing a couple of cases where we
incorrectly dropped out of RCU walking, due to incorrect initialization
and testing of the sequence locks in some corner cases.  Since dropping
out of RCU walk mode forces the slow locked accesses, those corner cases
slowed down quite dramatically.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  be careful with nd->inode in path_init() and follow_dotdot_rcu()
  don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu()
  fix bogus read_seqretry() checks introduced in b37199e
  move the call of __d_drop(anon) into __d_materialise_unique(dentry, anon)
  [fix] lustre: d_make_root() does iput() on dentry allocation failure
2014-09-14 17:37:36 -07:00
Linus Torvalds 9226b5b440 vfs: avoid non-forwarding large load after small store in path lookup
The performance regression that Josef Bacik reported in the pathname
lookup (see commit 99d263d4c5 "vfs: fix bad hashing of dentries") made
me look at performance stability of the dcache code, just to verify that
the problem was actually fixed.  That turned up a few other problems in
this area.

There are a few cases where we exit RCU lookup mode and go to the slow
serializing case when we shouldn't, Al has fixed those and they'll come
in with the next VFS pull.

But my performance verification also shows that link_path_walk() turns
out to have a very unfortunate 32-bit store of the length and hash of
the name we look up, followed by a 64-bit read of the combined hash_len
field.  That screws up the processor store to load forwarding, causing
an unnecessary hickup in this critical routine.

It's caused by the ugly calling convention for the "hash_name()"
function, and easily fixed by just making hash_name() fill in the whole
'struct qstr' rather than passing it a pointer to just the hash value.

With that, the profile for this function looks much smoother.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-14 17:28:32 -07:00
Linus Torvalds 5910cfdce3 Merge branch 'parisc-3.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
 "The most important patch is a new Light Weigth Syscall (LWS) for 8,
  16, 32 and 64 bit atomic CAS operations which is required in order to
  be able to implement the atomic gcc builtins on our platform.

  Other than that, we wire up the seccomp, getrandom and memfd_create
  syscalls, fixes a minor off-by-one bug and a wrong printk string"

* 'parisc-3.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Implement new LWS CAS supporting 64 bit operations.
  parisc: Wire up seccomp, getrandom and memfd_create syscalls
  parisc: dino: fix %d confusingly prefixed with 0x in format string
  parisc: sys_hpux: NUL terminator is one past the end
2014-09-14 12:28:08 -07:00
Al Viro 4023bfc9f3 be careful with nd->inode in path_init() and follow_dotdot_rcu()
in the former we simply check if dentry is still valid after picking
its ->d_inode; in the latter we fetch ->d_inode in the same places
where we fetch dentry and its ->d_seq, under the same checks.

Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-14 14:24:47 -04:00
Al Viro 7bd88377d4 don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu()
return the value instead, and have path_init() do the assignment.  Broken by
"vfs: Fix absolute RCU path walk failures due to uninitialized seq number",
which was Cc-stable with 2.6.38+ as destination.  This one should go where
it went.

To avoid dummy value returned in case when root is already set (it would do
no harm, actually, since the only caller that doesn't ignore the return value
is guaranteed to have nd->root *not* set, but it's more obvious that way),
lift the check into callers.  And do the same to set_root(), to keep them
in sync.

Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-14 14:19:44 -04:00
Linus Torvalds 02c1be3d0c NTB driver fixes for queue spread and buffer alignment. Also, update to
MAINTAINERS to reflect new e-mail address.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUFR7/AAoJEG5mS6x6i9Ij/84P/RniNXGFS4Kh+YuS5E6KdZ+C
 6rzpxlUkeEWkL+1Oy8xpLSJdWsCYajOmteP2V3ubQNlqzd0H/zN5W52/i4tIhVc1
 W7SXexh1tr18BeeiQZ0sX5IlBGzixJ9QDzsuf0cg5keX9RphVBNbm16d8DvW/Le3
 fj2LXyN+K82vy0GPZE0awi9TWfgj1oZobsKswjBp6Kz/z4gq11/rKQBeBvJbxSF/
 WttgvDWONHiuUyHuKYJq53KQ+B7Q6ANwtKfIvd2CpFb8dWlsYYMnr1eRmz4ZC5OM
 NDj2T4Dn5VrxqivmFJ+6d4EZK1agZXYXly+mkW21N2sMqL2/fq7b6CgbKzy76PLz
 sDc1e5WUTJu0AjfrtTQBKG/ZbxXxvpMsarO3rI3C2Oos9DUPnDG8eHsHeOSoZQXi
 tyUyIu4IyH4x7BWO//srU7l82YXWrrGN8+YU+GcLsMGDfsOaR4DDURUu9iYNX3cA
 RoGfhmb+H41X+zOnzgnioyGMcK6mwVvsPxIzZ4CwsSa64aWlud6vbxoJr+q6K2ap
 KEGxT78gy32Jw9wsfYUJKQpa5By7FdpGUDFk/rTt4ZYVU8TR1e+ShCz2l7Ny5Ioh
 u13/vZ/VGVG3+MFjZ5In02egYx3fPnbilbavGCNRQM31/GJxHSc5p8xnoPxLmg70
 tVREVtRX23KREqt38CBf
 =uhqP
 -----END PGP SIGNATURE-----

Merge tag 'ntb-3.17' of git://github.com/jonmason/ntb

Pull ntb driver bugfixes from Jon Mason:
 "NTB driver fixes for queue spread and buffer alignment.  Also, update
  to MAINTAINERS to reflect new e-mail address"

* tag 'ntb-3.17' of git://github.com/jonmason/ntb:
  ntb: Add alignment check to meet hardware requirement
  MAINTAINERS: update NTB info
  NTB: correct the spread of queues over mw's
2014-09-14 10:54:12 -07:00
Linus Torvalds 8ac19f0d90 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull ARM irq chip fixes from Thomas Gleixner:
 "Another pile of ARM specific irq chip fixlets:

   - off by one bugs in the crossbar driver
   - missing annotations
   - a bunch of "make it compile" updates

  I pulled the lot today from Jason, but it has been in -next for at
  least a week"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: gic-v3: Declare rdist as __percpu pointer to __iomem pointer
  irqchip: gic: Make gic_default_routable_irq_domain_ops static
  irqchip: exynos-combiner: Fix compilation error on ARM64
  irqchip: crossbar: Off by one bugs in init
  irqchip: gic-v3: Tag all low level accessors __maybe_unused
  irqchip: gic-v3: Only define gic_peek_irq() when building SMP
2014-09-14 10:37:10 -07:00
Thomas Gleixner 938c04a870 irqchip fixes for v3.17
- gic-v3
     - SMP build fix
     - tag low level accessors __maybe_unused
     - declare rdist as __percpu
 
  - gic
     - staticize
 
  - crossbar
     - fix off-by-one bug
 
  - exynos-combiner
     - fix arm64 build error
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJUFUvDAAoJEP45WPkGe8Znvz8P/0uRjF7iabU2yJjWnZypo96W
 ir62rSy9tn1ITqHoLh9fv98CPvwGtIit1Kjkg+heLFCR5KMao8h+gov6clMMaf8b
 wQQUK+eezfN0Uxx2nM0wbcdpsbjFWQnIxQjreoADcneoRqZu8UZIWIoBU/VAC1PP
 m4jaUFWCaktQVCtvC4/gG60Zi4GQO2ll08Ixh71BWmmoS4A08RQV8PjlNfDm4E7H
 VoQI4ALCui1hl+BRIZw35Em0EDd+u8fYJTMfqP7750rgtSF4bNDCjuZQ8dCmfwoo
 MbNHgOGDfAfIC7kK4X6IUxqveqaDxIYvjoeek/Xq9Vc7p8xY3G5rJ8asd8ZxHOd7
 oOEexLbQeg7dLwAqPXjASppE4M7VyhzOwX9KVBoQz1oXIGvYIBuZ3FeOFEOnZW1K
 YdPCIdggnFI7yrResVgIz09bMfi4f4AflUWFrBO4nMseOrTesx1myg2rymp9mjzN
 WVV4vOl+xn0EhU/G4bTtn8MVjKZ5DhREtS33lTlN84kZktcQAxXbV/TY2oLPtFXu
 f8lPZpEkEvBa32IYQoA85UYbYouVo5yl7OAYKIM/DpEmZYUEnsdFR6DwgX/t4ghz
 4JLnOb0GJbYNyziLtGwp4sMGkdFIQ264STov+5Rf/Q5EkUl6G13TWRbkOcXCrkvP
 sSNXMGwgVKDKlZImU/IA
 =3wED
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-urgent-3.17' of git://git.infradead.org/users/jcooper/linux into irq/urgent

irqchip fixes for v3.17 from Jason Cooper

 - GIC/GICV3: Various fixlets
 - crossbar: Fix off-by-one bug
 - exynos-combiner: Fix arm64 build error
2014-09-14 15:20:54 +02:00
Dave Jiang 3cc5ba1938 ntb: Add alignment check to meet hardware requirement
The NTB translate register must have the value to be BAR size aligned.
This alignment check make sure that the DMA memory allocated has the
proper alignment. Another requirement for NTB to function properly with
memory window BAR size greater or equal to 4M is to use the CMA feature
in 3.16 kernel with the appropriate CONFIG_CMA_ALIGNMENT and
CONFIG_CMA_SIZE_MBYTES set.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2014-09-14 00:10:38 -04:00
Jon Mason 9ef6bf6c75 MAINTAINERS: update NTB info
Update my contact info to my personal email address and add Dave Jiang.

Signed-off-by: Jon Mason <jon.mason@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2014-09-14 00:10:38 -04:00
Jon Mason a1413cfbcb NTB: correct the spread of queues over mw's
The detection of an uneven number of queues on the given memory windows
was not correct.  The mw_num is zero based and the mod should be
division to spread them evenly over the mw's.

Signed-off-by: Jon Mason <jon.mason@intel.com>
2014-09-14 00:10:38 -04:00
Al Viro f5be3e2912 fix bogus read_seqretry() checks introduced in b37199e
read_seqretry() returns true on mismatch, not on match...

Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-13 22:14:16 -04:00
Al Viro 6f18493e54 move the call of __d_drop(anon) into __d_materialise_unique(dentry, anon)
and lock the right list there

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-13 22:14:03 -04:00
Al Viro f77ced6637 [fix] lustre: d_make_root() does iput() on dentry allocation failure
double-free is a bad thing

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-09-13 22:13:39 -04:00
Linus Torvalds 1536340e7c Merge branches 'locking-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex and timer fixes from Thomas Gleixner:
 "A oneliner bugfix for the jinxed futex code:

   - Drop hash bucket lock in the error exit path.  I really could slap
     myself for intruducing that bug while fixing all the other horror
     in that code three month ago ...

  and the timer department is not too proud about the following fixes:

   - Deal with a long standing rounding bug in the timeval to jiffies
     conversion.  It's a real issue and this fix fell through the cracks
     for quite some time.

   - Another round of alarmtimer fixes.  Finally this code gets used
     more widely and the subtle issues hidden for quite some time are
     noticed and fixed.  Nothing really exciting, just the itty bitty
     details which bite the serious users here and there"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Unlock hb->lock in futex_wait_requeue_pi() error path

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  alarmtimer: Lock k_itimer during timer callback
  alarmtimer: Do not signal SIGEV_NONE timers
  alarmtimer: Return relative times in timer_gettime
  jiffies: Fix timeval conversion to jiffies
2014-09-13 14:22:12 -07:00
Guy Martin 8920649120 parisc: Implement new LWS CAS supporting 64 bit operations.
The current LWS cas only works correctly for 32bit. The new LWS allows
for CAS operations of variable size.

Signed-off-by: Guy Martin <gmsoft@tuxicoman.be>
Cc: <stable@vger.kernel.org> # 3.13+
Signed-off-by: Helge Deller <deller@gmx.de>
2014-09-13 22:40:48 +02:00
Linus Torvalds 99d263d4c5 vfs: fix bad hashing of dentries
Josef Bacik found a performance regression between 3.2 and 3.10 and
narrowed it down to commit bfcfaa77bd ("vfs: use 'unsigned long'
accesses for dcache name comparison and hashing"). He reports:

 "The test case is essentially

      for (i = 0; i < 1000000; i++)
              mkdir("a$i");

  On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k
  dir/sec with 3.10.  This is because we spend waaaaay more time in
  __d_lookup on 3.10 than in 3.2.

  The new hashing function for strings is suboptimal for <
  sizeof(unsigned long) string names (and hell even > sizeof(unsigned
  long) string names that I've tested).  I broke out the old hashing
  function and the new one into a userspace helper to get real numbers
  and this is what I'm getting:

      Old hash table had 1000000 entries, 0 dupes, 0 max dupes
      New hash table had 12628 entries, 987372 dupes, 900 max dupes
      We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash

  My test does the hash, and then does the d_hash into a integer pointer
  array the same size as the dentry hash table on my system, and then
  just increments the value at the address we got to see how many
  entries we overlap with.

  As you can see the old hash function ended up with all 1 million
  entries in their own bucket, whereas the new one they are only
  distributed among ~12.5k buckets, which is why we're using so much
  more CPU in __d_lookup".

The reason for this hash regression is two-fold:

 - On 64-bit architectures the down-mixing of the original 64-bit
   word-at-a-time hash into the final 32-bit hash value is very
   simplistic and suboptimal, and just adds the two 32-bit parts
   together.

   In particular, because there is no bit shuffling and the mixing
   boundary is also a byte boundary, similar character patterns in the
   low and high word easily end up just canceling each other out.

 - the old byte-at-a-time hash mixed each byte into the final hash as it
   hashed the path component name, resulting in the low bits of the hash
   generally being a good source of hash data.  That is not true for the
   word-at-a-time case, and the hash data is distributed among all the
   bits.

The fix is the same in both cases: do a better job of mixing the bits up
and using as much of the hash data as possible.  We already have the
"hash_32|64()" functions to do that.

Reported-by: Josef Bacik <jbacik@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-13 11:30:10 -07:00