Commit Graph

1997 Commits

Author SHA1 Message Date
Abhi Das 5e86d9d122 gfs2: time journal recovery steps accurately
This patch spits out the time taken by the various steps in the
journal recover process. Previously, the journal recovery time
didn't account for finding the journal head in the log which takes
up a significant portion of time.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-29 10:41:27 -07:00
Andreas Gruenbacher fffb64127a gfs2: Zero out fallocated blocks in fallocate_chunk
Instead of zeroing out fallocated blocks in gfs2_iomap_alloc, zero them
out in fallocate_chunk, much higher up the call stack.  This gets rid of
gfs2's abuse of the IOMAP_ZERO flag as well as the gfs2 specific zeronew
buffer flag.  I can't think of a reason why zeroing out the blocks in
gfs2_iomap_alloc would have any benefits: there is no additional locking
at that level that would add protection to the newly allocated blocks.

While at it, change fallocate over from gs2_block_map to gfs2_iomap_begin.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2018-03-29 06:50:32 -07:00
Christoph Hellwig 0e11f6443f fs: move I_DIRTY_INODE to fs.h
And use it in a few more places rather than opencoding the values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28 01:39:02 -04:00
Christoph Hellwig 937d330512 gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls
I_DIRTY_DATASYNC is a strict superset of I_DIRTY_SYNC semantics, as
in mark dirty to be written out by fdatasync as well.  So dirtying
for both flags makes no sense.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-28 01:39:01 -04:00
Andreas Gruenbacher bb491ce67a gfs2: Check for the end of metadata in punch_hole
When punching a hole or truncating an inode down to a given size, also
check if the truncate point / start of the hole is within the range we
have metadata for.  Otherwise, we can end up freeing blocks that
shouldn't be freed, corrupting the inode, or crashing the machine when
trying to punch a hole into the void.

When growing an inode via truncate, we set the new size but we don't
allocate additional levels of indirect blocks and grow the inode height.
When shrinking that inode again, the new size may still point beyond the
end of the inode's metadata.

Fixes xfstest generic/476.

Debugged-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-23 11:43:02 -07:00
Andreas Gruenbacher ee6ed857c8 gfs2: gfs2_iomap_end tracepoint: log block address
In the gfs2_iomap_end tracepoint, log the physical block address, just
as in the gfs2_bmap tracepoint.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-15 07:17:17 -07:00
Andreas Gruenbacher d39d18e0ef gfs2: Improve gfs2_block_map comment
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-08 09:26:20 -07:00
Bob Peterson b9e03f1861 GFS2: Only set PageChecked for jdata pages
Before this patch, GFS2 was setting the PageChecked flag for ordered
write pages. This is unnecessary. The ext3 file system only does it
for jdata, and it's only used in jdata circumstances. It only muddies
the already murky waters of writing pages in the aops.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-08 09:26:20 -07:00
Bob Peterson 9bc980cdb9 GFS2: Make function gfs2_remove_from_ail static
Function gfs2_remove_from_ail is only ever used from log.c, so there
is no reason to declare it extern. This patch removes the extern and
declares it static.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-08 09:26:20 -07:00
Andreas Gruenbacher 83998ccd9b gfs2: Dirty source inode during rename
Mark the source inode dirty during a rename instead of just updating the
underlying buffer head.  Otherwise, fsync may find the inode clean and
will then skip flushing the journal.  A subsequent power failure will
cause the rename to be lost.  This happens in command sequences like:

  xfs_io -f -c 'pwrite 0 4096' -c 'fsync' foo
  mv foo bar
  xfs_io -c 'fsync' bar
  # power failure

Fixes xfstests generic/322, generic/376.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-08 09:26:20 -07:00
Andreas Gruenbacher 174d1232eb gfs2: Fix fallocate chunk size
The chunk size of allocations in __gfs2_fallocate is calculated
incorrectly.  The size can collapse, causing __gfs2_fallocate to
allocate one block at a time, which is very inefficient.  This needs
fixing in two places:

In gfs2_quota_lock_check, always set ap->allowed to UINT_MAX to indicate
that there is no quota limit.  This fixes callers that rely on
ap->allowed to be set even when quotas are off.

In __gfs2_fallocate, reset max_blks to UINT_MAX in each iteration of the
loop to make sure that allocation limits from one resource group won't
spill over into another resource group.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-08 09:26:20 -07:00
Andreas Gruenbacher 3b5da96e45 gfs2: Fixes to "Implement iomap for block_map" (2)
It turns out that commit 3229c18c0d6b2 'Fixes to "Implement iomap for
block_map"' introduced another bug in gfs2_iomap_begin that can cause
gfs2_block_map to set bh->b_size of an actual buffer to 0.  This can
lead to arbitrary incorrect behavior including crashes or disk
corruption.  Revert the incorrect part of that commit.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-07 11:40:38 -07:00
Andreas Gruenbacher 49edd5bf42 gfs2: Fixes to "Implement iomap for block_map"
It turns out that commit 3974320ca6 "Implement iomap for block_map"
introduced a few bugs that trigger occasional failures with xfstest
generic/476:

In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
beyond the end of the allocated metadata (height > ip->i_height).
There, we can end up calling hole_size with a metapath that doesn't
match the current metadata tree, which doesn't make sense.  After
untangling the code at do_alloc, fix this by checking if the block we
are looking for is within the range of allocated metadata.

In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
for reading stuffed files: this is handled separately.  Make sure we
don't truncate iomap->length for reads beyond the end of the file; in
that case, the entire range counts as a hole.

Finally, revert to taking a bitmap write lock when doing allocations.
It's unclear why that change didn't lead to any failures during testing.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-13 13:38:10 -07:00
Andreas Gruenbacher 7ac07fdaf8 gfs2: Glock dump performance regression fix
Restore an optimization removed in commit 7f19449553 "Fix debugfs glocks
dump": keep the glock hash table iterator active while the glock dump
file is held open.  This avoids having to rescan the hash table from the
start for each read, with quadratically rising runtime.

In addition, use rhastable_walk_peek for resuming a glock dump at the
current position: when a glock doesn't fit in the provided buffer
anymore, the next read must revisit the same glock.

Finally, also restart the dump from the first entry when we notice that
the hash table has been resized in gfs2_glock_seq_start.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-01 11:27:11 -07:00
Andreas Gruenbacher dcb2cd55cf gfs2: Fix the crc32c dependency
Depend on LIBCRC32C which uses the crypto API to select the appropriate
crc32c implementation.  With the CRYPTO and CRYPTO_CRC32C dependencies,
gfs2 would still need to use the crypto API directly like ext4 and btrfs
do, which isn't necessary.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-01 11:25:31 -07:00
Linus Torvalds b2fe5fa686 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Significantly shrink the core networking routing structures. Result
    of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf

 2) Add netdevsim driver for testing various offloads, from Jakub
    Kicinski.

 3) Support cross-chip FDB operations in DSA, from Vivien Didelot.

 4) Add a 2nd listener hash table for TCP, similar to what was done for
    UDP. From Martin KaFai Lau.

 5) Add eBPF based queue selection to tun, from Jason Wang.

 6) Lockless qdisc support, from John Fastabend.

 7) SCTP stream interleave support, from Xin Long.

 8) Smoother TCP receive autotuning, from Eric Dumazet.

 9) Lots of erspan tunneling enhancements, from William Tu.

10) Add true function call support to BPF, from Alexei Starovoitov.

11) Add explicit support for GRO HW offloading, from Michael Chan.

12) Support extack generation in more netlink subsystems. From Alexander
    Aring, Quentin Monnet, and Jakub Kicinski.

13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From
    Russell King.

14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso.

15) Many improvements and simplifications to the NFP driver bpf JIT,
    from Jakub Kicinski.

16) Support for ipv6 non-equal cost multipath routing, from Ido
    Schimmel.

17) Add resource abstration to devlink, from Arkadi Sharshevsky.

18) Packet scheduler classifier shared filter block support, from Jiri
    Pirko.

19) Avoid locking in act_csum, from Davide Caratti.

20) devinet_ioctl() simplifications from Al viro.

21) More TCP bpf improvements from Lawrence Brakmo.

22) Add support for onlink ipv6 route flag, similar to ipv4, from David
    Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits)
  tls: Add support for encryption using async offload accelerator
  ip6mr: fix stale iterator
  net/sched: kconfig: Remove blank help texts
  openvswitch: meter: Use 64-bit arithmetic instead of 32-bit
  tcp_nv: fix potential integer overflow in tcpnv_acked
  r8169: fix RTL8168EP take too long to complete driver initialization.
  qmi_wwan: Add support for Quectel EP06
  rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK
  ipmr: Fix ptrdiff_t print formatting
  ibmvnic: Wait for device response when changing MAC
  qlcnic: fix deadlock bug
  tcp: release sk_frag.page in tcp_disconnect
  ipv4: Get the address of interface correctly.
  net_sched: gen_estimator: fix lockdep splat
  net: macb: Handle HRESP error
  net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
  ipv6: addrconf: break critical section in addrconf_verify_rtnl()
  ipv6: change route cache aging logic
  i40e/i40evf: Update DESC_NEEDED value to reflect larger value
  bnxt_en: cleanup DIM work on device shutdown
  ...
2018-01-31 14:31:10 -08:00
Andreas Gruenbacher af38816e48 gfs2: Add a few missing newlines in messages
Some of the info, warning, and error messages are missing their trailing
newline.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-30 10:32:30 -07:00
Abhi Das 957a7acd46 gfs2: Remove inode from ordered write list in gfs2_write_inode()
The vfs clears the I_DIRTY inode flag before calling gfs2_write_inode()
having queued any data that needed to be written to disk.
This is a good time to remove such inodes from our ordered write list
so they don't hang around for long periods of time.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-30 10:00:27 -07:00
Bob Peterson 2eb5909dee GFS2: Don't try to end a non-existent transaction in unlink
Before this patch, if function gfs2_unlink failed to get a valid
transaction (for example, not enough journal blocks) it would go
to label out_end_trans which did gfs2_trans_end. But if the
trans_begin failed, there's no transaction to end, and trying to
do so results in: kernel BUG at fs/gfs2/trans.c:117!

This patch changes the goto so that it does not try to end a
non-existent transaction.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-29 10:00:23 -07:00
Bob Peterson 4519eaad72 GFS2: Fix minor comment typo
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-25 10:18:06 -07:00
Bob Peterson 805c090750 GFS2: Log the reason for log flushes in every log header
This patch just adds the capability for GFS2 to track which function
called gfs2_log_flush. This should make it easier to diagnose
problems based on the sequence of events found in the journals.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-23 07:39:20 -07:00
Bob Peterson c1696fb85d GFS2: Introduce new gfs2_log_header_v2
This patch adds a new structure called gfs2_log_header_v2 which is used
to store expanded fields into previously unused areas of the log headers
(i.e., this change is backwards compatible).  Some of these are used for
debug purposes so we can backtrack when problems occur.  Others are
reserved for future expansion.

This patch is based on a prototype from Steve Whitehouse.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-23 07:38:53 -07:00
Andreas Gruenbacher 0ff5916ad4 gfs2: Get rid of gfs2_log_header_in
Get rid of gfs2_log_header_in by integrating it into get_log_header.
Clean up the crc32 computations and use the same functions for encoding
and decoding to make things less confusing.  Eliminate lh_hash from
gfs2_log_header_host which is completely useless.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-22 07:06:15 -07:00
Andreas Gruenbacher 88b65ce5fd gfs2: Minor gfs2_page_add_databufs cleanup
The to parameter of gfs2_page_add_databufs is passed inconsistently:
once as from + len, once as from + len - 1.  Just pass len instead.

In addition, once we're past the end, we can immediately break out of
the loop.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18 14:18:55 -07:00
Andreas Gruenbacher 235628c5c7 gfs2: Add gfs2_max_stuffed_size
Add a small inline function for computing the maximum size of a stuffed
inode instead of open coding that in several places throughout the code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18 14:18:53 -07:00
Andreas Gruenbacher 9db115a0e3 gfs2: Typo fixes
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18 14:18:49 -07:00
Bob Peterson 786ebd9f68 Merge branch 'punch-hole' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 2018-01-18 14:17:13 -07:00
Andreas Gruenbacher 4e56a6411f gfs2: Implement fallocate(FALLOC_FL_PUNCH_HOLE)
Implement the top-level bits of punching a hole into a file.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18 21:15:58 +01:00
Andreas Gruenbacher 10d2cf94c2 gfs2: Turn trunc_dealloc into punch_hole
Add an upper bound to the range of blocks to deallocate blocks to
function trunc_dealloc so that this function can be used for truncating
a file as well as for punching a hole into a file.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18 21:15:57 +01:00
Andreas Gruenbacher 5cf26b1e88 gfs2: Generalize truncate code
Pull the code for computing the range of metapointers to iterate out of
gfs2_metapath_ra (for readahead), sweep_bh_for_rgrps (for deallocating
metapointers within a block), and trunc_dealloc (for walking the
metadata tree).

In sweep_bh_for_rgrps, move the code for looking up the resource group
descriptor of the current resource group out of the inner loop.  The
metatype check moves to trunc_dealloc.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18 21:15:37 +01:00
Andreas Gruenbacher bdba0d5ec1 Turn gfs2_block_truncate_page into gfs2_block_zero_range
Turn gfs2_block_truncate_page into a function that zeroes a range within
a block rather than only the end of a block.  This will be used for
cleaning the end of the first partial block and the start of the last
partial block when punching a hole in a file.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:53 -07:00
Andreas Gruenbacher cb7f0903ef gfs2: Improve non-recursive delete algorithm
In rare cases, the current non-recursive delete algorithm doesn't
deallocate empty intermediary indirect blocks.  This should have very
little practical effect, but deallocating all blocks correctly should
still be preferable as it is cleaner and easier to validate.

The fix consists of using the first block to deallocate to compute the
start marker of the truncate point instead of the last block that needs
to be kept.  With that change, computing which indirect blocks are still
needed becomes relatively easy.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:52 -07:00
Andreas Gruenbacher c3ce5aa9b0 gfs2: Fix metadata read-ahead during truncate
The metadata read-ahead algorithm broke when switching from recursive to
non-recursive delete: the current algorithm reads ahead blocks at height
N - 1 while deallocating the blocks at hight N.  However, deallocating
the blocks at height N requires a complete walk of the metadata tree,
not only down to height N - 1.  Consequently, all blocks below height
N - 1 will be accessed without read-ahead.

Fix this by issuing read-aheads as early as possible, after each
metapath lookup.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:50 -07:00
Andreas Gruenbacher e8b43fe0c1 gfs2: Clean up {lookup,fillup}_metapath
Split out the entire lookup loop from lookup_metapath and
fillup_metapath.  Make both functions return the actual height in
mp->mp_aheight, and return 0 on success.  Handle lookup errors properly
in trunc_dealloc.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:48 -07:00
Andreas Gruenbacher e7fdf00406 gfs2: Remove minor gfs2_journaled_truncate inefficiencies
First, this function truncates the file in chunks.  When the original
file size isn't block aligned, each chunk that is truncated will remain
be misaligned.  This is inefficient.

Second, this function doesn't recognize where holes are, so it loops
through them.  For each chunk of a hole, it creates a new transaction.
At least avoid creating another transactions whe the current one is
still empty.  (An better fix would be to skip large holes, of course.)

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:47 -07:00
Andreas Gruenbacher 8b5860a35c gfs2: truncate: Remove unnecessary oldsize parameters
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:45 -07:00
Andreas Gruenbacher 80990f404d gfs2: Clean up trunc_start error path
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:42 -07:00
Andreas Gruenbacher da5eb9cdda gfs2: Remove pointless BUG_ON
The current transaction is being dereferenced before asserting that is
not NULL; that isn't going to help.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:35 -07:00
Steven Whitehouse 90bcab998d gfs2: Add gfs2_blk2rgrpd comment and fix incorrect use
Document when to use gfs2_blk2rgrpd for "inexact" resource group
matching.  Based on that, fix an incorrect use of gfs2_blk2rgrpd in
sweep_bh_for_rgrps.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:34:24 -07:00
Abhi Das 1f23bc7869 gfs2: Trim the ordered write list in gfs2_ordered_write()
We iterate through the entire ordered writes list in
gfs2_ordered_write() to write out inodes. It's a good
place to try and shrink the list by throwing out inodes
that don't have any pages.

Signed-off-by: Abhi Das <adas@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-22 07:55:31 -06:00
Bob Peterson 588bff95c9 GFS2: Reduce code redundancy writing log headers
Before this patch, there was a lot of code redundancy between functions
log_write_header (which uses bio) and clean_journal (which uses
buffer_head). This patch reduces the redundancy to simplify the code
and make log header writing more consistent. We want more consistency
and reduced redundancy because we plan to add a bunch of new fields
to improve performance (by eliminating the local statfs and quota files)
improve metadata integrity (by adding new crcs and such) and for better
debugging (by adding new fields to track when and where metadata was
pushed through the journals.) We don't want to duplicate setting these
new fields, nor allow for human error in the process.

This reduction in code redundancy is accomplished by introducing a new
helper function, gfs2_write_log_header which uses bio rather than bh.
That simplifies recovery function clean_journal() to use the new helper
function and iomap rather than redundancy and block_map (and eventually
we can maybe remove block_map). It also reduces our dependency on
buffer_heads.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-22 07:51:29 -06:00
Andrew Price 850d2d915f gfs2: Add a crc field to resource group headers
Add the rg_crc field to store a crc32 of the gfs2_rgrp structure. This
allows us to check resource group headers' integrity and removes the
requirement to check them against the rindex entries in fsck. If this
field is found to be zero, it should be ignored (or updated with an
accurate value).

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12 11:43:42 -06:00
Andrew Price 166725d963 gfs2: Add rindex fields to rgrp headers
Add rg_data0, rg_data and rg_bitbytes to struct gfs2_rgrp. The fields
are identical to their counterparts in struct gfs2_rindex and are
intended to reduce the use of the rindex. For now the fields are only
written back as the in-memory equivalents in struct gfs2_rgrpd are set
using values from the rindex. However, they are needed at this point so
that userspace can make use of them, allowing a migration away from the
rindex over time.

The new fields take up previously reserved space which was explicitly
zeroed on write so, in clusters with mixed kernels, these fields could
get zeroed after being set and this should not be treated as an error.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12 11:43:36 -06:00
Andrew Price 65adc27375 gfs2: Add a next-resource-group pointer to resource groups
Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The
rg_skip field has the following meaning:

- If rg_skip is zero, it is considered unset and not useful.
- If rg_skip is non-zero, its value will be the number of blocks between
  this rgrp's address and the next rgrp's address. This can be used as a
  hint by fsck.gfs2 when rebuilding a bad rindex, for example.

This will provide less dependency on the rindex in future, and allow
tools such as fsck.gfs2 to iterate the resource groups without keeping
the rindex around.

The field is updated in gfs2_rgrp_out() so that existing file systems
will have it set. This means that any resource groups that aren't ever
written will not be updated. The final rgrp is a special case as there
is no next rgrp, so it will always have a rg_skip of 0 (unless the fs is
extended).

Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so
the rg_skip field can get set back to 0 in cases where nodes with and
without this patch are mixed in a cluster. In some cases, the field may
bounce between being set by one node and then zeroed by another which
may harm performance slightly, e.g. when two nodes create many small
files. In testing this situation is rare but it becomes more likely as
the filesystem fills up and there are fewer resource groups to choose
from. The problem goes away when all nodes are running with this patch.
Dipping into the space currently occupied by the rg_reserved field would
have resulted in the same problem as it is also explicitly zeroed, so
unfortunately there is no other way around it.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-12 11:43:08 -06:00
Tom Herbert 97a6ec4ac0 rhashtable: Change rhashtable_walk_start to return void
Most callers of rhashtable_walk_start don't care about a resize event
which is indicated by a return value of -EAGAIN. So calls to
rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something
like this is common:

       ret = rhashtable_walk_start(rhiter);
       if (ret && ret != -EAGAIN)
               goto out;

Since zero and -EAGAIN are the only possible return values from the
function this check is pointless. The condition never evaluates to true.

This patch changes rhashtable_walk_start to return void. This simplifies
code for the callers that ignore -EAGAIN. For the few cases where the
caller cares about the resize event, particularly where the table can be
walked in mulitple parts for netlink or seq file dump, the function
rhashtable_walk_start_check has been added that returns -EAGAIN on a
resize event.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-11 09:58:38 -05:00
Linus Torvalds 1751e8a6cb Rename superblock flags (MS_xyz -> SB_xyz)
This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 13:05:09 -08:00
Andreas Gruenbacher 9aa0159327 gfs2: Remove unused gfs2_write_jdata_pagevec parameter
As a follow-up to commit d2bc5b3c67, remove the end parameter which is
now unused.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-11-27 10:54:55 -06:00
Tetsuo Handa 8b0d7f56b9 gfs2: Fix wrong error handling in init_gfs2_fs()
init_gfs2_fs() is calling e.g. calling unregister_shrinker() without
register_shrinker() when an error occurred during initialization.
Rename goto labels and call appropriate undo function.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-11-27 10:47:22 -06:00
Bob Peterson a18c78c5f5 GFS2: Combine gfs2_free_di with gfs2_free_uninit_di
Before this patch, function gfs2_free_di was 4 lines of code, and
one of those lines was to call gfs2_free_uninit_di. Although
unlikely, if function gfs2_free_uninit_di encountered an error
finding the block to be freed, the error was silently ignored by the
caller, which went ahead and improperly did a quota-change operation
and meta_wipe despite the error. This patch combines the two
functions into one to make the code more readable and fixes the bug
by returning from the combined function before it takes those next
incorrect steps.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-11-27 10:47:14 -06:00
Mel Gorman 8667982014 mm, pagevec: remove cold parameter for pagevecs
Every pagevec_init user claims the pages being released are hot even in
cases where it is unlikely the pages are hot.  As no one cares about the
hotness of pages being released to the allocator, just ditch the
parameter.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:06 -08:00
Jan Kara 67fd707f46 mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages.  Just drop the argument.

Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Jan Kara d2bc5b3c67 gfs2: use pagevec_lookup_range_tag()
We want only pages from given range in gfs2_write_cache_jdata().  Use
pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove
unnecessary code.

Link: http://lkml.kernel.org/r/20171009151359.31984-9-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Linus Torvalds 29309a4eb8 We've got a total of 17 GFS2 patches for this merge window. The
patches are basically in three categories: (1) patches related to
 broken xfstest cases, (2) patches related to improving iomap and
 start using it in GFS2, and (3) general typos and clarifications.
 
 Please note that one of the iomap patches extends beyond GFS2 and
 affects other file systems, but it was publically reviewed by a
 variety of file system people in the community.
 
 1. Andreas has a patch that simply renames variable 'bsize' to 'factor'
    to clarify the logic related to gfs2_block_map.
 2. He also has a patch to correctly set ctime in the setflags ioctl,
    which fixes broken xfstests test 277.
 3. He also fixed broken xfstest 258, due to an atime initialization
    problem.
 4. He also fixed broken xfstest 307, in which GFS2 was not setting
    ctime when setting acls.
 5. He has a patch to switch general iomap code from blkno to disk
    offset for a variety of file systems.
 6. He has a patch to add a new IOMAP_F_DATA_INLINE flag for iomap
    to indicate blocks that have data mixed with metadata.
 7. I contributed a patch to make inode height info part of the
    'metapath' data structure to facilitate using iomap in GFS2.
 8. I have a patch to start using iomap inside GFS2 and switch GFS2's
    block_map functions to use iomap under the covers.
 9. I have a patch to switch GFS2's fiemap implementation from using
    block_map to using iomap under the covers.
 10. Andreas has a patch to implement SEEK_HOLE and SEEK_DATA via
     iomap in GFS2.
 11. I have a patch related to journaled data pages not being properly
     synced to media when writing inodes. This was caught with xfstests.
 12. I have a patch to fix another failing xfstest case in which
     switching a file from ordered_write to journaled data via set_flags
     caused a deadlock.
 13. Andreas has a patch to fix failing xfstest case 066, which was
     due to not properly syncing dirty inodes when changing extended
     attributes.
 14. Andreas fixed a minor typo in a comment.
 15. Andreas contributed a patch to partially fix xfstest 424, which
     involved GET_FLAGS and SET_FLAGS ioctl. This is also a cleanup
     and simplification of the translation of flags from fs flags to
     gfs2 flags.
 16. He also added support for STATX_ATTR_ in statx, which fixed broken
     xfstest 424.
 17. He also contributed a fix for failing xfstest 093 which fixes a
     recursive glock problem with gfs2_xattr_get and _set.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJaCx5uAAoJENeLYdPf93o7Nb8H/RLJ2CsEbTSJQ82RH4eptoxe
 XbQ4HVig9Hm8k5teSTH9DdVypkxjPtbJZY9k1Y4mEtddtCZ/yS407aTdr/pP0C5r
 3W8Ouu2JXmqPKWg0sp3wC/Pji2ThCYssQXNyBSDPADsF2C8XEuT7aL/YPzMitIdm
 Lxa9JHo1tKgdFnkloNyaTt4MdBGNF5M5UBr6KgRfwhgooHWbxM0rNyZIXJtySb0I
 vsaNNOA7a4VQp1Fo1DkHQomNbOG5hpVKfswUOOZvk2RdAewTPN+jXiOAmIhNjQ3Y
 /PkJLjRCf8Ob/VIYmt2BTs16+07mODGv1d6DuhgXzH/dfiVihVGvVo71DxXx5uw=
 =i8b2
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.15.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Bob Peterson:
 "We've got a total of 17 GFS2 patches for this merge window. The
  patches are basically in three categories: (1) patches related to
  broken xfstest cases, (2) patches related to improving iomap and start
  using it in GFS2, and (3) general typos and clarifications.

  Please note that one of the iomap patches extends beyond GFS2 and
  affects other file systems, but it was publically reviewed by a
  variety of file system people in the community.

  From Andreas Gruenbacher:

   - rename variable 'bsize' to 'factor' to clarify the logic related to
     gfs2_block_map.

   - correctly set ctime in the setflags ioctl, which fixes broken
     xfstests test 277.

   - fix broken xfstest 258, due to an atime initialization problem.

   - fix broken xfstest 307, in which GFS2 was not setting ctime when
     setting acls.

   - switch general iomap code from blkno to disk offset for a variety
     of file systems.

   - add a new IOMAP_F_DATA_INLINE flag for iomap to indicate blocks
     that have data mixed with metadata.

   - implement SEEK_HOLE and SEEK_DATA via iomap in GFS2.

   - fix failing xfstest case 066, which was due to not properly syncing
     dirty inodes when changing extended attributes.

   - fix a minor typo in a comment.

   - partially fix xfstest 424, which involved GET_FLAGS and SET_FLAGS
     ioctl. This is also a cleanup and simplification of the translation
     of flags from fs flags to gfs2 flags.

   - add support for STATX_ATTR_ in statx, which fixed broken xfstest
     424.

   - fix for failing xfstest 093 which fixes a recursive glock problem
     with gfs2_xattr_get and _set

  From me:

   - make inode height info part of the 'metapath' data structure to
     facilitate using iomap in GFS2.

   - start using iomap inside GFS2 and switch GFS2's block_map functions
     to use iomap under the covers.

   - switch GFS2's fiemap implementation from using block_map to using
     iomap under the covers.

   - fix journaled data pages not being properly synced to media when
     writing inodes. This was caught with xfstests.

   - fix another failing xfstest case in which switching a file from
     ordered_write to journaled data via set_flags caused a deadlock"

* tag 'gfs2-4.15.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Allow gfs2_xattr_set to be called with the glock held
  gfs2: Add support for statx inode flags
  gfs2: Fix and clean up {GET,SET}FLAGS ioctl
  gfs2: Fix a harmless typo
  gfs2: Fix xattr fsync
  GFS2: Take inode off order_write list when setting jdata flag
  GFS2: flush the log and all pages for jdata as we do for WB_SYNC_ALL
  gfs2: Implement SEEK_HOLE / SEEK_DATA via iomap
  GFS2: Switch fiemap implementation to use iomap
  GFS2: Implement iomap for block_map
  GFS2: Make height info part of metapath
  gfs2: Always update inode ctime in set_acl
  gfs2: Support negative atimes
  gfs2: Update ctime in setflags ioctl
  gfs2: Clarify gfs2_block_map
2017-11-14 13:55:51 -08:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Andreas Gruenbacher d0920a9cd7 gfs2: Allow gfs2_xattr_set to be called with the glock held
On the following call path:

  gfs2_setattr -> setattr_prepare -> ... ->
    cap_inode_killpriv -> ... ->
      gfs2_xattr_set

the glock is locked in gfs2_setattr, so check for recursive locking in
gfs2_xattr_set as gfs2_xattr_get already does.  While at it, get rid of
need_unlock in gfs2_xattr_get.

Fixes xfstest generic/093.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:59 +01:00
Andreas Gruenbacher b2623c2fe6 gfs2: Add support for statx inode flags
Add support for the STATX_ATTR_ flags in statx.  (Compression,
encryption, and the nodump flag are not supported by gfs2.)

Partially fixes xfstest generic/424.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:58 +01:00
Andreas Gruenbacher b16f7e57b7 gfs2: Fix and clean up {GET,SET}FLAGS ioctl
Switch to a simple array for mapping between the FS_*_FL and GFS_DIF_*
flags.  Clarify how the mapping between FS_JOURNAL_DATA_FL and the
filesystem flags works.  The GFS2_DIF_SYSTEM flag cannot be set from
user space, so remove it from GFS2_FLAGS_USER_SET.  Fail with -EINVAL
when trying to set flags that are not supported instead of silently
ignoring those flags.

Partially fixes xfstest generic/424.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:57 +01:00
Andreas Gruenbacher 61d6899ad4 gfs2: Fix a harmless typo
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:56 +01:00
Andreas Gruenbacher 6862c44ec5 gfs2: Fix xattr fsync
Make sure that changing xattrs marks the corresponding inode dirty so
that a subsequent fsync will sync those changes to disk.  We set
I_DIRTY_SYNC as well as I_DIRTY_DATASYNC so that both fsync and
fdatasync will sync xattr changes: xattrs can contain information
critical to how the data can be accessed, so we don't want fdatasync
to skip them.

Fixes xfstest generic/066.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:56 +01:00
Bob Peterson cc555b09d8 GFS2: Take inode off order_write list when setting jdata flag
This patch fixes a deadlock caused when the jdata flag is set for
inodes that are already on the ordered write list. Since it is
on the ordered write list, log_flush calls gfs2_ordered_write which
calls filemap_fdatawrite. But since the inode had the jdata flag
set, that calls gfs2_jdata_writepages, which tries to start a new
transaction. A new transaction cannot be started because it tries
to acquire the log_flush rwsem which is already locked by the log
flush operation.

The bottom line is: We cannot switch an inode from ordered to jdata
until we eliminate any ordered data pages (via log flush) or any
log_flush operation afterward will create the circular dependency
above. So we need to flush the log before setting the diskflags to
switch the file mode, then we need to remove the inode from the
ordered writes list.

Before this patch, the log flush was done for jdata->ordered, but
that's wrong. If we're going from jdata to ordered, we don't need
to call gfs2_log_flush because the call to filemap_fdatawrite will
do it for us:

   filemap_fdatawrite() -> __filemap_fdatawrite_range()
      __filemap_fdatawrite_range() -> do_writepages()
         do_writepages() -> gfs2_jdata_writepages()
            gfs2_jdata_writepages() -> gfs2_log_flush()

This patch modifies function do_gfs2_set_flags so that if a file
has its jdata flag set, and it's already on the ordered write list,
the log will be flushed and it will be removed from the list
before setting the flag.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Abhijith Das <adas@redhat.com>
2017-10-31 14:26:47 +01:00
Bob Peterson adbc3ddf28 GFS2: flush the log and all pages for jdata as we do for WB_SYNC_ALL
In function gfs2_write_inode, starting with patch a9185b41a4, we
only flush the log and call filemap_fdatawait if we're passed in a
wbc sync_mode of WB_SYNC_ALL. We also need to do these things if
we're evicting a jdata inode, because we might have jdata pages
still attached to bufdata descriptors that need to be revoked, but
by the time it gets to evict() it's too late to start a new
transaction. This patch changes it to treat jdata inodes as if
WB_SYNC_ALL had been specified.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Abhijith Das <adas@redhat.com>
2017-10-31 14:26:35 +01:00
Andreas Gruenbacher 3a27411cb4 gfs2: Implement SEEK_HOLE / SEEK_DATA via iomap
So far, lseek on gfs2 did not report holes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31 14:26:35 +01:00
Bob Peterson aac1a55b45 GFS2: Switch fiemap implementation to use iomap
This patch switches GFS2's implementation of fiemap from the old
block_map code to the new iomap interface.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31 14:26:34 +01:00
Bob Peterson 3974320ca6 GFS2: Implement iomap for block_map
This patch implements iomap for block mapping, and switches the
block_map function to use it under the covers.

The additional IOMAP_F_BOUNDARY iomap flag indicates when iomap has
reached a "metadata boundary" and fetching the next mapping is likely to
incur an additional I/O.  This flag is used for setting the bh buffer
boundary flag.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31 14:26:33 +01:00
Bob Peterson 5f8bd4440d GFS2: Make height info part of metapath
This patch eliminates height parameters from function gfs2_bmap_alloc.
Function find_metapath determines the metapath's "find height", also
known as the desired height. Function lookup_metapath determines the
metapath's "actual height", previously known as starting height or
sheight. Function gfs2_bmap_alloc now gets both height values from
the metapath. This simplification was done as a step toward switching
the block_map functions to using iomap. The bh_map responsibilities
are also removed from function gfs2_bmap_alloc for the same reason.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31 14:26:23 +01:00
Andreas Gruenbacher 0c9a66ec0e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 2017-10-16 15:06:23 +02:00
Linus Torvalds 17763641ff GFS2: Fix an old regression in GFS2's debugfs interface
This tag is meant for pulling a patch called "gfs2: Fix
 debugfs glocks dump" which fixes a regression introduced
 by commit 88ffbf3e03. The regression caused the glock
 dump in debugfs to not report all the glocks, which makes
 debugging extremely difficult.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZyUI8AAoJENeLYdPf93o7iq4IAKhb9wJ8kmpu7LZ5k6Fl8BCy
 GFztPe2bKsFG8cul1o1gZx8c/GWORaCHe3ZDI6pxl16/E+AvWoA1pKbBLYB1GSvD
 90a7/m6+hx02ZXR/MHxBUQLWYXBtBrVMVcZDCmFMHWYCRUIiX2etPZL8wOXeJLTl
 lNCSGdd1+3y6IJbthaIKTt1ctzsR8ZqV4QN786d2C3L9dxZ63FnAV43p3rUBzBLX
 B5uT5LTmdWSLRqe0A9rnrPga/BfEnA8GDtIYUMic9Yz0Hq2a3vEnCC3P3Myp0DJZ
 PGposwqL/emRhXkC4+ICrGsTOIy1BzwMXLF47GQaB/k+2Rd3/l9r/hU5ESjQOgA=
 =taQL
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Bob Peterson:
 "GFS2: Fix an old regression in GFS2's debugfs interface

 This fixes a regression introduced by commit 88ffbf3e03 ("GFS2: Use
 resizable hash table for glocks"). The regression caused the glock dump
 in debugfs to not report all the glocks, which makes debugging
 extremely difficult"

* tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix debugfs glocks dump
2017-09-25 15:41:56 -07:00
Andreas Gruenbacher c2c4be28c2 gfs2: Always update inode ctime in set_acl
Three-entry POSIX ACLs can be stored in the file mode permission bits,
with no need to store them in extended attributes.  When a process sets
such a minimal ACL, the kernel updates the file mode like chmod does,
and removes any existing extended attributes for that ACL.  Make sure
the ctime is always updated in that case.

Fixes xfstest generic/307.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25 12:33:19 -05:00
Andreas Gruenbacher 38eedf2841 gfs2: Support negative atimes
When inodes are read from disk, GFS2 will only update in-memory atimes
older than the on-disk atimes; this prevents atimes from going
backwards.  The atimes of newly allocated inodes are initialized to 0.
This means that when an atime is explicitly set to a negative value,
this value will not persist.

Fix by setting the atime of newly allocated inodes to the lowest
possible value instead of 0.

Fixes xfstest generic/258.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25 12:33:19 -05:00
Andreas Gruenbacher 9b7c2ddb45 gfs2: Update ctime in setflags ioctl
The FS_IOC_SETFLAGS ioctl is supposed to update the inode ctime.
Fixes xfstests generic/277.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25 12:33:18 -05:00
Andreas Gruenbacher 20cdc1931e gfs2: Clarify gfs2_block_map
Add a comment about the logical block size for directories.  Rename
"bsize" in gfs2_block_map to "factor".  Fix a typo in the description of
metaptr1.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25 12:33:18 -05:00
Andreas Gruenbacher 10201655b0 gfs2: Fix debugfs glocks dump
The switch to rhashtables (commit 88ffbf3e03) broke the debugfs glock
dump (/sys/kernel/debug/gfs2/<device>/glocks) for dumps bigger than a
single buffer: the right function for restarting an rhashtable iteration
from the beginning of the hash table is rhashtable_walk_enter;
rhashtable_walk_stop + rhashtable_walk_start will just resume from the
current position.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # v4.3+
2017-09-25 12:32:33 -05:00
Linus Torvalds 0f0d12728e Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount flag updates from Al Viro:
 "Another chunk of fmount preparations from dhowells; only trivial
  conflicts for that part. It separates MS_... bits (very grotty
  mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
  only a small subset of MS_... stuff).

  This does *not* convert the filesystems to new constants; only the
  infrastructure is done here. The next step in that series is where the
  conflicts would be; that's the conversion of filesystems. It's purely
  mechanical and it's better done after the merge, so if you could run
  something like

	list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

	sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
	        -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
	        -e 's/\<MS_NODEV\>/SB_NODEV/g' \
	        -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
	        -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
	        -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
	        -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
	        -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
	        -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
	        -e 's/\<MS_SILENT\>/SB_SILENT/g' \
	        -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
	        -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
	        -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
	        -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
	        $list

  and commit it with something along the lines of 'convert filesystems
  away from use of MS_... constants' as commit message, it would save a
  quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Differentiate mount flags (MS_*) from internal superblock flags
  VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
  vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-14 18:54:01 -07:00
Linus Torvalds a0725ab0c7 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the first pull request for 4.14, containing most of the code
  changes. It's a quiet series this round, which I think we needed after
  the churn of the last few series. This contains:

   - Fix for a registration race in loop, from Anton Volkov.

   - Overflow complaint fix from Arnd for DAC960.

   - Series of drbd changes from the usual suspects.

   - Conversion of the stec/skd driver to blk-mq. From Bart.

   - A few BFQ improvements/fixes from Paolo.

   - CFQ improvement from Ritesh, allowing idling for group idle.

   - A few fixes found by Dan's smatch, courtesy of Dan.

   - A warning fixup for a race between changing the IO scheduler and
     device remova. From David Jeffery.

   - A few nbd fixes from Josef.

   - Support for cgroup info in blktrace, from Shaohua.

   - Also from Shaohua, new features in the null_blk driver to allow it
     to actually hold data, among other things.

   - Various corner cases and error handling fixes from Weiping Zhang.

   - Improvements to the IO stats tracking for blk-mq from me. Can
     drastically improve performance for fast devices and/or big
     machines.

   - Series from Christoph removing bi_bdev as being needed for IO
     submission, in preparation for nvme multipathing code.

   - Series from Bart, including various cleanups and fixes for switch
     fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
  kernfs: checking for IS_ERR() instead of NULL
  drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
  drbd: Fix allyesconfig build, fix recent commit
  drbd: switch from kmalloc() to kmalloc_array()
  drbd: abort drbd_start_resync if there is no connection
  drbd: move global variables to drbd namespace and make some static
  drbd: rename "usermode_helper" to "drbd_usermode_helper"
  drbd: fix race between handshake and admin disconnect/down
  drbd: fix potential deadlock when trying to detach during handshake
  drbd: A single dot should be put into a sequence.
  drbd: fix rmmod cleanup, remove _all_ debugfs entries
  drbd: Use setup_timer() instead of init_timer() to simplify the code.
  drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
  drbd: new disk-option disable-write-same
  drbd: Fix resource role for newly created resources in events2
  drbd: mark symbols static where possible
  drbd: Send P_NEG_ACK upon write error in protocol != C
  drbd: add explicit plugging when submitting batches
  drbd: change list_for_each_safe to while(list_first_entry_or_null)
  drbd: introduce drbd_recv_header_maybe_unplug
  ...
2017-09-07 11:59:42 -07:00
Linus Torvalds ec3604c7a5 Writeback error handling fixes for v4.14
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZrTy3AAoJEAAOaEEZVoIVaucP/ApBAj2S5wzvlV1u6l8E6ae7
 ZeEEZfcWwzRYlKjZAkTWqj9XvGpDGO5gLq4wsZK2edFAq++/MJF8ZVtN4tdZ1kUZ
 DUvRodtVOrT08Kp9wZXGT7JOFrf6U/6gMcR6p0MuWnHndeKYvlpcFi9NPT4EC9/z
 Zm9V7gtlPdSOha7eaSjUS0+vLERkxqXLBW3Av9QUOBP/lbI3lqIroGKeHDYnVdya
 2P/k5EcRRJMyJP6TqyYxmmJl+UWjJFMLvnlUDBslHnD/u3mIUhw3JLHYBjn5dZRE
 Xjq56IDPoXDUvzlBhtn/Uqyx+/wtwsNsylpmKv6K5G1JfdeuSsPVsCey+A1cqV64
 LpE5896wf9TmnmI9LNyh6vDn925xPSGBiF45UEp5f9aO7jXeY0MaEZ8g+ENqFIDK
 v4gtZdS9FhYHV+/l4qEwYMKrqSbwKEs1r1FT+f4wnABby1ojfdA57ZPlp5PV2Vjp
 szTp88Zkb7cMvZwEnWwxWofcJNmgS7uNahvnQF3IJ4ITsioEkuyYR3K4ZQMaaaV9
 wCp6G0FhXZaK3OI7o9WiDwaO2elp9Hxc8bnqKpiBbHZkY0NLh7/++5VxpeNbTHFy
 AGijQiiKGNNyYqNj93wq9jpVdMNjB0pXrHRxfav8v7MtQ+WfbEoAENF4T7hN7iXn
 UuF6eSWEC5O1UCRUk1A+
 =LLY3
 -----END PGP SIGNATURE-----

Merge tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull writeback error handling updates from Jeff Layton:
 "This pile continues the work from last cycle on better tracking
  writeback errors. In v4.13 we added some basic errseq_t infrastructure
  and converted a few filesystems to use it.

  This set continues refining that infrastructure, adds documentation,
  and converts most of the other filesystems to use it. The main
  exception at this point is the NFS client"

* tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  ecryptfs: convert to file_write_and_wait in ->fsync
  mm: remove optimizations based on i_size in mapping writeback waits
  fs: convert a pile of fsync routines to errseq_t based reporting
  gfs2: convert to errseq_t based writeback error reporting for fsync
  fs: convert sync_file_range to use errseq_t based error-tracking
  mm: add file_fdatawait_range and file_write_and_wait
  fuse: convert to errseq_t based error tracking for fsync
  mm: consolidate dax / non-dax checks for writeback
  Documentation: add some docs for errseq_t
  errseq: rename __errseq_set to errseq_set
2017-09-06 14:11:03 -07:00
Linus Torvalds 77d0ab600a We've got a whopping 29 GFS2 patches for this merge window, mainly
because we held some back from the previous merge window until we
 could get them perfected and well tested. We have a couple patch
 sets, including my patch set for protecting glock gl_object and
 Andreas Gruenbacher's patch set to fix the long-standing shrink-
 slab hang, plus a bunch of assorted bugs and cleanups:
 
 1. I fixed a bug whereby an IO error would lead to a double-brelse.
 2. Andreas Gruenbacher made a minor cleanup to call his relatively
    new function, gfs2_holder_initialized, rather than doing it
    manually. This was just missed by a previous patch set.
 3. Jan Kara fixed a bug whereby the SGID was being cleared when
    inheriting ACLs.
 4. Andreas found a bug and fixed it in his previous patch,
    "Get rid of flush_delayed_work in gfs2_evict_inode". A call to
    flush_delayed_work was deleted from *gfs2_inode_lookup and added
    to gfs2_create_inode.
 5. Wang Xibo found and fixed a list_add call in inode_go_lock
    that specified the parameters in the wrong order.
 6. Coly Li submitted a patch to add the REQ_PRIO to some of GFS2's
    metadata reads that were accidentally missing them.
 7 - 10. I submitted a 4-patch set to protect the glock gl_object
    field. GFS2 was setting and checking gl_object with no locking
    mechanism, so the value was occasionally stomped on, which caused
    file system corruption.
 11. I submitted a small cleanup to function gfs2_clear_rgrpd.
    It was needlessly adding rgrp glocks to the lru list, then pulling
    them back off immediately. The rgrp glocks don't use the lru list
    anyway, so doing so was just a waste of time.
 12. I submitted a patch that checks the GLOF_LRU flag on a glock
    before trying to remove it from the lru_list. This avoids a lot
    of unnecessary spin_lock contention.
 13. I submitted a patch to delete GFS2's debugfs files only after
    we evict all the glocks. Before this patch, GFS2 would delete the
    debugfs files, and if unmount hung waiting for a glock, there was
    no way to debug the problem. Now, if a hang occurs during umount,
    we can examine the debugfs files to figure out why it's hung.
 14. Andreas Gruenbacher submitted a patch to fix some trivial typos.
 15 - 19. Andreas also submitted a five-part patch set to fix the
    longstanding hang involving the slab shrinker: dlm requires
    memory, calls the inode shrinker, which calls gfs2's evict, which
    calls back into DLM before it can evict an inode.
 20. Abhi Das submitted a patch to forcibly flush the active items
    list to relieve memory pressure. This fixes a long-standing bug
    whereby GFS2 was getting hung permanently in balance_dirty_pages.
 21. Thomas Tai submitted a patch to fix a slab corruption problem
    due to a residual pointer left in the lock_dlm lockstruct.
 22. I submitted a patch to withdraw the file system if IO errors
    are encountered while writing to the journals or statfs system
    file which were previously not being sent back up. Before, some
    IO errors were sometimes not be detected for several hours, and
    at recovery time, the journal errors made journal replay
    impossible.
 23. Andreas has a patch to fix an annoying format-truncation compiler
    warning so GFS2 compiles cleanly.
 24. I have a patch that fixes a handful of sparse compiler warnings.
 25. Andreas fixed up an useless gl_object warning caused by an
    earlier patch.
 26. Arvind Yadav added a patch to properly constify our rhashtable
    params declare.
 27. I added a patch to fix a regression caused by the non-recursive
    delete and truncate patch that caused file system blocks to not
    be properly freed.
 28. Ernesto A. Fernández added a patch to fix a place where GFS2
    would send back the wrong return code setting extended attributes.
 29. Ernesto also added a patch to fix a case in which GFS2 was
    improperly setting an inode's i_mode, potentially granting access
    to the wrong users.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZrMC2AAoJENeLYdPf93o7PIQIAKY4hdC2pMM5tiiIHx5fPAAr
 tjpVuFkDQzyEaTb9sArVLxEdva3ShKERQKoYq/VVxqbAEwPgXbzJFNNil1WTJi1t
 J2gE4wE4G5x1+A7XDzCdPI8KAcF+yX63AaFYlVKyuZSq5w7njIRc1Vk+TFiIexxC
 xb0nP0g9L6Zt114rE8kfi0/GLjTO9vOKM3XsJgG612I3/cs3RUx4gJ+nSUG0bYLA
 qoBIXEJ3SFHw2Zr/LgHZ9QDHnlPVl3bjg03sRQaWZms7XbLegDBYsDSvS1HLZ300
 gjTc0Dgz/6KwzDVJ7cZ/fPNYtIFY58tKs6aqqDTrCncsX9nPjcTAxYkBNWsFyZM=
 =tXJ8
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.14.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "We've got a whopping 29 GFS2 patches for this merge window, mainly
  because we held some back from the previous merge window until we
  could get them perfected and well tested. We have a couple patch sets,
  including my patch set for protecting glock gl_object and Andreas
  Gruenbacher's patch set to fix the long-standing shrink- slab hang,
  plus a bunch of assorted bugs and cleanups.

  Summary:

   - I fixed a bug whereby an IO error would lead to a double-brelse.

   - Andreas Gruenbacher made a minor cleanup to call his relatively new
     function, gfs2_holder_initialized, rather than doing it manually.
     This was just missed by a previous patch set.

   - Jan Kara fixed a bug whereby the SGID was being cleared when
     inheriting ACLs.

   - Andreas found a bug and fixed it in his previous patch, "Get rid of
     flush_delayed_work in gfs2_evict_inode". A call to
     flush_delayed_work was deleted from *gfs2_inode_lookup and added to
     gfs2_create_inode.

   - Wang Xibo found and fixed a list_add call in inode_go_lock that
     specified the parameters in the wrong order.

   - Coly Li submitted a patch to add the REQ_PRIO to some of GFS2's
     metadata reads that were accidentally missing them.

   - I submitted a 4-patch set to protect the glock gl_object field.
     GFS2 was setting and checking gl_object with no locking mechanism,
     so the value was occasionally stomped on, which caused file system
     corruption.

   - I submitted a small cleanup to function gfs2_clear_rgrpd. It was
     needlessly adding rgrp glocks to the lru list, then pulling them
     back off immediately. The rgrp glocks don't use the lru list
     anyway, so doing so was just a waste of time.

   - I submitted a patch that checks the GLOF_LRU flag on a glock before
     trying to remove it from the lru_list. This avoids a lot of
     unnecessary spin_lock contention.

   - I submitted a patch to delete GFS2's debugfs files only after we
     evict all the glocks. Before this patch, GFS2 would delete the
     debugfs files, and if unmount hung waiting for a glock, there was
     no way to debug the problem. Now, if a hang occurs during umount,
     we can examine the debugfs files to figure out why it's hung.

   - Andreas Gruenbacher submitted a patch to fix some trivial typos.

   - Andreas also submitted a five-part patch set to fix the
     longstanding hang involving the slab shrinker: dlm requires memory,
     calls the inode shrinker, which calls gfs2's evict, which calls
     back into DLM before it can evict an inode.

   - Abhi Das submitted a patch to forcibly flush the active items list
     to relieve memory pressure. This fixes a long-standing bug whereby
     GFS2 was getting hung permanently in balance_dirty_pages.

   - Thomas Tai submitted a patch to fix a slab corruption problem due
     to a residual pointer left in the lock_dlm lockstruct.

   - I submitted a patch to withdraw the file system if IO errors are
     encountered while writing to the journals or statfs system file
     which were previously not being sent back up. Before, some IO
     errors were sometimes not be detected for several hours, and at
     recovery time, the journal errors made journal replay impossible.

   - Andreas has a patch to fix an annoying format-truncation compiler
     warning so GFS2 compiles cleanly.

   - I have a patch that fixes a handful of sparse compiler warnings.

   - Andreas fixed up an useless gl_object warning caused by an earlier
     patch.

   - Arvind Yadav added a patch to properly constify our rhashtable
     params declare.

   - I added a patch to fix a regression caused by the non-recursive
     delete and truncate patch that caused file system blocks to not be
     properly freed.

   - Ernesto A. Fernández added a patch to fix a place where GFS2 would
     send back the wrong return code setting extended attributes.

   - Ernesto also added a patch to fix a case in which GFS2 was
     improperly setting an inode's i_mode, potentially granting access
     to the wrong users"

* tag 'gfs2-4.14.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (29 commits)
  gfs2: preserve i_mode if __gfs2_set_acl() fails
  gfs2: don't return ENODATA in __gfs2_xattr_set unless replacing
  GFS2: Fix non-recursive truncate bug
  gfs2: constify rhashtable_params
  GFS2: Fix gl_object warnings
  GFS2: Fix up some sparse warnings
  gfs2: Silence gcc format-truncation warning
  GFS2: Withdraw for IO errors writing to the journal or statfs
  gfs2: fix slab corruption during mounting and umounting gfs file system
  gfs2: forcibly flush ail to relieve memory pressure
  gfs2: Clean up waiting on glocks
  gfs2: Defer deleting inodes under memory pressure
  gfs2: gfs2_evict_inode: Put glocks asynchronously
  gfs2: Get rid of gfs2_set_nlink
  gfs2: gfs2_glock_get: Wait on freeing glocks
  gfs2: Fix trivial typos
  GFS2: Delete debugfs files only after we evict the glocks
  GFS2: Don't waste time locking lru_lock for non-lru glocks
  GFS2: Don't bother trying to add rgrps to the lru list
  GFS2: Clear gl_object when deleting an inode in gfs2_delete_inode
  ...
2017-09-06 11:42:31 -07:00
Ernesto A. Fernández 309e8cda59 gfs2: preserve i_mode if __gfs2_set_acl() fails
When changing a file's acl mask, __gfs2_set_acl() will first set the
group bits of i_mode to the value of the mask, and only then set the
actual extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the
file had no acl attribute to begin with, the system will from now on
assume that the mask permission bits are actual group permission bits,
potentially granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-31 07:53:15 -05:00
Ernesto A. Fernández 54aae14bee gfs2: don't return ENODATA in __gfs2_xattr_set unless replacing
The function __gfs2_xattr_set() will return -ENODATA when called to
remove a xattr that does not exist. The result is that setfacl will
show an exit status of 1 when called to set only a file's mode bits
(on a file with no ACLs), despite succeeding. A "No data available"
error will be printed as well.

To fix this return 0 instead, except when the XATTR_REPLACE flag is
set, in which case -ENODATA is appropriate. This is consistent with
how most other xattr setting functions work, in other filesystems.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-31 07:43:03 -05:00
Bob Peterson c4a9d1892f GFS2: Fix non-recursive truncate bug
Before this patch if you truncated a file to a smaller size it
wasn't freeing all the blocks properly. There are two reasons.

First, the metapath comparison was not comparing previous heights.
I added a function, mp_eq_to_hgt, which checks the metapath at
all heights prior to the target height.

Second, in function find_nonnull_ptr, it needed to zero out all
pointers for heights following the target height. Translated into
decimal integer terms, this way a number like 299, when incremented,
becomes 300, not 399. The 2 gets incremented to 3, and the following
digits need to be reset.

These two things allow the truncate state machine to properly find
the blocks it needs to delete.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-30 13:29:22 -05:00
Arvind Yadav d296b15ed5 gfs2: constify rhashtable_params
rhashtable_params are not supposed to change at runtime. All
Functions rhashtable_* working with const rhashtable_params
provided by <linux/rhashtable.h>. So mark the non-const structs
as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-30 08:14:39 -05:00
Andreas Gruenbacher 7023a0b16f GFS2: Fix gl_object warnings
The following cleanup is needed to avoid spilling the syslog with
false warnings.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-30 08:14:27 -05:00
Bob Peterson 27c3b415f6 GFS2: Fix up some sparse warnings
This patch cleans up various pieces of GFS2 to avoid sparse errors.
This doesn't fix them all, but it fixes several. The first error,
in function glock_hash_walk was a genuine bug where the rhashtable
could be started and not stopped.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25 18:47:18 -05:00
Andreas Gruenbacher 561b796987 gfs2: Silence gcc format-truncation warning
Enlarge sd_fsname to be big enough for the longest long lock table name
and an arbitrary journal number.  This silences two -Wformat-truncation
warnings with gcc 7.1.1.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25 10:59:21 -05:00
Bob Peterson 942b0cddfb GFS2: Withdraw for IO errors writing to the journal or statfs
Before this patch, if GFS2 encountered IO errors while writing to
the journal, it would not report the problem, so they would go
unnoticed, sometimes for many hours. Sometimes this would only be
noticed later, when recovery tried to do journal replay and failed
due to invalid metadata at the blocks that resulted in IO errors.

This patch makes GFS2's log daemon check for IO errors. If it
encounters one, it withdraws from the file system and reports
why in dmesg. A similar action is taken when IO errors occur when
writing to the system statfs file.

These errors are also reported back to any callers of fsync, since
that requires the journal to be flushed. Therefore, any IO errors
that would previously go unnoticed are now noticed and the file
system is withdrawn as early as possible, thus preventing further
file system damage.

Also note that this reintroduces superblock variable sd_log_error,
which Christoph removed with commit f729b66fca.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25 10:59:09 -05:00
Christoph Hellwig 74d46992e0 block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-23 12:49:55 -06:00
Thomas Tai cc1dfa8b75 gfs2: fix slab corruption during mounting and umounting gfs file system
When using cman-3.0.12.1 and gfs2-utils-3.0.12.1, mounting and
unmounting GFS2 file system would cause kernel to hang. The slab
allocator suggests that it is likely a double free memory corruption.
The issue is traced back to v3.9-rc6 where a patch is submitted to
use kzalloc() for storing a bitmap instead of using a local variable.
The intention is to allocate memory during mount and to free memory
during unmount. The original patch misses a code path which has
already freed the memory and caused memory corruption. This patch sets
the memory pointer to NULL after the memory is freed, so that double
free memory corruption will not happen.

gdlm_mount()
  '-- set_recover_size() which use kzalloc()
  '-- if dlm does not support ops callbacks then
          '--- free_recover_size() which use kfree()

gldm_unmount()
  '-- free_recover_size() which use kfree()

Previous patch which introduced the double free issue is
commit 57c7310b8e ("GFS2: use kmalloc for lvb bitmap")

Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
2017-08-15 11:54:09 -05:00
Abhi Das b066a4eebd gfs2: forcibly flush ail to relieve memory pressure
On systems with low memory, it is possible for gfs2 to infinitely
loop in balance_dirty_pages() under heavy IO (creating sparse files).

balance_dirty_pages() attempts to write out the dirty pages via
gfs2_writepages() but none are found because these dirty pages are
being used by the journaling code in the ail. Normally, the journal
has an upper threshold which when hit triggers an automatic flush
of the ail. But this threshold can be higher than the number of
allowable dirty pages and result in the ail never being flushed.

This patch forces an ail flush when gfs2_writepages() fails to write
anything. This is a good indication that the ail might be holding
some dirty pages.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:51:03 -05:00
Andreas Gruenbacher a91323e255 gfs2: Clean up waiting on glocks
The prepare_to_wait_on_glock and finish_wait_on_glock functions introduced in
commit 56a365be "gfs2: gfs2_glock_get: Wait on freeing glocks" are
better removed, resulting in cleaner code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:51:02 -05:00
Andreas Gruenbacher 6a1c8f6dcf gfs2: Defer deleting inodes under memory pressure
When under memory pressure and an inode's link count has dropped to
zero, defer deleting the inode to the delete workqueue.  This avoids
calling into DLM under memory pressure, which can deadlock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:49:13 -05:00
Andreas Gruenbacher 71c1b21368 gfs2: gfs2_evict_inode: Put glocks asynchronously
gfs2_evict_inode is called to free inodes under memory pressure.  The
function calls into DLM when an inode's last cluster-wide reference goes
away (remote unlink) and to release the glock and associated DLM lock
before finally destroying the inode.  However, if DLM is blocked on
memory to become available, calling into DLM again will deadlock.

Avoid that by decoupling releasing glocks from destroying inodes in that
case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
in work queue context, when the associated inodes have likely already
been destroyed.

With this change, inodes can end up being unlinked, remote-unlink can be
triggered, and then the inode can be reallocated before all
remote-unlink callbacks are processed.  To detect that, revalidate the
link count in gfs2_evict_inode to make sure we're not deleting an
allocated, referenced inode.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:45:21 -05:00
Andreas Gruenbacher eebd2e813f gfs2: Get rid of gfs2_set_nlink
Remove gfs2_set_nlink which prevents the link count of an inode from
becoming non-zero once it has reached zero.  The next commit reduces the
amount of waiting on glocks when an inode is evicted from memory.  With
that, an inode can become reallocated before all the remote-unlink
callbacks from a previous delete are processed, which causes the link
count to change from zero to non-zero.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:42:11 -05:00
Andreas Gruenbacher 0515480ad4 gfs2: gfs2_glock_get: Wait on freeing glocks
Keep glocks in their hash table until they are freed instead of removing
them when their last reference is dropped.  This allows to wait for any
previous instances of a glock to go away in gfs2_glock_get before
creating a new glocks.

Special thanks to Andy Price for finding and fixing a problem which also
required us to delete the rcu_read_unlock from the error case in function
gfs2_glock_get.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10 10:39:31 -05:00
Andreas Gruenbacher 61b91cfdc6 gfs2: Fix trivial typos
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:39 -05:00
Bob Peterson b2fb7dab7f GFS2: Delete debugfs files only after we evict the glocks
This patch moves the call to gfs2_delete_debugfs_file so that it
comes after the glock hash table has been cleared. This way we
can query the debugfs files if umount hangs.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:39 -05:00
Bob Peterson 645ebd49f0 GFS2: Don't waste time locking lru_lock for non-lru glocks
Before this patch, glock_dq would call gfs2_glock_remove_from_lru.
For glocks that are never put on the LRU, such as the transaction
glock, this just takes the spin_lock, determines there's nothing to
be done because the list is empty, then unlocks again. This was
causing unnecessary lock contention on the lru_lock spin_lock.
This patch adds a check for GLOF_LRU in the glops before taking
the spin_lock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:39 -05:00
Bob Peterson 2d821a8b71 GFS2: Don't bother trying to add rgrps to the lru list
This patch removes a call to gfs2_glock_add_to_lru from function
gfs2_clear_rgrpd. The call is just a waste of time because as soon
as it adds it to the lru_list, the call to gfs2_glock_put takes it
back off again.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09 09:36:38 -05:00
Bob Peterson 240c6235df GFS2: Clear gl_object when deleting an inode in gfs2_delete_inode
This patch adds some calls to clear gl_object in function
gfs2_delete_inode. Since we are deleting the inode, and the glock
typically outlives the inode in core, we must clear gl_object
so subsequent use of the glock (e.g. for a new inode in its place)
will not have the old pointer sitting there. In error cases we
need to tidy up after ourselves. In non-error cases, we need to
clear gl_object before we set the block free in the bitmap so
residules aren't left for potential inode creators.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-08-09 09:36:38 -05:00
Bob Peterson 9c1b28081f GFS2: Clear gl_object if gfs2_create_inode fails
If function gfs2_create_inode fails after the inode has been
created (for example, if the inode_refresh fails for some reason)
the function was setting gl_object but never clearing it again.
The glocks are left pointing to a freed inode. This patch adds
the calls to clear gl_object in the appropriate error paths.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-08-09 09:36:26 -05:00
Jeff Layton d07a6ac7b6 gfs2: convert to errseq_t based writeback error reporting for fsync
Also, fix a place where a writeback error might get dropped in the
gfs2_is_jdata case.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-08-01 08:39:29 -04:00
Bob Peterson 4d7c18c7df GFS2: Set gl_object in inode lookup only after block type check
Before this patch, the inode glock's gl_object was set after a
reference was acquired, but before the block type was verified.
In cases where the block was unlinked, then freed and reused on
another node, a residule delete callback (delete_work) would try
to look up the inode, eventually failing the block check, but
only after it overwrites gl_object with a pointer to the wrong
inode. This patch moves the assignment of gl_object after the
block check so it won't be improperly overwritten.

Likewise, at the end of the function, gfs2_inode_lookup was
clearing gl_object after it unlocked the glock, which meant
another process might free the glock in the meantime. This
patch guards against that case.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-07-21 08:20:42 -05:00