The NFSv4.2 CLONE operation has implicit persistence requirements on the
target file, since there is no protocol requirement that the client issue
a separate operation to persist data.
For that reason, we should call vfs_fsync_range() on the destination file
after a successful call to vfs_clone_file_range().
Fixes: ffa0160a10 ("nfsd: implement the NFSv4.2 CLONE operation")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently, the knfsd server assumes that a short read indicates an
end of file. That assumption is incorrect. The short read means that
either we've hit the end of file, or we've hit a read error.
In the case of a read error, the client may want to retry (as per the
implementation recommendations in RFC1813 and RFC7530), but currently it
is being told that it hit an eof.
Move the code to detect eof from version specific code into the generic
nfsd read.
Report eof only in the two following cases:
1) read() returns a zero length short read with no error.
2) the offset+length of the read is >= the file size.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If multiple clients are writing to the same file, then due to the fact
we share a single file descriptor between all NFSv3 clients writing
to the file, we have a situation where clients can miss the fact that
their file data was not persisted. While this should be rare, it
could cause silent data loss in situations where multiple clients
are using NLM locking or O_DIRECT to write to the same file.
Unfortunately, the stateless nature of NFSv3 and the fact that we
can only identify clients by their IP address means that we cannot
trivially cache errors; we would not know when it is safe to
release them from the cache.
So the solution is to declare a reboot. We understand that this
should be a rare occurrence, since disks are usually stable. The
most frequent occurrence is likely to be ENOSPC, at which point
all writes to the given filesystem are likely to fail anyway.
So the expectation is that clients will be forced to retry their
writes until they hit the fatal error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.
On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.
This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.
Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The raparms cache was set up in order to ensure that we carry readahead
information forward from one RPC call to the next. In other words, it
was set up because each RPC call was forced to open a struct file, then
close it, causing the loss of readahead information that is normally
cached in that struct file, and used to keep the page cache filled when
a user calls read() multiple times on the same file descriptor.
Now that we cache the struct file, and reuse it for all the I/O calls
to a given file by a given user, we no longer have to keep a separate
readahead cache.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Use cached filps if possible instead of opening a new one every time.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently, NFSv2/3 reads and writes have to open a file, do the read or
write and then close it again for each RPC. This is highly inefficient,
especially when the underlying filesystem has a relatively slow open
routine.
This patch adds a new open file cache to knfsd. Rather than doing an
open for each RPC, the read/write handlers can call into this cache to
see if there is one already there for the correct filehandle and
NFS_MAY_READ/WRITE flags.
If there isn't an entry, then we create a new one and attempt to
perform the open. If there is, then we wait until the entry is fully
instantiated and return it if it is at the end of the wait. If it's
not, then we attempt to take over construction.
Since the main goal is to speed up NFSv2/3 I/O, we don't want to
close these files on last put of these objects. We need to keep them
around for a little while since we never know when the next READ/WRITE
will come in.
Cache entries have a hardcoded 1s timeout, and we have a recurring
workqueue job that walks the cache and purges any entries that have
expired.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Richard Sharpe <richard.sharpe@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
fh_want_write() can now be called twice, but I'm also fixing up the
callers not to do that.
Other cases include setattr and create.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If the parameter 'count' is non-zero, nfsd4_clone_file_range() will
currently clobber all errors returned by vfs_clone_file_range() and
replace them with EINVAL.
Fixes: 42ec3d4c02 ("vfs: make remap_file_range functions take and...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
As the man(2) page for utime/utimes states, EPERM is returned when the
second parameter of utime or utimes is not NULL, the caller's effective UID
does not match the owner of the file, and the caller is not privileged.
However, in a NFS directory mounted from knfsd, it will return EACCES
(from nfsd_setattr-> fh_verify->nfsd_permission). This patch fixes
that.
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Rework the vfs_clone_file_range and vfs_dedupe_file_range infrastructure to use
a common .remap_file_range method and supply generic bounds and sanity checking
functions that are shared with the data write path. The current VFS
infrastructure has problems with rlimit, LFS file sizes, file time stamps,
maximum filesystem file sizes, stripping setuid bits, etc and so they are
addressed in these commits.
We also introduce the ability for the ->remap_file_range methods to return short
clones so that clones for vfs_copy_file_range() don't get rejected if the entire
range can't be cloned. It also allows filesystems to sliently skip deduplication
of partial EOF blocks if they are not capable of doing so without requiring
errors to be thrown to userspace.
All existing filesystems are converted to user the new .remap_file_range method,
and both XFS and ocfs2 are modified to make use of the new generic checking
infrastructure.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJb29gEAAoJEK3oKUf0dfodpOAQAL2VbHjvKXEwNMDTKscSRMmZ
Z0xXo3gamFKQ+VGOqy2g2lmAYQs9SAnTuCGTJ7zIAp7u+q8gzUy5FzKAwLS4Id6L
8siaY6nzlicfO04d0MdXnWz0f3xykChgzfdQfVUlUi7WrDioBUECLPmx4a+USsp1
DQGjLOZfoOAmn2rijdnH9RTEaHqg+8mcTaLN9TRav4gGqrWxldFKXw2y6ouFC7uo
/hxTRNXR9VI+EdbDelwBNXl9nU9gQA0WLOvRKwgUrtv6bSJohTPsmXt7EbBtNcVR
cl3zDNc1sLD1bLaRLEUAszI/33wXaaQgom1iB51obIcHHef+JxRNG/j6rUMfzxZI
VaauGv5EIvtaKN0LTAqVVLQ8t2MQFYfOr8TykmO+1UFog204aKRANdVMHDSjxD/0
dTGKJGcq+HnKQ+JHDbTdvuXEL8sUUl1FiLjOQbZPw63XmuddLKFUA2TOjXn6htbU
1h1MG5d9KjGLpabp2BQheczD08NuSmcrOBNt7IoeI3+nxr3HpMwprfB9TyaERy9X
iEgyVXmjjc9bLLRW7A2wm77aW64NvPs51wKMnvuNgNwnCewrGS6cB8WVj2zbQjH1
h3f3nku44s9ctNPSBzb/sJLnpqmZQ5t0oSmrMSN+5+En6rNTacoJCzxHRJBA7z/h
Z+C6y1GTZw0euY6Zjiwu
=CE/A
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull vfs dedup fixes from Dave Chinner:
"This reworks the vfs data cloning infrastructure.
We discovered many issues with these interfaces late in the 4.19 cycle
- the worst of them (data corruption, setuid stripping) were fixed for
XFS in 4.19-rc8, but a larger rework of the infrastructure fixing all
the problems was needed. That rework is the contents of this pull
request.
Rework the vfs_clone_file_range and vfs_dedupe_file_range
infrastructure to use a common .remap_file_range method and supply
generic bounds and sanity checking functions that are shared with the
data write path. The current VFS infrastructure has problems with
rlimit, LFS file sizes, file time stamps, maximum filesystem file
sizes, stripping setuid bits, etc and so they are addressed in these
commits.
We also introduce the ability for the ->remap_file_range methods to
return short clones so that clones for vfs_copy_file_range() don't get
rejected if the entire range can't be cloned. It also allows
filesystems to sliently skip deduplication of partial EOF blocks if
they are not capable of doing so without requiring errors to be thrown
to userspace.
Existing filesystems are converted to user the new remap_file_range
method, and both XFS and ocfs2 are modified to make use of the new
generic checking infrastructure"
* tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits)
xfs: remove [cm]time update from reflink calls
xfs: remove xfs_reflink_remap_range
xfs: remove redundant remap partial EOF block checks
xfs: support returning partial reflink results
xfs: clean up xfs_reflink_remap_blocks call site
xfs: fix pagecache truncation prior to reflink
ocfs2: remove ocfs2_reflink_remap_range
ocfs2: support partial clone range and dedupe range
ocfs2: fix pagecache truncation prior to reflink
ocfs2: truncate page cache for clone destination file before remapping
vfs: clean up generic_remap_file_range_prep return value
vfs: hide file range comparison function
vfs: enable remap callers that can handle short operations
vfs: plumb remap flags through the vfs dedupe functions
vfs: plumb remap flags through the vfs clone functions
vfs: make remap_file_range functions take and return bytes completed
vfs: remap helper should update destination inode metadata
vfs: pass remap flags to generic_remap_checks
vfs: pass remap flags to generic_remap_file_range_prep
vfs: combine the clone and dedupe into a single remap_file_range
...
Pull AFS updates from Al Viro:
"AFS series, with some iov_iter bits included"
* 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
missing bits of "iov_iter: Separate type from direction and use accessor functions"
afs: Probe multiple fileservers simultaneously
afs: Fix callback handling
afs: Eliminate the address pointer from the address list cursor
afs: Allow dumping of server cursor on operation failure
afs: Implement YFS support in the fs client
afs: Expand data structure fields to support YFS
afs: Get the target vnode in afs_rmdir() and get a callback on it
afs: Calc callback expiry in op reply delivery
afs: Fix FS.FetchStatus delivery from updating wrong vnode
afs: Implement the YFS cache manager service
afs: Remove callback details from afs_callback_break struct
afs: Commit the status on a new file/dir/symlink
afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
afs: Don't invoke the server to read data beyond EOF
afs: Add a couple of tracepoints to log I/O errors
afs: Handle EIO from delivery function
afs: Fix TTL on VL server and address lists
afs: Implement VL server rotation
afs: Improve FS server rotation error handling
...
already supported COPY, by copying a limited amount of data and then
returning a short result, letting the client resend. The asynchronous
protocol should offer better performance at the expense of some
complexity.
The other highlight is Trond's work to convert the duplicate reply cache
to a red-black tree, and to move it and some other server caches to RCU.
(Previously these have meant taking global spinlocks on every RPC.)
Otherwise, some RDMA work and miscellaneous bugfixes.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJb2KWzAAoJECebzXlCjuG+gcQP/3DldB86CFxgSFx0t+h+s+TV
CdYJDPyLyRkEMiD+4dCPPuhueve+j5BPHVsDbn98FTWrEn131NMIs6uhU/VGTtAU
6a8f/ExtZ5U7s39MJCzlk2ozVElBc3QPp7p3p9NKn0Wi0PXbVgjuIqR5o2vwa8Si
KOVdLm6ylfav/HTH8DO6zFPJRsTgTwcJOivXXshjpglMKAcw8AuqSsGgBrDeGpgU
u91Vi0EM1vt96+CA6a01mTgC/sFX7EqGvxUUHOrKWf5cIjnpT3FDvouYPxi+GH8Z
SIDlaMQyXF5m4m6MhELNTP4v97XAHyPJtvLkEe5lggTyABPiA2heo9e8onysWkzV
1v8OZHCVFa1UL34mDlnFxbFCYVr7FFKMGjTBR/ntinobPfAbWRCO1Hdd+bBGPDD4
byf7ctDVp7KQ2bSatIdlYavikuGDHWFDZHzPHlqkD3gpIZSNvhe26sV3NZqIFlXO
cMUega2Y5mXmULauHhxAcNGtDK7dF5hHoMWKJy0DNxiyDiDLylwDOIfwt1De3Q7V
ycd/wUytUS2LkAhyS2mvoDK6eXTBAeQwzmXAqveh6rewwO83HC/t9mtKBBDomvKG
xRpRPmmbj9ijbwkilEBmijjR47wrihmEVIFahznEerZ+//QOfVVOB0MNtzIyU9/k
CnP1ZNvOs3LR1pxxwFa8
=TTo0
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Olga added support for the NFSv4.2 asynchronous copy protocol. We
already supported COPY, by copying a limited amount of data and then
returning a short result, letting the client resend. The asynchronous
protocol should offer better performance at the expense of some
complexity.
The other highlight is Trond's work to convert the duplicate reply
cache to a red-black tree, and to move it and some other server caches
to RCU. (Previously these have meant taking global spinlocks on every
RPC)
Otherwise, some RDMA work and miscellaneous bugfixes"
* tag 'nfsd-4.20' of git://linux-nfs.org/~bfields/linux: (30 commits)
lockd: fix access beyond unterminated strings in prints
nfsd: Fix an Oops in free_session()
nfsd: correctly decrement odstate refcount in error path
svcrdma: Increase the default connection credit limit
svcrdma: Remove try_module_get from backchannel
svcrdma: Remove ->release_rqst call in bc reply handler
svcrdma: Reduce max_send_sges
nfsd: fix fall-through annotations
knfsd: Improve lookup performance in the duplicate reply cache using an rbtree
knfsd: Further simplify the cache lookup
knfsd: Simplify NFS duplicate replay cache
knfsd: Remove dead code from nfsd_cache_lookup
SUNRPC: Simplify TCP receive code
SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock
SUNRPC: Remove non-RCU protected lookup
NFS: Fix up a typo in nfs_dns_ent_put
NFS: Lockless DNS lookups
knfsd: Lockless lookup of NFSv4 identities.
SUNRPC: Lockless server RPCSEC_GSS context lookup
knfsd: Allow lockless lookups of the exports
...
Plumb a remap_flags argument through the {do,vfs}_clone_file_range
functions so that clone can take advantage of it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Change the remap_file_range functions to take a number of bytes to
operate upon and return the number of bytes they operated on. This is a
requirement for allowing fs implementations to return short clone/dedupe
results to the user, which will enable us to obey resource limits in a
graceful manner.
A subsequent patch will enable copy_file_range to signal to the
->clone_file_range implementation that it can handle a short length,
which will be returned in the function's return value. For now the
short return is not implemented anywhere so the behavior won't change --
either copy_file_range manages to clone the entire range or it tries an
alternative.
Neither clone ioctl can take advantage of this, alas.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Replace "fallthru" with a proper "fall through" annotation.
Also, add an annotation were it is expected to fall through.
These fixes are part of the ongoing efforts to enabling
-Wimplicit-fallthrough
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.
Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements. This makes it easier to add further
iterator types. Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.
Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself. Only the direction is required.
Signed-off-by: David Howells <dhowells@redhat.com>
Fixes gcc '-Wunused-but-set-variable' warning:
fs/nfsd/vfs.c: In function 'nfsd_create':
fs/nfsd/vfs.c:1279:16: warning:
variable 'dirp' set but not used [-Wunused-but-set-variable]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Commit 031a072a0b ("vfs: call vfs_clone_file_range() under freeze
protection") created a wrapper do_clone_file_range() around
vfs_clone_file_range() moving the freeze protection to former, so
overlayfs could call the latter.
The more common vfs practice is to call do_xxx helpers from vfs_xxx
helpers, where freeze protecction is taken in the vfs_xxx helper, so
this anomality could be a source of confusion.
It seems that commit 8ede205541 ("ovl: add reflink/copyfile/dedup
support") may have fallen a victim to this confusion -
ovl_clone_file_range() calls the vfs_clone_file_range() helper in the
hope of getting freeze protection on upper fs, but in fact results in
overlayfs allowing to bypass upper fs freeze protection.
Swap the names of the two helpers to conform to common vfs practice
and call the correct helpers from overlayfs and nfsd.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
just check ->f_mode in ima_appraise_measurement()
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
That can (and does, on some filesystems) happen - ->mkdir() (and thus
vfs_mkdir()) can legitimately leave its argument negative and just
unhash it, counting upon the lookup to pick the object we'd created
next time we try to look at that name.
Some vfs_mkdir() callers forget about that possibility...
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
NFSv4 read compound processing invokes nfsd_splice_read and
nfs_readv directly, so the trace points currently in nfsd_read are
not invoked for NFSv4 reads.
Move the NFSD READ trace points to common helpers so that NFSv4
reads are captured.
Also, record any local I/O error that occurs, the total count of
bytes that were actually returned, and whether splice or vectored
read was used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFSv4 write compound processing invokes nfsd_vfs_write directly. The
trace points currently in nfsd_write are not effective for NFSv4
writes.
Move the trace points into the shared nfsd_vfs_write() helper.
After the I/O, we also want to record any local I/O error that
might have occurred, and the total count of bytes that were actually
moved (rather than the requested number).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Follow naming convention used in client and in sunrpc layers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Byte count is more helpful to know than vector count.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull read/write updates from Al Viro:
"Christoph's fs/read_write.c series - consolidation and cleanups"
* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nfsd: remove nfsd_vfs_read
nfsd: use vfs_iter_read/write
fs: implement vfs_iter_write using do_iter_write
fs: implement vfs_iter_read using do_iter_read
fs: move more code into do_iter_read/do_iter_write
fs: remove __do_readv_writev
fs: remove do_compat_readv_writev
fs: remove do_readv_writev
Instead of messing with the address limit to use vfs_read/vfs_writev.
Note that this requires that exported file implement ->read_iter and
->write_iter. All currently exportable file systems do this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
bugfixes.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZE2UeAAoJECebzXlCjuG+St8P/0vG+ps9sY012E6Wh9gy4Ev4
BtxG/c3CtcxrbNzW+cFhdEloBGtC0VvcrKNCozJTK4LdaPYErkyRBpjgXvIggT9I
GWY4ftpH3eJ6uByN9Okgc3/1la2poDflJO/nYhdRed3YHOnXTtx/746tu1xAnVCV
tFtDGrbJZTprt5c3zETtdtquCUSy2aMT5ZbrdU3yWBCwQMNSIufN3an8epfB++xx
Ct+G0HTRffcWAdYuLT0N1HKqm8pkncdNMFpm7mVw0hMCRy552G3fuj8LtkhVTvKE
1KN3zXY4jhaUYWD5Yt6AJcpLEro65b8swYk4e9FP2TNUpCmuRdXT9cb9vE8YztxC
8s4N23RHaEx9I6pC3OU64a2HfhiQM/oOIvjlhTBsjojXsQcqZFD1vsoSYA8Byl0w
m9EQWqPqge4m6yEYl7uAyL6xSthbrhcU1Ks5jvNXGcWzEQj7BATnynJANsfZ+y6r
ZoVcsRNX49m1BG+p9br+9DFffPiNFUMqxbfr73L9HRep3OsPeFKazFG0bKd3hOqA
E6L/AnBd9soSqTuTvbisWrGWbomhtd5G/fAa1uHrWTPHMXUWCmkguiau51FNfcHu
xcJlBBVCvUmmd5u3wF6QeiyjPs4KEBzQzsOUsWKHRxDBp6s+5PX/lHuXRBlDP+fN
TQq0KbvBtea1OyMaRtoV
=Rtl/
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.12' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Another RDMA update from Chuck Lever, and a bunch of miscellaneous
bugfixes"
* tag 'nfsd-4.12' of git://linux-nfs.org/~bfields/linux: (26 commits)
nfsd: Fix up the "supattr_exclcreat" attributes
nfsd: encoders mustn't use unitialized values in error cases
nfsd: fix undefined behavior in nfsd4_layout_verify
lockd: fix lockd shutdown race
NFSv4: Fix callback server shutdown
SUNRPC: Refactor svc_set_num_threads()
NFSv4.x/callback: Create the callback service through svc_create_pooled
lockd: remove redundant check on block
svcrdma: Clean out old XDR encoders
svcrdma: Remove the req_map cache
svcrdma: Remove unused RDMA Write completion handler
svcrdma: Reduce size of sge array in struct svc_rdma_op_ctxt
svcrdma: Clean up RPC-over-RDMA backchannel reply processing
svcrdma: Report Write/Reply chunk overruns
svcrdma: Clean up RDMA_ERROR path
svcrdma: Use rdma_rw API in RPC reply path
svcrdma: Introduce local rdma_rw API helpers
svcrdma: Clean up svc_rdma_get_inv_rkey()
svcrdma: Add helper to save pages under I/O
svcrdma: Eliminate RPCRDMA_SQ_DEPTH_MULT
...
consider the sequence of commands:
mkdir -p /import/nfs /import/bind /import/etc
mount --bind / /import/bind
mount --make-private /import/bind
mount --bind /import/etc /import/bind/etc
exportfs -o rw,no_root_squash,crossmnt,async,no_subtree_check localhost:/
mount -o vers=4 localhost:/ /import/nfs
ls -l /import/nfs/etc
You would not expect this to report a stale file handle.
Yet it does.
The manipulations under /import/bind cause the dentry for
/etc to get the DCACHE_MOUNTED flag set, even though nothing
is mounted on /etc. This causes nfsd to call
nfsd_cross_mnt() even though there is no mountpoint. So an
upcall to mountd for "/etc" is performed.
The 'crossmnt' flag on the export of / causes mountd to
report that /etc is exported as it is a descendant of /. It
assumes the kernel wouldn't ask about something that wasn't
a mountpoint. The filehandle returned identifies the
filesystem and the inode number of /etc.
When this filehandle is presented to rpc.mountd, via
"nfsd.fh", the inode cannot be found associated with any
name in /etc/exports, or with any mountpoint listed by
getmntent(). So rpc.mountd says the filehandle doesn't
exist. Hence ESTALE.
This is fixed by teaching nfsd not to trust DCACHE_MOUNTED
too much. It is just a hint, not a guarantee.
Change nfsd_mountpoint() to return '1' for a certain mountpoint,
'2' for a possible mountpoint, and 0 otherwise.
Then change nfsd_crossmnt() to check if follow_down()
actually found a mountpount and, if not, to avoid performing
a lookup if the location is not known to certainly require
an export-point.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It is not safe for one thread to modify the ->flags
of another thread as there is no locking that can protect
the update.
So tsk_restore_flags(), which takes a task pointer and modifies
the flags, is an invitation to do the wrong thing.
All current users pass "current" as the task, so no developers have
accepted that invitation. It would be best to ensure it remains
that way.
So rename tsk_restore_flags() to current_restore_flags() and don't
pass in a task_struct pointer. Always operate on current->flags.
Signed-off-by: NeilBrown <neilb@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Both the NFS protocols and the Linux VFS use a setattr operation with a
bitmap of attributes to set to set various file attributes including the
file size and the uid/gid.
The Linux syscalls never mix size updates with unrelated updates like
the uid/gid, and some file systems like XFS and GFS2 rely on the fact
that truncates don't update random other attributes, and many other file
systems handle the case but do not update the other attributes in the
same transaction. NFSD on the other hand passes the attributes it gets
on the wire more or less directly through to the VFS, leading to updates
the file systems don't expect. XFS at least has an assert on the
allowed attributes, which caught an unusual NFS client setting the size
and group at the same time.
To handle this issue properly this splits the notify_change call in
nfsd_setattr into two separate ones.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch incorrectly attempted nested mnt_want_write, and incorrectly
disabled nfsd's owner override for truncate. We'll fix those problems
and make another attempt soon, for the moment I think the safest is to
revert.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This is just cleanup, no change in functionality.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This is just cleanup, no change in functionality.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
After fae5096ad2 "nfsd: assume writeable exportabled filesystems have
f_sync" we no longer modify this argument.
This is just cleanup, no change in functionality.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Both the NFS protocols and the Linux VFS use a setattr operation with a
bitmap of attributs to set to set various file attributes including the
file size and the uid/gid.
The Linux syscalls never mixes size updates with unrelated updates like
the uid/gid, and some file systems like XFS and GFS2 rely on the fact
that truncates might not update random other attributes, and many other
file systems handle the case but do not update the different attributes
in the same transaction. NFSD on the other hand passes the attributes
it gets on the wire more or less directly through to the VFS, leading to
updates the file systems don't expect. XFS at least has an assert on
the allowed attributes, which caught an unusual NFS client setting the
size and group at the same time.
To handle this issue properly this switches nfsd to call vfs_truncate
for size changes, and then handle all other attributes through
notify_change. As a side effect this also means less boilerplace code
around the size change as we can now reuse the VFS code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull partial readlink cleanups from Miklos Szeredi.
This is the uncontroversial part of the readlink cleanup patch-set that
simplifies the default readlink handling.
Miklos and Al are still discussing the rest of the series.
* git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
vfs: make generic_readlink() static
vfs: remove ".readlink = generic_readlink" assignments
vfs: default to generic_readlink()
vfs: replace calling i_op->readlink with vfs_readlink()
proc/self: use generic_readlink
ecryptfs: use vfs_get_link()
bad_inode: add missing i_op initializers
Move sb_start_write()/sb_end_write() out of the vfs helper and up into the
ioctl handler.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Also check d_is_symlink() in callers instead of inode->i_op->readlink
because following patches will allow NULL ->readlink for symlinks.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
I only implemented the sync version of this call, since it's the
easiest. I can simply call vfs_copy_range() and have the vfs do the
right thing for the filesystem being exported.
Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
b44061d0b9 introduced a dentry ref counting bug. Previously we were
grabbing one ref to dchild in nfsd_create(), but with the creation of
nfsd_create_locked() we have a ref for dchild from the lookup in
nfsd_create(), and then another ref in nfsd_create_locked(). The ref
from the lookup in nfsd_create() is never dropped and results in
dentries still in use at unmount.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Fixes: b44061d0b9 "nfsd: reorganize nfsd_create"
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We changed this around in f135af1041f ('nfsd: reorganize nfsd_create')
so "dchild" can't be an error pointer any more. Also, dchild can't be
NULL here (and dput would already handle this even if it was).
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We need an fh_verify to make sure we at least have a dentry, but actual
permission checks happen later.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
vfs_{create,mkdir,mknod} each begin with a call to may_create(), which
returns EEXIST if the object already exists.
This check is therefore unnecessary.
(In the NFSv2 case, nfsd_proc_create also has such a check. Contrary to
RFC 1094, our code seems to believe that a CREATE of an existing file
should succeed. I'm leaving that behavior alone.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There's some odd logic in nfsd_create() that allows it to be called with
the parent directory either locked or unlocked. The only already-locked
caller is NFSv2's nfsd_proc_create(). It's less confusing to split out
the unlocked case into a separate function which the NFSv2 code can call
directly.
Also fix some comments while we're here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Create and other nfsd ops generally assume we can call lookup_one_len on
inodes with S_IFDIR set. Al says that this assumption isn't true in
general, though it should be for the filesystem objects nfsd sees.
Add a check just to make sure our assumption isn't violated.
Remove a couple checks for i_op->lookup in create code.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
lookup_one_len already has this check.
The only effect of this patch is to return access instead of perm in the
0-length-filename case. I actually prefer nfserr_perm (or _inval?), but
I doubt anyone cares.
The isdotent check seems redundant too, but I worry that some client
might actually care about that strange nfserr_exist error.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When doing a create (mkdir/mknod) on a name, it's worth
checking the name exists first before returning EACCES in case
the directory is not writeable by the user.
This makes return values on the client more consistent
regardless of whenever the entry there is cached in the local
cache or not.
Another positive side effect is certain programs only expect
EEXIST in that case even despite POSIX allowing any valid
error to be returned.
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This way we can set kiocb flags also from the sync read/write path for
the read_iter/write_iter operations. For now there is no way to pass
flags to plain read/write operations as there is no real need for that,
and all flags passed are explicitly rejected for these files.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stephen Bates <stephen.bates@pmcs.com>
Tested-by: Stephen Bates <stephen.bates@pmcs.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK storms
that can affect some high-availability NFS setups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWmVZHAAoJECebzXlCjuG+Cn4P/3zwSuwIeLuv9b89vzFXU8Xv
AbBWHk7WkFXJQGTKdclYjwxqU+l15D5lYHCae1cuD5eXdviraxXf7EcnqrMhJUc0
oRiQx0rAwlEkKUAVrxGCFP7WKjlX3TsEBV6wPpTCP3BEMzTPDEeaDek7+hICFkLF
9a/miEXAopm3jxP7WNmXEkdKpFEHklDDwtv6Av7iIKCW6+7XCGp7Prqo4NQKAKp6
hjE+nvt2HiD06MZhUeyb14cn6547smzt1rbSfK4IB4yHMwLyaoqPrT7ekDh9LDrE
uGgo+Y2PBbEcTAE6tJ88EjZx7cMCFPn0te+eKPgnpPy9RqrNqSxj5N/b7JAecKgW
a/09BtvFOoYs8fO5ovqeRY5THrE3IRyMIwn4gt7fCYaaAbG3dwGKG1uklTAVXtb1
95DkhOb8He2VhOCCoJ6ybbTnRfjB6b/cv7ZuEGlQfvTE+BtU3Jj9I76ruWFhb3zd
HM1dRI20UfwL/0Y8yYhZ+/rje9SSk2jOmVgSCqY9hnCmEqOqOdUU0X/uumIWaBym
zfGx9GIM0jQuYVdLQRXtJJbUgJUUN3MilGyU5wx7YoXip5guqTalXqAdQpShzXeW
s1ATYh/mY5X9ig51KogkkVlm9bXDQAzJBAnDRpLtJZqy5Cgkrj9RSu0ExN1Rmlhw
LKQCddBQxUSWJ+XWycgK
=G7V3
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Smaller bugfixes and cleanup, including a fix for a failures of
kerberized NFSv4.1 mounts, and Scott Mayhew's work addressing ACK
storms that can affect some high-availability NFS setups"
* tag 'nfsd-4.5' of git://linux-nfs.org/~bfields/linux:
nfsd: add new io class tracepoint
nfsd: give up on CB_LAYOUTRECALLs after two lease periods
nfsd: Fix nfsd leaks sunrpc module references
lockd: constify nlmsvc_binding structure
lockd: use to_delayed_work
nfsd: use to_delayed_work
Revert "svcrdma: Do not send XDR roundup bytes for a write chunk"
lockd: Register callbacks on the inetaddr_chain and inet6addr_chain
nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain
sunrpc: Add a function to close temporary transports immediately
nfsd: don't base cl_cb_status on stale information
nfsd4: fix gss-proxy 4.1 mounts for some AD principals
nfsd: fix unlikely NULL deref in mach_creds_match
nfsd: minor consolidation of mach_cred handling code
nfsd: helper for dup of possibly NULL string
svcrpc: move some initialization to common code
nfsd: fix a warning message
nfsd: constify nfsd4_callback_ops structure
nfsd: recover: constify nfsd4_client_tracking_ops structures
svcrdma: Do not send XDR roundup bytes for a write chunk
Add some new tracepoints in the nfsd read/write codepaths. The idea
is that this will give us the ability to measure how long each phase of
a read or write operation takes.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull misc vfs updates from Al Viro:
"All kinds of stuff. That probably should've been 5 or 6 separate
branches, but by the time I'd realized how large and mixed that bag
had become it had been too close to -final to play with rebasing.
Some fs/namei.c cleanups there, memdup_user_nul() introduction and
switching open-coded instances, burying long-dead code, whack-a-mole
of various kinds, several new helpers for ->llseek(), assorted
cleanups and fixes from various people, etc.
One piece probably deserves special mention - Neil's
lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
called without ->i_mutex and tries to avoid ever taking it. That, of
course, means that it's not useful for any directory modifications,
but things like getting inode attributes in nfds readdirplus are fine
with that. I really should've asked for moratorium on lookup-related
changes this cycle, but since I hadn't done that early enough... I
*am* asking for that for the coming cycle, though - I'm going to try
and get conversion of i_mutex to rwsem with ->lookup() done under lock
taken shared.
There will be a patch closer to the end of the window, along the lines
of the one Linus had posted last May - mechanical conversion of
->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
inode_is_locked()/inode_lock_nested(). To quote Linus back then:
-----
| This is an automated patch using
|
| sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
| sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
| sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
| sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
| sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
|
| with a very few manual fixups
-----
I'm going to send that once the ->i_mutex-affecting stuff in -next
gets mostly merged (or when Linus says he's about to stop taking
merges)"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
nfsd: don't hold i_mutex over userspace upcalls
fs:affs:Replace time_t with time64_t
fs/9p: use fscache mutex rather than spinlock
proc: add a reschedule point in proc_readfd_common()
logfs: constify logfs_block_ops structures
fcntl: allow to set O_DIRECT flag on pipe
fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
fs: xattr: Use kvfree()
[s390] page_to_phys() always returns a multiple of PAGE_SIZE
nbd: use ->compat_ioctl()
fs: use block_device name vsprintf helper
lib/vsprintf: add %*pg format specifier
fs: use gendisk->disk_name where possible
poll: plug an unused argument to do_poll
amdkfd: don't open-code memdup_user()
cdrom: don't open-code memdup_user()
rsxx: don't open-code memdup_user()
mtip32xx: don't open-code memdup_user()
[um] mconsole: don't open-code memdup_user_nul()
[um] hostaudio: don't open-code memdup_user()
...
We need information about exports when crossing mountpoints during
lookup or NFSv4 readdir. If we don't already have that information
cached, we may have to ask (and wait for) rpc.mountd.
In both cases we currently hold the i_mutex on the parent of the
directory we're asking rpc.mountd about. We've seen situations where
rpc.mountd performs some operation on that directory that tries to take
the i_mutex again, resulting in deadlock.
With some care, we may be able to avoid that in rpc.mountd. But it
seems better just to avoid holding a mutex while waiting on userspace.
It appears that lookup_one_len is pretty much the only operation that
needs the i_mutex. So we could just drop the i_mutex elsewhere and do
something like
mutex_lock()
lookup_one_len()
mutex_unlock()
In many cases though the lookup would have been cached and not required
the i_mutex, so it's more efficient to create a lookup_one_len() variant
that only takes the i_mutex when necessary.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is basically a remote version of the btrfs CLONE operation,
so the implementation is fairly trivial. Made even more trivial
by stealing the XDR code and general framework Anna Schumaker's
COPY prototype.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
According to rfc5661 18.16.4,
"If EXCLUSIVE4_1 was used, the client determines the attributes
used for the verifier by comparing attrset with cva_attrs.attrmask;"
So, EXCLUSIVE4_1 also needs those bitmask used to store the verifier.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch changes nfs4_preprocess_stateid_op so it always returns
a valid struct file if it has been asked for that. For that we
now allocate a temporary struct file for special stateids, and check
permissions if we got the file structure from the stateid. This
ensures that all callers will get their handling of special stateids
right, and avoids code duplication.
There is a little wart in here because the read code needs to know
if we allocated a file structure so that it can copy around the
read-ahead parameters. In the long run we should probably aim to
cache full file structures used with special stateids instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Refactor the raparam hash helpers to just deal with the raparms,
and keep opening/closing files separate from that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFSv2 can set the atime and/or mtime of a file to specific timestamps but not
to the server's current time. To implement the equivalent of utimes("file",
NULL), it uses a heuristic.
NFSv3 and later do support setting the atime and/or mtime to the server's
current time directly. The NFSv2 heuristic is still enabled, and causes
timestamps to be set wrong sometimes.
Fix this by moving the heuristic into the NFSv2 specific code. We can leave it
out of the create code path: the owner can always set timestamps arbitrarily,
and the workaround would never trigger.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );
my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);
foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}
[AV: overlayfs parts skipped]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull nfsd updates from Bruce Fields:
"A comparatively quieter cycle for nfsd this time, but still with two
larger changes:
- RPC server scalability improvements from Jeff Layton (using RCU
instead of a spinlock to find idle threads).
- server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
Schumaker, enabling fallocate on new clients"
* 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd4: fix xdr4 count of server in fs_location4
nfsd4: fix xdr4 inclusion of escaped char
sunrpc/cache: convert to use string_escape_str()
sunrpc: only call test_bit once in svc_xprt_received
fs: nfsd: Fix signedness bug in compare_blob
sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
sunrpc: convert to lockless lookup of queued server threads
sunrpc: fix potential races in pool_stats collection
sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
sunrpc: require svc_create callers to pass in meaningful shutdown routine
sunrpc: have svc_wake_up only deal with pool 0
sunrpc: convert sp_task_pending flag to use atomic bitops
sunrpc: move rq_cachetype field to better optimize space
sunrpc: move rq_splice_ok flag into rq_flags
sunrpc: move rq_dropme flag into rq_flags
sunrpc: move rq_usedeferral flag to rq_flags
sunrpc: move rq_local field to rq_flags
sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
nfsd: minor off by one checks in __write_versions()
sunrpc: release svc_pool_map reference when serv allocation fails
...
The ALLOCATE operation is used to preallocate space in a file. I can do
this by using vfs_fallocate() to do the actual preallocation.
ALLOCATE only returns a status indicator, so we don't need to write a
special encode() function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The vfs_fsync_range() call during write processing got the end of the
range off by one. The range is inclusive, not exclusive. The error has
nfsd sync more data than requested -- it's correct but unnecessary
overhead.
The call during commit processing is correct so I copied that pattern in
write processing. Maybe a helper would be nice but I kept it trivial.
This is untested. I found it while reviewing code for something else
entirely.
Signed-off-by: Zach Brown <zab@zabbo.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull security subsystem updates from James Morris.
Mostly ima, selinux, smack and key handling updates.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
integrity: do zero padding of the key id
KEYS: output last portion of fingerprint in /proc/keys
KEYS: strip 'id:' from ca_keyid
KEYS: use swapped SKID for performing partial matching
KEYS: Restore partial ID matching functionality for asymmetric keys
X.509: If available, use the raw subjKeyId to form the key description
KEYS: handle error code encoded in pointer
selinux: normalize audit log formatting
selinux: cleanup error reporting in selinux_nlmsg_perm()
KEYS: Check hex2bin()'s return when generating an asymmetric key ID
ima: detect violations for mmaped files
ima: fix race condition on ima_rdwr_violation_check and process_measurement
ima: added ima_policy_flag variable
ima: return an error code from ima_add_boot_aggregate()
ima: provide 'ima_appraise=log' kernel option
ima: move keyring initialization to ima_init()
PKCS#7: Handle PKCS#7 messages that contain no X.509 certs
PKCS#7: Better handling of unsupported crypto
KEYS: Overhaul key identification when searching for asymmetric keys
KEYS: Implement binary asymmetric key ID handling
...
This fixes a failure in xfstests generic/313 because nfs doesn't update
mtime on a truncate. The protocol requires this to be done implicity
for a size changing setattr.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Empty files and missing xattrs do not guarantee that a file was
just created. This patch passes FILE_CREATED flag to IMA to
reliably identify new files.
Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> 3.14+
Commit 2a7420c03e504 (nfsd: Ensure that nfsd_create_setattr commits
files to stable storage), added a couple of calls to commit_metadata,
but doesn't convert their return codes to __be32 in the appropriate
places.
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Since nfsd_create_setattr strips the mode from the struct iattr, it
is quite possible that it will optimise away the call to nfsd_setattr
altogether.
If this is the case, then we never call commit_metadata() on the
newly created file.
Also ensure that both nfsd_setattr() and nfsd_create_setattr() fail
when the call to commit_metadata fails.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Commit db2e747b14 (vfs: remove mode parameter from vfs_symlink())
have remove mode parameter from vfs_symlink.
So that, iattr isn't needed by nfsd_symlink now, just remove it.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently nfsd_symlink has a weird hack to serve callers who don't
null-terminate symlink data: it looks ahead at the next byte to see if
it's zero, and copies it to a new buffer to null-terminate if not.
That means callers don't have to null-terminate, but they *do* have to
ensure that the byte following the end of the data is theirs to read.
That's a bit subtle, and the NFSv4 code actually got this wrong.
So let's just throw out that code and let callers pass null-terminated
strings; we've already fixed them to do that.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The callers expect a __be32 return and the functions they call return
__be32, so having these return int is just wrong. Also, nfsd_finish_read
can be made static.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Don't using cache_get besides export.h, using exp_get for export.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The splice and readv cases are actually quite different--for example the
former case ignores the array of vectors we build up for the latter.
It is probably clearer to separate the two cases entirely.
There's some code duplication between the split out encoders, but this
is only temporary and will be fixed by a later patch.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks
and live-locks while writing to the page cache in a loop-back
NFS mount situation.
It therefore makes sense to *only* set PF_LESS_THROTTLE in this
situation.
We now know when a request came from the local-host so it could be a
loop-back mount. We already know when we are handling write requests,
and when we are doing anything else.
So combine those two to allow nfsd to still be throttled (like any
other process) in every situation except when it is known to be
problematic.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We're clearing the SUID/SGID bits on write by hand in nfsd_vfs_write,
even though the subsequent vfs_writev() call will end up doing this for
us (through file system write methods eventually calling
file_remove_suid(), e.g., from __generic_file_aio_write).
So, remove the redundant nfsd code.
The only change in behavior is when the write is by root, in which case
we previously cleared SUID/SGID, but will now leave it alone. The new
behavior is the behavior of every filesystem we've checked.
It seems better to be consistent with local filesystem behavior. And
the security advantage seems limited as root could always restore these
bits by hand if it wanted.
SUID/SGID is not cleared after writing data with (root, local ext4),
File: ‘test’
Size: 0 Blocks: 0 IO Block: 4096 regular
empty file
Device: 803h/2051d Inode: 1200137 Links: 1
Access: (4777/-rwsrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
Context: unconfined_u:object_r:admin_home_t:s0
Access: 2014-04-18 21:36:31.016029014 +0800
Modify: 2014-04-18 21:36:31.016029014 +0800
Change: 2014-04-18 21:36:31.026030285 +0800
Birth: -
File: ‘test’
Size: 5 Blocks: 8 IO Block: 4096 regular file
Device: 803h/2051d Inode: 1200137 Links: 1
Access: (4777/-rwsrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
Context: unconfined_u:object_r:admin_home_t:s0
Access: 2014-04-18 21:36:31.016029014 +0800
Modify: 2014-04-18 21:36:31.040032065 +0800
Change: 2014-04-18 21:36:31.040032065 +0800
Birth: -
With no_root_squash, (root, remote ext4), SUID/SGID are cleared,
File: ‘test’
Size: 0 Blocks: 0 IO Block: 262144 regular
empty file
Device: 24h/36d Inode: 786439 Links: 1
Access: (4777/-rwsrwxrwx) Uid: ( 1000/ test) Gid: ( 1000/ test)
Context: system_u:object_r:nfs_t:s0
Access: 2014-04-18 21:45:32.155805097 +0800
Modify: 2014-04-18 21:45:32.155805097 +0800
Change: 2014-04-18 21:45:32.168806749 +0800
Birth: -
File: ‘test’
Size: 5 Blocks: 8 IO Block: 262144 regular file
Device: 24h/36d Inode: 786439 Links: 1
Access: (0777/-rwxrwxrwx) Uid: ( 1000/ test) Gid: ( 1000/ test)
Context: system_u:object_r:nfs_t:s0
Access: 2014-04-18 21:45:32.155805097 +0800
Modify: 2014-04-18 21:45:32.184808783 +0800
Change: 2014-04-18 21:45:32.184808783 +0800
Birth: -
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull nfsd updates from Bruce Fields:
"Highlights:
- server-side nfs/rdma fixes from Jeff Layton and Tom Tucker
- xdr fixes (a larger xdr rewrite has been posted but I decided it
would be better to queue it up for 3.16).
- miscellaneous fixes and cleanup from all over (thanks especially to
Kinglong Mee)"
* 'for-3.15' of git://linux-nfs.org/~bfields/linux: (36 commits)
nfsd4: don't create unnecessary mask acl
nfsd: revert v2 half of "nfsd: don't return high mode bits"
nfsd4: fix memory leak in nfsd4_encode_fattr()
nfsd: check passed socket's net matches NFSd superblock's one
SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed
NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp
SUNRPC: New helper for creating client with rpc_xprt
NFSD: Free backchannel xprt in bc_destroy
NFSD: Clear wcc data between compound ops
nfsd: Don't return NFS4ERR_STALE_STATEID for NFSv4.1+
nfsd4: fix nfs4err_resource in 4.1 case
nfsd4: fix setclientid encode size
nfsd4: remove redundant check from nfsd4_check_resp_size
nfsd4: use more generous NFS4_ACL_MAX
nfsd4: minor nfsd4_replay_cache_entry cleanup
nfsd4: nfsd4_replay_cache_entry should be static
nfsd4: update comments with obsolete function name
rpc: Allow xdr_buf_subsegment to operate in-place
NFSD: Using free_conn free connection
SUNRPC: fix memory leak of peer addresses in XPRT
...
Add new renameat2 syscall, which is the same as renameat with an added
flags argument.
Pass flags to vfs_rename() and to i_op->rename() as well.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>