Commit Graph

73060 Commits

Author SHA1 Message Date
Alexander Aring 6c547f2640 fs: dlm: memory cache for midcomms hotpath
This patch will introduce a kmem cache for allocating message handles
which are needed for midcomms layer to take track of lowcomms messages.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring be3b0400ed fs: dlm: remove wq_alloc mutex
This patch cleanups the code for allocating a new buffer in the dlm
writequeue mechanism. There was a possible tuneup to allow scheduling
while a new writequeue entry needs to be allocated because either no
sending page is available or are full. To avoid multiple concurrent
users checking at the same time if an entry is available or full
alloc_wq was introduce that those are waiting if there is currently a
new writequeue entry in process to be queued so possible further users
will check on the new allocated writequeue entry if it's full.

To simplify the code we just remove this mutex and switch that the
already introduced spin lock will be held during writequeue check,
allocation and queueing. So other users can never check on available
writequeues while there is a new one in process but not queued yet.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring 21d9ac1a53 fs: dlm: use event based wait for pending remove
This patch will use an event based waitqueue to wait for a possible clash
with the ls_remove_name field of dlm_ls instead of doing busy waiting.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring bcbfea41e1 fs: dlm: check for pending users filling buffers
Currently we don't care if the DLM application stack is filling buffers
(not committed yet) while we transmit some already committed buffers.
By checking on active writequeue users before dequeue a writequeue entry
we know there is coming more data and do nothing. We wait until the send
worker will be triggered again if the writequeue entry users hit zero.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring f70813d6a5 fs: dlm: use list_empty() to check last iteration
This patch will use list_empty(&ls->ls_cb_delay) to check for last list
iteration. In case of a multiply count of MAX_CB_QUEUE and the list is
empty we do a extra goto more which we can avoid by checking on
list_empty().

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring 1b9beda83e fs: dlm: fix build with CONFIG_IPV6 disabled
This patch will surround the AF_INET6 case in sk_error_report() of dlm
with a #if IS_ENABLED(CONFIG_IPV6). The field sk->sk_v6_daddr is not
defined when CONFIG_IPV6 is disabled. If CONFIG_IPV6 is disabled, the
socket creation with AF_INET6 should already fail because a runtime
check if AF_INET6 is registered. However if there is the possibility
that AF_INET6 is set as sk_family the sk_error_report() callback will
print then an invalid family type error.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 4c3d90570b ("fs: dlm: don't call kernel_getpeername() in error_report()")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-17 08:46:23 -06:00
Alexander Aring 92c4460538 fs: dlm: replace use of socket sk_callback_lock with sock_lock
This patch will replace the use of socket sk_callback_lock lock and uses
socket lock instead. Some users like sunrpc, see commit ea9afca88b
("SUNRPC: Replace use of socket sk_callback_lock with sock_lock") moving
from sk_callback_lock to sock_lock which seems to be held when the socket
callbacks are called.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-15 11:49:18 -06:00
Alexander Aring 4c3d90570b fs: dlm: don't call kernel_getpeername() in error_report()
In some cases kernel_getpeername() will held the socket lock which is
already held when the socket layer calls error_report() callback. Since
commit 9dfc685e02 ("inet: remove races in inet{6}_getname()") this
problem becomes more likely because the socket lock will be held always.
You will see something like:

bob9-u5 login: [  562.316860] BUG: spinlock recursion on CPU#7, swapper/7/0
[  562.318562]  lock: 0xffff8f2284720088, .magic: dead4ead, .owner: swapper/7/0, .owner_cpu: 7
[  562.319522] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.15.0+ #135
[  562.320346] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
[  562.321277] Call Trace:
[  562.321529]  <IRQ>
[  562.321734]  dump_stack_lvl+0x33/0x42
[  562.322282]  do_raw_spin_lock+0x8b/0xc0
[  562.322674]  lock_sock_nested+0x1e/0x50
[  562.323057]  inet_getname+0x39/0x110
[  562.323425]  ? sock_def_readable+0x80/0x80
[  562.323838]  lowcomms_error_report+0x63/0x260 [dlm]
[  562.324338]  ? wait_for_completion_interruptible_timeout+0xd2/0x120
[  562.324949]  ? lock_timer_base+0x67/0x80
[  562.325330]  ? do_raw_spin_unlock+0x49/0xc0
[  562.325735]  ? _raw_spin_unlock_irqrestore+0x1e/0x40
[  562.326218]  ? del_timer+0x54/0x80
[  562.326549]  sk_error_report+0x12/0x70
[  562.326919]  tcp_validate_incoming+0x3c8/0x530
[  562.327347]  ? kvm_clock_read+0x14/0x30
[  562.327718]  ? ktime_get+0x3b/0xa0
[  562.328055]  tcp_rcv_established+0x121/0x660
[  562.328466]  tcp_v4_do_rcv+0x132/0x260
[  562.328835]  tcp_v4_rcv+0xcea/0xe20
[  562.329173]  ip_protocol_deliver_rcu+0x35/0x1f0
[  562.329615]  ip_local_deliver_finish+0x54/0x60
[  562.330050]  ip_local_deliver+0xf7/0x110
[  562.330431]  ? inet_rtm_getroute+0x211/0x840
[  562.330848]  ? ip_protocol_deliver_rcu+0x1f0/0x1f0
[  562.331310]  ip_rcv+0xe1/0xf0
[  562.331603]  ? ip_local_deliver+0x110/0x110
[  562.332011]  __netif_receive_skb_core+0x46a/0x1040
[  562.332476]  ? inet_gro_receive+0x263/0x2e0
[  562.332885]  __netif_receive_skb_list_core+0x13b/0x2c0
[  562.333383]  netif_receive_skb_list_internal+0x1c8/0x2f0
[  562.333896]  ? update_load_avg+0x7e/0x5e0
[  562.334285]  gro_normal_list.part.149+0x19/0x40
[  562.334722]  napi_complete_done+0x67/0x160
[  562.335134]  virtnet_poll+0x2ad/0x408 [virtio_net]
[  562.335644]  __napi_poll+0x28/0x140
[  562.336012]  net_rx_action+0x23d/0x300
[  562.336414]  __do_softirq+0xf2/0x2ea
[  562.336803]  irq_exit_rcu+0xc1/0xf0
[  562.337173]  common_interrupt+0xb9/0xd0

It is and was always forbidden to call kernel_getpeername() in context
of error_report(). To get rid of the problem we access the destination
address for the peer over the socket structure. While on it we fix to
print out the destination port of the inet socket.

Fixes: 1a31833d08 ("DLM: Replace nodeid_to_addr with kernel_getpeername")
Reported-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-15 11:49:18 -06:00
Alexander Aring 6a628fa438 fs: dlm: fix potential buffer overflow
This patch fixes an potential overflow in sscanf and the maximum
declared string parsing length which seems to be excluding the null
termination symbol. This patch will just add one byte to be prepared on
a string with length of DLM_RESNAME_MAXLEN including the null
termination symbol.

Fixes: 5054e79de9 ("fs: dlm: add lkb debugfs functionality")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-12 09:38:19 -06:00
Zhang Mingyu c8b9f34e22 fs: dlm:Remove unneeded semicolon
Eliminate the following coccinelle check warning:
fs/dlm/midcomms.c:972:2-3

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Zhang Mingyu <zhang.mingyu@zte.com.cn>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-05 13:03:34 -05:00
Alexander Aring b87b1883ef fs: dlm: remove double list_first_entry call
This patch removes a list_first_entry() call which is already done by
the previous con_next_wq() call.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-03 16:07:20 -05:00
Alexander Aring 6c2e3bf68f fs: dlm: filter user dlm messages for kernel locks
This patch fixes the following crash by receiving a invalid message:

[  160.672220] ==================================================================
[  160.676206] BUG: KASAN: user-memory-access in dlm_user_add_ast+0xc3/0x370
[  160.679659] Read of size 8 at addr 00000000deadbeef by task kworker/u32:13/319
[  160.681447]
[  160.681824] CPU: 10 PID: 319 Comm: kworker/u32:13 Not tainted 5.14.0-rc2+ #399
[  160.683472] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.14.0-1.module+el8.6.0+12648+6ede71a5 04/01/2014
[  160.685574] Workqueue: dlm_recv process_recv_sockets
[  160.686721] Call Trace:
[  160.687310]  dump_stack_lvl+0x56/0x6f
[  160.688169]  ? dlm_user_add_ast+0xc3/0x370
[  160.689116]  kasan_report.cold.14+0x116/0x11b
[  160.690138]  ? dlm_user_add_ast+0xc3/0x370
[  160.690832]  dlm_user_add_ast+0xc3/0x370
[  160.691502]  _receive_unlock_reply+0x103/0x170
[  160.692241]  _receive_message+0x11df/0x1ec0
[  160.692926]  ? rcu_read_lock_sched_held+0xa1/0xd0
[  160.693700]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  160.694427]  ? lock_acquire+0x175/0x400
[  160.695058]  ? do_purge.isra.51+0x200/0x200
[  160.695744]  ? lock_acquired+0x360/0x5d0
[  160.696400]  ? lock_contended+0x6a0/0x6a0
[  160.697055]  ? lock_release+0x21d/0x5e0
[  160.697686]  ? lock_is_held_type+0xe0/0x110
[  160.698352]  ? lock_is_held_type+0xe0/0x110
[  160.699026]  ? ___might_sleep+0x1cc/0x1e0
[  160.699698]  ? dlm_wait_requestqueue+0x94/0x140
[  160.700451]  ? dlm_process_requestqueue+0x240/0x240
[  160.701249]  ? down_write_killable+0x2b0/0x2b0
[  160.701988]  ? do_raw_spin_unlock+0xa2/0x130
[  160.702690]  dlm_receive_buffer+0x1a5/0x210
[  160.703385]  dlm_process_incoming_buffer+0x726/0x9f0
[  160.704210]  receive_from_sock+0x1c0/0x3b0
[  160.704886]  ? dlm_tcp_shutdown+0x30/0x30
[  160.705561]  ? lock_acquire+0x175/0x400
[  160.706197]  ? rcu_read_lock_sched_held+0xa1/0xd0
[  160.706941]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  160.707681]  process_recv_sockets+0x32/0x40
[  160.708366]  process_one_work+0x55e/0xad0
[  160.709045]  ? pwq_dec_nr_in_flight+0x110/0x110
[  160.709820]  worker_thread+0x65/0x5e0
[  160.710423]  ? process_one_work+0xad0/0xad0
[  160.711087]  kthread+0x1ed/0x220
[  160.711628]  ? set_kthread_struct+0x80/0x80
[  160.712314]  ret_from_fork+0x22/0x30

The issue is that we received a DLM message for a user lock but the
destination lock is a kernel lock. Note that the address which is trying
to derefence is 00000000deadbeef, which is in a kernel lock
lkb->lkb_astparam, this field should never be derefenced by the DLM
kernel stack. In case of a user lock lkb->lkb_astparam is lkb->lkb_ua
(memory is shared by a union field). The struct lkb_ua will be handled
by the DLM kernel stack but on a kernel lock it will contain invalid
data and ends in most likely crashing the kernel.

It can be reproduced with two cluster nodes.

node 2:
dlm_tool join test
echo "862 fooobaar 1 2 1" > /sys/kernel/debug/dlm/test_locks
echo "862 3 1" > /sys/kernel/debug/dlm/test_waiters

node 1:
dlm_tool join test

python:
foo = DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0x77222027, \
          m_type=7, m_flags=0x1, m_remid=0x862, m_result=0xFFFEFFFE)
newFile = open("/sys/kernel/debug/dlm/comms/2/rawmsg", "wb")
newFile.write(bytes(foo))

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 63eab2b00b fs: dlm: add lkb waiters debugfs functionality
This patch adds functionality to put a lkb to the waiters state. It can
be useful to combine this feature with the "rawmsg" debugfs
functionality. It will bring the DLM lkb into a state that a message
will be parsed by the kernel.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 5054e79de9 fs: dlm: add lkb debugfs functionality
This patch adds functionality to add an lkb during runtime. This is a
highly debugging feature only, wrong input can crash the kernel. It is a
early state feature as well. The goal is to provide a user interface for
manipulate dlm state and combine it with the rawmsg feature. It is
debugfs functionality, we don't care about UAPI breakage. Even it's
possible to add lkb's/rsb's which could never be exists in such wat by
using normal DLM operation. The user of this interface always need to
think before using this feature, not every crash which happens can really
occur during normal dlm operation.

Future there should be more functionality to add a more realistic lkb
which reflects normal DLM state inside the kernel. For now this is
enough.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 75d25ffe38 fs: dlm: allow create lkb with specific id range
This patch adds functionality to add a lkb with a specific id range.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 9af5b8f0ea fs: dlm: add debugfs rawmsg send functionality
This patch adds a dlm functionality to send a raw dlm message to a
specific cluster node. This raw message can be build by user space and
send out by writing the message to "rawmsg" dlm debugfs file.

There is a in progress scapy dlm module which provides a easy build of
DLM messages in user space. For example:

DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0xe4f48a18, ...)

The goal is to provide an easy reproducable state to crash DLM or to
fuzz the DLM kernel stack if there are possible ways to crash it.

Note: that if the sequence number is zero and dlm version is not set to
3.1 the kernel will automatic will set a right sequence number, otherwise
DLM stack testing is not possible.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 5c16febbc1 fs: dlm: let handle callback data as void
This patch changes the dlm_lowcomms_new_msg() function pointer private data
from "struct mhandle *" to "void *" to provide different structures than
just "struct mhandle".

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 3cb5977c52 fs: dlm: ls_count busy wait to event based wait
This patch changes the ls_count busy wait to use atomic counter values
and wait_event() to wait until ls_count reach zero. It will slightly
reduce the number of holding lslist_lock. At remove lockspace we need to
retry the wait because it a lockspace get could interefere between
wait_event() and holding the lock which deletes the lockspace list entry.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 164d88abd7 fs: dlm: requestqueue busy wait to event based wait
This patch changes the requestqueue busy waiting algorithm to use
atomic counter values and wait_event() to wait until the requestqueue is
empty. It will slightly reduce the number of holding ls_requestqueue_mutex
mutex.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 92732376fd fs: dlm: trace socket handling
This patch adds tracepoints for dlm socket receive and send
functionality. We can use it to track how much data was send or received
to or from a specific nodeid.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring f1d3b8f91d fs: dlm: initial support for tracepoints
This patch adds initial support for dlm tracepoints. It will introduce
tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their
complete ast() callback or blocking bast() callback.

The lock/unlock functionality has a start and end tracepoint, this is
because there exists a race in case if would have a tracepoint at the
end position only the complete/blocking callbacks could occur before. To
work with eBPF tracing and using their lookup hash functionality there
could be problems that an entry was not inserted yet. However use the
start functionality for hash insert and check again in end functionality
if there was an dlm internal error so there is no ast callback. In further
it might also that locks with local masters will occur those callbacks
immediately so we must have such functionality.

I did not make everything accessible yet, although it seems eBPF can be
used to access a lot of internal datastructures if it's aware of the
struct definitions of the running kernel instance. We still can change
it, if you do eBPF experiments e.g. time measurements between lock and
callback functionality you can simple use the local lkb_id field as hash
value in combination with the lockspace id if you have multiple
lockspaces. Otherwise you can simple use trace-cmd for some functionality,
e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 2f05ec4327 fs: dlm: make dlm_callback_resume quite
This patch makes dlm_callback_resume info printout less noisy by
accumulate all callback queues into one printout not in 25 times steps.
It seems this printout became lately quite noisy in relationship with
gfs2.

Before:

[241767.849302] dlm: bin: dlm_callback_resume 25
[241767.854846] dlm: bin: dlm_callback_resume 25
[241767.860373] dlm: bin: dlm_callback_resume 25
...
[241767.865920] dlm: bin: dlm_callback_resume 25
[241767.871352] dlm: bin: dlm_callback_resume 25
[241767.876733] dlm: bin: dlm_callback_resume 25

After the patch:

[  385.485728] dlm: gfs2: dlm_callback_resume 175

if zero it will not be printed out.

Reported-by: Barry Marson <bmarson@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring e10249b190 fs: dlm: use dlm_recovery_stopped in condition
This patch will change to evaluate the dlm_recovery_stopped() in the
condition of the if branch instead fetch it before evaluating the
condition. As this is an atomic test-set operation it should be
evaluated in the condition itself.

Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 3e9736713d fs: dlm: use dlm_recovery_stopped instead of test_bit
This patch will change to use dlm_recovery_stopped() which is the dlm way
to check if the LSFL_RECOVER_STOP flag in ls_flags by using the helper.
It is an atomic operation but the check is still as before to fetch the
value if ls_recover_lock is held. There might be more further
investigations if the value can be changed afterwards and if it has any
side effects.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 658bd576f9 fs: dlm: move version conversion to compile time
This patch moves version conversion to little endian from a runtime
variable to compile time constant.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring fe93367541 fs: dlm: remove check SCTP is loaded message
Since commit 764ff4011424 ("fs: dlm: auto load sctp module") we try
load the sctp module before we try to create a sctp kernel socket. That
a socket creation fails now has more likely other reasons. This patch
removes the part of error to load the sctp module and instead printout
the error code.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring 1aafd9c231 fs: dlm: debug improvements print nodeid
This patch improves the debug output for midcomms layer by also printing
out the nodeid where users counter belongs to.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring bb6866a5bd fs: dlm: fix small lockspace typo
This patch fixes a typo from lockspace to lockspace.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring dea450c90f fs: dlm: remove obsolete INBUF define
This patch removes an obsolete define for some length for an temporary
buffer which is not being used anymore. The use of this define is not
necessary anymore since commit 4798cbbfbd ("fs: dlm: rework receive
handling").

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Linus Torvalds b20078fd69 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull autofs fix from Al Viro:
 "Fix for a braino of mine (in getting rid of open-coded
  dentry_path_raw() in autofs a couple of cycles ago).

  Mea culpa...  Obvious -stable fodder"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  autofs: fix wait name hash calculation in autofs_wait()
2021-10-24 09:36:06 -10:00
Linus Torvalds c460e7896e Ten fixes for the ksmbd kernel server, for improved security and additional buffer overflow checks
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmFzb/4ACgkQiiy9cAdy
 T1HhOQv/bTsGX85V5QFBQyW7F7ymYaSuDpvb/nYFzzs4fMz8/lbCJ6GQHMK7G+uR
 ZYFdZw5c5A+KsfjRoRAKOc6HJbGERnKWvIlRkNWnmcpsXcR3PFhhOxXKXxCTBjga
 fHxAJBENbsyCtGz25pUtuOKVG9zVaz31lyXgnz53b0ATrb8CxIll7zdKTp7aGdK0
 UIWUEYeUwctIhvyBXyms8ni+15keop6/7Y1KhUvodL/VhU4YCkKerFFQojsqLEBk
 4N6ZrgEEvzZPSjMt3KkbAapMNvf8Jgy6hKrB10MWB2sJG3bG1A4MuWYzVz7hlLbx
 KXztSjbEiExCw7Y8ZXeA0pT/P51TVB9uxSLaDJhGVrXIxkZ4exUz2pS0Tp2E8eMH
 4BGylAV9vrZqjmF3HQHJu8c/+f833dwbmMzgDFlFgR01U8NQYQvLf6ZoxmnQxdJC
 5CVzO2rGV1bAVEbCJAN/wnKSCpEzjcN0ruz9bcje8MFF9mXXiJxATIgNahuaj17t
 AvOnAbwd
 =FhM8
 -----END PGP SIGNATURE-----

Merge tag '5.15-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull ksmbd fixes from Steve French:
 "Ten fixes for the ksmbd kernel server, for improved security and
  additional buffer overflow checks:

   - a security improvement to session establishment to reduce the
     possibility of dictionary attacks

   - fix to ensure that maximum i/o size negotiated in the protocol is
     not less than 64K and not more than 8MB to better match expected
     behavior

   - fix for crediting (flow control) important to properly verify that
     sufficient credits are available for the requested operation

   - seven additional buffer overflow, buffer validation checks"

* tag '5.15-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: add buffer validation in session setup
  ksmbd: throttle session setup failures to avoid dictionary attacks
  ksmbd: validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests
  ksmbd: validate credit charge after validating SMB2 PDU body size
  ksmbd: add buffer validation for smb direct
  ksmbd: limit read/write/trans buffer size not to exceed 8MB
  ksmbd: validate compound response buffer
  ksmbd: fix potencial 32bit overflow from data area check in smb2_write
  ksmbd: improve credits management
  ksmbd: add validation in smb2_ioctl
2021-10-24 06:43:59 -10:00
Linus Torvalds da4d34b669 io_uring-5.15-2021-10-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmFzfyQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpiO/D/9cqYHpjGDwyftzQFJFfEy9ny6nlLm6lJef
 hsrZjC0S649FnXc0YHVLDH3/nos0XsQUYvVJnAMW9EHB6x/95JRUyxzouVz1Fewp
 w8Z+lOKymIf3X1LQoB6KQXH5ayohNtUo6HA0Ye/v+iEG+bq/lo9tCMSshpJs3afq
 UWW8RxGhrMHfqfgn/8Kkz8fEqZjXz7tssZ+1AFftTxKbk97ZWPahwjvO+xLFWl/m
 NbMkHf3xeAvDL747ccrVBOerRZUPySXZElgkPzdjQ4y5HHZrpxt/ZR9Xu7XRzgkJ
 7SEmsJ80vla19u3eW/oAn3T4EEGS3qWlei8T47kKIoT1W52S3rqjwsV/30re16GW
 sGMWdFiH/GW3VnOxs0/a4/q70je3E9DicSTs4SALTwnvjQ+vrunWgG6ojtxLcieT
 Br+km8nmDPug1wxoH2gQLN/EhGcH5hQvi4ZMiMH8MWalYpEkIADOOvAwp0GDwVoE
 6DxWeYs57rdSQnSLxDah+mAqBokqswJ/ZmuBOO/iSqXCImehLs0VL1Y+TsThVbRy
 epnBdqLk5PbDpODcYTl7on3MD3hpoHjbpnAPah0py57sroiY73sNE/ms1AUsqYPs
 fAe5tjFwhGhVWRiZMGOAG6kgTtSdxG134c0Lyvy6xACTR8rJfgcnWMwFJDWK2GDn
 ReGYJcgEOA==
 =ywLV
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.15-2021-10-22' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Two fixes for the max workers limit API that was introduced this
  series: one fix for an issue with that code, and one fixing a linked
  timeout regression in this series"

* tag 'io_uring-5.15-2021-10-22' of git://git.kernel.dk/linux-block:
  io_uring: apply worker limits to previous users
  io_uring: fix ltimeout unprep
  io_uring: apply max_workers limit to all future users
  io-wq: max_worker fixes
2021-10-22 17:34:31 -10:00
Linus Torvalds 5ab2ed0a8d fuse fixes for 5.15-rc7
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYXLSYQAKCRDh3BK/laaZ
 PEfYAQCZcGVboa5uIrCYmVnEgXXf5NX0UrrM0ytvnVssGcgUOQEA8nAx3hwyvwvS
 onA14DgXIz3koEE48PWv3gbJdpL/kAM=
 =R0ip
 -----END PGP SIGNATURE-----

Merge tag 'fuse-fixes-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
 "Syzbot discovered a race in case of reusing the fuse sb (introduced in
  this cycle).

  Fix it by doing the s_fs_info initialization at the proper place"

* tag 'fuse-fixes-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: clean up error exits in fuse_fill_super()
  fuse: always initialize sb->s_fs_info
  fuse: clean up fuse_mount destruction
  fuse: get rid of fuse_put_super()
  fuse: check s_root when destroying sb
2021-10-22 10:39:47 -10:00
Pavel Begunkov b22fa62a35 io_uring: apply worker limits to previous users
Another change to the API io-wq worker limitation API added in 5.15,
apply the limit to all prior users that already registered a tctx. It
may be confusing as it's now, in particular the change covers the
following 2 cases:

TASK1                   | TASK2
_________________________________________________
ring = create()         |
                        | limit_iowq_workers()
*not limited*           |

TASK1                   | TASK2
_________________________________________________
ring = create()         |
                        | issue_requests()
limit_iowq_workers()    |
                        | *not limited*

A note on locking, it's safe to traverse ->tctx_list as we hold
->uring_lock, but do that after dropping sqd->lock to avoid possible
problems. It's also safe to access tctx->io_wq there because tasks
kill it only after removing themselves from tctx_list, see
io_uring_cancel_generic() -> io_uring_clean_tctx()

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6e09ecc3545e4dc56e43c906ee3d71b7ae21bed.1634818641.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21 11:19:38 -06:00
Miklos Szeredi 964d32e512 fuse: clean up error exits in fuse_fill_super()
Instead of "goto err", return error directly, since there's no error
cleanup to do now.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21 10:01:39 +02:00
Miklos Szeredi 80019f1138 fuse: always initialize sb->s_fs_info
Syzkaller reports a null pointer dereference in fuse_test_super() that is
caused by sb->s_fs_info being NULL.

This is due to the fact that fuse_fill_super() is initializing s_fs_info,
which is too late, it's already on the fs_supers list.  The initialization
needs to be done in sget_fc() with the sb_lock held.

Move allocation of fuse_mount and fuse_conn from fuse_fill_super() into
fuse_get_tree().

After this ->kill_sb() will always be called with non-NULL ->s_fs_info,
hence fuse_mount_destroy() can drop the test for non-NULL "fm".

Reported-by: syzbot+74a15f02ccb51f398601@syzkaller.appspotmail.com
Fixes: 5d5b74aa9c ("fuse: allow sharing existing sb")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21 10:01:39 +02:00
Miklos Szeredi c191cd07ee fuse: clean up fuse_mount destruction
1. call fuse_mount_destroy() for open coded variants

2. before deactivate_locked_super() don't need fuse_mount destruction since
that will now be done (if ->s_fs_info is not cleared)

3. rearrange fuse_mount setup in fuse_get_tree_submount() so that the
regular pattern can be used

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21 10:01:39 +02:00
Miklos Szeredi a27c061a49 fuse: get rid of fuse_put_super()
The ->put_super callback is called from generic_shutdown_super() in case of
a fully initialized sb.  This is called from kill_***_super(), which is
called from ->kill_sb instances.

Fuse uses ->put_super to destroy the fs specific fuse_mount and drop the
reference to the fuse_conn, while it does the same on each error case
during sb setup.

This patch moves the destruction from fuse_put_super() to
fuse_mount_destroy(), called at the end of all ->kill_sb instances.  A
follup patch will clean up the error paths.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21 10:01:38 +02:00
Miklos Szeredi d534d31d6a fuse: check s_root when destroying sb
Checking "fm" works because currently sb->s_fs_info is cleared on error
paths; however, sb->s_root is what generic_shutdown_super() checks to
determine whether the sb was fully initialized or not.

This change will allow cleanup of sb setup error paths.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-21 10:01:38 +02:00
Ian Kent 25f54d08f1 autofs: fix wait name hash calculation in autofs_wait()
There's a mistake in commit 2be7828c9f ("get rid of autofs_getpath()")
that affects kernels from v5.13.0, basically missed because of me not
fully testing the change for Al.

The problem is that the hash calculation for the wait name qstr hasn't
been updated to account for the change to use dentry_path_raw(). This
prevents the correct matching an existing wait resulting in multiple
notifications being sent to the daemon for the same mount which must
not occur.

The problem wasn't discovered earlier because it only occurs when
multiple processes trigger a request for the same mount concurrently
so it only shows up in more aggressive testing.

Fixes: 2be7828c9f ("get rid of autofs_getpath()")
Cc: stable@vger.kernel.org
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-10-20 21:09:02 -04:00
Linus Torvalds 2f111a6fd5 Two important filesystem fixes, marked for stable. The blocklisted
superblocks issue was particularly annoying because for unexperienced
 users it essentially exacted a reboot to establish a new functional
 mount in that scenario.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmFwWuYTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHziwdHB/wJEvFkMQrlzbgVmijhmneU+TseAMxR
 UBGnsyHdiimIDWqzb81cBuDfrocQzyhntghP2lBzcbzI+gZN1KlrYzKAbYk++cfi
 E5Zbw3U8+moa5B2CnO19QEgmJY5DoXYXb6AbO3udIIj1Ls9lx0ByUyDoSn6fZyVH
 iUQ9OH7zVTsTscoaBiEVcutmhQjIFjoYJqPpfCg6/15xcXX/L1DvxQFBWOxXqHQw
 LYfCQIu8orrA2QdZpuTRpklrMg1Ih+RmqYTdQST6tTtTKJUrHPI0r3A8c2vUoBk1
 ph4fBNsAMUqFn1fIGT88PJg81RC5RC3E6D5PqErzRFsPbAv9FHfGYvGQ
 =FadF
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two important filesystem fixes, marked for stable.

  The blocklisted superblocks issue was particularly annoying because
  for unexperienced users it essentially exacted a reboot to establish a
  new functional mount in that scenario"

* tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client:
  ceph: fix handling of "meta" errors
  ceph: skip existing superblocks that are blocklisted or shut down when mounting
2021-10-20 10:23:05 -10:00
Pavel Begunkov 4ea672ab69 io_uring: fix ltimeout unprep
io_unprep_linked_timeout() is broken, first it needs to return back
REQ_F_ARM_LTIMEOUT, so the linked timeout is enqueued and disarmed. But
now we refcounted it, and linked timeouts may get not executed at all,
leaking a request.

Just kill the unprep optimisation.

Fixes: 906c6caaf5 ("io_uring: optimise io_prep_linked_timeout()")
Reported-by: Beld Zhang <beldzhang@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/51b8e2bfc4bea8ee625cf2ba62b2a350cc9be031.1634719585.git.asml.silence@gmail.com
Link: https://github.com/axboe/liburing/issues/460
Reported-by: Beld Zhang <beldzhang@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20 09:54:16 -06:00
Pavel Begunkov e139a1ec92 io_uring: apply max_workers limit to all future users
Currently, IORING_REGISTER_IOWQ_MAX_WORKERS applies only to the task
that issued it, it's unexpected for users. If one task creates a ring,
limits workers and then passes it to another task the limit won't be
applied to the other task.

Another pitfall is that a task should either create a ring or submit at
least one request for IORING_REGISTER_IOWQ_MAX_WORKERS to work at all,
furher complicating the picture.

Change the API, save the limits and apply to all future users. Note, it
should be done first before giving away the ring or submitting new
requests otherwise the result is not guaranteed.

Fixes: 2e480058dd ("io-wq: provide a way to limit max number of workers")
Link: https://github.com/axboe/liburing/issues/460
Reported-by: Beld Zhang <beldzhang@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/51d0bae97180e08ab722c0d5c93e7439cfb6f697.1634683237.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20 09:54:06 -06:00
Marios Makassikis 0d994cd482 ksmbd: add buffer validation in session setup
Make sure the security buffer's length/offset are valid with regards to
the packet length.

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-10-20 00:07:10 -05:00
Namjae Jeon 621be84a9d ksmbd: throttle session setup failures to avoid dictionary attacks
To avoid dictionary attacks (repeated session setups rapidly sent) to
connect to server, ksmbd make a delay of a 5 seconds on session setup
failure to make it harder to send enough random connection requests
to break into a server if a user insert the wrong password 10 times
in a row.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-10-20 00:07:10 -05:00
Hyunchul Lee 34061d6b76 ksmbd: validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests
Validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests and
check the free size of response buffer for these requests.

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-10-20 00:07:10 -05:00
Pavel Begunkov bc369921d6 io-wq: max_worker fixes
First, fix nr_workers checks against max_workers, with max_worker
registration, it may pretty easily happen that nr_workers > max_workers.

Also, synchronise writing to acct->max_worker with wqe->lock. It's not
an actual problem, but as we don't care about io_wqe_create_worker(),
it's better than WRITE_ONCE()/READ_ONCE().

Fixes: 2e480058dd ("io-wq: provide a way to limit max number of workers")
Reported-by: Beld Zhang <beldzhang@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/11f90e6b49410b7d1a88f5d04fb8d95bb86b8cf3.1634671835.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19 17:09:34 -06:00
Jeff Layton 1bd85aa65d ceph: fix handling of "meta" errors
Currently, we check the wb_err too early for directories, before all of
the unsafe child requests have been waited on. In order to fix that we
need to check the mapping->wb_err later nearer to the end of ceph_fsync.

We also have an overly-complex method for tracking errors after
blocklisting. The errors recorded in cleanup_session_requests go to a
completely separate field in the inode, but we end up reporting them the
same way we would for any other error (in fsync).

There's no real benefit to tracking these errors in two different
places, since the only reporting mechanism for them is in fsync, and
we'd need to advance them both every time.

Given that, we can just remove i_meta_err, and convert the places that
used it to instead just use mapping->wb_err instead. That also fixes
the original problem by ensuring that we do a check_and_advance of the
wb_err at the end of the fsync op.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/52864
Reported-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-10-19 09:36:06 +02:00
Jeff Layton 98d0a6fb73 ceph: skip existing superblocks that are blocklisted or shut down when mounting
Currently when mounting, we may end up finding an existing superblock
that corresponds to a blocklisted MDS client. This means that the new
mount ends up being unusable.

If we've found an existing superblock with a client that is already
blocklisted, and the client is not configured to recover on its own,
fail the match. Ditto if the superblock has been forcibly unmounted.

While we're in here, also rename "other" to the more conventional "fsc".

Cc: stable@vger.kernel.org
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1901499
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-10-19 09:36:06 +02:00
Matthew Wilcox (Oracle) 032146cda8 vfs: check fd has read access in kernel_read_file_from_fd()
If we open a file without read access and then pass the fd to a syscall
whose implementation calls kernel_read_file_from_fd(), we get a warning
from __kernel_read():

        if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))

This currently affects both finit_module() and kexec_file_load(), but it
could affect other syscalls in the future.

Link: https://lkml.kernel.org/r/20211007220110.600005-1-willy@infradead.org
Fixes: b844f0ecbc ("vfs: define kernel_copy_file_from_fd()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Hao Sun <sunhao.th@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-18 20:22:03 -10:00