This patch will introduce a kmem cache for allocating message handles
which are needed for midcomms layer to take track of lowcomms messages.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch cleanups the code for allocating a new buffer in the dlm
writequeue mechanism. There was a possible tuneup to allow scheduling
while a new writequeue entry needs to be allocated because either no
sending page is available or are full. To avoid multiple concurrent
users checking at the same time if an entry is available or full
alloc_wq was introduce that those are waiting if there is currently a
new writequeue entry in process to be queued so possible further users
will check on the new allocated writequeue entry if it's full.
To simplify the code we just remove this mutex and switch that the
already introduced spin lock will be held during writequeue check,
allocation and queueing. So other users can never check on available
writequeues while there is a new one in process but not queued yet.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch will use an event based waitqueue to wait for a possible clash
with the ls_remove_name field of dlm_ls instead of doing busy waiting.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Currently we don't care if the DLM application stack is filling buffers
(not committed yet) while we transmit some already committed buffers.
By checking on active writequeue users before dequeue a writequeue entry
we know there is coming more data and do nothing. We wait until the send
worker will be triggered again if the writequeue entry users hit zero.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch will use list_empty(&ls->ls_cb_delay) to check for last list
iteration. In case of a multiply count of MAX_CB_QUEUE and the list is
empty we do a extra goto more which we can avoid by checking on
list_empty().
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch will surround the AF_INET6 case in sk_error_report() of dlm
with a #if IS_ENABLED(CONFIG_IPV6). The field sk->sk_v6_daddr is not
defined when CONFIG_IPV6 is disabled. If CONFIG_IPV6 is disabled, the
socket creation with AF_INET6 should already fail because a runtime
check if AF_INET6 is registered. However if there is the possibility
that AF_INET6 is set as sk_family the sk_error_report() callback will
print then an invalid family type error.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 4c3d90570b ("fs: dlm: don't call kernel_getpeername() in error_report()")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch will replace the use of socket sk_callback_lock lock and uses
socket lock instead. Some users like sunrpc, see commit ea9afca88b
("SUNRPC: Replace use of socket sk_callback_lock with sock_lock") moving
from sk_callback_lock to sock_lock which seems to be held when the socket
callbacks are called.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
In some cases kernel_getpeername() will held the socket lock which is
already held when the socket layer calls error_report() callback. Since
commit 9dfc685e02 ("inet: remove races in inet{6}_getname()") this
problem becomes more likely because the socket lock will be held always.
You will see something like:
bob9-u5 login: [ 562.316860] BUG: spinlock recursion on CPU#7, swapper/7/0
[ 562.318562] lock: 0xffff8f2284720088, .magic: dead4ead, .owner: swapper/7/0, .owner_cpu: 7
[ 562.319522] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.15.0+ #135
[ 562.320346] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
[ 562.321277] Call Trace:
[ 562.321529] <IRQ>
[ 562.321734] dump_stack_lvl+0x33/0x42
[ 562.322282] do_raw_spin_lock+0x8b/0xc0
[ 562.322674] lock_sock_nested+0x1e/0x50
[ 562.323057] inet_getname+0x39/0x110
[ 562.323425] ? sock_def_readable+0x80/0x80
[ 562.323838] lowcomms_error_report+0x63/0x260 [dlm]
[ 562.324338] ? wait_for_completion_interruptible_timeout+0xd2/0x120
[ 562.324949] ? lock_timer_base+0x67/0x80
[ 562.325330] ? do_raw_spin_unlock+0x49/0xc0
[ 562.325735] ? _raw_spin_unlock_irqrestore+0x1e/0x40
[ 562.326218] ? del_timer+0x54/0x80
[ 562.326549] sk_error_report+0x12/0x70
[ 562.326919] tcp_validate_incoming+0x3c8/0x530
[ 562.327347] ? kvm_clock_read+0x14/0x30
[ 562.327718] ? ktime_get+0x3b/0xa0
[ 562.328055] tcp_rcv_established+0x121/0x660
[ 562.328466] tcp_v4_do_rcv+0x132/0x260
[ 562.328835] tcp_v4_rcv+0xcea/0xe20
[ 562.329173] ip_protocol_deliver_rcu+0x35/0x1f0
[ 562.329615] ip_local_deliver_finish+0x54/0x60
[ 562.330050] ip_local_deliver+0xf7/0x110
[ 562.330431] ? inet_rtm_getroute+0x211/0x840
[ 562.330848] ? ip_protocol_deliver_rcu+0x1f0/0x1f0
[ 562.331310] ip_rcv+0xe1/0xf0
[ 562.331603] ? ip_local_deliver+0x110/0x110
[ 562.332011] __netif_receive_skb_core+0x46a/0x1040
[ 562.332476] ? inet_gro_receive+0x263/0x2e0
[ 562.332885] __netif_receive_skb_list_core+0x13b/0x2c0
[ 562.333383] netif_receive_skb_list_internal+0x1c8/0x2f0
[ 562.333896] ? update_load_avg+0x7e/0x5e0
[ 562.334285] gro_normal_list.part.149+0x19/0x40
[ 562.334722] napi_complete_done+0x67/0x160
[ 562.335134] virtnet_poll+0x2ad/0x408 [virtio_net]
[ 562.335644] __napi_poll+0x28/0x140
[ 562.336012] net_rx_action+0x23d/0x300
[ 562.336414] __do_softirq+0xf2/0x2ea
[ 562.336803] irq_exit_rcu+0xc1/0xf0
[ 562.337173] common_interrupt+0xb9/0xd0
It is and was always forbidden to call kernel_getpeername() in context
of error_report(). To get rid of the problem we access the destination
address for the peer over the socket structure. While on it we fix to
print out the destination port of the inet socket.
Fixes: 1a31833d08 ("DLM: Replace nodeid_to_addr with kernel_getpeername")
Reported-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch fixes an potential overflow in sscanf and the maximum
declared string parsing length which seems to be excluding the null
termination symbol. This patch will just add one byte to be prepared on
a string with length of DLM_RESNAME_MAXLEN including the null
termination symbol.
Fixes: 5054e79de9 ("fs: dlm: add lkb debugfs functionality")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch removes a list_first_entry() call which is already done by
the previous con_next_wq() call.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch fixes the following crash by receiving a invalid message:
[ 160.672220] ==================================================================
[ 160.676206] BUG: KASAN: user-memory-access in dlm_user_add_ast+0xc3/0x370
[ 160.679659] Read of size 8 at addr 00000000deadbeef by task kworker/u32:13/319
[ 160.681447]
[ 160.681824] CPU: 10 PID: 319 Comm: kworker/u32:13 Not tainted 5.14.0-rc2+ #399
[ 160.683472] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.14.0-1.module+el8.6.0+12648+6ede71a5 04/01/2014
[ 160.685574] Workqueue: dlm_recv process_recv_sockets
[ 160.686721] Call Trace:
[ 160.687310] dump_stack_lvl+0x56/0x6f
[ 160.688169] ? dlm_user_add_ast+0xc3/0x370
[ 160.689116] kasan_report.cold.14+0x116/0x11b
[ 160.690138] ? dlm_user_add_ast+0xc3/0x370
[ 160.690832] dlm_user_add_ast+0xc3/0x370
[ 160.691502] _receive_unlock_reply+0x103/0x170
[ 160.692241] _receive_message+0x11df/0x1ec0
[ 160.692926] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 160.693700] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 160.694427] ? lock_acquire+0x175/0x400
[ 160.695058] ? do_purge.isra.51+0x200/0x200
[ 160.695744] ? lock_acquired+0x360/0x5d0
[ 160.696400] ? lock_contended+0x6a0/0x6a0
[ 160.697055] ? lock_release+0x21d/0x5e0
[ 160.697686] ? lock_is_held_type+0xe0/0x110
[ 160.698352] ? lock_is_held_type+0xe0/0x110
[ 160.699026] ? ___might_sleep+0x1cc/0x1e0
[ 160.699698] ? dlm_wait_requestqueue+0x94/0x140
[ 160.700451] ? dlm_process_requestqueue+0x240/0x240
[ 160.701249] ? down_write_killable+0x2b0/0x2b0
[ 160.701988] ? do_raw_spin_unlock+0xa2/0x130
[ 160.702690] dlm_receive_buffer+0x1a5/0x210
[ 160.703385] dlm_process_incoming_buffer+0x726/0x9f0
[ 160.704210] receive_from_sock+0x1c0/0x3b0
[ 160.704886] ? dlm_tcp_shutdown+0x30/0x30
[ 160.705561] ? lock_acquire+0x175/0x400
[ 160.706197] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 160.706941] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 160.707681] process_recv_sockets+0x32/0x40
[ 160.708366] process_one_work+0x55e/0xad0
[ 160.709045] ? pwq_dec_nr_in_flight+0x110/0x110
[ 160.709820] worker_thread+0x65/0x5e0
[ 160.710423] ? process_one_work+0xad0/0xad0
[ 160.711087] kthread+0x1ed/0x220
[ 160.711628] ? set_kthread_struct+0x80/0x80
[ 160.712314] ret_from_fork+0x22/0x30
The issue is that we received a DLM message for a user lock but the
destination lock is a kernel lock. Note that the address which is trying
to derefence is 00000000deadbeef, which is in a kernel lock
lkb->lkb_astparam, this field should never be derefenced by the DLM
kernel stack. In case of a user lock lkb->lkb_astparam is lkb->lkb_ua
(memory is shared by a union field). The struct lkb_ua will be handled
by the DLM kernel stack but on a kernel lock it will contain invalid
data and ends in most likely crashing the kernel.
It can be reproduced with two cluster nodes.
node 2:
dlm_tool join test
echo "862 fooobaar 1 2 1" > /sys/kernel/debug/dlm/test_locks
echo "862 3 1" > /sys/kernel/debug/dlm/test_waiters
node 1:
dlm_tool join test
python:
foo = DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0x77222027, \
m_type=7, m_flags=0x1, m_remid=0x862, m_result=0xFFFEFFFE)
newFile = open("/sys/kernel/debug/dlm/comms/2/rawmsg", "wb")
newFile.write(bytes(foo))
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds functionality to put a lkb to the waiters state. It can
be useful to combine this feature with the "rawmsg" debugfs
functionality. It will bring the DLM lkb into a state that a message
will be parsed by the kernel.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds functionality to add an lkb during runtime. This is a
highly debugging feature only, wrong input can crash the kernel. It is a
early state feature as well. The goal is to provide a user interface for
manipulate dlm state and combine it with the rawmsg feature. It is
debugfs functionality, we don't care about UAPI breakage. Even it's
possible to add lkb's/rsb's which could never be exists in such wat by
using normal DLM operation. The user of this interface always need to
think before using this feature, not every crash which happens can really
occur during normal dlm operation.
Future there should be more functionality to add a more realistic lkb
which reflects normal DLM state inside the kernel. For now this is
enough.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds functionality to add a lkb with a specific id range.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds a dlm functionality to send a raw dlm message to a
specific cluster node. This raw message can be build by user space and
send out by writing the message to "rawmsg" dlm debugfs file.
There is a in progress scapy dlm module which provides a easy build of
DLM messages in user space. For example:
DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0xe4f48a18, ...)
The goal is to provide an easy reproducable state to crash DLM or to
fuzz the DLM kernel stack if there are possible ways to crash it.
Note: that if the sequence number is zero and dlm version is not set to
3.1 the kernel will automatic will set a right sequence number, otherwise
DLM stack testing is not possible.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch changes the dlm_lowcomms_new_msg() function pointer private data
from "struct mhandle *" to "void *" to provide different structures than
just "struct mhandle".
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch changes the ls_count busy wait to use atomic counter values
and wait_event() to wait until ls_count reach zero. It will slightly
reduce the number of holding lslist_lock. At remove lockspace we need to
retry the wait because it a lockspace get could interefere between
wait_event() and holding the lock which deletes the lockspace list entry.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch changes the requestqueue busy waiting algorithm to use
atomic counter values and wait_event() to wait until the requestqueue is
empty. It will slightly reduce the number of holding ls_requestqueue_mutex
mutex.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds tracepoints for dlm socket receive and send
functionality. We can use it to track how much data was send or received
to or from a specific nodeid.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds initial support for dlm tracepoints. It will introduce
tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their
complete ast() callback or blocking bast() callback.
The lock/unlock functionality has a start and end tracepoint, this is
because there exists a race in case if would have a tracepoint at the
end position only the complete/blocking callbacks could occur before. To
work with eBPF tracing and using their lookup hash functionality there
could be problems that an entry was not inserted yet. However use the
start functionality for hash insert and check again in end functionality
if there was an dlm internal error so there is no ast callback. In further
it might also that locks with local masters will occur those callbacks
immediately so we must have such functionality.
I did not make everything accessible yet, although it seems eBPF can be
used to access a lot of internal datastructures if it's aware of the
struct definitions of the running kernel instance. We still can change
it, if you do eBPF experiments e.g. time measurements between lock and
callback functionality you can simple use the local lkb_id field as hash
value in combination with the lockspace id if you have multiple
lockspaces. Otherwise you can simple use trace-cmd for some functionality,
e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch makes dlm_callback_resume info printout less noisy by
accumulate all callback queues into one printout not in 25 times steps.
It seems this printout became lately quite noisy in relationship with
gfs2.
Before:
[241767.849302] dlm: bin: dlm_callback_resume 25
[241767.854846] dlm: bin: dlm_callback_resume 25
[241767.860373] dlm: bin: dlm_callback_resume 25
...
[241767.865920] dlm: bin: dlm_callback_resume 25
[241767.871352] dlm: bin: dlm_callback_resume 25
[241767.876733] dlm: bin: dlm_callback_resume 25
After the patch:
[ 385.485728] dlm: gfs2: dlm_callback_resume 175
if zero it will not be printed out.
Reported-by: Barry Marson <bmarson@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch will change to evaluate the dlm_recovery_stopped() in the
condition of the if branch instead fetch it before evaluating the
condition. As this is an atomic test-set operation it should be
evaluated in the condition itself.
Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch will change to use dlm_recovery_stopped() which is the dlm way
to check if the LSFL_RECOVER_STOP flag in ls_flags by using the helper.
It is an atomic operation but the check is still as before to fetch the
value if ls_recover_lock is held. There might be more further
investigations if the value can be changed afterwards and if it has any
side effects.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch moves version conversion to little endian from a runtime
variable to compile time constant.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Since commit 764ff4011424 ("fs: dlm: auto load sctp module") we try
load the sctp module before we try to create a sctp kernel socket. That
a socket creation fails now has more likely other reasons. This patch
removes the part of error to load the sctp module and instead printout
the error code.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch improves the debug output for midcomms layer by also printing
out the nodeid where users counter belongs to.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch fixes a typo from lockspace to lockspace.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch removes an obsolete define for some length for an temporary
buffer which is not being used anymore. The use of this define is not
necessary anymore since commit 4798cbbfbd ("fs: dlm: rework receive
handling").
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Pull autofs fix from Al Viro:
"Fix for a braino of mine (in getting rid of open-coded
dentry_path_raw() in autofs a couple of cycles ago).
Mea culpa... Obvious -stable fodder"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
autofs: fix wait name hash calculation in autofs_wait()
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmFzb/4ACgkQiiy9cAdy
T1HhOQv/bTsGX85V5QFBQyW7F7ymYaSuDpvb/nYFzzs4fMz8/lbCJ6GQHMK7G+uR
ZYFdZw5c5A+KsfjRoRAKOc6HJbGERnKWvIlRkNWnmcpsXcR3PFhhOxXKXxCTBjga
fHxAJBENbsyCtGz25pUtuOKVG9zVaz31lyXgnz53b0ATrb8CxIll7zdKTp7aGdK0
UIWUEYeUwctIhvyBXyms8ni+15keop6/7Y1KhUvodL/VhU4YCkKerFFQojsqLEBk
4N6ZrgEEvzZPSjMt3KkbAapMNvf8Jgy6hKrB10MWB2sJG3bG1A4MuWYzVz7hlLbx
KXztSjbEiExCw7Y8ZXeA0pT/P51TVB9uxSLaDJhGVrXIxkZ4exUz2pS0Tp2E8eMH
4BGylAV9vrZqjmF3HQHJu8c/+f833dwbmMzgDFlFgR01U8NQYQvLf6ZoxmnQxdJC
5CVzO2rGV1bAVEbCJAN/wnKSCpEzjcN0ruz9bcje8MFF9mXXiJxATIgNahuaj17t
AvOnAbwd
=FhM8
-----END PGP SIGNATURE-----
Merge tag '5.15-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd fixes from Steve French:
"Ten fixes for the ksmbd kernel server, for improved security and
additional buffer overflow checks:
- a security improvement to session establishment to reduce the
possibility of dictionary attacks
- fix to ensure that maximum i/o size negotiated in the protocol is
not less than 64K and not more than 8MB to better match expected
behavior
- fix for crediting (flow control) important to properly verify that
sufficient credits are available for the requested operation
- seven additional buffer overflow, buffer validation checks"
* tag '5.15-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: add buffer validation in session setup
ksmbd: throttle session setup failures to avoid dictionary attacks
ksmbd: validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests
ksmbd: validate credit charge after validating SMB2 PDU body size
ksmbd: add buffer validation for smb direct
ksmbd: limit read/write/trans buffer size not to exceed 8MB
ksmbd: validate compound response buffer
ksmbd: fix potencial 32bit overflow from data area check in smb2_write
ksmbd: improve credits management
ksmbd: add validation in smb2_ioctl
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmFzfyQQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpiO/D/9cqYHpjGDwyftzQFJFfEy9ny6nlLm6lJef
hsrZjC0S649FnXc0YHVLDH3/nos0XsQUYvVJnAMW9EHB6x/95JRUyxzouVz1Fewp
w8Z+lOKymIf3X1LQoB6KQXH5ayohNtUo6HA0Ye/v+iEG+bq/lo9tCMSshpJs3afq
UWW8RxGhrMHfqfgn/8Kkz8fEqZjXz7tssZ+1AFftTxKbk97ZWPahwjvO+xLFWl/m
NbMkHf3xeAvDL747ccrVBOerRZUPySXZElgkPzdjQ4y5HHZrpxt/ZR9Xu7XRzgkJ
7SEmsJ80vla19u3eW/oAn3T4EEGS3qWlei8T47kKIoT1W52S3rqjwsV/30re16GW
sGMWdFiH/GW3VnOxs0/a4/q70je3E9DicSTs4SALTwnvjQ+vrunWgG6ojtxLcieT
Br+km8nmDPug1wxoH2gQLN/EhGcH5hQvi4ZMiMH8MWalYpEkIADOOvAwp0GDwVoE
6DxWeYs57rdSQnSLxDah+mAqBokqswJ/ZmuBOO/iSqXCImehLs0VL1Y+TsThVbRy
epnBdqLk5PbDpODcYTl7on3MD3hpoHjbpnAPah0py57sroiY73sNE/ms1AUsqYPs
fAe5tjFwhGhVWRiZMGOAG6kgTtSdxG134c0Lyvy6xACTR8rJfgcnWMwFJDWK2GDn
ReGYJcgEOA==
=ywLV
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.15-2021-10-22' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two fixes for the max workers limit API that was introduced this
series: one fix for an issue with that code, and one fixing a linked
timeout regression in this series"
* tag 'io_uring-5.15-2021-10-22' of git://git.kernel.dk/linux-block:
io_uring: apply worker limits to previous users
io_uring: fix ltimeout unprep
io_uring: apply max_workers limit to all future users
io-wq: max_worker fixes
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYXLSYQAKCRDh3BK/laaZ
PEfYAQCZcGVboa5uIrCYmVnEgXXf5NX0UrrM0ytvnVssGcgUOQEA8nAx3hwyvwvS
onA14DgXIz3koEE48PWv3gbJdpL/kAM=
=R0ip
-----END PGP SIGNATURE-----
Merge tag 'fuse-fixes-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"Syzbot discovered a race in case of reusing the fuse sb (introduced in
this cycle).
Fix it by doing the s_fs_info initialization at the proper place"
* tag 'fuse-fixes-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: clean up error exits in fuse_fill_super()
fuse: always initialize sb->s_fs_info
fuse: clean up fuse_mount destruction
fuse: get rid of fuse_put_super()
fuse: check s_root when destroying sb
Another change to the API io-wq worker limitation API added in 5.15,
apply the limit to all prior users that already registered a tctx. It
may be confusing as it's now, in particular the change covers the
following 2 cases:
TASK1 | TASK2
_________________________________________________
ring = create() |
| limit_iowq_workers()
*not limited* |
TASK1 | TASK2
_________________________________________________
ring = create() |
| issue_requests()
limit_iowq_workers() |
| *not limited*
A note on locking, it's safe to traverse ->tctx_list as we hold
->uring_lock, but do that after dropping sqd->lock to avoid possible
problems. It's also safe to access tctx->io_wq there because tasks
kill it only after removing themselves from tctx_list, see
io_uring_cancel_generic() -> io_uring_clean_tctx()
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6e09ecc3545e4dc56e43c906ee3d71b7ae21bed.1634818641.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Syzkaller reports a null pointer dereference in fuse_test_super() that is
caused by sb->s_fs_info being NULL.
This is due to the fact that fuse_fill_super() is initializing s_fs_info,
which is too late, it's already on the fs_supers list. The initialization
needs to be done in sget_fc() with the sb_lock held.
Move allocation of fuse_mount and fuse_conn from fuse_fill_super() into
fuse_get_tree().
After this ->kill_sb() will always be called with non-NULL ->s_fs_info,
hence fuse_mount_destroy() can drop the test for non-NULL "fm".
Reported-by: syzbot+74a15f02ccb51f398601@syzkaller.appspotmail.com
Fixes: 5d5b74aa9c ("fuse: allow sharing existing sb")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
1. call fuse_mount_destroy() for open coded variants
2. before deactivate_locked_super() don't need fuse_mount destruction since
that will now be done (if ->s_fs_info is not cleared)
3. rearrange fuse_mount setup in fuse_get_tree_submount() so that the
regular pattern can be used
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The ->put_super callback is called from generic_shutdown_super() in case of
a fully initialized sb. This is called from kill_***_super(), which is
called from ->kill_sb instances.
Fuse uses ->put_super to destroy the fs specific fuse_mount and drop the
reference to the fuse_conn, while it does the same on each error case
during sb setup.
This patch moves the destruction from fuse_put_super() to
fuse_mount_destroy(), called at the end of all ->kill_sb instances. A
follup patch will clean up the error paths.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Checking "fm" works because currently sb->s_fs_info is cleared on error
paths; however, sb->s_root is what generic_shutdown_super() checks to
determine whether the sb was fully initialized or not.
This change will allow cleanup of sb setup error paths.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
There's a mistake in commit 2be7828c9f ("get rid of autofs_getpath()")
that affects kernels from v5.13.0, basically missed because of me not
fully testing the change for Al.
The problem is that the hash calculation for the wait name qstr hasn't
been updated to account for the change to use dentry_path_raw(). This
prevents the correct matching an existing wait resulting in multiple
notifications being sent to the daemon for the same mount which must
not occur.
The problem wasn't discovered earlier because it only occurs when
multiple processes trigger a request for the same mount concurrently
so it only shows up in more aggressive testing.
Fixes: 2be7828c9f ("get rid of autofs_getpath()")
Cc: stable@vger.kernel.org
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
superblocks issue was particularly annoying because for unexperienced
users it essentially exacted a reboot to establish a new functional
mount in that scenario.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmFwWuYTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHziwdHB/wJEvFkMQrlzbgVmijhmneU+TseAMxR
UBGnsyHdiimIDWqzb81cBuDfrocQzyhntghP2lBzcbzI+gZN1KlrYzKAbYk++cfi
E5Zbw3U8+moa5B2CnO19QEgmJY5DoXYXb6AbO3udIIj1Ls9lx0ByUyDoSn6fZyVH
iUQ9OH7zVTsTscoaBiEVcutmhQjIFjoYJqPpfCg6/15xcXX/L1DvxQFBWOxXqHQw
LYfCQIu8orrA2QdZpuTRpklrMg1Ih+RmqYTdQST6tTtTKJUrHPI0r3A8c2vUoBk1
ph4fBNsAMUqFn1fIGT88PJg81RC5RC3E6D5PqErzRFsPbAv9FHfGYvGQ
=FadF
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two important filesystem fixes, marked for stable.
The blocklisted superblocks issue was particularly annoying because
for unexperienced users it essentially exacted a reboot to establish a
new functional mount in that scenario"
* tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client:
ceph: fix handling of "meta" errors
ceph: skip existing superblocks that are blocklisted or shut down when mounting
io_unprep_linked_timeout() is broken, first it needs to return back
REQ_F_ARM_LTIMEOUT, so the linked timeout is enqueued and disarmed. But
now we refcounted it, and linked timeouts may get not executed at all,
leaking a request.
Just kill the unprep optimisation.
Fixes: 906c6caaf5 ("io_uring: optimise io_prep_linked_timeout()")
Reported-by: Beld Zhang <beldzhang@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/51b8e2bfc4bea8ee625cf2ba62b2a350cc9be031.1634719585.git.asml.silence@gmail.com
Link: https://github.com/axboe/liburing/issues/460
Reported-by: Beld Zhang <beldzhang@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, IORING_REGISTER_IOWQ_MAX_WORKERS applies only to the task
that issued it, it's unexpected for users. If one task creates a ring,
limits workers and then passes it to another task the limit won't be
applied to the other task.
Another pitfall is that a task should either create a ring or submit at
least one request for IORING_REGISTER_IOWQ_MAX_WORKERS to work at all,
furher complicating the picture.
Change the API, save the limits and apply to all future users. Note, it
should be done first before giving away the ring or submitting new
requests otherwise the result is not guaranteed.
Fixes: 2e480058dd ("io-wq: provide a way to limit max number of workers")
Link: https://github.com/axboe/liburing/issues/460
Reported-by: Beld Zhang <beldzhang@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/51d0bae97180e08ab722c0d5c93e7439cfb6f697.1634683237.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Make sure the security buffer's length/offset are valid with regards to
the packet length.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Signed-off-by: Steve French <stfrench@microsoft.com>
To avoid dictionary attacks (repeated session setups rapidly sent) to
connect to server, ksmbd make a delay of a 5 seconds on session setup
failure to make it harder to send enough random connection requests
to break into a server if a user insert the wrong password 10 times
in a row.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests and
check the free size of response buffer for these requests.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
First, fix nr_workers checks against max_workers, with max_worker
registration, it may pretty easily happen that nr_workers > max_workers.
Also, synchronise writing to acct->max_worker with wqe->lock. It's not
an actual problem, but as we don't care about io_wqe_create_worker(),
it's better than WRITE_ONCE()/READ_ONCE().
Fixes: 2e480058dd ("io-wq: provide a way to limit max number of workers")
Reported-by: Beld Zhang <beldzhang@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/11f90e6b49410b7d1a88f5d04fb8d95bb86b8cf3.1634671835.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, we check the wb_err too early for directories, before all of
the unsafe child requests have been waited on. In order to fix that we
need to check the mapping->wb_err later nearer to the end of ceph_fsync.
We also have an overly-complex method for tracking errors after
blocklisting. The errors recorded in cleanup_session_requests go to a
completely separate field in the inode, but we end up reporting them the
same way we would for any other error (in fsync).
There's no real benefit to tracking these errors in two different
places, since the only reporting mechanism for them is in fsync, and
we'd need to advance them both every time.
Given that, we can just remove i_meta_err, and convert the places that
used it to instead just use mapping->wb_err instead. That also fixes
the original problem by ensuring that we do a check_and_advance of the
wb_err at the end of the fsync op.
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/52864
Reported-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Currently when mounting, we may end up finding an existing superblock
that corresponds to a blocklisted MDS client. This means that the new
mount ends up being unusable.
If we've found an existing superblock with a client that is already
blocklisted, and the client is not configured to recover on its own,
fail the match. Ditto if the superblock has been forcibly unmounted.
While we're in here, also rename "other" to the more conventional "fsc".
Cc: stable@vger.kernel.org
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1901499
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If we open a file without read access and then pass the fd to a syscall
whose implementation calls kernel_read_file_from_fd(), we get a warning
from __kernel_read():
if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
This currently affects both finit_module() and kexec_file_load(), but it
could affect other syscalls in the future.
Link: https://lkml.kernel.org/r/20211007220110.600005-1-willy@infradead.org
Fixes: b844f0ecbc ("vfs: define kernel_copy_file_from_fd()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Hao Sun <sunhao.th@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>