The introduction of the multi-device feature partially broke the support
for zoned block devices. In the function f2fs_scan_devices, sbi->devs
allocation and initialization is skipped in the case of a single device
mount. This result in no device information structure being allocated
for the device. This is fine if the device is a regular device, but in
the case of a zoned block device, the device zone type array is not
initialized, which causes the function __f2fs_issue_discard_zone to fail
as get_blkz_type is unable to determine the zone type of a section.
Fix this by always allocating and initializing the sbi->devs device
information array even in the case of a single device if that device is
zoned. For this particular case, make sure to obtain a reference on the
single device so that the call to blkdev_put() in destroy_device_list
operates as expected.
Fixes: 3c62be17d4 ("f2fs: support multiple devices")
Cc: <stable@vger.kernel.org> # v4.10
Signed-off-by: Masato Suzuki <masato.suzuki@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch remove redundant set_page_dirty in truncate_blocks
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It needs to double cache size of write_io_dummy mempool, otherwise we may
run out of cache in scenraio of Data/Node IOs were issued concurrently.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We should only account F2FS_{CP_}WB_DATA IOs for write path, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
F2FS has define MAX_FREE_NIDS for maximum of cached free nids target.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In scenario of intensively node allocation, free nids will be ran out
soon, then it needs to stop to load free nids by traversing NAT blocks,
in worse case, if NAT blocks does not be cached in memory, it generates
IOs which slows down our foreground operations.
In order to speed up node allocation, in this patch we introduce a new
free_nid_bitmap array, so there is an bitmap table for each NAT block,
Once the NAT block is loaded, related bitmap cache will be switched on,
and bitmap will be set during traversing nat entries in NAT block, later
we can query and update nid usage status in memory completely.
With such implementation, I expect performance of node allocation can be
improved in the long-term after filesystem image is mounted.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There are four places that getting the crc value in f2fs_checkpoint,
just add a new helper cur_cp_crc for them.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs removes the old xattr data and appends the new data although
the new data is same as the exist.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since commit ee6d182f2a ("f2fs: remove syncing inode page in all the
cases") delayed inode element updating from inode cache to node page
cache, so once largest cached extent is updated, we can make inode dirty
immediately instead of checking and updating it in the end of extent
cache update.
The above commit didn't clean up unneeded codes in extent_cache.c, let's
finish the job in this patch.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We use has_not_enough_free_secs to check if there are enough free segments,
(free_sections(sbi) + freed) <=
(node_secs + 2 * dent_secs + imeta_secs +
reserved_sections(sbi) + needed);
Under scenario with large number of dirty nodes, these nodes would be flushed
during cp, as a result, right side of the inequality would be decreased, while
left side stays unchanged if these nodes are flushed in SSR way, which means
there are enough free segments after this cp.
For this case, we just do a bggc instead of fggc.
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In SSR mode, we can allocate target segment which has different
temperature type from the type of current block, in order to avoid
mixing coldest and hottest data/node as much as possible, change
SSR allocation policy to select closer temperature for current
block prior.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously kernel message can show that in which function we do the
injection, but unfortunately, most of the caller are the same, for
tracking more information of injection path, it needs to show upper
caller's name. This patch supports that ability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Similar as f2fs_write_inode, f2fs_write_inline_data just
mark inode page dirty, so it's no need to write inline data
under read lock of cp_rwsem.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patches adds bitmaps to represent empty or full NAT blocks containing
free nid entries.
If we can find valid crc|cp_ver in the last block of checkpoint pack, we'll
use these bitmaps when building free nids. In order to avoid checkpointing
burden, up-to-date bitmaps will be flushed only during umount time. So,
normally we can get this gain, but when power-cut happens, we rely on fsck.f2fs
which recovers this bitmap again.
After this patch, we build free nids from nid #0 at mount time to make more
full NAT blocks, but in runtime, we check empty NAT blocks to load free nids
without loading any NAT pages from disk.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch replace rw semaphore extent_tree_lock with mutex lock
for no read cases with this lock.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When more than one data blocks are allocated, the F2FS_MAP_UNWRITTEN/MAPPED
flags will be overlapped by F2FS_MAP_NEW at the later times.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
proc A: proc B:
- writeback_sb_inodes
- __writeback_single_inode
- do_writepages
- f2fs_write_node_pages
- f2fs_balance_fs_bg - write_checkpoint
- build_free_nids - flush_nat_entries
- __build_free_nids - __flush_nat_entry_set
- ra_meta_pages - get_next_nat_page
- current_nat_addr - set_to_next_nat
[do nat_bitmap checking] - f2fs_change_bit
For proc A, nat_bitmap and nat_bitmap_mir would be compared without lock_op and
nm_i->nat_tree_lock, while proc B is changing nat_bitmap/nat_bitmap_ver in cp.
So it is normal for nat_bitmap/nat_bitmap diffrence under such scenario.
This patch fix this by removing the monitoring point.
[Fix: 599a09b f2fs: check in-memory nat version bitmap]
Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since has_not_enough_free_secs(sbi, 0, 0) must be true if has_not_enough_
free_secs(sbi, sec_freed, 0) is true, write_checkpoint is sure to execute in
both conditions.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For converntional zones, we don't need to align discard commands to exact zone
size.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We don't need to wait for each discard commands when unmounting the image.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We have a kernel thread to issue discard commands, so we can increase the
number of batched discard sections. By default, now it becomes 4GB range.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds MAX_DISCARD_BLOCKS() to avoid issuing too much large single
discard command.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Geminilake HDMI codec 0x280d is similar to previous platforms, so add it with
similar ops as previous.
Signed-off-by: Senthilnathan Veppur <senthilnathanx.veppur@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Yuval Mintz says:
====================
qed: Bug fixes
Patch #1 addresses a day-one race which is dependent on the number of Vfs
[I.e., more child VFs from a single PF make it more probable].
Patch #2 corrects a race that got introduced in the last set of fixes for
qed, one that would happen each time PF transitions to UP state.
I've built & tested those against current net-next.
Please consider applying the series there.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 653d2ffd6405 ("qed*: Fix link indication race") introduced another
race - one of the inner functions called from the link-change flow is
explicitly using the slowpath context dedicated PTT instead of gaining
that PTT from the caller. Since this flow can now be called from
a different context as well, we're in risk of the PTT breaking.
Fixes: 653d2ffd6405 ("qed*: Fix link indication race")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A PF syncronizes all IOV activity relating to its VFs
by using a single workqueue handling the work.
The workqueue would reach a bitmask of pending VF events
and act upon each in turn.
Problem is that the indication of a VF message [which sets
the 'vf event' bit for that VF] arrives and is set in
the slowpath attention context, which isn't syncronized with
the processing of the events.
When multiple VFs are present, it's possible that PF would
lose the indication of one of the VF's pending evens, leading
that VF to later timeout.
Instead of adding locks/barriers, simply move from a bitmask
into a per-VF indication inside that VF entry in the PF database.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains netfilter fixes for you net tree,
they are:
1) Missing ct zone size in the nft_ct initialization path, patch
from Florian Westphal.
2) Two patches for netfilter uapi headers, one to remove unnecessary
sysctl.h inclusion and another to fix compilation of xt_hashlimit.h
in userspace, from Dmitry V. Levin.
3) Patch to fix a sloppy change in nf_ct_expect that incorrectly
simplified nf_ct_expect_related_report() in the previous nf-next
batch. This also includes another patch for __nf_ct_expect_check()
to report success by returning 0 to keep it consistent with other
existing functions. From Jarno Rajahalme.
4) The ->walk() iterator of the new bitmap set type goes over the real
bitmap size, this results in incorrect dumps when NFTA_SET_USERDATA
is used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add compat ioctl support to dma-buf. This lets one to use DMA_BUF_IOCTL_SYNC
ioctl from 32bit application on 64bit kernel. Data structures for both 32
and 64bit modes are same, so there is no need for additional translation
layer.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1487683261-2655-1-git-send-email-m.szyprowski@samsung.com
On this machine, the micmute button is connected to Line2 of the
codec and the micmute led is connected to GPIO2 of the codec.
After applying this quirk, both hotkey and led work well.
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
l2tp_ip_backlog_recv may not return -1 if the packet gets dropped.
The return value is passed up to ip_local_deliver_finish, which treats
negative values as an IP protocol number for resubmission.
Signed-off-by: Paul Hüber <phueber@kernsp.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix xfrm_neigh_lookup to provide dst->path to the
neigh_lookup dst_ops method.
When skb is provided, the IP address in packet should already
match the dst->path address family. But for the non-skb case,
we should consider the last tunnel address as nexthop address.
Fixes: f894cbf847 ("net: Add optional SKB arg to dst_ops->neigh_lookup().")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current annotation is wrong as it says that we're only called
under spinlock. In fact it should be marked as under either
spinlock or RCU read lock.
Fixes: da20420f83 ("rhashtable: Add nested tables")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter reported a use before NULL check bug in the function
bucket_table_free. In fact we don't need the NULL check at all as
no caller can provide a NULL argument. So this patch fixes this by
simply removing it.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calls made through the in-kernel interface can end up getting stuck because
of a missed variable update in a loop in rxrpc_recvmsg_data(). The problem
is like this:
(1) A new packet comes in and doesn't cause a notification to be given to
the client as there's still another packet in the ring - the
assumption being that if the client will keep drawing off data until
the ring is empty.
(2) The client is in rxrpc_recvmsg_data(), inside the big while loop that
iterates through the packets. This copies the window pointers into
variables rather than using the information in the call struct
because:
(a) MSG_PEEK might be in effect;
(b) we need a barrier after reading call->rx_top to pair with the
barrier in the softirq routine that loads the buffer.
(3) The reading of call->rx_top is done outside of the loop, and top is
never updated whilst we're in the loop. This means that even through
there's a new packet available, we don't see it and may return -EFAULT
to the caller - who will happily return to the scheduler and await the
next notification.
(4) No further notifications are forthcoming until there's an abort as the
ring isn't empty.
The fix is to move the read of call->rx_top inside the loop - but it needs
to be done before the condition is checked.
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tc actions are loaded as a module and no actions have been installed,
flushing them would result in actions removed from the memory, but modules
reference count not being decremented, so that the modules would not be
unloaded.
Following is example with GACT action:
% sudo modprobe act_gact
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions ls action gact
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 1
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 2
% sudo rmmod act_gact
rmmod: ERROR: Module act_gact is in use
....
After the fix:
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions add action pass index 1
% sudo tc actions add action pass index 2
% sudo tc actions add action pass index 3
% lsmod
Module Size Used by
act_gact 16384 3
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 0
%
% sudo tc actions flush action gact
% lsmod
Module Size Used by
act_gact 16384 0
% sudo rmmod act_gact
% lsmod
Module Size Used by
%
Fixes: f97017cdef ("net-sched: Fix actions flushing")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows to enable and run the accompanying test (test_parman)
without dependencies on other users of parman.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5e1859fbcc ("ipv4: ipmr: various fixes and cleanups") fixed
the issue for ipv4 ipmr:
ip_mroute_setsockopt() & ip_mroute_getsockopt() should not
access/set raw_sk(sk)->ipmr_table before making sure the socket
is a raw socket, and protocol is IGMP
The same fix should be done for ipv6 ipmr as well.
This patch can fix the panic caused by overwriting the same offset
as ipmr_table as in raw_sk(sk) when accessing other type's socket
by ip_mroute_setsockopt().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b8607805dd ("sctp: not copying duplicate addrs to the assoc's
bind address list") tried to check for duplicate address before copying
to asoc's bind_addr list from global addr list.
But all the addrs' sin_ports in global addr list are 0 while the addrs'
sin_ports are bp->port in asoc's bind_addr list. It means even if it's
a duplicate address, af->cmp_addr will still return 0 as the their
sin_ports are different.
This patch is to fix it by setting the sin_port for addr param with
bp->port before comparing the addrs.
Fixes: b8607805dd ("sctp: not copying duplicate addrs to the assoc's bind address list")
Reported-by: Wei Chen <weichen@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmware tools has a daemon that gets layout information from the GUI and
forwards it to DRM so that the modesetting code can set preferred connector
locations and modes. This daemon was using control nodes but since control
nodes were just removed, make it possible for the daemon to use render- or
primary nodes instead. This is a bit ugly but will allow drm to proceed with
removal of the mostly unused control-node code and allow vmware to proceed
with fixing up automatic layout settings for gnome-shell/wayland.
We bump minor to inform user-space about the api change.
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170221104227.2854-1-thellstrom@vmware.com
The cited commit makes a great job of finding optimal shift/multiplier
values assuming a 10 seconds wrap around, but forgot to change the
overflow_period computation.
It overflows in cyclecounter_cyc2ns(), and the final result is 804 ms,
which is silly.
Lets simply use 5 seconds, no need to recompute this, given how it is
supposed to work.
Later, we will use a timer instead of a work queue, since the new RX
allocation schem will no longer need mlx4_en_recover_from_oom() and the
service_task firing every 250 ms.
Fixes: 31c128b66e ("net/mlx4_en: Choose time-stamping shift value according to HW frequency")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
priv->bitmap_size stores the real bitmap size, instead of the full
struct nft_bitmap object.
Fixes: 665153ff57 ("netfilter: nf_tables: add bitmap set type")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This commit fix the typo argumnet/argument
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit fix the typo argumnet/argument
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 4dee62b1b9 ("netfilter: nf_ct_expect: nf_ct_expect_insert()
returns void") inadvertently changed the successful return value of
nf_ct_expect_related_report() from 0 to 1 due to
__nf_ct_expect_check() returning 1 on success. Prevent this
regression in the future by changing the return value of
__nf_ct_expect_check() to 0 on success.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>