* acpi-ec:
ACPI / EC: Clean up EC GPE mask flag
ACPI: EC: Fix possible issues related to EC initialization order
* acpi-dma:
ACPI/IORT: Add IORT named component memory address limits
ACPI: Make acpi_dma_configure() DMA regions aware
ACPI: Introduce DMA ranges parsing
ACPI: Make acpi_dev_get_resources() method agnostic
* acpi-processor:
ACPI / processor: make function acpi_processor_check_duplicates() static
ACPI: processor: use dev_dbg() instead of dev_warn() when CPPC probe failed
* acpi-cppc:
mailbox: pcc: Drop uninformative output during boot
* acpica: (32 commits)
ACPICA: Update version to 20170728
ACPICA: Revert "Update resource descriptor handling"
ACPICA: Resources: Allow _DMA method in walk resources
ACPICA: Ensure all instances of AE_AML_INTERNAL have error messages
ACPICA: Implement deferred resolution of reference package elements
ACPICA: Debugger: Improve support for Alias objects
ACPICA: Interpreter: Update handling for Alias operator
ACPICA: EFI/EDK2: Cleanup to enable /WX for MSVC builds
ACPICA: acpidump: Add DSDT/FACS instance support for Linux and EFI
ACPICA: CLib: Add short multiply/shift support
ACPICA: EFI/EDK2: Sort acpi.h inclusion order
ACPICA: Add a comment, no functional change
ACPICA: Namespace: Update/fix an error message
ACPICA: iASL: Add support for the SDEI table
ACPICA: Divergences: reduce access size definitions
ACPICA: Update version to 20170629
ACPICA: Update resource descriptor handling
ACPICA: iasl: Update to IORT SMMUv3 disassembling
ACPICA: Disassembler: skip parsing of incorrect external declarations
ACPICA: iASL: Ensure that the target node is valid in acpi_ex_create_alias
...
Guillaume Nault says:
====================
l2tp: session creation fixes
The session creation process has a few issues wrt. concurrent tunnel
deletion.
Patch #1 avoids creating sessions in tunnels that are getting removed.
This prevents races where sessions could try to take tunnel resources
that were already released.
Patch #2 removes some racy l2tp_tunnel_find() calls in session creation
callbacks. Together with path #1 it ensures that sessions can only
access tunnel resources that are guaranteed to remain valid during the
session creation process.
There are other problems with how sessions are created: pseudo-wire
specific data are set after the session is added to the tunnel. So
the session can be used, or deleted, before it has been completely
initialised. Separating session allocation from session registration
would be necessary, but we'd still have circular dependencies
preventing race-free registration. I'll consider this issue in future
series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Using l2tp_tunnel_find() in pppol2tp_session_create() and
l2tp_eth_create() is racy, because no reference is held on the
returned session. These functions are only used to implement the
->session_create callback which is run by l2tp_nl_cmd_session_create().
Therefore searching for the parent tunnel isn't necessary because
l2tp_nl_cmd_session_create() already has a pointer to it and holds a
reference.
This patch modifies ->session_create()'s prototype to directly pass the
the parent tunnel as parameter, thus avoiding searching for it in
pppol2tp_session_create() and l2tp_eth_create().
Since we have to touch the ->session_create() call in
l2tp_nl_cmd_session_create(), let's also remove the useless conditional:
we know that ->session_create isn't NULL at this point because it's
already been checked earlier in this same function.
Finally, one might be tempted to think that the removed
l2tp_tunnel_find() calls were harmless because they would return the
same tunnel as the one held by l2tp_nl_cmd_session_create() anyway.
But that tunnel might be removed and a new one created with same tunnel
Id before the l2tp_tunnel_find() call. In this case l2tp_tunnel_find()
would return the new tunnel which wouldn't be protected by the
reference held by l2tp_nl_cmd_session_create().
Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Fixes: d9e31d17ce ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_tunnel_destruct() sets tunnel->sock to NULL, then removes the
tunnel from the pernet list and finally closes all its sessions.
Therefore, it's possible to add a session to a tunnel that is still
reachable, but for which tunnel->sock has already been reset. This can
make l2tp_session_create() dereference a NULL pointer when calling
sock_hold(tunnel->sock).
This patch adds the .acpt_newsess field to struct l2tp_tunnel, which is
used by l2tp_tunnel_closeall() to prevent addition of new sessions to
tunnels. Resetting tunnel->sock is done after l2tp_tunnel_closeall()
returned, so that l2tp_session_add_to_tunnel() can safely take a
reference on it when .acpt_newsess is true.
The .acpt_newsess field is modified in l2tp_tunnel_closeall(), rather
than in l2tp_tunnel_destruct(), so that it benefits all tunnel removal
mechanisms. E.g. on UDP tunnels, a session could be added to a tunnel
after l2tp_udp_encap_destroy() proceeded. This would prevent the tunnel
from being removed because of the references held by this new session
on the tunnel and its socket. Even though the session could be removed
manually later on, this defeats the purpose of
commit 9980d001ce ("l2tp: add udp encap socket destroy handler").
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Dangaard Brouer says:
====================
net: revert lib/percpu_counter API for fragmentation mem accounting
There is a bug in fragmentation codes use of the percpu_counter API,
that can cause issues on systems with many CPUs, above 24 CPUs.
After much consideration and different attempts at solving the API
usage. The conclusion is to revert to the simple atomic_t API instead.
The ratio between batch size and threshold size make it a bad use-case
for the lib/percpu_counter API. As using the correct API calls will
unfortunately cause systems with many CPUs to always execute an
expensive sum across all CPUs. Plus the added complexity is not worth it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 1d6119baf0.
After reverting commit 6d7b857d54 ("net: use lib/percpu_counter API
for fragmentation mem accounting") then here is no need for this
fix-up patch. As percpu_counter is no longer used, it cannot
memory leak it any-longer.
Fixes: 6d7b857d54 ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119baf0 ("net: fix percpu memory leaks")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 6d7b857d54.
There is a bug in fragmentation codes use of the percpu_counter API,
that can cause issues on systems with many CPUs.
The frag_mem_limit() just reads the global counter (fbc->count),
without considering other CPUs can have upto batch size (130K) that
haven't been subtracted yet. Due to the 3MBytes lower thresh limit,
this become dangerous at >=24 CPUs (3*1024*1024/130000=24).
The correct API usage would be to use __percpu_counter_compare() which
does the right thing, and takes into account the number of (online)
CPUs and batch size, to account for this and call __percpu_counter_sum()
when needed.
We choose to revert the use of the lib/percpu_counter API for frag
memory accounting for several reasons:
1) On systems with CPUs > 24, the heavier fully locked
__percpu_counter_sum() is always invoked, which will be more
expensive than the atomic_t that is reverted to.
Given systems with more than 24 CPUs are becoming common this doesn't
seem like a good option. To mitigate this, the batch size could be
decreased and thresh be increased.
2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
CPU, before SKBs are pushed into sockets on remote CPUs. Given
NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
likely be limited. Thus, a fair chance that atomic add+dec happen
on the same CPU.
Revert note that commit 1d6119baf0 ("net: fix percpu memory leaks")
removed init_frag_mem_limit() and instead use inet_frags_init_net().
After this revert, inet_frags_uninit_net() becomes empty.
Fixes: 6d7b857d54 ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119baf0 ("net: fix percpu memory leaks")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current allocation for dev->caps.spec_qps is for the size of the
pointer and not the size of the actual mlx4_spec_qps structure. Fix
this by using the correct size. Also splint allocation over a few
lines to make it cppcheck clean on overly wide lines.
Detected by CoverityScan, CID#1455222 ("Wrong sizeof argument")
Fixes: c73c8b1e47 ("net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The structures hca_param and func_cap are not being kfree'd on an error
exit path causing two memory leaks. Fix this by jumping to the existing
free memory error exit path.
Detected by CoverityScan, CID#1455219, CID#1455224 ("Resource Leak")
Fixes: c73c8b1e47 ("net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After ip_route_input() calls ip_route_input_noref(), another
check on skb_dst() is done, but if this fails, we shouldn't
override the return code from ip_route_input_noref(), as it
could have been more specific (i.e. -EHOSTUNREACH).
This also saves one call to skb_dst_force_safe() and one to
skb_dst() in case the ip_route_input_noref() check fails.
Reported-by: Sabrina Dubroca <sdubroca@redhat.com>
Fixes: 9df16efadd ("ipv4: call dst_hold_safe() properly")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In function xfs_test_remount_options(), kfree() is used to free memory
allocated by kmem_zalloc(). But it is better to use kmem_free().
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Our loop in xfs_finish_page_writeback, which iterates over all buffer
heads in a page and then calls end_buffer_async_write, which also
iterates over all buffers in the page to check if any I/O is in flight
is not only inefficient, but also potentially dangerous as
end_buffer_async_write can cause the page and all buffers to be freed.
Replace it with a single loop that does the work of end_buffer_async_write
on a per-page basis.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Pull MIPS fixes from Ralf Baechle:
"The two indirect syscall fixes have sat in linux-next for a few days.
I did check back with a hardware designer to ensure a SYNC is really
what's required for the GIC fix and so the GIC fix didn't make it into
to linux-next in time for this final pull request.
It builds in local build tests and passes Imagination's test system"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
irqchip: mips-gic: SYNC after enabling GIC region
MIPS: Remove pt_regs adjustments in indirect syscall handler
MIPS: seccomp: Fix indirect syscall args
Pull x86 fixes from Thomas Gleixner:
- Expand the space for uncompressing as the LZ4 worst case does not fit
into the currently reserved space
- Validate boot parameters more strictly to prevent out of bound access
in the decompressor/boot code
- Fix off by one errors in get_segment_base()
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Prevent faulty bootparams.screeninfo from causing harm
x86/boot: Provide more slack space during decompression
x86/ldt: Fix off by one in get_segment_base()
Pull timer fix from Thomas Gleixner:
"A single fix for a thinko in the raw timekeeper update which causes
clock MONOTONIC_RAW to run with erratically increased frequency"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Fix ktime_get_raw() incorrect base accumulation
Pull perf fixes from Thomas Gleixner:
- Prevent a potential inconistency in the perf user space access which
might lead to evading sanity checks.
- Prevent perf recording function trace entries twice
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/ftrace: Fix double traces of perf on ftrace:function
perf/core: Fix potential double-fetch bug
In default, uniformly distribute the RSS indirection table entries
among all RX rings, rather than restricting this only to the rings
on the close NUMA node. irqbalancer would anyway dynamically override
the default affinities set to the RX rings.
This gives better multi-stream performance and CPU util.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
NAPI context keeps rescheduling on same CPU as long as it's busy.
This doesn't give the oppurtunity for changes in irq affinities
to take effect.
Fix that by calling napi_complete_done() upon a change in affinity.
This would stop the NAPI and reschedule it on the new CPU.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We used a channel state bit MLX5E_CHANNEL_NAPI_SCHED to make
sure no NAPI is missed when a channel's napi_schedule() is called
for completion events of the different channel's resources/rings
while NAPI is currently running.
Now, as similar mechanism is implemented in kernel,
("39e6c8208d7b net: solve a NAPI race"),
we obsolete our own implementation and rely on the return value
of napi_complete_done().
This patch removes a redundant overhead of atomic bit operations.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In XDP_TX flow, we now get back quicker to each page in page-cache,
and on some occasions refcount does not get back to 1 on time, causing
some costly page allocations.
Slightly increase the size of RX page-cache to significantly decrease
the chances for this to happen.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Avoid recycling an RX page if it moved to another NUMA node.
Add an ethtool counter to count such events.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
As of current design, in each NAPI, only a single UMR WQE
completion could be available in the completion queue of the
the internal control operations (ICO) send queue, in addition
to nop operations that require no actions upon completion.
This renders the consume index obsolete, as the wqe_counter
field in CQE is sufficient.
This helps removing a memory barrier, and obsoletes the need
for tracking the num_wqebbs to update the consumer counter.
In addition, remove other unused fields in icosq struct:
pdev, dma_fifo_pc, and prev_cc.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Separate the RX post WQEs function of the different RQ types.
This enables RQ type-specific optimizations in data-path.
Poll the ICOSQ completion queue only for Striding RQ,
and only when a UMR post completion could be possibly available.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The indication for a UMR WQE in progress is needed only within
the NAPI context, and hence no races possible and no need for
the use of atomic operations.
The only place the flag is read outside of NAPI context is
in closure flow, after RQ is disabled flag is no more accessed
in NAPI.
Use a boolean instead of a bit in ring state, so that its
non-atomic set operations do not race with the atomic sets of
the other bits.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Rings enabled state change occurs in control path only, and is always
followed by a napi_sychronize(), so that following NAPIs read the
new value. This read does not need to be atomic.
The RQ auto-moderation bit is not set/cleared in data-path.
No need for atomic read, a regular read operation is sufficient.
In RQ creation time as well, there's no multiple threads trying
to access it yet, hence a regular read can be used.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Refactor function mlx5e_lro_update_hdr() to reduce number of
branches.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
NAPI context handles different kinds of completion queues
(RX, TX, and others). Hence, upon a poll trial, some of them
might be empty.
Here we early-return upon empty completion queues, as well as
full rx buffer, and save unnecessary logic and memory barriers.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If a UMR post is in progress, it means that there's a missing
WQE in RQ, and that a completion will be shortly available in
ICO SQ completion queue. Prefer busy-poll to handle it as soon
as possible.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The dma offset of a MPWQE (Multi-Packet WQE) in memory region
is fixed for all rounds. Calculate it once on creation time,
instead of in runtime. This also obsoletes the wqe argument in
the function.
In addition, optimize dma_info iterator calculation.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In RX data-path, use memset() instead of loop assignment
to init the whole skbs_frags array.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Field is used only locally within the RQ create function.
The use of a local variable is sufficient.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In RX data-path, use shift operations instead of a regular multiplication
by stride size, as it is a power of two.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Bring fast-path fields together, and combine RX WQE mutual
exclusive fields into a union.
Page-reuse and XDP are mutually exclusive and cannot be used at
the same time.
Use a union to combine their footprints.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reject attempts to set XFLAGS that correspond to di_flags2 inode flags
if the inode isn't a v3 inode, because di_flags2 only exists on v3.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Fix up all the compiler warnings that have crept in.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Pull cifs version warning fix from Steve French:
"As requested, additional kernel warning messages to clarify the
default dialect changes"
[ There is still some discussion about exactly which version should be
the new default. Longer-term we have auto-negotiation coming, but
that's not there yet.. - Linus ]
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
Fix warning messages when mounting to older servers
Haiyang Zhang says:
====================
hv_netvsc: cleanups and fixes of channel settings
This patch set cleans up some unused variables, unnecessary checks.
Also fixed some limit checking of channel number.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The limit of setting receive indirection table value should be
the current number of channels, not the VRSS_CHANNEL_MAX.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of the following code, net->num_tx_queues equals to
VRSS_CHANNEL_MAX, and max_chn is less than or equals to VRSS_CHANNEL_MAX.
netvsc_drv.c:
alloc_etherdev_mq(sizeof(struct net_device_context),
VRSS_CHANNEL_MAX);
rndis_filter.c:
net_device->max_chn = min_t(u32, VRSS_CHANNEL_MAX, num_possible_rss_qs);
So this patch removes the unnecessary limit check before comparing
with "max_chn".
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The minus one and assignment to a local variable is not necessary.
This patch simplifies it.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the parameter, num_queue in
rndis_filter_set_rss_param(), which is no longer in use.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a listener registers to the FIB notification chain it receives a
dump of the FIB entries and rules from existing address families by
invoking their dump operations.
While we call into these modules we need to make sure they aren't
removed. Do that by increasing their reference count before invoking
their dump operations and decrease it afterwards.
Fixes: 04b1d4e50e ("net: core: Make the FIB notification chain generic")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger says:
====================
netvsc: transparent VF related cleanups
The first gets rid of unnecessary ref counting, and second
allows removing hv_netvsc driver even if VF present.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If VF is attached then can still allow netvsc driver module to
be removed. Just have to make sure and do the cleanup.
Also, avoid extra rtnl round trip when calling unregister.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use one routine for datapath up/down. Don't need to reopen
the rndis layer.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of tracking wmem_queued and sk_mem_charge by incrementing
in the verdict SK_REDIRECT paths and decrementing in the tx work
path use skb_set_owner_w and sock_writeable helpers. This solves
a few issues with the current code. First, in SK_REDIRECT inc on
sk_wmem_queued and sk_mem_charge were being done without the peers
sock lock being held. Under stress this can result in accounting
errors when tx work and/or multiple verdict decisions are working
on the peer psock.
Additionally, this cleans up the code because we can rely on the
default destructor to decrement memory accounting on kfree_skb. Also
this will trigger sk_write_space when space becomes available on
kfree_skb() which wasn't happening before and prevent __sk_free
from being called until all in-flight packets are completed.
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
net: ubuf_info.refcnt conversion
Yet another atomic_t -> refcount_t conversion, split in two patches.
First patch prepares the automatic conversion done in the second patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>