Using typedefs inside the Kernel is against CodingStyle, and
there's no good usage here.
Just like we did at frontend.h, at commit 0df289a209 ("[media] dvb:
Get rid of typedev usage for enums"), let's keep those typedefs only
to provide userspace backward compatibility.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Pull x86 asm updates from Ingo Molnar:
- Introduce the ORC unwinder, which can be enabled via
CONFIG_ORC_UNWINDER=y.
The ORC unwinder is a lightweight, Linux kernel specific debuginfo
implementation, which aims to be DWARF done right for unwinding.
Objtool is used to generate the ORC unwinder tables during build, so
the data format is flexible and kernel internal: there's no
dependency on debuginfo created by an external toolchain.
The ORC unwinder is almost two orders of magnitude faster than the
(out of tree) DWARF unwinder - which is important for perf call graph
profiling. It is also significantly simpler and is coded defensively:
there has not been a single ORC related kernel crash so far, even
with early versions. (knock on wood!)
But the main advantage is that enabling the ORC unwinder allows
CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel
measurably:
With frame pointers disabled, GCC does not have to add frame pointer
instrumentation code to every function in the kernel. The kernel's
.text size decreases by about 3.2%, resulting in better cache
utilization and fewer instructions executed, resulting in a broad
kernel-wide speedup. Average speedup of system calls should be
roughly in the 1-3% range - measurements by Mel Gorman [1] have shown
a speedup of 5-10% for some function execution intense workloads.
The main cost of the unwinder is that the unwinder data has to be
stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel
config - which is a modest cost on modern x86 systems.
Given how young the ORC unwinder code is it's not enabled by default
- but given the performance advantages the plan is to eventually make
it the default unwinder on x86.
See Documentation/x86/orc-unwinder.txt for more details.
- Remove lguest support: its intended role was that of a temporary
proof of concept for virtualization, plus its removal will enable the
reduction (removal) of the paravirt API as well, so Rusty agreed to
its removal. (Juergen Gross)
- Clean up and fix FSGS related functionality (Andy Lutomirski)
- Clean up IO access APIs (Andy Shevchenko)
- Enhance the symbol namespace (Jiri Slaby)
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
objtool: Handle GCC stack pointer adjustment bug
x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
x86/fpu/math-emu: Add ENDPROC to functions
x86/boot/64: Extract efi_pe_entry() from startup_64()
x86/boot/32: Extract efi_pe_entry() from startup_32()
x86/lguest: Remove lguest support
x86/paravirt/xen: Remove xen_patch()
objtool: Fix objtool fallthrough detection with function padding
x86/xen/64: Fix the reported SS and CS in SYSCALL
objtool: Track DRAP separately from callee-saved registers
objtool: Fix validate_branch() return codes
x86: Clarify/fix no-op barriers for text_poke_bp()
x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
selftests/x86/fsgsbase: Test selectors 1, 2, and 3
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
x86/asm/32: Fix regs_get_register() on segment registers
x86/xen/64: Rearrange the SYSCALL entries
x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads
...
Pull perf updates from Ingo Molnar:
"Kernel side changes:
- Add branch type profiling/tracing support. (Jin Yao)
- Add the PERF_SAMPLE_PHYS_ADDR ABI to allow the tracing/profiling of
physical memory addresses, where the PMU supports it. (Kan Liang)
- Export some PMU capability details in the new
/sys/bus/event_source/devices/cpu/caps/ sysfs directory. (Andi
Kleen)
- Aux data fixes and updates (Will Deacon)
- kprobes fixes and updates (Masami Hiramatsu)
- AMD uncore PMU driver fixes and updates (Janakarajan Natarajan)
On the tooling side, here's a (limited!) list of highlights - there
were many other changes that I could not list, see the shortlog and
git history for details:
UI improvements:
- Implement a visual marker for fused x86 instructions in the
annotate TUI browser, available now in 'perf report', more work
needed to have it available as well in 'perf top' (Jin Yao)
Further explanation from one of Jin's patches:
│ ┌──cmpl $0x0,argp_program_version_hook
81.93 │ ├──je 20
│ │ lock cmpxchg %esi,0x38a9a4(%rip)
│ │↓ jne 29
│ │↓ jmp 43
11.47 │20:└─→cmpxch %esi,0x38a999(%rip)
That means the cmpl+je is a fused instruction pair and they should
be considered together.
- Record the branch type and then show statistics and info about in
callchain entries (Jin Yao)
Example from one of Jin's patches:
# perf record -g -j any,save_type
# perf report --branch-history --stdio --no-children
38.50% div.c:45 [.] main div
|
---main div.c:42 (RET CROSS_2M cycles:2)
compute_flag div.c:28 (cycles:2)
compute_flag div.c:27 (RET CROSS_2M cycles:1)
rand rand.c:28 (cycles:1)
rand rand.c:28 (RET CROSS_2M cycles:1)
__random random.c:298 (cycles:1)
__random random.c:297 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (RET CROSS_2M cycles:9)
namespaces support:
- Add initial support for namespaces, using setns to access files in
namespaces, grabbing their build-ids, etc. (Krister Johansen)
perf trace enhancements:
- Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace'
(Arnaldo Carvalho de Melo)
- Add initial 'clone' syscall args beautifier in 'perf trace'
(Arnaldo Carvalho de Melo)
- Ignore 'fd' and 'offset' args for MAP_ANONYMOUS in 'perf trace'
(Arnaldo Carvalho de Melo)
- Beautifiers for the 'cmd' arg of several ioctl types, including:
sound, DRM, KVM, vhost virtio and perf_events. (Arnaldo Carvalho de
Melo)
- Add PERF_SAMPLE_CALLCHAIN and PERF_RECORD_MMAP[2] to 'perf data'
CTF conversion, allowing CTF trace visualization tools to show
callchains and to resolve symbols (Geneviève Bastien)
- Beautify the fcntl syscall, which is an interesting one in the
sense that infrastructure had to be put in place to change the
formatters of some arguments according to the value in a previous
one, i.e. cmd dictates how arg and the syscall return will be
formatted. (Arnaldo Carvalho de Melo
perf stat enhancements:
- Use group read for event groups in 'perf stat', reducing overhead
when groups are defined in the event specification, i.e. when using
{} to enclose a list of events, asking them to be read at the same
time, e.g.: "perf stat -e '{cycles,instructions}'" (Jiri Olsa)
pipe mode improvements:
- Process tracing data in 'perf annotate' pipe mode (David
Carrillo-Cisneros)
- Add header record types to pipe-mode, now this command:
$ perf record -o - -e cycles sleep 1 | perf report --stdio --header
Will show the same as in non-pipe mode, i.e. involving a perf.data
file (David Carrillo-Cisneros)
Vendor specific hardware event support updates/enhancements:
- Update POWER9 vendor events tables (Sukadev Bhattiprolu)
- Add POWER9 PMU events Sukadev (Bhattiprolu)
- Support additional POWER8+ PVR in PMU mapfile (Shriya)
- Add Skylake server uncore JSON vendor events (Andi Kleen)
- Support exporting Intel PT data to sqlite3 with python perf
scripts, this is in addition to the postgresql support that was
already there (Adrian Hunter)"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (253 commits)
perf symbols: Fix plt entry calculation for ARM and AARCH64
perf probe: Fix kprobe blacklist checking condition
perf/x86: Fix caps/ for !Intel
perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR
perf/core, pt, bts: Get rid of itrace_started
perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments
tools headers: Sync cpu features kernel ABI headers with tooling headers
perf tools: Pass full path of FEATURES_DUMP
perf tools: Robustify detection of clang binary
tools lib: Allow external definition of CC, AR and LD
perf tools: Allow external definition of flex and bison binary names
tools build tests: Don't hardcode gcc name
perf report: Group stat values on global event id
perf values: Zero value buffers
perf values: Fix allocation check
perf values: Fix thread index bug
perf report: Add dump_read function
perf record: Set read_format for inherit_stat
perf c2c: Fix remote HITM detection for Skylake
perf tools: Fix static build with newer toolchains
...
In the last NFWS in Faro, Portugal, we discussed that netlink is lacking
the semantics to request non recursive deletions, ie. do not delete an
object iff it has child objects that hang from this parent object that
the user requests to be deleted.
We need this new flag to solve a problem for the iptables-compat
backward compatibility utility, that runs iptables commands using the
existing nf_tables netlink interface. Specifically, custom chains in
iptables cannot be deleted if there are rules in it, however, nf_tables
allows to remove any chain that is populated with content. To sort out
this asymmetry, iptables-compat userspace sets this new NLM_F_NONREC
flag to obtain the same semantics that iptables provides.
This new flag should only be used for deletion requests. Note this new
flag value overlaps with the existing:
* NLM_F_ROOT for get requests.
* NLM_F_REPLACE for new requests.
However, those flags should not ever be used in deletion requests.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull RCU updates from Ingo Molnad:
"The main RCU related changes in this cycle were:
- Removal of spin_unlock_wait()
- SRCU updates
- RCU torture-test updates
- RCU Documentation updates
- Extend the sys_membarrier() ABI with the MEMBARRIER_CMD_PRIVATE_EXPEDITED variant
- Miscellaneous RCU fixes
- CPU-hotplug fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
arch: Remove spin_unlock_wait() arch-specific definitions
locking: Remove spin_unlock_wait() generic definitions
drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
ipc: Replace spin_unlock_wait() with lock/unlock pair
exit: Replace spin_unlock_wait() with lock/unlock pair
completion: Replace spin_unlock_wait() with lock/unlock pair
doc: Set down RCU's scheduling-clock-interrupt needs
doc: No longer allowed to use rcu_dereference on non-pointers
doc: Add RCU files to docbook-generation files
doc: Update memory-barriers.txt for read-to-write dependencies
doc: Update RCU documentation
membarrier: Provide expedited private command
rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
rcu: Add warning to rcu_idle_enter() for irqs enabled
rcu: Make rcu_idle_enter() rely on callers disabling irqs
rcu: Add assertions verifying blocked-tasks list
rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
rcu: Add TPS() protection for _rcu_barrier_trace strings
rcu: Use idle versions of swait to make idle-hack clear
swait: Add idle variants which don't contribute to load average
...
Register a new limit stateful object type into the stateful object
infrastructure.
Signed-off-by: Pablo M. Bermudo Garay <pablombg@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds a new feature to hashlimit that allows matching on the
current packet/byte rate without rate limiting. This can be enabled
with a new flag --hashlimit-rate-match. The match returns true if the
current rate of packets is above/below the user specified value.
The main difference between the existing algorithm and the new one is
that the existing algorithm rate-limits the flow whereas the new
algorithm does not. Instead it *classifies* the flow based on whether
it is above or below a certain rate. I will demonstrate this with an
example below. Let us assume this rule:
iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain
If the packet rate is 15/s, the existing algorithm would ACCEPT 10
packets every second and send 5 packets to "new_chain".
But with the new algorithm, as long as the rate of 15/s is sustained,
all packets will continue to match and every packet is sent to new_chain.
This new functionality will let us classify different flows based on
their current rate, so that further decisions can be made on them based on
what the current rate is.
This is how the new algorithm works:
We divide time into intervals of 1 (sec/min/hour) as specified by
the user. We keep track of the number of packets/bytes processed in the
current interval. After each interval we reset the counter to 0.
When we receive a packet for match, we look at the packet rate
during the current interval and the previous interval to make a
decision:
if [ prev_rate < user and cur_rate < user ]
return Below
else
return Above
Where cur_rate is the number of packets/bytes seen in the current
interval, prev is the number of packets/bytes seen in the previous
interval and 'user' is the rate specified by the user.
We also provide flexibility to the user for choosing the time
interval using the option --hashilmit-interval. For example the user can
keep a low rate like x/hour but still keep the interval as small as 1
second.
To preserve backwards compatibility we have to add this feature in a new
revision, so I've created revision 3 for hashlimit. The two new options
we add are:
--hashlimit-rate-match
--hashlimit-rate-interval
I have updated the help text to add these new options. Also added a few
tests for the new options.
Suggested-by: Igor Lubashev <ilubashe@akamai.com>
Reviewed-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- Lots of hfi1 driver updates (mixed with a few qib and core updates as
well)
- rxe updates
- various mlx updates
- Set default roce type to RoCEv2
- Several larger fixes for bnxt_re that were too big for -rc
- Several larger fixes for qedr that, likewise, were too big for -rc
- Misc core changes
- Make the hns_roce driver compilable on arches other than aarch64 so we
can more easily debug build issues related to it
- Add rdma-netlink infrastructure updates
- Add automatic IRQ affinity infrastructure
- Add 32bit lid support
- Lots of misc fixes across the subsystem from random people
- Autoloading of RDMA netlink modules
- PCI pool cleanups from Romain Perier
- mlx5 driver feature additions and fixes
- Hardware tag matchine feature
- Fix sleeping in atomic when resolving roce ah
- Add experimental ioctl interface as posted to linux-api@
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZqBDtAAoJELgmozMOVy/dNlcQAJhYNRGaNUBx0L6+8t2xwUrt
7ndP6qlMar30DJY9FjTQCzRBw0CRMWkXdJD8rYlyaHy07pjWDKG8LZtxEXu1FLdZ
oNRvQX6ZJh8Bz7db2SQFBCTF2uWGZZFqWQCrSbQwjj9xxjMDs59u/knmwHVY9dKk
egjPG4IQBDmcTeNY7h1otG2hXpx7QPIOilQW2EFN5SWAuBAazdF2JKxjjxqhnUfp
gD2pSdgsm3VSMoo0zpMa6qOP+9GcOu8J97fYFhasRYWCavPdWHyq+XNu9S/eicRd
xbv+seCYM+9jPb2dsNdjEKll7w3yyWdu7h6tSCMPYv54eN9sDDiO1w2L2ZnESMZa
JRnSfB+HXru1r4RyHOTPO8peaNhYlR1V4u8bTS5G2dffbHis9BajkWoAR/oSiUcB
AIjIIDcdJFVGfpF9KIt/pEl+adHNgESibSijzOUYkyw6RNbPqDmdd7YakPHcQhKN
clE3zQfIsPRLWsToP/nkBE0tUd3tQocRuLy7ote7hXQK+0p7TBz0a6Kkj87MvX33
8dVbUI+q6WRlEY90l71y0ZdXy/AvkxkFxAc4Y7FQZyJxhEArTaKgfa5fmpRwVxBm
yi9baoYCspHNRNv6AO4IL86ZCJqmWBuch8CBY1n2X3h8IGfKYEZUAZ+T/mnTTeUq
A4joXduz94ZD4w23leD1
=2ntC
-----END PGP SIGNATURE-----
Merge tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"This is a big pull request.
Of note is that I'm sending you the new ioctl API for the rdma
subsystem. We put it up on linux-api@, but didn't get much response.
The API is complex, but it solves two different problems in one go:
1) The bi-directional nature of the RDMA file write calls, which
created the security hole we had to handle (and for which the fix
is now causing problems for systems in production, we were a bit
over zealous in the fix and the ability to open a device, then
fork, then create new queue pairs on the device and use them is
broken).
2) The bloat caused by different vendors implementing extensions to
the base verbs API. Each vendor's hardware is slightly different,
and the hardware might be suitable for one extension but not
another.
By the time we add generic extensions for all the different ways
that the different hardware can offload things, the API becomes
bloated. Things like our completion structs have started to exceed
a cache line in size because of all the elements needed to support
this. That in turn shows up heavily in the performance graphs with
a noticable drop in performance on 100Gigabit links as our
completion structs go from occupying one cache line to 1+.
This API makes things like the completion structs modular in a
very similar way to netlink so that your structs can only include
the items needed for the offloads/features you are actually using
on a given queue pair. In that way we support everything, but only
use what we need, and our structs stay smaller.
The ioctl API is better explained by the posting on linux-api@ than I
can explain it here, so I'll just leave it at that.
The rest of the pull request is typical stuff.
Updates for 4.14 kernel merge window
- Lots of hfi1 driver updates (mixed with a few qib and core updates
as well)
- rxe updates
- various mlx updates
- Set default roce type to RoCEv2
- Several larger fixes for bnxt_re that were too big for -rc
- Several larger fixes for qedr that, likewise, were too big for -rc
- Misc core changes
- Make the hns_roce driver compilable on arches other than aarch64 so
we can more easily debug build issues related to it
- Add rdma-netlink infrastructure updates
- Add automatic IRQ affinity infrastructure
- Add 32bit lid support
- Lots of misc fixes across the subsystem from random people
- Autoloading of RDMA netlink modules
- PCI pool cleanups from Romain Perier
- mlx5 driver feature additions and fixes
- Hardware tag matchine feature
- Fix sleeping in atomic when resolving roce ah
- Add experimental ioctl interface as posted to linux-api@"
* tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits)
IB/core: Expose ioctl interface through experimental Kconfig
IB/core: Assign root to all drivers
IB/core: Add completion queue (cq) object actions
IB/core: Add legacy driver's user-data
IB/core: Export ioctl enum types to user-space
IB/core: Explicitly destroy an object while keeping uobject
IB/core: Add macros for declaring methods and attributes
IB/core: Add uverbs merge trees functionality
IB/core: Add DEVICE object and root tree structure
IB/core: Declare an object instead of declaring only type attributes
IB/core: Add new ioctl interface
RDMA/vmw_pvrdma: Fix a signedness
RDMA/vmw_pvrdma: Report network header type in WC
IB/core: Add might_sleep() annotation to ib_init_ah_from_wc()
IB/cm: Fix sleeping in atomic when RoCE is used
IB/core: Add support to finalize objects in one transaction
IB/core: Add a generic way to execute an operation on a uobject
Documentation: Hardware tag matching
IB/mlx5: Support IB_SRQT_TM
net/mlx5: Add XRQ support
...
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree. Basically, updates to the conntrack core, enhancements for
nf_tables, conversion of netfilter hooks from linked list to array to
improve memory locality and asorted improvements for the Netfilter
codebase. More specifically, they are:
1) Add expection to hashes after timer initialization to prevent
access from another CPU that walks on the hashes and calls
del_timer(), from Florian Westphal.
2) Don't update nf_tables chain counters from hot path, this is only
used by the x_tables compatibility layer.
3) Get rid of nested rcu_read_lock() calls from netfilter hook path.
Hooks are always guaranteed to run from rcu read side, so remove
nested rcu_read_lock() where possible. Patch from Taehee Yoo.
4) nf_tables new ruleset generation notifications include PID and name
of the process that has updated the ruleset, from Phil Sutter.
5) Use skb_header_pointer() from nft_fib, so we can reuse this code from
the nf_family netdev family. Patch from Pablo M. Bermudo.
6) Add support for nft_fib in nf_tables netdev family, also from Pablo.
7) Use deferrable workqueue for conntrack garbage collection, to reduce
power consumption, from Patch from Subash Abhinov Kasiviswanathan.
8) Add nf_ct_expect_iterate_net() helper and use it. From Florian
Westphal.
9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian.
10) Drop references on conntrack removal path when skbuffs has escaped via
nfqueue, from Florian.
11) Don't queue packets to nfqueue with dying conntrack, from Florian.
12) Constify nf_hook_ops structure, from Florian.
13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter.
14) Add nla_strdup(), from Phil Sutter.
15) Rise nf_tables objects name size up to 255 chars, people want to use
DNS names, so increase this according to what RFC 1035 specifies.
Patch series from Phil Sutter.
16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook
registration on demand, suggested by Eric Dumazet, patch from Florian.
17) Remove unused variables in compat_copy_entry_from_user both in
ip_tables and arp_tables code. Patch from Taehee Yoo.
18) Constify struct nf_conntrack_l4proto, from Julia Lawall.
19) Constify nf_loginfo structure, also from Julia.
20) Use a single rb root in connlimit, from Taehee Yoo.
21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo.
22) Use audit_log() instead of open-coding it, from Geliang Tang.
23) Allow to mangle tcp options via nft_exthdr, from Florian.
24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes
a fix for a miscalculation of the minimal length.
25) Simplify branch logic in h323 helper, from Nick Desaulniers.
26) Calculate netlink attribute size for conntrack tuple at compile
time, from Florian.
27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure.
From Florian.
28) Remove holes in nf_conntrack_l4proto structure, so it becomes
smaller. From Florian.
29) Get rid of print_tuple() indirection for /proc conntrack listing.
Place all the code in net/netfilter/nf_conntrack_standalone.c.
Patch from Florian.
30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is
off. From Florian.
31) Constify most nf_conntrack_{l3,l4}proto helper functions, from
Florian.
32) Fix broken indentation in ebtables extensions, from Colin Ian King.
33) Fix several harmless sparse warning, from Florian.
34) Convert netfilter hook infrastructure to use array for better memory
locality, joint work done by Florian and Aaron Conole. Moreover, add
some instrumentation to debug this.
35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once
per batch, from Florian.
36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian.
37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao.
38) Remove unused code in the generic protocol tracker, from Davide
Caratti.
I think I will have material for a second Netfilter batch in my queue if
time allow to make it fit in this merge window.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZpRPIAAoJEAx081l5xIa+kCIP/2m2q0jBmCATvXXwrMBH0zNk
4lm9yIfl9pmluJP97aklvkeKF77chhost76+hv+0sQ9ZsJD8koHWv5WyTHEs7Cfn
NpmtGPqYlIZsWNSwW0OFF/XzllgLCVEWa+W/7ryYzPZrSEZr6Ge4HE0qS3LfuLJv
K89amZWHkP5ysPZ1uxRBzHtZfNAhdyjYVTUntCR7gj3DYv3yNdeZu+/epfcWK2w/
Q+ggoy644vX/yzy5L5zCGL/J1BjStDuec7sgAKTlNx4TwBUmp2wsfhEdovQBGFiu
t5PHMajvrBRqSJWDIAZSUfjQzIMSz517J9LWeChU7KtAClNJQJEabbu4CoX4aEmG
UbSzEe0IxnxQ4842jcqQXZ+mevlNIEIBVSNR7dXi17jL3Ts+APQgrYjRJYVk2ipg
uQ9TwkeVVu2WRGyU8iRQrXAZI7+O3p4UnbNPjeG2qACD2Ur7Z3n7b0mhNFPOLzO4
gbIv4D6CcUB/vltl+vhZTW3P50oMCVSq8ScCpY8CGo29mZ5vypj5PTS+W8FsyY3Z
ypyMqWg/DyxKlOoO+aK8EmXuZmgtDR4kb8asltH/S1A0NZkzjrFkKgs10Cp6EjJy
Zz1BWa1KKEpdN6yp+jrbJKjf9MJ7K2RPGv3bxWnCCdNv4j49rk4t3IHqvcihddsd
XXFQB5zE7Pz0ROi/VkXR
=5fxW
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.14' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This is the main drm pull request for 4.14 merge window.
I'm sending this early, as my continuing journey into fatherhood is
occurring really soon now, I'm going to be mostly useless for the next
couple of weeks, though I may be able to read email, I doubt I'll be
doing much patch applications or git sending. If anything urgent pops
up I've asked Daniel/Jani/Alex/Sean to try and direct stuff towards
you.
Outside drm changes:
Some rcar-du updates that touch the V4L tree, all acks should be in
place. It adds one export to the radix tree code for new i915 use
case. There are some minor AGP cleanups (don't see that too often).
Changes to the vbox driver in staging to avoid breaking compilation.
Summary:
core:
- Atomic helper fixes
- Atomic UAPI fixes
- Add YCBCR 4:2:0 support
- Drop set_busid hook
- Refactor fb_helper locking
- Remove a bunch of internal APIs
- Add a bunch of better default handlers
- Format modifier/blob plane property added
- More internal header refactoring
- Make more internal API names consistent
- Enhanced syncobj APIs (wait/signal/reset/create signalled)
bridge:
- Add Synopsys Designware MIPI DSI host bridge driver
tiny:
- Add Pervasive Displays RePaper displays
- Add support for LEGO MINDSTORMS EV3 LCD
i915:
- Lots of GEN10/CNL support patches
- drm syncobj support
- Skylake+ watermark refactoring
- GVT vGPU 48-bit ppgtt support
- GVT performance improvements
- NOA change ioctl
- CCS (color compression) scanout support
- GPU reset improvements
amdgpu:
- Initial hugepage support
- BO migration logic rework
- Vega10 improvements
- Powerplay fixes
- Stop reprogramming the MC
- Fixes for ACP audio on stoney
- SR-IOV fixes/improvements
- Command submission overhead improvements
amdkfd:
- Non-dGPU upstreaming patches
- Scratch VA ioctl
- Image tiling modes
- Update PM4 headers for new firmware
- Drop all BUG_ONs.
nouveau:
- GP108 modesetting support.
- Disable MSI on big endian.
vmwgfx:
- Add fence fd support.
msm:
- Runtime PM improvements
exynos:
- NV12MT support
- Refactor KMS drivers
imx-drm:
- Lock scanout channel to improve memory bw
- Cleanups
etnaviv:
- GEM object population fixes
tegra:
- Prep work for Tegra186 support
- PRIME mmap support
sunxi:
- HDMI support improvements
- HDMI CEC support
omapdrm:
- HDMI hotplug IRQ support
- Big driver cleanup
- OMAP5 DSI support
rcar-du:
- vblank fixes
- VSP1 updates
arcgpu:
- Minor fixes
stm:
- Add STM32 DSI controller driver
dw_hdmi:
- Add support for Rockchip RK3399
- HDMI CEC support
atmel-hlcdc:
- Add 8-bit color support
vc4:
- Atomic fixes
- New ioctl to attach a label to a buffer object
- HDMI CEC support
- Allow userspace to dictate rendering order on submit ioctl"
* tag 'drm-for-v4.14' of git://people.freedesktop.org/~airlied/linux: (1074 commits)
drm/syncobj: Add a signal ioctl (v3)
drm/syncobj: Add a reset ioctl (v3)
drm/syncobj: Add a syncobj_array_find helper
drm/syncobj: Allow wait for submit and signal behavior (v5)
drm/syncobj: Add a CREATE_SIGNALED flag
drm/syncobj: Add a callback mechanism for replace_fence (v3)
drm/syncobj: add sync obj wait interface. (v8)
i915: Use drm_syncobj_fence_get
drm/syncobj: Add a race-free drm_syncobj_fence_get helper (v2)
drm/syncobj: Rename fence_get to find_fence
drm: kirin: Add mode_valid logic to avoid mode clocks we can't generate
drm/vmwgfx: Bump the version for fence FD support
drm/vmwgfx: Add export fence to file descriptor support
drm/vmwgfx: Add support for imported Fence File Descriptor
drm/vmwgfx: Prepare to support fence fd
drm/vmwgfx: Fix incorrect command header offset at restart
drm/vmwgfx: Support the NOP_ERROR command
drm/vmwgfx: Restart command buffers after errors
drm/vmwgfx: Move irq bottom half processing to threads
drm/vmwgfx: Don't use drm_irq_[un]install
...
Pull misc fixes from Al Viro:
"Loose ends and regressions from the last merge window.
Strictly speaking, only binfmt_flat thing is a build regression per
se - the rest is 'only sparse cares about that' stuff"
[ This came in before the 4.13 release and could have gone there, but it
was late in the release and nothing seemed critical enough to care, so
I'm pulling it in the 4.14 merge window instead - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
binfmt_flat: fix arch/m32r and arch/microblaze flat_put_addr_at_rp()
compat_hdio_ioctl: Fix a declaration
<linux/uaccess.h>: Fix copy_in_user() declaration
annotate RWF_... flags
teach SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE to handle __bitwise arguments
Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to
processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is
not possible to retrieve these from the kernel once they have been
configured on sockets.
Signed-off-by: Ivan Delalande <colona@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The FMR_OF_LAST flag is set on the last fsmap record being returned for
the dataset requested, contrary to what the header file says. Fix the
docs to reflect the behavior of all fsmap implementations.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Root in a non-initial user ns cannot be trusted to write a traditional
security.capability xattr. If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a private
namespace, write the xattr, and execute the file with privilege on the
host.
However supporting file capabilities in a user namespace is very
desirable. Not doing so means that any programs designed to run with
limited privilege must continue to support other methods of gaining and
dropping privilege. For instance a program installer must detect
whether file capabilities can be assigned, and assign them if so but set
setuid-root otherwise. The program in turn must know how to drop
partial capabilities, and do so only if setuid-root.
This patch introduces v3 of the security.capability xattr. It builds a
vfs_ns_cap_data struct by appending a uid_t rootid to struct
vfs_cap_data. This is the absolute uid_t (that is, the uid_t in user
namespace which mounted the filesystem, usually init_user_ns) of the
root id in whose namespaces the file capabilities may take effect.
When a task asks to write a v2 security.capability xattr, if it is
privileged with respect to the userns which mounted the filesystem, then
nothing should change. Otherwise, the kernel will transparently rewrite
the xattr as a v3 with the appropriate rootid. This is done during the
execution of setxattr() to catch user-space-initiated capability writes.
Subsequently, any task executing the file which has the noted kuid as
its root uid, or which is in a descendent user_ns of such a user_ns,
will run the file with capabilities.
Similarly when asking to read file capabilities, a v3 capability will
be presented as v2 if it applies to the caller's namespace.
If a task writes a v3 security.capability, then it can provide a uid for
the xattr so long as the uid is valid in its own user namespace, and it
is privileged with CAP_SETFCAP over its namespace. The kernel will
translate that rootid to an absolute uid, and write that to disk. After
this, a task in the writer's namespace will not be able to use those
capabilities (unless rootid was 0), but a task in a namespace where the
given uid is root will.
Only a single security.capability xattr may exist at a time for a given
file. A task may overwrite an existing xattr so long as it is
privileged over the inode. Note this is a departure from previous
semantics, which required privilege to remove a security.capability
xattr. This check can be re-added if deemed useful.
This allows a simple setxattr to work, allows tar/untar to work, and
allows us to tar in one namespace and untar in another while preserving
the capability, without risking leaking privilege into a parent
namespace.
Example using tar:
$ cp /bin/sleep sleepx
$ mkdir b1 b2
$ lxc-usernsexec -m b:0:100000:1 -m b:1:$(id -u):1 -- chown 0:0 b1
$ lxc-usernsexec -m b:0:100001:1 -m b:1:$(id -u):1 -- chown 0:0 b2
$ lxc-usernsexec -m b:0:100000:1000 -- tar --xattrs-include=security.capability --xattrs -cf b1/sleepx.tar sleepx
$ lxc-usernsexec -m b:0:100001:1000 -- tar --xattrs-include=security.capability --xattrs -C b2 -xf b1/sleepx.tar
$ lxc-usernsexec -m b:0:100001:1000 -- getcap b2/sleepx
b2/sleepx = cap_sys_admin+ep
# /opt/ltp/testcases/bin/getv3xattr b2/sleepx
v3 xattr, rootid is 100001
A patch to linux-test-project adding a new set of tests for this
functionality is in the nsfscaps branch at github.com/hallyn/ltp
Changelog:
Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
Nov 07 2016: convert rootid from and to fs user_ns
(From ebiederm: mar 28 2017)
commoncap.c: fix typos - s/v4/v3
get_vfs_caps_from_disk: clarify the fs_ns root access check
nsfscaps: change the code split for cap_inode_setxattr()
Apr 09 2017:
don't return v3 cap for caps owned by current root.
return a v2 cap for a true v2 cap in non-init ns
Apr 18 2017:
. Change the flow of fscap writing to support s_user_ns writing.
. Remove refuse_fcap_overwrite(). The value of the previous
xattr doesn't matter.
Apr 24 2017:
. incorporate Eric's incremental diff
. move cap_convert_nscap to setxattr and simplify its usage
May 8, 2017:
. fix leaking dentry refcount in cap_inode_getsecurity
Signed-off-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
The BINDER_GET_NODE_DEBUG_INFO ioctl will return debug info on
a node. Each successive call reusing the previous return value
will return the next node. The data will be used by
libmemunreachable to mark the pointers with kernel references
as reachable.
Signed-off-by: Colin Cross <ccross@android.com>
Signed-off-by: Martijn Coenen <maco@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add socket mark and priority to fields that can be set by
ebpf program when a socket is created.
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used by the IPv6 host table which will be introduced in the
following patches. The fields in the header are added per-use. This header
is global and can be reused by many drivers.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a different approach from the first attempt in f2c6df7dbf
("loop: support 4k physical blocksize"). Rather than extending
LOOP_{GET,SET}_STATUS, add a separate ioctl just for setting the block
size.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Adding CQ ioctl actions:
1. create_cq
2. destroy_cq
This requires adding the following:
1. A specification describing the method
a. Handler
b. Attributes specification
Each attribute is one of the following:
a. PTR_IN - input data
Note: This could be encoded inlined for
data < 64bit
b. PTR_OUT - response data
c. IDR - idr based object
d. FD - fd based object
Blobs attributes (clauses a and b) contain their type,
while objects specifications (clauses c and d)
contains the expected object type (for example, the
given id should be UVERBS_TYPE_PD) and the required
access (READ, WRITE, NEW or DESTROY). If a NEW is
required, the new object's id will be assigned to this
attribute. All attributes could get UA_FLAGS
attribute. Currently we support stating that an
attribute is mandatory or that the specification size
corresponds to a lower bound (and that this attribute
could be extended).
We currently add both default attributes and the two
generic UHW_IN and UHW_OUT driver specific attributes.
2. Handler
A handler gets a uverbs_attr_bundle. The handler developer uses
uverbs_attr_get to fetch an attribute of a given id.
Each of these attribute groups correspond to the specification
group defined in the action (clauses 1.b and 1.c respectively).
The indices of these arrays corresponds to the attribute ids
declared in the specifications (clause 2).
The handler is quite simple. It assumes the infrastructure fetched
all objects and locked, created or destroyed them as required by
the specification. Pointer (or blob) attributes were validated to
match their required sizes. After the handler finished, the
infrastructure commits or rollbacks the objects.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In this phase, we don't want to change all the drivers to use
flexible driver's specific attributes. Therefore, we add two default
attributes: UHW_IN and UHW_OUT. These attributes are optional in some
methods and they encode the driver specific command data. We add
a function that extract this data and creates the legacy udata over
it.
Driver's data should start from UVERBS_UDATA_DRIVER_DATA_FLAG. This
turns on the first bit of the namespace, indicating this attribute
belongs to the driver's namespace.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add a new ib_user_ioctl_verbs.h which exports all required ABI
enums and structs to the user-space.
Export the default types to user-space through this file.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In this ioctl interface, processing the command starts from
properties of the command and fetching the appropriate user objects
before calling the handler.
Parsing and validation is done according to a specifier declared by
the driver's code. In the driver, all supported objects are declared.
These objects are separated to different object namepsaces. Dividing
objects to namespaces is done at initialization by using the higher
bits of the object ids. This initialization can mix objects declared
in different places to one parsing tree using in this ioctl interface.
For each object we list all supported methods. Similarly to objects,
methods are separated to method namespaces too. Namespacing is done
similarly to the objects case. This could be used in order to add
methods to an existing object.
Each method has a specific handler, which could be either a default
handler or a driver specific handler.
Along with the handler, a bunch of attributes are specified as well.
Similarly to objects and method, attributes are namespaced and hashed
by their ids at initialization too. All supported attributes are
subject to automatic fetching and validation. These attributes include
the command, response and the method's related objects' ids.
When these entities (objects, methods and attributes) are used, the
high bits of the entities ids are used in order to calculate the hash
bucket index. Then, these high bits are masked out in order to have a
zero based index. Since we use these high bits for both bucketing and
namespacing, we get a compact representation and O(1) array access.
This is mandatory for efficient dispatching.
Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length.
Attributes could be validated through some attributes, like:
(*) Minimum size / Exact size
(*) Fops for FD
(*) Object type for IDR
If an IDR/fd attribute is specified, the kernel also states the object
type and the required access (NEW, WRITE, READ or DESTROY).
All uobject/fd management is done automatically by the infrastructure,
meaning - the infrastructure will fail concurrent commands that at
least one of them requires concurrent access (WRITE/DESTROY),
synchronize actions with device removals (dissociate context events)
and take care of reference counting (increase/decrease) for concurrent
actions invocation. The reference counts on the actual kernel objects
shall be handled by the handlers.
objects
+--------+
| |
| | methods +--------+
| | ns method method_spec +-----+ |len |
+--------+ +------+[d]+-------+ +----------------+[d]+------------+ |attr1+-> |type |
| object +> |method+-> | spec +-> + attr_buckets +-> |default_chain+--> +-----+ |idr_type|
+--------+ +------+ |handler| | | +------------+ |attr2| |access |
| | | | +-------+ +----------------+ |driver chain| +-----+ +--------+
| | | | +------------+
| | +------+
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
+--------+
[d] = Hash ids to groups using the high order bits
The right types table is also chosen by using the high bits from
the ids. Currently we have either default or driver specific groups.
Once validation and object fetching (or creation) completed, we call
the handler:
int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
struct uverbs_attr_bundle *ctx);
ctx bundles attributes of different namespaces. Each element there
is an array of attributes which corresponds to one namespaces of
attributes. For example, in the usually used case:
ctx core
+----------------------------+ +------------+
| core: +---> | valid |
+----------------------------+ | cmd_attr |
| driver: | +------------+
|----------------------------+--+ | valid |
| | cmd_attr |
| +------------+
| | valid |
| | obj_attr |
| +------------+
|
| drivers
| +------------+
+> | valid |
| cmd_attr |
+------------+
| valid |
| cmd_attr |
+------------+
| valid |
| obj_attr |
+------------+
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We should report the network header type in the work completion so that
the kernel can infer the right RoCE type headers.
Reviewed-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This adds information about storage keys to the struct returned by
the KVM_PPC_GET_SMMU_INFO ioctl. The new fields replace a pad field,
which was zeroed by previous kernel versions. Thus userspace that
knows about the new fields will see zeroes when running on an older
kernel, indicating that storage keys are not supported. The size of
the structure has not changed.
The number of keys is hard-coded for the CPUs supported by HV KVM,
which is just POWER7, POWER8 and POWER9.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Define the raw IP type. This is needed for raw IP net devices
like rmnet.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define the Qualcomm multiplexing and aggregation (MAP) ether type 0x00F9.
This is needed for receiving data in the MAP protocol like RMNET. This is
not an officially registered ID.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 45f119bf93.
Eric Dumazet says:
We found at Google a significant regression caused by
45f119bf93 tcp: remove header prediction
In typical RPC (TCP_RR), when a TCP socket receives data, we now call
tcp_ack() while we used to not call it.
This touches enough cache lines to cause a slowdown.
so problem does not seem to be HP removal itself but the tcp_ack()
call. Therefore, it might be possible to remove HP after all, provided
one finds a way to elide tcp_ack for most cases.
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NSH draft says:
An IEEE EtherType, 0x894F, has been allocated for NSH.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the forces IFE lfb type according to IEEE registered
ethertypes. See http://standards-oui.ieee.org/ethertype/eth.txt for more
information. Since there exists the IFE subsystem it can be used there.
This patch also use the correct word "ForCES" instead of "FoRCES" which
is a spelling error inside the IEEE ethertype specification.
Signed-off-by: Alexander Aring <aring@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For understanding how the workload maps to memory channels and hardware
behavior, it's very important to collect address maps with physical
addresses. For example, 3D XPoint access can only be found by filtering
the physical address.
Add a new sample type for physical address.
perf already has a facility to collect data virtual address. This patch
introduces a function to convert the virtual address to physical address.
The function is quite generic and can be extended to any architecture as
long as a virtual address is provided.
- For kernel direct mapping addresses, virt_to_phys is used to convert
the virtual addresses to physical address.
- For user virtual addresses, __get_user_pages_fast is used to walk the
pages tables for user physical address.
- This does not work for vmalloc addresses right now. These are not
resolved, but code to do that could be added.
The new sample type requires collecting the virtual address. The
virtual address will not be output unless SAMPLE_ADDR is applied.
For security, the physical address can only be exposed to root or
privileged user.
Tested-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: mpe@ellerman.id.au
Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add tm_list_size parameter to struct ib_uverbs_create_xsrq.
If SRQ type is tag-matching this field defines maximum size
of tag matching list. Otherwise, it is expected to be zero.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
vmwgfx add fence fd support.
* 'drm-vmwgfx-next' of git://people.freedesktop.org/~syeh/repos_linux:
drm/vmwgfx: Bump the version for fence FD support
drm/vmwgfx: Add export fence to file descriptor support
drm/vmwgfx: Add support for imported Fence File Descriptor
drm/vmwgfx: Prepare to support fence fd
drm/vmwgfx: Fix incorrect command header offset at restart
drm/vmwgfx: Support the NOP_ERROR command
drm/vmwgfx: Restart command buffers after errors
drm/vmwgfx: Move irq bottom half processing to threads
drm/vmwgfx: Don't use drm_irq_[un]install
UAPI Changes:
- Rename u32 to __u32 in struct drm_format_modifier_blob (Lionel)
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* tag 'drm-misc-next-fixes-2017-08-28' of git://anongit.freedesktop.org/git/drm-misc:
drm: rename u32 in __u32 in uapi
This IOCTL provides a mechanism for userspace to trigger a sync object
directly. There are other ways that userspace can trigger a syncobj
such as submitting a dummy batch somewhere or hanging on to a triggered
sync_file and doing an import. This just provides an easy way to
manually trigger the sync object without weird hacks.
The motivation for this IOCTL is Vulkan fences. Vulkan lets you create
a fence already in the signaled state so that you can wait on it
immediatly without stalling. We could also handle this with a new
create flag to ask the driver to create a syncobj that is already
signaled but the IOCTL seemed a bit cleaner and more generic.
v2:
- Take an array of sync objects (Dave Airlie)
v3:
- Throw -EINVAL if pad != 0
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just resets the dma_fence to NULL so it looks like it's never been
signaled. This will be useful once we add the new wait API for allowing
wait on "submit and signal" behavior.
v2:
- Take an array of sync objects (Dave Airlie)
v3:
- Throw -EINVAL if pad != 0
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Vulkan VkFence semantics require that the application be able to perform
a CPU wait on work which may not yet have been submitted. This is
perfectly safe because the CPU wait has a timeout which will get
triggered eventually if no work is ever submitted. This behavior is
advantageous for multi-threaded workloads because, so long as all of the
threads agree on what fences to use up-front, you don't have the extra
cross-thread synchronization cost of thread A telling thread B that it
has submitted its dependent work and thread B is now free to wait.
Within a single process, this can be implemented in the userspace driver
by doing exactly the same kind of tracking the app would have to do
using posix condition variables or similar. However, in order for this
to work cross-process (as is required by VK_KHR_external_fence), we need
to handle this in the kernel.
This commit adds a WAIT_FOR_SUBMIT flag to DRM_IOCTL_SYNCOBJ_WAIT which
instructs the IOCTL to wait for the syncobj to have a non-null fence and
then wait on the fence. Combined with DRM_IOCTL_SYNCOBJ_RESET, you can
easily get the Vulkan behavior.
v2:
- Fix a bug in the invalid syncobj error path
- Unify the wait-all and wait-any cases
v3:
- Unify the timeout == 0 case a bit with the timeout > 0 case
- Use wait_event_interruptible_timeout
v4:
- Use proxy fence
v5:
- Revert to a combination of v2 and v3
- Don't use proxy fences
- Don't use wait_event_interruptible_timeout because it just adds an
extra layer of callbacks
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This requests that the driver create the sync object such that it
already has a signaled dma_fence attached. Because we don't need
anything in particular (just something signaled), we use a dummy null
fence. This is useful for Vulkan which has a similar flag that can be
passed to vkCreateFence.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This interface will allow sync object to be used to back
Vulkan fences. This API is pretty much the vulkan fence waiting
API, and I've ported the code from amdgpu.
v2: accept relative timeout, pass remaining time back
to userspace.
v3: return to absolute timeouts.
v4: absolute zero = poll,
rewrite any/all code to have same operation for arrays
return -EINVAL for 0 fences.
v4.1: fixup fences allocation check, use u64_to_user_ptr
v5: move to sec/nsec, and use timespec64 for calcs.
v6: use -ETIME and drop the out status flag. (-ETIME
is suggested by ickle, I can feel a shed painting)
v7: talked to Daniel/Arnd, use ktime and ns everywhere.
v8: be more careful in the timeout calculations
use uint32_t for counter variables so we don't overflow
graciously handle -ENOINT being returned from dma_fence_wait_timeout
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
MediaTek BTIF controller is the serial interface similar to UART but it
works only as the digital device which is mainly used to communicate with
the connectivity module called CONNSYS inside the SoC which could be mostly
found on those MediaTek SoCs with Bluetooth feature such as MT7622 and
MT7623 SoCs.
And the controller is made as being compatible with the 8250 register
layout with extra registers such as DMA enablement so it tends to be
integrated with reusing 8250 OF driver. However, DMA mode is not being
supported yet in the current driver.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The UAPI has a global list of unique numbers for different port types.
The commit
a2d6a987bf ("serial: 8250: Add new port type for TI DA8xx/66AK2x")
introduced a new port type and brought the collision with two other port
types.
Reuse 95 for it instead.
Fixes: a2d6a987bf ("serial: 8250: Add new port type for TI DA8xx/66AK2x")
Cc: David Lechner <david@lechnology.com>
Cc: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PORT_MFD is not in use since commit
1bd187de53 ("x86, intel-mid: remove Intel MID specific serial support")
Remove leftover.
Fixes: 1bd187de53 ("x86, intel-mid: remove Intel MID specific serial support")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It used to be a gap in port definitions after PORT_MAX_8250. Since the
new drivers are coming the gap become shorter and shorter until the
commit a2d6a987bf ("serial: 8250: Add new port type for TI DA8xx/66AK2x")
completely removed it.
So, while type here is just a formality, make things a little bit more
explicit for this driver and move port types to UAPI header. Note,
it uses two types for now.
Fixes: fddceb8b53 ("tty: 8250: Add 64byte UART support for FSL platforms")
Cc: Priyanka Jain <Priyanka.Jain@freescale.com>
Cc: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The addition of map_flags BPF_SOCKMAP_STRPARSER flags was to handle a
specific use case where we want to have BPF parse program disabled on
an entry in a sockmap.
However, Alexei found the API a bit cumbersome and I agreed. Lets
remove the STRPARSER flag and support the use case by allowing socks
to be in multiple maps. This allows users to create two maps one with
programs attached and one without. When socks are added to maps they
now inherit any programs attached to the map. This is a nice
generalization and IMO improves the API.
The API rules are less ambiguous and do not need a flag:
- When a sock is added to a sockmap we have two cases,
i. The sock map does not have any attached programs so
we can add sock to map without inheriting bpf programs.
The sock may exist in 0 or more other maps.
ii. The sock map has an attached BPF program. To avoid duplicate
bpf programs we only add the sock entry if it does not have
an existing strparser/verdict attached, returning -EBUSY if
a program is already attached. Otherwise attach the program
and inherit strparser/verdict programs from the sock map.
This allows for socks to be in a multiple maps for redirects and
inherit a BPF program from a single map.
Also this patch simplifies the logic around BPF_{EXIST|NOEXIST|ANY}
flags. In the original patch I tried to be extra clever and only
update map entries when necessary. Now I've decided the complexity
is not worth it. If users constantly update an entry with the same
sock for no reason (i.e. update an entry without actually changing
any parameters on map or sock) we still do an alloc/release. Using
this and allowing multiple entries of a sock to exist in a map the
logic becomes much simpler.
Note: Now that multiple maps are supported the "maps" pointer called
when a socket is closed becomes a list of maps to remove the sock from.
To keep the map up to date when a sock is added to the sockmap we must
add the map/elem in the list. Likewise when it is removed we must
remove it from the list. This results in searching the per psock list
on delete operation. On TCP_CLOSE events we walk the list and remove
the psock from all map/entry locations. I don't see any perf
implications in this because at most I have a psock in two maps. If
a psock were to be in many maps its possibly this might be noticeable
on delete but I can't think of a reason to dup a psock in many maps.
The sk_callback_lock is used to protect read/writes to the list. This
was convenient because in all locations we were taking the lock
anyways just after working on the list. Also the lock is per sock so
in normal cases we shouldn't see any contention.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the initial sockmap API we provided strparser and verdict programs
using a single attach command by extending the attach API with a the
attach_bpf_fd2 field.
However, if we add other programs in the future we will be adding a
field for every new possible type, attach_bpf_fd(3,4,..). This
seems a bit clumsy for an API. So lets push the programs using two
new type fields.
BPF_SK_SKB_STREAM_PARSER
BPF_SK_SKB_STREAM_VERDICT
This has the advantage of having a readable name and can easily be
extended in the future.
Updates to samples and sockmap included here also generalize tests
slightly to support upcoming patch for multiple map support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the fields and flags available.
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Deepak Singh Rawat <drawat@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Remove the command payloads that do not have an associated libnvdimm
ioctl. I.e. remove the payloads that would only ever be carried in the
ND_CMD_CALL envelope. This prevents userspace from growing unnecessary
dependencies on this kernel header when userspace already has everything
it needs to craft and send these commands.
Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Reported-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Increase PPL area to 1MB and use it as circular buffer to store PPL. The
entry with highest generation number is the latest one. If PPL to be
written is larger then space left in a buffer, rewind the buffer to the
start (don't wrap it).
Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com>
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
The fe_status variable s is not initialized meaning it can have any
random garbage status. This could be problematic if fe->ops.tune is
false as s is not updated by the call to fe->ops.tune() and a
subsequent check on the change status will using a garbage value.
Fix this by adding FE_NONE to the enum fe_status and initializing
s to this.
Detected by CoverityScan, CID#112887 ("Uninitialized scalar variable")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
These formats are compressed 12-bit raw bayer formats with four different
pixel orders. They are similar to 10-bit variants. The formats added by
this patch are
V4L2_PIX_FMT_SBGGR12P
V4L2_PIX_FMT_SGBRG12P
V4L2_PIX_FMT_SGRBG12P
V4L2_PIX_FMT_SRGGB12P
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
This patch implements the L2 frame encapsulation mechanism, referred to
as T.Encaps.L2 in the SRv6 specifications [1].
A new type of SRv6 tunnel mode is added (SEG6_IPTUN_MODE_L2ENCAP). It only
accepts packets with an existing MAC header (i.e., it will not work for
locally generated packets). The resulting packet looks like IPv6 -> SRH ->
Ethernet -> original L3 payload. The next header field of the SRH is set to
NEXTHDR_NONE.
[1] https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-01
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
The module private data can be modelled independent of its instances so
that it can be reused by the module instances. So move module data to
common manifest which can be referenced by the module instances.
This requires new tokens to be defined to accommodate these changes. The
new tokens will specify buffer sizes, DSP cycles and respective indexes
corresponding to the pcm params in the topology manifest so that driver
need not compute them.
Signed-off-by: Shreyas NC <shreyas.nc@intel.com>
Signed-off-by: Guneshwor Singh <guneshwor.o.singh@intel.com>
Acked-By: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
All other fields use __
Cc: Ben Widawsky <ben@bwidawsk.net>
Fixes: db1689aa61 ("drm: Create a format/modifier blob")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170824150814.5878-1-lionel.g.landwerlin@intel.com
Skylake changed the encoding of the PEBS data source field.
Some combinations are not available anymore, but some new cases
e.g. for L4 cache hit are added.
Fix up the conversion table for Skylake, similar as had been done
for Nehalem.
On Skylake server the encoding for L4 actually means persistent
memory. Handle this case too.
To properly describe it in the abstracted perf format I had to add
some new fields. Since a hit can have only one level add a new
field that is an enumeration, not a bit field to describe
the level. It can describe any level. Some numbers are also
used to describe PMEM and LFB.
Also add a new generic remote flag that can be combined with
the generic level to signify a remote cache.
And there is an extension field for the snoop indication to handle
the Forward state.
I didn't add a generic flag for hops because it's not needed
for Skylake.
I changed the existing encodings for older CPUs to also fill in the
new level and remote fields.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/20170816222156.19953-3-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Expose enhanced multi packet WQE capability to user space through
query_device by uhw.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Set the field to allow posting multi packet send WQEs if hardware
supports this feature. This doesn't mean the send WQEs will be for
multi packet unless the send WQE was prepared according to multi
packet send WQE format.
User space shall use flag MLX5_IB_ALLOW_MPW to check if hardware
supports MPW and allows MPW in SQ context.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Software parsing (SWP) is a feature that can be used to instruct the
device to stop using its internal parser and to parse packets on the
transmit path according to offsets set for each packets.
Through this feature, the device allows the handling of checksum and
LSO by the hardware according to the location of IP and TCP/UDP
headers.
Enable SW parsing on Raw Ethernet send queue by default if firmware
supports it and report these capabilities to user space.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
rx_key_len is not in use and needs to be removed.
Fixes: 3078f5f1bd ("IB/mlx4: Add support for RSS QP")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The mlx4 ABI defines to have structures with alignment of 64B.
Fixes: 400b1ebcfe ("IB/mlx4: Add support for WQ related verbs")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Avoid extra padding by replacing the order of inl_recv_sz and reserved,
otherwise 'mlx4_ib_create_qp' structure might be larger than legacy user
input leading to copy of some garbage data from the user space buffer.
Fixes: ea30b966f7 ('IB/mlx4: Add inline-receive support')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This will be used by the IPv4 host table which will be introduced in the
following patches. This header is global and can be reused by many
drivers.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used by the IPv4 host table which will be introduced in the
following patches. This header is global and can be reused by many
drivers.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add eDPC support. Get and print the RP PIO error information when the
trigger condition is RP PIO error.
For more information on eDPC, please see PCI Express Base Specification
Revision 3.1, section 6.2.10.3, or view the PCI-SIG eDPC ECN here:
https://pcisig.com/sites/default/files/specification_documents/ECN_Enhanced_DPC_2012-11-19_final.pdf
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
PCI_CAP_EXP_ENDPOINT_SIZEOF_V1 defines the size of the PCIe capability
structure for v1 devices with link, but we also have a need in the vfio
code for sizing the capability for devices without link, such as Root
Complex Integrated Endpoints. Create a separate define for this ending the
structure before the link fields.
Additionally, this reveals that PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 is currently
incorrect, ending the capability length before the v2 link fields. Rename
this to specify an RC Integrated Endpoint (no link) capability length and
move PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 to include the link fields as we have
for the v1 version.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
[bhelgaas: add "_" in "PCI_CAP_EXP_RC ENDPOINT_SIZEOF_V2 44"]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Lguest seems to be rather unused these days. It has seen only patches
ensuring it still builds the last two years and its official state is
"Odd Fixes".
Remove it in order to be able to clean up the paravirt code.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: boris.ostrovsky@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: rusty@rustcorp.com.au
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170816173157.8633-3-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There's some stuff still up in the air, let's not get stuck with a
subpar ABI. I'll follow up with something better for 4.14.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
binder_fd_array_object starts with a 4-byte header,
followed by a few fields that are 8 bytes when
ANDROID_BINDER_IPC_32BIT=N.
This can cause alignment issues in a 64-bit kernel
with a 32-bit userspace, as on x86_32 an 8-byte primitive
may be aligned to a 4-byte address. Pad with a __u32
to fix this.
Signed-off-by: Martijn Coenen <maco@android.com>
Cc: stable <stable@vger.kernel.org> # 4.11+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The patch adds ERSPAN type II tunnel support. The implementation
is based on the draft at [1]. One of the purposes is for Linux
box to be able to receive ERSPAN monitoring traffic sent from
the Cisco switch, by creating a ERSPAN tunnel device.
In addition, the patch also adds ERSPAN TX, so Linux virtual
switch can redirect monitored traffic to the ERSPAN tunnel device.
The traffic will be encapsulated into ERSPAN and sent out.
The implementation reuses tunnel key as ERSPAN session ID, and
field 'erspan' as ERSPAN Index fields:
./ip link add dev ers11 type erspan seq key 100 erspan 123 \
local 172.16.1.200 remote 172.16.1.100
To use the above device as ERSPAN receiver, configure
Nexus 5000 switch as below:
monitor session 100 type erspan-source
erspan-id 123
vrf default
destination ip 172.16.1.200
source interface Ethernet1/11 both
source interface Ethernet1/12 both
no shut
monitor erspan origin ip-address 172.16.1.100 global
[1] https://tools.ietf.org/html/draft-foschiano-erspan-01
[2] iproute2 patch: http://marc.info/?l=linux-netdev&m=150306086924951&w=2
[3] test script: http://marc.info/?l=linux-netdev&m=150231021807304&w=2
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Meenakshi Vohra <mvohra@vmware.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not a big pull request this time around. Only 49 non-merge
commits. This pull request is, however, all over the place. Most of
the changes are in the bdc driver adding support for USB Phy layer and
PM.
Renesas adds support for R-Car H3 ES2.0 and R-Car M3-W SoCs.
Also here is PM_RUNTIME support for dwc3-keystone.
UDC Core got a DMA unmap fix to make sure we only unmap requests that
were, indeed, mapped.
Other than these, we have a lot of cleanups, many of them adding
'const' to several places.
-----BEGIN PGP SIGNATURE-----
iQJRBAABCAA7FiEElLzh7wn96CXwjh2IzL64meEamQYFAlmaxjsdHGZlbGlwZS5i
YWxiaUBsaW51eC5pbnRlbC5jb20ACgkQzL64meEamQZOXRAAlxA/jU0DVtZ8YAAC
6GRVni3xiEkoEoGA8tQwPZVZh0QRlupKtq5E31xWxjW2Hmtd/159WlIWZcFo3Cgi
ChDZ7yBQ0NGJhOS8X1zYSM6Rl9DDzKtk1vh7EXhz1CjG3aYTqwELLy9ZvCJUTG4w
O57XNH8GtRptmloOrFT+LvI6lb2EpcxGSTMhf0kUdnAC3Iw6gBL8PGr+jnoTLQ4M
gcddC5oQ24CpPFB/mszGd3Ro2USiNipttJmZJ/yrYLGcBIoIL4oyNbbAkhvcXAai
4cBsC7SELifgTJZ2npoUh0z+tRsmQ668zqXS81QgS8/Y6uFgmp4PWoTiGCLbpEm0
l/T5jMDeHFySmBM8+0opbXjXzK2FVG+NnGNkRPxpwoue6cUlay2dw2PGVhP0/6Y9
DQfv3bOHsEEe2ywp3J7UAZPFw4UaLKM8yWJ/Tv0DUhDolG8bA2PerU5vX5W6FufY
tWkvpTjRWKBVgHfYIUJoYtxLMsEubHQeuuc/hKEU+uqJ3161IQyUbOx547Y7toVi
E2mdDfN+Zv/PW5lNJ+GguK3eTxeJ8w4aBKC6VY+mLoATEDa5xzq9OzGJ118LcV3V
f4Nw7UsqinJYxCJMKgZHHCnrd6ct0wmhcsiWq9J0fSRPSzwzY3IrdbkWj24Gcc8V
Z5/lfdjee2Jx1TrUI3Lasew0qP4=
=FSil
-----END PGP SIGNATURE-----
Merge tag 'usb-for-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-next
Felipe writes:
usb: changes for v4.14 merge window
Not a big pull request this time around. Only 49 non-merge
commits. This pull request is, however, all over the place. Most of
the changes are in the bdc driver adding support for USB Phy layer and
PM.
Renesas adds support for R-Car H3 ES2.0 and R-Car M3-W SoCs.
Also here is PM_RUNTIME support for dwc3-keystone.
UDC Core got a DMA unmap fix to make sure we only unmap requests that
were, indeed, mapped.
Other than these, we have a lot of cleanups, many of them adding
'const' to several places.
Final pile of features for 4.14
- New ioctl to change NOA configurations, plus prep (Lionel)
- CCS (color compression) scanout support, based on the fancy new
modifier additions (Ville&Ben)
- Document i915 register macro style (Jani)
- Many more gen10/cnl patches (Rodrigo, Pualo, ...)
- More gpu reset vs. modeset duct-tape to restore the old way.
- prep work for cnl: hpd_pin reorg (Rodrigo), support for more power
wells (Imre), i2c pin reorg (Anusha)
- drm_syncobj support (Jason Ekstrand)
- forcewake vs gpu reset fix (Chris)
- execbuf speedup for the no-relocs fastpath, anv/vk low-overhead ftw (Chris)
- switch to idr/radixtree instead of the resizing ht for execbuf id->vma
lookups (Chris)
gvt:
- MMIO save/restore optimization (Changbin)
- Split workload scan vs. dispatch for more parallel exec (Ping)
- vGPU full 48bit ppgtt support (Joonas, Tina)
- vGPU hw id expose for perf (Zhenyu)
Bunch of work all over to make the igt CI runs more complete/stable.
Watch https://intel-gfx-ci.01.org/tree/drm-tip/shards-all.html for
progress in getting this ready. Next week we're going into production
mode (i.e. will send results to intel-gfx) on hsw, more platforms to
come.
Also, a new maintainer tram, I'm stepping out. Huge thanks to Jani for
being an awesome co-maintainer the past few years, and all the best
for Jani, Joonas&Rodrigo as the new maintainers!
* tag 'drm-intel-next-2017-08-18' of git://anongit.freedesktop.org/git/drm-intel: (179 commits)
drm/i915: Update DRIVER_DATE to 20170818
drm/i915/bxt: use NULL for GPIO connection ID
drm/i915: Mark the GT as busy before idling the previous request
drm/i915: Trivial grammar fix s/opt of/opt out of/ in comment
drm/i915: Replace execbuf vma ht with an idr
drm/i915: Simplify eb_lookup_vmas()
drm/i915: Convert execbuf to use struct-of-array packing for critical fields
drm/i915: Check context status before looking up our obj/vma
drm/i915: Don't use MI_STORE_DWORD_IMM on Sandybridge/vcs
drm/i915: Stop touching forcewake following a gen6+ engine reset
MAINTAINERS: drm/i915 has a new maintainer team
drm/i915: Split pin mapping into per platform functions
drm/i915/opregion: let user specify override VBT via firmware load
drm/i915/cnl: Reuse skl_wm_get_hw_state on Cannonlake.
drm/i915/gen10: implement gen 10 watermarks calculations
drm/i915/cnl: Fix LSPCON support.
drm/i915/vbt: ignore extraneous child devices for a port
drm/i915/cnl: Setup PAT Index.
drm/i915/edp: Allow alternate fixed mode for eDP if available.
drm/i915: Add support for drm syncobjs
...
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2017-08-21
1) Support RX checksum with IPsec crypto offload for esp4/esp6.
From Ilan Tayari.
2) Fixup IPv6 checksums when doing IPsec crypto offload.
From Yossi Kuperman.
3) Auto load the xfrom offload modules if a user installs
a SA that requests IPsec offload. From Ilan Tayari.
4) Clear RX offload informations in xfrm_input to not
confuse the TX path with stale offload informations.
From Ilan Tayari.
5) Allow IPsec GSO for local sockets if the crypto operation
will be offloaded.
6) Support setting of an output mark to the xfrm_state.
This mark can be used to to do the tunnel route lookup.
From Lorenzo Colitti.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The CEC_EVENT_PIN_LOW/HIGH defines and the cec_queue_pin_event() function
did not specify that these were about CEC pin events.
Since in the future there will also be HPD pin events it is wise to rename
the event defines and function to CEC_EVENT_PIN_CEC_LOW/HIGH and
cec_queue_pin_cec_event() now before these become part of the ABI.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
The current map creation API does not allow to provide the numa-node
preference. The memory usually comes from where the map-creation-process
is running. The performance is not ideal if the bpf_prog is known to
always run in a numa node different from the map-creation-process.
One of the use case is sharding on CPU to different LRU maps (i.e.
an array of LRU maps). Here is the test result of map_perf_test on
the INNER_LRU_HASH_PREALLOC test if we force the lru map used by
CPU0 to be allocated from a remote numa node:
[ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ]
># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec
4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec
3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec
6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec
2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec
1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec
7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec
0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec #<<<
After specifying numa node:
># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec
3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec
1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec
6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec
2:inner_lru_hash_map_perf pre-alloc 1614630 events per sec
4:inner_lru_hash_map_perf pre-alloc 1612651 events per sec
7:inner_lru_hash_map_perf pre-alloc 1609337 events per sec
0:inner_lru_hash_map_perf pre-alloc 1619340 events per sec #<<<
This patch adds one field, numa_node, to the bpf_attr. Since numa node 0
is a valid node, a new flag BPF_F_NUMA_NODE is also added. The numa_node
field is honored if and only if the BPF_F_NUMA_NODE flag is set.
Numa node selection is not supported for percpu map.
This patch does not change all the kmalloc. F.e.
'htab = kzalloc()' is not changed since the object
is small enough to stay in the cache.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
to be used in combination with tcp option set support to mimic
iptables TCPMSS --clamp-mss-to-pmtu.
v2: Eric Dumazet points out dst must be initialized.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This allows setting 2 and 4 byte quantities in the tcp option space.
Main purpose is to allow native replacement for xt_TCPMSS to
work around pmtu blackholes.
Writes to kind and len are now allowed at the moment, it does not seem
useful to do this as it causes corruption of the tcp option space.
We can always lift this restriction later if a use-case appears.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is useful for directly looking up a task based on class id rather than
having to scan through all open file descriptors.
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built
from all runqueues for which current thread's mm is the same as the
thread calling sys_membarrier. It executes faster than the non-expedited
variant (no blocking). It also works on NOHZ_FULL configurations.
Scheduler-wise, it requires a memory barrier before and after context
switching between processes (which have different mm). The memory
barrier before context switch is already present. For the barrier after
context switch:
* Our TSO archs can do RELEASE without being a full barrier. Look at
x86 spin_unlock() being a regular STORE for example. But for those
archs, all atomics imply smp_mb and all of them have atomic ops in
switch_mm() for mm_cpumask(), and on x86 the CR3 load acts as a full
barrier.
* From all weakly ordered machines, only ARM64 and PPC can do RELEASE,
the rest does indeed do smp_mb(), so there the spin_unlock() is a full
barrier and we're good.
* ARM64 has a very heavy barrier in switch_to(), which suffices.
* PPC just removed its barrier from switch_to(), but appears to be
talking about adding something to switch_mm(). So add a
smp_mb__after_unlock_lock() for now, until this is settled on the PPC
side.
Changes since v3:
- Properly document the memory barriers provided by each architecture.
Changes since v2:
- Address comments from Peter Zijlstra,
- Add smp_mb__after_unlock_lock() after finish_lock_switch() in
finish_task_switch() to add the memory barrier we need after storing
to rq->curr. This is much simpler than the previous approach relying
on atomic_dec_and_test() in mmdrop(), which actually added a memory
barrier in the common case of switching between userspace processes.
- Return -EINVAL when MEMBARRIER_CMD_SHARED is used on a nohz_full
kernel, rather than having the whole membarrier system call returning
-ENOSYS. Indeed, CMD_PRIVATE_EXPEDITED is compatible with nohz_full.
Adapt the CMD_QUERY mask accordingly.
Changes since v1:
- move membarrier code under kernel/sched/ because it uses the
scheduler runqueue,
- only add the barrier when we switch from a kernel thread. The case
where we switch from a user-space thread is already handled by
the atomic_dec_and_test() in mmdrop().
- add a comment to mmdrop() documenting the requirement on the implicit
memory barrier.
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: gromer@google.com
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Dave Watson <davejwatson@fb.com>
UAPI Changes:
- vc4: Allow userspace to dictate rendering order in submit_cl ioctl (Eric)
Cross-subsystem Changes:
- vboxvideo: One of Cihangir's patches applies to vboxvideo which is maintained
in staging
Core Changes:
- atomic_legacy_backoff is officially killed (Daniel)
- Extract drm_device.h (Daniel)
- Unregister drm device on unplug (Daniel)
- Rename deprecated drm_*_(un)?reference functions to drm_*_{get|put} (Cihangir)
Driver Changes:
- vc4: Error/destroy path cleanups, log level demotion, edid leak (Eric)
- various: Make various drm_*_funcs structs const (Bhumika)
- tinydrm: add support for LEGO MINDSTORMS EV3 LCD (David)
- various: Second half of .dumb_{map_offset|destroy} defaults set (Noralf)
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Eric Anholt <eric@anholt.net>
Cc: Bhumika Goyal <bhumirks@gmail.com>
Cc: Cihangir Akturk <cakturk@gmail.com>
Cc: David Lechner <david@lechnology.com>
Cc: Noralf Trønnes <noralf@tronnes.org>
* tag 'drm-misc-next-2017-08-16' of git://anongit.freedesktop.org/git/drm-misc: (50 commits)
drm/gem-cma-helper: Remove drm_gem_cma_dumb_map_offset()
drm/virtio: Use the drm_driver.dumb_destroy default
drm/bochs: Use the drm_driver.dumb_destroy default
drm/mgag200: Use the drm_driver.dumb_destroy default
drm/exynos: Use .dumb_map_offset and .dumb_destroy defaults
drm/msm: Use the drm_driver.dumb_destroy default
drm/ast: Use the drm_driver.dumb_destroy default
drm/qxl: Use the drm_driver.dumb_destroy default
drm/udl: Use the drm_driver.dumb_destroy default
drm/cirrus: Use the drm_driver.dumb_destroy default
drm/tegra: Use .dumb_map_offset and .dumb_destroy defaults
drm/gma500: Use .dumb_map_offset and .dumb_destroy defaults
drm/mxsfb: Use .dumb_map_offset and .dumb_destroy defaults
drm/meson: Use .dumb_map_offset and .dumb_destroy defaults
drm/kirin: Use .dumb_map_offset and .dumb_destroy defaults
drm/vc4: Continue the switch to drm_*_put() helpers
drm/vc4: Fix leak of HDMI EDID
dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly v2
dma-buf: add reservation_object_copy_fences (v2)
drm/tinydrm: add support for LEGO MINDSTORMS EV3 LCD
...
Recently we added a new map type called dev map used to forward XDP
packets between ports (6093ec2dc3). This patches introduces a
similar notion for sockets.
A sockmap allows users to add participating sockets to a map. When
sockets are added to the map enough context is stored with the
map entry to use the entry with a new helper
bpf_sk_redirect_map(map, key, flags)
This helper (analogous to bpf_redirect_map in XDP) is given the map
and an entry in the map. When called from a sockmap program, discussed
below, the skb will be sent on the socket using skb_send_sock().
With the above we need a bpf program to call the helper from that will
then implement the send logic. The initial site implemented in this
series is the recv_sock hook. For this to work we implemented a map
attach command to add attributes to a map. In sockmap we add two
programs a parse program and a verdict program. The parse program
uses strparser to build messages and pass them to the verdict program.
The parse programs use the normal strparser semantics. The verdict
program is of type SK_SKB.
The verdict program returns a verdict SK_DROP, or SK_REDIRECT for
now. Additional actions may be added later. When SK_REDIRECT is
returned, expected when bpf program uses bpf_sk_redirect_map(), the
sockmap logic will consult per cpu variables set by the helper routine
and pull the sock entry out of the sock map. This pattern follows the
existing redirect logic in cls and xdp programs.
This gives the flow,
recv_sock -> str_parser (parse_prog) -> verdict_prog -> skb_send_sock
\
-> kfree_skb
As an example use case a message based load balancer may use specific
logic in the verdict program to select the sock to send on.
Sample programs are provided in future patches that hopefully illustrate
the user interfaces. Also selftests are in follow-on patches.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A class of programs, run from strparser and soon from a new map type
called sock map, are used with skb as the context but on established
sockets. By creating a specific program type for these we can use
bpf helpers that expect full sockets and get the verifier to ensure
these helpers are not used out of context.
The new type is BPF_PROG_TYPE_SK_SKB. This patch introduces the
infrastructure and type.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
v2:
* Renamed ALLOC_MEMORY_OF_SCRATCH to SET_SCRATCH_BACKING_VA
* Removed size parameter from the ioctl, it was unused
* Removed hole in ioctl number space
* No more call to write_config_static_mem
* Return correct error code from ioctl
Signed-off-by: Moses Reuben <moses.reuben@amd.com>
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
IPv6 routes currently lack nexthop flags as in IPv4. This has several
implications.
In the forwarding path, it requires us to check the carrier state of the
nexthop device and potentially ignore a linkdown route, instead of
checking for RTNH_F_LINKDOWN.
It also requires capable drivers to use the user facing IPv6-specific
route flags to provide offload indication, instead of using the nexthop
flags as in IPv4.
Add nexthop flags to IPv6 routes in the 40 bytes hole and use it to
provide offload indication instead of the RTF_OFFLOAD flag, which is
removed while it's still not part of any official kernel release.
In the near future we would like to use the field for the
RTNH_F_{LINKDOWN,DEAD} flags, but this change is more involved and might
not be ready in time for the current cycle.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add zstd compression and decompression support to BtrFS. zstd at its
fastest level compresses almost as well as zlib, while offering much
faster compression and decompression, approaching lzo speeds.
I benchmarked btrfs with zstd compression against no compression, lzo
compression, and zlib compression. I benchmarked two scenarios. Copying
a set of files to btrfs, and then reading the files. Copying a tarball
to btrfs, extracting it to btrfs, and then reading the extracted files.
After every operation, I call `sync` and include the sync time.
Between every pair of operations I unmount and remount the filesystem
to avoid caching. The benchmark files can be found in the upstream
zstd source repository under
`contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}`
[1] [2].
I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD.
The first compression benchmark is copying 10 copies of the unzipped
Silesia corpus [3] into a BtrFS filesystem mounted with
`-o compress-force=Method`. The decompression benchmark times how long
it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is
measured by comparing the output of `df` and `du`. See the benchmark file
[1] for details. I benchmarked multiple zstd compression levels, although
the patch uses zstd level 1.
| Method | Ratio | Compression MB/s | Decompression speed |
|---------|-------|------------------|---------------------|
| None | 0.99 | 504 | 686 |
| lzo | 1.66 | 398 | 442 |
| zlib | 2.58 | 65 | 241 |
| zstd 1 | 2.57 | 260 | 383 |
| zstd 3 | 2.71 | 174 | 408 |
| zstd 6 | 2.87 | 70 | 398 |
| zstd 9 | 2.92 | 43 | 406 |
| zstd 12 | 2.93 | 21 | 408 |
| zstd 15 | 3.01 | 11 | 354 |
The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it
measures the compression ratio, extracts the tar, and deletes the tar.
Then it measures the compression ratio again, and `tar`s the extracted
files into `/dev/null`. See the benchmark file [2] for details.
| Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) |
|--------|-----------|---------------|----------|------------|----------|
| None | 0.97 | 0.78 | 0.981 | 5.501 | 8.807 |
| lzo | 2.06 | 1.38 | 1.631 | 8.458 | 8.585 |
| zlib | 3.40 | 1.86 | 7.750 | 21.544 | 11.744 |
| zstd 1 | 3.57 | 1.85 | 2.579 | 11.479 | 9.389 |
[1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh
[3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
[4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz
zstd source repository: https://github.com/facebook/zstd
Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
This commit adds support for waiting on or signaling DRM syncobjs as
part of execbuf. It does so by hijacking the currently unused cliprects
pointer to instead point to an array of i915_gem_exec_fence structs
which containe a DRM syncobj and a flags parameter which specifies
whether to wait on it or to signal it. This implementation
theoretically allows for both flags to be set in which case it waits on
the dma_fence that was in the syncobj and then immediately replaces it
with the dma_fence from the current execbuf.
v2:
- Rebase on new syncobj API
v3:
- Pull everything out into helpers
- Do all allocation in gem_execbuffer2
- Pack the flags in the bottom 2 bits of the drm_syncobj*
v4:
- Prevent a potential race on syncobj->fence
Testcase: igt/gem_exec_fence/syncobj*
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Link: https://patchwork.freedesktop.org/patch/msgid/1499289202-25441-1-git-send-email-jason.ekstrand@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170815145733.4562-1-chris@chris-wilson.co.uk
Introducing USB charger type and state definition can help
to support USB charging which will be added in USB phy core.
Signed-off-by: Baolin Wang <baolin.wang@linaro.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZkNpUAAoJEHm+PkMAQRiGr68H/2nr8kxpoUhZ7eA5C71waCjh
gnJSevkzJAp+fCb0KfQFAp1qvpmLLle4e6tAxYgTQZg4Z3W5cJJNfxu9TzY5sGuL
o9QUr43XzABepW4e4jhRtZv6dj3K6XruNeDQKXDZTDcc/S8zoiS/Pltq7VgPcAuM
kX+3qsNdUyknngD6b0z9NtJkb0mHKY6J8MpraWRO34egDwsaN/tuhRj0DRQpCoyQ
x/k+hMbc9MB9Dn8cfACo6Omb+r5Rfd7dTBUAju/TnIIgs//9voHba307N7XvLJZg
kWc8MqMQQZXfRZHB0atpDMHyZS/XQRlNPXj76j0+Ud/byODKTFkkazmgTpALvj8=
=CxeU
-----END PGP SIGNATURE-----
Backmerge tag 'v4.13-rc5' into drm-next
Linux 4.13-rc5
There's a really nasty nouveau collision, hopefully someone can take a look
once I pushed this out.
Right now, SECCOMP_RET_KILL_THREAD (neé SECCOMP_RET_KILL) kills the
current thread. There have been a few requests for this to kill the entire
process (the thread group). This cannot be just changed (discovered when
adding coredump support since coredumping kills the entire process)
because there are userspace programs depending on the thread-kill
behavior.
Instead, implement SECCOMP_RET_KILL_PROCESS, which is 0x80000000, and can
be processed as "-1" by the kernel, below the existing RET_KILL that is
ABI-set to "0". For userspace, SECCOMP_RET_ACTION_FULL is added to expand
the mask to the signed bit. Old userspace using the SECCOMP_RET_ACTION
mask will see SECCOMP_RET_KILL_PROCESS as 0 still, but this would only
be visible when examining the siginfo in a core dump from a RET_KILL_*,
where it will think it was thread-killed instead of process-killed.
Attempts to introduce this behavior via other ways (filter flags,
seccomp struct flags, masked RET_DATA bits) all come with weird
side-effects and baggage. This change preserves the central behavioral
expectations of the seccomp filter engine without putting too great
a burden on changes needed in userspace to use the new action.
The new action is discoverable by userspace through either the new
actions_avail sysctl or through the SECCOMP_GET_ACTION_AVAIL seccomp
operation. If used without checking for availability, old kernels
will treat RET_KILL_PROCESS as RET_KILL_THREAD (since the old mask
will produce RET_KILL_THREAD).
Cc: Paul Moore <paul@paul-moore.com>
Cc: Fabricio Voznika <fvoznika@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
This introduces the BPF return value for SECCOMP_RET_KILL_PROCESS to kill
an entire process. This cannot yet be reached by seccomp, but it changes
the default-kill behavior (for unknown return values) from kill-thread to
kill-process.
Signed-off-by: Kees Cook <keescook@chromium.org>
In preparation for adding SECCOMP_RET_KILL_PROCESS, rename SECCOMP_RET_KILL
to the more accurate SECCOMP_RET_KILL_THREAD.
The existing selftest values are intentionally left as SECCOMP_RET_KILL
just to be sure we're exercising the alias.
Signed-off-by: Kees Cook <keescook@chromium.org>
Add a new action, SECCOMP_RET_LOG, that logs a syscall before allowing
the syscall. At the implementation level, this action is identical to
the existing SECCOMP_RET_ALLOW action. However, it can be very useful when
initially developing a seccomp filter for an application. The developer
can set the default action to be SECCOMP_RET_LOG, maybe mark any
obviously needed syscalls with SECCOMP_RET_ALLOW, and then put the
application through its paces. A list of syscalls that triggered the
default action (SECCOMP_RET_LOG) can be easily gleaned from the logs and
that list can be used to build the syscall whitelist. Finally, the
developer can change the default action to the desired value.
This provides a more friendly experience than seeing the application get
killed, then updating the filter and rebuilding the app, seeing the
application get killed due to a different syscall, then updating the
filter and rebuilding the app, etc.
The functionality is similar to what's supported by the various LSMs.
SELinux has permissive mode, AppArmor has complain mode, SMACK has
bring-up mode, etc.
SECCOMP_RET_LOG is given a lower value than SECCOMP_RET_ALLOW as allow
while logging is slightly more restrictive than quietly allowing.
Unfortunately, the tests added for SECCOMP_RET_LOG are not capable of
inspecting the audit log to verify that the syscall was logged.
With this patch, the logic for deciding if an action will be logged is:
if action == RET_ALLOW:
do not log
else if action == RET_KILL && RET_KILL in actions_logged:
log
else if action == RET_LOG && RET_LOG in actions_logged:
log
else if filter-requests-logging && action in actions_logged:
log
else if audit_enabled && process-is-being-audited:
log
else:
do not log
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Add a new filter flag, SECCOMP_FILTER_FLAG_LOG, that enables logging for
all actions except for SECCOMP_RET_ALLOW for the given filter.
SECCOMP_RET_KILL actions are always logged, when "kill" is in the
actions_logged sysctl, and SECCOMP_RET_ALLOW actions are never logged,
regardless of this flag.
This flag can be used to create noisy filters that result in all
non-allowed actions to be logged. A process may have one noisy filter,
which is loaded with this flag, as well as a quiet filter that's not
loaded with this flag. This allows for the actions in a set of filters
to be selectively conveyed to the admin.
Since a system could have a large number of allocated seccomp_filter
structs, struct packing was taken in consideration. On 64 bit x86, the
new log member takes up one byte of an existing four byte hole in the
struct. On 32 bit x86, the new log member creates a new four byte hole
(unavoidable) and consumes one of those bytes.
Unfortunately, the tests added for SECCOMP_FILTER_FLAG_LOG are not
capable of inspecting the audit log to verify that the actions taken in
the filter were logged.
With this patch, the logic for deciding if an action will be logged is:
if action == RET_ALLOW:
do not log
else if action == RET_KILL && RET_KILL in actions_logged:
log
else if filter-requests-logging && action in actions_logged:
log
else if audit_enabled && process-is-being-audited:
log
else:
do not log
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Userspace code that needs to check if the kernel supports a given action
may not be able to use the /proc/sys/kernel/seccomp/actions_avail
sysctl. The process may be running in a sandbox and, therefore,
sufficient filesystem access may not be available. This patch adds an
operation to the seccomp(2) syscall that allows userspace code to ask
the kernel if a given action is available.
If the action is supported by the kernel, 0 is returned. If the action
is not supported by the kernel, -1 is returned with errno set to
-EOPNOTSUPP. If this check is attempted on a kernel that doesn't support
this new operation, -1 is returned with errno set to -EINVAL meaning
that userspace code will have the ability to differentiate between the
two error cases.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
linux/errno.h is very sensitive to coordination with libc headers.
Nothing in linux/quota.h needs it, so this change allows using
this header in more contexts.
Signed-off-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
On systems that use mark-based routing it may be necessary for
routing lookups to use marks in order for packets to be routed
correctly. An example of such a system is Android, which uses
socket marks to route packets via different networks.
Currently, routing lookups in tunnel mode always use a mark of
zero, making routing incorrect on such systems.
This patch adds a new output_mark element to the xfrm state and
a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
mark differs from the existing xfrm mark in two ways:
1. The xfrm mark is used to match xfrm policies and states, while
the xfrm output mark is used to set the mark (and influence
the routing) of the packets emitted by those states.
2. The existing mark is constrained to be a subset of the bits of
the originating socket or transformed packet, but the output
mark is arbitrary and depends only on the state.
The use of a separate mark provides additional flexibility. For
example:
- A packet subject to two transforms (e.g., transport mode inside
tunnel mode) can have two different output marks applied to it,
one for the transport mode SA and one for the tunnel mode SA.
- On a system where socket marks determine routing, the packets
emitted by an IPsec tunnel can be routed based on a mark that
is determined by the tunnel, not by the marks of the
unencrypted packets.
- Support for setting the output marks can be introduced without
breaking any existing setups that employ both mark-based
routing and xfrm tunnel mode. Simply changing the code to use
the xfrm mark for routing output packets could xfrm mark could
change behaviour in a way that breaks these setups.
If the output mark is unspecified or set to zero, the mark is not
set or changed.
Tested: make allyesconfig; make -j64
Tested: https://android-review.googlesource.com/452776
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
SKL+ display engine can scan out certain kinds of compressed surfaces
produced by the render engine. This involved telling the display engine
the location of the color control surfae (CCS) which describes which
parts of the main surface are compressed and which are not. The location
of CCS is provided by userspace as just another plane with its own offset.
By providing our own format information for the CCS formats, we should
be able to make framebuffer_check() do the right thing for the CCS
surface as well.
Note that we'll return the same format info for both Y and Yf tiled
format as that's what happens with the non-CCS Y vs. Yf as well. If
desired, we could potentially return a unique pointer for each
pixel_format+tiling+ccs combination, in which case we immediately be
able to tell if any of that stuff changed by just comparing the
pointers. But that does sound a bit wasteful space wise.
v2: Drop the 'dev' argument from the hook
v3: Include the description of the CCS surface layout
v4: Pretend CCS tiles are regular 128 byte wide Y tiles (Jason)
v5: Re-drop 'dev', fix commit message, add missing drm_fourcc.h
description of CCS layout. (daniels)
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v3)
Reviewed-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Ville Syrjä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Stone <daniels@collabora.com>
Ben Widawsky/Daniel Stone need the extended modifier support from
drm-misc to be able to merge CCS support for i915.ko
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
According to the IB specification, the LID and SM_LID
are 16-bit wide, but to support OmniPath users, export
it as 32-bit value from the beginning.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Add Node GUID and system image GUID to the device properties
exported by RDMA netlink, to be used by RDMAtool.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The port capability mask is exposed to user space via sysfs interface,
while device capabilities are available for verbs only.
This patch provides those capabilities through netlink interface.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Introduce new defines to rdma_netlink.h, so the RDMA configuration tool
will be able to communicate with RDMA subsystem by using the shared defines.
The addition of new client (NLDEV) revealed the fact that we exposed by
mistake the RDMA_NL_I40IW define which is not backed by any RDMA netlink
by now and it won't be exposed in the future too. So this patch reuses
the value and deletes the old defines.
The NLDEV operates with objects. The struct ib_device has two straightforward
objects: device itself and ports of that device.
This brings us to propose the following commands to work on those objects:
* RDMA_NLDEV_CMD_{GET,SET,NEW,DEL} - works on ib_device itself
* RDMA_NLDEV_CMD_PORT_{GET,SET,NEW,DEL} - works on ports of specific ib_device
Those commands receive/return the device index (RDMA_NLDEV_ATTR_DEV_INDEX)
and port index (RDMA_NLDEV_ATTR_PORT_INDEX). For device object accesses,
the RDMA_NLDEV_ATTR_PORT_INDEX will return the maximum number of ports
for specific ib_device and for port access the actual port index.
The port index starts from 1 to follow RDMA/core internal semantics and
the sysfs exposed knobs.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=),
BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that
particularly *JLT/*JLE counterparts involving immediates need
to be rewritten from e.g. X < [IMM] by swapping arguments into
[IMM] > X, meaning the immediate first is required to be loaded
into a register Y := [IMM], such that then we can compare with
Y > X. Note that the destination operand is always required to
be a register.
This has the downside of having unnecessarily increased register
pressure, meaning complex program would need to spill other
registers temporarily to stack in order to obtain an unused
register for the [IMM]. Loading to registers will thus also
affect state pruning since we need to account for that register
use and potentially those registers that had to be spilled/filled
again. As a consequence slightly more stack space might have
been used due to spilling, and BPF programs are a bit longer
due to extra code involving the register load and potentially
required spill/fills.
Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=)
counterparts to the eBPF instruction set. Modifying LLVM to
remove the NegateCC() workaround in a PoC patch at [1] and
allowing it to also emit the new instructions resulted in
cilium's BPF programs that are injected into the fast-path to
have a reduced program length in the range of 2-3% (e.g.
accumulated main and tail call sections from one of the object
file reduced from 4864 to 4729 insns), reduced complexity in
the range of 10-30% (e.g. accumulated sections reduced in one
of the cases from 116432 to 88428 insns), and reduced stack
usage in the range of 1-5% (e.g. accumulated sections from one
of the object files reduced from 824 to 784b).
The modification for LLVM will be incorporated in a backwards
compatible way. Plan is for LLVM to have i) a target specific
option to offer a possibility to explicitly enable the extension
by the user (as we have with -m target specific extensions today
for various CPU insns), and ii) have the kernel checked for
presence of the extensions and enable them transparently when
the user is selecting more aggressive options such as -march=native
in a bpf target context. (Other frontends generating BPF byte
code, e.g. ply can probe the kernel directly for its code
generation.)
[1] https://github.com/borkmann/llvm/tree/bpf-insns
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc can get confused by this code and it thinks dev_features can be
returned uninitialized. So initialize to NULL at the beginning to shut up
the warning.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
This is useful to allow GL to provide defined results for overlapping
glBlitFramebuffer, which X11 in turn uses to accelerate uncomposited
window movement without first blitting to a temporary. x11perf
-copywinwin100 goes from 1850/sec to 4850/sec.
v2: Default to the same behavior as before when the flags aren't
passed. (suggested by Boris)
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20170725162733.28007-2-eric@anholt.net
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Set media_version to LINUX_VERSION_CODE, just as we did for
driver_version.
Nobody ever rememebers to update the version number, but
LINUX_VERSION_CODE will always be updated.
Move the MEDIA_API_VERSION define to the ifndef __KERNEL__ section of the
media.h header. That way kernelspace can't accidentally start to use
it again.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
This patch implements a new type of lightweight tunnel named seg6local.
A seg6local lwt is defined by a type of action and a set of parameters.
The action represents the operation to perform on the packets matching the
lwt's route, and is not necessarily an encapsulation. The set of parameters
are arguments for the processing function.
Each action is defined in a struct seg6_action_desc within
seg6_action_table[]. This structure contains the action, mandatory
attributes, the processing function, and a static headroom size required by
the action. The mandatory attributes are encoded as a bitmask field. The
static headroom is set to a non-zero value when the processing function
always add a constant number of bytes to the skb (e.g. the header size for
encapsulations).
To facilitate rtnetlink-related operations such as parsing, fill_encap,
and cmp_encap, each type of action parameter is associated to three
function pointers, in seg6_action_params[].
All actions defined in seg6_local.h are detailed in [1].
[1] https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-01
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes userspace compilation error:
error: ‘DLM_RESNAME_MAXLEN’ undeclared here (not in a function)
char resource_name[DLM_RESNAME_MAXLEN];
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: David Teigland <teigland@redhat.com>
These are defined in linux/types.h or drm/drm.h. Fixes
user space compilation errors like:
drm/armada_drm.h:26:2: error: unknown type name ‘uint32_t’
uint32_t handle;
^~~~~~~~
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Gabriel Laskar <gabriel@lse.epita.fr>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20170806164428.2273-33-mikko.rapeli@iki.fi
Include <drm/drm.h> instead of <linux/types.h> which on Linux includes
<linux/types.h> and on non-Linux platforms defines __u32 etc types.
Fixes user space compilation errors like:
linux/kfd_ioctl.h:33:2: error: unknown type name ‘uint32_t’
uint32_t major_version; /* from KFD */
^~~~~~~~
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Mirrors the TEE_DESC_PRIVILEGED bit of struct tee_desc:flags into struct
tee_ioctl_version_data:gen_caps as TEE_GEN_CAP_PRIVILEGED in
tee_ioctl_version()
Reviewed-by: Jerome Forissier <jerome.forissier@linaro.org>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
The send call ignores unknown flags. Legacy applications may already
unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a
socket opts in to zerocopy.
Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing.
Processes can also query this socket option to detect kernel support
for the feature. Older kernels will return ENOPROTOOPT.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel supports zerocopy sendmsg in virtio and tap. Expand the
infrastructure to support other socket types. Introduce a completion
notification channel over the socket error queue. Notifications are
returned with ee_origin SO_EE_ORIGIN_ZEROCOPY. ee_errno is 0 to avoid
blocking the send/recv path on receiving notifications.
Add reference counting, to support the skb split, merge, resize and
clone operations possible with SOCK_STREAM and other socket types.
The patch does not yet modify any datapaths.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow user space applications to see which routes are offloaded and
which aren't by setting the RTNH_F_OFFLOAD flag when dumping them.
To be consistent with IPv4, offload indication is provided on a
per-nexthop basis.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The motivation behind this new interface is expose at runtime the
creation of new OA configs which can be used as part of the i915 perf
open interface. This will enable the kernel to learn new configs which
may be experimental, or otherwise not part of the core set currently
available through the i915 perf interface.
v2: Drop DRM_ERROR for userspace errors (Matthew)
Add padding to userspace structure (Matthew)
s/guid/uuid/ (Matthew)
v3: Use u32 instead of int to iterate through registers (Matthew)
v4: Lock access to dynamic config list (Lionel)
v5: by Matthew:
Fix uninitialized error values
Fix incorrect unwiding when opening perf stream
Use kmalloc_array() to store register
Use uuid_is_valid() to valid config uuids
Declare ioctls as write only
Check padding members are set to 0
by Lionel:
Return ENOENT rather than EINVAL when trying to remove non
existing config
v6: by Chris:
Use ref counts for OA configs
Store UUID in drm_i915_perf_oa_config rather then using pointer
Shuffle fields of drm_i915_perf_oa_config to avoid padding
v7: by Chris
Rename uapi pointers fields to end with '_ptr'
v8: by Andrzej, Marek, Sebastian
Update register whitelisting
by Lionel
Add more register names for documentation
Allow configuration programming in non-paranoid mode
Add support for value filter for a couple of registers already
programmed in other part of the kernel
v9: Documentation fix (Lionel)
Allow writing WAIT_FOR_RC6_EXIT only on Gen8+ (Andrzej)
v10: Perform read access_ok() on register pointers (Lionel)
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Andrzej Datczuk <andrzej.datczuk@intel.com>
Reviewed-by: Andrzej Datczuk <andrzej.datczuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-2-lionel.g.landwerlin@intel.com
__user should be used to identify user pointers and not __u64
variables containing pointers.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Two minor conflicts in virtio_net driver (bug fix overlapping addition
of a helper) and MAINTAINERS (new driver edit overlapping revamp of
PHY entry).
Signed-off-by: David S. Miller <davem@davemloft.net>
Updated blob layout (Rob, Daniel, Kristian, xerpi)
v2:
* Removed __packed, and alignment (.+)
* Fix indent in drm_format_modifier fields (Liviu)
* Remove duplicated modifier > 64 check (Liviu)
* Change comment about modifier (Liviu)
* Remove arguments to blob creation, use plane instead (Liviu)
* Fix data types (Ben)
* Make the blob part of uapi (Daniel)
v3:
Remove unused ret field.
Change i, and j to unsigned int (Emil)
v4:
Use plane->modifier_count instead of recounting (Daniel)
v5:
Rename modifiers to modifiers_property (Ville)
Use sizeof(__u32) instead to reflect UAPI nature (Ville)
Make BUILD_BUG_ON for blob header size
Cc: Rob Clark <robdclark@gmail.com>
Cc: Kristian H. Kristensen <hoegsberg@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Daniel Stone <daniels@collabora.com> (v2)
Reviewed-by: Liviu Dudau <liviu@dudau.co.uk> (v2)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v3)
Signed-off-by: Daniel Stone <daniels@collabora.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170724034641.13369-2-ben@bwidawsk.net
This is the plumbing for supporting fb modifiers on planes. Modifiers
have already been introduced to some extent, but this series will extend
this to allow querying modifiers per plane. Based on this, the client to
enable optimal modifications for framebuffers.
This patch simply allows the DRM drivers to initialize their list of
supported modifiers upon initializing the plane.
v2: A minor addition from Daniel
v3:
* Updated commit message
* s/INVALID/DRM_FORMAT_MOD_INVALID (Liviu)
* Remove some excess newlines (Liviu)
* Update comment for > 64 modifiers (Liviu)
v4: Minor comment adjustments (Liviu)
v5: Some new platforms added due to rebase
v6: Add some missed plane inits (or maybe they're new - who knows at
this point) (Daniel)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Daniel Stone <daniels@collabora.com> (v2)
Reviewed-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Daniel Stone <daniels@collabora.com>
Add the following stats into SCM_TIMESTAMPING_OPT_STATS control msg:
TCP_NLA_PACING_RATE
TCP_NLA_DELIVERY_RATE
TCP_NLA_SND_CWND
TCP_NLA_REORDERING
TCP_NLA_MIN_RTT
TCP_NLA_RECUR_RETRANS
TCP_NLA_DELIVERY_RATE_APP_LMT
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
was used by tcp prequeue and header prediction.
TCPFORWARDRETRANS use was removed in january.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same conversion as for table names, use NFT_NAME_MAXLEN as upper
boundary as well.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Same conversion as for table names, use NFT_NAME_MAXLEN as upper
boundary as well.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Same conversion as for table names, use NFT_NAME_MAXLEN as upper
boundary as well.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Allocate all table names dynamically to allow for arbitrary lengths but
introduce NFT_NAME_MAXLEN as an upper sanity boundary. It's value was
chosen to allow using a domain name as per RFC 1035.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds support for filtering based on time since last used.
When we are dumping a large number of actions it is useful to
have the option of filtering based on when the action was last
used to reduce the amount of data crossing to user space.
With this patch the user space app sets the TCA_ROOT_TIME_DELTA
attribute with the value in milliseconds with "time of interest
since now". The kernel converts this to jiffies and does the
filtering comparison matching entries that have seen activity
since then and returns them to user space.
Old kernels and old tc continue to work in legacy mode since
they dont specify this attribute.
Some example (we have 400 actions bound to 400 filters); at
installation time. Using updated when tc setting the time of
interest to 120 seconds earlier (we see 400 actions):
prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
400
go get some coffee and wait for > 120 seconds and try again:
prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
0
Lets see a filter bound to one of these actions:
....
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 2 success 1)
match 7f000002/ffffffff at 12 (success 1 )
action order 1: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1145 sec used 802 sec
Action statistics:
Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
....
that coffee took long, no? It was good.
Now lets ping -c 1 127.0.0.2, then run the actions again:
prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
1
More details please:
prompt$ hackedtc -s actions ls action gact since 120000
action order 0: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1270 sec used 30 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
And the filter?
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10 (rule hit 4 success 2)
match 7f000002/ffffffff at 12 (success 2 )
action order 1: gact action pass
random type none pass val 0
index 23 ref 2 bind 1 installed 1324 sec used 84 sec
Action statistics:
Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When you dump hundreds of thousands of actions, getting only 32 per
dump batch even when the socket buffer and memory allocations allow
is inefficient.
With this change, the user will get as many as possibly fitting
within the given constraints available to the kernel.
The top level action TLV space is extended. An attribute
TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
is set by the user indicating the user is capable of processing
these large dumps. Older user space which doesnt set this flag
doesnt get the large (than 32) batches.
The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
actions are put in a single batch. As such user space app knows how long
to iterate (independent of the type of action being dumped)
instead of hardcoded maximum of 32 thus maintaining backward compat.
Some results dumping 1.5M actions below:
first an unpatched tc which doesnt understand these features...
prompt$ time -p tc actions ls action gact | grep index | wc -l
1500000
real 1388.43
user 2.07
sys 1386.79
Now lets see a patched tc which sets the correct flags when requesting
a dump:
prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
1500000
real 178.13
user 2.02
sys 176.96
That is about 8x performance improvement for tc app which sets its
receive buffer to about 32K.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Generic bitflags attribute content sent to the kernel by user.
With this netlink attr type the user can either set or unset a
flag in the kernel.
The value is a bitmap that defines the bit values being set
The selector is a bitmask that defines which value bit is to be
considered.
A check is made to ensure the rules that a kernel subsystem always
conforms to bitflags the kernel already knows about. i.e
if the user tries to set a bit flag that is not understood then
the _it will be rejected_.
In the most basic form, the user specifies the attribute policy as:
[ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data = &myvalidflags },
where myvalidflags is the bit mask of the flags the kernel understands.
If the user _does not_ provide myvalidflags then the attribute will
also be rejected.
Examples:
value = 0x0, and selector = 0x1
implies we are selecting bit 1 and we want to set its value to 0.
value = 0x2, and selector = 0x2
implies we are selecting bit 2 and we want to set its value to 1.
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds. Various networking devices
which support 25G/40G/100G provides ability to manage supported FEC
modes and the lack of FEC encoding control and reporting today is a
source for interoperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of
base link codeword.
This patch set intends to provide option under ethtool to manage
and report FEC encoding settings for networking devices as per
IEEE 802.3 bj, bm and by specs.
set-fec/show-fec option(s) are designed to provide control and
report the FEC encoding on the link.
SET FEC option:
root@tor: ethtool --set-fec swp1 encoding [off | RS | BaseR | auto]
Encoding: Types of encoding
Off : Turning off any encoding
RS : enforcing RS-FEC encoding on supported speeds
BaseR : enforcing Base R encoding on supported speeds
Auto : IEEE defaults for the speed/medium combination
Here are a few examples of what we would expect if encoding=auto:
- if autoneg is on, we are expecting FEC to be negotiated as on or off
as long as protocol supports it
- if the hardware is capable of detecting the FEC encoding on it's
receiver it will reconfigure its encoder to match
- in absence of the above, the configuration would be set to IEEE
defaults.
>From our understanding , this is essentially what most hardware/driver
combinations are doing today in the absence of a way for users to
control the behavior.
SHOW FEC option:
root@tor: ethtool --show-fec swp1
FEC parameters for swp1:
Active FEC encodings: RS
Configured FEC encodings: RS | BaseR
ETHTOOL DEVNAME output modification:
ethtool devname output:
root@tor:~# ethtool swp1
Settings for swp1:
root@hpe-7712-03:~# ethtool swp18
Settings for swp18:
Supported ports: [ FIBRE ]
Supported link modes: 40000baseCR4/Full
40000baseSR4/Full
40000baseLR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: [RS | BaseR | None | Not reported]
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: [RS | BaseR | None | Not reported]
<<<< One or more FEC modes
Speed: 100000Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 106
Transceiver: internal
Auto-negotiation: off
Link detected: yes
This patch includes following changes
a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
the new get_fecparam/set_fecparam callbacks, provides support
for configuration of forward error correction modes.
b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
are defined so that users can configure these fec modes for supported
and advertising fields as part of link autonegotiation.
Signed-off-by: Vidya Sagar Ravipati <vidya.chowdary@gmail.com>
Signed-off-by: Dustin Byford <dustin@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently blktrace isn't cgroup aware. blktrace prints out task name of
current context, but the task of current context isn't always in the
cgroup where the BIO comes from. We can't use task name to find out IO
cgroup. For example, Writeback BIOs always comes from flusher thread but
the BIOs are for different blk cgroups. Request could be requeued and
dispatched from completely different tasks. MD/DM are another examples.
This patch tries to fix the gap. We print out cgroup fhandle info in
blktrace. Userspace can use open_by_handle_at() syscall to find the
cgroup by fhandle. Or userspace can use name_to_handle_at() syscall to
find fhandle for a cgroup and use a BPF program to filter out blktrace
for a specific cgroup.
We add a new 'blk_cgroup' trace option for blk tracer. It's default off.
Application which doesn't know the new option isn't affected. When it's
on, we output fhandle info right after blk_io_trace with an extra bit
set in event action. So from application point of view, blktrace with
the option will output new actions.
I didn't change blk trace event yet, since I'm not sure if changing the
trace event output is an ABI issue. If not, I'll do it later.
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This has proven immensely useful for debugging memory leaks and
overallocation (which is a rather serious concern on the platform,
given that we typically run at about 256MB of CMA out of up to 1GB
total memory, with framebuffers that are about 8MB ecah).
The state of the art without this is to dump debug logs from every GL
application, guess as to kernel allocations based on bo_stats, and try
to merge that all together into a global picture of memory allocation
state. With this, you can add a couple of calls to the debug build of
the 3D driver and get a pretty detailed view of GPU memory usage from
/debug/dri/0/bo_stats (or when we debug print to dmesg on allocation
failure).
The Mesa side currently labels at the gallium resource level (so you
see that a 1920x20 pixmap has been created, presumably for the window
system panel), but we could extend that to be even more useful with
glObjectLabel() names being sent all the way down to the kernel.
(partial) example of sorted debugfs output with Mesa labeling all
resources:
kernel BO cache: 16392kb BOs (3)
tiling shadow 1920x1080: 8160kb BOs (1)
resource 1920x1080@32/0: 8160kb BOs (1)
scanout resource 1920x1080@32/0: 8100kb BOs (1)
kernel: 8100kb BOs (1)
v2: Use strndup_user(), use lockdep assertion instead of just a
comment, fix an array[-1] reference, extend comment about name
freeing.
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20170725182718.31468-2-eric@anholt.net
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The number of supported WIDs, if they are supported at all, can be
limited due to resources. Notifying the user space application the
number of available WIDs allows it to utilize them correctly.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Direct Packet Mode support may be disabled, e.g, due to limited
resources. Notifying the user application prevents wasting cycles
on attempting to send these kind of packets.
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZdS4PAAoJEHm+PkMAQRiGbEYH/2mukTPOUAfNoWaVjO2YHxuL
5yI3n1838tKIJm967IUmGdckN/RYGPjJxvZ+muXN2/rv23+9j3LVq9vQcsYqRQop
vrWP+hvGGJvOGJ2NYBDB+4AUrPPdeX9stolwyAcYvyCZ8AilPIovm4s2poA+fuQX
D78c8JSfpse32oc93dy4bUz3mRFKTeufstrWEuzqXI691mthF2G9EpA0R3hlbqv+
GiUnNcZVOnOuCt/47GnpWVKsyv91l3CkGq3bV1GSUi8a/1PnyFxHQxQI/qgbkLXs
NuswRupSeLDQKRgiDLgWF/BpdHEp4dpFFWXm00KWlgxeGSQnKat9bpW/d5OgnhA=
=mv3H
-----END PGP SIGNATURE-----
Backmerge tag 'v4.13-rc2' into drm-next
Linux 4.13-rc2
This is required for drm-misc fixing.
I need this to be able to apply the deferred fbdev setup patches, I
need the relevant prep work that landed through the drm-intel tree.
Also squash in conflict fixup from Laurent Pinchart.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
struct siginfo is a union and the kernel since 2.4 has been hiding a union
tag in the high 16bits of si_code using the values:
__SI_KILL
__SI_TIMER
__SI_POLL
__SI_FAULT
__SI_CHLD
__SI_RT
__SI_MESGQ
__SI_SYS
While this looks plausible on the surface, in practice this situation has
not worked well.
- Injected positive signals are not copied to user space properly
unless they have these magic high bits set.
- Injected positive signals are not reported properly by signalfd
unless they have these magic high bits set.
- These kernel internal values leaked to userspace via ptrace_peek_siginfo
- It was possible to inject these kernel internal values and cause the
the kernel to misbehave.
- Kernel developers got confused and expected these kernel internal values
in userspace in kernel self tests.
- Kernel developers got confused and set si_code to __SI_FAULT which
is SI_USER in userspace which causes userspace to think an ordinary user
sent the signal and that it was not kernel generated.
- The values make it impossible to reorganize the code to transform
siginfo_copy_to_user into a plain copy_to_user. As si_code must
be massaged before being passed to userspace.
So remove these kernel internal si codes and make the kernel code simpler
and more maintainable.
To replace these kernel internal magic si_codes introduce the helper
function siginfo_layout, that takes a signal number and an si_code and
computes which union member of siginfo is being used. Have
siginfo_layout return an enumeration so that gcc will have enough
information to warn if a switch statement does not handle all of union
members.
A couple of architectures have a messed up ABI that defines signal
specific duplications of SI_USER which causes more special cases in
siginfo_layout than I would like. The good news is only problem
architectures pay the cost.
Update all of the code that used the previous magic __SI_ values to
use the new SIL_ values and to call siginfo_layout to get those
values. Escept where not all of the cases are handled remove the
defaults in the switch statements so that if a new case is missed in
the future the lack will show up at compile time.
Modify the code that copies siginfo si_code to userspace to just copy
the value and not cast si_code to a short first. The high bits are no
longer used to hold a magic union member.
Fixup the siginfo header files to stop including the __SI_ values in
their constants and for the headers that were missing it to properly
update the number of si_codes for each signal type.
The fixes to copy_siginfo_from_user32 implementations has the
interesting property that several of them perviously should never have
worked as the __SI_ values they depended up where kernel internal.
With that dependency gone those implementations should work much
better.
The idea of not passing the __SI_ values out to userspace and then
not reinserting them has been tested with criu and criu worked without
changes.
Ref: 2.4.0-test1
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
We have a weird and problematic intersection of features that when
they all come together result in ambiguous siginfo values, that
we can not support properly.
- Supporting fcntl(F_SETSIG,...) with arbitrary valid signals.
- Using positive values for POLL_IN, POLL_OUT, POLL_MSG, ..., etc
that imply they are signal specific si_codes and using the
aforementioned arbitrary signal to deliver them.
- Supporting injection of arbitrary siginfo values for debugging and
checkpoint/restore.
The result is that just looking at siginfo si_codes of 1 to 6 are
ambigious. It could either be a signal specific si_code or it could
be a generic si_code.
For most of the kernel this is a non-issue but for sending signals
with siginfo it is impossible to play back the kernel signals and
get the same result.
Strictly speaking when the si_code was changed from SI_SIGIO to
POLL_IN and friends between 2.2 and 2.4 this functionality was not
ambiguous, as only real time signals were supported. Before 2.4 was
released the kernel began supporting siginfo with non realtime signals
so they could give details of why the signal was sent.
The result is that if F_SETSIG is set to one of the signals with signal
specific si_codes then user space can not know why the signal was sent.
I grepped through a bunch of userspace programs using debian code
search to get a feel for how often people choose a signal that results
in an ambiguous si_code. I only found one program doing so and it was
using SIGCHLD to test the F_SETSIG functionality, and did not appear
to be a real world usage.
Therefore the ambiguity does not appears to be a real world problem in
practice. Remove the ambiguity while introducing the smallest chance
of breakage by changing the si_code to SI_SIGIO when signals with
signal specific si_codes are targeted.
Fixes: v2.3.40 -- Added support for queueing non-rt signals
Fixes: v2.3.21 -- Changed the si_code from SI_SIGIO
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Add support to work with a RSS QP by using an indirection table object
upon QP creation. Other related QP verbs (e.g. modify/destroy/query) were
updated as well for that QP mode.
Notes:
- The RX hash properties are supplied as driver private data.
- The RSS QP port is used on the associated WQs in its indirection
table. Applying different ports during WQ life time is not allowed.
- The expected RSS QP flow is: create, modify(RST->INIT),
modify(RST->RTR), destroy.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
To enable RSS functionality the IB indirection table object (i.e.
ib_rwq_ind_table) should be used.
This patch implements the related verbs as of create and destroy an
indirection table.
In downstream patches the indirection table will be used as part of RSS
QP creation.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Support create/modify/destroy WQ related verbs.
The base IB object to enable RSS functionality is a WQ (i.e. ib_wq).
This patch implements the related WQ verbs as of create, modify and
destroy.
In downstream patches the WQ will be used as part of an indirection
table (i.e. ib_rwq_ind_table) to enable RSS QP creation.
Notes:
ConnectX-3 hardware requires consecutive WQNs list as receive descriptor
queues for the RSS QP. Hence, the driver manages consecutive ranges lists
per context which the user must respect.
Destroying the WQ does not return its WQN back to its range for
reusing. However, destroying all WQs from the same range releases the
range and in turn releases its WQNs for reusing.
Since the WQ object is not a natural object in the hardware, the driver
implements the WQ by the hardware QP.
As such, the WQ inherits its port from its RSS QP parent upon its
RST->INIT transition and by that time its state is applied to the
hardware.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When inline-receive is enabled, the HCA may write received
data into the receive WQE.
Inline-receive is enabled by setting its matching bit in
the QP context and each single-packet message with payload
not exceeding the receive WQE size will be delivered to
the WQE.
The completion report will indicate that the payload was placed to the WQE.
It includes:
1) Return maximum supported size of inline-receive by the hardware
in query_device vendor's data part.
2) Enable the feature when requested by the vendor data input.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enable QP creation with a given source QP number, the created QP will
use the source QPN as its wire QP number.
To create such a QP, root privileges (i.e. CAP_NET_RAW) are required
from the user application.
This comes as a pre-patch for downstream patches in this series to
allow user space applications to accelerate traffic which is typically
handled by IPoIB ULP.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This is helpful for 'nft monitor' to track which process caused a given
change to the ruleset.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Here are some small tty and serial driver fixes for 4.13-rc2. Nothing
huge at all, a revert of a patch that turned out to break things, a fix
up for a new tty ioctl we added in 4.13-rc1 to get the uapi definition
correct, and a few minor serial driver fixes for reported issues.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWXMebA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yn9MgCdGsIEQlTvTeG4QikxUPB7dkEJQacAoLWOKJzQ
A65mBAeOfiJztczi8uo4
=KEXp
-----END PGP SIGNATURE-----
Merge tag 'tty-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial driver fixes for 4.13-rc2. Nothing
huge at all, a revert of a patch that turned out to break things, a
fix up for a new tty ioctl we added in 4.13-rc1 to get the uapi
definition correct, and a few minor serial driver fixes for reported
issues.
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Fix TIOCGPTPEER ioctl definition
tty: hide unused pty_get_peer function
tty: serial: lpuart: Fix the logic for detecting the 32-bit type UART
serial: imx: Prevent TX buffer PIO write when a DMA has been started
Revert "serial: imx-serial - move DMA buffer configuration to DT"
serial: sh-sci: Uninitialized variables in sysfs files
serial: st-asc: Potential error pointer dereference
Move the protocol description header file into net/rxrpc/ and rename it to
protocol.h. It's no longer necessary to expose it as packets are no longer
exposed to kernel services (such as AFS) that use the facility.
The abort codes are transferred to the UAPI header instead as we pass these
back to userspace and also to kernel services.
Signed-off-by: David Howells <dhowells@redhat.com>
Move UAPI definitions from the internal header and place them in a UAPI
header file so that userspace can make use of them.
Signed-off-by: David Howells <dhowells@redhat.com>
It is often useful to know the branch types while analyzing branch data.
For example, a call is very different from a conditional branch.
Currently we have to look it up in binary while the binary may later not
be available and even the binary is available but user has to take some
time. It is very useful for user to check it directly in perf report.
Perf already has support for disassembling the branch instruction to get
the x86 branch type.
To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.
The patch only defines a minimum but most common set of branch types.
PERF_BR_UNKNOWN : unknown
PERF_BR_COND :conditional
PERF_BR_UNCOND : unconditional
PERF_BR_IND : indirect
PERF_BR_CALL : function call
PERF_BR_IND_CALL : indirect function call
PERF_BR_RET : function return
PERF_BR_SYSCALL : syscall
PERF_BR_SYSRET : syscall return
PERF_BR_COND_CALL : conditional function call
PERF_BR_COND_RET : conditional function return
The patch also adds a new field type (4 bits) in perf_branch_entry
to record the branch type.
Since the disassembling of branch instruction needs some overhead,
a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
needs to disassemble the branch instruction and record the branch
type.
Change log:
v10: Not changed.
v9: Not changed.
v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
No other change.
v7: Just keep the most common branch types.
Others are removed.
v6: Not changed.
v5: Not changed. The v5 patch series just change the userspace.
v4: Comparing to previous version, the major changes are:
1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
computed later in userspace.
2. Remove the "cross" field in perf_branch_entry. The cross page
computing will be done later in userspace.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1500379995-6449-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Event handling was always fairly simplistic since there were only
two events. With the addition of pin events this needed to be redesigned.
The state_change and lost_msgs events are now core events with the
guarantee that the last state is always available. The new pin events
are a queue of events (up to 64 for each event) and the oldest event
will be dropped if the application cannot keep up. Lost events are
marked with a new event flag.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Add support for low-level CEC pin monitoring. This adds a new monitor
mode, a new capability and two new events.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Implement serial console driver to complement earlycon.
Based on LeMaker linux-actions tree.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
USB spec says that multiple byte fields are stored in
little-endian order (see chapter 8.1 of USB2.0 spec and
chapter 7.1 of USB3.0 spec), thus mark such fields as LE
for UAC1 and UAC2 headers
Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
BPF programs can use the devmap with a bpf_redirect_map() helper
routine to forward packets to netdevice in map.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device map (devmap) is a BPF map, primarily useful for networking
applications, that uses a key to lookup a reference to a netdevice.
The map provides a clean way for BPF programs to build virtual port
to physical port maps. Additionally, it provides a scoping function
for the redirect action itself allowing multiple optimizations. Future
patches will leverage the map to provide batching at the XDP layer.
Another optimization/feature, that is not yet implemented, would be
to support multiple netdevices per key to support efficient multicast
and broadcast support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for a bpf_redirect helper function to the XDP
infrastructure. For now this only supports redirecting to the egress
path of a port.
In order to support drivers handling a xdp_buff natively this patches
uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff
to the specified device.
If the program specifies either (a) an unknown device or (b) a device
that does not support the operation a BPF warning is thrown and the
XDP_ABORTED error code is returned.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This ioctl does nothing to justify an _IOC_READ or _IOC_WRITE flag
because it doesn't copy anything from/to userspace to access the
argument.
Fixes: 54ebbfb160 ("tty: add TIOCGPTPEER ioctl")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Aleksa Sarai <asarai@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Common:
- add uevents for VM creation/destruction
- annotate and properly access RCU-protected objects
s390:
- rename IOCTL added in the first v4.13 merge
x86:
- emulate VMLOAD VMSAVE feature in SVM
- support paravirtual asynchronous page fault while nested
- add Hyper-V userspace interfaces for better migration
- improve master clock corner cases
- extend internal error reporting after EPT misconfig
- correct single-stepping of emulated instructions in SVM
- handle MCE during VM entry
- fix nVMX VM entry checks and nVMX VMCS shadowing
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJZaOm6AAoJEED/6hsPKofoqO8H/3breVIyVv9mwg7A5+o+6LTq
GzV/YXHSC8NtfxZn8ViS/TCziYiBSFv7XiPSodkXbOgYSz8Yya5x9D+dbEH+xgG7
l+LsZEqdSFbHCkvKrMiwSsoXtsT5WygA56+KZiBmu8cvlwqSyXWHFn3ZJ1wKzGq/
zivlkfCoh2m6bGdNmrG9pHUSgxvDh94pXesaVBKy4hgeovY1qjzby3Lo+HuIUzai
exuEU1EKRlUIfLK1B2Anp5IIv5Q1lFnMSvD6YSiWYywZb95dN/adsX1bv+MKeOdt
TIAgotsWjaAuT9JolAJjfVPHG0+uMBMsWg4Zh9Ra/gPPaSh3KEC2h1++zEYKjvw=
=1zII
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Radim Krčmář:
"Second batch of KVM updates for v4.13
Common:
- add uevents for VM creation/destruction
- annotate and properly access RCU-protected objects
s390:
- rename IOCTL added in the first v4.13 merge
x86:
- emulate VMLOAD VMSAVE feature in SVM
- support paravirtual asynchronous page fault while nested
- add Hyper-V userspace interfaces for better migration
- improve master clock corner cases
- extend internal error reporting after EPT misconfig
- correct single-stepping of emulated instructions in SVM
- handle MCE during VM entry
- fix nVMX VM entry checks and nVMX VMCS shadowing"
* tag 'kvm-4.13-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
kvm: x86: hyperv: make VP_INDEX managed by userspace
KVM: async_pf: Let guest support delivery of async_pf from guest mode
KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list
kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2
KVM: x86: make backwards_tsc_observed a per-VM variable
KVM: trigger uevents when creating or destroying a VM
KVM: SVM: Enable Virtual VMLOAD VMSAVE feature
KVM: SVM: Add Virtual VMLOAD VMSAVE feature definition
KVM: SVM: Rename lbr_ctl field in the vmcb control area
KVM: SVM: Prepare for new bit definition in lbr_ctl
KVM: SVM: handle singlestep exception when skipping emulated instructions
KVM: x86: take slots_lock in kvm_free_pit
KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition
kvm: vmx: Properly handle machine check during VM-entry
KVM: x86: update master clock before computing kvmclock_offset
kvm: nVMX: Shadow "high" parts of shadowed 64-bit VMCS fields
kvm: nVMX: Fix nested_vmx_check_msr_bitmap_controls
kvm: nVMX: Validate the I/O bitmaps on nested VM-entry
...
Hyper-V identifies vCPUs by Virtual Processor Index, which can be
queried via HV_X64_MSR_VP_INDEX msr. It is defined by the spec as a
sequential number which can't exceed the maximum number of vCPUs per VM.
APIC ids can be sparse and thus aren't a valid replacement for VP
indices.
Current KVM uses its internal vcpu index as VP_INDEX. However, to make
it predictable and persistent across VM migrations, the userspace has to
control the value of VP_INDEX.
This patch achieves that, by storing vp_index explicitly on vcpu, and
allowing HV_X64_MSR_VP_INDEX to be set from the host side. For
compatibility it's initialized to KVM vcpu index. Also a few variables
are renamed to make clear distinction betweed this Hyper-V vp_index and
KVM vcpu_id (== APIC id). Besides, a new capability,
KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip
attempting msr writes where unsupported, to avoid spamming error logs.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Pull SCSI target updates from Nicholas Bellinger:
"It's been usually busy for summer, with most of the efforts centered
around TCMU developments and various target-core + fabric driver bug
fixing activities. Not particularly large in terms of LoC, but lots of
smaller patches from many different folks.
The highlights include:
- ibmvscsis logical partition manager support (Michael Cyr + Bryant
Ly)
- Convert target/iblock WRITE_SAME to blkdev_issue_zeroout (hch +
nab)
- Add support for TMR percpu LUN reference counting (nab)
- Fix a potential deadlock between EXTENDED_COPY and iscsi shutdown
(Bart)
- Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce (Jiang Yi)
- Fix TMCU module removal (Xiubo Li)
- Fix iser-target OOPs during login failure (Andrea Righi + Sagi)
- Breakup target-core free_device backend driver callback (mnc)
- Perform TCMU add/delete/reconfig synchronously (mnc)
- Fix TCMU multiple UIO open/close sequences (mnc)
- Fix TCMU CHECK_CONDITION sense handling (mnc)
- Fix target-core SAM_STAT_BUSY + TASK_SET_FULL handling (mnc + nab)
- Introduce TYPE_ZBC support in PSCSI (Damien Le Moal)
- Fix possible TCMU memory leak + OOPs when recalculating cmd base
size (Xiubo Li + Bryant Ly + Damien Le Moal + mnc)
- Add login_keys_workaround attribute for non RFC initiators (Robert
LeBlanc + Arun Easi + nab)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (68 commits)
iscsi-target: Add login_keys_workaround attribute for non RFC initiators
Revert "qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT"
tcmu: clean up the code and with one small fix
tcmu: Fix possbile memory leak / OOPs when recalculating cmd base size
target: export lio pgr/alua support as device attr
target: Fix return sense reason in target_scsi3_emulate_pr_out
target: Fix cmd size for PR-OUT in passthrough_parse_cdb
tcmu: Fix dev_config_store
target: pscsi: Introduce TYPE_ZBC support
target: Use macro for WRITE_VERIFY_32 operation codes
target: fix SAM_STAT_BUSY/TASK_SET_FULL handling
target: remove transport_complete
pscsi: finish cmd processing from pscsi_req_done
tcmu: fix sense handling during completion
target: add helper to copy sense to se_cmd buffer
target: do not require a transport_complete for SCF_TRANSPORT_TASK_SENSE
target: make device_mutex and device_list static
tcmu: Fix flushing cmd entry dcache page
tcmu: fix multiple uio open/close sequences
tcmu: drop configured check in destroy
...
There is a flaw in the Hyper-V SynIC implementation in KVM: when message
page or event flags page is enabled by setting the corresponding msr,
KVM zeroes it out. This is problematic because on migration the
corresponding MSRs are loaded on the destination, so the content of
those pages is lost.
This went unnoticed so far because the only user of those pages was
in-KVM hyperv synic timers, which could continue working despite that
zeroing.
Newer QEMU uses those pages for Hyper-V VMBus implementation, and
zeroing them breaks the migration.
Besides, in newer QEMU the content of those pages is fully managed by
QEMU, so zeroing them is undesirable even when writing the MSRs from the
guest side.
To support this new scheme, introduce a new capability,
KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
pages aren't zeroed out in KVM.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
sem_ctime is initialized to the semget() time and then updated at every
semctl() that changes the array.
Thus it does not represent the time of the last change.
Especially, semop() calls are only stored in sem_otime, not in
sem_ctime.
This is already described in ipc/sem.c, I just overlooked that there is
a comment in include/linux/sem.h and man semctl(2) as well.
So: Correct wrong comments.
Link: http://lkml.kernel.org/r/20170515171912.6298-4-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: <1vier1@web.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With current epoll architecture target files are addressed with
file_struct and file descriptor number, where the last is not unique.
Moreover files can be transferred from another process via unix socket,
added into queue and closed then so we won't find this descriptor in the
task fdinfo list.
Thus to checkpoint and restore such processes CRIU needs to find out
where exactly the target file is present to add it into epoll queue.
For this sake one can use kcmp call where some particular target file
from the queue is compared with arbitrary file passed as an argument.
Because epoll target files can have same file descriptor number but
different file_struct a caller should explicitly specify the offset
within.
To test if some particular file is matching entry inside epoll one have
to
- fill kcmp_epoll_slot structure with epoll file descriptor,
target file number and target file offset (in case if only
one target is present then it should be 0)
- call kcmp as kcmp(pid1, pid2, KCMP_EPOLL_TFD, fd, &kcmp_epoll_slot)
- the kernel fetch file pointer matching file descriptor @fd of pid1
- lookups for file struct in epoll queue of pid2 and returns traditional
0,1,2 result for sorting purpose
Link: http://lkml.kernel.org/r/20170424154423.511592110@gmail.com
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrey Vagin <avagin@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case of KVM_S390_GET_CMMA_BITS, the kernel does not only read struct
kvm_s390_cmma_log passed from userspace (which constitutes _IOC_WRITE),
it also writes back a return value (which constitutes _IOC_READ) making
this an _IOWR ioctl instead of _IOW.
Fixes: 4036e387 ("KVM: s390: ioctls to get and set guest storage attributes")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Commit ac6424b981 ("sched/wait: Rename wait_queue_t =>
wait_queue_entry_t") had scripted the renaming incorrectly, and didn't
actually check that the 'wait_queue_t' was a full token.
As a result, it also triggered on 'wait_queue_token', and renamed that
to 'wait_queue_entry_token' entry in the autofs4 packet structure
definition too. That was entirely incorrect, and not intended.
The end result built fine when building just the kernel - because
everything had been renamed consistently there - but caused problems in
user space because the "struct autofs_packet_missing" type is exported
as part of the uapi.
This scripts it all back again:
git grep -lw wait_queue_entry_token |
xargs sed -i 's/wait_queue_entry_token/wait_queue_token/g'
and checks the end result.
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Fixes: ac6424b981 ("sched/wait: Rename wait_queue_t => wait_queue_entry_t")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZYseIAAoJEAx081l5xIa+85kP/0zKzKKVzZXSXG2TAGb5jNfk
Ex+TELG8tWk9KBxA7lEE5c0WEsnP79cNoXZLQu8wlUzO8+kwQK5Bz0zgNUkpSuo1
RthwdsxBQX1++UxB+HoSG+dOa7hkKVqlgQR3z9qyhsBXzetkJV0DoYcpMV0A1EWd
6Jzt+AvCShVkcW+21LqHPlc5EIVewrDMoA3oU6aYCLhyAOUTVvvQB2ML8YApH7TM
JrSrzCFHTrQEBbGUrZQhzR0sZzZzk9byntb/I/mdVbHeCyIHiL8sC4PfWSOyyazm
GkPnA8G3aFAY9haBRz9jG/VBr1yVb0mCBjkWQ1lGfIAOCDDSc+d7PDXdG+i4AewK
jZheXlrDIdGgmJLy4W3rdEqJvdf7UQHZOs8594OL19l4+FxCTrol1JSHSMeavCvr
8bUNil9Jb/ONU/wmp+q55U0k4TCTyerUA7gKnuaJAwBvd4n78/PKmQnbrWinDyJc
GQXp6zESk9bKt5DXSnVZuVf4POTzpuAsQkkfX1V2y145EHTQYfS3jLENWqEjyZUy
QtKCHZvRkJfGaFU4Pr+vBo9Iu1GlA5OiOv08QadldTT4OxUI0T6yaLDobHCQfKPE
sc3wCuCM+/dAnqoKDcGC4hAmF8zDdO0kw65P2m7uC6T9Jm1G35CioKbzo+fzUhuL
fg5TBpbp2Wwe2oPA5iBm
=2S5N
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.13' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This is the main pull request for the drm, I think I've got one later
driver pull for mediatek SoC driver, I'm undecided on if it needs to
go to you yet.
Otherwise summary below:
Core drm:
- Atomic add driver private objects
- Deprecate preclose hook in modern drivers
- MST bandwidth tracking
- Use kvmalloc in more places
- Add mode_valid hook for crtc/encoder/bridge
- Reduce sync_file construction time
- Documentation updates
- New DRM synchronisation object support
New drivers:
- pl111 - pl111 CLCD display controller
Panel:
- Innolux P079ZCA panel driver
- Add NL12880B20-05, NL192108AC18-02D, P320HVN03 panels
- panel-samsung-s6e3ha2: Add s6e3hf2 panel support
i915:
- SKL+ watermark fixes
- G4x/G33 reset improvements
- DP AUX backlight improvements
- Buffer based GuC/host communication
- New getparam for (sub)slice infomation
- Cannonlake and Coffeelake initial patches
- Execbuf optimisations
radeon/amdgpu:
- Lots of Vega10 bug fixes
- Preliminary raven support
- KIQ support for compute rings
- MEC queue management rework
- DCE6 Audio support
- SR-IOV improvements
- Better radeon/amdgpu selection support
nouveau:
- HDMI stereoscopic support
- Display code rework for >= GM20x GPUs
msm:
- GEM rework for fine-grained locking
- Per-process pagetable work
- HDMI fixes for Snapdragon 820.
vc4:
- Remove 256MB CMA limit from vc4
- Add out-fence support
- Add support for cygnus
- Get/set tiling ioctls support
- Add T-format tiling support for scanout
zte:
- add VGA support.
etnaviv:
- Thermal throttle support for newer GPUs
- Restore userspace buffer cache performance
- dma-buf sync fix
stm:
- add stm32f429 display support
exynos:
- Rework vblank handling
- Fixup sw-trigger code
sun4i:
- V3s display engine support
- HDMI support for older SoCs
- Preliminary work on dual-pipeline SoCs.
rcar-du:
- VSP work
imx-drm:
- Remove counter load enable from PRE
- Double read/write reduction flag support
tegra:
- Documentation for the host1x and drm driver.
- Lots of staging ioctl fixes due to grate project work.
omapdrm:
- dma-buf fence support
- TILER rotation fixes"
* tag 'drm-for-v4.13' of git://people.freedesktop.org/~airlied/linux: (1270 commits)
drm: Remove unused drm_file parameter to drm_syncobj_replace_fence()
drm/amd/powerplay: fix bug fail to remove sysfs when rmmod amdgpu.
amdgpu: Set cik/si_support to 1 by default if radeon isn't built
drm/amdgpu/gfx9: fix driver reload with KIQ
drm/amdgpu/gfx8: fix driver reload with KIQ
drm/amdgpu: Don't call amd_powerplay_destroy() if we don't have powerplay
drm/ttm: Fix use-after-free in ttm_bo_clean_mm
drm/amd/amdgpu: move get memory type function from early init to sw init
drm/amdgpu/cgs: always set reference clock in mode_info
drm/amdgpu: fix vblank_time when displays are off
drm/amd/powerplay: power value format change for Vega10
drm/amdgpu/gfx9: support the amdgpu.disable_cu option
drm/amd/powerplay: change PPSMC_MSG_GetCurrPkgPwr for Vega10
drm/amdgpu: Make amdgpu_cs_parser_init static (v2)
drm/amdgpu/cs: fix a typo in a comment
drm/amdgpu: Fix the exported always on CU bitmap
drm/amdgpu/gfx9: gfx_v9_0_enable_gfx_static_mg_power_gating() can be static
drm/amdgpu/psp: upper_32_bits/lower_32_bits for address setup
drm/amd/powerplay/cz: print message if smc message fails
drm/amdgpu: fix typo in amdgpu_debugfs_test_ib_init
...
Pull scheduler fixes from Thomas Gleixner:
"This scheduler update provides:
- The (hopefully) final fix for the vtime accounting issues which
were around for quite some time
- Use types known to user space in UAPI headers to unbreak user space
builds
- Make load balancing respect the current scheduling domain again
instead of evaluating unrelated CPUs"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/headers/uapi: Fix linux/sched/types.h userspace compilation errors
sched/fair: Fix load_balance() affinity redo path
sched/cputime: Accumulate vtime on top of nsec clocksource
sched/cputime: Move the vtime task fields to their own struct
sched/cputime: Rename vtime fields
sched/cputime: Always set tsk->vtime_snap_whence after accounting vtime
vtime, sched/cputime: Remove vtime_account_user()
Revert "sched/cputime: Refactor the cputime_adjust() code"
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAllhktgACgkQ8vlZVpUN
gaOQIQf+KM2s46sxxEl0/hjdBXR4OxTmSS2/0900NPyg7JHKlL8PdYslOyvMiKjo
wEi+YPwwQgbHtxhI1VINfV/q12MZHwvmFOfD9NzjrISwfmfsKj0dBgZDAfBH82sK
12wKgUxA8xJ4P+Xdvnz2PokRcFCsh1YUr5IUQkP3JR2RZOxNFUj42QwPJ2yWzqxO
MsnepMjIHsxvXZi0E7sPjRaoFsh3DDeLmNl8sX6INodC7hxJ1LotYKqJhA4stQpB
ezXY2tabwg3gaOWvWH7THyHhGntbZVDga3iRrKdNLahXN8OBdHktmG75ubiN6tEg
x80pqQLgr41yIQuJVOuyeh5jLYZrww==
=i4r9
-----END PGP SIGNATURE-----
Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypt updates from Ted Ts'o:
"Add support for 128-bit AES and some cleanups to fscrypt"
* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: make ->dummy_context() return bool
fscrypt: add support for AES-128-CBC
fscrypt: inline fscrypt_free_filename()
Pull input updates from Dmitry Torokhov:
- a new driver for STM FingerTip touchscreen
- a new driver for D-Link DIR-685 touch keys
- updated list of supported devices in xpad driver
- other assorted updates and fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (23 commits)
MAINTAINERS: update input subsystem patterns
Input: introduce KEY_ASSISTANT
Input: xpad - sync supported devices with XBCD
Input: xpad - sync supported devices with 360Controller
Input: xen-kbdfront - use string constants from PV protocol
Input: stmfts - mark all PM functions as __maybe_unused
Input: add support for the STMicroelectronics FingerTip touchscreen
Input: add D-Link DIR-685 touchkeys driver
Input: s3c2410_ts - handle return value of clk_prepare_enable
Input: axp20x-pek - add wakeup support
Input: synaptics-rmi4 - use %phN to form F34 configuration ID
Input: synaptics-rmi4 - change a char type to u8
Input: sparse-keymap - remove sparse_keymap_free()
Input: tsc2007 - move header file out of I2C realm
Input: mms114 - move header file out of I2C realm
Input: mcs - move header file out of I2C realm
Input: lm8323 - move header file out of I2C realm
Input: elantech - force relative mode on a certain module
Input: elan_i2c - add support for fetching chip type on newer hardware
Input: elan_i2c - check if device is there before really probing
...
Consistently use types provided by <linux/types.h> to fix the following
linux/sched/types.h userspace compilation errors:
/usr/include/linux/sched/types.h:57:2: error: unknown type name 'u32'
u32 size;
...
u64 sched_period;
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # v4.12
Fixes: e2d1e2aec5 ("sched/headers: Move various ABI definitions to <uapi/linux/sched/types.h>")
Link: http://lkml.kernel.org/r/20170705162328.GA11026@altlinux.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Introduce the _flushcache() family of memory copy helpers and use them
for persistent memory write operations on x86. The _flushcache()
semantic indicates that the cache is either bypassed for the copy
operation (movnt) or any lines dirtied by the copy operation are
written back (clwb, clflushopt, or clflush).
* Extend dax_operations with ->copy_from_iter() and ->flush()
operations. These operations and other infrastructure updates allow
all persistent memory specific dax functionality to be pushed into
libnvdimm and the pmem driver directly. It also allows dax-specific
sysfs attributes to be linked to a host device, for example:
/sys/block/pmem0/dax/write_cache
* Add support for the new NVDIMM platform/firmware mechanisms introduced
in ACPI 6.2 and UEFI 2.7. This support includes the v1.2 namespace
label format, extensions to the address-range-scrub command set, new
error injection commands, and a new BTT (block-translation-table)
layout. These updates support inter-OS and pre-OS compatibility.
* Fix a longstanding memory corruption bug in nfit_test.
* Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
capable.
* Miscellaneous fixes and small updates across libnvdimm and the nfit
driver.
Acknowledgements that came after the branch was pushed:
commit 6aa734a2f3 "libnvdimm, region, pmem: fix 'badblocks'
sysfs_get_dirent() reference lifetime"
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZXsUtAAoJEB7SkWpmfYgCOXcP/06bncqTEvtgrOF2b7O8w+8e
mTySD51RUn6UpkFd37SMRch+rmbojuqj465TAE7XIXgyLgIOJixKaTlHYUoEnP3X
rC4Q/g5mN0nittMDwL+vQaa1lQWd2kbjOlrqCgnLHVEEJpHmiQussunjvir4G1U7
5ROooP8W+qMK5y5XPLJAg/gyGhYkjpRSlDg3Eo5meZZ0IdURbI7+WCLKrPcQUERT
WmDc9gLhJdSQVxBV/0m2gdAER4ADmFjcrlm8kjXRBhdlUmEFjM0zpvlHJutHTkks
rNZWCmCJs0Sas+DmRKszFmvVFHRHqUVA3dWK4P6PJEX+tl7BwlPcxpbfacHTG2EZ
btArFc584DZ+EIrim1cXXRvLFlxnKOFBtBeteFs7l2kZjEcN6S4I5OZgTyeDpe/i
2WDpHWLQWibkcIzH9y1EuMBkYnQjTJl1pecHzJoTaC+jAQ+opLiY7EecjLmCmQS6
MBYUeQZNufLGfT5b8KXfpKeiXhpFkYrAGp+ErfoH/6RKy2zqTdagN1yVhos2y+a7
JJu/Weetpn8qv+KTGUShO8TGyWv3wU46YkG2rKWl0FL1+C+6LMMw1/L0A97lwVlg
BpypVVyaNu1D22ifZ8O5wbqPIYghoZ5akA0CiduhX19cpl5rTeTd8EvLjvcYhZEZ
pMHuMAqIcIyLhIe2/sRF
=xKQB
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"libnvdimm updates for the latest ACPI and UEFI specifications. This
pull request also includes new 'struct dax_operations' enabling to
undo the abuse of copy_user_nocache() for copy operations to pmem.
The dax work originally missed 4.12 to address concerns raised by Al.
Summary:
- Introduce the _flushcache() family of memory copy helpers and use
them for persistent memory write operations on x86. The
_flushcache() semantic indicates that the cache is either bypassed
for the copy operation (movnt) or any lines dirtied by the copy
operation are written back (clwb, clflushopt, or clflush).
- Extend dax_operations with ->copy_from_iter() and ->flush()
operations. These operations and other infrastructure updates allow
all persistent memory specific dax functionality to be pushed into
libnvdimm and the pmem driver directly. It also allows dax-specific
sysfs attributes to be linked to a host device, for example:
/sys/block/pmem0/dax/write_cache
- Add support for the new NVDIMM platform/firmware mechanisms
introduced in ACPI 6.2 and UEFI 2.7. This support includes the v1.2
namespace label format, extensions to the address-range-scrub
command set, new error injection commands, and a new BTT
(block-translation-table) layout. These updates support inter-OS
and pre-OS compatibility.
- Fix a longstanding memory corruption bug in nfit_test.
- Make the pmem and nvdimm-region 'badblocks' sysfs files poll(2)
capable.
- Miscellaneous fixes and small updates across libnvdimm and the nfit
driver.
Acknowledgements that came after the branch was pushed: commit
6aa734a2f3 ("libnvdimm, region, pmem: fix 'badblocks'
sysfs_get_dirent() reference lifetime") was reviewed by Toshi Kani
<toshi.kani@hpe.com>"
* tag 'libnvdimm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (42 commits)
libnvdimm, namespace: record 'lbasize' for pmem namespaces
acpi/nfit: Issue Start ARS to retrieve existing records
libnvdimm: New ACPI 6.2 DSM functions
acpi, nfit: Show bus_dsm_mask in sysfs
libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru.
acpi, nfit: Enable DSM pass thru for root functions.
libnvdimm: passthru functions clear to send
libnvdimm, btt: convert some info messages to warn/err
libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime
libnvdimm: fix the clear-error check in nsio_rw_bytes
libnvdimm, btt: fix btt_rw_page not returning errors
acpi, nfit: quiet invalid block-aperture-region warnings
libnvdimm, btt: BTT updates for UEFI 2.7 format
acpi, nfit: constify *_attribute_group
libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region
libnvdimm, pmem, dax: export a cache control attribute
dax: convert to bitmask for flags
dax: remove default copy_from_iter fallback
libnvdimm, nfit: enable support for volatile ranges
libnvdimm, pmem: fix persistence warning
...
This makes the device add, del reconfig operations sync. It fixes
the issue where for add and reconfig, we do not know if userspace
successfully completely the operation, so we leave invalid kernel
structs or report incorrect status for the config/reconfig operations.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
1. TCMU_ATTR_TYPE is too generic when it describes only the
reconfiguration type, so rename to TCMU_ATTR_RECONFIG_TYPE.
2. Only return the reconfig type when it is a
TCMU_CMD_RECONFIG_DEVICE command.
3. CONFIG_* type is not needed. We can pass the value along with an
ATTR to userspace, so it does not need to read sysfs/configfs.
4. Fix leak in tcmu_dev_path_store and rename to dev_config to
reflect it is more than just a path that can be changed.
6. Don't update kernel struct value if netlink sending fails.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: "Bryant G. Ly" <bryantly@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch adds more info about the attribute being changed,
so that usersapce can easily figure out what is happening.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-By: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This gives tcmu the ability to handle events that can cause
reconfiguration, such as resize, path changes, write_cache, etc...
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-By: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Merge misc updates from Andrew Morton:
- a few hotfixes
- various misc updates
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (108 commits)
mm, memory_hotplug: move movable_node to the hotplug proper
mm, memory_hotplug: drop CONFIG_MOVABLE_NODE
mm, memory_hotplug: drop artificial restriction on online/offline
mm: memcontrol: account slab stats per lruvec
mm: memcontrol: per-lruvec stats infrastructure
mm: memcontrol: use generic mod_memcg_page_state for kmem pages
mm: memcontrol: use the node-native slab memory counters
mm: vmstat: move slab statistics from zone to node counters
mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()
mm/zswap.c: improve a size determination in zswap_frontswap_init()
mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create()
mm/swapfile.c: sort swap entries before free
mm/oom_kill: count global and memory cgroup oom kills
mm: per-cgroup memory reclaim stats
mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects
mm: kmemleak: factor object reference updating out of scan_block()
mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures
mm, mempolicy: don't check cpuset seqlock where it doesn't matter
mm, cpuset: always use seqlock when changing task's nodemask
mm, mempolicy: simplify rebinding mempolicies when updating cpusets
...
- Better machine check handling for HV KVM
- Ability to support guests with threads=2, 4 or 8 on POWER9
- Fix for a race that could cause delayed recognition of signals
- Fix for a bug where POWER9 guests could sleep with interrupts pending.
ARM:
- VCPU request overhaul
- allow timer and PMU to have their interrupt number selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups
s390:
- initial machine check forwarding
- migration support for the CMMA page hinting information
- cleanups and fixes
x86:
- nested VMX bugfixes and improvements
- more reliable NMI window detection on AMD
- APIC timer optimizations
Generic:
- VCPU request overhaul + documentation of common code patterns
- kvm_stat improvements
There is a small conflict in arch/s390 due to an arch-wide field rename.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJZW4XTAAoJEL/70l94x66DkhMH/izpk54KI17PtyQ9VYI2sYeZ
BWK6Kl886g3ij4pFi3pECqjDJzWaa3ai+vFfzzpJJ8OkCJT5Rv4LxC5ERltVVmR8
A3T1I/MRktSC0VJLv34daPC2z4Lco/6SPipUpPnL4bE2HATKed4vzoOjQ3tOeGTy
dwi7TFjKwoVDiM7kPPDRnTHqCe5G5n13sZ49dBe9WeJ7ttJauWqoxhlYosCGNPEj
g8ZX8+cvcAhVnz5uFL8roqZ8ygNEQq2mgkU18W8ZZKuiuwR0gdsG0gSBFNTdwIMK
NoreRKMrw0+oLXTIB8SZsoieU6Qi7w3xMAMabe8AJsvYtoersugbOmdxGCr1lsA=
=OD7H
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"PPC:
- Better machine check handling for HV KVM
- Ability to support guests with threads=2, 4 or 8 on POWER9
- Fix for a race that could cause delayed recognition of signals
- Fix for a bug where POWER9 guests could sleep with interrupts pending.
ARM:
- VCPU request overhaul
- allow timer and PMU to have their interrupt number selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups
s390:
- initial machine check forwarding
- migration support for the CMMA page hinting information
- cleanups and fixes
x86:
- nested VMX bugfixes and improvements
- more reliable NMI window detection on AMD
- APIC timer optimizations
Generic:
- VCPU request overhaul + documentation of common code patterns
- kvm_stat improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
Update my email address
kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
kvm: x86: mmu: allow A/D bits to be disabled in an mmu
x86: kvm: mmu: make spte mmio mask more explicit
x86: kvm: mmu: dead code thanks to access tracking
KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
KVM: x86: remove ignored type attribute
KVM: LAPIC: Fix lapic timer injection delay
KVM: lapic: reorganize restart_apic_timer
KVM: lapic: reorganize start_hv_timer
kvm: nVMX: Check memory operand to INVVPID
KVM: s390: Inject machine check into the nested guest
KVM: s390: Inject machine check into the guest
tools/kvm_stat: add new interactive command 'b'
tools/kvm_stat: add new command line switch '-i'
tools/kvm_stat: fix error on interactive command 'g'
KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
...
Commit c0ff7453bb ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") has introduced a two-step protocol when
rebinding task's mempolicy due to cpuset update, in order to avoid a
parallel allocation seeing an empty effective nodemask and failing.
Later, commit cc9a6c8776 ("cpuset: mm: reduce large amounts of memory
barrier related damage v3") introduced a seqlock protection and removed
the synchronization point between the two update steps. At that point
(or perhaps later), the two-step rebinding became unnecessary.
Currently it only makes sure that the update first adds new nodes in
step 1 and then removes nodes in step 2. Without memory barriers the
effects are questionable, and even then this cannot prevent a parallel
zonelist iteration checking the nodemask at each step to observe all
nodes as unusable for allocation. We now fully rely on the seqlock to
prevent premature OOMs and allocation failures.
We can thus remove the two-step update parts and simplify the code.
Link: http://lkml.kernel.org/r/20170517081140.30654-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Filesystems generally use SUPER_MAGIC values from magic.h instead of a
local definition.
Link: http://lkml.kernel.org/r/20170521154217.27917-1-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>