Note the original code I deleted incorrectly listed this as (X | C1) & C2 --> (X & C2^(C1&C2)) | C1 Which is only valid if C1 is a subset of C2. This relied on SimplifyDemandedBits to remove any extra bits from C1 before we got to that code.
My new implementation avoids relying on that behavior so that it can be naively verified with alive.
Differential Revision: https://reviews.llvm.org/D36384
llvm-svn: 310272
Using task_for_pid to get the "self" task is not necessary, and it can fail (e.g. for sandboxed processes). Let's just use mach_task_self().
Differential Revision: https://reviews.llvm.org/D36284
llvm-svn: 310271
X86SubVBroadcast is for memory subvector broadcasts, but we must test that it handles all cases without the load as well just in case.
This was noticed while I was triaging the test cases from PR34041.
llvm-svn: 310268
Summary:
OpenMP has the ability to offload target regions to devices which may have different architectures.
A new -fopenmp-target-arch flag is introduced to specify the device architecture.
In this patch I use the new flag to specify the compute capability of the underlying NVIDIA architecture for the OpenMP offloading CUDA tool chain.
Only a host-offloading test is provided since full device offloading capability will only be available when [[ https://reviews.llvm.org/D29654 | D29654 ]] lands.
Reviewers: hfinkel, Hahnfeld, carlo.bertolli, caomhin, ABataev
Reviewed By: hfinkel
Subscribers: guansong, cfe-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D34784
llvm-svn: 310263
These lead to tests failing spuriously as the values after being rendered to a
string were incorrect.
Reviewers: clayborg
Differential Revision: https://reviews.llvm.org/D36319
llvm-svn: 310262
Summary:
1. Provide single library for all Intel specific hardware features instead
of individual libraries for each feature
2. Added Intel(R) Processor Trace hardware feature in this single library.
Details about the tool implementing this feature is as follows:
Tool developed on top of LLDB to provide its users the execution
trace of the debugged inferiors. Tool's API are exposed as C++ object
oriented interface in a shared library. API are designed especially to be
easily integrable with IDEs providing LLDB as an application debugger.
Entire API is also available as Python functions through a script bridging
interface allowing development of python modules.
This patch also provides a CLI wrapper to use the Tool through LLDB's command
line. Highlights of the Tool and the wrapper are given below:
******************************
Intel(R) Processor Trace Tool:
******************************
- Provides execution trace of the debugged application
- Uses Intel(R) Processor Trace hardware feature (already implemented inside LLDB)
for this purpose
-- Collects trace packets generated by this feature from LLDB, decodes and
post-processes them
-- Constructs the execution trace of the application
-- Presents execution trace as a list of assembly instructions
- Provides 4 APIs (exposed as C++ object oriented interface)
-- start trace with configuration options for a thread/process,
-- stop trace for a thread/process,
-- get the execution flow (assembly instructions) for a thread,
-- get trace specific information for a thread
- Easily integrable into IDEs providing LLDB as application debugger
- Entire API available as Python functions through script bridging interface
-- Allows developing python apps on top of Tool
- README_TOOL.txt provides more details about the Tool, its dependencies, building
steps and API usage
- Tool ready to use through LLDB's command line
-- CLI wrapper has been developed on top of the Tool for this purpose
*********************************
CLI wrapper: cli-wrapper-pt.cpp
*********************************
- Provides 4 commands (syntax similar to LLDB's CLI commands):
-- processor-trace start
-- processor-trace stop
-- processor-trace show-trace-options
-- processor-trace show-instr-log
- README_CLI.txt provides more details about commands and their options
Signed-off-by: Abhishek Aggarwal <abhishek.a.aggarwal@intel.com>
Reviewers: clayborg, jingham, lldb-commits, labath
Reviewed By: clayborg
Subscribers: ravitheja, emaste, krytarowski, mgorny
Differential Revision: https://reviews.llvm.org/D33035
llvm-svn: 310261
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the
array. This array is processed only after the vectorization of the
first-after-these instructions key node is finished. Vectorization goes
in reverse order to try to vectorize as much code as possible.
Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon
Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits
Differential Revision: https://reviews.llvm.org/D29826
llvm-svn: 310260
Try to avoid mutually exclusive features. Don't use
a real default GPU, and use a fake "generic". The goal
is to make it easier to see which set of features are
incompatible between feature strings.
Most of the test changes are due to random scheduling changes
from not having a default fullspeed model.
llvm-svn: 310258
Relanding after case to insert explicit truncation as necessary.
Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to
EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally
improves vector shuffle computations.
Reviewers: efriedma, RKSimon, spatel
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35566
llvm-svn: 310256
Summary:
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible.
Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon
Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits
Differential Revision: https://reviews.llvm.org/D29826
llvm-svn: 310255
Relanding after fixing UB issue with DefaultOffsets.
Consider the following instruction: "inst.eq $dst, $src" where ".eq"
is an optional flag operand. The $src and $dst operands are
registers. If we parse the instruction "inst r0, r1", the flag is not
present and it will be marked in the "OptionalOperandsMask" variable.
After the matching is complete we call the "convertToMCInst" method.
The current implementation works only if the optional operands are at
the end of the array. The "Operands" array looks like [token:"inst",
reg:r0, reg:r1]. The first operand that must be added to the MCInst
is the destination, the r0 register. The "OpIdx" (in the Operands
array) for this register is 2. However, since the flag is not present
in the Operands, the actual index for r0 should be 1. The flag is not
present since we rely on the default value.
This patch removes the "NumDefaults" variable and replaces it with an
array (DefaultsOffset). This array contains an index for each operand
(excluding the mnemonic). At each index, the array contains the
number of optional operands that should be subtracted. For the
previous example, this array looks like this: [0, 1, 1]. When we need
to access the r0 register, we compute its index as 2 -
DefaultsOffset[1] = 1.
Patch by Alexandru Guduleasa!
Reviewers: SamWot, nhaustov, niravd
Reviewed By: niravd
Subscribers: vitalybuka, llvm-commits
Differential Revision: https://reviews.llvm.org/D35998
llvm-svn: 310254
This patch expands the support of lowerInterleavedStore to 16x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future.
The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 16 chars:
c0, c1, , c16
m0, m1, , m16
y0, y1, , y16
k0, k1, ., k16
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Differential Revision: https://reviews.llvm.org/D35829
llvm-svn: 310252
Summary:
NetBSD ships with printf_l(3) like FreeBSD.
NetBSD does not ship with memalign, pvalloc, malloc with "usable size"
and is the same here as Darwin, Android, FreeBSD and Windows.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, vitalybuka, kcc, fjricci, filcab
Reviewed By: vitalybuka
Subscribers: srhines, llvm-commits, emaste, kubamracek, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D36373
llvm-svn: 310248
Summary:
Part of the code inspired by the original work on libsanitizer in GCC 5.4 by Christos Zoulas.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, fjricci, vitalybuka, filcab, kcc
Reviewed By: vitalybuka
Subscribers: llvm-commits, kubamracek, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D36374
llvm-svn: 310247
Summary:
Part of the code inspired by the original work on libsanitizer in GCC 5.4 by Christos Zoulas.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, filcab, kcc, fjricci, vitalybuka
Reviewed By: vitalybuka
Subscribers: kubamracek, llvm-commits, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D36375
llvm-svn: 310246
Summary:
Verified to work and useful to run check-asan, as this target tests 32-bit and 64-bit execution.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, filcab, dim, vitalybuka
Reviewed By: vitalybuka
Subscribers: #sanitizers, cfe-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D36378
llvm-svn: 310245
This patch addresses two issues with assembly and disassembly for VMRS/VMSR:
1.currently VMRS/VMSR instructions accessing fpsid, mvfr{0-2} and fpexc, are
accepted for non ARMv8-A targets.
2. all VMRS/VMSR instructions accept writing/reading to PC and SP, when only
ARMv7-A and ARMv8-A should be allowed to write/read to SP and none to PC.
This patch addresses those issues and adds tests for these cases.
Differential Revision: https://reviews.llvm.org/D36306
llvm-svn: 310243
The NewNodesMustHaveLegalTypes flag is set to false at the beginning of CodeGenAndEmitDAG, and set to true after legalizing types.
But before calling CodeGenAndEmitDAG we build the DAG for the basic block.
So for the first basic block NewNodesMustHaveLegalTypes would be 'false' during the SDAG building, and for all other basic blocks it would be 'true'.
This patch sets the flag to false before SDAG building each basic block.
Differential Revision:
https://reviews.llvm.org/D33435
llvm-svn: 310239
While here, rename `i` to `Rank` as the latter is more
self-explanatory (and this code also uses `I` two lines below to
identify an Instruction).
llvm-svn: 310238