Delete temporarily constructed node uses for analysis after it's use,
holding onto original input nodes. Ideally this would be rewritten
without making nodes, but this appears relatively complex.
Reviewers: spatel, RKSimon, craig.topper
Subscribers: jdoerfert, hiraditya, deadalnix, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57921
llvm-svn: 356382
Add an experimental buffer fat pointer address space that is currently
unhandled in the backend. This commit reserves address space 7 as a
non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer
descriptor + 32-bit offset) that is heavily used in graphics workloads
using the AMDGPU backend.
Differential Revision: https://reviews.llvm.org/D58957
llvm-svn: 356373
Follow-up to:
rL356338
rL356369
We can calculate an arbitrary vector constant minus the bitwidth, so there's
no need to limit this transform to scalars and splats.
llvm-svn: 356372
This fixes the https://bugs.llvm.org/show_bug.cgi?id=40980.
Previously if string optimization occurred as a result of
StringTableBuilder's finalize() method, the size wasn't updated.
This hopefully also makes the interaction between sections during finalization
processes a bit more clear.
Differential revision: https://reviews.llvm.org/D59488
llvm-svn: 356371
Follow-up to:
rL356338
Rotates are a special case of funnel shift where the 2 input operands
are the same value, but that does not need to be a restriction for the
canonicalization when the shift amount is a constant.
llvm-svn: 356369
Summary:
Look past bitcasts when looking for parameter debug values that are
described by frame-index loads in `EmitFuncArgumentDbgValue()`.
In the attached test case we would be left with an undef `DBG_VALUE`
for the parameter without this patch.
A similar fix was done for parameters passed in registers in D13005.
This fixes PR40777.
Reviewers: aprantl, vsk, jmorse
Reviewed By: aprantl
Subscribers: bjope, javed.absar, jdoerfert, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D58831
llvm-svn: 356363
Fixes https://bugs.llvm.org/show_bug.cgi?id=35094
The Dead register definition pass should leave alone the atomicrmw
instructions on AArch64 (LTE extension). The reason is the following
statement in the Arm ARM:
"The ST<OP> instructions, and LD<OP> instructions where the destination
register is WZR or XZR, are not regarded as doing a read for the purpose
of a DMB LD barrier."
A good example was given in the gcc thread by Will Deacon (linked in the
bugzilla ticket 35094):
P0 (atomic_int* y,atomic_int* x) {
atomic_store_explicit(x,1,memory_order_relaxed);
atomic_thread_fence(memory_order_release);
atomic_store_explicit(y,1,memory_order_relaxed);
}
P1 (atomic_int* y,atomic_int* x) {
atomic_fetch_add_explicit(y,1,memory_order_relaxed); // STADD
atomic_thread_fence(memory_order_acquire);
int r0 = atomic_load_explicit(x,memory_order_relaxed);
}
P2 (atomic_int* y) {
int r1 = atomic_load_explicit(y,memory_order_relaxed);
}
My understanding is that it is forbidden for r0 == 0 and r1 == 2 after
this test has executed. However, if the relaxed add in P1 compiles to
STADD and the subsequent acquire fence is compiled as DMB LD, then we
don't have any ordering guarantees in P1 and the forbidden result could
be observed.
Change-Id: I419f9f9df947716932038e1100c18d10a96408d0
llvm-svn: 356360
AMDGPU would like to use these MVTs.
Differential Revision: https://reviews.llvm.org/D58901
Change-Id: I6125fea810d7cc62a4b4de3d9904255a1233ae4e
llvm-svn: 356351
There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address-space as a feature for HSA, as it
should only be a device property. This was incorrectly allowing flat
instructions to select for SI.
Increase the default generation for HSA to avoid the encoding error
when emitting objects. This has some other side effects from various
checks which probably should be separate subtarget features (in the
cost model and for dealing with the DS offset folding issue).
Partial fix for bug 41070. It should probably be an error to try using
amdhsa without flat support.
llvm-svn: 356347
Previously we had a regular form of the instruction used when the immediate was 0-7. And _alt form that allowed the full 8 bit immediate. Codegen would always use the 0-7 form since the immediate was always checked to be in range. Assembly parsing would use the 0-7 form when a mnemonic like vpcomtrueb was used. If the immediate was specified directly the _alt form was used. The disassembler would prefer to use the 0-7 form instruction when the immediate was in range and the _alt form otherwise. This way disassembly would print the most readable form when possible.
The assembly parsing for things like vpcomtrueb relied on splitting the mnemonic into 3 pieces. A "vpcom" prefix, an immediate representing the "true", and a suffix of "b". The tablegenerated printing code would similarly print a "vpcom" prefix, decode the immediate into a string, and then print "b".
The _alt form on the other hand parsed and printed like any other instruction with no specialness.
With this patch we drop to one form and solve the disassembly printing issue by doing custom printing when the immediate is 0-7. The parsing code has been tweaked to turn "vpcomtrueb" into "vpcomb" and then the immediate for the "true" is inserted either before or after the other operands depending on at&t or intel syntax.
I'd rather not do the custom printing, but I tried using an InstAlias for each possible mnemonic for all 8 immediates for all 16 combinations of element size, signedness, and memory/register. The code emitted into printAliasInstr ended up checking the number of operands, the register class of each operand, and the immediate for all 256 aliases. This was repeated for both the at&t and intel printer. Despite a lot of common checks between all of the aliases, when compiled with clang at least this commonality was not well optimized. Nor do all the checks seem necessary. Since I want to do a similar thing for vcmpps/pd/ss/sd which have 32 immediate values and 3 encoding flavors, 3 register sizes, etc. This didn't seem to scale well for clang binary size. So custom printing seemed a better trade off.
I also considered just using the InstAlias for the matching and not the printing. But that seemed like it would add a lot of extra rows to the matcher table. Especially given that the 32 immediates for vpcmpps have 46 strings associated with them.
Differential Revision: https://reviews.llvm.org/D59398
llvm-svn: 356343
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:
* Fixed assumptions of power-of-2 vector type in kernel arg handling,
and added v5 kernel arg tests and v3/v5 shader arg tests.
* Added v5 tests for cost analysis.
* Added vec3/vec5 arg test cases.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58928
Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd
llvm-svn: 356342
This was noted as a backend problem:
https://bugs.llvm.org/show_bug.cgi?id=41057
...and subsequently fixed for x86:
rL356121
But we should canonicalize these in IR for the benefit of all targets
and improve IR analysis such as CSE.
llvm-svn: 356338
The constant island pass currently only looks at the instruction immediately
before a branch for a CMP to fold into a CBZ/CBNZ. This extends it to search
backwards for the instruction that defines CPSR. We need to ensure that the
register is not overridden between the CMP and the branch.
Differential Revision: https://reviews.llvm.org/D59317
llvm-svn: 356336
Fold (x & ~y) | y and it's four commuted variants to x | y. This pattern
can in particular appear when a vselect c, x, -1 is expanded to
(x & ~c) | (-1 & c) and combined to (x & ~c) | c.
This change has some overlap with D59066, which avoids creating a
vselect of this form in the first place during uaddsat expansion.
Differential Revision: https://reviews.llvm.org/D59174
llvm-svn: 356333
This is a subset of what was proposed in:
D59006
...and may overlap with test changes from:
D59174
...but it seems like a good general optimization to turn selects
into bitwise-logic when possible because we never know exactly
what can happen at this stage of DAG combining depending on how
the target has defined things.
Differential Revision: https://reviews.llvm.org/D59066
llvm-svn: 356332
RISCVAsmParser::ParseRegister is called from AsmParser::parseRegisterOrNumber,
which in turn is called when processing CFI directives. The RISC-V
implementation wasn't setting RegNo, and so was incorrect. This patch address
that and adds cfi directive tests that demonstrate the fix. A follow-up patch
will factor out the register parsing logic shared between ParseRegister and
parseRegister.
llvm-svn: 356329
rL356292 reduces the size of scalar_to_vector if we know the upper bits are undef - which means that shuffles may find they are suddenly referencing scalar_to_vector elements other than zero - so make sure we handle this as undef.
llvm-svn: 356327
Two new kinds, BTF_KIND_VAR and BTF_KIND_DATASEC, are added.
BTF_KIND_VAR has the following specification:
btf_type.name: var name
btf_type.info: type kind
btf_type.type: var type
// btf_type is followed by one u32
u32: varinfo (currently, only 0 - static, 1 - global allocated in elf sections)
Not all globals are supported in this patch. The following globals are supported:
. static variables with or without section attributes
. global variables with section attributes
The inclusion of globals with section attributes
is for future potential extraction of key/value
type id's from map definition.
BTF_KIND_DATASEC has the following specification:
btf_type.name: section name associated with variable or
one of .data/.bss/.readonly
btf_type.info: type kind and vlen for # of variables
btf_type.size: 0
#vlen number of the following:
u32: id of corresponding BTF_KIND_VAR
u32: in-session offset of the var
u32: the size of memory var occupied
At the time of debug info emission, the data section
size is unknown, so the btf_type.size = 0 for
BTF_KIND_DATASEC. The loader can patch it during
loading time.
The in-session offseet of the var is only available
for static variables. For global variables, the
loader neeeds to assign the global variable symbol value in
symbol table to in-section offset.
The size of memory is used to specify the amount of the
memory a variable occupies. Typically, it equals to
the type size, but for certain structures, e.g.,
struct tt {
int a;
int b;
char c[];
};
static volatile struct tt s2 = {3, 4, "abcdefghi"};
The static variable s2 has size of 20.
Note that for BTF_KIND_DATASEC name, the section name
does not contain object name. The compiler does have
input module name. For example, two cases below:
. clang -target bpf -O2 -g -c test.c
The compiler knows the input file (module) is test.c
and can generate sec name like test.data/test.bss etc.
. clang -target bpf -O2 -g -emit-llvm -c test.c -o - |
llc -march=bpf -filetype=obj -o test.o
The llc compiler has the input file as stdin, and
would generate something like stdin.data/stdin.bss etc.
which does not really make sense.
For any user specificed section name, e.g.,
static volatile int a __attribute__((section("id1")));
static volatile const int b __attribute__((section("id2")));
The DataSec with name "id1" and "id2" does not contain
information whether the section is readonly or not.
The loader needs to check the corresponding elf section
flags for such information.
A simple example:
-bash-4.4$ cat t.c
int g1;
int g2 = 3;
const int g3 = 4;
static volatile int s1;
struct tt {
int a;
int b;
char c[];
};
static volatile struct tt s2 = {3, 4, "abcdefghi"};
static volatile const int s3 = 4;
int m __attribute__((section("maps"), used)) = 4;
int test() { return g1 + g2 + g3 + s1 + s2.a + s3 + m; }
-bash-4.4$ clang -target bpf -O2 -g -S t.c
Checking t.s, 4 BTF_KIND_VAR's are generated (s1, s2, s3 and m).
4 BTF_KIND_DATASEC's are generated with names
".data", ".bss", ".rodata" and "maps".
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D59441
llvm-svn: 356326
Summary:
In the new wasm EH proposal, `rethrow` takes an `except_ref` argument.
This change was missing in r352598.
This patch adds `llvm.wasm.rethrow.in.catch` intrinsic. This is an
intrinsic that's gonna eventually be lowered to wasm `rethrow`
instruction, but this intrinsic can appear only within a catchpad or a
cleanuppad scope. Also this intrinsic needs to be invokable - otherwise
EH pad successor for it will not be correctly generated in clang.
This also adds lowering logic for this intrinsic in
`SelectionDAGBuilder::visitInvoke`. This routine is basically a
specialized and simplified version of
`SelectionDAGBuilder::visitTargetIntrinsic`, but we can't use it
because if is only for `CallInst`s.
This deletes the previous `llvm.wasm.rethrow` intrinsic and related
tests, which was meant to be used within a `__cxa_rethrow` library
function. Turned out this needs some more logic, so the intrinsic for
this purpose will be added later.
LateEHPrepare takes a result value of `catch` and inserts it into
matching `rethrow` as an argument.
`RETHROW_IN_CATCH` is a pseudo instruction that serves as a link between
`llvm.wasm.rethrow.in.catch` and the real wasm `rethrow` instruction. To
generate a `rethrow` instruction, we need an `except_ref` argument,
which is generated from `catch` instruction. But `catch` instrutions are
added in LateEHPrepare pass, so we use `RETHROW_IN_CATCH`, which takes
no argument, until we are able to correctly lower it to `rethrow` in
LateEHPrepare.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59352
llvm-svn: 356316
Summary:
Rewrite WebAssemblyFixIrreducibleControlFlow to a simpler and cleaner
design, which directly computes reachability and other properties
itself. This avoids previous complexity and bugs. (The new graph
analyses are very similar to how the Relooper algorithm would find loop
entries and so forth.)
This fixes a few bugs, including where we had a false positive and
thought fannkuch was irreducible when it was not, which made us much
larger and slower there, and a reverse bug where we missed
irreducibility. On fannkuch, we used to be 44% slower than asm2wasm and
are now 4% faster.
Reviewers: aheejin
Subscribers: jdoerfert, mgrang, dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D58919
Patch by Alon Zakai (kripken)
llvm-svn: 356313
TimePassesHandler object (implementation of time-passes for new pass manager)
gains ability to report into a stream customizable per-instance (per pipeline).
Intended use is to specify separate time-passes output stream per each compilation,
setting up TimePasses member of StandardInstrumentation during PassBuilder setup.
That allows to get independent non-overlapping pass-times reports for parallel
independent compilations (in JIT-like setups).
By default it still puts timing reports into the info-output-file stream
(created by CreateInfoOutputFile every time report is requested).
Unit-test added for non-default case, and it also allowed to discover that print() does not work
as declared - it did not reset the timers, leading to yet another report being printed into the default stream.
Fixed print() to actually reset timers according to what was declared in print's comments before.
Reviewed By: philip.pfaffe
Differential Revision: https://reviews.llvm.org/D59366
llvm-svn: 356305
tMOVr and tPUSH/tPOP/tPOP_RET have register constraints which can't be
expressed in TableGen, so check them explicitly. I've unfortunately run
into issues with both of these recently; hopefully this saves some time
for someone else in the future.
Differential Revision: https://reviews.llvm.org/D59383
llvm-svn: 356303
Summary:
As noted by @andreadb in https://reviews.llvm.org/D59035#inline-525780
If we have `sext (trunc (cmov C0, C1) to i8)`,
we can instead do `cmov (sext (trunc C0 to i8)), (sext (trunc C1 to i8))`
Reviewers: craig.topper, andreadb, RKSimon
Reviewed By: craig.topper
Subscribers: llvm-commits, andreadb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59412
llvm-svn: 356301
Switch BIC immediate creation for vector ANDs from custom lowering
to a DAG combine, which gives generic DAG combines a change to
apply first. In particular this avoids (and x, -1) being turned into
a (bic x, 0) instead of being eliminated entirely.
Differential Revision: https://reviews.llvm.org/D59187
llvm-svn: 356299
Summary:
At the exit of the loop, the compiler uses a register to remember and accumulate
the number of threads that have already exited. When all active threads exit the
loop, this register is used to restore the exec mask, and the execution continues
for the post loop code.
When there is a "continue" in the loop, the compiler made a mistake to reset the
register to 0 when the "continue" backedge is taken. This will result in some
threads not executing the post loop code as they are supposed to.
This patch fixed the issue.
Reviewers:
nhaehnle, arsenm
Differential Revision:
https://reviews.llvm.org/D59312
llvm-svn: 356298
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.
The result is that we replace a vector gep with at least one undef in each lane with a undef. We can also do the same for vector intrinsics. Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.
Differential Revision: https://reviews.llvm.org/D57468
llvm-svn: 356293
Reduce the size of an any-extended i64 scalar_to_vector source to i32 - the any_extend nodes are often introduced by SimplifyDemandedBits.
llvm-svn: 356292
Since we can't insert s16 gprs as we don't have 16 bit GPR registers, we need to
teach RBS to assign them to the FPR bank so our selector works.
llvm-svn: 356282
The existing lowering code is accidentally correct for unordered atomics as far as I can tell. An unordered atomic has no memory ordering, and simply requires the actual load or store to be done as a single well aligned instruction. As such, relax the restriction while adding tests to ensure the lowering remains correct in the future.
Differential Revision: https://reviews.llvm.org/D57803
llvm-svn: 356280
Previous commit 6bc58e6d3dbd ("[BPF] do not generate unused local/global types")
tried to exclude global variable from type generation. The condition is:
if (Global.hasExternalLinkage())
continue;
This is not right. It also excluded initialized globals.
The correct condition (from AssemblyWriter::printGlobal()) is:
if (!GV->hasInitializer() && GV->hasExternalLinkage())
Out << "external ";
Let us do the same in BTF type generation. Also added a test for it.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 356279
Summary:
The AliasSummary previously contained the AliaseeGUID, which was only
populated when reading the summary from bitcode. This patch changes it
to instead hold the ValueInfo of the aliasee, and always populates it.
This enables more efficient access to the ValueInfo (specifically in the
recent patch r352438 which needed to perform an index hash table lookup
using the aliasee GUID).
As noted in the comments in AliasSummary, we no longer technically need
to keep a pointer to the corresponding aliasee summary, since it could
be obtained by walking the list of summaries on the ValueInfo looking
for the summary in the same module. However, I am concerned that this
would be inefficient when walking through the index during the thin
link for various analyses. That can be reevaluated in the future.
By always populating this new field, we can remove the guard and special
handling for a 0 aliasee GUID when dumping the dot graph of the summary.
An additional improvement in this patch is when reading the summaries
from LLVM assembly we now set the AliaseeSummary field to the aliasee
summary in that same module, which makes it consistent with the behavior
when reading the summary from bitcode.
Reviewers: pcc, mehdi_amini
Subscribers: inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D57470
llvm-svn: 356268
Summary:
This is similar to how addr2line handles consecutive entries with the
same address - pick the last one.
Reviewers: dblaikie, friss, JDevlieghere
Reviewed By: dblaikie
Subscribers: eugenis, vitalybuka, echristo, JDevlieghere, probinson, aprantl, hiraditya, rupprecht, jdoerfert, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D58952
llvm-svn: 356265
Summary:
This is a fix to bug 41052:
https://bugs.llvm.org/show_bug.cgi?id=41052
While trying to optimize a memory instruction in a dead basic block, we end up registering the same phi for replacement twice. This patch avoids registering more than the first replacement candidate for a phi.
Patch by: JesperAntonsson
Reviewers: skatkov, aprantl
Reviewed By: aprantl
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59358
llvm-svn: 356260
Summary:
- During the fixing of SGPR copying from VGPR, ensure users of SCC is
properly propagated, i.e.
* only propagate through live def of SCC,
* skip the SCC-def inst itself, and
* stop the propagation on the other SCC-def inst after checking its
SCC-use first.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59362
llvm-svn: 356258
We are adding a sign extended IR value to an int64_t, which can cause
signed overflows, as in the attached test case, where we have a formula
with BaseOffset = -1 and a constant with numeric_limits<int64_t>::min().
If the addition would overflow, skip the simplification for this
formula. Note that the target triple is required to trigger the failure.
Reviewers: qcolombet, gilr, kparzysz, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59211
llvm-svn: 356256
yaml2obj currently derives the p_filesz, p_memsz, and p_offset values of
program headers from their sections. This makes writing tests for
certain formats more complex, and sometimes impossible. This patch
allows setting these fields explicitly, overriding the default value,
when relevant.
Reviewed by: jakehehrlich, Higuoxing
Differential Revision: https://reviews.llvm.org/D59372
llvm-svn: 356247
Bail early when we don't have a preheader and also if the target is
big endian because it's written with only little endian in mind!
Differential Revision: https://reviews.llvm.org/D59368
llvm-svn: 356243
Certain 32 bit constants can be generated with a single instruction
instead of two. Implement materialize32BitImm function for MIPS32.
Differential Revision: https://reviews.llvm.org/D59369
llvm-svn: 356238
The kernel currently has a limit for # of types to be 64KB and
the size of string subsection to be 64KB. A simple bcc tool
runqlat.py generates:
. the size of ~33KB type section, roughly ~10K types
. the size of ~17KB string section
The majority type is from the types referenced by local
variables in the bpf program. For example, the kernel "task_struct"
itself recursively brings in ~900 other types.
This patch did the following optimization to avoid generating
unused types:
. do not generate types for local variables unless they are
function arguments.
. do not generate types for external globals.
If an external global is not used in the program, llvm
already removes it from IR, so global variable saving is
typical small. For runqlat.py, only one variable "llvm.used"
is the external global.
The types for locals and external globals can be added back
once there is a usage for them.
After the above optimization, the runqlat.py generates:
. the size of ~1.5KB type section, roughtly 500 types
. the size of ~0.7KB string section
UPDATE:
resubmitted the patch after previous revert with
the following fix:
use Global.hasExternalLinkage() to test "external"
linkage instead of using Global.getInitializer(),
which will assert on external variables.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 356234
The kernel currently has a limit for # of types to be 64KB and
the size of string subsection to be 64KB. A simple bcc tool
runqlat.py generates:
. the size of ~33KB type section, roughly ~10K types
. the size of ~17KB string section
The majority type is from the types referenced by local
variables in the bpf program. For example, the kernel "task_struct"
itself recursively brings in ~900 other types.
This patch did the following optimization to avoid generating
unused types:
. do not generate types for local variables unless they are
function arguments.
. do not generate types for external globals.
If an external global is not used in the program, llvm
already removes it from IR, so global variable saving is
typical small. For runqlat.py, only one variable "llvm.used"
is the external global.
The types for locals and external globals can be added back
once there is a usage for them.
After the above optimization, the runqlat.py generates:
. the size of ~1.5KB type section, roughtly 500 types
. the size of ~0.7KB string section
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 356232
Before r355981, this was under LLVM_DEBUG. I don't think the assert is
quite right, but this really should be a verifier check. Instcombine
should not be asserting on this sort of thing.
llvm-svn: 356219
This is almost the same as:
rL355345
...and should prevent any potential crashing from examples like:
https://bugs.llvm.org/show_bug.cgi?id=41064
...although the bug was masked by:
rL355823
...and I'm not sure how to repro the problem after that change.
llvm-svn: 356218
These now verify that a given instruction has a specific source
location, rather than any old location. We want to make sure we
propagate the correct locations from one instruction to another.
llvm-svn: 356217
This isn't necessary according to the DWARF standard, but it matches the
.eh_frame sections emitted by other tools in practice, and the Android
libunwindstack rejects .eh_frame sections where an FDE refers to a CIE
other than the closest previous CIE. So match the other tools and also
sort accordingly.
I consider this a bug in libunwindstack, but it's easy enough to emit
a compatible .eh_frame section for compatibility with installed
operating systems.
Differential Revision: https://reviews.llvm.org/D58266
llvm-svn: 356216
This has been a very painful missing feature that has made producing
reduced testcases difficult. In particular the various registers
determined for stack access during function lowering were necessary to
avoid undefined register errors in a large percentage of
cases. Implement a subset of the important fields that need to be
preserved for AMDGPU.
Most of the changes are to support targets parsing register fields and
properly reporting errors. The biggest sort-of bug remaining is for
fields that can be initialized from the IR section will be overwritten
by a default initialized machineFunctionInfo section. Another
remaining bug is the machineFunctionInfo section is still printed even
if empty.
llvm-svn: 356215
This adds instruction selection support for G_UADDO on s32s and s64s.
Also
- Add an instruction selection test
- Update the arm64-xaluo.ll test to show that we generate the correct assembly
Differential Revision: https://reviews.llvm.org/D58734
llvm-svn: 356214
This re-uses the previous support for extract vector elt to extract the
subvectors.
Differential Revision: https://reviews.llvm.org/D59390
llvm-svn: 356213
For ELF, we accept but ignore --only-keep-debug. Do the same for llvm-strip.
COFF does implement this, so update the test that it is supported.
llvm-svn: 356207
On ARC ISA, general format of load instruction is this:
LD<zz><.x><.aa><.di> a, [b,c]
And general format of store is this:
ST<zz><.aa><.di> c, [b,s9]
Where:
<zz> is data size field and can be one of
<empty> (bits 00) - Word (32-bit), default behavior
B (bits 01) - Byte
H (bits 10) - Half-word (16-bit)
<.x> is data extend mode:
<empty> (bit 0) - If size is not Word(32-bit), then data is zero extended
X (bit 1) - If size is not Word(32-bit), then data is sign extended
<.aa> is address write-back mode:
<empty> (bits 00) - no write-back
.AW (bits 01) - Preincrement, base register updated pre memory transaction
.AB (bits 10) - Postincrement, base register updated post memory transaction
<.di> is cache bypass mode:
<empty> (bit 0) - Cached memory access, default mode
.DI (bit 1) - Non-cached data memory access
This patch adds these load/store instruction variants to the ARC backend.
Patch By Denis Antrushin! <denis@synopsys.com>
Differential Revision: https://reviews.llvm.org/D58980
llvm-svn: 356200
The shift argument is defined to be modulo the bitwidth, so if that argument
is a constant, we can always reduce the constant to its minimal form to allow
better CSE and other follow-on transforms.
We need to be careful to ignore constant expressions here, or we will likely
infinite loop. I'm adding a general vector constant query for that case.
Differential Revision: https://reviews.llvm.org/D59374
llvm-svn: 356192
This adds support for inserting elements into packed vectors. It also adds
two tests: one for selection, and one for regbank select.
Unpacked vectors will come in a follow-up.
Differential Revision: https://reviews.llvm.org/D59325
llvm-svn: 356182
Summary:
CoverageExporterJson::renderFiles accounts for most of the execution time given a large profdata file with multiple binaries.
Proposed solution is to generate JSON for each file in parallel and sort at the end to preserve deterministic output. Also added flags to skip generating parts of the output to trim the output size.
Patch by Sajjad Mirza (@sajjadm).
Reviewers: Dor1s, vsk
Reviewed By: Dor1s, vsk
Subscribers: liaoyuke, mgrang, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59277
llvm-svn: 356178
Building on the work done in D57601, now that we can distinguish between atomic and volatile memory accesses, go ahead and allow code motion of unordered atomics. As seen in the diffs, this allows much better folding of memory operations into using instructions. (Mostly done by the PeepholeOpt pass.)
Note: I have not reviewed all callers of hasOrderedMemoryRef since one of them - isSafeToMove - is very widely used. I'm relying on the documented semantics of each method to judge correctness.
Differential Revision: https://reviews.llvm.org/D59345
llvm-svn: 356170
These instructions used to use rotl with a bitwidth-1 immediate. I changed the immediate to 1,
but failed to change the opcode.
Thankfully this seems to have not caused a functional issue because we now had two rotl by 1 patterns,
but the correct ones were earlier and took priority. So we just missed some optimization.
llvm-svn: 356164
This is an immediate fix for:
https://bugs.llvm.org/show_bug.cgi?id=41066
...but as noted there and the code comments, we should do better
by stubbing this out sooner.
llvm-svn: 356158
This is consistent with what SelectionDAG does and is much easier to
work with than the extract sequence with an artificial wide register.
For the AMDGPU control flow intrinsics, this was producing an s128 for
the i64, i1 tuple return. Any legalization that should apply to a real
s128 value would badly obscure the direct values that need to be seen.
llvm-svn: 356147
I found these by asserting in clang for any GCCBuiltin that doesn't
require mangling and requires a constant for the builtin. This means
that intrinsics are missing which don't use GCCBuiltin, don't have
builtins defined in clang, or were missing the constant annotation in
the builtin definition.
llvm-svn: 356144
This patch changes llvm-objcopy's behaviour to not strip sections that
are in segments, if they otherwise would be due to a stripping operation
(--strip-all, --strip-sections, --strip-non-alloc). This preserves the
segment contents. It does not change the behaviour of --strip-all-gnu
(although we could choose to do so), because GNU objcopy's behaviour in
this case seems to be to strip the section, nor does it prevent removing
of sections in segments with --remove-section (if a user REALLY wants to
remove a section, we should probably let them, although I could be
persuaded that warning might be appropriate). Tests have been added to
show this latter behaviour.
This fixes https://bugs.llvm.org/show_bug.cgi?id=41006.
Reviewed by: grimar, rupprecht, jakehehrlich
Differential Revision: https://reviews.llvm.org/D59293
This is a reland of r356129, attempting to fix greendragon failures
due to a suspected compatibility issue with od on the greendragon bots
versus other versions.
llvm-svn: 356136
When choosing whether a pair of loads can be combined into a single
wide load, we check that the load only has a sext user and that sext
also only has one user. But this can prevent the transformation in
the cases when parallel macs use the same loaded data multiple times.
To enable this, we need to fix up any other uses after creating the
wide load: generating a trunc and a shift + trunc pair to recreate
the narrow values. We also need to keep a record of which loads have
already been widened.
Differential Revision: https://reviews.llvm.org/D59215
llvm-svn: 356132
This patch changes llvm-objcopy's behaviour to not strip sections that
are in segments, if they otherwise would be due to a stripping operation
(--strip-all, --strip-sections, --strip-non-alloc). This preserves the
segment contents. It does not change the behaviour of --strip-all-gnu
(although we could choose to do so), because GNU objcopy's behaviour in
this case seems to be to strip the section, nor does it prevent removing
of sections in segments with --remove-section (if a user REALLY wants to
remove a section, we should probably let them, although I could be
persuaded that warning might be appropriate). Tests have been added to
show this latter behaviour.
This fixes https://bugs.llvm.org/show_bug.cgi?id=41006.
Reviewed by: grimar, rupprecht, jakehehrlich
Differential Revision: https://reviews.llvm.org/D59293
llvm-svn: 356129
Prior to the introduction of funnel shift intrinsics we could count on rotate
by immediates prefering to use rotl since that's what MatchRotate would check
first. The or+shift pattern doesn't have a direction so one must be chosen
arbitrarily.
With funnel shift, there is a direction and fshr will try to use rotr first.
While fshl will try to use rotl first.
This patch adds the isel patterns for rotr to complement the rotl patterns. I've
put the rotr by 1 patterns in the instruction patterns. And moved the rotl by
bitwidth-1 patterns to separate Pat patterns.
Fixes PR41057.
llvm-svn: 356121
getConstantVRegVal used to only look for G_CONSTANT when looking at
unboxing the value of a vreg. However, constants are sometimes not
directly used and are hidden behind trunc, s|zext or copy chain of
computation.
In particular this may be introduced by the legalization process that
doesn't want to simplify these patterns because it can lead to infine
loop when legalizing a constant.
To circumvent that problem, add a new variant of getConstantVRegVal,
named getConstantVRegValWithLookThrough, that allow to look through
extensions.
Differential Revision: https://reviews.llvm.org/D59227
llvm-svn: 356116
error() was previously cleaned up from CopyConfig, but new uses were introduced.
This also tweaks the error message for --add-symbol to report all invalid flags.
llvm-svn: 356105
I found these by asserting in clang for any GCCBuiltin that doesn't
require mangling and requires a constant for the builtin. This means
that intrinsics are missing which don't use GCCBuiltin, don't have
builtins defined in clang, or were missing the constant annotation in
the builtin definition.
llvm-svn: 356091
I found these by asserting in clang for any GCCBuiltin that doesn't
require mangling and requires a constant for the builtin. This means
that intrinsics are missing which don't use GCCBuiltin, don't have
builtins defined in clang, or were missing the constant annotation in
the builtin definition.
I'm not sure what's going on with the immediates.ll test. It seems to
be intended to test invalid cases like this, but then tries to handle
some of them anyway. I've moved the cases that were inconsistent with
the GCCBuiltin definition so they don't test the codegen anymore.
llvm-svn: 356085
Summary:
MsgPackDocument is the lighter-weight replacement for MsgPackTypes. This
commit switches AMDGPU HSA metadata processing to use MsgPackDocument
instead of MsgPackTypes.
Differential Revision: https://reviews.llvm.org/D57024
Change-Id: I0751668013abe8c87db01db1170831a76079b3a6
llvm-svn: 356081
The feature flag alone can't be trusted since it can be passed via -mattr. Need to ensure 64-bit mode as well.
We had a 64 bit mode check on the instruction to make the assembler work correctly. But we weren't guarding any of our lowering code or the hooks for the AtomicExpandPass.
I've added 32-bit command lines to atomic128.ll with and without cx16. The tests there would all previously fail if -mattr=cx16 was passed to them. I had to move one test case for f128 to a new file as it seems to have a different 32-bit mode or possibly sse issue.
Differential Revision: https://reviews.llvm.org/D59308
llvm-svn: 356078
Because we don't currently simplify icmp with undef in DAG, bugpoint loves to introduce them during reduction.
This is a small step towards re-adding non-undef values into some of the simpler tests so that they should still test correctly and emit similar/same codegen.
Prep work for PR40800 ([SelectionDAG] Add UNDEF handling to SelectionDAG::FoldSetCC).
llvm-svn: 356076
rL356068 caused some minor re-orderings. Regenerate legalize-fneg.ll to
reflect this, and remove the NOLIB check lines (they're redundant given that
the RV32I and RV64I check lines generated by update_llc_test_checks.py already
demonstrate there is no libcall).
llvm-svn: 356074
Summary:
A number of optimizations are inhibited by single-use TokenFactors not
being merged into the TokenFactor using it. This makes we consider if
we can do the merge immediately.
Most tests changes here are due to the change in visitation causing
minor reorderings and associated reassociation of paired memory
operations.
CodeGen tests with non-reordering changes:
X86/aligned-variadic.ll -- memory-based add folded into stored leaq
value.
X86/constant-combiners.ll -- Optimizes out overlap between stores.
X86/pr40631_deadstore_elision -- folds constant byte store into
preceding quad word constant store.
Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet
Reviewed By: courbet
Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59260
llvm-svn: 356068
Attempt to combine CONCAT_VECTORS nodes, which we only really have pre-legalization.
This encourages a lot of X86ISD::SUBV_BROADCAST generation, so I've added SimplifyDemandedVectorEltsForTargetNode handling for this at the same time.
The X86ISD::VTRUNC regression in shuffle-vs-trunc-256-widen.ll will be handled in a future commit.
llvm-svn: 356064
This follows similar logic in the ARM and Mips backends, and allows the free
use of s0 in functions without a dedicated frame pointer. The changes in
callee-saved-gprs.ll most clearly show the effect of this patch.
llvm-svn: 356063
Note that s0 need not be marked reserved if the frame pointer isn't used. For
the ILP32 and LP64 soft float ABIS that are currently support, all FPRs are
always considered temporaries.
llvm-svn: 356061
A fuzzer found the crasher:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13700
The bug was introduced recently here:
rL355741
This is the quick fix. If we need to do this transform
later, then we'd have to extend/truncate the vector setcc
element type to the scalar setcc type (i8).
llvm-svn: 356053
Before this change LLVM emits non-microMIPS variant of the `mov.d`
command for microMIPS code.
Differential Revision: http://reviews.llvm.org/D59045
llvm-svn: 356052
To provide mapping between standard and microMIPS R6 variants of the
`sw` command we have to rename SWSP_xxx commands from "sw" to "swsp".
Otherwise `tablegen` starts to show the error `Multiple matches found
for `SW'`. After that to restore printing SWSP command as `sw`, I add
an appropriate `MipsInstAlias` instance.
We also need to implement "size reduction" for microMIPS R6. But this
task is for separate patch. After that the `micromips-lwsp-swsp.ll` test
case will be extended.
Differential Revision: http://reviews.llvm.org/D59046
llvm-svn: 356045
AVX1 broadcasts were failing as we were adding bitcasts that caused MayFoldLoad's hasOneUse to return false.
This patch stops introducing bitcasts so early and also replaces the broadcast index scaling through bitcasts (which can't succeed in some cases) to instead just keep track of the bitoffset which can be converted back to the broadcast index later on.
Differential Revision: https://reviews.llvm.org/D58888
llvm-svn: 356043
First step towards PR40800 - I intend to move the float case in a separate future patch.
I had to tweak the (overly reduced) thumb2 test and the x86 widening test change is annoying (no longer rematerializable) but we should address this separately.
Differential Revision: https://reviews.llvm.org/D59244
llvm-svn: 356040
On micromips MipsMTLOHI is always matched to PseudoMTLOHI_DSP regardless
of +dsp argument. This patch checks is HasDSP predicate is present for
PseudoMTLOHI_DSP so PseudoMTLOHI_MM can be matched when appropriate.
Add expansion of PseudoMTLOHI_MM instruction into a mtlo/mthi pair.
Patch by Mirko Brkusanin.
Differential Revision: http://reviews.llvm.org/D59203
llvm-svn: 356039
Add break statements in Object/ELF.cpp since the code should consider the
generic tags for Hexagon, MIPS, and PPC. Add a test (copied from llvm-readobj)
to show that this works correctly (earlier versions of this patch would have
asserted).
The warnings in X86ELFObjectWriter.cpp are actually false-positives since
the nested switch() handles all possible values and returns in all cases.
Make this explicit by adding llvm_unreachable's.
Differential Revision: https://reviews.llvm.org/D58837
llvm-svn: 356037
Summary:
After instruction selection phase, possibly-throwing calls, which were
previously invoke, are wrapped in `EH_LABEL` instructions. For example:
```
EH_LABEL <mcsymbol .Ltmp0>
CALL_VOID @foo ...
EH_LABEL <mcsymbol .Ltmp1>
```
`EH_LABEL` is placed also in the beginning of EH pads:
```
bb.1 (landing-pad):
EH_LABEL <mcsymbol .Ltmp2>
...
```
And we'd like to maintian this relationship, so when we place a `try`,
```
TRY ...
EH_LABEL <mcsymbol .Ltmp0>
CALL_VOID @foo ...
EH_LABEL <mcsymbol .Ltmp1>
```
When we place a `catch`,
```
bb.1 (landing-pad):
EH_LABEL <mcsymbol .Ltmp2>
%0:except_ref = CATCH ...
...
```
Previously we didn't treat EH_LABELs specially, so `try` was placed
right before a call, and `catch` was placed in the beginning of an EH
pad.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58914
llvm-svn: 355996
Remove test cases that checked for not crashing when immediate operands were passed not an immediate. These are now considered ill-formed in IR.
This was done by manually scanning the intrinsic file for llvm_i32_ty and llvm_i8_ty which are the predominant types we use for immediates. Most of them are on vector intrinsics. I might have missed some other intrinsics.
Differential Revision: https://reviews.llvm.org/D58302
llvm-svn: 355993
Currently we have -Rpass for filtering the remarks that are displayed as
diagnostics, but when using -fsave-optimization-record, there is no way
to filter the remarks while generating them.
This adds support for filtering remarks by passes using a regex.
Ex: `clang -fsave-optimization-record -foptimization-record-passes=inline`
will only emit the remarks coming from the pass `inline`.
This adds:
* `-fsave-optimization-record` to the driver
* `-opt-record-passes` to cc1
* `-lto-pass-remarks-filter` to the LTOCodeGenerator
* `--opt-remarks-passes` to lld
* `-pass-remarks-filter` to llc, opt, llvm-lto, llvm-lto2
* `-opt-remarks-passes` to gold-plugin
Differential Revision: https://reviews.llvm.org/D59268
Original llvm-svn: 355964
llvm-svn: 355984
A faulting_op is one that has specified behavior when a fault occurs, generally redirecting control flow to another location. This change just adds a comment to the assembly output which makes it both human readable, and machine checkable w/o having to parse the FaultMap section. This is used to split a test file into two parts, so that I can (in a near future commit) easily extend the test file to demonstrate another case.
llvm-svn: 355982
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.
Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.
This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.
llvm-svn: 355981
Summary:
This is similar to how addr2line handles consecutive entries with the
same address - pick the last one.
Reviewers: dblaikie, friss, JDevlieghere
Reviewed By: dblaikie
Subscribers: ormris, echristo, JDevlieghere, probinson, aprantl, hiraditya, rupprecht, jdoerfert, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D58952
llvm-svn: 355972
Currently we have -Rpass for filtering the remarks that are displayed as
diagnostics, but when using -fsave-optimization-record, there is no way
to filter the remarks while generating them.
This adds support for filtering remarks by passes using a regex.
Ex: `clang -fsave-optimization-record -foptimization-record-passes=inline`
will only emit the remarks coming from the pass `inline`.
This adds:
* `-fsave-optimization-record` to the driver
* `-opt-record-passes` to cc1
* `-lto-pass-remarks-filter` to the LTOCodeGenerator
* `--opt-remarks-passes` to lld
* `-pass-remarks-filter` to llc, opt, llvm-lto, llvm-lto2
* `-opt-remarks-passes` to gold-plugin
Differential Revision: https://reviews.llvm.org/D59268
llvm-svn: 355964
The included test case currently crashes on tip of tree. Rather than adding a bailout, I chose to restructure the code so that the existing helper function could be used. Given that, the majority of the diff is NFC-ish, but the key difference is that canConvertValue returns false when only one side is a non-integral pointer.
Thanks to Cherry Zhang for the test case.
Differential Revision: https://reviews.llvm.org/D59000
llvm-svn: 355962
Summary:
This fixes an extremely long compile time caused by recursive analysis
of truncs, which were not previously subject to any depth limits unlike
some of the other ops. I decided to use the same control used for
sext/zext, since the routines analyzing these are sometimes mutually
recursive with the trunc analysis.
Reviewers: mkazantsev, sanjoy
Subscribers: sanjoy, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58994
llvm-svn: 355949
Vector imm setting instructions like XXLXORz/XXLXORspz/XXLXORdpz
Should behave like LI8.
We should set corresponding flags to allow rematerialization and other
opts in LICM, RA, Scheduling etc.
Differential Revision: https://reviews.llvm.org/D58645
llvm-svn: 355948
This patch adds a new option to SplitAllCriticalEdges and uses it to avoid splitting critical edges when the destination basic block ends with unreachable. Otherwise if we split the critical edge, sanitizer coverage will instrument the new block that gets inserted for the split. But since this block itself shouldn't be reachable this is pointless. These basic blocks will stick around and generate assembly, but they don't end in sane control flow and might get placed at the end of the function. This makes it look like one function has code that flows into the next function.
This showed up while compiling the linux kernel with clang. The kernel has a tool called objtool that detected the code that appeared to flow from one function to the next. https://github.com/ClangBuiltLinux/linux/issues/351#issuecomment-461698884
Differential Revision: https://reviews.llvm.org/D57982
llvm-svn: 355947
If a symbol points to the end of a fragment, instead of searching for
fixups in that fragment, search in the next fragment.
Fixes spurious assembler error with subtarget change next to "la"
pseudo-instruction, or expanded equivalent.
Alternate proposal to fix the problem discussed in
https://reviews.llvm.org/D58759.
Testcase by Ana Pazos.
Differential Revision: https://reviews.llvm.org/D58943
llvm-svn: 355946
Prior to this change, the "Symbol" field of a relocation would always be
assumed to be a symbol name, and if no such symbol existed, the
relocation would reference index 0. This confused me when I tried to use
a literal symbol index in the field: since "0x1" was not a known symbol
name, the symbol index was set as 0. This change falls back to treating
unknown symbol names as integers, and emits an error if the name is not
found and the string is not an integer.
Note that the Symbol field is optional, so if a relocation doesn't
reference a symbol, it shouldn't be specified. The new error required a
number of test updates.
Reviewed by: grimar, ruiu
Differential Revision: https://reviews.llvm.org/D58510
llvm-svn: 355938
Expand MULO with constant power of two operand into a shift. The
overflow is checked with (x << shift) >> shift == x, where the right
shift will be logical for umulo and arithmetic for smulo (with
exception for multiplications by signed_min).
Differential Revision: https://reviews.llvm.org/D59041
llvm-svn: 355937
I recently discovered a bug in llvm-cxxfilt introduced in r353743 but
was fixed later incidentally due to r355031. Specifically, llvm-cxxfilt
was attempting to call .back() on an empty string any time there was a
new line in the input. This was causing a crash in my debug builds only.
This patch simply adds a test that explicitly tests that llvm-cxxfilt
handles empty lines correctly. It may pass under release builds under
the broken behaviour, but it fails at least in debug builds.
Reviewed by: mattd
Differential Revision: https://reviews.llvm.org/D58785
llvm-svn: 355929
This patch removes two assertions that were preventing writing of a test
that checked an empty line followed by some text. For example:
CHECK: {{^$}}
CHECK-NEXT: foo()
The assertion was because the current location the CHECK-NEXT was
scanning from was the start of the buffer. A similar issue occurred with
CHECK-SAME. These assertions don't protect against anything, as there is
already an error check that checks that CHECK-NEXT/EMPTY/SAME don't
appear first in the checks, and the following code works fine if the
pointer is at the start of the input.
Reviewed by: probinson, thopre, jdenny
Differential Revision: https://reviews.llvm.org/D58784
llvm-svn: 355928
Targets can potentially emit more efficient code if they know address
computations never overflow. For example ILP32 code on AArch64 (which only has
64-bit address computation) can ignore the possibility of overflow with this
extra information.
llvm-svn: 355926
These two values correspond to the 'Empty' and 'Tombstone' special
keys defined by DenseMapInfo<int64_t>, which means that neither one
can be used as a key in DenseMap<int64_t, anything>. Hence, if you try
to use either of those values as an int literal, IntInit::get() fails
an assertion when it tries to insert them into its static cache of
int-literal objects.
Fixed by replacing the DenseMap with a std::map, which doesn't intrude
on the space of legal values of the key type.
Reviewers: nhaehnle, hfinkel, javedabsar, efriedma
Reviewed By: efriedma
Subscribers: fhahn, efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59016
llvm-svn: 355900
These are closely modeled on similar tests for the ilp32 ABI. Like those
tests, we group together tests that should be common cross lp64, lp64+lp64f,
and lp64+lp64f+lp64d ABIs.
llvm-svn: 355899
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.
Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.
Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal
Reviewed By: sdesmalen
Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57728
llvm-svn: 355889
Summary:
Swift now generates PDBs for debugging on Windows. llvm and lldb
need a language enumerator value too properly handle the output
emitted by swiftc.
Subscribers: jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59231
llvm-svn: 355882
After r355865, we should be able to safely select G_EXTRACT_VECTOR_ELT without
running into any problematic intrinsics.
Also add a fix for lane copies, which don't support index 0.
llvm-svn: 355871
AtomicCmpSwapWithSuccess is legalised into an AtomicCmpSwap plus a comparison.
This requires an extension of the value which, by default, is a
zero-extension. When we later lower AtomicCmpSwap into a PseudoCmpXchg32 and then expanded in
RISCVExpandPseudoInsts.cpp, the lr.w instruction does a sign-extension.
This mismatch of extensions causes the comparison to fail when the compared
value is negative. This change overrides TargetLowering::getExtendForAtomicOps
for RISC-V so it does a sign-extension instead.
Differential Revision: https://reviews.llvm.org/D58829
Patch by Ferran Pallarès Roca.
llvm-svn: 355869
The RISC-V Assembly Programmer's Manual defines fp as another alias of x8.
However, our tablegen rules only recognise s0. This patch adds fp as another
alias of x8. GCC also accepts fp.
Differential Revision: https://reviews.llvm.org/D59209
Patch by Ferran Pallarès Roca.
llvm-svn: 355867
Overloaded intrinsics aren't necessarily safe for instruction selection. One
such intrinsic is aarch64.neon.addp.*.
This is a temporary workaround to ensure that we always fall back on that
intrinsic. Eventually this will be replaced with a proper solution.
https://bugs.llvm.org/show_bug.cgi?id=40968
Differential Revision: https://reviews.llvm.org/D59062
llvm-svn: 355865
It hasn't seen active development in years, and it hasn't reached a
state where it was useful.
Remove the code until someone is interested in working on it again.
Differential Revision: https://reviews.llvm.org/D59133
llvm-svn: 355862
Fixes https://bugs.llvm.org/show_bug.cgi?id=36796.
Implement basic legalizations (PromoteIntRes, PromoteIntOp,
ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes.
There are more legalizations missing (esp float legalizations),
but there's no way to test them right now, so I'm not adding them.
This also includes a few more changes to make this work somewhat
reasonably:
* Add support for expanding VECREDUCE in SDAG. Usually
experimental.vector.reduce is expanded prior to codegen, but if the
target does have native vector reduce, it may of course still be
necessary to expand due to legalization issues. This uses a shuffle
reduction if possible, followed by a naive scalar reduction.
* Allow the result type of integer VECREDUCE to be larger than the
vector element type. For example we need to be able to reduce a v8i8
into an (nominally) i32 result type on AArch64.
* Use the vector operand type rather than the scalar result type to
determine the action, so we can control exactly which vector types are
supported. Also change the legalize vector op code to handle
operations that only have vector operands, but no vector results, as
is the case for VECREDUCE.
* Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE),
explicitly specify for which vector types the reductions are supported.
This does not handle anything related to VECREDUCE_STRICT_*.
Differential Revision: https://reviews.llvm.org/D58015
llvm-svn: 355860
As a fix for https://bugs.llvm.org/show_bug.cgi?id=40986 ("excessive compile
time building opencollada"), this patch makes sure that no phys reg is hinted
more than once from getRegAllocationHints().
This handles the case were many virtual registers are assigned to the same
physreg. The previous compile time fix (r343686) in weightCalcHelper() only
made sure that physical/virtual registers are passed no more than once to
addRegAllocationHint().
Review: Dimitry Andric, Quentin Colombet
https://reviews.llvm.org/D59201
llvm-svn: 355854
Summary:
Depends on https://reviews.llvm.org/D59069.
https://bugs.llvm.org/show_bug.cgi?id=40979 describes a bug in which the
-coro-split pass would assert that a use was across a suspend point from
a definition. Normally this would mean that a value would "spill" across
a suspend point and thus need to be stored in the coroutine frame. However,
in this case the use was unreachable, and so it would not be necessary
to store the definition on the frame.
To prevent the assert, simply remove unreachable basic blocks from a
coroutine function before computing spills. This avoids the assert
reported in PR40979.
Reviewers: GorNishanov, tks2103
Reviewed By: GorNishanov
Subscribers: EricWF, jdoerfert, llvm-commits, lewissbaker
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59068
llvm-svn: 355852
Summary:
llvm-objdump can be tricked into reading beyond valid memory and
segfaulting if LC_LINKER_COMMAND strings are not null terminated. libObject
does have code to validate the integrity of the LC_LINKER_COMMAND struct,
but this validator improperly assumes linker command strings are null
terminated.
The solution is to report an error if a string extends beyond the end of
the LC_LINKER_COMMAND struct.
Reviewers: lhames, pete
Reviewed By: pete
Subscribers: rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59179
llvm-svn: 355851
Fixes bug 38023: https://bugs.llvm.org/show_bug.cgi?id=38023
The SimplifyCFG pass will perform jump threading in some cases where
doing so is trivial and would simplify the CFG. When folding a series
of blocks with redundant conditional branches into an unconditional "critical
edge" block, it does not keep the debug location associated with the previous
conditional branch.
This patch fixes the bug described by copying the debug info from the
old conditional branch to the new unconditional branch instruction, and
adds a regression test for the SimplifyCFG pass that covers this case.
Patch by Stephen Tozer!
Differential Revision: https://reviews.llvm.org/D59206
llvm-svn: 355833
A pattern needed to match TruncIntFP was missing. This was causing multiple
tests from llvm test suite to fail during compilation for micromips.
Patch by Mirko Brkusanin.
Differential Revision: https://reviews.llvm.org/D58722
llvm-svn: 355825
Inserting an overflowing arithmetic intrinsic can increase register
pressure by producing two values at a point where only one is needed,
while the second use maybe several blocks away. This increase in
pressure is likely to be more detrimental on performance than
rematerialising one of the original instructions.
So, check that the arithmetic and compare instructions are no further
apart than their immediate successor/predecessor.
Differential Revision: https://reviews.llvm.org/D59024
llvm-svn: 355823
Fixes bug 37966: https://bugs.llvm.org/show_bug.cgi?id=37966
The Jump Threading pass will replace certain conditional branch
instructions with unconditional branches when it can prove that only one
branch can occur. Prior to this patch, it would not carry the debug
info from the old instruction to the new one.
This patch fixes the bug described by copying the debug info from the
conditional branch instruction to the new unconditional branch
instruction, and adds a regression test for the Jump Threading pass that
covers this case.
Patch by Stephen Tozer!
Differential Revision: https://reviews.llvm.org/D58963
llvm-svn: 355822
When --compress-debug-sections is given,
llvm-objcopy removes the uncompressed sections and adds compressed to the section list.
This makes all the pointers to old sections to be outdated.
Currently, code already has logic for replacing the target sections of the relocation
sections. But we also have to update the relocations by themselves.
This fixes https://bugs.llvm.org/show_bug.cgi?id=40885.
Differential revision: https://reviews.llvm.org/D58960
llvm-svn: 355821
Narrow Scalar G_MUL for MIPS32.
Revisit NarrowScalar implementation in LegalizerHelper.
Introduce new helper function multiplyRegisters.
It performs generic multiplication of values held in multiple registers.
Generated instructions use only types NarrowTy and i1.
Destination can be same or two times size of the source.
Differential Revision: https://reviews.llvm.org/D58824
llvm-svn: 355814
Many of our tests were not using valid rounding mode immediates. Clang verifies this in the frontend when it creates the intrinsics from builtins, but the backend would still lower invalid immediates.
With this change we will now leave them as intrinsics if the immediate is invalid. This will cause an isel selection failure.
llvm-svn: 355789
Currently the store+load is folded and both operands of the umulo
end up being constants. To avoid this getting folded away entirely,
make sure at least one operand is non-constant.
Also remove some allocas which don't seem relevant to the test.
llvm-svn: 355776
This patch adds proper handling of -target-abi, as accepted by llvm-mc and
llc. Lowering (codegen) for the hard-float ABIs will follow in a subsequent
patch. However, this patch does add MC layer support for the hard float and
RVE ABIs (emission of the appropriate ELF flags
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#-file-header).
ABI parsing must be shared between codegen and the MC layer, so we add
computeTargetABI to RISCVUtils. A warning will be printed if an invalid or
unrecognized ABI is given.
Differential Revision: https://reviews.llvm.org/D59023
llvm-svn: 355771
Summary:
Uses the named operands tablegen feature to look up the indices of
offset, address, and p2align operands for all load and store
instructions. This replaces brittle, incorrect logic for identifying
loads and store when eliminating frame indices, which previously
crashed on bulk-memory ops. It also cleans up the SetP2Alignment pass.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59007
llvm-svn: 355770
This saves needing to call getInt32 ourselves. Making the code a little shorter.
The test changes are because insert/extract use getInt64 internally. Shouldn't be a functional issue.
This cleanup because I plan to write similar code for expandload/compressstore.
llvm-svn: 355767
Summary:
Floating-point CSRs should be accessible even when F extension is not enabled.
But pseudo instructions that access floating point CSRs still require the F extension.
GNU tools already implement this behavior. RISC-V spec is pending update to reflect
this behavior and to extend it to pseudo instructions that access floating point CSRs.
Reviewers: asb
Reviewed By: asb
Subscribers: asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, llvm-commits
Differential Revision: https://reviews.llvm.org/D58932
llvm-svn: 355753
r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be
maintained correctly in optimizations in optimizeBlock() and optimizeInst().
Function optimizeSelectInst() does not update the flag.
This patch propagates the flag in optimizeSelectInst() back to
optimizeBlock().
This patch also removes ModifiedDT in CodeGenPrepare class (which is not used).
The property of ModifiedDT is now recorded in a ref parameter.
Differential Revision: https://reviews.llvm.org/D59139
llvm-svn: 355751
Specifically, compute and Print Type and Section columns.
This is a re-commit of rL354833, after fixing the Asan problem found a a buildbot.
Differential Revision: https://reviews.llvm.org/D59060
llvm-svn: 355742
An extension of D58282 noted in PR39665:
https://bugs.llvm.org/show_bug.cgi?id=39665
This doesn't answer the request to use movmsk, but that's an
independent problem. We need this and probably still need
scalarization of FP selects because we can't do that as a
target-independent transform (although it seems likely that
targets besides x86 should have this transform).
llvm-svn: 355741
Summary:
This patch works around the bug in the ptxas tool with the processing of bytes
separated by the comma symbol. The emission of the packed string is
temporarily disabled.
Reviewers: tra
Subscribers: jholewinski, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59148
llvm-svn: 355740
Summary:
This change change the instrumentation to allow users to view the registers at the point at which tag mismatch occured. Most of the heavy lifting is done in the runtime library, where we save the registers to the stack and emit unwind information. This allows us to reduce the overhead, as very little additional work needs to be done in each __hwasan_check instance.
In this implementation, the fast path of __hwasan_check is unmodified. There are an additional 4 instructions (16B) emitted in the slow path in every __hwasan_check instance. This may increase binary size somewhat, but as most of the work is done in the runtime library, it's manageable.
The failure trace now contains a list of registers at the point of which the failure occured, in a format similar to that of Android's tombstones. It currently has the following format:
Registers where the failure occurred (pc 0x0055555561b4):
x0 0000000000000014 x1 0000007ffffff6c0 x2 1100007ffffff6d0 x3 12000056ffffe025
x4 0000007fff800000 x5 0000000000000014 x6 0000007fff800000 x7 0000000000000001
x8 12000056ffffe020 x9 0200007700000000 x10 0200007700000000 x11 0000000000000000
x12 0000007fffffdde0 x13 0000000000000000 x14 02b65b01f7a97490 x15 0000000000000000
x16 0000007fb77376b8 x17 0000000000000012 x18 0000007fb7ed6000 x19 0000005555556078
x20 0000007ffffff768 x21 0000007ffffff778 x22 0000000000000001 x23 0000000000000000
x24 0000000000000000 x25 0000000000000000 x26 0000000000000000 x27 0000000000000000
x28 0000000000000000 x29 0000007ffffff6f0 x30 00000055555561b4
... and prints after the dump of memory tags around the buggy address.
Every register is saved exactly as it was at the point where the tag mismatch occurs, with the exception of x16/x17. These registers are used in the tag mismatch calculation as scratch registers during __hwasan_check, and cannot be saved without affecting the fast path. As these registers are designated as scratch registers for linking, there should be no important information in them that could aid in debugging.
Reviewers: pcc, eugenis
Reviewed By: pcc, eugenis
Subscribers: srhines, kubamracek, mgorny, javed.absar, krytarowski, kristof.beyls, hiraditya, jdoerfert, llvm-commits, #sanitizers
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D58857
llvm-svn: 355738
When matching half of the build_vector to a load, there could still be
a hidden dependency on the other half of the build_vector the pattern
wouldn't detect. If there was an additional chain dependency on the
other value, a cycle could be introduced.
I don't think a tablegen pattern is capable of matching the necessary
conditions, so move this into PreprocessISelDAG. Check isPredecessorOf
for the other value to avoid a cycle. This has a warning that it's
expensive, so this should probably be moved into an MI pass eventually
that will have more freedom to reorder instructions to help match
this. That is currently complicated by the lack of a computeKnownBits
type mechanism for the selected function.
llvm-svn: 355731
This avoids breaking possible value dependencies when sorting loads by
offset.
AMDGPU has some load instructions that write into the high or low bits
of the destination register, and have a tied input for the other input
bits. These can easily have the same base pointer, but be a swizzle so
the high address load needs to come first. This was inserting glue
forcing the opposite ordering, producing a cycle the InstrEmitter
would assert on. It may be potentially expensive to look for the
dependency between the other loads, so just skip any where this could
happen.
Fixes bug 40936 by reverting r351379, which added a hacky attempt to
fix this by adding chains in this case, which I think was just working
around broken glue before the InstrEmitter. The core of the patch is
re-implementing the fix for that problem.
llvm-svn: 355728
This was checking the wrong operands for the base register and the
offsets. The indexes are shifted by the number of output registers
from the machine instruction definition, and the chain is moved to the
end.
llvm-svn: 355722
Summary:
If the LLVM module shows that it has debug info, but the file is
actually empty and the real debug info is not emitted, the ptxas tool
emits error 'Debug information not found in presence of .target debug'.
We need at leas one empty debug section to silence this message. Section
`.debug_loc` is not emitted for PTX and we can emit empty `.debug_loc`
section if `debug` option was emitted.
Reviewers: tra
Subscribers: jholewinski, aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D57250
llvm-svn: 355719
The indexed variant of vfmal.f16 and vfmsl.f16
instructions use the uppser bits of the indexed
operand to store the index (1 bit for the double
variant, 2 bits for the quad).
This limits the usable registers to d0 - d7 or
s0 - s15. This patch enforces this limitation.
Differential Revision: https://reviews.llvm.org/D59021
llvm-svn: 355707
llvm-readelf prints relocation addends as:
<symbol value>[+-]<absolute addend>
where [+-] is determined from whether addend is less than zero or not.
However, it does not print the +/- if there is no symbol, which meant
that negative addends became their positive value with no indication
that this had happened. This patch stops the absolute conversion when
addends are negative and there is no associated symbol.
Reviewed by: Higuoxing, mattd, MaskRay
Differential Revision: https://reviews.llvm.org/D59095
llvm-svn: 355696
Summary:
Right now, when we encounter a string equality check,
e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a
small compile-time constant, and fall back on calling `memcmp()` else.
This is sub-optimal because memcmp has to compute much more than
equality.
This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms
that support `bcmp`.
`bcmp` can be made much more efficient than `memcmp` because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.
Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits
Differential Revision: https://reviews.llvm.org/D56593
llvm-svn: 355672
We were just checking pointer size and type primitive size. But this caused unintended things like vectors of half being accepted by masked load/store.
For FP we now explicitly check for only double and float.
For pointers we now let any pointer through. Trusting that only 32 and 64 would be used to generate assembly.
We only check bitwidth after checking that the type is an integer.
llvm-svn: 355667
Summary:
In r349534, objc arc implementation is switched to use intrinsics and at
the same time, clang.arc.use is renamed to llvm.objc.clang.arc.use to
make the naming more consistent. The side-effect of that is llvm no
longer recognize it as intrinsics and codegen external references to
it instead.
Rather than upgrade the old intrinsics name to the new one and wait for
the arc-contract pass to remove it, simply remove it in the bitcode
upgrader.
rdar://problem/48607063
Reviewers: pete, ahatanak, erik.pilkington, dexonsmith
Reviewed By: pete, dexonsmith
Subscribers: jkorous, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59112
llvm-svn: 355663
Rotate with explicit immediate is a single uop from Haswell on. An immediate of 1 has a dependency on the previous writer of flags, but the other immediate values do not.
The implicit rotate by 1 instruction is 2 uops. But the flags are merged after the rotate uop so the data result does not see the flag dependency. But I don't think we have any way of modeling that.
RORX is 1 uop without the load. 2 uops with the load. We currently model these with WriteShift/WriteShiftLd.
Differential Revision: https://reviews.llvm.org/D59077
llvm-svn: 355636
Haswell and possibly Sandybridge have an optimization for ADC/SBB with immediate 0 to use a single uop flow. This only applies GR16/GR32/GR64 with an 8-bit immediate. It does not apply to GR8. It also does not apply to the implicit AX/EAX/RAX forms.
Differential Revision: https://reviews.llvm.org/D59058
llvm-svn: 355635
- Copy kernel symbol attributes into kernel descriptor attributes
- Make sure kernel symbol's visibility is not "higher" than protected
Differential Revision: https://reviews.llvm.org/D59057
llvm-svn: 355630
Summary:
Since bottleneck hints are enabled via user request, it can be
confusing if no bottleneck information is presented. Such is the
case when no bottlenecks are identified. This patch emits a message
in that case.
Reviewers: andreadb
Reviewed By: andreadb
Subscribers: tschuett, gbedwell, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59098
llvm-svn: 355628
Summary:
ShadowCallStack on x86_64 suffered from the same racy security issues as
Return Flow Guard and had performance overhead as high as 13% depending
on the benchmark. x86_64 ShadowCallStack was always an experimental
feature and never shipped a runtime required to support it, as such
there are no expected downstream users.
Reviewers: pcc
Reviewed By: pcc
Subscribers: mgorny, javed.absar, hiraditya, jdoerfert, cfe-commits, #sanitizers, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D59034
llvm-svn: 355624
Change the format type of *Personality and *LSDAAddress to PRIx64 since
they are of type uint64_t.
The problem was detected on mips builds, where it was printing junk values
and causing test failure.
Patch by Milos Stojanovic.
Differential Revision: https://reviews.llvm.org/D58451
llvm-svn: 355607
In some loops, we end up generating loop induction variables that look like:
{(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1}
As opposed to the simpler:
{(zext i16 (%i0 * %i1) to i32),+,-1}
i.e we count up from -limit to 0, not the simpler counting down from limit to
0. This is because the scores, as LSR calculates them, are the same and the
second is filtered in place of the first. We end up with a redundant SUB from 0
in the code.
This patch tries to make the calculation of the setup cost a little more
thoroughly, recursing into the scev members to better approximate the setup
required. The cost function for comparing LSR costs is:
return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds,
C1.ScaleCost, C1.ImmCost, C1.SetupCost) <
std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds,
C2.ScaleCost, C2.ImmCost, C2.SetupCost);
So this will only alter results if none of the other variables turn out to be
different.
Differential Revision: https://reviews.llvm.org/D58770
llvm-svn: 355597
Unsigned mul high for MIPS32 is selected into two PseudoInstructions:
PseudoMULTu and PseudoMFHI that use accumulator register class ACC64 for
some of its operands. Registers in this class have appropriate hi and lo
register as subregisters: $lo0 and $hi0 are subregisters of $ac0 etc.
mul instruction implicit-defs $lo0 and $hi0 according to MipsInstrInfo.td.
In functions where mul and PseudoMULTu are present fastRegisterAllocator
will "run out of registers during register allocation" because
'calcSpillCost' for $ac0 will return spillImpossible because subregisters
$lo0 and $hi0 of $ac0 are reserved by mul instruction above. A solution is
to mark implicit-defs of $lo0 and $hi0 as dead in mul instruction.
Differential Revision: https://reviews.llvm.org/D58715
llvm-svn: 355594
I need this to remove a binary from LLD test suite.
The patch also simplifies the code a bit.
Differential revision: https://reviews.llvm.org/D59082
llvm-svn: 355591
A previous patch for "uniform-work-group-size" attribute was found to break
some RADV and possibly radeon SI tests and had to be retracted.
This patch fixes that.
Differential Revision: http://reviews.llvm.org/D58993
llvm-svn: 355574
Summary:
While implementing inlining support for callbr
(https://bugs.llvm.org/show_bug.cgi?id=40722), I hit a crash in Loop
Rotation when trying to build the entire x86 Linux kernel
(drivers/char/random.c). This is a small fix up to r353563.
Test case is drivers/char/random.c (with callbr's inlined), then ran
through creduce, then `opt -opt-bisect-limit=<limit>`, then bugpoint.
Thanks to Craig Topper for immediately spotting the fix, and teaching me
how to fish.
Reviewers: craig.topper, jyknight
Reviewed By: craig.topper
Subscribers: hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58929
llvm-svn: 355564
MIPS target supports lowering `RETURNADDR` and `FRAMEADDR` for a current
frame only. It's better to show an error message then crash on assertion
if `__builtin_return_address` is invoked with non-zero argument.
llvm-svn: 355558
Part 4 of CSPGO changes:
(1) add support in cmake for cspgo build.
(2) fix an issue in big endian.
(3) test cases.
Differential Revision: https://reviews.llvm.org/D54175
llvm-svn: 355541