I had to tweak the i32 tests so we check both reg-reg and reg-mem cases.
I also added i64 load tests.
Part of ongoing work to ensure we test all intrinsic style tests on 32 and 64 bit targets where possible.
llvm-svn: 333827
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47120
llvm-svn: 333825
Object FIle Representation
At codegen time this is emitted into the ELF file a pair of symbol indices and a weight. In assembly it looks like:
.cg_profile a, b, 32
.cg_profile freq, a, 11
.cg_profile freq, b, 20
When writing an ELF file these are put into a SHT_LLVM_CALL_GRAPH_PROFILE (0x6fff4c02) section as (uint32_t, uint32_t, uint64_t) tuples as (from symbol index, to symbol index, weight).
Differential Revision: https://reviews.llvm.org/D44965
llvm-svn: 333823
As noted in the review thread for rL333782, we could have
made a bug harder to hit if we were simplifying instructions
before trying other folds.
The shuffle transform in question isn't ever a simplification;
it's just a canonicalization. So I've renamed that to make that
clearer.
This is NFCI at this point, but I've regenerated the test file
to show the cosmetic value naming difference of using
instcombine's RAUW vs. the builder.
Possible follow-ups:
1. Move reassociation folds after simplifies too.
2. Refactor common code; we shouldn't have so much repetition.
llvm-svn: 333820
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47121
llvm-svn: 333819
As noted in the review thread for rL333782, we're lacking coverage
for this transform, so add tests for each binop opcode with constant
operand.
llvm-svn: 333818
These instructions are unusual in that they operate on 4 consecutive registers so supporting them in codegen will be more difficult than normal.
Includes an assembler check to warn if the source register is not the first register of a 4 register group.
llvm-svn: 333812
Summary:
I noticed this issue because we didn't put the primary cloned loop into
the `NonChildClonedLoops` vector and so never iterated on it. Once
I fixed that, it made it clear why I had to do a really complicated and
unnecesasry dance when updating the loops to remain in canonical form --
I was unwittingly working around the fact that the primary cloned loop
wasn't in the expected list of cloned loops. Doh!
Now that we include it in this vector, we don't need to return it and we
can consolidate the update logic as we correctly have a single place
where it can be handled.
I've just added a test for the iteration order aspect as every time
I changed the update logic partially or incorrectly here, an existing
test failed and caught it so that seems well covered (which is also
evidenced by the extensive working around of this missing update).
Reviewers: asbirlea, sanjoy
Subscribers: mcrosier, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47647
llvm-svn: 333811
In r333801 I added a test for a dump method that, for reasons I don't
understand, fails on an msvc bot:
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/12306/
I'll remove the test for now to unblock the bot and try to look into why
there's a discrepancy on this platform later.
llvm-svn: 333807
and using the latter in DIBuilder::createArtificialType and
DIBuilder::createObjectPointerType methods as well as introducing
mirroring DISubprogram::cloneWithFlags and
DIBuilder::createArtificialSubprogram methods.
The primary goal here is to add createArtificialSubprogram to support
a pass downstream while keeping the method consistent with the
existing ones and making sure we don't encourage changing already
created DI-nodes.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D47615
llvm-svn: 333806
Previously we just returned undef, but really we should be returning the pass thru input. We also need to make sure we preserve the chain output that the original intrinsic node had to maintain connectivity in the DAG. So we should just return the incoming chain as the output chain.
llvm-svn: 333804
This re-lands r333797 with a fix for big endian systems.
Original commit message:
This isn't encountered anywhere inside LLVM, so I wrote a test case to expose the issue and verify that it is fixed.
The basic problem is that the macho_load_command union contains all load comamnd structs. Load command structs in 32-bit macho files can be 32-bit aligned instead of 64-bit aligned.
There are some strange circumstances in which this can be exposed in a 64-bit macho if the load commands are invalid or if a 32-bit aligned load command is used. In the past we've worked around this type of problem with changes like r264232.
llvm-svn: 333803
The idea behind WindowsSupport.h is that it's in the source directory so
that windows.h'isms don't leak out into the larger LLVM project. To that
end, any symbol that references a symbol from windows.h must be in this
private header, and not in a public header.
However, we had some useful utility functions in WindowsSupport.h which
have no dependency on the Windows API, but still only make sense on
Windows. Those functions should be usable outside of Support since there
is no risk of causing a windows.h leak. Although this introduces some
preprocessor logic in some header files, It's not too egregious and it's
better than the alternative of duplicating a ton of code.
Differential Revision: https://reviews.llvm.org/D47662
llvm-svn: 333798
This isn't encountered anywhere inside LLVM, so I wrote a test case to expose the issue and verify that it is fixed.
The basic problem is that the macho_load_command union contains all load comamnd structs. Load command structs in 32-bit macho files can be 32-bit aligned instead of 64-bit aligned.
There are some strange circumstances in which this can be exposed in a 64-bit macho if the load commands are invalid or if a 32-bit aligned load command is used. In the past we've worked around this type of problem with changes like r264232.
llvm-svn: 333797
The avx512f intrinsic tests were in the avx512vl file. We were also missing some combinations of masking.
This does show that we fail to use the zero masking form of expand loads when the passthru is zero. I'll try to get that fixed shortly.
llvm-svn: 333795
Summary:
Getelementptr returns a vector of pointers, instead of a single address,
when one or more of its arguments is a vector. In such case it is not
possible to simplify the expression by inserting a bitcast of operand(0)
into the destination type, as it will create a bitcast between different
sizes.
Reviewers: majnemer, mkuper, mssimpso, spatel
Reviewed By: spatel
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D46379
llvm-svn: 333783
This bug:
https://bugs.llvm.org/show_bug.cgi?id=37648
...was created with the enhancement to this transform with rL332479.
The urem test shows the disaster potential: any undef divisor lane makes
the whole op undef.
The test diffs show that vector demanded elements turns some of the potential,
but not all, unused binop operands back into undef already.
llvm-svn: 333782
The `MipsAsmParser::loadImmediate` can load immediates of various sizes
into a register. Idea of this change is to use `loadImmediate` in the
`MipsAsmParser::expandMemInst` method to load offset into a register and
then call required load/store instruction.
The patch removes separate `expandLoadInst` and `expandStoreInst`
methods and does everything in the `expandMemInst` method to escape code
duplication.
Differential Revision: https://reviews.llvm.org/D47316
llvm-svn: 333774
Supporting GOT and TLS related relocations by the `.reloc` directive is
useful for purpose of testing various tools like a linker, for example.
llvm-svn: 333773
This fixes the bug where strip-all option was
leading to a malformed outputted ELF file.
Differential Revision: https://reviews.llvm.org/D47414
llvm-svn: 333772
Summary:
Emit summaries for bitcode modules that are only destined for the
regular LTO portion of the build so they can participate in
summary-based dead stripping.
This change reduces the size of a nacl_helper build with cfi-icall
enabled by 7%, removing the majority of the overhead due to enabling
cfi-icall. The cfi-icall size increase was caused by compiling in lots
of unused code and cfi-icall generating jumptable references to unused
symbols that could no longer be removed by -Wl,-gc-sections. Increasing
the visibility of summary-based dead stripping prevented jumptable
entries being created for unused symbols from the regular LTO portion
of the build.
Reviewers: pcc
Reviewed By: pcc
Subscribers: dschuff, mehdi_amini, inglorion, eraman, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D47594
llvm-svn: 333768