Summary:
These 4 patterns have the same one use check repeated twice for each. Once without a cast and one with. But the cast has no effect on what method is called.
For the OR case I believe it is always profitable regardless of the number of uses since we'll never increase the instruction count.
For the AND case I believe it is profitable if the pair of xors has one use such that we'll get rid of it completely. Or if the C value is something freely invertible, in which case the not doesn't cost anything.
Reviewers: spatel, majnemer
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34308
llvm-svn: 305705
Summary:
Currently we don't try to do anything with vector xors.
This patch adds support for removing duplicate pairs from a chain of vector xors as its pretty easy to support. We still dont' try to combine the xors with and/ors, but I might try that in a future patch.
Reviewers: mcrosier, davide, resistor
Reviewed By: mcrosier
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34338
llvm-svn: 305704
A new Gfx9 dasm test added with approx 29000 cases.
Existing tests extended by (approx.):
* Gfx7 asm: 5000 test cases
* Gfx8 asm: 5000 test cases
* Gfx9 asm: 14400 test cases
* Gfx8 dasm: 5200 test cases
llvm-svn: 305702
As all store merges checks are based on the memory operation
performed, allow use of truncated stores and extended loads as valid
input candidates for merging.
Relanding after fixing selection between truncated and normal store.
llvm-svn: 305701
Summary:
After a single predecessor is merged into a basic block, we need to invalidate
the LVI information for the new merged block, when LVI is not provably true for
all of instructions in the new block.
The test cases added show the correct LVI information using the LVI printer
pass.
Reviewers: reames, dberlin, davide, sanjoy
Reviewed by: dberlin, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34108
llvm-svn: 305699
Summary: use AA to tell whether a load can be moved before a call that writes to memory.
Reviewers: dberlin, davide, sanjoy, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D34115
llvm-svn: 305698
Summary: Implement some of the simplest addressing modes.It should help to test ABI.
Reviewers: zvi, guyblank
Reviewed By: guyblank
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D33888
llvm-svn: 305691
Use llvm::make_unique to avoid ambiguity with MSVC.
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.
Differential Revision: https://reviews.llvm.org/D34144
llvm-svn: 305690
Summary:
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.
Reviewers: craig.topper, evandro, t.p.northover, atrick, MatzeB
Reviewed By: MatzeB
Subscribers: atrick, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34144
llvm-svn: 305677
Add support throughout the pipeline:
- mark as legal for s32 and pointers
- map to GPRs
- lower to a sequence of instructions, which moves 0 or 1 into the
result register based on the flags set by a CMPrr
We have copied from FastISel a helper function which maps CmpInst
predicates into ARMCC codes. Ideally, we should be able to move it
somewhere that both FastISel and GlobalISel can use.
llvm-svn: 305672
Current implementation of SCEVExpander demonstrates a very naive behavior when
it deals with power calculation. For example, a SCEV for x^8 looks like
(x * x * x * x * x * x * x * x)
If we try to expand it, it generates a very straightforward sequence of muls, like:
x2 = mul x, x
x3 = mul x2, x
x4 = mul x3, x
...
x8 = mul x7, x
This is a non-efficient way of doing that. A better way is to generate a sequence of
binary power calculation. In this case the expanded calculation will look like:
x2 = mul x, x
x4 = mul x2, x2
x8 = mul x4, x4
In some cases the code size reduction for such SCEVs is dramatic. If we had a loop:
x = a;
for (int i = 0; i < 3; i++)
x = x * x;
And this loop have been fully unrolled, we have something like:
x = a;
x2 = x * x;
x4 = x2 * x2;
x8 = x4 * x4;
The SCEV for x8 is the same as in example above, and if we for some reason
want to expand it, we will generate naively 7 multiplications instead of 3.
The BinPow expansion algorithm here allows to keep code size reasonable.
This patch teaches SCEV Expander to generate a sequence of BinPow multiplications
if we have repeating arguments in SCEVMulExpressions.
Differential Revision: https://reviews.llvm.org/D34025
llvm-svn: 305663
Merge the functionality into the random access type collection.
This class was only being used in 2 places, so getting rid of it
simplifies the code.
llvm-svn: 305653
Summary:
This allows strlen to be moved out of the loop in case its argument is
not modified in the loop in LICM.
Reviewers: hfinkel, davide, sanjoy, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34323
llvm-svn: 305641
No behavior is changed if LLVM_TARGET_TRIPLE_ENV is blank or undefined.
If LLVM_TARGET_TRIPLE_ENV is "TEST_TARGET_TRIPLE" and $TEST_TARGET_TRIPLE is not blank,
llvm::sys::getDefaultTargetTriple() returns $TEST_TARGET_TRIPLE.
Lit resets config.target_triple and config.environment[LLVM_TARGET_TRIPLE_ENV] to change the default target.
Without changing LLVM_DEFAULT_TARGET_TRIPLE nor rebuilding, lit can be run;
TEST_TARGET_TRIPLE=i686-pc-win32 bin/llvm-lit -sv path/to/test/
TEST_TARGET_TRIPLE=i686-pc-win32 ninja check-clang-tools
Differential Revision: https://reviews.llvm.org/D33662
llvm-svn: 305632
Re-apply r276044/r279124/r305516. Fixed a problem where we would refuse
to place spills as the very first instruciton of a basic block and thus
artifically increase pressure (test in
test/CodeGen/PowerPC/scavenging.mir:spill_at_begin)
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Differential Revision: http://reviews.llvm.org/D21885
llvm-svn: 305625
Summary:
This is my misunderstanding on isBarrier. It's not for memory barriers,
but for other control flow purposes. lwsync doesn't have it either.
This fixes a simple crash with -verify-machineinstrs like below:
define void @Foo() {
entry:
%tmp = load atomic i64, i64* undef acquire, align 8
unreachable
}
I deliberately don't want to check in the test, since there is little
chance to regress on such a mistake. Such a test adds noise to the code
base.
I plan to check in first, since it fixes a crash, and the fix is obvious.
Reviewers: kbarton, echristo
Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D34314
llvm-svn: 305624