This does not require -ffast-math, and it gives CSE/GVN more options to
eliminate duplicate expressions in, e.g.:
return ((x + 0.1234 * y) * (x - 0.1234 * y));
Differential Revision: http://reviews.llvm.org/D4904
llvm-svn: 216169
Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like:
int N = 100;
float **A;
void foo(int x0, int x1)
{
float * A_cur = &A[0][0];
float * A_next = &A[1][0];
for(int x = x0; x < x1; ++x).
{
// Currently only [x+N] case is widened. Others 2 cases lead to sext.
// This patch fixes it, so all 3 cases do not need sext.
const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N];
A_next[x] = div;
}
}
...
> clang++ test.cpp -march=core-avx2 -Ofast -fno-unroll-loops -fno-tree-vectorize -S -o -
Differential Revision: http://reviews.llvm.org/D4695
llvm-svn: 216160
advanced copy optimization.
This is the final step patch toward transforming:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
bx lr
Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1
and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
into simple generic GPR copies that the coalescer managed to remove.
<rdar://problem/12702965>
llvm-svn: 216144
If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation.
llvm-svn: 216140
This patch adds a new property: isInsertSubreg and the related target hooks:
TargetIntrInfo::getInsertSubregInputs and
TargetInstrInfo::getInsertSubregLikeInputs to specify that a target specific
instruction is a (kind of) INSERT_SUBREG.
The approach is similar to r215394.
<rdar://problem/12702965>
llvm-svn: 216139
On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to
be avoided. This patch trades simplicity for implementation time at the expense
of performance... As they say: correctness first, then performance.
See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few
ideas on how to make this better.
llvm-svn: 216138
advanced copy optimization.
This patch is a step toward transforming:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
bx lr
Indeed, thanks to this patch, this optimization is able to look through
vmov r0, r1, d16
but it does not understand yet
vmov.32 d16[0], r0
vmov.32 d16[1], r1
Comming patches will fix that and update the related test case.
<rdar://problem/12702965>
llvm-svn: 216136
It makes no sense and can hide bugs. In particular, it lead
to left shift by 64 bits, which is an undefined behavior,
properly reported by UBSan.
llvm-svn: 216134
target hook.
This patch teaches the compiler that:
rX, rY = VMOVRRD dZ
is the same as:
rX = EXTRACT_SUBREG dZ, ssub_0
rY = EXTRACT_SUBREG dZ, ssub_1
<rdar://problem/12702965>
llvm-svn: 216132
This patch adds a new property: isExtractSubreg and the related target hooks:
TargetIntrInfo::getExtractSubregInputs and
TargetInstrInfo::getExtractSubregLikeInputs to specify that a target specific
instruction is a (kind of) EXTRACT_SUBREG.
The approach is similar to r215394.
<rdar://problem/12702965>
llvm-svn: 216130
Store TargetSelectionDAGInfo as a pointer instead of a reference:
getSelectionDAGInfo() may not be implemented for certain backends
(e.g. it's not currently implemented for R600).
This bug is reported by UBSan.
llvm-svn: 216129
Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648
This patch checks the operands of a vselect to see if all values are constants.
If yes, bail out of any further attempts to create a blend or shuffle because
SelectionDAGLegalize knows how to turn this kind of vselect into a single load.
This already happens for machines without SSE4.1, so the added checks just send
more targets down that path.
Differential Revision: http://reviews.llvm.org/D4934
llvm-svn: 216121
The goal of the patch is to implement section 3.2.3 of the AMD64 ABI
correctly. The controlling sentence is, "The size of each argument gets
rounded up to eightbytes. Therefore the stack will always be eightbyte
aligned." The equivalent sentence in the i386 ABI page 37 says, "At all
times, the stack pointer should point to a word-aligned area." For both
architectures, the stack pointer is not being rounded up to the nearest
eightbyte or word between the last normal argument and the first
variadic argument.
Patch by Thomas Jablin!
llvm-svn: 216119
Both MachineLoopInfo and MachineDominatorTree may be null in ScheduleDAGMI
constructor call. It is undefined behavior to take references to these values.
This bug is reported by UBSan.
llvm-svn: 216118
Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530.
The problem is that X86ISelLowering erroneously thought the third call
was eligible for tail call elimination.
It would have been if it's return value was actually the one returned
by the calling function, but here that is not the case and
additional values are being returned.
Test Plan: Test case from the original bug report is included.
Reviewers: rafael
Reviewed By: rafael
Subscribers: rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D4968
llvm-svn: 216117
In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker
caused a miscompile because it broke a WAR hazard using a register that it thinks is available
based on info from a kill inst. Until PR18663 is solved, we shouldn't use any def/use info from
a kill because they are really just nops.
This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices
array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again.
The test case is a reduced version of the code from the bug report.
Differential Revision: http://reviews.llvm.org/D4977
llvm-svn: 216114
the isRegSequence property.
This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.
Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov d0, r2, r3
vmov d1, r0, r1
vmov r0, s0
vmov r1, s2
udiv r0, r1, r0
vmov r1, s1
vmov r2, s3
udiv r1, r2, r1
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.
With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.
The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).
Both optimizations rely on the ValueTracker introduced in r212100.
The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.
Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).
Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.
This is related to <rdar://problem/12702965>.
Review: http://reviews.llvm.org/D4874
llvm-svn: 216088
I added wrapping to the CFGPrinter a while back so the -view-cfg
output is actually viewable. I've since enountered very long mangled
names with the same problem, so I'm slightly tweaking this code to
work in that case.
llvm-svn: 216087
This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero-
extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both
operands have to be sign-/zero-extended with separate instructions.
Related to <rdar://problem/17913111>.
llvm-svn: 216073
legalization stage. With those two optimizations, fewer signed/zero extension
instructions can be inserted, and then we can expose more opportunities to
Machine CSE pass in back-end.
llvm-svn: 216066
Summary:
Fixes http://llvm.org/bugs/show_bug.cgi?id=20016 reproducible on new
lea-5.ll case.
Also use RSP/RBP for x32 lea to save 1 byte used for 0x67 prefix in
ESP/EBP case.
Test Plan: lea tests modified to include x32/nacl and new test added
Reviewers: nadav, dschuff, t.p.northover
Subscribers: llvm-commits, zinovy.nis
Differential Revision: http://reviews.llvm.org/D4929
llvm-svn: 216065
LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
According to ARM ARM, rbit only takes register as argument, not immediate.
The correct instruction should be rbit <Rd>, <Rm>.
The bug was originally introduced in r211057.
Differential Revision: http://reviews.llvm.org/D4980
llvm-svn: 216064
Because declarations of these functions can appear in places like autoconf
checks, they have to be handled somehow, even though we do not support
vararg custom functions. We do so by printing a warning and calling the
uninstrumented function, as we do for unimplemented functions.
llvm-svn: 216042
Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register
class to be used with the zero register. This makes the MachineInstruction
verifier happy again.
This is related to <rdar://problem/18027157>.
llvm-svn: 216040
We can prove that a 'sub' can be a 'sub nsw' under certain conditions:
- The sign bits of the operands is the same.
- Both operands have more than 1 sign bit.
The subtraction cannot be a signed overflow in either case.
llvm-svn: 216037
Factor out the ADDS/SUBS instruction emission code into helper functions and
make the helper functions more clever to support most of the different ADDS/SUBS
instructions the architecture support. This includes better immedediate support,
shift folding, and sign-/zero-extend folding.
This fixes <rdar://problem/17913111>.
llvm-svn: 216033
Implement `uselistorder` and `uselistorder_bb` assembly directives,
which allow the use-list order to be recovered when round-tripping to
assembly.
This is the bulk of PR20515.
llvm-svn: 216025
In r216015 I missed propagating `OnlyIfReduced` through the inline
versions of `getGetElementPtr()` (I was relying on compile failures on
mismatches between the header and source signatures to get them all).
llvm-svn: 216023
This adds the missing test that I promised for r215753 to test the
materialization of the floating-point value +0.0.
Related to <rdar://problem/18027157>.
llvm-svn: 216019
Change `ConstantExpr` to follow the model the other constants are using:
only malloc a replacement if it's going to be used. This fixes a subtle
bug where if an API user had used `ConstantExpr::get()` already to
create the replacement but hadn't given it any users, we'd delete the
replacement.
This relies on r216015 to thread `OnlyIfReduced` through
`ConstantExpr::getWithOperands()`.
llvm-svn: 216016
In order to change `ConstantExpr::replaceUsesOfWithOnConstant()` to work
like other constants (e.g., using `ConstantArray::getImpl()`), thread
`OnlyIfReduced` through as necessary. When `OnlyIfReduced` is false,
there's no functionality change. When it's true, if there's no constant
folding or type changes `nullptr` is returned instead of the new
constant.
`ConstantExpr::replaceUsesOfWithOnConstant()` will be updated to use the
"true" version in a follow-up commit.
llvm-svn: 216015
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.
For Example:
lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3]
ldr x0, [x0, x1]
sxtw x1, w1
lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3]
ldr x0, [x0, x1]
llvm-svn: 216013
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
In the large code model for X86 floating-point constants are placed in the
constant pool and materialized by loading from it. Since the constant pool
could be far away, a PC relative load might not work. Therefore we first
materialize the address of the constant pool with a movabsq and then load
from there the floating-point value.
Fixes <rdar://problem/17674628>.
llvm-svn: 216012
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
This mostly affects the i64 value type, which always resulted in an 15byte
mobavsq instruction to materialize any constant. The custom code checks the
value of the immediate and tries to use a different and smaller mov
instruction when possible.
This fixes <rdar://problem/17420988>.
llvm-svn: 216010
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.
Fixes <rdar://problem/17924413>.
llvm-svn: 216009
Note: This was originally reverted to track down a buildbot error. This commit
exposed a latent bug that was fixed in r215753. Therefore it is reapplied
without any modifications.
I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new
regeressions.
Original commit message:
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.
On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.
On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.
On ARM it would generate unnecessary mov instructions or not use mvn.
This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.
Related to <rdar://problem/17420988>.
llvm-svn: 216006
Owning the buffer is somewhat inflexible. Some Binaries have sub Binaries
(like Archive) and we had to create dummy buffers just to handle that. It is
also a bad fit for IRObjectFile where the Module wants to own the buffer too.
Keeping this ownership would make supporting IR inside native objects
particularly painful.
This patch focuses in lib/Object. If something elsewhere used to own an Binary,
now it also owns a MemoryBuffer.
This patch introduces a few new types.
* MemoryBufferRef. This is just a pair of StringRefs for the data and name.
This is to MemoryBuffer as StringRef is to std::string.
* OwningBinary. A combination of Binary and a MemoryBuffer. This is needed
for convenience functions that take a filename and return both the
buffer and the Binary using that buffer.
The C api now uses OwningBinary to avoid any change in semantics. I will start
a new thread to see if we want to change it and how.
llvm-svn: 216002
This fixes a few BuildMI callsites where the result register was added by
using addReg, which is per default a use and therefore an operand register.
Also use the zero register as result register when emitting a compare
instruction (SUBS with unused result register).
llvm-svn: 215997
Previously, the hint mechanism relied on clean up passes to remove redundant
metadata, which still showed up if running opt at low levels of optimization.
That also has shown that multiple nodes of the same type, but with different
values could still coexist, even if temporary, and cause confusion if the
next pass got the wrong value.
This patch makes sure that, if metadata already exists in a loop, the hint
mechanism will never append a new node, but always replace the existing one.
It also enhances the algorithm to cope with more metadata types in the future
by just adding a new type, not a lot of code.
llvm-svn: 215994
* Use StringRef instead of std::string&
* Return a std::unique_ptr<Module> instead of taking an optional module to write
to (was not really used).
* Use current comment style.
* Use current naming convention.
llvm-svn: 215989
I should have included this as part of r215986, which worked around this
corner by changing ArrayRef::equals() not to use std::equal. Alas.
llvm-svn: 215988
This reverts commit r215981, which reverted the above commits because
MSVC std::equal asserts on nullptr iterators, and thes commits
introduced an `ArrayRef::equals()` on empty ArrayRefs.
ArrayRef was changed not to use std::equal in r215986.
llvm-svn: 215987
MSVC's STL has a bug in `std::equal()`: it asserts on nullptr iterators,
causing a block revert in r215981. This works around that by re-writing
`ArrayRef::equals()` to do the work itself.
llvm-svn: 215986
Summary:
This directive is similar to ".set mipsX".
It is used to change the CPU target of the assembler, enabling it to accept instructions for a specific CPU.
This patch only implements the r4000 CPU (which is treated internally as generic mips3) and the generic ISAs.
Contains work done by Matheus Almeida.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D4884
llvm-svn: 215978
Avoid creating a new `ConstantVector` on an RAUW of one of its members.
This reduces RAUW traffic on any containing constant.
This is part of PR20515.
llvm-svn: 215966
Previously, `ConstantArray::replaceUsesOfWithOnConstant()` neglected to
check whether it becomes a `ConstantDataArray`. Call
`ConstantArray::getImpl()` to check for that.
llvm-svn: 215965
Introduce `getImpl()` that tries the simplification logic from `get()`
and then gives up. This allows the logic to be reused elsewhere in a
follow-up commit.
llvm-svn: 215963
Avoid RAUW-ing `ConstantExpr` when an operand changes unless the new
`ConstantExpr` already has users. This prevents the RAUW from rippling
up the expression tree unnecessarily.
This commit indirectly adds test coverage for r215953 (this is how I
came across the bug).
This is part of PR20515.
llvm-svn: 215960
Rewrite `ConstantUniqueMap` to be more similar to
`ConstantAggrUniqueMap`.
- Use a `DenseMap` with custom MapInfo instead of a `std::map` with
linear lookups and deletion.
- Don't waste memory explicitly storing (heavyweight) keys.
Only `ConstantExpr` and `InlineAsm` actually use this data structure, so
I also updated them to use it.
This code cleanup is a precursor to reducing RAUW traffic on
`ConstantExpr` -- I felt badly adding a new (linear) call to
`ConstantUniqueMap::FindExistingKey`, so this designs away the concern.
A follow-up commit will transition the users of `ConstantAggrUniqueMap`
over.
llvm-svn: 215957
This code had a homemade RAUW that was incorrect when a user was a
constant: instead of calling `replaceUsersWithOnConstant()` it would
incorrectly update the operand in-place, invalidating
`LLVMContextImpl::ExprConstants`. RAUW does the job better.
The ValueHandle that `GVMap` is holding onto needs to be removed first,
so this commit also removes each variable from the map on-the-fly.
Since deletions from `ExprConstants` use a linear search that compares
directly on the pointer value (instead of using the key), there isn't an
obvious way to expose this with a testcase.
llvm-svn: 215953
Previously all `blockaddress()` constants were treated as forward
references. They were resolved twice: once at the end of the function
in question, and again at the end of the module. Furthermore, if the
same blockaddress was referenced N times, the parser created N distinct
`GlobalVariable`s (one for each reference).
Instead, resolve all block addresses at the beginning of the function,
creating the standard `BasicBlock` forward references used for all other
basic block references. After the function, all references can be
resolved immediately. To check for the condition of parsing block
addresses from within the same function, I created a reference to the
current per-function-state in `BlockAddressPFS`.
Also, create only one forward-reference per basic block. Because
forward references to block addresses are rare, the data structure here
shouldn't matter. If somehow it does someday, this can be pretty easily
changed to a `DenseMap<std::pair<ValID, ValID>, GV>`.
This is part of PR20515.
llvm-svn: 215952
Call `verifyModule()` after parsing and after every transformation.
Also convert some `DEBUG(dbgs())` to `errs()` to increase visibility
into what's going on.
llvm-svn: 215951
- add check for volatile (probably unneeded, but I agree that we should be conservative about it).
- strengthen condition from isUnordered() to isSimple(), as I don't understand well enough Unordered semantics (and it also matches the comment better this way) to be confident in the previous behaviour (thanks for catching that one, I had missed the case Monotonic/Unordered).
- separate a condition in two.
- lengthen comment about aliasing and loads
- add tests in GVN/atomic.ll
llvm-svn: 215943
file with -macho, the Mach-O specific object file parser option.
After some discussion I chose to do this implementation contained in the logic
of llvm-objdump’s MachODump.cpp using a second disassembler for thumb when
needed and with updates mostly contained in the MachOObjectFile class.
llvm-svn: 215931
Summary:
Make use of isAtLeastRelease/Acquire in the ARM/AArch64 backends
These helper functions are introduced in D4844.
Depends D4844
Test Plan: make check-all passes
Reviewers: jfb
Subscribers: aemerson, llvm-commits, mcrosier, reames
Differential Revision: http://reviews.llvm.org/D4937
llvm-svn: 215902
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.
llvm-svn: 215890
ARM in particular is getting dangerously close to exceeding 32 bits worth of
possible subtarget features. When this happens, various parts of MC start to
fail inexplicably as masks get truncated to "unsigned".
Mostly just refactoring at present, and there's probably no way to test.
llvm-svn: 215887