Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel.
It races to emit *.inc files simultaneously.
llvm-svn: 187780
This patch fixes the multiple breakages on ARM test-suite after the SLP
vectorizer was introduced by default on O3. The problem was an illegal
vector type on ARMTTI::getCmpSelInstrCost() <3 x i1> which is not simple.
The guard protects this code from breaking (cause of the problems) but
doesn't fix the issue that is generating the odd vector in the first
place, which also needs to be investigated.
llvm-svn: 187658
Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead of through a random target options
structure.
llvm-svn: 187618
While the .td entry is nice and all, it takes a pretty gross hack in
ARMAsmParser::ParseInstruction() because of handling of other "subs"
instructions to get it to match. Ran it by Jim Grosbach and he said it was
about what he expected to make this work given the existing code.
rdar://14214063
llvm-svn: 187530
When simplifying a (or (and B A) (and C ~A)) to a (VBSL A B C) ensure that the
bitwidth of the second operands to both ands match before comparing the negation
of the values.
Split the check of the value of the second operands to the ands. Move the cast
and variable declaration slightly higher to make it slightly easier to follow.
Bug-Id: 16700
Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
llvm-svn: 187404
do in the SDag when lowering references to the GOT: use
ARMConstantPoolSymbol rather than creating a dummy global variable. The
computation of the alignment still feels weird (it uses IR types and
datalayout) but it preserves the exact previous behavior. This change
fixes the memory leak of the global variable detected on the valgrind
leak checking bot.
Thanks to Benjamin Kramer for pointing me at ARMConstantPoolSymbol to
handle this use case.
llvm-svn: 187303
me) should start watching this bot more as its catching lots of bugs.
The fix here is to not construct the global if we aren't going to need
it. That's cheaper anyways, and globals have highly predictable types in
practice. I've added an assert to catch skew between our manual testing
of the type and the actual type just for paranoia's sake.
Note that this pattern is actually fine in most globals because when you
build a global with a module it automatically is moved to be owned by
that module. But here, we're in isel and don't really want to do that.
The solution of not creating a global is simpler anyways.
llvm-svn: 187302
When vectors are built from a single value, the ARM lowering issues a
scalar_to_vector node.
This node is then always morphed into a move from the general purpose unit to
the vector unit.
When the value comes from a load, this can be simplified into a vector load to
the right lane.
This patch changes the lowering of insert_vector_elt to expose a vector
friendly pattern in this situation.
This is a step toward fixing <rdar://problem/14170854>.
llvm-svn: 186999
instructions. With this patch:
1. ldr.n is recognized as mnemonic for the short encoding
2. ldr.w is recognized as menmonic for the long encoding
3. ldr will map to either short or long encodings depending on the size of the offset
llvm-svn: 186831
After Ulrich's r180677 (thanks!) TableGen is intelligent enough to
handle tied constraints involving complex operands properly, so
virtually all of the ARM custom converters are now unnecessary.
llvm-svn: 186810
indirect branches correctly. Under some circumstances, this led to the deletion
of basic blocks that were the destination of indirect branches. In that case it
left indirect branches to nowhere in the code.
This patch replaces, and is more general than either of the previous fixes for
indirect-branch-analysis issues, r181161 and r186461.
For other branches (not indirect) this refactor should have *almost* identical
behavior to the previous version. There are some corner cases where this
refactor is able to analyze blocks that the previous version could not (e.g.
this necessitated the update to thumb2-ifcvt2.ll).
<rdar://problem/14464830>
llvm-svn: 186735
My patch 'r183551 - ARM FastISel integer sext/zext improvements' was incorrect when emitting ARM register-immediate ASR, LSL, LSR instructions: they are pseudo-instructions in ARMInstrInfo.td and I should have used MOVsi instead.
This is not an issue when code is generated through a .s file, but is an issue when generated straight to a .o (-filetype=obj).
llvm-svn: 186489
block. Blocks that have an indirect branch terminator, even if it's not the
last terminator, should still be treated as unanalyzable.
<rdar://problem/14437274>
Reducing a useful regression test case is proving difficult - I hope to have
one soon.
llvm-svn: 186461
This adds an instruction alias to make the assembler recognize the alternate literal form: pli [PC, #+/-<imm>]
See A8.8.129 in the ARM ARM (DDI 0406C.b).
Fixes <rdar://problem/14403733>.
llvm-svn: 186459
We'd forgotten to provide string representations for the special ARMISD atomic
nodes; this adds them in. No effect on CodeGen, just makes the output of
"-view-whatever-dags" slightly more readable.
llvm-svn: 186406
This patch enables calls to __aeabi_idivmod when in EABI mode,
by using the remainder value returned on registers (R1),
enabled by the ARM triple "none-eabi". Note that Darwin and
GNUEABI triples will continue lowering on GNU style, that is,
using the stack for the remainder.
Still need to add SREM/UREM support fix for 64-bit lowering.
llvm-svn: 186390
ARM paired GPR COPY was being lowered to two MOVr without CC. This
patch puts the CC back.
My test is a reduction of the case where I encountered the issue,
64-bit atomics use paired GPRs.
The issue only occurs with selectionDAG, FastISel doesn't encounter it
so I didn't bother calling it.
llvm-svn: 186226
Fixes a 35% degradation compared to unvectorized code in
MiBench/automotive-susan and an equally serious regression on a private
image processing benchmark.
radar://14351991
llvm-svn: 186188
Address calculation for gather/scather in vectorized code can incur a
significant cost making vectorization unbeneficial. Add infrastructure to add
cost.
Tests and cost model for targets will be in follow-up commits.
radar://14351991
llvm-svn: 186187
Currently ARM is the only backend that supports FMA instructions (for at least some subtargets) but does not implement this virtual, so FMAs are never generated except from explicit fma intrinsic calls. Apparently this is due to the fact that it supports both fused (one rounding step) and unfused (two rounding step) multiply + add instructions. This patch clarifies that this the case without changing behavior by implementing the virtual function to simply return false, as the default TargetLoweringBase version does.
It is possible that some cpus perform the fused version faster than the unfused version and vice-versa, so the function implementation should be revisited if hard data is found.
llvm-svn: 185994
Propagate the fix from r185712 to Thumb2 codegen as well. Original
commit message applies here as well:
A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and
packs them in the bottom half of "x". An arithmetic and logic shift are
only equivalent in this context if the shift amount is 16. We would be
shifting in ones into the bottom 16bits instead of zeros if "y" is
negative.
rdar://14338767
llvm-svn: 185982