r317453 added new ISD nodes without rounding modes that were added to an existing if/else chain. But all the previous nodes handled there included a rounding mode. The final code after this if/else chain expected an extra operand that isn't present for the new nodes.
llvm-svn: 317748
Adds the blacklist behaviour to llvm-cfi-verify. Now will calculate which lines caused expected failures in the blacklist and reports the number of affected indirect CF instructions for each blacklist entry.
Also moved DWARF checking after instruction analysis to improve performance significantly - unrolling the inlining stack is expensive.
Reviewers: vlad.tsyrklevich
Subscribers: aprantl, pcc, kcc, llvm-commits
Differential Revision: https://reviews.llvm.org/D39750
llvm-svn: 317743
InMemoryBuffer and OnDiskBuffer classes have both factory methods and
public constructors, and that looks a bit odd. This patch makes factory
methods non-member function to fix it.
Differential Revision: https://reviews.llvm.org/D39693
llvm-svn: 317739
In Rust, a trait can be implemented for any type, and if a trait
object pointer is used for the type, then a virtual table will be
emitted for that trait/type combination.
We would like debuggers to be able to inspect trait objects, which
requires finding the concrete type associated with a given vtable.
This patch changes LLVM so that any type can be passed to
replaceVTableHolder. This allows the Rust compiler to emit the needed
debug info -- associating a vtable with the concrete type for which it
was emitted.
This is a DWARF extension: DWARF only specifies the meaning of
DW_AT_containing_type in one specific situation. This style of DWARF
extension is routine, though, and LLVM already has one such case for
DW_AT_containing_type.
Patch by Tom Tromey!
Differential Revision: https://reviews.llvm.org/D39503
llvm-svn: 317730
This patch implements Chandler's idea [0] for supporting languages that
require support for infinite loops with side effects, such as Rust, providing
part of a solution to bug 965 [1].
Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual
effect, but which appears to optimization passes to have obscure side effects,
such that they don't optimize away loops containing it. It also teaches
several optimization passes to ignore this intrinsic, so that it doesn't
significantly impact optimization in most cases.
As discussed on llvm-dev [2], this patch is the first of two major parts.
The second part, to change LLVM's semantics to have defined behavior
on infinite loops by default, with a function attribute for opting into
potential-undefined-behavior, will be implemented and posted for review in
a separate patch.
[0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html
[1] https://bugs.llvm.org/show_bug.cgi?id=965
[2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html
Differential Revision: https://reviews.llvm.org/D38336
llvm-svn: 317729
This reverts r317579, originally committed as r317100.
There is a design issue with marking CFI instructions duplicatable. Not
all targets support the CFIInstrInserter pass, and targets like Darwin
can't cope with duplicated prologue setup CFI instructions. The compact
unwind info emission fails.
When the following code is compiled for arm64 on Mac at -O3, the CFI
instructions end up getting tail duplicated, which causes compact unwind
info emission to fail:
int a, c, d, e, f, g, h, i, j, k, l, m;
void n(int o, int *b) {
if (g)
f = 0;
for (; f < o; f++) {
m = a;
if (l > j * k > i)
j = i = k = d;
h = b[c] - e;
}
}
We get assembly that looks like this:
; BB#1: ; %if.then
Lloh3:
adrp x9, _f@GOTPAGE
Lloh4:
ldr x9, [x9, _f@GOTPAGEOFF]
mov w8, wzr
Lloh5:
str wzr, [x9]
stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill
.cfi_def_cfa_offset 16
.cfi_offset w19, -8
.cfi_offset w20, -16
cmp w8, w0
b.lt LBB0_3
b LBB0_7
LBB0_2: ; %entry.if.end_crit_edge
Lloh6:
adrp x8, _f@GOTPAGE
Lloh7:
ldr x8, [x8, _f@GOTPAGEOFF]
Lloh8:
ldr w8, [x8]
stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill
.cfi_def_cfa_offset 16
.cfi_offset w19, -8
.cfi_offset w20, -16
cmp w8, w0
b.ge LBB0_7
LBB0_3: ; %for.body.lr.ph
Note the multiple .cfi_def* directives. Compact unwind info emission
can't handle that.
llvm-svn: 317726
- This deprecates LLVM_ENABLE_IR_PGO but keeps it around for now.
- Errors out when LLVM_BUILD_INSTRUMENTED and LLVM_BUILD_INSTRUMENTED_COVERAGE
are both set.
Motivated by bogner's post-commit review of r313770.
llvm-svn: 317725
Previously, hasSideEffects was ? for TargetOpcode::PHI and would be inferred
as 1. D37065 sets the previously inferred properties explicitly. This patch sets
hasSideEffects=0 for PHI, as it is for G_PHI. MachineInstr::isSafeToMove has
been updated so it still returns false for PHI.
Additionally, HexagonBitSimplify relied on a PHI node having the
hasUnmodeledSideEffects property. This patch fixes that assumption.
Differential Revision: https://reviews.llvm.org/D37097
llvm-svn: 317721
We were calling tryFoldLoad with the 'and' node was the root and parent node of the load. But the parent of the load should be the shift that proceeds the and. While the and node is correctly the root node.
To fix this I had to make tryFoldLoad take a separate use and root input. I've added a convenience version with the old signature to avoid updating the other call sites.
llvm-svn: 317720
Summary:
In ThinLTO compilation, we exit populateModulePassManager early and
were not adding PM extension passes meant to run at the end of the
pipeline. This includes sanitizer passes. Add these passes before
the early exit.
A test will be added to projects/compiler-rt.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D39565
llvm-svn: 317714
Without this we can't parse gather instructions in ms inline asm blocks. The validateInstruction function was introduced in r316700 to check gather constraints.
llvm-svn: 317713
Previously, an "r" constraint would mean the compiler provides a value
on WebAssembly's operand stack. This was tricky to use properly,
particularly since it isn't possible to declare a new local from within
an inline asm string.
With this patch, "r" provides the value in a WebAssembly local, and the
local index is provided to the inline asm string. This requires inline
asm to use get_local and set_local to read the register. This does
potentially result in larger code size, however inline asm should
hopefully be quite rare in WebAssembly.
This also means that the "m" constraint can no longer be supported, as
WebAssembly has nothing like a "memory operand" that includes an
implicit get_local.
This fixes PR34599 for the wasm32-unknown-unknown-wasm target (though
not for the ELF target).
llvm-svn: 317707
Summary:
This fragment emits a symbol ID and will be useful for more than just Safe SEH
tables (e.g., I plan to re-use it for Control Flow Guard tables). This is
simply a rename refactor.
Reviewers: rnk
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39770
llvm-svn: 317703
In 2010 a commit with no testcase and no further explanation
explicitly disabled the handling of inlined variables in
EmitFuncArgumentDbgValue(). I don't think there is a good reason for
this any more and re-enabling this adds debug locations for variables
associated with an LLVM function argument in functions that are
inlined into the first basic block. The only downside of doing this is
that we may insert a DBG_VALUE before the inlined scope, but (1) this
could be filtered out later, and (2) LiveDebugValues will not
propagate it into subsequent basic blocks if they don't dominate the
variable's lexical scope, so this seems like a small price to pay.
rdar://problem/26228128
llvm-svn: 317702
These will be using inline asm to ensure we have coverage that we're unlikely to get from lowering of basic ir.
Currently waiting for D39728 to land to add support for scheduler comments for inline asm.
llvm-svn: 317698
This was once needed so that multiple tablegen binaries don't compile
the library concurrently. However, this isn't needed anymore since
adding USES_TERMINAL to the custom_command.
This is supported by the fact that the target was only building
LLVMSupport since some cleanups a year ago. If this dependency had
really been needed, we would have seen complaints.
Differential Revision: https://reviews.llvm.org/D39299
llvm-svn: 317695
CMake does a poor job in tracking dependencies on files and directories
directly. Create custom target similar to the configuration step.
On my system, this avoids the reconfiguration on each build.
Differential Revision: https://reviews.llvm.org/D39298
llvm-svn: 317694
This should be a trivial change, and I've started using it for generating all
tests at https://github.com/lowrisc/riscv-llvm (i.e. it's been tested in
action quite a lot). Note that the regex does not attempt to match
.cfi_startproc, as I want to ensure compatibility with functions that have the
nounwind attribute.
Differential Revision: https://reviews.llvm.org/D39789
llvm-svn: 317693
Note that this is just enough for simple function call examples to generate
working code. Support for varargs etc follows in future patches.
Differential Revision: https://reviews.llvm.org/D29936
llvm-svn: 317691
A good portion of this patch is the extra functions that needed to be
implemented to support the test case. e.g. storeRegToStackSlot,
loadRegFromStackSlot, eliminateFrameIndex.
Setting ISD::BR_CC to Expand may appear non-obvious on an architecture with
branch+cmp instructions. However, I found it much easier to deal with matching
the expanded form.
I had to change simm13_lsb0 and simm21_lsb0 to inherit from the
Operand<OtherVT> class rather than Operand<i32> in order to keep tablegen
happy. This isn't a big deal, but it does seem a shame to lose the uniformity
across immediate types when there's not an obvious benefit (I'm hoping a
tablegen expert will educate me on what I'm missing here!).
Differential Revision: https://reviews.llvm.org/D29935
llvm-svn: 317690
This required the implementation of RISCVTargetInstrInfo::copyPhysReg. Support
for lowering global addresses follow in the next patch.
Differential Revision: https://reviews.llvm.org/D29934
llvm-svn: 317685
There are cases when we have to merge TBAA access tags with the
same base access type, but different final access types. For
example, accesses to different members of the same structure may
be vectorized into a single load or store instruction. Since we
currently assume that the tags to merge always share the same
final access type, we incorrectly return a tag that describes an
access to one of the original final access types as the generic
tag. This patch fixes that by producing generic tags for the
common type and not the final access types of the original tags.
Resolves:
PR35225: Wrong tbaa metadata after load store vectorizer due to
recent change
https://bugs.llvm.org/show_bug.cgi?id=35225
Differential Revision: https://reviews.llvm.org/D39732
llvm-svn: 317682
Previously these pseudo instructions were not guarded by ISA, so their
select was dependant on the ordering of the entries in the DAG matcher.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39723
llvm-svn: 317681
My fix is conservative and will make us return may-alias instead.
The test case is:
check(gep(x, 0), n, gep(x, n), -1) with n == sizeof(x)
Here, the first value accesses the whole object, but the second access
doesn't access anything. The semantics of -1 is read until the end of the
object, which in this case means read nothing.
No test case, since isn't trivial to exploit this one, but I've proved it correct.
llvm-svn: 317680
rL162640 introduced CodeGenTarget::guessInstructionProperties. If a target
sets guessInstructionProperties=0 in its FooInstrInfo, tablegen will error if
it has to guess properties from patterns. Unfortunately,
guessInstructionProperties=0 can't be used with current upstream LLVM as
instructions in the TargetOpcode namespace are always included and sometimes
have inferred properties for mayLoad, mayStore, and hasSideEffects. This patch
provides the simplest possible fix to this problem, setting default values for
these fields in the TargetOpcode scope. There is no intended functional
change, as the explicitly set properties should match what was previously
inferred. A number of the instructions had hasSideEffects=1 inferred
unintentionally. This patch makes it explicit, while future patches (such as
D37097) correct the property.
Differential Revision: https://reviews.llvm.org/D37065
llvm-svn: 317674
Some of the AMDGPU stack addressing modes require knowing the sign
bit is zero. We used to accomplish this by custom lowering
frame indexes, and then putting an AssertZext around a
TargetFrameIndex. This required specifically looking for
the AssextZext + frame index pattern which was moderately
disgusting. The same could probably be accomplished
with a target specific node, but would still
require special handling of frame indexes.
llvm-svn: 317671
This patch enables the folding of address computation in
memory instruction in case adress is represented by Phi node.
The inputs of Phi node might be different in base register.
Differential Revision: https://reviews.llvm.org/D36073
llvm-svn: 317665
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.
llvm-svn: 317647
Summary:
This change allows yaml input to control the order of implicitly added sections
(`.symtab`, `.strtab`, `.shstrtab`). The order is controlled by adding a
placeholder section of the given name to the Sections field.
This change is to support changes in D39582, where it is desirable to control
the location of the `.dynsym` section.
This reapplied version fixes:
1. use of a function call within an assert
2. failing lld test which has an unnamed section
Additionally, one more test to cover the unnamed section failure.
Reviewers: compnerd, jakehehrlich
Reviewed By: jakehehrlich
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39749
llvm-svn: 317646
Summary:
This just seems to have been an oversight. We already supported the f64
atomic add with an explicit scope (e.g. "cta"), but not the scopeless
version.
Reviewers: tra
Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39638
llvm-svn: 317623
Summary:
This change allows yaml input to control the order of implicitly added sections
(`.symtab`, `.strtab`, `.shstrtab`). The order is controlled by adding a
placeholder section of the given name to the Sections field.
This change is to support changes in D39582, where it is desirable to control
the location of the `.dynsym` section.
Reviewers: compnerd, jakehehrlich
Reviewed By: jakehehrlich
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39749
llvm-svn: 317622
Patch tries to improve vectorization of the following code:
void add1(int * __restrict dst, const int * __restrict src) {
*dst++ = *src++;
*dst++ = *src++ + 1;
*dst++ = *src++ + 2;
*dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.
Fixed PR34619 and other issues related to previous commit.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
Reviewed By: ABataev, RKSimon
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 317618
Summary:
Extends SCL functionality to allow users to find the line number in the file the SCL is built from through SpecialCaseList::inSectionBlame(...).
Also removes the need to compile the SCL before use. As the matcher now contains a list of regexes to test against instead of a single regex, the regexes can be individually built on each insertion rather than one large compilation at the end of construction.
This change also fixes a bug where blank lines would cause the parser to become out-of-sync with the line number. An error on line `k` was being reported as being on line `k - num_blank_lines_before_k`.
Note: This change has a cyclical dependency on D39486. Both these changes must be submitted at the same time to avoid a build breakage.
Reviewers: vlad.tsyrklevich
Reviewed By: vlad.tsyrklevich
Subscribers: kcc, pcc, llvm-commits
Differential Revision: https://reviews.llvm.org/D39485
llvm-svn: 317617
The hexagon test should be fixed now.
Original commit message:
This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.
This can allow us to get the select closer to other selects to enable removing one.
Differential Revision: https://reviews.llvm.org/D39222
llvm-svn: 317600
An "or" that sets the sign-bit can be replaced with a "xor", if
the sign-bit was known to be clear before. With some changes to
instruction combining, the simple sign-bit check was failing.
Replace it with a more flexible one to catch more cases.
llvm-svn: 317592
Patch [5/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.
Patch by Sander De Smalen.
Reviewed by: rengolin
Differential Revision: https://reviews.llvm.org/D39091
llvm-svn: 317591
Patch [3/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.
To summarise, this patch adds:
* SVE register definitions
* Methods to parse SVE register operands
* Methods to print SVE register operands
* RegKind SVEDataVector to distinguish it from other data types like scalar register or Neon vector.
* k_SVEDataRegister and SVEDataRegOp to describe SVE registers (which will be extended by further patches with e.g. ElementWidth and the shift-extend type).
Patch by Sander De Smalen.
Reviewed by: rengolin
Differential Revision: https://reviews.llvm.org/D39089
llvm-svn: 317590
Patch [4/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.
We add SVE as unsupported feature for CPUs that don't have SVE to prevent errors from scheduler models saying it lacks information for these instructions.
Patch by Sander De Smalen.
Reviewed by: rengolin
Differential Revision: https://reviews.llvm.org/D39090
llvm-svn: 317582
Reland r317100 with minor fix regarding ComputeCommonTailLength function in
BranchFolding.cpp. Skipping top CFI instructions block needs to executed on
several more return points in ComputeCommonTailLength().
Original r317100 message:
"Correct dwarf unwind information in function epilogue for X86"
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.
The second part is platform independent and ensures that:
- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
different passes. This is done in a late pass by analyzing information
about cfa offset and cfa register in BBs and inserting additional CFI
directives where necessary.
Changed CFI instructions so that they:
- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal
Added CFIInstrInserter pass:
- analyzes each basic block to determine cfa offset and register valid at
its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
rule for calculating CFA
Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.
Patch by Violeta Vukobrat.
llvm-svn: 317579
Summary:
The cost calculation for default case on X86 target does not always
follow correct wayt because of missing 4-th argument in
`BaseT::getCastInstrCost()` call. Added this missing parameter.
Reviewers: hfinkel, mkuper, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39687
llvm-svn: 317576
Summary:
This makes it very easy to test files that only differ in a constant
value somewhere in the test case.
Reviewers: jlebar, hfinkel, chandlerc, probinson
Reviewed By: probinson
Subscribers: probinson, llvm-commits
Differential Revision: https://reviews.llvm.org/D39629
llvm-svn: 317572
Patch [2/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions.
This change is a non functional change that adds RegKind as an alternative to 'isVector' to prepare it for newer types (SVE data vectors and predicate vectors) that will be added in next patches (where the SVE data vector is added as part of this patch set)
Patch by Sander De Smalen.
Reviewed by: rengolin
Differential Revision: https://reviews.llvm.org/D39088
llvm-svn: 317569
Patch [1/5] in a series to add assembler/disassembler support for AArch64 SVE
unpredicated ADD/SUB instructions.
Patch by Sander De Smalen.
Reviewed by: rengolin
Differential Revision: https://reviews.llvm.org/D39087
llvm-svn: 317564
This changes the interface of how targets describe how to legalize, see
the below description.
1. Interface for targets to describe how to legalize.
In GlobalISel, the API in the LegalizerInfo class is the main interface
for targets to specify which types are legal for which operations, and
what to do to turn illegal type/operation combinations into legal ones.
For each operation the type sizes that can be legalized without having
to change the size of the type are specified with a call to setAction.
This isn't different to how GlobalISel worked before. For example, for a
target that supports 32 and 64 bit adds natively:
for (auto Ty : {s32, s64})
setAction({G_ADD, 0, s32}, Legal);
or for a target that needs a library call for a 32 bit division:
setAction({G_SDIV, s32}, Libcall);
The main conceptual change to the LegalizerInfo API, is in specifying
how to legalize the type sizes for which a change of size is needed. For
example, in the above example, how to specify how all types from i1 to
i8388607 (apart from s32 and s64 which are legal) need to be legalized
and expressed in terms of operations on the available legal sizes
(again, i32 and i64 in this case). Before, the implementation only
allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0,
s128}, NarrowScalar). A worse limitation was that if you'd wanted to
specify how to legalize all the sized types as allowed by the LLVM-IR
LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times
and probably would need a lot of memory to store all of these
specifications.
Instead, the legalization actions that need to change the size of the
type are specified now using a "SizeChangeStrategy". For example:
setLegalizeScalarToDifferentSizeStrategy(
G_ADD, 0, widenToLargerAndNarrowToLargest);
This example indicates that for type sizes for which there is a larger
size that can be legalized towards, do it by Widening the size.
For example, G_ADD on s17 will be legalized by first doing WidenScalar
to make it s32, after which it's legal.
The "NarrowToLargest" indicates what to do if there is no larger size
that can be legalized towards. E.g. G_ADD on s92 will be legalized by
doing NarrowScalar to s64.
Another example, taken from the ARM backend is:
for (unsigned Op : {G_SDIV, G_UDIV}) {
setLegalizeScalarToDifferentSizeStrategy(Op, 0,
widenToLargerTypesUnsupportedOtherwise);
if (ST.hasDivideInARMMode())
setAction({Op, s32}, Legal);
else
setAction({Op, s32}, Libcall);
}
For this example, G_SDIV on s8, on a target without a divide
instruction, would be legalized by first doing action (WidenScalar,
s32), followed by (Libcall, s32).
The same principle is also followed for when the number of vector lanes
on vector data types need to be changed, e.g.:
setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal);
setLegalizeVectorElementToDifferentSizeStrategy(
G_ADD, 0, widenToLargerTypesUnsupportedOtherwise);
As currently implemented here, vector types are legalized by first
making the vector element size legal, followed by then making the number
of lanes legal. The strategy to follow in the first step is set by a
call to setLegalizeVectorElementToDifferentSizeStrategy, see example
above. The strategy followed in the second step
"moreToWiderTypesAndLessToWidest" (see code for its definition),
indicating that vectors are widened to more elements so they map to
natively supported vector widths, or when there isn't a legal wider
vector, split the vector to map it to the widest vector supported.
Therefore, for the above specification, some example legalizations are:
* getAction({G_ADD, LLT::vector(3, 3)})
returns {WidenScalar, LLT::vector(3, 8)}
* getAction({G_ADD, LLT::vector(3, 8)})
then returns {MoreElements, LLT::vector(8, 8)}
* getAction({G_ADD, LLT::vector(20, 8)})
returns {FewerElements, LLT::vector(16, 8)}
2. Key implementation aspects.
How to legalize a specific (operation, type index, size) tuple is
represented by mapping intervals of integers representing a range of
size types to an action to take, e.g.:
setScalarAction({G_ADD, LLT:scalar(1)},
{{1, WidenScalar}, // bit sizes [ 1, 31[
{32, Legal}, // bit sizes [32, 33[
{33, WidenScalar}, // bit sizes [33, 64[
{64, Legal}, // bit sizes [64, 65[
{65, NarrowScalar} // bit sizes [65, +inf[
});
Please note that most of the code to do the actual lowering of
non-power-of-2 sized types is currently missing, this is just trying to
make it possible for targets to specify what is legal, and how non-legal
types should be legalized. Probably quite a bit of further work is
needed in the actual legalizing and the other passes in GlobalISel to
support non-power-of-2 sized types.
I hope the documentation in LegalizerInfo.h and the examples provided in the
various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well
enough how this is meant to be used.
This drops the need for LLT::{half,double}...Size().
Differential Revision: https://reviews.llvm.org/D30529
llvm-svn: 317560
This patch disables the handling of selects in optimization
extensing scope of optimizeMemoryInst.
The optimization itself is disable by default.
The idea here is just to switch optimiztion level step by step.
Specifically, first optimization will be enabled only for Phi nodes,
then select instructions will be added.
In case someone will complain about perfromance it will be easier to
detect what part of optimizations is responsible for that.
Differential Revision: https://reviews.llvm.org/D36073
llvm-svn: 317555
This document contains information on how to cross-compile the compiler-rt
builtins library for several flavours of Arm target and how to test the
libraries using qemu.
Differential Revision: https://reviews.llvm.org/D39600
llvm-svn: 317554
Summary:
Calls using invoke in funclet based functions are assumed to clobber
all registers, which causes the stack adjustment using pops to consider
all registers not defined by the call to be undefined, which can
unfortunately include the base pointer, if one is needed.
To prevent this (and possibly other hazards), skip reserved registers
when looking for candidate registers.
This fixes issue #45034 in the Rust compiler.
Reviewers: mkuper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39636
llvm-svn: 317551
According to the docs on opegroup.org, the function can return
EINVAL if:
The len argument is less than zero, or the offset argument is less
than zero, or the underlying file system does not support this
operation.
I'd say it's a peculiar choice (when EONOTSUPP is right there), but
let's keep POSIX happy for now. This was independently discovered
by Mark Millard (on FreeBSD/ZFS).
Quickly ack'ed by Rui on IRC.
llvm-svn: 317535
Minimal tool to convert xray traces to Chrome's Trace Event Format.
Summary:
Make use of Chrome Trace Event format's Duration events and stack frame dict to
produce Json files that chrome://tracing can visualize from xray function call
traces. Trace Event format is more robust and has several features like
argument logging, function categorization, multi process traces, etc. that we
can add as needed. Duration events cover an important base case.
Part of this change is rearranging the code so that the TrieNode data structure
can be used from multiple tools and can carry parameterized baggage on the
nodes. I put the actual behavior changes in llvm-xray convert exclusively.
Exploring the trace of instrumented llc was pretty nifty if overwhelming.
I can envision this being very useful for analyzing contention scenarios or
tuning parameters like batch sizes in a producer consumer queue. For more
targeted traces likemthis, let's talk about how we want to approach trace
pruning.
Reviewers: dberris, pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39362
llvm-svn: 317531
Blockaddresses refer to the function itself, therefore replacing them
would cause an assertion in doRAUW.
Fixes https://bugs.llvm.org/show_bug.cgi?id=35201
This was found when trying CFI on a proprietary kernel by Dmitry Mikulin.
Differential Revision: https://reviews.llvm.org/D39695
llvm-svn: 317527
This combine was already done in two places. The
generic combiner already has done this since
r217610, for adds (with a single use).
This one was added in r303641, and added support for handling
or as well. r313251 later added support to the generic
combine for or. It also turns out the isOrEquivalentToAdd
check is not necessary for this combine.
Additionally, we already reproduce this combine in yet
another place in the backend, although in that version
multiple uses of the add are still folded if it will
allow a fold into the addressing mode. That version needs
to be improved to understand ors though, as well as the
correct legal offsets for private.
llvm-svn: 317526
This makes DILocation::getMergedLocation() do what its comment says it
does when merging locations for an Instruction: set the common inlineAt
scope. This simplifies Instruction::applyMergedLocation() a bit.
Testing: check-llvm, check-clang
Differential Revision: https://reviews.llvm.org/D39628
llvm-svn: 317524
rL316419 exposed a platform specific issue where the type of the values
passed to llvm::format could be different to the format string.
Debian unstable for mips uses long long int for std::chrono:duration,
while x86_64 uses long int.
For mips, this resulted in the value being corrupted when rendered to a
string. Address this by explicitly casting the result of the duration_cast
to the type specified in the format string.
Reviewers: sammccall
Differential Revision: https://reviews.llvm.org/D39597
llvm-svn: 317523
The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available.
Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns.
All known CPUs with AVX512 have F16C so this should safe for now.
llvm-svn: 317521
Previously our VEX patterns were checking Subtarget.hasFMA() which checked FMA || AVX512. So we were behaving as if AVX512 implied it anyway. Which means we'd allow VEX encoded 128/256 FMA when AVX512F was enabled but AVX512VL is off. Regardless of the FMA flag.
EVEX to VEX also transforms scalar EVEX FMA instructions to their VEX versions even without the FMA flag. Similarly for 128/256 under AVX512VL.
So this makes AVX512 imply FeatureFMA to make our current behavior explicit.
All known CPUs that support AVX512 have VEX FMA instructions.
llvm-svn: 317520
As discussed in D39204, this is effectively a revert of rL265521 which required nnan
to vectorize sqrt libcalls based on the old LangRef definition of llvm.sqrt. Now that
the definition has been updated so the libcall and intrinsic have the same semantics
apart from potentially setting errno, we can remove the nnan requirement.
We have the right check to know that errno is not set:
if (!ICS.onlyReadsMemory())
...ahead of the switch.
This will solve https://bugs.llvm.org/show_bug.cgi?id=27435 assuming that's being
built for a target with -fno-math-errno.
Differential Revision: https://reviews.llvm.org/D39642
llvm-svn: 317519
This broke the CodeGen/Hexagon/loop-idiom/pmpy-mod.ll test on a bunch of buildbots.
> This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.
>
> This can allow us to get the select closer to other selects to enable removing one.
>
> Differential Revision: https://reviews.llvm.org/D39222
>
> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317510 91177308-0d34-0410-b5e6-96231b3b80d8
llvm-svn: 317518
This broke the use of libxml2 on machines where iconv() is provided by libc.
I'll follow up on the mailing list to discuss how to fix this properly.
> This is introduced in rL308711.
> Check for c library is incorrect here just because libc will be found always
> and it does not mean that iconv is presented.
>
> Thank to Andrew Krasny for narrowing down the root cause.
>
> Reviewers: ecbeckmann
> Reviewed By: ecbeckmann
> Subscribers: mgorny, llvm-commits
> Differential Revision: https://reviews.llvm.org/D38875
llvm-svn: 317517
Summary:
Print %subreg.<subregidxname> instead of just the subregister
index when printing immediate operands corresponding to subreg
indices in INSERT_SUBREG, EXTRACT_SUBREG, SUBREG_TO_REG and
REG_SEQUENCE.
Reviewers: qcolombet, MatzeB
Reviewed By: MatzeB
Subscribers: nhaehnle, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D39696
llvm-svn: 317513
This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select.
This can allow us to get the select closer to other selects to enable removing one.
Differential Revision: https://reviews.llvm.org/D39222
llvm-svn: 317510
Previously, this could end up replacing a vreg like %14 with
[[VREG1]]4, where VREG1 was the match for %1. That's obviously not
correct, though it hasn't actually come up in any tests I've converted
so far.
llvm-svn: 317509
Summary: When computing the SUM for indirect call promotion, if the callsite is already promoted in the profile, it will be promoted before ICP. In the current implementation, ICP only sees remaining counts in SUM. This may cause extra indirect call targets being promoted. This patch updates the SUM to include the counts already promoted earlier. This way we do not end up promoting too many indirect call targets.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38763
llvm-svn: 317502
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html
...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.
As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic
reassociation - 'AllowReassoc'.
We're also adding a bit to allow approximations for library functions called 'ApproxFunc'
(this was initially proposed as 'libm' or similar).
...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits),
but that's apparently already used for other purposes. Also, I don't think we can just
add a field to FPMathOperator because Operator is not intended to be instantiated.
We'll defer movement of FMF to another day.
We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.
Finally, this change is binary incompatible with existing IR as seen in the
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile
them. For example, if nsw is ever replaced with something else, dropping it would be
a valid way to upgrade the IR."
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will
fail to optimize some previously 'fast' code because it's no longer recognized as
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.
Note: an inter-dependent clang commit to use the new API name should closely follow
commit.
Differential Revision: https://reviews.llvm.org/D39304
llvm-svn: 317488
We still early-out for X86ISD::PEXTRW/X86ISD::PEXTRB so no actual change in behaviour, but it'll make it easier to add support in a future patch.
llvm-svn: 317485
combineExtractWithShuffle can handle more complex shuffles/bitcasts than we can with the equivalent code in XFormVExtractWithShuffleIntoLoad.
Mainly a compile time improvement now (combineExtractWithShuffle combines will have always failed late on inside XFormVExtractWithShuffleIntoLoad), and will let us merge combineExtractVectorElt_SSE in a future commit.
llvm-svn: 317481
SystemZ can do division and remainder in a single instruction for scalar
integer types, which are now reflected by returning true in this hook for
those cases.
Review: Ulrich Weigand
llvm-svn: 317477
The backend assumes pointer in default addr space is 32 bit, which is not
true for the new addr space mapping and causes assertion for unresolved
functions.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39643
llvm-svn: 317476
Previously, the 'movep' instruction was defined for microMIPS32r3 and
shared that definition with microMIPS32R6. 'movep' was re-encoded for
microMIPS32r6, so this patch provides the correct encoding.
Secondly, correct the encoding of the 'rs' and 'rt' operands which have
an instruction specific encoding for the registers those operands accept.
Finally, correct the decoding of the 'dst_regs' operand which was extracting
the relevant field from the instruction, but was actually extracting the
field from the alreadly extracted field.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39495
llvm-svn: 317475
It is currently not possible to build the documentation with cmake and
the same version of Sphinx (1.5.1) used to generate the public facing
documentation on llvm.org. When code blocks cannot be parsed by
Pygments, it generates a warning which is treated as an error.
In addition to being annoying and confusing for developers, this
needlessly increases the bar for newcomers that want to get involved.
This patch removes the language specifier from the affected block. The
result is the same as when parsing fails: the block are not highlighted.
llvm-svn: 317472
Recommit:
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
fixed the location of the lit test it works with make check-all.
Differential Revision: https://reviews.llvm.org/D39403
llvm-svn: 317471
Mark all symbols involved with TLS relocations as being TLS symbols.
This resolves PR35140.
Thanks to Alex Crichton for reporting the issue!
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39591
llvm-svn: 317470
Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits
and does not need the redundant shift left and shift right instructions afterwards.
Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM
and icmp(eq,and(X,Y), 0) goes folds into TESTNM
This commit is a preparation for lowering the test and testn X86 intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38732
llvm-svn: 317465
Summary:
Try to lower a BUILD_VECTOR composed of extract-extract chains that can be
reasoned to be a permutation of a vector by indices in a non-constant vector.
We saw this pattern created by ISPC, which resolts to creating it due to the
requirement that shufflevector's mask operand be a *constant* vector.
I didn't check this but we could possibly use this pattern for lowering the X86 permute
C-instrinsics instead of llvm.x86 instrinsics.
This change can be followed by more improvements:
1. Handle vectors with undef elements.
2. Utilize pshufb and zero-mask-blending to support more effiecient
construction of vectors with constant-0 elements.
3. Use smaller-element vectors of same width, and "interpolate" the indices,
when no native operation available.
Reviewers: RKSimon, craig.topper
Reviewed By: RKSimon
Subscribers: chandlerc, DavidKreitzer
Differential Revision: https://reviews.llvm.org/D39126
llvm-svn: 317463
This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38684
Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540
llvm-svn: 317458
Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers.
llvm-svn: 317453
Use feature names instead of CPU names.
A future commit will add avx512vl command lines to demonstrate missed use of EVEX instructions.
llvm-svn: 317451
single-iteration loop
This fixes PR34681. Avoid adding the "Stride == 1" predicate when we know that
Stride >= Trip-Count. Such a predicate will effectively optimize a single
or zero iteration loop, as Trip-Count <= Stride == 1.
Differential Revision: https://reviews.llvm.org/D38785
llvm-svn: 317438
reverted my changes will be committed later after fixing the failure
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
Differential Revision: https://reviews.llvm.org/D39403
llvm-svn: 317433
This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8.
Differential Revision: https://reviews.llvm.org/D39403
llvm-svn: 317432
This is an implementation of PR26223.
Currently optimizeMemoryInst optimization tries to fold address computation
if all possible way to get compute the address are of the form
baseGV + base + scale * Index + offset
where scale and offset are constants and baseGV, base and Index are exactly
the same instructions if defined.
The patch extends this optimization to allow different bases. In this case
it tries to find/build a Phi node merging all possible bases and use this Phi node
as a base for sunk address computation. Also it supports Select instruction on
the way.
The main motivation for this scope extension is GCRelocateInst.
If there is a relocation of derived pointer it will be represented as relocation of base + offset.
Also there will be a Phi node merging address computation for relocated derived pointer
and derived pointer itself. If we have a Phi node merging original base and relocated base
and can fold the address computation of derived pointer then we can potentially reduce
the code size and Phi node for derived pointer. The later can have a positive impact to
register allocator.
Reviewers: efriedma, dberlin, mkazantsev, reames, john.brawn
Reviewed By: john.brawn
Subscribers: javed.absar, john.brawn, dneilson, llvm-commits
Differential Revision: https://reviews.llvm.org/D36073
llvm-svn: 317429
Summary:
AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement.
Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt.
I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed.
As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here.
This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions.
Going forward I think our focus should be on
-Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14.
-Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER
-Supporting double precision.
Reviewers: zvi, DavidKreitzer, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39583
llvm-svn: 317413
AMDGPULibFunc hardcodes address space values of the old address space mapping,
which causes invalid addrspacecast instructions and undefined functions in
APPSDK sample MonteCarloAsianDP.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39616
llvm-svn: 317409
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Originally commited as r317374, but reverted in r317395 to update some missed
tests.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317408
This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary.
As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out.
llvm-svn: 317403
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.
This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.
llvm-svn: 317379
This preserves the debug info for the cast operation in the original location.
rdar://problem/33460652
Reapplied r317340 with the test moved into an ARM-specific directory.
llvm-svn: 317375
Now that we have a way to mark GlobalValues as local we can use the symbol
resolutions that the linker plugin provides as part of lto/thinlto link
step to refine the compilers view on what symbols will end up being local.
Differential Revision: https://reviews.llvm.org/D35702
llvm-svn: 317374
Merging conditional stores tries to check to see if the code is if convertible after the store is moved. But the store hasn't been moved yet so its being counted against the threshold.
The patch adds 1 to the threshold comparison to make sure we don't count the store. I've adjusted a test to use a lower threshold to ensure we still do that conversion with the lower threshold.
Differential Revision: https://reviews.llvm.org/D39570
llvm-svn: 317368
This class was split between libIR and libSupport, which breaks under
modular code generation. Move it into the one library that uses it,
ProfileData, to resolve this issue.
llvm-svn: 317366
Adds blacklist parsing behaviour for filtering results into four categories:
- Expected Protected: Things that are not in the blacklist and are protected.
- Unexpected Protected: Things that are in the blacklist and are protected.
- Expected Unprotected: Things that are in the blacklist and are unprotected.
- Unexpected Unprotected: Things that are not in the blacklist and are unprotected.
now can optionally be invoked with a second command line argument, which specifies the blacklist file that the binary was built with.
Current statistics for chromium:
Reviewers: vlad.tsyrklevich
Subscribers: mgorny, llvm-commits, pcc, kcc
Differential Revision: https://reviews.llvm.org/D39525
llvm-svn: 317364
This recommit r317351 after fixing a buildbot failure.
Original commit message:
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
llvm-svn: 317362
DenseMaps require the definition of a type to be available when using a
pointer to that type as a key to know how many bits are available for
tombstone/etc.
llvm-svn: 317360
Add an interesting unit test, found by changing --search-length-undef from the default. Program handles it correctly but good for ensuring correctness on further changes :)
Reviewers: pcc
Subscribers: mgorny, llvm-commits, kcc, vlad.tsyrklevich
Differential Revision: https://reviews.llvm.org/D38658
llvm-svn: 317355
This removes the athlon type and simplifies the string decoding. We only really need these type/subtype breaks where we need to match libgcc/compiler-rt and these CPUs aren't part of that.
I'm looking into moving some of this information to a .def file to share with clang's __builtin_cpu_is handling. And while these CPUs aren't part of that the less lines I have to deal with in the .def file the better.
llvm-svn: 317354
Tests were failing because some bots were running out of address
space and memory. Additionally the test was very slow. These issues
were solved by changing the test to take advantage of sparse filse and
restricting the test to run only on 64-bit systems.
This should fix https://bugs.llvm.org//show_bug.cgi?id=34189
This change makes it so that if writing a K_GNU style archive, you need
to output a > 32-bit offset it should output in K_GNU64 style instead.
Differential Revision: https://reviews.llvm.org/D36812
llvm-svn: 317352
Summary:
This change add a pass which tries to split a call-site to pass
more constrained arguments if its argument is predicated in the control flow
so that we can expose better context to the later passes (e.g, inliner, jump
threading, or IPA-CP based function cloning, etc.).
As of now we support two cases :
1) If a call site is dominated by an OR condition and if any of its arguments
are predicated on this OR condition, try to split the condition with more
constrained arguments. For example, in the code below, we try to split the
call site since we can predicate the argument (ptr) based on the OR condition.
Split from :
if (!ptr || c)
callee(ptr);
to :
if (!ptr)
callee(null ptr) // set the known constant value
else if (c)
callee(nonnull ptr) // set non-null attribute in the argument
2) We can also split a call-site based on constant incoming values of a PHI
For example,
from :
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2, label %BB1
BB1:
br label %BB2
BB2:
%p = phi i32 [ 0, %BB0 ], [ 1, %BB1 ]
call void @bar(i32 %p)
to
BB0:
%c = icmp eq i32 %i1, %i2
br i1 %c, label %BB2-split0, label %BB1
BB1:
br label %BB2-split1
BB2-split0:
call void @bar(i32 0)
br label %BB2
BB2-split1:
call void @bar(i32 1)
br label %BB2
BB2:
%p = phi i32 [ 0, %BB2-split0 ], [ 1, %BB2-split1 ]
Reviewers: davidxl, huntergr, chandlerc, mcrosier, eraman, davide
Reviewed By: davidxl
Subscribers: sdesmalen, ashutosh.nema, fhahn, mssimpso, aemerson, mgorny, mehdi_amini, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D39137
llvm-svn: 317351
The number of iterations was incorrectly determined for DP FP vector types
and the tests were insufficient to flag this issue.
Differential revision: https://reviews.llvm.org/D39507
llvm-svn: 317349