The i8 type is not registered with any register class.
This causes a segmentation fault in MachineLICM::getRegisterClassIDAndCost.
The code selects the first type associated with register class FPR8,
which happens to be i8.
It uses this type (i8) to get the representative class pointer, which is 0.
It then uses this pointer to access a field, resulting in segmentation fault.
Since i8 type is not being used for printing any neon instruction
we can safely remove it.
llvm-svn: 200046
This change does not affect anything because everybody seems to be using
Object/COFF.h instead. But the definition is not for PE32 but for PE32+,
so fix it anyway.
llvm-svn: 200038
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.
llvm-svn: 200034
DAGCombiner::GatherAllAliases, which is only used when AA used is enabled
during DAGCombine, had a fundamentally incorrect assumption for which this
change compensates. GatherAllAliases, which is used to find aliasing
predecessor chain nodes (so that a better chain can be selected for a load or
store to enable subsequent optimizations) assumed that walking up the chain
would always catch all possibly-aliasing loads and stores. This is not true: To
really find all aliases, we also need to search for aliases through the value
operand of a store, etc. Consider the following situation:
Token1 = ...
L1 = load Token1, %52
S1 = store Token1, L1, %51
L2 = load Token1, %52+8
S2 = store Token1, L2, %51+8
Token2 = Token(S1, S2)
L3 = load Token2, %53
S3 = store Token2, L3, %52
L4 = load Token2, %53+8
S4 = store Token2, L4, %52+8
If we search for aliases of S3 (which loads address %52), and we look only
through the chain, then we'll miss the trivial dependence on L1 (which loads
from %52). We then might change all loads and stores to use Token1 as their
chain operand, which could result in copying %53 into %52 before copying
%52 into %51 (which should happen first).
The problem is, however, that searching for such data dependencies can become
expensive, and the cost is not directly related to the chain depth. Instead,
we'll rule out such configurations by insisting that we've visited all chain
users (except for users of the original chain, which is not necessary). When
doing this, we need to look through nodes we don't care about (otherwise,
things like register copies will interfere with trivial use cases).
Unfortunately, I don't have a small test case for this problem. Creating the
underlying situation is not hard (a pair of memcpys will do it), but arranging
for the default instruction schedule to be incorrect is very fragile.
This unbreaks self hosting on PPC64 when using
-mllvm -combiner-global-alias-analysis -mllvm -combiner-alias-analysis.
llvm-svn: 200033
These transformations obviously won't work for indexed (pre/post-inc) loads and
stores. In practice, I'm not sure there is any benefit to enabling them for
indexed nodes because other transformations that these might enable likely also
won't handle indexed nodes.
I don't have an in-tree test case that hits this problem, but an upcoming bug
fix will make it much more likely.
llvm-svn: 200023
This pass identifies expensive constants to hoist and coalesces them to
better prepare it for SelectionDAG-based code generation. This works around the
limitations of the basic-block-at-a-time approach.
First it scans all instructions for integer constants and calculates its
cost. If the constant can be folded into the instruction (the cost is
TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
consider it expensive and leave it alone. This is the default behavior and
the default implementation of getIntImmCost will always return TCC_Free.
If the cost is more than TCC_BASIC, then the integer constant can't be folded
into the instruction and it might be beneficial to hoist the constant.
Similar constants are coalesced to reduce register pressure and
materialization code.
When a constant is hoisted, it is also hidden behind a bitcast to force it to
be live-out of the basic block. Otherwise the constant would be just
duplicated and each basic block would have its own copy in the SelectionDAG.
The SelectionDAG recognizes such constants as opaque and doesn't perform
certain transformations on them, which would create a new expensive constant.
This optimization is only applied to integer constants in instructions and
simple (this means not nested) constant cast experessions. For example:
%0 = load i64* inttoptr (i64 big_constant to i64*)
Reviewed by Eric
llvm-svn: 200022
This enables IO error reports in both the child and server processes.
The scheme still isn't entirely satisfactory and output is jumbled but it beats
having no output at all. This will hopefully unblock ARM support (PR18057).
llvm-svn: 200017
There is no inline asm in a .s file. Therefore, there should be no logic to
handle it in the streamer. Inline asm only exists in bitcode files, so the
logic can live in the (long misnamed) AsmPrinter class.
llvm-svn: 200011
compile unit. Make these relocations on the platforms that need
relocations and add a routine to ensure that we don't put the
addresses in an offset table for split dwarf.
llvm-svn: 199990
This commit teaches the X86 backend to create the same X86 instructions when it
lowers an sadd/ssub with overflow intrinsic and a conditional branch that uses
that overflow result. This allows SelectionDAG to recognize and remove one of
the redundant operations.
This fixes <rdar://problem/15874016> and <rdar://problem/15661073>.
Reviewed by Nadav
llvm-svn: 199976
We completely skipped promotion in LICM if the loop has a preheader or
dedicated exits, but not *both*. We hoist if there is a preheader, and
sink if there are dedicated exits, but either hoisting or sinking can
move loop invariant code out of the loop!
I have no idea if this has a practical consequence. If anyone has ideas
for a test case, let me know.
llvm-svn: 199966
scale factors in memory addresses. As it does for .att_syntax.
It was producing:
Assertion failed: (((Scale == 1 || Scale == 2 || Scale == 4 || Scale == 8)) && "Invalid scale!"), function CreateMem, file /Volumes/SandBox/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp, line 1133.
rdar://14967214
llvm-svn: 199942
Eliminates the LLI_BUILDING_CHILD build hack from r199885.
Also add a FIXME to remove code that tricks the tests into passing when the
feature fails to work. Please don't do stuff like this, the tests exist for a
reason!
llvm-svn: 199929
Originally, BLX was passed as operand #0 in MachineInstr and as operand
#2 in MCInst. But now, it's operand #2 in both cases.
This patch also removes unnecessary FileCheck in the test case added by r199127.
llvm-svn: 199928
This patch adds the target analysis passes (usually TargetTransformInfo) to the
codgen pipeline. We also expose now the AddAnalysisPasses method through the C
API, because the optimizer passes would also benefit from better target-specific
cost models.
Reviewed by Andrew Kaylor
llvm-svn: 199926
Clang says that "flow" is unused when building LLD. This patch suppresses it.
Differential Revision: http://llvm-reviews.chandlerc.com/D2573
llvm-svn: 199922
This fixes a crash in the OpenCV OpenCL test suite.
There is no lit test for this, because the test would be very large
and could easily be invalidated by changes to the scheduler
or other parts of the compiler.
Patch by: Vincent Lejeune
llvm-svn: 199919
This pattern uses an SDNodeXForm, which isn't being emitted for some
reason. I can get it to work by attaching the PatLeaf that has the
XForm to the argument in the output pattern, but this results in an
immediate being used in a register operand, which the backend can't
handle yet.
llvm-svn: 199918
The control flow finalizer would sometimes use an ALU_POP_AFTER
instruction before the vetex fetch clause instead of using a POP
instruction after it.
llvm-svn: 199917
Implement the getUnrollingPreferences() function for
AMDGPUTargetTransformInfo so that loops that do address calculations
on pointers derived from alloca are unconditionally unrolled.
Unrolling these loops makes it more likely that SROA will be able to
eliminate the allocas, which is a big win for R600 since memory
allocated by alloca (private memory) is really slow.
llvm-svn: 199916
Argument promotion can replace an argument of a call with an alloca. This
requires clearing the tail marker as it is very likely that the callee is now
using an alloca in the caller.
This fixes pr14710.
llvm-svn: 199909
The unit test is now disabled on non-asserts builds.
The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)
We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199905
With constant-sharing, litpool loads consume 4 + N*2 bytes of code, but
movw/movt pairs consume 8*N. This means litpools are better than movw/movt even
with just one use. Other materialisation strategies can still be better though,
so the logic is a little odd.
llvm-svn: 199891
function and a FunctionPass.
This has many benefits. The motivating use case was to be able to
compute function analysis passes *after* running LoopSimplify (to avoid
invalidating them) and then to run other passes which require
LoopSimplify. Specifically passes like unrolling and vectorization are
critical to wire up to BranchProbabilityInfo and BlockFrequencyInfo so
that they can be profile aware. For the LoopVectorize pass the only
things in the way are LoopSimplify and LCSSA. This fixes LoopSimplify
and LCSSA is next on my list.
There are also a bunch of other benefits of doing this:
- It is now very feasible to make more passes *preserve* LoopSimplify
because they can simply run it after changing a loop. Because
subsequence passes can assume LoopSimplify is preserved we can reduce
the runs of this pass to the times when we actually mutate a loop
structure.
- The new pass manager should be able to more easily support loop passes
factored in this way.
- We can at long, long last observe that LoopSimplify is preserved
across SCEV. This *halves* the number of times we run LoopSimplify!!!
Now, getting here wasn't trivial. First off, the interfaces used by
LoopSimplify are all over the map regarding how analysis are updated. We
end up with weird "pass" parameters as a consequence. I'll try to clean
at least some of this up later -- I'll have to have it all clean for the
new pass manager.
Next up I discovered a really frustrating bug. LoopUnroll *claims* to
preserve LoopSimplify. That's actually a lie. But the way the
LoopPassManager ends up running the passes, it always ran LoopSimplify
on the unrolled-into loop, rectifying this oversight before any
verification could kick in and point out that in fact nothing was
preserved. So I've added code to the unroller to *actually* simplify the
surrounding loop when it succeeds at unrolling.
The only functional change in the test suite is that we now catch a case
that was previously missed because SCEV and other loop transforms see
their containing loops as simplified and thus don't miss some
opportunities. One test case has been converted to check that we catch
this case rather than checking that we miss it but at least don't get
the wrong answer.
Note that I have #if-ed out all of the verification logic in
LoopSimplify! This is a temporary workaround while extracting these bits
from the LoopPassManager. Currently, there is no way to have a pass in
the LoopPassManager which preserves LoopSimplify along with one which
does not. The LPM will try to verify on each loop in the nest that
LoopSimplify holds but the now-Function-pass cannot distinguish what
loop is being verified and so must try to verify all of them. The inner
most loop is clearly no longer simplified as there is a pass which
didn't even *attempt* to preserve it. =/ Once I get LCSSA out (and maybe
LoopVectorize and some other fixes) I'll be able to re-enable this check
and catch any places where we are still failing to preserve
LoopSimplify. If this causes problems I can back this out and try to
commit *all* of this at once, but so far this seems to work and allow
much more incremental progress.
llvm-svn: 199884
Eliminate the copies LLVM's System mmap and cache invalidation code. These were
slowly drifting away from the original version, and moreover the copied code
was a dead end in terms of portability.
We now statically link to Support but in practice with stripping this adds next
to no weight to the resultant binary.
Also avoid installing lli-child-target to the user's $PATH. It's not meant to
be run directly.
llvm-svn: 199881
e.g. linkonce, to TargetMachine and set it when we've done so
for ELF targets currently. This involved making TargetMachine
non-const in a TLOF use and propagating that change around - I'm
open to other ideas.
This will be used in a future commit to handle emitting debug
information with ranges.
llvm-svn: 199871
This patch updates .set mips16 support which
affects the ELF ABI and its flags. In addition the patch uses
a common interface for both the MipsTargetSteamer and
MipsObjectStreamer that the assembler uses for
both ELF and ASCII output for these directives.
llvm-svn: 199851
This is a horrible bit of code. We're calling a simplification routine *in the middle* of type legalization. We tell the
simplification routine that it's running after legalization, but some of the types it will encounter will be illegal! The
fix is only to invoke the simplification if the types in question were legal, so that none of its invariants will be violated.
llvm-svn: 199847
This reverts commit 35b8331cad6eb512a2506adbc394201181da94ba.
The -debug-only flag for llc doesn't appear to be available in
all build configurations.
llvm-svn: 199845
The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)
We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199842
My understanding (from reading just the llvm code) is that
* most ppc cpus have a "sync n" instruction and an msync alias that is "sync 0".
* "book e" cpus instead have a msync instruction and not the more
general "sync n"
This patch reflects that in the .td files, allowing a single codepath for
asm ond obj streamer and incidentelly fixes a crash when EmitRawText was
called on a obj streamer.
llvm-svn: 199832
different number of elements.
Bitcasts were passing with vectors of pointers with different number of
elements since the number of elements was checking
SrcTy->getVectorNumElements() == SrcTy->getVectorNumElements() which
isn't helpful. The addrspacecast was also wrong, but that case at least
is caught by the verifier. Refactor bitcast and addrspacecast handling
in castIsValid to be more readable and fix this problem.
llvm-svn: 199821
This patch restores the ARM mode if the user's inline assembly
does not. In the object streamer, it ensures that instructions
following the inline assembly are encoded correctly and that
correct mapping symbols are emitted. For the asm streamer, it
emits a .arm or .thumb directive.
This patch does not ensure that the inline assembly contains
the ADR instruction to switch modes at runtime.
The problem we need to solve is code like this:
int foo(int a, int b) {
int r = a + b;
asm volatile(
".align 2 \n"
".arm \n"
"add r0,r0,r0 \n"
: : "r"(r));
return r+1;
}
If we compile this function in thumb mode then the inline assembly
will switch to arm mode. We need to make sure that we switch back to
thumb mode after emitting the inline assembly or we will incorrectly
encode the instructions that follow (i.e. the assembly instructions
for return r+1).
Based on patch by David Peixotto
Change-Id: Ib57f6d2d78a22afad5de8693fba6230ff56ba48b
llvm-svn: 199818
identify_magic is not free, so we should avoid calling it twice. The argument
also makes it cheap for createBinary to just forward to createObjectFile.
llvm-svn: 199813
This actually totally breaks and causes the machine verifier to cry in several cases, one of which being:
%RAX<def> = COPY %RCX<kill>
%ECX<def> = COPY %EAX<kill>, %RAX<imp-use,kill>
These subregister copies are together identified as noops, so are both removed. However, the second one as it has an imp-use gets converted into a kill:
%ECX<def> = KILL %EAX<kill>, %RAX<imp-use,kill>
As the original COPY has been removed, the verifier goes into tears at the use of undefined EAX and RAX.
There are several hacky solutions to this hacky problem (which is all to do with imp-use/def weirdnesses), but the least hacky I've come up with is to *always* remove COPYs by converting to KILLs. KILLs are no-ops to the code generator so the generated code doesn't change (which is why they were partially used in the first place), but using them also keeps the def/use and imp-def/imp-use chains alive:
%RAX<def> = KILL %RCX<kill>
%ECX<def> = KILL %EAX<kill>, %RAX<imp-use,kill>
The patch passes all test cases including the ones that check the removal of MOVs in this circumstance, along with an extra test I added to check subregister behaviour (which made the machine verifier fall over before my patch).
The patch also adds some DEBUG() statements because the file hadn't got any.
llvm-svn: 199797
inconsistent results for different orderings of alloca slices. The
fundamental issue is that it is just always a mistake to return early
from this function. There is no effective early exit to leverage. This
patch stops trynig to do so and simplifies the code a bit as
a consequence.
Original diagnosis and patch by James Molloy with some name tweaks by me
in part reflecting feedback from Duncan Smith on the mailing list.
llvm-svn: 199771
The constructors of classes deriving from Binary normally take an error_code
as an argument to the constructor. My original intent was to change them
to have a trivial constructor and move the initial parsing logic to a static
method returning an ErrorOr. I changed my mind because:
* A constructor with an error_code out parameter is extremely convenient from
the implementation side. We can incrementally construct the object and give
up when we find an error.
* It is very efficient when constructing on the stack or when there is no
error. The only inefficient case is where heap allocating and an error is
found (we have to free the memory).
The result is that this is a much smaller patch. It just standardizes the
create* helpers to return an ErrorOr.
Almost no functionality change: The only difference is that this found that
we were trying to read past the end of COFF import library but ignoring the
error.
llvm-svn: 199770
Fix a crash in SjLjEHPrepare::lowerIncomingArguments caused by treating
VectorType like an aggregate. It's first-class!
<rdar://problem/15854596>
llvm-svn: 199768
For PPC64 SVR (and Darwin), the stores that take byval aggregate parameters
from registers into the stack frame had MachinePointerInfo objects with
incorrect offsets. These offsets are relative to the object itself, not to the
stack frame base.
This fixes self hosting on PPC64 when compiling with -enable-aa-sched-mi.
llvm-svn: 199763
This is apparently a bit of a white lie (they can affect DSPControl for
overflow etc) but similar to how we currently handle floating-point operations.
When it becomes relevant the whole lot can be reviewed properly.
llvm-svn: 199718
Add support to llvm-readobj to decode the actual opcodes. The ARM EHABI opcodes
are a variable length instruction set that describe the operations required for
properly unwinding stack frames.
The primary motivation for this change is to ease the creation of tests for the
ARM EHABI object emission as well as the unwinding directive handling in the ARM
IAS.
Thanks to Logan Chien for an extra test case!
llvm-svn: 199708
This implements the unwind_raw directive for the ARM IAS. The unwind_raw
directive takes the form of a stack offset value followed by one or more bytes
representing the opcodes to be emitted. The opcode emitted will interpreted as
if it were assembled by the opcode assembler via the standard unwinding
directives.
Thanks to Logan Chien for an extra test!
llvm-svn: 199707
The .personalityindex directive is equivalent to the .personality directive with
the ARM EABI personality with the specific index (0, 1, 2). Both of these
directives indicate personality routines, so enhance the personality directive
handling to take into account personalityindex.
Bonus fix: flush the UnwindContext at the beginning of a new function.
Thanks to Logan Chien for additional tests!
llvm-svn: 199706
It was commited as r199628 but reverted in r199628 as causing
regression test failed. It's because of old vervsion of patch
I used to commit. Sorry for mistake.
llvm-svn: 199704
to not guess at a symbol name in some cases.
The problem is that in object files assembled starting at address 0, when
trying to symbolicate something that starts like this:
% cat x.s
_t1:
vpshufd $0x0, %xmm1, %xmm0
the symbolic disassembly can end up like this:
% otool -tV x.o
x.o:
(__TEXT,__text) section
_t1:
0000000000000000 vpshufd $_t1, %xmm1, %xmm0
Which is in this case produced incorrect symbolication.
But it is useful in some cases to use the SymbolLookUp() call back
to guess at some immediate values. For example one like this
that does not have an external relocation entry:
% cat y.s
_t1:
movl $_d1, %eax
.data
_d1: .long 0
% clang -c -arch i386 y.s
% otool -tV y.o
y.o:
(__TEXT,__text) section
_t1:
0000000000000000 movl $_d1, %eax
% otool -rv y.o
y.o:
Relocation information (__TEXT,__text) 1 entries
address pcrel length extern type scattered symbolnum/value
00000001 False long False VANILLA False 2 (__DATA,__data)
So the change is based on it is not likely that an immediate Value
coming from an instruction field of a width of 1 byte, other than branches
and items with relocation, are not likely symbol addresses.
With the change the first case above simply becomes:
% otool -tV x.o
x.o:
(__TEXT,__text) section
_t1:
0000000000000000 vpshufd $0x0, %xmm1, %xmm0
and the second case continues to work as expected.
rdar://14863405
llvm-svn: 199698
when used with symbolic disassembly, add a check that the operand
is an immediate and has not been symbolicated to MCExpr operand.
I’m trying to enable the ‘C’ disassembly API option
LLVMDisassembler_Option_SetInstrComments for darwin’s
otool(1) that uses the llvm disassembler API. The problem is
that the disassembler API can change an immediate operand to
an MCExpr operand if it symbolicates it with the call backs.
And if it does the code in llvm::EmitAnyX86InstComments()
will crash when it assumes these operands are immediates.
The fix for this is very straight forward to just protect the call
to getImm() with a check of isImm(). So if the immediate for
an instruction is symbolicated it simply doesn’t get the X86
verbose assembly comments:
% otool -tV test_asm.o
test_asm.o:
(__TEXT,__text) section
_t1:
0000000000000000 vpshufd $_t1, %xmm1, %xmm0
0000000000000005 retq
0000000000000006 nopw %cs:_t1(%rax,%rax)
_t2:
0000000000000010 vpshufd $-0x1, %xmm0, %xmm0 ## xmm0 = xmm0[3,3,3,3]
0000000000000015 retq
0000000000000016 nopw %cs:_t1(%rax,%rax)
_t3:
0000000000000020 vpshufd $_t1, %xmm1, %xmm0
0000000000000025 retq
0000000000000026 nopw %cs:_t1(%rax,%rax)
_t4:
0000000000000030 vpshufd $0x2d, %xmm0, %xmm0 ## xmm0 = xmm0[1,3,2,0]
0000000000000035 retq
The fact that the immediate $0x0 is being symbolicated at
all in this case is a different problem which my next patch
will address.
rdar://10989286
llvm-svn: 199697
StackProtector keeps a ValueMap of alloca instructions to layout kind tags for
use by PEI and other later passes. When stack coloring replaces one alloca with
a bitcast to another one, the key replacement in this map does not work.
Instead, provide an interface to manage this updating directly. This seems like
an improvement over the old behavior, where the layout map would not get
updated at all when the stack slots were merged. In practice, however, there is
likely no observable difference because PEI only did anything special with
'large array' kinds, and if one large array is merged with another, than the
replacement should already have been a large array.
This is an attempt to unbreak the clang-x86_64-darwin11-RA builder.
llvm-svn: 199684
Add target specific rules for combining vselect dag nodes into movss/movsd
when possible.
If the vector type of the vselect dag node in input is either MVT::v4i13 or
MVT::v4f32, then try to fold according to rules:
1) fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B)
2) fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A)
If the vector type of the vselect dag node in input is either MVT::v2i64 or
MVT::v2f64 (and we have SSE2), then try to fold according to rules:
3) fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B)
4) fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A)
llvm-svn: 199683
optional DWARF sections, so compiling with -g does not result in
different code being generated for PC-relative loads.
This is reapplying a diet r197922 (__TEXT-only).
llvm-svn: 199681
The way that stack coloring updated MMOs when merging stack slots, while
correct, is suboptimal, and is incompatible with the use of AA during
instruction scheduling. The solution, which involves the use of const_cast (and
more importantly, updating the IR from within an MI-level pass), obviously
requires some explanation:
When the stack coloring pass was originally committed, the code in
ScheduleDAGInstrs::buildSchedGraph tracked possible alias sets by using
GetUnderlyingObject, and all load/store and store/store memory control
dependencies where added between SUs at the object level (where only one
object, that returned by GetUnderlyingObject, was used to identify the object
associated with each MMO). When stack coloring merged stack slots, it would
replace MMOs derived from the remapped alloca with the alloca with which the
remapped alloca was being replaced. Because ScheduleDAGInstrs only used single
objects, and tracked alias sets at the object level, this was a fine solution.
In r169744, (Andy and) I updated the code in ScheduleDAGInstrs to use
GetUnderlyingObjects, and track alias sets using, potentially, multiple
underlying objects for each MMO. This was done, primarily, to provide the
ability to look through PHIs, and provide better scheduling for
induction-variable-dependent loads and stores inside loops. At this point, the
MMO-updating code in stack coloring became suboptimal, because it would clear
the MMOs for (i.e. completely pessimize) all instructions for which r169744
might help in scheduling. Updating the IR directly is the simplest fix for this
(and the one with, by far, the least compile-time impact), but others are
possible (we could give each MMO a small vector of potential values, or make
use of a remapping table, constructed from MFI, inside ScheduleDAGInstrs).
Unfortunately, replacing all MMO values derived from the remapped alloca with
the base replacement alloca fundamentally breaks our ability to use AA during
instruction scheduling (which is critical to performance on some targets). The
reason is that the original MMO might have had an offset (either constant or
dynamic) from the base remapped alloca, and that offset is not present in the
updated MMO. One possible way around this would be to use
GetPointerBaseWithConstantOffset, and update not only the MMO's value, but also
its offset based on the original offset. Unfortunately, this solution would
only handle constant offsets, and for safety (because AA is not completely
restricted to deducing relationships with constant offsets), we would need to
clear all MMOs without constant offsets over the entire function. This would be
an even worse pessimization than the current single-object restriction. Any
other solution would involve passing around a vector of remapped allocas, and
teaching AA to use it, introducing additional complexity and overhead into AA.
Instead, when remapping an alloca, we replace all IR uses of that alloca as
well (optionally inserting a bitcast as necessary). This is even more efficient
that the old MMO-updating code in the stack coloring pass (because it removes
the need to call GetUnderlyingObject on all MMO values), removes the
single-object pessimization in the default configuration, and enables the
correct use of AA during instruction scheduling (all without any additional
overhead).
LLVM now no longer miscompiles itself on x86_64 when using -enable-misched
-enable-aa-sched-mi -misched-bottomup=0 -misched-topdown=0 -misched=shuffle!
Fixed PR18497.
Because the alloca replacement is now done at the IR level, unless the MMO
directly refers to the remapped alloca, the change cannot be seen at the MI
level. As a result, there is no good way to fix test/CodeGen/X86/pr14090.ll.
llvm-svn: 199658
When using AA to break false chain dependencies, we need to track multiple
stores per object in ScheduleDAGInstrs. Historically, we tracked potential alias
chains at the object level, and so all loads of an object would retain
dependencies on any store to that object. With AA, however, this is not
sufficient: non-overlapping stores and loads to the same object all need to be
tested for dependencies separately, we cannot only test all loads to an object
against only the last store (see PR18497 for an explicit example).
To mitigate any unwelcome compile-time impact when not using AA, only one store
is kept in the list per object when not using AA.
This, along with a stack coloring change to come shortly, will provide a test
case, fix PR18497 (and allow LLVM to compile itself using -enable-aa-sched-mi
on x86-64).
llvm-svn: 199657
The addition of IC_OPSIZE_ADSIZE in r198759 wasn't quite complete. It
also turns out to have been unnecessary. The disassembler handles the
AdSize prefix for itself, and doesn't care about the difference between
(e.g.) MOV8ao8 and MOB8ao8_16 definitions. So just let them coexist and
don't worry about it.
llvm-svn: 199654
The disassembler has a special case for 'L' vs. 'W' in its heuristic for
checking for 32-bit and 16-bit equivalents. We could expand the heuristic,
but better just to be consistent in using the 'L' suffix.
llvm-svn: 199652
When disassembling in 16-bit mode the meaning of the OpSize bit is
inverted. Instructions found in the IC_OPSIZE context will actually
*not* have the 0x66 prefix, and instructions in the IC context will
have the 0x66 prefix. Make use of the existing special-case handling
for the 0x66 prefix being in the wrong place, to cope with this.
llvm-svn: 199650
Aside from cleaning up the code, this also adds support for the -code16
environment and actually enables the MODE_16BIT mode that was previously
not accessible.
There is no point adding any testing for 16-bit yet though; basically
nothing will work because we aren't handling the OpSize prefix correctly
for 16-bit mode.
llvm-svn: 199649
various opt verifier commandline options.
Mostly mechanical wiring of the verifier to the new pass manager.
Exercises one of the more unusual aspects of it -- a pass can be either
a module or function pass interchangably. If this is ever problematic,
we can make things more constrained, but for things like the verifier
where there is an "obvious" applicability at both levels, it seems
convenient.
This is the next-to-last piece of basic functionality left to make the
opt commandline driving of the new pass manager minimally functional for
testing and further development. There is still a lot to be done there
(notably the factoring into .def files to kill the current boilerplate
code) but it is relatively uninteresting. The only interesting bit left
for minimal functionality is supporting the registration of analyses.
I'm planning on doing that on top of the .def file switch mostly because
the boilerplate for the analyses would be significantly worse.
llvm-svn: 199646
ADDITIONAL_HEADERS is intended to add header files for IDEs as hint.
For example:
add_llvm_library(LLVMSupport
Host.cpp
ADDITIONAL_HEADERS
Unix/Host.inc
Windows/Host.inc
)
llvm-svn: 199639
coding standards, and instead fix the existing section.
Thanks to Daniel Jasper for pointing out we already had a section
devoted to this topic. Instead of adding sections, just hack on this
section some. Also fix the example in the anonymous namespace section
below it to agree with the new advice.
As a re-cap, this switches the LLVM preferred style to never indent
namespaces. Having two approaches just led to endless (and utterly
pointless) debates about what was "small enough". This wasn't helping
anyone. The no-indent rule is easy to understand and doesn't really make
anything harder to read. Moreover, with tools like clang-format it is
considerably nicer to have simple consistent rules.
llvm-svn: 199637
type units were enabled. The crux of the issue is that the
addDwarfTypeUnitType routine can end up being indirectly recursive. In
this case, the reference into the dense map (TU) became invalid by the
time we popped all the way back and used it to add the DIE type
signature.
Instead, use early return in the case where we can bypass the recursive
step and creating a type unit. Then use the pointer to the new type unit
to set up the DIE type signature in the case where we have to.
I tried really hard to reduce a testcase for this, but it's really
annoying. You have to get this to be mid-recursion when the densemap
grows. Even if we got a test case for this today, it'd be very unlikely
to continue exercising this pattern.
llvm-svn: 199630
This logic hadn't been updated to handle FastMathFlags, and it took me a while to detect it because it doesn't show up in a simple search for CreateFAdd.
llvm-svn: 199629
Have I mentioned that functions returning true on error and false on
success are confusing? They're more confusing when their name is
"verify". Anyways...
llvm-svn: 199622
For FCMEQ, FCMGE, FCMGT, FCMLE and FCMLT, floating point zero will be
printed as #0.0 instead of #0. To support the history codes using #0,
we consider to let asm parser accept both #0.0 and #0.
llvm-svn: 199621
(and to mention namespace ending comments). This is based on a quick
discussion on the developer mailing list where there was essentially no
objections to a simple and consistent rule. This should avoid future
debates about whether or not a namespace is "big enough" to indent. It
also matches clang-format's current behavior with LLVM source code which
hasn't really seen any opposition in code reviews that I spot checked.
llvm-svn: 199620
This was due to arithmetic overflow in the getNumBits() computation. Now we
cast BitWidth to a uint64_t so that does not occur during the computation. After
the computation is complete, the uint64_t is truncated when the function
returns.
I know that this is not something that is likely to happen, but it *IS* a valid
input and we should not blow up.
llvm-svn: 199609
In LLVM build tree, they points corresponding INTDIR.
In Clang standalone tree, they points external dir (llvm-config's --bindir and --libdir).
llvm-svn: 199595
intrinsics.
Reported on the list by Evan with a couple of attempts to fix, but it
took a while to dig down to the root cause. There are two overlapping
bugs here, both centering around the circumstance of discovering
a memcpy operand which is known to be completely outside the bounds of
the alloca.
First, we need to kill the *other* side of the memcpy if it was added to
this alloca. Otherwise we'll factor it into our slicing and try to
rewrite it even though we know for a fact that it is dead. This is made
more tricky because we can visit the sides in either order. So we have
to both kill the other side and skip instructions marked as dead. The
latter really should be goodness in every case, but here is a matter of
correctness.
Second, we need to actually remove the *uses* of the alloca by the
memcpy when queuing it for later deletion. Otherwise it may still be
using the alloca when we go to promote it (if the rewrite re-uses the
existing alloca instruction). Do this by factoring out the
use-clobbering used when for nixing a Phi argument and re-using it
across the operands of a to-be-deleted instruction.
llvm-svn: 199590
Ensure that the tag types are reflected on a replacement. This is particularly
important for the compatibility tag which has multiple representations where the
last definition wins.
llvm-svn: 199577
This moves the ARM build attributes definitions and support routines into the
Support library. The support routines simply permit the conversion of the value
to and from a string representation.
The movement is prompted in order to permit access to the constants and string
representations from readobj in order to facilitate decoding of the attributes
section.
llvm-svn: 199575
a reduction.
Really. Under certain circumstances (the use list of an instruction has to be
set up right - hence the extra pass in the test case) we would not recognize
when a value in a potential reduction cycle was used multiple times by the
reduction cycle.
Fixes PR18526.
radar://15851149
llvm-svn: 199570
This makes the 'verifyFunction' and 'verifyModule' functions totally
independent operations on the LLVM IR. It also cleans up their API a bit
by lifting the abort behavior into their clients and just using an
optional raw_ostream parameter to control printing.
The implementation of the verifier is now just an InstVisitor with no
multiple inheritance. It also is significantly more const-correct, and
hides the const violations internally. The two layers that force us to
break const correctness are building a DomTree and dispatching through
the InstVisitor.
A new VerifierPass is used to implement the legacy pass manager
interface in terms of the other pieces.
The error messages produced may be slightly different now, and we may
have slightly different short circuiting behavior with different usage
models of the verifier, but generally everything works equivalently and
this unblocks wiring the verifier up to the new pass manager.
llvm-svn: 199569
one, but not create one. This is useful in the verifier when we want to
query the constant if it exists but not create one. To be used in an
upcoming commit.
llvm-svn: 199568
Summary:
The only current use of this flag is to mark the alloca as dynamic, even
if its in the entry block. The stack adjustment for the alloca can
never be folded into the prologue because the call may clear it and it
has to be allocated at the top of the stack.
Reviewers: majnemer
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2571
llvm-svn: 199525
This patch adds two new target-independent calling conventions for runtime
calls - PreserveMost and PreserveAll.
The target-specific implementation for X86-64 is defined as following:
- Arguments are passed as for the default C calling convention
- The same applies for the return value(s)
- PreserveMost preserves all GPRs - except R11
- PreserveAll preserves all GPRs and all XMMs/YMMs - except R11
Reviewed by Lang and Philip
llvm-svn: 199508
Summary:
$rs and $rt were the wrong way round in the .td and the testcase wasn't
strict enough to detect the mistake.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D2554
llvm-svn: 199498
IIImul -> II_MUL
IIImult -> II_MULT, II_MULTU, II_MADD, II_MADDU, II_MSUB, II_MSUBU, II_DMULT, II_DMULTU
No functional change since the InstrItinData's have been duplicated.
llvm-svn: 199495
Fix MLA defs to use register class GPRnopc.
Add encoding tests for multiply instructions.
(Alias for MUL/SMLAL/UMLAL added by r199026.)
Patch by Zhaoshi.
llvm-svn: 199491
and tweak comments prior to more invasive surgery. Also clean up some
other non-doxygen comments, and run clang-format over the parts that are
going to change dramatically in subsequent commits so that those don't
get cluttered with formatting changes.
No functionality changed.
llvm-svn: 199489
the verifier after ensuring the CFG is at least usefully formed.
This fixes a number of problems:
1) The PreVerifier was missing the controls the Verifier provides over
*how* an invalid module is handled -- it just aborted the program!
Now it uses the same logic as the Verifier which is significantly
more library-friendly.
2) The DominatorTree used previously could have been cached and not
updated due to bugs in prior passes and we would silently use the
stale tree. This could cause dominance errors to not be as quickly
diagnosed.
3) We can now (in the next patch) pull the functionality of the verifier
apart from the pass infrastructure so that you can verify IR without
having any form of pass manager. This in turn frees the code to share
logic between old and new pass manager variants.
Along the way I fixed at least one annoying bug -- the state for
'Broken' wasn't being cleared from run to run causing all functions
visited after the first broken function to be marked as broken
regardless of whether *they* were a problem. Fortunately, I don't really
know much of a way to observe this peculiarity.
In case folks are worried about the runtime cost, its negligible.
I looked at running the entire regression test suite (which should be
a relatively good use of the verifier) before and after but was unable
to even measure the time spent on the verifier and there was no
regresion from before to after. I checked both with debug builds and
optimized builds.
llvm-svn: 199487
accidentally pick that up while using Clang and run into subtle bugs
down the road related to C++11 features not fully implemented in that
version of the standard library.
llvm-svn: 199484
This makes things a lot easier, because we can now talk about the
"argument allocation", which allocates all the memory for the call in
one shot.
The only functional change is to the verifier for a feature that hasn't
shipped yet.
llvm-svn: 199434
When registering a pass, a pass can now specify a second construct that takes as
argument a pointer to TargetMachine.
The PassInfo class has been updated to reflect that possibility.
If such a constructor exists opt will use it instead of the default constructor
when instantiating the pass.
Since such IR passes are supposed to be rare, no specific support has been
added to this commit to allow an easy registration of such a pass.
In other words, for such pass, the initialization function has to be
hand-written (see CodeGenPrepare for instance).
Now, codegenprepare can be tested using opt:
opt -codegenprepare -mtriple=mytriple input.ll
llvm-svn: 199430
Adding a doxygen comment for each bit of API to indicate at which
LTO_API_VERSION each was available, manually gleaned from successive
git-blames. A few notes:
- LTO_API_VERSION was set to 3 at its introduction.
- I've indicated all the API introduced before LTO_API_VERSION was
around as available "prior to LTO_API_VERSION=3".
- A number of API changes neglected to bump LTO_API_VERSION. These I've
indicated as available at the *next* bump of LTO_API_VERSION.
llvm-svn: 199429
with raw_ostream's write_escaped() method.
For example darwin's otool(1) program that uses the llvm
disassembler now produces disassembly like this:
leaq 0x7b(%rip), %rdi ## literal pool for: "%f\ntoto\n"
and not print the new lines which messes up the output.
rdar://15145300
llvm-svn: 199407
This is necessary because the classes are shared between all implementations.
No functional change since the InstrItinData's have been duplicated.
llvm-svn: 199394
IIArith -> II_ADD, II_ADDU, II_AND, II_CL[ZO], II_DADDIU, II_DADDU,
II_DROTR, II_DROTR32, II_DROTRV, II_DSLL, II_DSLL32, II_DSLLV,
II_DSR[AL], II_DSR[AL]32, II_DSR[AL]V, II_DSUBU, II_LUI, II_MOV[ZFNT],
II_NOR, II_OR, II_RDHWR, II_ROTR, II_ROTRV, II_SLL, II_SLLV, II_SR[AL],
II_SR[AL]V, II_SUBU, II_XOR
No functional change since the InstrItinData's have been duplicated.
This is necessary because the classes are shared between all schedulers.
Once this patch series is committed there will be an InstrItinClass for
each mnemonic with minimal grouping. This does increase the size of the
itinerary tables for each MIPS scheduler but we have a few options for dealing
with that later. These options include reducing the number of classes once
we see the best way to simplify them, or by extending tablegen to be able
to compress the table by eliminating duplicates entries, etc.
llvm-svn: 199391
Affects:
DMULT, DMULTu, MADD, MADD_MM, MADDU, MADDU_MM, MSUB, MSUB_MM, MSUBU,
MSUBU_MM, MULT, MULTu
Does not affect MULT_MM, MULTu_MM since they are currently miscategorised
as IIImul.
llvm-svn: 199381
There are two attempted optimisations in reMaterializeTrivialDef, trying to
avoid promoting the size of a register too much when rematerializing.
Unfortunately, both appear to be flawed. First, we see if the original register
would have worked, but this is inadequate. Consider:
v1 = SOMETHING (v1 is QQ)
v2:Q0 = COPY v1:Q1 (v1, v2 are QQ)
...
uses of v2
In this case even though v2 *could* be used directly as the output of
SOMETHING, this would set the wrong bits of the QQ register involved. The
correct rematerialization must be:
v2:Q0_Q1 = SOMETHING (v2 promoted to QQQ)
...
uses of v2:Q1_Q2
For the second optimisation, if the correct remat is "v2:idx = SOMETHING" then
we can't necessarily expect v2 itself to be valid for SOMETHING, but we do try
to hunt for a class between v1 and v2 that works. Unfortunately, this is also
wrong:
v1 = SOMETHING (v1 is QQ)
v2:Q0_Q1 = COPY v1 (v1 is QQ, v2 is QQQ)
...
uses of v2 as a QQQ
The canonical rematerialization here is "v2:Q0_Q1 = SOMETHING". However current
logic would decide that v2 could be a QQ (no interest is taken in later uses).
This patch, therefore, always accepts the widened register class without trying
to be clever. Generally there is no penalty to this (e.g. in the common GR32 <
GR64 case, expanding the width doesn't matter because it's not like you were
going to do anything else with the high bits of a GR32 register). It can
increase register pressure in cases like the ARM VFP regs though (multiple
non-overlapping but equivalent subregisters). This situation can be
spotted by the fact that both source and destination in the
not-quite-coalesced pair have a sub-register index and
rematerialisation is skipped in that situation.
Unfortunately, no in-tree targets actually expose this as far as I can tell
(there are so few isAsCheapAsAMove instructions for it to trigger on) so I've
been unable to produce a test. It was exposed in our ARM64 SPEC tests though,
and I will be adding a test there that we should be able to contribute
soon(TM).
rdar://problem/15775279
llvm-svn: 199376
flag from clang, and disable zero-base shadow support on all platforms
where it is not the default behavior.
- It is completely unused, as far as we know.
- It is ABI-incompatible with non-zero-base shadow, which means all
objects in a process must be built with the same setting. Failing to
do so results in a segmentation fault at runtime.
- It introduces a backward dependency of compiler-rt on user code,
which is uncommon and complicates testing.
This is the LLVM part of a larger change.
llvm-svn: 199371
This patch adds the capability to dump export table contents. An example
output is this:
Export Table:
Ordinal RVA Name
5 0x2008 exportfn1
6 0x2010 exportfn2
By adding this feature to llvm-objdump, we will be able to use it to check
export table contents in LLD's tests. Currently we are doing binary
comparison in the tests, which is fragile and not readable to humans.
llvm-svn: 199358
The generation of the native_export_file end up in
several different makefiles. All those makefiles
write the same file, but can be executed concurrently...
and bad things happen!
llvm-svn: 199356
Move copying of global initializers below the cloning of functions.
The BlockAddress doesn't have access to the correct basic blocks until the
functions have been cloned. This causes the BlockAddress to point to the old
values. Just wait until the functions have been cloned before copying the
initializers.
PR13163
llvm-svn: 199354
DataRefImpl (a union of two integers and a pointer) is not the ideal data type
to represent a reference to an import directory entity. We should just use the
pointer to the import table and an offset instead to simplify. No functionality
change.
llvm-svn: 199349
than it needs to be by 1 bit but I need to finish some other things so
that all the boundary cases will work in that situation. constpool.c
in test-suite will fail to assemble under our new internal test-suite sync
without this change.
llvm-svn: 199343
If a binary does not depend on any DLL, it does not contain import table at
all. Printing the section title without contents looks wrong, so we shouldn't
print it in that case.
llvm-svn: 199340
ARM assembly syntax uses @ for a comment, execpt for the second
parameter of the .symver directive which requires @ as part of the
symbol name. This commit fixes the parsing of this directive by
adding a special case for ARM for this one argumnet.
To make the change we had to move the AllowAtInIdentifier variable
to the MCAsmLexer interface (from AsmLexer) and expose a setter for
the value. The ELFAsmParser then toggles this value when parsing
the second argument to the .symver directive for a target that
uses @ as a comment symbol
llvm-svn: 199339
Add a hook in the C API of LTO so that clients of the code generator can set
their own handler for the LLVM diagnostics.
The handler is defined like this:
typedef void (*lto_diagnostic_handler_t)(lto_codegen_diagnostic_severity_t
severity, const char *diag, void *ctxt)
- severity says how bad this is.
- diag is a string that contains the diagnostic message.
- ctxt is the registered context for this handler.
This hook is more general than the lto_get_error_message, since this function
keeps only the latest message and can only be queried when something went wrong
(no warning for instance).
<rdar://problem/15517596>
llvm-svn: 199338
which catch buggy versions of libstdc++. While libc++ would pass them,
we don't actually update the state in the configure script to use libc++
when we pass --enable-libcpp, the logic for that is in the
Makefiles. So just completely skip the library test when that configure
flag is passed.
Hopefully this will be enough to fix the darwin bots at last, and thanks
to Duncan Smith for getting things set up so I can watch the bots myself
on lab.llvm.org and see any failures!
llvm-svn: 199334
This fixes a regression intruced by r199135.
Revision 199135 tried to simplify part of the logic in method
DAGCombiner::SimplifyVBinOp introducing calls to method BuildVectorSDNode::isConstant().
However, that revision wrongly changed the check performed by method
SimplifyVBinOp to identify dag nodes that can be folded.
Before revision 199135, that method only tried to simplify vector binary operations
if both operands were build_vector of Constant/ConstantFP/Undef only.
After revision 199135, method SimplifyVBinop tried to
simplify also vector binary operations with only one constant operand.
This fixes the problem restoring the old behavior of SimplifyVBinOp.
llvm-svn: 199328
I did write a version returning ErrorOr<OwningPtr<Binary> >, but it is too
cumbersome to use without std::move. I will keep the patch locally and submit
when we switch to c++11.
llvm-svn: 199326
enable flag that selects the C++ standard library to use with the host
toolchain. Otherwise we end up testing the wrong config.
I'm not really happy about this placement, but its pragmatic and should
unblock the Apple builders.
llvm-svn: 199325
libstdc++v4.6. This is quite hard to test directly, so we test for it by
checking a known missing feature in that version that was added in v4.7.
This should prevent users from upgrading Clang but not GCC and hosting
with a too-old GCC's libstdc++ and getting strange and hard to debug
errors when we switch to C++11 by default.
Also, switch several of the macros I introduced to use AC_LANG_SOURCE
rather than AC_LANG_PROGRAM as we don't need configure's help writing
our main function (and we don't need such a function at all for most of
the tests).
llvm-svn: 199313
MSVC on x64 requires that we create image relative symbol
references to refer to RTTI data. Seeing as how there is no way to
explicitly make reference to a given relocation type in LLVM IR, pattern
match expressions of the form &foo - &__ImageBase.
Differential Revision: http://llvm-reviews.chandlerc.com/D2523
llvm-svn: 199312
Disabling remote MCJIT tests on ARM again, as they're still failing when
self-hosting on ARM, despite all my tests. At least now we have more info
on what message it's breaking and what is going on. Investigating.
llvm-svn: 199310
There has been an old FIXME to find the right cut-off for when it's worth
analyzing and potentially transforming a switch to a lookup table.
The switches always have two or more cases. I could not measure any speed-up
by transforming a switch with two cases. A switch with three cases gets a nice
speed-up, and I couldn't measure any compile-time regression, so I think this
is the right threshold.
In a Clang self-host, this causes 480 new switches to be transformed,
and reduces the final binary size with 8 KB.
llvm-svn: 199294
The GNU as behavior is a bit different and very strange. It will mark any
label that contains an instruction. We can implement that, but using the
type looks more natural since gas will not mark a function if a .word is
used to output the instructions!
llvm-svn: 199287
When expanding neon pseudo stores, it may miss the implicit uses of sub
regs, which may cause post RA scheduler reorder instructions that
breakes anti dependency.
For example:
VST1d64QPseudo %R0<kill>, 16, %Q9_Q10, pred:14, pred:%noreg
will be expanded to
VST1d64Q %R0<kill>, 16, %D18, pred:14, pred:%noreg;
An instruction that defines %D20 may be scheduled before the store by
mistake.
This patches adds implicit uses for such case. For the example above, it
emits:
VST1d64Q %R0<kill>, 8, %D18, pred:14, pred:%noreg, %Q9_Q10<imp-use>
llvm-svn: 199282
The changes caused by folding an sp-adjustment into a "pop" previously
disrupted the forward search for the final real instruction in a
terminating block. This switches to a backward search (skipping debug
instrs).
This fixes PR18399.
Patch by Zhaoshi.
llvm-svn: 199266
We should set them to expand for now since there are no patterns
dealing with them. Actually, there are no instructions either so I
doubt they'll ever be acceptable.
llvm-svn: 199265
MCJIT remote execution (ChildTarget+RemoteTargetExternal) protocol was in
dire need of refactoring. It was fail-prone, had no error reporting and
implemented the same message logic on every single function.
This patch rectifies it, and makes it work on ARM, where it was randomly
failing. Other architectures shall profit from this change as well, making
their buildbots and releases more reliable.
llvm-svn: 199261