loads.
This handles many more cases than just the AA metadata, some of them
suggested by Hal in his review of the AA metadata handling patch. I've
tried to test this behavior where tractable to do so.
I'll point out that I have specifically *not* included a test for
debuginfo because it was going to require 2 or 3 times as much work to
craft some input which would survive the "helpful" stripping of debug
info metadata that doesn't match the desired schema. This is another
good example of why the current state of write-ability for our debug
info metadata is unacceptable. I spent over 30 minutes trying to conjure
some test case that would survive, even copying from other debug info
tests, but it always failed to survive with no explanation of why or how
I might fix it. =[
llvm-svn: 220165
up to where it actually works as intended. The problem is that
a GlobalAlias isa GlobalValue and so the prior block handled all of the
cases.
This allows us to constant fold based on the actual constant expression
in the global alias. As an example, see the last function in the newly
added test case which explicitly aligns an unaligned pointer using
constant expression math. Without this change, we fail to see that and
fold an alignment test to zero.
llvm-svn: 220164
The following implements the transformation:
(sub (or A B) (xor A B)) --> (and A B).
Patch by Ankur Garg!
Differential Revision: http://reviews.llvm.org/D5719
llvm-svn: 220163
The following implements the optimization for sequences of the form:
icmp eq/ne (shl Const2, A), Const1
Such sequences can be transformed to:
icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2))
This handles only the equality operators for now. Other operators need
to be handled.
Patch by Ankur Garg!
llvm-svn: 220162
by my refactoring of this code.
The method isSafeToLoadUnconditionally assumes that the load will
proceed with the preferred type alignment. Given that, it has to ensure
that the alloca or global is at least that aligned. It has always done
this historically when a datalayout is present, but has never checked it
when the datalayout is absent. When I refactored the code in r220156,
I exposed this path when datalayout was present and that turned the
latent bug into a patent bug.
This fixes the issue by just removing the special case which allows
folding things without datalayout. This isn't worth the complexity of
trying to tease apart when it is or isn't safe without actually knowing
the preferred alignment.
llvm-svn: 220161
make much more sense and in theory be more correct.
If you trace the code alllll the way back to when it was first
introduced, the comments make it slightly more clear what was going on
here. At that time, the only way Base != V was if DL (then TD) was
non-null. As a consequence, if DL *was* null, that meant we were loading
directly from the alloca or global found above the test. After
refactoring, this has become at least terribly subtle and potentially
incorrect. There are many forms of pointer manipulation that can be
traversed without DataLayout, and some of them would in fact change the
size of object being loaded vs. allocated.
Rather than this subtlety, I've hoisted the actual 'return true' bits
into the code which actually found an alloca or global and based them on
the loaded pointer being that alloca or global. This is both more clear
and safer. I've also added comments about exactly why this set of
predicates is used.
I've also corrected a misleading comment about globals -- if overridden
they may not just have a different size, they may be null and completely
unsafe to load from!
Hopefully this confuses the next reader a bit less. I don't have any
test cases or anything, the patch is motivated strictly to improve the
readability of the code.
llvm-svn: 220156
...)) and (load (cast ...)): canonicalize toward the former.
Historically, we've tried to load using the type of the *pointer*, and
tried to match that type as closely as possible removing as many pointer
casts as we could and trading them for bitcasts of the loaded value.
This is deeply and fundamentally wrong.
Repeat after me: memory does not have a type! This was a hard lesson for
me to learn working on SROA.
There is only one thing that should actually drive the type used for
a pointer, and that is the type which we need to use to load from that
pointer. Matching up pointer types to the loaded value types is very
useful because it minimizes the physical size of the IR required for
no-op casts. Similarly, the only thing that should drive the type used
for a loaded value is *how that value is used*! Again, this minimizes
casts. And in fact, the *only* thing motivating types in any part of
LLVM's IR are the types used by the operations in the IR. We should
match them as closely as possible.
I've ended up removing some tests here as they were testing bugs or
behavior that is no longer present. Mostly though, this is just cleanup
to let the tests continue to function as intended.
The only fallout I've found so far from this change was SROA and I have
fixed it to not be impeded by the different type of load. If you find
more places where this change causes optimizations not to fire, those
too are likely bugs where we are assuming that the type of pointers is
"significant" for optimization purposes.
llvm-svn: 220138
This test is pretty awesome. It is claiming to test devirtualization.
However, the code in question is not in fact devirtualized by LLVM. If
you take the original C++ test case and run it through Clang at -O3 we
fail to devirtualize it completely. It also isn't a sufficiently focused
test case.
The *reason* we fail to devirtualize it isn't because of any missing
instcombine though. Instead, it is because we fail to emit an available
externally vtable and thus the vtable is just an external and completely
opaque. If I cause the vtable to be emitted, we successfully
devirtualize things.
Anyways, I'm just removing it because it is providing negative value at
this point: it isn't representative of the output of Clang really, LLVM
isn't doing the transform it claims to be testing, LLVM's failure to do
the transform isn't actually an LLVM bug at all and we shouldn't be
testing for it here, and finally the test is written in such a way that
it will trivially pass even when the point of the test is failing.
llvm-svn: 220137
cases where the alloca type, the load types, and the store types used
all disagree.
Previously, the only way that vector-based promotion occured was if the
alloca type was a vector type. This was one of the *very* few remaining
uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns
out it was a bad idea.
The alloca type can change very easily based on the mixture of types
loaded and stored to that alloca. We shouldn't be relying on it as
a signal for very much. Instead, the source of truth should be loads and
stores. We should canonicalize the loads and stores as much as possible
and then rely on them exclusively in SROA.
When looking and loads and stores, we may find many different candidate
vector types. This change will let SROA try all of them to find a vector
type which is a viable way to promote the entire alloca to a vector
register.
With this change, it becomes possible to do better canonicalization and
optimization of loads and stores without breaking SROA in random ways,
and that should allow fixing a core source of performance loss in hot
numerical loops such as those in Eigen.
llvm-svn: 220116
The previous tests claimed to test constant offsets in the function name,
but the tests weren't actually testing them.
Clone the tests, and do testing of all combinations of the following:
1) with/without constant pointer offset
2) 32/64-bit addressing modes
3) Usage and non-usage of the return value from the atomicrmw
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220103
The function name now matches what it's actually testing.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 220102
TL;DR: Indexing maps with [] creates missing entries.
The long version:
When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime. Trouble is, we don't first check to see if this is a dynamic alloca.
On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime. 0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots.
PEI would later trigger an assert because it expects the stack protector to not be dead.
This fix ensures that we only get frame indices for static allocas, ie, those in the map. Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken.
rdar://problem/18672951
llvm-svn: 220099
This reverts commit r219899.
This also updates byval-tail-call.ll to make it clear what was breaking.
Adding r219899 again will cause the load/store to disappear.
llvm-svn: 220093
With VSX enabled, LLVM crashes when compiling
test/CodeGen/PowerPC/fma.ll. I traced this to the liveness test
that's revised in this patch. The interval test is designed to only
work for virtual registers, but in this case the AddendSrcReg is
physical. Since there is already a walk of the MIs between the
AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg
in that loop. At Hal Finkel's request, I converted the liveness test
to an assert restricted to virtual registers.
I've changed the fma.ll test to have VSX and non-VSX variants so we
can test both kinds of multiply-adds.
llvm-svn: 220090
The generic code trying to use findCommutedOpIndices won't
understand that it needs to swap the modifier operands also,
so it should fail if they are set.
llvm-svn: 220064
When the input to a store instruction was a zero vector, the backend
always selected a normal vector store regardless of the non-temporal
hint. This is fixed by this patch.
This fixes PR19370.
llvm-svn: 220054
We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same.
While we're at it, avoid writing to std::vector::end()...
Bug found with random testing and a lot of coffee.
llvm-svn: 220051
Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64
types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This
patch adds that support.
As with lxvd2x/stxvd2x, this involves straightforward overriding of
the patterns normally recognized for lvx/stvx, with preference given
to the VSX patterns when VSX is enabled.
In addition, the logic for permitting misaligned memory accesses is
modified so that v4r32 and v4i32 are treated the same as v2f64 and
v2i64 when VSX is enabled. Finally, the DAG generation for unaligned
loads is changed to just use a normal LOAD (which will become lxvw4x)
on P8 and later hardware, where unaligned loads are preferred over
lvsl/lvx/lvx/vperm.
A number of tests now generate the VSX loads/stores instead of
lvx/stvx, so this patch adds VSX variants to those tests. I've also
added <4 x float> tests to the vsx.ll test case, and created a
vsx-p8.ll test case to be used for testing code generation for the
P8Vector feature. For now, that simply tests the unaligned load/store
behavior.
This has been tested along with a temporary patch to enable the VSX
and P8Vector features, with no new regressions encountered with or
without the temporary patch applied.
llvm-svn: 220047
v2: use dyn_cast
fixup comments
v3: use cast
Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220044
DSE's overlap checking contained special logic, used only when no DataLayout
was available, which inferred a complete overwrite when the pointee types were
equal. This logic seems fine for regular loads/stores, but does not work for
memcpy and friends. Instead of fixing this, I'm just removing it.
Philosophically, transformations should not contain enhanced behavior used only
when data layout is lacking (data layout should be strictly additive), and
maintaining these rarely-tested code paths seems not worthwhile at this stage.
Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case
(slightly reduced from that provided by Aliaksei) replaces the original
contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few
other tests have been updated to have a data layout.
llvm-svn: 220035
The only difference from r219829 is using
getOrCreateSectionSymbol(*ELFSec)
instead of
GetOrCreateSymbol(ELFSec->getSectionName())
in ELFObjectWriter which causes us to use the correct section symbol even if
we have multiple sections with the same name.
Original messages:
r219829:
Correctly handle references to section symbols.
When processing assembly like
.long .text
we were creating a new undefined symbol .text. GAS on the other hand would
handle that as a reference to the .text section.
This patch implements that by creating the section symbols earlier so that
they are visible during asm parsing.
The patch also updates llvm-readobj to print the symbol number in the relocation
dump so that the test can differentiate between two sections with the same name.
r219835:
Allow forward references to section symbols.
llvm-svn: 220021
Patch by Bill Seurer; committed on his behalf.
These test cases generate slightly different code sequences when VSX
is activated and thus fail. The update turns off VSX explicitly for
the existing checks and then adds a second set of checks for most of
them that test the VSX instruction output.
llvm-svn: 220019
The bug is in ARMConstantIslands::createNewWater where the upper bound of the
new water split point is computed:
// This could point off the end of the block if we've already got constant
// pool entries following this block; only the last one is in the water list.
// Back past any possible branches (allow for a conditional and a maximally
// long unconditional).
if (BaseInsertOffset + 8 >= UserBBI.postOffset()) {
BaseInsertOffset = UserBBI.postOffset() - UPad - 8;
DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset));
}
The split point is supposed to be somewhere between the machine instruction that
loads from the constant pool entry and the end of the basic block, before branch
instructions. The code above is fine if the basic block is large enough and
there are a sufficient number of instructions following the machine instruction.
However, if the machine instruction is near the end of the basic block,
BaseInsertOffset can point to the machine instruction or another instruction
that precedes it, and this can lead to convergence failure.
This commit fixes this bug by ensuring BaseInsertOffset is larger than the
offset of the instruction following the constant-loading instruction.
rdar://problem/18581150
llvm-svn: 220015
Revert "Correctly handle references to section symbols."
Revert "Allow forward references to section symbols."
Rui found a regression I am debugging.
llvm-svn: 220010
llvm-symbolizer will consult one of the .dSYM paths passed via -dsym-hint
if it fails to find the .dSYM bundle at the default location.
llvm-svn: 220004
This code is based on the existing LLVM Go bindings project hosted at:
https://github.com/go-llvm/llvm
Note that all contributors to the gollvm project have agreed to relicense
their changes under the LLVM license and submit them to the LLVM project.
Differential Revision: http://reviews.llvm.org/D5684
llvm-svn: 219976
This is in preparation for another patch that makes patchpoints invokable.
Reviewers: atrick, ributzka
Reviewed By: ributzka
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5657
llvm-svn: 219967
'AS'.
Using 'S' as this was a terrible idea. Arguably, 'AS' is not much
better, but it at least follows the idea of using initialisms and
removes active confusion about the AllocaSlices variable and a Slice
variable.
llvm-svn: 219963
clang-modernize.
I did have to clean up the variable types and whitespace a bit because
the use of auto made the code much less readable here.
llvm-svn: 219962
Summary:
Backends can use setInsertFencesForAtomic to signal to the middle-end that
montonic is the only memory ordering they can accept for
stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger
ordering to fences + monotonic accesses is currently living in
SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it
for several reasons:
- There is lots of redundancy to avoid: extremely similar logic already
exists in AtomicExpand.
- The current code in SelectionDAGBuilder does not use any target-hooks, it
does the same transformation for every backend that requires it
- As a result it is plain *unsound*, as it was apparently designed for ARM.
It happens to mostly work for the other targets because they are extremely
conservative, but Power for example had to switch to AtomicExpand to be
able to use lwsync safely (see r218331).
- Because it produces IR-level fences, it cannot be made sound ! This is noted
in the C++11 standard (section 29.3, page 1140):
```
Fences cannot, in general, be used to restore sequential consistency for atomic
operations with weaker ordering semantics.
```
It can also be seen by the following example (called IRIW in the litterature):
```
atomic<int> x = y = 0;
int r1, r2, r3, r4;
Thread 0:
x.store(1);
Thread 1:
y.store(1);
Thread 2:
r1 = x.load();
r2 = y.load();
Thread 3:
r3 = y.load();
r4 = x.load();
```
r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst.
But if they are lowered to monotonic accesses, no amount of fences can prevent it..
This patch does three things (I could cut it into parts, but then some of them
would not be tested/testable, please tell me if you would prefer that):
- it provides a default implementation for emitLeadingFence/emitTrailingFence in
terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder.
As we saw above, this is unsound, but the best that can be done without knowing
the targets well (and there is a comment warning about this risk).
- it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default
implementation (that exactly replicates the logic of SelectionDAGBuilder, so no
functional change)
- it finally erase this logic from SelectionDAGBuilder as it is dead-code.
Ideally, each target would define its own override for emitLeading/TrailingFence
using target-specific fences, but I do not know the Sparc/Mips/XCore memory model
well enough to do this, and they appear to be dealing fine with the ARM-inspired
default expansion for now (probably because they are overly conservative, as
Power was). If anyone wants to compile fences more agressively on these
platforms, the long comment should make it clear why he should first override
emitLeading/TrailingFence.
Test Plan: make check-all, no functional change
Reviewers: jfb, t.p.northover
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5474
llvm-svn: 219957
iterators.
There are a ton of places where it essentially wants ranges
rather than just iterators. This is just the first step that adds the
core slice range typedefs and uses them in a couple of places. I still
have to explicitly construct them because they've not been punched
throughout the entire set of code. More range-based cleanups incoming.
llvm-svn: 219955
Summary:
Currently, call slot optimization requires that if the destination is an
argument, the argument has the sret attribute. This is to ensure that
the memory access won't trap. In addition to sret, we can also allow the
optimization to happen for arguments that have the new dereferenceable
attribute, which gives the same guarantee.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5832
llvm-svn: 219950
If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().
In the simplest case, this:
y = sqrt(x * x);
becomes this:
y = fabs(x);
This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.
Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to
use function-level attributes to do this optimization. This needs to be fixed
for both the intrinsics and in the backend.
Differential Revision: http://reviews.llvm.org/D5787
llvm-svn: 219944
When the constant divisor was larger than 32bits, then the optimized code
generated for the AArch64 backend would emit the wrong code, because the shift
was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would
loose the upper 32bits.
This fixes rdar://problem/18678801.
llvm-svn: 219934
Summary:
In order to support big endian targets for the BuildPairF64 nodes we
just need to swap the low/high pair registers. Additionally, for the
ExtractElementF64 nodes we have to calculate the correct stack offset
with respect to the node's register/operand that we want to extract.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5753
llvm-svn: 219931
Make tail recursion elimination a bit more aggressive. This allows us to get
tail recursion on functions that are just branches to a different function. The
fact that the function takes a byval argument does not restrict it from being
optimised into just a tail call.
llvm-svn: 219899
Philip Reames and I had a long conversation about this, mostly because it is
not obvious why the current logic is correct. Hopefully, these comments will
prevent such confusion in the future.
llvm-svn: 219882
For pointer-typed function arguments, enhanced alignment can be asserted using
the 'align' attribute. When inlining, if this enhanced alignment information is
not otherwise available, preserve it using @llvm.assume-based alignment
assumptions.
llvm-svn: 219876
Clang CodeGen had a utility function for creating pointer alignment assumptions
using the @llvm.assume intrinsic. This functionality will also be needed by the
inliner (to preserve function-argument alignment attributes when inlining), so
this moves the utility function into IRBuilder where it can be used both by
Clang CodeGen and also other LLVM-level code.
llvm-svn: 219875
In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4
respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative
VTs).
Since DQ has native support for these intructions, I peeled off the non-"Alt"
part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are
derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ.
Fixes <rdar://problem/18426089>
llvm-svn: 219874
The new attributes are NumElts and the CD8TupleForm. This prepares the code
to enable x8 and x2 inserts.
NFC, no change in X86.td.expanded except for the new attributes.
llvm-svn: 219871
It's the W bit that selects between 32 or 64 elt type and not the opcode. The
opcode selects between the width of the insert (128 or 256).
llvm-svn: 219870
This CL introduces MachOObjectFile::getUuid(). This function returns an ArrayRef to the object file's UUID, or an empty ArrayRef if the object file doesn't contain an LC_UUID load command.
The new function is gonna be used by llvm-symbolizer.
llvm-svn: 219866
The SelectDS1Addr1Offset complex pattern always tries to store constant
lds pointers in the offset operand and store a zero value in the addr operand.
Since the addr operand does not accept immediates, the zero value
needs to first be copied to a register.
This newly created zero value will not go through normal instruction
selection, so we need to manually insert a V_MOV_B32_e32 in the complex
pattern.
This bug was hidden by the fact that if there was another zero value
in the DAG that had not been selected yet, then the CSE done by the DAG
would use the unselected node for the addr operand rather than the one
that was just created. This would lead to the zero value being selected
and the DAG automatically inserting a V_MOV_B32_e32 instruction.
llvm-svn: 219848
This original fix for the build break was correct. LLVM_ATTRIBUTE_USED
removes the warning message because it keeps the function in the object
file. LLVM_ATTRIBUTE_UNUSED indicates that it may or may not be used
depending on build settings.
llvm-svn: 219846
Store `User::NumOperands` (and `MDNode::NumOperands`) in `Value`.
On 64-bit host architectures, this reduces `sizeof(User)` and all
subclasses by 8, and has no effect on `sizeof(Value)` (or, incidentally,
on `sizeof(MDNode)`).
On 32-bit host architectures, this increases `sizeof(Value)` by 4.
However, it has no effect on `sizeof(User)` and `sizeof(MDNode)`, so the
only concrete subclasses of `Value` that actually see the increase are
`BasicBlock`, `Argument`, `InlineAsm`, and `MDString`. Moreover, I'll
be shocked and confused if this causes a tangible memory regression.
This has no functionality change (other than memory footprint).
llvm-svn: 219845
A follow-up commit will modify the memory-layout of `Value`, `User`, and
`MDNode`. First fix the comments to be doxygen-friendly (and to follow
the coding standards).
- Use "\brief" instead of "repeatedName -".
- Add a brief intro where it was missing.
- Remove duplicated comments from source files (and a couple of
noisy/trivial comments altogether).
llvm-svn: 219844
If x is known to have the range [a, b) in a loop predicated by (icmp
ne x, a), its range can be sharpened to [a + 1, b). Get
ScalarEvolution and hence IndVars to exploit this fact.
This change triggers an optimization to widen-loop-comp.ll, so it had
to be edited to get it to pass.
phabricator: http://reviews.llvm.org/D5639
llvm-svn: 219834
Truncate the operands of a switch instruction to a narrower type if the upper
bits are known to be all ones or zeros.
rdar://problem/17720004
llvm-svn: 219832
This is mostly a copy of the existing FastISel GEP code, but we have to
duplicate it for AArch64, because otherwise we would bail out even for simple
cases. This is because the standard fastEmit functions don't cover MUL at all
and ADD is lowered very inefficientily.
The original commit had a bug in the add emit logic, which has been fixed.
llvm-svn: 219831
When processing assembly like
.long .text
we were creating a new undefined symbol .text. GAS on the other hand would
handle that as a reference to the .text section.
This patch implements that by creating the section symbols earlier so that
they are visible during asm parsing.
The patch also updates llvm-readobj to print the symbol number in the relocation
dump so that the test can differentiate between two sections with the same name.
llvm-svn: 219829
This adds the MCInstPrinter to the LLVMHexagonDesc library and removes
the dependency LLVMHexagonAsmPrinter had on LLVMHexagonDesc. This is
a prerequisite needed by the disassembler.
Phabricator Revision: http://reviews.llvm.org/D5734
llvm-svn: 219826
1. Use const with autos.
2. Don't bother with explicit const in cast ops because they do it automagically.
Thanks, David B. / Aaron B. / Reid K.
llvm-svn: 219817
The SLP vectorizer should not vectorize ephemeral values. These are used to
express information to the optimizer, and vectorizing them does not lead to
faster code (because the ephemeral values are dropped prior to code generation,
vectorized or not), and obscures the information the instructions are
attempting to communicate (the logic that interprets the arguments to
@llvm.assume generically does not understand vectorized conditions).
Also, uses by ephemeral values are free (because they, and the necessary
extractelement instructions, will be dropped prior to code generation).
llvm-svn: 219816
We need to make sure that we visit all operands of an instruction before moving
deeper in the operand graph. We had been pushing operands onto the back of the work
set, and popping them off the back as well, meaning that we might visit an
instruction before visiting all of its uses that sit in between it and the call
to @llvm.assume.
To provide an explicit example, given the following:
%q0 = extractelement <4 x float> %rd, i32 0
%q1 = extractelement <4 x float> %rd, i32 1
%q2 = extractelement <4 x float> %rd, i32 2
%q3 = extractelement <4 x float> %rd, i32 3
%q4 = fadd float %q0, %q1
%q5 = fadd float %q2, %q3
%q6 = fadd float %q4, %q5
%qi = fcmp olt float %q6, %q5
call void @llvm.assume(i1 %qi)
%q5 is used by both %qi and %q6. When we visit %qi, it will be marked as
ephemeral, and we'll queue %q6 and %q5. %q6 will be marked as ephemeral and
we'll queue %q4 and %q5. Under the old system, we'd then visit %q4, which
would become ephemeral, %q1 and then %q0, which would become ephemeral as
well, and now we have a problem. We'd visit %rd, but it would not be marked as
ephemeral because we've not yet visited %q2 and %q3 (because we've not yet
visited %q5).
This will be covered by a test case in a follow-up commit that enables
ephemeral-value awareness in the SLP vectorizer.
llvm-svn: 219815
Summary:
Currently an error is thrown if bundle alignment mode is set more than once
per module (either via the API or the .bundle_align_mode directive). This
change allows setting it multiple times as long as the alignment doesn't
change.
Also nested bundle_lock groups are currently not allowed. This change allows
them, with the effect that the group stays open until all nests are exited,
and if any of the bundle_lock directives has the align_to_end flag, the
group becomes align_to_end.
These changes make the bundle aligment simpler to use in the compiler, and
also better match the corresponding support in GNU as.
Reviewers: jvoung, eliben
Differential Revision: http://reviews.llvm.org/D5801
llvm-svn: 219811
Follow-up to r219801. Post-commit review pointed out that all comments
require a `\brief` description [1], so I converted many and recrafted a
few to be briefer or to include a brief intro. (If I'm going to clean
them up, I should do it right!)
[1]: http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments
llvm-svn: 219808
A number of comment cleanups:
- Remove duplicated function and class names from comments.
- Remove duplicated comments from source file (some of which were
out-of-sync).
- Move any unduplicated comments from source file to header.
- Remove some noisy comments entirely (e.g., a comment for
`DIDescriptor::print()` saying "print descriptor" just gets in the
way of reading the code).
llvm-svn: 219801
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in
isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo.
The old heuristics caused instructions to sink unnecessarily, and might create
register pressure.
This is the second try of the fix. The first one (D4814) caused a performance
regression due to failing to sink instructions out of loops (PR21115). This
patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower
one regardless of whether the target block post-dominates the source.
Thanks Alexey Volkov for reporting PR21115!
Test Plan:
Added a NVPTX codegen test to verify that our change prevents the backend from
over-sinking. It also shows the unnecessary register pressure caused by
over-sinking.
Added an X86 test to verify we can sink instructions out of loops regardless of
the dominance relationship. This test is reduced from Alexey's test in PR21115.
Updated an affected test in X86.
Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime
performance. Results are attached separately in the review thread.
Reviewers: Jiangning, resistor, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D5633
llvm-svn: 219773
Early attempts to support AAPCS bare metal MachO targets based the decision on
the CPU being compiled for. This was not a particularly great idea and we've
got a better option now, but this check remained.
No functional change for any target we care about.
llvm-svn: 219767
This is a follow up to commit r219742. It removes the CCInMI variable
and accesses the CC in CSCINC directly. In the case of a conditional
branch accessing the CC with CCInMI was wrong.
llvm-svn: 219748
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.
Examples:
1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
to b.<invCC>
2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
to b.<CC>
rdar://problem/18506500
llvm-svn: 219742
A few minor changes to prevent @llvm.assume from interfering with loop
vectorization. First, treat @llvm.assume like the lifetime intrinsics, which
are scalarized (but don't otherwise interfere with the legality checking).
Second, ignore the cost of ephemeral instructions in the loop (these will go
away anyway during CodeGen).
Alignment assumptions and other uses of @llvm.assume can often end up inside of
loops that should be vectorized (this is not uncommon for assumptions generated
by __attribute__((align_value(n))), for example).
llvm-svn: 219741
Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq SSE2/AVX2 byte shift instructions.
Differential Revision: http://reviews.llvm.org/D5598
llvm-svn: 219738
Thumb1 has legitimate reasons for preferring 32-bit alignment of types
i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be
a multiple of 4. However, this is a trade-off betweem code size and RAM usage;
the DataLayout string is not the best place to represent it even if desired.
So this patch removes the extra Thumb requirements, hopefully making ARM and
Thumb completely compatible in this respect.
llvm-svn: 219734
There's no hard requirement on LLVM to align local variable to 32-bits, so the
Thumb1 frame handling needs to be able to deal with variables that are only
naturally aligned without falling over.
llvm-svn: 219733
This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail
out even for simple cases, because the standard fastEmit functions don't cover
MUL and ADD is lowered inefficientily.
llvm-svn: 219726
Before, ARM and Thumb mode code had different preferred alignments, which could
lead to some rather unexpected results. There's justification for reducing it
from the default 64-bits (wasted space), but I don't think there is for going
below 32-bits.
There's no actual ABI change here, just to reassure people.
llvm-svn: 219719
The CFL-AA implementation was missing a visit* routine for va_arg instructions,
causing it to assert when run on a function that had one. For now, handle these
in a conservative way.
Fixes PR20954.
llvm-svn: 219718
Eliminate library calls and intrinsic calls to fabs when the input
is a squared value.
Note that no unsafe-math / fast-math assumptions are needed for
this optimization.
Differential Revision: http://reviews.llvm.org/D5777
llvm-svn: 219717
Sign-/zero-extend folding depended on the load and the integer extend to be
both selected by FastISel. This cannot always be garantueed and SelectionDAG
might interfer. This commit adds additonal checks to load and integer extend
lowering to catch this.
Related to rdar://problem/18495928.
llvm-svn: 219716
We assumed that A must be greater than B because the right hand side of
a remainder operator must be nonzero.
However, it is possible for A to be less than B if Pow2 is a power of
two greater than 1.
Take for example:
i32 %A = 0
i32 %B = 31
i32 Pow2 = 2147483648
((Pow2 << 0) >>u 31) is non-zero but A is less than B.
This fixes PR21274.
llvm-svn: 219713
This effectively reverts revert 219707. After fixing the test to work with
new function name format and renamed intrinsic.
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219710
Reapply r216913, a fix for PR20832 by Andrea Di Biagio. The commit was reverted
because of buildbot failures, and credit goes to Ulrich Weigand for isolating
the underlying issue (which can be confirmed by Valgrind, which does helpfully
light up like the fourth of July). Uli explained the problem with the original
patch as:
It seems the problem is calling multiplySignificand with an addend of category
fcZero; that is not expected by this routine. Note that for fcZero, the
significand parts are simply uninitialized, but the code in (or rather, called
from) multiplySignificand will unconditionally access them -- in effect using
uninitialized contents.
This version avoids using a category == fcZero addend within
multiplySignificand, which avoids this problem (the Valgrind output is also now
clean).
Original commit message:
[APFloat] Fixed a bug in method 'fusedMultiplyAdd'.
When folding a fused multiply-add builtin call, make sure that we propagate the
correct result in the case where the addend is zero, and the two other operands
are finite non-zero.
Example:
define double @test() {
%1 = call double @llvm.fma.f64(double 7.0, double 8.0, double 0.0)
ret double %1
}
Before this patch, the instruction simplifier wrongly folded the builtin call
in function @test to constant 'double 7.0'.
With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and
propagates the expected result (i.e. 56.0).
Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra
test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence
of NaN/Inf operands.
This fixes PR20832.
llvm-svn: 219708
v2: Add SI lowering
Add test
v3: Place work dimensions after the kernel arguments.
v4: Calculate offset while lowering arguments
v5: rebase
v6: change prefix to AMDGPU
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219705
Summary:
In order to facilitate use of common code, checking by reviewers of other fast-isel ports, and hopefully to eventually move most of Mips and other fast-isel ports into target independent code, I've tried to get the two implementations to line up.
There is no functional code change. Just methods moved in the file to be in the same order as in AArch64.
Test Plan: No functional change.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, aemerson, rfuhler
Differential Revision: http://reviews.llvm.org/D5692
llvm-svn: 219703
Let me tell you a tale...
Originally committed in r211723 after discovering a nasty case of weird
scoping due to inlining, this was reverted in r211724 after it fired in
ASan/compiler-rt.
(minor diversion where I accidentally committed/reverted again in
r211871/r211873)
After further testing and fixing bugs in ArgumentPromotion (r211872) and
Inlining (r212065) it was recommitted in r212085. Reverted in r212089
after the sanitizer buildbots still showed problems.
Fixed another bug in ArgumentPromotion (r212128) found by this
assertion.
Recommitted in r212205, reverted in r212226 after it crashed some more
on sanitizer buildbots.
Fix clang some more in r212761.
Recommitted in r212776, reverted in r212793. ASan failures.
Recommitted in r213391, reverted in r213432, trying to reproduce flakey
ASan build failure.
Fixed bugs in r213805 (ArgPromo + DebugInfo), r213952
(LiveDebugVariables strips dbg_value intrinsics in functions not
described by debug info).
Recommitted in r214761, reverted in r214999, flakey failure on Windows
buildbot.
Fixed DeadArgElimination + DebugInfo bug in r219210.
Recommitted in r219215, reverted in r219512, failure on ObjC++ atomic
properties in the test-suite on Darwin.
Fixed ObjC++ atomic properties issue in Clang in r219690.
[This commit is provided 'as is' with no hope that this is the last time
I commit this change either expressed or implied]
llvm-svn: 219702
When LazyValueInfo uses @llvm.assume intrinsics to provide edge-value
constraints, we should check for intrinsics that dominate the edge's branch,
not just any potential context instructions. An assumption that dominates the
edge's branch represents a truth on that edge. This is specifically useful, for
example, if multiple predecessors assume a pointer to be nonnull, allowing us
to simplify a later null comparison.
The test case, and an initial patch, were provided by Philip Reames. Thanks!
llvm-svn: 219688
and TargetRegisterInfo in the peephole optimizer. This
makes it easier to grab subtarget dependent variables off
of the MachineFunction rather than the TargetMachine.
llvm-svn: 219669
e.g Currently we'll generate following instructions if the immediate is too wide:
MOV X0, WideImmediate
ADD X1, BaseReg, X0
LDR X2, [X1, 0]
Using [Base+XReg] addressing mode can save one ADD as following:
MOV X0, WideImmediate
LDR X2, [BaseReg, X0]
Differential Revision: http://reviews.llvm.org/D5477
llvm-svn: 219665
This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block
and a test to check that this condition is handled.
llvm-svn: 219656
Rather than define our own standards, we adopt a set of best practices that
are already in use by the Go community.
Differential Revision: http://reviews.llvm.org/D5761
llvm-svn: 219646
the IR going into it and to clean up the IR produced by the vectorizers.
Note that these are *off by default* right now while folks collect data
on whether the performance tradeoff is reasonable.
In a build of the 'opt' binary, I see about 2% compile time regression
due to this change on average. This is in my mind essentially the worst
expected case: very little of the opt binary is going to *benefit* from
these extra passes.
I've seen several benchmarks improve in performance my small amounts due
to running these passes, and there are certain (rare) cases where these
passes make a huge difference by either enabling the vectorizer at all
or by hoisting runtime checks out of the outer loop. My primary
motivation is to prevent people from seeing runtime check overhead in
benchmarks where the existing passes and optimizers would be able to
eliminate that.
I've chosen the sequence of passes based on the kinds of things that
seem likely to be relevant for the code at each stage: rotaing loops for
the vectorizer, finding correlated values, loop invariants, and
unswitching opportunities from any runtime checks, and cleaning up
commonalities exposed by the SLP vectorizer.
I'll be pinging existing threads where some of these issues have come up
and will start new threads to get folks to benchmark and collect data on
whether this is the right tradeoff or we should do something else.
llvm-svn: 219644
This goes with the earlier commit to remove the static destructor from ManagedStatic.cpp by controlling the allocation and de-allocation of the mutex.
Summary: This is part of the ongoing work to remove static constructors and destructors.
Reviewers: chandlerc, rnk
Reviewed By: rnk
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D5473
llvm-svn: 219640
We assumed that negation operations of the form (0 - %Z) resulted in a
negative number. This isn't true if %Z was originally negative.
Substituting the negative number into the remainder operation may result
in undefined behavior because the dividend might be INT_MIN.
This fixes PR21256.
llvm-svn: 219639
This patch adds a new llvm_call_once function which is used by the ManagedStatic implementation to safely initialize a global to avoid static construction and destruction.
llvm-svn: 219638
We have a transform that changes:
(x lshr C1) udiv C2
into:
x udiv (C2 << C1)
However, it is unsafe to do so if C2 << C1 discards any of C2's bits.
This fixes PR21255.
llvm-svn: 219634
Summary:
Make Mips fast-isel track the form of AArch64 where practical.
This makes it easier for people to review the code, to borrow similar code, and to see how to eventually move a lot of this
target code for fast-isels into target independent code.
These are just cosmetic changes. Should be no functional difference.
Test Plan:
make check
test-suite for 4 flavors mips32 r1/r2 , -O0/-O2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: aemerson, llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D5595
llvm-svn: 219633
Broken parent scope pointers in inlined DIVariables can cause
ensureAbstractVariableIsCreated to insert new abstract scopes, thus
invalidating the iterator in this loop and leading to hard-to-debug
crashes. Useful when manually reducing IR for testcases.
llvm-svn: 219628
Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is
possible for a 64-bit multiply-accumulate instruction in AArch64 state to
generate an incorrect result. The details are quite complex and hard to
determine statically, since branches in the code may exist in some
circumstances, but all cases end with a memory (load, store, or prefetch)
instruction followed immediately by the multiply-accumulate operation.
The safest work-around for this issue is to make the compiler avoid emitting
multiply-accumulate instructions immediately after memory instructions and the
simplest way to do this is to insert a NOP.
This patch implements such work-around in the backend, enabled via the option
-aarch64-fix-cortex-a53-835769.
The work-around code generation is not enabled by default.
llvm-svn: 219603
Summary: [asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register.
Reviewers: eugenis
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5599
llvm-svn: 219602
This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.
This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too.
Differential Revision: http://reviews.llvm.org/D5701
llvm-svn: 219584
A helper routine, MultiplyOverflows, was a less efficient
reimplementation of APInt's smul_ov and umul_ov. While we are here,
clean up the code so it's more uniform.
No functionality change intended.
llvm-svn: 219583
On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member
variables private and clean up uses to go through the existing accessors.
NFC.
llvm-svn: 219573
Consider the case where X is 2. (2 <<s 31)/s-2147483648 is zero but we
would fold to X. Note that this is valid when we are in the unsigned
domain because we require NUW: 2 <<u 31 results in poison.
This fixes PR21245.
llvm-svn: 219568
consider:
C1 = INT_MIN
C2 = -1
C1 * C2 overflows without a doubt but consider the following:
%x = i32 INT_MIN
This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1.
N. B. Move the unsigned version of this transform to InstSimplify, it
doesn't create any new instructions.
This fixes PR21243.
llvm-svn: 219567
consider:
mul i32 nsw %x, -2147483648
this instruction will not result in poison if %x is 1
however, if we transform this into:
shl i32 nsw %x, 31
then we will be generating poison because we just shifted into the sign
bit.
This fixes PR21242.
llvm-svn: 219566
getSmallConstantTripCount even when it isn't the exiting block.
I missed this in my first audit, very sorry. This was found in LNT and
elsewhere. I don't have a test case, but it was completely obvious from
inspection that this was the problem. I'll see if I can reduce a test
case, but I'm not really hopeful, and the value seems quite low.
llvm-svn: 219562
Summary: Implement the most basic form of conditional branches in Mips fast-isel.
Test Plan:
br1.ll
run 4 flavors of test-suite. mips32 r1/r2 and at -O0/O2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D5583
llvm-svn: 219556
routines and fix all of the bugs they expose.
I hit a test case that crashed even without these asserts due to passing
a non-exiting latch to the ExitingBlock parameter of the trip count
computation machinery. However, when I add the nice asserts, it turns
out we have plenty of coverage of these bugs, they just didn't manifest
in crashers.
The core problem seems to stem from an assumption that the latch *is*
the exiting block. While this is often true, and somewhat the "normal"
way to think about loops, it isn't necessarily true. The correct way to
call the trip count routines in a *generic* fashion (that is, without
a particular exit in mind) is to just use the loop's single exiting
block if it has one. The trip count can't be computed generically unless
it does. This works great for the loop vectorizer. The loop unroller
actually *wants* to select the latch when it has to chose between
multiple exits because for unrolling it is the latch trips that matter.
But if this is the desire, it needs to explicitly guard for non-exiting
latches and check for the generic trip count in that case.
I've added the asserts, and added convenience APIs for querying the trip
count generically that check for a single exit block. I've kept the APIs
consistent between computing trip count and trip multiples.
Thansk to Mark for the help debugging and tracking down the *right* fix
here!
llvm-svn: 219550
The LLVM Lang Ref states for signed/unsigned int to float conversions:
"If the value cannot fit in the floating point value, the results are undefined."
And for FP to signed/unsigned int:
"If the value cannot fit in ty2, the results are undefined."
This matches the C definitions.
The existing behavior pins to infinity or a max int value, but that may just
lead to more confusion as seen in:
http://llvm.org/bugs/show_bug.cgi?id=21130
Returning undef will hopefully lead to a less silent failure.
Differential Revision: http://reviews.llvm.org/D5603
llvm-svn: 219542
1) Explicitly provide important arguments to llvm-symbolizer,
not relying on defaults.
2) Be more defensive about symbolizer output.
This might fix weird failures on ninja-x64-msvc-RA-centos6 buildbot.
llvm-svn: 219541
In fact, symbolization is now expected to work only on Linux and
FreeBSD/NetBSD, where we have dl_iterate_phdr and can learn the
main executable name without argv0 (it will be possible on BSD systems
after http://reviews.llvm.org/D5693 lands). #ifdef-out the code for
all the rest Unix systems.
Reviewed in http://reviews.llvm.org/D5610
llvm-svn: 219534
Currently this only functions to match simple cases
where ds_read2_* / ds_write2_* instructions can be used.
In the future it might match some of the other weird
load patterns, such as direct to LDS loads.
Currently enabled only with a subtarget feature to enable
easier testing.
llvm-svn: 219533
It also makes it more aggressive in querying range information by
adding a call to isKnownPredicateWithRanges to
isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond.
phabricator: http://reviews.llvm.org/D5638
Reviewed by: atrick, hfinkel
llvm-svn: 219532
is over a subset of condition codes.
This fixes the -Werror build which warns about use of uninitialized
variables in the default case.
llvm-svn: 219531
This invariant is violated (& the assertions fire) on some Objective C++
in the test-suite. Reverting while I investigate.
This reverts commit r219215.
llvm-svn: 219523
I was quiet surprised to find this feature being used. Fortunately the uses
I found look fairly simple. In fact, they are just a very verbose version
of the regular ar commands.
Start implementing it then by parsing the script and setting the command
variables as if we had a regular command line.
This patch adds just enough support to create an empty archive and do a bit
of error checking. In followup patches I will implement at least addmod
and addlib.
From the description in the manual, even the more general case should not
be too hard to implement if needed. The features that don't map 1:1 to
the simple command line are
* Reading from multiple archives.
* Creating multiple archives.
llvm-svn: 219521
ScalarEvolution in the presence of multiple exits. Previously all
loops exits had to have identical counts for a loop trip count to be
considered computable. This pessimization was implemented by calling
getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock)
inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME
in the comments of that function). The pessimization was added to fix
a corner case involving undefined behavior (pr/16130). This patch more
precisely handles the undefined behavior case allowing the pessimization
to be removed.
ControlsExit replaces IsSubExpr to more precisely track the case where
undefined behavior is expected to occur. Because undefined behavior is
tracked more precisely we can remove MustExit from ExitLimit. MustExit
was used to track the case where the limit was computed potentially
assuming undefined behavior even if undefined behavior didn't necessarily
occur.
llvm-svn: 219517
Fixes a logic error in the MachineScheduler found by Steve Montgomery (and
confirmed by Andy). This has gone unfixed for months because the fix has been
found to introduce some small performance regressions. However, Andy has
recommended that, at this point, we fix this to avoid further dependence on the
incorrect behavior (and then follow-up separately on any regressions), and I
agree.
Fixes PR18883.
llvm-svn: 219512
Summary: Add the ability to convert 64 or 32 bit floating point values to integer in mips fast-isel
Test Plan:
fpintconv.ll
ran 4 flavors of test-suite with no errors, misp32 r1/r2 O0/O2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, rfuhler, mcrosier
Differential Revision: http://reviews.llvm.org/D5562
llvm-svn: 219511
This change depends on the ApplePropertyString helper that I sent spearately.
Not sure how you want this tested: as a tool test by adding a binary to dump, or as an llvm test starting from an IR file?
Reviewers: dblaikie, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5689
llvm-svn: 219507
DW_AT_specification and DW_AT_abstract_origin resolving was only performed
on subroutine DIEs because it used the getSubroutineName method. Introduce
a more generic getName() and use it to dump the reference attributes.
Testcases have been updated to check the printed names instead of the offsets
except when the name could be ambiguous.
Reviewers: dblaikie, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5625
llvm-svn: 219506
The current VSX feature for PowerPC specifies availability of the VSX
instructions added with the 2.06 architecture version. With 2.07, the
architecture adds new instructions to both the Category:Vector and
Category:VSX instruction sets. Additionally, unaligned vector storage
operations have improved performance.
This patch adds a feature to provide access to the new instructions
and performance capabilities of Power8. For compatibility with GCC,
the feature is controlled via a new -mpower8-vector switch, and the
feature causes the __POWER8_VECTOR__ builtin define to be generated by
the preprocessor.
There is a companion patch for cfe being committed at the same time.
llvm-svn: 219501
This is dangerous for numerous reasons. The primary risk here is with
floating point or double types where if the wrong header files are
included in a strange order this can implicitly convert to integers and
then call the C abs function on the integers. There is a secondary risk
that even impacts integers where if the namespace the code is written in
ever defines an abs overload for types within that namespace the global
abs will be hidden. The correct form is to call std::abs or write 'using
std::abs' for builtin types (and only the latter is correct in any
generic context).
I've also added the requisite header to be a bit more explicit here.
llvm-svn: 219484
We, I suppose naïvely, believed the COFF specification with regard to
auxiliary symbol records which defined sections: they specified that the
symbol value should be zero. However, dumpbin and MinGW's objdump do
not consider the symbol value as a restriction. Relaxing this allows us
to properly dump MinGW linked executables.
llvm-svn: 219479
to what we actually want ilogb implementation. This makes everything
*much* easier to deal with and is actually what we want when using it
anyways.
llvm-svn: 219474
instead
We used to transform this:
define void @test6(i1 %cond, i8* %ptr) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br label %bb2
bb2:
%ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
store i8 2, i8* %ptr.2, align 8
ret void
}
into this:
define void @test6(i1 %cond, i8* %ptr) {
%ptr.2 = select i1 %cond, i8* null, i8* %ptr
store i8 2, i8* %ptr.2, align 8
ret void
}
because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).
The existing transformation that removes unreachable control flow in simplifycfg
is:
/// If BB has an incoming value that will always trigger undefined behavior
/// (eg. null pointer dereference), remove the branch leading here.
static bool removeUndefIntroducingPredecessor(BasicBlock *BB)
Now we generate:
define void @test6(i1 %cond, i8* %ptr) {
store i8 2, i8* %ptr.2, align 8
ret void
}
I did not see any impact on the test-suite + externals.
rdar://18596215
llvm-svn: 219462
Long section names are represented as a slash followed by a numeric
ASCII string. This number is an offset into a string table.
Print the appropriate entry in the string table instead of the less
enlightening /4.
N.B. yaml2obj already does the right thing, this test exercises both
sides of the (de-)serialization.
llvm-svn: 219458
code using it more readable.
Also add a copySign static function that works more like the standard
function by accepting the value and sign-carying value as arguments.
No interesting logic here, but tests added to cover the basic API
additions and make sure they do something plausible.
llvm-svn: 219453
This patch changes the fast-math implementation for calculating sqrt(x) from:
y = 1 / (1 / sqrt(x))
to:
y = x * (1 / sqrt(x))
This has 2 benefits: less code / faster code and one less estimate instruction
that may lose precision.
The only target that will be affected (until http://reviews.llvm.org/D5658 is approved)
is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf
or vector sqrtf and 4 less flops for a double-precision sqrt.
We also eliminate a constant load and extra register usage.
Differential Revision: http://reviews.llvm.org/D5682
llvm-svn: 219445
The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem.
llvm-svn: 219441
This patch removes the PBQPBuilder class and its subclasses and replaces them
with a composable constraints class: PBQPRAConstraint. This allows constraints
that are only required for optimisation (e.g. coalescing, soft pairing) to be
mixed and matched.
This patch also introduces support for target writers to supply custom
constraints for their targets by overriding a TargetSubtargetInfo method:
std::unique_ptr<PBQPRAConstraints> getCustomPBQPConstraints() const;
This patch should have no effect on allocations.
llvm-svn: 219421
LLVM assumes INSERT_SUBREG will always have register operands, so
we need to legalize non-register operands, like FrameIndexes, to
avoid random assertion failures.
llvm-svn: 219420
The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x
incorrectly use the XForm_1 instruction format, rather than the
XX1Form instruction format. This is likely a pasto when creating
these instructions, which were based on lvx and so forth. This patch
uses the correct format.
The existing reformatting test (test/MC/PowerPC/vsx.s) missed this
because the two formats differ only in that XX1Form has an extension
to the target register field in bit 31. The tests for these
instructions used a target register of 7, so the default of 0 in bit
31 for XForm_1 didn't expose a problem. For register numbers 32-63
this would be noticeable. I've changed the test to use higher
register numbers to verify my change is effective.
llvm-svn: 219416
This introduces access to the AbstractSPDies map from DwarfDebug so
DwarfCompileUnit can access it. Eventually this'll sink down to
DwarfFile, but it'll still be generically accessible - not much
encapsulation to provide it. (constructInlinedScopeDIE could stay
further up, in DwarfFile to avoid exposing this - but I don't think
that's particularly better)
llvm-svn: 219411
This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we
wrongly computed the distance between the highest bits set of two negative
values.
This fixes PR21222.
Differential Revision: http://reviews.llvm.org/D5700
llvm-svn: 219406
While getSectionContents was updated to do the right thing,
getSectionSize wasn't. Move the logic to getSectionSize and leverage it
from getSectionContents.
llvm-svn: 219391
It is not useful to return the data beyond VirtualSize it's less than
SizeOfRawData.
An implementation detail of COFF requires the section size to be rounded
up to a multiple of FileAlignment; this means that SizeOfRawData is not
representative of how large the section is. Instead, we should cap it
to VirtualSize when this occurs as it represents the true size of the
section.
Note that this is only relevant in executable files because this
rounding doesn't occur in object files (and VirtualSize is always zero).
llvm-svn: 219388
(& add a few accessors/make a couple of things public for this - it's a
bit of a toss-up, but I think I prefer it this way, keeping some more of
the meaty code down in DwarfCompileUnit - if only to make for smaller
implementation files, etc)
I think we could simplify range handling a bit if we removed the range
lists from each unit and just put a single range list on DwarfDebug,
similar to address pooling.
llvm-svn: 219370
No functional change.
This is the current AVX512_maskable multiclass hierarchy:
maskable_custom
/ \
/ \
maskable_common maskable_in_asm
/ \
/ \
maskable maskable_3src
llvm-svn: 219363
This adds the Pat<>'s for the intrinsics. These are necessary because we
don't lower these intrinsics to SDNodes but match them directly. See the
rational in the previous commit.
llvm-svn: 219362
These derive from the new asm-only masking definitions.
Unfortunately I wasn't able to find a ISel pattern that we could legally
generate for the masking variants. The problem is that since the destination
is v4* we would need VK4 register classes and v4i1 value types to express the
masking. These are however not legal types/classes in AVX512f but only in VL,
so things get complicated pretty quickly. We can revisit this question later
if we have a more pressing need to express something like this.
So the ISel patterns are empty for the masking instructions and the next patch
will add Pat<>s instead to match the intrinsics calls with instructions.
llvm-svn: 219361
No functional change.
No change in X86.td.expanded except for the appearance of the new attributes.
The new attributes will be used in the subsequent patch.
llvm-svn: 219360
This change modifies fatal signal handler used in LLVM tools.
Now it attempts to find llvm-symbolizer binary and communicates
with it in order to turn instruction addresses into
function/file/line info entries. This should significantly improve
stack traces readability in Debug builds.
This feature only works on selected platforms (including Darwin
and Linux). If the symbolization fails for some reason, signal
handler will fallback to the original behavior.
Reviewed in http://reviews.llvm.org/D5610
llvm-svn: 219354
One of many steps to generalize subprogram emission to both the DWO and
non-DWO sections (to emit -gmlt-like data under fission). Once the
functions are pushed down into DwarfCompileUnit some of the data
structures will be pushed at least into DwarfFile so that they can be
unique per-file, allowing emission to both files independently.
llvm-svn: 219345
Summary:
I had forgotten to check for NotSlowIncDec in the patterns that can generate
inc/dec for the above pattern (added in D4796).
This currently applies to Atom Silvermont, KNL and SKX.
Test Plan: New checks on atomic_mi.ll
Reviewers: jfb, nadav
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5677
llvm-svn: 219336
A function with discardable linkage cannot be discarded if its a member
of a COMDAT group without considering all the other COMDAT members as
well. This sort of thing is already handled by GlobalOpt/GlobalDCE.
This fixes PR21206.
llvm-svn: 219335
There are two methods in SectionRef that can fail:
* getName: The index into the string table can be invalid.
* getContents: The section might point to invalid contents.
Every other method will always succeed and returning and std::error_code just
complicates the code. For example, a section can have an invalid alignment,
but if we are able to get to the section structure at all and create a
SectionRef, we will always be able to read that invalid alignment.
llvm-svn: 219314
Summary:
We currently emit an DW_AT_APPLE_property_attribute with a value that is a
bitfield describing the various attributes applied to an ObjectiveC property.
While trying to add testing to one of my dwarfdump patches that would pretty
print that, I realized this information looks totally broken and has maybe
never been correct.
As with every DWARF info, we have some enum in Dwarf.h that describes this
attribute (enum ApplePropertyAttributes). It seems however that the attribute
value is set from another definition of these flags in Sema/DeclSpec.h (enum
ObjCPropertyAttributeKind). And these 2 enums aren't in sync.
This patch updates the Dwarf.h values to the ones we are (and have been for
a very long time) emitting. We change some publicly (and even documented
in SourceLevelDebugging.rst) values, but I doubt this could be an issue as
the information has been wrong for so long...
Reviewers: echristo, dblaikie, aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5653
llvm-svn: 219311
This must be enforced for all v6M cores, not just the cortex-m0,
irregardless of the user-specified alignment.
Patch by Charlie Turner.
llvm-svn: 219300
Interchangeable commit ids can now be used on this git-svnrevert, which
will figure out what kind of commit that is (if you use format rNNNN for SVN
commits) and make sure the right ids are used in the right places.
It's a little bit more robust and user-friendly.
llvm-svn: 219290
We won't link in pthreads if we weren't built with LLVM_ENABLE_THREADS
which means we won't get access to pthread_sigmask. Use sigprocmask
instead.
llvm-svn: 219288
The icmp-select-icmp optimization targets select-icmp.eq
only. This is now ensured by testing the branch predicate
explictly. This commit also includes the test case for pr21199.
llvm-svn: 219282
COFF normally doesn't allow us to describe the alignment of COMMON
symbols.
It turns out that most linkers use the symbol size as a hint as to how
aligned the symbol should be.
However the BFD folks have added a .drectve command, which we
now support as of r219229, that allows us to specify the alignment
precisely. With this in mind, stop rounding sizes up.
llvm-svn: 219281
thing we do inside selection dag. This code needs to be
migrated to queries on the function rather than global
data, but this organizes things before we start grabbing
the subtarget.
llvm-svn: 219271
mach-o supports "fat" files which are a header/table-of-contents followed by a
concatenation of mach-o files built for different architectures. Currently,
MemoryBuffer has no easy way to map a subrange (slice) of a file which lld
will need to select a mach-o slice of a fat file. The new function provides
an easy way to map a slice of a file into a MemoryBuffer. Test case included.
llvm-svn: 219260
Summary:
Fix pr21099
The pseudocode of what we were doing (spread through two functions) was:
if (operand.doesNotFitIn32Bits())
Opc.initializeWithFoo();
if (operand < 0)
operand = -operand;
if (operand.doesFitIn8Bits())
Opc.initializeWithBar();
else if (operand.doesFitIn32Bits())
Opc.initializeWithBlah();
doStuff(Opc);
So for operand == INT32_MIN, Opc was never initialized because the operand changes
from fitting in 32 bits to not fitting, causing the various bugs/error messages
noted by pr21099.
This patch adds an extra test at the beginning for this case, and an
llvm_unreachable to have better error message if the operand ends up
not fitting in 32-bits at the end.
Test Plan: new test + make check
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5655
llvm-svn: 219257
It would be more convenient to pass DWARFSection into DWARFUnitSection
constructor, instead of passing its components (Data and RelocAddrMap)
as a separate arguments.
llvm-svn: 219252
This is somewhat the inverse of how similar bugs in DAE and ArgPromo
manifested and were addressed. In those passes, individual call sites
were visited explicitly, and then the old function was deleted. This
left the debug info with a null llvm::Function* that needed to be
updated to point to the new function.
In the case of DFSan, it RAUWs the old function with the wrapper, which
includes debug info. So now the debug info refers to the wrapper, which
doesn't actually have any instructions with debug info in it, so it is
ignored entirely - resulting in a DW_TAG_subprogram with no high/low pc,
etc. Instead, fix up the debug info to refer to the original function
after the RAUW messed it up.
Reviewed/discussed with Peter Collingbourne on the llvm-dev mailing
list.
llvm-svn: 219249