This fixes PR15289. This bug was introduced (recently) in r175215; collecting
all std::vector references for candidate pairs to delete at once is invalid
because subsequent lookups in the owning DenseMap could invalidate the
references.
bugpoint was able to reduce a useful test case. Unfortunately, because whether
or not this asserts depends on memory layout, this test case will sometimes
appear to produce valid output. Nevertheless, running under valgrind will
reveal the error.
llvm-svn: 175397
GCC warns about the attribute being ignored if it occurs after void*.
There seems to be some kind of incompatibility between clang and gcc here, but
I can't fathom who's right.
void* LLVM_LIBRARY_VISIBILITY foo(); // clang: hidden, gcc: default
LLVM_LIBRARY_VISIBILITY void *bar(); // clang: hidden, gcc: hidden
void LLVM_LIBRARY_VISIBILITY qux(); // clang: hidden, gcc: hidden
llvm-svn: 175394
arguably better than forward iterators for this use case, they are confusing and
there are some implementation problems with reverse iterators and MI bundles.
llvm-svn: 175393
MachineBasicBlock::SplitCriticalEdge. Since this is an iterator rather than
an instr_iterator, the isBundled() check only passes if getFirstTerminator()
returned end() and the garbage memory happens to lean that way.
Multiple successors can be present without any terminator instructions in the
case of exception handling with a fallthrough.
llvm-svn: 175383
terminators that actually have register uses when splitting critical edges.
This commit also introduces a method repairIntervalsInRange() on LiveIntervals,
which allows for repairing LiveIntervals in a small range after an arbitrary
target hook modifies, inserts, and removes instructions. It's pretty limited
right now, but I hope to extend it to support all of the things that are done
by the convertToThreeAddress() target hooks.
llvm-svn: 175382
(or (bool?A:B),(bool?C:D)) --> (bool?(or A,C):(or B,D))
By the time the OR is visited, both the SELECTs have been visited and not
optimized and the OR itself hasn't been transformed so we do this transform in
the hopes that the new ORs will be optimized.
The transform is explicitly disabled for vector-selects until "codegen matures
to handle them better".
Patch by Muhammad Tauqir!
llvm-svn: 175380
Avoids malloc and is a lot denser. We lose iteration over target independent
attributes, but that's a strange interface anyways and didn't have any users
outside of AttrBuilder.
llvm-svn: 175370
as well as 16/32 bit variants to do and so I want this to look nice
when I do it. I've been experimenting with this. No new test cases
are needed.
llvm-svn: 175369
GNU as rejects them and there are configure scripts in the wild that check if
the assembler rejects ".align 3" to determine whether the alignment is in bytes
or powers of two.
llvm-svn: 175360
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175356
It's completely unnecessary and can be replace with proper
SReg_64 handling instead.
This actually fixes a piglit test on SI.
v2: use correct register class in addRegisterClass,
set special classes as not allocatable
v3: revert setting special classes as not allocateable
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175355
Seems to be allot simpler, and also paves the
way for further improvements.
v2: rebased on master, use 0 in BUFFER_LOAD_FORMAT_XYZW,
use VGPR0 in dummy EXP, avoid compiler warning, break
after encoding the first literal.
v3: correctly use V_ADD_F32_e64
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175354
Mark all the operands that can also have an immediate.
v2: SOFFSET is also an SSrc_32 operand
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175353
Previously it only worked because of coincident.
v2: fix 64bit versions, use 0x80 (inline 0) instead of SGPR0
for the unused SRC2
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175352
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175351
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175350
Stop adding more instructions than necessary.
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175349
Generate more than one loop if it seems to make sense.
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175348
Using the new NearestCommonDominator class.
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175347
Using the new NearestCommonDominator class.
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175346
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175345
If the frame pointer is omitted, and any stack changes occur in the inline
assembly, e.g.: "pusha", then any C local variable or C argument references
will be incorrect.
I pass no judgement on anyone who would do such a thing. ;)
rdar://13218191
llvm-svn: 175334
Input/Output rewrite to the same location. Make sure the SizeDirective rewrite
is performed first. This also ensure the sort algorithm is stable.
llvm-svn: 175317
With bundle alignment, instructions all get their own MCFragments
(unless they are in a bundle-locked group). For instructions with
fixups, this is an MCDataFragment. Emitting actual data (e.g. for
.long) attempts to re-use MCDataFragments, which we don't want int
this case since it leads to fragments which exceed the bundle size.
So, don't reuse them in this case.
Also adds a test and fixes some formatting.
llvm-svn: 175316
If two functions require different features (e.g., `-mno-sse' vs. `-msse') then
we want to honor that, especially during LTO. We can do that by resetting the
subtarget's features depending upon the 'target-feature' attribute.
llvm-svn: 175314
functions. Set AddedComplexity to determine the order in which patterns are
matched.
This simplifies selection of floating point loads/stores.
No functionality change intended.
llvm-svn: 175300
of the old jit and which we don't intend to support in mips16 or micromips.
This dependency is for the testing of whether an instruction is a pseudo.
llvm-svn: 175297
- add sincos to runtime library if target triple environment is GNU
- added canCombineSinCosLibcall() which checks that sincos is in the RTL and
if the environment is GNU then unsafe fpmath is enabled (required to
preserve errno)
- extended sincos-opt lit test
Reviewed by: Hal Finkel
llvm-svn: 175283
Several functions and variable names used the term 'tree' to refer
to what is actually a DAG. Correcting this mistake will, hopefully,
prevent confusion in the future.
No functionality change intended.
llvm-svn: 175278
It enables to work with a smaller constant, which is target friendly for those which can compare to immediates.
It also avoids inserting a shift in favor of a trunc, which can be free on some targets.
This used to work until LLVM-3.1, but regressed with the 3.2 release.
llvm-svn: 175270
This is essentially a stripped-down version of the ConstandIslands pass (which
always had these two functions), providing just the features necessary for
correctness.
In particular there needs to be a way to resolve the situation where a
conditional branch's destination block ends up out of range.
This issue crops up when self-hosting for AArch64.
llvm-svn: 175269
blocks. We still don't have consensus if we should try to change clang or
the standard, but llvm should work with compilers that implement the current
standard and mangle those functions.
llvm-svn: 175267
This implements the review suggestion to simplify the AArch64 backend. If we
later discover that we *really* need the extra complexity of the
ConstantIslands pass for performance reasons it can be resurrected.
llvm-svn: 175258
In the near future litpools will be in a different section, which means that
any access to them is at least two instructions. This makes the case for a
movz/movk pair (if total offset <= 32-bits) even more compelling.
llvm-svn: 175257
For some basic blocks, it is possible to generate many candidate pairs for
relatively few pairable instructions. When many (tens of thousands) of these pairs
are generated for a single instruction group, the time taken to generate and
rank the different vectorization plans can become quite large. As a result, we now
cap the number of candidate pairs within each instruction group. This is done by
closing out the group once the threshold is reached (set now at 3000 pairs).
Although this will limit the overall compile-time impact, this may not be the best
way to achieve this result. It might be better, for example, to prune excessive
candidate pairs after the fact the prevent the generation of short, but highly-connected
groups. We can experiment with this in the future.
This change reduces the overall compile-time slowdown of the csa.ll test case in
PR15222 to ~5x. If 5x is still considered too large, a lower limit can be
used as the default.
This represents a functionality change, but only for very large inputs
(thus, there is no regression test).
llvm-svn: 175251
not matter but makes it more gcc compatible which avoids possible subtle
problems. Also, turned back on a disabled check in helloworld.ll.
llvm-svn: 175237
assembler should also accept a two arg form, as the docuemntation specifies that
the first (destination) register is optional.
This patch uses TwoOperandAliasConstraint to add the two argument form.
It also fixes an 80-column formatting problem in:
test/MC/ARM/neon-bitwise-encoding
<rdar://problem/12909419> Clang rejects ARM NEON assembly instructions
llvm-svn: 175221
1. Define and use function terminateSearch.
2. Use MachineBasicBlock::iterator instead of MachineBasicBlock::instr_iterator.
3. Delete the line which checks whether an instruction is a pseudo.
llvm-svn: 175219
All instances of std::multimap have now been replaced by
DenseMap<K, std::vector<V> >, and this yields a speedup of 5% on the
csa.ll test case from PR15222.
No functionality change intended.
llvm-svn: 175216
This is another commit on the road to removing std::multimap from
BBVectorize. This gives an ~1% speedup on the csa.ll test case
in PR15222.
No functionality change intended.
llvm-svn: 175215
This patch doesn't introduce any functionality changes.
It adds some new fields to the Hexagon instruction classes and
changes their layout to support instruction encoding.
llvm-svn: 175205
The important fix is that the constant interpolation value is stored in the
parameter slot P0, which is encoded as 2.
In addition, drop the SI_INTERP_CONST pseudo instruction, pass the parameter
slot as an operand to V_INTERP_MOV_F32 instead of hardcoding it there, and
add a special operand class for the parameter slots for type checking and
pretty printing.
NOTE: This is a candidate for the Mesa stable branch.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175193
It fixes around 100 tfb piglit tests and 16 glean tests.
NOTE: This is a candidate for the Mesa stable branch.
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 175183
This allows MachineInstScheduler to reorder them, and thus make scheduling more
efficient.
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 175182
This fixes a couple of regressions on (probably not just) cayman
NOTE: This is a candidate for the Mesa stable branch.
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 175180
If vector types have legal register classes, then LLVM bypasses LegalizeTypes
on them, which causes faults currently since the code to handle them isn't in
place.
This fixes test failures when AArch64 is the default target.
llvm-svn: 175172
The parser will now accept instructions with alignment specifiers written like
vld1.8 {d16}, [r0:64]
, while also still accepting the incorrect syntax
vld1.8 {d16}, [r0, :64]
llvm-svn: 175164
up so that we can apply the direct object emitter patch. This patch
should be a nop right now and it's test is to not break what is already
there.
llvm-svn: 175126
of the copy is a subregister def. The current code assumes that it can do a full
def of the destination register, but it is not checking that the def operand is
read-undef. It also doesn't clear the subregister index of the destination in
the new instruction to reflect the full subregister def.
These issues were found running 'make check' with my next commit that enables
rematerialization in more cases.
llvm-svn: 175122
Since functions with internal linkage don't have language linkage, it is valid
to overload them:
extern "C" {
static int foo();
static int foo(int);
}
So we mangle them.
llvm-svn: 175120
It's possible (e.g. after an LTO build) that an internal global may be used for
debugging purposes. If that's the case appending a '.b' to it makes it hard to
find that variable. Steal the name from the old GV before deleting it so that
they can find that variable again.
llvm-svn: 175104
if the offset fits in 11 bits. This makes use of the fact that the abi
requires sp to be 8 byte aligned so the actual offset can fit in 8
bits. It will be shifted left and sign extended before being actually used.
The assembler or direct object emitter will shift right the 11 bit
signed field by 3 bits. We don't need to deal with that here.
llvm-svn: 175073
This happens when there is both stack realignment and a dynamic alloca in the
function. If we overwrite %esi (rep;movsl uses fixed registers) we'll lose the
base pointer and the next register spill will write into oblivion.
Fixes PR15249 and unbreaks firefox on i386/freebsd. Mozilla uses dynamic allocas
and freebsd a 4 byte stack alignment.
llvm-svn: 175057
RegisterCoalescer used to depend on LiveDebugVariable. LDV removes DBG_VALUEs
without emitting them at the end.
We fix this by removing LDV from RegisterCoalescer. Also add an assertion to
make sure we call emitDebugValues if DBG_VALUEs are removed at
runOnMachineFunction.
rdar://problem/13183203
Reviewed by Andy & Jakob
llvm-svn: 175023
This is complicated by backward labels (e.g., 0b can be both a backward label
and a binary zero). The current implementation assumes [0-9]b is always a
label and thus it's possible for 0b and 1b to not be interpreted correctly for
ms-style inline assembly. However, this is relatively simple to fix in the
inline assembly (i.e., drop the [bB]).
This patch also limits backward labels to [0-9]b, so that only 0b and 1b are
ambiguous.
Part of rdar://12470373
llvm-svn: 174983
DAGCombiner::ReduceLoadWidth was converting (trunc i32 (shl i64 v, 32))
into (shl i32 v, 32) into undef. To prevent this, check the shift count
against the final result size.
Patch by: Kevin Schoedel
Reviewed by: Nadav Rotem
llvm-svn: 174972
Vectors were being manually scalarized by the backend. Instead,
let the target-independent code do all of the work. The manual
scalarization was from a time before good target-independent support
for scalarization in LLVM. However, this forces us to specially-handle
vector loads and stores, which we can turn into PTX instructions that
produce/consume multiple operands.
llvm-svn: 174968
'R600/SI: Use proper instructions for array/shadow samplers.' removed two
cases from TEX_SHADOW. Vincent Lejeune reported on IRC that this broke some
shadow array piglit tests with the r600g driver. Reinstating the removed
cases should fix this, and still works with radeonsi as well.
I will follow up with some lit tests which would have caught the regression.
NOTE: This is a candidate for the Mesa stable branch.
Tested-by: Vincent Lejeune <vljn@ovi.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 174963
The bitcode writer emits a reference to the attribute group that the object at
the given index refers to. The bitcode reader is modified to read this in and
map it back to the attribute group.
llvm-svn: 174952
live ranges should always be extended, and the only successor that should be
considered for extension of other ranges is the target of the split edge.
llvm-svn: 174935
Sorry for the lack of a test case. I tried writing one for i386 as i know selects are illegal on this target, but they are actually considered legal by isel and expanded later.
I can't see any targets to trigger this, but checking for the legality of a node before forming it is general goodness.
llvm-svn: 174934
Check for reverse shuffles in the CostModel analysis pass and query
TargetTransform info accordingly. This allows us we can write test cases for
reverse shuffles.
radar://13171406
llvm-svn: 174932
Lower reverse shuffles to a vrev64 and a vext instruction instead of the default
legalization of storing and loading to the stack. This is important because we
generate reverse shuffles in the loop vectorizer when we reverse store to an
array.
uint8_t Arr[N];
for (i = 0; i < N; ++i)
Arr[N - i - 1] = ...
radar://13171760
llvm-svn: 174929
When building the pairable-instruction dependency map, don't search
past the last pairable instruction. For large blocks that have been
divided into multiple instruction groups, searching past the last
instruction in each group is very wasteful. This gives a 32% speedup
on the csa.ll test case from PR15222 (when using 50 instructions
in each group).
No functionality change intended.
llvm-svn: 174915
This map is queried only for instructions in pairs of pairable
instructions; so make sure that only pairs of pairable
instructions are added to the map. This gives a 3.5% speedup
on the csa.ll test case from PR15222.
No functionality change intended.
llvm-svn: 174914
MipsCodeEmitter.cpp.
JALR and NOP are expanded by function emitPseudoExpansionLowering, which is not
called when the old JIT is used.
This fixes the following tests which have been failing on
llvm-mips-linux builder:
LLVM :: ExecutionEngine__2003-01-04-LoopTest.ll
LLVM :: ExecutionEngine__2003-05-06-LivenessClobber.ll
LLVM :: ExecutionEngine__2003-06-04-bzip2-bug.ll
LLVM :: ExecutionEngine__2005-12-02-TailCallBug.ll
LLVM :: ExecutionEngine__2003-10-18-PHINode-ConstantExpr-CondCode-Failure.ll
LLVM :: ExecutionEngine__hello2.ll
LLVM :: ExecutionEngine__stubs.ll
LLVM :: ExecutionEngine__test-branch.ll
LLVM :: ExecutionEngine__test-call.ll
LLVM :: ExecutionEngine__test-common-symbols.ll
LLVM :: ExecutionEngine__test-loadstore.ll
LLVM :: ExecutionEngine__test-loop.ll
llvm-svn: 174912
This eliminates one more linear search over a range of
std::multimap entries. This gives a 22% speedup on the
csa.ll test case from PR15222.
No functionality change intended.
llvm-svn: 174893
The modifiers don't seem to have any effect with V_MOV_B32, supposedly it's
meant to just move bits untouched.
Fixes 46 piglit tests with radeonsi, though unfortunately 11 of those had
just regressed because they started using the clamp modifier.
NOTE: This is a candidate for the Mesa stable branch.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 174890
This flag makes asan use a small (<2G) offset for 64-bit asan shadow mapping.
On x86_64 this saves us a register, thus achieving ~2/3 of the
zero-base-offset's benefits in both performance and code size.
Thanks Jakub Jelinek for the idea.
llvm-svn: 174886
This does two things:
It removes a call to abs() which may have "long long" parameter on Windows,
which is not necessarily available in C++03.
It also corrects the signedness of Amount, which was relying on
implementation-defined conversions previously.
Code was already tested (albeit in an implemnetation defined way) so no extra
tests.
llvm-svn: 174885
Previous code had a confusing comment which was mostly an implementation
detail. This condition corresponds to "lsb up to register width" and "width not
ridiculous".
llvm-svn: 174877
This gives a DiagnosticType to all AsmOperands in sight. This replaces all
"invalid operand" diagnostics with something more specific. The messages given
should still be sufficiently vague that they're not usually actively misleading
when LLVM guesses your instruction incorrectly.
llvm-svn: 174871
This is currently a bit hairier than it needs to be, since depending on where the
split block resides the end ListEntry of the split block may be the end ListEntry
of the original block or a new entry. Some changes to the SlotIndexes updating
should make it possible to eliminate the two cases here.
This also isn't as optimized as it could be. In the future Liveinterval should
probably get a flag that indicates whether the LiveInterval is within a single
basic block. We could ignore all such intervals when splitting an edge.
llvm-svn: 174870