Optimize the codegen of select and branch instructions to directly use the
EFLAGS from the {s|u}{add|sub|mul}.with.overflow intrinsics.
llvm-svn: 211645
In assembly the expression a=b is parsed as an assignment, so it should be
printed as one.
This remove a truly horrible hack for producing a label with "a=.". It would
be used by codegen but would never be reached by the asm parser. Sorry I
missed this when it was first committed.
llvm-svn: 211639
R600 was using a clamped version of rsq, but SI was not. Add a
new rsq_clamped intrinsic and use them consistently.
It's unclear to me from the documentation what behavior
the R600 instructions have, so I assume they have the legacy behavior
described by the SI documents. For R600, use RECIPSQRT_IEEE
for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also
has RECIPSQRT_FF, which I'm not sure how it fits in here.
llvm-svn: 211637
Most of this is just tests that were silently succeeding in spite of
schema changes I made over a year ago. Cleaning them up as they lead to
failures in a change I'm working on/will come soon.
test/DebugInfo/2010-01-19-DbgScope.ll was removed as it tested miscoping
where a DebugLoc described a location not in the current function. The
test case doesn't describe why this is a valid situation and should be
supported, so I'm removing it and shortly going to commit changes that
make this firmly unsupported/assert-fail.
llvm-svn: 211628
PR20071 identifies a problem in PowerPC's fast-isel implementation for
floating-point conversion to integer. The fctiduz instruction was added in
Power ISA 2.06 (i.e., Power7 and later). However, this instruction is being
generated regardless of which 64-bit PowerPC target is selected.
The intent is for fast-isel to punt to DAG selection when this instruction is
not available. This patch implements that change. For testing purposes, the
existing fast-isel-conversion.ll test adds a RUN line for -mcpu=970 and tests
for the expected code generation. Additionally, the existing test
fast-isel-conversion-p5.ll was found to be incorrectly expecting the
unavailable instruction to be generated. I've removed these test variants
since we have adequate coverage in fast-isel-conversion.ll.
llvm-svn: 211627
Summary:
This new debug emission kind supports emitting line location
information in all instructions, but stops code generation
from emitting debug info to the final output.
This mode is useful when the backend wants to track source
locations during code generation, but it does not want to
produce debug info. This is currently used by optimization
remarks (-pass-remarks, -pass-remarks-missed and
-pass-remarks-analysis).
To prevent debug info emission, DIBuilder never inserts the
annotation 'llvm.dbg.cu' when LocTrackingOnly is enabled.
Reviewers: echristo, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4234
llvm-svn: 211609
Summary:
This instruction is re-encoded in MIPS32r6/MIPS64r6 without changing the
restrictions. We hadn't implemented it for earlier ISA's so it has been added to those too.
Differential Revision: http://reviews.llvm.org/D4265
llvm-svn: 211590
Referencing a dllimport variable requires actually instructions, not
just a relocation. This fixes PR19955.
Differential Revision: http://reviews.llvm.org/D4249
llvm-svn: 211571
V' bit in the P2 byte of the EVEX prefix provides the top bit of the NDD and
NDS register fields. This was simply not used in the decoder until now.
Fixes <rdar://problem/17402661>
llvm-svn: 211565
The extends the select lowering coverage by emiting pseudo cmov
instructions. These insturction will be later on lowered to control-flow to
simulate the select.
llvm-svn: 211545
This extends the select lowering to support floating-point selects. The
lowering depends on SSE instructions and that the conditon comes from a
floating-point compare. Under this conditions it is possible to emit an
optimized instruction sequence that doesn't require any branches to
simulate the select.
llvm-svn: 211544
to match llvm-size and other UNIX systems for their nm(1).
Tweak test cases that used llvm-nm with standard input to add a "-" to
indicate that and add a test case to check the default of a.out for llvm-nm.
llvm-svn: 211529
According Nick Kledzik (http://llvm.org/bugs/show_bug.cgi?id=19430#c2):
"... mach-o no longer needs names in the __eh_frame section (and has not for
years)."
Iain Sandoe confirms it is also unnecessary for their old darwin support.
llvm-svn: 211500
The PPCFrameLowering::determineFrameLayout routine currently ensures
that every function that allocates a stack frame provides space for the
parameter save area (via PPCFrameLowering::getMinCallFrameSize).
This is actually not necessary. There may be functions that never call
another routine but still allocate a frame; those do not require the
parameter save area. In the future, with the ELFv2 ABI, even some
routines that do call other functions do not need to allocate the
parameter save area.
While it is not a bug to allocate the parameter area when it is not
needed, it is better to avoid it to save stack space.
Note that when any particular function call requires the parameter save
area, this space will already have been included by ABI code in the size
the CALLSEQ_START insn is annotated with, and therefore included in the
size returned by MFI->getMaxCallFrameSize().
This means that determineFrameLayout simply does not need to care about
the parameter save area. (It still needs to ensure that every frame
provides the linkage area.) This is implemented by this patch.
Note that this exposed a bug in the new fast-isel code where the parameter
area was *not* included in the CALLSEQ_START size; this is also fixed.
A couple of test cases needed to be adapted for the new (smaller) stack
frame size those tests now see.
llvm-svn: 211495
Current 64-bit SVR4 code seems to have some remnants of Darwin code
in AltiVec argument handing. This had the effect that AltiVec arguments
(or subsequent arguments) were not correctly placed in the parameter area
in some cases.
The correct behaviour with the 64-bit SVR4 ABI is:
- All AltiVec arguments take up space in the parameter area, just like
any other arguments, whether vararg or not.
- They are always 16-byte aligned, skipping a parameter area doubleword
(and the associated GPR, if any), if necessary.
This patch implements the correct behaviour and adds a test case.
(Verified against GCC behaviour via the ABI compat test suite.)
llvm-svn: 211492
Strictly, it's unpredictable. But we don't quite model that yet and an error is
better than ignoring the issue. This one somehow got left out before though.
rdar://problem/15997748
llvm-svn: 211490
Correct the section flags for code built for Windows on ARM with
`-ffunction-sections`. Windows on ARM uses solely Thumb-2 instructions, and
indicates that the function is thumb by placing it in a text section that has
IMAGE_SCN_MEM_16BIT flag set.
When we encounter a .section directive, a new section is constructed. This may
be a text segment. In order to identify that we need the additional flag,
expose the target triple through the ObjectFileInfo as this information is lost
otherwise.
Since any modern ARM targeting environment on Windows would be Thumb-2 (Windows
ARM NT or Windows Embedded Compact), introducing a new flag to indicate the
section attribute seems to be a bit overkill. Simply depend on the target
triple. Since there is one location that this information is currently needed,
creating a target specific assembly parser and delegating the parsing of section
switches also feels a bit heavy handed. If it turns out that this information
ends up changing additional behaviour, then it may be worth considering that
alternative.
llvm-svn: 211481
User may initialize a var with non-zero value and specify .bss section.
E.g. : int a __attribute__((section(".bss"))) = 2;
This patch converts an assertion to error report for better user
experience.
Differential Revision: http://reviews.llvm.org/D4199
llvm-svn: 211455
We handle this by spilling the whole thing to the stack and doing the
insertion as a store.
PR19492. This happens in real code because the vectorizer creates v2i128 when AVX is enabled.
llvm-svn: 211435
This patch adds ISel patterns to select SSE3/AVX ADDSUB instructions
from a sequence of "vadd + vsub + blend".
Example:
///
typedef float float4 __attribute__((ext_vector_type(4)));
float4 foo(float4 A, float4 B) {
float4 X = A - B;
float4 Y = A + B;
return (float4){X[0], Y[1], X[2], Y[3]};
}
///
Before this patch, (with flag -mcpu=corei7) llc produced the following
assembly sequence:
movaps %xmm0, %xmm2
addps %xmm1, %xmm2
subps %xmm1, %xmm0
blendps $10, %xmm2, %xmm0
With this patch, we now get a single
addsubps %xmm1, %xmm0
llvm-svn: 211427
the tool is given multiple files. Also fix the same issue with Mach-O
universal files. And fix the newline spacing to separate the output
in these cases.
llvm-svn: 211405
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI. It handles all corner cases (I hope), including stack
realignment.
Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.
Patch by Vadim Chugunov!
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D4081
llvm-svn: 211399
Summary:
Different range metadata can lead to different optimizations in later
passes, possibly breaking the semantics of the merged function. So range
metadata must be taken into consideration when comparing Load
instructions.
Thanks!
llvm-svn: 211391
This adds support for several missing PPC64 relocations in the
straight-forward manner to RuntimeDyldELF.cpp.
Note that this actually fixes a failure of a large-model test case on
PowerPC, allowing the XFAIL to be removed.
llvm-svn: 211382
When small arguments (structures < 8 bytes or "float") are passed in a
stack slot in the ppc64 SVR4 ABI, they must reside in the least
significant part of that slot. On BE, this means that an offset needs
to be added to the stack address of the parameter, but on LE, the least
significant part of the slot has the same address as the slot itself.
This changes the PowerPC back-end ABI code to only add the small
argument stack slot offset for BE. It also adds test cases to verify
the correct behavior on both BE and LE.
llvm-svn: 211368
Targets can assume that a target streamer is present, so they have to be able
to construct a null streamer in order to set the target streamer in it to.
Fixes a crash when using the null streamer with arm.
llvm-svn: 211358
This patch adds support to recognize patterns such as fadd,fsub,fadd,fsub.../add,sub,add,sub... and
vectorizes them as vector shuffles if they are profitable.
These patterns of vector shuffle can later be converted to instructions such as addsubpd etc on X86.
Thanks to Arnold and Hal for the reviews. http://reviews.llvm.org/D4015
llvm-svn: 211339
We would previously put dllimport variables in switch lookup tables, which
doesn't work because the address cannot be used in a constant initializer.
This is basically the same problem that we have in PR19955.
Putting TLS variables in switch tables also desn't work, because the
address of such a variable is not constant.
Differential Revision: http://reviews.llvm.org/D4220
llvm-svn: 211331
fat files) to print “ (for architecture XYZ)” for fat files with more than
one architecture to be like what the darwin tools do for fat files.
Also clean up the Mach-O printing of archive membernames in llvm-nm to use
the darwin form of "libx.a(foo.o)".
llvm-svn: 211316
for assembly files we can't depend on the offset within the section
after a string since it could be different between producers etc.
Relax these tests accordingly.
llvm-svn: 211308
The address pool was being emitted before location lists. The latter
could add more entries to the pool which would be lost/never emitted.
llvm-svn: 211284
Use the MCStreamer base implementations for file ID tracking instead of
overriding them as no-ops.
Avoids assertions when streaming Dwarf debug info, and fixes ASM parsing of loc
and file directives.
llvm-svn: 211282
Summary:
With this patch, range metadata can be added to call/invoke including
IntrinsicInst. Previously, it could only be added to load.
Rename computeKnownBitsLoad to computeKnownBitsFromRangeMetadata because
range metadata is not only used by load.
Update the language reference to reflect this change.
Test Plan:
Add several tests in range-2.ll to confirm the verifier is happy with
having range metadata on call/invoke.
Add two tests in AddOverFlow.ll to confirm annotating range metadata to
call/invoke can benefit InstCombine.
Reviewers: meheff, nlewycky, reames, hfinkel, eliben
Reviewed By: eliben
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4187
llvm-svn: 211281
Currently, llvm always emits a DWARF CIE with a version of 1, even when emitting
DWARF 3 or 4, which both support CIE version 3. This patch makes it emit the
newer CIE version when we are emitting DWARF 3 or 4. This will not reduce
compatibility, as we already emit other DWARF3/4 features, and is worth doing as
the DWARF3 spec removed some ambiguities in the interpretation of call frame
information.
It also fixes a minor bug where the "return address" field of the CIE was
encoded as a ULEB128, which is only valid when the CIE version is 3. There are
no test changes for this, because (as far as I can tell) none of the platforms
that we test have a return address register with a DWARF register number >127.
llvm-svn: 211272
Patch by David Chisnall
His work was sponsored by: DARPA, AFRL
Some small modifications to the original patch: we now error if
it's not possible to expand an instruction (mips-expansions-bad.s has some
examples). Added some comments to the expansions.
llvm-svn: 211271
This patch enables transforms for following patterns.
(x + (~(y & c) + 1) --> x - (y & c)
(x + (~((y >> z) & c) + 1) --> x - ((y>>z) & c)
Differential Revision: http://reviews.llvm.org/D3733
llvm-svn: 211266
Before this change, the backend was unable to fold a build_vector dag
node with UNDEF operands into a single horizontal add/sub.
This patch teaches how to combine a build_vector with UNDEF operands into a
horizontal add/sub when possible. The algorithm conservatively avoids to combine
a build_vector with only a single non-UNDEF operand.
Added test haddsub-undef.ll to verify that we correctly fold horizontal binop
even in the presence of UNDEFs.
llvm-svn: 211265
* Find factorization opportunities using identity values.
* Find factorization opportunities by treating shl(X, C) as mul (X, shl(C))
* Keep NSW flag while simplifying instruction using factorization.
This fixes PR19263.
Differential Revision: http://reviews.llvm.org/D3799
llvm-svn: 211261
InstCombineMulDivRem has:
// Canonicalize (X+C1)*CI -> X*CI+C1*CI.
InstCombineAddSub has:
// W*X + Y*Z --> W * (X+Z) iff W == Y
These two transforms could fight with each other if C1*CI would not fold
away to something simpler than a ConstantExpr mul.
The InstCombineMulDivRem transform only acted on ConstantInts until
r199602 when it was changed to operate on all Constants in order to
let it fire on ConstantVectors.
To fix this, make this transform more careful by checking to see if we
actually folded away C1*CI.
This fixes PR20079.
llvm-svn: 211258
We would get confused by '@' characters in symbol names, we would
mistake the text following them for the variant kind.
When an identifier a string, the variant kind will never show up inside
of it. Instead, check to see if there is a variant following the
string.
This fixes PR19965.
llvm-svn: 211249
These will be used for custom lowering and for library
implementations of various math functions, so it's useful
to expose these as builtins.
llvm-svn: 211247
fat files containing archives.
Also fix a bug in MachOUniversalBinary::ObjectForArch::ObjectForArch()
where it needed a >= when comparing the Index with the number of
objects in a fat file. As the index starts at 0.
llvm-svn: 211230
The difference from rint isn't really relevant here,
so treat them as equivalent. OpenCL doesn't have nearbyint,
so this is sort of pointless other than for completeness.
llvm-svn: 211229
This contains all the previous patches + getlod support on top of it.
It doesn't use SDNodes anymore, so it's quite small.
It also adds v16i8 to SReg_128, which is used for the sampler descriptor.
Reviewed-by: Tom Stellard
llvm-svn: 211228
During an indirect function call sequence on the 64-bit SVR4 ABI,
generate code must load and then restore the TOC register.
This does not use a regular LOAD instruction since the TOC
register r2 is marked as reserved. Instead, the are two
special instruction patterns:
let RST = 2, DS = 2 in
def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg),
"ld 2, 8($reg)", IIC_LdStLD,
[(PPCload_toc i64:$reg)]>, isPPC64;
let RST = 2, DS = 10, RA = 1 in
def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins),
"ld 2, 40(1)", IIC_LdStLD,
[(PPCtoc_restore)]>, isPPC64;
Note that these not only restrict the destination of the
load to r2, but they also restrict the *source* of the
load to particular address combinations. The latter is
a problem when we want to support the ELFv2 ABI, since
there the TOC save slot is no longer at 40(1).
This patch replaces those two instructions with a single
instruction pattern that only hard-codes r2 as destination,
but supports generic addresses as source. This will allow
supporting the ELFv2 ABI, and also helps generate more
efficient code for calls to absolute addresses (allowing
simplification of the ppc64-calls.ll test case).
llvm-svn: 211193
As requested by Hal Finkel, this adds back a test for calls to
a known-constant function pointer value, and verifies that the
64-bit SVR4 indirect function call sequence is used.
llvm-svn: 211190
Note that I followed the AVX2 convention here and didn't add LLVM intrinsics
for stores. These can be generated with the nontemporal hint on LLVM IR
stores (see new test). The GCC builtins are lowered directly into nontemporal
stores.
<rdar://problem/17082571>
llvm-svn: 211176
The PowerPC back-end uses BLA to implement calls to functions at
known-constant addresses, which is apparently used for certain
system routines on Darwin.
However, with the 64-bit SVR4 ABI, this is actually incorrect.
An immediate function pointer value on this platform is not
directly usable as a target address for BLA:
- in the ELFv1 ABI, the function pointer value refers to the
*function descriptor*, not the code address
- in the ELFv2 ABI, the function pointer value refers to the
global entry point, but BL(A) would only be correct when
calling the *local* entry point
This bug didn't show up since using immediate function pointer
values is not usually done in the 64-bit SVR4 ABI in the first
place. However, I ran into this issue with a certain use case
of LLVM as JIT, where immediate function pointer values were
uses to implement callbacks from JITted code to helpers in
statically compiled code.
Fixed by simply not using BLA with the 64-bit SVR4 ABI.
llvm-svn: 211174
All tests in test/tools/llvm-cov fail on big-endian targets and are
supposed to be XFAILed there. However, including "powerpc64" in the
XFAIL line is now incorrect, since that matches both powerpc64- and
powerpc64le- targets, and the tests pass on the latter.
Update the XFAIL lines to use powerpc64- instead (like mips64-).
llvm-svn: 211172
Summary:
The assembler tries to reuse the destination register for memory operations whenever
it can but it's not possible to do so if the destination register is not a GPR.
Example:
ldc1 $f0, sym
should expand to:
lui $at, %hi(sym)
ldc1 $f0, %lo(sym)($at)
It's entirely wrong to expand to:
lui $f0, %hi(sym)
ldc1 $f0, %lo(sym)($f0)
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D4173
llvm-svn: 211169
Summary:
This patch doesn't really change the logic behind expandMemInst but it allows
us to assemble .S files that use .set noat with some macros. For example:
.set noat
lw $k0, offset($k1)
Can expand to:
lui $k0, %hi(offset)
addu $k0, $k0, $k1
lw $k0, %lo(offset)($k0)
with no need to access $at.
Reviewers: dsanders, vmedic
Reviewed By: dsanders, vmedic
Differential Revision: http://reviews.llvm.org/D4159
llvm-svn: 211165
Summary:
Added negative test case so that we can be sure we handle erroneous situations
while parsing the .cpsetup directive.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3681
llvm-svn: 211160
It looks like there are two versions of LowerCallTo here: the
SelectionDAGBuilder one is designed to operate on LLVM IR, and the
TargetLowering one in the case where everything is at DAG level.
Previously, only the SelectionDAGBuilder variant could handle demoting
an impossible return to sret semantics (before delegating to the
TargetLowering version), but this functionality is also useful for
certain libcalls (e.g. 128-bit operations on 32-bit x86). So this
commit moves the sret handling down a level.
rdar://problem/17242889
llvm-svn: 211155
ReconstructShuffle() may wrongly creat a CONCAT_VECTOR trying to
concat 2 of v2i32 into v4i16. This commit is to fix this issue and
try to generate UZP1 instead of lots of MOV and INS.
Patch is initalized by Kevin Qin, and refactored by Tim Northover.
llvm-svn: 211144
This patch is a follow up to r211040 & r211052. Rather than bailing out of fast
isel this patch will generate an alternate instruction (movabsq) instead of the
leaq. While this will always have enough room to handle the 64 bit displacment
it is generally over kill for internal symbols (most displacements will be
within 32 bits) but since we have no way of communicating the code model to the
the assmebler in order to avoid flagging an absolute leal/leaq as illegal when
using a symbolic displacement.
llvm-svn: 211130
This optimizes predicates for certain compares, such as fcmp oeq %x, %x to
fcmp ord %x, %x. The latter one is more efficient to generate.
The same optimization is applied to conditional branches.
llvm-svn: 211126
and the -l option for the long format. Also when the object is a Mach-O
file and the format is berkeley produce output like darwin’s default size(1)
summary berkeley derived output.
Like System V format, there are also some small changes in how and where
the file names and archive member names are printed for darwin and
Mach-O.
Like the changes to llvm-nm these are the first steps in seeing if it is
possible to make llvm-size produce the same output as darwin's size(1).
llvm-svn: 211117
mark the old JIT tests as unsupported for powerpc64 - CMake style.
This follows the style used for hexagon/arm64/aarch64.
The equivalent tests still run under the supported MCJIT/*
llvm-svn: 211111
This patch add code to remove unreachable blocks from function
as they may cause jump threading to stuck in infinite loop.
Differential Revision: http://reviews.llvm.org/D3991
llvm-svn: 211103
To make sure branches are in range, we need to do a better job of estimating
the length of an inline assembly block than "it's probably 1 instruction, who'd
write asm with more than that?".
Fortunately there's already a (highly suspect, see how many ways you can think
of to break it!) callback for this purpose, which is used by the other targets.
rdar://problem/17277590
llvm-svn: 211095
Multiplication by an integer with a number of trailing zero bits leaves
the same number of lower bits of the result initialized to zero.
This change makes MSan take this into account in the case of multiplication by
a compile-time constant.
We don't handle the general, non-constant, case because
(a) it's not going to be cheap (computation-wise);
(b) multiplication by a partially uninitialized value in user code is
a bad idea anyway.
Constant case must be handled because it appears from LLVM optimization of a
completely valid user code, as the test case in compiler-rt demonstrates.
llvm-svn: 211092
Summary:
As a starting step, we only use one simple heuristic: if the sign bits
of both a and b are zero, we can prove "add a, b" do not unsigned
overflow, and thus convert it to "add nuw a, b".
Updated all affected tests and added two new tests (@zero_sign_bit and
@zero_sign_bit2) in AddOverflow.ll
Test Plan: make check-all
Reviewers: eliben, rafael, meheff, chandlerc
Reviewed By: chandlerc
Subscribers: chandlerc, llvm-commits
Differential Revision: http://reviews.llvm.org/D4144
llvm-svn: 211084
r199771 accidently broke the logic that makes sure that SROA only splits
load on byte boundaries. If such a split happens, some bits get lost
when reassembling loads of wider types, causing data corruption.
Move the width check up to reject such splits early, avoiding the
corruption. Fixes PR19250.
Patch by: Björn Steinbrink <bsteinbr@gmail.com>
llvm-svn: 211082
Make use of helper functions to simplify the branch and compare instruction
selection in FastISel. Also add test cases for compare and conditonal branch.
llvm-svn: 211077
[This is resubmitting r210721, which was reverted due to suspected breakage
which turned out to be unrelated].
Some extra review comments were addressed. See D4090 and D4147 for more details.
The Clang change that produces this metadata was committed in r210667
Patch by Mark Heffernan.
llvm-svn: 211076
Summary:
This patches allows non conversions like i1=i2; where both are global ints.
In addition, arithmetic and other things start to work since fast-isel will use
existing patterns for non fast-isel from tablegen files where applicable.
In addition i8, i16 will work in this limited context for assignment without the need
for sign extension (zero or signed). It does not matter how i8 or i16 are loaded (zero or sign extended)
since only the 8 or 16 relevant bits are used and clang will ask for sign extension before using them in
arithmetic. This is all made more complete in forthcoming patches.
for example:
int i, j=1, k=3;
void foo() {
i = j + k;
}
Keep in mind that this pass is not enabled right now and is an experimental pass
It can only be enabled with a hidden option to llvm of -mips-fast-isel.
Test Plan: Run test-suite, loadstore2.ll and I will run some executable tests.
Reviewers: dsanders
Subscribers: mcrosier
Differential Revision: http://reviews.llvm.org/D3856
llvm-svn: 211061
Rafael opened http://llvm.org/bugs/show_bug.cgi?id=19893 to track non-optimal
code generation for forming a function address that is local to the compile
unit. The existing code was treating both local and non-local functions
identically.
This patch fixes the problem by properly identifying local functions and
generating the proper addis/addi code. I also noticed that Rafael's earlier
changes to correct the surrounding code in PPCISelLowering.cpp were also
needed for fast instruction selection in PPCFastISel.cpp, so this patch
fixes that code as well.
The existing test/CodeGen/PowerPC/func-addr.ll is modified to test the new
code generation. I've added a -O0 run line to test the fast-isel code as
well.
Tested on powerpc64[le]-unknown-linux-gnu with no regressions.
llvm-svn: 211056
Added comment to clarify why we r211040 choose to bail out of fast isel instead
of generating a more complicated relocation, and fix mislabelled register in the
comments of the asan test case.
llvm-svn: 211052
ARM v7M has ldrex/strex but not ldrexd/strexd. This means 32-bit
operations should work as normal, but 64-bit ones are almost certainly
doomed.
Patch by Phoebe Buckheister.
llvm-svn: 211042
On x86_86 the lea instruction can only use a 32 bit immediate value. When
the code is compiled statically the RIP register is not used, meaning the
immediate is all that can be used for the relocation, which is not sufficient
in the case of targets more than +/- 2GB away. This patch bails out of fast
isel in those cases and reverts to DAG which does the right thing.
Test case included.
llvm-svn: 211040
When LowerSwitch transforms a switch instruction into a tree of ifs it
is actually performing a binary search into the various case ranges, to
see if the current value falls into one cases range of values.
So, if we have a program with something like this:
switch (a) {
case 0:
do0();
break;
case 1:
do1();
break;
case 2:
do2();
break;
default:
break;
}
the code produced is something like this:
if (a < 1) {
if (a == 0) {
do0();
}
} else {
if (a < 2) {
if (a == 1) {
do1();
}
} else {
if (a == 2) {
do2();
}
}
}
This code is inefficient because the check (a == 1) to execute do1() is
not needed.
The reason is that because we already checked that (a >= 1) initially by
checking that also (a < 2) we basically already inferred that (a == 1)
without the need of an extra basic block spawned to check if actually (a
== 1).
The patch addresses this problem by keeping track of already
checked bounds in the LowerSwitch algorithm, so that when the time
arrives to produce a Leaf Block that checks the equality with the case
value / range the algorithm can decide if that block is really needed
depending on the already checked bounds .
For example, the above with "a = 1" would work like this:
the bounds start as LB: NONE , UB: NONE
as (a < 1) is emitted the bounds for the else path become LB: 1 UB:
NONE. This happens because by failing the test (a < 1) we know that the
value "a" cannot be smaller than 1 if we enter the else branch.
After the emitting the check (a < 2) the bounds in the if branch become
LB: 1 UB: 1. This is because by checking that "a" is smaller than 2 then
the upper bound becomes 2 - 1 = 1.
When it is time to emit the leaf block for "case 1:" we notice that 1
can be squeezed exactly in between the LB and UB, which means that if we
arrived to that block there is no need to emit a block that checks if (a
== 1).
Patch by: Marcello Maggioni <hayarms@gmail.com>
llvm-svn: 211038
This makes llvm-nm ignore members that are not sufficiently aligned for
lib/Object to handle.
These archives are invalid. GNU AR is able to handle this, but in general
just warns about broken archive members.
We should probably start warning too, but for now just make sure llvm-nm
exits with an 0.
llvm-svn: 211036
Summary:
There is no change to the restrictions, just the result register is stored
once in the encoding rather than twice. The rt field is zero in
MIPS32r6/MIPS64r6.
Depends on D4119
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D4120
llvm-svn: 211019
Summary:
The linked-load, store-conditional operations have been re-encoded such
that have a 9-bit offset instead of the 16-bit offset they have prior to
MIPS32r6/MIPS64r6.
While implementing this, I noticed that the atomic load/store pseudos always
emit a sign extension using sll and sra. I have improved this to use seb/seh
when they are available (MIPS32r2/MIPS64r2 and above).
Depends on D4118
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D4119
llvm-svn: 211018
Summary:
There is very little difference between the big and little endian cases in
test/CodeGen/Mips/atomic.ll. Merge them together using multiple
FileCheck prefixes.
Depends on D4117
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D4118
llvm-svn: 211013
Summary:
The error message for the invalid.s cases isn't very helpful. It happens because
there is an instruction with a wider immediate that would have matched if the
NotMips32r6 predicate were true. I have some WIP to improve the message but it
affects most error messages for removed/re-encoded instructions on
MIPS32r6/MIPS64r6 and should therefore be a separate commit.
Depens on D4115
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D4117
llvm-svn: 211012
As a follow-up to r210375 which canonicalizes addrspacecast
instructions, this patch canonicalizes addrspacecast constant
expressions.
Given clang uses ConstantExpr::getAddrSpaceCast to emit addrspacecast
cosntant expressions, this patch is also a step towards having the
frontend emit canonicalized addrspacecasts.
Piggyback a minor refactor in InstCombineCasts.cpp
Update three affected tests in addrspacecast-alias.ll,
access-non-generic.ll and constant-fold-gep.ll and added one new test in
constant-fold-address-space-pointer.ll
llvm-svn: 211004
I haven't nailed this down entirely, but this is about as small of a
test case as I can seem to construct and adequately demonstrates the
crasher. I'll continue investigating the root cause/fix(es).
llvm-svn: 210993
There's probably no acatual change in behaviour here, just updating
the LowerFP_TO_INT function to be more similar to the reverse
implementation and updating costs to current CodeGen.
llvm-svn: 210985
This would assert if a constant address space was extern
and therefore didn't have an initializer. If the initializer
was undef, it would hit the unreachable unhandled initializer case.
An extern global should never really occur since we don't have
machine linking, but bugpoint likes to remove initializers.
llvm-svn: 210967
This patch is to move GlobalMerge pass from Transform/Scalar
to CodeGen, because GlobalMerge depends on TargetMachine.
In the mean time, the macro INITIALIZE_TM_PASS is also moved
to CodeGen/Passes.h. With this fix we can avoid making
libScalarOpts depend on libCodeGen.
llvm-svn: 210951
Rather than relying on abstract variables looked up at the time the
concrete variable is created, look them up at the end of the module to
ensure they're referenced even if they're created after the concrete
definition. This completes/matches the work done in r209677 to handle
this for the subprograms themselves.
llvm-svn: 210946
This doesn't fix the abstract variable handling yet, but it introduces a
similar delay mechanism as was added for subprograms, causing
DW_AT_location to be reordered to the beginning of the attribute list
for local variables, and fixes all the test fallout for that.
A subsequent commit will remove the abstract variable handling in
DbgVariable and just do the abstract variable lookup at module end to
ensure that abstract variables introduced after their concrete
counterparts are appropriately referenced by the concrete variable.
llvm-svn: 210943
In an effort to fix concrete variables referencing abstract origins
where the concrete variable preceeds the first inlined usage, the
addition of attributes such as name, file, etc will be delayed until the
end of the module (to wait to see if any inlined instances have
occurred, thus necessitating an abstract definition that the concrete
definition should also reference).
These test cases don't actually need to care about this ordering of
attributes, so update them to be more resilient to such changes coming
in the near future.
llvm-svn: 210940
This silently broke a long time ago when I unified some aspects of the
debug info schema. I'm just cleaning these up if/when they become a
problem.
llvm-svn: 210939
Init-order and use-after-return modes can currently be enabled
by runtime flags. use-after-scope mode is not really working at the
moment.
The only problem I see is that users won't be able to disable extra
instrumentation for init-order and use-after-scope by a top-level Clang flag.
But this instrumentation was implicitly enabled for quite a while and
we didn't hear from users hurt by it.
llvm-svn: 210924
Lowering this new node allows us to fold the almost universal
comparison for success before it's even formed. Instead we can create
a copy from EFLAGS and an X86ISD::SETCC operation since all "cmpxchg"
instructions set the zero-flag to the correct value.
rdar://problem/13201607
llvm-svn: 210923
This also simplifies the IR we create slightly: instead of working out
where success & failure should go manually, it turns out we can just
always jump to a success/failure block created for the purpose. Later
phases will sort out the mess without much difficulty.
llvm-svn: 210917
This has two benefits: it makes the result more suitable for direct
insertaion into the struct to emulate the new cmpxchg, and it means
the name we give the instruction matches its actual effect better.
llvm-svn: 210916
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.
As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.
At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.
By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.
Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.
Summary for out of tree users:
------------------------------
+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.
llvm-svn: 210903
Summary:
cache and pref were added in MIPS-III, and MIPS32 but were re-encoded in
MIPS32r6/MIPS64r6 to use a 9-bit offset rather than the 16-bit offset
available to earlier cores.
Resolved the decoding conflict between pref and lwc3.
Depends on D4115
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D4116
llvm-svn: 210900
Summary:
These MIPS-3D instructions have never been implemented in LLVM so we only
add testcases.
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D4115
llvm-svn: 210899
Summary:
b(ge|lt)zal have been removed in MIPS32r6/MIPS64r6. However, bal (an alias
for 'bgezal $zero, $offset') still remains with the same encoding it had
prior to MIPS32r6/MIPS64r6.
Updated the MipsNaCLELFStreamer, and MipsLongBranch to correctly handle the
MIPS32r6/MIPS64r6 BAL instruction in addition to the existing BAL_BR pseudo.
No changes were required to the CodeGen test that looks for BAL
(test/CodeGen/Mips/longbranch.ll) since the new instruction has the same
syntax.
Depends on D4113
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D4114
llvm-svn: 210898
Summary:
It's not emitted by the code generator so we only need assembler tests.
Also added missing daddi aliases from dsub mnemonics, and removed a couple
duplicate dsub tests.
Depends on D4112
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D4113
llvm-svn: 210897
When targetting Thumb1 on a processor which has a VFP unit (which
is not accessible from Thumb1), we were converting the fastcc calling
convention to AAPCS-VFP, which is not possible.
llvm-svn: 210889
This adds support for the cvttss2si/cvttsd2si intrinsics. Preceding
insertelement instructions are folded into the conversion instruction (if
possible).
llvm-svn: 210870
Previous algorithm for constructing [Address ranges]->[Compile Units]
mapping was wrong. It somewhat relied on the assumption that address ranges
for different compile units may not overlap. It is not so.
For example, two compile units may contain the definition of the same
linkonce_odr function. These definitions will be merged at link-time,
resulting in equivalent .debug_ranges entries for both these units
Instead of sorting and merging original address ranges (from .debug_ranges
and .debug_aranges), implement a different approach: save endpoints
of all ranges, and then use a sweep-line approach to construct
the desired mapping. If we find that certain address maps to
several compilation units, we just pick any of them.
llvm-svn: 210860
This commit adds MachineMemOperands to load and store instructions. This allows
the peephole optimizer to fold load instructions. Unfortunatelly the peephole
optimizer currently doesn't run at -O0.
llvm-svn: 210858
Enable value forwarding for loads from `calloc()` without an intervening
store.
This change extends GVN to handle the following case:
%1 = tail call noalias i8* @calloc(i64 1, i64 4)
%2 = bitcast i8* %1 to i32*
; This load is trivially constant zero
%3 = load i32* %2, align 4
This is analogous to the handling for `malloc()` in the same places.
`malloc()` returns `undef`; `calloc()` returns a zero value. Note that
it is correct to return zero even for out of bounds GEPs since the
result of such a GEP would be undefined.
Patch by Philip Reames!
llvm-svn: 210828
Delete all unused ones, and add new AMDGPU named intrinsics for
the ones that are. Handle the old AMDIL names for comptability (although
remove their GCCBuiltin names) and add tests since there weren't any
for these before.
llvm-svn: 210827
Windows on ARM uses COFF/PE which is intrinsically position independent. For
the case of 32-bit immediates, use a pair-wise relocation as otherwise we may
exceed the range of operators. This fixes a code generation crash when using
-Oz when targeting Windows on ARM.
llvm-svn: 210814
Turns out that DW_AT_ranges_base attribute sets the offset for
DW_AT_ranges values specified in the .dwo file, but not for DW_AT_ranges specified
in the skeleton compile unit DIE in the main executable. This is extremely confusing,
and would hopefully be fixed in DWARF-5 when it's finalized. For now this
behavior makes sense, as otherwise Fission would break DWARF consumers who
doesn't know anything about DW_AT_ranges_base.
llvm-svn: 210809
Moritz's changes have improved codegen a lot, but further testing showed significant correctness problems. Disable by default until these have been worked out.
Patch by Moritz Roth!
llvm-svn: 210789
Summary:
Also tightened up the acceptable condition operand for these instructions
on MIPS-I to MIPS-III. Support for $fcc[1-7] was added in MIPS-IV. Prior
to that only $fcc0 is acceptable.
We currently don't optimize (BEQZ (NOT $a), $target) and similar. It's
probably best to do this in InstCombine.
Depends on D4111
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D4112
llvm-svn: 210787
Summary:
These instructions are not implemented for any MIPS ISA so we only need
testcases.
Depends on D4110
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D4111
llvm-svn: 210786
Summary:
Folded mips64-fp-indexed-ls.ll into fp-indexed-ls.ll. To do so, the zext's in
mips64-fp-indexed-ls.ll were changed to implicit sign extensions (performed
by getelementptr). This does not affect the purpose of the test.
Depends on D4004
Reviewers: zoran.jovanovic, jkolek, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D4110
llvm-svn: 210784
Summary: We haven't implemented this instruction so we only add a test case.
Reviewers: vmedic, zoran.jovanovic, jkolek
Reviewed By: jkolek
Differential Revision: http://reviews.llvm.org/D4004
llvm-svn: 210779
Summary:
c.cond.fmt has been replaced by cmp.cond.fmt. Where c.cond.fmt wrote to
dedicated condition registers, cmp.cond.fmt writes 1 or 0 to normal FGR's
(like the GPR comparisons).
mov[fntz] have been replaced by seleqz and selnez. These instructions
conditionally zero a register based on a bool in a GPR. The results can
then be or'd together to act as a select without, for example, requiring a third
register read port.
mov[fntz].[ds] have been replaced with sel.[ds]
MIPS64r6 currently generates unnecessary sign-extensions for most selects.
This is because the result of a SETCC is currently an i32. Bits 32-63 are
undefined in i32 and the behaviour of seleqz/selnez would otherwise depend
on undefined bits. Later, we will fix this by making the result of SETCC an
i64 on MIPS64 targets.
Depends on D3958
Reviewers: jkolek, vmedic, zoran.jovanovic
Reviewed By: vmedic, zoran.jovanovic
Differential Revision: http://reviews.llvm.org/D4003
llvm-svn: 210777
Summary:
To make this work for both AFGR64 and FGR64 register sets, I've had to make the
instruction definition consistent with the white lie (that it reads the lower
32-bits of the register) when they are generated by expandBuildPairF64().
Corrected the definition of hasMips32r2() and hasMips64r2() to include
MIPS32r6 and MIPS64r6.
Depends on D3956
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3957
llvm-svn: 210771
Summary:
This patch updates both the assembler and the code generator.
MIPS32r6/MIPS64r6 replaces them with maddf.[ds] and msubf.[ds] which are fused
multiply-add/sub operations. We don't emit these yet, this patch only prevents the removed instructions from being emitted.
Depends on D3955
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3956
llvm-svn: 210763
Summary:
This patch disables madd/maddu/msub/msubu in both the assembler and code
generator.
Depends on D3896
Reviewers: jkolek, zoran.jovanovic, vmedic
Reviewed By: vmedic
Differential Revision: http://reviews.llvm.org/D3955
llvm-svn: 210762
This patch adds target combine rules to match:
- [AVX] Horizontal add/sub of packed single/double precision floating point
values from 256-bit vectors;
- [AVX2] Horizontal add/sub of packed integer values from 256-bit vectors.
llvm-svn: 210761
Summary:
The accumulator-based (HI/LO) multiplies and divides from earlier ISA's have
been removed and replaced with GPR-based equivalents. For example:
div $1, $2
mflo $3
is now:
div $3, $1, $2
This patch disables the accumulator-based multiplies and divides for
MIPS32r6/MIPS64r6 and uses the GPR-based equivalents instead.
Renamed expandPseudoDiv to insertDivByZeroTrap to better describe the
behaviour of the function.
MipsDelaySlotFiller now invalidates the liveness information when moving
instructions to the delay slot. Without this, divrem.ll will abort since
%GP ends up used before it is defined.
Reviewers: vmedic, zoran.jovanovic, jkolek
Reviewed By: jkolek
Differential Revision: http://reviews.llvm.org/D3896
llvm-svn: 210760
The verifier follows GlobalAlias operands so that it can detect cycles of
alias definitions. It was doing this in a way that caused it to also recurse
through initializers for the GlobalValue aliasees, and it would fail when
an initializer refers to a global that is a declaration and not a definition.
This patch causes it to stop recursing when it hits a global definition.
<rdar://problem/17277451>
llvm-svn: 210734
See http://reviews.llvm.org/D4090 for more details.
The Clang change that produces this metadata was committed in r210667
Patch by Mark Heffernan.
llvm-svn: 210721
Previously we would always print the offset as decimal, regardless of
the formatting requested. Now we use the formatImm() helper so the value
is printed as the client (LLDB in the motivating example) requested.
Before:
ldr.w r8, [sp, #180] @ always
After:
ldr.w r8, [sp, #0xb4] @ when printing hex immediates
ldr.w r8, [sp, #0180] @ when printing decimal immediates
rdar://17237103
llvm-svn: 210701
Previously there was a separate mode entirely (--hdis vs.
--disassemble). It makes a bit more sense for the immediate printing
style to be a flag for --disassmeble rather than an entirely different
thing.
llvm-svn: 210700
This is the same problem fixed in r210664 for more types.
The test passes without this fix. For some reason
I'm only hitting this when creating selects lowered
to v2i32 selects.
llvm-svn: 210692
The idea of this patch is to turn llvm/Support/system_error.h into a
transitional header that just brings in the erorr_code api to the llvm
namespace. I will remove it shortly afterwards.
The cases where the general idea needed some tweaking:
* std::errc is a namespace in msvc, so we cannot use "using std::errc". I could
add an #ifdef, but there were not that many uses, so I just added std:: to
them in this patch.
* Template specialization had to be moved to the std namespace in this
patch set already.
* The msvc implementation of default_error_condition doesn't seem to
provide the same transformations as we need. Not too surprising since
the standard doesn't actually say what "equivalent" means. I fixed the
problem by keeping our old mapping and using it at error_code
construction time.
Despite these shortcomings I think this is still a good thing. Some reasons:
* The different implementations of system_error might improve over time.
* It removes 925 lines of code from llvm already.
* It removes 6313 bytes from the text segment of the clang binary when
it is built with gcc and 2816 bytes when building with clang and
libstdc++.
llvm-svn: 210687
There seem to be only 2 places that produce these,
and it's kind of tricky to hit them.
Also fixes failure to bitcast between i64 and v2f32,
although this for some reason wasn't actually broken in the
simple bitcast testcase, but did in the scalar_to_vector one.
llvm-svn: 210664
Summary:
MIPS32r6/MIPS64r6 support has not been added yet.
inlineasm-cnstrnt-reg.ll:
Explicitly specify the CPU since it will not work on MIPS32r6/MIPS64r6
when -integrated-as is the default. We can't change the mnemonic since the
LO register is an implicit def of mtlo and MIPS32r6/MIPS64r6 has no
instructions that use LO.
2008-08-01-AsmInline.ll:
Explicitly specify the CPU since MIPS32r6/MIPS64r6 will correctly emit
different code and this is a regression test.
mips64instrs.ll and mips64muldiv.ll
Check registers and the way the multiply is used in m1
divrem.ll
Check registers and use multiple filecheck prefixes to limit redundancy
Reviewers: vmedic, jkolek, zoran.jovanovic, matheusalmeida
Reviewed By: matheusalmeida
Subscribers: matheusalmeida
Differential Revision: http://reviews.llvm.org/D3894
llvm-svn: 210656
Summary: These instructions are available in ISAs >= mips32/mips64. For mips32r6/mips64r6, jr.hb has a new encoding format.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D4019
llvm-svn: 210654
This patch moves part of the logic implemented by the target specific
combine rules added at r210477 to a separate helper function.
This should make easier to add more rules for matching AVX/AVX2 horizontal
adds/subs.
This patch also fixes a problem caused by a wrong check performed on indices
of extract_vector_elt dag nodes in input to the scalar adds/subs.
New tests have been added to verify that we correctly check indices of
extract_vector_elt dag nodes when selecting a horizontal operation.
llvm-svn: 210644
This commit is to improve global merge pass and support global symbol merge.
The global symbol merge is not enabled by default. For aarch64, we need some
more back-end fix to make it really benifit ADRP CSE.
llvm-svn: 210640
This reverts commit r206683.
The code was confusing SEH register numbers with DWARF register numbers.
The test case it was committed with was obviously incorrect. The
disassembler was roundtripping '.seh_pushreg %rsi' as '.seh_pushreg
%rbp', and other exciting things.
Noticed by Vadim Chugunov.
llvm-svn: 210574