Commit Graph

30335 Commits

Author SHA1 Message Date
Adrian Prantl 87b7eb9d0f Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
2014-10-01 18:55:02 +00:00
Reed Kotler b9dc248e9e Add fptrunc to mips fast-sel
Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel

Test Plan:
fptrunc.ll
checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: rfuhler

Differential Revision: http://reviews.llvm.org/D5553

llvm-svn: 218785
2014-10-01 18:47:02 +00:00
Adrian Prantl b458dc2eee Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

llvm-svn: 218782
2014-10-01 18:10:54 +00:00
Adrian Prantl 25a7174e7a Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

llvm-svn: 218778
2014-10-01 17:55:39 +00:00
Tom Stellard 79243d9664 R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol table
llvm-svn: 218776
2014-10-01 17:15:17 +00:00
Toma Tabacu c4c202a9a7 [mips] Rename emit and parse functions for the .cpload assembler directive. NFC.
Summary: It's better if we have a consistent name for .cpload-related functions.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5437

llvm-svn: 218768
2014-10-01 14:53:19 +00:00
Tom Stellard 3a35d8f4c2 R600/SI: Add a generic pseudo EXP instruction
llvm-svn: 218767
2014-10-01 14:44:45 +00:00
Tom Stellard 0c238c2fbe R600/SI: Add generic pseudo MTBUF instructions
llvm-svn: 218766
2014-10-01 14:44:43 +00:00
Tom Stellard c470c96e6b R600/SI: Add generic pseudo SMRD instructions
llvm-svn: 218765
2014-10-01 14:44:42 +00:00
Oliver Stannard d4e0a4fd2c [ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5
Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions
when targeting ARMv8, but they are actually present on any target with
FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an
M-profile core, but they have the same instructions so we model them
both as FPARMv8 in the ARM backend.

llvm-svn: 218763
2014-10-01 13:13:18 +00:00
Chandler Carruth 6c02c031b8 [x86] Fix a few more tiny patterns with the new vector shuffle lowering
that keep cropping up in the regression test suite.

This also addresses one of the issues raised on the mailing list with
failing to form 'movsd' in as many cases as we realistically should.
There will be corresponding patches forthcoming for v4f32 at least. This
was a lot of fuss for a relatively small gain, but all the fuss was on
my end trying different ways of holding the pieces of the x86 fragment
patterns *just right*. Now that it works, the code is reasonably simple.

In the new test cases I'm adding here, v2i64 sticks out as just plain
horrible. I've not come up with any great ideas here other than that it
would be nice to recognize when we're *going* to take a domain crossing
hit and cross earlier to get the decent instructions. At least with AVX
it is slightly less silly....

llvm-svn: 218756
2014-10-01 11:14:02 +00:00
Chandler Carruth 048486109b [x86] Delete some extraneous logic from the new vector shuffle lowering.
Nothing was relying on this and there are potentially some edge cases
that it would not be correct under. Removing it seems better than trying
to "fix" it as nothing was relying on it.

llvm-svn: 218755
2014-10-01 11:13:57 +00:00
Tom Coxon e493f177ee [AArch64] Allow access to all system registers with MRS/MSR instructions.
The A64 instruction set includes a generic register syntax for accessing
implementation-defined system registers. The syntax for these registers is:
    S<op0>_<op1>_<CRn>_<CRm>_<op2>

The encoding space permitted for implementation-defined system registers
is:
    op0 op1  CRn   CRm   op2
    11  xxx  1x11  xxxx  xxx

The full encoding space can now be accessed:
    op0 op1  CRn   CRm   op2
    xx  xxx  xxxx  xxxx  xxx

This is useful to anyone needing to write assembly code supporting new
system registers before the assembler has learned the official names for
them.

llvm-svn: 218753
2014-10-01 10:13:59 +00:00
Asiri Rathnayake 530b3edab6 Add missing natual vector cast.
Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST
was introduced in r217159 and r217138. This patch adds a missing cast from
v2f32 to v1i64 which is causing some compilation failures. Also added test
cases to cover various modimm types and BUILD_VECTORs with i64 elements.

llvm-svn: 218751
2014-10-01 09:59:45 +00:00
Oliver Stannard 37e4daab05 [ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM)
The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and
FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be
modelled using the same target feature, and all double-precision
operations are already disabled by the fp-only-sp target features.

llvm-svn: 218747
2014-10-01 09:02:17 +00:00
Daniel Sanders 92db6b78f7 [mips] Fix disassembly of [ls][wd]c[23], cache, and pref
Fixes PR21015, and PR20993.                                                       
                                                                                  
Patch by Jun Koi

llvm-svn: 218745
2014-10-01 08:26:55 +00:00
Sasa Stankovic 7072a7968f [mips] For indirect calls we don't need $gp to point to .got. Mips linker
doesn't generate lazy binding stub for a function whose address is taken in
the program.

Differential Revision: http://reviews.llvm.org/D5067

llvm-svn: 218744
2014-10-01 08:22:21 +00:00
Nick Lewycky 5f75f4ddb9 Fix typo in comment from r218733
llvm-svn: 218739
2014-10-01 03:37:34 +00:00
Chandler Carruth 26cb9b8d2d [x86] Teach the new vector shuffle lowering to be even more aggressive
in exposing the scalar value to the broadcast DAG fragment so that we
can catch even reloads and fold them into the broadcast.

This is somewhat magical I'm afraid but seems to work. It is also what
the old lowering did, and I've switched an old test to run both
lowerings demonstrating that we get the same result.

Unlike the old code, I'm not lowering f32 or f64 scalars through this
path when we only have AVX1. The target patterns include pretty heinous
code to re-cast those as shuffles when the scalar happens to not be
spilled because AVX1 provides no broadcast mechanism from registers
what-so-ever. This is terribly brittle. I'd much rather go through our
generic lowering code to get this. If needed, we can add a peephole to
get even more opportunities to broadcast-from-spill-slots that are
exposed post-RA, but my suspicion is this just doesn't matter that much.

llvm-svn: 218734
2014-10-01 03:19:43 +00:00
Chandler Carruth 846baf2ca1 [x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is
the same speed as pshufd but we can fold loads into the pmovzx
instructions.

This fixes some regressions that came up in the regression test suite
for the new vector shuffle lowering.

llvm-svn: 218733
2014-10-01 02:25:54 +00:00
Adam Nemet 05d8c8e682 [AVX512] Remove space before \t in AsmStrings.
llvm-svn: 218725
2014-10-01 00:41:32 +00:00
Chandler Carruth b9d3fa1e65 [x86] Teach the new vector shuffle lowering about VBROADCAST and
VPBROADCAST.

This has the somewhat expected pervasive impact. I don't know why
I forgot about this. Everything seems good with lots of significant
improvements in the tests.

llvm-svn: 218724
2014-10-01 00:41:21 +00:00
Sanjay Patel 8fde95cb2b Split the estimate() interface into separate functions for each type. NFC.
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...

llvm-svn: 218698
2014-09-30 20:28:48 +00:00
Juergen Ributzka c110c0b99a Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Note: This version fixed an issue with the TBZ/TBNZ instructions that were
generated in FastISel. The issue was that the 64bit version of TBZ (TBZX)
automagically sets the upper bit of the immediate field that is used to specify
the bit we want to test. To test for any of the lower 32bits we have to first
extract the subregister and use the 32bit version of the TBZ instruction (TBZW).

Original commit message:
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

llvm-svn: 218693
2014-09-30 19:59:35 +00:00
Matt Arsenault 9706978077 R600/SI: Fix printing of clamp and omod
No tests for omod since nothing uses it yet, but
this should get rid of the remaining annoying trailing
zeros after some instructions.

llvm-svn: 218692
2014-09-30 19:49:48 +00:00
Matt Arsenault 272c50a1fe R600/SI: Update VOP3b to not include obsolete operands
abs / neg are now part of the srcN_modifiers operands

llvm-svn: 218691
2014-09-30 19:49:43 +00:00
Reed Kotler 3ebdcc9ea7 Add numeric extend, trunctate to mips fast-isel
Summary:
 Add numeric extend, trunctate to mips fast-isel

 Reactivates D4827



Test Plan:
fpext.ll
loadstoreconv.ll

Reviewers: dsanders

Subscribers: mcrosier

Differential Revision: http://reviews.llvm.org/D5251

llvm-svn: 218681
2014-09-30 16:30:13 +00:00
Tom Coxon 2c13e71728 [AArch64] Remove unnecessary whitespace. (Test commit)
llvm-svn: 218680
2014-09-30 16:23:16 +00:00
Robert Khasanov 28a7df0b5f [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}.
Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com>

llvm-svn: 218670
2014-09-30 12:15:52 +00:00
Robert Khasanov 5aa4445bde [AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ}
Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1.
Now cmp intrinsics lower in the following way:
 (i8 (int_x86_avx512_mask_pcmpeq_q_128
             (v2i64 %a), (v2i64 %b), (i8 %mask))) ->
 (i8 (bitcast
   (v8i1 (insert_subvector undef,
           (v2i1 (and (PCMPEQM %a, %b),
                      (extract_subvector
                         (v8i1 (bitcast %mask)), 0))), 0))))

llvm-svn: 218669
2014-09-30 11:41:54 +00:00
Robert Khasanov b25e562d14 [AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW.
Added new operand type for intrinsics (IIT_V64)

llvm-svn: 218668
2014-09-30 11:32:22 +00:00
Robert Khasanov a27c8e0fd9 [AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ.
Added CMP_MASK intrinsic type

llvm-svn: 218667
2014-09-30 11:19:50 +00:00
Job Noorman a9372a2755 Make sure aggregates are properly alligned on MSP430.
llvm-svn: 218665
2014-09-30 11:15:44 +00:00
Chandler Carruth aaf8e03d92 [x86] Revert r218588, r218589, and r218600. These patches were pursuing
a flawed direction and causing miscompiles. Read on for details.

Fundamentally, the premise of this patch series was to map
VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because
we are going to *have* to lower to VSELECT nodes for some blends to
trigger the instruction selection patterns of variable blend
instructions. This doesn't actually work out so well.

In order to match performance with the existing VECTOR_SHUFFLE
lowering code, we would need to re-slice the blend in order to fit it
into either the integer or floating point blends available on the ISA.
When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources)
this works well because the X86 backend ensures that these types of
operands to VSELECT get sign extended into '-1' and '0' for true and
false, allowing us to re-slice the bits in whatever granularity without
changing semantics.

However, if the VSELECT condition comes from some other source, for
example code lowering vector comparisons, it will likely only have the
required bit set -- the high bit. We can't blindly slice up this style
of VSELECT. Reid found some code using Halide that triggers this and I'm
hopeful to eventually get a test case, but I don't need it to understand
why this is A Bad Idea.

There is another aspect that makes this approach flawed. When in
VECTOR_SHUFFLE form, we have very distilled information that represents
the *constant* blend mask. Converting back to a VSELECT form actually
can lose this information, and so I think now that it is better to treat
this as VECTOR_SHUFFLE until the very last moment and only use VSELECT
nodes for instruction selection purposes.

My plan is to:
1) Clean up and formalize the target pre-legalization DAG combine that
   converts a VSELECT with a constant condition operand into
   a VECTOR_SHUFFLE.
2) Remove any fancy lowering from VSELECT during *legalization* relying
   entirely on the DAG combine to catch cases where we can match to an
   immediate-controlled blend instruction.

One additional step that I'm not planning on but would be interested in
others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV
which encodes a fully legalized VSELECT node. Then it would be easy to
write isel patterns only in terms of this to ensure VECTOR_SHUFFLE
legalization only ever forms the fully legalized construct and we can't
cycle between it and VSELECT combining.

llvm-svn: 218658
2014-09-30 02:52:28 +00:00
Matt Arsenault 06a711dce5 Fix missing C++ mode comment
llvm-svn: 218654
2014-09-30 01:05:27 +00:00
Juergen Ributzka 6ac12439d0 [FastISel][AArch64] Fold sign-/zero-extends into the load instruction.
The sign-/zero-extension of the loaded value can be performed by the memory
instruction for free. If the result of the load has only one use and the use is
a sign-/zero-extend, then we emit the proper load instruction. The extend is
only a register copy and will be optimized away later on.

Other instructions that consume the sign-/zero-extended value are also made
aware of this fact, so they don't fold the extend too.

This fixes rdar://problem/18495928.

llvm-svn: 218653
2014-09-30 00:49:58 +00:00
Juergen Ributzka 0616d9d41a [FastISel][AArch64] Factor out scale factor calculation. NFC.
Factor out the code that determines the implicit scale factor of memory
operations for a given value type.

llvm-svn: 218652
2014-09-30 00:49:54 +00:00
Eric Christopher a2db922c0e Simplify conditional.
llvm-svn: 218643
2014-09-29 23:31:13 +00:00
Adam Nemet 6bddb8c3a5 [AVX512] Use X86VectorVTInfo in the masking helper classes and the FMAs
No functionality change.

Makes the code more compact (see the FMA part).

This needs a new type attribute MemOpFrag in X86VectorVTInfo.  For now I only
defined this in the simple cases.  See the commment before the attribute.

Diff of X86.td.expanded before and after is empty except for the appearance of
the new attribute.

llvm-svn: 218637
2014-09-29 22:54:41 +00:00
Eric Christopher 6a0551e43a Add soft-float to the key for the subtarget lookup in the TargetMachine
map, this makes sure that we can compile the same code for two different
ABIs (hard and soft float) in the same module.

Update one testcase accordingly (and fix some confusing naming) and
add a new testcase as well with the ordering swapped which would
highlight the problem.

llvm-svn: 218632
2014-09-29 21:57:54 +00:00
Eric Christopher 9b270d4dc9 Fix spelling and reflow comments.
llvm-svn: 218631
2014-09-29 21:57:52 +00:00
Dave Estes 5f9daea101 [AArch64] Refines the Cortex-A57 Machine Model
Primarily refines all of the instructions with accurate latency
and micro-op information. Refinements largely focus on the NEON
instructions.

Additionally, a few advanced features are modeled, including
forwarding for MAC instructions and hazards for floating point SQRT
and DIV.

Lastly, the issue-width is reduced to three so that the scheduler
will better accommodate the narrower decode and dispatch width.

llvm-svn: 218627
2014-09-29 21:27:36 +00:00
Matt Arsenault 1fd0c62821 Fix include order
llvm-svn: 218611
2014-09-29 15:53:15 +00:00
Matt Arsenault 9783e00d1e R600/SI: Fix hardcoded values for modifiers.
Move enums to SIDefines.h

llvm-svn: 218610
2014-09-29 15:50:26 +00:00
Matt Arsenault 3d4233fe48 R600/SI: Also fix fsub + fadd a, a to mad combines
llvm-svn: 218609
2014-09-29 14:59:38 +00:00
Matt Arsenault 02cb0ff7db R600/SI: Fix using mad with multiplies by 2
These turn into fadds, so combine them into the target
mad node.

fadd (fadd (a, a), b) -> mad 2.0, a, b

llvm-svn: 218608
2014-09-29 14:59:34 +00:00
Chad Rosier 70d54ac848 [AArch64] Improve cost model to handle sdiv by a pow-of-two.
This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer to do a better job.

http://reviews.llvm.org/D5469
PR20714

llvm-svn: 218607
2014-09-29 13:59:31 +00:00
Oliver Stannard a4eba5ad70 [Thumb2] ldrexd and strexd are not defined on v7M
The Thumb2 ldrexd and strexd instructions are not defined for
M-class architectures.

llvm-svn: 218603
2014-09-29 10:57:29 +00:00
Chandler Carruth 6cbf43167b [x86] Make the new vector shuffle lowering lower blends as VSELECT
nodes, and rely exclusively on its logic. This removes a ton of
duplication from the blend lowering and centralizes it in one place.

One downside is that it requires a bunch of hacks to make this work with
the current legalization framework. We have to manually speculate one
aspect of legalizing VSELECT nodes to get everything to work nicely
because the existing legalization framework isn't *actually* bottom-up.

The other grossness is that we somewhat duplicate the analysis of
constant blends. I'm on the fence here. If reviewers thing this would
look better with VSELECT when it has constant operands dumping over tho
VECTOR_SHUFFLE, we could go that way. But it would be a substantial
change because currently all of the actual blend instructions are
matched via patterns in the TD files based around VSELECT nodes (despite
them not being perfect fits for that). Suggestions welcome, but at least
this removes the rampant duplication in the backend.

llvm-svn: 218600
2014-09-29 09:57:07 +00:00
Chandler Carruth b1cc7a8542 [x86] Delete a bunch of really bad and totally unnecessary code in the
X86 target-specific DAG combining that tried to convert VSELECT nodes
into VECTOR_SHUFFLE nodes that it "knew" would lower into
immediate-controlled blend nodes.

Turns out, we have perfectly good lowering of all these VSELECT nodes,
and indeed that lowering already knows how to handle lowering through
BLENDI to immediate-controlled blend nodes. The code just wasn't getting
used much because this thing forced the world to go through the vector
shuffle lowering. Yuck.

This also exposes that I was too aggressive in avoiding domain crossing
in v218588 with that lowering -- when the other option is to expand into
two 128-bit vectors, it is worth domain crossing. Restore that behavior
now that we have nice tests covering it.

The test updates here fall into two camps. One is where previously we
ended up with an unsigned encoding of the blend operand and now we get
a signed encoding. In most of those places there were elaborate comments
explaining exactly what these operands really mean. Rather than that,
just switch these tests to use the nicely decoded comments that make it
obvious that the final shuffle matches.

The other updates are just removing pointless domain crossing by
blending integers with PBLENDW rather than BLENDPS.

llvm-svn: 218589
2014-09-29 02:01:20 +00:00
Chandler Carruth d639c7a829 [x86] Refactor all of the VSELECT-as-blend lowering code to avoid domain
crossing and generally work more like the blend emission code in the new
vector shuffle lowering.

My goal is to have the new vector shuffle lowering just produce VSELECT
nodes that are either matched here to BLENDI or are legal and matched in
the .td files to specific blend instructions. That seems much cleaner as
there are other ways to produce a VSELECT anyways. =]

No *observable* functionality changed yet, mostly because this code
appears to be near-dead. The behavior of this lowering routine did
change though. This code being mostly dead and untestable will change
with my next commit which will also point some new tests at it.

llvm-svn: 218588
2014-09-29 01:32:54 +00:00
Chandler Carruth 2f9e56e527 [x86] Improve naming and comments for VSELECT lowering.
No functionality changed.

llvm-svn: 218586
2014-09-29 00:51:58 +00:00
Chandler Carruth c7129276cd [x86] Add the dispatch skeleton to the new vector shuffle lowering for
AVX-512.

There is no interesting logic yet. Everything ends up eventually
delegating to the generic code to split the vector and shuffle the
halves. Interestingly, that logic does a significantly better job of
lowering all of these types than the generic vector expansion code does.
Mostly, it lets most of the cases fall back to nice AVX2 code rather
than all the way back to SSE code paths.

Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next
up will be to incrementally add direct support for the basic instruction
set to each type (adding tests first).

llvm-svn: 218585
2014-09-29 00:37:27 +00:00
Chandler Carruth 32a3ebda14 [x86] Make the split-and-lower routine fully generic by relaxing the
assertion, making the name generic, and improving the documentation.

Step 1 in adding very primitive support for AVX-512. No functionality
changed yet.

llvm-svn: 218584
2014-09-29 00:21:49 +00:00
Chandler Carruth 24e3b69cbd [x86] Teach the new vector shuffle lowering to fall back on AVX-512
vectors.

Someone will need to build the AVX512 lowering, which should follow
AVX1 and AVX2 *very* closely for AVX512F and AVX512BW resp. I've added
a dummy test which is a port of the v8f32 and v8i32 tests from AVX and
AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this
is enough information for someone to implement proper lowering here. If
not, I'll be happy to help, but right now the AVX-512 support isn't
a priority for me.

llvm-svn: 218583
2014-09-28 23:53:10 +00:00
Chandler Carruth abe742e8fb [x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2
lowerings.

This was hopelessly broken. First, the x86 backend wants '-1' to be the
element value representing true in a boolean vector, and second the
operand order for VSELECT is backwards from the actual x86 instructions.
To make matters worse, the backend is just using '-1' as the true value
to get the high bit to be set. It doesn't actually symbolically map the
'-1' to anything. But on x86 this isn't quite how it works: there *only*
the high bit is relevant. As a consequence weird non-'-1' values like
0x80 actually "work" once you flip the operands to be backwards.

Anyways, thanks to Hal for helping me sort out what these *should* be.

llvm-svn: 218582
2014-09-28 23:23:55 +00:00
Chandler Carruth 6578f9208b [x86] Fix a really silly bug that I introduced fixing another bug in the
new vector shuffle target DAG combines -- it helps to actually test for
the value you want rather than just using an integer in a boolean
context.

Have I mentioned that I loathe implicit conversions recently? :: sigh ::

llvm-svn: 218576
2014-09-28 06:11:04 +00:00
Chandler Carruth b10c6b8e9e [x86] Fix yet another bug in the new vector shuffle lowering's handling
of widening masks.

We can't widen a zeroing mask unless both elements that would be merged
are either zeroed or undef. This is the only way to widen a mask if it
has a zeroed element.

Also clean up the code here by ordering the checks in a more logical way
and by using the symoblic values for undef and zero. I'm actually torn
on using the symbolic values because the existing code is littered with
the assumption that -1 is undef, and moreover that entries '< 0' are the
special entries. While that works with the values given to these
constants, using the symbolic constants actually makes it a bit more
opaque why this is the case.

llvm-svn: 218575
2014-09-28 03:30:25 +00:00
Chandler Carruth f4b9e6b9d9 [x86] Fix yet another issue with widening vector shuffle elements.
I spotted this by inspection when debugging something else, so I have no
test case what-so-ever, and am not even sure it is possible to
realistically trigger the bug. But this is what was intended here.

llvm-svn: 218565
2014-09-27 08:40:33 +00:00
Chandler Carruth 4d03be1717 [x86] Fix terrible bugs everywhere in the new vector shuffle lowering
and in the target shuffle combining when trying to widen vector
elements.

Previously only one of these was correct, and we didn't correctly
propagate zeroing target shuffle masks (which have a different sentinel
value from undef in non- target shuffle masks now). This isn't just
a missed optimization, this caused us to drop zeroing shuffles on the
floor and miscompile code. The added test case is one example of that.

There are other fixes to the test suite as a consequence of this as well
as restoring the undef elements in some of the masks that were lost when
I brought sanity to the actual *value* of the undef and zero sentinels.

I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW
combining code, but that code really needs to go. It was a nice initial
attempt, but it isn't very principled and the recursive shuffle combiner
is much more powerful.

llvm-svn: 218562
2014-09-27 04:42:44 +00:00
Chandler Carruth 81e6b29f03 [x86] Flip the sentinel values used in the target shuffle mask decoding
to significantly more sane sentinels. Notably, everywhere else in the
backend's representation of shuffles uses '-1' to represent undef. The
target shuffle masks really shouldn't diverge from that, especially as
in a few places they are manipulated by shared code.

This causes us to lose some undef lanes in various test masks. I want to
get these back, but technically it isn't invalid and there are a *lot*
of bugs here so I want to try to establish a saner baseline for fixing
some of the bugs by aligning the specific senitnel values used.

llvm-svn: 218561
2014-09-27 04:42:39 +00:00
Sanjay Patel bdf1e38856 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484

llvm-svn: 218553
2014-09-26 23:01:47 +00:00
Chandler Carruth f572f3b2c0 [x86] Fix a moderately terrifying bug in the new 128-bit shuffle logic
that managed to elude all of my fuzz testing historically. =/

Something changed to allow this code path to actually be exercised and
it was doing bad things. It is especially heavily exercised by the
patterns that emerge when doing AVX shuffles that end up lowered through
the 128-bit code path.

llvm-svn: 218540
2014-09-26 20:41:45 +00:00
Matt Arsenault 2dd3129b0a R600/SI: Use break instead of continue
If an instruction doesn't have src1, it doesn't have src2

llvm-svn: 218536
2014-09-26 17:55:14 +00:00
Matt Arsenault a276c3e053 R600/SI: Add a note about the order of the operands to div_scale
llvm-svn: 218534
2014-09-26 17:55:09 +00:00
Matt Arsenault ee522bf23e R600/SI: Move finding SGPR operand to move to separate function
llvm-svn: 218533
2014-09-26 17:55:06 +00:00
Matt Arsenault 6a0919fb9b R600/SI Allow same SGPR to be used for multiple operands
Instead of moving the first SGPR that is different than the first,
legalize the operand that requires the fewest moves if one
SGPR is used for multiple operands.

This saves extra moves and is also required for some instructions
which require that the same operand be used for multiple operands.

llvm-svn: 218532
2014-09-26 17:55:03 +00:00
Matt Arsenault cb0ac3d1fb R600/SI: Partially move operand legalization to post-isel hook.
Disable the SGPR usage restriction parts of the DAG legalizeOperands.
It now should only be doing immediate folding until it can be replaced
later. The real legalization work is now done by the other
SIInstrInfo::legalizeOperands

llvm-svn: 218531
2014-09-26 17:54:59 +00:00
Matt Arsenault 92befe7996 R600/SI: Implement findCommutedOpIndices
The base implementation of commuteInstruction is used
in some cases, but it turns out this has been broken for a
long time since modifiers were inserted between the real operands.

The base implementation of commuteInstruction also fails on immediates,
which also needs to be fixed.

llvm-svn: 218530
2014-09-26 17:54:54 +00:00
Matt Arsenault 5885bef6cf R600/SI: Don't move operands that are required to be SGPRs
e.g. v_cndmask_b32 requires the condition operand be an SGPR.
If one of the source operands were an SGPR, that would be considered
the one SGPR use and the condition operand would be illegally moved.

llvm-svn: 218529
2014-09-26 17:54:52 +00:00
Matt Arsenault 0bea8d830e R600/SI: Don't assert on exotic operand types
This needs a test, but I'm not sure if it is currently possible and
I originally hit it due to a bug. Right now the only global address
operands have no reason to be VALU instructions, although it
theoretically could be a problem.

llvm-svn: 218528
2014-09-26 17:54:46 +00:00
Matt Arsenault aff65fbca5 R600/SI: Fix using wrong operand indices when commuting
No test since the current SIISelLowering::legalizeOperands
effectively hides this, and the general uses seem to only fire
on SALU instructions which don't have modifiers between
the operands.

When trying to use legalizeOperands immediately after
instruction selection, it now sees a lot more patterns
it did not see before which break on this.

llvm-svn: 218527
2014-09-26 17:54:43 +00:00
Matt Arsenault e50c1c4a64 R600/SI: Remove apparently dead code in legalizeOperands
No tests hit this, and I don't see any way a GlobalAddress
node would survive beyond lowering on SI. It it would, the
move should probably be inserted by selection.

llvm-svn: 218526
2014-09-26 17:54:38 +00:00
Chandler Carruth acd1906446 [x86] The mnemonic is SHUFPS not SHUPFS. =[ I'm very bad at spelling
sadly.

llvm-svn: 218524
2014-09-26 17:27:40 +00:00
Chandler Carruth 0c9ee10d01 [x86] In the new vector shuffle lowering, when trying to do another
layer of tie-breaking sorting, it really helps to check that you're in
a tie first. =] Otherwise the whole thing cycles infinitely. Test case
added, another one found through fuzz testing.

llvm-svn: 218523
2014-09-26 17:24:26 +00:00
Chandler Carruth 5afd4c2603 [x86] Fix a large collection of bugs that crept in as I fleshed out the
AVX support.

New test cases included. Note that none of the existing test cases
covered these buggy code paths. =/ Also, it is clear from this that
SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[

These were all detected by fuzz-testing. (I <3 fuzz testing.)

llvm-svn: 218522
2014-09-26 17:11:02 +00:00
Renato Golin 36c626e33f Elide repeated register operand in Thumb1 instructions
This patch makes the ARM backend transform 3 operand instructions such as
'adds/subs' to the 2 operand version of the same instruction if the first
two register operands are the same.

Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'.

Currently for some instructions such as 'adds' if you try to assemble
'adds r0, r0, #8' for thumb v6m the assembler would throw an error message
because the immediate cannot be encoded using 3 bits.

The backend should be smart enough to transform the instruction to
'adds r0, #8', which allows for larger immediate constants.

Patch by Ranjeet Singh.

llvm-svn: 218521
2014-09-26 16:14:29 +00:00
Andrea Di Biagio 196e873cdc [X86][SchedModel] SSE reciprocal square root instruction latencies.
The SSE rsqrt instruction (a fast reciprocal square root estimate) was
grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very
slow) SSE sqrt instruction. For code which uses rsqrt (possibly with
newton-raphson iterations) this poor scheduling was affecting performances.

This patch splits off the rsqrt instruction from the sqrt instruction scheduling
classes and creates new IIC_SSE_RSQER* classes with latency values based on
Agner's table.

Differential Revision: http://reviews.llvm.org/D5370

Patch by Simon Pilgrim.

llvm-svn: 218517
2014-09-26 12:56:44 +00:00
Daniel Sanders 13496c4102 Fix unused variable warning added in r218509
llvm-svn: 218510
2014-09-26 10:45:26 +00:00
Daniel Sanders b3ca3388ca [mips] Generalize the handling of f128 return values to support f128 arguments.
Summary:
This will allow us to handle f128 arguments without duplicating code from
CCState::AnalyzeFormalArguments() or CCState::AnalyzeCallOperands().

No functional change.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5292

llvm-svn: 218509
2014-09-26 10:06:12 +00:00
Robert Khasanov 6d62c0202b [AVX512] Added load/store from BW/VL subsets to Register2Memory opcode tables.
Added lowering tests for these instructions.

llvm-svn: 218508
2014-09-26 09:48:50 +00:00
David Majnemer ec44e4d053 Fix build breakage on MSVC 2013
llvm-svn: 218499
2014-09-26 04:47:54 +00:00
David Majnemer de36075b41 Target: Fix build breakage.
No functional change intended.

llvm-svn: 218497
2014-09-26 02:57:05 +00:00
Eric Christopher a9353d1798 Add the first backend support for on demand subtarget creation
based on the Function. This is currently used to implement
mips16 support in the mips backend via the existing module
pass resetting the subtarget.

Things to note:

a) This involved running resetTargetOptions before creating a
new subtarget so that code generation options like soft-float
could be recognized when creating the new subtarget. This is
to deal with initialization code in isel lowering that only
paid attention to the initial value.

b) Many of the existing testcases weren't using the soft-float
feature correctly. I've corrected these based on the check
values assuming that was the desired behavior.

c) The mips port now pays attention to the target-cpu and
target-features strings when generating code for a particular
function. I've removed these from one function where the
requested cpu and features didn't match the check lines in
the testcase.

llvm-svn: 218492
2014-09-26 01:44:08 +00:00
Eric Christopher 3976f78247 Move resetTargetOptions from taking a MachineFunction to a Function
since we are accessing the TargetMachine that we're a member
function of.

llvm-svn: 218489
2014-09-26 01:28:10 +00:00
Matt Arsenault 3a99759498 R600/SI: Fix emitting trailing whitespace after s_waitcnt
llvm-svn: 218486
2014-09-26 01:09:46 +00:00
Adam Nemet ce465421d7 [AVX512] Simplify use of !con()
No change in X86.td.expanded.

llvm-svn: 218485
2014-09-26 00:53:12 +00:00
Adam Nemet f7988d7364 [AVX512] Pull pattern for subvector extract into the instruction definition
No functional change.

I initially thought that pulling the Pat<> into the instruction pattern was
not possible because it was doing a transform on the index in order to convert
it from a per-element (extract_subvector) index into a per-chunk (vextract*x4)
index.

Turns out this also works inside the pattern because the vextract_extract
PatFrag has an OperandTransform EXTRACT_get_vextract{128,256}_imm, so the
index in $idx goes through the same conversion.

The existing test CodeGen/X86/avx512-insert-extract.ll extended in the
previous commit provides coverage for this change.

llvm-svn: 218480
2014-09-25 23:48:49 +00:00
Adam Nemet 55536c6a8f [AVX512] Refactor subvector extracts
No functional change.

These are now implemented as two levels of multiclasses heavily relying on the
new X86VectorVTInfo class.  The multiclass at the first level that is called
with float or int provides the 128 or 256 bit subvector extracts.  The second
level provides the register and memory variants and some more Pat<>s.

I've compared the td.expanded files before and after.  One change is that
ExeDomain for 64x4 is SSEPackedDouble now.  I think this is correct, i.e. a
bugfix.

(BTW, this is the change that was blocked on the recent tablegen fix.  The
class-instance values X86VectorVTInfo inside vextract_for_type weren't
properly evaluated.)

Part of <rdar://problem/17688758>

llvm-svn: 218478
2014-09-25 23:48:45 +00:00
Adam Nemet 6ea09eb148 [AVX512] Fix typo
F->I in VEXTRACTF32x4rr.

llvm-svn: 218477
2014-09-25 23:48:42 +00:00
Tom Stellard 1fa1ce6112 ARM: Remove unneeded check for MI->hasPostISelHook()
llvm-svn: 218459
2014-09-25 18:59:23 +00:00
Tom Stellard 7980fc8562 R600/SI: Add support for global atomic add
llvm-svn: 218457
2014-09-25 18:30:26 +00:00
Robin Morisset 810739d174 Lower idempotent RMWs to fence+load
Summary:
I originally tried doing this specifically for X86 in the backend in D5091,
but it was rather brittle and generally running too late to be general.
Furthermore, other targets may want to implement similar optimizations.
So I reimplemented it at the IR-level, fitting it into AtomicExpandPass
as it interacts with that pass (which could not be cleanly done before
at the backend level).

This optimization relies on a new target hook, which is only used by X86
for now, as the correctness of the optimization on other targets remains
an open question. If it is found correct on other targets, it should be
trivial to enable for them.

Details of the optimization are discussed in D5091.

Test Plan: make check-all + a new test

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5422

llvm-svn: 218455
2014-09-25 17:27:43 +00:00
Sid Manning 31f7125562 Add missing attributes !cmp.[eq,gt,gtu] instructions.
These instructions do not indicate they are extendable or the
number of bits in the extendable operand.  Rename to match
architected names.  Add a testcase for the intrinsics.

llvm-svn: 218453
2014-09-25 13:09:54 +00:00
Daniel Sanders 621589e7c0 Add llvm_unreachables() for [ASZ]ExtUpper to X86FastISel.cpp to appease the buildbots.
llvm-svn: 218452
2014-09-25 13:08:51 +00:00
Daniel Sanders ae275e38a2 [mips] Add CCValAssign::[ASZ]ExtUpper and CCPromoteToUpperBitsInType and handle struct's correctly on big-endian N32/N64 return values.
Summary:
The N32/N64 ABI's require that structs passed in registers are laid out
such that spilling the register with 'sd' places the struct at the lowest
address. For little endian this is trivial but for big-endian it requires
that structs are shifted into the upper bits of the register.

We also require that structs passed in registers have the 'inreg'
attribute for big-endian N32/N64 to work correctly. This is because the
tablegen-erated calling convention implementation only has access to the
lowered form of struct arguments (one or more integers of up to 64-bits
each) and is unable to determine the original type.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5286

llvm-svn: 218451
2014-09-25 12:15:05 +00:00
Renato Golin f5dd1dacb6 Add aliases for VAND imm to VBIC ~imm
On ARM NEON, VAND with immediate (16/32 bits) is an alias to VBIC ~imm with
the same type size. Adding that logic to the parser, and generating VBIC
instructions from VAND asm files.

This patch also fixes the validation routines for NEON splat immediates which
were wrong.

Fixes PR20702.

llvm-svn: 218450
2014-09-25 11:31:24 +00:00
Chandler Carruth 0a6e961efd [x86] Teach the new vector shuffle lowering to use AVX2 instructions for
v4f64 and v8f32 shuffles when they are lane-crossing. We have fully
general lane-crossing permutation functions in AVX2 that make this easy.

Part of this also changes exactly when and how these vectors are split
up when we don't have AVX2. This isn't always a win but it usually is
a win, so on the balance I think its better. The primary regressions are
all things that just need to be fixed anyways such as modeling when
a blend can be completely accomplished via VINSERTF128, etc.

Also, this highlights one of the few remaining big features: we do
a really poor job of inserting elements into AVX registers efficiently.

This completes almost all of the big tricks I have in mind for AVX2. The
only things left that I plan to add:

1) element insertion smarts
2) palignr and other fairly specialized lowerings when they happen to
   apply

llvm-svn: 218449
2014-09-25 11:03:55 +00:00
Chandler Carruth e91d68c475 [x86] Teach the new vector shuffle lowering a fancier way to lower
256-bit vectors with lane-crossing.

Rather than immediately decomposing to 128-bit vectors, try flipping the
256-bit vector lanes, shuffling them and blending them together. This
reduces our worst case shuffle by a pretty significant margin across the
board.

llvm-svn: 218446
2014-09-25 10:21:15 +00:00
Oliver Stannard 3256b26ef2 [Thumb2] BXJ should be undefined for v7M, v8A
The Thumb2 BXJ instruction (Branch and Exchange Jazelle) is not
defined for v7M or v8A. It is defined for all other Thumb2-supporting
architectures (v6T2, v7A and v7R).

llvm-svn: 218445
2014-09-25 10:02:05 +00:00
Chandler Carruth 02387122e0 [x86] Fix an oversight in the v8i32 path of the new vector shuffle
lowering where it only used the mask of the low 128-bit lane rather than
the entire mask.

This allows the new lowering to correctly match the unpack patterns for
v8i32 vectors.

For reference, the reason that we check for the the entire mask rather
than checking the repeated mask is because the repeated masks don't
abide by all of the invariants of normal masks. As a consequence, it is
safer to use the full mask with functions like the generic equivalence
test.

llvm-svn: 218442
2014-09-25 04:10:27 +00:00
Chandler Carruth 8140158cb5 [x86] Rearrange the code for v16i16 lowering a bit for clarity and to
reduce the amount of checking we do here.

The first realization is that only non-crossing cases between 128-bit
lanes are handled by almost the entire function. It makes more sense to
handle the crossing cases first.

THe second is that until we actually are going to generate fancy shared
lowering strategies that use the repeated semantics of the v8i16
lowering, we should waste time checking for repeated masks. It is
simplest to directly test for the entire unpck masks anyways, so we
gained nothing from this.

This also matches the structure of v32i8 more closely.

No functionality changed here.

llvm-svn: 218441
2014-09-25 04:03:22 +00:00
Chandler Carruth d8f528adb8 [x86] Implement AVX2 support for v32i8 in the new vector shuffle
lowering.

This completes the basic AVX2 feature support, but there are still some
improvements I'd like to do to really get the last mile of performance
here.

llvm-svn: 218440
2014-09-25 02:52:12 +00:00
Chandler Carruth d355369dbb [x86] Remove the defunct X86ISD::BLENDV entry -- we use vector selects
for this now.

Should prevent folks from running afoul of this and not knowing why
their code won't instruction select the way I just did...

llvm-svn: 218436
2014-09-25 01:16:01 +00:00
Chandler Carruth a577bc26b6 [x86] Fix the v16i16 blend logic I added in the prior commit and add the
missing test cases for it.

Unsurprisingly, without test cases, there were bugs here. Surprisingly,
this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV.
It isn't wired to anything. Oops. I'll fix than next.

llvm-svn: 218434
2014-09-25 01:13:38 +00:00
Akira Hatanaka 8cc48bd159 [X86,AVX] Add an isel pattern for X86VBroadcast.
This fixes PR21050 and rdar://problem/18434607.

llvm-svn: 218431
2014-09-25 00:26:15 +00:00
Chandler Carruth 98443d89b9 [x86] Implement v16i16 support with AVX2 in the new vector shuffle
lowering.

This also implements the fancy blend lowering for v16i16 using AVX2 and
teaches the X86 backend to print shuffle masks for 256-bit PSHUFB
and PBLENDW instructions. It also makes the mask decoding correct for
PBLENDW instructions. The yaks, they are legion.

Tests are updated accordingly. There are some missing tests for the
VBLENDVB lowering, but I'll add those in a follow-up as this commit has
accumulated enough cruft already.

llvm-svn: 218430
2014-09-25 00:24:19 +00:00
Chandler Carruth edcba62b4a [x86] Factor out the logic to generically decombose a vector shuffle
into unblended shuffles and a blend.

This is the consistent fallback for the lowering paths that have fast
blend operations available, and its getting quite repetitive.

No functionality changed.

llvm-svn: 218399
2014-09-24 18:20:09 +00:00
Moritz Roth f5d0c7c2c0 [Thumb] Make load/store optimizer less conservative.
If it's safe to clobber the condition flags, we can do a few extra things:
it's then possible to reset the base register writeback using a SUBS, so
we can try to merge even if the base register isn't dead after the merged
instruction.

This is effectively a (heavily bug-fixed) rewrite of r208992.

llvm-svn: 218386
2014-09-24 16:35:50 +00:00
Oliver Stannard 1ae8b476f4 [Thumb] 32-bit encodings of 'cps' are not valid for v7M
v7M only allows the 16-bit encoding of the 'cps' (Change Processor
State) instruction, and does not have the 32-bit encoding which is
valid from v6T2 onwards.

llvm-svn: 218382
2014-09-24 14:20:01 +00:00
Aaron Ballman f086a14d53 Silencing an "enumeral and non-enumeral type in conditional expression" warning. NFC.
llvm-svn: 218381
2014-09-24 13:54:56 +00:00
Chandler Carruth e7e9c04ddf [x86] Teach the instruction lowering to add comments describing constant
pool data being loaded into a vector register.

The comments take the form of:

  # ymm0 = [a,b,c,d,...]
  # xmm1 = <x,y,z...>

The []s are used for generic sequential data and the <>s are used for
specifically ConstantVector loads. Undef elements are printed as the
letter 'u', integers in decimal, and floating point values as floating
point values. Suggestions on improving the formatting or other aspects
of the display are very welcome.

My primary use case for this is to be able to FileCheck test masks
passed to vector shuffle instructions in-register. It isn't fantastic
for that (no decoding special zeroing semantics or other tricks), but it
at least puts the mask onto an instruction line that could reasonably be
checked. I've updated many of the new vector shuffle lowering tests to
leverage this in their test cases so that we're actually checking the
shuffle masks remain as expected.

Before implementing this, I tried a *bunch* of different approaches.
I looked into teaching the MCInstLower code to scan up the basic block
and find a definition of a register used in a shuffle instruction and
then decode that, but this seems incredibly brittle and complex.
I talked to Hal a lot about the "right" way to do this: attach the raw
shuffle mask to the instruction itself in some form of unencoded
operands, and then use that to emit the comments. I still think that's
the optimal solution here, but it proved to be beyond what I'm up for
here. In particular, it seems likely best done by completing the
plumbing of metadata through these layers and attaching the shuffle mask
in metadata which could have fully automatic dropping when encoding an
actual instruction.

llvm-svn: 218377
2014-09-24 09:39:41 +00:00
Chandler Carruth 7b688c6884 [x86] More refactoring of the shuffle comment emission. The previous
attempt didn't work out so well. It looks like it will be much better
for introducing extra logic to find a shuffle mask if the finding logic
is totally separate. This also makes it easy to sink the opcode logic
completely out of the routine so we don't re-dispatch across it.

Still no functionality changed.

llvm-svn: 218363
2014-09-24 03:06:37 +00:00
Chandler Carruth edf50212df [x86] Bypass the shuffle mask comment generation when not using verbose
asm. This can be somewhat expensive and there is no reason to do it
outside of tests or debugging sessions. I'm also likely to make it
significantly more expensive to support more styles of shuffles.

llvm-svn: 218362
2014-09-24 03:06:34 +00:00
Chandler Carruth ab8b37a9d2 [x86] Hoist the logic for extracting the relevant bits of information
from the MachineInstr into the caller which is already doing a switch
over the instruction.

This will make it more clear how to compute different operands to feed
the comment selection for example.

Also, in a drive-by-fix, don't append an empty comment string (which is
a no-op ultimately).

No functionality changed.

llvm-svn: 218361
2014-09-24 02:24:41 +00:00
Matt Arsenault 2c41987490 R600/SI: Add new helper isSGPRClassID
Move these into header since they are trivial

llvm-svn: 218360
2014-09-24 02:17:12 +00:00
Matt Arsenault 262407bc2f R600/SI: Fix hardcoded and wrong operand numbers.
Also fix leftover debug printing

llvm-svn: 218359
2014-09-24 02:17:09 +00:00
Matt Arsenault 69612d6027 R600/SI: Enable named operand table for SALU instructions
llvm-svn: 218358
2014-09-24 02:17:06 +00:00
Chandler Carruth 0b682d42de [x86] Start refactoring the comment printing logic in the MC lowering of
vector shuffles.

This is just the beginning by hoisting it into its own function and
making use of early exit to dramatically simplify the flow of the
function. I'm going to be incrementally refactoring this until it is
a bit less magical how this applies to other instructions, and I can
teach it how to dig a shuffle mask out of a register. Then I plan to
hook it up to VPERMD so we get our mask comments for it.

No functionality changed yet.

llvm-svn: 218357
2014-09-24 02:16:12 +00:00
Tom Stellard 744b99b476 R600/SI: Enable selecting SALU inside branches
We can do this now that the FixSGPRLiveRanges pass is working.

llvm-svn: 218353
2014-09-24 01:33:28 +00:00
Tom Stellard deb3f9e643 R600/SI: Move PHIs that define SGPRs to the VALU in most cases
This fixes a bug that is uncovered by a future commit and will
be tested by the test/CodeGen/R600/sgpr-control-flow.ll test case.

llvm-svn: 218352
2014-09-24 01:33:26 +00:00
Tom Stellard 60024a0558 R600/SI: Fix the FixSGPRLiveRanges pass
The previous implementation was extending the live range of SGPRs
by modifying the live intervals directly.  This was causing a lot
of machine verification errors when the machine scheduler was enabled.

The new implementation adds pseudo instructions with implicit uses to
extend the live ranges of SGPRs, which works much better.

llvm-svn: 218351
2014-09-24 01:33:24 +00:00
Tom Stellard be507fb5d3 R600/SI: Mark EXEC_LO and EXEC_HI as reserved
These registers can be allocated and used like other 32-bit registers,
but it seems like a likely source for bugs.

llvm-svn: 218350
2014-09-24 01:33:23 +00:00
Tom Stellard 9a88593ed0 R600/SI: Fix SIRegisterInfo::getPhysRegSubReg()
Correctly handle special registers: EXEC, EXEC_LO, EXEC_HI, VCC_LO,
VCC_HI, and M0.  The previous implementation would assertion fail
when passed these registers.

llvm-svn: 218349
2014-09-24 01:33:22 +00:00
Tom Stellard 96468903d4 R600/SI: Implement VGPR register spilling for compute at -O0 v3
VGPRs are spilled to LDS.  This still needs more testing, but
we need to at least enable it at -O0, because the fast register
allocator spills all registers that are live at the end of blocks
and without this some future commits will break the
flat-address-space.ll test.

v2: Only calculate thread id once

v3: Move insertion of spill instructions to
    SIRegisterInfo::eliminateFrameIndex()
llvm-svn: 218348
2014-09-24 01:33:17 +00:00
Chandler Carruth 9bd10e7492 [x86] Teach the new vector shuffle lowering to lower v8i32 shuffles with
the native AVX2 instructions.

Note that the test case is really frustrating here because VPERMD
requires the mask to be in the register input and we don't produce
a comment looking through that to the constant pool. I'm going to
attempt to improve this in a subsequent commit, but not sure if I will
succeed.

llvm-svn: 218347
2014-09-24 01:24:44 +00:00
Chandler Carruth fd11815a7d [x86] Fix a really terrible bug in the repeated 128-bin-lane shuffle
detection. It was incorrectly handling undef lanes by actually treating
an undef lane in the first 128-bit lane as a *numeric* shuffle value.

Fortunately, this almost always DTRT and disabled detecting repeated
patterns. But not always. =/ This patch introduces a much more
principled approach and fixes the miscompiles I spotted by inspection
previously.

llvm-svn: 218346
2014-09-24 01:03:57 +00:00
Chandler Carruth df2e421845 [x86] Teach the new vector shuffle lowering to lower v4i64 vector
shuffles using the AVX2 instructions. This is the first step of cutting
in real AVX2 support.

Note that I have spotted at least one bug in the test cases already, but
I suspect it was already present and just is getting surfaced. Will
investigate next.

llvm-svn: 218338
2014-09-23 22:39:02 +00:00
Jim Grosbach 57fd2623c3 AArch64: allow constant expressions for shifted reg literals
e.g., add w1, w2, w3, lsl #(2 - 1)

This sort of thing comes up in pre-processed assembly playing macro games.
Still validate that it's an assembly time constant. The early exit error check
was just a bit overzealous and disallowed a left paren.

rdar://18430542

llvm-svn: 218336
2014-09-23 22:16:02 +00:00
Chandler Carruth 9a94bd6fa4 [x86] Teach the rest of the 'target shuffle' machinery about blends and
add VPBLENDD to the InstPrinter's comment generation so we get nice
comments everywhere.

Now that we have the nice comments, I can see the bug introduced by
a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay
tests that are easy to inspect.

llvm-svn: 218335
2014-09-23 22:14:14 +00:00
Tom Stellard 73ae1cb59a R600/SI: Clean up checks for legality of immediate operands
There are new register classes VCSrc_* which represent operands that
can take an SGPR, VGPR or inline constant.  The VSrc_* class is now used
to represent operands that can take an SGPR, VGPR, or a 32-bit
immediate.

This allows us to have more accurate checks for legality of
immediates, since before we had no way to distinguish between operands
that supported any 32-bit immediate and operands which could only
support inline constants.

llvm-svn: 218334
2014-09-23 21:26:25 +00:00
Robin Morisset 6dbbbc28b0 [X86] Make wide loads be managed by AtomicExpand
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.

Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).

Test Plan: make check-all (lots of tests for this functionality already exist)

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5404

llvm-svn: 218332
2014-09-23 20:59:25 +00:00
Robin Morisset 2212996936 [Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate
Summary:
This patch makes use of AtomicExpandPass in Power for inserting fences around
atomic as part of an effort to remove fence insertion from SelectionDAGBuilder.
As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic
lwsync) instead of sync 0 (heavyweight sync) in many cases.

I also added a test, as there was no test for the barriers emitted by the Power
backend for atomic loads and stores.

Test Plan: new test + make check-all

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5180

llvm-svn: 218331
2014-09-23 20:46:49 +00:00
Robin Morisset dedef3325f Add AtomicExpandPass::bracketInstWithFences, and use it whenever getInsertFencesForAtomic would trigger in SelectionDAGBuilder
Summary:
The goal is to eventually remove all the code related to getInsertFencesForAtomic
in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works
mostly by accident because the backends are overly conservative), and repeats the
same logic that goes in emitLeading/TrailingFence.

In this patch, I make AtomicExpandPass insert the fences as it knows better
where to put them. Because this requires getting the fences and not just
passing an IRBuilder around, I had to change the return type of
emitLeading/TrailingFence.
This code only triggers on ARM for now. Because it is earlier in the pipeline
than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so
SelectionDAGBuilder does not add barriers anymore on ARM.

If this patch is accepted I plan to implement emitLeading/TrailingFence for all
backends that setInsertFencesForAtomic(true), which will allow both making them
less conservative and simplifying SelectionDAGBuilder once they are all using
this interface.

This should not cause any functionnal change so the existing tests are used
and not modified.

Test Plan: make check-all, benefits from existing tests of atomics on ARM

Reviewers: jfb, t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5179

llvm-svn: 218329
2014-09-23 20:31:14 +00:00
Lang Hames a633a6cdb1 [MCJIT] Remove PPCRelocations.h - it's no longer used.
This was overlooked in r218320, which removed the relocation headers for other
targets. Thanks to Ulrich Weigand for catching it.

llvm-svn: 218327
2014-09-23 19:17:48 +00:00
Robin Morisset a7b357fed1 Just add a fixme about a possibly faster implementation of some atomic loads on some ARM processors
llvm-svn: 218326
2014-09-23 18:33:21 +00:00
Matt Arsenault 4364fef82f Fix typo
llvm-svn: 218324
2014-09-23 18:30:57 +00:00
Chandler Carruth adcfec995c [x86] Teach the new shuffle lowering's blend functionality to use AVX2's
VPBLENDD where appropriate even on 128-bit vectors.

According to Agner's tables, this instruction is significantly higher
throughput (can execute on any port) on Haswell chips so we should
aggressively try to form it when available.

Sadly, this loses our delightful shuffle comments. I'll add those back
for VPBLENDD next.

llvm-svn: 218322
2014-09-23 18:16:12 +00:00
Lang Hames d5f496d57c [MCJIT] Nuke MachineRelocation and MachineCodeEmitter. Now that the old JIT is
gone they're no longer needed.

llvm-svn: 218320
2014-09-23 18:08:47 +00:00
Oliver Stannard c546625c4f Fix segfault in AArch64 backend with -g and -mbig-endian
Fix a null pointer dereference when trying to swap the endianness of
fixups in the .eh_frame section in the AArch64 backend.

llvm-svn: 218311
2014-09-23 15:38:11 +00:00
Sid Manning bd8bd484c3 Loop instead of individual def's for each GPR.
Differential Revision: http://reviews.llvm.org/D5450

llvm-svn: 218305
2014-09-23 13:55:50 +00:00
Chandler Carruth 40592d2dec [x86] Teach the vector comment parsing and printing to correctly handle
undef in the shuffle mask. This shows up when we're printing comments
during lowering and we still have an IR-level constant hanging around
that models undef.

A nice consequence of this is *much* prettier test cases where the undef
lanes actually show up as undef rather than as a particular set of
values. This also allows us to print shuffle comments in cases that use
undef such as the recently added variable VPERMILPS lowering. Now those
test cases have nice shuffle comments attached with their details.

The shuffle lowering for PSHUFB has been augmented to use undef, and the
shuffle combining has been augmented to comprehend it.

llvm-svn: 218301
2014-09-23 11:15:19 +00:00
Chandler Carruth 6d5916a2d7 [x86] Teach the AVX1 path of the new vector shuffle lowering one more
trick that I missed.

VPERMILPS has a non-immediate memory operand mode that allows it to do
asymetric shuffles in the two 128-bit lanes. Use this rather than two
shuffles and a blend.

However, it turns out the variable shuffle path to VPERMILPS (and
VPERMILPD, although that one offers no functional differenc from the
immediate operand other than variability) wasn't even plumbed through
codegen. Do such plumbing so that we can reasonably emit
a variable-masked VPERMILP instruction. Also plumb basic comment parsing
and printing through so that the tests are reasonable.

There are still a few tests which don't show the shuffle pattern. These
are tests with undef lanes. I'll teach the shuffle decoding and printing
to handle undef mask entries in a follow-up. I've looked at the masks
and they seem reasonable.

llvm-svn: 218300
2014-09-23 10:08:29 +00:00
Chandler Carruth ed5dfff865 [x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for the
td pattern). Currently we only model the immediate operand variation of
VPERMILPS and VPERMILPD, we should make that clear in the pseudos used.
Will be adding support for the variable mask variant in my next commit.

llvm-svn: 218282
2014-09-22 22:29:42 +00:00
Kaelyn Takata cecdff6512 Fix a "typo" from my previous commit.
llvm-svn: 218281
2014-09-22 22:17:59 +00:00
Kaelyn Takata ba0a1e0520 Silence unused variable warnings in the new stub functions that occur
when assertions are disabled.

llvm-svn: 218280
2014-09-22 22:14:13 +00:00
Chandler Carruth 252debeb0b [x86] Stub out the integer lowering of 256-bit vectors with AVX2
support. No interesting functionality yet, but this will let me
implement one vector type at a time.

llvm-svn: 218277
2014-09-22 21:45:57 +00:00
Juergen Ributzka 27e959d7b2 [FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1).
Shift-left immediate with sign-/zero-extensions also works for boolean values.
Update the assert and the test cases to reflect that fact.

This should fix a bug found by Chad.

llvm-svn: 218275
2014-09-22 21:08:53 +00:00
Ehsan Akhgari bb6bb07d18 ms-inline-asm: Fix parsing label names inside bracket expressions
Summary:
This fixes a couple of issues.  One is ensuring that AOK_Label rewrite
rules have a lower priority than AOK_Skip rules, as AOK_Skip needs to
be able to skip the brackets properly.  The other part of the fix ensures
that we don't overwrite Identifier when looking up the identifier, and
that we use the locally available information to generate the AOK_Label
rewrite in ParseIntelIdentifier.  Doing that in CreateMemForInlineAsm
would be problematic since the Start location there may point to the
beginning of a bracket expression, and not necessarily the beginning of
an identifier.

This also means that we don't need to carry around the InternlName field,
which helps simplify the code.

Test Plan: This will be tested on the clang side.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5445

llvm-svn: 218270
2014-09-22 20:40:36 +00:00
Sanjay Patel 7939d7229d Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2).
We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels, 
but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.

The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save
at least 8 bytes (up to 31 bytes) of constant pool data.

Differential Revision: http://reviews.llvm.org/D5347

llvm-svn: 218263
2014-09-22 18:54:01 +00:00