Commit Graph

12952 Commits

Author SHA1 Message Date
Chandler Carruth 3bedf4407b [x86] Update the order of instructions after I switched to a bitcast
helper that skips creating a cast when it isn't necessary.

It's really somewhat concerning that this was caused by the the presence
of a no-op bitcast, but...

llvm-svn: 238642
2015-05-30 06:02:37 +00:00
Chandler Carruth 2599da3cfd [x86] Restore the bitcasts I removed when refactoring this to avoid
shifting vectors of bytes as x86 doesn't have direct support for that.

This removes a bunch of redundant masking in the generated code for SSE2
and SSE3.

In order to avoid the really significant code size growth this would
have triggered, I also factored the completely repeatative logic for
shifting and masking into two lambdas which in turn makes all of this
much easier to read IMO.

llvm-svn: 238637
2015-05-30 04:05:11 +00:00
Chandler Carruth 6ba9730a4e [x86] Implement a faster vector population count based on the PSHUFB
in-register LUT technique.

Summary:
A description of this technique can be found here:
http://wm.ite.pl/articles/sse-popcount.html

The core of the idea is to use an in-register lookup table and the
PSHUFB instruction to compute the population count for the low and high
nibbles of each byte, and then to use horizontal sums to aggregate these
into vector population counts with wider element types.

On x86 there is an instruction that will directly compute the horizontal
sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily.
Various tricks are used to get vNi32 and vNi16 from the vNi8 that the
LUT computes.

The base implemantion of this, and most of the work, was done by Bruno
in a follow up to D6531. See Bruno's detailed post there for lots of
timing information about these changes.

I have extended Bruno's patch in the following ways:

0) I committed the new tests with baseline sequences so this shows
   a diff, and regenerated the tests using the update scripts.

1) Bruno had noticed and mentioned in IRC a redundant mask that
   I removed.

2) I introduced a particular optimization for the i32 vector cases where
   we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD
   + PSADBW to compute doubled high i32 popcounts. This takes advantage
   of the fact that to line up the high i32 popcounts we have to shift
   them anyways, and we can shift them by one fewer bit to effectively
   divide the count by two. While the PSHUFD based horizontal add is no
   faster, it doesn't require registers or load traffic the way a mask
   would, and provides more ILP as it happens on different ports with
   high throughput.

3) I did some code cleanups throughout to simplify the implementation
   logic.

4) I refactored it to continue to use the parallel bitmath lowering when
   SSSE3 is not available to preserve the performance of that version on
   SSE2 targets where it is still much better than scalarizing as we'll
   still do a bitmath implementation of popcount even in scalar code
   there.

With #1 and #2 above, I analyzed the result in IACA for sandybridge,
ivybridge, and haswell. In every case I measured, the throughput is the
same or better using the LUT lowering, even v2i64 and v4i64, and even
compared with using the native popcnt instruction! The latency of the
LUT lowering is often higher than the latency of the scalarized popcnt
instruction sequence, but I think those latency measurements are deeply
misleading. Keeping the operation fully in the vector unit and having
many chances for increased throughput seems much more likely to win.

With this, we can lower every integer vector popcount implementation
using the LUT strategy if we have SSSE3 or better (and thus have
PSHUFB). I've updated the operation lowering to reflect this. This also
fixes an issue where we were scalarizing horribly some AVX lowerings.

Finally, there are some remaining cleanups. There is duplication between
the two techniques in how they perform the horizontal sum once the byte
population count is computed. I'm going to factor and merge those two in
a separate follow-up commit.

Differential Revision: http://reviews.llvm.org/D10084

llvm-svn: 238636
2015-05-30 03:20:59 +00:00
Chandler Carruth c2e400de83 [x86] Restructure the parallel bitmath lowering of popcount into
a separate routine, generalize it to work for all the integer vector
sizes, and do general code cleanups.

This dramatically improves lowerings of byte and short element vector
popcount, but more importantly it will make the introduction of the
LUT-approach much cleaner.

The biggest cleanup I've done is to just force the legalizer to do the
bitcasting we need. We run these iteratively now and it makes the code
much simpler IMO. Other changes were minor, and mostly naming and
splitting things up in a way that makes it more clear what is going on.

The other significant change is to use a different final horizontal sum
approach. This is the same number of instructions as the old method, but
shifts left instead of right so that we can clear everything but the
final sum with a single shift right. This seems likely better than
a mask which will usually have to read the mask from memory. It is
certaily fewer u-ops. Also, this will be temporary. This and the LUT
approach share the need of horizontal adds to finish the computation,
and we have more clever approaches than this one that I'll switch over
to.

llvm-svn: 238635
2015-05-30 03:20:55 +00:00
Reid Kleckner e6531a5588 [WinEH] Adjust the 32-bit SEH prologue to better match reality
It turns out that _except_handler3 and _except_handler4 really use the
same stack allocation layout, at least today. They just make different
choices about encoding the LSDA.

This is in preparation for lowering the llvm.eh.exceptioninfo().

llvm-svn: 238627
2015-05-29 22:57:46 +00:00
Reid Kleckner 173a72524f Disable FP elimination in funcs using 32-bit MSVC EH personalities
The value in 'ebp' acts as an implicit argument to the outlined
handlers, and is recovered with frameaddress(1).

llvm-svn: 238619
2015-05-29 21:58:11 +00:00
Matthias Braun 165d467125 MachineCopyPropagation: Remove the copies instead of using KILL instructions.
For some history here see the commit messages of r199797 and r169060.

The original intent was to fix cases like:

%EAX<def> = COPY %ECX<kill>, %RAX<imp-def>
%RCX<def> = COPY %RAX<kill>

where simply removing the copies would have RCX undefined as in terms of
machine operands only the ECX part of it is defined. The machine
verifier would complain about this so 169060 changed such COPY
instructions into KILL instructions so some super-register imp-defs
would be preserved. In r199797 it was finally decided to always do this
regardless of super-register defs.

But this is wrong, consider:
R1 = COPY R0
...
R0 = COPY R1
getting changed to:
R1 = KILL R0
...
R0 = KILL R1

It now looks like R0 dies at the first KILL and won't be alive until the
second KILL, while in reality R0 is alive and must not change in this
part of the program.

As this only happens after register allocation there is not much code
still performing liveness queries so the issue was not noticed.  In fact
I didn't manage to create a testcase for this, without unrelated changes
I am working on at the moment.

The fix is simple: As of r223896 the MachineVerifier allows reads from
partially defined registers, so the whole transforming COPY->KILL thing
is not necessary anymore. This patch also changes a similar (but more
benign case as the def and src are the same register) case in the
VirtRegRewriter.

Differential Revision: http://reviews.llvm.org/D10117

llvm-svn: 238588
2015-05-29 18:19:25 +00:00
Nemanja Ivanovic 376e17364f Add support for VSX FMA single-precision instructions to the PPC back end
This patch corresponds to review:
http://reviews.llvm.org/D9941

It adds the various FMA instructions introduced in the version 2.07 of
the ISA along with the testing for them. These are operations on single
precision scalar values in VSX registers.

llvm-svn: 238578
2015-05-29 17:13:25 +00:00
Alex Lorenz 09b832cac5 MIR Serialization: use correct line and column numbers for LLVM IR errors.
This commit translates the line and column numbers for LLVM IR
errors from the numbers in the YAML block scalar to the numbers 
in the MIR file so that the MIRParser users can report LLVM IR 
errors with the correct line and column numbers.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10108

llvm-svn: 238576
2015-05-29 17:05:41 +00:00
Reid Kleckner 1d3d4adbb9 [WinEH] Emit EH tables for __CxxFrameHandler3 on 32-bit x86
Small (really small!) C++ exception handling examples work on 32-bit x86
now.

This change disables the use of .seh_* directives in WinException when
CFI is not in use. It also uses absolute symbol references in the tables
instead of imagerel32 relocations.

Also fixes a cache invalidation bug in MMI personality classification.

llvm-svn: 238575
2015-05-29 17:00:57 +00:00
Jingyue Wu 995dde2799 [NVPTXFavorNonGenericAddrSpaces] recursively trace into GEP and BitCast
Summary:
This patch allows NVPTXFavorNonGenericAddrSpaces to remove addrspacecast
from longer chains consisting of GEPs and BitCasts. For example, it can
now optimize

  %0 = addrspacecast [10 x float] addrspace(3)* @a to [10 x float]*
  %1 = gep [10 x float]* %0, i64 0, i64 %i
  %2 = bitcast float* %1 to i32*
  %3 = load i32* %2 ; emits ld.u32

to

  %0 = gep [10 x float] addrspace(3)* @a, i64 0, i64 %i
  %1 = bitcast float addrspace(3)* %0 to i32 addrspace(3)*
  %3 = load i32 addrspace(3)* %1 ; emits ld.shared.f32

Test Plan: @ld_int_from_global_float in access-non-generic.ll

Reviewers: broune, eliben, jholewinski, meheff

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10074

llvm-svn: 238574
2015-05-29 17:00:27 +00:00
Colin LeMahieu 68d967d92e [Hexagon] Disassembling, printing, and emitting instructions a whole-bundle at a time which is the semantic unit for Hexagon. Fixing tests to use the new format. Disabling tests in the direct object emission path for a followup patch.
llvm-svn: 238556
2015-05-29 14:44:13 +00:00
Quentin Colombet 5f834c2260 Add a test for the MachineCopyPropagation change landed in r238518.
llvm-svn: 238537
2015-05-29 01:40:00 +00:00
Chandler Carruth 39691c41bf [x86] Move the vector popcount tests into non-ISA files, and instead
organize them by the width of vector.

This makes it a lot easier to see that we're covering all of the vector
types but not doing so excessively. This also adds tests across the
spectrum of SSE versions in addition to the AVX versions.

If you're really tired of seeing the *massive* sprawl of scalarized code
for this, don't worry, I'm just about to land Bruno's patch that
dramatically improve the situation for SSSE3 and newer.

llvm-svn: 238520
2015-05-28 22:46:48 +00:00
Alex Lorenz 78d7831b0f MIR Serialization: print and parse machine function names.
This commit introduces a serializable structure called
'llvm::yaml::MachineFunction' that stores the machine
function's name. This structure will mirror the machine 
function's state in the future.

This commit prints machine functions as YAML documents
containing a YAML mapping that stores the state of a machine
function. This commit also parses the YAML documents
that contain the machine functions.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D9841

llvm-svn: 238519
2015-05-28 22:41:12 +00:00
David Majnemer 4e6438c534 Add testcase for r238503.
llvm-svn: 238515
2015-05-28 22:12:27 +00:00
Reid Kleckner fe4d491bd9 [WinEH] Start inserting state number stores for C++ EH
This moves all the state numbering code for C++ EH to WinEHPrepare so
that we can call it from the X86 state numbering IR pass that runs
before isel.

Now we just call the same state numbering machinery and insert a bunch
of stores. It also populates MachineModuleInfo with information about
the current function.

llvm-svn: 238514
2015-05-28 22:00:24 +00:00
Reid Kleckner 80956a0142 Disable x86 tail call optimizations that jump through GOT
For x86 targets, do not do sibling call optimization when materializing
the callee's address would require a GOT relocation. We can still do
tail calls to internal functions, hidden functions, and protected
functions, because they do not require this kind of relocation. It is
still possible to get GOT relocations when the user explicitly asks for
it with musttail or -tailcallopt, both of which are supposed to
guarantee TCO.

Based on a patch by Chih-hung Hsieh.

Reviewers: srhines, timmurray, danalbert, enh, void, nadav, rnk

Subscribers: joerg, davidxl, llvm-commits

Differential Revision: http://reviews.llvm.org/D9799

llvm-svn: 238487
2015-05-28 20:44:28 +00:00
Daniel Sanders b34dab3d00 Revert r238427 - [mips] Make TTypeEncoding indirect to allow .eh_frame to be read-only.
It caused a smaller number of failures than the previous attempt at committing but still caused a couple on the llvm-linux-mips builder. Reverting while I investigate the remainder.

llvm-svn: 238483
2015-05-28 20:30:32 +00:00
Peter Collingbourne 450fbee6b2 Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing these as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach
the register allocator to allocate the registers in ascending order. This
never got implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achiveing better codegen is to create LDM/STM
instructions with identical sets of virtual registers, let the register
allocator pick arbitrary registers and order register lists when printing an
MCInst. This approach also avoids the need to repeatedly calculate offsets
which ultimately ought to be eliminated pre-RA in order to decrease register
pressure.

This is implemented by lowering the memcpy intrinsic to a series of SD-only
MCOPY pseudo-instructions which performs a memory copy using a given number
of registers. During SD->MI lowering, we lower MCOPY to LDM/STM. This is a
little unusual, but it avoids the need to encode register lists in the SD,
and we can take advantage of SD use lists to decide whether to use the _UPD
variant of the instructions.

Fixes PR9199.

Differential Revision: http://reviews.llvm.org/D9508

llvm-svn: 238473
2015-05-28 20:02:45 +00:00
Daniel Sanders 3985530328 [mips] Make TTypeEncoding indirect to allow .eh_frame to be read-only.
Summary:
Following on from r209907 which made personality encodings indirect, do the
same for TType encodings. This fixes the case where a try/catch block needs
to generate references to, for example, std::exception in the
.gcc_except_table.

Reviewers: petarj

Reviewed By: petarj

Subscribers: srhines, joerg, tberghammer, llvm-commits

Differential Revision: http://reviews.llvm.org/D9669

llvm-svn: 238427
2015-05-28 14:52:15 +00:00
Chandler Carruth 88862eaefa [x86] Refactor the tests for popcnt.
Extracted from the D6531 patch by Bruno Cardoso Lopes, and re-generated
to reflect the current state of the world. This should let Bruno's D6531
actually show the delta between the approaches by running the x86 test
case update script after re-building.

llvm-svn: 238391
2015-05-28 02:40:15 +00:00
Alex Lorenz 2bdb4e1063 Resubmit r237954 (MIR Serialization: print and parse LLVM IR using MIR format).
This commit a 3rd attempt at comitting the initial MIR serialization patch.
The first commit (r237708) was reverted in 237730. Then the second commit
(r237954) was reverted in r238007, as the MIR library under CodeGen caused
a circular dependency where the CodeGen library depended on MIR and MIR
library depended on CodeGen.

This commit has fixed the dependencies between CodeGen and MIR by
reorganizing the MIR serialization code - the code that prints out
MIR has been moved to CodeGen, and the MIR library has been renamed
to MIRParser. Now the CodeGen library doesn't depend on the
MIRParser library, thus the circular dependency no longer exists.

--Original Commit Message--

MIR Serialization: print and parse LLVM IR using MIR format.

This commit is the initial commit for the MIR serialization project.
It creates a new library under CodeGen called 'MIR'. This new
library adds a new machine function pass that prints out the LLVM IR
using the MIR format. This pass is then added as a last pass when a
'stop-after' option is used in llc. The new library adds the initial
functionality for parsing of MIR files as well. This commit also
extends the llc tool so that it can recognize and parse MIR input files.

Reviewers: Duncan P. N. Exon Smith, Matthias Braun, Philip Reames

Differential Revision: http://reviews.llvm.org/D9616 

llvm-svn: 238341
2015-05-27 18:02:19 +00:00
Elena Demikhovsky 86c7b46680 AVX-512: Fixed a bug in extracting subvector from v64i1
By Igor Breger (igor.breger@intel.com)

llvm-svn: 238322
2015-05-27 14:09:33 +00:00
Daniel Sanders 8ef465f4bb Revert r238190 and r238197: [mips] Make TTypeEncoding indirect to allow .eh_frame to be read-only.
This broke the llvm-mips-linux builder and several of our out-of-tree builders.
Initial investigations show that the commit probably isn't the problem but
reverting anyway while I investigate.

llvm-svn: 238302
2015-05-27 08:44:01 +00:00
Elena Demikhovsky 3948c590e3 AVX-512: Implemented all forms of sign-extend and zero-extend instructions for KNL and SKX
Implemented DAG lowering for all these forms.
Added tests for DAG lowering and encoding.

By Igor Breger (igor.breger@intel.com)

llvm-svn: 238301
2015-05-27 08:15:19 +00:00
Quentin Colombet aa8020752e [X86] Implement the support for shrink-wrapping.
With this patch the x86 backend is now shrink-wrapping capable
and this functionality can be tested by using the
-enable-shrink-wrap switch.

The next step is to make more test and enable shrink-wrapping by
default for x86.

Related to <rdar://problem/20821487>

llvm-svn: 238293
2015-05-27 06:28:41 +00:00
Rafael Espindola 2fb8401b2a Print "lock \t foo" instead of "lock \n foo".
This gets gas and llc -filetype=obj to agree on the order of prefixes.

For llvm-mc we need to fix the asm parser to know that it makes a difference
on which line the "lock" is in.

Part of pr23594.

llvm-svn: 238232
2015-05-26 18:35:10 +00:00
Jan Vesely b670d37105 R600: Use SIGN_EXTEND_INREG for SEXT loads
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 238229
2015-05-26 18:07:22 +00:00
Diego Novillo bfecc06656 Revert "Re-commit changes in r237579 with fix for bug breaking windows builds."
This reverts commit r238201 to fix linking problems in x86 Linux
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150525/278413.html

llvm-svn: 238223
2015-05-26 17:45:38 +00:00
Luke Cheeseman a5d053d6f4 Re-commit changes in r237579 with fix for bug breaking windows builds.
llvm-svn: 238201
2015-05-26 13:40:31 +00:00
Elena Demikhovsky b2b901c607 AVX-512: fixed a bug in lowering VSELECT for 512-bit vector
https://llvm.org/bugs/show_bug.cgi?id=23634

llvm-svn: 238195
2015-05-26 11:32:39 +00:00
Daniel Sanders 58ee4c9451 [mips] Make TTypeEncoding indirect to allow .eh_frame to be read-only.
Summary:
Following on from r209907 which made personality encodings indirect, do the
same for TType encodings. This fixes the case where a try/catch block needs
to generate references to, for example, std::exception in the
.gcc_except_table.

This commit uses DW_EH_PE_sdata8 for N64 as far as is possible at the moment.
However, it is possible to end up with DW_EH_PE_sdata4 when a TargetMachine is
not available. There's no risk of issues with inconsistency here since the
tables are self describing but it does mean there is a small chance of the
PC-relative offset being out of range for particularly large programs.

Reviewers: petarj

Reviewed By: petarj

Subscribers: srhines, joerg, tberghammer, llvm-commits

Differential Revision: http://reviews.llvm.org/D9669

llvm-svn: 238190
2015-05-26 10:19:18 +00:00
Simon Pilgrim 0be4fa761f [X86][AVX2] Vectorized i16 shift operators
Part of D9474, this patch extends AVX2 v16i16 types to 2 x 8i32 vectors and uses i32 shift variable shifts before packing back to i16.

Adds AVX2 tests for v8i16 and v16i16 

llvm-svn: 238149
2015-05-25 17:49:13 +00:00
Tom Stellard ec87f841c6 R600/SI: Fix bug with v_interp_p1_f32 instructions on 16 bank lds chips
The src and dst register cannot be the same on chips with 16 lds banks.

llvm-svn: 238147
2015-05-25 16:15:54 +00:00
Kit Barton 6646033e6e This patch adds support for the vector quadword add/sub instructions introduced
in POWER8:

vadduqm
vaddeuqm
vaddcuq
vaddecuq
vsubuqm
vsubeuqm
vsubcuq
vsubecuq
In addition to adding the instructions themselves, it also adds support for the
v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and
IntrinsicEmitter.cpp).

http://reviews.llvm.org/D9081

llvm-svn: 238144
2015-05-25 15:49:26 +00:00
Michael Kuperstein f145228676 [X86] When pattern-matching scalar FMA3 intrinsics, don't re-arrange the first and second operands.
The semantics of the scalar FMA intrinsics are that the high vector elements are copied from the first source.
The existing pattern switches src1 and src2 around, to match the "213" order, which ends up tying the original src2 to the dest. Since the actual scalar fma3 instructions copy the high elements from the dest register, the wrong values are copied.

This modifies the pattern to leave src1 and src2 in their original order.

Differential Revision: http://reviews.llvm.org/D9908

llvm-svn: 238131
2015-05-25 12:35:25 +00:00
Elena Demikhovsky 1c1391ba24 Added promotion to EXTRACT_SUBVECTOR operand.
I encountered with this case in one of KNL tests for i1 vectors.
v16i1 = EXTRACT_SUBVECTOR v32i1, x

llvm-svn: 238130
2015-05-25 11:33:13 +00:00
Matt Arsenault 65ad1602b0 Add target hook to allow merging stores of nonzero constants
On GPU targets, materializing constants is cheap and stores are
expensive, so only doing this for zero vectors was silly.

Most of the new testcases aren't optimally merged, and are for
later improvements.

llvm-svn: 238108
2015-05-24 00:51:27 +00:00
Hal Finkel 5f2a1379ef [PowerPC] Fix fast-isel when compare is split from branch
When the compare feeding a branch was in a different BB from the branch, we'd
try to "regenerate" the compare in the block with the branch, possibly trying
to make use of values not available there. Copy a page from AArch64's play book
here to fix the problem (at least in terms of correctness).

Fixes PR23640.

llvm-svn: 238097
2015-05-23 12:18:10 +00:00
Akira Hatanaka ddf76aa36f Stop resetting NoFramePointerElim in TargetMachine::resetTargetOptions.
This is part of the work to remove TargetMachine::resetTargetOptions.

In this patch, instead of updating global variable NoFramePointerElim in
resetTargetOptions, its use in DisableFramePointerElim is replaced with a call
to TargetFrameLowering::noFramePointerElim. This function determines on a
per-function basis if frame pointer elimination should be disabled.

There is no change in functionality except that cl:opt option "disable-fp-elim"
can now override function attribute "no-frame-pointer-elim". 

llvm-svn: 238080
2015-05-23 01:14:08 +00:00
Akira Hatanaka aa115fbb7b Remove unnecessary command line option "-disable-fp-elim".
This option currently has no effect as function attribute
"no-frame-pointer-elim=false" overrides it.

llvm-svn: 238077
2015-05-23 00:31:56 +00:00
Rafael Espindola 445712264d Revert "make reciprocal estimate code generation more flexible by adding command-line options"
This reverts commit r238051.

It broke some bots:

http://lab.llvm.org:8011/builders/llvm-ppc64-linux1/builds/18190

llvm-svn: 238075
2015-05-23 00:22:44 +00:00
Ahmed Bougacha 236f9040d0 [AArch64][CGP] Sink zext feeding stxr/stlxr into the same block.
The usual CodeGenPrepare trickery, on a target-specific intrinsic.
Without this, the expansion of atomics will usually have the zext
be hoisted out of the loop, defeating the various patterns we have
to catch this precise case.

Differential Revision: http://reviews.llvm.org/D9930

llvm-svn: 238054
2015-05-22 21:37:17 +00:00
Ahmed Bougacha 3d2d9d1d91 [AArch64] Robustize atomic cmpxchg test a little more. NFC.
We changed the test to test non-constant values in r238049.
We can also use CHECK-NEXT to be a little stricter.

llvm-svn: 238052
2015-05-22 21:35:14 +00:00
Sanjay Patel ba2ba80302 make reciprocal estimate code generation more flexible by adding command-line options
This patch adds a class for processing many recip codegen possibilities.
The TargetRecip class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.

The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other CPUs continue to *not* use reciprocal estimates by default with -ffast-math.

Differential Revision: http://reviews.llvm.org/D8982

llvm-svn: 238051
2015-05-22 21:10:06 +00:00
Ahmed Bougacha df94265963 [AArch64] Robustize atomic cmpxchg test. NFC.
Constants are easy to get right the wrong way.

llvm-svn: 238049
2015-05-22 21:08:15 +00:00
Quentin Colombet 494eb606cd Reapply r238011 with a fix for the trap instruction.
The problem was that I slipped a change required for shrink-wrapping, namely I
used getFirstTerminator instead of the getLastNonDebugInstr that was here before
the refactoring, whereas the surrounding code is not yet patched for that.

Original message:
[X86] Refactor the prologue emission to prepare for shrink-wrapping.

- Add a late pass to expand pseudo instructions (tail call and EH returns).
 Instead of doing it in the prologue emission.
- Factor some static methods in X86FrameLowering to ease code sharing.

NFC.

Related to <rdar://problem/20821487>

llvm-svn: 238035
2015-05-22 18:10:47 +00:00
NAKAMURA Takumi 263b27997d Revert r237954, "Resubmit r237708 (MIR Serialization: print and parse LLVM IR using MIR format)."
It brought cyclic dependencies between LLVMCodeGen and LLVMMIR.

llvm-svn: 238007
2015-05-22 07:17:07 +00:00
Peter Collingbourne 7e814d100b Revert r237590, "ARM: allow jump tables to be placed as constant islands."
Caused a miscompile of the Android port of Chromium, details
forthcoming.

llvm-svn: 237972
2015-05-21 23:20:55 +00:00
Chad Rosier ce8e5abbaf [AArch64] Enhance the load/store optimizer with target-specific alias analysis.
Phabricator: http://reviews.llvm.org/D9863
llvm-svn: 237963
2015-05-21 21:36:46 +00:00
Alex Lorenz c37baf82a9 Resubmit r237708 (MIR Serialization: print and parse LLVM IR using MIR format).
This commit is a 2nd attempt at committing the initial MIR serialization patch.
The first commit (r237708) made the incremental buildbots unstable and was 
reverted in r237730. The original commit didn't add a terminating null 
character to the LLVM IR source which was passed to LLParser, and this 
sometimes caused the test 'llvmIR.mir' to fail with a parsing error because 
the LLVM IR source didn't have a null character immediately after the end 
and thus LLLexer encountered some garbage characters that ultimately caused 
the error.

This commit also includes the other test fixes I committed in
r237712 (llc path fix) and r237723 (remove target triple) which
also got reverted in r237730.

--Original Commit Message--

MIR Serialization: print and parse LLVM IR using MIR format.

This commit is the initial commit for the MIR serialization project.
It creates a new library under CodeGen called 'MIR'. This new
library adds a new machine function pass that prints out the LLVM IR 
using the MIR format. This pass is then added as a last pass when a 
'stop-after' option is used in llc. The new library adds the initial 
functionality for parsing of MIR files as well. This commit also 
extends the llc tool so that it can recognize and parse MIR input files.

Reviewers: Duncan P. N. Exon Smith, Matthias Braun, Philip Reames

Differential Revision: http://reviews.llvm.org/D9616

llvm-svn: 237954
2015-05-21 20:54:45 +00:00
Bill Schmidt e13ac91c5d [PPC64] Handle vpkudum mask pattern correctly when vpkudum isn't available
My recent patch to add support for ISA 2.07 vector pack/unpack
instructions didn't properly check for availability of the vpkudum
instruction when recognizing it as a special vector shuffle case.
This causes us to leave the vector shuffle in place (rather than
converting it to a vector permute) so that it can be recognized later
as a vpkudum, but that pattern is invalid for processors prior to
POWER8.  Thus LLVM crashes with an "unable to select" message.  We
observed this since one of our buildbots is configured to generate
code for a POWER7.

This patch fixes the problem by checking for availability of the
vpkudum instruction during custom lowering of vector shuffles.

I've added a test case variant for the vpkudum pattern when the
instruction isn't available.

llvm-svn: 237952
2015-05-21 20:48:49 +00:00
Nemanja Ivanovic f02def6cbc Add support for VSX scalar single-precision arithmetic in the PPC target
http://reviews.llvm.org/D9891
Following up on the VSX single precision loads and stores added earlier, this
adds support for elementary arithmetic operations on single precision values
in VSX registers. These instructions utilize the new VSSRC register class.
Instructions added:
xsaddsp
xsdivsp
xsmulsp
xsresp
xsrsqrtesp
xssqrtsp
xssubsp

llvm-svn: 237937
2015-05-21 19:32:49 +00:00
Elena Demikhovsky 4aed59fc89 AVX-512: Enabled SSE intrinsics on AVX-512.
Predicate UseAVX depricates pattern selection on AVX-512.
This predicate is necessary for DAG selection to select EVEX form.
But mapping SSE intrinsics to AVX-512 instructions is not ready yet.
So I replaced UseAVX with HasAVX for intrinsics patterns.

llvm-svn: 237903
2015-05-21 14:01:32 +00:00
Simon Pilgrim e054199354 [X86][SSE] Improve support for 128-bit vector sign extension
This patch improves support for sign extension of the lower lanes of vectors of integers by making use of the SSE41 pmovsx* sign extension instructions where possible, and optimizing the sign extension by shifts on pre-SSE41 targets (avoiding the use of i64 arithmetic shifts which require scalarization).

It converts SIGN_EXTEND nodes to SIGN_EXTEND_VECTOR_INREG where necessary, that more closely matches the pmovsx* instruction than the default approach of using SIGN_EXTEND_INREG which splits the operation (into an ANY_EXTEND lowered to a shuffle followed by shifts) making instruction matching difficult during lowering. Necessary support for SIGN_EXTEND_VECTOR_INREG has been added to the DAGCombiner.

Differential Revision: http://reviews.llvm.org/D9848

llvm-svn: 237885
2015-05-21 10:05:03 +00:00
Andrew Kaylor a6c5b9682e [WinEH] C++ EH state numbering fixes
Differential Revision: http://reviews.llvm.org/D9787

llvm-svn: 237854
2015-05-20 23:22:24 +00:00
Reid Kleckner 2632f0df48 [WinEH] Store pointers to the LSDA in the exception registration object
We aren't yet emitting the LSDA yet, so this will still fail to
assemble.

llvm-svn: 237852
2015-05-20 23:08:04 +00:00
Hans Wennborg a8f8df5dd2 Revert r237828 "[X86] Remove unused node after morphing it from shr to and."
This caused assertions during DAG combine: PR23601.

llvm-svn: 237843
2015-05-20 22:31:55 +00:00
Davide Italiano 141b2891cb [Target/ARM] Only enable OptimizeBarrierPass at -O1 and above.
Ideally this is going to be and LLVM IR pass (shared, among others
with AArch64), but for the time being just enable it if consumers
ask us for optimization and not unconditionally.

Discussed with Tim Northover on IRC.

llvm-svn: 237837
2015-05-20 21:40:38 +00:00
Benjamin Kramer a74480d1eb [X86] Remove unused node after morphing it from shr to and.
In some cases it won't get cleaned up properly leading to crashes
downstream. PR23353.

Based on a patch by Davide Italiano.

llvm-svn: 237828
2015-05-20 20:10:26 +00:00
Pawel Bylica 8011da9628 Fix icmp lowering
Summary:
During icmp lowering it can happen that a constant value can be larger than expected (see the code around the change).
APInt::getMinSignedBits() must be checked again as the shift before can change the constant sign to positive.
I'm not sure it is the best fix possible though.

Test Plan: Regression test included.

Reviewers: resistor, chandlerc, spatel, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D9147

llvm-svn: 237812
2015-05-20 17:21:09 +00:00
Elena Demikhovsky f61727d880 AVX-512: fixed algorithm of building vectors of i1 elements
fixed extract-insert i1 element,
load i1, zextload i1 should be with "and $1, %reg" to prevent loading garbage.
added a bunch of new tests.

llvm-svn: 237793
2015-05-20 14:32:03 +00:00
Daniel Sanders 69c6008e49 Revert r237789 - [mips] The naming convention for private labels is ABI dependant.
It works, but I've noticed that I missed several callers of createMCAsmInfo()
and many don't have a TargetMachine to provide.

llvm-svn: 237792
2015-05-20 14:18:59 +00:00
Daniel Sanders 196390e8af [mips] Fix ehframe-indirect.ll test.
Summary:
-check-prefix replaces the default CHECK prefix rather than adding to it and
must be explicitly re-added.

Also added the N32 cases.

Reviewers: petarj

Reviewed By: petarj

Subscribers: tberghammer, llvm-commits

Differential Revision: http://reviews.llvm.org/D9668

llvm-svn: 237790
2015-05-20 13:19:19 +00:00
Daniel Sanders b718eca643 [mips] The naming convention for private labels is ABI dependant.
Summary:
For N32/N64, private labels begin with '.L' but for O32 they begin with '$'.

MCAsmInfo now has an initializer function which can be used to provide information from the TargetMachine to control the assembly syntax.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: jfb, sandeep, llvm-commits, rafael

Differential Revision: http://reviews.llvm.org/D9821

llvm-svn: 237789
2015-05-20 13:16:42 +00:00
Igor Laevsky 423bc9ec4c [StatepointLowering] Support of the gc.relocates for invoke statepoints.
This change implements support for lowering of the gc.relocates tied to the invoke statepoint.
This is acomplished by storing frame indices of the lowered values in "StatepointRelocatedValues" map inside FunctionLoweringInfo instead of storing them in per-basic block structure StatepointLowering.
After this change StatepointLowering is used only during "LowerStatepoint" call and it is not necessary to store it as a field in SelectionDAGBuilder anymore.

Differential Revision: http://reviews.llvm.org/D7798

llvm-svn: 237786
2015-05-20 11:37:25 +00:00
David Majnemer 402c5def11 [X86] Implement the local-exec TLS model for Windows targets
We know that _tls_index is zero for local-exec TLS variables because
they are always defined in the executable.

llvm-svn: 237772
2015-05-20 04:45:26 +00:00
Alex Lorenz de1970fe66 Revert r237708 (MIR serialization) - incremental buildbots became unstable.
The incremental buildbots entered a pass-fail cycle where during the fail
cycle one of the tests from this commit fails for an unknown reason. I
have reverted this commit and will investigate the cause of this problem.

llvm-svn: 237730
2015-05-19 21:41:28 +00:00
Alex Lorenz edde50331d Fix MIR testcase committed in r237708 - remove target triple.
Remove the target specific triple and datalayout from the 
llvm IR in the MIR testcase file.

llvm-svn: 237723
2015-05-19 20:51:48 +00:00
Alex Lorenz 06e3bf670f Fix llc path in MIR testcases committed in r237708.
I've committed testcases with local llc path by mistake.

llvm-svn: 237712
2015-05-19 18:45:41 +00:00
Alex Lorenz c5e0d4d146 MIR Serialization: print and parse LLVM IR using MIR format.
This commit is the initial commit for the MIR serialization project.
It creates a new library under CodeGen called 'MIR'. This new
library adds a new machine function pass that prints out the LLVM IR 
using the MIR format. This pass is then added as a last pass when a 
'stop-after' option is used in llc. The new library adds the initial 
functionality for parsing of MIR files as well. This commit also 
extends the llc tool so that it can recognize and parse MIR input files.

Reviewers: Duncan P. N. Exon Smith, Matthias Braun, Philip Reames

Differential Revision: http://reviews.llvm.org/D9616

llvm-svn: 237708
2015-05-19 18:17:39 +00:00
Daniel Sanders c8cd58fa26 [mips] Correct and improve special-case shuffle instructions.
Summary:
The documentation writes vectors highest-index first whereas LLVM-IR writes
them lowest-index first. As a result, instructions defined in terms of
left_half() and right_half() had the halves reversed.

In addition to correcting them, they have been improved to allow shuffles
that use the same operand twice or in reverse order. For example, ilvev
used to accept masks of the form:
  <0, n, 2, n+2, 4, n+4, ...>
but now accepts:
  <0, 0, 2, 2, 4, 4, ...>
  <n, n, n+2, n+2, n+4, n+4, ...>
  <0, n, 2, n+2, 4, n+4, ...>
  <n, 0, n+2, 2, n+4, 4, ...>

One further improvement is that splati.[bhwd] is now the preferred instruction
for splat-like operations. The other special shuffles are no longer used
for splats. This lead to the discovery that <0, 0, ...> would not cause
splati.[hwd] to be selected and this has also been fixed.

This fixes the enc-3des test from the test-suite on Mips64r6 with MSA.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9660

llvm-svn: 237689
2015-05-19 12:24:52 +00:00
Michael Kuperstein e88a021bf4 [X86] ABI change for x86-32: pass 3 vector arguments in-register instead of 4, except on Darwin.
This changes the ABI used on 32-bit x86 for passing vector arguments. 
Historically, clang passes the first 4 vector arguments in-register, and additional vector arguments on the stack, regardless of platform. That is different from the behavior of gcc, icc, and msvc, all of which pass only the first 3 arguments in-register.
The 3-register convention is documented, unofficially, in Agner's calling convention guide, and, officially, in the recently released version 1.0 of the i386 psABI.

Darwin is kept as is because the OS X ABI Function Call Guide explicitly documents the current (4-register) behavior.

This fixes PR21510

Differential revision: http://reviews.llvm.org/D9644

llvm-svn: 237682
2015-05-19 11:06:56 +00:00
Reid Kleckner 7bc6b6210c Re-land r237175: [X86] Always return the sret parameter in eax/rax ...
This reverts commit r237210.

Also fix X86/complex-fca.ll to match the code that we used to generate
on win32 and now generate everwhere to conform to SysV.

llvm-svn: 237639
2015-05-18 23:35:09 +00:00
Tim Northover 12c41af07c ARM: allow jump tables to be placed as constant islands.
Previously, they were forced to immediately follow the actual branch
instruction. This was usually OK (the LEAs actually accessing them got emitted
nearby, and weren't usually separated much afterwards). Unfortunately, a
sufficiently nasty phi elimination dumps many instructions right before the
basic block terminator, and this can increase the range too much.

This patch frees them up to be placed as usual by the constant islands pass,
and consequently has to slightly modify the form of TBB/TBH tables to refer to
a PC-relative label at the final jump. The other jump table formats were
already position-independent.

rdar://20813304

llvm-svn: 237590
2015-05-18 17:10:40 +00:00
Oliver Stannard 6cb23465e0 Revert r237579, as it broke windows buildbots
llvm-svn: 237583
2015-05-18 16:39:16 +00:00
James Y Knight 807563df22 Add support for the Sparc implementation-defined "ASR" registers.
(Note that register "Y" is essentially just ASR0).

Also added some test cases for divide and multiply, which had none before.

Differential Revision: http://reviews.llvm.org/D8670

llvm-svn: 237580
2015-05-18 16:29:48 +00:00
Oliver Stannard 0c553afe6a [LLVM - ARM/AArch64] Add ACLE special register intrinsics
This patch implements LLVM support for the ACLE special register intrinsics in
section 10.1, __arm_{w,r}sr{,p,64}.

This patch is intended to lower the read/write_register instrinsics, used to
implement the special register intrinsics in the clang patch for special
register intrinsics (see http://reviews.llvm.org/D9697), to ARM specific
instructions MRC,MCR,MSR etc. to allow reading an writing of coprocessor
registers in AArch32 and AArch64. This is done by inspecting the register
string passed to the intrinsic and then lowering to the appropriate
instruction.

Patch by Luke Cheeseman.

Differential Revision: http://reviews.llvm.org/D9699

llvm-svn: 237579
2015-05-18 16:23:33 +00:00
Elena Demikhovsky b8573cba02 AVX-512: Added intrinsics for ADDSS/D, MULSS/D, SUBSS/D, DIVSS/D
instructions. These intrinsics are comming with rounding mode.
Added intrinsics for MAXSS/D, MINSS/D - with and without  sae.

By Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 237560
2015-05-18 07:24:19 +00:00
Elena Demikhovsky 08ce53c0ea AVX-512: Added patterns for scalar-to-vector broadcast
llvm-svn: 237558
2015-05-18 07:06:23 +00:00
Hal Finkel 8340de142c [PowerPC] Add extra r2 read deps on @toc@l relocations
If some commits are happy, and some commits are sad, this is a sad commit. It
is sad because it restricts instruction scheduling to work around a binutils
linker bug, and moreover, one that may never be fixed. On 2012-05-21, GCC was
updated not to produce code triggering this bug, and now we'll do the same...

When resolving an address using the ELF ABI TOC pointer, two relocations are
generally required: one for the high part and one for the low part. Only
the high part generally explicitly depends on r2 (the TOC pointer). And, so,
we might produce code like this:

.Ltmp526:
        addis 3, 2, .LC12@toc@ha
.Ltmp1628:
        std 2, 40(1)
        ld 5, 0(27)
        ld 2, 8(27)
        ld 11, 16(27)
        ld 3, .LC12@toc@l(3)
        rldicl 4, 4, 0, 32
        mtctr 5
        bctrl
        ld 2, 40(1)

And there is nothing wrong with this code, as such, but there is a linker bug
in binutils (https://sourceware.org/bugzilla/show_bug.cgi?id=18414) that will
misoptimize this code sequence to this:
        nop
        std     r2,40(r1)
        ld      r5,0(r27)
        ld      r2,8(r27)
        ld      r11,16(r27)
        ld      r3,-32472(r2)
        clrldi  r4,r4,32
        mtctr   r5
        bctrl
        ld      r2,40(r1)
because the linker does not know (and does not check) that the value in r2
changed in between the instruction using the .LC12@toc@ha (TOC-relative)
relocation and the instruction using the .LC12@toc@l(3) relocation.
Because it finds these instructions using the relocations (and not by
scanning the instructions), it has been asserted that there is no good way
to detect the change of r2 in between. As a result, this bug may never be
fixed (i.e. it may become part of the definition of the ABI). GCC was
updated to add extra dependencies on r2 to instructions using the @toc@l
relocations to avoid this problem, and we'll do the same here.

This is done as a separate pass because:
 1. These extra r2 dependencies are not really properties of the
    instructions, but rather due to a linker bug, and maybe one day we'll be
    able to get rid of them when targeting linkers without this bug (and,
    thus, keeping the logic centralized here will make that
    straightforward).
 2. There are ISel-level peephole optimizations that propagate the @toc@l
    relocations to some user instructions, and so the exta dependencies do
    not apply only to a fixed set of instructions (without undesirable
    definition replication).

The test case was reduced with the help of bugpoint, with minimal cleaning. I'm
looking forward to our upcoming MI serialization support, and with that, much
better tests can be created.

llvm-svn: 237556
2015-05-18 06:25:59 +00:00
Elena Demikhovsky a8200603d4 AVX-512: fixed extended load to 512-bit register
llvm-svn: 237537
2015-05-17 08:08:06 +00:00
Elena Demikhovsky 1d6a495d6d AVX-512: fixed a bug in mask operations - (i1 1) pattern
Filling k-reg with all-ones value was wrong,
(i1 1) should switch on only one bit in mask register

llvm-svn: 237536
2015-05-17 07:28:51 +00:00
Bill Schmidt 5ed84cdba8 [PPC64] Add vector pack/unpack support from ISA 2.07
This patch adds support for the following new instructions in the
Power ISA 2.07:

  vpksdss
  vpksdus
  vpkudus
  vpkudum
  vupkhsw
  vupklsw

These instructions are available through the vec_packs, vec_packsu,
vec_unpackh, and vec_unpackl built-in interfaces.  These are
lane-sensitive instructions, so the built-ins have different
implementations for big- and little-endian, and the instructions must
be marked as killing the vector swap optimization for now.

The first three instructions perform saturating pack operations.  The
fourth performs a modulo pack operation, which means it can be
represented with a vector shuffle, and conversely the appropriate
vector shuffles may cause this instruction to be generated.  The other
instructions are only generated via built-in support for now.

Appropriate tests have been added.

There is a companion patch to clang for the rest of this support.

llvm-svn: 237499
2015-05-16 01:02:12 +00:00
James Molloy cfb0443af6 Mark SMIN/SMAX/UMIN/UMAX nodes as legal and add patterns for them.
The new [SU]{MIN,MAX} SDNodes can be lowered directly to instructions for
most NEON datatypes - the big exclusion being v2i64.

llvm-svn: 237455
2015-05-15 16:15:57 +00:00
Brendon Cahoon 7c8a3b0ef6 [Hexagon] Generate hardware loop for a vectorized loop
The induction variable in the vectorized loop wasn't
recognized properly, so a hardware loop wasn't generated.

Differential Revision: http://reviews.llvm.org/D9722

llvm-svn: 237388
2015-05-14 20:36:19 +00:00
Brendon Cahoon 485bea74ad [Hexagon] Remove dead constant assignment in hardware loop pass
After converting a loop to a hardware loop, the pass should remove
any unnecessary instructions from the old compare-and-branch
code. This patch removes a dead constant assignment that was
used in the compare instruction.

Differential Revision: http://reviews.llvm.org/D9720

llvm-svn: 237373
2015-05-14 17:31:40 +00:00
Brendon Cahoon 9376e9998e [Hexagon] Check for underflow/wrap in hardware loop pass
If the loop trip count may underflow or wrap, the compiler should
not generate a hardware loop since the trip count will be
incorrect.

llvm-svn: 237365
2015-05-14 14:15:08 +00:00
Vasileios Kalintiris 70b744e4a1 [mips] Do not place users of $ra in the delay slot of call instructions.
Summary:
When we are trying to fill the delay slot of a call instruction, we must avoid
filler instructions that use the $ra register. This fixes the test
MultiSource/Applications/JM/lencod when we enable the forward delay slot filler.

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9670

llvm-svn: 237362
2015-05-14 13:17:56 +00:00
Artyom Skrobov a70dfe18d3 Re-apply r237247 - [AArch64] Codegen VMAX/VMIN for safe math cases
No longer breaks SPEC2000/2006

llvm-svn: 237361
2015-05-14 12:59:46 +00:00
Elena Demikhovsky d5b3e376d2 AVX-512: Added i1 type handling for calling conventions.
i1 type is a legal type on AVX-512 and can be passed as parameter or return value.
i1 is promoted to i8 on return and to i32 for call arguments (i8 is also promoted to i32 here).
The result code is similar to the previous X86 targets, where i1 is allways promoted to i8.

llvm-svn: 237350
2015-05-14 09:04:45 +00:00
Ahmed Bougacha 6402ad27c0 [CodeGen] Use standard -not gnueabi- naming for f16 libcalls on Darwin.
Other targets probably should as well.  Since r237161, compiler-rt has
both, but I don't see why anything other than gnueabi would use a
gnueabi naming scheme.

llvm-svn: 237324
2015-05-14 01:00:51 +00:00
Brendon Cahoon d11c92a41c [Hexagon] Generate loop1 instruction for nested loops
loop1 is for the outer loop and loop0 is for the inner loop.

Differential Revision: http://reviews.llvm.org/D9680

llvm-svn: 237266
2015-05-13 17:56:03 +00:00
Brendon Cahoon 254e656862 [Hexagon] Generate hardware loop when loop has a critical edge
The hardware loop pass should try to generate a hardware loop
instruction when the original loop has a critical edge.

Differential Revision: http://reviews.llvm.org/D9678

llvm-svn: 237258
2015-05-13 14:54:24 +00:00
Silviu Baranga 780a3b3be7 Revert r237247 - [AArch64] Codegen VMAX/VMIN.. as it is causing failures in SPEC2000/2006
llvm-svn: 237256
2015-05-13 14:03:18 +00:00
Artyom Skrobov b526681e08 [AArch64] Codegen VMAX/VMIN for safe math cases
llvm-svn: 237247
2015-05-13 12:01:09 +00:00
Sanjoy Das a1d39ba940 [Statepoints] Support for "patchable" statepoints.
Summary:
This change adds two new parameters to the statepoint intrinsic, `i64 id`
and `i32 num_patch_bytes`.  `id` gets propagated to the ID field
in the generated StackMap section.  If the `num_patch_bytes` is
non-zero then the statepoint is lowered to `num_patch_bytes` bytes of
nops instead of a call (the spill and reload code remains unchanged).
A non-zero `num_patch_bytes` is useful in situations where a language
runtime requires complete control over how a call is lowered.

This change brings statepoints one step closer to patchpoints.  With
some additional work (that is not part of this patch) it should be
possible to get rid of `TargetOpcode::STATEPOINT` altogether.

PlaceSafepoints generates `statepoint` wrappers with `id` set to
`0xABCDEF00` (the old default value for the ID reported in the stackmap)
and `num_patch_bytes` set to `0`.  This can be made more sophisticated
later.

Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9546

llvm-svn: 237214
2015-05-12 23:52:24 +00:00
Saleem Abdulrasool ee13fbe848 CodeGen: ignore DEBUG_VALUE nodes in KILL tagging
DEBUG_VALUE nodes do not take part in code generation.  Ignore them when
performing KILL updates.  Addresses PR23486.

llvm-svn: 237211
2015-05-12 23:36:18 +00:00
Chandler Carruth 942fba95e2 Revert r237175: [X86] Always return the sret parameter in eax/rax ...
This commit broke an x86 test and the bots have been broken for well
over an hour now so I'm just reverting.

llvm-svn: 237210
2015-05-12 23:34:27 +00:00
Reid Kleckner b465563b46 [X86] Always return the sret parameter in eax/rax, even on 32-bit
Summary:
This rule was always in the old SysV i386 ABI docs and the new ones that
H.J. Lu has put together, but we never noticed:

  EAX   scratch register; also used to return integer and pointer values
        from functions; also stores the address of a returned struct or union

Fixes PR23491.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9715

llvm-svn: 237175
2015-05-12 20:56:32 +00:00
Sundeep Kushwaha 83a4039c17 [PATCH] [HEXAGON] Add a test program to verify calling convention
for large struct return by value.

Differential Revision: http://reviews.llvm.org/D9709

llvm-svn: 237170
2015-05-12 20:13:10 +00:00
Pat Gavlin c7dc6d6ee7 [Statepoints] Split the calling convention and statepoint flags operand to STATEPOINT into two separate operands.
Differential Revision: http://reviews.llvm.org/D9623

llvm-svn: 237166
2015-05-12 19:50:19 +00:00
Petar Jovanovic e0de8f4efb [Mips] Return false for isFPCloseToIncomingSP()
On Mips, frame pointer points to the same side of the frame as the stack
pointer. This function is used to decide where to put register scavenging
spill slot. So far, it was put on the wrong side of the frame, and thus it
was too far away from $fp when frame was larger than 2^15 bytes.

Patch by Vladimir Radosavljevic.

http://reviews.llvm.org/D8895

llvm-svn: 237153
2015-05-12 17:14:05 +00:00
Tom Stellard 28d13a4b12 R600/SI: add pass to mark CF live ranges as non-spillable
Spilling can insert instructions almost anywhere, and this can mess
up control flow lowering in a multitude of ways, due to instruction
reordering. Let's sort this out the easy way: never spill registers
involved with control flow, i.e. saved EXEC masks.

Unfortunately, this does not work at all with optimizations disabled,
as the register allocator ignores spill weights. This should be
addressed in a future commit.

The test was reduced from the "stacks" shader of [1]. Some issues
trigger the machine verifier while another one is checked manually.

[1] http://madebyevan.com/webgl-path-tracing/

v2: only insert pass with optimizations enabled, merge test runs.

Patch by: Grigori Goronzy

llvm-svn: 237152
2015-05-12 17:13:02 +00:00
Sunil Srivastava d79dfcbc37 Changed renaming of local symbols by inserting a dot vefore the numeric suffix.
One code change and several test changes to match that
details in http://reviews.llvm.org/D9481

llvm-svn: 237150
2015-05-12 16:47:30 +00:00
Tom Stellard 8f96dfc9ea R600/SI: Remove M0Reg register class
It is no longer used.

llvm-svn: 237142
2015-05-12 15:00:52 +00:00
Tom Stellard 381a94a764 R600/SI: Remove explicit m0 operand from DS instructions
Instead add m0 as an implicit operand.  This helps avoid spills
of the m0 register in some cases.

llvm-svn: 237141
2015-05-12 15:00:49 +00:00
Tom Stellard 7b222e110f R600/SI: Make sendmsg test more strict
We want to make sure that the m0 copies are being cse'd.

llvm-svn: 237134
2015-05-12 14:18:16 +00:00
Elena Demikhovsky fae20d3565 AVX-512, X86: Added lowering for shift operations for SKX.
The other changes in the LowerShift() are not functional,
just to make the code more convenient.
So, the functional changes for SKX only.

llvm-svn: 237129
2015-05-12 13:25:46 +00:00
John Brawn 70605f7d22 [ARM] Use AEABI aligned function variants
AEABI defines aligned variants of memcpy etc. that can be faster than
the default version due to not having to do alignment checks. When
emitting target code for these functions make use of these aligned
variants if possible. Also convert memset to memclr if possible.

Differential Revision: http://reviews.llvm.org/D8060

llvm-svn: 237127
2015-05-12 13:13:38 +00:00
Igor Laevsky 87ef5eaf46 Reverse ordering of base and derived pointer during safepoint lowering.
According to the documentation in StackMap section for the safepoint we should have:
"The first Location in each pair describes the base pointer for the object. The second is the derived pointer actually being relocated."
But before this change we emitted them in reverse order - derived pointer first, base pointer second.

llvm-svn: 237126
2015-05-12 13:12:14 +00:00
Vasileios Kalintiris b48c905613 [mips][FastISel] Handle calls with non legal types i8 and i16.
Summary: Allow calls with non legal integer types based on i8 and i16 to be processed by mips fast-isel.

Based on a patch by Reed Kotler.

Test Plan:
"Make check" test forthcoming.
Test-suite passes at O0/O2 and with mips32 r1/r2

Reviewers: rkotler, dsanders

Subscribers: llvm-commits, rfuhler

Differential Revision: http://reviews.llvm.org/D6770

llvm-svn: 237121
2015-05-12 12:29:17 +00:00
Vasileios Kalintiris c07f848785 [mips][FastISel] Simplify callabi.ll by using multiple check prefixes.
Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9635

llvm-svn: 237119
2015-05-12 12:17:11 +00:00
Vasileios Kalintiris 32cd69a2eb [mips][FastISel] Allow computation of addresses from constant expressions.
Summary:
Try to compute addresses when the offset from a memory location is a constant
expression.

Based on a patch by Reed Kotler.

Test Plan:
Passes test-suite for -O0/O2 and mips 32 r1/r2

Reviewers: rkotler, dsanders

Subscribers: llvm-commits, aemerson, rfuhler

Differential Revision: http://reviews.llvm.org/D6767

llvm-svn: 237117
2015-05-12 12:08:31 +00:00
Elena Demikhovsky c1ac5d7bd5 AVX-512: select operation for i1 vectors
like: select i1 %cond, <16 x i1> %a, <16 x i1> %b.
I added pseudo-CMOV patterns to resolve the "select".
Added tests for KNL and SKX.

llvm-svn: 237106
2015-05-12 09:36:52 +00:00
Michael Kuperstein 6f5ff6905c [X86] DAGCombine should not assume arbitrary vector types are simple
The X86-specific DAGCombine for stores should not assume vector types are always simple.
This fixes PR23476.

Differential Revision: http://reviews.llvm.org/D9659

llvm-svn: 237097
2015-05-12 07:33:07 +00:00
Eric Christopher 824f42f209 Migrate existing backends that care about software floating point
to use the information in the module rather than TargetOptions.

We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.

For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.

Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.

llvm-svn: 237079
2015-05-12 01:26:05 +00:00
Andrew Kaylor cc14f387e8 [WinEH] Handle nested landing pads that return directly to the parent function.
Differential Revision: http://reviews.llvm.org/D9684

llvm-svn: 237063
2015-05-11 23:06:02 +00:00
Andrew Kaylor 762a6bea1f [WinEH] Update exception numbering to give handlers their own base state.
Differential Revision: http://reviews.llvm.org/D9512

llvm-svn: 237014
2015-05-11 19:41:19 +00:00
Pirama Arumuga Nainar af171e7720 [X86] Updates to X86 backend for f16 promotion
Summary:
r235215 adds support for f16 to be considered as a load/store type and
promote f16 operations to f32.

This patch has miscellaneous fixes for the X86 backend so all f16
operations are handled:
1. Set loadextaction for f16 vectors to expand.
2. Handle FP_EXTEND in a switch statement when handling v2f32
3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or
(store (SINT_TO_FP )) to a FILD.

Tests included.

Reviewers: ab, srhines, delena

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9092

llvm-svn: 237004
2015-05-11 17:14:39 +00:00
Elena Demikhovsky 3822af0715 AVX-512: Changed CC parameter in "cmp" intrinsic
from i8 to i32 according to the Intel Spec

by Igor Breger (igor.breger@intel.com)

llvm-svn: 236979
2015-05-11 09:03:14 +00:00
Elena Demikhovsky 0d7e9364d1 AVX-512: Added SKX instructions and intrinsics:
{add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae

By Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236971
2015-05-11 06:05:05 +00:00
Elena Demikhovsky f40342d6a2 AVX-512: fixed UINT_TO_FP operation for 512-bit types.
llvm-svn: 236955
2015-05-10 14:23:52 +00:00
Elena Demikhovsky 75d1489326 AVX-512: fixed a bug in i1 vectors lowering
llvm-svn: 236947
2015-05-10 10:33:32 +00:00
NAKAMURA Takumi 7f52d6d82c llvm/test/CodeGen/AArch64/tailcall_misched_graph.ll: s/REQUIRE/REQUIRES/
llvm-svn: 236928
2015-05-09 05:59:00 +00:00
James Y Knight fca02be3c1 Fix MergeConsecutiveStore for non-byte-sized memory accesses.
The bug showed up as a compile-time assertion failure:
  Assertion `NumBits >= MIN_INT_BITS && "bitwidth too small"' failed
when building msan tests on x86-64.

Prior to r236850, this bug was masked due to a bogus alignment check,
which also accidentally rejected non-byte-sized accesses. Afterwards,
an invalid ElementSizeBytes == 0 got further into the function, and
triggered the assertion failure.

It would probably be a good idea to allow it to handle merging stores
of unusual widths as well, but for now, to un-break it, I'm just
making the minimal fix.

Differential Revision: http://reviews.llvm.org/D9626

llvm-svn: 236927
2015-05-09 03:13:37 +00:00
Pete Cooper d54fb89901 [Fast-ISel] Don't mark the first use of a remat constant as killed.
When emitting something like 'add x, 1000' if we remat the 1000 then we should be able to
mark the vreg containing 1000 as killed.  Given that we go bottom up in fast-isel, a later
use of 1000 will be higher up in the BB and won't kill it, or be impacted by the lower kill.

However, rematerialised constant expressions aren't generated bottom up.  The local value save area
grows downwards.  This means that if you remat 2 constant expressions which both use 1000 then the
first will kill it, then the second, which is *lower* in the BB will read a killed register.

This is the case in the attached test where the 2 GEPs both need to generate 'add x, 6680' for the constant offset.

Note that this commit only makes kill flag generation conservative.  There's nothing else obviously wrong with
the local value save area growing downwards, and in fact it needs to for handling arbitrarily complex constant expressions.

However, it would be nice if there was a solution which would let us generate more accurate kill flags, or just kill flags completely.

llvm-svn: 236922
2015-05-09 00:51:03 +00:00
Arnold Schwaighofer f54b73d681 ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not non-aliasing distinct objects
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.

Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.

Fix for PR23459.

rdar://20740035

llvm-svn: 236916
2015-05-08 23:52:00 +00:00
Hans Wennborg ae0254dabc Switch lowering: cluster adjacent fall-through cases even at -O0
It's cheap to do, and codegen is much faster if cases can be merged
into clusters.

llvm-svn: 236905
2015-05-08 21:23:39 +00:00
Pete Cooper e4bb07ecff [Fast-ISel] Clear kill flags on registers replaced by updateValueMap.
When selecting an extract instruction, we don't actually generate code but instead work out which register we are reading, and rewrite uses of the extract def to the source register.  This is done via updateValueMap,.

However, its possible that the source register we are rewriting *to* to also have uses.  If those uses are after a kill of the value we are rewriting *from* then we have uses after a kill and the verifier fails.

This code checks for the case where the to register is also used, and if so it clears all kill on the from register.  This is conservative, but better that always clearing kills on the from register.

llvm-svn: 236897
2015-05-08 20:46:54 +00:00
Brendon Cahoon bece8edcdd [Hexagon] Generate more hardware loops
Refactored parts of the hardware loop pass to generate
more. Also, added more tests.

Differential Revision: http://reviews.llvm.org/D9568

llvm-svn: 236896
2015-05-08 20:18:21 +00:00
Pete Cooper 7f7c9f1dad [X86] Fast-ISel was incorrectly always killing the source of a truncate.
A trunc from i32 to i1 on x86_64 generates an instruction such as

%vreg19<def> = COPY %vreg9:sub_8bit<kill>; GR8:%vreg19 GR32:%vreg9

However, the copy here should only have the kill flag on the 32-bit path, not the 64-bit one.
Otherwise, we are killing the source of the truncate which could be used later in the program.

llvm-svn: 236890
2015-05-08 18:29:42 +00:00
Pat Gavlin cc0431d1c0 Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware.
This changes the shape of the statepoint intrinsic from:

  @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args)

to:

  @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args)

This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back.

In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation.

Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops.

Differential Revision: http://reviews.llvm.org/D9501

llvm-svn: 236888
2015-05-08 18:07:42 +00:00
Pete Cooper 85b1c48b20 Clear kill flags on all used registers when sinking instructions.
The test here was sinking the AND here to a lower BB:

	%vreg7<def> = ANDWri %vreg8, 0; GPR32common:%vreg7,%vreg8
	TBNZW %vreg8<kill>, 0, <BB#1>; GPR32common:%vreg8

which meant that vreg8 was read after it was killed.

This commit changes the code from clearing kill flags on the AND to clearing flags on all registers used by the AND.

llvm-svn: 236886
2015-05-08 17:54:32 +00:00
Brendon Cahoon df43e68629 [Hexagon] Update AnalyzeBranch, etc target hooks
Improved the AnalyzeBranch, InsertBranch, and RemoveBranch
functions in order to handle more of our branch instructions.
This requires changes to analyzeCompare and PredicateInstructions.
Specifically, we've added support for new value compare jumps,
improved handling of endloop, added more compare instructions,
and improved support for predicate instructions.

Differential Revision: http://reviews.llvm.org/D9559

llvm-svn: 236876
2015-05-08 16:16:29 +00:00
Andrea Di Biagio 84e22b9096 [X86] Teach 'getTargetShuffleMask' how to look through ISD::WrapperRIP when decoding a PSHUFB mask.
The function 'getTargetShuffleMask' already knows how to deal with PSHUFB nodes
where the mask node is a load from constant pool, and the constant pool node
is wrapped by a X86ISD::Wrapper node. This patch extends that logic by teaching
it how to also look through X86ISD::WrapperRIP.

This helps function combineX86ShufflesRecusively to combine more shuffle
sequences containing PSHUFB nodes if we are in RIPRel PIC mode.

Before this change, llc (with -relocation-model=pic -march=x86-64) was unable
to decode a pshufb where the mask was loaded from a constant pool. For example,
the no-op shuffle from test 'x86-fold-pshufb.ll' was not folded into its
operand, so instead of generating a single 'movaps' the backend always
generated a sub-optimal 'movdqa + pshufb' sequence.

Added test x86-fold-pshufb.ll.

llvm-svn: 236863
2015-05-08 15:11:07 +00:00
James Y Knight 38514d2177 Fix test added in r236850 for OSX builders.
Need to specify triple so that llvm emits the asm syntax that the
test expected.

llvm-svn: 236855
2015-05-08 14:04:54 +00:00
James Y Knight 284e7b3d6c Fix alignment checks in MergeConsecutiveStores.
1) check whether the alignment of the memory is sufficient for the
*merged* store or load to be efficient.

Not doing so can result in some ridiculously poor code generation, if
merging creates a vector operation which must be aligned but isn't.

2) DON'T check that the alignment of each load/store is equal. If
you're merging 2 4-byte stores, the first *might* have 8-byte
alignment, but the second certainly will have 4-byte alignment. We do
want to allow those to be merged.

llvm-svn: 236850
2015-05-08 13:47:01 +00:00
Vasileios Kalintiris 42544d6472 [mips] Emit the .insn directive for empty basic blocks.
Summary:
In microMIPS, labels need to know whether they are on code or data. This is
indicated with STO_MIPS_MICROMIPS and can be inferred by being followed
by instructions. For empty basic blocks, we can ensure this by emitting the
.insn directive after the label.

Also, this fixes some failures in our out-of-tree microMIPS buildbots, for the
exception handling regression tests under: SingleSource/Regression/C++/EH

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9530

llvm-svn: 236815
2015-05-08 09:10:15 +00:00
Eric Christopher 81fa35ca24 Now that we have a soft-float attribute, use it instead of the
hard coded command line option for the Mips soft float tests.

llvm-svn: 236801
2015-05-08 00:57:22 +00:00
Pete Cooper ba593ad3f3 Clear kill flags in tail duplication.
If we duplicate an instruction then we must also clear kill flags on any uses we rewrite.
Otherwise we might be killing a register which was used in other BBs.

For example, here the entry BB ended up with these instructions, the ADD having been tail duplicated.

	%vreg24<def> = t2ADDri %vreg10<kill>, 1, pred:14, pred:%noreg, opt:%noreg; GPRnopc:%vreg24 rGPR:%vreg10
	%vreg22<def> = COPY %vreg10; GPR:%vreg22 rGPR:%vreg10

	The copy here is inserted after the add and so needs vreg10 to be live.

llvm-svn: 236782
2015-05-07 21:48:26 +00:00
Pete Cooper f52123b454 [AArch64] Fix sext/zext folding in address arithmetic.
We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there.

Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use.

llvm-svn: 236764
2015-05-07 19:21:36 +00:00
Nemanja Ivanovic f3c94b1e3c Add VSX Scalar loads and stores to the PPC back end
This patch corresponds to review:
http://reviews.llvm.org/D9440

It adds a new register class to the PPC back end to contain single precision
values in VSX registers. Additionally, it adds scalar loads and stores for
VSX registers.

llvm-svn: 236755
2015-05-07 18:24:05 +00:00
Diego Novillo de5b8016ab Fix information loss in branch probability computation.
Summary:
This addresses PR 22718. When branch weights are too large, they were
being clamped to the range [1, MaxWeightForBB]. But this clamping is
only applied to edges that go outside the range, so it distorts the
relative branch probabilities.

This patch changes the weight calculation to scale every branch so the
relative probabilities are preserved. The scaling is done differently
now. First, all the branch weights are added up, and if the sum exceeds
32 bits, it computes an integer scale to bring all the weights within
the range.

The patch fixes an existing test that had slightly wrong branch
probabilities due to the previous clamping. It now gets branch weights
scaled accordingly.

Reviewers: dexonsmith

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9442

llvm-svn: 236750
2015-05-07 17:22:06 +00:00
Sanjay Patel a9f6d3505d [x86] eliminate unnecessary shuffling/moves with unary scalar math ops (PR21507)
Finish the job that was abandoned in D6958 following the refactoring in
http://reviews.llvm.org/rL230221:

1. Uncomment the intrinsic def for the AVX r_Int instruction.
2. Add missing r_Int entries to the load folding tables; there are already
   tests that check these in "test/Codegen/X86/fold-load-unops.ll", so I
   haven't added any more in this patch.
3. Add patterns to solve PR21507 ( https://llvm.org/bugs/show_bug.cgi?id=21507 ).

So instead of this:

  movaps	%xmm0, %xmm1
  rcpss	%xmm1, %xmm1
  movss	%xmm1, %xmm0

We should now get:

  rcpss	%xmm0, %xmm0

And instead of this:

  vsqrtss	%xmm0, %xmm0, %xmm1
  vblendps	$1, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm1[0],xmm0[1,2,3]

We should now get:

  vsqrtss	%xmm0, %xmm0, %xmm0


Differential Revision: http://reviews.llvm.org/D9504

llvm-svn: 236740
2015-05-07 15:48:53 +00:00
Elena Demikhovsky 29792e9a80 AVX-512: Added all forms of FP compare instructions for KNL and SKX.
Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec.

By Igor Breger (igor.breger@intel.com)

llvm-svn: 236714
2015-05-07 11:24:42 +00:00
NAKAMURA Takumi 386c22c0ff llvm/test/CodeGen/X86/llc-override-mcpu-mattr.ll: Tweak not to be affected by x64 Calling Convention.
llvm-svn: 236710
2015-05-07 10:18:28 +00:00
Akira Hatanaka 3058d0f080 Let llc and opt override "-target-cpu" and "-target-features" via command line
options.

This commit fixes a bug in llc and opt where "-mcpu" and "-mattr" wouldn't
override function attributes "-target-cpu" and "-target-features" in the IR.

Differential Revision: http://reviews.llvm.org/D9537

llvm-svn: 236677
2015-05-06 23:54:14 +00:00
Pete Cooper 27483915e8 Handle dead defs in the if converter.
We had code such as this:
  r2 = ...
  t2Bcc

label1:
  ldr ... r2

label2;
  return r2<dead, def>

The if converter was transforming this to
   r2<def> = ...
   return [pred] r2<dead,def>
   ldr <r2, kill>
   return

which fails the machine verifier because the ldr now reads from a dead def.

The fix here detects dead defs in stepForward and passes them back to the caller in the clobbers list.  The caller then clears the dead flag from the def is the value is live.

llvm-svn: 236660
2015-05-06 22:51:04 +00:00
Pete Cooper 54085cdc7b Fix incorrect kill flags in fastisel.
If called twice in the same BB on the same constant, FastISel::fastEmit_ri_ was marking the materialized vreg as killed on each use, instead of only the last use.

Change this to only mark the last use as killed by making earlier uses check if the vreg is already used elsewhere.

llvm-svn: 236650
2015-05-06 22:09:29 +00:00
Pete Cooper d31583ddfb [x86] Fix register class of folded load index reg.
When folding a load in to another instruction, we need to fix the class of the index register
Otherwise, it could be something like GR64 not GR64_NOSP and would fail the machine verifier.

llvm-svn: 236644
2015-05-06 21:37:19 +00:00
Tim Northover e4310fe946 CodeGen: move over-zealous assert into actual if statement.
It's quite possible to encounter an insertvalue instruction that's more deeply
nested than the value we're looking for, but when that happens we really
mustn't compare beyond the end of the index array.

Since I couldn't see any guarantees about what comparisons std::equal makes, we
probably need to directly check the size beforehand. In practice, I suspect
most std::equal implementations would probably bail early, which would be OK.
But just in case...

rdar://20834485

llvm-svn: 236635
2015-05-06 20:07:38 +00:00
Duncan P. N. Exon Smith 653c1099b4 DwarfDebug: Emit number of bytes in .debug_loc entry directly
Emit the number of bytes in a `.debug_loc` entry directly.  The old code
created temp labels (expensive), emitted the difference between them,
and then emitted one on each side of the relevant bytes.

(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`
(the optimized version of ld64's `-save-temps` when linking the
`verify-uselistorder` executable in an LTO bootstrap).  I've hacked
`MCContext::Allocate()` to just call `malloc()` instead of using the
`BumpPtrAllocator` so that the heap profile is easier to read.  As far
as peak memory is concerned, `MCContext::Allocate()` is equivalent to a
leak, since it only gets freed at process teardown.

In my heap profile, this patch drops memory usage of
`DwarfDebug::emitDebugLoc()` from 132.56 MB (11.4%) down to 29.86 MB
(2.7%) at peak memory.  Some of that must be noise from `SmallVector`
(or other) allocations -- peak memory only dropped from 1160 MB down to
1100 MB -- but this nevertheless shaves 5% off the top.)

llvm-svn: 236629
2015-05-06 19:11:20 +00:00
Pawel Bylica 3b0adaf6b0 Readd the regression test from r236584. Calling convention fixed to linux.
llvm-svn: 236610
2015-05-06 16:43:21 +00:00
Pete Cooper d927c6eaf8 [ARM] Fast-Isel was incorrectly selecting <2 x double> adds.
With neon enabled, we reach SelectBinaryFPOp and are able to get registers for a <2 x double> add.

However, we shouldn't actually attempt arithmetic on it as ARMIselLowering says "v2f64 is legal so that QR subregs can be extracted as f64 elements, but neither Neon nor VFP support any arithmetic operations on it."

This commit disables SelectBinaryFPOp for any vector types.  There's already a FIXME to try handle neon.  Doing so would require fixing this conditional which isn't safe for vectors 'VT == MVT::f64 || VT == MVT::i64'

llvm-svn: 236609
2015-05-06 16:39:17 +00:00
Bill Schmidt 5fe2e25f7c [PPC64LE] Adjust vector splats during VSX swap optimization
The initial code drop for VSX swap optimization permitted the
optimization only when all operations in a web of related computation
are lane-insensitive.  For some lane-sensitive operations, we can
still permit the optimization provided that we make adjustments to
those operations.  This patch adds special handling for vector splats
so that their presence doesn't kill the optimization.

Vector splats are lane-sensitive since they identify by number a
vector element to be used as the source of a splat.  When swap
optimizations take place, the desired vector element will move to the
opposite doubleword of the quadword vector.  We thus replace the index
I by (I + N/2) % N, where N is the number of elements in the vector.

A new test case is added to test that swap optimization succeeds when
vector splats are present, and that the proper input element is used
as the source of the splat.

An ancillary change removes SH_BUILDVEC as one of the kinds of special
handling that may be required by VSX swap optimization.  From
experience with GCC, I had expected to need some modifications for
vector build operations, but I did not find that to be the case.

llvm-svn: 236606
2015-05-06 15:40:46 +00:00
Artyom Skrobov 3f8eae92a4 [ARM] generate VMAXNM/VMINNM for a compare followed by a select, in safe math mode too
llvm-svn: 236590
2015-05-06 11:44:10 +00:00
Pawel Bylica b25491faf4 Revert regression test from r236584.
Temporary remove a regression test added in r236584. It fails on Windows.

llvm-svn: 236586
2015-05-06 10:41:46 +00:00
Pawel Bylica 9f1fb9d1ef SelectionDAG: Handle out-of-bounds index in extract vector element
Summary: This patch correctly handles undef case of EXTRACT_VECTOR_ELT node where the element index is constant and not less than vector size.

Test Plan:
CodeGen for X86 test included.
Also one incorrect regression test fixed.

Reviewers: qcolombet, chandlerc, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D9250

llvm-svn: 236584
2015-05-06 10:19:14 +00:00
Ahmed Bougacha e8d0c4ccea [ARM][FastISel] Use TST #1 instead of CMP #0 for select.
Since r234249, i1 are sext instead of zext; because of that, doing
"CMP rN, #0; IT EQ/NE" isn't correct anymore.

"TST #1" is the conservatively correct alternative - the tradeoff being
that it doesn't have a 16-bit encoding -, so use that instead.

llvm-svn: 236569
2015-05-06 04:14:02 +00:00
Sanjoy Das 77b16b78e9 [Statepoints] Remove broken test case.
statepoint-indirect-return.ll breaks on linux systems.  Delete the test
case to make the bots green while I figure out what the right fix is.

llvm-svn: 236568
2015-05-06 02:51:46 +00:00
Sanjoy Das c6bf3e9f12 [StatepointLowering] Don't create temporary instructions. NFCI.
Summary:
Instead of creating a temporary call instruction and lowering that, use
SelectionDAGBuilder::lowerCallOperands.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9480

llvm-svn: 236563
2015-05-06 02:36:20 +00:00
Pete Cooper d0dae3e577 [X86 fast-isel] Constrain the index reg class to not include SP.
The index reg on instructions with complex address modes is a GPR64_NOSP.  Constrain it to appease the machine verifier.

llvm-svn: 236557
2015-05-05 23:41:53 +00:00
Pete Cooper ce9ad757c7 Fix IfConverter to handle regmask machine operands.
Note, this is a recommit of r236515 after fixing an error in r236514.  The buildbot ran fast enough that it picked up r236514 prior to r236515 and threw an error.  r236515 itself ran 'make check' without errors.

Original commit message follows:

A regmask (typically seen on a call) clobbers the set of registers it lists.  The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.

These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier.  Otherwise, uses after the if converted call could think they are reading an undefined register.

Reviewed by Matthias Braun and Quentin Colombet.

llvm-svn: 236550
2015-05-05 22:09:41 +00:00
Peter Collingbourne 85a0e23bc8 Thumb2SizeReduction: Check the correct set of registers for LDMIA.
The register set for LDMIA begins at offset 3, not 4. We were previously
missing the short encoding of this instruction in the case where the base
register was the first register in the register set.

Also clean up some dead code:

- The isARMLowRegister check is redundant with what VerifyLowRegs does;
  replace with an assert.
- Remove handling of LDMDB instruction, which has no short encoding (and
  does not appear in ReduceTable).

Differential Revision: http://reviews.llvm.org/D9485

llvm-svn: 236535
2015-05-05 20:07:10 +00:00
Ulrich Weigand 9958c489bb [DAGCombiner] Account for getVectorIdxTy() when narrowing vector load
This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert
the element number from getVectorIdxTy() to PtrTy before doing pointer
arithmetic on it.  This is needed on z, where element numbers are i32
but pointers are i64.

Original patch by Richard Sandiford.

llvm-svn: 236530
2015-05-05 19:34:10 +00:00
Ulrich Weigand af2c618e2b [DAGCombiner] Fix ReplaceExtractVectorEltOfLoadWithNarrowedLoad for BE
For little-endian, the function would convert (extract_vector_elt (load X), Y)
to X + Y*sizeof(elt).  For big-endian it would instead use
X + sizeof(vec) - Y*sizeof(elt).  The big-endian case wasn't right since
vector index order always follows memory/array order, even for big-endian.
(Note that the current handling has to be wrong for Y==0 since it would
access beyond the end of the vector.)

Original patch by Richard Sandiford.

llvm-svn: 236529
2015-05-05 19:33:37 +00:00
Ulrich Weigand 2693c0a491 [LegalizeVectorTypes] Allow single loads and stores for more short vectors
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal.
E.g. it would load a v4i8 as an i32 if i32 was legal.

This patch extends that behavior to promoted integers as well as legal ones.
If the integer type for the full vector width is TypePromoteInteger,
the element type is going to be TypePromoteInteger too, and it's still
better to use a single promoting load or truncating store rather than N
individual promoting loads or truncating stores.  E.g. if you have a v2i8
on a target where i16 is promoted to i32, it's better to load the v2i8 as
an i16 rather than load both i8s individually.

Original patch by Richard Sandiford.

llvm-svn: 236528
2015-05-05 19:32:57 +00:00
Ulrich Weigand c1708b2618 [SystemZ] Add vector intrinsics
This adds intrinsics to allow access to all of the z13 vector instructions.
Note that instructions whose semantics can be described by standard LLVM IR
do not get any intrinsics.

For each instructions whose semantics *cannot* (fully) be described, we
define an LLVM IR target-specific intrinsic that directly maps to this
instruction.

For instructions that also set the condition code, the LLVM IR intrinsic
returns the post-instruction CC value as a second result.  Instruction
selection will attempt to detect code that compares that CC value against
constants and use the condition code directly instead.

Based on a patch by Richard Sandiford.

llvm-svn: 236527
2015-05-05 19:31:09 +00:00
Ulrich Weigand 5211f9ff4d [SystemZ] Mark v1i128 and v1f128 as unsupported
The ABI specifies that <1 x i128> and <1 x fp128> are supposed to be
passed in vector registers.  We do not yet support those types, and
some infrastructure is missing before we can do so.

In order to prevent accidentally generating code violating the ABI,
this patch adds checks to detect those types and error out if user
code attempts to use them.

llvm-svn: 236526
2015-05-05 19:30:05 +00:00
Ulrich Weigand cd2a1b5341 [SystemZ] Handle sub-128 vectors
The ABI allows sub-128 vectors to be passed and returned in registers,
with the vector occupying the upper part of a register.  We therefore
want to legalize those types by widening the vector rather than promoting
the elements.

The patch includes some simple tests for sub-128 vectors and also tests
that we can recognize various pack sequences, some of which use sub-128
vectors as temporary results.  One of these forms is based on the pack
sequences generated by llvmpipe when no intrinsics are used.

Signed unpacks are recognized as BUILD_VECTORs whose elements are
individually sign-extended.  Unsigned unpacks can have the equivalent
form with zero extension, but they also occur as shuffles in which some
elements are zero.

Based on a patch by Richard Sandiford.

llvm-svn: 236525
2015-05-05 19:29:21 +00:00
Ulrich Weigand 49506d78e7 [SystemZ] Add CodeGen support for scalar f64 ops in vector registers
The z13 vector facility includes some instructions that operate only on the
high f64 in a v2f64, effectively extending the FP register set from 16
to 32 registers.  It's still better to use the old instructions if the
operands happen to fit though, since the older instructions have a shorter
encoding.

Based on a patch by Richard Sandiford.

llvm-svn: 236524
2015-05-05 19:28:34 +00:00
Ulrich Weigand 80b3af7ab3 [SystemZ] Add CodeGen support for v4f32
The architecture doesn't really have any native v4f32 operations except
v4f32->v2f64 and v2f64->v4f32 conversions, with only half of the v4f32
elements being used.  Even so, using vector registers for <4 x float>
and scalarising individual operations is much better than generating
completely scalar code, since there's much less register pressure.
It's also more efficient to do v4f32 comparisons by extending to 2
v2f64s, comparing those, then packing the result.

This particularly helps with llvmpipe.

Based on a patch by Richard Sandiford.

llvm-svn: 236523
2015-05-05 19:27:45 +00:00
Ulrich Weigand cd808237b2 [SystemZ] Add CodeGen support for v2f64
This adds ABI and CodeGen support for the v2f64 type, which is natively
supported by z13 instructions.

Based on a patch by Richard Sandiford.

llvm-svn: 236522
2015-05-05 19:26:48 +00:00
Ulrich Weigand ce4c109585 [SystemZ] Add CodeGen support for integer vector types
This the first of a series of patches to add CodeGen support exploiting
the instructions of the z13 vector facility.  This patch adds support
for the native integer vector types (v16i8, v8i16, v4i32, v2i64).

When the vector facility is present, we default to the new vector ABI.
This is characterized by two major differences:
- Vector types are passed/returned in vector registers
  (except for unnamed arguments of a variable-argument list function).
- Vector types are at most 8-byte aligned.

The reason for the choice of 8-byte vector alignment is that the hardware
is able to efficiently load vectors at 8-byte alignment, and the ABI only
guarantees 8-byte alignment of the stack pointer, so requiring any higher
alignment for vectors would require dynamic stack re-alignment code.

However, for compatibility with old code that may use vector types, when
*not* using the vector facility, the old alignment rules (vector types
are naturally aligned) remain in use.

These alignment rules are not only implemented at the C language level
(implemented in clang), but also at the LLVM IR level.  This is done
by selecting a different DataLayout string depending on whether the
vector ABI is in effect or not.

Based on a patch by Richard Sandiford.

llvm-svn: 236521
2015-05-05 19:25:42 +00:00
Pete Cooper 05b84d4168 Revert "Fix IfConverter to handle regmask machine operands."
This reverts commit b27413cbfd78d959c18e713bfa271fb69e6b3303 (ie r236515).

This is to get the bots green while i investigate the failures.

llvm-svn: 236517
2015-05-05 18:49:05 +00:00
Pete Cooper 6ebc207703 Fix IfConverter to handle regmask machine operands.
A regmask (typically seen on a call) clobbers the set of registers it lists.  The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.

These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier.  Otherwise, uses after the if converted call could think they are reading an undefined register.

Reviewed by Matthias Braun and Quentin Colombet.

llvm-svn: 236515
2015-05-05 18:31:36 +00:00
Reid Kleckner 0738a9c02e Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236360.

This change exposed a bug in WinEHPrepare by opting win32 code into EH
preparation. We already knew that WinEHPrepare has bugs, and is the
status quo for x64, so I don't think that's a reason to hold off on this
change. I disabled exceptions in the sanitizer tests in r236505 and an
earlier revision.

llvm-svn: 236508
2015-05-05 17:44:16 +00:00
Quentin Colombet 61b305edfd [ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.

As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.


** Context **

Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.


** Motivating example **

Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b)  {
 %tmp = alloca i32, align 4
 %tmp2 = icmp slt i32 %a, %b
 br i1 %tmp2, label %true, label %false

true:
 store i32 %a, i32* %tmp, align 4
 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
 br label %false

false:
 %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
 ret i32 %tmp.0
}

On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f:                                     ; @f
; BB#0:
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
LBB0_2:                                 ; %false
  mov  sp, x29
  ldp x29, x30, [sp], #16
  ret

With shrink-wrapping we could generate:
_f:                                     ; @f
; BB#0:
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
  add sp, x29, #16            ; =16
  ldp x29, x30, [sp], #16
LBB0_2:                                 ; %false
  ret

Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.


** Proposed Solution **

This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.

Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.

The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.

Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.


** Design Decisions **

1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
  pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.

Differential Revision: http://reviews.llvm.org/D9210

<rdar://problem/3201744>

llvm-svn: 236507
2015-05-05 17:38:16 +00:00
Kit Barton d4eb73c00e This patch adds ABI support for v1i128 data type.
It adds v1i128 to the appropriate register classes and checks parameter passing
and return values.

This is related to http://reviews.llvm.org/D9081, which will add instructions
that exploit the v1i128 datatype.

Phabricator review: http://reviews.llvm.org/D9475

llvm-svn: 236503
2015-05-05 16:10:44 +00:00
Daniel Sanders eda60d217b [mips] Generate code for insert/extract operations when using the N64 ABI and MSA.
Summary:
When using the N64 ABI, element-indices use the i64 type instead of i32.
In many cases, we can use iPTR to account for this but additional patterns
and pseudo's are also required.

This fixes most (but not quite all) failures in the test-suite when using
N64 and MSA together.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9342

llvm-svn: 236494
2015-05-05 10:32:24 +00:00
Daniel Sanders 4160c802d9 [mips][msa] Test basic operations for the N32 ABI too.
Summary:
This required adding instruction aliases for dneg.

N64 will be enabled shortly but requires additional bugfixes.

Reviewers: vkalintiris

Reviewed By: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9341

llvm-svn: 236489
2015-05-05 08:48:35 +00:00
Reid Kleckner 9dad227b85 [X86] Fix assertion while DAG combining offsets and ExternalSymbols
ExternalSymbol nodes do not contain offsets, unlike GlobalValue nodes.

llvm-svn: 236471
2015-05-04 23:22:36 +00:00
Sanjay Patel ec2d7358b9 zap windows line endings; NFC
llvm-svn: 236460
2015-05-04 21:27:27 +00:00
Tim Northover 851ff69b42 CodeGen: match up correct insertvalue indices when assessing tail calls.
When deciding whether a value comes from the aggregate or inserted value of an
insertvalue instruction, we compare the indices against those of the location
we're interested in. One of the lists needs reversing because the input data is
backwards (so that modifications take place at the end of the SmallVector), but
we were reversing both before leading to incorrect results.

Should fix PR23408

llvm-svn: 236457
2015-05-04 20:41:51 +00:00
Elena Demikhovsky d41e506342 AVX-512: added a test for encoding
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236421
2015-05-04 12:59:15 +00:00
Elena Demikhovsky 60eb9db7bb AVX-512: added calling convention for i1 vectors in 32-bit mode.
Fixed some bugs in extend/truncate for AVX-512 target.
Removed VBROADCASTM (masked broadcast) node, since it is not used any more.

llvm-svn: 236420
2015-05-04 12:40:50 +00:00
Elena Demikhovsky 52266388f8 AVX-512: added integer "add" and "sub" instructions with saturation for SKX
with intrinsics and tests

by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236418
2015-05-04 12:35:55 +00:00
Elena Demikhovsky 2557a22be7 AVX-512: Added VPACK* instructions forms for KNL and SKX
and their intrinsics
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236414
2015-05-04 09:14:02 +00:00
Elena Demikhovsky 1b60ed7069 Masked gather and scatter intrinsics - enabled codegen for KNL.
llvm-svn: 236394
2015-05-03 07:12:25 +00:00
Simon Pilgrim 017ca19384 [DAGCombiner] Enabled vector float/double -> int constant folding
llvm-svn: 236387
2015-05-02 13:04:07 +00:00
Simon Pilgrim e170a4f5fa Line ending fix
llvm-svn: 236386
2015-05-02 11:50:47 +00:00
Simon Pilgrim 7d6df82dd1 [SSE] Added vector int (i32 and i64) -> float/double conversion tests
llvm-svn: 236385
2015-05-02 11:42:47 +00:00
Simon Pilgrim 6e3b7bad11 [SSE] Added vector float/double -> i32 and i64 conversion tests
llvm-svn: 236384
2015-05-02 11:18:47 +00:00
Eric Christopher a2d44dee73 Rework test to use FileCheck by making sure we have no xmm registers
with numbers.

llvm-svn: 236373
2015-05-02 01:06:17 +00:00
Reid Kleckner 83d89fa546 Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236359. Things are still broken despite testing. :(

llvm-svn: 236360
2015-05-01 22:50:14 +00:00
Reid Kleckner 51476acd77 Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86"
This reverts commit r236340.

llvm-svn: 236359
2015-05-01 22:40:25 +00:00
Colin LeMahieu bb0d7cbee1 [Hexagon] r236351 fix does not work on builder configurations yet.
llvm-svn: 236358
2015-05-01 22:39:20 +00:00
Quentin Colombet 0de2346859 [AArch64][FastISel] Variant of the logical instructions that use two input
registers cannot write on SP.

rdar://problem/20748715

llvm-svn: 236352
2015-05-01 21:34:57 +00:00