Commit Graph

34654 Commits

Author SHA1 Message Date
Craig Topper 87990ee4ec [X86] Remove special validation for INT immediate operand from AsmParser. Instead mark its operand type as u8imm which will cause it to fail to match. This is more consistent with other instruction behavior.
This also fixes a bug where negative immediates below -128 were not being reported as errors.

llvm-svn: 249989
2015-10-11 18:27:24 +00:00
Craig Topper a71630729d [X86] Simplify immediate range checking code.
llvm-svn: 249979
2015-10-11 16:38:14 +00:00
Simon Pilgrim 52d47e5704 [X86][XOP] Added support for the lowering of 128-bit vector integer comparisons to XOP PCOM/PCOMU instructions.
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).

llvm-svn: 249976
2015-10-11 14:15:17 +00:00
Craig Topper 55b1f29203 Change isUIntN/isIntN calls with constant N to use the template version. NFC
llvm-svn: 249952
2015-10-10 20:17:07 +00:00
Jonas Paulsson 63a2b6862e [SystemZ] Fixes in the backend I/R.
expandPostRAPseudo():
STX -> 2 * STD: The first STD should not have the kill flag set for the address.

SystemZElimCompare:
BRC -> BRCT conversion: Don't forget to remove the CC<use,kill> operand.

Needed to make SystemZ/asm-17.ll pass with -verify-machineinstrs, which
now runs with this flag.

Reviewed by Ulrich Weigand.

llvm-svn: 249945
2015-10-10 07:14:24 +00:00
Craig Topper 7143d8001a Use range-based for loops. NFC.
llvm-svn: 249941
2015-10-10 05:25:06 +00:00
Craig Topper 7d5b23101c Use emplace_back instead of a constructor call and push_back. NFC
llvm-svn: 249940
2015-10-10 05:25:02 +00:00
David Majnemer bfa5b98201 [WinEH] Remove more dead code
wineh-parent is dead, so is ValueOrMBB.

llvm-svn: 249920
2015-10-10 00:04:29 +00:00
Reid Kleckner 14e773500e [WinEH] Delete the old landingpad implementation of Windows EH
The new implementation works at least as well as the old implementation
did.

Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.

llvm-svn: 249918
2015-10-09 23:34:53 +00:00
David Majnemer 35d27b21a1 [WinEH] Insert the catchpad return before CSR restoration
x64 catchpads use rax to inform the unwinder where control should go
next.  However, we must initialize rax before the epilogue sequence so
as to not perturb the unwinder.

llvm-svn: 249910
2015-10-09 22:18:45 +00:00
James Y Knight 692e037499 Fix assert when emitting llvm.pow.f86.
This occurred due to introducing the invalid i64 type after type
legalization had already finished, in an attempt to workaround bitcast
f64 -> v2i32 not doing constant folding.

The *right* thing is to actually fix bitcast, but that has other
complications. So, for now, just get rid of the broken workaround, and
check in a test-case showing that it doesn't crash, with TODOs for
emitting proper code.

llvm-svn: 249908
2015-10-09 21:36:19 +00:00
James Y Knight 5b8217bc05 Fix assert in X86 backend.
When running combine on an extract_vector_elt, it wants to look through
a bitcast to check if the argument to the bitcast was itself an
extract_vector_elt with particular operands.

However, it called getOperand() on the argument to the bitcast *before*
checking that the opcode was EXTRACT_VECTOR_ELT, assert-failing if there
were zero operands for the actual opcode.

Fix, and add trivial test.

llvm-svn: 249891
2015-10-09 20:10:14 +00:00
Dan Gohman ee1588ce96 [WebAssembly] Rename floating-point operators to match their spec names.
llvm-svn: 249859
2015-10-09 17:50:00 +00:00
Duncan P. N. Exon Smith 769e1a972d AArch64: Make getNextNode() cleanup in r249764 more clear
After r249764, if you didn't see the full context, it looked like
`std::next(I)` would get the same result as
`++MachineBasicBlock::iterator(I)`.  However, `I` is a `MachineInstr*`
(not a `MachineBasicBlock::iterator`).

Use the `getIterator()` helper I added later (r249782) to make this code
more clear.

llvm-svn: 249852
2015-10-09 16:54:54 +00:00
Jun Bum Lim 0aace13d18 Improve ISel across lane float min/max reduction
In vectorized float min/max reduction code, the final "reduce" step
is sub-optimal. In AArch64, this change wll combine :

  svn0 = vector_shuffle t0, undef<2,3,u,u>
  fmin = fminnum t0,svn0
  svn1 = vector_shuffle fmin, undef<1,u,u,u>
  cc = setcc fmin, svn1, ole
  n0 = extract_vector_elt cc, #0
  n1 = extract_vector_elt fmin, #0
  n2 = extract_vector_elt fmin, #1
  result = select n0, n1,n2
into :
  result = llvm.aarch64.neon.fminnmv t0

This change extends r247575.

llvm-svn: 249834
2015-10-09 14:11:25 +00:00
Jonas Paulsson ee3685fd45 [SystemZ] Remove unused code in SystemZElimCompare.cpp
The Reference IndirectDef and IndirectUse members were unused and therefore
removed.

llvm-svn: 249824
2015-10-09 11:27:44 +00:00
Nemanja Ivanovic d389657399 Vector element extraction without stack operations on Power 8
This patch corresponds to review:
http://reviews.llvm.org/D12032

This patch builds onto the patch that provided scalar to vector conversions
without stack operations (D11471).
Included in this patch:

    - Vector element extraction for all vector types with constant element number
    - Vector element extraction for v16i8 and v8i16 with variable element number
    - Removal of some unnecessary COPY_TO_REGCLASS operations that ended up
      unnecessarily moving things around between registers

Not included in this patch (will be in upcoming patch):

    - Vector element extraction for v4i32, v4f32, v2i64 and v2f64 with
      variable element number
    - Vector element insertion for variable/constant element number

Testing is provided for all extractions. The extractions that are not
implemented yet are just placeholders.

llvm-svn: 249822
2015-10-09 11:12:18 +00:00
Jonas Paulsson 5b3bab40b2 [SystemZ] Remove superfluous braces in SystemZShortenInst.cpp
llvm-svn: 249812
2015-10-09 07:19:20 +00:00
Jonas Paulsson 18d877f79b [SystemZ] Minor bugfixes.
LLCH, LLHH and CLIH had the wrong register classes for the def-operand.
Tie operands if changing opcode to an instruction with tied ops.
Comment typo fix.

These fixes were needed in order to make regression test case
SystemZ/asm-18.ll pass with -verify-machineinstrs (not used by
default).

Reviewed by Ulrich Weigand.

llvm-svn: 249811
2015-10-09 07:19:16 +00:00
Jonas Paulsson 0a9049ba82 [SystemZ] Bugfix in SystemZAsmParser.cpp.
Let parseRegister() allow RegFP Group if expecting RegV Group, since the
%f register prefix yields the FP group even while used with vector instructions.

Reviewed by Ulrich Weigand.

llvm-svn: 249810
2015-10-09 07:19:12 +00:00
Saleem Abdulrasool 1825fac3c9 ARM: tweak WoA frame lowering
Accept r11 when targeting Windows on ARM rather than just low registers.
Because we are in a thumb-2 only mode, this may be slightly more expensive in
code size, but results in better code for the environment since it spills the
frame register, which is generally desired for fast stack walking as per the
ABI.

llvm-svn: 249804
2015-10-09 03:19:03 +00:00
Reid Kleckner ae44e871cd Revert "Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"""
This reverts commit r249794.

Apparently my checkouts are full of unexpected surprises today.

llvm-svn: 249796
2015-10-09 01:13:17 +00:00
Reid Kleckner b510401785 Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64""
This reverts commit r249032.

TODO write commit msg

llvm-svn: 249794
2015-10-09 01:11:37 +00:00
Duncan P. N. Exon Smith d389165c14 AArch64: Stop using MachineInstr::getNextNode()
Stop using `getNextNode()` to get an insertion point (at least, in this
one place).  Instead, use iterator logic directly.

The `getNextNode()` interface isn't actually supposed to work for
creating iterators; it's supposed to return `nullptr` (not a real
iterator) if this is the last node.  It's currently broken and will
"happen" to work, but if we ever fix the function, we'll get some
strange failures in places like this.

llvm-svn: 249764
2015-10-08 22:43:26 +00:00
Duncan P. N. Exon Smith a3da44882f PowerPC: Don't use getNextNode() for insertion point
Stop using `getNextNode()` to create an insertion point for machine
instructions (at least, in this one place).  Instead, use an iterator.
As a drive-by, clean up dump statements to use iterator logic.

The `getNextNode()` interface isn't actually supposed to work for
insertion points; it's supposed to return `nullptr` if this is the last
node.  It's currently broken and will "happen" to work, but if we ever
fix the function, we'll get some strange failures.

llvm-svn: 249758
2015-10-08 22:20:37 +00:00
Evgeniy Stepanov 5fe279e727 Add Triple::isAndroid().
This is a simple refactoring that replaces Triple.getEnvironment()
checks for Android with Triple.isAndroid().

llvm-svn: 249750
2015-10-08 21:21:24 +00:00
Eric Christopher 11e5983658 Move the MMX subtarget feature out of the SSE set of features and into
its own variable.

This is needed so that we can explicitly turn off MMX without turning
off SSE and also so that we can diagnose feature set incompatibilities
that involve MMX without SSE.

Rationale:

// sse3
__m128d test_mm_addsub_pd(__m128d A, __m128d B) {
  return _mm_addsub_pd(A, B);
}

// mmx
void shift(__m64 a, __m64 b, int c) {
  _mm_slli_pi16(a, c);
  _mm_slli_pi32(a, c);
  _mm_slli_si64(a, c);
  _mm_srli_pi16(a, c);
  _mm_srli_pi32(a, c);
  _mm_srli_si64(a, c);
  _mm_srai_pi16(a, c);
  _mm_srai_pi32(a, c);
}

clang -msse3 -mno-mmx file.c -c

For this code we should be able to explicitly turn off MMX
without affecting the compilation of the SSE3 function and then
diagnose and error on compiling the MMX function.

This matches the existing gcc behavior and follows the spirit of
the SSE/MMX separation in llvm where we can (and do) turn off
MMX code generation except in the presence of intrinsics.

Updated a couple of tests, but primarily tested with a couple of tests
for turning on only mmx and only sse.

This is paired with a patch to clang to take advantage of this behavior.

llvm-svn: 249731
2015-10-08 20:10:06 +00:00
Alexei Starovoitov 87f83e6926 [bpf] Do not expand UNDEF SDNode during insn selection lowering
o Before this patch, BPF backend will expand UNDEF node
    to i64 constant 0.
  o For second pass of dag combiner, legalizer will run through
    each to-be-processed dag node.
  o If any new SDNode is generated and has an undef operand,
    dag combiner will put undef node, newly-generated constant-0 node,
    and any node which uses these nodes in the working list.
  o During this process, it is possible undef operand is
    generated again, and this will form an infinite loop
    for dag combiner pass2.
  o This patch allows UNDEF to be a legal type.

Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
llvm-svn: 249718
2015-10-08 18:52:40 +00:00
Reid Kleckner b2244cb8f0 [WinEH] Relax assertion in the presence of stack realignment
The code is correct as is, but we should test it.

llvm-svn: 249715
2015-10-08 18:41:52 +00:00
Ulrich Weigand f4d14f781f [SystemZ] Fix another assertion failure in tryBuildVectorShuffle
This fixes yet another scenario where tryBuildVectorShuffle would
attempt to create a BUILD_VECTOR node with an invalid combination
of types.  This can happen if the incoming BUILD_VECTOR has elements
of a type different from the vector element type, which is allowed
in certain cases as long as they are all the same type.

When one of these elements is used in the residual vector, and
UNDEF elements are added to fill up the residual vector, those
UNDEFs then have to use the type of the original element, not
the vector element type, or else the resulting BUILD_VECTOR
will have an invalid type combination.

llvm-svn: 249706
2015-10-08 17:46:59 +00:00
Frederic Riss 263b772bda [X86] Disable X86CallFrameOptimization on Darwin in presence of EH
We emit 1 compact unwind encoding per function, and this can’t represent
the varying stack pointer that will be generated by X86CallFrameOptimization.
Disable the optimization on Darwin.

(It might be possible to split the function into multiple ranges
and emit 1 compact unwind info per range. The compact unwind emission
code isn’t ready for that and this kind of info certainly isn’t
tested/used anywhere. It might be worth exploring this path if we want
to get the space savings at some point though)

llvm-svn: 249694
2015-10-08 15:45:08 +00:00
Igor Breger defab3c1ef AVX512: vpextrb/w/d/q and vpinsrb/w/d/q implementation.
This instructions doesn't have intrincis.
Added tests for lowering and encoding.

Differential Revision: http://reviews.llvm.org/D12317

llvm-svn: 249688
2015-10-08 12:55:01 +00:00
Michael Kuperstein 04e79329d0 [X86] Fix wrong treatment of multi-lane blends in BUILD_VECTORtoBlendMask()
This fixes two separate bugs:
1) The mask for the high lane was not set correctly. That fixes PR24532.
2) The transformation should bail out if it believes it involves more than
2 lanes, as it does not currently do anything sensible in this case.

Differential Revision: http://reviews.llvm.org/D13505

llvm-svn: 249669
2015-10-08 08:13:02 +00:00
Jonas Paulsson 5d3fbd3733 [SystemZ] SystemZElimCompare pass improved.
Compare elimination extended to recognize load-and-test instructions used
for comparison and eliminate them the same way as with compare instructions.

Test case fp-cmp-05.ll updated to expect optimized results now also for z13.

The order of instruction shortening and compare elimination passes have been
changed so that opcodes do not have to be handled in both passes.

Reviewed by Ulrich Weigand.

llvm-svn: 249666
2015-10-08 07:40:23 +00:00
Jonas Paulsson 29d9d8d955 [SystemZ] Bugfix: check CC reg liveness in SystemZShortenInst.
The following instruction shortening transformations would introduce a
definition of the CC reg, so therefore liveness of CC reg must be checked:

WFADB -> ADBR
WFSDB -> SDBR

Also add the CC reg implicit def operand to the MI in case of change of opcode.

Reviewed by Ulrich Weigand.

llvm-svn: 249665
2015-10-08 07:40:19 +00:00
Jonas Paulsson 7c5ce10a07 [SystemZ] Use load-and-test for fp compare with 0 if vector support is present.
Since the LTxBRCompare instructions can't be used with vector registers, a
normal load-and-test instruction (with a modelled def operand) is used instead.

Reviewed by Ulrich Weigand.

llvm-svn: 249664
2015-10-08 07:40:16 +00:00
Jonas Paulsson 2c96dd64fc [SystemZ] More minor fixing in SystemZElimCompare.cpp
Don't use subreg indices since they are not used after regalloc.

Reviewed by Ulrich Weigand.

llvm-svn: 249663
2015-10-08 07:40:11 +00:00
Jonas Paulsson 9e1f3bd1bd [SystemZ] Minor fixes in SystemZElimCompare.cpp
Reviewed by Ulrich Weigand.

llvm-svn: 249662
2015-10-08 07:39:55 +00:00
Justin Bogner 468c998031 CodeGen: print and verify after TargetPassConfig::insertPass by default
In r224059, we started verifying after addPass, but missed doing so on
insertPass. There isn't a good reason for the discrepancy, and
skipping the verifier in these cases causes bugs.

This also exposes a verifier error that was introduced in r249087, but
the verifier doesn't run until after the register coalescer, when the
issue happens to have been resolved. I've skipped the verifier after
SIFixSGPRLiveRangesID to avoid the failures for now and will follow up
with Matt for a proper fix.

llvm-svn: 249643
2015-10-08 00:36:22 +00:00
Reid Kleckner 97797419e6 [WinEH] Fix 32-bit funclet epilogues in the presence of dynamic allocas
In particular, passing non-trivially copyable objects by value on win32
uses a dynamic alloca (inalloca). We would clobber ESP in the epilogue
and end up returning to outer space.

llvm-svn: 249637
2015-10-07 23:55:01 +00:00
Rafael Espindola 284093033f git-clang-format r249548.
Sorry for missing this the first time.

llvm-svn: 249610
2015-10-07 20:32:24 +00:00
Vasileios Kalintiris b876b58d38 [mips][FastISel] Factor out common code from switch statement. NFC
llvm-svn: 249603
2015-10-07 20:06:30 +00:00
Vasileios Kalintiris 6ae1b35cda [mips][FastISel] Use ternary operator to select opcode. NFC
llvm-svn: 249594
2015-10-07 19:43:31 +00:00
Vasileios Kalintiris daad571ba4 [mips][FastISel] Simple refactoring of MipsFastISel::emitLogicalOP(). NFC.
llvm-svn: 249580
2015-10-07 18:14:24 +00:00
Chad Rosier 7c6ac2b8f9 [AArch64] Fold a floating-point divide by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249579
2015-10-07 17:51:37 +00:00
Matt Arsenault fc0ad42516 AMDGPU: Fix missing implicit m0 uses on movrel instructions
llvm-svn: 249577
2015-10-07 17:46:32 +00:00
Chad Rosier fa30c9b436 [AArch64] Fold a floating-point multiply by power of two into fp conversion.
Part of http://reviews.llvm.org/D13442

llvm-svn: 249576
2015-10-07 17:39:18 +00:00
Chad Rosier 169865ffda [ARM] Promote helper function to SelectionDAG.
I'll be using the function in a similar combine for AArch64.  The helper was
also improved to handle undef values.

Part of http://reviews.llvm.org/D13442

llvm-svn: 249572
2015-10-07 17:28:58 +00:00
Kevin B. Smith 9c7408807f Test commit access. Fixed comment to have correct input parameter name and
period termination.

llvm-svn: 249571
2015-10-07 17:24:25 +00:00
Oliver Stannard d3d114ba54 [ARM] Use correct half-precision functions in EABI mode
The ARM RTABI defines the half- to single-precision float conversion functions
with an __aeabi prefix, but libgcc only has them with a __gnu prefix. Therefore
we need to emit the __aeabi version when compiling with an eabi or eabihf
triple, and the __gnu version with a gnueabi or gnueabihf triple.

llvm-svn: 249565
2015-10-07 16:58:49 +00:00
Chad Rosier 17436bf64e [ARM] Prevent PerformVDIVCombine from combining a vcvt/vdiv with 8 lanes.
This would result in a crash since the vcvt used does not support v8i32 types.

llvm-svn: 249560
2015-10-07 16:15:40 +00:00
Jeroen Ketema aebca09543 [ARM][AArch64] Only lower to interleaved load/store if the target has NEON
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.

Differential Revision: http://reviews.llvm.org/D13508

llvm-svn: 249550
2015-10-07 14:53:29 +00:00
Rafael Espindola 30d77777e7 Use non virtual destructors for sections.
llvm-svn: 249548
2015-10-07 13:46:06 +00:00
Chad Rosier db71abf2d4 [ARM] Push more complex check down to reduce compile time. NFC.
llvm-svn: 249547
2015-10-07 13:40:44 +00:00
Rafael Espindola 665b0d3a4e Don't repeat names in comments and don't indent in namespaces. NFC.
llvm-svn: 249546
2015-10-07 13:38:49 +00:00
Scott Egerton 9004cc7942 Revert: r249536 - Testing commit access with a trival whitespace change.
llvm-svn: 249537
2015-10-07 10:57:06 +00:00
Scott Egerton be6b54b691 Testing commit access with a trival whitespace change.
llvm-svn: 249536
2015-10-07 10:49:49 +00:00
Michael Kuperstein 259f1508f0 [X86] Emit .cfi_escape GNU_ARGS_SIZE when adjusting the stack before calls
When outgoing function arguments are passed using push instructions, and EH
is enabled, we may need to indicate to the stack unwinder that the stack
pointer was adjusted before the call.

This should fix the exception handling issues in PR24792.

Differential Revision: http://reviews.llvm.org/D13132

llvm-svn: 249522
2015-10-07 07:01:31 +00:00
Igor Breger 1a6fd1cc0f AVX512: Change encoding of vpshuflw and vpshufhw instructions. Implement WIG as W0 and not W1, like all other instruction have been implemented.
Add encoding tests.

Differential Revision: http://reviews.llvm.org/D13471

llvm-svn: 249521
2015-10-07 06:31:18 +00:00
Matt Arsenault 10e6a61892 AMDGPU: Add comment for VOP2b operand class
Because of the constant bus requirement, it is never legal to
use a literal constant for these instructions despite the encoding
allowing it. This was already doing the right thing, but note why.

llvm-svn: 249500
2015-10-07 01:36:00 +00:00
Matt Arsenault 187276fa94 AMDGPU: Properly register passes
llvm-svn: 249495
2015-10-07 00:42:53 +00:00
Matt Arsenault 284192730a AMDGPU: Use explicit register size indirect pseudos
This stops using an unknown reg class operand.

Currently build_vector selection has a broken looking check
where it tries to use a VGPR reg class and an SGPR one if it
sees an SGPR use.

With the source operand has an explicit VGPR class,
illegal copies will be inserted that SIFixSGPRCopies will take care
of normally later, which will allow removing the weird check
of build_vector users. Without this, when removed v_movrels_b32 would
still be emitted even though all of the values were only stored in
SGPRs.

llvm-svn: 249494
2015-10-07 00:42:51 +00:00
Matt Arsenault 922b7bf808 AMDGPU: Remove inferRegClassFromUses / inferRegClassFromDefs
I'm not sure why this would be necessary, and no tests fail with
them removed. Looking at the uses is suspect as well because
the use reg classes will likely change when the users are moved
as a result of moving this instruction.

llvm-svn: 249493
2015-10-07 00:42:31 +00:00
Hans Wennborg 083ca9bb32 Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups.
Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D13321

llvm-svn: 249482
2015-10-06 23:24:35 +00:00
Tom Stellard 0fbf899c0f AMDGPU/SI: Remove calling convention assertion from LowerFormalArguments()
Summary:
We currently ignore the calling convention, so there is no real reason to
assert on the calling convention of functions.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13367

llvm-svn: 249468
2015-10-06 21:16:34 +00:00
Chad Rosier dca46b426f [ARM] Minor refactoring. NFC.
llvm-svn: 249465
2015-10-06 20:58:42 +00:00
Chad Rosier aed910b7d7 [ARM] Minor refactoring. NFC.
llvm-svn: 249464
2015-10-06 20:51:26 +00:00
Chad Rosier 9df4aff86d [ARM] Minor refactoring. NFC.
llvm-svn: 249463
2015-10-06 20:45:45 +00:00
Joseph Tremoulet 2afea5438f [WinEH] Recognize CoreCLR personality function
Summary:
 - Add CoreCLR to if/else ladders and switches as appropriate.
 - Rename isMSVCEHPersonality to isFuncletEHPersonality to better
   reflect what it captures.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D13449

llvm-svn: 249455
2015-10-06 20:28:16 +00:00
Chad Rosier a087fd21da [ARM] Minor refactoring to improve readability. NFC.
llvm-svn: 249454
2015-10-06 20:23:42 +00:00
Krzysztof Parzyszek 8d2b2cfa29 [Hexagon] Remove ZeroOrMore from option flags
llvm-svn: 249438
2015-10-06 18:29:36 +00:00
Tom Stellard 88e0b25181 AMDGPU/SI: Add 64-bit versions of v_nop and v_clrexcp
Summary:
The assembly printing of these is still missing the encoding size
suffix, but this will be fixed in a later commit.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13436

llvm-svn: 249424
2015-10-06 15:57:53 +00:00
Krzysztof Parzyszek fb33824efd [Hexagon] Add an early if-conversion pass
llvm-svn: 249423
2015-10-06 15:49:14 +00:00
Daniel Sanders 1b3341724c [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Summary:
This fixes 7 tests during fast LLVM test-suite run:
* MultiSource/Benchmarks/McCat/18-imp/imp
* MultiSource/Applications/oggenc/oggenc
* MultiSource/Benchmarks/MallocBench/gs/gs
* MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer
* MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame
* MultiSource/Benchmarks/Bullet/bullet

Error message was in the form of:
fatal error: error in backend: Cannot select: 0x95c3288: f32 = fsqrt 0x95c0190 [ORD=9] [ID=18]
  0x95c0190: f32 = fadd 0x95bef30, 0x95c4d00 [ORD=8] [ID=17]
    0x95bef30: f32 = fmul 0x95c4988, 0x95c4988 [ORD=5] [ID=16]
...

There was problem with selecting sqrt instruction in LLVM backend.

To fix the issue changes are made in TableGen definition for sqrt instruction in MipsInstrFPU.td and new test file sqrt.ll is added to LLVM regression tests.

Patch by Zlatko Buljan

Reviewers: zoran.jovanovic, hvarga, dsanders

Subscribers: llvm-commits, petarj

Differential Revision: http://reviews.llvm.org/D13235

llvm-svn: 249416
2015-10-06 15:17:25 +00:00
Daniel Sanders add9057fa7 Revert r249123 - [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
The author was not credited and most of the commit message is missing. Will re-commit with this fixed.

llvm-svn: 249415
2015-10-06 15:13:16 +00:00
Alexei Starovoitov 4e01a38da0 [bpf] Avoid extra pointer arithmetic for stack access
For the program like below
struct key_t {
  int pid;
  char name[16];
};
extern void test1(char *);
int test() {
  struct key_t key = {};
  test1(key.name);
  return 0;
}
For key.name, the llc/bpf may generate the below code:
  R1 = R10  // R10 is the frame pointer
  R1 += -24 // framepointer adjustment
  R1 |= 4   // R1 is then used as the first parameter of test1
OR operation is not recognized by in-kernel verifier.

This patch introduces an intermediate FI_ri instruction and
generates the following code that can be properly verified:
  R1 = R10
  R1 += -20

Patch by Yonghong Song <yhs@plumgrid.com>

llvm-svn: 249371
2015-10-06 04:00:53 +00:00
Craig Topper 79dd1bf094 [X86] Teach constant hoisting that ANDs with 64-bit immediates in the range 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted.
Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister.

llvm-svn: 249370
2015-10-06 02:50:24 +00:00
Craig Topper d69d495333 [X86] Remove unnecessary AddComplexity directive. The instruction is already wrapped in the equivalent earlier. NFC
llvm-svn: 249369
2015-10-06 02:50:21 +00:00
Dan Gohman e51c058ecc [WebAssembly] Switch to a more traditional assembly syntax
This new syntax is built around putting each instruction on its own line
in a "mnemonic op, op, op" like syntax. It also uses conventional data
section directives like ".byte" and so on rather than requiring everything
to be in hierarchical S-expression format. This is a more natural syntax
for a ".s" file format from the perspective of LLVM MC and related tools,
while remaining easy to translate into other forms as needed.

llvm-svn: 249364
2015-10-06 00:27:55 +00:00
David Majnemer e4f9b09b51 [WinEH] Update CATCHRET's operand to match its successor
The CATCHRET operand did not match the MachineFunction's CFG.  This
mismatch happened because FrameLowering created a new MachineBasicBlock
and updated the CFG but forgot to update the CATCHRET operand.

Let's make sure this doesn't happen again by strengthing the funclet
membership analysis: it can now reason about the membership of all basic
blocks, not just those inside of funclets.

llvm-svn: 249344
2015-10-05 20:09:16 +00:00
Tom Stellard d585cd85a3 AMDGPU/SI: Add a helper for creating aliases for the _e32 instructions
Summary:
We are currently only using these aliases for VOPC instructions,
but this helper will make it easier to use them everywhere.

These aliases allow for the automatic matching of instructions
with forced 32-bit encoding.  Eventually, we should be able to remove
the custom C++ logic we have for this in the assembler.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13396

llvm-svn: 249330
2015-10-05 17:57:39 +00:00
Scott Douglass 953f908173 [ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.

Fixes PR9199 and PR23768.

[This is based on Peter Collingbourne's r238473 which was reverted.]

Differential Revision: http://reviews.llvm.org/D13239

Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
2015-10-05 14:49:54 +00:00
Zoran Jovanovic 5a8dffc618 [mips][microMIPS] Implement JALRC16, JRCADDIUSP and JRC16 instructions
Differential Revision: http://reviews.llvm.org/D11219

llvm-svn: 249317
2015-10-05 14:00:09 +00:00
Alexandros Lamprineas 1bab191f25 [MC layer][AArch64] llvm-mc accepts 4-bit immediate values for
"msr pan, #imm", while only 1-bit immediate values should be valid.
Changed encoding and decoding for msr pstate instructions.

Differential Revision: http://reviews.llvm.org/D13011

llvm-svn: 249313
2015-10-05 13:42:31 +00:00
Daniel Sanders d5a89418c5 [mips] Changed the way symbols are handled in dla and la instructions to allow simple expressions.
Summary:
An instruction like "(d)la $5, symbol+8" previously would have crashed the
assembler as it contains an expression. This is now fixed.
A few tests cases have also been changed to reflect these changes, however
these should only be syntax changes. Some new test cases have also been
added.

Patch by Scott Egerton.

Reviewers: vkalintiris, dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12760

llvm-svn: 249311
2015-10-05 13:19:29 +00:00
Rafael Espindola e3a20f57d9 Fix pr24486.
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.

With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.

In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.

This patch is a joint work by Maxim Ostapenko andy myself.

llvm-svn: 249303
2015-10-05 12:07:05 +00:00
Joerg Sonnenberger 726e624c0c [SPARCv9] Add support for the rdpr/wrpr instructions.
llvm-svn: 249262
2015-10-04 09:11:22 +00:00
Igor Breger 78741a1b1e AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12690

llvm-svn: 249261
2015-10-04 07:20:41 +00:00
Jeroen Ketema 321fc30afc Fix typo in README
llvm-svn: 249253
2015-10-04 00:46:16 +00:00
Simon Pilgrim bc707d04a4 [X86] Lower SEXTLOAD using SIGN_EXTEND_VECTOR_INREG. NCI.
The custom lowering in LowerExtendedLoad is doing the equivalent shuffle, so make use of existing lowering code to reduce duplication.

llvm-svn: 249243
2015-10-03 18:55:43 +00:00
Dan Gohman dc51b96b7f [WebAssembly] Implement the remaining conversion operations.
This is a temporary assembly syntax that will likely evolve along with
broader upcoming syntax changes.

llvm-svn: 249225
2015-10-03 02:10:28 +00:00
Tom Stellard dc9088a10e AMDGPU/SI: Remove unused tablegen multiclass
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13395

llvm-svn: 249221
2015-10-03 00:29:50 +00:00
Dan Gohman 6a050f30de [WebAssembly] Rename setlocal to set_local to match the spec.
llvm-svn: 249218
2015-10-03 00:01:53 +00:00
Dan Gohman e3e4a5ff52 [WebAssembly] Fix CFG stackification of nested loops.
llvm-svn: 249187
2015-10-02 21:11:36 +00:00
Dan Gohman 9cc692b06e [WebAssembly] Support calls marked as "tail", fastcc, and coldcc.
llvm-svn: 249184
2015-10-02 20:54:23 +00:00
Dan Gohman baba8c648b [WebAssembly] Add a resize_memory intrinsic.
llvm-svn: 249178
2015-10-02 20:10:26 +00:00
Dan Gohman 72f1692a2c [WebAssembly] Add a memory_size intrinsic.
llvm-svn: 249171
2015-10-02 19:21:15 +00:00
Matt Arsenault d092a068ba AMDGPU/SI: Add verifier check for exec reads
Make sure we aren't accidentally not setting
these in the instruction definitions.

llvm-svn: 249170
2015-10-02 18:58:37 +00:00
Roman Divacky 4b5507a037 Actually switch the arch when we see .arch. PR21695
llvm-svn: 249165
2015-10-02 18:25:25 +00:00
Tim Northover 8d67b8e053 ARM: diagnose invalid local fixups on Thumb1
We previously stopped producing Thumb2 relaxations when they weren't supported,
but only diagnosed the case where an actual relocation was produced. We should
also tell people if local symbols aren't going to work rather than silently
overflowing.

llvm-svn: 249164
2015-10-02 18:07:18 +00:00
Tim Northover 956b008db6 ARM: correctly align constant pool value on Thumb1 targets.
Since we're using tLDRpci to access it, the constant pool's address must be 0
(mod 4).

llvm-svn: 249163
2015-10-02 18:07:13 +00:00
Chad Rosier 1f385618c0 [ARM] Typo. NFC.
llvm-svn: 249153
2015-10-02 16:42:59 +00:00
Andrea Di Biagio 77f62652c1 Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types."
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Originally reviewed here: http://reviews.llvm.org/D13347

llvm-svn: 249147
2015-10-02 16:08:05 +00:00
Andrea Di Biagio 45874e67a1 Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
r249121 caused a Clang test failure (avx2-buitins.c).
Revert r249121 while I keep investigating on the reason why that test failed.

llvm-svn: 249124
2015-10-02 13:06:19 +00:00
Zoran Jovanovic 9ffdfa5986 [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Differential Revision: http://reviews.llvm.org/D13235

llvm-svn: 249123
2015-10-02 13:06:02 +00:00
Andrea Di Biagio cb33456122 [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Differential Revision: http://reviews.llvm.org/D13347

llvm-svn: 249121
2015-10-02 12:45:37 +00:00
Matt Arsenault b733f00510 AMDGPU: Fix unused variable warning in release build
llvm-svn: 249091
2015-10-01 22:40:35 +00:00
Matt Arsenault b87fc22915 AMDGPU: Move SIFixSGPRLiveRanges to be a regalloc pass
Replace LiveInterval usage with LiveVariables. LiveIntervals
computes far more information than is needed for this pass
which just needs to find if an SGPR is live out of the
defining block.

LiveIntervals are not usually available that early, requiring
computing them twice which is very expensive. The extra run of
LiveIntervals/LiveVariables/SlotIndexes was costing in total
about 5% of compile time.

Continuing to use LiveIntervals is problematic. It seems
there is an option (early-live-intervals) to run the analysis
about where it should go to avoid recomputing LiveVariables,
but it seems to be completely broken with subreg liveness
enabled. There are also problems from trying to recompute
LiveIntervals since this seems to undo LiveVariables
and clearing kill flags, causing TwoAddressInstructions
to make bad decisions.

Insert the pass right after live variables and preserve it.
The tricky case to worry about might be phis since
LiveVariables doesn't count a register as live out if
in the successor block it is only used in a phi,
but I don't think this is a concern right now
because SIFixSGPRCopies replaces SGPR phis.

llvm-svn: 249087
2015-10-01 22:10:03 +00:00
Joerg Sonnenberger c8d50d6347 Fix relocation used for GOT references in non-PIC mode. Fix relocations
for "set" pseudo op in PIC mode.

Differential Revision: http://reviews.llvm.org/D13173

llvm-svn: 249086
2015-10-01 22:08:20 +00:00
Matt Arsenault d2c7589f93 AMDGPU: Merge if and switch
llvm-svn: 249082
2015-10-01 21:51:59 +00:00
Matt Arsenault db7f0ef367 AMDGPU: Remove dead code
There's no point in checking VReg_1 because all uses
of it should already have been removed by SILowerI1Copies.

llvm-svn: 249081
2015-10-01 21:51:57 +00:00
Matt Arsenault d1d499aa56 AMDGPU: Make SIInsertWaits about a factor of 4 faster
This was the slowest target custom pass and was spending 80%
of the time in getMinimalPhysRegClass which was called
for every register operand.

Try to use the statically known register class when possible from
the instruction's MCOperandInfo. There are a few pseudo instructions
which are not well behaved with unknown register classes which still
require the expensive physical register class search.

There are a few other possibilities for making this even faster,
such as not inspecting implicit operands. For now those are checked
because it is technically possible to have a scalar load into
exec or vcc which can be implicitly used.

llvm-svn: 249079
2015-10-01 21:43:15 +00:00
Reid Kleckner fc64fae6e3 [WinEH] Emit __C_specific_handler tables for the new IR
We emit denormalized tables, where every range of invokes in the same
state gets a complete list of EH action entries. This is significantly
simpler than trying to infer the correct nested scoping structure from
the MI. Fortunately, for SEH, the nesting structure is really just a
size optimization.

With this, some basic __try / __except examples work.

llvm-svn: 249078
2015-10-01 21:38:24 +00:00
Tom Stellard e9f8b24985 AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass
Summary:
Instead of asserting when the kernel metadata is different than we expect,
we should just skip lowering that function.  This fixes assertion
failures with OpenCL argument metadata from older LLVM releases.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13356

llvm-svn: 249073
2015-10-01 21:16:05 +00:00
David Majnemer f828a0ccc7 [WinEH] Make FuncletLayout more robust against catchret
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of.  Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.

llvm-svn: 249052
2015-10-01 18:44:59 +00:00
Chad Rosier f11d040f01 [AArch64] Deprecate a command-line option used for testing.
Support for pairing unscaled loads and stores has been enabled since the
original ARM64 port.  This feature is no longer experimental, AFAICT.

llvm-svn: 249049
2015-10-01 18:17:12 +00:00
Jonas Paulsson 12629324a4 [SystemZ] Add some generic (floating point support) load instructions.
Add generic instructions for load complement, load negative and load positive
for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so
give scheduler more freedom. SystemZElimCompare pass will convert them when it
can to the CC-setting variants.

Regression tests updated to expect the new opcodes in places where the old ones
where used. New test case SystemZ/fp-cmp-05.ll checks that
SystemZCompareElim.cpp can handle the new opcodes.

README.txt updated (bullet removed).

Note that fp128 is not yet handled, because it is relatively rare, and is a
bit trickier, because of the fact that l.dfr would operate on the sign bit of
one of the subregisters of a fp128, but we would not want to copy the other
sub-reg in case src and dst regs are not the same.

Reviewed by Ulrich Weigand.

llvm-svn: 249046
2015-10-01 18:12:28 +00:00
Tom Stellard e0e582c9aa AMDGPU: Add MEM_RAT STORE_TYPED.
v2: Add test (Matt).
    Fix capitalization of isEOP (Matt).
    Move pattern to class parameter (Matt).
    Make the instruction available to Cayman (Matt).
    Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED.

Patch by: Zoltan Gilian

llvm-svn: 249042
2015-10-01 17:51:34 +00:00
Tom Stellard c0f0fba2c4 AMDGPU: Factor out EOP query.
v2: Fix brace placement and capitalization (Matt).

Patch by: Zoltan Gilian

llvm-svn: 249041
2015-10-01 17:51:29 +00:00
NAKAMURA Takumi 1ed20db720 Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"
It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll

llvm-svn: 249032
2015-10-01 17:00:56 +00:00
Ulrich Weigand cf1670a095 [SystemZ] Add assembly instructions for obtaining clock values as well as CPU features
Provide assembler support for STCK, STCKF, STCKE, and STFLE.

Author: joncmu
Differential Revision: http://reviews.llvm.org/D13299

llvm-svn: 249015
2015-10-01 14:43:48 +00:00
Chad Rosier b7c5b91068 [AArch64] Hoist commonly failing check. NFC.
llvm-svn: 249011
2015-10-01 13:43:05 +00:00
Chad Rosier 0b15e7c618 [AArch64] Rename variable to improve readability. NFC.
llvm-svn: 249008
2015-10-01 13:33:31 +00:00
Chad Rosier 7a83d770ae [AArch64] Update comment to reflect reality.
llvm-svn: 249007
2015-10-01 13:09:44 +00:00
Zoran Jovanovic 2960f3a346 [mips][microMIPS] Implement CACHEE, WRPGPR and WSBH instructions
Differential Revision: http://reviews.llvm.org/D10337

llvm-svn: 249004
2015-10-01 12:49:27 +00:00
Scott Douglass 290183d734 [ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer
Differential Revision: http://reviews.llvm.org/D13240

llvm-svn: 249002
2015-10-01 11:56:19 +00:00
Tom Stellard 1f0e7bbc5b AMDGPU/SI: Re-order PreloadedValue enum and number entries based on init order
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12451

llvm-svn: 248978
2015-10-01 02:02:46 +00:00
Ahmed Bougacha 23a0d1a1d6 [X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.

Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:

  movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
  pand %xmm0, %xmm1
  por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
  psrld $16, %xmm0
  por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
  addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
  addps %xmm1, %xmm0

Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).

In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).

However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.

When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.

Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.

One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.

Fixes PR24512.

...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.

llvm-svn: 248965
2015-10-01 00:11:07 +00:00
Reid Kleckner 6dec87a8a0 [WinEH] Emit int3 after noreturn calls on Win64
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.

Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call.  Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.

The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.

llvm-svn: 248959
2015-09-30 23:09:23 +00:00
Sanjay Patel a114a10bbe [x86] enable machine combiner reassociations for 256-bit vector logical integer insts
llvm-svn: 248955
2015-09-30 22:25:55 +00:00
David Blaikie 757908e545 Fix -Wsign-compare warning
llvm-svn: 248942
2015-09-30 20:37:48 +00:00
Chad Rosier 11c825f7db [AArch64] Remove an unnecessary restriction on pre-index instructions.
Previously, the index was constrained to the size of the memory operation for
no apparent reason.  This change removes that constraint so that we can form
pre-index instructions with any valid offset.

llvm-svn: 248931
2015-09-30 19:44:40 +00:00
Hal Finkel 4c45775880 [PowerPC] Disable shrink wrapping
Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for
now until the problem can be fixed.

llvm-svn: 248924
2015-09-30 17:29:03 +00:00
Artyom Skrobov 72ca6b8f3f [ARM] Support for ARMv6-Z / ARMv6-ZK missing
As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121
TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK.

In particular, there were no tests for TrustZone being supported in these
architectures.

The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes
his test case which hadn't really been testing what it was claiming to test.

Differential Revision: http://reviews.llvm.org/D13236

llvm-svn: 248921
2015-09-30 17:25:52 +00:00
Chad Rosier 4f04e2ec87 [AArch64] Use helper function to improve readability. NFC.
llvm-svn: 248914
2015-09-30 16:50:41 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Simon Pilgrim 3d11c994f7 [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.

Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.

Differential Revision: http://reviews.llvm.org/D8690

llvm-svn: 248878
2015-09-30 08:17:50 +00:00
Marek Olsak d1a69a2839 AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is set
to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.

This is a candidate for stable llvm 3.7.

Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 248858
2015-09-29 23:37:32 +00:00
Reid Kleckner a13dfd539b [WinEH] Setup RBP correctly in Win64 funclet prologues
Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.

llvm-svn: 248857
2015-09-29 23:32:01 +00:00
David Majnemer 91b0ab9172 [WinEH] Ensure that funclets obey the x64 ABI
The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow.  However, we'd insert code to
initialize the return address after stack adjustments.  Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.

llvm-svn: 248839
2015-09-29 22:33:36 +00:00
Maksim Panchenko cce239c45d HHVM calling conventions.
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.

In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.

When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.

One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)

Differential Revision: http://reviews.llvm.org/D12681

llvm-svn: 248832
2015-09-29 22:09:16 +00:00
Chad Rosier 4315012769 [AArch64] Add support for pre- and post-index LDPSWs.
llvm-svn: 248825
2015-09-29 20:39:55 +00:00
David Majnemer a80c151286 [WinEH] Teach AsmPrinter about funclets
Summary:
Funclets have been turned into functions by the time they hit the object
file.  Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.

Differential Revision: http://reviews.llvm.org/D13261

llvm-svn: 248824
2015-09-29 20:12:33 +00:00
Chad Rosier dabe2534ed [AArch64] Add integer pre- and post-index halfword/byte loads and stores.
llvm-svn: 248817
2015-09-29 18:26:15 +00:00
Nemanja Ivanovic 2c84b29464 Addition of interfaces the BE to conform to Table A-2 of ELF V2 ABI V1.1
This patch corresponds to review:
http://reviews.llvm.org/D13191

Back end portion of the fifth round of additions to altivec.h.

llvm-svn: 248809
2015-09-29 17:41:53 +00:00
Chad Rosier 32d4d37e61 [AArch64] Scale offsets by the size of the memory operation. NFC.
The immediate in the load/store should be scaled by the size of the memory
operation, not the size of the register being loaded/stored.  This change gets
us one step closer to forming LDPSW instructions.  This change also enables
pre- and post-indexing for halfword and byte loads and stores.

llvm-svn: 248804
2015-09-29 16:07:32 +00:00
Chad Rosier a4d3217e81 [AArch64] Remove some redundant cases. NFC.
llvm-svn: 248800
2015-09-29 14:57:10 +00:00
Jeroen Ketema 740f9d79ca Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

the x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards 
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Differential Revision: http://reviews.llvm.org/D12337

llvm-svn: 248786
2015-09-29 10:12:57 +00:00
NAKAMURA Takumi 0c12a3949e [CMake] X86AsmParser: Prune redundant LINK_LIBS.
It is described in LLVMBuild.txt.

llvm-svn: 248771
2015-09-29 01:25:01 +00:00
Sanjay Patel 3a14f1a338 add a FIXME for a CPU model check that should have an attribute instead
llvm-svn: 248746
2015-09-28 22:00:24 +00:00