Commit Graph

39766 Commits

Author SHA1 Message Date
Simon Dardis aff4d141b9 [mips] Macro expansion for ld, sd for O32
ld and sd when assembled for the O32 ABI expand to a pair of 32 bit word loads
or stores using the specified source or destination register and the next
register.

This patch does not add support for the cases where the offset is greater than
a 16 bit signed immediate as that would lead to a wrong/misleading error
message as the assembler would report "instruction requires a CPU feature
not currently enabled" for ld & sd for MIPS64 when their offset is not a signed
16 bit number.

This fixes PR/29159.

Thanks to Sean Bruno for reporting this issue!

Reviewers: vkalintiris, seanbruno, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D24556

llvm-svn: 284481
2016-10-18 14:28:00 +00:00
Michael Zuckerman 1bee6340ef [x86][inline-asm][avx512] allow swapping of '{k<num>}' & '{z}' marks
Committing on behalf of Coby Tayree: After check-all and LGTM

Desc:

AVX512 allows dest operand to be followed by an op-mask register specifier ('{k<num>}', which in turn may be followed by a merging/zeroing specifier ('{z}')
 Currently, the following forms are allowed:
 {k<num>}
 {k<num>}{z}

This patch allows the following forms:
 {z}{k<num>}

and ignores the next form:
 {z}

Justification would be quite simple - GCC

Differential Revision: http://reviews.llvm.org/D25013

llvm-svn: 284479
2016-10-18 13:52:39 +00:00
Vasileios Kalintiris 3955b75ba9 [mips][FastISel] Instantiate the MipsFastISel class only for targets that support FastISel.
Summary:
Instead of instantiating the MipsFastISel class and checking if the
target is supported in the overriden methods, we should perform that
check before creating the class. This allows us to enable FastISel *only*
for targets that truly support it, ie. MIPS32 to MIPS32R5.

Reviewers: sdardis

Subscribers: ehostunreach, llvm-commits

Differential Revision: https://reviews.llvm.org/D24824

llvm-svn: 284475
2016-10-18 13:05:42 +00:00
Javed Absar e7c338081a [ARM] Assign cost of scaling for Cortex-R52
This patch assigns cost of the scaling used in addressing for Cortex-R52.

On Cortex-R52 a negated register offset takes longer than a non-negated
register offset, in a register-offset addressing mode.

Differential Revision: http://reviews.llvm.org/D25670

Reviewer: jmolloy
llvm-svn: 284460
2016-10-18 09:08:54 +00:00
Simon Pilgrim 4ddc92b6cd [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

llvm-svn: 284459
2016-10-18 07:42:15 +00:00
Dean Michael Berris 156f6cafc2 [XRay] Support for for tail calls for ARM no-Thumb
This patch adds simplified support for tail calls on ARM with XRay instrumentation.

Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall
-std=c++14  -ffunction-sections -fdata-sections` (this list doesn't include my
specific flags like --target=armv7-linux-gnueabihf etc.), the following program

    #include <cstdio>
    #include <cassert>
    #include <xray/xray_interface.h>

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() {
      std::printf("In fC()\n");
    }

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() {
      std::printf("In fB()\n");
      fC();
    }

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() {
      std::printf("In fA()\n");
      fB();
    }

    // Avoid infinite recursion in case the logging function is instrumented (so calls logging
    //   function again).
    [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
    {
      printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
    }

    int main(int argc, char* argv[]) {
      __xray_set_handler(simplyPrint);

      printf("Patching...\n");
      __xray_patch();
      fA();

      printf("Unpatching...\n");
      __xray_unpatch();
      fA();

      return 0;
    }

gives the following output:

    Patching...
    XRay: functionId=3 type=0.
    In fA()
    XRay: functionId=3 type=1.
    XRay: functionId=2 type=0.
    In fB()
    XRay: functionId=2 type=1.
    XRay: functionId=1 type=0.
    XRay: functionId=1 type=1.
    In fC()
    Unpatching...
    In fA()
    In fB()
    In fC()

So for function fC() the exit sled seems to be called too much before function
exit: before printing In fC().

Debugging shows that the above happens because printf from fC is also called as
a tail call. So first the exit sled of fC is executed, and only then printf is
jumped into. So it seems we can't do anything about this with the current
approach (i.e. within the simplification described in
https://reviews.llvm.org/D23988 ).

Differential Revision: https://reviews.llvm.org/D25030

llvm-svn: 284456
2016-10-18 05:54:15 +00:00
Craig Topper 448358b5f1 [X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has a different type than the shuffle itself.
This is especially important for 32-bit targets with 64-bit shuffle elements.

llvm-svn: 284453
2016-10-18 04:48:33 +00:00
Craig Topper 7268bf99ab [AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool entry has a different type than the shuffle itself.
Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25666

llvm-svn: 284451
2016-10-18 04:00:32 +00:00
Craig Topper 175a415e78 [AVX-512] Add support for decoding shuffle mask from constant pool for masked VPERMILPS/PD.
llvm-svn: 284450
2016-10-18 03:36:52 +00:00
Konstantin Zhuravlyov 98a3ac7106 [AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for it
Differential Revision: https://reviews.llvm.org/D25694

llvm-svn: 284435
2016-10-17 22:40:15 +00:00
Tim Northover 020d104496 GlobalISel: support wider range of load/store sizes in AArch64.
llvm-svn: 284406
2016-10-17 18:36:53 +00:00
Tom Stellard 083f1626f5 AMDGPU/SI: LowerParameter() should be computing align based on memory type
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25203

llvm-svn: 284398
2016-10-17 16:56:19 +00:00
Tom Stellard bc6c523cce AMDGPU/SI: Fix LowerParameter() for i16 arguments
Summary:
If we are loading an i16 value from a 32-bit memory location, then
we need to be able to truncate the loaded value to i16.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25198

llvm-svn: 284397
2016-10-17 16:21:45 +00:00
Craig Topper 1f5178ff9f [X86] Fix shuffle decoding assertions to print the right number of required operands. Update the checks themselves to be >= to the same number instead of > one less than the required number.
llvm-svn: 284365
2016-10-17 06:41:18 +00:00
Craig Topper 5b24cd31f5 [AVX-512] Add shuffle combining support for vpermi2var shuffles derived from existing support for vpermt2var.
llvm-svn: 284357
2016-10-17 04:26:47 +00:00
Craig Topper 715ad7fef5 [AVX-512] Add support for turning a 256-bit load that goes to both halfs of an insert_subvector into a subvector broadcast.
Differential Revision: https://reviews.llvm.org/D25650

llvm-svn: 284353
2016-10-16 23:29:51 +00:00
Craig Topper aa1370ac57 [AVX-512] Fix the operand order for vpermi2var_qi intrinsics to match the other vpermi2var intrinsics.
llvm-svn: 284329
2016-10-16 04:54:35 +00:00
Craig Topper 4729fe8bb6 [AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS.
llvm-svn: 284328
2016-10-16 04:54:31 +00:00
Craig Topper f18b9201f5 [AVX-512] Move (v4i64 (X86SubVBroadcast (v2i64))) alternate patterns under a HasVLX predicate. Similar for floating point.
llvm-svn: 284327
2016-10-16 04:54:26 +00:00
Davide Italiano e9cdb24f67 [ArmFastISel] Kill dead code. NFCI.
llvm-svn: 284320
2016-10-16 01:09:39 +00:00
Konstantin Zhuravlyov 8ea0246e93 [MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG.
Differential Revision: https://reviews.llvm.org/D24577

llvm-svn: 284312
2016-10-15 22:01:18 +00:00
Craig Topper dde865afb5 [AVX-512] Add shuffle comments for vbroadcast instructions.
llvm-svn: 284305
2016-10-15 16:26:07 +00:00
Craig Topper 51e052f741 [AVX-512] Rename VPBROADCASTI32X2 and VPBROADCASTF32X2 instruction classes to match the mnemonic which does not include a 'P'.
llvm-svn: 284304
2016-10-15 16:26:02 +00:00
Tom Stellard 961811c906 AMDGPU/SI: Handle s_getreg hazard in GCNHazardRecognizer
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25526

llvm-svn: 284298
2016-10-15 00:58:14 +00:00
Tim Northover 69fa84a6e9 GlobalISel: rename legalizer components to match others.
The previous names were both misleading (the MachineLegalizer actually
contained the info tables) and inconsistent with the selector & translator (in
having a "Machine") prefix. This should make everything sensible again.

The only functional change is the name of a couple of command-line options.

llvm-svn: 284287
2016-10-14 22:18:18 +00:00
Guozhi Wei 0cd65429be [PPC] Shorter sequence to load 64bit constant with same hi/lo words
This is a patch to implement pr30640.

When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.

This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.

Differential Revision: https://reviews.llvm.org/D25521

llvm-svn: 284276
2016-10-14 20:41:50 +00:00
Tom Stellard 09c2bd6bd4 AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)

Reviewers: arsenm, nhaehnle

Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24672

llvm-svn: 284267
2016-10-14 19:14:29 +00:00
Krzysztof Parzyszek 775a20913d The real fix for post-r284255 failures
llvm-svn: 284264
2016-10-14 19:06:25 +00:00
Krzysztof Parzyszek a22daa0fa6 Workaround to eliminate check-llvm failures after r284255
llvm-svn: 284262
2016-10-14 18:36:42 +00:00
David L Kreitzer 01a057a0c4 Add a pass to optimize patterns of vectorized interleaved memory accesses for
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.

Patch by Farhana Aleen

Differential revision: http://reviews.llvm.org/D24681

llvm-svn: 284260
2016-10-14 18:20:41 +00:00
Tom Stellard 64a9d0876c AMDGPU/SI: Don't allow unaligned scratch access
Summary: The hardware doesn't support this.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25523

llvm-svn: 284257
2016-10-14 18:10:39 +00:00
Krzysztof Parzyszek 445bd12621 [RDF] Switch RegisterRef to be a pair (Register, LaneMask)
Use PackedRegisterRef to store the register information in the graph nodes.

This commit also removes support for virtual registers. It has never been
tested or used. It will be possible to add it back if there is a need.

llvm-svn: 284255
2016-10-14 17:57:55 +00:00
David L Kreitzer d5c6755d83 [safestack] Use non-thread-local unsafe stack pointer for Contiki OS
Patch by Michael LeMay

Differential revision: http://reviews.llvm.org/D19852

llvm-svn: 284254
2016-10-14 17:56:00 +00:00
Eric Christopher c39f8b0a3a Revert "In preparation for removing getNameWithPrefix off of
TargetMachine," as it's causing sanitizer/memory issues until I
can track down this set.

This reverts commit r284203

llvm-svn: 284252
2016-10-14 17:28:23 +00:00
Pierre Gousseau b6d652adb5 [X86] Take advantage of the lzcnt instruction on btver2 architectures when ORing comparisons to zero.
This change adds transformations such as:
  zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
  To:
  srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.

Differential Revision: https://reviews.llvm.org/D23446

llvm-svn: 284248
2016-10-14 16:41:38 +00:00
Nicolai Haehnle 67624af0cc AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes
Summary:
This will be used for 64-bit MULHU, which is in turn used for the 64-bit
divide-by-constant optimization (see D24822).

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25289

llvm-svn: 284224
2016-10-14 10:30:00 +00:00
Simon Dardis b3fd189cb5 [mips] Fix aui/daui/dahi/dati for MIPSR6
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.

Reviewers: dsanders, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D21473

llvm-svn: 284218
2016-10-14 09:31:42 +00:00
Nicolai Haehnle bd15c3267f AMDGPU: Fix use-after-frees
Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25312

llvm-svn: 284215
2016-10-14 09:03:04 +00:00
Michael Zuckerman 174d2e784b [x86][ms-inline-asm] use of "jmp short" in asm is not supported
Committing in the name of Ziv Izhar: After check-all and LGTM .

The following patch is for compatability with Microsoft.
Microsoft ignores the keyword "short" when used after a jmp, for example:
__asm {
      jmp short label
      label:
      }

A test for that patch will be added in another patch, since it's located in clang's codegen tests. Link will be added shortly.
link to test: https://reviews.llvm.org/D24958

Differential Revision: https://reviews.llvm.org/D24957

llvm-svn: 284211
2016-10-14 08:09:40 +00:00
Eric Christopher 2bd52b5d91 In preparation for removing getNameWithPrefix off of TargetMachine,
sink the current behavior into the callers and sink
TargetMachine::getNameWithPrefix into TargetMachine::getSymbol.

llvm-svn: 284203
2016-10-14 05:47:41 +00:00
Eric Christopher 445c952bd0 Tidy the calls to getCurrentSection().first -> getCurrentSectionOnly to help
readability a bit.

llvm-svn: 284202
2016-10-14 05:47:37 +00:00
Konstantin Zhuravlyov c96b5d7073 [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables
Differential Revision: https://reviews.llvm.org/D25562

llvm-svn: 284196
2016-10-14 04:37:34 +00:00
Konstantin Zhuravlyov 2a2ac37c2c [AMDGPU] Add 32-bit lo/hi got and pc relative variant kinds and emit appropriate relocations
Differential Revision: https://reviews.llvm.org/D25548

llvm-svn: 284195
2016-10-14 04:21:32 +00:00
Saleem Abdulrasool 7705c4f1be CodeGen: use MSVC division on windows itanium
Windows itanium is identical to MSVC when dealing with everything but C++.
Lower the math routines into msvcrt rather than compiler-rt.

llvm-svn: 284175
2016-10-13 23:00:11 +00:00
Saleem Abdulrasool 06383dd272 CodeGen: adjust floating point operations in Windows itanium
Windows itanium is equivalent to MSVC except in C++ mode.  Ensure that the
promote the 32-bit floating point operations to their 64-bit equivalences.

llvm-svn: 284173
2016-10-13 22:38:15 +00:00
Sriraman Tallam f29fa586e1 New llc option pie-copy-relocations to optimize access to extern globals.
This option indicates copy relocations support is available from the linker
when building as PIE and allows accesses to extern globals to avoid the GOT.

Differential Revision: https://reviews.llvm.org/D24849

llvm-svn: 284160
2016-10-13 20:54:39 +00:00
Nirav Dave a81682aad4 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

llvm-svn: 284157
2016-10-13 20:23:25 +00:00
Nirav Dave 4b36957243 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151
2016-10-13 19:20:16 +00:00
Quentin Colombet b3f5a8c644 [AArch64][RegisterBankInfo] Switch to fully static opds mapping for G_BITCAST.
NFC.

llvm-svn: 284146
2016-10-13 18:46:38 +00:00
Igor Breger 8409c356ad [X86][AVX512] Fix sext v32i1 -> v32i8 lowering.
Fix PR30600.

Differential Revision: https://reviews.llvm.org/D25554

llvm-svn: 284134
2016-10-13 17:20:38 +00:00