Commit Graph

39498 Commits

Author SHA1 Message Date
Konstantin Zhuravlyov d7bdf24f32 [AMDGPU] Ask subtarget if waitcnt instruction is needed before barrier instruction
Differential Revision: https://reviews.llvm.org/D24985

llvm-svn: 282875
2016-09-30 16:50:36 +00:00
Konstantin Zhuravlyov 4658e5f7b0 [AMDGPU] Do not run scalar optimization passes at "-O0"
Differential Revision: https://reviews.llvm.org/D25055

llvm-svn: 282873
2016-09-30 16:39:24 +00:00
Dylan McKay 4a25499b13 [AVR] Add the ELF object file writer
Summary: This adds the ELF32 writer for AVR.

Reviewers: kparzysz

Subscribers: beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25031

llvm-svn: 282856
2016-09-30 14:09:20 +00:00
Dylan McKay 1a7bd84a92 [AVR] Add the assembly instruction printer
Summary:
This change adds the AVR assembly instruction printer.

No tests are included in this patch. I have left them downstream so we can
add them once `llc` successfully runs (there's very few components left
to upstream until this).

Reviewers: arsenm, kparzysz

Subscribers: wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25028

llvm-svn: 282854
2016-09-30 14:01:50 +00:00
Craig Topper f3e671e020 [AVX-512] Store address operand should be an input operand for the special stack spilling pseudos for XMM16-31 and YMM16-31 without VLX.
llvm-svn: 282843
2016-09-30 05:35:47 +00:00
Craig Topper 1c01cbe9ee [AVX-512] Add the special stack spilling pseudos for XMM16-31 and YMM16-31 without VLX to teh isFrameLoadOpcode and isFrameStoreOpcode.
llvm-svn: 282842
2016-09-30 05:35:45 +00:00
Craig Topper 3f37a4180b Revert r282835 "[AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not."
Turns out this doesn't pass verify-machineinstrs.

llvm-svn: 282841
2016-09-30 05:35:42 +00:00
Craig Topper de03ff7063 [X86] Add AVX-512 VTs to findRepresentativeClass as well as v16i16 which was also missing. Change register class to include the extra 16 AVX512 registers.
I'm not completely sure what this method does or why all the 256-bit VTs returned VR128RegClass when the comments on the method definiton say it should return the largest super register class. I just figured AVX-512 should be similar.

llvm-svn: 282836
2016-09-30 04:31:37 +00:00
Craig Topper bc6e97b8f4 [AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not.
If AVX512 is disabled, the registers should already be marked reserved. Pattern predicates and register classes on instructions should take care of most of the rest. Loads/stores and physical register copies for XMM16-31 and YMM16-31 without VLX have already been taken care of.

I'm a little unclear why this changed the register allocation of the SSE2 run of the sad.ll test, but the registers selected appear to be valid after this change.

llvm-svn: 282835
2016-09-30 04:31:33 +00:00
Matt Arsenault 5d8eb25e78 AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

llvm-svn: 282832
2016-09-30 01:50:20 +00:00
Reid Kleckner 147f91c88e [X86] Don't preserve Win64 SSE CSRs when SSE is disabled
Code that doesn't use floating point and doesn't use SSE (kernel code)
shouldn't save and restore SSE registers.

Fixes PR30503

llvm-svn: 282819
2016-09-30 00:17:49 +00:00
Quentin Colombet 4b36e0c409 [AArch64][RegisterBankInfo] Use static mapping for 3-operands instrs.
This uses a TableGen'ed like structure for all 3-operands instrs.
The output of the RegBankSelect pass should be identical but the
RegisterBankInfo will do less dynamic allocations.

llvm-svn: 282817
2016-09-30 00:10:00 +00:00
Quentin Colombet fdd303afe2 [AArch64][RegisterBankInfo] Add static value mapping for 3-op instrs.
This is the kind of input TableGen should generate at some point.
NFC.

llvm-svn: 282816
2016-09-30 00:09:58 +00:00
Quentin Colombet eb8d3da9a0 [AArch64][RegisterBankInfo] Check the statically created ValueMapping.
Make sure that the ValueMappings contain the value we expect at the
indices we expect.

NFC.

llvm-svn: 282815
2016-09-30 00:09:43 +00:00
Douglas Katzman 3ace13adfa [X86] Avoid "unused" warnings if no asserts
llvm-svn: 282732
2016-09-29 17:26:12 +00:00
Simon Pilgrim 97a4820ccd [X86][SSE] Added common helper for shuffle mask constant pool decodes.
The shuffle mask decodes have a large amount of repeated code extracting/splitting mask values from Constant data.

This patch pulls all of this duplicated code into a single helper function to identify undef elements and combine/split constant integer data into the requested shuffle mask elements.

Updated PSHUFB/VPERMIL/VPERMIL2/VPPERM decoders to use it (VPERMV/VPERMV3 could be converted as well in the future).

llvm-svn: 282720
2016-09-29 15:25:48 +00:00
Dylan McKay 7e91886a3f Revert "[AVR] Add instruction selection lowering code"
I accidentally comitted it.

llvm-svn: 282712
2016-09-29 12:49:18 +00:00
Dylan McKay b79c01a423 [AVR] Add instruction selection lowering code
Summary: This adds AVRISelLowering.cpp

Reviewers: kparzysz, arsenm

Subscribers: wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25034

llvm-svn: 282711
2016-09-29 12:44:38 +00:00
Craig Topper d875d6b9b4 [AVX-512] Support spills of XMM16-31 and YMM16-31 when VLX isn't available.
This adds new pseudo instructions that can be selected during register allocation to represent loads and stores of XMM/YMM registers when AVX512F is available, but VLX isn't. They will be converted to VEX encoded moves if the register turns out to be XMM0-15/YMM0-15. Otherwise either an EVEX VEXTRACT(store) or VBROADCAST(load) will be used.

Fixes one of the cases from PR29112.

llvm-svn: 282690
2016-09-29 06:07:09 +00:00
Craig Topper 7eb0e7ce1f [AVX-512] Replicate pattern from AVX to select VMOVDDUP for (v2f64 (X86VBroadcast f64:)). Add AVX512VL to command line of existing AVX2 test that hits this condition.
llvm-svn: 282688
2016-09-29 05:54:43 +00:00
Craig Topper e7f2611160 [X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution domain fixing table.
llvm-svn: 282687
2016-09-29 05:54:39 +00:00
Craig Topper cb3ae5a03d [X86] Remove AddedComplexity adjustments that don't seem to be needed.
llvm-svn: 282686
2016-09-29 05:54:34 +00:00
Craig Topper 816a1d7783 [X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables.
llvm-svn: 282684
2016-09-29 05:54:28 +00:00
Eric Christopher 20ac943748 Remove an unnecessary duplicate initialization of TLOF from the Mips
AsmPrinter. This was reinitializing the Mangler after we moved the
Mangler down to TLOF and causing us to have two different unnamed
global values accessed with the same name.

This should fix the problems on the ubsan tests here:
http://lab.llvm.org:8011/builders/clang-cmake-mips/builds/15307

llvm-svn: 282675
2016-09-29 02:03:52 +00:00
Eric Christopher b4b75a531e Update comment about initializing TLOF with a pointer at the previous
line or the other commented out place.

llvm-svn: 282673
2016-09-29 02:03:47 +00:00
Matt Arsenault e6740754f0 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00
Lei Liu 361615cfd0 AArch64: Set shift bit of TLSLE HI12 add instruction
Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model.  This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000.

Reviewers: t.p.northover, peter.smith, rovka

Subscribers: salim.nasser, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24702

llvm-svn: 282661
2016-09-29 01:05:48 +00:00
Quentin Colombet 40cbc27ff3 [RegisterBankInfo] Uniquely generate OperandsMapping.
This is a step toward statically allocate InstructionMapping. Like the
previous few commits, the goal is to move toward a TableGen'ed like
structure with no dynamic allocation at all.

This should already improve compile time by getting rid of a bunch of
memmove of SmallVectors.

llvm-svn: 282643
2016-09-28 22:20:49 +00:00
Konstantin Zhuravlyov e14df4b236 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00
Artem Belevich 3e1211581c [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.
These are only available on sm_60+ GPUs.

Differential Revision: https://reviews.llvm.org/D24943

llvm-svn: 282607
2016-09-28 17:25:38 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Dylan McKay 1f69cdb321 [AVR] Rename the builtin calling convention names
'BUILTIN' is clearer than 'RT' in this context.

llvm-svn: 282602
2016-09-28 16:04:40 +00:00
Marina Yatsina 76bfc6670b [x86] Accept 'retn' as an alias to 'ret[lqw]'\'ret' (At&t\Intel)
Implement 'retn' simply by aliasing it to the relevant 'ret' instruction

Commit on behalf of coby

Differential Revision: https://reviews.llvm.org/D24346

llvm-svn: 282601
2016-09-28 15:52:56 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Dylan McKay 536239f144 [AVR] Import the LLVM namespace inside AVRMCTargetDesc.cpp
llvm-svn: 282598
2016-09-28 15:35:26 +00:00
Dylan McKay e762094864 [AVR] Add AVRMCTargetDesc.cpp
Summary:
This adds the AVRMCTargetDesc file in tree. It allows creation of the
core classes used in the backend.

Reviewers: arsenm, kparzysz

Subscribers: wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25023

llvm-svn: 282597
2016-09-28 15:31:12 +00:00
Dylan McKay d6e7fc6d9a [AVR] Update the signature of createAVRAsmBackend
It has been recently changed to also take a MCTargetOptions structure.

llvm-svn: 282594
2016-09-28 14:35:07 +00:00
Dylan McKay f010a2b41a [AVR] Enable the assembly parser
We very recently landed the code. This commit enables the parser.

It also adds a missing include to AVRAsmParser.cpp

llvm-svn: 282593
2016-09-28 14:34:42 +00:00
Dylan McKay 0fe1e63837 [AVR] Merge most recent changes to AVRInstrInfo.td
This adds two new things:

- Operand types per fixup
- Atomic pseudo operations

llvm-svn: 282588
2016-09-28 13:44:02 +00:00
Dylan McKay b967d16c43 [AVR] Update the data layout
The previous data layout caused issues when dealing with atomics.

Foe example, it is illegal to load a 16-bit value with less than 16-bits
of alignment.

This changes the data layout so that all types are aligned by at least
their own width.

Interestingly, this also _slightly_ decreased register pressure in some
cases.

llvm-svn: 282587
2016-09-28 13:29:10 +00:00
Dylan McKay 1f877f06b9 [AVR] Add assembly parser
Summary: This patch adds the AVRAsmParser library.

Reviewers: arsenm, kparzysz

Subscribers: wdng, beanz, mgorny, kparzysz, simoncook, jtbandes, llvm-commits

Differential Revision: https://reviews.llvm.org/D20046

llvm-svn: 282584
2016-09-28 13:02:57 +00:00
Guy Blank 2bdc74a471 [X86][FastISel] Use a COPY from K register to a GPR instead of a K operation
The KORTEST was introduced due to a bug where a TEST instruction used a K register.
but, turns out that the opposite case of KORTEST using a GPR is now happening

The change removes the KORTEST flow and adds a COPY instruction from the K reg to a GPR.

Differential Revision: https://reviews.llvm.org/D24953

llvm-svn: 282580
2016-09-28 11:22:17 +00:00
Simon Pilgrim 55b8eaa505 Strip trailing whitespace
llvm-svn: 282579
2016-09-28 11:08:00 +00:00
Jonas Paulsson 58c5a7f55a [SystemZ] Implementation of getUnrollingPreferences().
This commit enables more unrolling for SystemZ by implementing the
SystemZTargetTransformInfo::getUnrollingPreferences() method.

It has been found that it is better to only unroll moderately, so the
DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order
to set this to a lower value for SystemZ (4).

Reviewers: Evgeny Stupachenko, Ulrich Weigand.
https://reviews.llvm.org/D24451

llvm-svn: 282570
2016-09-28 09:41:38 +00:00
Quentin Colombet c0f11a9fb8 [AArch64][RegisterBankInfo] Switch to statically allocated ValueMapping.
Another step toward TableGen'ed like structure for the RegisterBankInfo
of AArch64. By doing this, we also save a bit of compile time for the
exact same output.

llvm-svn: 282550
2016-09-27 22:55:04 +00:00
Quentin Colombet caae9cd246 [AArch64][RegisterBankInfo] Fix copy/paste in comments.
NFC.

llvm-svn: 282549
2016-09-27 22:54:57 +00:00
Sanjay Patel 764ae8bd72 [x86] add folds for FP logic with vector zeros
The 'or' case shows up in copysign. The copysign code also had 
redundant checking for a scalar zero operand with 'and', so I 
removed that. 

I'm not sure how to test vector 'and', 'andn', and 'xor' yet, 
but it seems better to just include all of the logic ops since
we're fixing 'or' anyway.

llvm-svn: 282546
2016-09-27 22:28:13 +00:00
Geoff Berry b124331db7 [TargetRegisterInfo, AArch64] Add target hook for isConstantPhysReg().
Summary:
The current implementation of isConstantPhysReg() checks for defs of
physical registers to determine if they are constant.  Some
architectures (e.g. AArch64 XZR/WZR) have registers that are constant
and may be used as destinations to indicate the generated value is
discarded, preventing isConstantPhysReg() from returning true.  This
change adds a TargetRegisterInfo hook that overrides the no defs check
for cases such as this.

Reviewers: MatzeB, qcolombet, t.p.northover, jmolloy

Subscribers: junbuml, aemerson, mcrosier, rengolin

Differential Revision: https://reviews.llvm.org/D24570

llvm-svn: 282543
2016-09-27 22:17:27 +00:00
Sanjay Patel 43ef1ad0ba [x86] use isNullFPConstant(); NFCI
Also, put the related FP logic functions together to see the similarities. 

llvm-svn: 282522
2016-09-27 18:48:02 +00:00
Krzysztof Parzyszek 586fc12e32 [RDF] Add "dead" flag to node attributes
llvm-svn: 282520
2016-09-27 18:24:33 +00:00