Commit Graph

39488 Commits

Author SHA1 Message Date
Reid Kleckner 147f91c88e [X86] Don't preserve Win64 SSE CSRs when SSE is disabled
Code that doesn't use floating point and doesn't use SSE (kernel code)
shouldn't save and restore SSE registers.

Fixes PR30503

llvm-svn: 282819
2016-09-30 00:17:49 +00:00
Quentin Colombet 4b36e0c409 [AArch64][RegisterBankInfo] Use static mapping for 3-operands instrs.
This uses a TableGen'ed like structure for all 3-operands instrs.
The output of the RegBankSelect pass should be identical but the
RegisterBankInfo will do less dynamic allocations.

llvm-svn: 282817
2016-09-30 00:10:00 +00:00
Quentin Colombet fdd303afe2 [AArch64][RegisterBankInfo] Add static value mapping for 3-op instrs.
This is the kind of input TableGen should generate at some point.
NFC.

llvm-svn: 282816
2016-09-30 00:09:58 +00:00
Quentin Colombet eb8d3da9a0 [AArch64][RegisterBankInfo] Check the statically created ValueMapping.
Make sure that the ValueMappings contain the value we expect at the
indices we expect.

NFC.

llvm-svn: 282815
2016-09-30 00:09:43 +00:00
Douglas Katzman 3ace13adfa [X86] Avoid "unused" warnings if no asserts
llvm-svn: 282732
2016-09-29 17:26:12 +00:00
Simon Pilgrim 97a4820ccd [X86][SSE] Added common helper for shuffle mask constant pool decodes.
The shuffle mask decodes have a large amount of repeated code extracting/splitting mask values from Constant data.

This patch pulls all of this duplicated code into a single helper function to identify undef elements and combine/split constant integer data into the requested shuffle mask elements.

Updated PSHUFB/VPERMIL/VPERMIL2/VPPERM decoders to use it (VPERMV/VPERMV3 could be converted as well in the future).

llvm-svn: 282720
2016-09-29 15:25:48 +00:00
Dylan McKay 7e91886a3f Revert "[AVR] Add instruction selection lowering code"
I accidentally comitted it.

llvm-svn: 282712
2016-09-29 12:49:18 +00:00
Dylan McKay b79c01a423 [AVR] Add instruction selection lowering code
Summary: This adds AVRISelLowering.cpp

Reviewers: kparzysz, arsenm

Subscribers: wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25034

llvm-svn: 282711
2016-09-29 12:44:38 +00:00
Craig Topper d875d6b9b4 [AVX-512] Support spills of XMM16-31 and YMM16-31 when VLX isn't available.
This adds new pseudo instructions that can be selected during register allocation to represent loads and stores of XMM/YMM registers when AVX512F is available, but VLX isn't. They will be converted to VEX encoded moves if the register turns out to be XMM0-15/YMM0-15. Otherwise either an EVEX VEXTRACT(store) or VBROADCAST(load) will be used.

Fixes one of the cases from PR29112.

llvm-svn: 282690
2016-09-29 06:07:09 +00:00
Craig Topper 7eb0e7ce1f [AVX-512] Replicate pattern from AVX to select VMOVDDUP for (v2f64 (X86VBroadcast f64:)). Add AVX512VL to command line of existing AVX2 test that hits this condition.
llvm-svn: 282688
2016-09-29 05:54:43 +00:00
Craig Topper e7f2611160 [X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution domain fixing table.
llvm-svn: 282687
2016-09-29 05:54:39 +00:00
Craig Topper cb3ae5a03d [X86] Remove AddedComplexity adjustments that don't seem to be needed.
llvm-svn: 282686
2016-09-29 05:54:34 +00:00
Craig Topper 816a1d7783 [X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables.
llvm-svn: 282684
2016-09-29 05:54:28 +00:00
Eric Christopher 20ac943748 Remove an unnecessary duplicate initialization of TLOF from the Mips
AsmPrinter. This was reinitializing the Mangler after we moved the
Mangler down to TLOF and causing us to have two different unnamed
global values accessed with the same name.

This should fix the problems on the ubsan tests here:
http://lab.llvm.org:8011/builders/clang-cmake-mips/builds/15307

llvm-svn: 282675
2016-09-29 02:03:52 +00:00
Eric Christopher b4b75a531e Update comment about initializing TLOF with a pointer at the previous
line or the other commented out place.

llvm-svn: 282673
2016-09-29 02:03:47 +00:00
Matt Arsenault e6740754f0 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00
Lei Liu 361615cfd0 AArch64: Set shift bit of TLSLE HI12 add instruction
Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model.  This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000.

Reviewers: t.p.northover, peter.smith, rovka

Subscribers: salim.nasser, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D24702

llvm-svn: 282661
2016-09-29 01:05:48 +00:00
Quentin Colombet 40cbc27ff3 [RegisterBankInfo] Uniquely generate OperandsMapping.
This is a step toward statically allocate InstructionMapping. Like the
previous few commits, the goal is to move toward a TableGen'ed like
structure with no dynamic allocation at all.

This should already improve compile time by getting rid of a bunch of
memmove of SmallVectors.

llvm-svn: 282643
2016-09-28 22:20:49 +00:00
Konstantin Zhuravlyov e14df4b236 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00
Artem Belevich 3e1211581c [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.
These are only available on sm_60+ GPUs.

Differential Revision: https://reviews.llvm.org/D24943

llvm-svn: 282607
2016-09-28 17:25:38 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Dylan McKay 1f69cdb321 [AVR] Rename the builtin calling convention names
'BUILTIN' is clearer than 'RT' in this context.

llvm-svn: 282602
2016-09-28 16:04:40 +00:00
Marina Yatsina 76bfc6670b [x86] Accept 'retn' as an alias to 'ret[lqw]'\'ret' (At&t\Intel)
Implement 'retn' simply by aliasing it to the relevant 'ret' instruction

Commit on behalf of coby

Differential Revision: https://reviews.llvm.org/D24346

llvm-svn: 282601
2016-09-28 15:52:56 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Dylan McKay 536239f144 [AVR] Import the LLVM namespace inside AVRMCTargetDesc.cpp
llvm-svn: 282598
2016-09-28 15:35:26 +00:00
Dylan McKay e762094864 [AVR] Add AVRMCTargetDesc.cpp
Summary:
This adds the AVRMCTargetDesc file in tree. It allows creation of the
core classes used in the backend.

Reviewers: arsenm, kparzysz

Subscribers: wdng, beanz, mgorny

Differential Revision: https://reviews.llvm.org/D25023

llvm-svn: 282597
2016-09-28 15:31:12 +00:00
Dylan McKay d6e7fc6d9a [AVR] Update the signature of createAVRAsmBackend
It has been recently changed to also take a MCTargetOptions structure.

llvm-svn: 282594
2016-09-28 14:35:07 +00:00
Dylan McKay f010a2b41a [AVR] Enable the assembly parser
We very recently landed the code. This commit enables the parser.

It also adds a missing include to AVRAsmParser.cpp

llvm-svn: 282593
2016-09-28 14:34:42 +00:00
Dylan McKay 0fe1e63837 [AVR] Merge most recent changes to AVRInstrInfo.td
This adds two new things:

- Operand types per fixup
- Atomic pseudo operations

llvm-svn: 282588
2016-09-28 13:44:02 +00:00
Dylan McKay b967d16c43 [AVR] Update the data layout
The previous data layout caused issues when dealing with atomics.

Foe example, it is illegal to load a 16-bit value with less than 16-bits
of alignment.

This changes the data layout so that all types are aligned by at least
their own width.

Interestingly, this also _slightly_ decreased register pressure in some
cases.

llvm-svn: 282587
2016-09-28 13:29:10 +00:00
Dylan McKay 1f877f06b9 [AVR] Add assembly parser
Summary: This patch adds the AVRAsmParser library.

Reviewers: arsenm, kparzysz

Subscribers: wdng, beanz, mgorny, kparzysz, simoncook, jtbandes, llvm-commits

Differential Revision: https://reviews.llvm.org/D20046

llvm-svn: 282584
2016-09-28 13:02:57 +00:00
Guy Blank 2bdc74a471 [X86][FastISel] Use a COPY from K register to a GPR instead of a K operation
The KORTEST was introduced due to a bug where a TEST instruction used a K register.
but, turns out that the opposite case of KORTEST using a GPR is now happening

The change removes the KORTEST flow and adds a COPY instruction from the K reg to a GPR.

Differential Revision: https://reviews.llvm.org/D24953

llvm-svn: 282580
2016-09-28 11:22:17 +00:00
Simon Pilgrim 55b8eaa505 Strip trailing whitespace
llvm-svn: 282579
2016-09-28 11:08:00 +00:00
Jonas Paulsson 58c5a7f55a [SystemZ] Implementation of getUnrollingPreferences().
This commit enables more unrolling for SystemZ by implementing the
SystemZTargetTransformInfo::getUnrollingPreferences() method.

It has been found that it is better to only unroll moderately, so the
DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order
to set this to a lower value for SystemZ (4).

Reviewers: Evgeny Stupachenko, Ulrich Weigand.
https://reviews.llvm.org/D24451

llvm-svn: 282570
2016-09-28 09:41:38 +00:00
Quentin Colombet c0f11a9fb8 [AArch64][RegisterBankInfo] Switch to statically allocated ValueMapping.
Another step toward TableGen'ed like structure for the RegisterBankInfo
of AArch64. By doing this, we also save a bit of compile time for the
exact same output.

llvm-svn: 282550
2016-09-27 22:55:04 +00:00
Quentin Colombet caae9cd246 [AArch64][RegisterBankInfo] Fix copy/paste in comments.
NFC.

llvm-svn: 282549
2016-09-27 22:54:57 +00:00
Sanjay Patel 764ae8bd72 [x86] add folds for FP logic with vector zeros
The 'or' case shows up in copysign. The copysign code also had 
redundant checking for a scalar zero operand with 'and', so I 
removed that. 

I'm not sure how to test vector 'and', 'andn', and 'xor' yet, 
but it seems better to just include all of the logic ops since
we're fixing 'or' anyway.

llvm-svn: 282546
2016-09-27 22:28:13 +00:00
Geoff Berry b124331db7 [TargetRegisterInfo, AArch64] Add target hook for isConstantPhysReg().
Summary:
The current implementation of isConstantPhysReg() checks for defs of
physical registers to determine if they are constant.  Some
architectures (e.g. AArch64 XZR/WZR) have registers that are constant
and may be used as destinations to indicate the generated value is
discarded, preventing isConstantPhysReg() from returning true.  This
change adds a TargetRegisterInfo hook that overrides the no defs check
for cases such as this.

Reviewers: MatzeB, qcolombet, t.p.northover, jmolloy

Subscribers: junbuml, aemerson, mcrosier, rengolin

Differential Revision: https://reviews.llvm.org/D24570

llvm-svn: 282543
2016-09-27 22:17:27 +00:00
Sanjay Patel 43ef1ad0ba [x86] use isNullFPConstant(); NFCI
Also, put the related FP logic functions together to see the similarities. 

llvm-svn: 282522
2016-09-27 18:48:02 +00:00
Krzysztof Parzyszek 586fc12e32 [RDF] Add "dead" flag to node attributes
llvm-svn: 282520
2016-09-27 18:24:33 +00:00
Krzysztof Parzyszek 1d32220721 [RDF] Special treatment of exception handling registers
A landing pad can have live-in registers that are defined by the runtime,
not the program (exception pointer register and exception selector
register). Make sure to recognize that case and not link these registers
with any defs in the program.
Each landing pad will have phi nodes added at the beginning to provide
definitions of these registers, but the uses of those phi nodes will not
have any reaching defs.

llvm-svn: 282519
2016-09-27 18:18:44 +00:00
Konstantin Zhuravlyov da4687c531 [AMDGPU] Enable changing instprinter's behavior based on the per-function
subtarget

This is a prerequisite for coming waitcnt changes

Differential Revision: https://reviews.llvm.org/D24939

llvm-svn: 282489
2016-09-27 14:42:48 +00:00
Simon Dardis d2ed8abb15 [mips] Disable tail calls temporarily
Disable tail calls while the remaining bugs are fixed. Enable only for tests.

Reviewers: vkalintiris

Differential Review: https://reviews.llvm.org/D24912

llvm-svn: 282487
2016-09-27 13:15:54 +00:00
Simon Dardis 0486d585c5 [mips] Add rsqrt, recip for MIPS
Add rsqrt.[ds], recip.[ds] for MIPS. Correct the microMIPS definitions for
architecture support and register usage.

Reviewers: vkalintiris, zoran.jovanoic

Differential Review: https://reviews.llvm.org/D24499

llvm-svn: 282485
2016-09-27 12:25:15 +00:00
Nemanja Ivanovic 6f22b41398 [Power9] Builtins for ELF v.2 API conformance - back end portion
This patch corresponds to review:
https://reviews.llvm.org/D24396

This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.

llvm-svn: 282478
2016-09-27 08:42:12 +00:00
Craig Topper 789888002a [X86] Use std::max to calculate alignment instead of assuming RC->getSize() will not return a value greater than 32. I think it theoretically could be 64 for AVX-512.
llvm-svn: 282471
2016-09-27 06:44:25 +00:00
Davide Italiano a9f85d68cc [CodeGen] Add support for emitting .init_array instead of .ctors on FreeBSD.
PR: 30494
llvm-svn: 282451
2016-09-26 22:53:15 +00:00
Derek Schuff 92d300eb8f [WebAssembly] Use the frame pointer instead of the stack pointer
When we have dynamic allocas we have a frame pointer, and
when we're lowering frame indexes we should make sure we use it.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D24889

llvm-svn: 282442
2016-09-26 21:18:03 +00:00
Nirav Dave 6477ce2697 Add support for Code16GCC
[X86] The .code16gcc directive parses X86 assembly input in 32-bit mode and
outputs in 16-bit mode. Teach parser to switch modes appropriately.

Reviewers: dwmw2, craig.topper

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20109

llvm-svn: 282430
2016-09-26 19:33:36 +00:00
Andrew Kaylor 595307a468 Add optimization bisect support to an optional Mips pass
Differential Revision: https://reviews.llvm.org/D19513

llvm-svn: 282428
2016-09-26 19:05:37 +00:00