Commit Graph

15568 Commits

Author SHA1 Message Date
Manman Ren 5751814eda Swift Calling Convention: swifterror target support.
Differential Revision: http://reviews.llvm.org/D18716

llvm-svn: 265997
2016-04-11 21:08:06 +00:00
Tom Stellard 0ffdf65eaa Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute"
This reverts commit r263720.

Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute.

llvm-svn: 265992
2016-04-11 20:38:40 +00:00
Adrian Prantl b01a4d48ac More upgrading of old- and very-old-style debug info in testcases.
llvm-svn: 265953
2016-04-11 15:53:44 +00:00
Petar Jovanovic e578e970cb [mips] Make Static a default relocation model for MIPS codegen
This change follows up defaults for GCC and Clang, so LLVM does not differ
from them. While number of the test files are touched with this change, they
all keep the old (expected) behaviour with the explicit option:
"-relocation-model=pic"
The tests that have not been touched are insensitive to relocation model.

Differential Revision: http://reviews.llvm.org/D17995

llvm-svn: 265949
2016-04-11 15:24:23 +00:00
Ulrich Weigand 848a513d0a [SystemZ] Support conditional indirect sibling calls via BCR
This adds a conditional variant of CallBR instruction, CallBCR. Also,
it can be fused with integer comparisons, resulting in one of the new
C*BCall instructions.

In addition to CallBRCL limitations, this has another one: it won't
trigger if the function to call isn't already in %r1 - see f22 in the
test for an example (it's also why the loads in tests are volatile).

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18928

llvm-svn: 265933
2016-04-11 12:12:32 +00:00
Simon Pilgrim ac0cac3b0d [X86] Added extra widening tests for and/xor/or bit operations
Add tests for bitcasting an illegal vector to/from a legal scalar

Additional tests requested for D18944

llvm-svn: 265930
2016-04-11 11:10:36 +00:00
Simon Pilgrim ff434aaa3a [X86] Added extra widening tests for and/xor/or bit operations
To make sure we're dealing with both cases of legal/illegal number of vector elements and legal/illegal vector element types

llvm-svn: 265929
2016-04-11 10:58:52 +00:00
Simon Pilgrim d839fe9cfb [X86] Regenerated sdglue test checks
llvm-svn: 265927
2016-04-11 10:22:05 +00:00
Simon Pilgrim da730769fb [X86] Added widening tests for and/xor/or bit operations
Part of additional tests requested for D18944

llvm-svn: 265925
2016-04-11 10:16:27 +00:00
Simon Pilgrim 2d463b1a0f [X86][AVX512] Add vector integer division by constant tests
Added sdiv/srem and udiv/urem tests cases for 512 bit vectors.

llvm-svn: 265903
2016-04-10 17:14:26 +00:00
Simon Pilgrim d263fdc512 [X86][AVX512BW] Add support for v64i8 multiplies
Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets.

I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL.

Differential Revision: http://reviews.llvm.org/D18937

llvm-svn: 265902
2016-04-10 17:02:48 +00:00
Simon Pilgrim bab9979ee9 [X86][AVX512] Regenerated mask op tests
llvm-svn: 265898
2016-04-10 14:16:03 +00:00
Charles Davis 2f65f35c27 [CodeGen] Don't assume that fixed stack objects are aligned in a stack-realigned function.
Summary:
After we make the adjustment, we can assume that for local allocas, but
not for stack parameters, the return address, or any other fixed stack
object (which has a negative offset and therefore lies prior to the
adjusted SP).

Fixes PR26662.

Reviewers: hfinkel, qcolombet, rnk

Subscribers: rnk, llvm-commits

Differential Revision: http://reviews.llvm.org/D18471

llvm-svn: 265886
2016-04-09 23:34:42 +00:00
Sanjay Patel 4abae4e0fa [x86] use BMI 'andn' for logic + compare ops
With BMI, we can use 'andn' to save an instruction when the result is only used in a compare.
This is related to one of the potential sequences to check 'isfinite' in:
https://llvm.org/bugs/show_bug.cgi?id=27164

Differential Revision: http://reviews.llvm.org/D18910

llvm-svn: 265875
2016-04-09 16:02:52 +00:00
Simon Pilgrim 1cc5712763 [X86][XOP] Support for VPPERM 2-input shuffle mask decoding
This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle.

The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp.

Differential Revision: http://reviews.llvm.org/D18441

llvm-svn: 265874
2016-04-09 14:51:26 +00:00
Sanjay Patel c0a627524d [x86] show missed opportunities to use andn
llvm-svn: 265850
2016-04-08 21:26:11 +00:00
Sanjay Patel a699ecc55c [x86] regenerate checks for BMI tests
llvm-svn: 265841
2016-04-08 20:29:39 +00:00
James Y Knight fec5c64b63 Add trailing colons to labels in a test.
This will avoid matching on the FILENAME if it happened to contain, say,
"f4" anywhere in the file path.

llvm-svn: 265837
2016-04-08 19:49:03 +00:00
Nirav Dave 66f485f4e2 Fix Load Control Dependence in MemCpy Generation
In Memcpy lowering we had missed a dependence from the load of the
operation to successor operations. This causes us to potentially
construct an in initial DAG with a memory dependence not fully
represented in the chain sub-DAG but rather require looking at the
entire DAG breaking alias analysis by allowing incorrect repositioning
of memory operations.

To work around this, r200033 changed DAGCombiner::GatherAllAliases to be
conservative if any possible issues to happen. Unfortunately this check
forbade many non-problematic situations as well. For example, it's
common for incoming argument lowering to add a non-aliasing load hanging
off of EntryNode. Then, if GatherAllAliases visited EntryNode, it would
find that other (unvisited) use of the EntryNode chain, and just give up
entirely. Furthermore, the check was incomplete: it would not actually
detect all such potentially problematic DAG constructions, because
GatherAllAliases did not guarantee to visit all chain nodes going up to
the root EntryNode. This is in general fine -- giving up early will just
miss a potential optimization, not generate incorrect results. But, for
this non-chain dependency detection code, it's possible that you could
have a load attached to a higher-up chain node than any which were
visited. If that load aliases your store, but the only dependency is
through the value operand of a non-aliasing store, it would've been
missed by this code, and potentially reordered.

With the dependence added, this check can be removed and Alias Analysis
can be much more aggressive. This fixes code quality regression in the
Consecutive Store Merge cleanup (D14834).

Test Change:

ppc64-align-long-double.ll now may see multiple serializations
of its stores

Differential Revision: http://reviews.llvm.org/D18062

llvm-svn: 265836
2016-04-08 19:44:40 +00:00
Kevin B. Smith e0a6fc3bcc [X86] Fix PR23155 by turning on X86FixupBWInsts by default.
Differential Revision: http://reviews.llvm.org/D18866

llvm-svn: 265830
2016-04-08 18:58:29 +00:00
Colin LeMahieu efe3732883 Revert r265817
lld tests need to be addressed.

llvm-svn: 265822
2016-04-08 18:15:37 +00:00
Colin LeMahieu 4a1975ba8e [llvm-objdump] Printing hex instead of dec by default
Differential Revision: http://reviews.llvm.org/D18770

llvm-svn: 265817
2016-04-08 17:55:03 +00:00
Ulrich Weigand fa2dffbc1a [SystemZ] Support conditional sibling calls via BRCL
This adds a conditional variant of CallJG instruction, CallBRCL.
It can be used for conditional sibling calls. Unfortunately, due
to IfCvt limitations, it only really works well for functions without
arguments.

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18864

llvm-svn: 265814
2016-04-08 17:22:19 +00:00
Quentin Colombet bd19c8a39e [AArch64] Add a test case for the default mapping of RegBankSelect.
llvm-svn: 265811
2016-04-08 17:11:51 +00:00
Quentin Colombet 876ddf8107 [MIR] Teach the parser how to deal with register banks.
llvm-svn: 265802
2016-04-08 16:40:43 +00:00
Sam Parker 2d5126cdf5 [ARM] Enable SMLAW[B|T] and SMLUW[B|T] instruction selection
Added ISelDAGToDAG functions to enable selection of the smlawb, smlawt,
smulwb and smulwt instructions for the ARM backend. Also updated the smul
CodeGen test and removed the smulw one.

Differential Revision: http://reviews.llvm.org/D18892

llvm-svn: 265793
2016-04-08 16:02:53 +00:00
Hans Wennborg 5a7723c7a2 Revert r265547 "Recommit r265309 after fixed an invalid memory reference bug happened"
It caused PR27275: "ARM: Bad machine code: Using an undefined physical register"

Also reverting the following commits that were landed on top:
r265610 "Fix the compare-clang diff error introduced by r265547."
r265639 "Fix the sanitizer bootstrap error in r265547."
r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]"

llvm-svn: 265790
2016-04-08 15:17:43 +00:00
Simon Pilgrim 872dd6c3fe [X86][SSE] Added 32-bit tests for vector lzcnt/tzcnt tests
v2i64 tests are particularly bad on 32-bit targets.

llvm-svn: 265789
2016-04-08 15:01:31 +00:00
Chuang-Yu Cheng 98c1894755 CXX_FAST_TLS calling convention: performance improvement for PPC64
This is the same change on PPC64 as r255821 on AArch64. I have even borrowed
his commit message.

The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng

http://reviews.llvm.org/D17533

llvm-svn: 265781
2016-04-08 12:04:32 +00:00
Zlatko Buljan 53a037f5cc [mips][microMIPS] Add CodeGen support for ADD, ADDIU*, ADDU* and DADD* instructions
Differential Revision: http://reviews.llvm.org/D16454

llvm-svn: 265772
2016-04-08 07:27:26 +00:00
Dmitry Polukhin 24c4f28ac9 [IFUNC] Fix ifunc-asm.ll test
It seems that llc cannot be called used in assembler tests so test that
checks asm for particular target needs to be moved to codegen.

llvm-svn: 265770
2016-04-08 06:45:19 +00:00
Amaury Sechet c53ad4f3b2 Do not select EhPad BB in MachineBlockPlacement when there is regular BB to schedule
Summary:
EHPad BB are not entered the classic way and therefor do not need to be placed after their predecessors. This patch make sure EHPad BB are not chosen amongst successors to form chains, and are selected as last resort when selecting the best candidate.

EHPad are scheduled in reverse probability order in order to have them flow into each others naturally.

Reviewers: chandlerc, majnemer, rafael, MatzeB, escha, silvas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17625

llvm-svn: 265726
2016-04-07 21:29:39 +00:00
Jan Vesely 43b7b5b846 AMDGPU/SI: Implement atomic load/store for i32 and i64
Standard load/store instructions with GLC bit set.

Reviewers: tstellardAMD, arsenm

Differential Revision: http://reviews.llvm.org/D18760

llvm-svn: 265709
2016-04-07 19:23:11 +00:00
Tom Stellard 9112758077 AMDGPU/SI: Add latency for export instructions
Reviewers: arsenm, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18599

llvm-svn: 265708
2016-04-07 18:30:05 +00:00
Simon Pilgrim fba9352f31 [X86][SSE] Added bitmask pattern shuffle tests
Based on OR(AND(MASK,V0),AND(~MASK,V1)) style patterns

llvm-svn: 265697
2016-04-07 17:23:55 +00:00
Kevin B. Smith 3802c4af59 [X86]: Fix for PR27251.
Differential Revision: http://reviews.llvm.org/D18850

llvm-svn: 265690
2016-04-07 16:15:34 +00:00
Ulrich Weigand 2eb027d21f [SystemZ] Implement conditional returns
Return is now considered a predicable instruction, and is converted
to a newly-added CondReturn (which maps to BCR to %r14) instruction by
the if conversion pass.

Also, fused compare-and-branch transform knows about conditional
returns, emitting the proper fused instructions for them.

This transform triggers on a *lot* of tests, hence the huge diffstat.
The changes are mostly jX to br %r14 -> bXr %r14.

Author: koriakin

Differential Revision: http://reviews.llvm.org/D17339

llvm-svn: 265689
2016-04-07 16:11:44 +00:00
Ehsan Amiri 4701a91e59 [PPC] Enable transformations in PPCPassConfig::addIRPasses at O2
http://reviews.llvm.org/D18562

A large number of testcases has been modified so they pass after this test.
One testcase is deleted, because I realized even after undoing the original
change that was committed with this testcase, the testcase still passes. So
I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll)
is an arbitrary change to keep it passing. Given the original intention of the
testcase, and the fact that fixing it will require some time to change the testcase,
we concluded that this quick change will be enough.

llvm-svn: 265683
2016-04-07 15:30:55 +00:00
Simon Pilgrim d54bae6525 [X86][SSE] Add support for VZEXT constant folding
llvm-svn: 265646
2016-04-07 07:52:45 +00:00
Ahmed Bougacha 1cf67fb9cb [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
Re-apply r265450 which caused PR27245 and was reverted in r265559
because of a wrong generalization: the fetch_and_add->add_and_fetch
combine only works in specific, but pretty common, cases:
  (icmp slt x, 0) -> (icmp sle (add x, 1), 0)
  (icmp sge x, 0) -> (icmp sgt (add x, 1), 0)
  (icmp sle x, 0) -> (icmp slt (sub x, 1), 0)
  (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0)

Original Message:

We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265636
2016-04-07 02:07:10 +00:00
Ahmed Bougacha 70bde5445b [X86] Refresh and tweak EFLAGS reuse tests. NFC.
The non-1 and EQ/NE tests were misguided.

llvm-svn: 265635
2016-04-07 02:06:53 +00:00
Hans Wennborg ab16be799c Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
Third time's the charm? The previous attempt (r265345) caused ASan test
failures on X86, as broken CFI caused stack traces to not work.

This version of the patch makes sure not to merge with stack adjustments
that have CFI, and to not add merged instructions' offests to the CFI
about to be generated.

This is already covered by the lit tests; I just got the expectations
wrong previously.

llvm-svn: 265623
2016-04-07 00:05:49 +00:00
Ehsan Amiri 322eca3849 [PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP
http://reviews.llvm.org/D18405

When the integer value loaded is never used directly as integer we should use VSX 
or Floating Point Facility integer loads and avoid extra direct move

llvm-svn: 265593
2016-04-06 20:12:29 +00:00
Nicolai Haehnle df3a20cd80 AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.

Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Differential Revision: http://reviews.llvm.org/D18559

llvm-svn: 265589
2016-04-06 19:40:20 +00:00
Hans Wennborg 6849f8f15f Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC."
It caused ASan 32-bit tests to hang (PR27245).

llvm-svn: 265559
2016-04-06 16:44:38 +00:00
Hans Wennborg a7e396b5ef Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)""
It seems to be causing ASan tests to crash, probably due to
miscompiling the run-time somehow.

llvm-svn: 265551
2016-04-06 16:10:20 +00:00
Wei Mi 18293bef4e Recommit r265309 after fixed an invalid memory reference bug happened
when DenseMap growed and moved memory. I verified it fixed the bootstrap
problem on x86_64-linux-gnu but I cannot verify whether it fixes
the bootstrap error on clang-ppc64be-linux. I will watch the build-bot
result closely.

Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.

analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.

To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.

Differential Revision: http://reviews.llvm.org/D15302

llvm-svn: 265547
2016-04-06 15:41:07 +00:00
Chuang-Yu Cheng 6e1408a891 [ppc64] Temporary disable sibling call optimization on ppc64 due to breaking test case
r265506 breaks print-stack-trace.cc test case of compiler-rt in bootstrap
test.

http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/1708

llvm-svn: 265528
2016-04-06 10:48:36 +00:00
Chuang-Yu Cheng 2e5973ef74 [ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi
This patch enable sibling call optimization on ppc64 ELFv1/ELFv2 abi, and
add a couple of test cases. This patch also passed llvm/clang bootstrap
test, and spec2006 build/run/result validation.

Original issue: https://llvm.org/bugs/show_bug.cgi?id=25617

Great thanks to Tom's (tjablin) help, he contributed a lot to this patch.
Thanks Hal and Kit's invaluable opinions!

Reviewers: hfinkel kbarton

http://reviews.llvm.org/D16315

llvm-svn: 265506
2016-04-06 02:04:38 +00:00
Sanjoy Das 65a60670e8 Lower @llvm.experimental.deoptimize as a noreturn call
While preserving the return value for @llvm.experimental.deoptimize at
the IR level is useful during mid-level optimization, doing so at the
machine instruction level requires generating some extra code and a
return that is non-ideal.  This change has LLVM lower

```
  %val = call @llvm.experimental.deoptimize
  ret %val
```

to effectively

```
  call @__llvm_deoptimize()
  unreachable
```

instead.

llvm-svn: 265502
2016-04-06 01:33:49 +00:00
Davide Italiano ea04026c13 [DebugInfo] Fix tests so that each subprogram belongs to a CU.
llvm-svn: 265490
2016-04-05 23:37:08 +00:00
Manman Ren 802cd6f9d7 Swift Calling Convention: swiftcc for ARM.
Differential Revision: http://reviews.llvm.org/D18769

llvm-svn: 265482
2016-04-05 22:44:44 +00:00
Evgeniy Stepanov dde29e2799 Faster stack-protector for Android/AArch64.
Bionic has a defined thread-local location for the stack protector
cookie. Emit a direct load instead of going through __stack_chk_guard.

llvm-svn: 265481
2016-04-05 22:41:50 +00:00
Manman Ren f8bdd88cd9 Swift Calling Convention: add swiftcc.
Differential Revision: http://reviews.llvm.org/D17863

llvm-svn: 265480
2016-04-05 22:41:47 +00:00
Ahmed Bougacha 50e6cd4a3a [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265450
2016-04-05 20:02:57 +00:00
Ahmed Bougacha b76e7253e9 [X86] Add tests for ATOMIC_LOAD_OP EFLAGS reuse. NFC.
llvm-svn: 265448
2016-04-05 20:02:44 +00:00
Quentin Colombet 811da0efbc [AArch64][Test] Do not override the suffixes for test cases.
llvm-svn: 265441
2016-04-05 19:26:42 +00:00
Sanjay Patel 0484879fe7 [x86] regenerate checks
utils/update_test_checks.py was improved with:
http://reviews.llvm.org/rL265414
to include the first line of the function (expected to be
a comment line). This ensures that nothing bad has happened
before the first actual line of checked asm. It also matches
the existing behavior of the old script.

llvm-svn: 265416
2016-04-05 17:12:19 +00:00
JF Bastien c6ba5ead5e WebAssembly: fix cfg-stackify test
It was broken by reshuffling induced by r265397 'Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge'.

llvm-svn: 265415
2016-04-05 17:01:52 +00:00
Jacques Pienaar 42991b3e5a [lanai] LanaiSetflagAluCombiner more conservative
Summary: LanaiSetflagAluCombiner could previously combine instructions across basic building blocks even when not legal. Make the LanaiSetflagAluCombiner more conservative to avoid this.

Reviewers: eliben

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D18746

llvm-svn: 265411
2016-04-05 16:18:13 +00:00
Konstantin Zhuravlyov e63e02cb0c [AMDGPU] Emit linkonce and linkonce_odr symbols
Differential Revision: http://reviews.llvm.org/D18726

llvm-svn: 265408
2016-04-05 16:00:58 +00:00
Chuang-Yu Cheng f0eba83571 Add missing test for the "Don't delete empty preheaders" added in r265397
Author: Tom Jablin (tjablin)
llvm-svn: 265402
2016-04-05 14:21:32 +00:00
Chuang-Yu Cheng d3fb38cae5 Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge
Presently, CodeGenPrepare deletes all nearly empty (only phi and branch)
basic blocks. This pass can delete loop preheaders which frequently creates
critical edges. A preheader can be a convenient place to spill registers to
the stack. If the entrance to a loop body is a critical edge, then spills
may occur in the loop body rather than immediately before it. This patch
protects loop preheaders from deletion in CodeGenPrepare even if they are
nearly empty.

Since the patch alters the CFG, it affects a large number of test cases.
In most cases, the changes are merely cosmetic (basic blocks have different
names or instruction orders change slightly). I am somewhat concerned about
the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not
deleted, then the MIPS backend does not take advantage of a branch delay
slot. Consequently, I would like some close review by a MIPS expert.

The patch also partially subsumes D16893 from George Burgess IV. George
correctly notes that CodeGenPrepare does not actually preserve the dominator
tree. I think the dominator tree was usually not valid when CodeGenPrepare
ran, but I am using LoopInfo to mark preheaders, so the dominator tree is
now always valid before CodeGenPrepare.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng

http://reviews.llvm.org/D16984

llvm-svn: 265397
2016-04-05 14:06:20 +00:00
Simon Dardis d9d41f531e [mips] MIPSR6 Compact jump support
This patch adds support for compact jumps similiar to the previous compact
branch support for MIPSR6. Unlike compact branches, compact jumps do not
have a forbidden slot.

As MipsInstrInfo::getEquivalentCompactForm can determine the correct
expansion for jumps and branches for both microMIPS and MIPSR6, remove the
unnecessary distinction in the delay slot filler.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders
llvm-svn: 265390
2016-04-05 12:50:29 +00:00
Justin Holewinski c79979299a [NVPTX] Handle ldg created from sign-/zero-extended load
Reviewers: jingyue

Subscribers: jholewinski

Differential Revision: http://reviews.llvm.org/D18053

llvm-svn: 265389
2016-04-05 12:38:01 +00:00
Teresa Johnson f4cf1c3eb4 Don't fold double constant to an integer if dest type not integral
Summary:
I encountered this issue when constant folding during inlining tried to
fold away a bitcast of a double to an x86_mmx, which is not an integral
type. The test case exposes the same issue with a smaller code snippet
during early CSE.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18528

llvm-svn: 265367
2016-04-04 23:50:46 +00:00
Matthias Braun 571e3481e7 test: Always treat .mir files as tests even outside of CodeGen/MIR
We missed a handful of .mir tests that existed outside the
test/CodeGen/MIR directory.

Also fix the three powerpc .mir tests that nobody noticed were broken.

llvm-svn: 265350
2016-04-04 21:23:44 +00:00
Hans Wennborg a47a692341 Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
The original commit miscompiled things on 32-bit Windows, e.g. a Clang
boostrap. It turns out that mergeSPUpdates() was a bit too generous in
what it interpreted as a stack adjustment, causing the following code:

        addl    $12, %esp
        leal    -4(%ebp), %esp

To be "optimized" into simply:

        addl    $8, %esp

This commit tightens up mergeSPUpdates() and includes a new test
(test14 in movtopush.ll) for this situation.

llvm-svn: 265345
2016-04-04 21:02:46 +00:00
Sean Silva f4517a15b8 Beef up some dllexport tests.
Adds some dllexport tests to verify that:
  - Variables in bss are exported appropriately
  - Non-dllexport symbols aliased to dllexport symbols are not exported
  - Symbols declared as dllexport but are not defined are not exported

We plan to enable dllimport/dllexport support for the PS4, and these
additional tests are for points we noticed in our internal testing.

Patch by Warren Ristow!

Differential Revision: http://reviews.llvm.org/D18682

llvm-svn: 265333
2016-04-04 19:10:55 +00:00
Matthias Braun 870c34f0cf ARM, AArch64, X86: Check preserved registers for tail calls.
We can only perform a tail call to a callee that preserves all the
registers that the caller needs to preserve.

This situation happens with calling conventions like preserver_mostcc or
cxx_fast_tls. It was explicitely handled for fast_tls and failing for
preserve_most. This patch generalizes the check to any calling
convention.

Related to rdar://24207743

Differential Revision: http://reviews.llvm.org/D18680

llvm-svn: 265329
2016-04-04 18:56:13 +00:00
Wei Mi fb5252cac1 Revert r265309 and r265312 because they caused some errors I need to investigate.
llvm-svn: 265317
2016-04-04 17:45:03 +00:00
Derek Schuff 1dbf7a571f Add MachineFunctionProperty checks for AllVRegsAllocated for target passes
Summary:
This adds the same checks that were added in r264593 to all
target-specific passes that run after register allocation.

Reviewers: qcolombet

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18525

llvm-svn: 265313
2016-04-04 17:09:25 +00:00
Wei Mi ffbc9c7f3b Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.

analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.

To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.

Differential Revision: http://reviews.llvm.org/D15302

llvm-svn: 265309
2016-04-04 16:42:40 +00:00
Ulrich Weigand a9ac6d6cc2 [SystemZ] Support ATOMIC_FENCE
A cross-thread sequentially consistent fence should be lowered into
z/Architecture's BCR serialization instruction, instead of causing a
fatal error in the back-end.

Author: bryanpkc
Differential Revision: http://reviews.llvm.org/D18644

llvm-svn: 265292
2016-04-04 12:45:44 +00:00
Ulrich Weigand f557d08325 [SystemZ] Support llvm.frameaddress/llvm.returnaddress intrinsics
Enable the SystemZ back-end to lower FRAMEADDR and RETURNADDR, which
previously would cause the back-end to crash.  Currently, only a
frame count of zero is supported.

Author: bryanpkc
Differential Revision: http://reviews.llvm.org/D18514

llvm-svn: 265291
2016-04-04 12:44:55 +00:00
Elena Demikhovsky e99c561391 AVX-512: Truncating store for i1 vectors
Implemented truncstore for KNL and skylake-avx512.
Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes.

Differential Revision: http://reviews.llvm.org/D18740

llvm-svn: 265283
2016-04-04 07:17:47 +00:00
Simon Pilgrim d74f6e22f2 [X86][SSE] Refreshed MOVMSK sign bit tests
llvm-svn: 265267
2016-04-03 18:59:42 +00:00
Elena Demikhovsky 5e426f7356 AVX-512: Load and Extended Load for i1 vectors
Implemented load+{sign|zero}_extend for i1 vectors
Fixed failures in i1 vector load.
Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX.

Differential Revision: http://reviews.llvm.org/D18737

llvm-svn: 265259
2016-04-03 08:41:12 +00:00
Zoran Jovanovic 2b7cc5a4ae [mips][microMIPS] Revert commits r264245 and r264248.
Commit r264245 was the reason for failing tests in LLVM test suite.
Commit r264248 depends on the first one.

llvm-svn: 265249
2016-04-02 23:06:13 +00:00
Simon Pilgrim b2e837d875 [X86][SSE] Added 1024-bit vector comparison tests
More examples of PR22603, poor vector splitting for AVX512F targets as well as missing uses of PACKSS/MOVMSK

llvm-svn: 265248
2016-04-02 21:33:09 +00:00
Simon Pilgrim 8fd7d852d5 [X86][AVX512] Added AVX512 comparison tests
llvm-svn: 265247
2016-04-02 21:24:42 +00:00
Simon Pilgrim a2c8da9e06 [X86][AVX] Added vector float truncation (double2float) tests
llvm-svn: 265222
2016-04-02 14:09:17 +00:00
Tim Northover 5dad9df9f7 AArch64: avoid clobbering SP for dead MOVimm pseudos.
We were producing ORR, which actually defines a GPR32sp rather than a GPR32.

Should fix PR23209.

llvm-svn: 265198
2016-04-01 23:14:52 +00:00
Adrian Prantl cf0961f5ea Add missing emissionKind flags to the DICompileUnits of several old testcases.
llvm-svn: 265192
2016-04-01 22:18:43 +00:00
Simon Pilgrim 3243b21dae [X86][SSE] Regenerated vector float tests - fabs / floor(etc.) / fneg / float2double
llvm-svn: 265186
2016-04-01 21:30:48 +00:00
Simon Pilgrim 1e5bf0a256 [X86][SSE] Vector i64 load tests
llvm-svn: 265185
2016-04-01 21:06:17 +00:00
Simon Pilgrim 275b2bcb76 [X86][SSE] Regenerated comparison mask and float immediate tests
llvm-svn: 265184
2016-04-01 21:00:00 +00:00
Simon Pilgrim a372a0f295 [X86][SSE] Regenerated the vec_extract tests.
llvm-svn: 265183
2016-04-01 20:55:19 +00:00
Simon Pilgrim f739d8a2ed [X86][SSE] Regenerated the vec_insert tests.
llvm-svn: 265179
2016-04-01 19:42:23 +00:00
Simon Pilgrim 6121118663 [X86][SSE] Regenerated vec_partial tests.
llvm-svn: 265173
2016-04-01 18:30:29 +00:00
Sanjay Patel 9b5b5c82ca [x86] add an SSE2 + fast-unaligned accesses run for memset nonzero tests
Was there really no other way to splat a byte in SSE2?
    punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
    pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
    pshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]

llvm-svn: 265172
2016-04-01 18:29:25 +00:00
Simon Pilgrim 858194640e [X86][SSE] Regenerated vec_logical tests.
llvm-svn: 265171
2016-04-01 18:28:23 +00:00
Tom Stellard 354a43c7bc AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}
Summary:
Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+.

32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý.

Patch by: Vedran Miletić

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: jvesely, scchan, kanarayan, arsenm

Differential Revision: http://reviews.llvm.org/D17280

llvm-svn: 265170
2016-04-01 18:27:37 +00:00
Simon Pilgrim 1b14082488 [X86][SSE] Regenerated vector sdiv to shifts tests
Added SSE + AVX1 tests as well as AVX2

llvm-svn: 265169
2016-04-01 18:18:40 +00:00
Sanjay Patel d3e3d48cb9 [x86] add an SSE1 run for these tests
Note however that this is identical to the existing SSE2 run.
What we really want is yet another run for an SSE2 machine that
also has fast unaligned 16-byte accesses.

llvm-svn: 265167
2016-04-01 18:11:30 +00:00
Simon Pilgrim b8283631a5 [X86][SSE] Regenerated vec_setcc tests.
llvm-svn: 265164
2016-04-01 17:55:02 +00:00
Simon Pilgrim 41e31ff6bd [X86][SSE] Regenerated the vec_set tests.
Replaced lots of dodgy greps with actual codegen

llvm-svn: 265163
2016-04-01 17:40:25 +00:00
Sanjay Patel 9f413364d5 [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 -
where we noticed that an intermediate splat was being generated for memsets of
non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.

Note that the SSE1 path is not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

llvm-svn: 265161
2016-04-01 17:36:45 +00:00
Sanjay Patel a05e0ff223 [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to D18566 - where we noticed that an intermediate splat was being
generated for memsets of non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The tests that were added in the last patch are now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.
In the new tests, the splat via shuffling looks ok to me, but there might be some
room for improvement depending on uarch there.

Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

Differential Revision: http://reviews.llvm.org/D18676

llvm-svn: 265148
2016-04-01 16:27:14 +00:00
Simon Pilgrim 7ec092d0f8 [X86][AVX512] Regenerated intrinsics tests
llvm-svn: 265135
2016-04-01 11:57:51 +00:00
Andrey Turetskiy 958eb46443 [X86] Introduce Lakemont CPU.
Add a new Intel MCU CPU Lakemont, which doesn't support X87.

Differential Revision: http://reviews.llvm.org/D18650

llvm-svn: 265128
2016-04-01 10:16:15 +00:00
Sean Silva 3de5ef96c9 Improve CHECK-NOT robustness of dllexport tests
This changes some dllexport tests, to verify that some symbols that
should not be exported are not, in a way that improves the robustness
of CHECK-SAME interaction with CHECK-NOT.

We plan to enable dllimport/dllexport support for the PS4, and these
changes are for points we noticed in our internal testing.

Patch by Warren Ristow!

llvm-svn: 265106
2016-04-01 03:54:03 +00:00
Sanjoy Das 9d41a8f269 Don't use an i64 return type with webkit_jscc
Re-enable an assertion enabled by Justin Lebar in rL265092.  rL265092
was breaking test/CodeGen/X86/deopt-intrinsic.ll because webkit_jscc
does not like non-i64 return types.  Change the test case to not do
that.

llvm-svn: 265099
2016-04-01 02:51:21 +00:00
Chuang-Yu Cheng 35c6181982 Fix Sub-register Rewriting in Aggressive Anti-Dependence Breaker
Previously, HandleLastUse would delete RegRef information for sub-registers
if they were dead even if their corresponding super-register were still live.

If the super-register were later renamed, then the definitions of the
sub-register would not be updated appropriately. This patch alters the
behavior so that RegInfo information for sub-registers is only deleted when
the sub-register and super-register are both dead.

This resolves PR26775. This is the mirror image of Hal's r227311 commit.

Author: Tom Jablin (tjablin)
Reviewers: kbarton uweigand nemanjai hfinkel

http://reviews.llvm.org/D18448

llvm-svn: 265097
2016-04-01 02:05:29 +00:00
Justin Lebar efcc81cbb4 [NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect.
Summary:
Previously the NVVMReflect pass would read its configuration from
command-line flags or a static configuration given to the pass at
instantiation time.

This doesn't quite work for clang's use-case.  It needs to pass a value
for __CUDA_FTZ down on a per-module basis.  We use a module flag for
this, so the NVVMReflect pass needs to be updated to read said flag.

Reviewers: tra, rnk

Subscribers: cfe-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D18672

llvm-svn: 265090
2016-04-01 01:09:07 +00:00
Adrian Prantl b8089516a5 testcase gardening: update the emissionKind enum to the new syntax. (NFC)
llvm-svn: 265081
2016-04-01 00:16:49 +00:00
Adrian Prantl b939a25707 Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.
This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder
into DICompileUnit. DIBuilder is not the right place for this enum to live
in — a metadata consumer should not have to include DIBuilder.h.
I also added a Verifier check that checks that the emission kind of a
DICompileUnit is actually legal.

http://reviews.llvm.org/D18612
<rdar://problem/25427165>

llvm-svn: 265077
2016-03-31 23:56:58 +00:00
Tim Shen 2e24d0c0c1 Move asm-printer-topological-order.ll to PowerPC backend
llvm-svn: 265067
2016-03-31 22:32:10 +00:00
Tim Shen 800ed436e5 [AsmPrinter] Print aliases in topological order
Print aliases in topological order, that is, for any alias a = b,
b must be printed before a. This is because on some targets (e.g. PowerPC)
linker expects aliases in such an order to generate correct TOC information.

GCC also prints aliases in topological order.

llvm-svn: 265064
2016-03-31 22:08:19 +00:00
Jun Bum Lim 760afcb338 [AArch64] Allow loads with imp-def to be handled in getMemOpBaseRegImmOfsWidth()
Summary:
This change will allow loads with imp-def to be clustered in machine-scheduler pass.
areMemAccessesTriviallyDisjoint() can also handle loads with imp-def.

Reviewers: mcrosier, jmolloy, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18665

llvm-svn: 265051
2016-03-31 20:53:47 +00:00
Hal Finkel 0f02ccd955 [PowerPC] Cleanup test/CodeGen/PowerPC/qpx-load-splat.ll
Removing unnecessary attributes and metadata...

llvm-svn: 265049
2016-03-31 20:45:00 +00:00
Sanjay Patel 61e13249d8 [x86] add memset tests to show another potential improvement
llvm-svn: 265048
2016-03-31 20:40:32 +00:00
Hal Finkel fc35391f2b [PowerPC] Add a late MI-level pass for QPX load/splat simplification
Chapter 3 of the QPX manual states that, "Scalar floating-point load
instructions, defined in the Power ISA, cause a replication of the source data
across all elements of the target register." Thus, if we have a load followed
by a QPX splat (from the first lane), the splat is redundant. This adds a late
MI-level pass to remove the redundant splats in some of these cases
(specifically when both occur in the same basic block).

This optimization is scheduled just prior to post-RA scheduling. It can't happen
before anything that might replace the load with some already-computed quantity
(i.e. store-to-load forwarding).

llvm-svn: 265047
2016-03-31 20:39:41 +00:00
Hans Wennborg 132cd62121 Revert r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
I think it might have caused these build breakages:
http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio

llvm-svn: 265046
2016-03-31 20:27:30 +00:00
Simon Pilgrim aab59b7a28 [X86][SSE] Some basic tests for variable shuffles
We don't really support non-constant shuffle masks, but these tests are for cases where BUILD_VECTOR is made up from vector extracts (as well as undef/zero scalars).

llvm-svn: 265045
2016-03-31 20:26:30 +00:00
Benjamin Kramer 569efd2cfd [ARM] Expand v1i64 and v2i64 ctpop.
The default is legal, which results in 'Cannot select' errors. This is
triggered during selfhost due to a recent cost model change.

llvm-svn: 265040
2016-03-31 19:42:04 +00:00
Hans Wennborg e97fb414e8 [X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)
For code such as:

  void f(int, int);
  void g() {
      f(1, 2);
  }

compiled for 32-bit X86 Linux, Clang would previously generate:

  subl    $12, %esp
  subl    $8, %esp
  pushl   $2
  pushl   $1
  calll   f
  addl    $16, %esp
  addl    $12, %esp
  retl

This patch fixes that by merging adjacent stack adjustments in
eliminateCallFramePseudoInstr().

Differential Revision: http://reviews.llvm.org/D18627

llvm-svn: 265039
2016-03-31 19:26:24 +00:00
Sanjay Patel 92d5ea5e07 [x86] use SSE/AVX ops for non-zero memsets (PR27100)
Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast
targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping
into a codegen sinkhole while trying to splat a byte into an XMM reg.

Follow-on bugs exposed by the current codegen are:
https://llvm.org/bugs/show_bug.cgi?id=27141
https://llvm.org/bugs/show_bug.cgi?id=27143

Differential Revision: http://reviews.llvm.org/D18566

llvm-svn: 265029
2016-03-31 17:30:06 +00:00
Hans Wennborg a7543ba10c More checks in win32-seh-nested-finally.ll after comment on r264966
llvm-svn: 265027
2016-03-31 16:42:10 +00:00
Ulrich Weigand 6ad762d36f [PowerPC] Attempt to fix fast-isel-i64offset.ll failure
The test case added in r265023 is failing on ninja-x64-msvc-RA-centos6.
Update the test to make less specific assumptions on code generation.

llvm-svn: 265026
2016-03-31 16:38:57 +00:00
Ulrich Weigand 3707ba8030 [PowerPC] Correctly compute 64-bit offsets in fast isel
PPCSimplifyAddress contains this code:

  IntegerType *OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(*Context)
                                            : Type::getInt64Ty(*Context));

to determine the type to be used for an index register, if one needs
to be created.  However, the "VT" here is the type of the data being
loaded or stored, *not* the type of an address.  This means that if
a data element of type i32 is accessed using an index that does not
not fit into 32 bits, a wrong address is computed here.

Note that PPCFastISel is only ever used on 64-bit currently, so the type
of an address is actually *always* MVT::i64.  Other parts of the code,
even in this same PPCSimplifyAddress routine, already rely on that fact.
Thus, this patch changes the code to simply unconditionally use
Type::getInt64Ty(*Context) as OffsetTy.

llvm-svn: 265023
2016-03-31 15:37:06 +00:00
Jun Bum Lim cf9744367b [AArch64] Handle missing store pair opportunity
Summary:
This change will handle missing store pair opportunity where the first store
instruction stores zero followed by the non-zero store. For example, this change
will convert :

  str wzr, [x8]
  str w1, [x8, #4]
into:
  stp wzr, w1, [x8]

Reviewers: jmolloy, t.p.northover, mcrosier

Subscribers: flyingforyou, aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18570

llvm-svn: 265021
2016-03-31 14:47:24 +00:00
Ulrich Weigand 1931b01a64 [PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast isel
The fast isel pass currently emits a COPY_TO_REGCLASS node to convert
from a F4RC to a F8RC register class during conversion of a
floating-point number to integer. There is actually no support in the
common code instruction printers to emit COPY_TO_REGCLASS nodes, so the
PowerPC back-end has special code there to simply ignore
COPY_TO_REGCLASS.

This is correct *if and only if* the source and destination registers of
COPY_TO_REGCLASS are the same (except for the different register class).
But nothing guarantees this to be the case, and if the register
allocator does end up allocating source and destination to different
registers after all, the back-end simply generates incorrect code. I've
included a test case that shows such incorrect code generation.

However, it seems that COPY_TO_REGCLASS is actually not intended to be
used at the MI layer at all. It is used during SelectionDAG, but always
lowered to a plain COPY before emitting MI. Other back-end's fast isel
passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong
for the PowerPC back-end to emit it here.

This patch changes the PowerPC back-end to directly emit COPY instead of
COPY_TO_REGCLASS and removes the special handling in the instruction
printers.

Differential Revision: http://reviews.llvm.org/D18605

llvm-svn: 265020
2016-03-31 14:44:50 +00:00
Davide Italiano 936a2b09f3 [DebugInfo] Subprograms should belong to a CU.
Start fixing tests accordingly. There are still
about 35 failures before we can enable this check
in the IR verifier.

llvm-svn: 264990
2016-03-31 03:40:07 +00:00
Hal Finkel 851b33a0b1 [PowerPC] Load two floats directly instead of using one 64-bit integer load
When dealing with complex<float>, and similar structures with two
single-precision floating-point numbers, especially when such things are being
passed around by value, we'll sometimes end up loading both float values by
extracting them from one 64-bit integer load. It looks like this:

  t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64
      t16: i64 = srl t13, Constant:i32<32>
    t17: i32 = truncate t16
  t18: f32 = bitcast t17
    t19: i32 = truncate t13
  t20: f32 = bitcast t19

The problem, especially before the P8 where those bitcasts aren't legal (and
get expanded via the stack), is that it would have been better to use two
floating-point loads directly. Here we add a target-specific DAGCombine to do
just that. In short, we turn:

	ld 3, 0(5)
	stw 3, -8(1)
	rldicl 3, 3, 32, 32
	stw 3, -4(1)
	lfs 3, -4(1)
	lfs 0, -8(1)

into:

        lfs 3, 4(5)
        lfs 0, 0(5)

llvm-svn: 264988
2016-03-31 02:56:05 +00:00
Sean Silva 24d7e2e869 Fix case confusion.
The test case was defining and using a function 'notExported()', but
the FileCheck checks were checking for the name 'not_exported'. This
changes the test to use 'notExported' across the board. Also, the test
defined a function 'not_defined()', but doesn't have any checks related
to it. For consistency, this name is changed to 'notDefined'. A later
commit will add checks for 'notDefined'.

Patch by Warren Ristow!

llvm-svn: 264984
2016-03-31 01:47:33 +00:00
Hans Wennborg be0df2b102 Add some more triples after r264966
llvm-svn: 264972
2016-03-30 23:55:22 +00:00
Hans Wennborg 6596977130 [X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325)
The size savings are significant, and from what I can tell, both ICC and GCC do this.

Differential Revision: http://reviews.llvm.org/D18573

llvm-svn: 264966
2016-03-30 23:38:01 +00:00
Matt Arsenault 2fe4fbc184 AMDGPU: Add frexp_exp intrinsic
llvm-svn: 264944
2016-03-30 22:28:52 +00:00
Simon Pilgrim c49bd2ede0 [X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.

llvm-svn: 264922
2016-03-30 20:52:24 +00:00
Tom Stellard 1d5e6d4bdc AMDGPU/SI: Improve MachineSchedModel definition
This patch contains a few improvements to the model, including:

- Using a single resource with a defined buffers size for each memory unit.
- Setting the IssueWidth correctly.
- Fixing latency values for memory instructions.

shader-db stats:

16429 shaders in 3231 tests
Totals:
SGPRS: 318232 -> 312328 (-1.86 %)
VGPRS: 208996 -> 209346 (0.17 %)
Code Size: 7147044 -> 7166440 (0.27 %) bytes
LDS: 83 -> 83 (0.00 %) blocks
Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave
Max Waves: 49182 -> 49243 (0.12 %)
Wait states: 0 -> 0 (0.00 %)A

Differential Revision: http://reviews.llvm.org/D18453

llvm-svn: 264877
2016-03-30 16:35:13 +00:00
Tom Stellard 0bc954e3bc AMDGPU/SI: Enable lanemask tracking in misched
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.

This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.

shader-db stats:

2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)

Depends on D18451

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18452

llvm-svn: 264876
2016-03-30 16:35:09 +00:00
Simon Pilgrim b87ffe8519 [X86][XOP] BITREVERSE lowering using VPPERM
XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.

llvm-svn: 264870
2016-03-30 14:14:00 +00:00
Simon Pilgrim 9490b56a89 [X86][SSE] Test the legalization of vector comparison results
We are currently doing a REALLY bad job of packing results of vector comparisons into the legalized <X x i1> result equivalents - a mixture of PACKSS/PMOVMSKB would be much better here.

llvm-svn: 264867
2016-03-30 13:55:00 +00:00
Simon Pilgrim ab305a9d4c [X86][SSE] Added tests for clearing upper bits of vector elements
Patterns based on PR6455

llvm-svn: 264857
2016-03-30 11:43:26 +00:00
Chandler Carruth 8e06a10d1f [x86] Fix a horrible bug in our lowering of x86 floating point atomic
operations.

Specifically, we had code that tried to badly approximate reconstructing
all of the possible variations on addressing modes in two x86
instructions based on those in one pseudo instruction. This is not the
first bug uncovered with doing this, so stop doing it altogether.
Instead generically and pedantically copy every operand from the address
over to both new instructions, and strip kill flags from any register
operands.

This fixes a subtle bug seen in the wild where we would mysteriously
drop parts of the addressing mode, causing for example the index
argument in the added test case to just be completely ignored.

Hypothetically, this was an extremely bad miscompile because it actually
caused a predictable and leveragable write of a 64bit quantity to an
unintended offset (the first element of the array intead of whatever
other element was intended). As a consequence, in theory this could even
have introduced security vulnerabilities.

However, this was only something that could happen with an atomic
floating point add. No other operation could trigger this bug, so it
seems extremely unlikely to have occured widely in the wild.

But it did in fact occur, and frequently in scientific applications
which were using relaxed atomic updates of a floating point value after
adding a delta. Those would end up being quite badly miscompiled by
LLVM, which is how we found this. Of course, this often looks like
a race condition in the code, but it was actually a miscompile.

I suspect that this whole RELEASE_FADD thing was a complete mistake.
There is no such operation, and I worry that anything other than add
will get remarkably worse codegeneration. But that's not for this
change....

llvm-svn: 264845
2016-03-30 08:41:59 +00:00
Adrian Prantl 4a09777b37 Upgrade some wildly anachronistic debug info in testcases.
llvm-svn: 264797
2016-03-29 22:34:30 +00:00
Sanjay Patel f48f7d74e2 use FileCheck and auto-check-generation script for exact checking
1. Removed the run line for mingw32 and made the Darwin triples unknown.
   This is a test of 32-bit vs. 64-bit platform and the underlying hardware.
   We have other tests for checking behavioral differences of the OS platform.

2. Changed the CPU specifiers to the attributes they were meant to represent.
   Any CPU that doesn't have SSE4.2 is assumed to have slow unaligned 16-byte accesses,
   so it won't use those here.
 
3. Although the stores really could all be CHECK-DAG, I left them as CHECK-NEXT to
   show the strange behavior of the instruction scheduler in the SLOW_32 case.

4. The odd-looking instructions are due to the use of a null pointer in the IR, so
   we have integer immediate store addresses. Cute.

llvm-svn: 264796
2016-03-29 22:27:39 +00:00
Nirav Dave 2aab7f4358 Add support for no-jump-tables
Add function soft attribute to the generation of Jump Tables in CodeGen
as initial step towards clang support of gcc's no-jump-table support

Reviewers: hans, echristo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18321

llvm-svn: 264756
2016-03-29 17:46:23 +00:00
Manman Ren f46262e0b7 Swift Calling Convention: add swiftself attribute.
Differential Revision: http://reviews.llvm.org/D17866

llvm-svn: 264754
2016-03-29 17:37:21 +00:00
Sanjay Patel 7c61507991 [x86] add tests to show current memset codegen
llvm-svn: 264748
2016-03-29 17:09:27 +00:00
Sanjay Patel 32de9628e3 regenerate checks
llvm-svn: 264738
2016-03-29 16:11:29 +00:00
Junmo Park 32ddc544c3 fix CHECK_NOT -> CHECK-NOT
llvm-svn: 264706
2016-03-29 07:53:07 +00:00
Elena Demikhovsky 95629caaa9 AVX-512: fixed a bug in fp_to_uint pattern on KNL
Fixed fp_to_uint instruction selection on KNL.
One pattern was missing for <4 x double> to <4 x i32>

Differential Revision: http://reviews.llvm.org/D18512

llvm-svn: 264701
2016-03-29 06:33:41 +00:00
Hal Finkel fa7057a415 [PowerPC] Refactor popcnt[dw] target features
Instead of using two feature bits, one to indicate the availability of the
popcnt[dw] instructions, and another to indicate whether or not they're fast,
use a single enum. This allows more consistent control via target attribute
strings, and via Clang's command line.

llvm-svn: 264690
2016-03-29 01:36:01 +00:00
Kyle Butt 5e241b11ed [Codegen] Decrease minimum jump table density.
Minimum density for both optsize and non optsize are now options
-sparse-jump-table-density (default 10) for non optsize functions
-dense-jump-table-density (default 40) for optsize functions, which
matches the current default. This improves several benchmarks at google
at the cost of a small codesize increase. For code compiled with -Os,
the old behavior continues

llvm-svn: 264689
2016-03-29 00:23:41 +00:00
Sanjay Patel 1f867c6f9c fix checks: *_DAG -> *-DAG
llvm-svn: 264676
2016-03-28 22:11:06 +00:00
Sanjay Patel 53f9ae288a fix CHECK_NEXT -> CHECK-NEXT
llvm-svn: 264674
2016-03-28 22:03:07 +00:00
Sanjay Patel d9ebff8bab fix CHECK_DAG -> CHECK-DAG
llvm-svn: 264673
2016-03-28 22:00:38 +00:00
Sanjay Patel 52fe9d0efe fix CHECK_NEXT -> CHECK-NEXT
llvm-svn: 264672
2016-03-28 21:58:27 +00:00