Commit Graph

20263 Commits

Author SHA1 Message Date
John Brawn 9009d2905d [ARM] Fix lowering of misaligned memcpy/memset
Currently getOptimalMemOpType returns i32 for large enough sizes without
checking for alignment, leading to poor code generation when misaligned accesses
aren't permitted as we generate a word store then later split it up into byte
stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for
memset we splat the memset value into a word then immediately split it up
again.

Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type
to use, but also fix a bug there where it wasn't correctly checking if
misaligned memory accesses are allowed.

Differential Revision: https://reviews.llvm.org/D33442

llvm-svn: 303990
2017-05-26 13:59:12 +00:00
Amaury Sechet ba9d8ba82a nits in wide-integer-cmp.ll . NFC
llvm-svn: 303989
2017-05-26 13:56:54 +00:00
John Brawn 57b2492b38 [ARM] Add tests for 6-M memcpy/memset code generation
Differential Revision: https://reviews.llvm.org/D33495

llvm-svn: 303987
2017-05-26 13:52:36 +00:00
Matthias Braun c93c063993 Revert "LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEI"
Tentatively revert this to see if it fixes the buildbot stage2
breakages.

This reverts commit r303938.
This reverts commit r303954.

llvm-svn: 303960
2017-05-26 02:25:20 +00:00
Matthias Braun 9e6826de77 Test for r303938
llvm-svn: 303954
2017-05-26 01:29:25 +00:00
Tim Shen a2b85da879 [PPC] Fix atomics lowering in DAG lowering.
I forgot to forward the chain, causing some missing instruction
dependencies. The test crashes the compiler without this patch.

Inspired by the test case, D33519 also tries to remove the extra sync.

Differential Revision: https://reviews.llvm.org/D33573

llvm-svn: 303931
2017-05-25 22:58:35 +00:00
Andrew Kaylor f466001eef Add constrained intrinsics for some libm-equivalent operations
Differential revision: https://reviews.llvm.org/D32319

llvm-svn: 303922
2017-05-25 21:31:00 +00:00
Matthias Braun 1527baab0c CodeGen: Rename DEBUG_TYPE to match passnames
Rename the DEBUG_TYPE to match the names of corresponding passes where
it makes sense. Also establish the pattern of simply referencing
DEBUG_TYPE instead of repeating the passname where possible.

llvm-svn: 303921
2017-05-25 21:26:32 +00:00
Nico Weber b3d83a092a Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.
llvm-svn: 303902
2017-05-25 19:19:29 +00:00
Manoj Gupta d536180fdc [AArch64]: add 'a' inline asm operand modifier.
Summary:
This is used in the Linux kernel, and effectively just means "print an
address". This brings back r193593.

Reviewed by: Renato Golin

Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls

Subscribers: aemerson, javed.absar, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D33558

llvm-svn: 303901
2017-05-25 19:07:57 +00:00
Tim Corringham 32d0d38679 [AMDGPU] add intrinsic for s_getpc
Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D32862

llvm-svn: 303859
2017-05-25 14:04:14 +00:00
Oren Ben Simhon 7bf27f03f2 [X86] Adding vpopcntd and vpopcntq instructions
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).

Differential Revision: https://reviews.llvm.org/D33169

llvm-svn: 303858
2017-05-25 13:45:23 +00:00
Tony Jiang 0a429f040e [PowerPC] Fix a performance bug for PPC::XXSLDWI.
There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI
instruction, this patch recognizes them and does the selection to improve the
PPC performance.

llvm-svn: 303822
2017-05-24 23:48:29 +00:00
Zaara Syeda 932978315b P9: D-form vector load/store. Differential Revision: https://reviews.llvm.org/D33248
llvm-svn: 303780
2017-05-24 17:50:37 +00:00
Krzysztof Parzyszek 6a0005d1b4 Move machine-cse-physreg.mir to test/CodeGen/Thumb
llvm-svn: 303778
2017-05-24 17:20:47 +00:00
Vadzim Dambrouski b07351f4f8 [MSP430] Fix PR33050: Don't use ADD16ri to lower FrameIndex.
Use ADDframe pseudo instruction instead.
This will fix machine verifier error, and will help to fix PR32146.

Differential Revision: https://reviews.llvm.org/D33452

llvm-svn: 303758
2017-05-24 15:08:30 +00:00
Mikael Holmen 2676f8269a MachineCSE: Respect interblock physreg liveness
Summary:
This is a fix for PR32538. MachineCSE first looks at MO.isDead(), but
if it is not marked dead, MachineCSE still wants to do its own check
to see if it is trivially dead. This check for the trivial case
assumed that physical registers cannot be live out of a block.

Patch by Mattias Eriksson.

Reviewers: qcolombet, jbhateja

Reviewed By: qcolombet, jbhateja

Subscribers: jbhateja, llvm-commits

Differential Revision: https://reviews.llvm.org/D33408

llvm-svn: 303731
2017-05-24 09:35:23 +00:00
Daniel Sanders 35b72229b1 Explicitly set CPU and -slow-incdec to try to fix r303678's test on llvm-clang-x86_64-expensive-checks-win.
llvm-svn: 303727
2017-05-24 07:02:37 +00:00
Daniel Sanders b40e16fbef Revert r303720: Tweak r303678's test to try to fix llvm-clang-x86_64-expensive-checks-win.
It doesn't fix that builder.

llvm-svn: 303721
2017-05-24 06:44:55 +00:00
Daniel Sanders 2087483748 Tweak r303678's test to try to fix llvm-clang-x86_64-expensive-checks-win.
I suspect this buildbot has slow-incdec set by default, most likely due to
the default CPU having this set. This feature bit can prevent optsize from
having an effect on this IR.

llvm-svn: 303720
2017-05-24 06:05:14 +00:00
Vadzim Dambrouski 49dd6e68c2 [MSP430] Add subtarget features for hardware multiplier.
Also add more processors to make -mcpu option behave similar to gcc.

Differential Revision: https://reviews.llvm.org/D33335

llvm-svn: 303695
2017-05-23 21:49:42 +00:00
Simon Pilgrim c910a70b21 [AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045)
This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045

Differential Revision: https://reviews.llvm.org/D33451

llvm-svn: 303691
2017-05-23 21:27:15 +00:00
Francis Visoiu Mistrih 1c98701e57 AsmPrinter: mark the beginning and the end of a function in verbose mode
llvm-svn: 303690
2017-05-23 21:22:16 +00:00
Changpeng Fang 1dbace195d AMDGPU/SI: Move the local memory usage related checking after calling convention checking in PromoteAlloca
Summary:
  Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other.
As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage
related checking out and replace it after the calling convention checking.

Reviewer:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D33139

llvm-svn: 303684
2017-05-23 20:25:41 +00:00
Stanislav Mekhanoshin 53a21292f8 [AMDGPU] Combine and (srl) into shl (bfe)
Perform DAG combine:
and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb
Where nb is a number of trailing zeroes in mask.

It replaces two instructions with two and BFE is generally a more
expensive one. However this is only done if we are selecting a byte
or word at an aligned boundary which results in a proper SDWA
operand pattern. It is only done if SDWA is supported.

TODO: improve SDWA pass to actually convert this pattern. It is not
done now because we have an immediate in the instruction, which has
be moved into a VGPR.

Differential Revision: https://reviews.llvm.org/D33455

llvm-svn: 303681
2017-05-23 19:54:48 +00:00
Oleg Ranevskyy 09df0020fc [ARM] Temporarily disable globals promotion to constant pools to prevent miscompilation
Summary:
A temporary workaround for PR32780 - rematerialized instructions accessing the same promoted global through different constant pool entries.

The patch turns off the globals promotion optimization leaving all its code in place, so that it can be easily turned on once PR32780 is fixed.

Since this is a miscompilation issue causing generation of misbehaving code, and the problem is very subtle, the patch might be valuable enough to get into 4.0.1.

Reviewers: efriedma, jmolloy

Reviewed By: efriedma

Subscribers: aemerson, javed.absar, llvm-commits, rengolin, asl, tstellar

Differential Revision: https://reviews.llvm.org/D33446

llvm-svn: 303679
2017-05-23 19:38:37 +00:00
Daniel Sanders 452c8aec61 [globalisel][tablegen] Add support for (set $dst, 1) and test X86's OptForSize predicate.
Summary:
It's rare but a small number of patterns use IntInit's at the root of the match.
On X86, one such rule is enabled by the OptForSize predicate and causes the
compiler to use the smaller:
	%0 = MOV32r1
instead of the usual:
	%0 = MOV32ri 1

This patch adds support for matching IntInit's at the root and uses this as a
test case for the optsize attribute that was implemented in r301750

Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls, aditya_nandakumar

Reviewed By: qcolombet

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32791

llvm-svn: 303678
2017-05-23 19:33:16 +00:00
Stanislav Mekhanoshin a96ec3f360 [AMDGPU] Convert shl (add) into add (shl)
shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1)
This allows to fold a constant into an address in some cases as
well as to eliminate second shift if the expression is used as
an address and second shift is a result of a GEP.

Differential Revision: https://reviews.llvm.org/D33432

llvm-svn: 303641
2017-05-23 15:59:58 +00:00
Florian Hahn abb4218b98 [AArch64] Make instruction fusion more aggressive.
Summary:
This patch makes instruction fusion more aggressive by
* adding artificial edges between the successors of FirstSU and
  SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps.
* updating PostGenericScheduler::tryCandidate to keep clusters together,
   similar to GenericScheduler::tryCandidate.

This change increases the number of AES instruction pairs generated on
 Cortex-A57 and Cortex-A72. This doesn't change code at all in
 most benchmarks or general code, but we've seen improvement on kernels
 using AESE/AESMC and AESD/AESIMC. 

Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB

Reviewed By: evandro

Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33230

llvm-svn: 303618
2017-05-23 09:33:34 +00:00
Igor Breger 617be6e475 [GlobalISel][X86] G_LOAD/G_STORE vec256/512 support
Summary: mark G_LOAD/G_STORE vec256/512 legal for AVX/AVX512. Implement instruction selection.

Reviewers: zvi, guyblank

Reviewed By: zvi

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33268

llvm-svn: 303617
2017-05-23 08:23:51 +00:00
Akira Hatanaka e8ae3346a3 [AArch64] Fix PRR33100.
This commit fixes a bug introduced in r301019 where optimizeLogicalImm
would replace a logical node's immediate operand that was CSE'd and
was also an operand of another node.

This commit fixes the bug by replacing the logical node instead of its
immediate operand.

rdar://problem/32295276

llvm-svn: 303607
2017-05-23 06:08:37 +00:00
Amaury Sechet beef4d7887 Update expected result for or-branch.ll . NFC
llvm-svn: 303606
2017-05-23 05:42:54 +00:00
Stanislav Mekhanoshin 5fa289f0d8 [AMDGPU] Narrow lshl from 64 to 32 bit if possible
Turn expensive 64 bit shift into 32 bit if shift does not overflow int:
shl (ext x) => zext (shl x)

Differential Revision: https://reviews.llvm.org/D33367

llvm-svn: 303569
2017-05-22 16:58:10 +00:00
Nirav Dave 3465861177 [X86] Remove target feature info from mul-i256.ll test. NFC.
llvm-svn: 303558
2017-05-22 15:04:08 +00:00
Simon Atanasyan e0b726f2fa [mips] Support micromips attribute passed by front-end
This patch adds handling of the `micromips` and `nomicromips` attributes
passed by front-end. The patch depends on D33363.

Differential revision: https://reviews.llvm.org/D33364

llvm-svn: 303545
2017-05-22 12:47:41 +00:00
Strahinja Petrovic ab9573f37c [MIPS] Add support to match more patterns for DINS instruction
This patch adds support for recognizing patterns to match
DINS instruction.

Differential Revision: https://reviews.llvm.org/D31465

llvm-svn: 303537
2017-05-22 09:06:44 +00:00
Amaury Sechet 89d733a505 Regenerate expected result for test constant-combines.ll . NFC
llvm-svn: 303533
2017-05-22 07:49:16 +00:00
Zvi Rackover fdddf671d8 [X86] Add (ix bitcast(vsetcc)) test cases with illegal types. NFC.
llvm-svn: 303530
2017-05-22 06:39:12 +00:00
Amaury Sechet 1b195c35a0 Add a test case for large integer subtraction via subcarry. NFC
llvm-svn: 303528
2017-05-22 06:06:45 +00:00
Amaury Sechet 822ba7eddc Add test case for subcarry optimization.
llvm-svn: 303525
2017-05-22 02:31:42 +00:00
Igor Breger 014fc566e7 [GlobalISel][X86] Fix G_TRUNC instruction selection.
Updated tests with -verify-machineinstrs flag.
It fixes 3 tests failed with machine verifier enabled and listed
in PR27481

llvm-svn: 303502
2017-05-21 11:13:56 +00:00
Hiroshi Inoue 37e63b1b21 Summary
PPC backend eliminates compare instructions by using record-form instructions in PPCInstrInfo::optimizeCompareInstr, which is called from peephole optimization pass.
This patch improves this optimization to eliminate more compare instructions in two types of common case.


- comparison against a constant 1 or -1

The record-form instructions set CR bit based on signed comparison against 0. So, the current implementation does not exploit the record-form instruction for comparison against a non-zero constant.
This patch enables record-form optimization for constant of 1 or -1 if possible; it changes the condition "greater than -1" into "greater than or equal to 0" and "less than 1" into "less than or equal to 0".
With this patch, compare can be eliminated in the following code sequence, as an example.

uint64_t a, b;
if ((a | b) & 0x8000000000000000ull) { ... }
else { ... }


- andi for 32-bit comparison on PPC64

Since record-form instructions execute 64-bit signed comparison and so we have limitation in eliminating 32-bit comparison, i.e. with cmplwi, using the record-form. The original implementation already has such checks but andi. is not recognized as an instruction which executes implicit zero extension and hence safe to convert into record-form if used for equality check.

%1 = and i32 %a, 10
%2 = icmp ne i32 %1, 0
br i1 %2, label %foo, label %bar

In this simple example, LLVM generates andi. + cmplwi + beq on PPC64.
This patch make it possible to eliminate the cmplwi for this case.
I added andi. for optimization targets if it is safe to do so.

Differential Revision: https://reviews.llvm.org/D30081

llvm-svn: 303500
2017-05-21 06:00:05 +00:00
Amaury Sechet 77cfb4a85f [DAGCombine] (addcarry 0, 0, X) -> (ext/trunc X)
Summary:
While this makes some case better and some case worse - so it's unclear if it is a worthy combine just by itself - this is a useful canonicalisation.

As per discussion in D32756 .

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32916

llvm-svn: 303441
2017-05-19 18:20:44 +00:00
Matthias Braun 420713c40b Fix typo in test
llvm-svn: 303436
2017-05-19 17:25:20 +00:00
Simon Pilgrim 63892402ba [X86][FMA] Tests showing missed fmsubadd opportunities (PR30633)
llvm-svn: 303435
2017-05-19 17:19:26 +00:00
Guy Blank 548e22a1a7 [X86][AVX512] Make i1 illegal in the CodeGen
This patch defines the i1 type as illegal in the X86 backend for AVX512.
For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended.
This should produce better scalar code for i1 types since GPRs will be used instead of mask registers.

Differential Revision: https://reviews.llvm.org/D32273

llvm-svn: 303421
2017-05-19 12:35:15 +00:00
Volkan Keles 6a36c64720 [GlobalISel] IRTranslator: Translate ConstantStruct
Reviewers: qcolombet, ab, t.p.northover, aditya_nandakumar, dsanders

Reviewed By: qcolombet

Subscribers: rovka, kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D33317

llvm-svn: 303412
2017-05-19 09:47:02 +00:00
Matthias Braun d6e75ed93e LiveIntervalAnalysis: Fix missing case in pruneSubRegValues()
pruneSubRegValues() needs to remove subregister ranges starting at
instructions that later get removed by eraseInstrs(). It missed to check
one case in which eraseInstrs() would remove an instruction.

Fixes http://llvm.org/PR32688

llvm-svn: 303396
2017-05-19 00:18:03 +00:00
Hans Wennborg b00ffd8cb7 Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB."
This also reverts follow-ups r303292 and r303298.

It broke some Chromium tests under MSan, and apparently also internal
tests at Google.

llvm-svn: 303369
2017-05-18 18:50:05 +00:00
Francis Visoiu Mistrih 8b61764cbb [LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handling a null TargetMachine call
  `getAnalysisIfAvailable<TargetPassConfig>`.

* Passes not handling a null TargetMachine
  `addRequired<TargetPassConfig>` and call
  `getAnalysis<TargetPassConfig>`.

* MachineFunctionPasses now use MF.getTarget().

* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.

This fixes a crash when running `llc -start-before prologepilog`.

PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.

Related to PR30324.

Differential Revision: https://reviews.llvm.org/D33222

llvm-svn: 303360
2017-05-18 17:21:13 +00:00