Commit Graph

20221 Commits

Author SHA1 Message Date
Amaury Sechet 77cfb4a85f [DAGCombine] (addcarry 0, 0, X) -> (ext/trunc X)
Summary:
While this makes some case better and some case worse - so it's unclear if it is a worthy combine just by itself - this is a useful canonicalisation.

As per discussion in D32756 .

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32916

llvm-svn: 303441
2017-05-19 18:20:44 +00:00
Matthias Braun 420713c40b Fix typo in test
llvm-svn: 303436
2017-05-19 17:25:20 +00:00
Simon Pilgrim 63892402ba [X86][FMA] Tests showing missed fmsubadd opportunities (PR30633)
llvm-svn: 303435
2017-05-19 17:19:26 +00:00
Guy Blank 548e22a1a7 [X86][AVX512] Make i1 illegal in the CodeGen
This patch defines the i1 type as illegal in the X86 backend for AVX512.
For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended.
This should produce better scalar code for i1 types since GPRs will be used instead of mask registers.

Differential Revision: https://reviews.llvm.org/D32273

llvm-svn: 303421
2017-05-19 12:35:15 +00:00
Volkan Keles 6a36c64720 [GlobalISel] IRTranslator: Translate ConstantStruct
Reviewers: qcolombet, ab, t.p.northover, aditya_nandakumar, dsanders

Reviewed By: qcolombet

Subscribers: rovka, kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D33317

llvm-svn: 303412
2017-05-19 09:47:02 +00:00
Matthias Braun d6e75ed93e LiveIntervalAnalysis: Fix missing case in pruneSubRegValues()
pruneSubRegValues() needs to remove subregister ranges starting at
instructions that later get removed by eraseInstrs(). It missed to check
one case in which eraseInstrs() would remove an instruction.

Fixes http://llvm.org/PR32688

llvm-svn: 303396
2017-05-19 00:18:03 +00:00
Hans Wennborg b00ffd8cb7 Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB."
This also reverts follow-ups r303292 and r303298.

It broke some Chromium tests under MSan, and apparently also internal
tests at Google.

llvm-svn: 303369
2017-05-18 18:50:05 +00:00
Francis Visoiu Mistrih 8b61764cbb [LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handling a null TargetMachine call
  `getAnalysisIfAvailable<TargetPassConfig>`.

* Passes not handling a null TargetMachine
  `addRequired<TargetPassConfig>` and call
  `getAnalysis<TargetPassConfig>`.

* MachineFunctionPasses now use MF.getTarget().

* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.

This fixes a crash when running `llc -start-before prologepilog`.

PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.

Related to PR30324.

Differential Revision: https://reviews.llvm.org/D33222

llvm-svn: 303360
2017-05-18 17:21:13 +00:00
Sam Kolton ebfdaf7394 [AMDGPU] SDWA operands should not intersect with potential MIs
Summary:
There should be no intesection between SDWA operands and potential MIs. E.g.:
```
v_and_b32 v0, 0xff, v1 -> src:v1 sel:BYTE_0
v_and_b32 v2, 0xff, v0 -> src:v0 sel:BYTE_0
v_add_u32 v3, v4, v2
```
In that example it is possible that we would fold 2nd instruction into 3rd (v_add_u32_sdwa) and then try to fold 1st instruction into 2nd (that was already destroyed). So if SDWAOperand is also a potential MI then do not apply it.

Reviewers: vpykhtin, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D32804

llvm-svn: 303347
2017-05-18 12:12:03 +00:00
Igor Breger 842b5b36ba [GlobalISel][X86] G_ADD/G_SUB vector legalizer/selector support.
Summary: G_ADD/G_SUB vector legalizer/selector support.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33232

llvm-svn: 303345
2017-05-18 11:10:56 +00:00
Zvi Rackover d17d13d2a9 [X86] Add explicit triple to test invocation
llvm-svn: 303340
2017-05-18 09:32:56 +00:00
Lama Saba 2ea271b54a [X86] Replace slow LEA instructions in X86
According to Intel's Optimization Reference Manual for SNB+:
  " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
    dispatch via port 1:
  - LEA that has all three source operands: base, index, and offset
  - LEA that uses base and index registers where the base is EBP, RBP,or R13
  - LEA that uses RIP relative addressing mode
  - LEA that uses 16-bit addressing mode "
  This patch currently handles the first 2 cases only.
 
Differential Revision: https://reviews.llvm.org/D32277

llvm-svn: 303333
2017-05-18 08:11:50 +00:00
Serguei Katkov 00211c3faa Fix buildbot failure after rL303327: [BPI] Reduce the probability of unreachable edge to minimal value greater than 0.
One more test is updated to meet new branch probability for unreachable branches.

llvm-svn: 303329
2017-05-18 07:20:52 +00:00
Zvi Rackover c20c6d07cf [X86] Adding tests for scalar bitcasts from vsetcc. NFC.
llvm-svn: 303328
2017-05-18 07:04:48 +00:00
Matt Arsenault 2b1f9aa577 AMDGPU: Start defining a calling convention
Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.

llvm-svn: 303308
2017-05-17 21:56:25 +00:00
Kyle Butt f6c61ef64d CodeGen: Power: Add lowering for shifts of v1i128.
When legalizing vector operations on vNi128, they will be split to v1i128
because that is a legal type on ppc64, but then the compiler will crash in
selection dag because it fails to select for these operations. This patch fixes
shift operations. Logical shift right and left shift can be performed in the
vector unit, but algebraic shift right requires being split.

Differential Revision: https://reviews.llvm.org/D32774

llvm-svn: 303307
2017-05-17 21:54:41 +00:00
Michael Liao ab12984634 Fix PR33028
- '-verify-mahcineinstrs' starts to complain allocatable live-in physical
  registers on non-entry or non-landing-pad basic blocks.
- Refactor the XBEGIN translation to define EAX on a dedicated fallback code
  path due to XABORT. Add a pseudo instruction to define EAX explicitly to
  avoid add physical register live-in.

Differential Revision: https://reviews.llvm.org/D33168

llvm-svn: 303306
2017-05-17 21:48:00 +00:00
Matt Arsenault a53292779a AMDGPU: Remove old intrinsic uses
llvm-svn: 303305
2017-05-17 21:38:21 +00:00
Dehao Chen 00549e47bd update the test that should have been updated in r303292. (NFC)
llvm-svn: 303298
2017-05-17 20:44:08 +00:00
Matt Arsenault 98f2946ab3 AMDGPU: Make better use of op_sel with high components
Handle more general swizzles.

llvm-svn: 303296
2017-05-17 20:30:58 +00:00
Dehao Chen 02828a93e8 Only enable LiveRangeShrink for x86.
Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure.

Reviewers: MatzeB, qcolombet

Reviewed By: qcolombet

Subscribers: jholewinski, jyknight, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33294

llvm-svn: 303292
2017-05-17 20:18:13 +00:00
Matt Arsenault 786eeea23e AMDGPU: Try to use op_sel when selecting packed instructions
Avoids instructions to pack a vector when the source is really
a scalar being broadcast.

Also be smarter and look for per-component fneg.

Doesn't yet handle scalar from upper half of register
or other swizzles.

llvm-svn: 303291
2017-05-17 20:00:00 +00:00
Matt Arsenault ee324ffc1f AMDGPU: Fix min3/max3 combines for f16/i16
Fix missing instruction definitions for min3/max3.

llvm-svn: 303284
2017-05-17 19:25:06 +00:00
Sanjay Patel 9c8f7a2eff [x86] Update tests in psubus.ll; NFC
Remove unnecessary memops to minimize tests.

Patch by Yulia Koval!

Differential Revision: https://reviews.llvm.org/D32643

llvm-svn: 303258
2017-05-17 13:39:16 +00:00
Krzysztof Parzyszek 2b0533126e [PPC] Properly update register save area offsets
The variables MinGPR/MinG8R were not updated properly when resetting the
offsets, which in the included testcase lead to saving the CR register
in the same location as R30.

This fixes another issue reported in PR26519.

Differential Revision: https://reviews.llvm.org/D33017

llvm-svn: 303257
2017-05-17 13:25:09 +00:00
Igor Breger 28f290fab8 [GlobalISel][X86] Support add i64 in IA32.
Summary: support G_UADDE instruction selection.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D33096

llvm-svn: 303255
2017-05-17 12:48:08 +00:00
Daniel Cederman 4af795b499 [Sparc] Remove execute permissions from non-executable text files
Reviewers: jyknight, lero_chris, venkatra

Reviewed By: jyknight

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27127

llvm-svn: 303245
2017-05-17 11:05:20 +00:00
Amara Emerson c9916d7e97 Re-commit r302678, fixing PR33053.
The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions
which didn't have a lowering.

llvm-svn: 303211
2017-05-16 21:29:22 +00:00
Tim Shen 0fbbef43e0 [PPC] Add -ppc-asm-full-reg-names to atomic-2.ll. NFC.
Differential Revisions: https://reviews.llvm.org/D32763

llvm-svn: 303209
2017-05-16 20:58:55 +00:00
Tim Shen 3bef27cc6f [PPC] Lower load acquire/seq_cst trailing fence to cmp + bne + isync.
Summary:
This fixes pr32392.

The lowering pipeline is:
llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in
expandPostRAPseudo.

The reason why expandPostRAPseudo is chosen is because previous passes
are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne-
7, .+4 (some branch pass(s)).

Differential Revision: https://reviews.llvm.org/D32763

llvm-svn: 303205
2017-05-16 20:18:06 +00:00
Reid Kleckner 0ad69fc89f Revert "[X86] Replace slow LEA instructions in X86"
This reverts commit r303183, it broke various buildbots and introduced
sanitizer errors.

llvm-svn: 303199
2017-05-16 19:55:03 +00:00
Nirav Dave da8f221273 Elide stores which are overwritten without being observed.
Summary:
In SelectionDAG, when a store is immediately chained to another store
to the same address, elide the first store as it has no observable
effects. This is causes small improvements dealing with intrinsics
lowered to stores.

Test notes:

* Many testcases overwrite store addresses multiple times and needed
  minor changes, mainly making stores volatile to prevent the
  optimization from optimizing the test away.

* Many X86 test cases optimized out instructions associated with
  associated with va_start.

* Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has
  dependencies to check and can probably be removed and potentially
  replaced with another test.

Reviewers: rnk, john.brawn

Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33206

llvm-svn: 303198
2017-05-16 19:43:56 +00:00
Renato Golin d69570e017 Revert "[ARM] Mark LEApcrel instructions as isAsCheapAsAMove"
Revert "[ARM] Mark LEApcrel as not having side effects"

This reverts commit r303054 and r303053, as they broke the ARM
self-hosting buildbots:

http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/1550

http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/1349

http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/1845

Offline investigation on course.

llvm-svn: 303193
2017-05-16 17:59:07 +00:00
Lama Saba 52e892577d [X86] Replace slow LEA instructions in X86
According to Intel's Optimization Reference Manual for SNB+:
  " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
    dispatch via port 1:
  - LEA that has all three source operands: base, index, and offset
  - LEA that uses base and index registers where the base is EBP, RBP,or R13
  - LEA that uses RIP relative addressing mode
  - LEA that uses 16-bit addressing mode "
  This patch currently handles the first 2 cases only.
 
Differential Revision: https://reviews.llvm.org/D32277

llvm-svn: 303183
2017-05-16 16:01:36 +00:00
Igor Breger 3a45504498 [GlobalISel][X86] Split memop test file. NFC
llvm-svn: 303169
2017-05-16 13:37:31 +00:00
Francis Visoiu Mistrih ebbc7159e9 [ShrinkWrapping] Handle restores on no-return paths
Shrink-wrapping uses post-dominators to find a restore point that
post-dominates all the uses of CSR / stack.

The way dominator trees are modeled in LLVM today is that unreachable
blocks are not present in a generic dominator tree, so, an unreachable node is
dominated by anything: include/llvm/Support/GenericDomTree.h:467.

Since for post-dominators, a no-return block is considered
"unreachable", calling findNearestCommonDominator on an unreachable node
A and a non-unreachable node B, will return B, which can be false. If we
find such node, we bail out since there is no good restore point
available.

rdar://problem/30186931

llvm-svn: 303130
2017-05-15 23:13:35 +00:00
Tim Northover 203c6f055d AArch64: use linker-private symbols for globals in MachO.
We don't use section-relative relocations on AArch64, so all symbols must be at
least visible to the linker (i.e. properly global or l_whatever, but not
L_whatever).

llvm-svn: 303118
2017-05-15 21:51:38 +00:00
Hans Wennborg bd6e9e77a7 Revert r302678 "[AArch64] Enable use of reduction intrinsics."
This caused PR33053.

Original commit message:

> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247

llvm-svn: 303115
2017-05-15 20:59:32 +00:00
Kyle Butt 7d531daece CodeGen: BlockPlacement: Increase tail duplication size for O3.
At O3 we are more willing to increase size if we believe it will improve
performance. The current threshold for tail-duplication of 2 instructions is
conservative, and can be relaxed at O3.

Benchmark results:
llvm test-suite:
6% improvement in aha, due to duplication of loop latch
3% improvement in hexxagon

2% slowdown in lpbench. Seems related, but couldn't completely diagnose.

Internal google benchmark:
Produces 4% improvement on internal google protocol buffer serialization
benchmarks.

Differential-Revision: https://reviews.llvm.org/D32324
llvm-svn: 303084
2017-05-15 17:30:47 +00:00
Simon Pilgrim 55ff57861a [NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)
Follow up to D33147

NVPTXTargetLowering::LowerCall was trusting the default argument values.

Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33189

llvm-svn: 303082
2017-05-15 17:17:44 +00:00
Florian Hahn af91e7e6d2 [AArch64] Enable FeatureFuseAES on Cortex-A72.
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.

llvm-svn: 303073
2017-05-15 15:15:22 +00:00
Dmitry Preobrazhensky 167f8b69e3 [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64
See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D33123

llvm-svn: 303070
2017-05-15 14:28:23 +00:00
Dinar Temirbulatov aa2b7a6faa Test commit.
llvm-svn: 303059
2017-05-15 13:14:04 +00:00
John Brawn 9486becf09 [ARM] Mark LEApcrel instructions as isAsCheapAsAMove
Doing this means that if an LEApcrel is used in two places we will rematerialize
instead of generating two MOVs. This is particularly useful for printfs using
the same format string, where we want to generate an address into a register
that's going to get corrupted by the call.

Differential Revision: https://reviews.llvm.org/D32858

llvm-svn: 303054
2017-05-15 11:57:54 +00:00
John Brawn 43132c46a6 [ARM] Mark LEApcrel as not having side effects
Doing this lets us hoist it out of loops, and I've also marked it as
rematerializable the same as the thumb1 and thumb2 counterparts.

It looks like it being marked as such was just a mistake, as the commit that
made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the
LEApcrelJT instructions were marked as having side-effects, so it looks like
the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was
accidentally marked as such also.

Differential Revision: https://reviews.llvm.org/D32857

llvm-svn: 303053
2017-05-15 11:50:21 +00:00
Ayman Musa c5490e5a29 [X86] Relocate code of replacement of subtarget unsupported masked memory intrinsics to run also on -O0 option.
Currently, when masked load, store, gather or scatter intrinsics are used, we check in CodeGenPrepare pass if the subtarget support these intrinsics, if not we replace them with scalar code - this is a functional transformation not an optimization (not optional).

CodeGenPrepare pass does not run when the optimization level is set to CodeGenOpt::None (-O0).

Functional transformation should run with all optimization levels, so here I created a new pass which runs on all optimization levels and does no more than this transformation.

Differential Revision: https://reviews.llvm.org/D32487

llvm-svn: 303050
2017-05-15 11:30:54 +00:00
Igor Breger 06c61e8639 [GlobalISel][X86] G_BR instruction select test
llvm-svn: 303036
2017-05-15 07:03:38 +00:00
Craig Topper c27dc9f797 [X86] Add avx512vl command lines to the 128/256-bit vector-lzcnt tests so we can see what compare instructions are being used in the lookup table code.
I noticed the 512-bit lzcnts don't use the X86 specific lookup table code and instead use the EXPAND case in LegalizeDAG. I was toying around with fixing this and noticed it would require compare instructions that generate i1 masks and then converting from mask to vector. Then I noticed that we don't test which compares are used with avx512vl and no avx512cd.

llvm-svn: 303020
2017-05-14 19:38:11 +00:00
Craig Topper 87804dfe76 [X86] Cleanup some of the check-prefixes in the vector-lzcnt tests.
Remove an unneeded prefix from the 32-bit command line. Make all the 64-bit triples match. Replace ALL with X64 and remove it from the 32-bit test.

llvm-svn: 303019
2017-05-14 19:38:09 +00:00
Simon Pilgrim f3ee9c6997 [X86][AVX] Allow 32-bit targets to peek through subvectors to extract constant splats for vXi64 shifts.
llvm-svn: 303009
2017-05-14 11:46:26 +00:00