Commit Graph

29161 Commits

Author SHA1 Message Date
Roman Lebedev 1f63d7fef9 [NFC][ARM] addsubcarry-promotion.ll: whoops - replace '.' with '-' in check-prefix
Does not affect update_llc_test_checks, or the actual output,
but is not accepted by the actual FileCheck.

Sorry, i should have noticed this before committing,
not the very next second after..

llvm-svn: 361398
2019-05-22 15:42:33 +00:00
Roman Lebedev 1b45bdf5ba [NFC][ARM] Autogenerate addsubcarry-promotion.ll test
Being affected by upcoming patch

llvm-svn: 361397
2019-05-22 15:34:51 +00:00
Roman Lebedev 6a53135698 [NFC][X86] Autogenerate negative-offset.ll test
Being affected by upcoming patch

llvm-svn: 361396
2019-05-22 15:34:43 +00:00
Roman Lebedev 406421b332 [NFC][X86][AArch64] Rewrite sink-addsub-of-const.ll tests to have full permutation coverage
Somehow missed some patterns initially..
While there, add comments.

llvm-svn: 361390
2019-05-22 14:42:41 +00:00
Roman Lebedev 7c72ca012d UpdateTestChecks: sparc march handling
Summary:
Another target that prefers to use `-march` in tests
```
llvm/test/CodeGen/SPARC$ grep -ri mtriple | wc -l
25
llvm/test/CodeGen/SPARC$ grep -ri march | wc -l
165
```

This test is being affected by a further patch,
so regenerate it to better visualize the changes

Reviewers: RKSimon, dcederman, gberry

Reviewed By: RKSimon

Subscribers: jyknight, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62242

llvm-svn: 361381
2019-05-22 13:04:34 +00:00
Roman Lebedev 4bf35671b5 [NFC][SystemZ] Autogenerate alloca-03.ll test to make test changes more visible
The check lines are being affected by an upcoming patch,
regenerate the checklines to visualize the changes better.

llvm-svn: 361380
2019-05-22 13:04:24 +00:00
Anton Afanasyev df00c6a54f [MIR] Add simple PRE pass to MachineCSE
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.

The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.

First step: https://reviews.llvm.org/D54839

Fixes llvm.org/PR38917

llvm-svn: 361356
2019-05-22 07:41:34 +00:00
Fangrui Song 1c61471ab1 [PPC64] Parse -elfv1 -elfv2 when specified on target triple
Summary:
For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2].

The following results are expected:

ELFv1 when using:
-target powerpc64-unknown-freebsd12.0
-target powerpc64-unknown-freebsd12.0 -mabi=elfv1
-target powerpc64-unknown-freebsd12.0-elfv1

ELFv2 when using:
-target powerpc64-unknown-freebsd12.0 -mabi=elfv2
-target powerpc64-unknown-freebsd12.0-elfv2

[1] https://wiki.freebsd.org/powerpc/llvm-elfv2
[2] https://clang.llvm.org/docs/CrossCompilation.html

Patch by Alfredo Dal'Ava Júnior!

Differential Revision: https://reviews.llvm.org/D61950

llvm-svn: 361355
2019-05-22 07:29:59 +00:00
Nikita Popov 15df05152d [X86] Don't compare i128 through vector if construction not cheap (PR41971)
Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the
combineVectorSizedSetCCEquality() transform more conservative by
checking that the bitcast to the vector type will be cheap/free
for both operands. I'm considering it cheap if it's a constant,
a load or already a vector. I've dropped the explicit check for
f128 because it should fall out naturally (in the cases where
it'd be detrimental).

Differential Revision: https://reviews.llvm.org/D62220

llvm-svn: 361352
2019-05-22 06:47:06 +00:00
Chen Zheng 9970665f60 [PowerPC] [ISEL] select x-form instruction for unaligned offset
Differential Revision: https://reviews.llvm.org/D62173

llvm-svn: 361346
2019-05-22 02:57:31 +00:00
Pengfei Wang 6a0d432e9e [X86] [CET] Deal with return-twice function such as vfork, setjmp when
CET-IBT enabled

Return-twice functions will indirectly jump after the caller's position.
So when CET-IBT is enable, we should make sure these is endbr*
instructions follow these Return-twice function caller. Like GCC does.

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D61881

llvm-svn: 361342
2019-05-22 00:50:21 +00:00
Matt Arsenault 2cba91b8db AMDGPU: Assume calls read exec
llvm-svn: 361333
2019-05-21 23:23:16 +00:00
Matt Arsenault eea81c20fe AMDGPU: Add some tests for inlineasm behavior
llvm-svn: 361332
2019-05-21 23:23:12 +00:00
Matt Arsenault dd1ffa00a5 AMDGPU: Assume call pseudos are convergent
There should probably be nonconvergent versions, but my guess is it
doesn't matter in practice.

llvm-svn: 361331
2019-05-21 23:23:10 +00:00
Matt Arsenault 60ba03e210 AMDGPU: Fix not marking new gfx10 SGPRs as CSRs
llvm-svn: 361330
2019-05-21 23:23:05 +00:00
Dan Gohman a49496fb2a [WebAssembly] Add the signature for the new llround builtin function
r360889 added new llround builtin functions. This patch adds their
signatures for the WebAssembly backend.

It also adds wasm32 support to utils/update_llc_test_checks.py, since
that's the script other targets are using for their testcases for this
feature.

Differential Revision: https://reviews.llvm.org/D62207

llvm-svn: 361327
2019-05-21 23:06:34 +00:00
Roman Lebedev 675307b1f1 [NFC][AMDGPU] Autogenerate llvm.amdgcn.s.barrier.ll test
llvm-svn: 361320
2019-05-21 21:49:14 +00:00
Roman Lebedev 21e8ec8d4f [NFC][X86] Autogenerate ragreedy-hoist-spill.ll test
llvm-svn: 361319
2019-05-21 21:49:10 +00:00
Roman Lebedev 079d8b425f [NFC][Thumb2] Autogenerate thumb2-ldr_pre.ll test
llvm-svn: 361318
2019-05-21 21:49:05 +00:00
Nikita Popov d34d96770e [X86] Add large integer comparison tests for PR41971; NFC
In these cases we would prefer a direct comparison over going through
a vector type.

llvm-svn: 361315
2019-05-21 21:27:08 +00:00
Yi-Hong Lyu 00e85f7535 Move csr-save-restore-order.ll to the right place
llvm-svn: 361306
2019-05-21 20:28:31 +00:00
Roman Lebedev a7e88f8570 [NFC][X86][AArch64] Add tests for sinking of add/sub by constant through add/sub
Looks we can transform all 8 variants of the pattern:
https://rise4fun.com/Alive/auH

This comes up as an issue on the path towards
https://bugs.llvm.org/show_bug.cgi?id=41952

llvm-svn: 361303
2019-05-21 20:14:54 +00:00
Stanislav Mekhanoshin 44d17ca02e Fix register coalescer failure to prune value
Register coalescer fails for the test in the patch with the assertion in
JoinVals::ConflictResolution `DefMI != nullptr'. It attempts to join
live intervals for two adjacent instructions and erase the copy:

    %2:vreg_256 = COPY %1
    %3:vreg_256 = COPY killed %1

The LI needs to be adjusted to kill subrange for the erased instruction
and extend the subrange of the original def. That was done for the main
interval only but not for the subrange. As a result subrange had a VNI
pointing to the erased slot resulting in the above failure.

Differential Revision: https://reviews.llvm.org/D62162

llvm-svn: 361293
2019-05-21 19:32:41 +00:00
Leonard Chan 0bada7ce6c [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them. The
result is saturated and clamped between the largest and smallest representable
values of the first 2 operands.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D55720

llvm-svn: 361289
2019-05-21 19:17:19 +00:00
Simon Pilgrim 4b82e50315 [X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP support
Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692

llvm-svn: 361270
2019-05-21 15:20:24 +00:00
Roman Lebedev d8db224ecb [NFC][X86][AArch64] Shift amount masking: tests that show that 'neg' doesn't last
Meaning if we were to produce 'neg' in dagcombine, we will get an
endless cycle; some inverse transform would need to be guarded somehow.

Also, the 'and (sub 0, x), 31' variant is sticky,
doesn't get optimized in any way.

https://bugs.llvm.org/show_bug.cgi?id=41952

llvm-svn: 361254
2019-05-21 13:04:56 +00:00
Simon Pilgrim bc03bee66b [X86][SSE] Add shuffle tests for 'splat3' patterns.
Test codegen from shuffles for { dst[0] = dst[1] = dst[2] = *src++; dst += 3 } 'splatting' memcpy patterns generated by loop-vectorizer.

llvm-svn: 361243
2019-05-21 11:42:28 +00:00
Roman Lebedev 2aee73f591 [NFC][X86][AArch64] Add some more tests for shift amount masking
The negation creation should be more eager:
https://bugs.llvm.org/show_bug.cgi?id=41952

llvm-svn: 361241
2019-05-21 11:14:01 +00:00
Florian Hahn 4a8835c655 [AArch64] Skip mask checks for masks with an odd number of elements.
Some checks in isShuffleMaskLegal expect an even number of elements,
e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access
invalid elements and crash. This patch adds checks to the impacted
functions.

Fixes PR41951

Reviewers: t.p.northover, dmgreen, samparker

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D60690

llvm-svn: 361235
2019-05-21 10:05:26 +00:00
Sam Parker 3141bbd52d [ARM][CGP] Skip nuw in PrepareConstants
PrepareConstants step converts add/sub with 'negative' immediates to
sub/add with a 'positive' imm to make promotion more simple. nuw
already states that the add shouldn't cause an unsigned wrap, so
it shouldn't need any tweaking. Plus, we also don't allow a sub with
a 'negative' immediate to be safe wrap, so this functionality has
been removed. The PrepareConstants step now just handles the add
instructions that we've determined would be safe if they wrap around
zero.

Differential Revision: https://reviews.llvm.org/D62057

llvm-svn: 361227
2019-05-21 07:56:47 +00:00
Dylan McKay e967308da4 Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess
Summary:
The endianess used in the calling convention does not always match the
endianess of the target on all architectures, namely AVR.

When an argument is too large to be legalised by the architecture and is
split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian
is queried to find the endianess that function arguments must be laid
out in.

This approach was recommended by Eli Friedman.

Originally reported in https://github.com/avr-rust/rust/issues/129.

Patch by Carl Peto.

Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma

Reviewed By: efriedma

Subscribers: JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62003

llvm-svn: 361222
2019-05-21 06:38:02 +00:00
QingShan Zhang 690fa1b51b [NFC][PowerPC] Add a test to verify if the scheduler schedule the addi before the load.
llvm-svn: 361221
2019-05-21 06:32:31 +00:00
Nikita Popov e44691bf9f Move thumbv7k test from AArch64 to ARM
As pointed out by charukcs on rL361166, this test uses an ARM triple.

llvm-svn: 361220
2019-05-21 06:24:36 +00:00
Chen Zheng e64bcada5f [PowerPC] test cases for selecting x-form instruction for unaligned offset - NFC
llvm-svn: 361219
2019-05-21 05:06:09 +00:00
Matt Arsenault 6dd08e335f AMDGPU: Force skip branches over calls
Unfortunately the way SIInsertSkips works is backwards, and is
required for correctness. r338235 added handling of some special cases
where skipping is mandatory to avoid side effects if no lanes are
active. It conservatively handled asm correctly, but the same logic
needs to apply to calls.

Usually the call sequence code is larger than the skip threshold,
although the way the count is computed is really broken, so I'm not
sure if anything was likely to really hit this.

llvm-svn: 361202
2019-05-20 22:04:42 +00:00
Martin Storsjo 4ed18e5ef5 [AArch64] Handle lowering lround on windows, where long is 32 bit
Differential Revision: https://reviews.llvm.org/D62108

llvm-svn: 361192
2019-05-20 19:53:28 +00:00
Craig Topper e97e52757c [X86] Add test case for r361177.
That commit makes sure we flush PendingExports in SelectDAGBuilder
before we create INLINEASM_BR. Unfortunatley, I haven't yet found
a CodeGen failure without that change.

This commit uses the debug output from SelectionDAG to at least
ensure we build the DAG correctly.

llvm-svn: 361179
2019-05-20 17:37:52 +00:00
Nikita Popov 9060b6df97 [SDAG] Vector op legalization for overflow ops
Fixes issue reported by aemerson on D57348. Vector op legalization
support is added for uaddo, usubo, saddo and ssubo (umulo and smulo
were already supported). As usual, by extracting TargetLowering methods
and calling them from vector op legalization.

Vector op legalization doesn't really deal with multiple result nodes,
so I'm explicitly performing a recursive legalization call on the
result value that is not being legalized.

There are some existing test changes because expansion happens
earlier, so we don't get a DAG combiner run in between anymore.

Differential Revision: https://reviews.llvm.org/D61692

llvm-svn: 361166
2019-05-20 16:09:22 +00:00
Matt Arsenault 7c8ec18964 RegAlloc: Fix verifier error with undef identity copies
The code did not match the example in the comment, and was checking
the undef flag on the copy dest instead of source. The existing tests
were only hitting the > 2 operands case.

llvm-svn: 361156
2019-05-20 14:09:36 +00:00
Sander de Smalen f83cccf917 Match types of accumulator and result for llvm.experimental.vector.reduce.fadd/fmul
The scalar start/accumulator value of the fadd- and fmul reduction
should match the result type of the reduction, as well as the vector
element-type of the input vector. Although this was not explicitly
specified in the LangRef, it was taken for granted in code implementing
the reductions. The patch also fixes the LangRef by adding this
constraint.

Reviewed By: aemerson, nikic

Differential Revision: https://reviews.llvm.org/D60260

llvm-svn: 361133
2019-05-20 09:54:06 +00:00
Carl Ritson 34e95ce259 [AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt values
Summary:
Avoid introducing hazard mitigation when lgkmcnt is reduced to 0.
Clarify code comments to explain assumptions made for this hazard
mitigation.  Expand and correct test cases to cover variants of
s_waitcnt.

Reviewers: nhaehnle, rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62058

llvm-svn: 361124
2019-05-20 07:20:12 +00:00
Roman Lebedev 1a5d623ded [NFC][AArch64] Autogenerate fcopysign.ll test
llvm-svn: 361106
2019-05-18 20:24:40 +00:00
Roman Lebedev 13ac317e4c [NFC][AArch64] Autogenerate bitfield-insert.ll, selectcc-to-shiftand.ll tests
Investigating bit-extract (ubfx) pattern with shifted mask.

llvm-svn: 361105
2019-05-18 17:42:06 +00:00
Roman Lebedev d1be3c446e [NFC][AArch64] Add some ubfx tests with immediates
Shows the regression in D62100

llvm-svn: 361102
2019-05-18 13:49:44 +00:00
Roman Lebedev 98092f37d0 UpdateTestChecks: fix AMDGPU handling
Summary:
Was looking into supporting `(srl (shl x, c1), c2)` with c1 != c2 in dagcombiner,
this test changes, but makes `update_llc_test_checks.py` unhappy.

**Many** AMDGPU tests specify `-march`, not `-mtriple`, which results in `update_llc_test_checks.py`
defaulting to x86 asm function detection heuristics, which don't work here.
I propose to fix this by adding an infrastructure to map from `-march` to `-mtriple`,
in the UpdateTestChecks tooling.

Reviewers: RKSimon, MaskRay, arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62099

llvm-svn: 361101
2019-05-18 13:00:03 +00:00
Roman Lebedev 822b9c971b UpdateTestChecks: arm64-eabi handlind
Summary:
Was looking into supporting `(srl (shl x, c1), c2)` with c1 != c2 in dagcombiner,
this test changes, but makes `update_llc_test_checks.py` unhappy

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62097

llvm-svn: 361100
2019-05-18 12:59:56 +00:00
Matt Arsenault 2f29220d6d AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFP
llvm-svn: 361082
2019-05-17 23:05:18 +00:00
Matt Arsenault 02b5ca8cd1 GlobalISel: Implement lower for S64->S32 [SU]ITOFP
This is ported from the custom AMDGPU DAG implementation. I think this
is a better default expansion than what the DAG currently uses, at
least if the target has CTLZ.

This implements the signed version in terms of the unsigned
conversion, which is implemented with bit operations. SelectionDAG has
several other implementations that should eventually be ported
depending on what instructions are legal.

llvm-svn: 361081
2019-05-17 23:05:13 +00:00
Matt Arsenault f3cedf4823 GlobalISel: Define integer min/max instructions
Doesn't attempt to emit them for anything yet, but some legalizations
I want to port use them.

llvm-svn: 361061
2019-05-17 18:36:31 +00:00
Simon Pilgrim 065431c82b [X86][SSE] Fold movmsk(not(x)) -> not(movmsk)
Helps to improve folding of comparisons with movmsk results.

llvm-svn: 361056
2019-05-17 17:56:25 +00:00
Simon Pilgrim 2c2f8e74b9 [X86][SSE] Match all-of bool scalar reductions into a bitcast/movmsk + cmp.
Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns. 

llvm-svn: 361052
2019-05-17 17:25:55 +00:00
Roman Lebedev 64c756b991 [DAGCombiner] visitShiftByConstant(): drop bogus signbit check
Summary:
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
   https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
   https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
   https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
   https://rise4fun.com/Alive/ml6

So, why is it there then?

This changes the testcase that was touched by @spatel in rL347478,
but i'm not sure that test tests anything particular?

Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin

Reviewed By: spatel

Subscribers: javed.absar, llvm-commits, spatel

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61918

llvm-svn: 361044
2019-05-17 15:52:58 +00:00
Simon Pilgrim 62c7032c18 [X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) fold.
Prep work for the removal of the remaining x86 CTTZ vector lowering.

llvm-svn: 361035
2019-05-17 14:04:56 +00:00
Matt Arsenault a510b570c2 AMDGPU/GlobalISel: Legalize G_FCEIL
llvm-svn: 361028
2019-05-17 12:20:05 +00:00
Matt Arsenault 6aebcd5499 AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNC
llvm-svn: 361027
2019-05-17 12:20:01 +00:00
Matt Arsenault 6aafc5e19d AMDGPU/GlobalISel: Legalize G_FRINT
llvm-svn: 361026
2019-05-17 12:19:57 +00:00
Matt Arsenault 1448f5689e AMDGPU/GlobalISel: Legalize G_FCOPYSIGN
llvm-svn: 361025
2019-05-17 12:19:52 +00:00
Matt Arsenault 568f193847 AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.load
llvm-svn: 361023
2019-05-17 12:02:34 +00:00
Matt Arsenault a3b5a386fa AMDGPU/GlobalISel: Use subreg index instead of extra unmerge
This saves instructions and extra steps, but I'm not sure about
introducing subregister indexes at this point.

llvm-svn: 361022
2019-05-17 12:02:31 +00:00
Matt Arsenault b3dc73634c AMDGPU/GlobalISel: Use waterfall loop for buffer_load
This adds support for more complex waterfall loops that need to handle
operands > 32-bits, and multiple operands.

llvm-svn: 361021
2019-05-17 12:02:27 +00:00
Clement Courbet 632dfdda16 Re-land r360859: "[MergeICmps] Simplify the code."
With a fix for PR41917: The predecessor list was changing under our feet.

-  for (BasicBlock *Pred : predecessors(EntryBlock_)) {
+  while (!pred_empty(EntryBlock_)) {
+    BasicBlock* const Pred = *pred_begin(EntryBlock_);

llvm-svn: 361009
2019-05-17 09:43:45 +00:00
Rhys Perry c4bc61bad7 [AMDGPU] detect WaW hazards when moving/merging load/store instructions
Summary:
    In order to combine memory operations efficiently, the load/store
    optimizer might move some instructions around. It's usually safe
    to move instructions down past the merged instruction because the
    pass checks if memory operations can be re-ordered.

    Though, the current logic doesn't handle Write-after-Write hazards.

    This fixes a reflection issue with Monster Hunter World and DXVK.

    v2: - rebased on top of master
        - clean up the test case
        - handle WaW hazards correctly

    Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130

Original patch by Samuel Pitoiset.

Reviewers: tpr, arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D61313

llvm-svn: 361008
2019-05-17 09:32:23 +00:00
Jonas Paulsson 9427961c89 [SystemZ] Bugfix in SystemZTargetLowering::combineIntDIVREM()
Make sure to not unroll a vector division/remainder (with a constant splat
divisor) after type legalization, since the scalar type may then be illegal.

Review: Ulrich Weigand
https://reviews.llvm.org/D62036

llvm-svn: 360965
2019-05-17 00:50:35 +00:00
Nico Weber d764e7c660 Revert r360859: "Reland r360771 "[MergeICmps] Simplify the code.""
It caused PR41917.

llvm-svn: 360963
2019-05-17 00:43:53 +00:00
Tim Renouf e3cbdaf1b5 [CodeGen] Fixed de-optimization of legalize subvector extract
The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU
3-dword memory instructions, caused a de-optimization problem for code
with such a load that then bitcasts via vector of i8, because v12i8 is
not an MVT so it legalizes the bitcast by widening it.

This commit adds the ability to widen a bitcast using extract_subvector
on the result, so the value does not need to go via memory.

Differential Revision: https://reviews.llvm.org/D60457

Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64
llvm-svn: 360942
2019-05-16 21:49:06 +00:00
Craig Topper f09b9d419f [X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. Similarly change 0x2 to 0xA for ceil.
This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate
in patterns without masking.

llvm-svn: 360915
2019-05-16 16:53:50 +00:00
Matt Arsenault 99e6f4d11a AMDGPU: Introduce TokenFactor for ABI register copies in call sequence
The call was missing chain dependencies on the pre-call copies. I
don't think this was causing any real issues however.

llvm-svn: 360906
2019-05-16 15:10:27 +00:00
Matt Arsenault df24c92c0f AMDGPU: Assume xnack is enabled by default
This is the conservatively correct default. It is always safe to
assume xnack is enabled, but not the converse.

Introduce a feature to blacklist targets where xnack can never be
meaningfully enabled. I'm not sure the targets this is applied to is
100% correct.

llvm-svn: 360903
2019-05-16 14:48:34 +00:00
Alex Bradbury 3966b02cc8 [RISCV][NFC] Add nounwind attribute to functions missing it in test/CodeGen/RISCV
This is in preparation for emitting CFI directives.

llvm-svn: 360897
2019-05-16 13:56:23 +00:00
Adhemerval Zanella 2d28db6b9f [AArch64] Handle ISD::LROUND and ISD::LLROUND
This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas
instruction. It currently only handles the scalar version.

llvm-svn: 360894
2019-05-16 13:30:18 +00:00
Adhemerval Zanella 73643b5041 [CodeGen] Add lround/llround builtins
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lround/llround generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

llvm-svn: 360889
2019-05-16 13:15:27 +00:00
Matt Arsenault 828b685ebe RegAllocFast: Improve hinting heuristic
Trace through multiple COPYs when looking for a physreg source. Add
hinting for vregs that will be copied into physregs (we only hinted
for vregs getting copied to a physreg previously).  Give hinted a
register a bonus when deciding which value to spill.  This is part of
my rewrite regallocfast series. In fact this one doesn't even have an
effect unless you also flip the allocation to happen from back to
front of a basic block. Nonetheless it helps to split this up to ease
review of D52010

Patch by Matthias Braun

llvm-svn: 360887
2019-05-16 12:50:39 +00:00
Roman Lebedev 62650cf464 [NFC] Fixup FileCheck option name in tests added in rL360881
llvm-svn: 360884
2019-05-16 12:39:34 +00:00
Roman Lebedev ec6608d547 [NFC][CodeGen] Add some more tests for pulling binops through shifts
The ashr variant may see relaxation in https://reviews.llvm.org/D61918

llvm-svn: 360881
2019-05-16 12:26:53 +00:00
Matt Arsenault a8f88c388f AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xor
Bool values should use the scc/vcc regbank since r350611.

llvm-svn: 360877
2019-05-16 12:06:41 +00:00
Clement Courbet c4fdd717ef Reland r360771 "[MergeICmps] Simplify the code."
This revision does not seem to be the culprit.

llvm-svn: 360859
2019-05-16 06:18:02 +00:00
Matt Arsenault 55146d3139 GlobalISel: Add G_FCOPYSIGN
llvm-svn: 360850
2019-05-16 04:08:39 +00:00
Craig Topper e43bdf144c [X86] Delay creating index register negations during address matching until after we know for sure the match will succeed
If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel.

Differential Revision: https://reviews.llvm.org/D61047

llvm-svn: 360823
2019-05-15 21:59:53 +00:00
Reid Kleckner 4882490349 [codeview] Fix SDNode representation of annotation labels
Before this change, they were erroneously constructed with the EH_LABEL
SDNode opcode, which caused other passes to interact with them in
incorrect ways. See the FIXME about fastisel that this addresses in the
existing test case.

Fixes PR41890

llvm-svn: 360818
2019-05-15 21:46:05 +00:00
Mandeep Singh Grang 814435fe87 [AArch64] only indicate CFI on Windows if we emitted CFI
Summary:
Otherwise, we emit directives for CFI without any actual CFI opcodes to
go with them, which causes tools to malfunction.  The technique is
similar to what the x86 backend already does.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40876

Patch by: froydnj (Nathan Froyd)

Reviewers: mstorsjo, eli.friedman, rnk, mgrang, ssijaric

Reviewed By: rnk

Subscribers: javed.absar, kristof.beyls, llvm-commits, dmajor

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61960

llvm-svn: 360816
2019-05-15 21:23:41 +00:00
Nicolai Haehnle 664ceeda68 RegAlloc: try to fail more gracefully when out of registers
Summary:
The emitError path allows the program to continue, unlike report_fatal_error.
This is friendlier to use cases where LLVM is embedded in a larger program,
because the caller may be able to deal with the error somewhat gracefully.

Change the number of requested NOP bytes in the AArch64 and PowerPC
test cases to avoid triggering an unrelated assertion. The compilation
still fails, as verified by the test.

Change-Id: Iafb9ca341002a597b82e59ddc7a1f13c78758e3d

Reviewers: arsenm, MatzeB

Subscribers: qcolombet, nemanjai, wdng, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61489

llvm-svn: 360786
2019-05-15 17:29:58 +00:00
Ryan Taylor 29257eb76c [AMDGPU] Increases available SGPR for Calling Convention
Summary:
SGPR in CC can be either hw initialized or set by other chained shaders
and so this increases the SGPR count availalbe to CC to 105.

Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61261

llvm-svn: 360778
2019-05-15 14:43:55 +00:00
Clement Courbet eaf4413d2d Revert r360771 "[MergeICmps] Simplify the code."
Breaks a bunch of builbdots.

llvm-svn: 360776
2019-05-15 14:21:59 +00:00
Clement Courbet 157ae639fa [MergeICmps] Simplify the code.
Instead of patching the original blocks, we now generate new blocks and
delete the old blocks. This results in simpler code with a less twisted
control flow (see the change in `entry-block-shuffled.ll`).

This will make https://reviews.llvm.org/D60318 simpler by making it more
obvious where control flow created and deleted.

Reviewers: gchatelet

Subscribers: hiraditya, llvm-commits, spatel

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61736

llvm-svn: 360771
2019-05-15 13:04:24 +00:00
David Green 0582b22f10 [ARM] Don't use the Machine Scheduler for cortex-m at minsize
The new cortex-m schedule in rL360768 helps performance, but can increase the
amount of high-registers used. This, on average, ends up increasing the
codesize by a fair amount (because less instructions are converted from T2 to
T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use
the existing DAG scheduler with the RegPressure scheduling preference (at least
until the issues around T2 vs T1 instructions can be improved).

I have also made sure that the Sched::RegPressure dag scheduler is always
chosen for MinSize.

The test shows one case where we increase the number of registers used.

Differential Revision: https://reviews.llvm.org/D61882

llvm-svn: 360769
2019-05-15 12:58:02 +00:00
David Green d2d0f46cd2 [ARM] Cortex-M4 schedule
This patch adds a simple Cortex-M4 schedule, renaming the existing M3
schedule to M4 and filling in the latencies as-per the Cortex-M4 TRM:
https://developer.arm.com/docs/ddi0439/latest

Most of these are 1, with the important exception being loads taking 2
cycles. A few others are also higher, but I don't believe they make a
large difference. I've repurposed the M3 schedule as the latencies are
mostly the same between the two cores, with the M4 having more FP and
DSP instructions. We also turn on MISched and UseAA for the cores that
now use this.

It also adds some schedule Write's to various instruction to make things
simpler.

Differential Revision: https://reviews.llvm.org/D54142

llvm-svn: 360768
2019-05-15 12:41:58 +00:00
Craig Topper 384d46c0d5 [X86] Use OR32mi8Locked instead of LOCK_OR32mi8 in emitLockedStackOp.
They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set
which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think
this makes sense since we are using it as a fence.

This also seems to hide the operation from the speculative load hardening pass
so I've reverted r360511.

llvm-svn: 360747
2019-05-15 04:15:46 +00:00
Fangrui Song f4dfd63c74 [IR] Disallow llvm.global_ctors and llvm.global_dtors of the 2-field form in textual format
The 3-field form was introduced by D3499 in 2014 and the legacy 2-field
form was planned to be removed in LLVM 4.0

For the textual format, this patch migrates the existing 2-field form to
use the 3-field form and deletes the compatibility code.
test/Verifier/global-ctors-2.ll checks we have a friendly error message.

For bitcode, lib/IR/AutoUpgrade UpgradeGlobalVariables will upgrade the
2-field form (add i8* null as the third field).

Reviewed By: rnk, dexonsmith

Differential Revision: https://reviews.llvm.org/D61547

llvm-svn: 360742
2019-05-15 02:35:32 +00:00
Philip Reames 445f942fc4 Use an offset from TOS for idempotent rmw locked op lowering
This was the portion split off D58632 so that it could follow the redzone API cleanup. Note that I changed the offset preferred from -8 to -64. The difference should be very minor, but I thought it might help address one concern which had been previously raised.

Differential Revision: https://reviews.llvm.org/D61862

llvm-svn: 360719
2019-05-14 22:32:42 +00:00
Roman Lebedev 7baf528aba [NFC][CodeGen][X86][AArch64] Add and-const-mask + const-shift pattern tests
Unlike instcombine, we currently don't turn and+shift into shift+and.
We probably should, likely unconditionally.

While i'm adding only all-ones (potentially shifted) mask,
this obviously isn't limited to any particular mask pattern:
https://rise4fun.com/Alive/kmX

Related to https://bugs.llvm.org/show_bug.cgi?id=41874

llvm-svn: 360706
2019-05-14 20:17:04 +00:00
Stanislav Mekhanoshin 05791d90c9 [AMDGPU] Fixed handling of imemdiate i1 literals
This bug was exposed by the rL360395.

Differential Revision: https://reviews.llvm.org/D61812

llvm-svn: 360689
2019-05-14 16:18:00 +00:00
Stanislav Mekhanoshin 7b20032628 [AMDGPU] gfx1010 Strengthen some SMEM WAR hazard unit tests. NFC.
Tighten conditions on SMEM WAR hazard unit tests to ensure rejection
of workaround insertion where a s_waitcnt is present in dependency
chain. The current workaround code already conforms to these revise
tests.

llvm-svn: 360686
2019-05-14 16:04:03 +00:00
Simon Pilgrim c2d9cfd925 [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD targets (PR40758)
D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops - this is true almost entirely for AMD targets where the scalar ALUs are well balanced.

This combine avoids AND immediate mask which usually means we reduce encoding size.

Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily.

Differential Revision: https://reviews.llvm.org/D61830

llvm-svn: 360684
2019-05-14 15:21:28 +00:00
Lei Huang 22561972af [PowerPC] Custom lower known CR bit spills
For known CRBit spills, CRSET/CRUNSET, it is more efficient to load and spill
the known value instead of extracting the bit.

eg. This sequence is currently used to spill a CRUNSET:
    crclr   4*cr5+lt
    mfocrf  r3,4
    rlwinm  r3,r3,20,0,0
    stw     r3,132(r1)

This patch custom lower it to:
    li  r3,0
    stw r3,132(r1)

Differential Revision: https://reviews.llvm.org/D61754

llvm-svn: 360677
2019-05-14 14:27:06 +00:00
Diana Picus a568222ddd [IRTranslator] Don't hardcode GEP index type
When breaking up loads and stores of aggregates, the IRTranslator uses
LLT::scalar(64) for the index type of the G_GEP instructions that
compute the addresses. This is unnecessarily large for 32-bit targets.
Use the int ptr type provided by the DataLayout instead.

Note that we're already doing the right thing when translating
getelementptr instructions from the IR. This is just an oversight when
generating new ones while translating loads/stores.

Both x86 and AArch64 already have tests confirming that the old
behaviour is preserved for 64-bit targets.

Differential Revision: https://reviews.llvm.org/D61852

llvm-svn: 360656
2019-05-14 09:25:17 +00:00
Philip Reames 3098e44daa [X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit targets
This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence.

Differential Revision: https://reviews.llvm.org/D61863

llvm-svn: 360649
2019-05-14 04:43:37 +00:00
Jinsong Ji b7b3d866a4 [PowerPC][NFC] Fix typos in triples
Found by bzEq (Kai Luo).

llvm-svn: 360643
2019-05-14 03:11:24 +00:00
Craig Topper cc761e6fae [X86] Use X86 instead of X32 as a check prefix in atomic-idempotent.ll. NFC
X32 can refer to a 64-bit ABI that uses 32-bit ints, longs, and pointers.

I plan to add gnux32 command lines to this test so this prepares for that.

Also remove some check lines that have a prefix that is not in any run lines.

llvm-svn: 360642
2019-05-14 03:07:56 +00:00
Sanjay Patel 3a13d970aa [SDAG, x86] allow targets to override test for binop opcodes
This follows the pattern of the existing isCommutativeBinOp().

x86 shows improvements from vector narrowing for the min/max opcodes.

llvm-svn: 360639
2019-05-14 00:39:40 +00:00
Nikita Popov 323dc634b9 [WebAssembly] Don't assume that zext/sext result is i32/i64 in fast isel (PR41841)
Usually this will abort fast-isel at the instruction using the
non-legal result, but if the only use is in a different basic block,
we'll incorrectly assume that the zext/sext is to i32 (rather than
i128 in this case).

Differential Revision: https://reviews.llvm.org/D61823

llvm-svn: 360616
2019-05-13 19:40:18 +00:00
Stanislav Mekhanoshin d9930d499a [AMDGPU] gfx1010 tests. NFC.
llvm-svn: 360615
2019-05-13 19:30:06 +00:00
Robert Lougher 91a9d4ef4b Revert [X86] Avoid SFB - Fix inconsistent codegen with/without debug info
Revert r360436 as it is causing clang-x64-windows-msvc buildbot to fail.

llvm-svn: 360606
2019-05-13 17:36:46 +00:00
Nick Desaulniers c33f754e74 [TargetLowering] Handle multi depth GEPs w/ inline asm constraints
Summary:
X86TargetLowering::LowerAsmOperandForConstraint had better support than
TargetLowering::LowerAsmOperandForConstraint for arbitrary depth
getelementpointers for "i", "n", and "s" extended inline assembly
constraints. Hoist its support from the derived class into the base
class.

Link: https://github.com/ClangBuiltLinux/linux/issues/469

Reviewers: echristo, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61560

llvm-svn: 360604
2019-05-13 17:27:44 +00:00
Simon Pilgrim 73aee29095 [X86][SSE] LowerBuildVectorv4x32 - don't insert MOVQ for undef elts
Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts.

Differential Revision: https://reviews.llvm.org/D61782

llvm-svn: 360596
2019-05-13 16:10:11 +00:00
Simon Pilgrim cf5a8eb7cd [X86][SSE] Relax use limits for lowerAddSubToHorizontalOp (PR32433)
Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op.

This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately.

Differential Revision: https://reviews.llvm.org/D61782

llvm-svn: 360594
2019-05-13 16:02:45 +00:00
Simon Pilgrim d3cedee3c6 [TargetLowering] Add SimplifyDemandedBits support for ZERO_EXTEND_VECTOR_INREG
More work for PR39709.

llvm-svn: 360592
2019-05-13 15:51:26 +00:00
Craig Topper c6a6c10742 [X86] Add test case for mask register variant of PR41619 which should be fixed after r360552
llvm-svn: 360591
2019-05-13 15:45:20 +00:00
Sanjay Patel 05dafb1c97 [DAGCombiner] narrow vector binop with inserts/extract
We catch most of these patterns (on x86 at least) by matching
a concat vectors opcode early in combining, but the pattern may
emerge later using insert subvector instead.

The AVX1 diffs for add/sub overflow show another missed narrowing
pattern. That one may be falling though the cracks because of
combine ordering and multiple uses.

llvm-svn: 360585
2019-05-13 14:31:14 +00:00
Sanjay Patel 83e61bc5e2 [x86] add test for insert/extract binop; NFC
This pattern is visible in the c-ray benchmark with an AVX target.

llvm-svn: 360582
2019-05-13 13:32:16 +00:00
Kevin P. Neal 5987749e33 Add constrained fptrunc and fpext intrinsics.
The new fptrunc and fpext intrinsics are constrained versions of the
regular fptrunc and fpext instructions.

Reviewed by:	Andrew Kaylor, Craig Topper, Cameron McInally, Conner Abbot
Approved by:	Craig Topper
Differential Revision: https://reviews.llvm.org/D55897

llvm-svn: 360581
2019-05-13 13:23:30 +00:00
Ulrich Weigand 8e42f6ddc8 [SystemZ] Model floating-point control register
This adds the FPC (floating-point control register) as a reserved
physical register and models its use by SystemZ instructions.

Note that only the current rounding modes and the IEEE exception
masks are modeled.  *Changes* of the FPC due to exceptions (in
particular the IEEE exception flags and the DXC) are not modeled.

At this point, this patch is mostly NFC, but it will prevent
scheduling of floating-point instructions across SPFC/LFPC etc.

llvm-svn: 360570
2019-05-13 09:47:26 +00:00
Sam Parker a33e311a3b [ARM][ParallelDSP] Relax alias checks
When deciding the safety of generating smlad, we checked for any
writes within the block that may alias with any of the loads that
need to be widened. This is overly conservative because it only
matters when there's a potential aliasing write to a location
accessed by a pair of loads.

Now we check for aliasing writes only once, during setup. If two
loads are found to have an aliasing write between them, we don't add
these loads to LoadPairs. This means that later during the transform,
we can safely widened a pair without worrying about aliasing.

However, to maintain correctness, we also need to change the way that
wide loads are inserted because the order is now important.

The MatchSMLAD method has also been changed, absorbing
MatchReductions and AddMACCandidate to hopefully improve readability.

Differential Revision: https://reviews.llvm.org/D6102

llvm-svn: 360567
2019-05-13 09:23:32 +00:00
Clement Courbet 9afc4764dd [DAGCombiner] Fix invalid alias analysis.
Summary:
When we know for sure whether two addresses do or do not alias, we
should immediately return from DAGCombiner::isAlias().

I think this comes from a bad copy/paste, Sorry for not catching that during the
code review.

Fixes PR41855.

Reviewers: niravd, gchatelet, EricWF

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61846

llvm-svn: 360566
2019-05-13 09:07:37 +00:00
Clement Courbet c4e37fd9b2 [DAGCombiner][NFC] Commit test to show fix in D61846.
llvm-svn: 360561
2019-05-13 08:15:34 +00:00
Yonghong Song 98fe9c9869 [BPF] emit BTF sections only if debuginfo available
Currently, without -g, BTF sections may still be emitted with
data sections, e.g., for linux kernel bpf selftest
test_tcp_check_syncookie_kern.c issue discovered by Martin
as shown below.

-bash-4.4$ bpftool btf dump file test_tcp_check_syncookie_kern.o
[1] VAR 'results' type_id=0, linkage=global-alloc
[2] VAR '_license' type_id=0, linkage=global-alloc
[3] DATASEC 'license' size=0 vlen=1
        type_id=2 offset=0 size=4
[4] DATASEC 'maps' size=0 vlen=1
        type_id=1 offset=0 size=28

Let disable BTF generation if no debuginfo, which is
the original design.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D61826

llvm-svn: 360556
2019-05-13 05:00:23 +00:00
Craig Topper 61e556d2bd Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"
I've included a new fix in X86RegisterInfo to prevent PR41619 without
reintroducing r359392. We might be able to improve that in the base class
implementation of shouldRewriteCopySrc somehow. But this hopefully enables
forward progress on SimplifyDemandedBits improvements for now.

Original commit message:

This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb
but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

llvm-svn: 360552
2019-05-13 04:03:35 +00:00
Simon Pilgrim a7fc763082 [X86][AVX] Split VZEXT_MOVL ymm/zmm if the upper elements are not demanded.
Removes unnecessary vzeroupper noted in D61806

llvm-svn: 360543
2019-05-12 15:16:29 +00:00
Sanjay Patel a09e686821 [DAGCombiner] try to move bitcast after extract_subvector
I noticed that we were failing to narrow an x86 ymm math op in a case similar
to the 'madd' test diff. That is because a bitcast is sitting between the math
and the extract subvector and thwarting our pattern matching for narrowing:

       t56: v8i32 = add t59, t58
      t68: v4i64 = bitcast t56
    t73: v2i64 = extract_subvector t68, Constant:i64<2>
  t96: v4i32 = bitcast t73

There are a few wins and neutral diffs in the other tests.

Differential Revision: https://reviews.llvm.org/D61806

llvm-svn: 360541
2019-05-12 14:43:20 +00:00
Simon Pilgrim fda6bffd3b [X86][SSE] SimplifyDemandedBits - call PEXTRB/PEXTRW SimplifyDemandedVectorElts as well.
See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits.

This helps us with target shuffles and hops in particular.

llvm-svn: 360535
2019-05-11 21:35:50 +00:00
Simon Pilgrim 605a840747 [DAG] Add SimplifyDemandedBits support for BITREVERSE
Pulled out of D58017 while I continue to investigate the BSWAP regression on PPC

llvm-svn: 360534
2019-05-11 20:56:05 +00:00
Simon Pilgrim 3fa632a112 [X86] Updated shift-mask test targets for D61830
llvm-svn: 360533
2019-05-11 20:28:20 +00:00
Simon Pilgrim 91e697c145 [X86] Add scalar shl+lshr -> shift+mask tests (PR40758)
As discussed on D61068, many x86 targets can perform 2 immediate shifts quicker than a shift + mask

llvm-svn: 360530
2019-05-11 19:16:46 +00:00
Simon Pilgrim 6f7c62d70f [X86] Add avx512f tests for boolean reduction
llvm-svn: 360529
2019-05-11 19:14:19 +00:00
Simon Pilgrim e4c5b6d9bd [X86][SSE] Add SimplifyDemandedVectorElts HADD/HSUB handling.
Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts

llvm-svn: 360526
2019-05-11 16:07:12 +00:00
Craig Topper c9d7484aa3 [X86] Add CMOV_FR32X/CMOV_FR64X pseudo instructions. Use them in fast isel to fix a machine verifier error after adding test cases.
Fast isel picks the FR32X/FR64X register classes when lowering pseudo select, but it didn't have the right opcode to go with it.

llvm-svn: 360524
2019-05-11 16:00:28 +00:00
Simon Pilgrim 4871a3057e [X86][SSE] Tweaked HADD/HSUB SimplifyDemandedVectorElts
Try to ensure we LHS and RHS test coverage

llvm-svn: 360519
2019-05-11 14:47:54 +00:00
Simon Pilgrim 1db0cc9e1b [X86][SSE] Add integer HADD/HSUB SimplifyDemandedVectorElts tests
llvm-svn: 360518
2019-05-11 14:08:34 +00:00
Simon Pilgrim 67ad4c2f27 [X86][SSE] Add HADD/HSUB SimplifyDemandedVectorElts tests
Shows missed opportunities to simplify args.

Will add integer HADD/HSUB tests in a future commit.

llvm-svn: 360517
2019-05-11 12:46:38 +00:00
Craig Topper 31f7adb94f [X86] Don't emit MOVNTDQA loads from fast-isel without SSE4.1.
We were checking for SSE4.1 for FP types, but not integer 128-bit types.

Fixes PR41837.

llvm-svn: 360512
2019-05-11 04:19:33 +00:00
Craig Topper bdef12df8d [X86] Add a test case for idempotent atomic operations with speculative load hardening. Fix an additional issue found by the test.
This test covers the fix from r360475 as well.

llvm-svn: 360511
2019-05-11 04:00:27 +00:00
Mircea Trofin ff3bed0e61 Skip over prefetches
Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61789

llvm-svn: 360471
2019-05-10 21:27:55 +00:00
Nikita Popov 9f7537bd48 [SDAG] Recursively legalize both vector mulo results
Split out from D61692 per RKSimon's suggestion. Vector op
legalization will automatically recursively legalize the returned
SDValue, but we need to take care of the other results ourselves.
Otherwise it will end up getting legalized only during op
legalization, by which point it might be too late (though I'm not
aware of any specific cases right now).

There are codegen differences because expansion occurs earlier now
and we don't get a DAGCombiner run in between.

Differential Revision: https://reviews.llvm.org/D61744

llvm-svn: 360470
2019-05-10 20:42:48 +00:00
Momchil Velikov c396f09ce9 Adjust MachineScheduler to use ProcResource counts
This fix allows the scheduler to take into account the number of instances of
each ProcResource specified. Previously a declaration in a scheduler of
ProcResource<1> would be treated identically to a declaration of
ProcResource<2>. Now the hazard recognizer would report a hazard only after all
of the resource instances are busy.

Patch by Jackson Woodruff and Momchil Velikov.

Differential Revision: https://reviews.llvm.org/D51160

llvm-svn: 360441
2019-05-10 16:54:32 +00:00
Robert Lougher 986b6b86bb [X86] Avoid SFB - Fix inconsistent codegen with/without debug info
Fixes https://bugs.llvm.org/show_bug.cgi?id=40969

The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.

This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.

Patch by Chris Dawson.

Differential Revision: https://reviews.llvm.org/D61680

llvm-svn: 360436
2019-05-10 15:55:06 +00:00
Simon Pilgrim a0b1518a4a [X86][SSE] Add getHopForBuildVector vector splitting
If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free).

Fixes some of the regressions noted in D61782

llvm-svn: 360435
2019-05-10 15:46:04 +00:00
Lei Huang 1ac6e9636c [PowerPC] custom lower `v2f64 fpext v2f32`
Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32.

eg. For the following IR
  %0 = load <2 x float>, <2 x float>* %Ptr, align 8
  %1 = fpext <2 x float> %0 to <2 x double>
  ret <2 x double> %1

Pre custom lowering:
  ld r3, 0(r3)
  mtvsrd f0, r3
  xxswapd vs34, vs0
  xscvspdpn f0, vs0
  xxsldwi vs1, vs34, vs34, 3
  xscvspdpn f1, vs1
  xxmrghd vs34, vs0, vs1

After custom lowering:
  lfd f0, 0(r3)
  xxmrghw vs0, vs0, vs0
  xvcvspdp vs34, vs0

Differential Revision: https://reviews.llvm.org/D57857

llvm-svn: 360429
2019-05-10 14:04:06 +00:00
Tim Northover 6c1e3f9493 SelectionDAG: accommodate atomic floating stores.
We were applying a pointer truncation to floating types, which crashed LLVM.
That is Not A Good Thing(TM).

llvm-svn: 360421
2019-05-10 11:23:04 +00:00
Sam Clegg ea38ac5ba3 [WebAssembly] Don't assume that strongly defined symbols are DSO-local
The current PIC model for WebAssembly is more like ELF in that it
allows symbol interposition.

This means that more functions end up being addressed via the GOT
and fewer directly added to the wasm table.

One effect is a reduction in the number of wasm table entries similar
to the previous attempt in https://reviews.llvm.org/D61539 which was
reverted.

Differential Revision: https://reviews.llvm.org/D61772

llvm-svn: 360402
2019-05-10 01:52:08 +00:00
Mircea Trofin 5c31c05fbd [llvm] X86DiscriminateMemOps: insert debug info when missing
Reviewers: davidxl

Reviewed By: davidxl

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61735

llvm-svn: 360396
2019-05-10 00:12:51 +00:00
Stanislav Mekhanoshin 64196850f0 [AMDGPU] Pattern for v_xor3_b32
This also allows three op patterns to use increased constant bus
limit of GFX10.

Differential Revision: https://reviews.llvm.org/D61763

llvm-svn: 360395
2019-05-10 00:09:01 +00:00
Philip Reames bd588dfd59 [X86] Improve lowering of idemptotent RMW operations
The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local.

Differential Revision: https://reviews.llvm.org/D58632

llvm-svn: 360393
2019-05-09 23:23:42 +00:00
Simon Pilgrim 93bfa5af48 [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)
As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2*shuffle+add/sub - so if we can reduce 2*shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count.

This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2*shuffle+add+shuffle to hadd+2*shuffle - I've opened PR41813 to cover this.

Differential Revision: https://reviews.llvm.org/D61308

llvm-svn: 360360
2019-05-09 17:45:01 +00:00
Nemanja Ivanovic 80808ed0f6 [PowerPC][NFC] Add test for D60506 to show differences in code-gen
Differential revision: https://reviews.llvm.org/D61723

llvm-svn: 360338
2019-05-09 12:26:39 +00:00
Sam Parker d7b650cc72 [ARM][CGP] Guard against signext args and sitofp
Add an Argument that has the SExtAttr attached, as well as SIToFP
instructions, as values that generate sign bits. SIToFP doesn't
strictly do this and could be treated as a sink to be sign-extended.

Differential Revision: https://reviews.llvm.org/D61381

llvm-svn: 360331
2019-05-09 11:56:16 +00:00
Diana Picus 3531453371 [ARM GlobalISel] Map DBG_VALUE for types != s32
...and make sure we fail elegantly for unsupported values.

s64 goes into DPR, anything <= 32 into GPR.

llvm-svn: 360321
2019-05-09 09:49:36 +00:00
Leonard Chan 95b7abdcc5 [SelectionDAG] Expand ADD/SUBCARRY
This patch allows for expansion of ADDCARRY and SUBCARRY when the target does not support it.

Differential Revision: https://reviews.llvm.org/D61411

llvm-svn: 360303
2019-05-09 01:17:48 +00:00
Stanislav Mekhanoshin 327626368c [AMDGPU] gfx1010 tests. NFC.
Added tests which now pass after code commits.

llvm-svn: 360300
2019-05-08 23:31:32 +00:00
Sanjay Patel 902b3ecdad [SelectionDAG] fold 'fneg undef' to undef
This is extracted from the original draft of D61419 with some additional tests.
We don't currently get this in IR (it's conservatively turned into a NaN),
but presumably that'll get updated as we add real IR support for 'fneg'
rather than 'fsub -0.0, x'.

The x86-32 run shows the following, and I haven't looked further to see why,
but that seems to be independent:
  Legalizing: t1: f32 = undef
  Trying to expand node
  Creating fp constant: t4: f32 = ConstantFP<0.000000e+00>

Differential Revision: https://reviews.llvm.org/D61516

llvm-svn: 360296
2019-05-08 22:19:52 +00:00
Matt Arsenault 01434f9377 AMDGPU: Select VOP3 form of add
The VOP3 form should always be the preferred selection, to be shrunk
later. This should only be an optimization issue, but this partially
works around a problem from clobbering VCC when SIFixSGPRCopies
rewrites an SCC defining operation directly to VCC.

3 of the testcases are regressions from failing to fold the immediate
in cases it should. These can be avoided by improving the VCC liveness
handling in SIFoldOperands. Simply increasing the threshold to
computeRegisterLiveness works, although this is common enough that VCC
liveness should probably be tracked throughout the pass. The hack of
leaving behind an implicit_def instruction to avoid breaking iterator
wastes instruction count, which inhibits finding the VCC def in long
chains of adds. Doing this however exposes different, worse looking
regressions from poor scheduling behavior. This could probably be
avoided around by forcing the shrink of the addc here, but the
scheduler should probably be fixed.

The r600 add test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 360293
2019-05-08 22:09:57 +00:00
Stanislav Mekhanoshin 1dbf721315 [AMDGPU] gfx1010 exp modifications
Differential Revision: https://reviews.llvm.org/D61701

llvm-svn: 360287
2019-05-08 21:23:37 +00:00
Craig Topper c5db081f8d [X86] Add a non-ambiguous check prefix to lwp-intrinsics.ll for the case when only the feature is specified and not the CPUs.
Not sure why the script doesn't notice this. We just weren't checking the +lwp command in 32-bit mode in 2 of the test cases.

llvm-svn: 360282
2019-05-08 19:20:53 +00:00
Quentin Colombet 157427245a [RegAllocFast] Scan physcial reg definitions before assigning virtual reg definitions
When assigning the definitions of an instruction we were updating
the available registers while walking the definitions. Some of
those definitions may be from physical registers and thus, they are
not available for other definitions to take, but by the time we see
that we may have already assign these registers to another
virtual register.

Fix that by walking through all the definitions and mark as unavailable
the physical register definitions, then do the virtual register assignments.

PR41790

llvm-svn: 360278
2019-05-08 18:30:26 +00:00
Philip Reames 8186e39082 [Tests] Landing tests for D58632 to show diffs in review
llvm-svn: 360274
2019-05-08 17:28:38 +00:00
Craig Topper 493aec3ef5 [FastISel][X86] Support FNeg instruction in target independent fast isel handling
This patch adds support for calling selectFNeg for FNeg instructions in addition to the fsub idiom

Differential Revision: https://reviews.llvm.org/D61624

llvm-svn: 360273
2019-05-08 17:27:08 +00:00
Simon Pilgrim e3eec06dde [AMDGPU] Reapplied BFE canonicalization from D60462
This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch.

llvm-svn: 360263
2019-05-08 15:49:10 +00:00
Tim Northover 18adcf331b ARM: disallow SP as Rn for Thumb2 TST & TEQ instructions
Using SP in this position is unpredictable in ARMv7. CMP and CMN are not
affected, and of course v8 relaxes this requirement, but that's handled
elsewhere.

llvm-svn: 360242
2019-05-08 10:59:08 +00:00
QingShan Zhang 5f7c86147d [NFC][PowerPC] Add test for store combine optimization.
llvm-svn: 360229
2019-05-08 07:56:59 +00:00
QingShan Zhang 0e71a6e755 [CodeGenPrepare] Don't split the store if it is volatile
We shouldn't split the store when it is volatile.

Differential Revision: https://reviews.llvm.org/D61169

llvm-svn: 360228
2019-05-08 07:32:12 +00:00
Philip Reames b61eaebb6b [Tests] Expand coverage of small memset zero idioms
llvm-svn: 360210
2019-05-07 23:48:42 +00:00
Reid Kleckner 6bf108d77a [COFF] Use COFF stubs for extern_weak functions
Summary:
A COFF stub indirects the reference to a symbol through memory. A
.refptr.$sym global variable pointer is created to refer to $sym.
Typically mingw uses these for external global variable declarations,
but we can use them for weak function declarations as well.

Updates the dso_local classification to add a special case for
extern_weak symbols on COFF in both clang and LLVM.

Fixes PR37598

Reviewers: smeenai, mstorsjo

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D61615

llvm-svn: 360207
2019-05-07 23:06:21 +00:00
Austin Kerbow 8a3d3a9af6 [AMDGPU] Check MI bundles for hazards
Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually.

Reviewers: arsenm, msearles, rampitec

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61564

llvm-svn: 360199
2019-05-07 22:12:15 +00:00
Eric Christopher 4727221734 Make sure that the DAG combiner doesn't merge stores that we explicitly
asked not be greater than preferred vector width for the vectorizer.
Test for both 128 and 256 with a skylake architecture.

llvm-svn: 360183
2019-05-07 19:25:34 +00:00
Philip Reames 800e6e34ae [Tests] Yet more combination of tests for unordered.atomic memset
llvm-svn: 360177
2019-05-07 17:45:52 +00:00
Jinsong Ji cc63db4ff0 [PowerPC][NFC] Update build-vector-tests.ll using utils/update_llc_test_checks.py
build-vector-tests.ll is a huge testcase, it is hard to maintain: eg:
any fundamental changes might need to update hundreds of lines. We should
leverage the script to maintain it.

This patch simply run utils/update_llc_test_checks.py on it. There
should be no missing test points.

llvm-svn: 360175
2019-05-07 17:29:44 +00:00
Nemanja Ivanovic b4f028f0f3 [PowerPC] Use the two-constant NR algorithm for refining estimates
The single-constant algorithm produces infinities on a lot of denormal values.
The precision of the two-constant algorithm is actually sufficient across the
range of denormals. We will switch to that algorithm for now to avoid the
infinities on denormals. In the future, we will re-evaluate the algorithm to
find the optimal one for PowerPC.

Differential revision: https://reviews.llvm.org/D60037

llvm-svn: 360144
2019-05-07 13:48:03 +00:00
George Rimar 5c922f6988 [llvm-objdump] - Print relocation record in a GNU format.
This fixes the https://bugs.llvm.org/show_bug.cgi?id=41355.

Previously with -r we printed relocation section name instead of the target section name.
It was like this: "RELOCATION RECORDS FOR [.rel.text]"
Now it is: "RELOCATION RECORDS FOR [.text]"

Also when relocation target section has more than one relocation section,
we did not combine the output. Now we do.

Differential revision: https://reviews.llvm.org/D61312

llvm-svn: 360143
2019-05-07 13:14:18 +00:00
Diana Picus 0a47fb8884 [ARM GlobalISel] Widen G_SELECT operands
...except for the condition operand.

llvm-svn: 360135
2019-05-07 11:39:30 +00:00
Simon Pilgrim b0f51266b8 [X86][AVX] Fold concat(packus(),packus()) -> packus(concat(),concat()) (PR34773)
Basic "revectorization" combine, we can probably do more opcodes here but it can be a tricky cost-benefit depending on where the subvectors came from - but this case helps shuffle combining.

llvm-svn: 360134
2019-05-07 11:17:39 +00:00
Diana Picus d6d3808fa4 [ARM GlobalISel] Widen G_INTTOPTR/G_PTRTOINT
We actually have a couple of G_PTRTOINT to s8 when building clang, so
we should do something about them.

llvm-svn: 360130
2019-05-07 10:48:01 +00:00
Diana Picus d18bac5d19 [ARM GlobalISel] Widen G_GEP index operand
llvm-svn: 360127
2019-05-07 10:11:57 +00:00
Nicolai Haehnle 79ea85c6af AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand
Summary:
No test case because I don't know of a way to trigger this, but I
accidentally caused this to fail while working on a different change.

Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61490

llvm-svn: 360123
2019-05-07 09:19:09 +00:00
Craig Topper c6d445f9c1 [FastISel][X86] If selectFNeg fails, fall back to SelectionDAG not treating it as an fsub.
Summary:
If fneg lowering for fsub -0.0, x fails we currently fall back to treating it as an fsub. This has different behavior for nans than the xor with sign bit trick we normally try to do. On X86, the xor trick for double fails fast-isel in 32-bit mode with sse2 due to 64 bit integer types not being available. With -O2 we would always use an xorpd for this case. If we use subsd, this creates an observable behavior difference between -O0 and -O2. So fall back to SelectionDAG if we can't fast-isel it, that way SelectionDAG will use the xorpd.

I believe this patch is restoring the behavior prior to r345295 from last October. This was missed then because our fast isel case in 32-bit mode aborted fast-isel earlier for another reason. But I've added new tests to cover that.

Reviewers: andrew.w.kaylor, cameron.mcinally, spatel, efriedma

Reviewed By: cameron.mcinally

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61622

llvm-svn: 360111
2019-05-07 04:25:24 +00:00
Amy Huang 987b969bab Fix bug in getCompleteTypeIndex in codeview debug info
Summary:
When there are multiple instances of a forward decl record type, only the first one is emitted with a type index, because
the type is added to a map with a null type index. Avoid this by reordering so that forward decl types aren't added to the map.

Reviewers: rnk

Subscribers: aprantl, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61460

llvm-svn: 360101
2019-05-06 23:37:03 +00:00
Eli Friedman 2ea088173d [ARM] Glue register copies to tail calls.
This generally follows what other targets do. I don't completely
understand why the special case for tail calls existed in the first
place; even when the code was committed in r105413, call lowering didn't
work in the way described in the comments.

Stack protector lowering breaks if the register copies are not glued to
a tail call: we have to insert the stack protector check before the tail
call, and we choose the location based on the assumption that all
physical register dependencies of a tail call are adjacent to the tail
call. (See FindSplitPointForStackProtector.) This is sort of fragile,
but I don't see any reason to break that assumption.

I'm guessing nobody has seen this before just because it's hard to
convince the scheduler to actually schedule the code in a way that
breaks; even without the glue, the only computation that could actually
be scheduled after the register copies is the computation of the call
address, and the scheduler usually prefers to schedule that before the
copies anyway.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41417

Differential Revision: https://reviews.llvm.org/D60427

llvm-svn: 360099
2019-05-06 23:21:59 +00:00
Craig Topper 39f1a97417 [FastISel] Pass the fneg input operand to hasTrivialKill in FastISel::selectFNeg.
We're trying to calculate the kill flag for OpReg which is the input so we need to pass the input here.

llvm-svn: 360097
2019-05-06 23:09:09 +00:00
Craig Topper 24cfb7a992 [X86] Add test case to show that we don't set the kill flag properly for fast isel handling of fneg.
llvm-svn: 360096
2019-05-06 23:08:17 +00:00
Stanislav Mekhanoshin 971cb8b633 [AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32
GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for
all targets.

Differential Revision: https://reviews.llvm.org/D61525

llvm-svn: 360094
2019-05-06 22:27:05 +00:00
Philip Reames 1b31390fc6 [Tests] Add tests for optimized lowerings of element.unordered.atomic memset/memcmove/memcopy
llvm-svn: 360093
2019-05-06 22:25:59 +00:00
Philip Reames 03a979a45a [Tests] Rename tests before adding new ones
llvm-svn: 360092
2019-05-06 22:16:55 +00:00
Philip Reames 4bcf10fc0f [Tests] Autogen a test in advance of updates
llvm-svn: 360091
2019-05-06 22:12:07 +00:00
Philip Reames 2f53d79bff Fix pr33010, a 2 year old crashing regression
The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>.  The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this.  Instead, simply prevent it's creation.

Arguably, we shouldn't be using *Target*FrameIndices in StatepointLowering at all, but that's a much deeper change.  

llvm-svn: 360090
2019-05-06 22:09:31 +00:00
Craig Topper 77e69d8850 [X86] Add more test cases for fast-isel handling of fneg.
The fneg double case is falling back to a subsd in 32-bit mode if you write a test that doesn't trigger a fast-isel abort on the return value.

The subsd lowering has different behavior with respect to nans than using an xor. This is inconsisent with what we would do in SelectionDAG
and can lead to differences between -O0 and -O2.

llvm-svn: 360088
2019-05-06 22:04:26 +00:00
Stanislav Mekhanoshin 1bc001dec4 [AMDGPU] gfx1010 memory legalizer
Differential Revision: https://reviews.llvm.org/D61535

llvm-svn: 360087
2019-05-06 21:57:02 +00:00
Jordan Rupprecht 8f14e7cacf Revert "Re-commit r357452: SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)"
This reverts r357452 (git commit 21eb771dcb).

This was causing strange optimization-related test failures on an internal test. Will followup with more details offline.

llvm-svn: 360086
2019-05-06 21:55:05 +00:00
Craig Topper d10a200ceb [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing.
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.

Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.

After this patch we should support the d/q for parsing, but not print it when its unneeded.

llvm-svn: 360085
2019-05-06 21:39:51 +00:00
Martin Storsjo 899f3cd581 [AArch64] Default to SEH exception handling on MinGW
The SEH implementation is pretty mature at this point.

Differential Revision: https://reviews.llvm.org/D61590

llvm-svn: 360080
2019-05-06 21:18:15 +00:00
Craig Topper ad56843dd7 [SelectionDAG][X86] Support inline assembly returning an mmx register into a type with fewer than 64 bits.
It's possible to use the 'y' mmx constraint with a type narrower than 64-bits.

This patch supports this by bitcasting the mmx type to 64-bits and then
truncating to the desired type.

There are probably other missing type combinations we need to support, but this
is the case we have a bug report for.

Fixes PR41748.

Differential Revision: https://reviews.llvm.org/D61582

llvm-svn: 360069
2019-05-06 19:50:14 +00:00
Amara Emerson 3d1128cc9e [GlobalISel] Handle <1 x T> vector return types properly.
After support for dealing with types that need to be extended in some way was
added in r358032 we didn't correctly handle <1 x T> return types. These types
don't have a GISel direct representation, instead we just see them as scalars.
When we need to pad them into <2 x T> types however we need to use a
G_BUILD_VECTOR instead of trying to do a G_CONCAT_VECTOR.

This fixes PR41738.

llvm-svn: 360068
2019-05-06 19:41:01 +00:00
Craig Topper 55a71b575c Revert r359392 and r358887
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"

Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.

Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.

Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.

This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.

llvm-svn: 360066
2019-05-06 19:29:24 +00:00
Nikita Popov cfe786a195 [SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635)
This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635
by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is
legal but former is not) for zero-or-all-ones boolean reductions (which
are detected based on sign bits).

Differential Revision: https://reviews.llvm.org/D61398

llvm-svn: 360054
2019-05-06 16:17:17 +00:00
Nemanja Ivanovic 70afe4f7e1 [PowerPC] Fix erroneous condition for converting uint-to-fp vector conversion
A condition for exiting the legalization of v4i32 conversion to v2f64 through
extract/convert/build erroneously checks for the extract having type i32.
This is not adequate as smaller extracts are actually legalized to i32 as well.
Furthermore, an early exit is missing which means that we only check that
both extracts are from the same vector if that check fails.
As a result, both cases in the included test case fail - the first gets a
select error and the second generates incorrect code.

The culprit commit is r274535.

llvm-svn: 360043
2019-05-06 13:35:49 +00:00
Fangrui Song 4c3d579096 [CodeGen] Move X86 tests under the X86 directory
llvm-svn: 360029
2019-05-06 10:21:17 +00:00
Guillaume Chatelet 3cfb48b877 [NFC] Update memcpy tests
Summary: Runs utils/update_llc_test_checks.py on a few memcpy files

Reviewers: courbet

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61507

Remove cfi noise by adding nounwind

llvm-svn: 360023
2019-05-06 09:46:50 +00:00
Fangrui Song 041c377a59 [X86] Move files to correct directories after D60552
llvm-svn: 360022
2019-05-06 09:24:36 +00:00
Pengfei Wang b5d3430d3d [NFC] This is a test for the commit access.
Summary: Signed-off-by: Pengfei Wang <pengfei.wang@intel.com>

Reviewers: LuoYuanke

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61580

llvm-svn: 360019
2019-05-06 08:31:18 +00:00
Sanjay Patel 5ab41a7a05 [CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd try)
This is a subset of the original commit from rL359879
which was reverted because it could crash when using the 'RemovedInstructions'
structure that enables delayed deletion of dead instructions. The motivating
compile-time win does not require that change though. We should get most of
that win from this change alone.

Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359969
2019-05-04 12:46:32 +00:00
Stanislav Mekhanoshin 51d1415a16 AMDGPU] gfx1010 hazard recognizer
Differential Revision: https://reviews.llvm.org/D61536

llvm-svn: 359961
2019-05-04 04:30:57 +00:00
Stanislav Mekhanoshin 28a1936f6d [AMDGPU] gfx1010: use fmac instructions
Differential Revision: https://reviews.llvm.org/D61527

llvm-svn: 359959
2019-05-04 04:20:37 +00:00
Sanjay Patel dd2e91a181 [x86] add tests for fneg IR with undef; NFC
llvm-svn: 359941
2019-05-03 22:47:29 +00:00
Jessica Paquette 910630c1e4 [AArch64][GlobalISel] Use fcsel instead of csel for G_SELECT on FPRs
This saves us some unnecessary copies.

If the inputs to a G_SELECT are floating point, we should use fcsel rather than
csel.

Changes here are...

- Teach selectCopy about s1-to-s1 copies across register banks.
- AArch64RegisterBankInfo about G_SELECT in general.
- Teach the instruction selector about the FCSEL instructions.

Also add two tests:

- select-select.mir to show that we get the expected FCSEL
- regbank-select.mir (unfortunately named) to show the register banks on
G_SELECT are properly preserved

And update fast-isel-select.ll to show that we do the same thing as other
instruction selectors in these cases.

llvm-svn: 359940
2019-05-03 22:37:46 +00:00
Stanislav Mekhanoshin d9dcf392c7 [AMDGPU] gfx1010 wait count insertion
Differential Revision: https://reviews.llvm.org/D61534

llvm-svn: 359938
2019-05-03 21:53:53 +00:00
Stanislav Mekhanoshin 41bbe101a2 [AMDGPU] gfx1010 s_code_end generation
Also add some missing metadata in the streamer.

Differential Revision: https://reviews.llvm.org/D61531

llvm-svn: 359937
2019-05-03 21:26:39 +00:00
Mandeep Singh Grang 5dc8aeb26d [COFF, ARM64] Fix ABI implementation of struct returns
Summary:
Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values

Related clang patch: D60349

Reviewers: rnk, efriedma, TomTan, ssijaric

Reviewed By: rnk, efriedma

Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60348

llvm-svn: 359934
2019-05-03 21:12:36 +00:00
Matt Arsenault b6c599afd3 Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
This reverts commit r359912.

This should pass now, since the clang test was made less fragile in
r359918.

llvm-svn: 359919
2019-05-03 19:06:57 +00:00
Nico Weber bb852a9672 Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail.

llvm-svn: 359912
2019-05-03 18:08:03 +00:00
Evgeniy Stepanov 46ec57e576 Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic block"
This reverts commit r359879, which introduced a compiler crash.

llvm-svn: 359908
2019-05-03 17:31:49 +00:00
Matt Arsenault daf2d653fa RegAllocFast: Add heuristic to detect values not live-out of a block
Add an improved/new heuristic to catch more cases when values are not
live out of a basic block.

Patch by Matthias Braun

llvm-svn: 359906
2019-05-03 17:03:24 +00:00
Matt Arsenault 657ef48a88 AMDGPU: Select VOP3 form of sub
The VOP3 form should always be the preferred selection form to be
shrunk later.

The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 359899
2019-05-03 15:37:07 +00:00
Matt Arsenault cfd0ca38b0 AMDGPU: Support shrinking add with FI in SIFoldOperands
Avoids test regression in a future patch

llvm-svn: 359898
2019-05-03 15:21:53 +00:00
Sanjay Patel d0336b1e3f [x86] add tests for fneg with undefs; NFC
This was originally part of D61419.

llvm-svn: 359896
2019-05-03 15:09:53 +00:00
Matt Arsenault ca7a582bf3 AMDGPU: Add baseline test for future patch
llvm-svn: 359893
2019-05-03 14:54:38 +00:00
Matt Arsenault 0446fbe45e AMDGPU: Replace shrunk instruction with dummy implicit_def
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.

Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.

llvm-svn: 359891
2019-05-03 14:40:10 +00:00
Simon Pilgrim 4d4f779fa2 [X86] Add X64 common prefixes and regenerate mul i64 tests
Noticed while reviewing D61472

llvm-svn: 359886
2019-05-03 14:07:38 +00:00
Matt Arsenault 6d0c59605c AMDGPU: Forgot to commit test file for r358890
llvm-svn: 359885
2019-05-03 13:55:40 +00:00
Matt Arsenault 2c8936fd26 AMDGPU: Fix incorrect commute with sub when folding immediates
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.

llvm-svn: 359883
2019-05-03 13:42:56 +00:00
Matt Arsenault 2636460f0e AMDGPU: Fix test verification
This should run the verifier, and needs to enable trackRegLiveness.

llvm-svn: 359882
2019-05-03 13:42:55 +00:00
Sanjay Patel 8ff072e48e [CodeGenPrepare] limit overflow intrinsic matching to a single basic block
Using/updating a dominator tree to match math overflow patterns may be very
expensive in compile-time (because of the way CGP uses a DT), so just handle
the single-block case.

Also, we were restarting the iterator loops when doing the overflow intrinsic
transforms by marking the dominator tree for update. That was done to prevent
iterating over a removed instruction. But we can postpone the deletion using
the existing "RemovedInsts" structure, and that means we don't need to update
the DT.

See post-commit thread for rL354298 for more details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html

Differential Revision: https://reviews.llvm.org/D61075

llvm-svn: 359879
2019-05-03 13:09:18 +00:00
Anton Afanasyev 6d08b8dbae Revert "[MIR] Add simple PRE pass to MachineCSE"
This reverts commit 9c20156de3.
It breaks stage 2 of clang-ppc64be-linux-multistage.

llvm-svn: 359875
2019-05-03 12:36:22 +00:00
Anton Afanasyev 9c20156de3 [MIR] Add simple PRE pass to MachineCSE
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.

The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.

First step: https://reviews.llvm.org/D54839

Fixes llvm.org/PR38917

Reviewers: RKSimon

Subscribers: hfinkel, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D56772

llvm-svn: 359870
2019-05-03 10:30:59 +00:00
Quentin Colombet c9256cc6ba [IRTranslator] Use the alloc size instead of the store size when translating allocas
We use to incorrectly use the store size instead of the alloc size when
creating the stack slot for allocas.
On aarch64 this can be demonstrated by allocating weirdly sized types.

For instance, in the added test case, we use an alloca for i19. We used
to allocate a slot of size 24-bit (19 rounded up to the next byte),
whereas we really want to use a full 32-bit slot for this type.

llvm-svn: 359856
2019-05-03 01:23:56 +00:00
Eli Friedman 0b61d220c9 [AArch64][Windows] Compute function length correctly in unwind tables.
The primary fix here is to WinException.cpp: we need to exclude jump
tables when computing the length of a function, or else we fail to
correctly compute the length. (We can only compute the number of bytes
consumed by certain assembler directives after the entire file is
parsed. ".p2align" is one of those directives, and is used by jump table
generation.)

The secondary fix, to MCWin64EH, is to make sure we don't silently
miscompile if we hit a similar situation in the future.

It's possible we could extend ARM64EmitUnwindInfo so it allows function
bodies that contain assembler directives, but that's a lot more
complicated; see the FIXME in MCWin64EH.cpp.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41581 .

Differential Revision: https://reviews.llvm.org/D61095

llvm-svn: 359849
2019-05-03 00:10:45 +00:00
Craig Topper e1e38d4248 [X86] Correct the register class for specific mask register constraints in getRegForInlineAsmConstraint when the VT is a scalar type
The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes.

Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying.

This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg.

Fixes PR41678

Differential Revision: https://reviews.llvm.org/D61453

llvm-svn: 359837
2019-05-02 22:26:40 +00:00
Sanjay Patel 1972826178 [DAGCombiner] try repeated fdiv divisor transform before building estimate (2nd try)
The original patch was committed at rL359398 and reverted at rL359695 because of
infinite looping.

This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop.

Original commit message:

This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359793
2019-05-02 15:02:08 +00:00
Sanjay Patel 284472be6d [SelectionDAG] remove constant folding limitations based on FP exceptions
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.

There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.

Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.

Differential Revision: https://reviews.llvm.org/D61331

llvm-svn: 359791
2019-05-02 14:47:59 +00:00
Simon Pilgrim df8daf0ef4 [X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+fold
Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway.

Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result:
https://godbolt.org/z/0R-U-K

Differential Revision: https://reviews.llvm.org/D61426

llvm-svn: 359786
2019-05-02 14:00:55 +00:00
Diana Picus 06a61ccc42 [ARM GlobalISel] Select extensions to < 32 bits
Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in
the exact same way as 32 bits. This overwrites the higher bits, but that
should be ok since all legal users of types smaller than 32 bits ignore
those bits anyway.

llvm-svn: 359768
2019-05-02 09:28:00 +00:00
Diana Picus 7da389818d [ARM GlobalISel] Rename some inst selector tests. NFC
Prepare to add support for extensions to types smaller than 32 bits.

llvm-svn: 359767
2019-05-02 09:24:47 +00:00
Diana Picus 53bcf6f2e7 [ARM GlobalISel] Legalize extensions to < 32 bits
Make it legal to extend from e.g. s1 to s8 or s16.

llvm-svn: 359766
2019-05-02 09:21:46 +00:00
Stanislav Mekhanoshin 64399da8b8 [AMDGPU] gfx1010 lost VOP2 forms of some add/sub
Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32.

Differential Revision:

llvm-svn: 359757
2019-05-02 04:26:35 +00:00
Stanislav Mekhanoshin 5cf8167735 [AMDGPU] gfx1010 allows VOP3 to have a literal
Differential Revision: https://reviews.llvm.org/D61413

llvm-svn: 359756
2019-05-02 04:01:39 +00:00
Jessica Paquette d010a3b63e Fix erroneous flag in GISel line for arm64-fast-isel-materialize.ll
Accidentally put a fast-isel-abort=2 instead of the GISel abort line.

This test doesn't actually fall back at all for GISel though, so remove the
fallback checks entirely.

llvm-svn: 359737
2019-05-01 22:50:11 +00:00
Jessica Paquette a3843fe6f4 [GlobalISel][AArch64] Use fmov for G_FCONSTANT when possible
This adds support for using fmov rather than a standard mov to materialize
G_FCONSTANT when it's safe to do so.

Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the
selection is correct.

llvm-svn: 359734
2019-05-01 22:39:43 +00:00
Nikita Popov c89667db2c [AArch64] Add tests for bool vector reductions; NFC
Baseline tests for PR41635.

llvm-svn: 359723
2019-05-01 20:18:36 +00:00
Sanjay Patel 9f68614494 [PowerPC] add test that could infinite loop with reordered transforms; NFC
This is a slightly reduced version of the test from D61384.
Adding this as a preliminary step, so I can update D61149 with
the proposed fix.

llvm-svn: 359709
2019-05-01 17:34:30 +00:00
Simon Pilgrim 9f04d97cd7 [X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractions
We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector).

Differential Revision: https://reviews.llvm.org/D61263

llvm-svn: 359707
2019-05-01 17:13:35 +00:00
Stanislav Mekhanoshin 3b7925f035 [AMDGPU] gfx1010 GCNRegBankReassign pass
Reassign registers to reduce register bank conflicts.

Differential Revision: https://reviews.llvm.org/D61344

llvm-svn: 359704
2019-05-01 16:49:31 +00:00
Stanislav Mekhanoshin c29d491596 [AMDGPU] gfx1010 GCNNSAReassign pass
Convert NSA into non-NSA images.

Differential Revision: https://reviews.llvm.org/D61341

llvm-svn: 359700
2019-05-01 16:40:49 +00:00
Stanislav Mekhanoshin 692560dc98 [AMDGPU] gfx1010 MIMG implementation
Differential Revision: https://reviews.llvm.org/D61339

llvm-svn: 359698
2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin a224f68a10 [AMDGPU] gfx1010 DS implementation
Differential Revision: https://reviews.llvm.org/D61332

llvm-svn: 359696
2019-05-01 16:11:11 +00:00
Sanjay Patel 64d5751254 Revert "[DAGCombiner] try repeated fdiv divisor transform before building estimate"
This reverts commit fb9a5307a9 (rL359398)
because it can cause an infinite loop due to opposing combines.

llvm-svn: 359695
2019-05-01 16:06:21 +00:00
Simon Pilgrim 6711b9699a [X86][SSE] Add demanded elts support X86ISD::PMULDQ\PMULUDQ
Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode

llvm-svn: 359686
2019-05-01 14:50:50 +00:00
Simon Pilgrim 3d6899e369 [X86][SSE] Add SSE vector shift support to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359680
2019-05-01 13:51:09 +00:00
Simon Pilgrim ba372c6e62 [X86][SSE] Split 512-bit -> 128-bit vector directly in SimplifyDemandedVectorEltsForTargetNode
llvm-svn: 359678
2019-05-01 12:48:42 +00:00
Simon Pilgrim 951a6b4579 [X86][SSE] Add 512-bit vector support to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359677
2019-05-01 12:37:41 +00:00
Simon Pilgrim 37c2419cc7 [X86][SSE] Add X86ISD::PACKSS\PACKUS to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359673
2019-05-01 11:29:36 +00:00
Simon Pilgrim 7244437050 [X86][SSE] Add scalar horizontal add/sub tests for element extractions from upper lanes
As suggested on D61263

llvm-svn: 359671
2019-05-01 11:17:11 +00:00
Simon Pilgrim 3353cee06c [X86][SSE] Add X86ISD::UNPCKL\UNPCK to SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359670
2019-05-01 11:08:03 +00:00
Simon Pilgrim f7b978a71b [X86][SSE] Move extract_subvector(pshufb) fold to SimplifyDemandedVectorEltsForTargetNode
This lets us hit more cases than combineExtractSubvector and allows us reuse more code.

llvm-svn: 359669
2019-05-01 10:58:38 +00:00
Simon Pilgrim 99eefe94b5 [X86][SSE] Extract i1 elements from vXi1 bool vectors
This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK.

Differential Revision: https://reviews.llvm.org/D61189

llvm-svn: 359666
2019-05-01 10:02:22 +00:00
Fangrui Song 6afcdcf9ab [llvm-readobj] Change -t to --symbols in tests. NFC
-t is --symbols in llvm-readobj but --section-details (unimplemented) in readelf.
The confusing option should not be used since we aim for improving
compatibility.

Keep just one llvm-readobj -t use case in test/tools/llvm-readobj/symbols.test

llvm-svn: 359661
2019-05-01 09:28:24 +00:00
Fangrui Song e29e30b139 [llvm-readobj] Change -long-option to --long-option in tests. NFC
We use both -long-option and --long-option in tests. Switch to --long-option for consistency.

In the "llvm-readelf" mode, -long-option is discouraged as it conflicts with grouped short options and it is not accepted by GNU readelf.

While updating the tests, change llvm-readobj -s to llvm-readobj -S to reduce confusion ("s" is --section-headers in llvm-readobj but --symbols in llvm-readelf).

llvm-svn: 359649
2019-05-01 05:27:20 +00:00
Stanislav Mekhanoshin a6322941ff [AMDGPU] gfx1010 VMEM and SMEM implementation
Differential Revision: https://reviews.llvm.org/D61330

llvm-svn: 359621
2019-04-30 22:08:23 +00:00
Simon Pilgrim 07ab4e7db8 [X86][SSE] Fold extract_subvector(extend(x)) -> extend_vector_inreg(x)
This adds any extend support - folding to zero_extend_vector_inreg (PMOVZX) for legality

Minor improvement for PR39709

llvm-svn: 359608
2019-04-30 20:31:07 +00:00
Dan Gohman 3a7532e645 [WebAssembly] Support f16 libcalls
Add support for f16 libcalls in WebAssembly. This entails adding signatures
for the remaining F16 libcalls, and renaming gnu_f2h_ieee/gnu_h2f_ieee to
truncsfhf2/extendhfsf2 for consistency between f32 and f64/f128 (compiler-rt
already supports this).

Differential Revision: https://reviews.llvm.org/D61287

Reviewer: dschuff
llvm-svn: 359600
2019-04-30 19:17:59 +00:00
Sanjay Patel 3ec1c51716 [AArch64] add more tests for constant folding failures; NFC
llvm-svn: 359592
2019-04-30 18:15:18 +00:00
Craig Topper 3958719dda [X86] If PreprocessISelDAG reorders a load before a call, make sure we remove dead nodes from the graph
The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue.

This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it.

Differential Revision: https://reviews.llvm.org/D61164

llvm-svn: 359582
2019-04-30 17:56:47 +00:00
Craig Topper 965d1306ae [X86] Initial cleanups on the FixupLEAs pass. Separate Atom LEA creation from other LEA optimizations.
This removes some of the class variables. Merge basic block processing into
runOnMachineFunction to keep the flags local.

Pass MachineBasicBlock around instead of an iterator. We can get the iterator in
the few places that need it. Allows a range-based outer for loop.

Separate the Atom optimization from the rest of the optimizations. This allows
fixupIncDec to create INC/DEC and still allow Atom to turn it back into LEA
when profitable by its heuristics.

I'd like to improve fixupIncDec to turn LEAs into ADD any time the base or index
register is equal to the destination register. This is profitable regardless of
the various slow flags. But again we would want Atom to be able to undo that.

Differential Revision: https://reviews.llvm.org/D60993

llvm-svn: 359581
2019-04-30 17:56:28 +00:00
Sanjay Patel 0387bf5269 [SelectionDAG] remove div-by-zero constant folding restriction
We don't have this restriction in IR, so it should not be here
either simply out of consistency. Code that wants to handle FP
exceptions is expected to use the 'strict' variants of these
nodes.

We don't get the frem case because frem by 0.0 produces NaN (invalid),
and that's the remaining check here (so the removed check for frem
was dead code AFAIK).

This is the only place in SDAG that uses "HasFPExceptions", so I
think we should remove that entirely as a follow-up patch.

llvm-svn: 359566
2019-04-30 14:37:15 +00:00
Sanjay Patel c16fd75e44 [AArch64] add tests for fdiv/frem constant folding (PR41668); NFC
llvm-svn: 359561
2019-04-30 13:43:17 +00:00
Simon Pilgrim 22641cc194 Fix for bug 41512: lower INSERT_VECTOR_ELT(ZeroVec, 0, Elt) to SCALAR_TO_VECTOR(Elt) for all SSE flavors
Current LLVM uses pxor+pinsrb on SSE4+ for INSERT_VECTOR_ELT(ZeroVec, 0, Elt) insead of much simpler movd.
INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is idiomatic construct which is used e.g. for _mm_cvtsi32_si128(Elt) and for lowest element initialization in _mm_set_epi32.
So such inefficient lowering leads to significant performance digradations in ceratin cases switching from SSSE3 to SSE4.
https://bugs.llvm.org/show_bug.cgi?id=41512

Here INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is simply converted to SCALAR_TO_VECTOR(Elt) when applicable since latter is closer match to desired behavior and always efficiently lowered to movd and alike.

Committed on behalf of @Serge_Preis (Serge Preis)

Differential Revision: https://reviews.llvm.org/D60852

llvm-svn: 359545
2019-04-30 10:18:25 +00:00
Diana Picus 59a4c0481a [ARM GlobalISel] Widen small shift operands
The legalizer was already widening the shift amount. Add tests for that
behaviour, and also support widening the shifted value.

llvm-svn: 359542
2019-04-30 09:24:43 +00:00
Diana Picus 1e88ac213b [ARM GlobalISel] Be more careful about bailing out
Bail out on function arguments/returns with types aggregating an
unsupported type. This fixes cases where we would happily and
incorrectly lower functions taking e.g. [1 x i64] parameters, when we
don't even support plain i64 yet.

llvm-svn: 359540
2019-04-30 09:05:25 +00:00
Markus Lavin a475da36eb [DebugInfo] DW_OP_deref_size in PrologEpilogInserter.
The PrologEpilogInserter need to insert a DW_OP_deref_size before
prepending a memory location expression to an already implicit
expression to avoid having the existing expression act on the memory
address instead of the value behind it.

The reason for using DW_OP_deref_size and not plain DW_OP_deref is that
big-endian targets need to read the right size as simply truncating a
larger read would yield the wrong result (LSB bytes are not at the lower
address).

This re-commit fixes issues reported in the first one. Namely deref was
inserted under wrong conditions and additionally the deref_size argument
was incorrectly encoded.

Differential Revision: https://reviews.llvm.org/D59687

llvm-svn: 359535
2019-04-30 07:58:57 +00:00
Kang Zhang d43b66b318 [NFC][PowerPC] Use -check-prefixes to simplify the check in code-align.ll
Summary:
When checking the same output, we can use the `-check-prefixes` to simplify the check.
For example, if we want to check below output.
```
; GENERIC-LABEL: .globl  foo
; BASIC-LABEL: .globl  foo
; PWR-LABEL: .globl  foo
; GENERIC: .p2align  2
; BASIC: .p2align  4
; PWR: .p2align  4
; GENERIC: @foo
; BASIC: @foo
; PWR: @foo

```
If we use `-check-prefixes`
```
... -check-prefixes=CHECK,GENERAL
... -check-prefixes=CHECK,BASIC
... -check-prefixes=CHECK,PWR
```
Above check can be simplify to:
```
; CHECK-LABEL: .globl  foo
; GENERIC: .p2align  2
; BASIC: .p2align  4
; PWR: .p2align  4
; CHECK: @foo
```

Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D61227

llvm-svn: 359533
2019-04-30 03:39:05 +00:00
Zi Xuan Wu 49d60fdc2e [DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the target when combine ISD::TRUNC node
Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), 
if adde is not legal for the target. Even it's at type-legalize phase. 
Because adde is special and will not be legalized at operation-legalize phase later.

This fixes: PR40922
https://bugs.llvm.org/show_bug.cgi?id=40922

Differential Revision: https://reviews.llvm.org//D60854

llvm-svn: 359532
2019-04-30 03:01:14 +00:00
Ahsan Saghir 3962d6da17 Add __builtin_dcbf support for PPC
Summary:
This patch adds support for __builtin_dcbf for PPC.

__builtin_dcbf copies the contents of a modified block from the data cache
to main memory and flushes the copy from the data cache.

Differential revision: https://reviews.llvm.org/D59843

llvm-svn: 359517
2019-04-29 23:25:33 +00:00
Dan Gohman 8306cb5702 [WebAssembly] Define the signature for __stack_chk_fail
The WebAssembly backend needs to know the signatures of all runtime
libcall functions. This adds the signature for __stack_chk_fail which was
previously missing.

Also, make the error message for a missing libcall include the name of
the function.

Differential Revision: https://reviews.llvm.org/D59521

Reviewed By: sbc100

llvm-svn: 359505
2019-04-29 21:09:44 +00:00
Roland Froese 728e139700 [PowerPC] Try harder to avoid load/move-to VSR for partial vector loads
Change the PPCISelLowering.cpp function that decides to avoid update form in
favor of partial vector loads to know about newer load types and to not be
confused by the chain operand.

Differential Revision: https://reviews.llvm.org/D60102

llvm-svn: 359504
2019-04-29 21:08:35 +00:00
Jessica Paquette 7f6fe7c02c [GlobalISel][AArch64] Select llvm.aarch64.crypto.sha1h
This was falling back and gives us a reason to create a selectIntrinsic function
which we would need eventually anyway. Update arm64-crypto.ll to show that we
correctly select it.

Also factor out the code for finding an intrinsic ID.

llvm-svn: 359501
2019-04-29 20:58:17 +00:00
Martin Storsjo c0d138d147 [X86] Run CFIInstrInserter on Windows if Dwarf is used
This is necessary since SVN r330706, as tail merging can include
CFI instructions since then.

This fixes PR40322 and PR40012.

Differential Revision: https://reviews.llvm.org/D61252

llvm-svn: 359496
2019-04-29 20:25:51 +00:00
Simon Pilgrim 028485d7b9 [X86][SSE] isHorizontalBinOp - add support for target shuffles
Add target shuffle decoding to isHorizontalBinOp as well as ISD::VECTOR_SHUFFLE support.

This does mean we can go through bitcasts so we need to bitcast the extracted args to ensure they are the correct type

Fixes PR39936 and should help with PR39920/PR39921

Differential Revision: https://reviews.llvm.org/D61245

llvm-svn: 359491
2019-04-29 19:52:59 +00:00
Don Hinton 54dbcfe5f0 Fix additional cases of more that two dashes for options in tests.
llvm-svn: 359484
2019-04-29 18:58:52 +00:00
Bjorn Pettersson 820994572c [DAG] Refactor DAGCombiner::ReassociateOps
Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.

Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.

Reviewers: spatel, craig.topper, tstellar

Reviewed By: spatel

Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61199

llvm-svn: 359476
2019-04-29 17:50:10 +00:00
Kevin P. Neal a25c928302 Add AVX support to this test.
Requested by Craig Topper and Andrew Kaylor as part of D55897.

llvm-svn: 359461
2019-04-29 16:06:04 +00:00
Simon Pilgrim 9d4ed24f25 [X86][SSE] Add scalar horizontal add/sub tests for non-0/1 element extractions
llvm-svn: 359454
2019-04-29 14:26:27 +00:00
Simon Pilgrim c570b2a2e5 [X86][SSE] Moved haddps test from phaddsub.ll to haddsub.ll (D61245)
Also merged duplicate PR39921 + PR39936 tests

llvm-svn: 359437
2019-04-29 11:30:47 +00:00
Diogo N. Sampaio d95abb170b [ARM] Add bitcast/extract_subvec. of fp16 vectors
Summary:
This patch adds some basic operations for fp16
vectors, such as bitcast from fp16 to i16,
required to perform extract_subvector (also added
here) and extract_element.

Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard

Reviewed By: ostannard

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60618

llvm-svn: 359433
2019-04-29 10:28:07 +00:00
Diogo N. Sampaio 2078eb745d [ARM] Add v4f16 and v8f16 types to the CallingConv
Summary:
The Procedure Call Standard for the Arm Architecture
states that float16x4_t and float16x8_t behave just
as uint16x4_t and uint16x8_t for argument passing.
This patch adds the fp16 vectors to the
ARMCallingConv.td file.

Reviewers: miyuki, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60720

llvm-svn: 359431
2019-04-29 10:10:37 +00:00
Simon Pilgrim 65f12f66f6 [X86] Add PR39921 HADD pairwise reduction test and AVX2 test coverage
llvm-svn: 359409
2019-04-28 21:04:47 +00:00
Simon Pilgrim 85bacd0f95 [X86][AVX] Add fast-hops target for add/fadd reduction tests
llvm-svn: 359408
2019-04-28 20:04:08 +00:00
Simon Pilgrim e375257e95 [X86] Add PR39936 HADD Tests
llvm-svn: 359407
2019-04-28 20:03:11 +00:00
Simon Pilgrim d394195221 [X86][AVX] Enabled AVX512F tests and add PR40815 test case
llvm-svn: 359401
2019-04-28 15:04:30 +00:00
Simon Pilgrim 22d1476bfa [X86][AVX] Combine non-lane crossing binary shuffles using X86ISD::VPERMV3
Some of the combines might be further improved if we lower more shuffles with X86ISD::VPERMV3 directly, instead of waiting to combine the results.

llvm-svn: 359400
2019-04-28 14:31:01 +00:00
Sanjay Patel ce8cfe96f7 [SelectionDAG] include FP min/max variants as binary operators
The x86 test diffs don't look great because of extra move ops,
but FP min/max should clearly be included in the list.

llvm-svn: 359399
2019-04-28 13:19:29 +00:00
Sanjay Patel fb9a5307a9 [DAGCombiner] try repeated fdiv divisor transform before building estimate
This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359398
2019-04-28 12:23:43 +00:00
Simon Pilgrim 93ad48210c [X86][SSE] Optimize llvm.experimental.vector.reduce.xor.vXi1 parity reduction (PR38840)
An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0.

Differential Revision: https://reviews.llvm.org/D61230

llvm-svn: 359396
2019-04-28 10:46:17 +00:00
Simon Pilgrim fed302ae37 [X86][AVX] Add AVX512DQ coverage for masked memory ops tests (PR34584)
llvm-svn: 359395
2019-04-28 10:02:34 +00:00
Craig Topper bd35a30940 [X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead
Summary:
The register form of these instructions are CodeGenOnly instructions that cover
GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for
the opposite bitcast. Due to the patterns using bitcasts these instructions get
marked as "bitcast" machine instructions as well. The peephole pass is able to
look through these as well as other copies to try to avoid register bank copies.

Because FR32/FR64/VR128 are all coalescable to each other we can end up in a
situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to
GR32->GR64 which the copyPhysReg code can't handle.

To prevent this, this patch removes one set of the 'bitcast' instructions. So
now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that
converts from GR32/GR64->VR128 has no special significance to the peephole pass
and won't be looked through.

I guess the other option would be to add support to copyPhysReg to just promote
the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined
anyway. But removing the CodeGenOnly instruction in favor of one that won't be
optimized seemed safer.

I deleted the peephole test because it couldn't be made to work with the bitcast
instructions removed.

The load version of the instructions were unnecessary as the pattern that selects
them contains a bitcasted load which should never happen.

Fixes PR41619.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61223

llvm-svn: 359392
2019-04-28 06:25:33 +00:00
Simon Pilgrim 03c4e2663c Revert rL359389: [X86][SSE] Add support for <64 x i1> bool reduction
Minor generalization of the existing <32 x i1> pre-AVX2 split code.
........
Causing irregular buildbot failures.

llvm-svn: 359391
2019-04-27 20:44:08 +00:00
Simon Pilgrim 1a4a43250e [X86][AVX] Add additional SSE/AVX expandload and compressstore targets
llvm-svn: 359390
2019-04-27 20:20:02 +00:00
Simon Pilgrim 4118be3af6 [X86][SSE] Add support for <64 x i1> bool reduction
Minor generalization of the existing <32 x i1> pre-AVX2 split code.

llvm-svn: 359389
2019-04-27 20:04:44 +00:00
Simon Pilgrim 399746eaf6 [X86][AVX] Cleanup and add additional expandload and compressstore tests
sort order by types and add vXi32/vXi16/vXi8 test coverage

llvm-svn: 359388
2019-04-27 19:57:34 +00:00
Simon Pilgrim 2a2d422400 [X86][AVX512] Improve vector bool reductions
As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern.

llvm-svn: 359386
2019-04-27 17:32:46 +00:00
Simon Pilgrim 913bfd3363 [X86] Add vector boolean reduction tests (PR38840)
AND/OR/XOR tests for the @llvm.experimental.vector.reduce intrinsics

AND/OR are pretty good (pre-AVX512), XOR (not so common but used for parity reduction) is still pretty bad.

llvm-svn: 359385
2019-04-27 16:49:54 +00:00
Simon Pilgrim 5cf616530a Fix check-prefixes typo
llvm-svn: 359382
2019-04-27 15:41:14 +00:00
Simon Pilgrim 3879b2cd45 [X86][SSE] Add initial test case for subvector insert/extract of illegal types
Suggested by @nikic on D59188

llvm-svn: 359379
2019-04-27 15:30:06 +00:00
Simon Pilgrim acc1e6d1c6 [X86][AVX] Merge mask select with shuffles across extract_subvector (PR40332)
Fixes PR40332 in the limited case where we're selecting between a target shuffle and a zero vector.

We can extend this in the future to handle more opcodes and non-zero selections.

llvm-svn: 359378
2019-04-27 13:35:32 +00:00
Craig Topper 063b471ff7 [X86] Use MOVQ for i64 atomic_stores when SSE2 is enabled
Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store.

Reviewers: spatel, RKSimon, reames, jfb, efriedma

Reviewed By: RKSimon

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60546

llvm-svn: 359368
2019-04-27 03:38:15 +00:00
Mark Searles 76c5b62988 Revert "AMDGPU: Split block for si_end_cf"
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.

We discovered some internal test failures, so reverting for now.

Differential Revision: https://reviews.llvm.org/D61213

llvm-svn: 359363
2019-04-27 00:51:18 +00:00
Jessica Paquette 76f64b665b [GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for extracts
getConstantVRegValWithLookThrough does the same thing as the
getConstantValueForReg function, and has more visibility across GISel. Plus, it
supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code
reuse and more functionality for free by using it.

Add some test cases to select-extract-vector-elt.mir to show that we can now
look through those instructions.

llvm-svn: 359351
2019-04-26 21:53:13 +00:00
Nick Desaulniers 7ab164c4a4 [AsmPrinter] refactor to support %c w/ GlobalAddress'
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.

Refactors a few subclasses to support the target independent %a, %c, and
%n.

The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.

It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.

Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449

Reviewers: echristo, void

Reviewed By: void

Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60887

llvm-svn: 359337
2019-04-26 18:45:04 +00:00
Simon Pilgrim 27e01e675c [X86][AVX] Fold extract_subvector(broadcast(x)) -> broadcast(x) iff x has one use
llvm-svn: 359332
2019-04-26 18:02:14 +00:00
Jessica Paquette 67ab9eb193 [AArch64][GlobalISel] Select G_BSWAP for vectors of s32 and s64
There are instructions for these, so mark them as legal. Select the correct
instruction in AArch64InstructionSelector.cpp.

Update select-bswap.mir and arm64-rev.ll to reflect the changes.

llvm-svn: 359331
2019-04-26 18:00:01 +00:00
Stanislav Mekhanoshin 61beff020e [AMDGPU] gfx1010 VOP3 and VOP3P implementation
Differential Revision: https://reviews.llvm.org/D61202

llvm-svn: 359328
2019-04-26 17:56:03 +00:00
Stanislav Mekhanoshin 8f3da70eed [AMDGPU] gfx1010 VOP2 changes
Differential Revision: https://reviews.llvm.org/D61156

llvm-svn: 359316
2019-04-26 16:37:51 +00:00
Sanjay Patel 8224bc081c [x86] add tests for fmin/fmax; NFC
'maximum' and 'minimum' still crash, so they are commented out.

llvm-svn: 359306
2019-04-26 13:36:37 +00:00
Simon Pilgrim 5d6ef94c36 [X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758)
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).

Differential Revision: https://reviews.llvm.org/D61068

llvm-svn: 359293
2019-04-26 10:49:13 +00:00
Simon Pilgrim 5e161df9f8 [X86][AVX] Combine shuffles extracted from a common vector
A small step towards combining shuffles across vector sizes - this recognizes when a shuffle's operands are all extracted from the same larger source and tries to combine to an unary shuffle of that source instead. Fixes one of the test cases from PR34380.

Differential Revision: https://reviews.llvm.org/D60512

llvm-svn: 359292
2019-04-26 09:56:14 +00:00
Hans Wennborg 5d5ee4aff7 Fix alignment in AArch64InstructionSelector::emitConstantPoolEntry()
The code was using the alignment of a pointer to the value, not the
alignment of the constant itself.

Maybe we got away with it so far because the pointer alignment is
fairly high, but we did end up under-aligning <16 x i8> vectors,
which was caught in the Chromium build after lld stopped over-aligning
the .rodata.cst16 section in r356428. (See crbug.com/953815)

Differential revision: https://reviews.llvm.org/D61124

llvm-svn: 359287
2019-04-26 08:31:00 +00:00
Artem Belevich 16737538f4 PTX 6.3 extends `wmma` instruction to support s8/u8/s4/u4/b1 -> s32.
All of the new instructions are still handled mostly by tablegen. I've slightly
refactored the code to drive intrinsic/instruction generation from a master
list of supported variants, so all irregularities have to be implemented in one place only.

The test generation script wmma.py has been refactored in a similar way.

Differential Revision: https://reviews.llvm.org/D60015

llvm-svn: 359247
2019-04-25 22:27:57 +00:00
Artem Belevich 8d825b38ed [NVPTX] generate correct MMA instruction mnemonics with PTX63+.
PTX 6.3 requires using ".aligned" in the MMA instruction names.
In order to generate correct name, now we pass current
PTX version to each instruction as an extra constant operand
and InstPrinter adjusts its output accordingly.

Differential Revision: https://reviews.llvm.org/D59393

llvm-svn: 359246
2019-04-25 22:27:46 +00:00
Sanjay Patel 7a2718181e [x86] add tests for vector fdiv reciprocal estimate; NFC
llvm-svn: 359238
2019-04-25 20:35:47 +00:00
Jessica Paquette f54258c888 [GlobalISel][AArch64] Make G_EXTRACT_VECTOR_ELT legal for v8s16s
This case was missing before, so we couldn't legalize it.

Add it to AArch64LegalizerInfo.cpp and update select-extract-vector-elt.mir.

llvm-svn: 359231
2019-04-25 20:00:57 +00:00
Jessica Paquette 8184b6e7f6 [GlobalISel][AArch64] Add generic legalization rule for extends
This adds a legalization rule for G_ZEXT, G_ANYEXT, and G_SEXT which allows
extends whenever the types will fit in registers (or the source is an s1).

Update tests. Add GISel checks throughout all of arm64-vabs.ll,
where we now select a good portion of the code. Add GISel checks to
arm64-subvector-extend.ll, which has a good number of vector extends in it.

Differential Revision: https://reviews.llvm.org/D60889

llvm-svn: 359222
2019-04-25 18:42:00 +00:00
Craig Topper f9c30eddd0 [SelectionDAG][X86] Use stack load/store in PromoteIntRes_BITCAST when the input needs to be be split and the output type is a vector.
We had special case handling here, but it uses a scalar any_extend for the
promotion then bitcasts to the final type. This won't split up the input data
into multiple promoted elements like we need.

This patch falls back to doing the conversion through memory.

Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll
changes. The changes to vector-half-conversions.ll are fixing a previously
unknown miscompile from this issue.

Differential Revision: https://reviews.llvm.org/D61114

llvm-svn: 359219
2019-04-25 18:19:59 +00:00
Jessica Paquette ba55767f51 [GlobalISel][AArch64] Legalize G_FNEARBYINT
Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc.

Since the importer allows us to automatically select this after legalization,
also add tests for selection etc. Also update arm64-vfloatintrinsics.ll.

llvm-svn: 359204
2019-04-25 16:44:40 +00:00
Jessica Paquette bd7ac30b15 [GlobalISel] Add IRTranslator support for G_FNEARBYINT
Translate llvm.nearbyint into G_FNEARBYINT as a simple intrinsic. Update
arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D60922

llvm-svn: 359203
2019-04-25 16:39:28 +00:00
Jessica Paquette f13b6a74ce [GlobalISel] Add a G_FNEARBYINT opcode
For eventually selecting llvm.nearbyint. Equivalent to the SelectionDAG
nearbyint node.

Update legalizer-info-validation.mir.

Differential Revision: https://reviews.llvm.org/D60921

llvm-svn: 359201
2019-04-25 16:36:03 +00:00
Simon Pilgrim 0a7d1b3ce1 [X86][SSE] combineBitcastvxi1 - add support for bitcasting to non-scalar integers
Truncate the movmsk scalar integer result to the equivalent scalar integer width as before but then bitcast to the requested type.

We still have the issue identified in PR41594 but D61114 should handle this.

llvm-svn: 359176
2019-04-25 09:34:36 +00:00
Simon Atanasyan a0291110da [MIPS] Use custom bitcast lowering to avoid excessive instructions
On Mips32r2 bitcast can be expanded to two sw instructions and an ldc1
when using bitcast i64 to double or an sdc1 and two lw instructions when
using bitcast double to i64. By introducing custom lowering that uses
mtc1/mthc1 we can avoid excessive instructions.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D61069

llvm-svn: 359171
2019-04-25 07:47:28 +00:00
Alina Sbirlea 733c8c40c8 Enable LoopVectorization by default.
Summary:
When refactoring vectorization flags, vectorization was disabled by default in the new pass manager.
This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults.
Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization.

Reviewers: chandlerc, jgorbe

Subscribers: jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61091

llvm-svn: 359167
2019-04-25 04:49:48 +00:00
Amy Huang 68c9199493 Recommitting r358783 and r358786 "[MS] Emit S_HEAPALLOCSITE debug info" with fixes for buildbot error (undefined assembler label).
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.

Reviewers: rnk

Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D61083

llvm-svn: 359149
2019-04-24 23:02:48 +00:00
Sanjay Patel 6f41bf948b [DAGCombiner] scale repeated FP divisor by splat factor
If we have a vector FP division with a splatted divisor, use the existing transform
that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then
potentially be converted into a scalar FP division by existing combines (rL358984)
as seen in the tests here.

That can be a potentially big perf difference if scalar fdiv has better timing
(including avoiding possible frequency throttling for vector ops).

Differential Revision: https://reviews.llvm.org/D61028

llvm-svn: 359147
2019-04-24 22:28:58 +00:00
Joerg Sonnenberger 8372b467f1 [PowerPC] Allow using initial-exec TLS with PIC
Using initial-exec TLS variables is a reasonable performance
optimisation for system libraries. Use the correct PIC mechanism to get
hold of the GOT to avoid text relocations.

Differential Revision: https://reviews.llvm.org/D61026

llvm-svn: 359146
2019-04-24 22:12:22 +00:00
Craig Topper af194e9380 [X86] Prevent folding a load into an AND if that AND is really a ZEXT_INREG that should use movzx.
This can save a 32-bit immediate move.

We would shrink the load and fold it if it was non-volatile, but that's trickier to check for.

llvm-svn: 359129
2019-04-24 19:28:38 +00:00
Craig Topper 882ca6d484 [X86] Remove dead nodes left after ReplaceAllUsesWith calls during address matching
ReplaceAllUsesWith doesn't remove the node that was replaced. So its left around in the graph messing up use counts on other nodes.

One thing to note, is that this isn't valid if the node being deleted is the root node of an LEA match that gets rejected. In that case the node needs to stay alive because the isel table walking code would still have a reference to it that its going to try to match next. I don't think that's the case here though because the nodes being deleted here should be "and", "srl", and "zero_extend" none of which can be the root node of an LEA match.

Differential Revision: https://reviews.llvm.org/D61048

llvm-svn: 359121
2019-04-24 18:02:07 +00:00
Simon Pilgrim 10daecba1d [X86][SSE] Add tests for bitcasting vXi1 bool vectors to non-simple types.
llvm-svn: 359116
2019-04-24 17:25:45 +00:00
Stanislav Mekhanoshin cee607e414 [AMDGPU] Add gfx1010 target definitions
Differential Revision: https://reviews.llvm.org/D61041

llvm-svn: 359113
2019-04-24 17:03:15 +00:00
Sanjay Patel b1b3368907 [x86] make sure horizontal op and broadcast types match to simplify (PR41414)
If the types don't match, we can't just remove the shuffle.
There may be some other opportunity for optimization here,
but this should prevent the crashing seen in:
https://bugs.llvm.org/show_bug.cgi?id=41414

llvm-svn: 359095
2019-04-24 14:05:08 +00:00
Simon Pilgrim 039a563e6a [X86][SSE] Add masked bit test cases for PR26697
llvm-svn: 359082
2019-04-24 10:34:15 +00:00
Francis Visoiu Mistrih 7fee2b89fd [Remarks] Add string deduplication using a string table
* Add support for uniquing strings in the remark streamer and emitting the string table in the remarks section.

* Add parsing support for the string table in the RemarkParser.

From this remark:

```
--- !Missed
Pass:     inline
Name:     NoDefinition
DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
            Line: 7, Column: 3 }
Function: printArgsNoRet
Args:
  - Callee:   printf
  - String:   ' will not be inlined into '
  - Caller:   printArgsNoRet
    DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
                Line: 6, Column: 0 }
  - String:   ' because its definition is unavailable'
...
```

to:

```
--- !Missed
Pass: 0
Name: 1
DebugLoc: { File: 3, Line: 7, Column: 3 }
Function: 2
Args:
  - Callee:   4
  - String:   5
  - Caller:   2
    DebugLoc: { File: 3, Line: 6, Column: 0 }
  - String:   6
...
```

And the string table in the .remarks/__remarks section containing:

```
inline\0NoDefinition\0printArgsNoRet\0
test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c\0printf\0
will not be inlined into \0 because its definition is unavailable\0
```

This is mostly supposed to be used for testing purposes, but it gives us
a 2x reduction in the remark size, and is an incremental change for the
updates to the remarks file format.

Differential Revision: https://reviews.llvm.org/D60227

llvm-svn: 359050
2019-04-24 00:06:24 +00:00
Jessica Paquette 4fe7574d5d [AArch64][GlobalISel] Select G_INTRINSIC_ROUND
Add selection support for G_INTRINSIC_ROUND, add a selection test, and add
check lines to arm64-vfloatintrinsics.ll and f16-instructions.ll.

llvm-svn: 359046
2019-04-23 23:03:03 +00:00
Jessica Paquette 9766bf1854 [AArch64][GlobalISel] Mark G_INTRINSIC_ROUND as a pre-isel floating point opcode
Add G_INTRINSIC_ROUND to isPreISelGenericFloatingPointOpcode to ensure that its
input and output are assigned the correct register bank.

Add a regbankselect test to verify that we get what we expect here.

llvm-svn: 359044
2019-04-23 22:47:00 +00:00
Francis Visoiu Mistrih 1646851b87 [CGP] Look through bitcasts when duplicating returns for tail calls
The simple case of:

```
int *callee();
void *caller(void *a) {
  if (a == NULL)
    return callee();
  return a;
}
```

would generate a regular call instead of a tail call because we don't
look through the bitcast of the call to `callee` when duplicating the
return blocks.

Differential Revision: https://reviews.llvm.org/D60837

llvm-svn: 359041
2019-04-23 21:57:46 +00:00
Francis Visoiu Mistrih eea9da5921 [X86] Add codegen prepare test exercising a bitcast + tail call
In preparation of https://reviews.llvm.org/D60837, add this test where
we don't perform a tail call because we don't look through a bitcast.

llvm-svn: 359040
2019-04-23 21:57:43 +00:00
Heejin Ahn b9f282d384 [WebAssembly] Emit br_table for most switch instructions
Summary:
Always convert switches to br_tables unless there is only one case,
which is equivalent to a simple branch. This reduces code size for wasm,
and we defer possible jump table optimizations to the VM.
Addresses PR41502.

Reviewers: kripken, sunfish

Subscribers: dschuff, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60966

llvm-svn: 359038
2019-04-23 21:30:30 +00:00
Heejin Ahn ace7a086ca [WebAssembly] Make LBB markers not affected by test order
Summary:
This way we can change the order of tests or delete some of them without
affecting tests for other functions.

Reviewers: tlively

Subscribers: sunfish, dschuff, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60929

llvm-svn: 359036
2019-04-23 21:17:03 +00:00
Amy Huang fc79ab9857 Revert "[MS] Emit S_HEAPALLOCSITE debug info" because of ToTWin64(db)
buildbot failure.

This reverts commit d07d6d6177 and
c774f687b6.

llvm-svn: 359034
2019-04-23 21:12:58 +00:00
Jessica Paquette 3cc6d1f542 [AArch64][GlobalISel] Legalize G_INTRINSIC_ROUND
Add it to the same rule as G_FCEIL etc. Add a legalizer test, and add a missing
switch case to AArch64LegalizerInfo.cpp.

llvm-svn: 359033
2019-04-23 21:11:57 +00:00
Craig Topper 26518466ef [X86] Autogenerate complete checks. NFC
Prep for D60993

llvm-svn: 359031
2019-04-23 20:52:00 +00:00
Jessica Paquette 991cb39242 [AArch64][GlobalISel] Actually select G_INTRINSIC_TRUNC
Apparently FileCheck wasn't actually matching the fallback check lines in
arm64-vfloatintrinsics.ll properly. So, there were selection fallbacks for
G_INTRINSIC_TRUNC there.

Actually hook it up into AArch64InstructionSelector.cpp and write a proper
selection test.

I guess I'll figure out the FileCheck magic to make the fallback checks work
properly in arm64-vfloatintrinsics.ll.

llvm-svn: 359030
2019-04-23 20:46:19 +00:00
Jessica Paquette ede0b2e695 [AArch64][GlobalISel] Teach regbankselect about G_INTRINSIC_TRUNC
Add it to isPreISelGenericFloatingPointOpcode, and add a regbankselect test.

Update arm64-vfloatintrinsics.ll now that we can select it.

llvm-svn: 359022
2019-04-23 18:20:47 +00:00
Jessica Paquette 56342642a0 [AArch64][GlobalISel] Legalize G_INTRINSIC_TRUNC
Same patch as G_FCEIL etc.

Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct
rule in AArch64LegalizerInfo.cpp, and add a test.

llvm-svn: 359021
2019-04-23 18:20:44 +00:00
Stanislav Mekhanoshin c464dddccb [AMDGPU] Fixed addReg() in SIOptimizeExecMaskingPreRA.cpp
The second argument is flags, not subreg.

Differential Revision: https://reviews.llvm.org/D61031

llvm-svn: 359017
2019-04-23 17:59:26 +00:00
Jessica Paquette df5ce782ad [AArch64][GlobalISel] Legalize G_FMA for more vector types
Same as G_FCEIL, G_FABS, etc. Just move it into that rule.

Add a legalizer test for G_FMA, which we didn't have before and update
arm64-vfloatintrinsics.ll.

llvm-svn: 359015
2019-04-23 17:37:56 +00:00
Jessica Paquette e50e6d2563 [AArch64][GlobalISel] Add G_FMA to isPreISelGenericFloatingPointOpcode
Noticed an unnecessary fallback in arm64-vmul caused by this.

Also add a regbankselect test for G_FMA.

llvm-svn: 359013
2019-04-23 17:17:06 +00:00
Sanjay Patel 7c0bd5a27c [x86] fix test checks for fdiv combine; NFC
Must have picked up some transient code changes when originally generating this.

llvm-svn: 359008
2019-04-23 16:31:30 +00:00
Sanjay Patel 171b74e31c [x86] add tests for vector fdiv with splat divisor; NFC
llvm-svn: 359006
2019-04-23 16:16:16 +00:00
Sanjay Patel 12a561fa1b [x86] use psubus for more vsetcc lowering (PR39859)
Circling back to a leftover bit from PR39859:
https://bugs.llvm.org/show_bug.cgi?id=39859#c1

...we have this counter-intuitive (based on the test diffs) opportunity to use 'psubus'.
This appears to be the better perf option for both Haswell and Jaguar based on llvm-mca.
We already do this transform for the SETULT predicate, so this makes the code more
symmetrical too. If we have pminub/pminuw, we prefer those, so this should not affect
anything but pre-SSE4.1 subtargets.

  $ cat before.s
	movdqa	-16(%rip), %xmm2    ## xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
	pxor	%xmm0, %xmm2
	pcmpgtw	-32(%rip), %xmm2 ## xmm2 = [255,255,255,255,255,255,255,255]
	pand	%xmm2, %xmm0
	pandn	%xmm1, %xmm2
	por	%xmm2, %xmm0

  $ cat after.s
	movdqa	-16(%rip), %xmm2    ## xmm2 = [256,256,256,256,256,256,256,256]
	psubusw	%xmm0, %xmm2
	pxor	%xmm3, %xmm3
	pcmpeqw	%xmm2, %xmm3
	pand	%xmm3, %xmm0
	pandn	%xmm1, %xmm3
	por	%xmm3, %xmm0

  $ llvm-mca before.s -mcpu=haswell
  Iterations:        100
  Instructions:      600
  Total Cycles:      909
  Total uOps:        700

  Dispatch Width:    4
  uOps Per Cycle:    0.77
  IPC:               0.66
  Block RThroughput: 1.8

  $ llvm-mca after.s -mcpu=haswell
  Iterations:        100
  Instructions:      700
  Total Cycles:      409
  Total uOps:        700

  Dispatch Width:    4
  uOps Per Cycle:    1.71
  IPC:               1.71
  Block RThroughput: 1.8

Differential Revision: https://reviews.llvm.org/D60838

llvm-svn: 358999
2019-04-23 15:20:17 +00:00
Joerg Sonnenberger 6e7cc49d5c [SPARC] Use the correct register set for the "r" asm constraint.
64bit mode must use 64bit registers, otherwise assumptions about the top
half of the registers are made. Problem found by Takeshi Nakayama in
NetBSD.

llvm-svn: 358998
2019-04-23 15:15:33 +00:00