Commit Graph

27229 Commits

Author SHA1 Message Date
Sanjin Sijaric b694030647 [ARM64][Windows] Share unwind codes between epilogues
There are cases where we have multiple epilogues that have the exact same unwind
code sequence.  In that case, the epilogues can share the same unwind codes in
the .xdata section.  This should get us past the assert "SEH unwind data
splitting not yet implemented" in many cases.

We still need to add support for generating multiple .pdata/.xdata sections for
those functions that need to be split into fragments.

Differential Revision: https://reviews.llvm.org/D56813

llvm-svn: 351421
2019-01-17 09:45:17 +00:00
Thomas Lively cbda16eb8e [WebAssembly] Parse llvm.ident into producers section
llvm-svn: 351413
2019-01-17 02:29:55 +00:00
Thomas Lively 3cfcc94c09 Revert "[WebAssembly] Parse llvm.ident into producers section"
This reverts commit eccdbba3a02a33e13b5262e92200a33e2ead873d.

llvm-svn: 351410
2019-01-17 00:39:49 +00:00
Sanjin Sijaric 685565ae9a [SEH] [ARM64] Retrieve the frame pointer from SEH funclets
The Windows ARM64 runtime passes the establisher frame to funclets as the first
argument.

llvm-svn: 351404
2019-01-17 00:24:38 +00:00
Thomas Lively a56c23c5ba [WebAssembly] Parse llvm.ident into producers section
Summary:
Everything before the word "version" is the tool, and everything after
the word "version" is the version.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56742

llvm-svn: 351399
2019-01-16 23:46:14 +00:00
Craig Topper 59abdf5f3f [X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlv
Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits.

This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts.

We do have an existing InstCombine optimization to use shl/lshr when the shift amounts are all constant and in bounds. That should prevent some loss of constant folding from this change.

Patch by zhutianyang and Craig Topper

Differential Revision: https://reviews.llvm.org/D56695

llvm-svn: 351381
2019-01-16 21:46:32 +00:00
Changpeng Fang fe9269f804 AMDGPU: Adjust the chain for loads writing to the HI part of a register.
Summary:
  For these loads that write to the HI part of a register, we should chain them to the op that writes to the LO part
of the register to maintain the appropriate order.

Reviewers:
  rampitec, arsenm

Differential Revision:
  https://reviews.llvm.org/D56454

llvm-svn: 351379
2019-01-16 21:32:53 +00:00
Craig Topper e5b7cc8aa0 [X86] Add a one use check to the setcc inversion code in combineVSelectWithAllOnesOrZeros
If we're going to generate a new inverted setcc, we should make sure we will be able to remove the old setcc.

Differential Revision: https://reviews.llvm.org/D56765

llvm-svn: 351378
2019-01-16 21:29:29 +00:00
Craig Topper 238ad13b3a [X86] Add test case for D56765. NFC
llvm-svn: 351377
2019-01-16 21:29:26 +00:00
Nikita Popov 5fa75a0488 [X86] Add additional saturating add/sub vector tests; NFC
Additional tests for vNi32 and vNi64. I've added these for
usub.sat before, this covers uadd.sat, ssub.sat and sadd.sat.

llvm-svn: 351375
2019-01-16 20:53:23 +00:00
Mandeep Singh Grang 33c49c0c82 [COFF, ARM64] Implement support for SEH extensions __try/__except/__finally
Summary:
This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler.

We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape.

Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric

Reviewed By: rnk, efriedma

Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53540

llvm-svn: 351370
2019-01-16 19:52:59 +00:00
Andrea Di Biagio c5f0f5309e [X86][BtVer2] Update latency of horizontal operations.
On Jaguar, horizontal adds/subs have local forwarding disable.
That means, we pay a compulsory extra cycle of write-back stage, and the value
is not available until the end of that stage.

This patch changes the latency of horizontal operations by adding an extra
cycle. With this patch, latency numbers now match what is reported by perf.

I plan to send another patch to also 'fix' the latency of shuffle operations (on
Jaguar, local forwarding is disabled for vector shuffles too).

Differential Revision: https://reviews.llvm.org/D56777

llvm-svn: 351366
2019-01-16 18:18:01 +00:00
Simon Pilgrim d40ab8db30 [X86] Regenerate test
Split check-prefixes to support a future commit

llvm-svn: 351362
2019-01-16 18:00:09 +00:00
Sanjay Patel 546c9f6d8f [x86] add tests for extracted scalar casts (PR39974); NFC
https://bugs.llvm.org/show_bug.cgi?id=39974

llvm-svn: 351354
2019-01-16 16:11:30 +00:00
Marek Olsak c5cec5e1fa AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52944

llvm-svn: 351351
2019-01-16 15:43:53 +00:00
Sanjay Patel 0dbecd05ed [x86] lower shuffle of extracts to AVX2 vperm instructions
I was trying to prevent shuffle regressions while matching more horizontal ops 
and ended up here:
  shuf (extract X, 0), (extract X, 4), Mask --> extract (shuf X, undef, Mask'), 0

The affected tests were added for:
https://bugs.llvm.org/show_bug.cgi?id=34380

This patch won't change the examples in the bug report itself, but we should be 
able to extend this to catch more types.

Differential Revision: https://reviews.llvm.org/D56756

llvm-svn: 351346
2019-01-16 14:15:18 +00:00
Anton Korobeynikov cbdb4effae [MSP430] Emit a separate section for every interrupt vector
This is LLVM part of D56663

Linker scripts shipped by TI require to have every
interrupt vector in a separate section with a specific name:

 SECTIONS
 {
   __interrupt_vector_XX   : { KEEP (*(__interrupt_vector_XX )) } > VECTXX
   ...
 }

Follow the requirement emit the section for every vector
which contain address of interrupt handler:

  .section  __interrupt_vector_XX,"ax",@progbits
  .word %isr%

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56664

llvm-svn: 351345
2019-01-16 14:03:41 +00:00
Simon Pilgrim 1bdc049767 [X86][SSE] Add additional PR40318 shuffle test cases
llvm-svn: 351333
2019-01-16 13:15:59 +00:00
Sam Parker dd8cd6d26b [DAGCombine] Fix ReduceLoadWidth for shifted offsets
ReduceLoadWidth can trigger using a shifted mask is used and this
requires that the function return a shl node to correct for the
offset. However, the way that this was implemented meant that the
returned result could be an existing node, which would be incorrect.
This fixes the method of inserting the new node and replacing uses.

Differential Revision: https://reviews.llvm.org/D50432

llvm-svn: 351310
2019-01-16 08:40:12 +00:00
Aditya Nandakumar 500e3ead9f [GISel]: Add support for CSEing continuously during GISel passes.
https://reviews.llvm.org/D52803

This patch adds support to continuously CSE instructions during
each of the GISel passes. It consists of a GISelCSEInfo analysis pass
that can be used by the CSEMIRBuilder.

llvm-svn: 351283
2019-01-16 00:40:37 +00:00
Mandeep Singh Grang 436735c3fe [EH] Rename llvm.x86.seh.recoverfp intrinsic to llvm.eh.recoverfp
Summary:
Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc.
Refer D53541 for the context. Clang counterpart D56748.

Reviewers: rnk, efriedma

Reviewed By: rnk, efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56747

llvm-svn: 351281
2019-01-16 00:37:13 +00:00
Craig Topper 34ac509ac8 [X86] Add avx512 scatter intrinsics that use a vXi1 mask instead of a scalar integer.
We're trying to have the vXi1 types in IR as much as possible. This prevents the need for bitcasts when the producer of the mask was already a vXi1 value like an icmp. The bitcasts can be subject to code motion and interfere with basic block at a time isel in bad ways.

llvm-svn: 351275
2019-01-15 23:36:25 +00:00
Changpeng Fang 20fe3d2f35 AMDGPU: Raise the priority of MAD24 in instruction selection.
Summary:
  We have seen performance regression when v_add3 is generated. The major reason is that the v_mad pattern
is broken when v_add3 is generated. We also see the register pressure increased. While we could not properly
estimate register pressure during instruction selection, we can give mad a higher priority.

In this work, we raise the priority for mad24 in selection and resolve the performance regression.

Reviewers:
  rampitec

Differential Revision:
  https://reviews.llvm.org/D56745

llvm-svn: 351273
2019-01-15 23:12:36 +00:00
Roman Lebedev fb4eed381d X86DAGToDAGISel::matchBitExtract() with truncation (PR36419)
Summary:
Previously in D54095 i have added support for extraction of `lshr` from `X` if we are to produce `BEXTR`.
That was good, but the fix was partial, there was still [[ https://bugs.llvm.org/show_bug.cgi?id=36419 | PR36419 ]].

That pattern can also appear, roughly, when you have a large (64-bit) storage, and the consume bits from it.
It will not be unexpected if you will be doing further computations in 32-bit width.
And then the current code breaks, as the tests show.

The basic idea/pattern here is following:
1. We have `i64` input
2. We perform `i64` right-shift on it.
3. We `trunc`ate that shifted value
4. We do all further work (masking) in `i32`

Since we see `trunc`ation and not `lshr`, we give up, and stop trying to extract that right-shift.
BUT. The mask is `i32`, therefore we can extend both of the operands of the masking (`and`) to `i64`
and truncate the result after masking: https://rise4fun.com/Alive/K4B
```
Name: @bextr64_32_b1 -> @bextr64_32_b0
  %shiftedval = lshr i64 %val, %numskipbits
  %truncshiftedval = trunc i64 %shiftedval to i32
  %widenumlowbits1 = zext i8 %numlowbits to i32
  %notmask1 = shl nsw i32 -1, %widenumlowbits1
  %mask1 = xor i32 %notmask1, -1
  %res = and i32 %truncshiftedval, %mask1
=>
  %shiftedval = lshr i64 %val, %numskipbits
  %widenumlowbits = zext i8 %numlowbits to i64
  %notmask = shl nsw i64 -1, %widenumlowbits
  %mask = xor i64 %notmask, -1
  %wideres = and i64 %shiftedval, %mask
  %res = trunc i64 %wideres to i32
```

Thus, we are again able to extract that `lshr` into `BEXTR`'s control.

Now, the perf (via `llvm-exegesis`) of the snippet suggests that it is not a good idea:
```
$ cat /tmp/old.s
# bextr64_32_b1
# LLVM-EXEGESIS-LIVEIN RSI
# LLVM-EXEGESIS-LIVEIN EDX
# LLVM-EXEGESIS-LIVEIN RDI
movq %rsi, %rcx
shrq %cl, %rdi
shll $8, %edx
bextrl %edx, %edi, %eax
$ cat /tmp/old.s | ./bin/llvm-exegesis -mode=latency -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-1e0082.o
---
mode:            latency
key:
  instructions:
    - 'MOV64rr RCX RSI'
    - 'SHR64rCL RDI RDI'
    - 'SHL32ri EDX EDX i_0x8'
    - 'BEXTR32rr EAX EDI EDX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: latency, value: 0.6638, per_snippet_value: 2.6552 }
error:           ''
info:            ''
assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3
...
$ cat /tmp/old.s | ./bin/llvm-exegesis -mode=uops -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-43e346.o
---
mode:            uops
key:
  instructions:
    - 'MOV64rr RCX RSI'
    - 'SHR64rCL RDI RDI'
    - 'SHL32ri EDX EDX i_0x8'
    - 'BEXTR32rr EAX EDI EDX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: PdFPU0, value: 0, per_snippet_value: 0 }
  - { key: PdFPU1, value: 0, per_snippet_value: 0 }
  - { key: PdFPU2, value: 0, per_snippet_value: 0 }
  - { key: PdFPU3, value: 0, per_snippet_value: 0 }
  - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 }
error:           ''
info:            ''
assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3
...
```
vs
```
$ cat /tmp/new.s
# bextr64_32_b1
# LLVM-EXEGESIS-LIVEIN RDX
# LLVM-EXEGESIS-LIVEIN SIL
# LLVM-EXEGESIS-LIVEIN RDI
shlq $8, %rdx
movzbl %sil, %eax
orq %rdx, %rax
bextrq %rax, %rdi, %rax
$ cat /tmp/new.s | ./bin/llvm-exegesis -mode=latency -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8944f1.o
---
mode:            latency
key:
  instructions:
    - 'SHL64ri RDX RDX i_0x8'
    - 'MOVZX32rr8 EAX SIL'
    - 'OR64rr RAX RAX RDX'
    - 'BEXTR64rr RAX RDI RAX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: latency, value: 0.7454, per_snippet_value: 2.9816 }
error:           ''
info:            ''
assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3
...
$ cat /tmp/new.s | ./bin/llvm-exegesis -mode=uops -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-da403c.o
---
mode:            uops
key:
  instructions:
    - 'SHL64ri RDX RDX i_0x8'
    - 'MOVZX32rr8 EAX SIL'
    - 'OR64rr RAX RAX RDX'
    - 'BEXTR64rr RAX RDI RAX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: PdFPU0, value: 0, per_snippet_value: 0 }
  - { key: PdFPU1, value: 0, per_snippet_value: 0 }
  - { key: PdFPU2, value: 0, per_snippet_value: 0 }
  - { key: PdFPU3, value: 0, per_snippet_value: 0 }
  - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 }
error:           ''
info:            ''
assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3
...
```
^ latency increased (worse).

Except //maybe// not really.
Like with all synthetic benchmarks, they //may// be misleading.

Let's take a look on some actual real-world hotpath.
In this case it's 'my' [[ https://github.com/darktable-org/rawspeed | RawSpeed ]]'s `BitStream<>::peekBitsNoFill()`, in [[ e3316dc851/src/librawspeed/decompressors/VC5Decompressor.cpp (L814) | GoPro VC5 decompressor ]]:
```
raw.pixls.us-unique/GoPro/HERO6 Black$ /usr/src/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-clangs1-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR
RUNNING: /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmplwbKEM
2018-12-22 21:23:03
Running /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench
Run on (8 X 4012.81 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 3.41, 2.41, 2.03
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GOPR9172.GPR/threads:8/real_time_mean           40 ms         40 ms        128   0.322244          7.96974        12M       37.4457M        298.534M      3.12047       24.8778   0.040465
GOPR9172.GPR/threads:8/real_time_median         39 ms         39 ms        128   0.312606          7.99155        12M        38.387M        306.788M      3.19891       25.5656   0.039115
GOPR9172.GPR/threads:8/real_time_stddev          4 ms          3 ms        128  0.0271557         0.130575          0        2.4941M        21.3909M     0.207842       1.78257   3.81081m
RUNNING: /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpWAkan9
2018-12-22 21:23:08
Running /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench
Run on (8 X 4013.1 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 3.78, 2.50, 2.06
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GOPR9172.GPR/threads:8/real_time_mean           39 ms         39 ms        128   0.311533          7.97323        12M       38.6828M        308.471M      3.22356        25.706  0.0390928
GOPR9172.GPR/threads:8/real_time_median         38 ms         38 ms        128   0.304231          7.99005        12M       39.4437M        315.527M      3.28698        26.294  0.0380316
GOPR9172.GPR/threads:8/real_time_stddev          3 ms          3 ms        128  0.0229149         0.133814          0       2.26225M        19.1421M     0.188521       1.59517   3.13671m
Comparing /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench
Benchmark                                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------
GOPR9172.GPR/threads:8/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
GOPR9172.GPR/threads:8/real_time_mean                  -0.0339         -0.0316            40            39            40            39
GOPR9172.GPR/threads:8/real_time_median                -0.0277         -0.0274            39            38            39            38
GOPR9172.GPR/threads:8/real_time_stddev                -0.1769         -0.1267             4             3             3             3
```
I.e. this results in //roughly// -3% improvements in perf.

While this will help [[ https://bugs.llvm.org/show_bug.cgi?id=36419 | PR36419 ]], it won't address it fully.

Reviewers: RKSimon, craig.topper, andreadb, spatel

Reviewed By: craig.topper

Subscribers: courbet, llvm-commits

Differential Revision: https://reviews.llvm.org/D56052

llvm-svn: 351253
2019-01-15 21:31:18 +00:00
Craig Topper 82015b633b [X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar
In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen.

I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics.

I'll work on scatter and gatherpf/scatterpf next.

Differential Revision: https://reviews.llvm.org/D56527

llvm-svn: 351234
2019-01-15 20:12:33 +00:00
Craig Topper 99fcbf67d0 [Nios2] Remove Nios2 backend
As mentioned here http://lists.llvm.org/pipermail/llvm-dev/2019-January/129121.html This backend is incomplete and has not been maintained in several months.

Differential Revision: https://reviews.llvm.org/D56691

llvm-svn: 351231
2019-01-15 19:59:19 +00:00
Nikita Popov d3b86b79fa Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Reapplying with updated SLPVectorizer tests.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351219
2019-01-15 18:43:41 +00:00
Simon Pilgrim b8f08c8d7b [X86] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero (PR40306)
If we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements.

We were already doing this for SSSE3 targets as we have PSHUFB, but its worth doing for all targets.

llvm-svn: 351203
2019-01-15 16:56:55 +00:00
Simon Pilgrim fa7a8c7ddc [X86] Add PR40318 shuffle test case
The other test case is already covered by the PR40306 test case, which was mainly concerned with SSSE3 codegen.

llvm-svn: 351201
2019-01-15 16:31:10 +00:00
James Y Knight 693d39dd12 Remove irrelevant references to legacy git repositories from
compiler identification lines in test-cases.

(Doing so only because it's then easier to search for references which
are actually important and need fixing.)

llvm-svn: 351200
2019-01-15 16:18:52 +00:00
Sanjay Patel fad5bdaf95 [DAGCombiner] reduce buildvec of zexted extracted element to shuffle
The motivating case for this is shown in the first regression test. We are 
transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'.

That's a special-case for a more general pattern as shown here. In all tests, 
we're avoiding the vector-scalar-vector moves in favor of vector ops.

We aren't producing optimal shuffle code in some cases though, so the patch is
limited to reduce regressions.

Differential Revision: https://reviews.llvm.org/D56281

llvm-svn: 351198
2019-01-15 16:11:05 +00:00
Roman Lebedev 58000f804a [NFC][X86] extract-bits.ll: add test with truncation with extra-use.
That extra-use *should* prevent D56052 from looking past the trunc.

llvm-svn: 351182
2019-01-15 10:36:20 +00:00
Craig Topper 2581249f05 [X86] Upgrade some avx512bw shift intrinsics that were removed a while ago. NFC
Masking was removed from these intrinsics and I guess we didn't update the tests then.

llvm-svn: 351165
2019-01-15 07:15:20 +00:00
Craig Topper 3c5423b26e [X86] Add test cases for D56695. NFC
llvm-svn: 351162
2019-01-15 06:39:51 +00:00
Craig Topper a3cfdcc21f [X86] Switch the triple on avx2-intrinsics-x86.ll to be -unknown-unknown instead of darwin so the constant pool entries will be filtered better by the script.
Darwin uses LCPI instead of .LCPI so the filter doesn't work.

This is silly, but it will help reduce some future some test diffs.

llvm-svn: 351161
2019-01-15 06:39:49 +00:00
Thomas Lively 6bf2b40051 [WebAssembly] Expand SIMD shifts while V8's implementation disagrees
Summary:
V8 currently implements SIMD shifts as taking an immediate operation,
which disagrees with the spec proposal and the toolchain
implementation. As a stopgap measure to get things working, unroll all
vector shifts. Since this is a temporary measure, there are no tests.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D56520

llvm-svn: 351151
2019-01-15 02:16:03 +00:00
Marek Olsak 33eb4d947d AMDGPU: Add a fast path for icmp.i1(src, false, NE)
Summary:
This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.

And we can also get the mask from and i1, or i1, xor i1.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52060

llvm-svn: 351150
2019-01-15 02:13:18 +00:00
Evandro Menezes f793fe1402 [AArch64] Adjust the feature set for Exynos
Enable the fusion of arithmetic and logic instructions for Exynos M4.

llvm-svn: 351149
2019-01-15 01:53:49 +00:00
Reid Kleckner fe5e5dcab0 [X86] Avoid clobbering ESP/RSP in the epilogue.
Summary:
In r345197 ESP and RSP were added to GR32_TC/GR64_TC, allowing them to
be used for tail calls, but this also caused `findDeadCallerSavedReg` to
think they were acceptable targets for clobbering. Filter them out.

Fixes PR40289.

Patch by Geoffry Song!

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D56617

llvm-svn: 351146
2019-01-15 01:24:18 +00:00
Evandro Menezes 111033d917 [AArch64] Fix typo (NFC)
Fix another typo, this time in the `RUN` line, which used a syntax not
universally supported, in test case added by D56572.

llvm-svn: 351144
2019-01-15 00:58:59 +00:00
Evandro Menezes 4b6c290fbd [AArch64] Fix typo (NFC)
Fix typo in test case added by D56572 (rL351139).

llvm-svn: 351143
2019-01-15 00:20:57 +00:00
Eli Friedman 295e346da4 [EarlyIfConversion] Don't if-convert unconditional branches.
A block ending in an unconditional branch can have two successors if one
is a landing pad.  In practice, I think this only has an effect on
Windows because landing pads are never empty for Itanium unwinding.

(Alternatively, I could add a check to
AArch64InstrInfo::canInsertSelect, but this seems more obvious.)

Differential Revision: https://reviews.llvm.org/D56468

llvm-svn: 351142
2019-01-15 00:19:46 +00:00
Eli Friedman 33aecc8182 [AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 .
Otherwise, with D56544, the intrinsic will be expanded to an integer
csel, which is probably not what the user expected.  This matches the
general convention of using "v1" types to represent scalar integer
operations in vector registers.

While I'm here, also add some error checking so we don't generate
illegal ABS nodes.

Differential Revision: https://reviews.llvm.org/D56616

llvm-svn: 351141
2019-01-15 00:15:24 +00:00
Evandro Menezes bf59cb02c3 [AArch64] Add new target feature to fuse arithmetic and logic operations
This feature enables the fusion of some arithmetic and logic instructions
together.

Differential revision: https://reviews.llvm.org/D56572

llvm-svn: 351139
2019-01-14 23:54:36 +00:00
Nikita Popov 5885eec35a Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
This reverts commit r351125.

I missed test changes in an SLPVectorizer test, due to the cost model
changes. Reverting for now.

llvm-svn: 351129
2019-01-14 22:18:39 +00:00
Thomas Lively 2a47e03ee4 [WebAssembly][FastISel] Do not assume naive CmpInst lowering
Summary:
Fixes https://bugs.llvm.org/show_bug.cgi?id=40172. See
test/CodeGen/WebAssembly/PR40172.ll for an explanation.

Reviewers: dschuff, aheejin

Subscribers: nikic, llvm-commits, sunfish, jgravelle-google, sbc100

Differential Revision: https://reviews.llvm.org/D56457

llvm-svn: 351127
2019-01-14 22:03:43 +00:00
Nikita Popov 8e9a8432a8 [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351125
2019-01-14 21:43:30 +00:00
Simon Pilgrim bfe2ee453a [X86][SSSE3] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero (PR40306)
If we have PSHUFB and we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements.

llvm-svn: 351103
2019-01-14 19:07:26 +00:00
Sanjay Patel b23ff7a0e2 [x86] lower extracted add/sub to horizontal vector math
add (extractelt (X, 0), extractelt (X, 1)) --> extractelt (hadd X, X), 0

This is the integer sibling to D56011.

There's an additional restriction to only to do this transform in the
case where we don't have extra extracts from the source vector. Without
that, we can fail to match larger horizontal patterns that are more 
beneficial than this minimal case. An improvement to the more general 
h-op lowering may allow us to remove the restriction here in a follow-up.

llvm-svn: 351093
2019-01-14 18:44:02 +00:00
Dan Gohman 8da641455c [WebAssembly] Remove tests for old intrinsics.
This is a followup to r351084 which removes the tests for the old
intrinsic names.

llvm-svn: 351086
2019-01-14 18:25:29 +00:00