Commit Graph

48740 Commits

Author SHA1 Message Date
Jonas Paulsson 5ffb27b166 [SystemZ] Increase the amount of inlining.
Implement getInliningThresholdMultiplier() and have it return 3.

Review: Ulrich Weigand
llvm-svn: 339563
2018-08-13 13:31:30 +00:00
Daniel Cederman 1bfbc62022 [Sparc] Add support for the cycle counter available in GR740
Summary: The GR740 provides an up cycle counter in the
registers ASR22 and ASR23. As these registers can not be
read together atomically we only use the value of ASR23
for llvm.readcyclecounter(). The ASR23 register holds the
32 LSBs of the up-counter.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D48638

llvm-svn: 339551
2018-08-13 10:49:48 +00:00
Luke Geeson 4ce41d2bb7 [ARM] Added FP16 VREV Vector Instrinsic CodeGen support
llvm-svn: 339546
2018-08-13 08:37:41 +00:00
Matt Arsenault 13b0db9285 AMDGPU: Check NSZ MI flag when folding omod
I'm not sure the exact nsz flag combination that
is OK. I think as long as it's on either, this is OK.
For now just check it on the omod multiply.

llvm-svn: 339513
2018-08-12 08:44:25 +00:00
Matt Arsenault b5acec1f79 AMDGPU: Use splat vectors for undefs when folding canonicalize
If one of the elements is undef, use the canonicalized constant
from the other element instead of 0.

Splat vectors are more useful for other optimizations, such
as matching vector clamps. This was breaking on clamps
of half3 from the undef 4th component.

llvm-svn: 339512
2018-08-12 08:42:54 +00:00
Matt Arsenault 3ead7d7389 AMDGPU: Fix packing undef parts of build_vector
llvm-svn: 339511
2018-08-12 08:42:46 +00:00
Craig Topper ed8a114c86 [X86] Remove unnecessary AddedComplexity line. NFC
The use of the or_is_add predicate already gives enough of a complexity boost to get the patterns ordered properly.

llvm-svn: 339507
2018-08-12 03:22:18 +00:00
Craig Topper b3e3477649 [X86] Remove the AL/AX/EAX/RAX short immediate forms from the macro fusion shouldScheduleAdjacent. NFC
These instructions are only created by the backend during MCInst lowering.

llvm-svn: 339499
2018-08-11 06:42:51 +00:00
Craig Topper c6cf169940 [X86] Add the mem-reg form of CMP to the macro fusion shouldScheduleAdjacent.
Unlike the other arithmetic instructions the mem-reg form of compare is just a load and not a RMW operation. According to the Intel optimization manual, this form is also supported by macro fusion.

llvm-svn: 339498
2018-08-11 06:42:50 +00:00
Craig Topper 616eeb827d [X86] Remove ADD8mi and ADDmr from the macro fusion shouldScheduleAdjacent.
The are RMW of memory operations. They aren't eligible for macro fusion.

llvm-svn: 339497
2018-08-11 06:42:49 +00:00
Craig Topper 570d47a010 [X86] Change the MOV32ri64 pseudo instruction to def a GR64 directly instead of wrapping it in a SUBREG_TO_REG.
Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode.

This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test.

llvm-svn: 339496
2018-08-11 05:33:00 +00:00
Richard Trieu 01f99f3cd6 Fix WebAssembly instruction printer after r339474
Treat the stack variants of control instructions the same as regular
instructions.  Otherwise, the vector ControlFlowStack will be the wrong
size and have out-of-bounds access.  This was detected by MemorySanitizer.

llvm-svn: 339495
2018-08-11 04:18:05 +00:00
Tom Stellard 8adc86a7dc AMDGPU/GlobalISel: Define instruction mapping for G_INSERT
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49625

llvm-svn: 339491
2018-08-11 00:51:54 +00:00
Eli Friedman 6b84a48953 Fix unused lambda capture warning from r339472.
llvm-svn: 339479
2018-08-10 22:03:25 +00:00
Wouter van Oortmerssen ab26bd0647 [WebAssembly] Added default stack-only instruction mode for MC.
Summary:
Moved Explicit Locals pass to last.
Made that pass obligatory.
Made it convert from register to stack based instructions, and removed the registers.
Fixes to related code that was expecting register based instructions.
Added the correct testing flag to all tests, depending on what the
format they were expecting so far.
Translated one test to stack format as example: reg-stackify-stack.ll

tested:
llvm-lit -v `find test -name WebAssembly`
unittests/MC/*

Reviewers: dschuff, sunfish

Subscribers: jfb, llvm-commits, aheejin, eraman, jgravelle-google, sbc100

Differential Revision: https://reviews.llvm.org/D50568

llvm-svn: 339474
2018-08-10 21:32:47 +00:00
Eli Friedman e1687a89e8 [ARM] Adjust AND immediates to make them cheaper to select.
LLVM normally prefers to minimize the number of bits set in an AND
immediate, but that doesn't always match the available ARM instructions.
In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer
a two-instruction sequence movs+ands or movs+bics.

Some potential improvements outlined in
ARMTargetLowering::targetShrinkDemandedConstant, but seems to work
pretty well already.

The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX
instruction due to a larger-than-expected mask. (It's orthogonal, in
some sense, but as far as I can tell it's either impossible or nearly
impossible to reproduce the bug without this change.)

According to my testing, this seems to consistently improve codesize by
a small amount by forming bic more often for ISD::AND with an immediate.

Differential Revision: https://reviews.llvm.org/D50030

llvm-svn: 339472
2018-08-10 21:21:53 +00:00
Matt Arsenault 940e6075e4 AMDGPU: More canonicalized operations
llvm-svn: 339464
2018-08-10 19:20:17 +00:00
Matt Arsenault 3dcf4ce435 AMDGPU: Combine and of seto/setuo and fp_class
Clear the nan (or non-nan) test bits from the mask.

llvm-svn: 339462
2018-08-10 18:58:56 +00:00
Matt Arsenault 8ad00d30fa AMDGPU: Match isfinite pattern to class instructions
llvm-svn: 339460
2018-08-10 18:58:41 +00:00
Matt Arsenault 5bb9d798b4 AMDGPU: Add LLVM_FALLTHROUGH
llvm-svn: 339458
2018-08-10 17:57:12 +00:00
Sam Parker 8c4b964c5a [ARM] Disallow zexts in ARMCodeGenPrepare
Enabling ARMCodeGenPrepare by default caused a whole load of
failures. This is due to zexts and truncs not being handled properly.
ZExts are messy so it's just easier to disable for now and truncs
are allowed only as 'sinks'. I still need to figure out why allowing
them as 'sources' causes so many failures. The other main changes are
that we are explicit in the types that we converting to, it's now
always 'TypeSize'. Type support is also now performed while checking
for valid opcodes as it unnecessarily complicated having the checks
are different stages.
    
I've moved the tests around too, so we have the zext and truncs in
their own file as well as the overflowing opcode tests.

Differential Revision: https://reviews.llvm.org/D50518

llvm-svn: 339432
2018-08-10 13:57:13 +00:00
Simon Pilgrim 130b00bc43 [X86][SSE] Pull out repeated shift getOpcode() calls. NFCI.
llvm-svn: 339425
2018-08-10 11:42:42 +00:00
Heejin Ahn 5831e9cc79 [WebAssembly] Gate i64x2 and f64x2 on -wasm-enable-unimplemented
Summary:
i64x2 and f64x2 operations are not implemented in V8, so we normally
do not want to emit them. However, they are in the SIMD spec proposal,
so we still want to be able to test them in the toolchain. This patch
adds a flag to enable their emission.

Reviewers: aheejin, dschuff

Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D50423

Patch by Thomas Lively (tlively)

llvm-svn: 339407
2018-08-09 23:58:51 +00:00
Craig Topper 9a8136f7b4 [X86] Qualify one of the heuristics in combineMul to only apply to positive multiply amounts.
This seems to slightly help the performance of one of our internal benchmarks. We probably need better heuristics here.

llvm-svn: 339406
2018-08-09 23:27:42 +00:00
Heejin Ahn 41b25c6cf4 [WebAssembly] Fix wasm backend compilation on gcc 5.4: variable name cannot match class
Summary:
gcc does not like

const Region *Region;

It wants a different name for the variable.

Is there a better convention for what name to use in such a case?

Reviewers: sbc100, aheejin

Subscribers: aheejin, jgravelle-google, dschuff, llvm-commits

Differential Revision: https://reviews.llvm.org/D50472

Patch by Alon Zakai (kripken)

llvm-svn: 339398
2018-08-09 22:35:23 +00:00
Reid Kleckner fce7f73bec [MC] Move EH DWARF encodings from MC to CodeGen, NFC
Summary:
The TType encoding, LSDA encoding, and personality encoding are all
passed explicitly by CodeGen to the assembler through .cfi_* directives,
so only the AsmPrinter needs to know about them.

The FDE CFI encoding however, controls the encoding of the label
implicitly created by the .cfi_startproc directive. That directive seems
to be special in that it doesn't take an encoding, so the assembler just
has to know how to encode one DSO-local label reference from .eh_frame
to .text.

As a result, it looks like MC will continue to have to know when the
large code model is in use. Perhaps we could invent a '.cfi_startproc
[large]' flag so that this knowledge doesn't need to pollute the
assembler.

Reviewers: davide, lliu0, JDevlieghere

Subscribers: hiraditya, fedor.sergeev, llvm-commits

Differential Revision: https://reviews.llvm.org/D50533

llvm-svn: 339397
2018-08-09 22:24:04 +00:00
Ana Pazos 10de234905 [RISC-V] Fixed alias for addi x2, x2, 0
A missing check for non-zero immediate in MCOperandPredicate
caused c.addi16sp sp, 0 to be selected which is not a valid
instruction.

llvm-svn: 339381
2018-08-09 20:51:53 +00:00
Krzysztof Parzyszek 75c2ca3638 [Hexagon] Map ISD::TRAP to J2_trap0(#0)
llvm-svn: 339365
2018-08-09 18:03:45 +00:00
Evandro Menezes 8c4366273c [ARM] Adjust the feature set for Exynos
Enable `FeatureZCZeroing`, `FeatureHasSlowFPVMLx`, `FeatureExpandMLx`,
`FeatureProfUnpredicate`, `FeatureSlowVDUP32`, `FeatureSlowVGETLNi32`,
`FeatureSplatVFPToNeon`, `FeatureHasRetAddrStack`, `FeatureSlowFPBrcc` for
all Exynos processors.

llvm-svn: 339356
2018-08-09 16:34:38 +00:00
Evandro Menezes 9a92fe0c9e [ARM] Replace processor check with feature
Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a
processor check.  Otherwise, NFC.

llvm-svn: 339354
2018-08-09 16:13:24 +00:00
Andrea Di Biagio f3bde0485c [MC][PredicateExpander] Extend the grammar to support simple switch and return statements.
This patch introduces tablegen class MCStatement.

Currently, an MCStatement can be either a return statement, or a switch
statement.

```
MCStatement:
   MCReturnStatement
   MCOpcodeSwitchStatement
```

A MCReturnStatement expands to a return statement, and the boolean expression
associated with the return statement is described by a MCInstPredicate.

An MCOpcodeSwitchStatement is a switch statement where the condition is a check
on the machine opcode. It allows the definition of multiple checks, as well as a
default case. More details on the grammar implemented by these two new
constructs can be found in the diff for TargetInstrPredicates.td.

This patch makes it easier to read the body of auto-generated TargetInstrInfo
predicates.

In future, I plan to reuse/extend the MCStatement grammar to describe more
complex target hooks. For now, this is just a first step (mostly a minor
cosmetic change to polish the new predicates framework).

Differential Revision: https://reviews.llvm.org/D50457

llvm-svn: 339352
2018-08-09 15:32:48 +00:00
Sjoerd Meijer 806f70d229 [ARM] FP16: codegen support for VTRN
Differential Revision: https://reviews.llvm.org/D50454

llvm-svn: 339340
2018-08-09 12:45:09 +00:00
Simon Pilgrim 511c3fc529 [X86][SSE] Remove PMULDQ/PMULUDQ by zero
Exposed by D50328

Differential Revision: https://reviews.llvm.org/D50328

llvm-svn: 339337
2018-08-09 12:37:36 +00:00
Simon Pilgrim 01ae462fef [X86][SSE] Combine (some) target shuffles with multiple uses
As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses.

This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool.

However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move.

This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit.

Differential Revision: https://reviews.llvm.org/D50328

llvm-svn: 339335
2018-08-09 12:30:02 +00:00
Andrew V. Tischenko 24f63bcb34 [X86] Improved sched models for X86 XCHG*rr and XADD*rr instructions.
Differential Revision: https://reviews.llvm.org/D49861

llvm-svn: 339321
2018-08-09 09:23:26 +00:00
Jonas Hahnfeld 20526bf483 [NVPTX] Select atomic loads and stores
According to PTX ISA .volatile has the same memory synchronization
semantics as .relaxed.sys, so it can be used to implement monotonic
atomic loads and stores. This is important for OpenMP's atomic
construct where
 - 'read's and 'write's are lowered to atomic loads and stores, and
 - an update of float or double types are lowered into a cmpxchg loop.
(Note that PTX could do better because it has atom.add.f{32,64} but
LLVM's atomicrmw instruction only allows integer types.)

Higher levels of atomicity (like acquire and release) need additional
synchronization properties which were added with PTX ISA 6.0 / sm_70.
So using these instructions still results in an error.

Differential Revision: https://reviews.llvm.org/D50391

llvm-svn: 339316
2018-08-09 07:45:49 +00:00
Roger Ferrer Ibanez 577a97e2b9 [RISCV] Add "lla" pseudo-instruction to assembler
This pseudo-instruction is similar to la but uses PC-relative addressing
unconditionally. This is, la is only different to lla when using -fPIC. This
pseudo-instruction seems often forgotten in several specs but it is definitely
mentioned in binutils opcodes/riscv-opc.c. The semantics are defined both in
page 37 of the "RISC-V Reader" book but also in function macro found in
gas/config/tc-riscv.c.

This is a very first step towards adding PIC support for Linux in the RISC-V
backend.

The lla pseudo-instruction expands to a sequence of auipc + addi with a couple
of pc-rel relocations where the second points to the first one. This is
described in
https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#pc-relative-symbol-addresses

For now, this patch only introduces support of that pseudo instruction at the
assembler parser.

Differential Revision: https://reviews.llvm.org/D49661

llvm-svn: 339314
2018-08-09 07:08:20 +00:00
Eli Friedman 5b45a39056 [ARM] Avoid spilling lr with Thumb1 tail calls.
Normally, if any registers are spilled, we prefer to spill lr on Thumb1
so we can fold the "bx lr" into the "pop".  However, if there are tail
calls involved, restoring lr is expensive, so skip the optimization in
that case.

The spill of r7 in the new test also isn't necessary, but that's
mostly orthogonal to this patch. (It's the same code in
ARMFrameLowering, but it's not related to tail calls.)

Differential Revision: https://reviews.llvm.org/D49459

llvm-svn: 339283
2018-08-08 20:03:10 +00:00
Krzysztof Parzyszek 1df7059150 [Hexagon] Diagnose misaligned absolute loads and stores
Differential Revision: https://reviews.llvm.org/D50405

llvm-svn: 339272
2018-08-08 17:00:09 +00:00
Matt Arsenault 935f3b70fe AMDGPU: Error more gracefully on libcalls
I think this is the only situation where the callsite
will have a null instruction.

llvm-svn: 339271
2018-08-08 16:58:39 +00:00
Matt Arsenault e719139b10 AMDGPU: Fix shifts for i128
llvm-svn: 339270
2018-08-08 16:58:33 +00:00
Zaara Syeda b2595b988b [PowerPC] Improve codegen for vector loads using scalar_to_vector
This patch aims to improve the codegen for vector loads involving the
scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used
for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X)
to utilize:

LXSD and LXSDX for i64 and f64
LXSIWAX for i32 (sign extension to i64)
LXSIWZX for i32 and f64

Committing on behalf of Amy Kwan.
Differential Revision: https://reviews.llvm.org/D48950

llvm-svn: 339260
2018-08-08 15:20:43 +00:00
Alex Bradbury 07224dfb47 [RISCV] Add mnemonic alias: move, sbreak and scall.
Further improve compatibility with the GNU assembler.

Differential Revision: https://reviews.llvm.org/D50217
Patch by Kito Cheng.

llvm-svn: 339255
2018-08-08 14:53:45 +00:00
Alex Bradbury 7d8d87c143 [RISCV] Add InstAlias definitions for add[w], and, xor, or, sll[w], srl[w], sra[w], slt and sltu with immediate
Match the GNU assembler in supporting immediate operands for these 
instructions even when the reg-reg mnemonic is used.

Differential Revision: https://reviews.llvm.org/D50046
Patch by Kito Cheng.

llvm-svn: 339252
2018-08-08 14:45:44 +00:00
Sjoerd Meijer f8c394f0f5 [ARM] FP16: codegen support for VEXT
Differential Revision: https://reviews.llvm.org/D50427

llvm-svn: 339241
2018-08-08 13:26:38 +00:00
Sjoerd Meijer db5908deb9 [ARM] FP16: vector vmov and vdup support
This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants.

Differential Revision: https://reviews.llvm.org/D50329

llvm-svn: 339238
2018-08-08 13:11:31 +00:00
Sjoerd Meijer 920a453485 [ARM] FP16: vector VMUL variants
This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants.

Differential Revision: https://reviews.llvm.org/D50326

llvm-svn: 339232
2018-08-08 10:27:34 +00:00
Benjamin Kramer 83996e4dee [Wasm] Don't iterate over MachineBasicBlock::successors while erasing from it
This will read out of bounds. Found by asan.

llvm-svn: 339230
2018-08-08 10:13:19 +00:00
Sjoerd Meijer b33a4c02cc [ARM] FP16: support vector INT_TO_FP and FP_TO_INT
This adds codegen support for the different vcvt_f16 variants.

Differential Revision: https://reviews.llvm.org/D50393

llvm-svn: 339227
2018-08-08 09:45:34 +00:00
Sjoerd Meijer b264944ed5 [ARM] FP16: support the vector vmin and vmax variants
Differential Revision: https://reviews.llvm.org/D50238

llvm-svn: 339221
2018-08-08 07:20:15 +00:00
Jan Vesely 7b2c98ab59 AMDGPU: Remove broken i16 ternary patterns
Fixup test to check for GCN prefix
These patterns always zero extend the result even though it might need sign extension.
This has been broken since the addition of i16 support.
It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence:
        v_mad_i16 v2, v10, v9, v11
        v_med3_i32 v2, v2, v8, v7

Fixes mad_sat(char) piglit on VI.

Differential Revision: https://reviews.llvm.org/D49836

llvm-svn: 339190
2018-08-07 21:54:37 +00:00
Derek Schuff 51ed131ed2 [WebAssembly] Update SIMD binary arithmetic
Add missing SIMD types (v2f64) and binary ops. Also adds
tablegen support for automatically prepending prefix byte to SIMD
opcodes.

Differential Revision: https://reviews.llvm.org/D50292

Patch by Thomas Lively

llvm-svn: 339186
2018-08-07 21:24:01 +00:00
Krzysztof Parzyszek e7ce247dd7 [Hexagon] Allow use of gather intrinsics even with no-packets
Vgather requires must be in a packet with a store, which contradicts
the no-packets feature. As a consequence, gather/scatter could not be
used with no-packets. Relax this, and allow gather packets as exceptions
to the no-packets requirements.

llvm-svn: 339177
2018-08-07 20:33:47 +00:00
Heejin Ahn 7fb68d2679 [WebAssembly] CFG sort support for exception handling
Summary:
This patch extends CFGSort pass to support exception handling. Once it
places a loop header, it does not place blocks that are not dominated by
the loop header until all the loop blocks are sorted. This patch extends
the same algorithm to exception 'catch' part, using the information
calculated by WebAssemblyExceptionInfo class.

Reviewers: dschuff, sunfish

Subscribers: sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D46500

llvm-svn: 339172
2018-08-07 20:19:23 +00:00
Craig Topper deb2899b2d [SelectionDAG][X86][SystemZ] Add a generic nonvolatile_store/nonvolatile_load pattern fragment in TargetSelectionDAG.td
Differential Revision: https://reviews.llvm.org/D50358

llvm-svn: 339156
2018-08-07 17:34:59 +00:00
Sjoerd Meijer b39cd886b9 [ARM] FP16: codegen support for VACGT
Differential Revision: https://reviews.llvm.org/D50236

llvm-svn: 339148
2018-08-07 15:11:47 +00:00
Jonas Paulsson 5438f1debc [SystemZ] Comment update.
Update the comment in nextGroup since the ProcResourceCounters are not anymore
always decremented with '1'.

llvm-svn: 339140
2018-08-07 13:48:09 +00:00
Jonas Paulsson 25cbfdd423 [SystemZ] NFC: Remove redundant check in SystemZHazardRecognizer.
Remove the redundant check against zero when updating ProcResourceCounters in
nextGroup(), as pointed out in https://reviews.llvm.org/D50187.

Review: Ulrich Weigand.
llvm-svn: 339139
2018-08-07 13:44:11 +00:00
Aleksandar Beserminji 949a17c016 [mips] Handle branch expansion corner cases
When potential jump instruction and target are in the same segment, use
jump instruction with immediate field.

In cases where offset does not fit immediate value of a bc/j instructions,
offset is stored into register, and then jump register instruction is used.

Differential Revision: https://reviews.llvm.org/D48019

llvm-svn: 339126
2018-08-07 10:45:45 +00:00
Matt Arsenault 96b678427a AMDGPU: Add feature vi-insts
This is necessary to add a VI specific builtin,
__builtin_amdgcn_s_dcache_wb. We already have an
overly specific feature for one of these builtins,
for s_memrealtime. I'm not sure whether it's better
to add more of those, or to get rid of that and merge
it with vi-insts.

Alternatively, maybe this logically goes with scalar-stores?

llvm-svn: 339104
2018-08-07 07:28:46 +00:00
Craig Topper 9de1797c50 [SelectionDAG][X86] Rename MaskedLoadSDNode::getSrc0 to getPassThru.
Src0 doesn't really convey any meaning to what the operand is. Passthru matches what's used in the documentation for the intrinsic this comes from.

llvm-svn: 339101
2018-08-07 06:52:49 +00:00
Craig Topper 17989208a9 [SelectionDAG][X86] Rename getValue to getPassThru for gather SDNodes.
getValue is more meaningful name for scatter than it is for gather. Split them and use getPassThru for gather.

llvm-svn: 339096
2018-08-07 06:13:40 +00:00
Heejin Ahn e8653bb89a [WebAssembly] Enable atomic expansion for unsupported atomicrmws
Summary:
Wasm does not have direct counterparts to some of LLVM IR's atomicrmw
instructions (min, max, umin, umax, and nand). This enables atomic
expansion using cmpxchg instruction within a loop for those atomicrmw
instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49440

llvm-svn: 339084
2018-08-07 00:22:22 +00:00
Derek Schuff 2c78385960 [WebAssembly] Replace SIMD expression types with V128
Summary:
The spec only defines a SIMD expression type of V128 and
leaves interpretation of different vector types to the instructions.

Differential Revision: https://reviews.llvm.org/D50367

Patch by Thomas Lively

llvm-svn: 339082
2018-08-06 23:16:50 +00:00
Matt Arsenault 08f3fe4fae AMDGPU: cvt_pk_rtz_f16 canonicalizes
llvm-svn: 339078
2018-08-06 23:01:31 +00:00
Matt Arsenault e94ee833f9 AMDGPU: Handle some vector operations in isCanonicalized
llvm-svn: 339077
2018-08-06 22:45:51 +00:00
Matt Arsenault a29e76244a AMDGPU: Push fcanonicalize through partially constant build_vector
This usually avoids some re-packing code, and may
help find canonical sources.

llvm-svn: 339072
2018-08-06 22:30:44 +00:00
Matt Arsenault f2a167fb1d AMDGPU: Refactor fcanonicalize combine
This will make more complex combines easier.

llvm-svn: 339070
2018-08-06 22:10:26 +00:00
Matt Arsenault d49ab0b214 AMDGPU: Treat more custom operations as canonicalizing
Everything should quiet, and I think everything should
flush.

I assume the min3/med3/max3 follow the same rules
as regular min/max for flushing, which should at
least be conservatively correct.

There are still more operations that need to
be handled.

llvm-svn: 339065
2018-08-06 21:58:11 +00:00
Matt Arsenault ce6d61fba8 AMDGPU: Conversions always produce canonical results
Not sure why this was checking for denormals for f16.
My interpretation of the IEEE standard is conversions
should produce a canonical result, and the ISA manual
says denormals are created when appropriate.

llvm-svn: 339064
2018-08-06 21:51:52 +00:00
Matt Arsenault f8768bfc84 AMDGPU: Fix implementation of isCanonicalized
If denormals are enabled, denormals are canonical.
Also fix a few other issues. minnum/maxnum are supposed
to canonicalize. Temporarily improve workaround for the
instruction behavior change in gfx9.

Handle selects and fcopysign.

The tests were also largely broken, since they were
checking for a flush used on some targets after the
store of the result.

llvm-svn: 339061
2018-08-06 21:38:27 +00:00
Easwaran Raman 10fd92dd94 [X86] Recognize a splat of negate in isFNEG
Summary:
Expand isFNEG so that we generate the appropriate F(N)M(ADD|SUB)
instructions in more cases. For example, the following sequence
a = _mm256_broadcast_ss(f)
d = _mm256_fnmadd_ps(a, b, c)

generates an fsub and fma without this patch and an fnma with this
change.

Reviewers: craig.topper

Subscribers: llvm-commits, davidxl, wmi

Differential Revision: https://reviews.llvm.org/D48467

llvm-svn: 339043
2018-08-06 19:23:38 +00:00
Craig Topper 0076477a4c [X86] When using "and $0" and "orl $-1" to store 0 and -1 for minsize, make sure the store isn't volatile
If the store is volatile this might be a memory mapped IO access. In that case we shouldn't generate a load that didn't exist in the source

Differential Revision: https://reviews.llvm.org/D50270

llvm-svn: 339041
2018-08-06 18:44:26 +00:00
Matt Arsenault 0d1b3934e2 AMDGPU: Fold v_lshl_or_b32 with 0 src0
Appears from expansion of some packed cases.

llvm-svn: 339025
2018-08-06 15:40:20 +00:00
Bryan Chan e023706471 [AArch64] Fix assertion failure on widened f16 BUILD_VECTOR
Summary:
Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the
same type. This fixes an assertion failure in VerifySDNode.

Reviewers: SjoerdMeijer, t.p.northover, javed.absar

Reviewed By: SjoerdMeijer

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D50202

llvm-svn: 339013
2018-08-06 14:14:41 +00:00
Tim Northover 9956e4a24b ARM-MachO: don't add Thumb bit for addend to non-external relocation.
ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes
out that part of any addend in an object file. But it only does that for
symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting
that extra bit in other cases.

llvm-svn: 339007
2018-08-06 11:32:44 +00:00
David Bolvansky c0aa4b75a4 Enrich inline messages
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.

Patch by: yrouban (Yevgeny Rouban)


Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00

Reviewed By: tejohnson, xbolva00

Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith

Differential Revision: https://reviews.llvm.org/D49412

llvm-svn: 338969
2018-08-05 14:53:08 +00:00
Craig Topper 3c869cb5e5 [X86] Add isel patterns for atomic_load+sub+atomic_sub.
Despite the comment removed in this patch, this is beneficial when the RHS of the sub is a register.

llvm-svn: 338930
2018-08-03 22:08:30 +00:00
Craig Topper d7391eefdf [X86] Remove RELEASE_ and ACQUIRE_ pseudo instructions. Use isel patterns and the normal instructions instead
At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering.

So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions.

Differential Revision: https://reviews.llvm.org/D50212

llvm-svn: 338925
2018-08-03 21:40:44 +00:00
Matt Arsenault c3dc8e65e2 DAG: Enhance isKnownNeverNaN
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.

Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.

llvm-svn: 338910
2018-08-03 18:27:52 +00:00
Artem Belevich 0a11b6366a [NVPTX] Handle __nvvm_reflect("__CUDA_ARCH").
Summary:
libdevice in recent CUDA versions relies on __nvvm_reflect() to select
GPU-specific bitcode. This patch addresses the requirement.

Reviewers: jlebar

Subscribers: jholewinski, sanjoy, hiraditya, bixia, llvm-commits

Differential Revision: https://reviews.llvm.org/D50207

llvm-svn: 338908
2018-08-03 18:05:24 +00:00
Craig Topper feb2a58860 [X86] Add a DAG combine for the __builtin_parity idiom used by clang to enable better codegen
Clang uses "ctpop & 1" to implement __builtin_parity. If the popcnt instruction isn't supported this generates a large amount of code to calculate the population count. Instead we can bisect the data down to a single byte using xor and then check the parity flag.

Even when popcnt is supported, its still a good idea to split 64-bit data on 32-bit targets using an xor in front of a single popcnt. Otherwise we get two popcnts and an add before the and.

I've specifically targeted this at the sizes supported by clang builtins, but we could generalize this if we think that's useful.

Differential Revision: https://reviews.llvm.org/D50165

llvm-svn: 338907
2018-08-03 18:00:29 +00:00
Nicholas Wilson e408a89a3a [WebAssembly] Cleanup of the way globals and global flags are handled
Differential Revision: https://reviews.llvm.org/D44030

llvm-svn: 338894
2018-08-03 14:33:37 +00:00
Jonas Paulsson f107b7275c [SystemZ] Improve handling of instructions which expand to several groups
Some instructions expand to more than one decoder group.

This has been hitherto ignored, but is handled with this patch.

Review: Ulrich Weigand
https://reviews.llvm.org/D50187

llvm-svn: 338849
2018-08-03 10:43:05 +00:00
Sjoerd Meijer d62c5ec2fe [ARM] FP16: support vector zip and unzip
This is addressing PR38404.

Differential Revision: https://reviews.llvm.org/D50186

llvm-svn: 338835
2018-08-03 09:24:29 +00:00
Sjoerd Meijer 9b30213828 [ARM] FP16: support VFMA
This is addressing PR38404.

llvm-svn: 338830
2018-08-03 09:12:56 +00:00
Craig Topper a7a12399a1 [X86] Remove all the vector NOP bitcast patterns. Use a few lines of code in the Select method in X86ISelDAGToDAG.cpp instead.
There are a lot of permutations of types here generating a lot of patterns in the isel table. It's more efficient to just ReplaceUses and RemoveDeadNode from the Select function.

The test changes are because we have a some shuffle patterns that have a bitcast as their root node. But the behavior is identical to another instruction whose pattern doesn't start with a bitcast. So this isn't a functional change.

llvm-svn: 338824
2018-08-03 07:01:10 +00:00
Craig Topper e902b7d0b0 [X86] Support fp128 and/or/xor/load/store with VEX and EVEX encoded instructions.
Move all the patterns to X86InstrVecCompiler.td so we can keep SSE/AVX/AVX512 all in one place.

To save some patterns we'll use an existing DAG combine to convert f128 fand/for/fxor to integer when sse2 is enabled. This allows use to reuse all the existing patterns for v2i64.

I believe this now makes SHA instructions the only case where VEX/EVEX and legacy encoded instructions could be generated simultaneously.

llvm-svn: 338821
2018-08-03 06:12:56 +00:00
Craig Topper a80352c04e [X86] When post-processing the DAG to remove zero extending moves for YMM/ZMM, make sure the producing instruction is VEX/XOP/EVEX encoded.
If the producing instruction is legacy encoded it doesn't implicitly zero the upper bits. This is important for the SHA instructions which don't have a VEX encoded version. We might also be able to hit this with the incomplete f128 support that hasn't been ported to VEX.

llvm-svn: 338812
2018-08-03 04:49:42 +00:00
Craig Topper b2cc9a1d44 [X86] Add R13D to the isInefficientLEAReg in FixupLEAs.
I'm assuming the R13 restriction extends to R13D. Guessing this restriction is related to the funny encoding of this register as base always requiring a displacement to be encoded.

llvm-svn: 338806
2018-08-03 03:45:19 +00:00
Craig Topper 2c095444a4 [X86] Prevent promotion of i16 add/sub/and/or/xor to i32 if we can fold an atomic load and atomic store.
This makes them consistent with i8/i32/i64. Which still seems to be more aggressive on folding than icc, gcc, or MSVC.

llvm-svn: 338795
2018-08-03 00:37:34 +00:00
Tim Renouf 366a49d986 [AMDGPU] Minor change to d16 buffer load implementation
Summary:
By not reconstructing the operand list of the SDNode, this change makes
it easier to add the forthcoming new tbuffer and buffer intrinsics.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49995

Change-Id: I0cb79ef0801532645d7dd954a6d7355139db7b38
llvm-svn: 338784
2018-08-02 23:33:01 +00:00
Tim Renouf abd85fb1f5 [AMDGPU] Reworked SIFixWWMLiveness
Summary:
I encountered some problems with SIFixWWMLiveness when WWM is in a loop:

1. It sometimes gave invalid MIR where there is some control flow path
   to the new implicit use of a register on EXIT_WWM that does not pass
   through any def.

2. There were lots of false positives of registers that needed to have
   an implicit use added to EXIT_WWM.

3. Adding an implicit use to EXIT_WWM (and adding an implicit def just
   before the WWM code, which I tried in order to fix (1)) caused lots
   of the values to be spilled and reloaded unnecessarily.

This commit is a rework of SIFixWWMLiveness, with the following changes:

1. Instead of considering any register with a def that can reach the WWM
   code and a def that can be reached from the WWM code, it now
   considers three specific cases that need to be handled.

2. A register that needs liveness over WWM to be synthesized now has it
   done by adding itself as an implicit use to defs other than the
   dominant one.

Also added the following fixmes:

FIXME: We should detect whether a register in one of the above
categories is already live at the WWM code before deciding to add the
implicit uses to synthesize its liveness.

FIXME: I believe this whole scheme may be flawed due to the possibility
of the register allocator doing live interval splitting.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46756

Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94
llvm-svn: 338783
2018-08-02 23:31:32 +00:00
Craig Topper 63873db5c4 [X86] Allow 'atomic_store (neg/not atomic_load)' to isel to a RMW instruction.
There was a FIXMe in the td file about a type inference issue that was easy to fix.

llvm-svn: 338782
2018-08-02 23:30:38 +00:00
Tim Renouf f1c7b92a6a [AMDGPU] Avoid using divergent value in mubuf addr64 descriptor
Summary:
This fixes a problem where a load from global+idx generated incorrect
code on <=gfx7 when the index is divergent.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47383

Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed
llvm-svn: 338779
2018-08-02 22:53:57 +00:00
Krzysztof Parzyszek d91a9e27a9 [Hexagon] Simplify CFG after atomic expansion
This will remove suboptimal branching from the generated ll/sc loops.
The extra simplification pass affects a lot of testcases, which have
been modified to accommodate this change: either by modifying the
test to become immune to the CFG simplification, or (less preferablt)
by adding option -hexagon-initial-cfg-clenaup=0.

llvm-svn: 338774
2018-08-02 22:17:53 +00:00
Heejin Ahn 4128cb0b6b [WebAssembly] Support for atomic.wait / atomic.wake instructions
Summary:
This adds support for atomic.wait / atomic.wake instructions in the wasm
thread proposal.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49395

llvm-svn: 338770
2018-08-02 21:44:24 +00:00
Sam Clegg 41d7047de5 [WebAssembly] Ensure bitcasts that would result in invalid wasm are removed by FixFunctionBitcasts
Rather than allowing invalid bitcasts to be lowered to wasm
call instructions that won't validate, generate wrappers that
contain unreachable thereby delaying the error until runtime.

Differential Revision: https://reviews.llvm.org/D49517

llvm-svn: 338744
2018-08-02 17:38:06 +00:00
Craig Topper 0423881820 [X86] Allow fake unary unpckhpd and movhlps to be commuted for execution domain fixing purposes
These instructions perform the same operation, but the semantic of which operand is destroyed is reversed. If the same register is used as both operands we can change the execution domain without worrying about this difference.

Unfortunately, this really only works in cases where the input register is killed by the instruction. If its not killed, the two address isntruction pass inserts a copy that will become a move instruction. This makes the instruction use different physical registers that contain the same data at the time the unpck/movhlps executes. I've considered using a unary pseudo instruction with tied operand to trick the two address instruction pass. We could then expand the pseudo post regalloc to get the same physical register on both inputs.

Differential Revision: https://reviews.llvm.org/D50157

llvm-svn: 338735
2018-08-02 16:48:01 +00:00
Matt Arsenault 36cdcfadcf AMDGPU: Fix scalarizing v4f16 fcanonicalize
llvm-svn: 338714
2018-08-02 13:43:42 +00:00