Commit Graph

51455 Commits

Author SHA1 Message Date
Aakanksha Patil c56d2afc63 AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV)
A previous patch for "uniform-work-group-size" attribute was found to break
some RADV and possibly radeon SI tests and had to be retracted.
This patch fixes that.

Differential Revision: http://reviews.llvm.org/D58993

llvm-svn: 355574
2019-03-07 00:54:04 +00:00
Simon Atanasyan 83b88441ad [mips] Replace assertion by error message while lowering `RETURNADDR` and `FRAMEADDR`
MIPS target supports lowering `RETURNADDR` and `FRAMEADDR` for a current
frame only. It's better to show an error message then crash on assertion
if `__builtin_return_address` is invoked with non-zero argument.

llvm-svn: 355558
2019-03-06 22:40:28 +00:00
Abderrazek Zaafrani 5ced596198 [AArch64] Improve FP16 instruction selection for vector round and vector conver from half instructions
https://reviews.llvm.org/D58855

llvm-svn: 355545
2019-03-06 20:30:06 +00:00
Mitch Phillips 318028f00f Revert "[IR][ARM] Add function pointer alignment to datalayout"
This reverts commit 2391bfca97.

This reverts rL355522 (https://reviews.llvm.org/D57335).

Kills buildbots that use '-Werror' with the following error:
	/var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm/lib/IR/Value.cpp:657:7: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default]

See buildbots http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/30200/steps/check-llvm%20asan/logs/stdio for more information.

llvm-svn: 355537
2019-03-06 19:17:18 +00:00
Amara Emerson 21f44dfe9c [AArch64] Remove a stray test from the AArch64 directory.
llvm-svn: 355534
2019-03-06 18:54:07 +00:00
Simon Pilgrim 9d6347cfc1 [DAGCombine] Improve select (not Cond), N1, N2 -> select Cond, N2, N1 fold
Move the x86 combine from D58974 into the DAGCombine VSELECT code and update the SELECT version to use the isBooleanFlip helper as well.

Requested by @spatel on D59006

llvm-svn: 355533
2019-03-06 18:52:52 +00:00
Guozhi Wei 11308bdb43 [PPC] Adjust the computed branch offset for the possible shorter distance
In file PPCBranchSelector.cpp we tend to over estimate code size due to large
alignment and inline assembly. Usually it causes larger computed branch offset,
it is not big problem. But sometimes it may also causes smaller computed branch
offset than actual branch offset. If the offset is close to the limit of
encoding, it may cause problem at run time.
Following is a simplified example.

           actual        estimated
           address        address
 ...
bne Far      100            10c
.p2align 4
Near:        110            110
 ...
Far:        8108           8108

Actual offset:    0x8108 - 0x100 = 0x8008
Computed offset:  0x8108 - 0x10c = 0x7ffc

The computed offset is at most ((1 << alignment) - 4) bytes smaller than actual
offset. So we add this number to the offset for safety.

Differential Revision: https://reviews.llvm.org/D57718

llvm-svn: 355529
2019-03-06 18:22:22 +00:00
Krzysztof Parzyszek 9c005bbdd4 [Hexagon] Avoid creating 5-instruction packets with vgather pseudos
Change the resource usage of the vgather pseudos from SLOT0+LD to
SLOT0+SLOT1.

llvm-svn: 355524
2019-03-06 17:43:50 +00:00
Michael Platings 2391bfca97 [IR][ARM] Add function pointer alignment to datalayout
Use this feature to fix a bug on ARM where 4 byte alignment is
incorrectly assumed.

Differential Revision: https://reviews.llvm.org/D57335

llvm-svn: 355522
2019-03-06 17:24:11 +00:00
Ryan Taylor 67f36903ae [AMDGPU] Add support for 64 bit buffer atomic artihmetic instructions
Summary:
This adds support for 64 bit buffer atomic arithmetic instructions but does not include
cmpswap as that depends on a fix to the way the register pairs are handled

Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58918

llvm-svn: 355520
2019-03-06 17:02:06 +00:00
Strahinja Petrovic 94fccc93de [PowerPC] Add secure plt support for TLS symbols
This patch supports secure plt mode for TLS symbols.

Differential Revision: https://reviews.llvm.org/D45520

llvm-svn: 355513
2019-03-06 15:00:10 +00:00
Simon Pilgrim 468bb2e601 [X86][SSE] VSELECT(XOR(Cond,-1), LHS, RHS) --> VSELECT(Cond, RHS, LHS)
As noticed on D58965

DAGCombiner::visitSELECT has something similar, so we should be able to move this to DAGCombiner and support VSELECT as well at some point.

Differential Revision: https://reviews.llvm.org/D58974

llvm-svn: 355494
2019-03-06 10:54:43 +00:00
Craig Topper c0e01d29a4 [X86] Enable the add with 128 -> sub with -128 encoding trick with X86ISD::ADD when the carry flag isn't used.
This allows us to use an 8-bit sign extended immediate instead of a 16 or 32 bit immediate.

Also do similar for 0x80000000 with 64-bit adds to avoid having to use a movabsq.

llvm-svn: 355485
2019-03-06 07:36:38 +00:00
Craig Topper 97a1c4c340 [X86] Suppress load folding for add/sub with 128 immediate.
128 won't fit in a sign extended 8-bit immediate, but we can negate it to -128 and use the other operation. This results in a shorter encoding since the move would have used 16 or 32 bits for the immediate.

llvm-svn: 355484
2019-03-06 07:36:36 +00:00
Craig Topper 112ea336c3 [X86] Remove periods from the end of SubtargetFeature descriptions since the help printer adds a period.
Most features don't have periods already, but some did. When there is a period it causes llc -mattr=+help to print 2 periods.

llvm-svn: 355474
2019-03-06 02:36:48 +00:00
Heejin Ahn 3c20b34d24 [WebAssembly] Remove trailing whitespaces in tests (NFC)
Reviewers: sbc100

Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58955

llvm-svn: 355472
2019-03-06 02:00:22 +00:00
Florian Hahn 13bbcb3264 [ARM] Sink zext/sext operands for add and sub to enable vsubl generation.
This uses the infrastructure added in rL353152 to sink zext and sexts to
sub/add users, to enable vsubl/vaddl generation when NEON is available.

See https://bugs.llvm.org/show_bug.cgi?id=40025.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D58063

llvm-svn: 355460
2019-03-06 00:10:03 +00:00
Heejin Ahn 5c644c9bca [WebAssembly] Simplify iterator navigations (NFC)
Summary:
- Replaces some uses of `MachineFunction::iterator(MBB)` with
  `MBB->getIterator()` and `MachineBasicBlock::iterator(MI)` with
  `MI->getIterator()`, which are simpler.
- Replaces some uses of `std::prev` of `std::next` that takes a
  MachineFunction or MachineBasicBlock iterator with `getPrevNode` and
  `getNextNode`, which are also simpler.

Reviewers: sbc100

Subscribers: dschuff, sunfish, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58913

llvm-svn: 355444
2019-03-05 21:05:09 +00:00
Heejin Ahn ef9d6aea45 [WebAssembly] Disable MachineBlockPlacement pass
Summary:
This pass hurts code size for wasm and sometimes generates irreducible
control flow.
Context: https://github.com/emscripten-core/emscripten/pull/8233

Reviewers: kripken, dschuff

Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58953

llvm-svn: 355437
2019-03-05 20:35:34 +00:00
Craig Topper 57fd733140 Revert r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary."
This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures.

llvm-svn: 355433
2019-03-05 19:18:16 +00:00
Guozhi Wei f124e75656 [X86] In X86DomainReassignment.cpp add enclosed registers to EnclosedEdges
The variable X86DomainReassignment::EnclosedEdges is used to store registers that have been enclosed in some closure, so those registers will be ignored when create new closures. But there is no registers has ever been put into this set, so a single register can be enclosed in multiple closures, it significantly increase compile time.

This patch adds a register into EnclosedEdges when it is enclosed into a closure.

Differential Revision: https://reviews.llvm.org/D58646

llvm-svn: 355430
2019-03-05 18:54:34 +00:00
Matt Arsenault 870397739e AMDGPU: Preserve undef flag when expanding SI_IF
Fixes undefined value verifier error.

llvm-svn: 355426
2019-03-05 18:38:00 +00:00
Craig Topper 4a9dd7c39b [X86] Enable 8-bit SHL to convert to LEA
Differential Revision: https://reviews.llvm.org/D58870

llvm-svn: 355425
2019-03-05 18:37:41 +00:00
Craig Topper 216bf7f03b [X86] Allow 8-bit INC/DEC to be converted to LEA.
We already do this for 16/32/64 as well as 8-bit add with register/immediate. Might as well do it for 8-bit INC/DEC too.

Differential Revision: https://reviews.llvm.org/D58869

llvm-svn: 355424
2019-03-05 18:37:37 +00:00
Craig Topper 572e94ca02 [X86] Enable 8-bit OR with disjoint bits to convert to LEA
We already support 8-bits adds in convertToThreeAddress. But we can also support 8-bit OR if the bits are disjoint. We already do this for 16/32/64.

Differential Revision: https://reviews.llvm.org/D58863

llvm-svn: 355423
2019-03-05 18:37:33 +00:00
Jessica Paquette 00d5847b5c Revert "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT"
This broke test-suite::aarch64_neon_intrinsics.test

Reverting while I look into it.

Example failure:
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-quick/builds/17740

llvm-svn: 355408
2019-03-05 15:47:00 +00:00
Carl Ritson 9e3f7d8ad0 [AMDGPU] Fix DPP operand order in atomic optimizer
Summary:
Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.

Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30

Reviewers: sheredom, tpr

Reviewed By: sheredom

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58900

llvm-svn: 355394
2019-03-05 12:21:44 +00:00
Heejin Ahn c7397613d2 [WebAssembly] Rename a variable in LateEHPrepare (NFC)
llvm-svn: 355387
2019-03-05 11:11:34 +00:00
Oliver Stannard 4a9086b537 [ARM] Fix select_cc lowering for fp16
When lowering a select_cc node where the true and false values are of type f16,
we can't use a general conditional move because the FP16 instructions do not
support conditional execution. Instead, we must ensure that the condition code
is one of the four supported by the VSEL instruction.

Differential revision: https://reviews.llvm.org/D58813

llvm-svn: 355385
2019-03-05 10:42:34 +00:00
David Stuttard 81eec58a0d [AMDGPU] Omit KILL instructions from hazard recognizer
Summary:
In some cases the KILL was causing a hazard to be introduced as these were
scheduled into hazard slots, but don't result in an instruction.

KILL shouldn't be considered for hazard recognition.

Change-Id: Ib6d2a2160f8c94cd0ce611ab198c7e4f46aeffcf

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58898

llvm-svn: 355384
2019-03-05 10:25:16 +00:00
Chen Zheng 9cfe7e81f1 [PowerPC] fix killed/dead flag after convert x-form to d-form tranformation.
Differential Revision: https://reviews.llvm.org/D58428

llvm-svn: 355378
2019-03-05 04:56:54 +00:00
Scott Linder efec1396ac [AMDGPU] Implement AMDGPUMCInstrAnalysis
Implement MCInstrAnalysis for AMDGPU, with default implementations save
for `evaluateBranch`.

Differential Revision: https://reviews.llvm.org/D58400

llvm-svn: 355373
2019-03-05 03:02:00 +00:00
Craig Topper 6a6ce5be84 [X86] Reduce some patterns by using FP instructions for integer types even when AVX2 is available and execution domain fixing will do the right thing
We have quite a few cases of using FP instructions for integer operations when only AVX1 is available. Then we switch to integer instructions with AVX2. In a lot of these cases execution domain fixing will take care of turning FP instructions into integer if its profitable.

With this patch we just keep on using the FP instructions even with AVX2. I've only handled some cases that don't require messing with patterns that are defined in the instruction definition. Those will require more subtle multiclass work possibly involving null_frag, hasSideEffects = 0, etc.

Differential Revision: https://reviews.llvm.org/D58470

llvm-svn: 355361
2019-03-05 01:14:25 +00:00
Yonghong Song d82247cb80 [BPF] Do not generate BTF sections unnecessarily
If There is no types/non-empty strings, do not generate
.BTF section. If there is no func_info/line_info, do
not generate .BTF.ext section.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D58936

llvm-svn: 355360
2019-03-05 01:01:21 +00:00
Jessica Paquette caf62b1d47 [GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT
This adds instruction selection support for G_EXTRACT_VECTOR_ELT for cases
where the index is defined by a G_CONSTANT.

It also factos out the lane copy opcode selection part into its own function,
`getLaneCopyOpcode`. This is used by both `selectUnmergeValues` and
`selectExtractElt`.

Differential Revision: https://reviews.llvm.org/D58469

llvm-svn: 355344
2019-03-04 22:35:32 +00:00
Jessica Paquette 0632e12f89 [GlobalISel][AArch64] Legalize vector G_SELECT
Just scalarize it, and add a test showing it works.

Differential Revision: https://reviews.llvm.org/D58747

llvm-svn: 355339
2019-03-04 21:12:46 +00:00
Amara Emerson 8acb0d9c82 Re-commit r355104: "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1."
The code to materialize a mask from a constant pool load tried to use a 128 bit
LDR to load a 64 bit constant pool entry, which was 8 byte aligned. This resulted
in a link failure in the NEON tests in the test suite since the LDR address was
unaligned. This change fixes that to instead emit a 64 bit LDR if the entry is
64 bit, before converting back to a 128 bit register for the TBL.

llvm-svn: 355326
2019-03-04 19:16:00 +00:00
Wouter van Oortmerssen f3feb6adb9 [WebAssembly] Add support for data sections in the assembler.
Summary:
This is quite minimal so far, introduce them with .section,
fill them with .int8 or .asciz, end with .size

Reviewers: dschuff, sbc100, aheejin

Subscribers: jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58660

llvm-svn: 355321
2019-03-04 17:18:04 +00:00
Dmitry Preobrazhensky 6023d5990d [AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, v_readlane_b32 and v_writelane_b32
See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D58713

llvm-svn: 355312
2019-03-04 12:48:32 +00:00
Jeremy Morse 09d8ea5282 [X86] Avoid codegen changes when DBG_VALUE appears between lowered selects
X86TargetLowering::EmitLoweredSelect presently detects sequences of CMOV pseudo
instructions without accounting for debug intrinsics. This leads to different
codegen with and without option -g, if a DBG_VALUE instruction lands in the
middle of several lowered selects.

Work around this by skipping over debug instructions when looking for CMOV
sequences, and sinking those debug insts into the EmitLoweredSelect sunk block.
This might slightly shift where variables appear in the instruction sequence,
but won't re-order assignments.

Differential Revision: https://reviews.llvm.org/D58672

llvm-svn: 355307
2019-03-04 10:56:02 +00:00
Oliver Stannard 181afc7f3b [ARM] Fix selection of VLDR.16 instruction with imm offset
The isScaledConstantInRange function takes upper and lower bounds which are
checked after dividing by the scale, so the bounds checks for half, single and
double precision should all be the same. Previously, we had wrong bounds checks
for half precision, so selected an immediate the instructions can't actually
represent.

Differential revision: https://reviews.llvm.org/D58822

llvm-svn: 355305
2019-03-04 09:17:38 +00:00
Jonas Hahnfeld 65a401f6a9 [AArch64/ARM] Fix two compiler warnings in InstructionSelector, NFCI
1) GCC complains that KnownValid is set but not used.
2) In ARMInstructionSelector::selectGlobal() the code is mixing "enumeral
   and non-enumeral type in conditional expression". Solve this by casting
   to unsigned which is the final type anyway.

Differential Revision: https://reviews.llvm.org/D58834

llvm-svn: 355304
2019-03-04 08:51:32 +00:00
Heejin Ahn 195a62e9ae [WebAssembly] Delete ThrowUnwindDest map from WasmEHFuncInfo
Summary:
Before when we implemented the first EH proposal, 'catch <tag>'
instruction may not catch an exception so there were multiple EH pads an
exception can unwind to. That means a BB could have multiple EH pad
successors.

Now after we switched to the new proposal, every 'catch' instruction
catches an exception, and there is only one catchpad per catchswitch, so
we at most have one EH pad successor, making `ThrowUnwindDest` map in
`WasmEHInfo` unnecessary.

Keeping `ThrowUnwindDest` map in `WasmEHInfo` has its own problems,
because other optimization passes can split a BB that contains possibly
throwing calls (previously invokes), and we have to update the map every
time that happens, which is not easy for common CodeGen passes.

This also correctly updates successor info in LateEHPrepare when we add
a rethrow instruction.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58486

llvm-svn: 355296
2019-03-03 22:35:56 +00:00
Simon Pilgrim e48be5d698 Remove unused variable. NFCI.
llvm-svn: 355289
2019-03-03 14:23:07 +00:00
Simon Pilgrim d8e91a54c0 [X86] getShuffleScalarElt - peek through insert/extract subvector nodes.
llvm-svn: 355288
2019-03-03 14:11:05 +00:00
Simon Pilgrim 11149ea433 [X86] Pull out combineToConsecutiveLoads helper. NFCI.
llvm-svn: 355287
2019-03-03 13:53:27 +00:00
Craig Topper ce68659772 [X86] Prefer VPBLENDD for v2i64/v4i64 blends with AVX2.
We were using VPBLENDW for v2i64 and VBLENDPD for v4i64. VPBLENDD has better throughput than VPBLENDW on some CPUs so it makes sense to use it when possible. VBLENDPD will probably become VBLENDD during execution domain fixing, but we might as well use integer in isel while we can.

This should work around some issues with the domain fixing pass prefering PBLENDW when we start with PBLENDW. There may still be some v8i16 cases that could use PBLENDD.

llvm-svn: 355281
2019-03-03 00:18:07 +00:00
Thomas Lively 43876ae7bc [WebAssembly] Expand operations not supported by SIMD
Summary:
This prevents crashes in instruction selection when these operations
are used. The tests check that the scalar version of the instruction
is used where applicable, although some expansions do not use the
scalar version.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58859

llvm-svn: 355261
2019-03-02 03:32:25 +00:00
Amaury Sechet f24abf6511 [X86] Improve use of SHLD/SHRD
Summary:
This extends the variety of pattern that can generate a SHLD instead of using two shifts.

This fixes a regression that would be introduced by D57367 or D33587

Reviewers: RKSimon, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57389

llvm-svn: 355260
2019-03-02 02:44:16 +00:00
Thomas Lively 7ec62fde78 Revert "[WebAssembly][WIP] Expand operations not supported by SIMD"
This was accidentally committed without tests or review.

llvm-svn: 355254
2019-03-02 00:55:16 +00:00
Thomas Lively f5a8c28e7e [WebAssembly][WIP] Expand operations not supported by SIMD
llvm-svn: 355247
2019-03-02 00:18:07 +00:00
Craig Topper 4cfc39179e [TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary.
Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.

By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.

This removes something like 40,000 bytes from the X86 isel table.

Differential Revision: https://reviews.llvm.org/D58595

llvm-svn: 355224
2019-03-01 20:18:38 +00:00
Vlad Tsyrklevich 8925138007 Revert "[MIPS GlobalISel] Fix mul operands"
This reverts commit r355178, it is causing ASan failures on the
sanitizer bots.

llvm-svn: 355219
2019-03-01 18:58:22 +00:00
Thomas Lively d295f51469 Revert "[WebAssembly] Lower SIMD shifts since they are fixed in V8"
They weren't fixed in V8. Oops.

llvm-svn: 355208
2019-03-01 17:43:55 +00:00
Oliver Stannard 82fbbc21fd [ARM] Fix FP16 stack loads/stores for Thumb2 with frame pointer
The new addressing mode added for the v8.2A FP16 instructions uses bit 8 of the
immediate to encode the sign of the offset, like the other FP loads/stores, so
need to be treated the same way.

Differential revision: https://reviews.llvm.org/D58816

llvm-svn: 355201
2019-03-01 14:20:28 +00:00
Oliver Stannard e019e6223b [ARM] Consider undefined-on-NaN conditions in checkVSELConstraints
This function was not checking for the condition code variants which are
undefined if either input is NaN, so we were missing selection of the VSEL
instruction in some cases when using -fno-honor-nans or -ffast-math.

Differential revision: https://reviews.llvm.org/D58812

llvm-svn: 355199
2019-03-01 13:58:25 +00:00
Diana Picus 54829ec5d0 [ARM GlobalISel] Support G_CTLZ for Thumb2
Same as ARM mode but with different opcode.

llvm-svn: 355191
2019-03-01 10:12:28 +00:00
Stanislav Mekhanoshin bb98841399 [AMDGPU] Mark ds instructions as meybeAtomic
These were not recognized as potential atomics by memory legalizer.
The test was working not because legalizer did a right thing, but
because it has skipped all these instructions. When I have fixed
DS desciption test started to fail because region address has
changed from 4 to 2 a while ago.

Differential Revision: https://reviews.llvm.org/D58802

llvm-svn: 355179
2019-03-01 07:59:17 +00:00
Petar Avramovic 9bf43b5c26 [MIPS GlobalISel] Fix mul operands
Unsigned mul high for MIPS32 is selected into two PseudoInstructions:
PseudoMULTu and PseudoMFHI that use accumulator register class ACC64 for
some of its operands. Registers in this class have appropriate hi and lo
register as subregisters: $lo0 and $hi0 are subregisters of $ac0 etc.
mul instruction implicit-defs $lo0 and $hi0 according to MipsInstrInfo.td.
In functions where mul and PseudoMULTu are present fastRegisterAllocator
will "run out of registers during register allocation" because
'calcSpillCost' for $ac0 will return spillImpossible because subregisters
$lo0 and $hi0 of $ac0 are reserved by mul instruction above. A solution is
to mark implicit-defs of $lo0 and $hi0 as dead in mul instruction.

Differential Revision: https://reviews.llvm.org/D58715

llvm-svn: 355178
2019-03-01 07:35:57 +00:00
Petar Avramovic a48285a190 [MIPS GlobalISel] Select G_UMULH
Legalize G_UMULO and select G_UMULH for MIPS32.

Differential Revision: https://reviews.llvm.org/D58714

llvm-svn: 355177
2019-03-01 07:25:44 +00:00
Thomas Lively c4b674955c [WebAssembly] Lower SIMD shifts since they are fixed in V8
Reviewers: sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58800

llvm-svn: 355163
2019-03-01 01:38:54 +00:00
Tom Stellard 33634d1b25 AMDGPU/GlobalISel: Implement select for G_INSERT
Re-commit r344310.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53116

llvm-svn: 355159
2019-03-01 00:50:26 +00:00
Thomas Lively ae79f42a2f [WebAssembly] Fix crash when @llvm.global_dtors is external
Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58799

llvm-svn: 355157
2019-03-01 00:12:13 +00:00
Tom Stellard 41f32196a0 AMDGPU/GlobalISel: Implement select for G_EXTRACT
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D49714

llvm-svn: 355156
2019-02-28 23:37:48 +00:00
Joerg Sonnenberger 01530291ea [PPC] Secure PLT only has meaning for PIC
llvm-svn: 355154
2019-02-28 23:33:09 +00:00
Eli Friedman d19a7060c6 [AArch64] [Windows] Don't skip constructing UnwindHelp.
In certain cases, the first non-frame-setup instruction in a function is
a branch.  For example, it could be a cbz on an argument.  Make sure we
correctly allocate the UnwindHelp, and find an appropriate register to
use to initialize it.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40184

Differential Revision: https://reviews.llvm.org/D58752

llvm-svn: 355136
2019-02-28 20:38:45 +00:00
Abderrazek Zaafrani abfd10807c [AArch64] Improve FP16 vector convert from short instructions.
https://reviews.llvm.org/D58563

llvm-svn: 355134
2019-02-28 20:21:46 +00:00
Sanjay Patel 7fc6ef7dd7 [x86] scalarize extract element 0 of FP math
This is another step towards ensuring that we produce the optimal code for reductions,
but there are other potential benefits as seen in the tests diffs:

  1. Memory loads may get scalarized resulting in more efficient code.
  2. Memory stores may get scalarized resulting in more efficient code.
  3. Complex ops like fdiv/sqrt get scalarized which may be faster instructions depending on uarch.
  4. Even simple ops like addss/subss/mulss/roundss may result in faster operation/less frequency throttling when scalarized depending on uarch.

The TODO comment suggests 1 or more follow-ups for opcodes that can currently result in regressions.

Differential Revision: https://reviews.llvm.org/D58282

llvm-svn: 355130
2019-02-28 19:47:04 +00:00
Jiong Wang 0a039660fa bpf: disassembler support for XADD under sub-register mode
Like the other load/store instructions, "w" register is preferred when
disassembling BPF_STX | BPF_W | BPF_XADD.

v1 -> v2:
 - Updated testcase insn-unit.s (Yonghong)

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 355127
2019-02-28 19:22:34 +00:00
Jiong Wang 3da8bcd0a0 bpf: enable sub-register code-gen for XADD
Support sub-register code-gen for XADD is like supporting any other Load
and Store patterns.

No new instruction is introduced.

  lock *(u32 *)(r1 + 0) += w2

has exactly the same underlying insn as:

  lock *(u32 *)(r1 + 0) += r2

BPF_W width modifier has guaranteed they behave the same at runtime. This
patch merely teaches BPF back-end that BPF_W width modifier could work
GPR32 register class and that's all needed for sub-register code-gen
support for XADD.

test/CodeGen/BPF/xadd.ll updated to include sub-register code-gen tests.

A new testcase test/CodeGen/BPF/xadd_legal.ll is added to make sure the
legal case could pass on all code-gen modes. It could also test dead Def
check on GPR32. If there is no proper handling like what has been done
inside BPFMIChecking.cpp:hasLivingDefs, then this testcase will fail.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 355126
2019-02-28 19:21:28 +00:00
Jiong Wang 3d7c265e11 bpf: improve dead Defs check for XADD
BPF XADD semantics require all Defs of XADD are dead, meaning any result of
XADD insn is not used.

However, BPF backend hasn't enabled sub-register liveness track, so when
the source and destination operands of XADD are GPR32, there is no
sub-register dead info. If we rely on the generic
MachineInstr::allDefsAreDead, then we will raise false alarm on GPR32 Def.
This was fine as there was no sub-register code-gen support for XADD which
will be added by the next patch.

To support GPR32 Def, ideally we could just enable sub-registr liveness
track on BPF backend, then allDefsAreDead could work on GPR32 Def. This
requires implementing TargetSubtargetInfo::enableSubRegLiveness on BPF.

However, sub-register liveness tracking module inside LLVM is actually
designed for the situation where one register could be split into more
than one sub-registers for which case each sub-register could have their
own liveness and kill one of them doesn't kill others. So, tracking
liveness for each make sense.

For BPF, each 64-bit register could only have one 32-bit sub-register. This
is exactly the case which LLVM think brings no benefits for doing
sub-register tracking, because the live range of sub-register must always
equal to its parent register, therefore liveness tracking is disabled even
the back-end has implemented enableSubRegLiveness. The detailed information
is at r232695:

  Author: Matthias Braun <matze@braunis.de>
  Date:   Thu Mar 19 00:21:58 2015 +0000
  Do not track subregister liveness when it brings no benefits

Hence, for BPF, we enhance MachineInstr::allDefsAreDead. Given the solo
sub-register always has the same liveness as its parent register, LLVM is
already attaching a implicit 64-bit register Def whenever the there is
a sub-register Def. The liveness of the implicit 64-bit Def is available.
For example, for "lock *(u32 *)(r0 + 4) += w9", the MachineOperand info
could be:

  $w9 = XADDW32 killed $r0, 4, $w9(tied-def 0),
                       implicit killed $r9, implicit-def dead $r9

Even though w9 is not marked as Dead, the parent register r9 is marked as
Dead correctly, and it is safe to use such information or our purpose.

v1 -> v2:
 - Simplified code logic inside hasLiveDefs. (Yonghong)

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 355124
2019-02-28 19:20:26 +00:00
Craig Topper 38427c47b9 [X86] Don't peek through bitcasts before checking ISD::isBuildVectorOfConstantSDNodes in combineTruncatedArithmetic
We don't have any combines that can look through a bitcast to truncate a build vector of constants. So the truncate will stick around and give us something like this pattern (binop (trunc X), (trunc (bitcast (build_vector)))) which has two truncates in it. Which will be reversed by hoistLogicOpWithSameOpcodeHands in the generic DAG combiner. Thus causing an infinite loop.

Even if we had a combine for (truncate (bitcast (build_vector))), I think it would need to be implemented in getNode otherwise DAG combiner visit ordering would probably still visit the binop first and reverse it. Or combineTruncatedArithmetic would need to do its own constant folding.

Differential Revision: https://reviews.llvm.org/D58705

llvm-svn: 355116
2019-02-28 18:49:29 +00:00
Amara Emerson 8d70e6425c Revert "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1."
Seems to break some neon intrinsics tests.

llvm-svn: 355115
2019-02-28 18:47:29 +00:00
Thomas Lively f3b4f99007 [WebAssembly] Remove uses of ThreadModel
Summary:
In the clang UI, replaces -mthread-model posix with -matomics as the
source of truth on threading. In the backend, replaces
-thread-model=posix with the atomics target feature, which is now
collected on the WebAssemblyTargetMachine along with all other used
features. These collected features will also be used to emit the
target features section in the future.

The default configuration for the backend is thread-model=posix and no
atomics, which was previously an invalid configuration. This change
makes the default valid because the thread model is ignored.

A side effect of this change is that objects are never emitted with
passive segments. It will instead be up to the linker to decide
whether sections should be active or passive based on whether atomics
are used in the final link.

Reviewers: aheejin, sbc100, dschuff

Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, steven_wu, dexonsmith, rupprecht, jfb, jdoerfert, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D58742

llvm-svn: 355112
2019-02-28 18:39:08 +00:00
Amara Emerson 85c3afd7f6 [AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1.
This extends the existing support for shufflevector to handle cases like
<2 x float>, which we can implement by concating the vectors and using a TBL1.

Differential Revision: https://reviews.llvm.org/D58684

llvm-svn: 355104
2019-02-28 16:43:11 +00:00
Kadir Cetinkaya 1b1b1a6135 [Target][ARM] Add a usage for SrcSz to unbreak build-bots without assertions
llvm-svn: 355101
2019-02-28 15:55:11 +00:00
Bjorn Pettersson d30f308a9f Add support for computing "zext of value" in KnownBits. NFCI
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.

This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.

Reviewers: craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58650

llvm-svn: 355099
2019-02-28 15:45:29 +00:00
Stefan Pintilie a073a18460 [PowerPC] Removed STATISTIC that was causing build errors.
llvm-svn: 355087
2019-02-28 12:40:28 +00:00
Stefan Pintilie bd5429ef38 [PowerPC] Move the stack pointer update instruction later in the prologue and earlier in the epilogue.
Move the stdu instruction in the prologue and epilogue.
This should provide a small performance boost in functions that are able to do
this. I've kept this change rather conservative at the moment and functions
with frame pointers or base pointers will not try to move the stack pointer
update.

Differential Revision: https://reviews.llvm.org/D42590

llvm-svn: 355085
2019-02-28 12:23:28 +00:00
Simon Pilgrim 134bc19079 [X86][AVX] Remove superfluous insert_subvector(zero, bitcast(x)) -> bitcast(insert_subvector(zero, x)) fold
This is caught by other existing bitcast folds.

llvm-svn: 355084
2019-02-28 11:39:52 +00:00
Diana Picus cf0ff638bc [ARM GlobalISel] Make arm_i32imm an IntImmLeaf
This gets rid of some duplication in the TableGen definition, but it
forces us to keep both a pointer and a reference to the subtarget in the
ARMInstructionSelector. That is pretty ugly but it might be a reasonable
trade-off, since the TableGen descriptions should outlive the code in
the selector (or in the worst case we can update to use just the
reference when we get rid of DAGISel).

Differential Revision: https://reviews.llvm.org/D58031

llvm-svn: 355083
2019-02-28 11:13:05 +00:00
Simon Pilgrim 87aeff8bbb [X86][AVX] Fold vf64 concat_vectors(movddup(x),movddup(x)) -> broadcast(x)
llvm-svn: 355078
2019-02-28 10:53:58 +00:00
Diana Picus 3b7beafc77 [ARM GlobalISel] Support global variables for Thumb2
Add the same level of support as for ARM mode (i.e. still no TLS
support).

In most cases, it is sufficient to replace the opcodes with the
t2-equivalent, but there are some idiosyncrasies that I decided to
preserve because I don't understand the full implications:
* For ARM we use LDRi12 to load from constant pools, but for Thumb we
  use t2LDRpci (I'm not sure if the ideal would be to use t2LDRi12 for
  Thumb as well, or to use LDRcp for ARM).
* For Thumb we don't have an equivalent for MOV|LDRLIT_ga_pcrel_ldr, so
  we have to generate MOV|LDRLIT_ga_pcrel plus a load from GOT.

The tests are in separate files because they're hard enough to read even
without doubling the number of checks.

llvm-svn: 355077
2019-02-28 10:42:47 +00:00
Craig Topper 6ca7398a1e [X86] Use PreprocessISelDAG to convert vector sra/srl/shl to the X86 specific variable shift ISD opcodes.
These allows use to use the same set of isel patterns for sra/srl/shl which are undefined for out of range shifts and intrinsic shifts which aren't undefined.

Doing this late allows DAG combine to have every opportunity to optimize the sra/srl/shl nodes.

This removes about 7000 bytes from the isel table and simplies the td files.

llvm-svn: 355071
2019-02-28 07:21:26 +00:00
Craig Topper 240315aa64 [X86] Use X86::LAST_VALID_COND instead of assuming X86::COND_S is the last encoding. NFC
llvm-svn: 355059
2019-02-28 01:00:31 +00:00
Matt Arsenault 09a09ef8b7 AMDGPU: Fix typo
llvm-svn: 355056
2019-02-28 00:52:33 +00:00
Matt Arsenault 5d567dc137 AMDGPU: Enable function calls by default
Fixes some crashes on illegal call situations which are unfortunately
still valid IR.

llvm-svn: 355051
2019-02-28 00:40:32 +00:00
Abderrazek Zaafrani 2fc498a652 [AArch64] Generate FP16 vector compare instructions.
https://reviews.llvm.org/D58561

llvm-svn: 355050
2019-02-28 00:31:38 +00:00
Matt Arsenault aa03bcd23c AMDGPU: Fix crashes in invalid call cases
We have to at least tolerate calls to kernels, possibly with a
mismatched calling convention on the callsite.

llvm-svn: 355049
2019-02-28 00:28:44 +00:00
Matt Arsenault d3093c2f1f GlobalISel: Implement fewerElementsVector for phi
llvm-svn: 355048
2019-02-28 00:16:32 +00:00
Matt Arsenault 72bcf15dbf GlobalISel: Implement moreElementsVector for phi
llvm-svn: 355047
2019-02-28 00:01:05 +00:00
Joerg Sonnenberger 6a198366a0 Default to Secure PLT on PPC for NetBSD and OpenBSD.
This matches the default settings of clang.

llvm-svn: 355038
2019-02-27 21:53:14 +00:00
Philip Reames 288a95fc8c Seperate volatility and atomicity/ordering in SelectionDAG
At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..).

This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success.

To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll)..

Previously landed

    D57593 Fix a bug in the definition of isUnordered on MachineMemOperand
    D57596 [CodeGen] Be conservative about atomic accesses as for volatile
    D57802 Be conservative about unordered accesses for the moment
    rL353959: [Tests] First batch of cornercase tests for unordered atomics.
    rL353966: [Tests] RMW folding tests w/unordered atomic operations.
    rL353972: [Tests] More unordered atomic lowering tests.
    rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface
    rL354740: [Hexagon, SystemZ] Be super conservative about atomics
    rL354800: [Lanai] Be super conservative about atomics
    rL354845: [ARM] Be super conservative about atomics

Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.)

Differential Revision: https://reviews.llvm.org/D57601

llvm-svn: 355025
2019-02-27 20:20:08 +00:00
Simon Pilgrim 1001a6ab03 [X86][AVX] Pull out some INSERT_SUBVECTOR combines into a combineConcatVectorOps helper. NFCI
A lot of the INSERT_SUBVECTOR combines can be more generally handled as if they have come from a CONCAT_VECTORS node.

I've been investigating adding a CONCAT_VECTORS combine to X86, but this is a much easier first step that avoids the issue of handling a number of pre-legalization issues that I've encountered.

Differential Revision: https://reviews.llvm.org/D58583

llvm-svn: 355015
2019-02-27 18:46:32 +00:00
Dmitry Preobrazhensky 7904231edb [AMDGPU][MC] Added register size check for VOP3/SDWA/DPP operands
See bug 37943: https://bugs.llvm.org/show_bug.cgi?id=37943

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D58287

llvm-svn: 354974
2019-02-27 13:58:48 +00:00
Dmitry Preobrazhensky ef92035827 [AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of instructions s_set_gpr_idx_on and s_set_gpr_idx_mode
See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D58288

llvm-svn: 354969
2019-02-27 13:12:12 +00:00
Simon Pilgrim 71bb6850cf [X86][AVX] Only combine loads to broadcasts for legal types
Thanks to @echristo for spotting this.

llvm-svn: 354961
2019-02-27 11:17:25 +00:00
Yonghong Song cc290a9e91 [BPF] Don't fail for static variables
Currently, the LLVM will print an error like
  Unsupported relocation: try to compile with -O2 or above,
  or check your static variable usage
if user defines more than one static variables in a single
ELF section (e.g., .bss or .data).

There is ongoing effort to support static and global
variables in libbpf and kernel. This patch removed the
assertion so user programs with static variables won't
fail compilation.

The static variable in-section offset is written to
the "imm" field of the corresponding to-be-relocated
bpf instruction. Below is an example to show how the
application (e.g., libbpf) can relate variable to relocations.

  -bash-4.4$ cat g1.c
  static volatile long a = 2;
  static volatile int b = 3;
  int test() { return a + b; }
  -bash-4.4$ clang -target bpf -O2 -c g1.c
  -bash-4.4$ llvm-readelf -r g1.o

  Relocation section '.rel.text' at offset 0x158 contains 2 entries:
      Offset             Info             Type               Symbol's Value  Symbol's Name
  0000000000000000  0000000400000001 R_BPF_64_64            0000000000000000 .data
  0000000000000018  0000000400000001 R_BPF_64_64            0000000000000000 .data
  -bash-4.4$ llvm-readelf -s g1.o

  Symbol table '.symtab' contains 6 entries:
     Num:    Value          Size Type    Bind   Vis      Ndx Name
       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g1.c
       2: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    4 a
       3: 0000000000000008     4 OBJECT  LOCAL  DEFAULT    4 b
       4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
       5: 0000000000000000    64 FUNC    GLOBAL DEFAULT    2 test
  -bash-4.4$ llvm-objdump -d g1.o

  g1.o:   file format ELF64-BPF

  Disassembly of section .text:
  0000000000000000 test:
       0:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00         r1 = 0 ll
       2:       79 11 00 00 00 00 00 00         r1 = *(u64 *)(r1 + 0)
       3:       18 02 00 00 08 00 00 00 00 00 00 00 00 00 00 00         r2 = 8 ll
       5:       61 20 00 00 00 00 00 00         r0 = *(u32 *)(r2 + 0)
       6:       0f 10 00 00 00 00 00 00         r0 += r1
       7:       95 00 00 00 00 00 00 00         exit
  -bash-4.4$

  . from symbol table, static variable "a" is in section #4, offset 0.
  . from symbol table, static variable "b" is in section #4, offset 8.
  . the first relocation is against symbol #4:
    4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
    and in-section offset 0 (see llvm-objdump result)
  . the second relocation is against symbol #4:
    4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
    and in-section offset 8 (see llvm-objdump result)
  . therefore, the first relocation is for variable "a", and
    the second relocation is for variable "b".

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 354954
2019-02-27 05:36:15 +00:00
Heejin Ahn 82da1ffc16 [WebAssembly] Fix ScopeTops info in CFGStackify for EH pads
Summary:
When creating `ScopeTops` info for `try` ~ `catch` ~ `end_try`, we
should create not only `end_try` -> `try` mapping but also `catch` ->
`try` mapping as well. If this is not created, `block` and `end_block`
markers later added may span across an existing `catch`, resulting in
the incorrect code like:
```
try
  block     --|  (X)
catch         |
  end_block --|
end_try
```

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58605

llvm-svn: 354945
2019-02-27 01:35:14 +00:00
Heejin Ahn cf699b4534 [WebAssembly] Remove unnecessary instructions after TRY marker placement
Summary:
This removes unnecessary instructions after TRY marker placement. There
are two cases:
- `end`/`end_block` can be removed if they overlap with `try`/`end_try`
  and they have the same return types.
- `br` right before `catch` that branches to after `end_try` can be
  deleted.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58591

llvm-svn: 354939
2019-02-27 00:50:53 +00:00
Jonas Paulsson 129826cd9f [SystemZ] Pass regalloc hints to help Load-and-Test transformations.
Since there is no "Load-and-Test-High" instruction, the 32 bit load of a
register to be compared with 0 can only be implemented with LT if the virtual
GRX32 register ends up in a low part (GR32 register).

This patch detects these cases and passes the GR32 registers (low parts) as
(soft) hints in getRegAllocationHints().

Review: Ulrich Weigand.
llvm-svn: 354935
2019-02-27 00:18:28 +00:00
Stanislav Mekhanoshin da1628eb67 [AMDGPU] Fixed hang during DAG combine
SITargetLowering::reassociateScalarOps() does not touch constants
so that DAGCombiner::ReassociateOps() does not revert the combine.
However a global address is not a ConstantSDNode.

Switched to the method used by DAGCombiner::ReassociateOps() itself
to detect constants.

Differential Revision: https://reviews.llvm.org/D58695

llvm-svn: 354926
2019-02-26 20:56:25 +00:00
Reid Kleckner 8fda7e15e6 [X86] Fix bug in vectorcall calling convention
Original implementation can't correctly handle __m256 and __m512 types
passed by reference through stack. This patch fixes it.

Patch by Wei Xiao!

Differential Revision: https://reviews.llvm.org/D57643

llvm-svn: 354921
2019-02-26 19:48:16 +00:00
Petar Avramovic bd39569913 [MIPS GlobalISel] Select G_UADDO
Lower G_UADDO.
Legalize G_UADDO for MIPS32

Differential Revision: https://reviews.llvm.org/D58671

llvm-svn: 354900
2019-02-26 17:22:42 +00:00
Ganesh Gopalasubramanian e172d7008d [X86] AMD znver2 enablement
This patch enables the following

1) AMD family 17h "znver2" tune flag (-march, -mcpu).
2) ISAs that are enabled for "znver2" architecture.
3) For the time being, it uses the znver1 scheduler model.
4) Tests are updated.
5) Scheduler descriptions are yet to be put in place.

Reviewers: craig.topper

Differential Revision: https://reviews.llvm.org/D58343

llvm-svn: 354897
2019-02-26 16:55:10 +00:00
Jonas Paulsson c110b5b69f [SystemZ] Wait with selection of legal vector/FP constants until Select().
This patch aims to make sure that any such constant that can be generated
with a vector instruction (for example VGBM) is recognized as such during
legalization and kept as a target independent node through post-legalize
DAGCombining.

Two new functions named isVectorConstantLegal() and loadVectorConstant()
replace old ways of handling vector/FP constants.

A new struct named SystemZVectorConstantInfo is used to cache the results of
isVectorConstantLegal() and pass them onto loadVectorConstant().

Support for fp128 constants in the presence of FeatureVectorEnhancements1
(z14) has been added.

Review: Ulrich Weigand
https://reviews.llvm.org/D58270

llvm-svn: 354896
2019-02-26 16:47:59 +00:00
Simon Atanasyan 8cb497027d [mips] Emit `.module softfloat` directive
This change fixes crash on an assertion in case of using
`soft float` ABI for mips32r6 target.

llvm-svn: 354882
2019-02-26 14:45:17 +00:00
Igor Kudrin 2d3faad706 [llvm-objdump] Implement -Mreg-names-raw/-std options.
The --disassembler-options, or -M, are used to customize
the disassembler and affect its output.

The two implemented options allow selecting register names on ARM:
* With -Mreg-names-raw, the disassembler uses rNN for all registers.
* With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15,
  which is the default behavior of llvm-objdump.

Differential Revision: https://reviews.llvm.org/D57680

llvm-svn: 354870
2019-02-26 12:15:14 +00:00
Luke Cheeseman 9e285bef2b [ARM] Add Cortex-M35P
- Add LLVM backend support for Cortex-M35P
- Documentation can be found at
  https://developer.arm.com/products/processors/cortex-m/cortex-m35p

Differentail Revision: https://reviews.llvm.org/D57763

llvm-svn: 354868
2019-02-26 12:02:12 +00:00
Dan Gohman c71132c0be [WebAssembly] Properly align fp128 arguments in outgoing varargs arguments
For outgoing varargs arguments, it's necessary to check the OrigAlign field
of the corresponding OutputArg entry to determine argument alignment, rather
than just computing an alignment from the argument value type. This is
because types like fp128 are split into multiple argument values, with
narrower types that don't reflect the ABI alignment of the full fp128.

This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase.

Differential Revision: https://reviews.llvm.org/D58656

llvm-svn: 354846
2019-02-26 05:20:19 +00:00
Philip Reames 38b14e33a8 [ARM] Be super conservative about atomics
As requested during review of D57601 <https://reviews.llvm.org/D57601> https://reviews.llvm.org/D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.

Differential Revision: https://reviews.llvm.org/D58490

Note: D58498 landed in several pieces as individual backends were approved.  This is the last chunk.
llvm-svn: 354845
2019-02-26 04:30:33 +00:00
Heejin Ahn d2a56ac661 [WebAssembly] Fix a bug deleting instruction in a ranged for loop
Summary: We shouldn't delete elements while iterating a ranged for loop.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58519

llvm-svn: 354844
2019-02-26 04:08:49 +00:00
Reid Kleckner 2f055f026a [X86] Fix bug in x86_intrcc with arg copy elision
Summary:
Use a custom calling convention handler for interrupts instead of fixing
up the locations in LowerMemArgument. This way, the offsets are correct
when constructed and we don't need to account for them in as many
places.

Depends on D56883

Replaces D56275

Reviewers: craig.topper, phil-opp

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56944

llvm-svn: 354837
2019-02-26 02:11:25 +00:00
Matt Arsenault 752579736e RegBankSelect: Handle slightly more complex value mappings
Try to use concat_vectors. Also remove unnecessary assert on
pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers
for AMDGPU.

llvm-svn: 354828
2019-02-25 22:24:13 +00:00
Matt Arsenault f4bfe4cd17 AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes
llvm-svn: 354825
2019-02-25 21:32:48 +00:00
Matt Arsenault 82b103998b AMDGPU/GlobalISel: Clamp max implicit_def elements
llvm-svn: 354818
2019-02-25 20:46:06 +00:00
Matt Arsenault f97ace5639 AMDGPU: Remove IntrReadMem from memtime/memrealtime intrinsics
EarlyCSE with MemorySSA was able to use this to merge multiple calls
with no intervening store.

llvm-svn: 354814
2019-02-25 20:16:11 +00:00
Craig Topper 316c58e8f1 [X86] Improve detection of unneeded shift amount masking to also handle the case that the LHS has known zeroes in it
If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case.

Differential Revision: https://reviews.llvm.org/D58475

llvm-svn: 354811
2019-02-25 19:42:47 +00:00
Matt Arsenault fd6fd00773 AMDGPU: Correct definitions for bitset instructions
These really read and write the result register, so these need a tied
input.

llvm-svn: 354809
2019-02-25 19:24:46 +00:00
Nikita Popov fcbd7f6495 [Mips] Fix missing masking in fast-isel of br (PR40325)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40325 by zero extending
(and x, 1) the condition before branching on it.

To avoid regressing trivial cases, I'm combining emission of cmp+br
sequences for the single-use + same block case (similar to what we
do in x86). icmpbr1.ll still regresses due to the cross-bb usage
of the condition.

Differential Revision: https://reviews.llvm.org/D58576

llvm-svn: 354808
2019-02-25 18:54:17 +00:00
Amara Emerson 6bcfa1c419 [AArch64][GlobalISel] Refactor selectBuildVector to use MachineIRBuilder. NFC.
This is a preparatory change as I want to use emitScalarToVector() elsewhere,
and in general we want to transition to MIRBuilder instead of using BuildMI
directly.

Differential Revision: https://reviews.llvm.org/D58528

llvm-svn: 354807
2019-02-25 18:52:54 +00:00
Philip Reames a64de6720b [Lanai] Be super conservative about atomics
As requested during review of D57601 <https://reviews.llvm.org/D57601>, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.

Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.

llvm-svn: 354800
2019-02-25 17:36:10 +00:00
David Green b504f104b2 [ARM] Add some more missing T1 opcodes for the peephole optimisier
This adds a few extra Thumb1 opcodes to improve the peephole opimisers
ability to remove redundant cmp instructions. tADC and tSBC require
a small fixup to prevent MOVS being moved past the instruction, giving
the wrong flags.

Differential Revision: https://reviews.llvm.org/D58281

llvm-svn: 354791
2019-02-25 15:50:54 +00:00
Luke Cheeseman 59f77e7891 [AArch64] Add support for Cortex-A76 and Cortex-A76AE
- Add LLVM backend support for Cortex-A76 and Cortex-A76AE
- Documentation can be found at
  https://developer.arm.com/products/processors/cortex-a/cortex-a76

llvm-svn: 354788
2019-02-25 15:08:27 +00:00
Simon Pilgrim c61f1e8e6c [X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483)
Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results.

Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB.

Differential Revision: https://reviews.llvm.org/D58597

llvm-svn: 354771
2019-02-25 11:19:37 +00:00
Simon Tatham b70fc0c5fd [ARM] Make fullfp16 instructions not conditionalisable.
More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.

In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.

(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)

So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.

I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.

Reviewers: SjoerdMeijer, t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57823

llvm-svn: 354768
2019-02-25 10:39:53 +00:00
Kang Zhang 4faa4090c9 [PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts
Summary:
Fast selection of llvm fptoi & fptrunc instructions is not handled well about
VSX instruction support.
We'd use VSX float convert integer instruction instead of non-vsx float convert
integer instruction if the operand register class is VSSRC or VSFRC because i32
and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is
openeded.
For float trunc instruction, we do this silimar work like float convert integer
instruction to try to use VSX instruction.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D58430

llvm-svn: 354762
2019-02-25 02:46:16 +00:00
Simon Pilgrim cfaf663a35 [X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637)
Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step.

llvm-svn: 354757
2019-02-24 19:57:52 +00:00
Craig Topper 3fe4bd464c [X86] Fix tls variable lowering issue with large code model
Summary:
The problem here is the lowering for tls variable. Below is the DAG for the code.
SelectionDAG has 11 nodes:

t0: ch = EntryToken
      t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64
        t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
      t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
    t12: i64 = add t8, t11
  t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64
t6: ch = CopyToReg t0, Register:i32 %0, t4
And when mcmodel is large, below instruction can NOT be folded.

  t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0"

When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails.

The patch is to fold the load and X86ISD::WrapperRIP.

Fixes PR26906

Patch by LuoYuanke

Reviewers: craig.topper, rnk, annita.zhang, wxiao3

Reviewed By: rnk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58336

llvm-svn: 354756
2019-02-24 19:33:37 +00:00
Craig Topper 5532a98737 [X86][SSE] Use pblendw for v4i32/v2i64 during isel.
Summary:

Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58574

llvm-svn: 354755
2019-02-24 19:23:41 +00:00
Craig Topper ce2bd19c49 [X86] Correct some ADC/SBB with immediate scheduler data for Broadwell and Skylake.
Summary:
The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate.

The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate.

Reviewers: RKSimon, andreadb

Reviewed By: RKSimon

Subscribers: gbedwell, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58581

llvm-svn: 354754
2019-02-24 19:23:39 +00:00
Craig Topper be3348573e [LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents
Summary:
When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it.

We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used.

Reviewers: spatel, RKSimon, nikic

Reviewed By: nikic

Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58567

llvm-svn: 354753
2019-02-24 19:23:36 +00:00
Simon Pilgrim 4f4f9abdfa [X86][AVX] Rename lowerShuffleByMerging128BitLanes to lowerShuffleAsLanePermuteAndRepeatedMask. NFC.
Name better matches the other similar 'lane permute' and 'repeated mask' functions we have.

llvm-svn: 354749
2019-02-24 17:30:06 +00:00
Heejin Ahn 20cf0749cb [WebAssembly] Rename a variable in CFGStackify (NFC)
llvm-svn: 354744
2019-02-24 08:30:06 +00:00
Heejin Ahn 25d924b41f [WebAssembly] Merge two identical switch case routines into one (NFC)
llvm-svn: 354743
2019-02-24 08:19:55 +00:00
Philip Reames 33d7e49bb7 [Hexagon, SystemZ] Be super conservative about atomics
As requested during review of D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.

Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.  

llvm-svn: 354740
2019-02-24 00:45:09 +00:00
Craig Topper be9eeb5526 Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types"
And its follow ups r354511, r354640.

A follow patch will fix the issue that caused it to be reverted.

llvm-svn: 354737
2019-02-23 21:41:42 +00:00
Craig Topper ccc860cb81 Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract"
r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit."

These were reverted in r354713 as their context depended on other patches that were reverted for a bug.

llvm-svn: 354734
2019-02-23 19:51:32 +00:00
Nikita Popov e661f946a7 [WebAssembly] Fix select of and (PR40805)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by
patterns added in D53676.

I'm removing the patterns entirely here, as they are not correct
in the general case. If necessary something more specific can be
added in the future.

Differential Revision: https://reviews.llvm.org/D58575

llvm-svn: 354733
2019-02-23 18:59:01 +00:00
Simon Pilgrim f383a47b7d [X86][AVX] combineInsertSubvector - remove concat_vectors(load(x),load(x)) --> sub_vbroadcast(x)
D58053/rL354340 added this to EltsFromConsecutiveLoads directly

llvm-svn: 354732
2019-02-23 18:53:03 +00:00
Simon Pilgrim 398d0b9e96 Fix MSVC constant truncation warnings. NFCI.
llvm-svn: 354731
2019-02-23 18:49:02 +00:00
Simon Pilgrim e08f177ea2 [X86][AVX] concat_vectors(scalar_to_vector(x),scalar_to_vector(x)) --> broadcast(x)
For AVX1, limit this to i32/f32/i64/f64 loading cases only.

llvm-svn: 354730
2019-02-23 18:34:05 +00:00
Simon Pilgrim 31793733a0 [X86][AVX] Shuffle->Permute+Blend if we have one v4f64/v4i64 shuffle input in place
Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD.

llvm-svn: 354729
2019-02-23 17:10:47 +00:00
Craig Topper 75afc0105c [X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel.
Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t.

This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost.

The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results.

llvm-svn: 354724
2019-02-23 08:34:10 +00:00
Jordan Rupprecht 6387fa2715 [NFC] Fix typos: preceeding -> preceding
llvm-svn: 354715
2019-02-23 01:28:32 +00:00
Reid Kleckner e3876637cf Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more value types"
r354363 caused https://crbug.com/934963#c1, which has a plain C reduced
test case.

I also had to revert some dependent changes:
- r354648
- r354647
- r354640
- r354511

llvm-svn: 354713
2019-02-23 01:19:42 +00:00
Craig Topper a9697f24cf [X86] Enable custom splitting of v8i64/v16i32 sext/zext for avx/avx2 when input type will be promoted by the type legalize to 128-bits.
If the the input type will be promoted to 128 bits its better to put a sign_extend_inreg/and in the 128 bit register before the split occurs. Otherwise we end up doing it on each half in the wider register.

Some of the overflow arithmetic tests are regressions, but I think we can make some improvement using getSetccResultType in DAG combine and/or type legalization.

llvm-svn: 354709
2019-02-23 00:35:02 +00:00
Konstantin Zhuravlyov 9a278bf6b5 Revert "AMDGPU/NFC: Cleanup subtarget predicates"
It breaks one of our downstream merges, so revert it
temporarily while investigating failures downstream

llvm-svn: 354700
2019-02-22 23:21:06 +00:00
Sam Clegg 8fffa1dfa3 [WebAssembly] Remove unneeded MCSymbolRefExpr variants
We record the type of the symbol (event/function/data/global) in the
MCWasmSymbol and so it should always be clear how to handle a relocation
based on the symbol itself.

The exception is a function which still needs the special @TYPEINDEX
then the relocation contains the signature rather than the address
of the functions.

Differential Revision: https://reviews.llvm.org/D58472

llvm-svn: 354697
2019-02-22 22:29:34 +00:00
Matt Arsenault 476e26b5d3 AMDGPU: Use removeAllRegUnitsForPhysReg
llvm-svn: 354686
2019-02-22 19:03:36 +00:00
Sam Clegg a5e68748bf [WebAssembly] Remove debug statement submitted in rL354657
Subscribers: dschuff, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58549

llvm-svn: 354684
2019-02-22 19:00:03 +00:00
Sanjay Patel a9e289174a [x86] allow narrowing of vector UINT_TO_FP
As discussed in:
D56864
D58197

Always use the narrow (128-bit) instruction when possible.
We already had the signed int version of this transform.

llvm-svn: 354675
2019-02-22 15:47:45 +00:00
Sanjay Patel 1baf7896cc [x86] simplify code in combineExtractSubvector; NFC
Only the 1st fold is attempted pre-legalization, but it requires
legal (simple) types too, so we don't need an EVT in any of the code.

llvm-svn: 354674
2019-02-22 15:28:22 +00:00
Petar Jovanovic 6083106b12 [mips][micromips] fix filling delay slots for PseudoIndirectBranch_MM
Filling a delay slot in 32bit jump instructions with a 16bit instruction
can cause issues. According to the documentation such an operation is
unpredictable.
This patch adds opcode Mips::PseudoIndirectBranch_MM alongside
Mips::PseudoIndirectBranch and other instructions that are expanded to jr
instruction and do not allow a 16bit instruction in their delay slots.

Patch by Mirko Brkusanin.

Differential Revision: https://reviews.llvm.org/D58507

llvm-svn: 354672
2019-02-22 14:53:58 +00:00
David Green acb628b2af [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs
This adds a number of missing Thumb1 opcodes so that the peephole optimiser can
remove redundant CMP instructions.

Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri
instruction can be used with frame indices. In thumb1 we use tADDframe.

Differential Revision: https://reviews.llvm.org/D57833

llvm-svn: 354667
2019-02-22 12:23:31 +00:00
Diana Picus 35e1c6663c [ARM GlobalISel] Support floating point for Thumb2
This is exactly the same as arm mode, so for the instruction selector
tests we just extract them to a new file and run with the same checks
for both arm and thumb mode.

For the legalizer we need to update the tests for soft float a bit, but
only because BL and tBL are slightly different. We could be pedantic and
check that we get a well-formed BL for arm mode and a tBL for thumb, but
for the purposes of the legalizer test it's sufficient to just skip over
the predicate operands in the checks. Also note that we have the
pedantic checks in the divmod test, so we're covered.

llvm-svn: 354665
2019-02-22 09:54:54 +00:00
Heejin Ahn 85631d8b50 [WebAssembly] Remove getBottom function from CFGStackify (NFC)
Summary:
This removes `getBottom` function and the bookeeping map of <begin
marker instruction, bottom BB>.

Reviewers: dschuff

Subscribers: sunfish, sbc100, jgravelle-google, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58319

llvm-svn: 354657
2019-02-22 07:19:30 +00:00
Craig Topper 3a391fc0e8 [X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit.
Type legalization is causing two nodes to be created here, but we can use a single node to extend from v8i16 to v2i64.

llvm-svn: 354648
2019-02-22 01:49:53 +00:00
Craig Topper 427404c769 [X86] Fix some copy/paste mistakes that caused a VR128 to be used as the address of a load in an isel pattern
This was introduced in r354511.

Fixes PR40811.

llvm-svn: 354640
2019-02-22 00:04:35 +00:00
Matt Arsenault aa6fb4c45e AMDGPU: Remove debugger related subtarget features
As far as I know these aren't needed anymore.

llvm-svn: 354634
2019-02-21 23:27:46 +00:00
Craig Topper 2b34fdc67f [X86] Remove hasSideEffects=1 from the X87 pseudos with folded load.
This was done in r321424 to prevent scheduling from reordering things. But now that we model FPCW as a dependency, I don't think the same scheduling we were trying to prevent can occur.

llvm-svn: 354628
2019-02-21 22:00:15 +00:00
Konstantin Zhuravlyov c2650178a1 AMDGPU/NFC: Cleanup subtarget predicates
Differential Revision: https://reviews.llvm.org/D58522

llvm-svn: 354620
2019-02-21 20:43:43 +00:00
Sanjay Patel 234a5e8ea4 [x86] vectorize more cast ops in lowering to avoid register file transfers
This is a follow-up to D56864.

If we're extracting from a non-zero index before casting to FP,
then shuffle the vector and optionally narrow the vector before doing the cast:

cast (extelt V, C) --> extelt (cast (extract_subv (shuffle V, [C...]))), 0

This might be enough to close PR39974:
https://bugs.llvm.org/show_bug.cgi?id=39974

Differential Revision: https://reviews.llvm.org/D58197

llvm-svn: 354619
2019-02-21 20:40:39 +00:00
Amara Emerson 1abe05c0dd Re-land "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR""
Thanks to Richard Trieu for pointing out that the failures were due to a
use-after-free of an ArrayRef.

llvm-svn: 354616
2019-02-21 20:20:16 +00:00
Krzysztof Parzyszek f6e875bacf [Hexagon] Use misaligned load instead of trap0(#0) for __builtin_trap
The trap instruction is intercepted by various runtime environments,
and instead of a crash it creates confusion.

This reapplies r354606 with a fix.

llvm-svn: 354611
2019-02-21 19:42:39 +00:00
Krzysztof Parzyszek 948c9f93c4 Revert r354606, it breaks asan tests
llvm-svn: 354609
2019-02-21 19:33:58 +00:00
Krzysztof Parzyszek 5f47fac3a2 [Hexagon] Use misaligned load instead of trap0(#0) for __builtin_trap
The trap instruction is intercepted by various runtime environments,
and instead of a crash it creates confusion.

llvm-svn: 354606
2019-02-21 18:39:22 +00:00
Mark Searles 599ce44d3f [AMDGPU] remove unused AssemblerPredicates
An internal build is hitting asserts complaining about too many subtarget
features:
  llvm/utils/TableGen/Types.cpp:42:
    const char* llvm::getMinimalTypeForEnumBitfield(uint64_t):
    Assertion `MaxIndex <= 64 && "Too many bits"' failed.

  llvm/utils/TableGen/AsmMatcherEmitter.cpp:1476:
    void {anonymous}::AsmMatcherInfo::buildInfo():
    Assertion `SubtargetFeatures.size() <= 64 && "Too many subtarget features!"'
    failed.

The short-term solution is to remove a few unused AssemblerPredicates to get
under the limit.

The long-term solution seems to be to revisit these asserts. E.g., rather than
hardcoded '64', use the standard sized std::bitset like the other places that
track subtarget features.

Differential Revision: https://reviews.llvm.org/D58516

llvm-svn: 354604
2019-02-21 18:19:54 +00:00
Matt Arsenault 2e0ee47712 AMDGPU/GlobalISel: Make phis legal
llvm-svn: 354592
2019-02-21 15:48:13 +00:00
Nirav Dave dce91c1edb [X86] Fix copy-paste error in @ccz flag.
@ccz operand should be equivalent to @cce.

llvm-svn: 354588
2019-02-21 15:28:31 +00:00
Matt Arsenault b10fa8df3f AMDGPU/GlobalISel: Fix bit count ops for non-power-of-2 types
llvm-svn: 354587
2019-02-21 15:22:20 +00:00
Alex Bradbury db67be889d [RISCV][NFC] IsEligibleForTailCallOptimization -> isEligibleForTailCallOptimization
Also clang-format the modified hunks.

llvm-svn: 354584
2019-02-21 14:31:41 +00:00
Alex Bradbury 047170cfc3 [RISCV] Add implied zero offset load/store alias patterns
Allow load/store instructions with implied zero offset for compatibility with
GNU assembler.

Differential Revision: https://reviews.llvm.org/D57141
Patch by James Clarke.

llvm-svn: 354581
2019-02-21 14:09:34 +00:00
Diana Picus dcaa939ab7 [ARM GlobalISel] Support G_FRAME_INDEX for Thumb2
Same as arm mode.

llvm-svn: 354579
2019-02-21 13:00:02 +00:00
Simon Pilgrim e6b338cbef [X86][SSE] combineX86ShufflesRecursively - moved to generic op input index lookup. NFCI.
We currently bail if the target shuffle decodes to more than 2 input vectors, this change alters the input index to work for any number of inputs for when we drop that requirement.

llvm-svn: 354575
2019-02-21 12:24:49 +00:00
David Green 7a183a86be Revert 354564: [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs
I believe it's causing bootstrap failures for A32 code. I'll take a look at
what's wrong.

llvm-svn: 354569
2019-02-21 11:03:13 +00:00
David Spickett 8ac2b181a1 [AArch64] Print instruction before atomic semantic annotations
Commit r353303 added annotations when acquire semantics
were dropped from an instruction.

printAnnotation was called before printInstruction.
So if you didn't set a separate comment output stream
you got <comment><instr> instead of <instr><comment>
as expected.

To fix this move the new printAnnotation to after
the instruction is printed.

Differential Revision: https://reviews.llvm.org/D58059

llvm-svn: 354565
2019-02-21 10:42:49 +00:00
David Green 89efe24eba [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs
This adds a number of missing Thumb1 opcodes so that the peephole optimiser can
remove redundant CMP instructions.

Differential Revision: https://reviews.llvm.org/D57833

llvm-svn: 354564
2019-02-21 10:30:09 +00:00
Sam Parker 6ed47bee27 [ARM] Negative constants mishandled in ARM CGP
During type promotion, sometimes we convert negative an add with a
negative constant into a sub with a positive constant. The loop that
performs this transformation has two issues:
- it iterates over a set, causing non-determinism.
- it breaks, instead of continuing, when it finds the first
  non-negative operand.

Differential Revision: https://reviews.llvm.org/D58452

llvm-svn: 354557
2019-02-21 09:33:18 +00:00
Sam Clegg 1634516e35 [WebAssembly] Default to something reasonable in WebAssemblyAddMissingPrototypes
Previously if we couldn't derive a prototype for a "no-prototype"
function from C we would leave it as is:

  void foo(...)

With this change we instead give is an empty signature and remove
the "no-prototype" attribute.

This fixes the current wasm waterfall test failure.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58488

llvm-svn: 354544
2019-02-21 03:27:00 +00:00
Stanislav Mekhanoshin 42e229e130 [AMDGPU] fix commuted case of sub combine
Differential Revision: https://reviews.llvm.org/D58481

llvm-svn: 354543
2019-02-21 02:58:00 +00:00
Amara Emerson 71f2a5e60f Revert "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR"
This reverts r354521 because it broke the bots, but passes on Darwin somehow.

llvm-svn: 354532
2019-02-21 00:31:13 +00:00
Sam Clegg 6028c969ac [WebAssembly] Don't error on conflicting uses of prototype-less functions
When we can't determine with certainty the signature of a function
import we pick the fist signature we find rather than error'ing out.

The resulting program might not do what is expected since we might pick
the wrong signature.  However since undefined behavior in C to use the
same function with different signatures this seems better than refusing
to compile such programs.

Fixes PR40472

Differential Revision: https://reviews.llvm.org/D58304

llvm-svn: 354523
2019-02-20 22:40:57 +00:00
Amara Emerson a946d057b4 [AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR
This change makes some basic type combinations for G_SHUFFLE_VECTOR legal, and
implements them with a very pessimistic TBL2 instruction in the selector.

For TBL2, support is also needed to generate constant pool entries and load from
them in order to materialize the mask register.

Currently supports <2 x s64> and <4 x s32> result types.

Differential Revision: https://reviews.llvm.org/D58466

llvm-svn: 354521
2019-02-20 22:11:39 +00:00
Tom Stellard 79b5c3842b AMDGPU/GlobalISel: Move SMRD selection logic to TableGen
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52922

llvm-svn: 354516
2019-02-20 21:02:37 +00:00
Nikita Popov c3b496de7a [SDAG] Support vector UMULO/SMULO
Second part of https://bugs.llvm.org/show_bug.cgi?id=40442.

This adds an extra UnrollVectorOverflowOp() method to SDAG, because
the general UnrollOverflowOp() method can't deal with multiple results.

Additionally we need to expand UMULO/SMULO during vector op
legalization, as it may result in unrolling, which may need additional
type legalization.

Differential Revision: https://reviews.llvm.org/D57997

llvm-svn: 354513
2019-02-20 20:41:44 +00:00
Craig Topper 31823fba2e [X86] Add more load folding patterns for blend instructions as a follow up to r354363.
This avoids depending on the peephole pass to do load folding.

Also adds some load folding for some insert_subvector patterns that use blend.

All of this was found by temporarily adding TB_NO_FORWARD to the blend immediate entries in the load folding tables.

I've added -disable-peephole to some of the affected tests from that experiment to ensure we're testing isel patterns.

llvm-svn: 354511
2019-02-20 20:18:20 +00:00
Simon Pilgrim dca47c659c [X86][SSE] combineX86ShufflesRecursively - begin generalizing the number of shuffle inputs. NFCI.
We currently bail if the target shuffle decodes to more than 2 input vectors, this is some initial cleanup that still has the limit but generalizes the opindices to an array that will be necessary when we drop the limit.

llvm-svn: 354489
2019-02-20 17:58:29 +00:00
Matt Arsenault 75e30c4d5d GlobalISel: Fix fewerElementsVector for ctlz with different result type
Also complete the set of related operations.

llvm-svn: 354480
2019-02-20 16:42:52 +00:00
Matt Arsenault c4d07554e4 GlobalISel: Implement moreElementsVector for g_insert results
llvm-svn: 354477
2019-02-20 16:11:22 +00:00
Krzysztof Parzyszek 6128ac5a8f [Hexagon] Split vector pairs for ISD::SIGN_EXTEND and ISD::ZERO_EXTEND
llvm-svn: 354473
2019-02-20 15:05:19 +00:00
Petar Avramovic dee5846b4a [MIPS MSA] Avoid some DAG combines for vector shifts
DAG combiner combines two shifts into shift + and with bitmask.
Avoid such combines for vectors since leaving two vector shifts
as they are produces better end results.

Differential Revision: https://reviews.llvm.org/D58225

llvm-svn: 354461
2019-02-20 13:42:44 +00:00
Craig Topper e4025c5eb1 [X86] Remove FeatureSlowIncDec from Sandy Bridge and later Intel Core CPUs
Summary:
Inc and Dec were at one point slow on Intel CPUs due to their tendency to cause partial flag stalls on P6 derived CPU cores. This is because these instructions are defined to preserve the carry flag. This partial flag stall issue persisted until Sandy Bridge when flag merging was changed to be handled as a data dependency instead of as a stall until retirement. Sandy Bridge and later CPUs rename the C flag separately from OSPAZ so there is no flag merge needed on INC/DEC to preserve the C flag.

Given these improvements I don't know why INC/DEC was ever considered slow on Sandy Bridge. If anything they should have been disabled on the earlier CPUs instead.

Note after this patch, INC/DEC are still considered slow on Silvermont, Goldmont, Knights Landing and our generic "x86-64" CPU.

Reviewers: spatel, RKSimon, chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D58412

llvm-svn: 354436
2019-02-20 05:39:11 +00:00
Eric Christopher 2534592b9f Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)"
As this has broken the lto bootstrap build for 3 days and is
showing a significant regression on the Dither_benchmark results (from
the LLVM benchmark suite) -- specifically, on the
BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and
BENCHMARK_FLOYD_DITHER_512; the others are unchanged.  These have
regressed by about 28% on Skylake, 34% on Haswell, and over 40% on
Sandybridge.

This reverts commit r353923.

llvm-svn: 354434
2019-02-20 04:42:07 +00:00
Kito Cheng 303217e8b4 [RISCV] Implement pseudo instructions for load/store from a symbol address.
Summary:
Those pseudo-instructions are making load/store instructions able to
load/store from/to a symbol, and its always using PC-relative addressing
to generating a symbol address.

Reviewers: asb, apazos, rogfer01, jrtc27

Differential Revision: https://reviews.llvm.org/D50496

llvm-svn: 354430
2019-02-20 03:31:32 +00:00
Chen Zheng ffece2dfcf [PowerPC] exploit P9 instruction maddld.
Differential Revision: https://reviews.llvm.org/D58364

llvm-svn: 354427
2019-02-20 02:30:06 +00:00
Heejin Ahn 20ea1826f7 [WebAssembly] Refactor atomic operation definitions (NFC)
Summary:
- Make `ATOMIC_I`, `ATOMIC_NRI`, `AtomicLoad`, `AtomicStore` classes and
  make other operations inherit from them
- Factor the common opcode prefix '0xfe' out from the opcodes into the
  common class
- Reorder instructions in the order of increasing opcodes

Reviewers: tlively

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58338

llvm-svn: 354421
2019-02-20 01:29:34 +00:00
Heejin Ahn 3477bd12a0 [WebAssembly] Fix load/store name detection for atomic instructions
Summary:
Fixed a bug in the routine in AsmParser that determines whether the
current instruction is a load or a store. Atomic instructions' prefixes
are not `atomic_` but `atomic.`, and all atomic instructions are also
memory instructions. Also fixed the printing format of atomic
instructions to match other memory instructions and added encoding tests
for atomic instructions.

Reviewers: aardappel, tlively

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58337

llvm-svn: 354419
2019-02-20 01:14:36 +00:00
Wouter van Oortmerssen 8a28ce1a12 [WebAssembly] Fixed disassembler not knowing about OPERAND_EVENT
Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58414

llvm-svn: 354416
2019-02-20 00:55:59 +00:00
Thomas Lively 2e1504091e [WebAssembly] Update MC for bulk memory
Summary:
Rename MemoryIndex to InitFlags and implement logic for determining
data segment layout in ObjectYAML and MC. Also adds a "passive" flag
for the .section assembler directive although this cannot be assembled
yet because the assembler does not support data sections.

Reviewers: sbc100, aardappel, aheejin, dschuff

Subscribers: jgravelle-google, hiraditya, sunfish, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57938

llvm-svn: 354397
2019-02-19 22:56:19 +00:00
Craig Topper 8eade09249 [X86] Mark FP32_TO_INT16_IN_MEM/FP32_TO_INT32_IN_MEM/FP32_TO_INT64_IN_MEM as clobbering EFLAGS to prevent mis-scheduling during conversion from SelectionDAG to MIR.
After r354178, these instruction expand to a sequence that uses an OR instruction. That OR clobbers EFLAGS so we need to state that to avoid accidentally using the clobbered flags.

Our tests show the bug, but I didn't notice because the SETcc instructions didn't move after r354178 since it used to be safe to do the fp->int conversion first.

We should probably convert this whole sequence to SelectionDAG instead of a custom inserter to avoid mistakes like this.

Fixes PR40779

llvm-svn: 354395
2019-02-19 22:37:00 +00:00
Jinsong Ji 58bab8e690 PowerPC: Fix typos in comments
llvm-svn: 354382
2019-02-19 21:25:13 +00:00
Craig Topper 6d0190f21c [X86] Don't consider functions ABI compatible for ArgumentPromotion pass if they view 512-bit vectors differently.
The use of the -mprefer-vector-width=256 command line option mixed with functions
using vector intrinsics can create situations where one function thinks 512 vectors
are legal, but another fucntion does not.

If a 512 bit vector is passed between them via a pointer, its possible ArgumentPromotion
might try to pass by value instead. This will result in type legalization for the two
functions handling the 512 bit vector differently leading to runtime failures.

Had the 512 bit vector been passed by value from clang codegen, both functions would
have been tagged with a min-legal-vector-width=512 function attribute. That would
make them be legalized the same way.

I observed this issue in 32-bit mode where a union containing a 512 bit vector was
being passed by a function that used intrinsics to one that did not. The caller
ended up passing in zmm0 and the callee tried to read it from ymm0 and ymm1.

The fix implemented here is just to consider it a mismatch if two functions
would handle 512 bit differently without looking at the types that are being
considered. This is the easist and safest fix, but it can be improved in the future.

Differential Revision: https://reviews.llvm.org/D58390

llvm-svn: 354376
2019-02-19 20:12:20 +00:00
Simon Pilgrim 0b3b9424ca [X86][SSE] Generalize X86ISD::BLENDI support to more value types
D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required.

With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors.

This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts.

I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case).

The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV.

Reapplied after reversion at rL353699 - AVX2 isel fix was applied at rL354358, additional test at rL354360/rL354361

Differential Revision: https://reviews.llvm.org/D57888

llvm-svn: 354363
2019-02-19 18:05:42 +00:00
Simon Pilgrim dce9c2a811 [X86][AVX2] Hide VPBLENDD instructions behind AVX2 predicate
This was the cause of the regression in D57888 - the commuted load pattern wasn't hidden by the predicate so once we enabled v4i32 blends on SSE41+ targets then isel was incorrectly matched against AVX2+ instructions.

llvm-svn: 354358
2019-02-19 17:23:55 +00:00
Craig Topper 51a2e88990 [X86] Bugfix for nullptr check by klocwork
klocwork critical issues in CG files:

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D58363

llvm-svn: 354357
2019-02-19 17:16:23 +00:00
Craig Topper d8acfe69f0 X86AsmParser AVX-512: Return error instead of hitting assert
When parsing a sequence of tokens beginning with {, it will hit an assert and crash if the token afterwards is not an identifier. Instead of this, return a more verbose error as seen elsewhere in the function.

Patch by Brandon Jones (BrandonTJones)

Differential Revision: https://reviews.llvm.org/D57375

llvm-svn: 354356
2019-02-19 17:13:40 +00:00
Craig Topper 236e1ce1d9 [X86] Filter out tuning feature flags and a few ISA feature flags when checking for function inline compatibility.
Tuning flags don't have any effect on the available instructions so aren't a good reason to prevent inlining.

There are also some ISA flags that don't have any intrinsics our ABI requirements that we can exclude. I've put only the most basic ones like cmpxchg16b and lahfsahf. These are interesting because they aren't present in all 64-bit CPUs, but we have codegen workarounds when they aren't present.

Loosening these checks can help with scenarios where a caller has a more specific CPU than a callee. The default tuning flags on our generic 'x86-64' CPU can currently make it inline compatible with other CPUs. I've also added an example test for 'nocona' and 'prescott' where 'nocona' is just a 64-bit capable version of 'prescott' but in 32-bit mode they should be completely compatible.

I've based the implementation here of the similar code in AMDGPU.

Differential Revision: https://reviews.llvm.org/D58371

llvm-svn: 354355
2019-02-19 17:05:11 +00:00
Matt Arsenault b4c95b338b GlobalISel: Implement moreElementsVector for select
llvm-svn: 354354
2019-02-19 17:03:09 +00:00
Matt Arsenault 4d88427a58 GlobalISel: Implement moreElementsVector for G_EXTRACT source
llvm-svn: 354348
2019-02-19 16:44:22 +00:00
Simon Pilgrim 9d575db85e [X86][AVX] Update VBROADCAST folds to always use v2i64 X86vzload
The VBROADCAST combines and SimplifyDemandedVectorElts improvements mean that we now more consistently use shorter (128-bit) X86vzload input operands.

Follow up to D58053

llvm-svn: 354346
2019-02-19 16:33:17 +00:00
Matt Arsenault 26b7e859ef GlobalISel: Implement moreElementsVector for bit ops
llvm-svn: 354345
2019-02-19 16:30:19 +00:00
Simon Pilgrim d6add74915 Cast from SDValue directly instead of superfluous getNode(). NFCI.
llvm-svn: 354343
2019-02-19 16:20:09 +00:00
Simon Pilgrim 952abcefe4 [X86][AVX] EltsFromConsecutiveLoads - Add BROADCAST lowering support
This patch adds scalar/subvector BROADCAST handling to EltsFromConsecutiveLoads.

It mainly shows codegen changes to 32-bit code which failed to handle i64 loads, although 64-bit code is also using this new path to more efficiently combine to a broadcast load.

Differential Revision: https://reviews.llvm.org/D58053

llvm-svn: 354340
2019-02-19 15:57:09 +00:00
Alex Bradbury 6aae216109 [RISCV][NFC] Move some std::string to StringRef
llvm-svn: 354333
2019-02-19 14:42:00 +00:00
Diana Picus 19dbc6245f [ARM GlobalISel] Support G_PHI for Thumb2
Same as arm mode.

llvm-svn: 354310
2019-02-19 10:26:47 +00:00
Craig Topper c81e0c67ba [X86] Remove command line strings from the ProcIntel* features.
These should always follow the CPU string. There's no reason to control them independently.

llvm-svn: 354304
2019-02-19 03:04:14 +00:00
Jessica Paquette b53e0f4b81 [GlobalISel][AArch64] Legalize + select some llvm.ctlz.* intrinsics
Legalize/select llvm.ctlz.*

Add select-ctlz to show that we actually select them. Update arm64-clrsb.ll and
arm64-vclz.ll to show that we perform valid transformations in optimized builds,
and document where GISel can improve.

Differential Revision: https://reviews.llvm.org/D58155

llvm-svn: 354299
2019-02-18 23:33:24 +00:00
Sanjay Patel d8b4efcb6b [CGP] form usub with overflow from sub+icmp
The motivating x86 cases for forming the intrinsic are shown in PR31754 and PR40487:
https://bugs.llvm.org/show_bug.cgi?id=31754
https://bugs.llvm.org/show_bug.cgi?id=40487
..and those are shown in the IR test file and x86 codegen file.

Matching the usubo pattern is harder than uaddo because we have 2 independent values rather than a def-use.

This adds a TLI hook that should preserve the existing behavior for uaddo formation, but disables usubo
formation by default. Only x86 overrides that setting for now although other targets will likely benefit
by forming usbuo too.

Differential Revision: https://reviews.llvm.org/D57789

llvm-svn: 354298
2019-02-18 23:33:05 +00:00
Changpeng Fang 4cabf6d3b5 AMDGPU: Use MachineInstr::mayAlias to replace areMemAccessesTriviallyDisjoint in LoadStoreOptimizer pass.
Summary:
  This is to fix a memory dependence bug in LoadStoreOptimizer.

Reviewers:
  arsenm, rampitec

Differential Revision:
  https://reviews.llvm.org/D58295

llvm-svn: 354295
2019-02-18 23:00:26 +00:00
Matt Arsenault fbe92a53d0 GlobalISel: Implement widenScalar for g_extract scalar results
llvm-svn: 354293
2019-02-18 22:39:27 +00:00
Sanjay Patel fff628274d [x86] split more v8f32/v8i32 shuffles in lowering
Similar to D57867 - this is a small patch with lots of test diffs.
With half-vector-width narrowing potential, using an extract + 128-bit vshufps
is a win because it replaces a 256-bit shuffle with a 128-bit shufle.

This seems like it should be a win even for targets with 'fast-variable-shuffle',
but we are intentionally deferring that to an independent change to make sure
that is true.

Differential Revision: https://reviews.llvm.org/D58181

llvm-svn: 354279
2019-02-18 16:46:12 +00:00
Craig Topper ce3c5ac6a6 [X86] In FP_TO_INTHelper, when moving data from SSE register to X87 register file via the stack, use the same stack slot we use for the integer conversion.
No need for a separate stack slot. The lifetimes don't overlap.

Also fix the MachinePointerInfo for the final load after the integer conversion to indicate it came from the stack slot.

llvm-svn: 354234
2019-02-17 19:23:49 +00:00
Craig Topper db5aa955cb [X86] When type legalizing the result of a i64 fp_to_uint on 32-bit targets. Generate all of the ops as i64 and let them be legalized.
No need to manually split everything. We can let the type legalizer work for us.

The test change seems to be caused by some DAG ordering issue that was previously circumventing a one use check in LowerSELECT where FP selects are turned into blends if the setcc has one use. But it was running after an integer select and the same setcc had been legalized to cmov and X86SISD::CMP. This dropped the use count of the setcc, but wasn't what was intended.

llvm-svn: 354197
2019-02-16 08:25:42 +00:00
Craig Topper 61da80584d [X86] Don't prevent load folding for cvtsi2ss/cvtsi2sd based on hasPartialRegUpdate.
Preventing the load fold won't fix the partial register update since the
input we can fold is a GPR. So it will do nothing to prevent a false dependency
on an XMM register.

llvm-svn: 354193
2019-02-16 03:34:54 +00:00
Craig Topper db2f084aa9 [X86] Don't set exception mask bits when modifying FPCW to change rounding mode for fp->int conversion
When we need to do an fp->int conversion using x87 instructions, we need to temporarily change the rounding mode to 0b11 and perform a store. To do this we save the old value of the fpcw to the stack, then set the fpcw to 0xc7f, do the store, then restore fpcw. But the 0xc7f value forces the exception mask bits 1. While this is what they would be in the default FP environment, as we move to support changing the FP environments, we shouldn't make this assumption.

This patch changes the code to explicitly OR 0xc00 with the old value so that only the rounding mode is changed. Unfortunately, this requires two stack temporaries instead of one. One to hold the old value and one to hold the new value. Without two stack temporaries we would need an additional GPR. We already need one to do the OR operation in. This is similar to what gcc and icc do for this operation. Though they are both better at reusing the stack temporaries when there are multiple truncates in a function(or at least in a basic block)

Differential Revision: https://reviews.llvm.org/D57788

llvm-svn: 354178
2019-02-15 21:59:33 +00:00
Nirav Dave 7875841121 [X86] Fix LowerAsmOutputForConstraint.
Summary:
Update Flag when generating cc output.

Fixes PR40737.

Reviewers: rnk, nickdesaulniers, craig.topper, spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58283

llvm-svn: 354163
2019-02-15 20:01:55 +00:00
Craig Topper 9c6a9276da [X86] Move all the SSE legality checks out of FP_TO_INTHelper and up to LowerFP_TO_INT. NFCI
These checks aren't needed on the call to FP_TO_INTHelper from the type legalizer for splitting i64. We always want to use X87 FIST/FISTT to memory there.

Moving up the SSE checks will allow this routine to focus on what it cares about and makes its return semantics cleaner.

llvm-svn: 354161
2019-02-15 19:21:39 +00:00
Jonas Paulsson c0eef3542b Recommit "[SystemZ] Do not emit VEXTEND or VROUND nodes without vector support."
It seems there were some problem with using a .mir test. For some reason
doing '-stop-before=codegenprepare' and then '-start-before=codegenprepare'
on the output .mir file results in the NoVRegs Property after instruction
selection.

Recommitting the same test as an .ll file instead.

llvm-svn: 354160
2019-02-15 19:13:55 +00:00
Simon Pilgrim 6ce08672fb [X86][AVX] lowerShuffleAsLanePermuteAndPermute - fully populate the lane shuffle mask (PR40730)
As detailed on PR40730, we are not correctly filling in the lane shuffle mask (D53148/rL344446) - we fill in for the correct src lane but don't add it to the correct mask element, so any reference to the correct element is likely to see an UNDEF mask index.

This allows constant folding to propagate UNDEFs prior to the lane mask being (correctly) lowered to vperm2f128.

This patch fixes the issue by fully populating the lane shuffle mask - this is more than is necessary (if we only filled in the required mask elements we might be able to match other shuffle instructions - broadcasts etc.), but its the most cautious approach as this needs to be cherrypicked into the 8.0.0 release branch.

Differential Revision: https://reviews.llvm.org/D58237

llvm-svn: 354117
2019-02-15 11:39:21 +00:00
Diana Picus c0f964eb2f [ARM GlobalISel] Style fix. NFCI
Add the opcode for ADDrr / t2ADDrr to the Opcode cache, as we did for
all other opcodes where the handling is otherwise the same between arm
mode and thumb2.

llvm-svn: 354115
2019-02-15 10:50:02 +00:00
Diana Picus a00425ff0d [ARM GlobalISel] Support branches for Thumb2
Just like arm mode, but with different opcodes.

llvm-svn: 354113
2019-02-15 10:24:03 +00:00
Alex Bradbury 22531c4a14 [RISCV] Add assembler support for LA pseudo-instruction
This patch also introduces the emitAuipcInstPair helper, which is then used
for both emitLoadAddress and emitLoadLocalAddress.

Differential Revision: https://reviews.llvm.org/D55325
Patch by James Clarke.

llvm-svn: 354111
2019-02-15 09:53:32 +00:00
Alex Bradbury 8eb87e59a6 [RISCV] Support assembling %got_pcrel_hi operator
Differential Revision: https://reviews.llvm.org/D55279
Patch by James Clarke.

llvm-svn: 354110
2019-02-15 09:43:46 +00:00
Sam Parker 3c17cb7bc4 [ARM CGP] Fix ConvertTruncs
ConvertTruncs is used to replace a trunc for an AND mask, however
this function wasn't working as expected. By performing the change
later, we can create a wide type integer mask instead of a narrow -1
value, which could then be simply removed (incorrectly). Because we
now perform this action later, it's necessary to cache the trunc type
before we perform the promotion.

Differential Revision: https://reviews.llvm.org/D57686

llvm-svn: 354108
2019-02-15 09:04:39 +00:00
Matt Arsenault 2a5488b877 X86: Replace isSafeToClobberEFLAGS implementation
Also use modifiesRegister instead of looping over operands.

llvm-svn: 354098
2019-02-15 04:01:39 +00:00
Francis Visoiu Mistrih 3fb7d4f55f Revert "[SystemZ] Do not emit VEXTEND or VROUND nodes without vector support."
This reverts commit aa0b77d339.

This fails to pass the machine verifier:
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/13579/

llvm-svn: 354096
2019-02-15 03:01:09 +00:00
Konstantin Zhuravlyov 1e126c503b AMDGPU: Set ABI version to 1 for code object v3
Differential Revision: https://reviews.llvm.org/D57811

llvm-svn: 354085
2019-02-14 23:56:04 +00:00
Matt Arsenault 530d05e94a GlobalISel: Add alignment to LegalityQuery MMOs
This allows targets to specify the minimum alignment required for the
load/store.

llvm-svn: 354071
2019-02-14 22:41:09 +00:00
Matt Arsenault 9e5e868d95 AMDGPU/GlobalISel: Fix RegBankSelect for GEP.
This is basically a pointer typed add, so shouldn't be any different.
This was assuming everything was an SGPR, which is not true.

Also cleanup legality for GEP. I don't seem to be seeing the problem
the hack marking s64 as a legal pointer type the comment mentions.

llvm-svn: 354067
2019-02-14 22:24:28 +00:00
Stanislav Mekhanoshin 871821f786 [AMDGPU] Ressociate 'add (add x, y), z' to use SALU
Reassociate adds to collect scalar operands in a single
instruction when possible. That will result in a scalar
add followed by vector instead of two vector adds, thus
better utilizing SALU.

Differential Revision: https://reviews.llvm.org/D58220

llvm-svn: 354066
2019-02-14 22:11:25 +00:00
Matt Arsenault d3d496338e AMDGPU/GlobalISel: Handle split for 64-bit VALU select
llvm-svn: 354065
2019-02-14 21:58:12 +00:00
Nirav Dave 5ffdc43dc9 [X86] cleanup inline asm register generation. NFCI.
llvm-svn: 354042
2019-02-14 18:06:21 +00:00
Jonas Paulsson aa0b77d339 [SystemZ] Do not emit VEXTEND or VROUND nodes without vector support.
Review: Ulrich Weigand
https://reviews.llvm.org/D58240

llvm-svn: 354039
2019-02-14 17:58:48 +00:00
Petar Avramovic 14c7ecfe84 [MIPS GlobalISel] Select phi instruction for integers
Select G_PHI for integers for MIPS32.

Differential Revision: https://reviews.llvm.org/D58183

llvm-svn: 354025
2019-02-14 12:36:19 +00:00
Petar Avramovic 5d9b8eed85 [MIPS GlobalISel] Select branch instructions
Select G_BR and G_BRCOND for MIPS32.
Unconditional branch G_BR does not have register operand,
for that reason we only add tests.
Since conditional branch G_BRCOND compares register to zero on MIPS32,
explicit extension must be performed on i1 condition in order to set
high bits to appropriate value.

Differential Revision: https://reviews.llvm.org/D58182

llvm-svn: 354022
2019-02-14 11:39:53 +00:00
David Green 743abf2bd9 [ARM] Ensure we update the correct flags in the peephole optimiser
The Arm peephole optimiser code keeps track of both an MI and a SubAdd that can
be used to optimise away a CMP. In the rare case that both are found and not
ruled-out as valid, we could end up setting the flags on the wrong one.

Instead make sure we are using SubAdd if it exists, as it will be closer to the
CMP.

The testcase here is a little theoretical, with a dead def of cpsr. It should
hopefully show the point.

Differential Revision: https://reviews.llvm.org/D58176

llvm-svn: 354018
2019-02-14 11:09:24 +00:00
Craig Topper 1d158dd930 [X86] Make (f80 (sint_to_fp (i16))) use fistps/fisttps instead of fistpl/fisttpl when SSE is enabled.
When SSE is enabled sint_to_fp with i16 is blindly promoted to i32, but that changes the behavior of f80 conversion.

Move the promotion to i16 to LowerFP_TO_INT so we can limit it based on the floating point type.

llvm-svn: 354003
2019-02-14 01:41:43 +00:00
Dylan McKay 8a56d10a2f [AVR] Fix a typo - 's/analisys/analysis'
llvm-svn: 353987
2019-02-13 22:31:37 +00:00
Thomas Lively bba3f06d05 [WebAssembly] memory.fill
Summary:
memset lowering, fix argument types in memcpy lowering, and
test encodings. Depends on D57736.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57791

llvm-svn: 353986
2019-02-13 22:25:18 +00:00
Thomas Lively de7a0a1526 [WebAssembly] Bulk memory intrinsics and builtins
Summary:
implements llvm intrinsics and clang intrinsics for
memory.init and data.drop.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D57736

llvm-svn: 353983
2019-02-13 22:11:16 +00:00
Petr Hosek fcbec02ea6 [AArch64] Support reserving arbitrary general purpose registers
This is a follow up to D48580 and D48581 which allows reserving
arbitrary general purpose registers with the exception of registers
with special purpose (X8, X16-X18, X29, X30) and registers used by LLVM
(X0, X19). This change also generalizes some of the existing logic to
rely entirely on values generated from tablegen.

Differential Revision: https://reviews.llvm.org/D56305

llvm-svn: 353957
2019-02-13 17:28:47 +00:00
Diana Picus aa4118a873 [ARM GlobalISel] Support G_SELECT for Thumb2
Same as arm mode, but slightly different opcodes.

llvm-svn: 353938
2019-02-13 11:25:32 +00:00
Anton Afanasyev ca9aff9353 [X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)
Try to use 64-bit SLP vectorization. In addition to horizontal instrs
this change triggers optimizations for partial vector operations (for instance,
using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by
<2 x float>).

Fixes llvm.org/PR32433

llvm-svn: 353923
2019-02-13 08:26:43 +00:00
Craig Topper 9b61f48e4b [X86] Use default expansion for (i64 fp_to_uint f80) when avx512 is enabled on 64-bit targets to match what happens without avx512.
In 64-bit mode prior to avx512 we use Expand, but with avx512 we need to make f32/f64 conversions Legal so we use Custom and then do our own expansion for f80. But this seems to produce codegen differences relative to avx2. This patch corrects this.

llvm-svn: 353921
2019-02-13 07:42:34 +00:00
Craig Topper 3099e442a6 [X86] Refactor the FP_TO_INTHelper interface. NFCI
-Pull the final stack load creation from the two callers into the helper.
-Return a single SDValue instead of a std::pair.
-Remove the Replace flag which isn't really needed.

llvm-svn: 353920
2019-02-13 07:42:31 +00:00
Matt Arsenault 4cd9509e1d AMDGPU: Try to use function specific ST
Subtargets are a function level property, so ideally we would
eliminate everywhere that needs to check the global one. Rename the
function to try avoiding confusion.

llvm-svn: 353900
2019-02-12 23:44:13 +00:00
Matt Arsenault d24296e282 AMDGPU: Ignore CodeObjectV3 when inlining
This was inhibiting inlining of library functions when clang was
invoking the inliner directly. This is covering a bit of a mess with
subtarget feature handling, and this shouldn't be a subtarget
feature. The behavior is different depending on whether you are using
a -mattr flag in clang, or llc, opt.

llvm-svn: 353899
2019-02-12 23:30:11 +00:00
Jonas Paulsson 749dc51e45 [SystemZ] Remember to cast value to void to disable warning.
Hopefully fixes buildbot problems.

llvm-svn: 353898
2019-02-12 23:13:18 +00:00
Konstantin Zhuravlyov 6220d62e5c AMDGPU/NFC: Remove SubtargetFeatureISAVersion since it is not used anywhere
llvm-svn: 353892
2019-02-12 22:49:49 +00:00
Konstantin Zhuravlyov acb231c8d8 AMDGPU: Remove duplicate processor (gfx900)
llvm-svn: 353889
2019-02-12 22:29:25 +00:00
Sean Fertile 9850a48275 Fix undefined behaviour in PPCInstPrinter::printBranchOperand.
Fix the undefined behaviour introduced by my previous patch r353865 (left
shifting a potentially negative value), which was caught by the bots that run
UBSan.

llvm-svn: 353874
2019-02-12 20:03:04 +00:00
Nikita Popov a3be17ea1c [AArch64] Expand v8i8 cttz (PR39729)
Fix for https://bugs.llvm.org/show_bug.cgi?id=39729.

Rather than adding just a case for v8i8 I'm setting cttz to expand
for all vector types.

Differential Revision: https://reviews.llvm.org/D58008

llvm-svn: 353872
2019-02-12 18:55:53 +00:00
Jonas Paulsson 34bead750c [SystemZ] Use VGM whenever possible to load FP immediates.
isFPImmLegal() has been extended to recognize certain FP immediates that can
be built with VGM (Vector Generate Mask).

These scalar FP immediates (that were previously loaded from the constant
pool) are now selected as VGMF/VGMG in Select().

Review: Ulrich Weigand
https://reviews.llvm.org/D58003

llvm-svn: 353867
2019-02-12 18:06:06 +00:00
Sean Fertile c069452027 [PowerPC] Fix printing of negative offsets in call instruction dissasembly.
llvm-svn: 353865
2019-02-12 17:48:22 +00:00
Jessica Paquette 0e71e73faa [GlobalISel][AArch64] Select llvm.bswap* for non-vector types
This teaches the IRTranslator to emit G_BSWAP when it runs into
Intrinsic::bswap. This allows us to select G_BSWAP for non-vector types in
AArch64.

Add a select-bswap.mir test, and add global isel checks to a couple existing
tests in test/CodeGen/AArch64.

This doesn't handle every bswap case, since some of these rely on known bits
stuff. This just lets us handle the naive case.

Differential Revision: https://reviews.llvm.org/D58081

llvm-svn: 353861
2019-02-12 17:28:17 +00:00
Simon Pilgrim 5338f41ced [X86][AVX] Enable shuffle combining support for zero_extend
A more limited version of rL352997 that had to be disabled in rL353198 - allow extension of any 128/256/512 bit vector that at least uses byte sized scalars.

llvm-svn: 353860
2019-02-12 17:22:35 +00:00
Matt Arsenault 00ccd13c73 AMDGPU/GlobalISel: Only make f16 constants legal on f16 targets
We could deal with it, but there's no real point.

llvm-svn: 353845
2019-02-12 14:54:55 +00:00
Craig Topper 7670ede434 [X86] Collapse FP_TO_INT16_IN_MEM/FP_TO_INT32_IN_MEM/FP_TO_INT64_IN_MEM into a single opcode using memory VT to distinquish. NFC
llvm-svn: 353798
2019-02-12 06:14:18 +00:00
Craig Topper d7303ecd0b [X86] Remove the value type operand from the floating point load/store MemIntrinsicSDNodes. Use the MemoryVT instead. NFCI
We already have the memory VT, we can just match from that during isel.

llvm-svn: 353797
2019-02-12 06:14:16 +00:00
Matt Arsenault 18ec382698 GlobalISel: Implement moreElementsVector for implicit_def
llvm-svn: 353754
2019-02-11 22:00:39 +00:00
Craig Topper 75eb0af874 [X86] Correct the memory operand for the FLD emitted in FP_TO_INTHelper for 32-bit SSE targets.
We were using DstTy, but that represents the integer type we are converting to which is i64 in this
case. The FLD is part of an intermediate step to get from the SSE registers to the x87 registers.
If the floating point type is f32, the memory operand should reflect a 4 byte access not an 8 byte
access. The store we used to get from SSE to the stack is using the corect size.

While there, consistenly use TheVT in place of Op.getOperand(0).getValueType() throughout the function.

llvm-svn: 353745
2019-02-11 20:38:10 +00:00
Jessica Paquette e57fe23f70 [AArch64][GlobalISel] Add isel support for a couple vector exts/truncs
Add support for

- v4s16 <-> v4s32
- v2s64 <-> v2s32

And update tests that use them to show that we generate the correct
instructions.

Differential Revision: https://reviews.llvm.org/D57832

llvm-svn: 353732
2019-02-11 18:56:39 +00:00
Roland Froese 732fe22454 [PowerPC] Avoid scalarization of vector truncate
The PowerPC code generator currently scalarizes vector truncates that would fit in a vector register, resulting in vector extracts, scalar operations, and vector merges. This patch custom lowers a vector truncate that would fit in a register to a vector shuffle instead.

Differential Revision: https://reviews.llvm.org/D56507

llvm-svn: 353724
2019-02-11 17:29:14 +00:00
Jessica Paquette ebdb021031 [GlobalISel][AArch64] Select G_FFLOOR
This teaches the legalizer about G_FFLOOR, and lets us select G_FFLOOR in
AArch64.

It updates the existing floating point tests, and adds a select-floor.mir test.

Differential Revision: https://reviews.llvm.org/D57486

llvm-svn: 353722
2019-02-11 17:22:58 +00:00
Matt Arsenault 9dba67f431 GlobalISel: Add G_FCANONICALIZE instruction
llvm-svn: 353719
2019-02-11 17:05:20 +00:00
Benjamin Kramer 711950c116 Move some classes into anonymous namespaces. NFC.
llvm-svn: 353710
2019-02-11 15:16:21 +00:00
Benjamin Kramer 582c16013d [AMDGPU] Remove unused variable
llvm-svn: 353704
2019-02-11 14:49:54 +00:00
Neil Henning 8c10fa1a90 [AMDGPU] Fix DPP sequence in atomic optimizer.
This commit fixes the DPP sequence in the atomic optimizer (which was
previously missing the row_shr:3 step), and works around a read_register
exec bug by using a ballot instead.

Differential Revision: https://reviews.llvm.org/D57737

llvm-svn: 353703
2019-02-11 14:44:14 +00:00
Sam McCall e825ba9165 Revert "[X86][SSE] Generalize X86ISD::BLENDI support to more value types"
This reverts commit r353610.
It causes a miscompile visible in macro expansion in a bootstrapped clang.

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190211/626590.html

llvm-svn: 353699
2019-02-11 14:05:36 +00:00
Sam Parker 8ff143033a [ARM] Add v8m.base pattern for add negative imm
The v8m.base ISA contains movw, which can operate on an unsigned
16-bit value. Add the pattern that converts an add with a negative
value, that could fit into 16-bits when negated, into a sub with that
positive value.

Differential Revision: https://reviews.llvm.org/D57942

llvm-svn: 353692
2019-02-11 11:35:42 +00:00
Valery Pykhtin ded96df01e [AMDGPU] Enable DPP combiner pass by default.
Related revisions: https://reviews.llvm.org/D55444, https://reviews.llvm.org/D55314

llvm-svn: 353691
2019-02-11 11:15:03 +00:00
Sjoerd Meijer 150ccb889e [ARM] LoadStoreOptimizer: reoder limit
The whole design of generating LDMs/STMs is fragile and unreliable: it depends on
rescheduling here in the LoadStoreOptimizer that isn't register pressure aware
and regalloc that isn't aware of generating LDMs/STMs.
This patch adds a (hidden) option to control the total number of instructions that
can be re-ordered. I appreciate this looks only a tiny bit better than a hard-coded
constant, but at least it allows more easy experimentation with different values
for now. Ideally we calculate this reorder limit based on some heuristics, and take
register pressure into account. I might be looking into that next.

Differential Revision: https://reviews.llvm.org/D57954

llvm-svn: 353678
2019-02-11 09:37:42 +00:00
Sjoerd Meijer 0cc50c6b87 [ARM] LoadStoreOptimizer: just a clean-up. NFC.
Differential Revision: https://reviews.llvm.org/D57955

llvm-svn: 353670
2019-02-11 08:47:59 +00:00
Craig Topper 5b1beda001 [X86] Removed unused SDTypeProfile. NFC
llvm-svn: 353659
2019-02-11 07:30:48 +00:00
Simon Pilgrim f6e6c369c0 [X86] EltsFromConsecutiveLoads - replace SmallBitVector with APInt (NFC).
Minor refactor to simplify some incoming patches to improve broadcast loads.

llvm-svn: 353655
2019-02-10 22:45:48 +00:00
Sanjay Patel 833550fc74 [x86] narrow 256-bit horizontal ops via demanded elements
256-bit horizontal math ops are an x86 monstrosity (and thankfully have
not been extended to 512-bit AFAIK).

The two 128-bit halves operate on separate halves of the inputs. So if we
don't demand anything in the upper half of the result, we can extract the
low halves of the inputs, do the math, and then insert that result into a
256-bit output.

All of the extract/insert is free (ymm<-->xmm), so we're left with a
narrower (cheaper) version of the original op.

In the affected tests based on:
https://bugs.llvm.org/show_bug.cgi?id=33758
https://bugs.llvm.org/show_bug.cgi?id=38971
...we see that the h-op narrowing can result in further narrowing of other
math via existing generic transforms.

I originally drafted this patch as an exact pattern match starting from
extract_vector_elt, but I thought we might see diffs starting from
extract_subvector too, so I changed it to a more general demanded elements
solution. There are no extra existing regression test improvements from
that switch though, so we could go back.

Differential Revision: https://reviews.llvm.org/D57841

llvm-svn: 353641
2019-02-10 15:22:06 +00:00
Craig Topper f37ea96922 [X86] Move some vector InstAliases out from under unnecessary 'let Predicates'. NFCI
We don't have any assembler predicates for vector ISAs so this isn't necessary. It just adds extra lines and identation.

llvm-svn: 353631
2019-02-10 02:34:31 +00:00
Simon Pilgrim 6bf7b30b10 [X86] CombineOr - fold to generic funnel shifts
As discussed on D57389, this is a first step towards moving the SHLD/SHRD matching code to DAGCombiner using FSHL/FSHR instead.

There's a bit of work to do before I can do that, so this just folds to FSHL/FSHR in the existing code (handling the different SHRD/FSHR argument ordering), which fixes the issue we had with i16 shift amounts not being correctly masked.

llvm-svn: 353626
2019-02-09 20:34:59 +00:00
Simon Pilgrim 690a2889d8 [X86][SSE] Generalize X86ISD::BLENDI support to more value types
D42042 introduced the ability for the ExecutionDomainFixPass to more easily change between BLENDPD/BLENDPS/PBLENDW as the domains required.

With this ability, we can avoid most bitcasts/scaling in the DAG that was occurring with X86ISD::BLENDI lowering/combining, blend with the vXi32/vXi64 vectors directly and use isel patterns to lower to the float vector equivalent vectors.

This helps the shuffle combining and SimplifyDemandedVectorElts be more aggressive as we lose track of fewer UNDEF elements than when we go up/down through bitcasts.

I've introduced a basic blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) fold, there are more generalizations I can do there (e.g. widening/scaling and handling the tricky v16i16 repeated mask case).

The vector-reduce-smin/smax regressions will be fixed in a future improvement to SimplifyDemandedBits to peek through bitcasts and support X86ISD::BLENDV.

Differential Revision: https://reviews.llvm.org/D57888

llvm-svn: 353610
2019-02-09 13:13:59 +00:00
Stanislav Mekhanoshin 0e858b028d [AMDGPU] Split dot-insts feature
Differential Revision: https://reviews.llvm.org/D57971

llvm-svn: 353587
2019-02-09 00:34:21 +00:00
Craig Topper fcb63c4c6c [X86] Add FPCW as an implicit use on floating point load instructions.
These instructions can generate a stack overflow exception so technically they read the stack overflow exception mask bit.

llvm-svn: 353564
2019-02-08 20:50:09 +00:00
Craig Topper 784929d045 Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563
2019-02-08 20:48:56 +00:00
Matt Arsenault 564f0f832c AMDGPU: Eliminate GPU specific SubtargetFeatures
Inline compatability is determined from the individual feature
bits. These are just sets of the separate features, but will always be
treated as incompatible unless they are specifically ignored.

Defining the ISA version number here in tablegen would be nice, but it
turns out this wasn't actually used.

llvm-svn: 353558
2019-02-08 19:59:32 +00:00
Matt Arsenault d7047276ec AMDGPU: Remove GCN features and predicates
These are no longer necessary since the R600 tablegen files are split
out now.

llvm-svn: 353548
2019-02-08 19:18:01 +00:00
Craig Topper 41a1792b15 [X86] Remove isReMaterializable from X87 floating point constant loads and constant pool loads.
Summary: These instructions update FPSW so they aren't generically safe to rematerialize into any location if FPSW is live for a comparison result. They also use FPCW for exception masking control. Though the only exception they can generate is stack overflow and we manage the stack ourselves so that's not really going to occur.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57934

llvm-svn: 353536
2019-02-08 17:07:54 +00:00
Sanjay Patel e9cc26a56a [x86] fix formatting; NFC
(test commit #2 migrating to git)

llvm-svn: 353533
2019-02-08 16:48:40 +00:00
Carl Ritson 494b8ac95a [AMDGPU] Fix CS scratch setup on pre-GCN3 ASICs
Summary:
Prior to GCN3 s_load_dword offsets are in dwords rather than bytes.
Thus the scratch buffer descriptor offset must be adjusted for pre-GCN3 ASICs.

Reviewers: nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: sheredom, arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D56496

llvm-svn: 353530
2019-02-08 15:41:11 +00:00
Matt Arsenault b0a227049f AMDGPU/GlobalISel: Fix shift legalization for non-power-of-2
clampScalar doesn't do anything for non-power-of-2 in range.
There should probably be a combination rule to reduce the number
of matching rules.

llvm-svn: 353526
2019-02-08 15:06:24 +00:00
Dmitry Preobrazhensky 942c273d64 [AMDGPU][MC] Added support of lds_direct operand
See bug 39293: https://bugs.llvm.org/show_bug.cgi?id=39293

Reviewers: artem.tamazov, rampitec

Differential Revision: https://reviews.llvm.org/D57889

llvm-svn: 353524
2019-02-08 14:57:37 +00:00
Matt Arsenault 0f2debb1c2 AMDGPU/GlobalISel: Fix non-power-of-2 implicit_def
llvm-svn: 353522
2019-02-08 14:46:27 +00:00
Petar Avramovic c98b26d326 [MIPS GlobalISel] Select any extending load and truncating store
Make behavior of G_LOAD in widenScalar same as for G_ZEXTLOAD and
G_SEXTLOAD. That is perform widenScalarDst to size given by the target
and avoid additional checks in common code. Targets can reorder or add
additional rules in LegalizeRuleSet for the opcode to achieve desired
behavior.

Select extending load that does not have specified type of extension
into zero extending load.

Select truncating store that stores number of bytes indicated by size
in MachineMemoperand.

Differential Revision: https://reviews.llvm.org/D57454

llvm-svn: 353520
2019-02-08 14:27:23 +00:00
Matt Arsenault dc88a2ce35 AMDGPU/GlobalISel: Don't use a copy in addrspacecast lowering
llvm-svn: 353516
2019-02-08 14:16:11 +00:00
Dmitry Preobrazhensky 62a0318dff [AMDGPU][MC][CODEOBJECT] Added predefined symbols to access GPU minor and stepping numbers
Added the following Code Object v3 symbols:
    .amdgcn.gfx_generation_minor
    .amdgcn.gfx_generation_stepping

Reviewers: artem.tamazov, kzhuravl

Differential Revision: https://reviews.llvm.org/D57826

llvm-svn: 353515
2019-02-08 13:51:31 +00:00
Valery Pykhtin 7fe97f8c7c [AMDGPU] Fix DPP combiner
Differential revision: https://reviews.llvm.org/D55444

dpp move with uses and old reg initializer should be in the same BB.
bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Otherwise the old register value is checked for identity.
Added add, subrev, and, or instructions to the old folding function.
Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.

The pass is still disabled by default.

llvm-svn: 353513
2019-02-08 11:59:48 +00:00
Petar Avramovic 56dc218dc1 [MIPS GlobalISel] Select mul
Legalize and select G_MUL for s32 and smaller types for MIPS32.

Differential Revision: https://reviews.llvm.org/D57816

llvm-svn: 353506
2019-02-08 10:11:33 +00:00
Sam Parker 5b09834bc3 [ARM] Add OptMinSize to ARMSubtarget
In many places in the backend, we like to know whether we're
optimising for code size and this is performed by checking the
current machine function attributes. A subtarget is created on a
per-function basis, so it's possible to know when we're compiling for
code size on construction so record this in the new object.

Differential Revision: https://reviews.llvm.org/D57812

llvm-svn: 353501
2019-02-08 07:57:42 +00:00
Heejin Ahn df6770f0c9 [WebAssembly] Fix parseImmediate's memory alignment requirement
This fixes the current failure in the x86-64 ubsan bot caused by
r353496.

llvm-svn: 353499
2019-02-08 04:06:56 +00:00
Matt Arsenault a8b4339c2f AMDGPU/GlobalISel: Legalize addrspacecast
Use a placeholder constant for now on targets
that need the load from the queue ptr.

llvm-svn: 353497
2019-02-08 02:40:47 +00:00
Wouter van Oortmerssen 0d9f3f7f95 [WebAssembly] Fixed Disassembler ignoring endian swap on big endian.
Summary: This fixes: https://bugs.llvm.org/show_bug.cgi?id=40620

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57933

llvm-svn: 353496
2019-02-08 01:43:23 +00:00
Craig Topper 738180cc7f Fix the lowering issue of intrinsics llvm.localaddress on X86
Patch by Yuanke Luo

Reviewers: craig.topper, annita.zhang, smaslov, rnk, wxiao3

Reviewed By: rnk

Subscribers: efriedma, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57501

llvm-svn: 353492
2019-02-08 01:14:12 +00:00
Craig Topper c782f18835 [X86] Add FPCW as a register and start using it as an implicit use on floating point instructions.
Summary:
FPCW contains the rounding mode control which we manipulate to implement fp to integer conversion by changing the roudning mode, storing the value to the stack, and then changing the rounding mode back. Because we didn't model FPCW and its dependency chain, other instructions could be scheduled into the middle of the sequence.

This patch introduces the register and adds it as an implciit def of FLDCW and implicit use of the FP binary arithmetic instructions and store instructions. There are more instructions that need to be updated, but this is a good start. I believe this fixes at least the reduced test case from PR40529.

Reviewers: RKSimon, spatel, rnk, efriedma, andrew.w.kaylor

Subscribers: dim, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57735

llvm-svn: 353489
2019-02-08 00:44:39 +00:00
Eli Friedman 29c0609301 [AArch64] Fix condition for "high-vector" DUP optimizations.
AArch64 NEON has a bunch of instructions with a "2" suffix that extract
the top half of the source vectors, instead of the bottom half.  We have
some DAGCombines to try to take advantage of that.  However, they
assumed that any EXTRACT_VECTOR was extracting the high half of the
vector in question.

This issue has apparently existed since the AArch64 backend was merged.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40632 .

Differential Revision: https://reviews.llvm.org/D57862

llvm-svn: 353486
2019-02-08 00:23:35 +00:00
Sanjay Patel 81f859d169 [x86] fix formatting; NFC
llvm-svn: 353477
2019-02-07 22:36:55 +00:00
Simon Pilgrim fe3ac70b18 [DAGCombiner] (add (umax X, C), -C) --> (usubsat X, C) (PR40111)
Move the (add (umax X, C), -C) --> (usubsat X, C) X86 combine into generic DAGCombiner

First of a number of saturated arithmetic folds that can be moved out of X86-specific code for PR40111.

Differential Revision: https://reviews.llvm.org/D57754

llvm-svn: 353457
2019-02-07 20:14:43 +00:00
Matt Arsenault fbec8fe93b GlobalISel: Implement narrowScalar for shift main type
This is pretty much directly ported from SelectionDAG. Doesn't include
the shift by non-constant but known bits version, since there isn't a
globalisel version of computeKnownBits yet.

This shows a disadvantage of targets not specifically which type
should be used for the shift amount. If type 0 is legalized before
type 1, the operations on the shift amount type use the wider type
(which are also less likely to legalize). This can be avoided by
targets specifying legalization actions on type 1 earlier than for
type 0.

llvm-svn: 353455
2019-02-07 19:37:44 +00:00
Matt Arsenault d914189a2e AMDGPU/GlobalISel: Restrict g_implicit_def legality
llvm-svn: 353452
2019-02-07 19:10:15 +00:00
Matt Arsenault c0f7569aab AMDGPU/GlobalISel: Legalize fsqrt
llvm-svn: 353438
2019-02-07 18:14:39 +00:00
Matt Arsenault 93fdec739b AMDGPU/GlobalISel: Legalize some f16 operations
llvm-svn: 353436
2019-02-07 18:03:11 +00:00
Matt Arsenault c83b82363c GlobalISel: Implement fewerElementsVector for shifts
Introduce a new function which handles instructions with multiple type
indices, but have the same number of vector elements.

Also legalize v2s16 shifts when applicable.

llvm-svn: 353432
2019-02-07 17:38:00 +00:00
Matt Arsenault 91be65be65 GlobalISel: Try to make legalize rules more useful for vectors
Mostly keep the existing functions on scalars, but add versions which
also operate based on the vector element size.

llvm-svn: 353430
2019-02-07 17:25:51 +00:00
Sanjay Patel a5c4a5e958 [x86] split more 256/512-bit shuffles in lowering
This is intentionally a small step because it's hard to know exactly 
where we might introduce a conflicting transform with the code that 
tries to form wider shuffles. But I think this is safe - if we have 
a wide shuffle with 2 operands, then we should do better with an 
extract + narrow shuffle.

Differential Revision: https://reviews.llvm.org/D57867

llvm-svn: 353427
2019-02-07 17:10:49 +00:00
Nirav Dave 84e5bf0c95 [X86] Simplify casing. NFC.
llvm-svn: 353417
2019-02-07 15:43:40 +00:00
Sam Parker 67756c09f2 [LSR] Generate cross iteration indexes
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.

Differential Revision: https://reviews.llvm.org/D55373

llvm-svn: 353403
2019-02-07 13:32:54 +00:00
Diana Picus 75a04e2a77 [ARM GlobalISel] Support G_ICMP for Thumb2
Mark as legal and use the t2* equivalents of the arm mode instructions,
e.g. t2CMPrr instead of plain CMPrr.

llvm-svn: 353392
2019-02-07 11:05:33 +00:00
David Green 7e6da81633 [ARM] Reformat isRedundantFlagInstr for D57833. NFC
llvm-svn: 353386
2019-02-07 10:51:04 +00:00
Jiong Wang 66b18e5755 [BPF] add code-gen support for JMP32 instructions
JMP32 instructions has been added to eBPF ISA. They are 32-bit variants of
existing BPF conditional jump instructions, but the comparison happens on
low 32-bit sub-register only, therefore some unnecessary extensions could
be saved.

JMP32 instructions will only be available for -mcpu=v3. Host probe hook has
been updated accordingly.

JMP32 instructions will only be enabled in code-gen when -mattr=+alu32
enabled, meaning compiling the program using sub-register mode.

For JMP32 encoding, it is a new instruction class, and is using the
reserved eBPF class number 0x6.

This patch has been tested by compiling and running kernel bpf selftests
with JMP32 enabled.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 353384
2019-02-07 10:43:09 +00:00
Tim Northover 638110a208 AArch64: implement copy for paired GPR registers.
When doing 128-bit atomics using CASP we might need to copy a GPRPair to a
different register, but that was unimplemented up to now.

llvm-svn: 353383
2019-02-07 10:35:34 +00:00
Nirav Dave c6bfa103a5 [X86][DAG] Avoid creating dangling bitcast.
combineExtractWithShuffle may leave a dangling bitcast which may
prevent further optimization in later passes. Avoid constructing it
unless it is used.

llvm-svn: 353333
2019-02-06 19:45:47 +00:00
Jonas Paulsson b21dde0530 [SystemZ] Improved handling of the @llvm.ctlz intrinsic.
Since SystemZ supports counting of leading zeros with the FLOGR instruction,
isCheapToSpeculateCtlz() should return true, which it now does.

ISD::CTLZ_ZERO_UNDEF i32 is now handled the same way as ISD::CTLZ is, which
is needed since promotion to i64 is required and CTLZ_ZERO_UNDEF is only
expanded to CTLZ if it is Legal or Custom.

Review: Ulrich Weigand
https://reviews.llvm.org/D57710

llvm-svn: 353330
2019-02-06 19:23:31 +00:00
Jonas Paulsson 8cda83a5db [SystemZ] Wait with VGBM selection until after DAGCombine2.
Don't lower BUILD_VECTORs to BYTE_MASK, but instead expose the BUILD_VECTORs
to the DAGCombiner and select them to VGBM in Select(). This allows the
DAGCombiner to understand the constant vector values.

For floating point, only all-zeros vectors are now generated with VGBM, as it
turned out to be somewhat complicated to handle any arbitrary constants,
while in practice this is very rare and hardly needed.

The SystemZ ISD opcodes z_byte_mask, z_vzero and z_vones have been removed.

Review: Ulrich Weigand
https://reviews.llvm.org/D57152

llvm-svn: 353325
2019-02-06 18:59:19 +00:00
Tim Northover 474f5d9b55 AArch64: enforce even/odd register pairs for CASP instructions.
ARMv8.1a CASP instructions need the first of the pair to be an even register
(otherwise the encoding is unallocated). We enforced this during assembly, but
not CodeGen before.

llvm-svn: 353308
2019-02-06 15:26:35 +00:00
Nirav Dave e5c37958f9 [InlineAsm][X86] Add backend support for X86 flag output parameters.
Allow custom handling of inline assembly output parameters and add X86
flag parameter support.

llvm-svn: 353307
2019-02-06 15:26:29 +00:00
Ulrich Weigand 17a0012687 [SystemZ] Do not return INT_MIN from strcmp/memcmp
The IPM sequence currently generated to compute the strcmp/memcmp
result will return INT_MIN for the "less than zero" case.  While
this is in compliance with the standard, strictly speaking, it
turns out that common applications cannot handle this, e.g. because
they negate a comparison result in order to implement reverse
compares.

This patch changes code to use a different sequence that will result
in -2 for the "less than zero" case (same as GCC).  However, this
requires that the two source operands of the compare instructions
are inverted, which breaks the optimization in removeIPMBasedCompare.
Therefore, I've removed this (and all of optimizeCompareInstr), and
replaced it with a mostly equivalent optimization in combineCCMask
at the DAGcombine level.

llvm-svn: 353304
2019-02-06 15:10:13 +00:00
Tim Northover 71025a2f3e AArch64: annotate atomics with dropped acquire semantics when printing.
A quirk of the v8.1a spec is that when the writeback regiser for an atomic
read-modify-write instruction is wzr/xzr, the instruction no longer enforces
acquire ordering. However, it's still written with the misleading 'a' mnemonic.

So this adds an annotation when disassembling such instructions, mentioning the
change.

llvm-svn: 353303
2019-02-06 15:07:59 +00:00
Sanjay Patel e84fbb67a1 [x86] vectorize cast ops in lowering to avoid register file transfers
The proposal in D56796 may cross the line because we're trying to avoid vectorization 
transforms in generic DAG combining. So this is an alternate, later, x86-specific 
translation of that patch.

There are several potential follow-ups to enhance this:
1. Allow extraction from non-zero element index.
2. Peek through extends of smaller width integers.
3. Support x86-specific conversion opcodes like X86ISD::CVTSI2P

Differential Revision: https://reviews.llvm.org/D56864

llvm-svn: 353302
2019-02-06 14:59:39 +00:00
Heejin Ahn 4367587fc6 [WebAssembly] Tidy up `let` statements in .td files (NFC)
Summary:
- Delete {} for one-line `let` statements
- Don't indent within `let` blocks
- Add comments after `let` block's closing braces

Reviewers: tlively

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57730

llvm-svn: 353248
2019-02-06 00:17:03 +00:00
Aditya Nandakumar fef7619b05 [NFC][GlobalISel]: Add a convenience method to MachineInstrBuilder to simplify getOperand(i).getReg()
https://reviews.llvm.org/D57608

It's a common pattern in GISel to have a MachineInstrBuilder from which we get various regs
(commonly MIB->getOperand(0).getReg()). This adds a helper method and the above can be
replaced with MIB.getReg(0).

llvm-svn: 353223
2019-02-05 22:14:40 +00:00
Thomas Lively 315056692d [WebAssembly] Lower memmove to memory.copy
Summary: The lowering is identical to the memcpy lowering.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57727

llvm-svn: 353216
2019-02-05 20:57:40 +00:00
Scott Linder e2c5847414 [AMDGPU] Consider XOR in waterfall loop as a terminator
Ensure the XOR in the waterfall loop for indirect addressing is considered a terminator.

Differential Revision: https://reviews.llvm.org/D57703

llvm-svn: 353207
2019-02-05 19:50:32 +00:00
Matt Arsenault a3f9b71c09 AMDGPU: Fix assert on trunc from bitcast of build_vector
The v2i64 argument is lowered to a bitcast of v4i32 build_vector.
This would then attempt to use the i32-element as the source of the
vector truncate. This really would need to collect 2 elements from the
build_vector to produce the intended truncate.

llvm-svn: 353202
2019-02-05 19:23:57 +00:00
Simon Pilgrim b0afc69435 [X86][SSE] Disable ZERO_EXTEND shuffle combining
rL352997 enabled ZERO_EXTEND from non-shuffle-able value types. I've disabled it for now to fix a regression identified by @asbirlea until I can fix this properly.

llvm-svn: 353198
2019-02-05 19:15:48 +00:00
Anton Korobeynikov b26134bf92 Enable integrated assembler on MSP430 by default.
Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56787

llvm-svn: 353192
2019-02-05 18:01:45 +00:00
Oliver Stannard 78dc38ec94 [AArch64][Outliner] Don't outline BTI instructions
We can't outline BTI instructions, because they need to be the very first
instruction executed after an indirect call or branch. If we outline them, then
an indirect call might go to the branch to the outlined function, which will
fault.

Differential revision: https://reviews.llvm.org/D57753

llvm-svn: 353190
2019-02-05 17:21:57 +00:00
Simon Pilgrim 822d2e35e7 [X86][AVX] Attempt to combine shuffles to subvector broadcast load
llvm-svn: 353189
2019-02-05 17:02:49 +00:00
Matt Arsenault 86504fb20e AArch64/GlobalISel: Don't clamp from 2 to 2
This is equivalent to clampMaxNumElements, but saves a check.

llvm-svn: 353188
2019-02-05 16:57:18 +00:00
Simon Pilgrim 62af24cc93 [X86][SSE] Add SimplifyDemandedVectorElts support for X86ISD::BLENDV
llvm-svn: 353165
2019-02-05 12:27:29 +00:00
Simon Pilgrim 9e595e3663 [X86][AVX] Attempt to share broadcasts of different widths (PR39454)
If we have broadcasts of different vector widths, keep the longest vector width and extract subvectors for the shorter vectors (which should be free).

Differential Revision: https://reviews.llvm.org/D57663

llvm-svn: 353154
2019-02-05 10:58:43 +00:00
Florian Hahn 3b251963c3 [CGP] Add support for sinking operands to their users, if they are free.
This patch improves code generation for some AArch64 ACLE intrinsics. It adds
support to CGP to duplicate and sink operands to their user, if they can be
folded into a target instruction, like zexts and sub into usubl. It adds a
TargetLowering hook shouldSinkOperands, which looks at the operands of
instructions to see if sinking is profitable.

I decided to add a new target hook, as for the sinking to be profitable,
at least on AArch64, we have to look at multiple operands of an
instruction, instead of looking at the users of a zext for example.

The sinking is done in CGP, because it works around an instruction
selection limitation. If instruction selection is not limited to a
single basic block, this patch should not be needed any longer.

Alternatively this could be done in the LoopSink pass, which tries to
undo LICM for instructions in blocks that are not executed frequently.

Note that we do not force the operands to sink to have a single user,
because we duplicate them before sinking. Therefore this is only
desirable if they really can be done for free. Additionally we could
consider the impact on live ranges later on.

This should fix https://bugs.llvm.org/show_bug.cgi?id=40025.

As for performance, we have internal code that uses intrinsics and can
be speed up by 10% by this change.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D57377

llvm-svn: 353152
2019-02-05 10:27:40 +00:00
Diana Picus e24b104a11 [ARM GlobalISel] Support G_GEP for Thumb2
Same as ARM, but use a different opcode in the instruction selection.

llvm-svn: 353151
2019-02-05 10:21:37 +00:00
Craig Topper f86eb00f12 [X86] Connect the default fpsr and dirflag clobbers in inline assembly to the registers we have defined for them.
Summary:
We don't currently map these constraints to physical register numbers so they don't make it to the MachineIR representation of inline assembly.

This could have problems for proper dependency tracking in the machine schedulers though I don't have a test case that shows that.

Reviewers: rnk

Reviewed By: rnk

Subscribers: eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57641

llvm-svn: 353141
2019-02-05 06:13:06 +00:00
Heejin Ahn e37ba2cb96 [WebAssembly] Fix indentation after adding IsCanonical property (NFC)
llvm-svn: 353132
2019-02-05 01:59:49 +00:00
Wouter van Oortmerssen 1a91cb0402 [WebAssembly] Make disassembler always emit most canonical name.
Summary:
There are a few instructions that all map to the same opcode, so
when disassembling, we have to pick one. That was just the first one
before (the except_ref variant in the case of "call"), now it is the
one marked as IsCanonical in tablegen, or failing that, the shortest
name (which is typically the "canonical" one).

Also introduced a canonical "end" instruction for this purpose.

Reviewers: dschuff, tlively

Subscribers: sbc100, jgravelle-google, aheejin, llvm-commits, sunfish

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57713

llvm-svn: 353131
2019-02-05 01:19:45 +00:00
Thomas Lively d99af23765 [WebAssembly] memory.copy
Summary: Depends on D57495.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish

Differential Revision: https://reviews.llvm.org/D57498

llvm-svn: 353127
2019-02-05 00:49:55 +00:00
Matt Arsenault cba0c6d0c9 AMDGPU: Don't rematerialize mov with implicit operands
This was pulling the mov used for register indexing on gfx9 out of the
loop.

llvm-svn: 353101
2019-02-04 22:26:21 +00:00
Craig Topper c45e39b35f [CodeGen][ARC][SystemZ][WebAssembly] Use MachineInstr::isInlineAsm in more places instead of just comparing opcode. NFCI
I'm looking at adding a second INLINEASM opcode for better modeling asm-goto
as a terminator. Using the existing predicate will reduce teh number of
places that will need to use the new opcode.

llvm-svn: 353095
2019-02-04 21:24:13 +00:00
Scott Linder d19d197221 [AMDGPU] Support emitting GOT relocations for function calls
Differential Revision: https://reviews.llvm.org/D57416

llvm-svn: 353083
2019-02-04 20:00:07 +00:00
Heejin Ahn 18c56a0762 [WebAssembly] clang-tidy (NFC)
Summary:
This patch fixes clang-tidy warnings on wasm-only files.
The list of checks used is:
`-*,clang-diagnostic-*,llvm-*,misc-*,-misc-unused-parameters,readability-identifier-naming,modernize-*`
(LLVM's default .clang-tidy list is the same except it does not have
`modernize-*`. But I've seen in multiple CLs in LLVM the modernize style
was recommended and code was fixed based on the style, so I added it as
well.)

The common fixes are:
- Variable names start with an uppercase letter
- Function names start with a lowercase letter
- Use `auto` when you use casts so the type is evident
- Use inline initialization for class member variables
- Use `= default` for empty constructors / destructors
- Use `using` in place of `typedef`

Reviewers: sbc100, tlively, aardappel

Subscribers: dschuff, sunfish, jgravelle-google, yurydelendik, kripken, MatzeB, mgorny, rupprecht, llvm-commits

Differential Revision: https://reviews.llvm.org/D57500

llvm-svn: 353075
2019-02-04 19:13:39 +00:00
Roman Lebedev b7ecc9b624 [X86] X86DAGToDAGISel::matchBitExtract(): prepare 'control' in 32 bits
Summary:
Noticed while looking at D56052.
```
  // The 'control' of BEXTR has the pattern of:
  // [15...8 bit][ 7...0 bit] location
  // [ bit count][     shift] name
  // I.e. 0b000000011'00000001 means  (x >> 0b1) & 0b11
```
I.e. we do not care about any of the bits aside from the low 16 bits.
So there is no point in doing the `slh`,`or` in 64 bits,
let's just do everything in 32 bits, and anyext if needed.

We could do that in 16 even, but we intentionally don't
zext to i16 (longer encoding IIRC),
so i'm guessing the same applies here.

Reviewers: craig.topper, andreadb, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D56715

llvm-svn: 353073
2019-02-04 19:04:26 +00:00
Craig Topper 8ea72a8201 [X86] Add ST0 as an implicit def/use of x87 load/store instructions during FP stackifying.
These instructions implicitly operate on ST0, but we don't currently add that information to the MachineInstr. We also don't add it the tablegen definitions either.

For the most part this doesn't cause any problems because the stackifying occurs after register allocation. All the instructions are marked as having side effects so the postRA scheduler won't reorder them amongst themselves.

But nothing stops inline assembly using X87 instructions from being reordered around other x87 instructions if that inline assembly wasn't marked volatile.

The two test cases I've identified so far in PR40539 involve loads and stores used to set up the inline assembly or capture the results of the inline assembly ending up in the wrong order.

This patch adds implicit ST0 uses/defs to the load/store instructions to prevent this from happening.

I plan to fix all of the FP instructions, but the binops are bit trickier to get right. So I've chosen fixing the known test cases as a good first step.

I think we also need to update the tablegen descriptions so MS inline assembly infers the right clobbers, but I haven't checked that yet.

Differential Revision: https://reviews.llvm.org/D57644

llvm-svn: 353070
2019-02-04 18:43:55 +00:00
Wouter van Oortmerssen 0b3cf247c4 [WebAssembly] Make segment/size/type directives optional in asm
Summary:
These were "boilerplate" that repeated information already present
in .functype and end_function, that needed to be repeated to Please
the particular way our object writing works, and missing them would
generate errors.

Instead, we generate the information for these automatically so the
user can concern itself with writing more canonical wasm functions
that always work as expected.

Reviewers: dschuff, sbc100

Subscribers: jgravelle-google, aheejin, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D57546

llvm-svn: 353067
2019-02-04 18:03:11 +00:00
Sam Clegg d1152a267c [WebAssembly] Rename relocations from R_WEBASSEMBLY_ to R_WASM_
See https://github.com/WebAssembly/tool-conventions/pull/95.

This is less typing and IMHO more readable, and it also fits with
our naming around the binary format which tends to use the short name.
e.g.

include/llvm/BinaryFormat/Wasm.h
tools/llvm-objdump/WasmDump.cpp
etc..

Differential Revision: https://reviews.llvm.org/D57611

llvm-svn: 353062
2019-02-04 17:28:46 +00:00
Craig Topper bf7593ec4a [X86] Print all register forms of x87 fadd/fsub/fdiv/fmul as having two arguments where on is %st.
All of these instructions consume one encoded register and the other register is %st. They either write the result to %st or the encoded register. Previously we printed both arguments when the encoded register was written. And we printed one argument when the result was written to %st. For the stack popping forms the encoded register is always the destination and we didn't print both operands. This was inconsistent with gcc and objdump and just makes the output assembly code harder to read.

This patch changes things to always print both operands making us consistent with gcc and objdump. The parser should still be able to handle the single register forms just as it did before. This also matches the GNU assembler behavior.

llvm-svn: 353061
2019-02-04 17:28:18 +00:00
Simon Pilgrim 6e5350a367 [X86][SSE] SimplifyDemandedBitsForTargetNode - PCMPGT(0,X) sign mask
For PCMPGT(0, X) patterns where we only demand the sign bit (e.g. BLENDV or MOVMSK) then we can use X directly.

Differential Revision: https://reviews.llvm.org/D57667

llvm-svn: 353051
2019-02-04 15:43:36 +00:00
Matt Arsenault 10547230f3 AMDGPU/GlobalISel: Legalize select for v4s16
Also add some more select tests to help show future legalization
changes.

llvm-svn: 353045
2019-02-04 14:04:52 +00:00
Andrea Di Biagio edbf06a767 [AsmPrinter] Remove hidden flag -print-schedule.
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).

Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".

These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.

When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.

Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.

Differential Revision: https://reviews.llvm.org/D57244

llvm-svn: 353043
2019-02-04 12:51:26 +00:00
Simon Pilgrim 9899967464 Use auto for dyn_cast case to save a line. NFCI.
llvm-svn: 353041
2019-02-04 12:32:39 +00:00
David Green b4f36a2196 [ARM] Mark 255 and 65535 as cheap for Thumb1 "And"
This prevents Constant Hoisting from pulling the constant out of the block,
allowing us to still produce LDRH/UXTH nodes. LDRB/UXTB (255) is already cheap
by the default getIntImmCost, but I've added it for clarity.

Differential Revision: https://reviews.llvm.org/D57671

llvm-svn: 353040
2019-02-04 11:58:48 +00:00
Craig Topper b5e945c260 Recommit r352660 "[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7."
We now print ST0 as 'st' when generating the clobber list for MS inline assembly in clang. This matches what the gcc reg name list expects.

Original commit message:

This fixes the test case in PR35982 by preventing MMX instructions that read MM0-7 from being moved below EMMS/FEMMS by the post RA scheduler.

Though as discussed in bugzilla, this is not a complete fix. There is still the possibility of reordering in IR or by the pre-RA scheduler.

Differential Revision: https://reviews.llvm.org/D57298

llvm-svn: 353016
2019-02-04 04:44:20 +00:00
Craig Topper 7a2944efe1 [X86] Print %st(0) as %st when its implicit to the instruction. Continue printing it as %st(0) when its encoded in the instruction.
This is a step back from the change I made in r352985. This appears to be more consistent with gcc and objdump behavior.

llvm-svn: 353015
2019-02-04 04:15:10 +00:00
Craig Topper f77b858dc3 Revert r352985 "[X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly"
Looking into gcc and objdump behavior more this was overly aggressive. If the register is encoded in the instruction we should print %st(0), if its implicit we should print %st.

I'll be making a more directed change in a future patch.

llvm-svn: 353013
2019-02-04 04:15:02 +00:00
Simon Pilgrim 1fce5a8b75 [X86][AVX] Support shuffle combining for VBROADCAST with smaller vector sources
getTargetShuffleMask can only do this safely if we're extracting the lowest subvector from a vector of the same result type.

llvm-svn: 352999
2019-02-03 16:51:33 +00:00
Simon Pilgrim 18b73a655b [X86][AVX] Support shuffle combining for VPMOVZX with smaller vector sources
llvm-svn: 352997
2019-02-03 16:10:18 +00:00
Simon Pilgrim a2a3e5b811 [X86][AVX] More aggressively simplify BROADCAST source operand
Aim to use scalar source or lowest 128-bit vector directly.

We're still missing some VZMOVL_LOAD combines.

llvm-svn: 352994
2019-02-03 14:39:41 +00:00
Craig Topper 5a570dd437 [X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly
Summary:
When calculating clobbers for MS style inline assembly we fail if the asm clobbers stack top because we print st(0) and try to pass it through the gcc register name check. This was found with when I attempted to make a emms/femms clobber all ST registers. If you use emms/femms in MS inline asm we would try to use st(0) as the clobber name but clang would think that wasn't a valid clobber name.

This also matches what objdump disassembly prints. It's also what is printed by gcc -S.

Reviewers: RKSimon, rnk, efriedma, spatel, andreadb, lebedev.ri

Reviewed By: rnk

Subscribers: eraman, gbedwell, lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D57621

llvm-svn: 352985
2019-02-03 07:53:39 +00:00
Craig Topper 950ca192f6 [X86] Lower ISD::UADDO to use the Z flag instead of C flag when the RHS is a constant 1 to encourage INC formation.
Summary:
Add an additional combine to combineCarryThroughADD to reverse it back to the C flag to avoid regressions.

I believe this catches the cases that D57547 got.

Reviewers: RKSimon, spatel

Reviewed By: spatel

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57637

llvm-svn: 352984
2019-02-03 07:25:06 +00:00
Fangrui Song b21ed3c57a [AMDGPU] Fix -Wunused-variable after rL352978
llvm-svn: 352982
2019-02-03 03:51:52 +00:00
Matt Arsenault 888aa5dedd GlobalISel: Implement widenScalar for G_UNMERGE_VALUES
For the scalar case only.

Also move the similar G_MERGE_VALUES handling to a separate function
and cleanup to make them look more similar.

llvm-svn: 352979
2019-02-03 00:07:33 +00:00
Matt Arsenault 0e5d856eb8 GlobalISel: Implement widenScalar for G_EXTRACT vector sources
Handle the basic element extract case.

llvm-svn: 352978
2019-02-02 23:56:00 +00:00
Matt Arsenault eb2603cfb2 AMDGPU/GlobalISel: Avoid reporting illegal extloads as legal
This avoids breaking a test in a future commit.

llvm-svn: 352977
2019-02-02 23:39:13 +00:00
Matt Arsenault 58f9d3df97 AMDGPU/GlobalISel: Legalize icmp for pointer types
llvm-svn: 352976
2019-02-02 23:35:15 +00:00
Matt Arsenault 2065c94dd3 AMDGPU/GlobalISel: Legalize constant for pointer types
llvm-svn: 352975
2019-02-02 23:33:49 +00:00
Matt Arsenault 2491f82679 AMDGPU/GlobalISel: Legalize select for pointer types
llvm-svn: 352974
2019-02-02 23:31:50 +00:00
Matt Arsenault cbaada6bc1 GlobalISel: Legalization for inttoptr/ptrtoint
llvm-svn: 352973
2019-02-02 23:29:55 +00:00
Simon Pilgrim dbf302c9f1 [X86][AVX] Enable INSERT_SUBVECTOR(SRC0, SHUFFLE(SRC1)) shuffle combining
Push the insert_subvector up through the shuffle operands to help find more cross-lane shuffles.

The is exposes a couple of minor issues that will be fixed shortly:
Missed broadcast folds - we have a mixture of vzext_load lengths that need cleaning up
combine-sdiv.ll - AVX1 SimplifyDemandedVectorElts failure (hits max depth due to a couple of extra bitcasts).

llvm-svn: 352963
2019-02-02 18:08:04 +00:00
Simon Pilgrim bd42f97946 [SDAG] Add SDNode/SDValue getConstantOperandAPInt helper. NFCI.
We already have the getConstantOperandVal helper which returns a uint64_t, but along comes the fuzzer and inserts a i128 -1 constant or something and the whole thing asserts.......

I've updated a few obvious cases, and tried to make use of the const reference where possible, but there's more to do. A number of existing oss-fuzz tickets should be fixed if we start using APInt and perform value clamping where necessary.

llvm-svn: 352961
2019-02-02 17:35:06 +00:00
Yonghong Song fa3654008b [BPF] [BTF] Process FileName with absolute path correctly
In IR, sometimes the following attributes for DIFile may be
generated:
  filename: /home/yhs/test.c
  directory: /tmp
The /tmp may represent the working directory of the compilation
process.

In such cases, since filename is with absolute path,
the directory should be ignored by BTF. The filename alone is
enough to get the source.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 352952
2019-02-02 05:54:59 +00:00
Yonghong Song 329010e1b5 Revert "[BPF] [BTF] Process FileName with absolute path correctly"
This reverts commit r352939.

Some tests failed. Revert to unblock others.

llvm-svn: 352941
2019-02-01 23:49:52 +00:00
Mandeep Singh Grang dc1e778369 [AArch64] Fix unused variable [NFC]
llvm-svn: 352940
2019-02-01 23:42:34 +00:00
Yonghong Song 5233fb8f5e [BPF] [BTF] Process FileName with absolute path correctly
In IR, sometimes the following attributes for DIFile may be
generated:
  filename: /home/yhs/test.c
  directory: /tmp
The /tmp may represent the working directory of the compilation
process.

In such cases, since filename is with absolute path,
the directory should be ignored by BTF. The filename alone is
enough to get the source.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 352939
2019-02-01 23:23:17 +00:00
Dan Gohman f726e4454c [WebAssembly] Add codegen support for the import_field attribute
This adds the LLVM side of https://reviews.llvm.org/D57602 -- the
import_field attribute. See that patch for details.

Differential Revision: https://reviews.llvm.org/D57603

llvm-svn: 352931
2019-02-01 22:27:34 +00:00
Mandeep Singh Grang 70d484d94e [COFF, ARM64] Fix localaddress to handle stack realignment and variable size objects
Summary: This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present.

Reviewers: rnk, efriedma, ssijaric, TomTan

Reviewed By: rnk, efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D57183

llvm-svn: 352923
2019-02-01 21:41:33 +00:00
Simon Pilgrim e95550f508 [X86][AVX] Add VMOVDDUP-VPBROADCASTQ execution domain mapping
Noticed in D57514.

Differential Revision: https://reviews.llvm.org/D57519

llvm-svn: 352922
2019-02-01 21:41:30 +00:00
James Y Knight 7716075a17 [opaque pointer types] Pass value type to GetElementPtr creation.
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57173

llvm-svn: 352913
2019-02-01 20:44:47 +00:00
James Y Knight 14359ef1b6 [opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911
2019-02-01 20:44:24 +00:00
James Y Knight 7976eb5838 [opaque pointer types] Pass function types to CallInst creation.
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57170

llvm-svn: 352909
2019-02-01 20:43:25 +00:00
Roland Froese 7f29195c3f test commit (add blank line) NFC
llvm-svn: 352897
2019-02-01 18:55:43 +00:00
Tim Corringham fa3e4e5b53 [AMDGPU] Fix for vector element insertion
Summary:
Incorrect code was generated when lowering insertelement operations
for vectors with 8 or 16 bit elements.  The value being inserted was
not adjusted for the position of the element within the 32 bit word
and so only the low element within each 32 bit word could receive
the intended value.

Fixed by simply replicating the value to each element of a
congruent vector before the mask and or operation used to
update the intended element.

A number of affected LIT tests have been updated appropriately.

before the mask & or into the intended

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: llvm-commits, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57588

llvm-svn: 352885
2019-02-01 16:51:09 +00:00
Simon Pilgrim 85184017e9 [X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle
As suggested on PR40318, this patch uses PSLLDQ/PSRLDQ to lower shuffles to zero out the ends of a vector, leaving a sequential inner section.

For pre-SSSE3 we do this for shuffles with zeros at either end (requiring up to 3 shifts), but once PSHUFB is available I've limited this to shuffles with a single zeroable end (2 shifts).

Differential Revision: https://reviews.llvm.org/D56784

llvm-svn: 352883
2019-02-01 16:02:12 +00:00
Simon Pilgrim 1a529f58f9 [X86][AVX] Combine INSERT_SUBVECTOR(SRC0, BITCAST(SHUFFLE(EXTRACT_SUBVECTOR(SRC1)))
Enable peeking through one use bitcasts to the subvector shuffle.

This still depends on the subvector being the same scalar-size but D57514 has already helped with the more tricky patterns

llvm-svn: 352879
2019-02-01 15:31:01 +00:00