Commit Graph

26447 Commits

Author SHA1 Message Date
Scott Linder 11ef7984b0 [AMDGPU] Add a pass to promote bitcast calls
AMDGPU currently only supports direct calls, but at lower optimisation levels it
fails to lower statically direct calls which appear indirect due to a bitcast.

Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize"
calls.

Differential Revision: https://reviews.llvm.org/D52741

llvm-svn: 345382
2018-10-26 13:18:36 +00:00
Simon Pilgrim 11c01f402f Regenerate test
llvm-svn: 345379
2018-10-26 12:33:56 +00:00
George Rimar 088d96b43d [Codegen] - Implement basic .debug_loclists section emission (DWARF5).
.debug_loclists is the DWARF 5 version of the .debug_loc.
With that patch, it will be emitted when DWARF 5 is used.

Differential revision: https://reviews.llvm.org/D53365

llvm-svn: 345377
2018-10-26 11:25:12 +00:00
Li Jia He f6fb752fe8 [PowerPC] Fix some missed optimization opportunities in combineSetCC
For both operands are bool, short, int, long, long long, add the following optimization.
1. 0-x == y --> x+y ==0
2. 0-x != y --> x+y != 0

Review: nemanjai
Differential Revision: https://reviews.llvm.org/D53360

llvm-svn: 345366
2018-10-26 06:48:53 +00:00
Li Jia He 9521467318 [PowerPC][NFC] Add tests for some missed optimization opportunities in combineSetCC
For both operands are bool, short, int, long, long long, add the following optimization test case.
1. 0-x == y --> x+y ==0
2. 0-x != y --> x+y != 0

Review: nemanjai
Differential Revision: https://reviews.llvm.org/D53358

llvm-svn: 345365
2018-10-26 05:02:10 +00:00
Nemanja Ivanovic 6a74bfba20 [PowerPC] Keep vector int to fp conversions in vector domain
At present a v2i16 -> v2f64 convert is implemented by extracts to scalar,
scalar converts, and merge back into a vector. Use vector converts instead,
with the int data permuted into the proper position and extended if necessary.

Patch by RolandF.

Differential revision: https://reviews.llvm.org/D53346

llvm-svn: 345361
2018-10-26 03:19:13 +00:00
Fangrui Song 5300c2e0ea [Pipeliner] Mark swp-art-deps-rec.ll as REQUIRES: asserts after rL345319
llvm-svn: 345359
2018-10-26 03:15:56 +00:00
Vlad Tsyrklevich 21beeb29ea Revert "[AArch64] Create proper memoperand for multi-vector stores"
This reverts commit r345315, it was causing test failures on
sanitizer-x86_64-linux-fast.

llvm-svn: 345356
2018-10-26 02:00:14 +00:00
Jonas Paulsson e2c5cbc164 [SystemZ] Pass the DAG pointer from SystemZAddressingMode::dump().
In order to print the IR slot number for the memory operand, the DAG pointer
must be passed to SDNode::dump().

The isel-debug.ll test updated to also check for the IR Value reference being
printed correctly.

Review: Ulrich Weigand
https://reviews.llvm.org/D53333

llvm-svn: 345347
2018-10-26 00:02:33 +00:00
Heejin Ahn 24faf859e5 Reland "[WebAssembly] LSDA info generation"
Summary:
This adds support for LSDA (exception table) generation for wasm EH.
Wasm EH mostly follows the structure of Itanium-style exception tables,
with one exception: a call site table entry in wasm EH corresponds to
not a call site but a landing pad.

In wasm EH, the VM is responsible for stack unwinding. After an
exception occurs and the stack is unwound, the control flow is
transferred to wasm 'catch' instruction by the VM, after which the
personality function is called from the compiler-generated code. (Refer
to WasmEHPrepare pass for more information on this part.)

This patch:
- Changes wasm.landingpad.index intrinsic to take a token argument, to
make this 1:1 match with a catchpad instruction
- Stores landingpad index info and catch type info MachineFunction in
before instruction selection
- Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an
exception table
- Adds WasmException class with overridden methods for table generation
- Adds support for LSDA section in Wasm object writer

Reviewers: dschuff, sbc100, rnk

Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52748

llvm-svn: 345345
2018-10-25 23:55:10 +00:00
Heejin Ahn 3103d3dcd1 [WebAssembly] Support EH instructions in InstPrinter
Summary: This adds support for exception handling instructions to InstPrinter.

Reviewers: dschuff, aardappel

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53634

llvm-svn: 345343
2018-10-25 23:45:48 +00:00
Jonas Paulsson f213f81d9c Fix in MachineOperand::printIRValueReference().
Handle the case where getCurrentFunction() returns nullptr by passing -1 to
printIRSlotNumber(). This will result in <badref> being printed instead of an
assertion failure.

Review: Francis Visoiu Mistrih
https://reviews.llvm.org/D53333

llvm-svn: 345342
2018-10-25 23:39:07 +00:00
Bryan Chan f0923f16f8 [AArch64] Implement FP16FML intrinsics
Add LLVM intrinsics for the ARMv8.2-A FP16FML vector-form instructions. Add a
DAG pattern to define the indexed-form intrinsics in terms of the vector-form
ones, similarly to how the Dot Product intrinsics were implemented.

Based on a patch by Gao Yiling.

Differential Revision: https://reviews.llvm.org/D53632

llvm-svn: 345337
2018-10-25 23:36:41 +00:00
Heejin Ahn 1d13e6be37 Address comments
- Add llvm-mc test case (and delete the old one)
- Change report_fatal_error to assertions

llvm-svn: 345334
2018-10-25 23:35:14 +00:00
Heejin Ahn 1147d91402 [WebAssembly] Error out when block/loop markers mismatch
Summary:
Currently InstPrinter ignores if there are mismatches between block/loop
and end markers by skipping the case if ControlFlowStack is empty. I
guess it is better to explicitly error out in this case, because this
signals invalid input.

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53620

llvm-svn: 345333
2018-10-25 23:35:13 +00:00
Sanjay Patel c14aafdacc [x86] add tests for missed load folding; NFC
llvm-svn: 345325
2018-10-25 22:23:27 +00:00
Sumanth Gundapaneni ada0f511ba [Pipeliner] Ignore Artificial dependences while computing recurrences.
The artificial dependencies are not real dependencies. In some cases, they
form circuits with bigger MII. However, they are used to schedule instructions
better.

Differential Revision: https://reviews.llvm.org/D53450

llvm-svn: 345319
2018-10-25 21:27:08 +00:00
Craig Topper 813064bf4d [X86] Change X86 backend to look for 'min-legal-vector-width' attribute instead of 'required-vector-width' when determining whether 512-bit vectors should be legal.
The required-vector-width attribute was only used for backend testing and has never been generated by clang.

I believe clang is now generating min-legal-vector-width for vector uses in user code.

With this I believe passing -mprefer-vector-width=256 to clang should prevent use of zmm registers in the generated assembly unless the user used a 512-bit intrinsic in their source code.

llvm-svn: 345317
2018-10-25 21:16:06 +00:00
Francis Visoiu Mistrih 5be9e6de89 [CodeGen] Remove operands from FENTRY_CALL
FENTRY_CALL is actually not taking any input / output operands. The
machine verifier complains now because the target description says that:

* It needs 1 unknown output
* It needs 1 or more variable inputs

llvm-svn: 345316
2018-10-25 21:12:15 +00:00
David Greene 53e869da7d [AArch64] Create proper memoperand for multi-vector stores
Include all of the store's source vector operands when creating the
MachineMemOperand. Previously, we were missing the first operand,
making the store size seem smaller than it really is.

Differential Revision: https://reviews.llvm.org/D52816

llvm-svn: 345315
2018-10-25 21:10:39 +00:00
Volkan Keles f28e81f6aa [AArch64][GlobalISel] Simplify a legalizer test. NFC.
llvm-svn: 345307
2018-10-25 20:01:19 +00:00
Thomas Lively 0aad98fd07 [WebAssembly] Use target-independent saturating add
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53721

llvm-svn: 345299
2018-10-25 19:06:13 +00:00
Craig Topper 4a825e7b29 [X86] Add some non-AVX512VL command lines to the *vl-vec-test-testn.ll tests.
This will expose some regressions in the WIP and/or/xor promotion removal patch.

llvm-svn: 345297
2018-10-25 18:23:48 +00:00
Craig Topper ce0bc3814b [X86] Add KNL command lines to movmsk-cmp.ll.
Some of this code looks pretty bad and we should probably still be using movmskb more with avx512f.

llvm-svn: 345293
2018-10-25 18:06:25 +00:00
Volkan Keles 60c6affcb0 [GlobalISel] LegalizerHelper: Fix the incorrect alignment when splitting loads/stores in narrowScalar
Reviewers: dsanders, bogner, jpaquette, aemerson, ab, paquette

Reviewed By: dsanders

Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53664

llvm-svn: 345292
2018-10-25 17:52:19 +00:00
Craig Topper 5d787ac4be [X86] Remove some uarch tuning flags from KNL that look to have been inherited from SNB/IVB incorrectly
KNL is based on a modified Silvermont core so I don't think these features apply. I think the LEA flag is probably also wrong, but I'm less sure as I barely understand the 3 LEA flags we have currently.

Differential Revision: https://reviews.llvm.org/D53671

llvm-svn: 345285
2018-10-25 17:28:57 +00:00
Volkan Keles 3a103b1d25 [AArch64][GlobalISel] Fix the LegalityPredicate for lowerIf for G_LOAD/G_STORE
Summary:
Currently, Legalizer is trying to lower G_LOAD with a vector type
that has more than two elements due to the incorrect LegalityPredicate.

This patch fixes the issue by removing the multiplication by 8
as `MemDesc.Size` already contains the size in bits.

Reviewers: dsanders, aemerson

Reviewed By: dsanders

Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53679

llvm-svn: 345282
2018-10-25 17:23:25 +00:00
Simon Pilgrim 8f11ddc397 [ARM] Regenerate vdup tests
llvm-svn: 345276
2018-10-25 15:33:47 +00:00
John Brawn 958865202d [AArch64] Add EXT patterns for 64-bit EXT of a subvector of a 128-bit vector
If we have a 64-bit EXT where one of the operands is a subvector of a 128-bit
vector then in some cases we can eliminate an extract_subvector by converting
to a 128-bit EXT of the 128-bit vector.

Differential Revision: https://reviews.llvm.org/D53582

llvm-svn: 345275
2018-10-25 15:31:51 +00:00
John Brawn 49e61d90ca [AArch64] Do 64-bit vector move of 0 and -1 by extracting from the 128-bit move
Currently a vector move of 0 or -1 will use different instructions depending on
the size of the vector. Using a single instruction (the 128-bit one) for both
gives more opportunity for Machine CSE to eliminate instructions.

Differential Revision: https://reviews.llvm.org/D53579

llvm-svn: 345270
2018-10-25 14:56:48 +00:00
Francis Visoiu Mistrih 7d55dd673b [X86] Fix llc invocation on MIR test case
The current state of the llc invocation is:

* Running all the passes from dwarfehprepare to stack coloring
(included)
* It runs it from the LLVM IR included in the file
* It *ADDS* the generated MI from ISel to the MI in the MIR file
* The machine verifier doesn't like it.

Differential Revision: https://reviews.llvm.org/D53698

llvm-svn: 345266
2018-10-25 14:11:07 +00:00
Amara Emerson cbd86d8429 [GlobalISel] Use the target preferred type for G_EXTRACT_VECTOR_ELT index.
Allows for better imported pattern re-use.

llvm-svn: 345265
2018-10-25 14:04:54 +00:00
Simon Pilgrim 838eb24014 [TargetLowering] Improve vXi64 UINT_TO_FP vXf64 support (P38226)
As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well.

Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets.

The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it.

Differential Revision: https://reviews.llvm.org/D53649

llvm-svn: 345256
2018-10-25 11:15:57 +00:00
Carlos Alberto Enciso 9a24e1a7cd [DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG.
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines.

Differential Revision: https://reviews.llvm.org/D53287

llvm-svn: 345250
2018-10-25 09:58:59 +00:00
Thomas Lively 325c9c5e84 [WebAssembly] Set LoadExt and TruncStore actions for SIMD types
Summary: Fixes part of the problem reported in bug 39275.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton

Differential Revision: https://reviews.llvm.org/D53542

llvm-svn: 345230
2018-10-25 01:46:07 +00:00
Reid Kleckner a6c6698217 [X86] Adjust MIR test case to pacify machine verifier
llvm-svn: 345227
2018-10-24 23:52:33 +00:00
Reid Kleckner 24d12c28e7 [X86] Fix pipeline tests when enabling MIR verification, NFC
llvm-svn: 345226
2018-10-24 23:52:22 +00:00
Heejin Ahn ac764aa88e [WebAssembly] Fix immediate of rethrow when throwing to caller
Summary:
Currently when assigning depths 'rethrow' does not take the whole
control flow stack into accounts but only considers EH pad stacks. When
assigning depth immmediates to rethrows, in normal cases it is done
correctly but when a rethrow instruction throws up to a caller, i.e., we
convert a pseudo RETHROW_TO_CALLER instruction to a rethrow, it
mistakenly compute the whole stack depth.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53619

llvm-svn: 345223
2018-10-24 23:31:24 +00:00
Thomas Lively ed9513472c [WebAssembly] Retain shuffle types during custom lowering
Summary:
Changing the node type in lowering was violating assumptions made in
the DAG combiner, so don't change the node type any more. This fixes
one of the issues reported in bug 39275.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton

Differential Revision: https://reviews.llvm.org/D53537

llvm-svn: 345221
2018-10-24 23:27:40 +00:00
Reid Kleckner 49a24278ba [ELF] Fix large code model MIR verifier errors
Instead of using the MOVGOT64r pseudo, use the existing
MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to
create a "scratch register operand" for the pseudo to use, and the
register allocator can make better decisions.

Fixes some X86 verifier errors tracked in PR27481.

llvm-svn: 345219
2018-10-24 22:57:28 +00:00
Thomas Lively 30f1d69115 [NFC] Rename minnan and maxnan to minimum and maximum
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.

Reviewers: arsenm, aheejin, dschuff, javed.absar

Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53112

llvm-svn: 345218
2018-10-24 22:49:55 +00:00
Thomas Lively 43bc46207a [SelectionDAG] DAG combiner for fminnan and fmaxnan
Summary: Depends on D52765.

Reviewers: aheejin, dschuff

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52768

llvm-svn: 345210
2018-10-24 22:18:54 +00:00
Reid Kleckner 9c5bda652c [X86] Add *SP to tailcall register class to fix verifier error
It's possible to do a tail call to a stack argument. LLVM already
calculates the right stack offset to call through.

Fixes the sibcall* and musttail* verifier failures tracked at PR27481.

llvm-svn: 345197
2018-10-24 21:09:34 +00:00
Tim Northover 1c353419ab AArch64: add a pass to compress jump-table entries when possible.
llvm-svn: 345188
2018-10-24 20:19:09 +00:00
Simon Pilgrim c5bb362b13 [X86][SSE] Add SimplifyDemandedBitsForTargetNode PMULDQ/PMULUDQ handling
Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes.

This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398.

Differential Revision: https://reviews.llvm.org/D53643

llvm-svn: 345182
2018-10-24 19:11:28 +00:00
Peter Collingbourne 4bb928c110 ARM: Use BKPT instead of TRAP to implement llvm.debugtrap.
The BKPT instruction is specified to cause a software breakpoint,
and at least on Linux results in a SIGTRAP. This makes it more
suitable for implementing debugtrap than TRAP (aka UDF #254), which
is specified to cause an undefined instruction exception and results
in a SIGILL on Linux.

Moreover, BKPT is not marked as a terminator, which is not only
consistent with the IR instruction but allows the analyzeBlock
function to correctly analyze a basic block containing the instruction,
which fixes an assertion failure in the machine block placement pass
previously triggered by the included test case.

Because BKPT is only supported starting with ARMv5T, we continue to
use UDF #254 when targeting v4T.

Differential Revision: https://reviews.llvm.org/D53614

llvm-svn: 345171
2018-10-24 18:10:38 +00:00
Craig Topper 2417273255 [X86] Bring back the MOV64r0 pseudo instruction
This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos.

My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at.

There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling.

Differential Revision: https://reviews.llvm.org/D52757

llvm-svn: 345165
2018-10-24 17:32:09 +00:00
Robert Lougher 18bfb3a5ec [CodeGen] skip lifetime end marker in isInTailCallPosition
A lifetime end intrinsic between a tail call and the return should not
prevent the call from being tail call optimized.

Differential Revision: https://reviews.llvm.org/D53519

llvm-svn: 345163
2018-10-24 17:03:19 +00:00
Simon Pilgrim 84cc110732 [X86][SSE] Update PMULDQ schedule tests to survive more aggressive SimplifyDemandedBits
llvm-svn: 345136
2018-10-24 13:13:36 +00:00
Tim Renouf 2a1b1d94b6 [AMDGPU] Defined gfx909 Raven Ridge 2
Differential Revision: https://reviews.llvm.org/D53418

Change-Id: Ie3d054f2e956c2768988c0f4c0ffd29a47294eef
llvm-svn: 345120
2018-10-24 08:14:07 +00:00
Saleem Abdulrasool 4005f9a860 ARM: handle checking aliases with out-of-bounds GEPs
A global alias may use indices which are not considered in bounds.  In
such a case, accessing the base object will fail as it only peers
through inbounds accesses.  This pattern is used by the swift compiler
to create references to preceeding members in the type metadata.  This
would cause the code generation to fail when targeting a platform that
used ELF as the object file format.  Be conservative and fail the
read-only check if we run into an alias that we cannot peer through.

llvm-svn: 345107
2018-10-24 00:00:52 +00:00
Matthias Braun 4f82406c46 SelectionDAG: Reuse bigger sized constants in memset expansion.
When implementing memset's today we often see this pattern:
$x0 = MOV 0xXYXYXYXYXYXYXYXY
store $x0, ...
$w1 = MOV 0xXYXYXYXY
store $w1, ...

We first create a 64bit constant in a 64bit register with all bytes the
same and then create a 32bit constant with all bytes the same in a 32bit
register. In many targets we could just access the lower byte of the
64bit register instead.

- Ideally this would be handled by the ConstantHoist pass but it runs
  too early when memset isn't expanded yet.
- The memset expansion code already had this optimization implemented,
  however SelectionDAG constantfolding would constantfold the
  "trunc(bigconstnat)" pattern to "smallconstant".
- This patch makes the memset expansion mark the constant as Opaque and
  stop DAGCombiner from constant folding in this situation. (Similar to
  how ConstantHoisting marks things as Opaque to avoid folding
  ADD/SUB/etc.)

Differential Revision: https://reviews.llvm.org/D53181

llvm-svn: 345102
2018-10-23 23:19:23 +00:00
Craig Topper e01d516ac7 [X86] Autogenerate comple checks. NFC
llvm-svn: 345087
2018-10-23 21:58:49 +00:00
Peter Collingbourne abd820a92b CGP: Clear data structures at the end of a loop iteration instead of the beginning.
Clearing LargeOffsetGEPMap at the end fixes a bug where if a large
offset GEP is in a dead basic block, we fail an assertion when trying
to delete the block due to the asserting VH in LargeOffsetGEPMap.

Differential Revision: https://reviews.llvm.org/D53464

llvm-svn: 345082
2018-10-23 21:23:18 +00:00
Simon Pilgrim b6c57075c0 [X86][SSE] Revert rL343922 combinePMULDQ AddToWorklist (PR39398)
We can't add the MULDQ node back to the worklist after the demanded bits change has been committed in case the node has been removed entirely. This will have to wait until we have SimplifyDemandedBitsForTargetNode.

llvm-svn: 345070
2018-10-23 19:07:53 +00:00
Simon Pilgrim d705ba97dd [LegalizeDAG] Share Vector/Scalar CTLZ Expansion
As suggested on D53258, this patch shares common CTLZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering.

Extension to D53474

llvm-svn: 345060
2018-10-23 17:48:30 +00:00
Stefan Pintilie 927e8bf316 [Power9] Add __float128 support in the backend for bitcast to a i128
Add support to allow bit-casting from f128 to i128 and then
extracting 64 bits from the result.

Differential Revision: https://reviews.llvm.org/D49507

llvm-svn: 345053
2018-10-23 17:11:36 +00:00
Roman Lebedev 06e4db07af Experimental re-land of [X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern
This initially landed in rL345014, but was reverted in rL345017
due to sanitizer-x86_64-linux-fast buildbot failure in
check-lld (ELF/relocatable-versioned.s) test.

While i'm not yet quite sure what is the problem, one obvious
thing here is that extra truncation roundtrip.
Maybe that's it? If not, will re-revert.

Differential Revision: https://reviews.llvm.org/D53521

llvm-svn: 345027
2018-10-23 13:19:31 +00:00
Roman Lebedev c29dbbdb10 Revert "[X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern"
*Seems* to be breaking sanitizer-x86_64-linux-fast buildbot,
the ELF/relocatable-versioned.s test:

==17758==MemorySanitizer CHECK failed: /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191 "((kBlockMagic)) == ((((u64*)addr)[0]))" (0x6a6cb03abcebc041, 0x0)
    #0 0x59716b in MsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/msan/msan.cc:393
    #1 0x586635 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_termination.cc:79
    #2 0x57d5ff in __sanitizer::InternalFree(void*, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191
    #3 0x7fc21b24193f  (/lib/x86_64-linux-gnu/libc.so.6+0x3593f)
    #4 0x7fc21b241999 in exit (/lib/x86_64-linux-gnu/libc.so.6+0x35999)
    #5 0x7fc21b22c2e7 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e7)
    #6 0x57c039 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld+0x57c039)

This reverts commit r345014.

llvm-svn: 345017
2018-10-23 10:34:57 +00:00
Roman Lebedev 1c95b2f779 [X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern
Summary:
Continuation of D52348.

We also get the `c) x &  (-1 >> (32 - y))` pattern here, because of the D48768.
I will add extra-uses into those tests and follow-up with a patch to handle those patterns too.

Reviewers: RKSimon, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53521

llvm-svn: 345014
2018-10-23 09:08:44 +00:00
Craig Topper f50f086743 [X86] Regenerate test checks to show fma comments. NFC
llvm-svn: 344999
2018-10-23 04:18:08 +00:00
Heejin Ahn a40303aa03 [WebAssembly] Fix assembly printing of br_table
Summary: In `br_table's stack version asm string, \t was missing.

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53516

llvm-svn: 344981
2018-10-23 00:28:14 +00:00
Wouter van Oortmerssen a569c20587 [WebAssembly] Added test for inline assembly roundtrip.
Summary:
Due to previous work to make WebAssembly MC by default stack-only
inline assembly now "just works" (previously it didn't since it had
no way to know types of registers), so no further work required.

So far we only have tests (in inline-asm.ll) which test with
non-existing instructions, so this adds a test that roundtrips
both the inline assembly and its surrounding code thru the assembler.

Reviewers: dschuff, sunfish

Subscribers: sbc100, jgravelle-google, eraman, aheejin, llvm-commits

Differential Revision: https://reviews.llvm.org/D52914

llvm-svn: 344977
2018-10-23 00:12:49 +00:00
Leonard Chan 0acfc6be38 [Intrinsic] Unigned Saturation Addition Intrinsic
Add an intrinsic that takes 2 integers and perform unsigned saturation
addition on them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D53340

llvm-svn: 344971
2018-10-22 23:08:40 +00:00
Matthias Braun a0beeffeed X86: Do not optimize branches with undef eflags inputs
analyzeBranch()/insertBranch() etc. do not properly deal with an undef
flag on the eflags input and used to produce invalid MIR.  I don't see
this ever affecting real world inputs (I don't think it is possible to
produce undef flags with llvm IR), so I simply changed the code to bail
out in this case.

rdar://42122367

llvm-svn: 344970
2018-10-22 22:52:23 +00:00
Craig Topper c8e183f9ee Recommit r344877 "[X86] Stop promoting integer loads to vXi64"
I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile.

Original commit message:

Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem

I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.

I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo

I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D53306

llvm-svn: 344965
2018-10-22 22:14:05 +00:00
Sanjay Patel fb41544af8 [x86] add test for PR25498 and complete checks; NFC
Might as well test the actual codegen instead of just the absence of crashing.

llvm-svn: 344955
2018-10-22 21:11:15 +00:00
Justin Bogner 912adfba7e Reapply "[MachineCopyPropagation] Reimplement CopyTracker in terms of register units"
Recommits r342942, which was reverted in r343189, with a fix for an
issue where we would propagate unsafely if we defined only the upper
part of a register.

Original message:

  Change the copy tracker to keep a single map of register units
  instead of 3 maps of registers. This gives a very significant
  compile time performance improvement to the pass. I measured a
  30-40% decrease in time spent in MCP on x86 and AArch64 and much
  more significant improvements on out of tree targets with more
  registers.

llvm-svn: 344942
2018-10-22 19:51:31 +00:00
Craig Topper 8d8dcfe690 Revert r344877 "[X86] Stop promoting integer loads to vXi64"
Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out.

llvm-svn: 344921
2018-10-22 16:59:24 +00:00
Matt Arsenault 687ec75d10 DAG: Change behavior of fminnum/fmaxnum nodes
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.

There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.

llvm-svn: 344914
2018-10-22 16:27:27 +00:00
Roman Lebedev 13c5ab2e27 [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) pattern
Summary:
Trivial continuation of D52304.
While this pattern is not canonical, we do select it in the BZHI case,
so this should not be any different.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52348

llvm-svn: 344902
2018-10-22 13:54:17 +00:00
Nemanja Ivanovic 674581afbb [PowerPC][NFC] Fix bugs in r+r to r+i conversion
The D-Form VSX loads introduced in ISA 3.0 are not direct D-Form equivalent of
the corresponding X-Forms since they only target the Altivec registers.
Namely LXSSPX can load into any of the 64 VSX registers whereas LXSSP can only
load into the upper 32 VSX registers. Similarly with the remaining affected
instructions.

There is currently no way that I can see to trigger the bug, but as we add other
ways of exploiting these instructions, there may very well be instances that do.

This is an NFC patch in practical terms since the changes it introduces can not
be triggered without an MIR test.

Differential revision: https://reviews.llvm.org/D53323

llvm-svn: 344894
2018-10-22 11:22:59 +00:00
Craig Topper 290c081d91 [X86] Add patterns for vector and/or/xor/andn with other types than vXi64.
This makes fast isel treat all legal vector types the same way. Previously only vXi64 was in the fast-isel tables.

This unfortunately prevents matching of andn by fast-isel for these types since the requires SelectionDAG. But we already had this issue for vXi64. So at least we're consistent now.

Interestinly it looks like fast-isel can't handle instructions with constant vector arguments so the the not part of the andn patterns is selected with SelectionDAG. This explains why VPTERNLOG shows up in some of the tests.

This is a subset of D53268. As I make progress on that, I will try to reduce the number of lines in the tablegen files.

llvm-svn: 344884
2018-10-22 06:30:22 +00:00
Craig Topper 321df5b0d4 [X86] Stop promoting integer loads to vXi64
Summary:
Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast.

I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.

I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size.

I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D53306

llvm-svn: 344877
2018-10-21 21:30:26 +00:00
Sanjay Patel e439cc2745 [DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016)
This is a late backend subset of the IR transform added with:
D52439

We can confirm that the conversion to a 'trunc' is correct by running:
$ opt -instcombine -data-layout="e"
(assuming the IR transforms are correct; change "e" to "E" for big-endian)

As discussed in PR39016:
https://bugs.llvm.org/show_bug.cgi?id=39016
...the pattern may emerge during legalization, so that's we are waiting for an 
insertelement to become a scalar_to_vector in the pattern matching here.

The DAG allows for fun variations that are not possible in IR. Result types for 
extracts and scalar_to_vector don't necessarily match input types, so that means 
we have to be a bit more careful in the transform (see code comments).

The tests show that we don't handle cases that require a shift (as we did in the
IR version). I've left that as a potential follow-up because I'm not sure if 
that's a real concern at this late stage.

Differential Revision: https://reviews.llvm.org/D53201

llvm-svn: 344872
2018-10-21 20:13:29 +00:00
Simon Pilgrim eb806d5f30 [X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 unary shuffle lowering
llvm-svn: 344868
2018-10-21 17:07:50 +00:00
David Blaikie 2df23a4e2e DebugInfo: Use DW_OP_addrx in DWARFv5
Reuse addresses in the address pool, even in non-split cases.

llvm-svn: 344838
2018-10-20 08:54:05 +00:00
Thomas Lively 5ea17d450e [WebAssembly] Implement vector sext_inreg and tests with comparisons
Summary: Depends on D53251.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53252

llvm-svn: 344826
2018-10-20 01:35:23 +00:00
Thomas Lively 55735d522d [WebAssembly] Custom lower i64x2 constant shifts to avoid wrap
Summary: Depends on D53057.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53251

llvm-svn: 344825
2018-10-20 01:31:18 +00:00
Roman Tereshin 8d6ff4c0af [MachineCSE][GlobalISel] Making sure MachineCSE works mid-GlobalISel (again)
Change of approach, it looks like it's a much better idea to deal with
the vregs that have LLTs and reg classes both properly, than trying to
avoid creating those across all GlobalISel passes and all targets.

The change mostly touches MachineRegisterInfo::constrainRegClass,
which is apparently only used by MachineCSE. The changes are NFC for
any pipeline but one that contains MachineCSE mid-GlobalISel.

NOTE on isCallerPreservedOrConstPhysReg change in MachineCSE:

    There is no test covering it as the only way to insert a new pass
(MachineCSE) from a command line I know of is llc's -run-pass option,
which only works with MIR, but MIRParser freezes reserved registers upon
MachineFunctions creation, making it impossible to reproduce the state
that exposes the issue.

Reviwed By: aditya_nandakumar

Differential Revision: https://reviews.llvm.org/D53144

llvm-svn: 344822
2018-10-20 00:06:15 +00:00
Changpeng Fang f95f763ea5 AMDGPU: Add support pattern for SUB of one bit
Summary:
  Add selection patterns to support one bit Sub.

Reviewers:
  rampitec, arsenm

Differential Revision:
  https://reviews.llvm.org/D52946

llvm-svn: 344815
2018-10-19 21:09:21 +00:00
Aditya Nandakumar cd04e366d7 [GISel]: Allow PHIs to be DCEd
https://reviews.llvm.org/D53304

Currently dead phis are not cleaned up during DCE. This patch allows
dead PHI and G_PHI insts to be deleted.

Reviewed by: dsanders

llvm-svn: 344811
2018-10-19 20:11:52 +00:00
Thomas Lively 11a332d08d [WebAssembly] Handle undefined lane indices in SIMD patterns
Summary:
Undefined indices in shuffles can be used when not all lanes of the
output vector will be used. This happens for example in the expansion
of vector reduce operations. Regardless, undefs are legal as lane
indices in IR and should be supported.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53057

llvm-svn: 344803
2018-10-19 19:08:06 +00:00
Krzysztof Parzyszek 6bfc6577f2 [Hexagon] Remove support for V4
llvm-svn: 344791
2018-10-19 17:31:11 +00:00
Fangrui Song d243ac6c4b [pipeliner] Fix test added in rL344748 to require asserts
llvm-svn: 344775
2018-10-19 06:20:01 +00:00
Eli Friedman b09c778715 Revert r344693 ("[ARM] bottom-top mul support in ARMParallelDSP")
Still causing failures on the polly-aosp buildbot; I'll follow up
with a reduced testcase.

llvm-svn: 344752
2018-10-18 19:34:30 +00:00
Sumanth Gundapaneni 62ac69d45c [Pipeliner] copyToPhi DAG Mutation to improve scheduling.
In a loop, create artificial dependences between the source of a
COPY/REG_SEQUENCE to the use in next iteration.

Eg:
SRC ----Data Dep--> COPY
COPY ---Anti Dep--> PHI (implies, to be used in next iteration)
PHI ----Data Dep--> USE

This patches creates
USE ----Artificial Dep---> SRC

This will effectively schedule the COPY late to eliminate additional copies.
Before this patch, the schedule can be
SRC, COPY, USE : The COPY is used in next iteration and it needs to be
preserved.

After this patch, the schedule can be
USE, SRC, COPY : The COPY is used in next iteration and the live interval is
reduced.

Differential Revision: https://reviews.llvm.org/D53303

llvm-svn: 344748
2018-10-18 15:51:16 +00:00
Kristina Brooks 312fcc116b [X86] Support for the mno-tls-direct-seg-refs flag
Allows to disable direct TLS segment access (%fs or %gs). GCC supports
a similar flag, it can be useful in some circumstances, e.g. when a thread
context block needs to be updated directly from user space. More info
and specific use cases: https://bugs.llvm.org/show_bug.cgi?id=16145

There is another revision for clang as well.
Related: D53102

All X86 CodeGen tests appear to pass:
```
[46/47] Running lit suite /SourceCache/llvm-trunk-8.0/test/CodeGen
Testing Time: 23.17s
  Expected Passes    : 3801
  Expected Failures  : 15
  Unsupported Tests  : 8021
```

Reviewed by: Craig Topper.

Patch by nruslan (Ruslan Nikolaev).

Differential Revision: https://reviews.llvm.org/D53103

llvm-svn: 344723
2018-10-18 03:14:37 +00:00
Nicolai Haehnle 4821937d2e AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI
Summary:
To workaround a hardware issue in the (base + offset) calculation
when base is negative. The impact on code quality should be limited
since SILoadStoreOptimizer still runs afterwards and is able to
combine loads/stores based on known sign information.

This fixes visible corruption in Hitman on SI (easily reproducible
by running benchmark mode).

Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923

Reviewers: arsenm, mareko

Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53160

llvm-svn: 344698
2018-10-17 15:37:48 +00:00
Nicolai Haehnle 0823050b9f StructurizeCFG: Simplify inserted PHI nodes
Summary:
This improves subsequent divergence analysis in some cases.

Change-Id: I5e95e7ec7fd3fa80d414d1a53a02fea23e3d67d3

Reviewers: arsenm, rampitec

Subscribers: jvesely, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D53316

llvm-svn: 344697
2018-10-17 15:37:41 +00:00
Nicolai Haehnle c4a2ff0950 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 344696
2018-10-17 15:37:30 +00:00
Sam Parker 2ef3c0dad6 [ARM] bottom-top mul support in ARMParallelDSP
Previously reverted in rL343082.

Original commit message:

On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.

Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 344693
2018-10-17 13:02:48 +00:00
Petar Jovanovic 8a08412533 [MIPS GlobalISel] Legalize constants
Legalize s1, s8, s16 and s64 G_CONSTANT for MIPS32.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D53077

llvm-svn: 344684
2018-10-17 10:30:03 +00:00
Sjoerd Meijer 64cfb74a61 [ARM] Do not fuse VADD and VMUL, continued (2/2)
This is patch 2/2, following up on D53314, and is the functional change
to prevent fusing mul + add sequences into VFMAs.

Differential revision: https://reviews.llvm.org/D53315

llvm-svn: 344683
2018-10-17 10:05:44 +00:00
Sjoerd Meijer ff3ab33ec8 [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2)
This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs
for performance reasons on the Cortex-M4 and Cortex-M33.  This is a serie of 2
patches, that is trying to achieve the same for VFMA.  The second column in the
table below shows what we were generating before rL342874, the third column
what changed with rL342874, and the last column what we want to achieve with
these 2 patches:

 --------------------------------------------------------
 | Opt   |  < rL342874   |  >= rL342874   |             |
 |------------------------------------------------------|
 |-O3    |     vmla      |      vmul      |     vmul    |
 |       |               |      vadd      |     vadd    |
 |------------------------------------------------------|
 |-Ofast |     vfma      |      vfma      |     vmul    |
 |       |               |                |     vadd    |
 |------------------------------------------------------|
 |-Oz    |     vmla      |      vmla      |     vmla    |
 --------------------------------------------------------

This patch 1/2, is a cleanup of the spaghetti predicate logic on the different
VMLA and VFMA codegen rules, so that we can make the final functional change in
patch 2/2.  This also fixes a typo in the regression test added in rL342874.

Differential revision: https://reviews.llvm.org/D53314

llvm-svn: 344671
2018-10-17 07:26:35 +00:00
Craig Topper e0a992918b [X86] Match (cmp (and (shr X, C), mask), 0) to BEXTR+TEST.
Without this we match the CMP+AND to a TEST and then match the SHR separately. I'm trusting analyzeCompare to remove the TEST during the peephole pass. Otherwise we need to check the flag users to see if they only use the Z flag.

This recovers a case lost by r344270.

Differential Revision: https://reviews.llvm.org/D53310

llvm-svn: 344649
2018-10-16 22:29:36 +00:00
Krasimir Georgiev 547d824da6 Revert "[WebAssembly] LSDA info generation"
This reverts commit r344575.
Newly introduced test eh-lsda.ll.test fails with use-after-free under
ASAN build.

llvm-svn: 344639
2018-10-16 18:50:09 +00:00
Leonard Chan 699b3b54da [Intrinsic] Signed Saturation Addition Intrinsic
Add an intrinsic that takes 2 integers and perform saturation addition on them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D53053

llvm-svn: 344629
2018-10-16 17:35:41 +00:00
Aleksandar Beserminji a5949439ca [mips][micromips] Fix how values in .gcc_except_table are calculated
When a landing pad is calculated in a program that is compiled
for micromips, it will point to an even address. Such an error will
cause a segmentation fault, as the instructions in micromips are
aligned on odd addresses. This patch sets the last bit of the offset
where a landing pad is, to 1, which will effectively be
an odd address and point to the instruction exactly.

Differential Revision: https://reviews.llvm.org/D52985

llvm-svn: 344591
2018-10-16 08:27:28 +00:00
Heejin Ahn 0981eaab47 [WebAssembly] LSDA info generation
Summary:
This adds support for LSDA (exception table) generation for wasm EH.
Wasm EH mostly follows the structure of Itanium-style exception tables,
with one exception: a call site table entry in wasm EH corresponds to
not a call site but a landing pad.

In wasm EH, the VM is responsible for stack unwinding. After an
exception occurs and the stack is unwound, the control flow is
transferred to wasm 'catch' instruction by the VM, after which the
personality function is called from the compiler-generated code. (Refer
to WasmEHPrepare pass for more information on this part.)

This patch:
- Changes wasm.landingpad.index intrinsic to take a token argument, to
make this 1:1 match with a catchpad instruction
- Stores landingpad index info and catch type info MachineFunction in
before instruction selection
- Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an
exception table
- Adds WasmException class with overridden methods for table generation
- Adds support for LSDA section in Wasm object writer

Reviewers: dschuff, sbc100, rnk

Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52748

llvm-svn: 344575
2018-10-16 00:09:12 +00:00
Nicolai Haehnle 6d92ca61f9 StructurizeCFG,AMDGPU: Test case of a redundant phi and codegen consequences
Change-Id: I9681f9e41ca30f82576f3d1f965c3a550a34b171
llvm-svn: 344569
2018-10-15 22:37:46 +00:00
Craig Topper 2909a3d9d0 [X86] Fix a bad bitcast in the load form of vXi16 uniform shift patterns for EVEX encoded instructions.
llvm-svn: 344563
2018-10-15 21:51:32 +00:00
Craig Topper 548e8b5b47 [X86] Add test cases showing failure to fold load into vpsrlw when EVEX encoded instructions are used.
There's a bad bitcast being used in the isel patterns for the vXi16 shift instructions.

llvm-svn: 344562
2018-10-15 21:51:29 +00:00
Craig Topper 125c54ba10 [X86] Disable the peephole pass on avx2-intrinsics-x86.ll and avx512bw-intrinsics.ll to ensure any load folding tests are testing isel not load folding tables.
llvm-svn: 344561
2018-10-15 21:51:26 +00:00
Craig Topper da0e544953 [X86] Regenerate avx2-intrinsics-x86.ll to compress the 32 vs 64 bit mode checks.
llvm-svn: 344560
2018-10-15 21:51:22 +00:00
Simon Pilgrim 095a7fe635 [AARCH64] Improve vector popcnt lowering with ADDLP
AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors.

This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258.

Differential Revision: https://reviews.llvm.org/D53259

llvm-svn: 344554
2018-10-15 21:15:58 +00:00
Konstantin Zhuravlyov 94dfcc2eb2 AMDGPU: Generate .amdgcn_target for object code v3
Differential Revision: https://reviews.llvm.org/D53221

llvm-svn: 344552
2018-10-15 20:37:47 +00:00
Sanjay Patel 4cf1da0e02 [SelectionDAG] allow FP binops in SimplifyDemandedVectorElts
This is intended to make the backend on par with functionality that was 
added to the IR version of SimplifyDemandedVectorElts in:
rL343727
...and the original motivation is that we need to improve demanded-vector-elements 
in several ways to avoid problems that would be exposed in D51553.

Differential Revision: https://reviews.llvm.org/D52912

llvm-svn: 344541
2018-10-15 18:05:34 +00:00
Sanjay Patel 8bd74785f0 [DAGCombiner] allow undef elts in vector fmul matching
llvm-svn: 344534
2018-10-15 16:54:07 +00:00
Sanjay Patel 7cf5733f7f [AArch64] add tests for fmul x, -2.0 with undef elts; NFC
Also, add tests with commuted operands. There was no coverage for that case.

llvm-svn: 344531
2018-10-15 16:44:00 +00:00
Sanjay Patel 9e7e0fd828 [DAGCombiner] allow undef elts in vector fma matching
llvm-svn: 344528
2018-10-15 15:56:39 +00:00
Sanjay Patel 475a53649e [x86] add tests for fma with undef elts; NFC
llvm-svn: 344527
2018-10-15 15:47:37 +00:00
Sanjay Patel 4e970ff022 [DAGCombiner] allow undef elts in vector fma matching
llvm-svn: 344525
2018-10-15 15:38:38 +00:00
Sanjay Patel b06ac18ee9 [x86] add tests for fma with undef elts; NFC
llvm-svn: 344523
2018-10-15 15:28:44 +00:00
Simon Pilgrim 5abb607ebe [ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281)
As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type.

This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case......

Differential Revision: https://reviews.llvm.org/D53257

llvm-svn: 344512
2018-10-15 13:20:41 +00:00
Nicolai Haehnle 1bb242e201 AMDGPU: Test showing a scalar buffer load deficiency
Change-Id: I5b64a565f22a8482aa0712488d85e45163ac3d12
llvm-svn: 344506
2018-10-15 11:37:04 +00:00
Bjorn Pettersson 064944352e [TwoAddressInstructionPass] Replace subregister uses when processing tied operands
Summary:
TwoAddressInstruction pass typically rewrites
  %1:short = foo %0.sub_lo:long
as
  %1:short = COPY %0.sub_lo:long
  %1:short = foo %1:short
when having tied operands.

If there are extra un-tied operands that uses the same reg and
subreg, such as the second and third inputs to fie here:
  %1:short = fie %0.sub_lo:long, %0.sub_hi:long, %0.sub_lo:long
then there was a bug which replaced the register %0 also for
the un-tied operand, but without changing the subregister indices.
So we used to get:
  %1:short = COPY %0.sub_lo:long
  %1:short = fie %1, %1.sub_hi:short, %1.sub_lo:short
With this fix we instead get:
  %1:short = COPY %0.sub_lo:long
  %1:short = fie %1, %0.sub_hi:long, %1

Reviewers: arsenm, JesperAntonsson, kparzysz, MatzeB

Reviewed By: MatzeB

Subscribers: bjope, kparzysz, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D36224

llvm-svn: 344492
2018-10-15 08:36:03 +00:00
Craig Topper b44b22c6fe [X86] Autogenerate checks. NFC
llvm-svn: 344490
2018-10-15 05:31:24 +00:00
Craig Topper 06aea1720a [X86] Move promotion of vector and/or/xor from legalization to DAG combine
Summary:
I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that.

This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better.

In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53107

llvm-svn: 344487
2018-10-15 01:51:58 +00:00
Craig Topper 671779456a [X86] Add 128 MOVDDUP to the constant pool printing in X86AsmPrinter::EmitInstruction.
We use this instruction to broadcast a single 64-bit value to a v2i64/v2f64 vector.

llvm-svn: 344486
2018-10-15 01:51:53 +00:00
Craig Topper b5000974fe [X86] Autogenerate complete checks. NFC
llvm-svn: 344485
2018-10-15 01:51:50 +00:00
Simon Pilgrim 861cd0ba44 [X86][AVX] Enable lowerVectorShuffleAsLanePermuteAndPermute v16i16/v32i8 shuffle lowering
Extends D53148 from v4f64 now that we have test coverage for v16i16/v32i8 shuffles.

llvm-svn: 344481
2018-10-14 17:34:20 +00:00
Simon Pilgrim 9afb1e66e5 [ARM] Regenerate cttz tests
Improve codegen view as part of PR32655

llvm-svn: 344479
2018-10-14 16:49:04 +00:00
Craig Topper ec4b75f47a [X86] Type legalize v2f32 stores by widening to v4f32, casting to v2f64, extracting f64 and storing.
Summary: This is similar to what D52528 did for loads. It should match what generic type legalization does in 64-bit mode where it uses a v2i64 cast and an i64 store.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53173

llvm-svn: 344470
2018-10-14 03:36:27 +00:00
Simon Pilgrim 2ac03ec2c4 [AARCH64] Regenerate popcnt tests
Improve codegen view as part of PR32655

llvm-svn: 344466
2018-10-13 21:50:15 +00:00
Simon Pilgrim 247ea88090 [ARM] Regenerate popcnt tests
Improve codegen view as part of PR32655

llvm-svn: 344465
2018-10-13 21:32:49 +00:00
Craig Topper 189e5b4ab6 [LegalizeTypes] Prevent an assertion from PromoteIntRes_BSWAP and PromoteIntRes_BITREVERSE if the shift amount is too large for the VT returned by getShiftAmountTy
Summary:
getShiftAmountTy for X86 returns MVT::i8. If a BSWAP or BITREVERSE is created that requires promotion and the difference between the original VT and the promoted VT is more than 255 then we won't able to create the constant.

This patch adds a check to replace the result from getShiftAmountTy to MVT::i32 if the difference won't fit. This should get legalized later when the shift is ultimately expanded since its clearly an illegal type that we're only promoting to make it a power of 2 bit width. Alternatively we could base the decision completely on the largest shift amount the promoted VT could use.

Vectors should be immune here because getShiftAmountTy always returns the incoming VT for vectors. Only the scalar shift amount can be changed by the targets.

Reviewers: eli.friedman, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53232

llvm-svn: 344460
2018-10-13 17:47:20 +00:00
Simon Pilgrim 1c6d320351 [X86][SSE] combineIncDecVector - use isConstantSplat
Use isConstantSplat instead of ISD::isConstantSplatVector to let us us peek through to illegal types (in this case for i686 targets to recognise i64 constants)

llvm-svn: 344452
2018-10-13 14:45:44 +00:00
Simon Pilgrim f64e654d62 [X86][SSE] Improve CTTZ lowering when CTLZ is legal
If we have better CTLZ support than CTPOP, then use cttz(x) = width - ctlz(~x & (x - 1)) - and remove the CTTZ_ZERO_UNDEF handling as it no longer gives better codegen.

Similar to rL344447, this is also closer to LegalizeDAG's approach

llvm-svn: 344448
2018-10-13 13:05:19 +00:00
Simon Pilgrim afead139cf [X86][SSE] Change CTTZ vector lowering to cttz(x) = ctpop(~x & (x - 1))
This patch changes the vector CTTZ lowering from:

cttz(x) = ctpop((x & -x) - 1)

to:

cttz(x) = ctpop(~x & (x - 1))

Not only does this make better use of the PANDN instruction, but it also matches the LegalizeDAG method which should allow us to remove the x86 specific code at some point in the future (we need to fix some issues with the bitcasted logic ops and CTPOP lowering first).

Differential Revision: https://reviews.llvm.org/D53214

llvm-svn: 344447
2018-10-13 12:12:06 +00:00
Simon Pilgrim f3952413f7 [X86][AVX] Add lowerVectorShuffleAsLanePermuteAndPermute for v4f64 shuffles (PR39161)
Add shuffle lowering for the case where we can shuffle the lanes into place followed by an in-lane permute.

This is mainly for cases where we can have non-repeating permutes in each lane, but for now I've just enabled it for v4f64 unary shuffles to fix PR39161 - there is no test coverage for other shuffles that might benefit yet.

We now have several cross-lane shuffle lowering methods that all do something similar - I've looked at merging some of these (notably by making the repeated mask mechanism in lowerVectorShuffleByMerging128BitLanes optional), but there is a lot of assertions/assumptions in the way that makes this tricky - I ended up going for adding yet another relatively simple method instead.

Differential Revision: https://reviews.llvm.org/D53148

llvm-svn: 344446
2018-10-13 11:38:10 +00:00
Arnaud A. de Grandmaison 162435e7b5 [AArch64] Swap comparison operands if that enables some folding.
Summary:
AArch64 can fold some shift+extend operations on the RHS operand of
comparisons, so swap the operands if that makes sense.

This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751

Reviewers: efriedma, t.p.northover, javed.absar

Subscribers: mcrosier, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53067

llvm-svn: 344439
2018-10-13 07:43:56 +00:00
Thomas Lively 3afc346dd0 [WebAssembly] SIMD min and max
Summary: Depends on D52324 and D52764.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52325

llvm-svn: 344438
2018-10-13 07:26:10 +00:00
Alex Bradbury 748d080e62 [RISCV] Eliminate unnecessary masking of promoted shift amounts
SelectionDAGBuilder::visitShift will always zero-extend a shift amount when it
is promoted to the ShiftAmountTy. This results in zero-extension (masking)
which is unnecessary for RISC-V as the shift operations only read the lower 5
or 6 bits (RV32 or RV64).

I initially proposed adding a getExtendForShiftAmount hook so the shift amount
can be any-extended (D52975). @efriedma explained this was unsafe, so I have
instead eliminate the unnecessary and operations at instruction selection time
in a manner similar to X86InstrCompiler.td.

Differential Revision: https://reviews.llvm.org/D53224

llvm-svn: 344432
2018-10-12 23:18:52 +00:00
Craig Topper 3e76b2d736 [X86] Improve type legalization of (v2i32/v4i16/v8i16 (bitcast (v2f32))) to avoid a stack stack temporary.
llvm-svn: 344425
2018-10-12 22:00:04 +00:00
Craig Topper 435e38a5df [LegalizeVectorTypes] When widening the result of a bitcast from a scalar type, use a scalar_to_vector to turn the scalar into a vector intead of a build vector full of mostly undefs.
This is more consistent with what we usually do and matches some code X86 custom emits in some cases that I think I can cleanup.

The MIPS test change just looks to be an instruction ordering change.

llvm-svn: 344422
2018-10-12 21:59:55 +00:00
Craig Topper 1bb0c6041a [LegalizeVectorTypes] When widening the operands to a concat_vectors, see if we can use the widened operand 0 if the width matches and the other operands are undef.
This saves a conversion to extracts and build_vector. We already do this when both the result and the input need to be widened to the same type.

This changed the sse-intrinsics-fast-isel test because we don't lower (insert_vector_elt (scalar_to_vector X), Y, 1) well. We turn it into (vector_shuffle (scalar_to_vector X), (scalar_to_vector Y), <0, 4, 2, 3>) losing track of the fact that the upper elts could be undef.

We should probably find a way to prevent the scalarization of the <2 x f32> load on these tests.

llvm-svn: 344404
2018-10-12 19:37:49 +00:00
Simon Pilgrim 1d6b938132 Regenerate test. NFCI.
llvm-svn: 344399
2018-10-12 19:03:54 +00:00
Sanjay Patel e28c8ecd72 [x86] add and use fast horizontal vector math subtarget feature
This is the planned follow-up to D52997. Here we are reducing horizontal vector math codegen 
by default. AMD Jaguar (btver2) should have no difference with this patch because it has 
fast-hops. (If we want to set that bit for other CPUs, let me know.)

The code changes are small, but there are many test diffs. For files that are specifically 
testing for hops, I added RUNs to distinguish fast/slow, so we can see the consequences 
side-by-side. For files that are primarily concerned with codegen other than hops, I just 
updated the CHECK lines to reflect the new default codegen.

To recap the recent horizontal op story:

1. Before rL343727, we were producing hops for all subtargets for a variety of patterns. 
   Hops were likely not optimal for all targets though.
2. The IR improvement in r343727 exposed a hole in the backend hop pattern matching, so 
   we reduced hop codegen for all subtargets. That was bad for Jaguar (PR39195).
3. We restored the hop codegen for all targets with rL344141. Good for Jaguar, but 
   probably bad for other CPUs.
4. This patch allows us to distinguish when we want to produce hops, so everyone can be 
   happy. I'm not sure if we have the best predicate here, but the intent is to undo the 
   extra hop-iness that was enabled by r344141.

Differential Revision: https://reviews.llvm.org/D53095

llvm-svn: 344361
2018-10-12 16:41:02 +00:00
Nick Desaulniers 47bab69a2e [MC][ELF] fix newly added test
Summary:
Reland of
- r344197 "[MC][ELF] compute entity size for explicit sections"
- r344206 "[MC][ELF] Fix section_mergeable_size.ll"
after being reverted in r344278 due to build breakages from not
specifying a target triple.

Move test from test/CodeGen/Generic/ to test/MC/ELF/.
Add explicit target triple so we don't try to run
this test on non ELF targets.

Reported: https://reviews.llvm.org/D53056#1261707

Reviewers: fhahn, rnk, espindola, NoQ

Reviewed By: fhahn, rnk

Subscribers: NoQ, MaskRay, rengolin, emaste, arichardson, llvm-commits, pirama, srhines

Differential Revision: https://reviews.llvm.org/D53146

llvm-svn: 344360
2018-10-12 16:35:44 +00:00
Zachary Turner 9f169afab2 Make YAML quote forward slashes.
If you have the string /usr/bin, prior to this patch it would not
be quoted by our YAML serializer.  But a string like C:\src would
be, due to the presence of a backslash.  This makes the quoting
rules of basically every single file path different depending on
the path syntax (posix vs. Windows).

While technically not required by the YAML specification to quote
forward slashes, when the behavior of paths is inconsistent it
makes it difficult to portably write FileCheck lines that will
work with either kind of path.

Differential Revision: https://reviews.llvm.org/D53169

llvm-svn: 344359
2018-10-12 16:31:20 +00:00
Zachary Turner 9c544199cf Revert "Make YAML quote forward slashes."
This reverts commit b86c16ad8c97dadc1f529da72a5bb74e9eaed344.

This is being reverted because I forgot to write a useful
commit message, so I'm going to resubmit it with an actual
commit message.

llvm-svn: 344358
2018-10-12 16:31:08 +00:00
Zachary Turner ec234052a6 Make YAML quote forward slashes.
llvm-svn: 344357
2018-10-12 16:24:09 +00:00
Sanjay Patel f5b1892348 [AArch64][x86] add tests for trunc disguised as vector ops (PR39016); NFC
These correspond to the IR transform from:
D52439

llvm-svn: 344353
2018-10-12 15:22:14 +00:00
Simon Pilgrim b8339c0167 [SelectionDAG] Move VectorLegalizer::ExpandCTLZ codegen into SelectionDAGLegalize
Generalize SelectionDAGLegalize's CTLZ expansion to handle vectors - lets VectorLegalizer::ExpandCTLZ to just pass the expansion on instead of repeating the same codegen.

llvm-svn: 344349
2018-10-12 14:45:57 +00:00
Simon Pilgrim 78b5a3c3ef [X86][SSE] LowerVectorCTPOP - pull out repeated byte sum stage.
Pull out repeated byte sum stage for popcount of vector elements > 8bits.

This allows us to simplify the LUT/BITMATH popcnt code to always assume vXi8 vectors, and also improves avx512bitalg codegen which only has access to vpopcntb/vpopcntw.

llvm-svn: 344348
2018-10-12 14:18:47 +00:00
Hiroshi Inoue 9552dd187a [PowerPC] avoid masking already-zero bits in BitPermutationSelector
The current BitPermutationSelector generates a code to build a value by tracking two types of bits: ConstZero and Variable.
ConstZero means a bit we need to mask off and Variable is a bit we copy from an input value.

This patch add third type of bits VariableKnownToBeZero caused by AssertZext node or zero-extending load node.
VariableKnownToBeZero means a bit comes from an input value, but it is known to be already zero. So we do not need to mask them.
VariableKnownToBeZero enhances flexibility to group bits, since we can avoid redundant masking for these bits.

This patch also renames "HasZero" to "NeedMask" since now we may skip masking even when we have zeros (of type VariableKnownToBeZero).

Differential Revision: https://reviews.llvm.org/D48025

llvm-svn: 344347
2018-10-12 14:02:20 +00:00
Simon Pilgrim bb37e81b65 [X86][AVX] Regenerate tzcnt tests
llvm-svn: 344341
2018-10-12 13:24:51 +00:00
Simon Pilgrim 29279f29c8 [X86][SSE] Add extract_subvector(PSHUFB) -> PSHUFB(extract_subvector()) combine
Fixes PR32160 by reducing the size of PSHUFB if we only use one of the lanes.

This approach can probably be generalized to handle any target shuffle (and any subvector index) but we have no test coverage at the moment.

llvm-svn: 344336
2018-10-12 12:10:34 +00:00
Simon Pilgrim bc760b2dc5 [X86][AVX] Add examples of shuffles that can be reduced to a cross-lane shuffle followed by a in-lane permute
Suitable for lowering by D53148

llvm-svn: 344332
2018-10-12 10:26:59 +00:00
Simon Pilgrim c844bc84dd [X86] Ignore float/double non-temporal loads (PR39256)
Scalar non-temporal loads were asserting instead of just being ignored.

Reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10895

llvm-svn: 344331
2018-10-12 10:20:16 +00:00
Stefan Maksimovic 285c0f4fdc [mips] Mark fmaxl as a long double emulation routine
Failure was discovered upon running
projects/compiler-rt/test/builtins/Unit/divtc3_test.c
in a stage2 compiler build.

When compiling projects/compiler-rt/lib/builtins/divtc3.c,
a call to fmaxl within the divtc3 implementation had its
return values read from registers $2 and $3 instead of $f0 and $f2.
Include fmaxl in the list of long double emulation routines
to have its return value correctly interpreted as f128.

Almost exact issue here: https://reviews.llvm.org/D17760

Differential Revision: https://reviews.llvm.org/D52649

llvm-svn: 344326
2018-10-12 08:18:38 +00:00
Sanjay Patel 56b6660d2e [DAGCombiner] rearrange extract_element+bitcast fold; NFC
I want to add another pattern here that includes scalar_to_vector,
so this makes that patch smaller. I was hoping to remove the
hasOneUse() check because it shouldn't be necessary for common
codegen, but an AMDGPU test has a comment suggesting that the
extra check makes things better on one of those targets.

llvm-svn: 344320
2018-10-11 23:56:56 +00:00
Tom Stellard a894043910 Revert "AMDGPU/GlobalISel: Implement select for G_INSERT"
This reverts commit r344310.

The test case was failing on some bots.

llvm-svn: 344317
2018-10-11 23:36:46 +00:00
Tom Stellard 4733be6e7b AMDGPU/GlobalISel: Implement select for G_INSERT
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53116

llvm-svn: 344310
2018-10-11 22:49:54 +00:00
Sanjay Patel 99c02f6301 [x86] add tests for extract_element; NFC
The transform for this pattern has an unnecessary one-use limitation.

llvm-svn: 344303
2018-10-11 22:04:36 +00:00
Sanjay Patel 7ad8e32f03 [x86] regenerate CHECKs; NFC
llvm-svn: 344301
2018-10-11 21:44:38 +00:00
Craig Topper 35d513c7e4 [X86] Type legalize v2f32 loads by using an f64 load and a scalar_to_vector.
On 64-bit targets the generic legalize will use an i64 load and a scalar_to_vector for us. But on 32-bit targets i64 isn't legal and the generic legalizer will end up emitting two 32-bit loads. We have DAG combines that try to put those two loads back together with pretty good success.

This patch instead uses f64 to avoid the splitting entirely. I've made it do the same for 64-bit mode for consistency and to keep the load in the fp domain.

There are a few things in here that look like regressions in 32-bit mode, but I believe they bring us closer to the 64-bit mode codegen. And that the 64-bit mode code could be better. I think those issues should be looked at separately.

Differential Revision: https://reviews.llvm.org/D52528

llvm-svn: 344291
2018-10-11 20:36:06 +00:00
Sumanth Gundapaneni a4a9155e4f [Hexagon] Restrict compound instructions with constant value.
Having a constant value operand in the compound instruction
is not always profitable. This patch improves coremark by ~4% on
Hexagon.

Differential Revision: https://reviews.llvm.org/D53152

llvm-svn: 344284
2018-10-11 19:48:15 +00:00
Artem Dergachev 2ce1d6faf8 Revert r344197 "[MC][ELF] compute entity size for explicit sections"
Revert r344206 "[MC][ELF] Fix section_mergeable_size.ll"

They were causing failures on too many important buildbots for too long.
Please revert eagerly if your fix takes more than a couple of hours to land!

llvm-svn: 344278
2018-10-11 18:43:08 +00:00
Nirav Dave f1f2a2a31a [DAG] Fix Big Endian in Load-Store forwarding
Summary:
Correct offset calculation in load-store forwarding for big-endian
targets.

Reviewers: rnk, RKSimon, waltl

Subscribers: sdardis, nemanjai, hiraditya, jrtc27, atanasyan, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D53147

llvm-svn: 344272
2018-10-11 18:28:59 +00:00
Craig Topper fb2ac8969e [X86] Restore X86ISelDAGToDAG::matchBEXTRFromAnd. Teach address matching to create a BEXTR pattern from a (shl (and X, mask >> C1) if C1 can be folded into addressing mode.
This is an alternative to D53080 since I think using a BEXTR for a shifted mask is definitely an improvement when the shl can be absorbed into addressing mode. The other cases I'm less sure about.

We already have several tricks for handling an and of a shift in address matching. This adds a new case for BEXTR.

I've moved the BEXTR matching code back to X86ISelDAGToDAG to allow it to match. I suppose alternatively we could directly emit a X86ISD::BEXTR node that isel could pattern match. But I'm trying to view BEXTR matching as an isel concern so DAG combine can see 'and' and 'shift' operations that are well understood. We did lose a couple cases from tbm_patterns.ll, but I think there are ways to recover that.

I've also put back the manual load folding code in matchBEXTRFromAnd that I removed a few months ago in r324939. This gives us some more freedom to make decisions based on the ability to fold a load. I haven't done anything with that yet.

Differential Revision: https://reviews.llvm.org/D53126

llvm-svn: 344270
2018-10-11 18:06:07 +00:00
Alex Bradbury 686ef92141 [RISCV] Re-generate test/CodeGen/RISCV/vararg.ll after r344142
The improved load-store forwarding committed in r344142 broke this test.

llvm-svn: 344238
2018-10-11 11:11:58 +00:00
Roman Lebedev 4225f4adff [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ~(-1 << nbits) pattern
Summary:
As discussed in D48491, we can't really do this in the TableGen,
since we need to produce *two* instructions. This only implements
one single pattern. The other 3 patterns will be in follow-ups.

I'm not sure yet if we want to also fuse shift into here
(i.e `(x >> start) & ...`)

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D52304

llvm-svn: 344224
2018-10-11 07:51:13 +00:00
Fangrui Song f953ea5fb6 [MC][ELF] Fix section_mergeable_size.ll
Some targets use %progbits instead of @progbits.

Updating that check with a {{[@%]}}progbits regex to make those bots happy.

llvm-svn: 344206
2018-10-11 00:08:59 +00:00
Thomas Lively 2ebacb107b [WebAssembly] Saturating float to int intrinsics
Summary:
Although the saturating float to int instructions are already
emitted from normal IR, the fpto{s,u}i instructions produce poison
values if the argument cannot fit in the result type. These intrinsics
are therefore necessary to get guaranteed defined saturating behavior.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D53004

llvm-svn: 344204
2018-10-11 00:01:25 +00:00
Nick Desaulniers 335315697a [MC][ELF] compute entity size for explicit sections
Summary:
Global variables might declare themselves to be in explicit sections.
Calculate the entity size always to prevent assembler warnings
"entity size for SHF_MERGE not specified" when sections are to be
marked merge-able.

Fixes PR31828.

Reviewers: rnk, echristo

Reviewed By: rnk

Subscribers: llvm-commits, pirama, srhines

Differential Revision: https://reviews.llvm.org/D53056

llvm-svn: 344197
2018-10-10 22:52:32 +00:00
Roman Lebedev 62cd430602 [NFC][X86][AArch64] extract-bits.ll: add tests with constants+storing results.
As noted in https://reviews.llvm.org/D53080#inline-467678,
this *may* get pessimized by that diff.

llvm-svn: 344182
2018-10-10 20:50:52 +00:00
Roman Lebedev 33d84c6dac [X86] Move X86DAGToDAGISel::matchBEXTRFromAnd() into X86ISelLowering
Summary:
As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=38938 | PR38938 ]],
we fail to emit `BEXTR` if the mask is shifted.
We can't deal with that in `X86DAGToDAGISel` `before the address mode for the inc is selected`,
and we can't really do it in the normal DAGCombine, because we don't have generic `ISD::BitFieldExtract` node,
and if we simply turn the shifted mask into a normal mask + shift-left, it will be folded back.
So it would seem X86ISelLowering is the place to handle this.

This patch only moves the matchBEXTRFromAnd()
from X86DAGToDAGISel to X86ISelLowering.
It does not add support for the 'shifted mask' pattern.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52426

llvm-svn: 344179
2018-10-10 20:40:12 +00:00
Volkan Keles da5578c5d0 [GlobalISel] Fix the artifact combiner to fold G_IMPLICIT_DEF properly
Summary:
GlobalISel generates incorrect code because the legalizer artifact
combiner assumes `G_[SZ]EXT (G_IMPLICIT_DEF)` is equivalent to
`G_IMPLICIT_DEF `.

Replace `G_[SZ]EXT (G_IMPLICIT_DEF)` with 0 because the top bits
will be 0 for G_ZEXT and 0/1 for the G_SEXT.

Reviewers: aditya_nandakumar, dsanders, aemerson, javed.absar

Reviewed By: aditya_nandakumar

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D52996

llvm-svn: 344163
2018-10-10 18:01:48 +00:00
Nirav Dave 07acc992dc [DAGCombine] Improve Load-Store Forwarding
Summary:
Extend analysis forwarding loads from preceeding stores to work with
extended loads and truncated stores to the same address so long as the
load is fully subsumed by the store.

Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are
deleted as they've no longer seem to be relevant.

Reviewers: RKSimon, rnk, kparzysz, javed.absar

Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D49200

llvm-svn: 344142
2018-10-10 14:15:52 +00:00
Sanjay Patel 6cca8af227 [x86] allow single source horizontal op matching (PR39195)
This is intended to restore horizontal codegen to what it looked like before IR demanded elements improved in:
rL343727

As noted in PR39195:
https://bugs.llvm.org/show_bug.cgi?id=39195
...horizontal ops can be worse for performance than a shuffle+regular binop, so I've added a TODO. Ideally, we'd 
solve that in a machine instruction pass, but a quicker solution will be adding a 'HasFastHorizontalOp' feature
bit to deal with it here in the DAG.

Differential Revision: https://reviews.llvm.org/D52997

llvm-svn: 344141
2018-10-10 13:39:59 +00:00
Carlos Alberto Enciso c0952c8a08 Revert "[DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG."
This reverts commit r344120.

It was causing buildbot failures.

llvm-svn: 344135
2018-10-10 12:09:34 +00:00
Carlos Alberto Enciso e7a347e5f8 [DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG.
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines. 

Differential Revision: https://reviews.llvm.org/D52887

llvm-svn: 344120
2018-10-10 08:29:55 +00:00
Craig Topper 02c62aa58a [X86] Remove FeatureRTM from Skylake processor list
Summary:
There are a LOT of Skylakes and later without TSX-NI. Examples:
- SKL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3-20-GHz-
- KBL: https://ark.intel.com/products/97540/Intel-Core-i7-7560U-Processor-4M-Cache-up-to-3-80-GHz-
- KBL-R: https://ark.intel.com/products/149091/Intel-Core-i7-8565U-Processor-8M-Cache-up-to-4-60-GHz-
- CNL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3_20-GHz

This feature seems to be present only on high-end desktop and server
chips (I can't find any SKX without). This commit leaves it disabled
for all processors, but can be re-enabled for specific builds with
-mrtm.

Patch by Thiago Macieira

Reviewers: erichkeane, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53041

llvm-svn: 344116
2018-10-10 07:43:35 +00:00
Nemanja Ivanovic ec527dacca [PowerPC][NFC] Add a test case for extract and store patterns
An upcoming patch will change the codegen for these patterns. This test case is
added now so that the patch can show the differences in codegen.

llvm-svn: 344112
2018-10-10 04:18:35 +00:00
Dylan McKay 30ef1d60f9 [AVR] Fix the 'call.ll' CodeGen test
Commit r343851 changed the format of the generated instructions.

An unnecessary load has been removed. Previously, a value would be moved
from r24 into a temporary register just to be copied into r30 before the
indirect call. Now, codegen immediately loads r24 into r30, saving a
MOVW instruction.

llvm-svn: 344111
2018-10-10 03:21:42 +00:00
QingShan Zhang bc1586352e [PowerPC] Fix the assert of ISD::SIGN_EXTEND_INREG when type is v2i16 and v2i8
For ISD::SIGN_EXTEND_INREG operation of v2i16 and v2i8 types will cause assert because they are registered as custom operation. 
So that the type legalization phase will enter the custom hook, which do not handle ISD::SIGN_EXTEND_INREG operation and fall throw into unreachable assert.

Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52449

llvm-svn: 344109
2018-10-10 02:33:48 +00:00
Thomas Lively 108e98ec32 [WebAssembly] Fix fneg lowering
Summary:
Subtraction from zero and floating point negation do not have the same
semantics, so fix lowering.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52948

llvm-svn: 344107
2018-10-10 01:09:09 +00:00
Thomas Lively 409f5840a7 [WebAssembly] Handle V128 register class in explicit locals pass
Summary:
Also add tests to catch crashes in passes that are not normally run in
tests.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52959

llvm-svn: 344094
2018-10-09 23:33:16 +00:00
Nemanja Ivanovic 72d4866e57 [DAGCombiner] Expand combining of FP logical ops to sign-setting FP ops
We already do the following combines:
(bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X
(bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X

When the target has "bit preserving fp logic". This patch just extends it
to also combine:
(bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X)

As some targets have fnabs and even those that don't can efficiently lower
both the fabs and the fneg.

Differential revision: https://reviews.llvm.org/D44548

llvm-svn: 344093
2018-10-09 23:20:11 +00:00
Nemanja Ivanovic c62dfe512e [PowerPC][NFC] Commit nabs test case in preparation for committing D44548
This just adds the test case so that the different code gen is clearly visible
when the DAG Combine lands.

llvm-svn: 344091
2018-10-09 23:02:53 +00:00
Rong Xu 3d2efdfdea Recommit r343993: [X86] condition branches folding for three-way conditional codes
Fix the memory issue exposed by sanitizer.

llvm-svn: 344085
2018-10-09 22:03:40 +00:00
Nemanja Ivanovic 87873d04c3 [PowerPC] Implement hasBitPreservingFPLogic for types that can be supported
This is the PPC-specific non-controversial part of
https://reviews.llvm.org/D44548 that simply enables this combine for PPC
since PPC has these instructions.
This commit will allow the target-independent portion to be truly target
independent.

llvm-svn: 344077
2018-10-09 20:35:15 +00:00
Craig Topper f6d8400869 [X86] When lowering unsigned v2i64 setcc without SSE42, flip the sign bits in the v2i64 type then bitcast to v4i32.
This may give slightly better opportunities for DAG combine to simplify with the operations before the setcc. It also matches the type the xors will eventually be promoted to anyway so it saves a legalization step.

Almost all of the test changes are because our constant pool entry is now v2i64 instead of v4i32 on 64-bit targets. On 32-bit targets getConstant should be emitting a v4i32 build_vector and a v4i32->v2i64 bitcast.

There are a couple test cases where it appears we now combine a bitwise not with one of these xors which caused a new constant vector to be generated. This prevented a constant pool entry from being shared. But if that's an issue we're concerned about, it seems we need to address it another way that just relying a bitcast to hide it.

This came about from experiments I've been trying with pushing the promotion of and/or/xor to vXi64 later than LegalizeVectorOps where it is today. We run LegalizeVectorOps in a bottom up order. So the and/or/xor are promoted before their users are legalized. The bitcasts added for the promotion act as a barrier to computeKnownBits if we try to use it during vector legalization of a later operation. So by moving the promotion out we can hopefully get better results from computeKnownBits/computeNumSignBits like in LowerTruncate on AVX512. I've also looked at running LegalizeVectorOps in a top down order like LegalizeDAG, but thats showing some other issues.

llvm-svn: 344071
2018-10-09 19:05:50 +00:00
Craig Topper 4ea569622a [X86] Autogenerate complete checks. NFC
llvm-svn: 344060
2018-10-09 17:52:07 +00:00
Sanjay Patel a875030ab3 [AArch64][x86] add tests for bitcasted fnabs; NFC
Alternate target coverage for  D44548.

llvm-svn: 344059
2018-10-09 17:20:26 +00:00
Sanjay Patel f5fac1826a [x86] use demanded bits to simplify masked store codegen
As noted in D52747, if we prefer IR to use trunc for bool vectors rather 
than and+icmp, we can expose codegen shortcomings as seen here with masked store.

Replace a hard-coded PCMPGT simplification with the more general demanded bits call
to improve things.

Differential Revision: https://reviews.llvm.org/D52964

llvm-svn: 344048
2018-10-09 14:04:14 +00:00
Simon Pilgrim 23f880317a [SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG and CONCAT_VECTORS support to SimplifyDemandedBits
Fix for AVX1 masked load/store regression on D52964

llvm-svn: 344043
2018-10-09 13:13:35 +00:00
Nemanja Ivanovic c8fe772ac4 Fix buildbot failures with the newly added test case (triple was missing).
llvm-svn: 344037
2018-10-09 11:17:47 +00:00
Nemanja Ivanovic 4c0b110e3e [PowerPC] Remove self-copies in pre-emit peephole
There are occasionally instances where AADB rewrites registers in such a way
that a reg-reg copy becomes a self-copy. Such an instruction is obviously
redundant and can be removed. This patch does precisely that.

Note that this will not remove various nop's that we insert (which are
themselves just self-copies). The reason those are left alone is that all of
them have their own opcodes (that just encode to a self-copy).

What prompted this patch is the fact that these self-copies sometimes end up
using registers that make the instruction a priority-setting nop, thereby
having a significant effect on performance.

Differential revision: https://reviews.llvm.org/D52432

llvm-svn: 344036
2018-10-09 10:54:04 +00:00
Simon Pilgrim 720db8ed7b [X86][AVX1] Enable *_EXTEND_VECTOR_INREG lowering of 256-bit vectors
As discussed on D52964, this adds 256-bit *_EXTEND_VECTOR_INREG lowering support for AVX1 targets to help improve SimplifyDemandedBits handling.

Differential Revision: https://reviews.llvm.org/D52980

llvm-svn: 344019
2018-10-09 07:42:01 +00:00
Matthias Braun 1c96c9687f ExpandPostRAPseudos: Fix alldefsAreDead() not removing operands
One case left around nonsensical operands for the KILL instruction
which the machine verifier checks for nowadays. While this should not
hurt in release builds we should fix the machine verifier errors anyway.

llvm-svn: 344008
2018-10-09 00:07:34 +00:00
Petar Jovanovic aa97890d66 [MIPS GlobalISel] Legalize i64 add
Custom legalize s64 G_ADD for MIPS32.

Patch by Petar Avramovic.

Differential Revision: https://reviews.llvm.org/D52652

llvm-svn: 344007
2018-10-08 23:59:37 +00:00
Rong Xu 47fd015163 [X86] Revert r343993 condition branches folding for three-way conditional codes
Some buildbots failed.

llvm-svn: 343998
2018-10-08 22:08:43 +00:00
Sanjay Patel 6205ba0e7f [x86] add tests for phaddd/phaddw; NFC
More tests related to PR39195:
https://bugs.llvm.org/show_bug.cgi?id=39195

If we limit the horizontal codegen, it may require different
constraints for FP and integer.

llvm-svn: 343994
2018-10-08 19:48:18 +00:00
Rong Xu 67b1b328f7 [X86] condition branches folding for three-way conditional codes
This patch implements a pass that optimizes condition branches on x86 by
taking advantage of the three-way conditional code generated by compare
instructions.

Currently, it tries to hoisting EQ and NE conditional branch to a dominant
conditional branch condition where the same EQ/NE conditional code is
computed. An example:
bb_0:
  cmp %0, 19
  jg bb_1
  jmp bb_2
bb_1:
  cmp %0, 40
  jg bb_3
  jmp bb_4
bb_4:
  cmp %0, 20
  je bb_5
  jmp bb_6
Here we could combine the two compares in bb_0 and bb_4 and have the
following code:

bb_0:
  cmp %0, 20
  jg bb_1
  jl bb_2
  jmp bb_5
bb_1:
  cmp %0, 40
  jg bb_3
  jmp bb_6

For the case of %0 == 20 (bb_5), we eliminate two jumps, and the control height
for bb_6 is also reduced. bb_4 is gone after the optimization.

This optimization is motivated by the branch pattern generated by the switch
lowering: we always have pivot-1 compare for the inner nodes and we do a pivot
compare again the leaf (like above pattern).

This pass currently is enabled on Intel's Sandybridge and later arches. Some
reviewers pointed out that on some arches (like AMD Jaguar), this pass may
increase branch density to the point where it hurts the performance of the
branch predictor.

Differential Revision: https://reviews.llvm.org/D46662

llvm-svn: 343993
2018-10-08 18:52:39 +00:00
Scott Linder 823549a6ec [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions
Emit a waterfall loop in the general case for a potentially-divergent Rsrc
operand. When practical, avoid this by using Addr64 instructions.

Recommits r341413 with changes to update the MachineDominatorTree when present.

Differential Revision: https://reviews.llvm.org/D51742

llvm-svn: 343992
2018-10-08 18:47:01 +00:00
Simon Pilgrim 6fc8d05565 [X86][AVX2] Enable ZERO_EXTEND_VECTOR_INREG lowering of 256-bit vectors
Some necessary yak shaving before lowering *_EXTEND_VECTOR_INREG 256-bit vectors on AVX1 targets as suggested by D52964.

Differential Revision: https://reviews.llvm.org/D52970

llvm-svn: 343991
2018-10-08 18:40:50 +00:00
Tom Stellard 14d8807d9a AMDGPU/GlobalISel: Select amdgcn.cvt.pkrtz to 64-bit instructions
Summary: The 32-bit variants do not exist on VI+.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52958

llvm-svn: 343985
2018-10-08 17:49:29 +00:00