Commit Graph

127407 Commits

Author SHA1 Message Date
Matt Arsenault 1237aa2996 AMDGPU/GlobalISel: Fix selection of 16-bit shifts
llvm-svn: 373945
2019-10-07 19:10:44 +00:00
Matt Arsenault 09ec6918bc AMDGPU/GlobalISel: Select VALU G_AMDGPU_FFBH_U32
llvm-svn: 373944
2019-10-07 19:10:43 +00:00
Matt Arsenault 0b2ea91d6d AMDGPU/GlobalISel: Use S_MOV_B64 for inline constants
This hides some defects in SIFoldOperands when the immediates are
split.

llvm-svn: 373943
2019-10-07 19:07:19 +00:00
Matt Arsenault 578fa2819f AMDGPU/GlobalISel: Widen 16-bit G_MERGE_VALUEs sources
Continue making a mess of merge/unmerge legality.

llvm-svn: 373942
2019-10-07 19:05:58 +00:00
Matt Arsenault b4cbf9862c AMDGPU/GlobalISel: Select more G_INSERT cases
At minimum handle the s64 insert type, which are emitted in real cases
during legalization.

We really need TableGen to emit something to emit something like the
inverse of composeSubRegIndices do determine the subreg index to use.

llvm-svn: 373938
2019-10-07 18:43:31 +00:00
Matt Arsenault 27269054d2 GlobalISel: Add target pre-isel instructions
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.

Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.

llvm-svn: 373937
2019-10-07 18:43:29 +00:00
Jordan Rose fdaa742174 Second attempt to add iterator_range::empty()
Doing this makes MSVC complain that `empty(someRange)` could refer to
either C++17's std::empty or LLVM's llvm::empty, which previously we
avoided via SFINAE because std::empty is defined in terms of an empty
member rather than begin and end. So, switch callers over to the new
method as it is added.

https://reviews.llvm.org/D68439

llvm-svn: 373935
2019-10-07 18:14:24 +00:00
Erich Keane 8a410bcef0 Fix Calling Convention through aliases
r369697 changed the behavior of stripPointerCasts to no longer include
aliases.  However, the code in CGDeclCXX.cpp's createAtExitStub counted
on the looking through aliases to properly set the calling convention of
a call.

The result of the change was that the calling convention mismatch of the
call would be replaced with a llvm.trap, causing a runtime crash.

Differential Revision: https://reviews.llvm.org/D68584

llvm-svn: 373929
2019-10-07 17:28:03 +00:00
Florian Hahn 90b7dc9e71 [Remarks] Pass StringBlockValue as StringRef.
After changing the remark serialization, we now pass StringRefs to the
serializer. We should use StringRef for StringBlockVal, to avoid
creating temporary objects, which then cause StringBlockVal.Value to
point to invalid memory.

Reviewers: thegameg, anemet

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D68571

llvm-svn: 373923
2019-10-07 17:05:09 +00:00
Wei Mi 283df8cf74 Fix build errors caused by rL373914.
llvm-svn: 373919
2019-10-07 16:45:47 +00:00
Wenlei He b3342e180e [llvm-profdata] Minor format fix
Summary: Minor format fix for output of "llvm-profdata -show"

Reviewers: wmi

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68440

llvm-svn: 373917
2019-10-07 16:30:31 +00:00
Simon Pilgrim 9c2e123043 [X86][SSE] getTargetShuffleInputs - move VT.isSimple/isVector checks inside. NFCI.
Stop all the callers from having to check the value type before calling getTargetShuffleInputs.

llvm-svn: 373915
2019-10-07 16:15:20 +00:00
Wei Mi b523790ae1 [SampleFDO] Add compression support for any section in ExtBinary profile format
Previously ExtBinary profile format only supports compression using zlib for
profile symbol list. In this patch, we extend the compression support to any
section. User can select some or all of the sections to compress. In an
experiment, for a 45M profile in ExtBinary format, compressing name table
reduced its size to 24M, and compressing all the sections reduced its size
to 11M.

Differential Revision: https://reviews.llvm.org/D68253

llvm-svn: 373914
2019-10-07 16:12:37 +00:00
Simon Atanasyan 55ac745828 [Mips] Always save RA when disabling frame pointer elimination
This ensures that frame-based unwinding will continue to work when
calling a noreturn function; there is not much use having the caller's
frame pointer saved if you don't also have the caller's program counter.

Patch by James Clarke.

Differential Revision: https://reviews.llvm.org/D68542

llvm-svn: 373907
2019-10-07 14:01:37 +00:00
Simon Atanasyan 19ede2f53b [Mips] Fix evaluating J-format branch targets
J/JAL/JALX/JALS are absolute branches, but stay within the current
256 MB-aligned region, so we must include the high bits of the
instruction address when calculating the branch target.

Patch by James Clarke.

Differential Revision: https://reviews.llvm.org/D68548

llvm-svn: 373906
2019-10-07 14:01:22 +00:00
whitequark b63db94fa5 [LLVM-C] Add bindings to create macro debug info
Summary: The C API doesn't have the bindings to create macro debug information.

Reviewers: whitequark, CodaFi, deadalnix

Reviewed By: whitequark

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58334

llvm-svn: 373903
2019-10-07 13:57:13 +00:00
Mirko Brkusanin b5fbdf1f5e Test commit
Fix comment.

llvm-svn: 373901
2019-10-07 13:23:12 +00:00
Kevin P. Neal 1c3d19c82d [FPEnv] Add constrained intrinsics for lrint and lround
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.

Reviewed by:	andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by:	craig.topper
Differential Revision:	https://reviews.llvm.org/D64746

llvm-svn: 373900
2019-10-07 13:20:00 +00:00
Nico Weber 0fedc26a0d Revert r373888 "[IA] Recognize hexadecimal escape sequences"
It broke MC/AsmParser/directive_ascii.s on all bots:

    Assertion failed: (Index < Length && "Invalid index!"), function operator[],
        file ../../llvm/include/llvm/ADT/StringRef.h, line 243.

llvm-svn: 373898
2019-10-07 11:46:26 +00:00
Bill Wendling 6942327a8f [IA] Recognize hexadecimal escape sequences
Summary:
Implement support for hexadecimal escape sequences to match how GNU 'as'
handles them. I.e., read all hexadecimal characters and truncate to the
lower 16 bits.

Reviewers: nickdesaulniers

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68483

llvm-svn: 373888
2019-10-07 09:54:53 +00:00
Martin Storsjo dfc1aee25b Revert "[SLP] avoid reduction transform on patterns that the backend can load-combine"
This reverts SVN r373833, as it caused a failed assert "Non-zero loop
cost expected" on building numerous projects, see PR43582 for details
and reproduction samples.

llvm-svn: 373882
2019-10-07 08:21:37 +00:00
Craig Topper 2c4f078877 [X86] Support LEA64_32r in processInstrForSlow3OpLEA and use INC/DEC when possible.
Move the erasing and iterator updating inside to match the
other slow LEA function.

I've adapted code from optTwoAddrLEA and basically rebuilt the
implementation here. We do lose the kill flags now just like
optTwoAddrLEA. This runs late enough in the pipeline that
shouldn't really be a problem.

llvm-svn: 373877
2019-10-07 06:27:55 +00:00
Simon Pilgrim b4ba3cbda0 [X86][AVX] Access a scalar float/double as a free extract from a broadcast load (PR43217)
If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.

This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.

Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper.

Fixes PR43217

Differential Revision: https://reviews.llvm.org/D68544

llvm-svn: 373871
2019-10-06 21:11:45 +00:00
Simon Pilgrim d84cd7caa8 Fix signed/unsigned warning. NFCI
llvm-svn: 373870
2019-10-06 19:54:20 +00:00
Amy Kwan e36415cacf [NFC][PowerPC] Reorganize CRNotPat multiclass patterns in PPCInstrInfo.td
This is patch aims to group together the `CRNotPat` multi class instantiations
within the `PPCInstrInfo.td` file.

Integer instantiations of the multi class are grouped together into a section,
and the floating point patterns are separated into its own section.

Differential Revision: https://reviews.llvm.org/D67975

llvm-svn: 373869
2019-10-06 19:45:53 +00:00
Simon Pilgrim 739c9f0b79 [X86][SSE] Remove resolveTargetShuffleInputs and use getTargetShuffleInputs directly.
Move the resolveTargetShuffleInputsAndMask call to after the shuffle mask combine before the undef/zero constant fold instead.

llvm-svn: 373868
2019-10-06 19:07:00 +00:00
Simon Pilgrim 42010dc810 [X86][SSE] Don't merge known undef/zero elements into target shuffle masks.
Replaces setTargetShuffleZeroElements with getTargetShuffleAndZeroables which reports the Zeroable elements but doesn't merge them into the decoded target shuffle mask (the merging has been moved up into getTargetShuffleInputs until we can get rid of it entirely).

This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle mask isn't adjusted by its source inputs but instead we cache them in a parallel Zeroable mask.

llvm-svn: 373867
2019-10-06 19:06:45 +00:00
Craig Topper 570ae49d03 [X86] Add custom type legalization for v16i64->v16i8 truncate and v8i64->v8i8 truncate when v8i64 isn't legal
Summary:
The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions.

I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68428

llvm-svn: 373864
2019-10-06 18:43:08 +00:00
Craig Topper 842dde6be4 [LegalizeTypes][X86] When splitting a vselect for type legalization, don't split a setcc condition if the setcc input is legal and vXi1 conditions are supported
Summary: The VSELECT splitting code tries to split a setcc input as well. But on avx512 where mask registers are well supported it should be better to just split the mask and use a single compare.

Reviewers: RKSimon, spatel, efriedma

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68359

llvm-svn: 373863
2019-10-06 18:43:03 +00:00
Whitney Tsang dcb75bf843 [LOOPGUARD] Remove asserts in getLoopGuardBranch
Summary: The assertion in getLoopGuardBranch can be a 'return nullptr'
under if condition.
Authored By: DTharun
Reviewer: Whitney, fhahn
Reviewed By: Whitney, fhahn
Subscribers: fhahn, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D66084

llvm-svn: 373857
2019-10-06 16:39:43 +00:00
Simon Pilgrim 5c876303ec [X86][SSE] resolveTargetShuffleInputs - call getTargetShuffleInputs instead of using setTargetShuffleZeroElements directly. NFCI.
llvm-svn: 373855
2019-10-06 15:42:25 +00:00
Sanjay Patel f643fabb52 Revert [DAGCombine] Match more patterns for half word bswap
This reverts r373850 (git commit 25ba49824d)

This patch appears to cause multiple codegen regression test failures - http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10680

llvm-svn: 373853
2019-10-06 15:27:34 +00:00
Xiangling Liao ee68f1ec67 [NFC] Replace 'isDarwin' with 'IsDarwin'
Summary: Replace 'isDarwin' with 'IsDarwin' based on LLVM naming convention.

Differential Revision: https://reviews.llvm.org/D68336

llvm-svn: 373852
2019-10-06 14:44:22 +00:00
Sanjay Patel aab8b3ab9c [InstCombine] fold fneg disguised as select+fmul (PR43497)
Extends rL373230 and solves the motivating bug (although in a narrow way):
https://bugs.llvm.org/show_bug.cgi?id=43497

llvm-svn: 373851
2019-10-06 14:15:48 +00:00
Amaury Sechet 25ba49824d [DAGCombine] Match more patterns for half word bswap
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68250

llvm-svn: 373850
2019-10-06 14:14:55 +00:00
Simon Pilgrim 2dee7e5561 [X86][AVX] combineExtractSubvector - merge duplicate variables. NFCI.
llvm-svn: 373849
2019-10-06 13:25:10 +00:00
Sanjay Patel c38881a6b7 [InstCombine] don't assume 'inbounds' for bitcast pointer to GEP transform (PR43501)
https://bugs.llvm.org/show_bug.cgi?id=43501
We can't declare a GEP 'inbounds' in general. But we may salvage that information if
we have known dereferenceable bytes on the source pointer.

Differential Revision: https://reviews.llvm.org/D68244

llvm-svn: 373847
2019-10-06 13:08:08 +00:00
Simon Pilgrim 032dd9b086 [X86][SSE] matchVectorShuffleAsBlend - use Zeroable element mask directly.
We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly.

This allows us to remove createTargetShuffleMask.

This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask.

llvm-svn: 373846
2019-10-06 12:38:38 +00:00
David Zarzycki 7653ff398d [X86] Enable AVX512BW for memcmp()
llvm-svn: 373845
2019-10-06 10:25:52 +00:00
Matt Arsenault e59296a051 AMDGPU/GlobalISel: Fall back on weird G_EXTRACT offsets
llvm-svn: 373842
2019-10-06 01:41:22 +00:00
Matt Arsenault 786a3953ba AMDGPU/GlobalISel: RegBankSelect mul24 intrinsics
llvm-svn: 373841
2019-10-06 01:37:39 +00:00
Matt Arsenault c0ec72d4f8 AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsics
llvm-svn: 373840
2019-10-06 01:37:38 +00:00
Matt Arsenault bcd6b1d209 AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS
llvm-svn: 373839
2019-10-06 01:37:37 +00:00
Matt Arsenault a5b9c75674 GlobalISel: Partially implement lower for G_EXTRACT
Turn into shift and truncate. Doesn't yet handle pointers.

llvm-svn: 373838
2019-10-06 01:37:35 +00:00
Matt Arsenault 69c65a8609 AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsics
This wasn't updated for the immarg handling change.

llvm-svn: 373837
2019-10-06 01:37:34 +00:00
Craig Topper 2decdf42b9 [FastISel] Copy the inline assembly dialect to the INLINEASM instruction.
Fixes PR43575.

llvm-svn: 373836
2019-10-05 23:21:17 +00:00
Simon Pilgrim 8815be04ec [X86][AVX] Push sign extensions of comparison bool results through bitops (PR42025)
As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop.

This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended.

Differential Revision: https://reviews.llvm.org/D68226

llvm-svn: 373834
2019-10-05 20:49:34 +00:00
Sanjay Patel e2321bb448 [SLP] avoid reduction transform on patterns that the backend can load-combine
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146

We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).

Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.

In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:

  movbe   rax, qword ptr [rdi]

or:

  mov     rax, qword ptr [rdi]

Not some (half) vector monstrosity as we currently do using SLP:

  vpmovzxbq       ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
  vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
  movzx   eax, byte ptr [rdi]
  movzx   ecx, byte ptr [rdi + 5]
  shl     rcx, 40
  movzx   edx, byte ptr [rdi + 6]
  shl     rdx, 48
  or      rdx, rcx
  movzx   ecx, byte ptr [rdi + 7]
  shl     rcx, 56
  or      rcx, rdx
  or      rcx, rax
  vextracti128    xmm1, ymm0, 1
  vpor    xmm0, xmm0, xmm1
  vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
  vpor    xmm0, xmm0, xmm1
  vmovq   rax, xmm0
  or      rax, rcx
  vzeroupper
  ret

Differential Revision: https://reviews.llvm.org/D67841

llvm-svn: 373833
2019-10-05 18:03:58 +00:00
Simon Pilgrim 9ecacb0d54 [X86] lowerShuffleAsLanePermuteAndRepeatedMask - variable renames. NFCI.
Rename some variables to match lowerShuffleAsRepeatedMaskAndLanePermute - prep work toward adding some equivalent sublane functionality.

llvm-svn: 373832
2019-10-05 16:08:30 +00:00
Simon Pilgrim f609c0a303 BranchFolding - IsBetterFallthrough - assert non-null pointers. NFCI.
Silences static analyzer null dereference warnings.

llvm-svn: 373823
2019-10-05 13:20:30 +00:00
Mehdi Amini 482f4d9aa9 Expose ProvidePositionalOption as a public API
The motivation is to reuse the key value parsing logic here to
parse instance specific pass options within the context of MLIR.
The primary functionality exposed is the "," splitting for
arrays and the logic for properly handling duplicate definitions
of a single flag.

Patch by: Parker Schuh <parkers@google.com>

Differential Revision: https://reviews.llvm.org/D68294

llvm-svn: 373815
2019-10-05 01:37:04 +00:00
Philip Reames d5a4dad206 Fix a *nasty* miscompile in experimental unordered atomic lowering
This is an omission in rL371441.  Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block.

Included test case is how I spotted this.  We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with.  I'm sure it showed up a bunch of other ways as well.

Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode.  Finding this with testing would have been "unpleasant".  

llvm-svn: 373814
2019-10-05 00:32:10 +00:00
Ana Pazos ea835f5ce8 [RISCV] Added missing ImmLeaf predicates
simm9_lsb0 and simm12_lsb0 operand types were missing predicates.

llvm-svn: 373812
2019-10-04 23:42:07 +00:00
Aditya Kumar 6a2673605e Invalidate assumption cache before outlining.
Subscribers: llvm-commits

Tags: #llvm

Reviewers: compnerd, vsk, sebpop, fhahn, tejohnson

Reviewed by: vsk

Differential Revision: https://reviews.llvm.org/D68478

llvm-svn: 373807
2019-10-04 22:46:42 +00:00
Reid Kleckner 67cfa79c01 Revert [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks
This reverts r371177 (git commit f879c68755)

It caused PR43566 by removing empty, address-taken MachineBasicBlocks.
Such blocks may have references from blockaddress or other operands, and
need more consideration to be removed.

See the PR for a test case to use when relanding.

llvm-svn: 373805
2019-10-04 22:24:21 +00:00
Roman Lebedev fb5af8b9b9 [InstCombine] Fold 'icmp eq/ne (?trunc (lshr/ashr %x, bitwidth(x)-1)), 0' -> 'icmp sge/slt %x, 0'
We do indeed already get it right in some cases, but only transitively,
with one-use restrictions. Since we only need to produce a single
comparison, it makes sense to match the pattern directly:
  https://rise4fun.com/Alive/kPg

llvm-svn: 373802
2019-10-04 22:16:22 +00:00
Roman Lebedev f304d4d185 [InstCombine] Right-shift shift amount reassociation with truncation (PR43564, PR42391)
Initially (D65380) i believed that if we have rightshift-trunc-rightshift,
we can't do any folding. But as it usually happens, i was wrong.

https://rise4fun.com/Alive/GEw
https://rise4fun.com/Alive/gN2O

In https://bugs.llvm.org/show_bug.cgi?id=43564 we happen to have
this very sequence, of two right shifts separated by trunc.
And "just" so that happens, we apparently can fold the pattern
if the total shift amount is either 0, or it's equal to the bitwidth
of the innermost widest shift - i.e. if we are left with only the
original sign bit. Which is exactly what is wanted there.

llvm-svn: 373801
2019-10-04 22:16:11 +00:00
Jessica Paquette 784892c964 [MachineOutliner] Disable outlining from noreturn functions
Outlining from noreturn functions doesn't do the correct thing right now. The
outliner should respect that the caller is marked noreturn. In the event that
we have a noreturn function, and the outlined code is in tail position, the
outliner will not see that the outlined function should be tail called. As a
result, you end up with a regular call containing a return.

Fixing this requires that we check that all candidates live inside noreturn
functions. So, for the sake of correctness, don't outline from noreturn
functions right now.

Add machine-outliner-noreturn.mir to test this.

llvm-svn: 373791
2019-10-04 21:24:12 +00:00
Huihui Zhang da9e252491 [NFC] Add { } to silence compiler warning [-Wmissing-braces].
../llvm-project/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:355:48: warning: suggest braces around initialization of subobject [-Wmissing-braces]
      return addMappingFromTable<1>(MI, MRI, { 0 }, Table);
                                               ^
                                               {}

llvm-svn: 373784
2019-10-04 20:04:34 +00:00
Eli Friedman 23ae13d51f [ScheduleDAG] When a node is cloned, add an edge between the nodes.
InstrEmitter's virtual register handling assumes that clones are emitted
after the cloned node.  Make sure this assumption actually holds.

Fixes a "Node emitted out of order - early" assertion on the testcase.

This is probably a very rare case to actually hit in practice; even
without the explicit edge, the scheduler will usually end up scheduling
the nodes in the expected order due to other constraints.

Differential Revision: https://reviews.llvm.org/D68068

llvm-svn: 373782
2019-10-04 19:51:40 +00:00
Martin Storsjo 5b2e0ba28e [JITLink] Silence GCC warnings. NFC.
Use parentheses in an expression with mixed && and ||.

Differential Revision: https://reviews.llvm.org/D68447

llvm-svn: 373779
2019-10-04 19:47:42 +00:00
Craig Topper 87aa59a0c7 [X86] Remove isel patterns for mask vpcmpgt/vpcmpeq. Switch vpcmp to these based on the immediate in MCInstLower
The immediate form of VPCMP can represent these completely. The
vpcmpgt/eq are just shorter encodings.

This patch removes the isel patterns and just swaps the opcodes
and removes the immediate in MCInstLower. This matches where we do
some other encodings tricks.

Removes over 10K bytes from the isel table.

Differential Revision: https://reviews.llvm.org/D68446

llvm-svn: 373766
2019-10-04 18:02:46 +00:00
Craig Topper 074fa390d2 [X86] Add DAG combine to form saturating VTRUNCUS/VTRUNCS from VTRUNC
We already do this for ISD::TRUNCATE, but we can do the same for X86ISD::VTRUNC

Differential Revision: https://reviews.llvm.org/D68432

llvm-svn: 373765
2019-10-04 17:53:18 +00:00
James Molloy 9baac83a2e [ModuloSchedule] Do not remap terminators
This is a trivial point fix. Terminator instructions aren't scheduled, so
we shouldn't expect to be able to remap them.

This doesn't affect Hexagon and PPC because their terminators are always
hardware loop backbranches that have no register operands.

llvm-svn: 373762
2019-10-04 17:15:25 +00:00
Dmitry Preobrazhensky 434d59250e [AMDGPU][MC][GFX10][WS32] Corrected decoding of dst operand for v_cmp_*_sdwa opcodes
See bug 43484: https://bugs.llvm.org/show_bug.cgi?id=43484

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68349

llvm-svn: 373745
2019-10-04 13:04:17 +00:00
Simon Pilgrim 84f5cd75b3 Fix MSVC "not all control paths return a value" warning. NFCI.
llvm-svn: 373741
2019-10-04 12:45:27 +00:00
Dmitry Preobrazhensky 9bd763679f [AMDGPU][MC][GFX10] Enabled decoding of 'null' operand
See bug 43485: https://bugs.llvm.org/show_bug.cgi?id=43485

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68348

llvm-svn: 373740
2019-10-04 12:38:36 +00:00
Tim Northover a7d90af1be ARM-Darwin: keep the frame register reserved even if not updated.
Darwin platforms need the frame register to always point at a valid record even
if it's not updated in a leaf function. Backtraces are more important than one
extra GPR.

llvm-svn: 373738
2019-10-04 12:29:32 +00:00
Dmitry Preobrazhensky 94d040706d [AMDGPU][MC][GFX10] Corrected definition of FLAT GLOBAL/SCRATCH instructions
See bug 43483: https://bugs.llvm.org/show_bug.cgi?id=43483

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D68347

llvm-svn: 373736
2019-10-04 12:10:22 +00:00
Simon Pilgrim 329ae6ad71 Fix MSVC "not all control paths return a value" warning. NFCI.
llvm-svn: 373730
2019-10-04 11:24:51 +00:00
Simon Pilgrim 7de9a5ce60 Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 373729
2019-10-04 11:24:35 +00:00
Jeremy Morse 61800a75b7 [DebugInfo] LiveDebugValues: move DBG_VALUE creation into VarLoc class
Rather than having a mixture of location-state shared between DBG_VALUEs
and VarLoc objects in LiveDebugValues, this patch makes VarLoc the
master record of variable locations. The refactoring means that the
transfer of locations from one place to another is always a performed by
an operation on an existing VarLoc, that produces another transferred
VarLoc. DBG_VALUEs are only created at the end of LiveDebugValues, once
all locations are known. As a plus, there is now only one method where
DBG_VALUEs can be created.

The test case added covers a circumstance that is now impossible to
express in LiveDebugValues: if an already-indirect DBG_VALUE is spilt,
previously it would have been restored-from-spill as a direct DBG_VALUE.
We now don't lose this information along the way, as VarLocs always
refer back to the "original" non-transfer DBG_VALUE, and we can always
work out whether a location was "originally" indirect.

Differential Revision: https://reviews.llvm.org/D67398

llvm-svn: 373727
2019-10-04 10:53:47 +00:00
Jeremy Morse 0ca48de26c [DebugInfo] LiveDebugValues: defer DBG_VALUE creation during analysis
When transfering variable locations from one place to another,
LiveDebugValues immediately creates a DBG_VALUE representing that
transfer. This causes trouble if the variable location should
subsequently be invalidated by a loop back-edge, such as in the added
test case: the transfer DBG_VALUE from a now-invalid location is used
as proof that the variable location is correct. This is effectively a
self-fulfilling prophesy.

To avoid this, defer the insertion of transfer DBG_VALUEs until after
analysis has completed. Some of those transfers are still sketchy, but
we don't propagate them into other blocks now.

Differential Revision: https://reviews.llvm.org/D67393

llvm-svn: 373720
2019-10-04 09:38:05 +00:00
Matt Arsenault d7cad4fb41 AMDGPU/GlobalISel: Fix using wrong addrspace for aperture
This was always passing the destination flat address space, when it
should be picking between the two valid source options.

llvm-svn: 373716
2019-10-04 08:35:38 +00:00
Matt Arsenault 412e0bf8f3 AMDGPU/GlobalISel: Select G_PTRTOINT
llvm-svn: 373715
2019-10-04 08:35:37 +00:00
Matt Arsenault be9521acaa AMDGPU/GlobalISel: Support wave32 waterfall loops
llvm-svn: 373714
2019-10-04 08:35:35 +00:00
David Zarzycki 03b216d854 [X86] Enable inline memcmp() to use AVX512
llvm-svn: 373706
2019-10-04 07:42:34 +00:00
Martin Storsjo b8f790234f Revert "[Symbolize] Use the local MSVC C++ demangler instead of relying on dbghelp. NFC."
This reverts SVN r373698, as it broke sanitizer tests, e.g. in
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/52441.

llvm-svn: 373701
2019-10-04 07:22:37 +00:00
Piotr Sobczak 165e469145 [AMDGPU][SILoadStoreOptimizer] NFC: Refactor code
Summary:
This patch fixes a potential aliasing problem in InstClassEnum,
where local values were mixed with machine opcodes.

Introducing InstSubclass will keep them separate and help extending
InstClassEnum with other instruction types (e.g. MIMG) in the future.

This patch also makes getSubRegIdxs() more concise.

Reviewers: nhaehnle, arsenm, tstellar

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68384

llvm-svn: 373699
2019-10-04 07:09:40 +00:00
Martin Storsjo 1ca074b86a [Symbolize] Use the local MSVC C++ demangler instead of relying on dbghelp. NFC.
This allows making a couple llvm-symbolizer tests run in all
environments.

Differential Revision: https://reviews.llvm.org/D68133

llvm-svn: 373698
2019-10-04 07:05:42 +00:00
Lang Hames 7f379a3366 [JITLink] Explicitly destroy bumpptr-allocated blocks to avoid a memory leak.
llvm-svn: 373693
2019-10-04 05:24:40 +00:00
Lang Hames 6fd9129aaf [JITLink] Fix an unused variable warning.
llvm-svn: 373692
2019-10-04 05:24:39 +00:00
Lang Hames 4e920e58e6 [JITLink] Switch from an atom-based model to a "blocks and symbols" model.
In the Atom model the symbols, content and relocations of a relocatable object
file are represented as a graph of atoms, where each Atom represents a
contiguous block of content with a single name (or no name at all if the
content is anonymous), and where edges between Atoms represent relocations.
If more than one symbol is associated with a contiguous block of content then
the content is broken into multiple atoms and layout constraints (represented by
edges) are introduced to ensure that the content remains effectively contiguous.
These layout constraints must be kept in mind when examining the content
associated with a symbol (it may be spread over multiple atoms) or when applying
certain relocation types (e.g. MachO subtractors).

This patch replaces the Atom model in JITLink with a blocks-and-symbols model.
The blocks-and-symbols model represents relocatable object files as bipartite
graphs, with one set of nodes representing contiguous content (Blocks) and
another representing named or anonymous locations (Symbols) within a Block.
Relocations are represented as edges from Blocks to Symbols. This scheme
removes layout constraints (simplifying handling of MachO alt-entry symbols,
and hopefully ELF sections at some point in the future) and simplifies some
relocation logic.

llvm-svn: 373689
2019-10-04 03:55:26 +00:00
Shiva Chen ff55e2e047 [RISCV] Split SP adjustment to reduce the offset of callee saved register spill and restore
We would like to split the SP adjustment to reduce the instructions in
prologue and epilogue as the following case. In this way, the offset of
the callee saved register could fit in a single store.

    add     sp,sp,-2032
    sw      ra,2028(sp)
    sw      s0,2024(sp)
    sw      s1,2020(sp)
    sw      s3,2012(sp)
    sw      s4,2008(sp)
    add     sp,sp,-64

Differential Revision: https://reviews.llvm.org/D68011

llvm-svn: 373688
2019-10-04 02:00:57 +00:00
Peter Collingbourne 71662116fd LowerTypeTests: Rename local functions to avoid collisions with identically named functions in ThinLTO modules.
Without this we can encounter link errors or incorrect behaviour
at runtime as a result of the wrong function being referenced.

Differential Revision: https://reviews.llvm.org/D67945

llvm-svn: 373678
2019-10-03 23:42:44 +00:00
Alina Sbirlea 145cdad119 [MemorySSA] Don't hoist stores if interfering uses (as calls) exist.
llvm-svn: 373674
2019-10-03 22:20:04 +00:00
Sanjay Patel 288079aafd [DAGCombiner] add operation legality checks before creating shift ops (PR43542)
As discussed on llvm-dev and:
https://bugs.llvm.org/show_bug.cgi?id=43542
...we have transforms that assume shift operations are legal and transforms to
use them are profitable, but that may not hold for simple targets.

In this case, the MSP430 target custom lowers shifts by repeating (many)
simpler/fixed ops. That can be avoided by keeping this code as setcc/select.

Differential Revision: https://reviews.llvm.org/D68397

llvm-svn: 373666
2019-10-03 21:34:04 +00:00
Nico Weber 204623e05c Reland r349624: Let TableGen write output only if it changed, instead of doing so in cmake
Move the write-if-changed logic behind a flag and don't pass it
with the MSVC generator. msbuild doesn't have a restat optimization,
so not doing write-if-change there doesn't have a cost, and it
should fix whatever causes PR43385.

llvm-svn: 373664
2019-10-03 21:22:28 +00:00
David Blaikie 2ac586c58f DebugInfo: Generalize rnglist emission as a precursor to reusing it for loclist emission
llvm-svn: 373663
2019-10-03 20:56:23 +00:00
Nick Desaulniers ede784ff5a [AArch64InstPrinter] prefer bfi to bfc for < armv8.2-a
Summary:
Fixes pr/42576.

Link: https://github.com/ClangBuiltLinux/linux/issues/697

Reviewers: t.p.northover

Reviewed By: t.p.northover

Subscribers: kristof.beyls, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68356

llvm-svn: 373655
2019-10-03 20:10:02 +00:00
Jinsong Ji 4a6881eabc [PowerPC] Adjust the naming and operand order of fnmsub patterns
Summary:
This is follow up patch of https://reviews.llvm.org/D67595.
Adjust naming and the Commutable operands for additional patterns
to make it easier to read.

The testcase update also show that we can save some unecessary fmr as
well.

Reviewers: #powerpc, steven.zhang, hfinkel, nemanjai

Reviewed By: #powerpc, nemanjai

Subscribers: wuzish, hiraditya, kbarton, MaskRay, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68112

llvm-svn: 373652
2019-10-03 19:36:42 +00:00
Jordan Rupprecht b2b43c8576 [NFC] Fix unused variable in release builds
llvm-svn: 373646
2019-10-03 18:35:44 +00:00
Craig Topper 185ee6ec7c [X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes.
This patch recognizes the shuffle pattern we get from a
v8i64->v8i8 truncate when v8i64 isn't a legal type.

With VLX we can use two VTRUNCs, unpckldq, and a insert_subvector.

Diffrential Revision: https://reviews.llvm.org/D68374

llvm-svn: 373645
2019-10-03 18:34:42 +00:00
Simon Pilgrim eb8d85e5db [X86] matchShuffleWithSHUFPD - use Zeroable element mask directly. NFCI.
We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly.

This only leaves one user of createTargetShuffleMask which we can hopefully get rid of in a similar manner.

This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask.

llvm-svn: 373641
2019-10-03 18:13:50 +00:00
Matt Arsenault ed77b27441 AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELT
llvm-svn: 373639
2019-10-03 17:59:03 +00:00
Matt Arsenault 233ff982c7 AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect
Register indexing 64-bit elements is possible on the SALU, but not the
VALU. Handle splitting this into two 32-bit indexes. Extend waterfall
loop handling to allow moving a range of instructions.

llvm-svn: 373638
2019-10-03 17:55:27 +00:00
Matt Arsenault 56271fe180 AMDGPU/GlobalISel: Allow VGPR to index SGPR register
We can still do a waterfall loop over the index if using a VGPR to
index an SGPR. The result will still be a VGPR, but we can avoid the
wide copy of the source register to a VGPR.

llvm-svn: 373637
2019-10-03 17:50:32 +00:00
Matt Arsenault 3d23e58dbe AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 and
This would try to do FewerElements to v9s8

llvm-svn: 373635
2019-10-03 17:50:29 +00:00
Tom Stellard e6f5171305 AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructions
Summary:
This adds a pre-pass to this optimization that scans through the basic
block and generates lists of mergeable instructions with one list per unique
address.

In the optimization phase instead of scanning through the basic block for mergeable
instructions, we now iterate over the lists generated by the pre-pass.

The decision to re-optimize a block is now made per list, so if we fail to merge any
instructions with the same address, then we do not attempt to optimize them in
future passes over the block.  This will help to reduce the time this pass
spends re-optimizing instructions.

In one pathological test case, this change reduces the time spent in the
SILoadStoreOptimizer from 0.2s to 0.03s.

This restructuring will also make it possible to implement further solutions in
this pass, because we can now add less expensive checks to the pre-pass and
filter instructions out early which will avoid the need to do the expensive
scanning during the optimization pass. For example, checking for adjacent
offsets is an inexpensive test we can move to the pre-pass.

Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65961

llvm-svn: 373630
2019-10-03 17:11:47 +00:00
James Molloy 9972c992eb [ModuloSchedule] removeBranch() *before* creating the trip count condition
The Hexagon code assumes there's no existing terminator when inserting its
trip count condition check.

This causes swp-stages5.ll to break. The generated code looks good to me,
it is likely a permutation. I have disabled the new codegen path to keep
everything green and will investigate along with the other 3-4 tests
that have different codegen.

Fixes expensive-checks build.

llvm-svn: 373629
2019-10-03 17:10:32 +00:00