Commit Graph

124005 Commits

Author SHA1 Message Date
Fangrui Song f955d5f623 SlotIndexes: delete unused functions
llvm-svn: 364154
2019-06-23 16:05:29 +00:00
Sanjay Patel 13a5ae58fc [InstCombine] squash is-power-of-2 that uses ctpop
This is another intermediate IR step towards solving PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314

We can test if a value is power-of-2-or-0 using ctpop(X) < 2,
so combining that with a non-zero check of the input is the
same as testing if exactly 1 bit is set:

(X != 0) && (ctpop(X) u< 2) --> ctpop(X) == 1

Differential Revision: https://reviews.llvm.org/D63660

llvm-svn: 364153
2019-06-23 14:22:37 +00:00
Fangrui Song 6620e3b2f6 SlotIndexes: simplify IdxMBBPair operators
llvm-svn: 364152
2019-06-23 13:16:03 +00:00
Craig Topper 6ddc7912b0 [SelectionDAG] Remove the code that attempts to calculate the alignment for the second half of a split masked load/store.
The code divides the alignment by 2 if the original alignment is
equal to the original VT size. But this wouldn't be correct
if the alignment was larger than the VT size.

The memory operand object already takes care of calling MinAlign
on the base alignment and the memory pointer offset. So we don't
need any special code at all.

llvm-svn: 364151
2019-06-23 07:00:46 +00:00
Craig Topper cadd826d0a [X86][SelectionDAG] Cleanup and simplify masked_load/masked_store in tablegen. Use more precise PatFrags for scalar masked load/store.
Rename masked_load/masked_store to masked_ld/masked_st to discourage
their direct use. We need to check truncating/extending and
compressing/expanding before using them. This revealed that
our scalar masked load/store patterns were misusing these.

With those out of the way, renamed masked_load_unaligned and
masked_store_unaligned to remove the "_unaligned". We didn't
check the alignment anyway so the name was somewhat misleading.

Make the aligned versions inherit from masked_load/store instead
from a separate identical version. Merge the 3 different alignments
PatFrags into a single version that uses the VT from the SDNode to
determine the size that the alignment needs to match.

llvm-svn: 364150
2019-06-23 06:06:04 +00:00
Keno Fischer 5f4ae7c457 [Support] Fix build under Emscripten
Summary:
Emscripten's libc doesn't define MNT_LOCAL, thus causing a build
failure in the fallback path. However, to the best of my knowledge,
it also doesn't support remote file system mounts, so we may simply
return `true` here (as we do for e.g. Fuchsia). With this fix, the
core LLVM libraries build correctly under emscripten (though some
of the tools and utils do not).

Reviewers: kripken
Differential Revision: https://reviews.llvm.org/D63688

llvm-svn: 364143
2019-06-23 00:29:59 +00:00
Don Hinton 64b0924531 Revert [CommandLine] Remove OptionCategory and SubCommand caches from the Option class.
This reverts r364134 (git commit a5b83bc9e3)

Caused errors in the asan bot, so the GeneralCategory global needs to
be changed to ManagedStatic.

Differential Revision: https://reviews.llvm.org/D62105

llvm-svn: 364141
2019-06-22 23:32:36 +00:00
Simon Pilgrim a962c1bc0f [X86][SSE] Fold extract_subvector(vselect(x,y,z),0) -> vselect(extract_subvector(x,0),extract_subvector(y,0),extract_subvector(z,0))
llvm-svn: 364136
2019-06-22 17:57:01 +00:00
Philip Reames 8deb84c8ef Exploit a zero LoopExit count to eliminate loop exits
This turned out to be surprisingly effective. I was originally doing this just for completeness sake, but it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

Once this is in, I'm thinking about trying to build on the same infrastructure to eliminate provably untaken checks. There may be something generally interesting here.

Differential Revision: https://reviews.llvm.org/D63618

llvm-svn: 364135
2019-06-22 17:54:25 +00:00
Don Hinton a5b83bc9e3 [CommandLine] Remove OptionCategory and SubCommand caches from the Option class.
Summary:
This change processes `OptionCategory`s and `SubCommand`s as they
 are seen instead of caching them in the Option class and processing
them later.  Doing so simplifies the work needed to be done by the Global
parser and significantly reduces the size of the Option class to a mere 64
bytes.

Removing  the `OptionCategory` cache saved 24 bytes, and removing
the `SubCommand` cache saved an additional 48 bytes, for a total of a
72 byte reduction.

Reviewers: beanz, zturner, MaskRay, serge-sans-paille

Reviewed By: serge-sans-paille

Subscribers: serge-sans-paille, tstellar, zturner, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62105

llvm-svn: 364134
2019-06-22 17:22:50 +00:00
Hubert Tong 6f3222ed94 [NFC] Fix indentation in PPCAsmPrinter.cpp
After r248261, the indentation switches, inside a namespace definition,
between indenting and not indenting one level in for that namespace; the
abomination occurs in the middle of a class definition. Fix that.

llvm-svn: 364133
2019-06-22 16:03:29 +00:00
Hubert Tong d801cb1f54 [PowerPC][NFC] Move comment to the relevant function
A comment that applies to a virtual destructor was placed on a class
constructor. Move the comment to where it belongs.

llvm-svn: 364132
2019-06-22 16:02:02 +00:00
Nikita Popov 8c8e40f763 [NewGVN] Fix copy/paste mistake in cast
llvm-svn: 364130
2019-06-22 10:20:13 +00:00
Nikita Popov e96fda726e [NewGVN] Remove dead SwitchEdges variable; NFC
llvm-svn: 364129
2019-06-22 10:20:07 +00:00
Peter Collingbourne 8cd780b432 AArch64: Add support for reading pc using llvm.read_register.
This is useful for allowing code to efficiently take an address
that can be later mapped onto debug info. Currently the hwasan
pass achieves this by taking the address of the current function:
http://llvm-cs.pcc.me.uk/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#921

but this costs two instructions (plus a GOT entry in PIC code) per function
with stack variables. This will allow the cost to be reduced to a single
instruction.

Differential Revision: https://reviews.llvm.org/D63471

llvm-svn: 364126
2019-06-22 03:03:25 +00:00
Fangrui Song 01d649c249 [CMake] Delete redundant DEPENDS/LINK_LIBS from LineEditor/XRay
The link dependencies are already specified in LLVMBuild.txt

llvm-svn: 364125
2019-06-22 01:50:21 +00:00
Fangrui Song 43e14390b0 Make GlobalISel depend on SelectionDAG after D63169
GlobalISel/IRTranslator.cpp now references SelectionDAG/FunctionLoweringInfo.cpp.
This fixes a link error in -DBUILD_SHARED_LIBS=on builds:

    ld.lld: error: undefined symbol: llvm::FunctionLoweringInfo::clear()
    >>> referenced by IRTranslator.cpp:2198 (../lib/CodeGen/GlobalISel/IRTranslator.cpp:2198)
    >>>               lib/CodeGen/GlobalISel/CMakeFiles/LLVMGlobalISel.dir/IRTranslator.cpp.o:(llvm::IRTranslator::finalizeFunction())

llvm-svn: 364124
2019-06-22 01:30:17 +00:00
Peter Collingbourne 4608868d2f AArch64: Prefer FP-relative debug locations in HWASANified functions.
To help produce better diagnostics for stack use-after-return, we'd like
to be able to determine the addresses of each HWASANified function's local
variables given a small amount of information recorded on entry to the
function. Currently we require all HWASANified functions to use frame pointers
and record (PC, FP) on function entry. This works better than recording SP
because FP cannot change during the function, unlike SP which can change
e.g. due to dynamic alloca.

However, most variables currently end up using SP-relative locations in their
debug info. This prevents us from recomputing the address of most variables
because the distance between SP and FP isn't recorded in the debug info. To
address this, make the AArch64 backend prefer FP-relative debug locations
when producing debug info for HWASANified functions.

Differential Revision: https://reviews.llvm.org/D63300

llvm-svn: 364117
2019-06-22 00:06:51 +00:00
Tom Tan 7ecb5145ba [COFF, ARM64] Fix encoding of debugtrap for Windows
On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is
mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break
point instruction on ARM64 which is recognized by Windows debugger and
exception handling code, so llvm.debugtrap should map to it instead of
redirecting to llvm.trap (brk #1) as the default implementation.

Differential Revision: https://reviews.llvm.org/D63635

llvm-svn: 364115
2019-06-21 23:38:05 +00:00
Reid Kleckner 592a193285 Revert [SLP] Look-ahead operand reordering heuristic.
This reverts r364084 (git commit 5698921be2)

It caused crashes while compiling a file in Chrome. Reduction
forthcoming.

llvm-svn: 364111
2019-06-21 23:10:25 +00:00
Julian Lettner 19c4d660f4 [ASan] Use dynamic shadow on 32-bit iOS and simulators
The VM layout on iOS is not stable between releases. On 64-bit iOS and
its derivatives we use a dynamic shadow offset that enables ASan to
search for a valid location for the shadow heap on process launch rather
than hardcode it.

This commit extends that approach for 32-bit iOS plus derivatives and
their simulators.

rdar://50645192
rdar://51200372
rdar://51767702

Reviewed By: delcypher

Differential Revision: https://reviews.llvm.org/D63586

llvm-svn: 364105
2019-06-21 21:01:39 +00:00
Matt Arsenault 22e3dc60a0 AMDGPU: Fix not using s33 for scratch wave offset in kernels
Fixes missing piece from r363990.

llvm-svn: 364099
2019-06-21 20:04:02 +00:00
Craig Topper 4649a051bf [X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0)
128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg.

This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns.

Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining.

I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload.

Differential Revision: https://reviews.llvm.org/D63512

llvm-svn: 364095
2019-06-21 19:10:21 +00:00
Craig Topper 4569cdbcf5 [X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types.
We don't have any Custom handling during type legalization. Only
operation legalization.

Fixes PR42355

llvm-svn: 364093
2019-06-21 18:50:00 +00:00
Craig Topper ce6c06dfdd [X86] Add a debug print of the node in the default case for unhandled opcodes in ReplaceNodeResults.
This should be unreachable, but bugs can make it reachable. This
adds a debug print so we can see the bad node in the output when
the llvm_unreachable triggers.

llvm-svn: 364091
2019-06-21 18:49:21 +00:00
Simon Pilgrim 5dba4ed208 [X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffle
Subvector shuffling often ends up as insert/extract subvector.

llvm-svn: 364090
2019-06-21 18:35:04 +00:00
Amara Emerson 6e71b34fe6 [AArch64][GlobalISel] Implement selection support for the new G_JUMP_TABLE and G_BRJT ops.
With this we can now fully code generate jump tables, which is important for code size.

Differential Revision: https://reviews.llvm.org/D63223

llvm-svn: 364086
2019-06-21 18:10:41 +00:00
Amara Emerson fe4625fb24 [GlobalISel][IRTranslator] Change switch table translation to generate jump tables and range checks.
This change makes use of the newly refactored SwitchLoweringUtils code from
SelectionDAG to in order to generate jump tables and range checks where appropriate.

Much of this code is ported from SDAG with some modifications. We generate
G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means
that targets which previously relied on the naive one MBB per case stmt
translation will now start falling back until they add support for the new opcodes.

For range checks, we don't generate any previously unused operations. This
just recognizes contiguous ranges of case values and generates a single block per
range. Single case value blocks are just a special case of ranges so we get that
support almost for free.

There are still some optimizations missing that I haven't ported over, and
bit-tests are also unimplemented. This patch series is already complex enough.

Actual arm64 support for selection of jump tables is coming in a later patch.

Differential Revision: https://reviews.llvm.org/D63169

llvm-svn: 364085
2019-06-21 18:10:38 +00:00
Simon Pilgrim 5698921be2 [SLP] Look-ahead operand reordering heuristic.
This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D60897

llvm-svn: 364084
2019-06-21 17:57:01 +00:00
Craig Topper 6af1be9664 [X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl.
We already use vmovq for v2i64/v2f64 vzmovl. But we were using a
blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or
movsd+xorpd under optsize.

I think the blend with 0 or movss/d is only needed for
vXi32 where we don't have an instruction that can move 32
bits from one xmm to another while zeroing upper bits.

movq is no worse than blendpd on any known CPUs.

llvm-svn: 364079
2019-06-21 17:24:21 +00:00
Simon Pilgrim 0da13ed1f6 [DAGCombine] narrowExtractedVectorBinOp - pull out repeated getOpcode(). NFCI.
llvm-svn: 364076
2019-06-21 16:44:51 +00:00
Amara Emerson 8f25a021dd [AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal.
We sometimes get poor code size because constants of types < 32b are legalized
as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the
localizer can no longer sink them (although it's possible to extend it to do so).

On AArch64 however s8 and s16 constants can be selected in the same way as s32
constants, with a mov pseudo into a W register. If we make s8 and s16 constants
legal then we can avoid unnecessary truncates, they can be CSE'd, and the
localizer can sink them as normal.

There is a caveat: if the user of a smaller constant has to widen the sources,
we end up with an anyext of the smaller typed G_CONSTANT. This can cause
regressions because of the additional extend and missed pattern matching. To
remedy this, there's a new artifact combiner to generate the wider G_CONSTANT
if it's legal for the target.

Differential Revision: https://reviews.llvm.org/D63587

llvm-svn: 364075
2019-06-21 16:43:50 +00:00
Stanislav Mekhanoshin bdf7f81b89 [AMDGPU] hazard recognizer for fp atomic to s_denorm_mode
This requires 3 wait states unless there is a wait or VALU in
between.

Differential Revision: https://reviews.llvm.org/D63619

llvm-svn: 364074
2019-06-21 16:30:14 +00:00
David Bolvansky dbcdad51ff [InstCombine] (1 << (C - x)) -> ((1 << C) >> x) if C is bitwidth - 1
Summary:
```
%a = sub i32 31, %x
%r = shl i32 1, %a
  =>
%d = shl i32 1, 31
%r = lshr i32 %d, %x

Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/btZm

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63652

llvm-svn: 364073
2019-06-21 16:25:32 +00:00
Simon Pilgrim 96e77ce626 [X86] isBinOp - move commutative ops to isCommutativeBinOp. NFCI.
TargetLoweringBase::isBinOp checks isCommutativeBinOp as a fallback, so don't duplicate.

llvm-svn: 364072
2019-06-21 16:23:28 +00:00
Simon Pilgrim bdea88325f Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 364068
2019-06-21 16:11:18 +00:00
David Bolvansky 4b28478389 [InstCombine] cttz(abs(x)) -> cttz(x)
Summary: Signedness does not change number of trailing zeros.

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D63546

llvm-svn: 364064
2019-06-21 15:26:22 +00:00
Sanjay Patel ddb9093684 [GVNSink] prevent crashing on mismatched instructions (PR42346)
Patch based on suggestion by James Molloy (@jmolloy) in:
https://bugs.llvm.org/show_bug.cgi?id=42346

llvm-svn: 364062
2019-06-21 15:17:24 +00:00
Simon Pilgrim ca9933c22d [DAGCombine] narrowInsertExtractVectorBinOp - reuse "extract from insert" detection code.
Move the "extract from insert detection code" into a lambda helper function.

llvm-svn: 364059
2019-06-21 14:46:21 +00:00
Jay Foad d9d3c91b48 [Scalarizer] Propagate IR flags
Summary:
The motivation for this was to propagate fast-math flags like nnan and
ninf on vector floating point operations to the corresponding scalar
operations to take advantage of follow-on optimizations. But I think
the same argument applies to all of our IR flags: if they apply to the
vector operation then they also apply to all the individual scalar
operations, and they might enable follow-on optimizations.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63593

llvm-svn: 364051
2019-06-21 14:10:18 +00:00
Sam Elliott 96c8bc7956 [RISCV] Add RISCV-specific TargetTransformInfo
Summary:
LLVM Allows Targets to provide information that guides optimisations
made to LLVM IR. This is done with callbacks on a TargetTransformInfo object.

This patch adds a TargetTransformInfo class for RISC-V. This will allow us to
implement RISC-V specific callbacks as they become necessary.

This commit also adds the getIntImmCost callbacks, and tests them with a simple
constant hoisting test. Our immediate costs are on the conservative side, for
the moment, but we prevent hoisting in most circumstances anyway.

Previous review was on D63007

Reviewers: asb, luismarques

Reviewed By: asb

Subscribers: ributzka, MaskRay, llvm-commits, Jim, benna, psnobl, jocewei, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, apazos, simoncook, johnrusso, rbar, hiraditya, mgorny

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63433

llvm-svn: 364046
2019-06-21 13:36:09 +00:00
Simon Tatham 0c7af66450 [ARM] Add MVE 64-bit GPR <-> vector move instructions.
These instructions let you load half a vector register at once from
two general-purpose registers, or vice versa.

The assembly syntax for these instructions mentions the vector
register name twice. For the move _into_ a vector register, the MC
operand list also has to mention the register name twice (once as the
output, and once as an input to represent where the unchanged half of
the output register comes from). So we can conveniently assign one of
the two asm operands to be the output $Qd, and the other $QdSrc, which
avoids confusing the auto-generated AsmMatcher too much. For the move
_from_ a vector register, there's no way to get round the fact that
both instances of that register name have to be inputs, so we need a
custom AsmMatchConverter to avoid generating two separate output MC
operands. (And even that wouldn't have worked if it hadn't been for
D60695.)

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62679

llvm-svn: 364041
2019-06-21 13:17:23 +00:00
Simon Tatham bafb105e96 [ARM] Add MVE vector instructions that take a scalar input.
This adds the `MVE_qDest_rSrc` superclass and all its instances, plus
a few other instructions that also take a scalar input register or two.

I've also belatedly added custom diagnostic messages to the operand
classes for odd- and even-numbered GPRs, which required matching
changes in two of the existing MVE assembly test files.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62678

llvm-svn: 364040
2019-06-21 13:17:08 +00:00
Paul Robinson 26cc5bcb1a Fix a crash with assembler source and -g.
llvm-mc or clang with -g normally produces debug info describing the
assembler source itself; however, if that source already contains some
.file/.loc directives, we should instead emit the debug info described
by those directives.  For certain assembler sources seen in the wild
(particularly in the Chrome build) this was causing a crash due to
incorrect assumptions about legal sequences of assembler source text.

Fixes PR38994.

Differential Revision: https://reviews.llvm.org/D63573

llvm-svn: 364039
2019-06-21 13:10:19 +00:00
Simon Pilgrim 36a999ffb8 [X86] X86ISD::ANDNP is a (non-commutative) binop
The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better.

llvm-svn: 364038
2019-06-21 12:42:39 +00:00
Simon Tatham a6b6a15701 [ARM] Add a batch of similarly encoded MVE instructions.
Summary:
This adds the `MVE_qDest_qSrc` superclass and all instructions that
inherit from it. It's not the complete class of _everything_ with a
q-register as both destination and source; it's a subset of them that
all have similar encodings (but it would have been hopelessly unwieldy
to call it anything like MVE_111x11100).

This category includes add/sub with carry; long multiplies; halving
multiplies; multiply and accumulate, and some more complex
instructions.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62677

llvm-svn: 364037
2019-06-21 12:13:59 +00:00
Simon Pilgrim 9184b009cf [X86] createMMXBuildVector - call with BuildVectorSDNode directly. NFCI.
llvm-svn: 364030
2019-06-21 11:25:06 +00:00
Fangrui Song d5cf95e41c [ARM] Fix -Wimplicit-fallthrough after D62675
llvm-svn: 364028
2019-06-21 11:19:11 +00:00
Simon Tatham 7d76f8acf0 [ARM] Add MVE vector compare instructions.
Summary:
These take a pair of vector register to compare, and a comparison type
(written in the form of an Arm condition suffix); they output a vector
of booleans in the VPR register, where predication can conveniently
use them.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62676

llvm-svn: 364027
2019-06-21 11:14:51 +00:00
Simon Pilgrim c26b8f2afc [X86] combineAndnp - use isNOT instead of manually checking for (XOR x, -1)
llvm-svn: 364026
2019-06-21 11:13:15 +00:00
Fangrui Song 22e478f054 [Symbolize] Avoid lifetime extension and simplify std::map find/insert. NFC
llvm-svn: 364025
2019-06-21 11:05:26 +00:00
Simon Pilgrim b5733581c4 [X86] foldVectorXorShiftIntoCmp - use isConstOrConstSplat. NFCI.
Use the isConstOrConstSplat helper instead of inspecting the build vector manually.

llvm-svn: 364024
2019-06-21 10:54:30 +00:00
Simon Pilgrim 771c33e375 [X86][AVX] isNOT - handle concat_vectors(xor X, -1, xor Y, -1) pattern
llvm-svn: 364022
2019-06-21 10:44:15 +00:00
Simon Tatham c9b2cd4674 [ARM] Add a batch of MVE floating-point instructions.
Summary:
This includes floating-point basic arithmetic (add/sub/multiply),
complex add/multiply, unary negation and absolute value, rounding to
integer value, and conversion to/from integer formats.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62675

llvm-svn: 364013
2019-06-21 09:35:07 +00:00
Fangrui Song dc8de6037c Simplify std::lower_bound with llvm::{bsearch,lower_bound}. NFC
llvm-svn: 364006
2019-06-21 05:40:31 +00:00
Fangrui Song ddd056c984 [MIPS GlobalISel] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63541
llvm-svn: 364003
2019-06-21 01:51:50 +00:00
Amara Emerson bc0d08e0ee [GlobalISel][Localizer] Allow localization of G_INTTOPTR and chains of instructions.
G_INTTOPTR can prevent the localizer from moving G_CONSTANTs, but since it's
essentially a side effect free cast instruction we can remat both instructions.
This patch changes the localizer to enable localization of the chains by
iterating over the entry block instructions in reverse order. That way, uses will
localized first, and then the defs are free to be localized as well.

This also changes the previous SmallPtrSet of localized instructions to use a
SetVector instead. We're dealing with pointers and need deterministic iteration
order.

Overall, this change improves ARM64 -O0 CTMark code size by around 0.7% geomean.

Differential Revision: https://reviews.llvm.org/D63630

llvm-svn: 364001
2019-06-21 00:36:19 +00:00
Cameron McInally 1c0bd6dd2c [Reassociate] Remove bogus assert reported in PR42349.
Also, add a FIXME for the unsafe transform on a unary FNeg. A unary FNeg can only be transformed to a FMul by -1.0 when the nnan flag is present. The unary FNeg project is a WIP, so the unsafe transformation is acceptable until that work is complete.

The bogus assert with introduced in D63445.

llvm-svn: 363998
2019-06-20 23:03:55 +00:00
Sanjay Patel b342f026a4 [InstSimplify] simplify power-of-2 (single bit set) sequences
As discussed in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314

Improving the canonicalization for these patterns:
rL363956
...means we should adjust/enhance the related simplification.

https://rise4fun.com/Alive/w1cp

  Name: isPow2 or zero
  %x = and i32 %xx, 2048
  %a = add i32 %x, -1
  %r = and i32 %a, %x
  =>
  %r = i32 0

llvm-svn: 363997
2019-06-20 22:55:28 +00:00
Matt Arsenault d88db6d7fc AMDGPU: Always use s33 for global scratch wave offset
Every called function could possibly need this to calculate the
absolute address of stack objectst, and this avoids inserting a copy
around every call site in the kernel. It's also somewhat cleaner to
keep this in a callee saved SGPR.

llvm-svn: 363990
2019-06-20 21:58:24 +00:00
Eli Friedman 25f08a17c3 [ARM GlobalISel] Add support for s64 G_ADD and G_SUB.
Teach RegisterBankInfo to use the correct register class, and tell the
legalizer it's legal.  Everything else just works.

The one thing that's slightly weird about this compared to SelectionDAG
isel is that legalization can't distinguish between i64 and <1 x i64>,
so we might end up with more NEON instructions than the user expects.

Differential Revision: https://reviews.llvm.org/D63585

llvm-svn: 363989
2019-06-20 21:56:47 +00:00
Jinsong Ji 8b1abe568e [PowerPC][NFC] Fix comments for AltVSXFMARel mapping.
llvm-svn: 363987
2019-06-20 21:36:06 +00:00
Rainer Orth 6fde832b82 [profile] Solaris ld supports __start___llvm_prof_data etc. labels
Currently, many profiling tests on Solaris FAIL like

  Command Output (stderr):
  --
  Undefined                       first referenced
   symbol                             in file
  __llvm_profile_register_names_function /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o
  __llvm_profile_register_function    /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o

Solaris 11.4 ld supports the non-standard GNU ld extension of adding
__start_SECNAME and __stop_SECNAME labels to sections whose names are valid
as C identifiers.  Given that we already use Solaris 11.4-only features
like ld -z gnu-version-script-compat and fully working .preinit_array
support in compiler-rt, we don't need to worry about older versions of
Solaris ld.

The patch documents that support (although the comment in
lib/Transforms/Instrumentation/InstrProfiling.cpp
(needsRuntimeRegistrationOfSectionRange) is quite cryptic what it's
actually about), and adapts the affected testcase not to expect the
alternativeq __llvm_profile_register_functions and __llvm_profile_init.
It fixes all affected tests.

Tested on amd64-pc-solaris2.11.

Differential Revision: https://reviews.llvm.org/D41111

llvm-svn: 363984
2019-06-20 21:27:06 +00:00
Matt Arsenault 740322f1eb AMDGPU: Add intrinsics for DS GWS semaphore instructions
llvm-svn: 363983
2019-06-20 21:11:42 +00:00
Alina Sbirlea d0b11698cd [LICM & MSSA] Limit unsafe sinking and hoisting.
Summary:
The getClobberingMemoryAccess API checks for clobbering accesses in a loop by walking the backedge. This may check if a memory access is being
clobbered by the loop in a previous iteration, depending how smart AA got over the course of the updates in MemorySSA (it does not occur when built from scratch).
If no clobbering access is found inside the loop, it will optimize to an access outside the loop. This however does not mean that access is safe to sink.
Given:
```
for i
  load a[i]
  store a[i]
```
The access corresponding to the load can be optimized to outside the loop, and the load can be hoisted. But it is incorrect to sink it.
In order to sink the load, we'd need to check no Def clobbers the Use in the same iteration. With this patch we currently restrict sinking to either
Defs not existing in the loop, or Defs preceding the load in the same block. An easy extension is to ensure the load (Use) post-dominates all Defs.

Caught by PR42294.

This issue also shed light on the converse problem: hoisting stores in this same scenario would be illegal. With this patch we restrict
hoisting of stores to the case when their corresponding Defs are dominating all Uses in the loop.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63582

llvm-svn: 363982
2019-06-20 21:09:09 +00:00
Matt Arsenault 8ad1decf45 AMDGPU: Insert mem_viol check loop around GWS pre-GFX9
It is necessary to emit this loop around GWS operations in case the
wave is preempted pre-GFX9.

llvm-svn: 363979
2019-06-20 20:54:32 +00:00
Sanjay Patel 273d97e6bf [InstCombine] fix typo in comment; NFC
llvm-svn: 363974
2019-06-20 20:23:32 +00:00
Leonard Chan 97dc622ab3 [clang][NewPM] Do not eliminate available_externally durng `-O2 -flto` runs
This fixes CodeGen/available-externally-suppress.c when the new pass manager is
turned on by default. available_externally was not emitted during -O2 -flto
runs when it should still be retained for link time inlining purposes. This can
be fixed by checking that we aren't LTOPrelinking when adding the
EliminateAvailableExternallyPass.

Differential Revision: https://reviews.llvm.org/D63580

llvm-svn: 363971
2019-06-20 19:44:51 +00:00
Philip Reames a7fd8a806f [LFTR] Fix a (latent?) bug related to nested loops
I can't actually come up with a test case this triggers on without an out of tree change, but in theory, it's a bug in the recently added multiple exit LFTR support.  The root issue is that an exiting block common to two loops can (in theory) have computable exit counts for both loops.  Rewriting the exit of an inner loop in terms of the outer loops IV would cause the inner loop to either a) run forever, or b) terminate on the first iteration.

In practice, we appear to get lucky and not have the exit count computable for the outer loop, except when it's trivially zero.  Given we bail on zero exit counts, we don't appear to ever trigger this.  But I can't come up with a reason we *can't* compute an exit count for the outer loop on the common exiting block, so this may very well be triggering in some cases.

llvm-svn: 363964
2019-06-20 18:45:06 +00:00
Craig Topper 9e1665f2d6 [X86] Add BLSI to isUseDefConvertible.
Summary:
BLSI sets the C flag is the input is not zero. So if its followed
by a TEST of the input where only the Z flag is consumed, we can
replace it with the opposite check of the C flag.

We should be able to do the same for BLSMSK and BLSR, but the
naive test case for those is being optimized to a subo by
CodeGenPrepare.

Reviewers: spatel, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63589

llvm-svn: 363957
2019-06-20 17:52:53 +00:00
Sanjay Patel 63311bfb83 [InstCombine] canonicalize check for power-of-2
The form that compares against 0 is better because:
1. It removes a use of the input value.
2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS).

This is a root cause for PR42314, but probably doesn't completely answer the codegen request:
https://bugs.llvm.org/show_bug.cgi?id=42314

Alive proof:
https://rise4fun.com/Alive/9kG

  Name: is power-of-2
  %neg = sub i32 0, %x
  %a = and i32 %neg, %x
  %r = icmp eq i32 %a, %x
  =>
  %dec = add i32 %x, -1
  %a2 = and i32 %dec, %x
  %r = icmp eq i32 %a2, 0

  Name: is not power-of-2
  %neg = sub i32 0, %x
  %a = and i32 %neg, %x
  %r = icmp ne i32 %a, %x
  =>
  %dec = add i32 %x, -1
  %a2 = and i32 %dec, %x
  %r = icmp ne i32 %a2, 0

llvm-svn: 363956
2019-06-20 17:41:15 +00:00
Simon Pilgrim 801c0f12b0 [DAGCombiner] Use getAPIntValue() instead of getZExtValue() where possible.
Better handling of out-of-i64-range values due to large integer types or from fuzz tests.

llvm-svn: 363955
2019-06-20 17:36:23 +00:00
Jordan Rupprecht 02508decf4 [DAGCombiner][NFC] Remove unused var
llvm-svn: 363954
2019-06-20 17:30:01 +00:00
Amy Huang 7fac5c8d94 Store a pointer to the return value in a static alloca and let the debugger use that
as the variable address for NRVO variables.

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D63361

llvm-svn: 363952
2019-06-20 17:15:21 +00:00
David Bolvansky 01511192b2 [InstCombine] cttz(-x) -> cttz(x)
Summary: Signedness does not change number of trailing zeros.

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63534

llvm-svn: 363951
2019-06-20 17:04:14 +00:00
Matt Arsenault 5dc457cbe4 AMDGPU: Fix ignoring DisableFramePointerElim in leaf functions
The attribute can specify elimination for leaf or non-leaf, so it
should always be considered. I copied this bug from AArch64, which
probably should also be fixed.

llvm-svn: 363949
2019-06-20 17:03:23 +00:00
Evandro Menezes aa10f05044 [CodeGen] Fix formatting and comments (NFC)
llvm-svn: 363947
2019-06-20 16:34:00 +00:00
Matt Arsenault b7f87c0ecf AMDGPU: Treat undef as an inline immediate
This should only matter in vectors with an undef component, since a
full undef vector would have been folded out.

llvm-svn: 363941
2019-06-20 16:01:09 +00:00
Simon Tatham 232db11020 [ARM] Add a batch of MVE integer instructions.
This includes integer arithmetic of various kinds (add/sub/multiply,
saturating and not), and the immediate forms of VMOV and VMVN that
load an immediate into all lanes of a vector.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62674

llvm-svn: 363936
2019-06-20 15:16:56 +00:00
Stanislav Mekhanoshin 0846c125f9 [AMDGPU] gfx1010 core wave32 changes
Differential Revision: https://reviews.llvm.org/D63204

llvm-svn: 363934
2019-06-20 15:08:34 +00:00
Simon Pilgrim 1d8093249f [DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), C)) non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

llvm-svn: 363929
2019-06-20 14:42:27 +00:00
Simon Pilgrim 98a0ac5c0f [DAGCombine] Add TODOs for some combines that should support non-uniform vectors
We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc.

llvm-svn: 363924
2019-06-20 12:48:49 +00:00
Simon Pilgrim a4d705e0ef [X86] LowerAVXExtend - handle ANY_EXTEND_VECTOR_INREG lowering as well.
llvm-svn: 363922
2019-06-20 11:31:54 +00:00
Simon Pilgrim a487628270 [DAGCombine] Reduce scope of ShAmtVal variable. NFCI.
Fixes cppcheck warning.

Use the more capable getAPIntVal() instead of getZExtValue() as well since I'm here.

llvm-svn: 363921
2019-06-20 10:56:37 +00:00
Petar Avramovic 153bd24eda [MIPS GlobalISel] Select integer to floating point conversions
Select G_SITOFP and G_UITOFP for MIPS32.

Differential Revision: https://reviews.llvm.org/D63542

llvm-svn: 363912
2019-06-20 09:05:02 +00:00
Petar Avramovic 4b4dae1c76 [MIPS GlobalISel] Select floating point to integer conversions
Select G_FPTOSI and G_FPTOUI for MIPS32.

Differential Revision: https://reviews.llvm.org/D63541

llvm-svn: 363911
2019-06-20 08:52:53 +00:00
Craig Topper b4ea64570c [X86] Remove memory instructions form isUseDefConvertible.
The caller of this is looking for comparisons of the input
to these instructions with 0. But the memory instructions
input is an addess not a value input in a register.

llvm-svn: 363907
2019-06-20 04:58:40 +00:00
Craig Topper 451f7feb64 [X86] Add v64i8/v32i16 to several places in X86CallingConv.td where they seemed obviously missing.
llvm-svn: 363906
2019-06-20 04:29:00 +00:00
Matt Arsenault c67c484f36 AMDGPU: Don't clobber VCC in MUBUF addr64 emulation
Introducing VCC defs during SIFixSGPRCopies is generally
problematic. Avoid it by starting with the VOP3 form with the general
condition register. This is the easiest to fix instance, but doesn't
solve any specific problems I'm looking at.

llvm-svn: 363904
2019-06-20 00:51:28 +00:00
Eli Friedman d88e28d13e [llvm-objdump] Switch between ARM/Thumb based on mapping symbols.
The ARMDisassembler changes allow changing between ARM and Thumb mode
based on the MCSubtargetInfo, rather than the Target, which simplifies
the other changes a bit.

I'm not really happy with adding more target-specific logic to
tools/llvm-objdump/, but there isn't any easy way around it: the logic
in question specifically applies to disassembling an object file, and
that code simply isn't located in lib/Target, at least at the moment.

Differential Revision: https://reviews.llvm.org/D60927

llvm-svn: 363903
2019-06-20 00:29:40 +00:00
Matt Arsenault e4c2e9b016 AMDGPU: Consolidate some getGeneration checks
This is incomplete, and ideally these would all be removed, but it's
better to localize them to the subtarget first with comments about
what they're for.

llvm-svn: 363902
2019-06-19 23:54:58 +00:00
Thomas Preud'homme a2ef1ba32f [FileCheck] Stop qualifying expressions as numeric
Summary:
Stop referring to "numeric expression", using simply the term
"expression" instead. Likewise for numeric operation since operations
are only used in numeric expressions.

Reviewers: jhenderson, jdenny, probinson, arichardson

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63500

llvm-svn: 363901
2019-06-19 23:47:24 +00:00
Thomas Preud'homme baae41ff76 FileCheck: Return parse error w/ Error & Expected
Summary:
Make use of Error and Expected to bubble up diagnostics and force
checking of errors in the callers.

Reviewers: jhenderson, jdenny, probinson, arichardson

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63125

llvm-svn: 363900
2019-06-19 23:47:10 +00:00
Matt Arsenault e24b34e9c9 AMDGPU: Undo sub x, c canonicalization for v2i16
Should avoid regression from D62341

llvm-svn: 363899
2019-06-19 23:37:43 +00:00
Simon Pilgrim 046d49a8dc [DAGCombine] Use ConstantSDNode::getAPIntValue() instead of getZExtValue().
Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something.....

llvm-svn: 363887
2019-06-19 22:14:24 +00:00
Simon Atanasyan f61c43c636 [mips] Mark the `lwupc` instruction as MIPS64 R6 only
The "The MIPS64 Instruction Set Reference Manual" [1] states that
the `lwupc` is MIPS64 Release 6 only. It should not be supported
for 32-bit CPUs.

[1] https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00087-2B-MIPS64BIS-AFP-6.06.pdf

llvm-svn: 363886
2019-06-19 22:08:06 +00:00
Simon Atanasyan 0121432602 [mips] Add (GPR|PTR)_64 predicates to PseudoReturn64 and PseudoIndirectHazardBranch64
This patch is one of a series of patches. The goal is to make P5600
scheduler model complete and turn on the `CompleteModel` flag.

llvm-svn: 363885
2019-06-19 22:07:46 +00:00
Philip Reames eda1ba65ca LFTR for multiple exit loops
Teach IndVarSimply's LinearFunctionTestReplace transform to handle multiple exit loops. LFTR does two key things 1) it rewrites (all) exit tests in terms of a common IV potentially eliminating one in the process and 2) it moves any offset/indexing/f(i) style logic out of the loop.

This turns out to actually be pretty easy to implement. SCEV already has all the information we need to know what the backedge taken count is for each individual exit. (We use that when computing the BE taken count for the loop as a whole.) We basically just need to iterate through the exiting blocks and apply the existing logic with the exit specific BE taken count. (The previously landed NFC makes this super obvious.)

I chose to go ahead and apply this to all loop exits instead of only latch exits as originally proposed. After reviewing other passes, the only case I could find where LFTR form was harmful was LoopPredication. I've fixed the latch case, and guards aren't LFTRed anyways. We'll have some more work to do on the way towards widenable_conditions, but that's easily deferred.

I do want to note that I added one bit after the review.  When running tests, I saw a new failure (no idea why didn't see previously) which pointed out LFTR can rewrite a constant condition back to a loop varying one.  This was theoretically possible with a single exit, but the zero case covered it in practice.  With multiple exits, we saw this happening in practice for the eliminate-comparison.ll test case because we'd compute a ExitCount for one of the exits which was guaranteed to never actually be reached.  Since LFTR ran after simplifyAndExtend, we'd immediately turn around and undo the simplication work we'd just done.  The solution seemed obvious, so I didn't bother with another round of review.

Differential Revision: https://reviews.llvm.org/D62625

llvm-svn: 363883
2019-06-19 21:58:25 +00:00
Alina Sbirlea 109d2ea153 [MemorySSA] Cleanup trivial phis.
Summary:
This is unfortunately needed for correctness, if we are to extend the tolerance of the update API to the way simple loop unswitch is doing cloning.

In simple loop unswitch (as opposed to loop unswitch), not all blocks are cloned. This can create unreachable cloned blocks (no predecessor), which are later cleaned up.

In MemorySSA, the  APIs for supporting these kind of updates (clone + update exit blocks), make certain assumption on the integrity of the CFG. When cloning, if something was not cloned, it's values in MemorySSA default to LiveOnEntry. When updating exit blocks, it is safe to assume that we can first insert phis in the blocks merging two clones, then add additional phis in the IDF of the blocks that received phis. This no longer holds true if one of the clones being merged comes from an unreachable block. We'd conservatively need to add all phis before filling in their incoming definitions. In practice this restriction can be relaxed if we clean up trivial phis after the first round of insertion.

Reviewers: george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63354

llvm-svn: 363880
2019-06-19 21:33:09 +00:00
Alina Sbirlea 238b8e62b6 [MemorySSA] Use GraphDiff info when computing IDF.
Summary:
When computing IDF for insert updates, ensure we use the snapshot CFG offered by GraphDiff.
Caught by D63389.

Reviewers: kuhar, george.burgess.iv

Subscribers: jlebar, Prazek, llvm-commits, Szelethus

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63443

llvm-svn: 363879
2019-06-19 21:17:31 +00:00