Commit Graph

53263 Commits

Author SHA1 Message Date
Tom Stellard cc0bc941d4 AMDGPU/LoadStoreOptimizer: combine MMOs when merging instructions
Summary:
The LoadStoreOptimizer was creating instructions with 2
MachineMemOperands, which meant they were assumed to alias with all other instructions,
because MachineInstr:mayAlias() returns true when an instruction has multiple
MachineMemOperands.

This was preventing these instructions from being merged again, and was
giving the scheduler less freedom to reorder them.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65036

llvm-svn: 367237
2019-07-29 16:40:58 +00:00
Jay Foad 3bdcedbf3d [AMDGPU] Fix typo in error message
llvm-svn: 367235
2019-07-29 16:17:13 +00:00
Simon Pilgrim 5ab948f823 [X86] combineX86ShufflesRecursively - start recursion at depth = 0. NFCI.
As discussed on rL367171, we have a problem where the depth recursion used in combineX86ShufflesRecursively was subtly different to computeKnownBits etc. - it starts at Depth=1 instead of Depth=0 like the others and has a different maximum recursion depth.

This NFC patch fixes the recursion depth to start at 0, so we can more easily reuse depth values in calls from combineX86ShufflesRecursively and its helper functions in computeKnownBits etc.

llvm-svn: 367232
2019-07-29 15:57:06 +00:00
Francis Visoiu Mistrih d42289e291 [RISCV] Fix uninitialized variable after call to evaluateConstantImm
For llvm/test/MC/RISCV/rv64i-aliases-invalid.s, UBSan reports:

lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:371:9: runtime error:
load of value 3879186881, which is not a valid value for type
'RISCVMCExpr::VariantKind'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp:371:9 in

It turns out that evaluateConstantImm does not set `VK` and it remains
unitialized when doing comparisons in `isImmXLenLI()`.

Differential Revision: https://reviews.llvm.org/D65347

llvm-svn: 367230
2019-07-29 15:52:13 +00:00
Jay Foad dcb7532479 [DivergenceAnalysis] Add methods for querying divergence at use
Summary:
The existing isDivergent(Value) methods query whether a value is
divergent at its definition. However even if a value is uniform at its
definition, a use of it in another basic block can be divergent because
of divergent control flow between the def and the use.

This patch adds new isDivergent(Use) methods to DivergenceAnalysis,
LegacyDivergenceAnalysis and GPUDivergenceAnalysis.

This might allow D63953 or other similar workarounds to be removed.

Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue

Reviewed By: nhaehnle

Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65141

llvm-svn: 367218
2019-07-29 10:22:09 +00:00
Sam Parker 414dd1c946 [NFC][ARM[ParallelDSP] Cleanup of BinOpChain
- Remove some unused typedefs.
- Rename BinOpChain struct to MulCandidate.
- Remove the size method of MulCandidate.
- Store only the first input of the ValueList provided to
  MulCandidate, as it's the only value we care about. This means we
  don't have to perform any ugly (and unnecessary) iterations of the 
  list later on.

llvm-svn: 367208
2019-07-29 08:41:51 +00:00
David Stuttard 20235ef3e7 [AMDGPU] Enable v4f16 and above for v_pk_fma instructions
Summary:
If isel is presented with <2 x half> vectors then it will correctly select
v_pk_fma style instructions.
If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for
other instruction types (such as fadd, fmul etc.)

Added extra support to enable this. Updated one of the tests to include a test
for this (as well as extending the test to GFX9)

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65325

Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee
llvm-svn: 367206
2019-07-29 08:15:10 +00:00
Sam Parker 8538060103 [NFC][ARM][ParallelDSP] Remove AreSymmetrical
We explicitly search for a parallel mac and we only care about its
inputs, checking for symmetry doesn't add anything here.

llvm-svn: 367205
2019-07-29 08:12:24 +00:00
Sam Parker 11ad33ede6 [NFC][ARM][ParallelDSP] Remove PopulateLoads
We no longer have to check what loads are used, all this
is performed at the start of the transform, so it's not
doing anything now.

llvm-svn: 367204
2019-07-29 08:07:23 +00:00
Craig Topper eb1beabad9 [X86] Don't use PMADDWD for vector add reductions of multiplies if the mul inputs have an additional user.
The pmaddwd inserts a truncate, if that truncate would end up
creating additional instructions instead of making a zext
narrower, then we shouldn't do it.

I've restricted this to only sse4.1 targets since on prior
targets the zext will be done in stages. So the truncate will
probably not create additional instructions. Might need some
more investigation of mul shrinking and the other pmaddwd
transform to be sure this is the right decision.

There might be a slight regression on AVX1 targets due to add
splitting. Hard to say for sure. Maybe we need to look into
using the vector reduction flag to use 2 narrow loads and a
blend instead of extracting and inserting.

llvm-svn: 367198
2019-07-29 01:36:58 +00:00
Craig Topper 894916cac9 [X86] In combineLoopMAddPattern and combineLoopSADPattern, preserve the vector reduction flag on the final add. Handle unrolled loops by letting DAG combine revisit.
This reverts r340478 and r340631 and replaces them with a simpler
method of just letting DAG combine revisit the nodes to handle
the other operand.

llvm-svn: 367195
2019-07-28 18:45:42 +00:00
David Green b8b8b46a51 [ARM] MVE VPNOT
This adds the patterns required to transform xor P0, -1 to a VPNOT. The
instruction operands have to change a little for this, adding an in and an out
VCCR reg and using a custom DecodeMVEVPNOT for the decode.

Differential Revision: https://reviews.llvm.org/D65133

llvm-svn: 367192
2019-07-28 14:07:48 +00:00
David Green 9cf344e739 [ARM] Better patterns for fp <> predicate vectors
These are some better patterns for converting between predicates and floating
points. Much like the extends, we select "1"/"-1" or "0" depending on the
predicate value. Or we perform a compare against 0 to convert to a predicate.

Differential Revision: https://reviews.llvm.org/D65103

llvm-svn: 367191
2019-07-28 13:53:39 +00:00
Simon Pilgrim 353a848473 [X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler (Reapplied)
Recommit rL367100 which was reverted at rL367141. Until PR42777 is fixed, we no longer get the benefits of peeking through bitcasts but it does still remove a GetDemandedBits user and gives us the equivalent combines.

llvm-svn: 367172
2019-07-27 13:30:29 +00:00
Amara Emerson 7bc4fad0fb [AArch64][GlobalISel] Implement narrowing of G_SEXT.
We need this to narrow a sext to s128.

Differential Revision: https://reviews.llvm.org/D65357

llvm-svn: 367164
2019-07-26 23:46:38 +00:00
Jessica Paquette aa8b9993c2 [AArch64][GlobalISel] Select @llvm.aarch64.stlxr for 32-bit pointers
Add partial instruction selection for intrinsics like this:

```
declare i32 @llvm.aarch64.stlxr(i64, i32*)
```

(This only handles the case where a G_ZEXT is feeding the intrinsic.)

Also make sure that the added store instruction actually has the memory op from
the original G_STORE.

Update select-stlxr-intrin.mir and arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D65355

llvm-svn: 367163
2019-07-26 23:28:53 +00:00
Vlad Tsyrklevich 485b8789de Revert "[X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler."
This reverts r367100, it appears to be causing test failures after
Nico's revert of r367091.

llvm-svn: 367141
2019-07-26 18:14:21 +00:00
Sean Fertile 9df6177d38 [PowerPC][AIX]Add lowering of MCSymbol MachineOperand.
Adds machine operand lowering for MCSymbolSDNodes to the PowerPC
backend. This is needed to produce call instructions in assembly for AIX
because the callee operand is a MCSymbolSDNode. The test is XFAIL'ed for
asserts due to a (valid) assertion in PEI that the AIX ABI isn't supported yet.

Differential Revision: https://reviews.llvm.org/D63738

llvm-svn: 367133
2019-07-26 17:25:27 +00:00
Michael Liao 711556e6a8 [AMDGPU] Fix typo.
llvm-svn: 367131
2019-07-26 17:13:59 +00:00
Cullen Rhodes 2cde8b5db6 [AArch64][SVE2] Rename bitperm feature to sve2-bitperm
Summary:
The bitperm feature flag is now prefixed with SVE2, as it is for all other SVE2
extensions

Patch by Maciej Gabka.

Reviewers: sdesmalen, rovka, chill, SjoerdMeijer, rengolin

Reviewed By: SjoerdMeijer, rengolin

Differential Revision: https://reviews.llvm.org/D65327

llvm-svn: 367124
2019-07-26 15:57:50 +00:00
Sam Parker 3da59e5513 [ARM][ParallelDSP] Combine structs
Combine OpChain and BinOpChain structs as OpChain is a base class to
BinOpChain that is never used.

llvm-svn: 367114
2019-07-26 14:11:40 +00:00
Sean Fertile 9bd22fec0d [PowerPC] Add getCRSaveOffset to improve readability. [NFC]
In preperation for AIX support in FrameLowering: replace a number of literal
'8' that represent the stack offset of the condition register save area with
a member in PPCFrameLowering.

Patch by Chris Bowler.

llvm-svn: 367111
2019-07-26 14:02:17 +00:00
Petar Avramovic cf21794566 [MIPS GlobalISel] Fix check for void return during lowerCall
Void return used to have unsigned with value 0 for virtual register
but with addition of Register class and changes to arguments to lowerCall
this is no longer valid.
Check for void return by inspecting the Ty field in OrigRet.

Differential Revision: https://reviews.llvm.org/D65321

llvm-svn: 367107
2019-07-26 13:19:37 +00:00
Carl Ritson 0b28357053 [AMDGPU] Move WQM/WWM intrinsic instruction selection to AMDGPUISelDAGToDAG
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65328

llvm-svn: 367105
2019-07-26 13:11:44 +00:00
Petar Avramovic b1fc6f6130 [MIPS GlobalISel] Select inttoptr and ptrtoint
Select G_INTTOPTR and G_PTRTOINT for MIPS32.

Differential Revision: https://reviews.llvm.org/D65217

llvm-svn: 367104
2019-07-26 13:08:06 +00:00
Simon Pilgrim d93e8ece7b [X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler.
This removes a GetDemandedBits user and allows us to benefit from the DemandedElts propagated through SimplifyDemandedBits.

llvm-svn: 367100
2019-07-26 11:10:20 +00:00
Sam Parker 7440065bd8 [NFC][ARM][ParallelDSP] Cleanup isNarrowSequence
Remove unused logic.

llvm-svn: 367099
2019-07-26 10:57:42 +00:00
Carl Ritson 00e89b428b [AMDGPU] Add llvm.amdgcn.softwqm intrinsic
Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm
only if there is other WQM computation in the shader.

Reviewers: nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64935

llvm-svn: 367097
2019-07-26 09:54:12 +00:00
Momchil Velikov 898d953693 [AArch64] Define ETE and TRBE system registers
Embedded Trace Extension and Trace Buffer Extension are optional
future architecture extensions.
(cf. https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools)

Their system registers are documented here:
https://developer.arm.com/docs/ddi0601/a

ETE shares register names with ETM. One exception is the ETE
TRCEXTINSELR0 register, which has the same encoding as the ETM
TRCEXTINSELR register (but different semantics). This patch treats
them as aliases: the assembler will accept both names, emitting
identical encoding, and the disassembler will keep disassembling
to TRCEXRINSELR.

Differential Revision: https://reviews.llvm.org/D63707

llvm-svn: 367093
2019-07-26 09:19:08 +00:00
Sam Parker c760b5da11 [ARM][LowOverheadLoops] Add CPSR defs
Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair,
so add an implicit def to these pseudo instructions in case that WLS
and LE aren't generated.

Differential Revision: https://reviews.llvm.org/D65275

llvm-svn: 367089
2019-07-26 08:15:01 +00:00
Pengfei Wang 9ad565f70e [WinEH] Allocate space in funclets stack to save XMM CSRs
Summary:
This is an alternate approach to D57970.
Currently funclets reuse the same stack slots that are used in the
parent function for saving callee-saved xmm registers. If the parent
function modifies a callee-saved xmm register before an excpetion is
thrown, the catch handler will overwrite the original saved value.

This patch allocates space in funclets stack for saving callee-saved xmm
registers and uses RSP instead RBP to access memory.

Reviewers: andrew.w.kaylor, LuoYuanke, annita.zhang, craig.topper,
RKSimon

Subscribers: rnk, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63396

Signed-off-by: pengfei <pengfei.wang@intel.com>
llvm-svn: 367088
2019-07-26 07:33:15 +00:00
Matt Arsenault a9ea8a9aae AMDGPU/GlobalISel: Handle most function return types
handleAssignments gives up pretty easily on structs, and i8 values for
some reason. The other case that doesn't work is when an implicit sret
needs to be inserted if the return size exceeds the number of return
registers.

llvm-svn: 367082
2019-07-26 02:36:05 +00:00
Amara Emerson c07fe307b4 [AArch64][GlobalISel] Simplify zext/sext selection, use MachineIRBuilder. NFC.
llvm-svn: 367075
2019-07-26 00:01:09 +00:00
Yonghong Song 329abf2939 [BPF] fix typedef issue for offset relocation
Currently, the CO-RE offset relocation does not work
if any struct/union member or array element is a typedef.
For example,
  typedef const int arr_t[7];
  struct input {
      arr_t a;
  };
  func(...) {
       struct input *in = ...;
       ... __builtin_preserve_access_index(&in->a[1]) ...
  }
The BPF backend calculated default offset is 0 while
4 is the correct answer. Similar issues exist for struct/union
typedef's.

When getting struct/union member or array element type,
we should trace down to the type by skipping typedef
and qualifiers const/volatile as this is what clang did
to generate getelementptr instructions.
(const/volatile member type qualifiers are already
ignored by clang.)

This patch fixed this issue, for each access index,
skipping typedef and const/volatile/restrict BTF types.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D65259

llvm-svn: 367062
2019-07-25 21:47:27 +00:00
Amara Emerson e54dc6b8b5 [AArch64][GlobalISel] Fix G_SELECT legalization fallback after r366943.
Changes the order of legalization of G_ICMP suggested by Petar in D65079.

llvm-svn: 367060
2019-07-25 21:44:52 +00:00
Yonghong Song d8efec97be [BPF] fix CO-RE incorrect index access string
Currently, we expect the CO-RE offset relocation records
a string encoding the original getelementptr access index,
so kernel bpf loader can decode it correctly.

For example,
  struct s { int a; int b; };
  struct t { int c; int d; };
  #define _(x) (__builtin_preserve_access_index(x))
  int get_value(const void *addr1, const void *addr2);
  int test(struct s *arg1, struct t *arg2) {
    return get_value(_(&arg1->b), _(&arg2->d));
  }

We expect two offset relocations:
  reloc 1: type s, access index 0, 1
  reloc 2: type t, access index 0, 1

Two globals are created to retain access indexes for the
above two relocations with global variable names.
The first global has a name "0:1:". Unfortunately,
the second global has the name "0:1:.1" as the llvm
internals automatically add suffix ".1" to a global
with the same name. Later on, the BPF peels the last
character and record "0:1" and "0:1:." in the
relocation table.

This is not desirable. BPF backend could use the global
variable suffix knowledge to generate correct access str.
This patch rather took an approach not relying on
that knowledge. It generates "s:0:1:" and "t:0:1:" to
avoid global variable suffixes and later on generate
correct index access string "0:1" for both records.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D65258

llvm-svn: 367030
2019-07-25 16:01:26 +00:00
Michael Liao 53f967f2bd [AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs.
Summary:
- As LCSSA is turned on just before isel, it may create PHI of the flow,
  which is consumed by pseudo structurized CFG instructions. When that
  PHIs are eliminated in O0, COPY may be placed wrongly as the these
  pseudo structurized CFG instructions are considering prologue of MBB.
- Run extra `unreachable-mbb-elimination` at the end of isel to clean up
  PHIs.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64353

llvm-svn: 367023
2019-07-25 14:50:18 +00:00
Momchil Velikov a655f476b0 [AArch64][SVE] Allow explicit size specifier for predicate operand
... for the vector forms of `{SQ,UQ,}{INC,DEC}P` instructions. Also continue
supporting the exsting behaviour of not requiring an explicit size
specifier. The preferred disasembly is *with* the specifier.

This is implemented by redefining intruction forms to require vector predicates
with explicit size and adding aliases, which allow a predicate with no size.

Differential Revision: https://reviews.llvm.org/D65145

llvm-svn: 367019
2019-07-25 13:56:04 +00:00
Matt Arsenault a85af76c72 AMDGPU: Don't assert on v4f16 arguments to shader calling conventions
llvm-svn: 367018
2019-07-25 13:55:07 +00:00
Simon Pilgrim 447fe31964 [X86] concatSubVectors - remove unnecessary args. NFCI.
All these args can be cheaply recomputed and it makes it much easier to use the function as a quick helper.

llvm-svn: 367014
2019-07-25 13:05:46 +00:00
Pablo Barrio 275954539d [ARM][AArch64] Support for Cortex-A65 & A65AE, Neoverse E1 & N1
Summary:
Add support for Cortex-A65, Cortex-A65AE, Neoverse E1 and Neoverse N1.
Neoverse E1 and Cortex-A65(&AE) only implement the AArch64 state of the
Arm architecture. Neoverse N1 implements both AArch32 and AArch64.

Cortex-A65:
https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65

Cortex-A65AE:
https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65ae

Neoverse E1:
https://developer.arm.com/ip-products/processors/neoverse/neoverse-e1

Neoverse N1:
https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1

Patch by Diogo Sampaio and Pablo Barrio

Reviewers: samparker, LukeCheeseman, sbaranga, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64406

llvm-svn: 367007
2019-07-25 10:59:45 +00:00
Kai Luo 985e52a4c1 [PowerPC][NFC] Make `getDefMIPostRA` public
llvm-svn: 366995
2019-07-25 08:36:44 +00:00
Kai Luo 5c8af53806 [PowerPC][NFC] Added `getDefMIPostRA` method
Summary:
In PostRA phase, we often have to find out the most recent definition
of a register.  This patch adds getDefMIPostRA so that other methods
can use it rather than implementing it repeatedly.

Differential Revision: https://reviews.llvm.org/D65131

llvm-svn: 366990
2019-07-25 07:47:52 +00:00
Seiya Nuta 21277e3ec2 [MC] Add MCInstrAnalysis::evaluateMemoryOperandAddress
Summary:
Add a new method which tries to compute the target address referenced by an operand.

This patch supports x86_64 RIP-relative addressing for now.

It is necessary to print referenced symbol names in llvm-objdump.

Reviewers: andreadb, MaskRay, grosbach, jgalenson, craig.topper

Reviewed By: MaskRay, craig.topper

Subscribers: bcain, rupprecht, jhenderson, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63847

llvm-svn: 366987
2019-07-25 06:57:09 +00:00
Eli Friedman 82e109279d [ARM] Remove dead code from ARMConstantIslands.
tLDRHi is not a pc-relative load; it can't directly refer to a
constant pool or jump table.

llvm-svn: 366963
2019-07-24 23:36:14 +00:00
Jessica Paquette 728b18f29f [AArch64][GlobalISel] Select immediate modes for ADD when selecting G_GEP
Before, we weren't able to select things like this for G_GEP:

add	x0, x8, #8

And instead we'd materialize the 8.

This teaches GISel to do that. It gives some considerable code size savings
on 252.eon-- about 4%!

Differential Revision: https://reviews.llvm.org/D65248

llvm-svn: 366959
2019-07-24 23:11:01 +00:00
Amara Emerson de81bd0faa [AArch64][GlobalISel] Don't try to use GISel if subtarget doesn't have neon or fp.
Throughout the legalizerinfo we currently make the assumption that the target
has neon and FP target features available. Fixing it will require a refactor of
the whole thing, so until then make sure we fall back.

Works around PR42734

Differential Revision: https://reviews.llvm.org/D65244

llvm-svn: 366957
2019-07-24 23:00:04 +00:00
Roman Lebedev 017e272c3a [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH

InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.

Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.

Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.

Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm

Reviewed By: spatel

Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62871

llvm-svn: 366955
2019-07-24 22:57:22 +00:00
Jessica Paquette 68499112cf [AArch64][GlobalISel] Fold G_MUL into XRO load addressing mode when possible
If we have a G_MUL, and either the LHS or the RHS of that mul is the legal
shift value for a load addressing mode, we can fold it into the load.

This gives some code size savings on some SPEC tests. The best are around 2%
on 300.twolf and 3% on 254.gap.

Differential Revision: https://reviews.llvm.org/D65173

llvm-svn: 366954
2019-07-24 22:49:42 +00:00
Amara Emerson 13af1ed8e3 [GlobalISel] Support for inlining memcpy, memset and memmove calls.
This introduces a new family of combiner helper routines that re-use the
target specific cost model from SelectionDAG, and generate inline implementations
of the memcpy family of intrinsics.

The combines are only enabled at optimization levels higher than -O0, and give
very substantial performance improvements.

Differential Revision: https://reviews.llvm.org/D65167

llvm-svn: 366951
2019-07-24 22:17:31 +00:00
Stanislav Mekhanoshin c43784ff26 [AMDGPU] Increase kernel padding
To support prefetch mode 3 we need to pad current
cacheline and fill 3 cachelines after. Current padding
is only sufficient for mode 2.

Differential Revision: https://reviews.llvm.org/D65236

llvm-svn: 366938
2019-07-24 19:40:13 +00:00
David Green cd7a6fa314 [ARM] Rewrite how VCMP are lowered, using a single node
This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and
VCMPZ with an extra operand as the condition code. I believe this will make
some combines simpler, allowing us to just look at these codes and not the
operands. It also helps fill in a missing VCGTUZ MVE selection without adding
extra nodes for it.

Differential Revision: https://reviews.llvm.org/D65072

llvm-svn: 366934
2019-07-24 17:36:47 +00:00
Simon Pilgrim 7d318b2bb1 [DAGCombine] matchBinOpReduction - add partial reduction matching
This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector:

e.g. <8 x i32> reduction pattern in a <16 x i32> vector:

<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u>
<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u>

matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction.

I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction.

Fixes the x86 partial reduction sum cases in PR33758 and PR42023.

Differential Revision: https://reviews.llvm.org/D65047

llvm-svn: 366933
2019-07-24 17:29:56 +00:00
David Green 047a0b6575 [ARM] Disable MVE fptosi and friends
The prevents us from trying to convert an i1 predicate vector to a float, or
vice-versa. Better patterns are possible, which will follow in a subsequent
commit. For now we just expand them.

Differential Revision: https://reviews.llvm.org/D65066

llvm-svn: 366931
2019-07-24 17:26:26 +00:00
Jessica Paquette c19c30776a [AArch64][GlobalISel] Make vector dup optimization look at last elt of ZeroVec
Fix an off-by-one error which made us not look at the last element of the
zero vector. This caused a miscompile in 188.ammp.

Differential Revision: https://reviews.llvm.org/D65168

llvm-svn: 366930
2019-07-24 17:18:51 +00:00
David Green b342bddbe2 [ARM] More MVE compare vector splat combines for ANDs
Adds some extra r register compare combines, this time for ANDs.

Differential Revision: https://reviews.llvm.org/D65062

llvm-svn: 366928
2019-07-24 17:08:09 +00:00
David Green 93b5f61295 [ARM] MVE compare vector splat combine
MVE VCMP instructions can use a general purpose register as the second operand.
This adds the combines for it, selecting from a compare of a vdup.

Differential Revision: https://reviews.llvm.org/D65061

llvm-svn: 366924
2019-07-24 16:58:41 +00:00
Dmitry Preobrazhensky 5e1dd02c90 [AMDGPU][MC][GFX10] Enabled GFX10 assembly with arbitrary wavesize assumed by the code
Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D65216

llvm-svn: 366921
2019-07-24 16:50:17 +00:00
David Green bab4d8ac5a [ARM] Better OR's for MVE compares
This adds a DeMorgan combine for OR's of compares to turn them into AND's,
helping prevent them from going into and out of gpr registers. It also fills in
the VCLE and VCLT nodes that MVE can select, allowing it to invert more
compares.

Differential Revision: https://reviews.llvm.org/D65059

llvm-svn: 366920
2019-07-24 16:42:09 +00:00
Stanislav Mekhanoshin 5cdacea297 [AMDGPU] Add all vgpr classes to asm parser
Differential Revision: https://reviews.llvm.org/D65158

llvm-svn: 366917
2019-07-24 16:21:18 +00:00
Matt Arsenault 0e7d8698b5 AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts
The G_ANYEXT handling can end up reaching selectCOPY, which mutates
the instruction in place.

llvm-svn: 366915
2019-07-24 16:05:53 +00:00
David Green 69fba7434e [ARM] Better AND's for MVE compares
Add a number of folds to convert and(vcmp, vcmp) into a single VPT block, where
the second vcmp becomes predicated on the first.

The VCMP; VPST; VCMP will eventually be converted to VPT; VCMP in the
VPTBlockPass.

Differential Revision: https://reviews.llvm.org/D65058

llvm-svn: 366910
2019-07-24 14:42:05 +00:00
David Green 4fc78c496e [ARM] MVE floating point compares and selects
Much like integers, this adds MVE floating point compares and select. It
requires a lot more buildvector/shuffle code because we may need to expand the
compares without mve.fp, and requires support for and/or because of the way we
lower llvm condition codes.

Some original code by David Sherwood

Differential Revision: https://reviews.llvm.org/D65054

llvm-svn: 366909
2019-07-24 14:28:22 +00:00
David Green a4a4698c16 [ARM] Basic And/Or/Xor handling for MVE predicates
This adds some basic, "worst case" handling for MVE predicate Or/And/Xor. It
does this by going into and out of GPRs, doing the operation on scalars.

Code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65053

llvm-svn: 366907
2019-07-24 14:17:54 +00:00
Simi Pallipurath 724888af45 [ARM] Make sure that the constant pool does not keep in the middle of an IT block.
This change make sure that llvm does not emit an invalid IT block
by putting the constant pool in the middle of an IT block.

We have code to try to avoid putting a constant island in the middle of an
IT block, but it only works if we see an IT between the one currently
referencing CPE and possible insertion point. If the first instruction
we look at is the VLDRD after the IT , we never see the IT and does not
realize that the instruction doing the load could be in an IT block itself.

Differential Revision: https://reviews.llvm.org/D64621

Change-Id: I24cecb37cded75e8992870bd997f6226853bd920
llvm-svn: 366905
2019-07-24 13:54:14 +00:00
Sjoerd Meijer a19f5a76e6 Test commit. NFC.
Removed 2 trailing whitespaces in 2 files that used to be in different
repos to test my new github monorepo workflow.

llvm-svn: 366904
2019-07-24 13:30:36 +00:00
David Green c7e55d4f52 [ARM] MVE predicate register support
This adds support code for building and shuffling i1 predicate registers. It
generally uses two basic principles, either converting the predicate into an
scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or
by converting the register to an full vector register and back.

Some of the code here is a not super efficient but will hopefully cover most
cases of moving i1 vectors around and can be improved in subsequent patches.

Some code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65052

llvm-svn: 366890
2019-07-24 11:51:36 +00:00
David Green b9d96ceca0 [ARM] MVE integer compares and selects
This adds the very basics for MVE vector predication, adding integer VCMP and
VSEL instruction support. This is done through predicate registers (MVT::v16i1,
MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom
lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc).

An extra VCNE was added, as this can be handled sensibly by MVE's expanded
number of VCMP condition codes. (There are also VCLE and VCLT which are added
later).

VPSEL is also added here, simply selecting on the vselect.

Original code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65051

llvm-svn: 366885
2019-07-24 11:08:14 +00:00
Sam Parker aeb21b96a0 [ARM][ParallelDSP] Fix pointer operand reordering
While combining two loads into a single load, we often need to
reorder the pointer operands for the new load. This reordering was
broken in the cases where there was a chain of values that built up
the pointer.

Differential Revision: https://reviews.llvm.org/D65193

llvm-svn: 366881
2019-07-24 09:38:39 +00:00
Chen Zheng 8b7e82be12 [PowerPC][NFC] use opcode instead of MachineInstr for instrHasImmForm().
llvm-svn: 366867
2019-07-24 04:50:23 +00:00
Fangrui Song 305ace7cc8 [AArch64] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after r366857
llvm-svn: 366866
2019-07-24 01:59:44 +00:00
Amara Emerson 511f7f5785 [AArch64][GlobalISel] Add support for s128 loads, stores, extracts, truncs.
We need to be able to load and store s128 for memcpy inlining, where we want to
generate Q register mem ops. Making these legal also requires that we add some
support in other instructions. Regbankselect should also know about these since
they have no GPR register class that can hold them, so need special handling to
live on the FPR bank.

Differential Revision: https://reviews.llvm.org/D65166

llvm-svn: 366857
2019-07-23 22:05:13 +00:00
Jessica Paquette a2fae1e3e9 [GlobalISel][AArch64] Save a copy on G_SELECT by fixing condition to GPR
The condition can never be fed by FPRs, so it should always be on a GPR.

Differential Revision: https://reviews.llvm.org/D65157

llvm-svn: 366854
2019-07-23 21:39:50 +00:00
Eli Friedman b27fc95e89 [ARM] Add opt-bisect support to ARMParallelDSP.
llvm-svn: 366851
2019-07-23 20:48:46 +00:00
Yi-Hong Lyu 41a010a4ef [PowerPC] Remove redundant load immediate instructions
Currently PowerPC backend emits code like this:

  r3 = li 0
  std r3, 264(r1)
  r3 = li 0
  std r3, 272(r1)

This patch fixes that and other cases where a register already contains a value that is loaded so we will get:

  r3 = li 0
  std r3, 264(r1)
  std r3, 272(r1)

Differential Revision: https://reviews.llvm.org/D64220

llvm-svn: 366840
2019-07-23 19:11:07 +00:00
Craig Topper 76bc3d6e07 [X86] In lowerVectorShuffle, instead of creating a new node to canonicalize the shuffle mask by commuting, just commute the mask and swap V1/V2.
LegalizeDAG tries to legal the DAG by legalizing nodes before
their operands.

If we create a new node, we end up legalizing it after its operands.
This prevents some of the optimizations that can be done when the
operand is a build_vector since the build_vector will have been
legalized to something else.

Differential Revision: https://reviews.llvm.org/D65132

llvm-svn: 366835
2019-07-23 18:46:15 +00:00
Jessica Paquette 2b404d01e8 [GlobalISel][AArch64] Teach GISel to handle shifts in load addressing modes
When we select the XRO variants of loads, we can pull in very specific shifts
(of the size of an element). E.g.

```
ldr x1, [x2, x3, lsl #3]
```

This teaches GISel to handle these when they're coming from shifts
specifically.

This adds a new addressing mode function, `selectAddrModeShiftedExtendXReg`
which recognizes this pattern.

This also packs this up with `selectAddrModeRegisterOffset` into
`selectAddrModeXRO`. This is intended to be equivalent to `selectAddrModeXRO`
in AArch64ISelDAGtoDAG.

Also update load-addressing-modes to show that all of the cases here work.

Differential Revision: https://reviews.llvm.org/D65119

llvm-svn: 366819
2019-07-23 16:09:42 +00:00
Sam Parker 57e87dd81b [ARM][LowOverheadLoops] Fix branch target codegen
While lowering test.set.loop.iterations, it wasn't checked how the
brcond was using the result and so the wls could branch to the loop
preheader instead of not entering it. The same was true for
loop.decrement.reg.
    
So brcond and br_cc and now lowered manually when using the hwloop
intrinsics. During this we now check whether the result has been
negated and whether we're using SETEQ or SETNE and 0 or 1. We can
then figure out which basic block the WLS and LE should be targeting.

Differential Revision: https://reviews.llvm.org/D64616

llvm-svn: 366809
2019-07-23 14:08:46 +00:00
Simon Pilgrim c60c12fb10 Fix MSVC warning about extending a uint32_t shift result to uint64_t. NFCI.
llvm-svn: 366808
2019-07-23 14:04:54 +00:00
David Green fdedf240f8 [ARM] Rename NEONModImm to VMOVModImm. NFC
Rename NEONModImm to VMOVModImm as it is used in both NEON and MVE.

llvm-svn: 366790
2019-07-23 09:19:24 +00:00
Zi Xuan Wu 57d17ec2e1 [PowerPC] Replace float load/store pair with integer load/store pair when it's only used in load/store
Replace float load/store pair with integer load/store pair when it's only used in load/store,
because float load/store instructions cost more cycles then integer load/store.

A typical scenario is when there is a call with more than 13 float arguments passing, we need pass them by stack.
So we need a load/store pair to do such memory operation if the variable is global variable.

Differential Revision: https://reviews.llvm.org/D64195

llvm-svn: 366775
2019-07-23 03:34:40 +00:00
Matt Arsenault 827427f65b AMDGPU: Don't use SDNodeXForm for DS offset output
The xform has no real valuewhen it's using out of a complex pattern
output. The complex pattern was already creating TargetConstants with
i16, so this was just unnecessary machinery.

This allows global isel to import the simple cases once the complex
pattern is implemented.

llvm-svn: 366743
2019-07-22 21:38:11 +00:00
Craig Topper 510e6fadaa [X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it.
The build_vector will become a constant pool load. By using the
desired type initially, it ensures we don't generate a bitcast
of the constant pool load which will need to be folded with
the load.

While experimenting with another patch, I noticed that when the
load type and the constant pool type don't match, then
SimplifyDemandedBits can't handle it. While we should probably
fix that, this was a simple way to fix the issue I saw.

llvm-svn: 366732
2019-07-22 19:58:49 +00:00
Jason Liu 8dd563ef4b [NFC][PowerPC]Change ADDIStocHA to ADDIStocHA8 to follow 64-bit naming convention
Summary:

Since we are planning to add ADDIStocHA for 32bit in later patch, we decided
 to change 64bit one first to follow naming convention with 8 behind opcode.

Patch by: Xiangling_L

Differential Revision: https://reviews.llvm.org/D64814

llvm-svn: 366731
2019-07-22 19:55:33 +00:00
Sean Fertile 942537d9fa Stubs out TLOF for AIX and add support for common vars in assembly output.
Stubs out a TargetLoweringObjectFileXCOFF class, implementing only
SelectSectionForGlobal for common symbols. Also adds an override of
EmitGlobalVariable in PPCAIXAsmPrinter which adds a number of defensive errors
and adds support for emitting common globals.

llvm-svn: 366727
2019-07-22 19:15:29 +00:00
Sean Fertile 324d33dd4e [PowerPC] Fix comment on MO_PLT Target Operand Flag. [NFC]
Patch by Xiangling Liao.

llvm-svn: 366724
2019-07-22 18:47:59 +00:00
Sam Parker 4379a40088 [ARM][LowOverheadLoops] Revert remaining pseudos
ARMLowOverheadLoops would assert a failure if it did not find all the
pseudo instructions that comprise the hardware loop. Instead of doing
this, iterate through all the instructions of the function and revert
any remaining pseudo instructions that haven't been converted.

Differential Revision: https://reviews.llvm.org/D65080

llvm-svn: 366691
2019-07-22 14:16:40 +00:00
Matt Arsenault 937d0ee5d8 AMDGPU/GlobalISel: Remove unnecessary code
The minnum/maxnum case are dead, and the cvt is handled by the
default.

llvm-svn: 366685
2019-07-22 13:05:25 +00:00
David Green 8876a312a8 [ARM] Fix for MVE VPT block pass
We need to ensure that the number of T's is correct when adding multiple
instructions into the same VPT block.

Differential revision: https://reviews.llvm.org/D65049

llvm-svn: 366684
2019-07-22 12:51:38 +00:00
Simon Pilgrim b3d719e1cf [X86] EltsFromConsecutiveLoads - support common source loads (REAPPLIED)
This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.

A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.

Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.

Fixed out of bounds load assert identified in rL366501

Differential Revision: https://reviews.llvm.org/D64551

llvm-svn: 366681
2019-07-22 12:44:10 +00:00
Christudasan Devadasan 006cf8c03d Added address-space mangling for stack related intrinsics
Modified the following 3 intrinsics:
int_addressofreturnaddress,
int_frameaddress & int_sponentry.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D64561

llvm-svn: 366679
2019-07-22 12:42:48 +00:00
Oliver Stannard 6771a89fa0 [IPRA][ARM] Make use of the "returned" parameter attribute
ARM has code to recognise uses of the "returned" function parameter
attribute which guarantee that the value passed to the function in r0
will be returned in r0 unmodified. IPRA replaces the regmask on call
instructions, so needs to be told about this to avoid reverting the
optimisation.

Differential revision: https://reviews.llvm.org/D64986

llvm-svn: 366669
2019-07-22 08:44:36 +00:00
Jay Foad 298500ae33 [AMDGPU] Save some work when an atomic op has no uses
Summary:
In the atomic optimizer, save doing a bunch of work and generating a
bunch of dead IR in the fairly common case where the result of an
atomic op (i.e. the value that was in memory before the atomic op was
performed) is not used. NFC.

Reviewers: arsenm, dstuttard, tpr

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64981

llvm-svn: 366667
2019-07-22 07:19:44 +00:00
Simon Pilgrim 86fa3270ef [X86] SimplifyDemandedVectorEltsForTargetNode - Move SUBV_BROADCAST narrowing handling. NFCI.
Move the narrowing of SUBV_BROADCAST to where we handle all the other opcodes.

llvm-svn: 366660
2019-07-21 19:04:44 +00:00
Simon Pilgrim adec0f2252 [X86][SSE] Use PSADBW to improve vXi8 sum reduction (PR42674)
As detailed on PR42674, we can reduce a vXi8 down until we have the final <8 x i8>, and then use PSADBW with zero, to sum those values. We then extract the bottom i8, discarding any overflow from the upper bits of the i16 result.

llvm-svn: 366636
2019-07-20 15:20:11 +00:00
Jessica Paquette 41affad967 [GlobalISel][AArch64] Contract trivial same-size cross-bank copies into G_STOREs
Sometimes, you can end up with cross-bank copies between same-sized GPRs and
FPRs, which feed into G_STOREs. When these copies feed only into stores, they
aren't necessary; we can just store using the original register bank.

This provides some minor code size savings for some floating point SPEC
benchmarks. (Around 0.2% for 453.povray and 450.soplex)

This issue doesn't seem to show up due to regbankselect or anything similar. So,
this patch introduces an early select function, `contractCrossBankCopyIntoStore`
which performs the contraction when possible. The selector then continues
normally and selects the correct store opcode, eliminating needless copies
along the way.

Differential Revision: https://reviews.llvm.org/D65024

llvm-svn: 366625
2019-07-20 01:55:35 +00:00
Guanzhong Chen 5204f7611f [WebAssembly] Compute and export TLS block alignment
Summary:
Add immutable WASM global `__tls_align` which stores the alignment
requirements of the TLS segment.

Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang.

The expected usage has now changed to:

    __wasm_init_tls(memalign(__builtin_wasm_tls_align(),
                             __builtin_wasm_tls_size()));

Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton

Reviewed By: tlively

Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65028

llvm-svn: 366624
2019-07-19 23:34:16 +00:00
Matt Arsenault f3bfb85bce AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spaces
llvm-svn: 366621
2019-07-19 22:28:44 +00:00
Stanislav Mekhanoshin 05d9e6a2a3 [AMDGPU] Autogenerate register sequences in tuples
Differential Revision: https://reviews.llvm.org/D65007

llvm-svn: 366619
2019-07-19 21:43:42 +00:00
Stanislav Mekhanoshin 7b5a54e369 [AMDGPU] Fixed occupancy calculation for gfx10
Differential Revision: https://reviews.llvm.org/D65010

llvm-svn: 366616
2019-07-19 21:29:51 +00:00
Matt Arsenault 5e23f42820 AMDGPU: Avoid custom predicates for stores with glue
llvm-svn: 366613
2019-07-19 21:01:30 +00:00
Matt Arsenault e3401a9b86 AMDGPU: Redefine setcc condition PatLeafs
Avoid using custom code predicates.

llvm-svn: 366609
2019-07-19 20:24:40 +00:00
Matt Arsenault 48c0df5d46 AMDGPU: Don't rely on m0 being -1 for GWS offsets
This only works if the high bits of m0 are also 0, so m0 would have to
be set to 0xffff.

llvm-svn: 366608
2019-07-19 20:01:24 +00:00
Matt Arsenault 85f3890126 AMDGPU: Force s_waitcnt after GWS instructions
This is apparently required to be the immediately following
instruction, so force it into a bundle with a waitcnt.

llvm-svn: 366607
2019-07-19 19:47:30 +00:00
Stanislav Mekhanoshin 01fcf9238f [AMDGPU] Allow register tuples to set asm names
This change reverts most of the previous register name generation.
The real problem is that RegisterTuple does not generate asm names.
Added optional operand to RegisterTuple. This way we can simplify
register name access and dramatically reduce the size of static
tables for the backend.

Differential Revision: https://reviews.llvm.org/D64967

llvm-svn: 366598
2019-07-19 18:05:01 +00:00
Matt Arsenault 7df225dfc2 AMDGPU/GlobalISel: Fix MMO flags for kernel argument loads
The DAG lowering sets dereferencable and invariant, not nontemporal.

llvm-svn: 366597
2019-07-19 17:52:56 +00:00
Matt Arsenault 08494f6231 AMDGPU/GlobalISel: Selection for fminnum/fmaxnum
v2f16 case doesn't work yet because the VOP3P complex patterns haven't
been ported yet.

llvm-svn: 366585
2019-07-19 14:42:40 +00:00
Matt Arsenault b60a2ae40e AMDGPU/GlobalISel: Support arguments with multiple registers
Handles structs used directly in argument lists.

llvm-svn: 366584
2019-07-19 14:29:30 +00:00
Matt Arsenault fecf43eba3 AMDGPU/GlobalISel: Rewrite lowerFormalArguments
This should now handle everything except structs passed as multiple
registers.

I think most of the packing logic should be handled by
handleAssignments, but I'm unclear on what the contract is for
multiple registers. This is copying how x86 handles this.

This does change the behavior of the test_sgpr_alignment0 amdgpu_vs
test. I don't think shader arguments should try to follow the
alignment, and registers need to be repacked. I also don't think it
matters, since I think the pointers are packed to the beginning of the
argument list anyway.

llvm-svn: 366582
2019-07-19 14:15:18 +00:00
Matt Arsenault 1022c0dfde AMDGPU: Decompose all values to 32-bit pieces for calling conventions
This is the more natural lowering, and presents more opportunities to
reduce 64-bit ops to 32-bit.

This should also help avoid issues graphics shaders have had with
64-bit values, and simplify argument lowering in globalisel.

llvm-svn: 366578
2019-07-19 13:57:44 +00:00
Dmitry Preobrazhensky 4ccb7f8c45 [AMDGPU][MC] Corrected parsing of branch offsets
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D64629

llvm-svn: 366571
2019-07-19 13:12:47 +00:00
Than McIntosh e238a4c757 [X86] for split stack, not save/restore nested arg if unused
Summary:
For split-stack, if the nested argument (i.e. R10) is not used, no need to save/restore it in the prologue.

Reviewers: thanm

Reviewed By: thanm

Subscribers: mstorsjo, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64673

llvm-svn: 366569
2019-07-19 12:54:44 +00:00
Oliver Stannard 8780c0dda2 Don't update NoTrappingFPMath and FPDenormalMode in resetTargetOptions
We'd like to remove this whole function, because these are properties of
functions, not the target as a whole. These two are easy to remove
because they are only used for emitting ARM build attributes, which
expects them to represent the defaults for the whole module, not just
the last function generated.

This is needed to get correct build attributes when using IPRA on ARM,
because IPRA causes resetTargetOptions to get called before
ARMAsmPrinter::emitAttributes.

Differential revision: https://reviews.llvm.org/D64929

llvm-svn: 366562
2019-07-19 10:37:37 +00:00
Mikhail Maltsev 0b001f94a5 [ARM] Add <saturate> operand to SQRSHRL and UQRSHLL
Summary:
According to the new Armv8-M specification
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf the
instructions SQRSHRL and UQRSHLL now have an additional immediate
operand <saturate>. The new assembly syntax is:

SQRSHRL<c> RdaLo, RdaHi, #<saturate>, Rm
UQRSHLL<c> RdaLo, RdaHi, #<saturate>, Rm

where <saturate> can be either 64 (the existing behavior) or 48, in
that case the result is saturated to 48 bits.

The new operand is encoded as follows:
  #64 Encoded as sat = 0
  #48 Encoded as sat = 1
sat is bit 7 of the instruction bit pattern.

This patch adds a new assembler operand class MveSaturateOperand which
implements parsing and encoding. Decoding is implemented in
DecodeMVEOverlappingLongShift.

Reviewers: ostannard, simon_tatham, t.p.northover, samparker, dmgreen, SjoerdMeijer

Reviewed By: simon_tatham

Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64810

llvm-svn: 366555
2019-07-19 09:46:28 +00:00
Jay Foad 7d06ffff46 [AMDGPU] Simplify the exclusive scan used for optimized atomics
Summary:
Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8,
16, 32) instead of starting off shifting by 1, 2 and 3 and then doing
a 3-way ADD, because:

1. It simplifies the compiler a little.
2. It minimizes vgpr pressure because each instruction is now of the
   form vn = vn + vn << c.
3. It is more friendly to the DPP combiner, which currently can't
   combine into an ADD3 instruction.

Because of #2 and #3 the end result is improved from this:

  v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
  v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
  v_mov_b32_dpp v1, v3  row_shr:3 row_mask:0xf bank_mask:0xf
  v_add3_u32 v1, v4, v5, v1
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf

To this:

  v_add_u32_dpp v1, v1, v1  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf

I.e. two fewer computational instructions, one extra nop where we could
schedule something else.

Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64411

llvm-svn: 366543
2019-07-19 08:40:37 +00:00
Hsiangkai Wang 18ccfadd46 [DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame.
It is necessary to generate fixups in .debug_frame or .eh_frame as
relaxation is enabled due to the address delta may be changed after
relaxation.

There is an opcode with 6-bits data in debug frame encoding. So, we
also need 6-bits fixup types.

Differential Revision: https://reviews.llvm.org/D58335

llvm-svn: 366524
2019-07-19 02:03:34 +00:00
Amara Emerson cf12c7815f [GlobalISel] Translate calls to memcpy et al to G_INTRINSIC_W_SIDE_EFFECTs and legalize later.
I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't
do that unless we delay lowering to actual function calls. This patch changes
the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and
then have each target specify that using the new custom legalizer for intrinsics
hook that they want it expanded it a libcall.

Differential Revision: https://reviews.llvm.org/D64895

llvm-svn: 366516
2019-07-19 00:24:45 +00:00
Stanislav Mekhanoshin a9c71e01e7 [AMDGPU] Drop Reg32 and use regular AsmName
This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb.

Differential Revision: https://reviews.llvm.org/D64952

llvm-svn: 366505
2019-07-18 22:18:33 +00:00
Jessica Paquette 7a1dcc5ff1 [GlobalISel][AArch64] Add support for base register + offset register loads
Add support for folding G_GEPs into loads of the form

```
ldr reg, [base, off]
```

when possible. This can save an add before the load. Currently, this is only
supported for loads of 64 bits into 64 bit registers.

Add a new addressing mode function, `selectAddrModeRegisterOffset` which
performs this folding when it is profitable.

Also add a test for addressing modes for G_LOAD.

Differential Revision: https://reviews.llvm.org/D64944

llvm-svn: 366503
2019-07-18 21:50:11 +00:00
Reid Kleckner ba9c9e62cb Revert [X86] EltsFromConsecutiveLoads - support common source loads
This reverts r366441 (git commit 48104ef7c9)

This causes clang to fail to compile some file in Skia. Reduction soon.

llvm-svn: 366501
2019-07-18 21:26:41 +00:00
Guanzhong Chen df4479200b [WebAssembly] Fix __builtin_wasm_tls_base intrinsic
Summary:
Properly generate the outchain for the `__builtin_wasm_tls_base` intrinsic.

Also marked the intrinsic pure, per @sunfish's suggestion.

Reviewers: tlively, aheejin, sbc100, sunfish

Reviewed By: tlively

Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits, sunfish

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64949

llvm-svn: 366499
2019-07-18 21:17:52 +00:00
Guanzhong Chen 801fa8e6b9 [WebAssembly] Implement __builtin_wasm_tls_base intrinsic
Summary:
Add `__builtin_wasm_tls_base` so that LeakSanitizer can find the thread-local
block and scan through it for memory leaks.

Reviewers: tlively, aheejin, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64900

llvm-svn: 366475
2019-07-18 17:53:22 +00:00
Peter Collingbourne aa6a7df64a MC: AArch64: Add support for prel_g* relocation specifiers.
Differential Revision: https://reviews.llvm.org/D64683

llvm-svn: 366462
2019-07-18 16:54:33 +00:00
Peter Collingbourne 76427f849f AArch64: Unify relocation restrictions between MOVK/MOVN/MOVZ.
There doesn't seem to be a practical reason for these instructions to have
different restrictions on the types of relocations that they may be used
with, notwithstanding the language in the ELF AArch64 spec that implies that
specific relocations are meant to be used with specific instructions.

For example, we currently forbid the first instruction in the following
sequence, despite it currently being used by clang to generate a global
reference under -mcmodel=large:

	movz	x0, #:abs_g0_nc:foo
	movk	x0, #:abs_g1_nc:foo
	movk	x0, #:abs_g2_nc:foo
	movk	x0, #:abs_g3:foo

Therefore, allow MOVK/MOVN/MOVZ to accept the union of the set of relocations
that they currently accept individually.

Differential Revision: https://reviews.llvm.org/D64466

llvm-svn: 366461
2019-07-18 16:51:53 +00:00
Hsiangkai Wang 657277e0f1 Revert "[DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame."
This reverts commit 17e3cbf5fe656483d9016d0ba9e1d0cd8629379e.

llvm-svn: 366444
2019-07-18 15:06:50 +00:00
Hsiangkai Wang e43ce1a958 [DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame.
It is necessary to generate fixups in .debug_frame or .eh_frame as
relaxation is enabled due to the address delta may be changed after
relaxation.

There is an opcode with 6-bits data in debug frame encoding. So, we
also need 6-bits fixup types.

Differential Revision: https://reviews.llvm.org/D58335

llvm-svn: 366442
2019-07-18 14:47:34 +00:00
Simon Pilgrim 48104ef7c9 [X86] EltsFromConsecutiveLoads - support common source loads
This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.

A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.

Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.

Differential Revision: https://reviews.llvm.org/D64551

llvm-svn: 366441
2019-07-18 14:33:25 +00:00
Sanjay Patel e654785912 [x86] try harder to form LEA from ADD to avoid flag conflicts (PR40483)
LEA doesn't affect flags, so use it more liberally to replace an ADD when
we know that the ADD operands affect flags.

In the motivating example from PR40483:
https://bugs.llvm.org/show_bug.cgi?id=40483
...this lets us avoid duplicating a math op just to avoid flag conflict.

As mentioned in the TODO comments, this heuristic can be extended to
fire more often if that leads to more improvements.

Differential Revision: https://reviews.llvm.org/D64707

llvm-svn: 366431
2019-07-18 12:48:01 +00:00
Diogo N. Sampaio 11512e742b [ARM][DAGCOMBINE][FIX] PerformVMOVRRDCombine
Summary:
PerformVMOVRRDCombine ommits adding a offset
of 4 to the PointerInfo, when converting a
f64 = load[M]
to
{i32, i32} = {load[M], load[M + 4]}

Which would allow the machine scheduller
to break dependencies with the second load.

 - pr42638

Reviewers: eli.friedman, dmgreen, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64870

llvm-svn: 366423
2019-07-18 10:05:56 +00:00
Alex Bradbury b8d352a08b [RISCV] Reset NoPHIS MachineFunctionProperty in emitSelectPseudo
We insered PHIS were there were none before, so the property must be
reset. This error was found on an EXPENSIVE_CHECKS build.

llvm-svn: 366412
2019-07-18 07:52:41 +00:00
Craig Topper 8da0402210 [X86] Disable combineConcatVectors for vXi1 vectors.
I'm not convinced the code this calls is properly vetted for
vXi1 vectors. Experimental vector widening legalization testing
for D55251 is now hitting an assertion failure inside
EltsFromConsecutiveLoads. This is occurring from a v2i1 load
having a store size different than its VT size. Hopefully
this commit will keep such issues from happening.

llvm-svn: 366405
2019-07-18 06:18:06 +00:00
Alex Bradbury 8aba95d64c [RISCV] Avoid signed integer overflow UB in RISCVMatInt::generateInstSeq
Found by UBSan.

llvm-svn: 366398
2019-07-18 04:02:58 +00:00
Alex Bradbury ad73a436dc [RISCV] Don't acccess an invalidated iterator in RISCVInstrInfo::removeBranch
Issue found by ASan.

llvm-svn: 366397
2019-07-18 03:23:47 +00:00
Fangrui Song f358cf8de2 [AArch64] Add dependency from AArch64CodeGen to TransformUtils to fix -DBUILD_SHARED_LIBS=on link error after D64173/r366361
This fixes:

ld.lld: error: undefined symbol: llvm::findAllocaForValue(llvm::Value*, llvm::DenseMap<llvm::Value*, llvm::Alloc aInst*, llvm::DenseMapInfo<llvm::Value*>, llvm::detail::DenseMapPair<llvm::Value*, llvm::AllocaInst*> >&)
>>> referenced by AArch64StackTagging.cpp

llvm-svn: 366396
2019-07-18 01:53:08 +00:00
Stanislav Mekhanoshin 7872d76a16 [AMDGPU] Simplify AMDGPUInstPrinter::printRegOperand()
Differential Revision: https://reviews.llvm.org/D64892

llvm-svn: 366385
2019-07-17 22:58:43 +00:00
Craig Topper 61fff7a337 [X86] Make sure we mark 128/256 MLOAD as Legal with VLX when min-legal-vector-width=256 is in effect.
This started triggering an assertion after r364718 when we made
these Custom under AVX2.

llvm-svn: 366382
2019-07-17 22:26:00 +00:00
Stanislav Mekhanoshin 9c7f4264d3 [AMDGPU] Stop special casing flat_scratch for register name
Differential Revision: https://reviews.llvm.org/D64885

llvm-svn: 366376
2019-07-17 21:35:11 +00:00
Evgeniy Stepanov f45fd429b7 Speculative fix for stack-tagging.ll failure.
Depending on the evaluation order of function call arguments,
the current code may insert a use before def.

llvm-svn: 366375
2019-07-17 21:27:44 +00:00
Evgeniy Stepanov 851339fb29 Basic MTE stack tagging instrumentation.
Summary:
Use MTE intrinsics to tag stack variables in functions with
sanitize_memtag attribute.

Reviewers: pcc, vitalybuka, hctim, ostannard

Subscribers: srhines, mgorny, javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64173

llvm-svn: 366361
2019-07-17 19:24:12 +00:00
Evgeniy Stepanov d752f5e953 Basic codegen for MTE stack tagging.
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.

Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.

Differential Revision: https://reviews.llvm.org/D64172

llvm-svn: 366360
2019-07-17 19:24:02 +00:00
Momchil Velikov 0e2b74a2b0 Revert [AArch64] Add support for Transactional Memory Extension (TME)
This reverts r366322 (git commit 4b8da3a503)

llvm-svn: 366355
2019-07-17 17:43:32 +00:00
Daniil Fukalov d912a9ba9b [AMDGPU] Tune inlining parameters for AMDGPU target
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.

amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64642

llvm-svn: 366348
2019-07-17 16:51:29 +00:00
Matt Arsenault 06eed42213 AMDGPU: Use getTargetConstant
Avoids creating an extra intermediate mov.

llvm-svn: 366340
2019-07-17 15:35:36 +00:00
Alex Bradbury ab009a602e [AsmPrinter] Make the encoding of call sites in .gcc_except_table configurable and use for RISC-V
The original behavior was to always emit the offsets to each call site in the
call site table as uleb128 values, however on some architectures (eg RISCV)
these uleb128 offsets into the code cannot always be resolved until link time
(because relaxation will invalidate any calculated offsets), and there are no
appropriate relocations for uleb128 values. As a consequence it needs to be
possible to specify an alternative.

This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in
.gcc_except_table

Differential Revision: https://reviews.llvm.org/D63415
Patch by Edward Jones.

llvm-svn: 366329
2019-07-17 14:00:35 +00:00
Jay Foad 70235c642e [AMDGPU] Optimize atomic AND/OR/XOR
Summary: Extend the atomic optimizer to handle AND, OR and XOR.

Reviewers: arsenm, sheredom

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64809

llvm-svn: 366323
2019-07-17 13:40:03 +00:00
Momchil Velikov 4b8da3a503 [AArch64] Add support for Transactional Memory Extension (TME)
TME is a future architecture technology, documented in

https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools
https://developer.arm.com/docs/ddi0601/a

More about the future architectures:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture

This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and
TCANCEL and the target feature/arch extension "tme".

It also implements TME builtin functions, defined in ACLE Q2 2019
(https://developer.arm.com/docs/101028/latest)

Patch by Javed Absar and Momchil Velikov

Differential Revision: https://reviews.llvm.org/D64416

llvm-svn: 366322
2019-07-17 13:23:27 +00:00
Justin Hibbits 0257c6b659 PowerPC: Fix register spilling for SPE registers
Summary:
Missed in the original commit, use the correct callee-saved register
list for spilling, instead of the standard SVR432 list.  This avoids
needlessly spilling the SPE non-volatile registers when they're not used.

As part of this, also add where missing, and sort, the spill opcode
checks for SPE and SPE4 register classes.

Reviewers: nemanjai, hfinkel, joerg

Subscribers: kbarton, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D56703

llvm-svn: 366319
2019-07-17 12:30:48 +00:00
Justin Hibbits 5214956eaa PowerPC/SPE: Fix load/store handling for SPE
Summary:
Pointed out in a comment for D49754, register spilling will currently
spill SPE registers at almost any offset.  However, the instructions
`evstdd` and `evldd` require a) 8-byte alignment, and b) a limit of 256
(unsigned) bytes from the base register, as the offset must fix into a
5-bit offset, which ranges from 0-31 (indexed in double-words).

The update to the register spill test is taken partially from the test
case shown in D49754.

Additionally, pointed out by Kei Thomsen, globals will currently use
evldd/evstdd, though the offset isn't known at compile time, so may
exceed the 8-bit (unsigned) offset permitted.  This fixes that as well,
by forcing it to always use evlddx/evstddx when accessing globals.

Part of the patch contributed by Kei Thomsen.

Reviewers: nemanjai, hfinkel, joerg

Subscribers: kbarton, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D54409

llvm-svn: 366318
2019-07-17 12:30:04 +00:00
Petar Avramovic 1e62635d05 [MIPS GlobalISel] ClampScalar and select pointer G_ICMP
Add narrowScalar to half of original size for G_ICMP.
ClampScalar G_ICMP's operands 2 and 3 to to s32.
Select G_ICMP for pointers for MIPS32. Pointer compare is same
as for integers, it is enough to declare them as legal type.

Differential Revision: https://reviews.llvm.org/D64856

llvm-svn: 366317
2019-07-17 12:08:01 +00:00
Nicolai Haehnle 8b7041a5c6 AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC
Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630

Reviewers: rampitec, mareko

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64807

Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc
llvm-svn: 366314
2019-07-17 11:22:57 +00:00