Commit Graph

52451 Commits

Author SHA1 Message Date
Simon Atanasyan 0121432602 [mips] Add (GPR|PTR)_64 predicates to PseudoReturn64 and PseudoIndirectHazardBranch64
This patch is one of a series of patches. The goal is to make P5600
scheduler model complete and turn on the `CompleteModel` flag.

llvm-svn: 363885
2019-06-19 22:07:46 +00:00
Matt Arsenault 4d000d2488 AMDGPU: Fix folding immediate into readfirstlane through reg_sequence
The def instruction for the vreg may not match, because it may be
folding through a reg_sequence. The assert was overly conservative and
not necessary. It's not actually important if DefMI really defined the
register, because the fold that will be done cares about the def of
the value that will be folded.

For some reason copies aren't making it through the reg_sequence,
although they should.

llvm-svn: 363876
2019-06-19 20:44:15 +00:00
Peter Collingbourne 2742eeb78e hwasan: Shrink outlined checks by 1 instruction.
Turns out that we can save an instruction by folding the right shift into
the compare.

Differential Revision: https://reviews.llvm.org/D63568

llvm-svn: 363874
2019-06-19 20:40:03 +00:00
Matt Arsenault 4d55d024be Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"
This reapplies r363678, using the correct chain for the CopyToReg for
v0. glueCopyToM0 counterintuitively changes the operands of the
original node.

llvm-svn: 363870
2019-06-19 19:55:27 +00:00
Sanjay Patel b5640b6fe8 [x86] avoid vector load narrowing with extracted store uses (PR42305)
This is an exception to the rule that we should prefer xmm ops to ymm ops.
As shown in PR42305:
https://bugs.llvm.org/show_bug.cgi?id=42305
...the store folding opportunity with vextractf128 may result in better
perf by reducing the instruction count.

Differential Revision: https://reviews.llvm.org/D63517

llvm-svn: 363853
2019-06-19 18:13:47 +00:00
Simon Pilgrim 0018b78ef6 [X86][SSE] combineToExtendVectorInReg - add ANY_EXTEND support TODO. NFCI.
So I don't forget - there's a load of yak shaving to do first.

llvm-svn: 363847
2019-06-19 17:42:37 +00:00
Simon Pilgrim 34279db355 [X86][SSE] Combine shuffles to ANY_EXTEND/ANY_EXTEND_VECTOR_INREG.
We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension.

llvm-svn: 363841
2019-06-19 17:21:15 +00:00
Simon Tatham 2f5188fd58 [ARM] Add MVE vector bit-operations (register inputs).
This includes all the obvious bitwise operations (AND, OR, BIC, ORN,
MVN) in register-to-register forms, and the immediate forms of
AND/OR/BIC/ORN; byte-order reverse instructions; and the VMOVs that
access a single lane of a vector.

Some of those VMOVs (specifically, the ones that access a 32-bit lane)
share an encoding with existing instructions that were disassembled as
accessing half of a d-register (e.g. `vmov.32 r0, d1[0]`), but in
8.1-M they're now written as accessing a quarter of a q-register (e.g.
`vmov.32 r0, q0[2]`). The older syntax is still accepted by the
assembler.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62673

llvm-svn: 363838
2019-06-19 16:43:53 +00:00
Evandro Menezes 567f6c150d [AVR] Change limit type to match the argument type (NFC)
llvm-svn: 363832
2019-06-19 16:12:12 +00:00
Evandro Menezes 56c45e93ab [Hexagon] Change limit type to match the argument type (NFC)
llvm-svn: 363831
2019-06-19 16:12:01 +00:00
Simon Pilgrim cdc0236e3a [X86] getExtendInVec - take a ISD::*_EXTEND opcode instead of a IsSigned bool flag. NFCI.
Prep work to support ANY_EXTEND/ANY_EXTEND_VECTOR_INREG without needing another flag.

llvm-svn: 363818
2019-06-19 15:18:24 +00:00
Simon Pilgrim d4754cac89 [X86] Add *_EXTEND -> *_EXTEND_VECTOR_INREG opcode conversion helper. NFCI.
Given a *_EXTEND or *_EXTEND_VECTOR_INREG opcode, convert it to *_EXTEND_VECTOR_INREG.

llvm-svn: 363812
2019-06-19 14:54:02 +00:00
Simon Pilgrim 2b309027ed [X86] Merge extract_subvector(*_EXTEND) and extract_subvector(*_EXTEND_VECTOR_INREG) handling. NFCI.
llvm-svn: 363808
2019-06-19 14:25:27 +00:00
Ulrich Weigand 3641b10f3d [SystemZ] Support vector load/store alignment hints
Vector load/store instructions support an optional alignment field
that the compiler can use to provide known alignment info to the
hardware.  If the field is used (and the information is correct),
the hardware may be able (on some models) to perform faster memory
accesses than otherwise.

This patch adds support for alignment hints in the assembler and
disassembler, and fills in known alignment during codegen.

llvm-svn: 363806
2019-06-19 14:20:00 +00:00
Simon Pilgrim 128ce93c60 Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on
SI/CI.
........
Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/

llvm-svn: 363797
2019-06-19 13:00:54 +00:00
Lewis Revill 18737e81eb [RISCV] Allow parsing immediates that use tilde & exclaim
This patch allows immediates (and CSR alias immediates) which start with
a tilde token or an exclaim (!) token to be parsed as intended.

Differential Revision: https://reviews.llvm.org/D57320

llvm-svn: 363783
2019-06-19 10:27:24 +00:00
Lewis Revill 218aa0edb1 [RISCV] Fix failure to parse parenthesized immediates
Since the parser attempts to parse an operand as a register with
parentheses before parsing it as an immediate, immediates in
parentheses should not be parsed by parseRegister. However in the case
where the immediate does not start with an identifier, the LParen is not
unlexed and so the RParen causes an unexpected token error.

This patch adds the missing UnLex, and modifies the existing UnLex to
not use a buffered token, as it should always be unlexing an LParen.

Differential Revision: https://reviews.llvm.org/D57319

llvm-svn: 363782
2019-06-19 10:11:13 +00:00
Clement Courbet 4ef7c2868a [X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr
Summary:
llvm.x86.sse.stmxcsr only writes to memory.
llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE.

Reviewers: craig.topper, RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62896

llvm-svn: 363773
2019-06-19 08:44:31 +00:00
Lewis Revill 39263ac5d1 [RISCV] Add lowering of global TLS addresses
This patch adds lowering for global TLS addresses for the TLS models of
InitialExec, GlobalDynamic, LocalExec and LocalDynamic.

LocalExec support required using a 4-operand add instruction, which uses
the fourth operand to express a relocation on the symbol. The necessary
fixup is emitted when the instruction is emitted.

Differential Revision: https://reviews.llvm.org/D55305

llvm-svn: 363771
2019-06-19 08:40:59 +00:00
Chen Zheng c5b918de58 [NFC] move some hardware loop checking code to a common place for other using.
Differential Revision: https://reviews.llvm.org/D63478

llvm-svn: 363758
2019-06-19 01:26:31 +00:00
Matt Arsenault 9cac4e6d14 Rename ExpandISelPseudo->FinalizeISel, delay register reservation
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.

Patch by Matthias Braun

llvm-svn: 363757
2019-06-19 00:25:39 +00:00
Thomas Lively 1885747498 [WebAssembly] Optimize ISel for SIMD Boolean reductions
Summary:
Converting the result *.{all,any}_true to a bool at the source level
generates LLVM IR that compares the result to 0. This check is
redundant since these instructions already return either 0 or 1 and
therefore conform to the BooleanContents setting for WebAssembly. This
CL adds patterns to detect and remove such redundant operations on the
result of Boolean reductions.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63529

llvm-svn: 363756
2019-06-19 00:02:13 +00:00
Huihui Zhang d16779a732 [ARM] Comply with rules on ARMv8-A thumb mode partial deprecation of IT.
Summary:
When identifing instructions that can be folded into a MOVCC instruction,
checking for a predicate operand is not enough, also need to check for
thumb2 function, with restrict-IT, is the machine instruction eligible for
ARMv8 IT or not.

Notes in ARMv8-A Architecture Reference Manual, section "Partial deprecation of IT"
  https://usermanual.wiki/Pdf/ARM20Architecture20Reference20ManualARMv8.1667877052.pdf

"ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to
instructions other than a single subsequent 16-bit instruction from a restricted set
are deprecated, as are explicit references to the PC within that single 16-bit
instruction. This permits the non-deprecated forms of IT and subsequent instructions
to be treated as a single 32-bit conditional instruction."

Reviewers: efriedma, lebedev.ri, t.p.northover, jmolloy, aemerson, compnerd, stoklund, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63474

llvm-svn: 363739
2019-06-18 20:55:09 +00:00
Sam Elliott 9f155bc6e5 [RISCV] Prevent re-ordering some adds after shifts
Summary:
DAGCombine will normally turn a `(shl (add x, c1), c2)` into `(add (shl x, c2), c1 << c2)`, where `c1` and `c2` are constants. This can be prevented by a callback in TargetLowering.

On RISC-V, materialising the constant `c1 << c2` can be more expensive than materialising `c1`, because materialising the former may take more instructions, and may use a register, where materialising the latter would not.

This patch implements the hook in RISCVTargetLowering to prevent this transform, in the cases where:
- `c1` fits into the immediate field in an `addi` instruction.
- `c1` takes fewer instructions to materialise than `c1 << c2`.

In future, DAGCombine could do the check to see whether `c1` fits into an add immediate, which might simplify more targets hooks than just RISC-V.

Reviewers: asb, luismarques, efriedma

Reviewed By: asb

Subscribers: xbolva00, lebedev.ri, craig.topper, lewis-revill, Jim, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62857

llvm-svn: 363736
2019-06-18 20:38:08 +00:00
Stanislav Mekhanoshin bb1c8b6f5c [AMDGPU] gfx10 wave32 patterns
Differential Revision: https://reviews.llvm.org/D63511

llvm-svn: 363729
2019-06-18 20:00:24 +00:00
Stanislav Mekhanoshin ab4f2ea793 [AMDGPU] gfx1010 disassembler changes for wave32
Differential Revision: https://reviews.llvm.org/D63506

llvm-svn: 363721
2019-06-18 19:10:59 +00:00
Craig Topper 10e6128c62 [X86] Remove unnecessary line that makes v4f32 FP_ROUND Legal. NFC
FP_ROUND defaults to Legal for all MVT types and nothing changes
the v4f32 entry way from this default. If we needed this line
we'd also need one for v8f32 with AVX512 which we don't have.

llvm-svn: 363719
2019-06-18 19:04:03 +00:00
Simon Atanasyan 796e7f8724 [mips] Add more strict predicates to the RSQRT_S_MM and TAILCALL_MM
This patch is one of a series of patches. The goal is to make P5600
scheduler model complete and turn on the `CompleteModel` flag.

llvm-svn: 363703
2019-06-18 17:00:08 +00:00
Simon Atanasyan 60a9d0c248 [mips] Add PTR_64 and GPR_64 predicates to some MIPS 64-bit instructions
Add `IsGP64bit` and `IsPTR64bit` to the list of `UnsupportedFeatures`
of the P5600 scheduling definitions. Also mark some MIPS 64-bit
instructions by PTR_64 and GPR_64 predicates. This reduces number
of "No schedule information for" and "lacks information for" errors
in case of marking this scheduler model as complete.

This patch is one of a series of patches. The goal is to make P5600
scheduler model complete and turn on the `CompleteModel` flag.

Differential Revision: https://reviews.llvm.org/D63237

llvm-svn: 363702
2019-06-18 16:59:57 +00:00
Simon Atanasyan 9086ba8763 [mips] Set the hasNoSchedulingInfo flag for the `MipsAsmPseudoInst`
Set the hasNoSchedulingInfo flag for the`MipsAsmPseudoInst`. These
pseudo-instructions are never used by codegen. This flag allows to
reduce number of "No schedule information for" and "lacks information
for" errors in case of marking a scheduler model as complete.

This patch is one of a series of patches. The goal is to make P5600
scheduler model complete and turn on the `CompleteModel` flag.

Differential Revision: https://reviews.llvm.org/D63236

llvm-svn: 363701
2019-06-18 16:59:47 +00:00
Simon Tatham cfc70782d7 [ARM] Add MVE vector shift instructions.
This includes saturating and non-saturating shifts, both with
immediate shift count and with the shift counts given by another
vector register; VSHLC (in which the bits shifted out of each active
vector lane are shifted in to the next active lane); and also VMOVL,
which is enough like an immediate shift that it didn't fit too badly
in this category.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62672

llvm-svn: 363696
2019-06-18 16:19:59 +00:00
Simon Tatham faaf1a5366 [ARM] Add MVE integer vector min/max instructions.
Summary:
These form a small family of their own, to go with the floating-point
VMINNM/VMAXNM instructions added in a previous commit.

They introduce the first of many special cases in the mnemonic
recognition code, because VMIN with the E suffix used by the VPT
predication system needs to avoid being interpreted as the nonexistent
instruction 'VMI' with an ordinary 'NE' condition suffix.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62671

llvm-svn: 363695
2019-06-18 15:51:46 +00:00
Simon Pilgrim 9c8593934a [X86][AVX] extract_subvector(any_extend(x)) -> any_extend_vector_inreg(x)
Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here.

llvm-svn: 363693
2019-06-18 15:30:50 +00:00
Simon Tatham ed4a602515 [ARM] Rename MVE instructions in Tablegen for consistency.
Summary:
Their names began with a mishmash of `MVE_`, `t2` and no prefix at
all. Now they all start with `MVE_`, which seems like a reasonable
choice on the grounds that (a) NEON is the thing they're most at risk
of being confused with, and (b) MVE implies Thumb-2, so a prefix
indicating MVE is strictly more specific than one indicating Thumb-2.

Reviewers: ostannard, SjoerdMeijer, dmgreen

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63492

llvm-svn: 363690
2019-06-18 15:05:42 +00:00
Lewis Revill 74c8364954 [RISCV] Lower calls through PLT
This patch adds support for generating calls through the procedure
linkage table where required for a given ExternalSymbol or GlobalAddress
callee.

Differential Revision: https://reviews.llvm.org/D55304

llvm-svn: 363686
2019-06-18 14:29:45 +00:00
Matt Arsenault 8d35dcd703 AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on
SI/CI.

llvm-svn: 363678
2019-06-18 13:19:57 +00:00
Matt Arsenault f39f3bd056 AMDGPU: Change API for checking for exec modification
Invert the name and return value to better reflect the imprecise
nature.

Force passing in the DefMI, since it's known in the 2 users and could
possibly fail for an arbitrary vreg.

Allow specifying a specific user instruction. Scan through use
instructions, instead of use operands. Add scan thresholds instead of
searching infinitely.

Stop using a set to track seen uses. I didn't understand this usage,
or why it would not check the last use. I don't think the use list has
any particular order.

llvm-svn: 363675
2019-06-18 12:48:36 +00:00
Matt Arsenault bcb5ea0042 AMDGPU: Fold readlane from copy of SGPR or imm
These may be inserted to assert uniformity somewhere.

llvm-svn: 363670
2019-06-18 12:23:46 +00:00
Matt Arsenault e75e197ad8 AMDGPU: Remove unnecessary check for virtual register
The copy was found by searching the uses of a virtual register, so
it's already known to be virtual.

llvm-svn: 363669
2019-06-18 12:23:45 +00:00
Matt Arsenault 23f03f5059 AMDGPU: Fix iterator crash in AMDGPUPromoteAlloca
The lifetime intrinsic was erased, which was the next iterator.

llvm-svn: 363668
2019-06-18 12:23:44 +00:00
Matt Arsenault d5ce8ec778 AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scale
llvm-svn: 363667
2019-06-18 12:23:42 +00:00
Sjoerd Meijer 7a7009f7c8 [ARM] Some Thumb2ITBlock clean ups. NFC
Some more refactoring, like registering the IT Block pass, less cryptic
variable names, and some simplification of loops.

Differential Revision: https://reviews.llvm.org/D63419

llvm-svn: 363666
2019-06-18 12:13:11 +00:00
Jonas Paulsson 5c64a8c4c6 [SystemZ] Fix AHIMuxK pseudo expansion.
Do not emit a copy if the source and destination registers are the same.

Review: Ulrich Weigand
llvm-svn: 363665
2019-06-18 12:10:02 +00:00
Valery Pykhtin 7e854e1cdd [AMDGPU] Speed up live-in virtual register set computaion in GCNScheduleDAGMILive.
Differential revision: https://reviews.llvm.org/D62401

llvm-svn: 363661
2019-06-18 11:43:17 +00:00
Simon Pilgrim 7dd529e54d [X86] Replace any_extend* vector extensions with zero_extend* equivalents
First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal.

In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep......

Differential Revision: https://reviews.llvm.org/D63326

llvm-svn: 363655
2019-06-18 09:50:13 +00:00
Craig Topper f4284f8a9d [X86] Move code that shrinks immediates for ((x << C1) op C2) into a helper function. NFCI
Preliminary step for D59909

llvm-svn: 363645
2019-06-18 04:23:58 +00:00
Craig Topper 587427716c [X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly instructions.
The isel patterns for these use a bitcast and load/store, but
DAG combine should have canonicalized those away.

For the purposes of the memory folding table these opcodes can be
replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes.

llvm-svn: 363644
2019-06-18 03:23:15 +00:00
Craig Topper 8582ecd8d9 [X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class.
Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt.

Use the new versions in patterns that previously used a COPY_TO_REGCLASS
to VR128. These patterns expect the upper bits to be zero. The
current set up appears to work, but I'm not sure we should be
enforcing upper bits being zero through a COPY_TO_REGCLASS.

I wanted to flip the arrangement and use a COPY_TO_REGCLASS to
FR32/FR64 for the patterns that need an f32/f64 result, but that
complicated fastisel and globalisel.

I've been doing some experiments with reducing some isel patterns
and ended up in a situation where I had a
(SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our
post-isel peephole was unable to avoid using an instruction for
the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128
instruction removes the COPY_TO_REGCLASS that was breaking this.

llvm-svn: 363643
2019-06-18 03:23:11 +00:00
Amara Emerson 146882242f [GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block.
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.

The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.

One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.

Overall, these changes improve CTMark code size on arm64 by 1.2%.

Full code size results:

Program                                         baseline       new       diff
------------------------------------------------------------------------------
 test-suite...-typeset/consumer-typeset.test    1249984      1217216     -2.6%
 test-suite...:: CTMark/ClamAV/clamscan.test    1264928      1232152     -2.6%
 test-suite :: CTMark/SPASS/SPASS.test          1394092      1361316     -2.4%
 test-suite...Mark/mafft/pairlocalalign.test    731320       714928      -2.2%
 test-suite :: CTMark/lencod/lencod.test        1340592      1324200     -1.2%
 test-suite :: CTMark/kimwitu++/kc.test         3853512      3820420     -0.9%
 test-suite :: CTMark/Bullet/bullet.test        3406036      3389652     -0.5%
 test-suite...ark/tramp3d-v4/tramp3d-v4.test    8017000      8016992     -0.0%
 test-suite...TMark/7zip/7zip-benchmark.test    2856588      2856588      0.0%
 test-suite...:: CTMark/sqlite3/sqlite3.test    765704       765704       0.0%
 Geomean difference                                                      -1.2%

Differential Revision: https://reviews.llvm.org/D63303

llvm-svn: 363632
2019-06-17 23:20:29 +00:00
Craig Topper 971ad74ba2 Use VR128X instead of FR32X/FR64X for the register class in VMOVSSZmrk/VMOVSDZmrk.
Removes COPY_TO_REGCLASS from some patterns.

llvm-svn: 363630
2019-06-17 23:08:29 +00:00