Commit Graph

1173 Commits

Author SHA1 Message Date
Craig Topper 9cc9120969 [X86] Turn FP_ROUND/STRICT_FP_ROUND into X86ISD::VFPROUND/STRICT_VFPROUND during PreprocessISelDAG to remove some duplicate isel patterns. 2020-01-11 11:06:52 -08:00
Craig Topper d60b3b4817 [X86] Add isel patterns for bitcasting between v32i1/v64i1 and float/double.
We have to do an intermediate jump to a GPR to make the cast.

Fixes PR43750.
2020-01-08 10:06:01 -08:00
James Henderson d68904f957 [NFC] Fix trivial typos in comments
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72143

Patch by Kazuaki Ishizaki.
2020-01-06 10:50:26 +00:00
Craig Topper 3186b18b99 [X86] Reorder X86any* PatFrags to put the strict node first so that chain property will be inferred for the instruction by the tablegen backend.
Also use X86any_vfpround instead of X86vfpround in some instruction
definitions so the strict version can be used to infer the chain
property.

Without these changes we don't propagate strict FP chain through
isel for some instructions.
2020-01-03 00:11:55 -08:00
Liu, Chen3 8af492ade1 add strict float for round operation
Differential Revision: https://reviews.llvm.org/D72026
2020-01-01 20:42:12 +08:00
Craig Topper ecbaf152f8 [X86] Custom widen 128/256-bit vXi32 fp_to_uint on avx512f targets without avx512vl. Similar for vXi64 on avx512dq without avx512vl.
Summary:
Previously we did this with isel patterns that used garbage in
the widened part of the source. But that's not valid for strictfp.
So now we custom widen and use zeroes for the widened elemens for
strictfp.

This replaces D71864.

Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3

Reviewed By: pengfei

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71879
2019-12-26 22:04:40 -08:00
Liu, Chen3 1a7b69f5dd add custom operation for strict fpextend/fpround
Differential Revision: https://reviews.llvm.org/D71892
2019-12-27 08:28:33 +08:00
Craig Topper f953882113 [X86] Custom widen 128/256-bit vXi32 uint_to_fp on avx512f targets without avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl.
Previously we widened these through isel patterns, but that
didn't work for STRICT_ nodes. Those need to be padded with
zeroes in the upper bits which is harder to do in isel patterns.
2019-12-26 14:46:56 -08:00
Craig Topper 90ff34e6ab [X86] Add custom widening for v2i32->v2f64 strict_uint_to_fp with AVX512F, but not AVX512VL.
Previously we were widening with isel patterns, but that wasn't
exception safe for strict FP. So now we widen to v4i32->v4f64
during type legalization. And then let op legalization further
widen to v8i32->v8f64.

The vec_int_to_fp.ll changes are caused by us no longer narrowing
extracts of strict_uint_to_fp to the v4i32->v2f64 instruction
without AVX512VL only to have isel rewiden it. Now we just keep
it wide throughout. So we don't have an opportunity to narrow
the load.
2019-12-26 13:40:56 -08:00
Wang, Pengfei 472bded3ed [X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend
Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend

Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71871
2019-12-26 08:15:13 +08:00
Craig Topper a21beccea2 [X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.
Differential Revision: https://reviews.llvm.org/D71850
2019-12-24 10:07:04 -08:00
Ulrich Weigand 0d3f782e41 [FPEnv][X86] More strict int <-> FP conversion fixes
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:

- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
  compare can raise exceptions and therefore needs to be a strict compare.
  I've made it signaling (even though quiet would also be correct) as
  signaling is the more usual default for an LT. This code exists both
  in common code and in the X86 target.

- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
  it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
  of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
  that ends up not chosen. I've fixed the algorithm to use only a single
  STRICT_SINT_TO_FP instead.

- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
  the wrong thing because it calls getOperationAction using the result VT.
  But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
  be called using the operand VT.

- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
  STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.

Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D71840
2019-12-23 21:11:45 +01:00
Liu, Chen3 2f932b5729 Enable STRICT_FP_TO_SINT/UINT on X86 backend
This patch is mainly for custom lowering the vector operation.

Differential Revision: https://reviews.llvm.org/D71592
2019-12-19 14:49:13 +08:00
Wang, Pengfei 1949235d13 [X86] Add strict fma support
Summary: Add strict fma support

Reviewers: craig.topper, RKSimon, LiuChen3

Subscribers: hiraditya, llvm-commits, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71604
2019-12-18 11:44:00 +08:00
Wang, Pengfei 21bc8631fe [FPEnv][X86] Constrained FCmp intrinsics enabling on X86
Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision.

Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70582
2019-12-11 08:23:09 +08:00
Liu, Chen3 bbf7860b93 add support for strict operation fpextend/fpround/fsqrt on X86 backend
Differential Revision: https://reviews.llvm.org/D71184
2019-12-10 09:04:28 +08:00
Liu, Chen3 3041434450 Add strict fp support for instructions fadd/fsub/fmul/fdiv
Differential Revision: https://reviews.llvm.org/D68757
2019-12-06 09:44:33 +08:00
Wang, Pengfei c8995de069 [X86] Model DAZ and FTZ
Summary: This is a follow-up of D70881. It models DAZ and FTZ for releated instructions.

Reviewers: craig.topper, RKSimon, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70938
2019-12-04 08:22:45 +08:00
Wang, Pengfei c1c673303d [X86] Model MXCSR for all AVX512 instructions
Summary: Model MXCSR for all AVX512 instructions

Reviewers: craig.topper, RKSimon, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70881
2019-12-04 08:07:38 +08:00
Craig Topper 40dfc6dff1 [X86] Add floating point execution domain to comi/ucomi/cvtss2si/cvtsd2si/cvttss2si/cvttsd2si/cvtsi2ss/cvtsi2sd instructions. 2019-11-30 11:26:28 -08:00
Craig Topper 8f28f26860 [X86] Add SSEPackedSingle/Double execution domain to COMI/UCOMI SSE/AVX instructions. 2019-11-27 15:21:38 -08:00
Craig Topper 3687ddef2c [X86] Add proper execution domain information to the avx512vnni instructions. 2019-11-25 17:07:35 -08:00
Simon Pilgrim 4d0e7b628a [X86][AVX] Add plausible schedule classes to MASKPAIR/VP2INTERSECT/VDPBF16PS instructions
These are really just placeholders that use approximately the right resources - once we have CPUs scheduler models that support these instructions they will need revisiting.

In the meantime this means that all instructions have a class of some kind., meaning models can be more easily flagged as complete.
2019-11-13 12:02:01 +00:00
Craig Topper 87aa59a0c7 [X86] Remove isel patterns for mask vpcmpgt/vpcmpeq. Switch vpcmp to these based on the immediate in MCInstLower
The immediate form of VPCMP can represent these completely. The
vpcmpgt/eq are just shorter encodings.

This patch removes the isel patterns and just swaps the opcodes
and removes the immediate in MCInstLower. This matches where we do
some other encodings tricks.

Removes over 10K bytes from the isel table.

Differential Revision: https://reviews.llvm.org/D68446

llvm-svn: 373766
2019-10-04 18:02:46 +00:00
Craig Topper eb420aa379 [X86] Add DAG combine to turn (bitcast (vbroadcast_load)) into just a vbroadcast_load if the scalar size is the same.
This improves broadcast load folding of i64 elements on 32-bit
targets where i64 isn't legal.

Previously we had to represent these as vXf64 vbroadcast_loads and
a bitcast to vXi64. But we didn't have any isel patterns
looking for that.

This also allows us to remove or simplify some isel patterns that
were looking for bitcasted vbroadcast_loads.

llvm-svn: 373566
2019-10-03 05:30:02 +00:00
Craig Topper f849f41469 [X86] Add broadcast load folding patterns to NoVLX VPMULLQ/VPMAXSQ/VPMAXUQ/VPMINSQ/VPMINUQ patterns.
More fixes for PR36191.

llvm-svn: 373560
2019-10-03 03:16:27 +00:00
Craig Topper 241c72ddd9 [X86] Remove a couple redundant isel patterns that look to have been copy/pasted from right above them. NFC
llvm-svn: 373559
2019-10-03 03:16:21 +00:00
Craig Topper 8d6a863b02 [X86] Add broadcast load folding patterns to the NoVLX compare patterns.
These patterns use zmm registers for 128/256-bit compares when
the VLX instructions aren't available. Previously we only
supported registers, but as PR36191 notes we can fold broadcast
loads, but not regular loads.

llvm-svn: 373423
2019-10-02 04:45:02 +00:00
Craig Topper 105e82edde [X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load broadcasted to a vector.
Summary:
This adds the ISD opcode and a DAG combine to create it. There are
probably some places where we can directly create it, but I'll
leave that for future work.

This updates all of the isel patterns to look for this new node.
I had to add a few additional isel patterns for aligned extloads
which we should probably fix with a DAG combine or something. This
does mean that the broadcast load folding for avx512 can no
longer match a broadcasted aligned extload.

There's still some work to do here for combining a broadcast of
a broadcast_load. We also need to improve extractelement or
demanded vector elements of a broadcast_load. I'll try to get
those done before I submit this patch.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68198

llvm-svn: 373349
2019-10-01 16:28:20 +00:00
Craig Topper 220cf53540 [X86] Consider isCodeGenOnly in the EVEX2VEX pass to make VMAXPD/PS map to the non-commutable VEX instruction. Use EVEX2VEX override to fix the scalar instructions.
Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD
were mapped to the same VEX instruction. But we should keep
the commutableness when change the opcode.

llvm-svn: 373303
2019-10-01 07:10:09 +00:00
Craig Topper 5951e3f813 [X86] Remove some redundant isel patterns. NFCI
These are all also implemented in avx512_logical_lowering_types
with support for masking.

llvm-svn: 373181
2019-09-30 06:47:03 +00:00
Craig Topper 494bfd9fed [X86] Enable isel to fold broadcast loads that have been bitcasted from FP into a vpternlog.
llvm-svn: 373157
2019-09-29 01:24:33 +00:00
Craig Topper b6a2207ba2 [X86] Move bitselect matching to vpternlog into X86ISelDAGToDAG.cpp
This allows us to reduce the use count on the condition node before
the match. This enables load folding for that operand without
relying on the peephole pass. This will be improved on for
broadcast load folding in a subsequent commit.

This still requires a bunch of isel patterns for vXi16/vXi8 types
though.

llvm-svn: 373156
2019-09-29 01:24:29 +00:00
Craig Topper 6195ed8397 [X86] Match (or (and A, B), (andn (A, C))) to VPTERNLOG with AVX512.
This uses a similar isel pattern as we used for vpcmov with XOP.

llvm-svn: 373154
2019-09-29 01:24:16 +00:00
Craig Topper 3912ecb649 [X86] Remove CodeGenOnly instructions added in r373021, but keep the isel patterns and add COPY_TO_REGCLASS to them.
llvm-svn: 373031
2019-09-26 23:22:15 +00:00
Craig Topper c898724974 [X86] Add CodeGenOnly instructions for (f32 (X86selects $mask, (loadf32 addr), fp32imm0) to use masked MOVSS from memory.
Similar for f64 and having a non-zero passthru value.

We were previously not trying to fold the load at all. Using
a CodeGenOnly instruction allows us to use FR32X/FR64X as the
register class to avoid a bunch of COPY_TO_REGCLASS.

llvm-svn: 373021
2019-09-26 22:23:09 +00:00
Craig Topper ee78e44126 [X86] Mark the EVEX encoded PSADBW instructions as commutable to enable load folding of the other operand.
The SSE and VEX versions are already correct.

llvm-svn: 372941
2019-09-26 04:42:58 +00:00
Craig Topper c4802de31b [X86] Fix some VCVTPS2PH isel patterns where 'i32' was used instead of 'timm'
This seems to have completed omitted any check for the opcode
of the operand in the isel table.

llvm-svn: 372526
2019-09-22 20:08:57 +00:00
Craig Topper 80fda375b2 [X86][TableGen] Allow timm to appear in output patterns. Use it to remove ConvertToTarget opcodes from the X86 isel table.
We're now using a lot more TargetConstant nodes in SelectionDAG.
But we were still telling isel to convert some of them
to TargetConstants even though they already are. This is because
isel emits a conversion anytime the output pattern has a an 'imm'.
I guess for patterns in instructions we take the 'timm' from the
'set' pattern, but for Pat patterns with explcicit output we
previously had to say 'imm' since 'timm' wasn't allowed in outputs.

llvm-svn: 372525
2019-09-22 19:49:39 +00:00
Craig Topper a1d86857ff [X86] Update commutable EVEX vcmp patterns to use timm instead of imm.
We need to match TargetConstant, not Constant. This was broken
in r372338, but we lacked test coverage.

llvm-svn: 372523
2019-09-22 19:06:13 +00:00
Craig Topper 04682939eb [X86] Use sse_load_f32/f64 and timm in patterns for memory form of vgetmantss/sd.
Previously we only matched scalar_to_vector and scalar load, but
we should be able to narrow a vector load or match vzload.

Also need to match TargetConstant instead of Constant. The register
patterns were previously updated, but not the memory patterns.

llvm-svn: 372458
2019-09-21 06:44:29 +00:00
Matt Arsenault 3ecab8e455 Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
Hans Wennborg 13bdae8541 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00
Matt Arsenault d8399d12cd GlobalISel: Don't materialize immarg arguments to intrinsics
Encode them directly as an imm argument to G_INTRINSIC*.

Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.

This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.

SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.

Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.

The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.

This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372285
2019-09-19 01:33:14 +00:00
Craig Topper 769dd59a27 [X86] Allow masked VBROADCAST instructions to be turned into BLENDM with a broadcast load to avoid a copy.
The BLENDM instructions allow an 2 sources and an independent
destination while masked VBROADCAST has the destination tied
to the source.

llvm-svn: 372068
2019-09-17 04:41:10 +00:00
Craig Topper 359918dadf [X86] Enable commuting of EVEX VCMP for all immediate values during isel.
llvm-svn: 372065
2019-09-17 04:40:58 +00:00
Matt Arsenault b366329a34 DAG/GlobalISel: Correct type profile of bitcount ops
The result integer does not need to be the same width as the input.
AMDGPU, NVPTX, and Hexagon all have patterns working around the types
matching. GlobalISel defines these as being different type indexes.

llvm-svn: 371797
2019-09-13 00:11:14 +00:00
Philip Reames 0b4d67ca35 Rename nonvolatile_load/store to simple_load/store [NFC]
Implement the TODO from D66318.

llvm-svn: 371789
2019-09-12 23:03:39 +00:00
Craig Topper 72624b0e59 [X86] Use xorps to create fp128 +0.0 constants.
This matches what we do for f32/f64. gcc also does this for fp128.

llvm-svn: 371357
2019-09-09 01:35:00 +00:00
Craig Topper 9c11901256 [X86] Remove call to getZeroVector from materializeVectorConstant. Add isel patterns for zero vectors with all types.
The change to avx512-vec-cmp.ll is a regression, but should be
easy to fix. It occurs because the getZeroVector call was
canonicalizing both sides to the same node, then SimplifySelect
was able to simplify it. But since only called getZeroVector
on some VTs this isn't a robust way to combine this.

The change to vector-shuffle-combining-ssse3.ll is more
instructions, but removes a constant pool load so its unclear
if its a regression or not.

llvm-svn: 371350
2019-09-08 20:56:05 +00:00