Commit Graph

164 Commits

Author SHA1 Message Date
Eric Astor a57fdcdd40 x87 FPU state instructions do not use an f32 memory location
These instructions actually use a 512-byte location, where bytes 464-511 are ignored.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86942
2020-09-01 13:50:07 -04:00
Craig Topper 31c40f2d6b [X86] Add mayLoad/mayStore flags to some X87 instructions that don't have isel patterns to infer them from.
Should remove part of the differences in D81833 due to some
some of these getting isel patterns.
2020-06-23 23:40:30 -07:00
Craig Topper 269d843720 [X86] Replace TB with PS on instructions that are documented in the SDM with 'NP'
'NP' means that the instruction is not recognized with a 66, F2 or F3
prefix. It will either #UD or decode to a different instruction.

All of the cases are here should fall into the #UD variety since
we should be detecting the collision with other instructions when
we build the disassembler tables.
2020-06-11 12:20:29 -07:00
Kazuaki Ishizaki 0312b9f550 [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
Craig Topper 4175d7e22e [X86] Custom isel floating point X86ISD::CMP on pre-CMOV targets. Eliminate ConvertCmpIfNecessary
If we don't have cmov, X87 compares write to FPSW and we need to
move the bits to EFLAGS to use as JCC/SETCC/CMOV conditions.

Previously this was done by calling ConvertCmpIfNecessary in
multiple places which would emit the extra code for the FNSTSW,
a shift, a truncate, and a SAHF instructions. Isel would then
select trunc+X86ISD::CMP to a FUCOM instruction that produces FPSW.

This patch centralizes all of the handling into a single custom
isel handler. This allows us to remove ConvertCmpIfNecessary and
a couple target specific ISD opcodes.

Differential Revision: https://reviews.llvm.org/D73863
2020-02-06 10:43:06 -08:00
Craig Topper 016f42e3dc [X86] Add custom lowering for lrint/llrint to either cvtss2si/cvtsd2si or fist.
lrint/llrint are defined as rounding using the current rounding
mode. Numbers that can't be converted raise FE_INVALID and an
implementation defined value is returned. They may also write to
errno.

I believe this means we can use cvtss2si/cvtsd2si or fist to
convert as long as -fno-math-errno is passed on the command line.
Clang will leave them as libcalls if errno is enabled so they
won't become ISD::LRINT/LLRINT in SelectionDAG.

For 64-bit results on a 32-bit target we can't use cvtss2si/cvtsd2si
but we can use fist since it can write to a 64-bit memory location.
Though maybe we could consider using vcvtps2qq/vcvtpd2qq on avx512dq
targets?

gcc also does this optimization.

I think we might be able to do this with STRICT_LRINT/LLRINT as
well, but I've left that for future work.

Differential Revision: https://reviews.llvm.org/D73859
2020-02-04 16:15:40 -08:00
Craig Topper 028579b51e [X86] FUCOMI/FCOMI instructions should Def FPSW not FPCW.
These instructions can set the exception in FPSW. But I
don't think they can change FPCW. So this looks like a typo.

Differential Revision: https://reviews.llvm.org/D73864
2020-02-03 07:39:00 -08:00
Craig Topper 5fa2022ec0 [X86] Remove X86ISD::FILD_FLAG and stop gluing nodes together.
Summary:
I think whatever problem the gluing was fixing has long since been fixed. We don't have any of the restrictions on FP stack stuff that existed back when this was first added.

I had to change which type we use for FILD in BuildFILD when X86 was enabled because most of the isel patterns block f32/f64 instructions when SSE1/SSE2 are enabled. So I needed to use the f80 pattern, but this shouldn't have an effect the generated code since there is only one FILD instruction anyway. We already use f80 explicitly in other other places.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: andrew.w.kaylor, scanon, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72805
2020-01-18 23:44:05 -06:00
Liu, Chen3 8fdafb7dce Insert wait instruction after X87 instructions which could raise
float-point exception.

This patch also modify some mayRaiseFPException flag which set in D68854.

Differential Revision: https://reviews.llvm.org/D72750
2020-01-16 12:12:51 +08:00
Craig Topper 52aaf4a275 [X86] Use SDNPOptInGlue instead of SDNPInGlue on a couple SDNodes.
At least one of these is used without a Glue. This doesn't seem
to change the X86GenDAGISel.inc output so maybe it doesn't matter?
2020-01-12 21:11:18 -08:00
Craig Topper 5fe5c0a60f [X86] Preserve fpexcept property when turning strict_fp_extend and strict_fp_round into stack operations.
We use the stack for X87 fp_round and for moving from SSE f32/f64 to
X87 f64/f80. Or from X87 f64/f80 to SSE f32/f64.

Note for the SSE<->X87 conversions the conversion always happens in the
X87 domain. The load/store ops in the X87 instructions are able
to signal exceptions.
2020-01-10 23:41:06 -08:00
Wang, Pengfei 21bc8631fe [FPEnv][X86] Constrained FCmp intrinsics enabling on X86
Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision.

Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70582
2019-12-11 08:23:09 +08:00
Craig Topper cfce8f2cfb [X86] Add strict fp support for operations of X87 instructions
This is the following patch of D68854.

This patch adds basic operations of X87 instructions, including +, -, *, / , fp extensions and fp truncations.

Patch by Chen Liu(LiuChen3)

Differential Revision: https://reviews.llvm.org/D68857
2019-11-26 10:59:41 -08:00
Pengfei Wang af3a7de20c [X86] add mayRaiseFPException flag and FPCW registers for X87 instructions
Summary:
This patch adds flag "mayRaiseFPException"  , FPCW and FPSW for X87 instructions which could raise
float exception.

Reviewers: pengfei, RKSimon, andrew.w.kaylor, uweigand, kpn, spatel, cameron.mcinally, craig.topper

Reviewed By: craig.topper

Subscribers: thakis, hiraditya, llvm-commits

Patch by LiuChen.

Differential Revision: https://reviews.llvm.org/D68854
2019-11-01 21:12:43 -07:00
Nico Weber a5bf48b84c Revert "[X86] add mayRaiseFPException flag and FPCW registers for X87 instructions"
This reverts commit a678677da4.
It broke CodeGen/ms-inline-asm.c on most bots.
2019-10-31 19:14:42 -04:00
Craig Topper a678677da4 [X86] add mayRaiseFPException flag and FPCW registers for X87 instructions
This patch adds flag "mayRaiseFPException" , FPCW and FPSW for X87 instructions which could raise
float exception.

Patch by LiuChen. With a couple small fixes from me.

Differential Revision: https://reviews.llvm.org/D68854
2019-10-31 15:05:29 -07:00
Craig Topper a0aef63208 [X86] Remove FSIN/FCOS isel patterns and the pseudo instructions that they selected for the FP stackifier.
We always expand these to libcalls so get rid of the last vestiges
of using the instructions.
2019-10-31 13:42:01 -07:00
Craig Topper f7e548c076 Recommit r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2"
With correct test checks this time.

If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integ

This matches what gcc and icc do for this case and removes an existing FIXME.

llvm-svn: 358214
2019-04-11 19:19:42 +00:00
Craig Topper 8200880c9a Revert r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2"
I seem to have messed up the test checks.

llvm-svn: 358212
2019-04-11 19:04:38 +00:00
Craig Topper 1c2dfc3100 [X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2
If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integer and store it to a stack temporary. From there we can do two 32-bit loads to get the value into integer registers without worrying about atomicness.

This matches what gcc and icc do for this case and removes an existing FIXME.

Differential Revision: https://reviews.llvm.org/D60156

llvm-svn: 358211
2019-04-11 18:40:21 +00:00
Craig Topper 2b34fdc67f [X86] Remove hasSideEffects=1 from the X87 pseudos with folded load.
This was done in r321424 to prevent scheduling from reordering things. But now that we model FPCW as a dependency, I don't think the same scheduling we were trying to prevent can occur.

llvm-svn: 354628
2019-02-21 22:00:15 +00:00
Craig Topper 8eade09249 [X86] Mark FP32_TO_INT16_IN_MEM/FP32_TO_INT32_IN_MEM/FP32_TO_INT64_IN_MEM as clobbering EFLAGS to prevent mis-scheduling during conversion from SelectionDAG to MIR.
After r354178, these instruction expand to a sequence that uses an OR instruction. That OR clobbers EFLAGS so we need to state that to avoid accidentally using the clobbered flags.

Our tests show the bug, but I didn't notice because the SETcc instructions didn't move after r354178 since it used to be safe to do the fp->int conversion first.

We should probably convert this whole sequence to SelectionDAG instead of a custom inserter to avoid mistakes like this.

Fixes PR40779

llvm-svn: 354395
2019-02-19 22:37:00 +00:00
Craig Topper 7670ede434 [X86] Collapse FP_TO_INT16_IN_MEM/FP_TO_INT32_IN_MEM/FP_TO_INT64_IN_MEM into a single opcode using memory VT to distinquish. NFC
llvm-svn: 353798
2019-02-12 06:14:18 +00:00
Craig Topper d7303ecd0b [X86] Remove the value type operand from the floating point load/store MemIntrinsicSDNodes. Use the MemoryVT instead. NFCI
We already have the memory VT, we can just match from that during isel.

llvm-svn: 353797
2019-02-12 06:14:16 +00:00
Craig Topper 5b1beda001 [X86] Removed unused SDTypeProfile. NFC
llvm-svn: 353659
2019-02-11 07:30:48 +00:00
Craig Topper fcb63c4c6c [X86] Add FPCW as an implicit use on floating point load instructions.
These instructions can generate a stack overflow exception so technically they read the stack overflow exception mask bit.

llvm-svn: 353564
2019-02-08 20:50:09 +00:00
Craig Topper 41a1792b15 [X86] Remove isReMaterializable from X87 floating point constant loads and constant pool loads.
Summary: These instructions update FPSW so they aren't generically safe to rematerialize into any location if FPSW is live for a comparison result. They also use FPCW for exception masking control. Though the only exception they can generate is stack overflow and we manage the stack ourselves so that's not really going to occur.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57934

llvm-svn: 353536
2019-02-08 17:07:54 +00:00
Craig Topper c782f18835 [X86] Add FPCW as a register and start using it as an implicit use on floating point instructions.
Summary:
FPCW contains the rounding mode control which we manipulate to implement fp to integer conversion by changing the roudning mode, storing the value to the stack, and then changing the rounding mode back. Because we didn't model FPCW and its dependency chain, other instructions could be scheduled into the middle of the sequence.

This patch introduces the register and adds it as an implciit def of FLDCW and implicit use of the FP binary arithmetic instructions and store instructions. There are more instructions that need to be updated, but this is a good start. I believe this fixes at least the reduced test case from PR40529.

Reviewers: RKSimon, spatel, rnk, efriedma, andrew.w.kaylor

Subscribers: dim, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57735

llvm-svn: 353489
2019-02-08 00:44:39 +00:00
Craig Topper bf7593ec4a [X86] Print all register forms of x87 fadd/fsub/fdiv/fmul as having two arguments where on is %st.
All of these instructions consume one encoded register and the other register is %st. They either write the result to %st or the encoded register. Previously we printed both arguments when the encoded register was written. And we printed one argument when the result was written to %st. For the stack popping forms the encoded register is always the destination and we didn't print both operands. This was inconsistent with gcc and objdump and just makes the output assembly code harder to read.

This patch changes things to always print both operands making us consistent with gcc and objdump. The parser should still be able to handle the single register forms just as it did before. This also matches the GNU assembler behavior.

llvm-svn: 353061
2019-02-04 17:28:18 +00:00
Craig Topper 7a2944efe1 [X86] Print %st(0) as %st when its implicit to the instruction. Continue printing it as %st(0) when its encoded in the instruction.
This is a step back from the change I made in r352985. This appears to be more consistent with gcc and objdump behavior.

llvm-svn: 353015
2019-02-04 04:15:10 +00:00
Craig Topper f77b858dc3 Revert r352985 "[X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly"
Looking into gcc and objdump behavior more this was overly aggressive. If the register is encoded in the instruction we should print %st(0), if its implicit we should print %st.

I'll be making a more directed change in a future patch.

llvm-svn: 353013
2019-02-04 04:15:02 +00:00
Craig Topper 5a570dd437 [X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber name to make MS inline asm work correctly
Summary:
When calculating clobbers for MS style inline assembly we fail if the asm clobbers stack top because we print st(0) and try to pass it through the gcc register name check. This was found with when I attempted to make a emms/femms clobber all ST registers. If you use emms/femms in MS inline asm we would try to use st(0) as the clobber name but clang would think that wasn't a valid clobber name.

This also matches what objdump disassembly prints. It's also what is printed by gcc -S.

Reviewers: RKSimon, rnk, efriedma, spatel, andreadb, lebedev.ri

Reviewed By: rnk

Subscribers: eraman, gbedwell, lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D57621

llvm-svn: 352985
2019-02-03 07:53:39 +00:00
Craig Topper 9dfe9b086e [X86] Add FPSW as a Def on some FP instructions that were missing it.
llvm-svn: 352607
2019-01-30 07:08:44 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Craig Topper c73095e264 [X86] Mark the FUCOMI instructions as requiring CMOV to be enabled. NFCI
These instructions were added on the PentiumPro along with CMOV.

This was already comprehended by the lowering process which should emit an alternate sequence using FCOM and FNSTW. This just makes it an explicit error if that doesn't work for some reason.

llvm-svn: 340844
2018-08-28 17:17:13 +00:00
Clement Courbet 2e41c5a79c [X86] Introduce WriteFLDC for x87 constant loads.
Summary:
{FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI} were using WriteMicrocoded.

 - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake.
 - For ZnVer1 and Atom, values were transferred form InstRWs.
 - For SLM and BtVer2, I've guessed some values :(

Reviewers: RKSimon, craig.topper, andreadb

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D47585

llvm-svn: 333656
2018-05-31 14:22:01 +00:00
Clement Courbet b78ab5097d [X86] Extract latency of fldz/fld1 in separate classes.
Summary:
 - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake.
 - For ZnVer1 and Atom, values were transferred form `InstRW`s.
 - For SLM and BtVer2, values are from Agner.

This is split off from https://reviews.llvm.org/D47377

Reviewers: RKSimon, andreadb

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D47523

llvm-svn: 333642
2018-05-31 11:41:27 +00:00
Simon Pilgrim 6e160c1813 [X86] Add WriteFCMOV scheduler class for x87 CMOVs
llvm-svn: 332173
2018-05-12 18:07:07 +00:00
Simon Pilgrim f3ae50fca2 [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes
WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.

WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.

This removes all InstrRW overrides for these instructions.

NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.

NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629
2018-05-07 11:50:44 +00:00
Craig Topper 33dc01d105 [X86] Remove 'opaque ptr' from the intel syntax parser and printer.
Previously for instructions like fxsave we would print "opaque ptr" as part of the memory operand. Now we print nothing.

We also no longer accept "opaque ptr" in the parser. We still accept any size to be specified for these instructions, but we may want to consider only parsing when no explicit size is specified. This what gas does.

llvm-svn: 331243
2018-05-01 04:42:00 +00:00
Simon Pilgrim d14d2e7b18 [X86] Add WriteFSign/WriteFLogic scheduler classes
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.

This unearthed a couple of things that are also handled in this patch:

(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.

Differential Revision: https://reviews.llvm.org/D45629

llvm-svn: 330480
2018-04-20 21:16:05 +00:00
Simon Pilgrim 86e3c26924 [X86] Add FP comparison scheduler classes
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd

Differential Revision: https://reviews.llvm.org/D45656

llvm-svn: 330179
2018-04-17 07:22:44 +00:00
Simon Pilgrim 32d368147f [X86] Remove X87 schedule itineraries (PR37093)
First of a number of commits to remove x86 schedule itineraries entirely - approved off-line with @craig.topper

llvm-svn: 329893
2018-04-12 10:27:37 +00:00
Sanjay Patel 05daae75ad [x86] put nops into the WriteNop class and customize for Jaguar
1. Given that we already have a classification bucket with 'nop' in the name, 
   that's where 'nop' belongs. Right now, it's only used for prefix bytes and 'pause'.
2. Make the latency of this class '1' for Jaguar to tell the scheduler (and presumably 
   llvm-mca) how to model the resource requirements better even though a nop has no 
   dependencies.

Differential Revision: https://reviews.llvm.org/D44608

llvm-svn: 327853
2018-03-19 14:26:50 +00:00
Simon Pilgrim e0434fad16 [X86][X87] Mark pseudo memory fold instructions as load/sideeffects (PR21160, PR34080, PR34454).
Match regular x87 memory fold instructions with load/sideeffects tags, to prevent the schedulers from re-ordering them across the fnstcw/fldcw sequences for truncating stores while they are still pseudo during the stack conversion pass.

llvm-svn: 321424
2017-12-24 12:20:21 +00:00
Craig Topper a16395008c [X86] Fix XSAVE64 and similar instructions to not be allowed by the assembler in 32-bit mode.
There was a top level "let Predicates =" in the .td file that was overriding the Requires on each instruction.

I've added an assert to the code emitter to catch more cases like this. I'm sure this isn't the only place where the right predicates aren't being applied. This assert already found that we don't block btq/btsq/btrq in 32-bit mode.

llvm-svn: 320830
2017-12-15 17:22:58 +00:00
Simon Pilgrim f621dcf8d7 [X86][X87] Tag x87 load/store instructions scheduler classes
llvm-svn: 320192
2017-12-08 20:31:48 +00:00
Simon Pilgrim 6415f56c79 [X86][X87] Tag x87 float compare instructions scheduler classes
llvm-svn: 320189
2017-12-08 20:10:31 +00:00
Simon Pilgrim bd5f7455a2 [X86][X87] X87 math binop pseudo instructions don't need scheduling info
llvm-svn: 320044
2017-12-07 14:07:18 +00:00
Simon Pilgrim 65f805fe30 [X86][X87] Tag FCMOV instruction scheduler classes
llvm-svn: 319804
2017-12-05 18:01:26 +00:00