llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	a7e45ea30d	[X86] Add memory operand to AESENC/AESDEC Key Locker instructions. This removes FIXMEs from selectAddr.	2020-10-03 21:42:16 -07:00
Craig Topper	39fc4a0b0a	[X86] Move ENCODEKEY128/256 handling from lowering to selection. We should avoid emitting MachineSDNodes from lowering. We can use the the implicit def handling in InstrEmitter to avoid manually copying from each xmm result register. We only need to manually emit the copies for the implicit uses.	2020-10-03 18:44:53 -07:00
Craig Topper	adccc0bfa3	[X86] Add X86ISD opcodes for the Key Locker AESENCKL and AESDECKL instructions Instead of emitting MachineSDNodes during lowering, emit X86ISD opcodes. These opcodes will either be selected by tablegen patterns or custom selection code. Emitting MachineSDNodes during lowering is uncommon so this makes things more consistent. It also allows selectAddr to be called to perform address matching during instruction selection. I had trouble getting tablegen to accept XMM0-XMM7 as results in an isel pattern for the WIDE instructions so I had to use custom instruction selection.	2020-10-03 16:55:19 -07:00
Craig Topper	46673763fe	[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract Fixes PR47482	2020-09-14 16:59:04 -07:00
Craig Topper	da1aaa0b70	Revert "[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract." I got the bug number wrong. This reverts commit `3251593890`.	2020-09-14 16:58:57 -07:00
Craig Topper	3251593890	[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract. Fixes PR47525	2020-09-14 16:28:37 -07:00
Simon Pilgrim	23d9f4b958	[X86] Fix llvm-qualified-auto warning by using auto*. NFC.	2020-09-03 14:21:17 +01:00
Craig Topper	ab7151f1cf	[X86] Make PreprocessISelDAG create X86ISD::VRNDSCALE nodes with i32 constants instead of i8. This is the type declared in X86InstrFragmentsSIMD.td. ISel pattern matching doesn't check so it doesn't matter in practice. Maybe for SelectionDAG CSE it would matter.	2020-08-17 17:25:51 -07:00
Craig Topper	9201efb3b9	[X86] Custom match X86ISD::VPTERNLOG in X86ISelDAGToDAG in order to reduce isel patterns. By factoring out the end of tryVPTERNLOG, we can use the same code to directly match X86ISD::VPTERNLOG. This allows us to remove around 3-4K worth of X86GenDAGISel.inc.	2020-08-10 23:15:58 -07:00
Wang, Pengfei	9512525947	[X86][FPEnv] Teach X86 mask compare intrinsics to respect strict FP semantics. When we use mask compare intrinsics under strict FP option, the masked elements shouldn't raise any exception. So, we cann't replace the intrinsic with a full compare + "and" operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D85385	2020-08-11 10:28:41 +08:00
Craig Topper	966a58e329	[X86] Support matching VPTERNLOG when the root node is X86ISD::ANDNP.	2020-08-08 13:11:47 -07:00
Craig Topper	75f134eec1	[X86] Refactor the broadcast and load folding in tryVPTESTM to reduce some code. Now we try to load and broadcast together for operand 1. Followed by load and broadcast for operand 1. Previously we tried load operand 1, load operand 1, broadcast operand 0, broadcast operand 1. Now we have a single helper that tries load and broadcast for one operand that we can just call twice.	2020-07-31 23:57:13 -07:00
Craig Topper	1bd7046e4c	[X86] Use TargetLowering::getRegClassFor to simplify some code in tryVPTESTM. NFCI	2020-07-31 21:39:10 -07:00
Craig Topper	93c678a79b	[X86] Simplify vpternlog immediate selection. Rather than hardcoding immediate values for 12 different combinations in a nested pair of switches, we can perform the matched logic operation on 3 magic constants to calculate the immediate. Special thanks to this tweet https://twitter.com/rygorous/status/1187034321992871936 for making me realize I could do this.	2020-07-31 17:16:27 -07:00
Craig Topper	c4823b24a4	[X86] Add custom lowering for llvm.roundeven with sse4.1. We can use the roundss/sd/ps/pd instructions like we do for ceil/floor/trunc/rint/nearbyint. Differential Revision: https://reviews.llvm.org/D84592	2020-07-29 10:23:08 -07:00
Craig Topper	df12524e6b	[X86] Turn X86DAGToDAGISel::tryVPTERNLOG into a fully custom instruction selector that can handle bitcasts between logic ops Previously we just matched the logic ops and replaced with an X86ISD::VPTERNLOG node that we would send through the normal pattern match. But that approach couldn't handle a bitcast between the logic ops. Extending that approach would require us to peek through the bitcasts and emit new bitcasts to match the types. Those new bitcasts would then have to be properly topologically sorted. This patch instead switches to directly emitting the MachineSDNode and skips the normal tablegen pattern matching. We do have to handle load folding and broadcast load folding ourselves now. Which also means commuting the immediate control. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D83630	2020-07-26 12:19:08 -07:00
Eric Christopher	e958379581	Fold the opt size check into the assert to silence an unused variable warning.	2020-07-13 16:05:24 -07:00
Hiroshi Yamauchi	fb558ccae7	[PGO][PGSO] Add profile guided size optimization to X86ISelDAGToDAG. Differential Revision: https://reviews.llvm.org/D83331	2020-07-13 10:28:09 -07:00
Xiang1 Zhang	939d8309db	[X86-64] Support Intel AMX Intrinsic INTEL ADVANCED MATRIX EXTENSIONS (AMX). AMX is a new programming paradigm, it has a set of 2-dimensional registers (TILES) representing sub-arrays from a larger 2-dimensional memory image and operate on TILES. These intrinsics use direct TMM register number as its params. Spec can be found in Chapter 3 here https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D83111	2020-07-07 10:13:40 +08:00
Craig Topper	e75f2d5a8c	[X86] Add matching support for X86ISD::ANDNP to X86DAGToDAGISel::tryVPTERNLOG.	2020-07-03 17:50:35 -07:00
Craig Topper	52855ed099	[X86] Add back support for matching VPTERNLOG from back to back logic ops. I think this mostly looks ok. The only weird thing I noticed was a couple rotate vXi8 tests picked up an extra logic op where we have (and (or (and), (andn)), X). Previously we matched the (or (and), (andn)) to vpternlog, but now we match the (and (or), X) and leave the and/andn unmatched.	2020-07-02 22:11:52 -07:00
Craig Topper	1df1186ab1	[X86] Use some preprocessor macros to reduce the very similar repeated code in getVPTESTMOpc. NFCI This function picks X86 opcode name based on type, masking, and whether not a load or broadcast has been folded using multiple switch statements. The contents of the switches mostly just vary in a few characters in the instruction name. So use some macros to build the instruction names to reduce the repetiveness.	2020-06-30 14:38:22 -07:00
Guillaume Chatelet	a976ea3209	[Alignment][NFC] Migrate PPC, X86 and XCore backends to Align This patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82779	2020-06-30 08:08:45 +00:00
Craig Topper	d72cb4ce21	Recommit "[X86] Separate imm from relocImm handling." Fix the copy/paste mistake that caused it to fail previously	2020-06-15 10:59:43 -07:00
Hans Wennborg	f47a776628	Revert "[X86] Separate imm from relocImm handling." > relocImm was a complexPattern that handled both ConstantSDNode > and X86Wrapper. But it was only applied selectively because using > it would cause patterns to be not importable into FastISel or > GlobalISel. So it only got applied to flag setting instructions, > stores, RMW arithmetic instructions, and rotates. > > Most of the test changes are a result of making patterns available > to GlobalISel or FastISel. The absolute-cmp.ll change is due to > this fixing a pattern ordering issue to make an absolute symbol > match to an 8-bit immediate before trying a 32-bit immediate. > > I tried to use PatFrags to reduce the repetition, but I was getting > errors from TableGen. This caused "Invalid EmitNode" assertions, see the llvm-commits thread for discussion.	2020-06-15 16:14:59 +02:00
Craig Topper	8885a7640b	[X86] Separate imm from relocImm handling. relocImm was a complexPattern that handled both ConstantSDNode and X86Wrapper. But it was only applied selectively because using it would cause patterns to be not importable into FastISel or GlobalISel. So it only got applied to flag setting instructions, stores, RMW arithmetic instructions, and rotates. Most of the test changes are a result of making patterns available to GlobalISel or FastISel. The absolute-cmp.ll change is due to this fixing a pattern ordering issue to make an absolute symbol match to an 8-bit immediate before trying a 32-bit immediate. I tried to use PatFrags to reduce the repetition, but I was getting errors from TableGen.	2020-06-13 11:29:28 -07:00
Craig Topper	1385ab356a	[X86] Use X86AS enum constants to replace hardcoded numbers in more places. NFC	2020-06-10 22:31:21 -07:00
Craig Topper	324e13668e	[X86] Split imm handling out of selectMOV64Imm32 and add a separate isel pattern. This makes the pattern available to global isel.	2020-06-10 11:12:36 -07:00
Sanjay Patel	6f6d2d2383	[x86] refine conditions for immediate hoisting to save code-size As shown in PR46237: https://bugs.llvm.org/show_bug.cgi?id=46237 The size-savings win for hoisting an 8-bit ALU immediate (intentionally excluding store constants) requires extreme conditions; it may not even be possible when including REX prefix bytes on x86-64. I did draft a version of this patch that included use counts after the loop, but I suspect that accounting is not working as expected. I think that is because the number of constant uses are changing as we select instructions (for example as we transform shl/add into LEA). Differential Revision: https://reviews.llvm.org/D81468	2020-06-09 15:44:55 -04:00
Guozhi Wei	587af86f1d	[X86] Add a flag to guard the wide load As shown in http://lists.llvm.org/pipermail/llvm-dev/2020-May/141854.html, widen load can also cause stall. Add a flag to guard the widening code, so users can disable it and evaluate its performance impact. Differential Revision: https://reviews.llvm.org/D80943	2020-06-02 16:16:13 -07:00
Craig Topper	961c1b5f72	[X86] Remove DeleteNode calls from PreprocessISelDAG. Rely on the RemoveDeadNodes call at the end. Add a MadeChange flag so we don't call RemoveDeadNodes unless something changed.	2020-06-02 14:10:20 -07:00
Craig Topper	07e8a780d8	[X86] Add pseudo instructions to use MULX with a single destination when the low result isn't used. The instruction is defined to only produce high result if both destinations are the same. We can exploit this to avoid unnecessarily clobbering a register. In order to hide this from register allocation we use a pseudo instruction and expand the result during MCInst creation. Differential Revision: https://reviews.llvm.org/D80500	2020-05-30 16:01:01 -07:00
Craig Topper	8e7e6a8d6b	[X86] Restore selection of MULX on BMI2 targets. Looking back over gcc and icc behavior it looks like icc does use mulx32 on 32-bit targets and mulx64 on 64-bit targets. It's also used when dividing i32 by constant on 32-bit targets and i64 by constant on 64-bit targets. gcc uses it multiplies producing a 64 bit result on 32-bit targets and 128-bit results on a 64-bit target. gcc does not appear to use it for division by constant. After this patch clang is closer to the icc behavior. This basically reverts `d1c61861dd`, but there were no strong feelings at the time. Fixes PR45518. Differential Revision: https://reviews.llvm.org/D80498	2020-05-27 12:01:18 -07:00
Craig Topper	7940123084	[X86] Fix typo in comment. NFC	2020-05-24 00:29:24 -07:00
Arthur Eubanks	8a88755610	Reland [X86] Codegen for preallocated See https://reviews.llvm.org/D74651 for the preallocated IR constructs and LangRef changes. In X86TargetLowering::LowerCall(), if a call is preallocated, record each argument's offset from the stack pointer and the total stack adjustment. Associate the call Value with an integer index. Store the info in X86MachineFunctionInfo with the integer index as the key. This adds two new target independent ISDOpcodes and two new target dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}. The setup ISelDAG node takes in a chain and outputs a chain and a SrcValue of the preallocated call Value. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an %esp adjustment, the exact amount determined by looking in X86MachineFunctionInfo with the integer index key. The arg ISelDAG node takes in a chain, a SrcValue of the preallocated call Value, and the arg index int constant. It produces a chain and the pointer fo the arg. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a lea of the stack pointer plus an offset determined by looking in X86MachineFunctionInfo with the integer index key. Force any function containing a preallocated call to use the frame pointer. Does not yet handle a setup without a call, or a conditional call. Does not yet handle musttail. That requires a LangRef change first. Tried to look at all references to inalloca and see if they apply to preallocated. I've made preallocated versions of tests testing inalloca whenever possible and when they make sense (e.g. not alloca related, inalloca edge cases). Aside from the tests added here, I checked that this codegen produces correct code for something like ``` struct A { A(); A(A&&); ~A(); }; void bar() { foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8); } ``` by replacing the inalloca version of the .ll file with the appropriate preallocated code. Running the executable produces the same results as using the current inalloca implementation. Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77689	2020-05-20 11:25:44 -07:00
Arthur Eubanks	b8cbff51d3	Revert "[X86] Codegen for preallocated" This reverts commit `810567dc69`. Some tests are unexpectedly passing	2020-05-20 10:04:55 -07:00
Arthur Eubanks	810567dc69	[X86] Codegen for preallocated See https://reviews.llvm.org/D74651 for the preallocated IR constructs and LangRef changes. In X86TargetLowering::LowerCall(), if a call is preallocated, record each argument's offset from the stack pointer and the total stack adjustment. Associate the call Value with an integer index. Store the info in X86MachineFunctionInfo with the integer index as the key. This adds two new target independent ISDOpcodes and two new target dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}. The setup ISelDAG node takes in a chain and outputs a chain and a SrcValue of the preallocated call Value. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an %esp adjustment, the exact amount determined by looking in X86MachineFunctionInfo with the integer index key. The arg ISelDAG node takes in a chain, a SrcValue of the preallocated call Value, and the arg index int constant. It produces a chain and the pointer fo the arg. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a lea of the stack pointer plus an offset determined by looking in X86MachineFunctionInfo with the integer index key. Force any function containing a preallocated call to use the frame pointer. Does not yet handle a setup without a call, or a conditional call. Does not yet handle musttail. That requires a LangRef change first. Tried to look at all references to inalloca and see if they apply to preallocated. I've made preallocated versions of tests testing inalloca whenever possible and when they make sense (e.g. not alloca related, inalloca edge cases). Aside from the tests added here, I checked that this codegen produces correct code for something like ``` struct A { A(); A(A&&); ~A(); }; void bar() { foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8); } ``` by replacing the inalloca version of the .ll file with the appropriate preallocated code. Running the executable produces the same results as using the current inalloca implementation. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77689	2020-05-20 09:20:38 -07:00
Jean-Michel Gorius	cd12e79e6d	[x86] Propagate memory operands during ISel DAG postprocessing Summary: Propagate memory operands when folding test instructions. This was split from D80062. Reviewers: craig.topper, rnk, lebedev.ri Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80140	2020-05-18 21:35:31 +02:00
Craig Topper	135b877874	[X86] Replace selectScalarSSELoad ComplexPattern with PatFrags to handle the 3 types of loads we currently match. This ensures we create mem operands for these instructions fixing PR45949. Unfortunately, it increases the size of X86GenDAGISel.inc, but some dag combine canonicalization could reduce the types of load we need to match.	2020-05-16 14:30:45 -07:00
Craig Topper	bebdc62c3f	[SelectionDAG] Remove ConstantPoolSDNode::getAlignment. Use getAlign instead. Differential Revision: https://reviews.llvm.org/D79459	2020-05-08 16:04:11 -07:00
Craig Topper	d1119980e5	[SelectionDAG] Use Align/MaybeAlign for ConstantPoolSDNode. This patch stores the alignment for ConstantPoolSDNode as an Align and updates the getConstantPool interface to take a MaybeAlign. Removing getAlignment() will be done as a follow up. Differential Revision: https://reviews.llvm.org/D79436	2020-05-08 16:04:11 -07:00
Simon Pilgrim	33f043cc9f	X86ISelDAGToDAG.cpp - remove unnecessary includes. NFC. The X86 specific headers have to include these so we don't need to duplicate.	2020-04-26 14:50:53 +01:00
Craig Topper	8dfb9627b7	[X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead. This moves v32i16/v64i8 to a model consistent with how we treat integer types with avx1. This does change the ABI for types vXi16/vXi8 vectors larger than 512 bits to pass in multiple zmms instead of multiple ymms. We'd already hacked some code to make v64i8/v32i16 pass in zmm. Cost model is still a bit of a mess. In some place I tried to match existing behavior. But really we need to account for splitting and concating costs. Cost model for shuffles is especially pessimistic. Differential Revision: https://reviews.llvm.org/D76212	2020-04-15 12:17:18 -07:00
Craig Topper	d1da1b53ff	[X86] Cleanup ISD::BRIND handling code in X86DAGToDAGISel::Select. NFC -Drop llvm:: on MVT::i32 -Use getValueType instead of getSimpleValueType for an equality check just cause its shorter and doesn't matter. -Don't create a const SDValue & since its cheap to copy. -Remove explicit case from MVT enum to EVT. -Add message to assert.	2020-04-11 15:01:05 -07:00
Craig Topper	21a7d08e72	[X86] Move code that replaces ISD::VSELECT with X86ISD::BLENDV from X86DAGToDAGISel::Select to PreprocessISelDAG	2020-04-11 15:01:05 -07:00
Scott Constable	71e8021d82	[X86][NFC] Generalize the naming of "Retpoline Thunks" and related code to "Indirect Thunks" There are applications for indirect call/branch thunks other than retpoline for Spectre v2, e.g., https://software.intel.com/security-software-guidance/software-guidance/load-value-injection Therefore it makes sense to refactor X86RetpolineThunks as a more general capability. Differential Revision: https://reviews.llvm.org/D76810	2020-04-02 21:55:13 -07:00
Guillaume Chatelet	c7468c1696	[Alignment][NFC] Use Align in SelectionDAG::getMemIntrinsicNode Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, nemanjai, hiraditya, kbarton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77149	2020-04-01 09:32:05 +00:00
Craig Topper	cdd1cd7120	[X86] Don't form masked instructions if the operation has an additional user. This will cause the operation to be repeated in both a mask and another masked or unmasked form. This can a wasted of execution resources. Differential Revision: https://reviews.llvm.org/D60940	2020-03-27 10:44:22 -07:00
Simon Pilgrim	e20e6f26fa	Fix shadow variable warning. NFC.	2020-03-02 18:53:19 +00:00
Simon Pilgrim	2b624e04c7	Fix 'unsigned variable can never be negative' cppcheck warning. NFCI.	2020-03-02 18:53:18 +00:00

1 2 3 4 5 ...

992 Commits