llvm-project

Commit Graph

Author	SHA1	Message	Date
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Nico Weber	432a38838d	IWYU for llvm-config.h in llvm, additions. See r331124 for how I made a list of files missing the include. I then ran this Python script: for f in open('filelist.txt'): f = f.strip() fl = open(f).readlines() found = False for i in xrange(len(fl)): p = '#include "llvm/' if not fl[i].startswith(p): continue if fl[i][len(p):] > 'Config': fl.insert(i, '#include "llvm/Config/llvm-config.h"\n') found = True break if not found: print 'not found', f else: open(f, 'w').write(''.join(fl)) and then looked through everything with `svn diff \| diffstat -l \| xargs -n 1000 gvim -p` and tried to fix include ordering and whatnot. No intended behavior change. llvm-svn: 331184	2018-04-30 14:59:11 +00:00
Craig Topper	d656410293	[X86] Make the STTNI flag intrinsics use the flags from pcmpestrm/pcmpistrm if the mask instrinsics are also used in the same basic block. Summary: Previously the flag intrinsics always used the index instructions even if a mask instruction also exists. To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction. Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs. I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions. Reviewers: chandlerc, RKSimon, spatel Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46202 llvm-svn: 331091	2018-04-27 22:15:33 +00:00
Craig Topper	7e42af87a6	[X86] Prevent folding loads with 64-bit ANDs with immediates that fit in 32-bits. Prefer to use the 32-bit AND with immediate instead. Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold. Fixes PR37063. llvm-svn: 329667	2018-04-10 03:44:15 +00:00
Craig Topper	88e38e3e3e	[X86] Remove more dead code left over from the handling of i8/i16 UMUL_LOHI/SMUL_LOHI that is no longer needed. NFC llvm-svn: 329152	2018-04-04 07:00:16 +00:00
Craig Topper	afa22edcf0	[X86] Remove dead code for handling i8/i16 UMUL_LOHI/SMUL_LOHI from X86ISelDAGToDAG.cpp. NFC These are promoted to i16/i32 multiplies by a DAG combine. llvm-svn: 329147	2018-04-04 04:38:55 +00:00
Nirav Dave	8c5f47ac40	[DAG, X86] Fix ISel-time node insertion ids As in SystemZ backend, correctly propagate node ids when inserting new unselected nodes into the DAG during instruction Seleciton for X86 target. Fixes PR36865. Reviewers: jyknight, craig.topper Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D44797 llvm-svn: 328233	2018-03-22 19:32:07 +00:00
Craig Topper	ad7c685791	[X86] Rename MOVSX32_NOREXrr8 to MOVSX32rr8_NOREX so that the scheduler model regular expressions will pick it up with the regular version. Do the same for MOVSX32_NOREXrm8, MOVZX32_NOREXrr8, and MOVZX32_NOREXrm8 llvm-svn: 327948	2018-03-20 05:00:20 +00:00
Nirav Dave	3264c1bdf6	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying node id invariant traversal and correcting typo. llvm-svn: 327898	2018-03-19 20:19:46 +00:00
Nirav Dave	5f0ab71b62	Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"" as it times out building test-suite on PPC. llvm-svn: 327778	2018-03-17 19:24:54 +00:00
Nirav Dave	982d3a56ea	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777	2018-03-17 17:42:10 +00:00
Craig Topper	25007c4f32	[X86] Pass SelectionDAG into X86ISelAddressMode::dump and on to SDNode::dump. This prevents a crash in SelectionDAGDumper with -debug when trying to print mem operands if one of the registers in the addressing mode comes from a load. llvm-svn: 327744	2018-03-16 21:10:07 +00:00
Craig Topper	e6913ec340	[X86] Post process the DAG after isel to remove vector moves that were added to zero upper bits. We previously avoided inserting these moves during isel in a few cases which is implemented using a whitelist of opcodes. But it's too difficult to generate a perfect list of opcodes to whitelist. Especially with AVX512F without AVX512VL using 512 bit vectors to implement some 128/256 bit operations. Since isel is done bottoms up, we'd have to check the VT and opcode and subtarget in order to determine whether an EXTRACT_SUBREG would be generated for some operations. So instead of doing that, this patch adds a post processing step that detects when the moves are unnecesssary after isel. At that point any EXTRACT_SUBREGs would have already been created and appear in the DAG. So then we just need to ensure the input to the move isn't one. Differential Revision: https://reviews.llvm.org/D44289 llvm-svn: 327724	2018-03-16 17:13:42 +00:00
Nirav Dave	042678bd55	Revert: r327172 "Correct load-op-store cycle detection analysis" r327171 "Improve Dependency analysis when doing multi-node Instruction Selection" r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection" Reverting patch as NodeId invariant change is causing pathological increases in compile time on PPC llvm-svn: 327197	2018-03-10 02:16:15 +00:00
Nirav Dave	0fab41782d	Correct load-op-store cycle detection analysis Add missing cycle dependency checks in load-op-store fusion. Fixes PR36274. Reviewers: craig.topper, bogner Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43154 llvm-svn: 327172	2018-03-09 20:58:07 +00:00
Nirav Dave	d668f69ee7	Improve Dependency analysis when doing multi-node Instruction Selection Relanding after fixing NodeId Invariant. Cleanup cycle/validity checks in ISel (IsLegalToFold, HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full search for cycles / dependencies pruning the search when topological property of NodeId allows. As part of this propogate the NodeId-based cutoffs to narrow hasPreprocessorHelper searches. Reviewers: craig.topper, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41293 llvm-svn: 327171	2018-03-09 20:57:42 +00:00
Nirav Dave	071699bf82	[DAG] Enforce stricter NodeId invariant during Instruction selection Instruction Selection makes use of the topological ordering of nodes by node id (a node's operands have smaller node id than it) when doing cycle detection. During selection we may violate this property as a selection of multiple nodes may induce a use dependence (and thus a node id restriction) between two unrelated nodes. If a selected node has an unselected successor this may allow us to miss a cycle in detection an invalid selection. This patch fixes this by marking all unselected successors of a selected node have negated node id. We avoid pruning on such negative ids but still can reconstruct the original id for pruning. In-tree targets have been updated to replace DAG-level replacements with ISel-level ones which enforce this property. This preemptively fixes PR36312 before triggering commit r324359 relands Reviewers: craig.topper, bogner, jyknight Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43198 llvm-svn: 327170	2018-03-09 20:57:15 +00:00
Craig Topper	48d5ed265c	[X86] Don't use EXTRACT_ELEMENT from v1i1 with i8/i32 result type when we need to guarantee zeroes in the upper bits of return. An extract_element where the result type is larger than the scalar element type is semantically an any_extend of from the scalar element type to the result type. If we expect zeroes in the upper bits of the i8/i32 we need to mae sure those zeroes are explicit in the DAG. For these cases the best way to accomplish this is use an insert_subvector to pad zeroes to the upper bits of the v1i1 first. We extend to either v16i1(for i32) or v8i1(for i8). Then bitcast that to a scalar and finish with a zero_extend up to i32 if necessary. We can't extend past v16i1 because that's the largest mask size on KNL. But isel is smarter enough to know that a zext of a bitcast from v16i1 to i16 can use a KMOVW instruction. The insert_subvectors will be dropped during isel because we can determine that the producing instruction already zeroed the upper bits of the k-register. llvm-svn: 326308	2018-02-28 08:14:28 +00:00
Chandler Carruth	a1d6107b14	[DAG, X86] Revert r324797, r324491, and r324359. Sadly, r324359 caused at least PR36312. There is a patch out for review but it seems to be taking a bit and we've already had these crashers in tree for too long. We're hitting this PR in real code now and are blocked on shipping new compilers as a consequence so I'm reverting us back to green. Sorry for the churn due to the stacked changes that I had to revert. =/ llvm-svn: 325420	2018-02-17 02:26:25 +00:00
Craig Topper	2b2d8c5eb2	[X86] Use btc/btr/bts to implement xor/and/or that affects a single bit in the upper 32-bits of a 64-bit operation. We can't fold a large immediate into a 64-bit operation. But if we know we're only operating on a single bit we can use the bit instructions. For now only do this for optsize. Differential Revision: https://reviews.llvm.org/D37418 llvm-svn: 325287	2018-02-15 19:57:35 +00:00
Craig Topper	88939fefe8	[X86] Simplify X86DAGToDAGISel::matchBEXTRFromAnd by creating an X86ISD::BEXTR node and calling Select. Add isel patterns to recognize this node. This removes a bunch of special case code for selecting the immediate and folding loads. llvm-svn: 324939	2018-02-12 21:18:11 +00:00
Craig Topper	b424fafa9f	[X86] Don't look for TEST instruction shrinking opportunities when the root node is a X86ISD::SUB. I don't believe we ever create an X86ISD::SUB with a 0 constant which is what the TEST handling needs. The ternary operator at the end of this code shows up as only going one way in the llvm-cov report from the bots. llvm-svn: 324865	2018-02-12 03:02:02 +00:00
Craig Topper	3ccbd3f32f	[X86] Remove check for X86ISD::AND with no flag users from the TEST instruction immediate shrinking code. We turn X86ISD::AND with no flag users back to ISD::AND in PreprocessISelDAG. llvm-svn: 324864	2018-02-12 03:02:01 +00:00
Nirav Dave	c8c9d4fe35	[DAG] Make early exit hasPredecessorHelper return true. NFCI. All uses conservatively assume in early exit case that it will be a predecessor. Changing default removes checking code in all uses. llvm-svn: 324797	2018-02-10 02:41:22 +00:00
Nirav Dave	27721e8617	[DAG, X86] Improve Dependency analysis when doing multi-node Instruction Selection Cleanup cycle/validity checks in ISel (IsLegalToFold, HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full search for cycles / dependencies pruning the search when topological property of NodeId allows. As part of this propogate the NodeId-based cutoffs to narrow hasPreprocessorHelper searches. Reviewers: craig.topper, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41293 llvm-svn: 324359	2018-02-06 16:14:29 +00:00
Craig Topper	57e0643160	[X86] Teach X86DAGToDAGISel::shrinkAndImmediate to preserve upper 32 zeroes of a 64 bit mask. If the upper 32 bits of a 64 bit mask are all zeros, we have special isel patterns to use a 32-bit and instead of a 64-bit and by relying on the impliciting zeroing of 32 bit ops. This patch teachs shrinkAndImmediate not to break that optimization. Differential Revision: https://reviews.llvm.org/D42899 llvm-svn: 324249	2018-02-05 16:54:07 +00:00
Craig Topper	7e910a9e85	[X86] Turn X86ISD::AND nodes that have no flag users back into ISD::AND just before isel to enable test instruction matching Summary: EmitTest sometimes creates X86ISD::AND specifically to hide the AND from DAG combine. But this prevents isel patterns that look for (cmp (and X, Y), 0) from being able to see it. So we end up with an AND and a TEST. The TEST gets removed by compare instruction optimization during the peephole pass. This patch attempts to fix this by converting X86ISD::AND with no flag users back into ISD::AND during the DAG preprocessing just before isel. In order to do this correctly I had to make the X86ISD::AND node created by EmitTest in this case really have a flag output. Which arguably it should have had anyway so that the number of operands would be consistent for the opcode in all cases. Then I had to modify the ReplaceAllUsesWith to understand that we might be looking at an instruction with 2 outputs. Though in this case there are no uses to replace since we just created the node, but that's what the code did before so I just made it keep working. Reviewers: spatel, RKSimon, niravd, deadalnix Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42764 llvm-svn: 323982	2018-02-01 17:08:39 +00:00
Amaury Sechet	f9a9e9a251	[X86] Generate testl instruction through truncates. Summary: This was introduced in D42646 but ended up being reverted because the original implementation was buggy. Depends on D42646 Reviewers: craig.topper, niravd, spatel, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42741 llvm-svn: 323899	2018-01-31 19:20:06 +00:00
Amaury Sechet	f89f188ddb	[X86] Avoid using high register trick for test instruction Summary: It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger. Reviewers: craig.topper, niravd, spatel, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42646 llvm-svn: 323888	2018-01-31 16:48:54 +00:00
Eric Liu	0b69b5ed85	Revert "[X86] Avoid using high register trick for test instruction" This reverts commit r323690. This causes crash in llc. See the original commit thread for details. llvm-svn: 323761	2018-01-30 14:18:33 +00:00
Amaury Sechet	015184b79e	[X86] Avoid using high register trick for test instruction Summary: It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger. The main noteworthy regression I was able to observe was pattern of the type (setcc (trunc (and X, C)), 0) where C is such as it would benefit from the hi register trick. To prevent this, a new pattern is added to materialize such pattern using a 32 bits test. This has the added benefit of working with any constant that is materializable as a 32bits immediate, not just the ones that can leverage the high register trick, as demonstrated by the test case in test-shrink.ll using the constant 2049 . Reviewers: craig.topper, niravd, spatel, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42646 llvm-svn: 323690	2018-01-29 20:54:33 +00:00
Craig Topper	15d69739e2	[X86] Remove VPTESTM/VPTESTNM ISD opcodes. Use isel patterns matching cmpm eq/ne with immallzeros. llvm-svn: 323612	2018-01-28 00:56:30 +00:00
Craig Topper	513d3fa674	[X86] Remove X86ISD::PCMPGTM/PCMPEQM and instead just use X86ISD::PCMPM and pattern match the immediate value during isel. Legalization is still biased to turn LT compares in to GT by swapping operands to avoid needing extra isel patterns to commute. I'm hoping to remove TESTM/TESTNM next and this should simplify that by making EQ/NE more similar. llvm-svn: 323604	2018-01-27 20:19:02 +00:00
Craig Topper	8f324bb1a4	[SelectionDAGISel] Add a debug print before call to Select. Adjust where blank lines are printed during isel process to make things more sensibly grouped. Previously some targets printed their own message at the start of Select to indicate what they were selecting. For the targets that didn't, it means there was no print of the root node before any custom handling in the target executed. So if the target did something custom and never called SelectNodeCommon, no print would be made. For the targets that did print a message in Select, if they didn't custom handle a node SelectNodeCommon would reprint the root node before walking the isel table. It seems better to just print the message before the call to Select so all targets behave the same. And then remove the root node printing from SelectNodeCommon and just leave a message that says we're starting the table search. There were also some oddities in blank line behavior. Usually due to a \n after a call to SelectionDAGNode::dump which already inserted a new line. llvm-svn: 323551	2018-01-26 19:34:20 +00:00
Chandler Carruth	c58f2166ab	Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155	2018-01-22 22:05:25 +00:00
Sanjay Patel	74a1eef7c4	[x86] shrink 'and' immediate values by setting the high bits (PR35907) Try to reverse the constant-shrinking that happens in SimplifyDemandedBits() for 'and' masks when it results in a smaller sign-extended immediate. We are also able to detect dead 'and' ops here (the mask is all ones). In that case, we replace and return without selecting the 'and'. Other targets might want to share some of this logic by enabling this under a target hook, but I didn't see diffs for simple cases with PowerPC or AArch64, so they may already have some specialized logic for this kind of thing or have different needs. This should solve PR35907: https://bugs.llvm.org/show_bug.cgi?id=35907 Differential Revision: https://reviews.llvm.org/D42088 llvm-svn: 322957	2018-01-19 16:37:25 +00:00
Nirav Dave	72d32f24f5	[X86] Extend load-op-store fusion merge to ADC/SBB. Summary: Add handling of EFLAG input to X86 Load-op-store fusion checking. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D42128 llvm-svn: 322952	2018-01-19 15:37:57 +00:00
Craig Topper	af4eb17223	[SelectionDAG][X86] Explicitly store the scale in the gather/scatter ISD nodes Currently we infer the scale at isel time by analyzing whether the base is a constant 0 or not. If it is we assume scale is 1, else we take it from the element size of the pass thru or stored value. This seems a little weird and I think it makes more sense to make it explicit in the DAG rather than doing tricky things in the backend. Most of this patch is just making sure we copy the scale around everywhere. Differential Revision: https://reviews.llvm.org/D40055 llvm-svn: 322210	2018-01-10 19:16:05 +00:00
Craig Topper	d58c165545	[X86] Make v2i1 and v4i1 legal types without VLX Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967	2018-01-07 18:20:37 +00:00
Craig Topper	eff84ed204	[X86] Improve the printing of address mode during isel matching. Fix some inconsistent new line behavior and only print the FrameIndex when the address mode is a FrameIndexBase addressing mode. llvm-svn: 321368	2017-12-22 17:18:10 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Michael Zolotukhin	ad24af7f58	Remove redundant includes from lib/Target/X86. llvm-svn: 320636	2017-12-13 21:31:19 +00:00
Matt Morehouse	9e658c974b	Revert "[X86] Improvement in CodeGen instruction selection for LEAs." This reverts r319543, due to ASan bot breakage. llvm-svn: 319591	2017-12-01 22:20:26 +00:00
Jatin Bhateja	328199ec26	[X86] Improvement in CodeGen instruction selection for LEAs. Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will now look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. 4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put. PR32755 will be taken care of by this pathc. Previous patch revisions : r313343 , r314886 Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja Reviewed By: lsaba, RKSimon, jbhateja Subscribers: jmolloy, spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 319543	2017-12-01 14:07:38 +00:00
Craig Topper	f31b0b850b	[X86] Teach isel that X86ISD::CMPM_RND zeros the upper bits of the mask register. llvm-svn: 318933	2017-11-23 18:41:21 +00:00
Craig Topper	ee74044f93	[X86] Add an X86ISD::MSCATTER node for consistency with the X86ISD::MGATHER. This makes the fact that X86 needs an explicit mask output not part of the type constraint for the ISD::MSCATTER. This also gives the X86ISD::MGATHER/MSCATTER nodes a common base class simplifying the address selection code in X86ISelDAGToDAG.cpp llvm-svn: 318823	2017-11-22 08:10:54 +00:00
Craig Topper	c314f461dd	[X86] Allow X86ISD::Wrapper to be folded into the base of gather/scatter address If the base of our gather corresponds to something contained in X86ISD::Wrapper we should be able to fold it into the address. This patch refactors some of the address matching to more fully use the X86ISelAddressMode struct and the getAddressOperands helper. A new helper function matchVectorAddress is added to call matchWrapper or fall back to matchAddressBase. We should also be able to support constant offsets from a wrapper, but I'll look into that in a future patch. We may even be able to completely reuse matchAddress here, but I wanted to start simple and work up to it. Differential Revision: https://reviews.llvm.org/D39927 llvm-svn: 318057	2017-11-13 17:53:59 +00:00
Craig Topper	bb001c6ddc	[X86] Merge the template method selectAddrOfGatherScatterNode into selectVectorAddr. NFCI Just need to initialize a couple variables differently based on the node type. No need for a whole separate template method. llvm-svn: 317915	2017-11-10 19:26:04 +00:00
Craig Topper	61f81f9637	[X86] Preserve memory refs when folding loads into divides. This is similar to what we already do for multiplies. Without this we can't unfold and hoist an invariant load. llvm-svn: 317732	2017-11-08 22:26:39 +00:00
Craig Topper	55029d811f	[X86] Remove an if check on the result of a cast. NFC cast takes a non-null input and produces a non-null output. So this if can never fail. llvm-svn: 317731	2017-11-08 22:26:37 +00:00

1 2 3 4 5 ...

744 Commits