llvm-project

Commit Graph

Author	SHA1	Message	Date
Jessica Paquette	34b618bf7e	[MachineOutliner][NFC] Don't create outlined sequence from integer mapping Some gardening/refactoring. It's cleaner to copy the instructions into the MachineFunction using the first candidate instead of going to the mapper. Also, by doing this we can remove the Seq member from OutlinedFunction entirely. llvm-svn: 348390	2018-12-05 17:57:33 +00:00
Sanjay Patel	33a448f935	[DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893) Because we're potentially peeking through a bitcast in this transform, we need to use overall bitwidths rather than number of elements to determine when it's safe to proceed. Should fix: https://bugs.llvm.org/show_bug.cgi?id=39893 llvm-svn: 348383	2018-12-05 17:10:30 +00:00
Simon Pilgrim	8fdaf5c915	[TargetLowering] Remove ISD::ANY_EXTEND/ANY_EXTEND_VECTOR_INREG opcodes from SimplifyDemandedVectorElts These have no test coverage and the KnownZero flags can't be guaranteed unlike SIGN/ZERO_EXTEND cases. llvm-svn: 348361	2018-12-05 12:20:05 +00:00
Simon Pilgrim	180639afe5	[SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467) This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353	2018-12-05 11:12:12 +00:00
Simon Pilgrim	cd8a152b18	Remove superfluous comments. NFCI. As requested in D54698. llvm-svn: 348350	2018-12-05 10:45:44 +00:00
Simon Pilgrim	d24730cdda	[TargetLowering] SimplifyDemandedVectorElts - don't alter DemandedElts mask Fix potential issue with the ISD::INSERT_VECTOR_ELT case tweaking the DemandedElts mask instead of using a local copy - so later uses of the mask use the tweaked version..... Noticed while investigating adding zero/undef folding to SimplifyDemandedVectorElts and the altered DemandedElts mask was causing mismatches. llvm-svn: 348348	2018-12-05 10:37:45 +00:00
Craig Topper	6934202dc0	[MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers in post-RA LICM It looks like MCRegAliasIterator can visit the same physical register twice. When this happens in this code in LICM we end up setting the PhysRegDef and then later in the same loop visit the register again. Now we see that PhysRegDef is set from the earlier iteration so now set PhysRegClobber. This patch splits the loop so we have one that uses the previous value of PhysRegDef to update PhysRegClobber and second loop that updates PhysRegDef. The X86 atomic test is an improvement. I had to add sideeffect to the two shrink wrapping tests to prevent hoisting from occurring. I'm not sure about the AMDGPU tests. It looks like the branch instruction changed at end the of the loops. And in the branch-relaxation test I think there is now "and vcc, exec, -1" instruction that wasn't there before. Differential Revision: https://reviews.llvm.org/D55102 llvm-svn: 348330	2018-12-05 03:41:26 +00:00
Amara Emerson	814a6794ba	[SelectionDAG] Split very large token factors for loads into 64k chunks. There's a 64k limit on the number of SDNode operands, and some very large functions with 64k or more loads can cause crashes due to this limit being hit when a TokenFactor with this many operands is created. To fix this, create sub-tokenfactors if we've exceeded the limit. No test case as it requires a very large function. rdar://45196621 Differential Revision: https://reviews.llvm.org/D55073 llvm-svn: 348324	2018-12-05 00:41:30 +00:00
Nirav Dave	ce26c27b2a	[SelectionDAG] Redefine isGAPlusOffset in terms of unwrapAddress. NFCI. llvm-svn: 348288	2018-12-04 17:59:43 +00:00
Matt Arsenault	43153024ab	MIR: Add method to stop after specific runs of passes Currently if you use -{start,stop}-{before,after}, it picks the first instance with the matching pass name. If you run the same pass multiple times, there's no way to distinguish them. Allow specifying a run index wih ,N to specify which you mean. llvm-svn: 348285	2018-12-04 17:45:12 +00:00
Simon Pilgrim	0add090e24	[TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686) PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction. The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment. The X87 cases don't need the strict flag but generates much nicer code with it.... Differential Revision: https://reviews.llvm.org/D53794 llvm-svn: 348251	2018-12-04 11:21:30 +00:00
Simon Pilgrim	666261cdc8	[TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes Add support for ISD::_EXTEND and ISD::_EXTEND_VECTOR_INREG opcodes. The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch. llvm-svn: 348246	2018-12-04 10:41:06 +00:00
Sanjay Patel	d24f63477d	[DAGCombiner] narrow truncated vector binops when legal This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195	2018-12-03 21:57:35 +00:00
Craig Topper	e35b01f8ea	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158	2018-12-03 18:26:24 +00:00
Sanjay Patel	b205606d3e	[SelectionDAG] fold constant with undef vector per element This makes the SDAG behavior consistent with the way we do this in IR. It's possible that we were getting the wrong answer before. For example, 'xor undef, undef --> 0' but 'xor undef, C' --> undef. But the most practical improvement is likely as shown in the tests here - for FP, we were overconstraining undef lanes to NaN, and that can prevent vector simplifications/narrowing (see D51553). llvm-svn: 348090	2018-12-02 13:48:42 +00:00
Sanjay Patel	2daceedf92	[DAGCombiner] guard against an oversized shift crash This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089	2018-12-02 13:33:56 +00:00
Simon Pilgrim	e017ed3245	[SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element. This patch relaxes this to demanding an element if we need any bit from it. Differential Revision: https://reviews.llvm.org/D54761 llvm-svn: 348073	2018-12-01 12:08:55 +00:00
Nicolai Haehnle	a9cc92c247	AMDGPU: Fix various issues around the VirtReg2Value mapping Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54340 llvm-svn: 348049	2018-11-30 22:55:29 +00:00
Sanjay Patel	1901a12e76	[SelectionDAG] fold FP binops with 2 undef operands to undef llvm-svn: 348016	2018-11-30 18:38:52 +00:00
Yonghong Song	f487334622	Revert "[BTF] Add BTF DebugInfo" This reverts commit 9c6b970db8bc63b28ce58a129bb1580a6a3c6caf. llvm-svn: 348004	2018-11-30 16:54:43 +00:00
Yonghong Song	81b77e9159	[BTF] Add BTF DebugInfo This patch adds BPF Debug Format (BTF) as a standalone LLVM debuginfo. The BTF related sections are directly generated from IR. The BTF debuginfo is generated only when the compilation target is BPF. What is BTF? ============ First, the BPF is a linux kernel virtual machine and widely used for tracing, networking and security. https://www.kernel.org/doc/Documentation/networking/filter.txt https://cilium.readthedocs.io/en/v1.2/bpf/ BTF is the debug info format for BPF, introduced in the below linux patch `69b693f0ae (diff-06fb1c8825f653d7e539058b72c83332)` in the patch set mentioned in the below lwn article. https://lwn.net/Articles/752047/ The BTF format is specified in the above github commit. In summary, its layout looks like struct btf_header type subsection (a list of types) string subsection (a list of strings) With such information, the kernel and the user space is able to pretty print a particular bpf map key/value. One possible example below: Withtout BTF: key: [ 0x01, 0x01, 0x00, 0x00 ] With BTF: key: struct t { a : 1; b : 1; c : 0} where struct is defined as struct t { char a; char b; short c; }; How BTF is generated? ===================== Currently, the BTF is generated through pahole. https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=68645f7facc2eb69d0aeb2dd7d2f0cac0feb4d69 and available in pahole v1.12 https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=4a21c5c8db0fcd2a279d067ecfb731596de822d4 Basically, the bpf program needs to be compiled with -g with dwarf sections generated. The pahole is enhanced such that a .BTF section can be generated based on dwarf. This format of the .BTF section matches the format expected by the kernel, so a bpf loader can just take the .BTF section and load it into the kernel. `8a138aed4a` The .BTF section layout is also specified in this patch: with file include/llvm/BinaryFormat/BTF.h. What use cases this patch tries to address? =========================================== Currently, only the bpf instruction stream is required to pass to the kernel. The kernel verifies it, jits it if configured to do so, attaches it to a particular kernel attachment point, and later executes when a particular event happens. This patch tries to expand BTF to support two more use cases below: (1). BPF supports subroutine calls. During performance analysis, it would be good to differentiate which call is hot instead of just providing a virtual address. This would require to pass a unique identifier for each subroutine to the kernel, the subroutine name is a natual choice. (2). If a particular jitted instruction is hot, we want user to know which source line this jitted instruction belongs to. This would require the source information is available to various profiling tools. Note that in a single ELF file, . there may be multiple loadable bpf programs, . for a particular to-be-loaded bpf instruction stream, its instructions may come from multiple PROGBITS sections, the bpf loader needs to merge them together to a single consecutive insn stream before loading to the kernel. For example: section .text: subroutines funcFoo section _progA: calling funcFoo section _progB: calling funcFoo The bpf loader could construct two loadable bpf instruction streams and load them into the kernel: . _progA funcFoo . _progB funcFoo So per ELF section function offset and instruction offset will need to be adjusted before passing to the kernel, and the kernel essentially expect only one code section regardless of how many in the ELF file. What do we propose and Why? =========================== To support the above two use cases, we propose to add an additional section, .BTF.ext, to the ELF file which is the input of the bpf loader. A different section is preferred since loader may need to manipulate it before loading part of its data to the kernel. The .BTF.ext section has a similar header to the .BTF section and it contains two subsections for func_info and line_info. . the func_info maps the func insn byte offset to a func type in the .BTF type subsection. . the line_info maps the insn byte offset to a line info. . both func_info and line_info subsections are organized by ELF PROGBITS AX sections. pahole is not a good place to implement .BTF.ext as pahole is mostly for structure hole information and more importantly, we want to pass the actual code to the kernel. . bpf program typically is small so storage overhead should be small. . in bpf land, it is totally possible that an application loads the bpf program into the kernel and then that application quits, so holding debug info by the user space application is not practical as you may not even know who loads this bpf program. . having source codes directly kept by kernel would ease deployment since the original source code does not need ship on every hosts and kernel-devel package does not need to be deployed even if kernel headers are used. LLVM is a good place to implement. . The only reliable time to get the source code is during compilation time. This will result in both more accurate information and easier deployment as stated in the above. . Another consideration is for JIT. The project like bcc (https://github.com/iovisor/bcc) use MCJIT to compile a C program into bpf insns and load them to the kernel. The llvm generated BTF sections will be readily available for such cases as well. Design and implementation of emiting .BTF/.BTF.ext sections =========================================================== The BTF debuginfo format is defined. Both .BTF and .BTF.ext sections are generated directly from IR when both "-target bpf" and "-g" are specified. Note that dwarf sections are still generated as dwarf is used by user space tools like llvm-objdump etc. for BPF target. This patch also contains tests to verify generated .BTF and .BTF.ext sections for all supported types, func_info and line_info subsections. The patch is also tested against linux kernel bpf sample tests and selftests. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D53736 llvm-svn: 347999	2018-11-30 16:22:59 +00:00
Than McIntosh	0e0a8a3fee	[CodeGen] Prefer static frame index for STATEPOINT liveness args Summary: If a given liveness arg of STATEPOINT is at a fixed frame index (e.g. a function argument passed on stack), prefer to use this fixed location even the address is also in a register. If we use the register it will generate a spill, which is not necessary since the fixed frame index can be directly recorded in the stack map. Patch by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, niravd, reames Reviewed By: reames Subscribers: cherryyz, reames, anna, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D53889 llvm-svn: 347998	2018-11-30 16:22:41 +00:00
Nicolai Haehnle	445b0b6260	TableGen/ISel: Allow PatFrag predicate code to access captured operands Summary: This simplifies writing predicates for pattern fragments that are automatically re-associated or commuted. For example, a followup patch adds patterns for fragments of the form (add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are automatically commuted to (add $z, (shl $x, $y)), which makes it basically impossible to refer to $x, $y, and $z generically in the PredicateCode. With this change, the PredicateCode can refer to $x, $y, and $z simply as `Operands[i]`. Test confirmed that there are no changes to any of the generated files when building all (non-experimental) targets. Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D51994 llvm-svn: 347992	2018-11-30 14:15:13 +00:00
Alex Bradbury	fca95cfee9	[SelectionDAG] Support result type promotion for FLT_ROUNDS_ For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the result of ISD::FLT_ROUNDS_. Differential Revision: https://reviews.llvm.org/D53820 llvm-svn: 347986	2018-11-30 13:18:33 +00:00
Alex Bradbury	bd24c7b045	[SelectionDAG] Support promotion of PREFETCH operands For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operands of ISD::PREFETCH. Differential Revision: https://reviews.llvm.org/D53281 llvm-svn: 347980	2018-11-30 10:06:31 +00:00
Alex Bradbury	36e0fd1d39	[SelectionDAG] Support promotion of FRAMEADDR/RETURNADDR operands For targets where i32 is not a legal type (e.g. 64-bit RISC-V), LegalizeIntegerTypes must promote the operand. Differential Revision: https://reviews.llvm.org/D53279 llvm-svn: 347978	2018-11-30 10:02:06 +00:00
Alex Bradbury	e0e62e97df	[TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend operands when it is able to do so. For some targets this is more expensive than a sign-extension, which is also a valid choice. Introduce the isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == MVT::i64, as it can be performed using a single instruction. Differential Revision: https://reviews.llvm.org/D52978 llvm-svn: 347977	2018-11-30 09:56:54 +00:00
Hsiangkai Wang	957578ddf7	[CodeGen] Fix bugs in BranchFolderPass when debug labels are generated. Skip DBG_VALUE and DBG_LABEL in branch folding algorithms. The bug is reported in https://bugs.chromium.org/p/chromium/issues/detail?id=898160. Differential Revision: https://reviews.llvm.org/D54199 llvm-svn: 347964	2018-11-30 08:07:29 +00:00
Hsiangkai Wang	d72f6f133a	[NFC] Refine doxygen format. Differential Revision: https://reviews.llvm.org/D54568 llvm-svn: 347963	2018-11-30 08:07:24 +00:00
Sanjay Patel	8d27144251	[DAGCombiner] narrow truncated binops The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. Differential Revision: https://reviews.llvm.org/D54640 llvm-svn: 347917	2018-11-29 20:58:26 +00:00
Alex Bradbury	66d9a752b9	[RISCV] Implement codegen for cmpxchg on RV32IA Utilise a similar ('late') lowering strategy to D47882. The changes to AtomicExpandPass allow this strategy to be utilised by other targets which implement shouldExpandAtomicCmpXchgInIR. All cmpxchg are lowered as 'strong' currently and failure ordering is ignored. This is conservative but correct. Differential Revision: https://reviews.llvm.org/D48131 llvm-svn: 347914	2018-11-29 20:43:42 +00:00
Francis Visoiu Mistrih	0b8dd4488e	[MachineScheduler] Order FI-based memops based on stack direction It makes more sense to order FI-based memops in descending order when the stack goes down. This allows offsets to stay "consecutive" and allow easier pattern matching. llvm-svn: 347906	2018-11-29 20:03:19 +00:00
Craig Topper	129d529ab3	[SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG. I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse. Differential Revision: https://reviews.llvm.org/D54276 llvm-svn: 347902	2018-11-29 19:36:17 +00:00
Petr Pavlu	6bb80512db	[GlobalISel] Fix insertion of stack-protector epilogue * Tell the StackProtector pass to generate the epilogue instrumentation when GlobalISel is enabled because GISel currently does not implement the same deferred epilogue insertion as SelectionDAG. * Update StackProtector::InsertStackProtectors() to find a stack guard slot by searching for the llvm.stackprotector intrinsic when the prologue was not created by StackProtector itself but the pass still needs to generate the epilogue instrumentation. This fixes a problem when the pass would abort because the stack guard AllocInst pointer was null when generating the epilogue -- test CodeGen/AArch64/GlobalISel/arm64-irtranslator-stackprotect.ll. Differential Revision: https://reviews.llvm.org/D54518 llvm-svn: 347862	2018-11-29 13:22:53 +00:00
Petr Pavlu	e6406d568c	[GlobalISel] Make EnableGlobalISel always set when GISel is enabled Change meaning of TargetOptions::EnableGlobalISel. The flag was previously set only when a target switched on GlobalISel but it is now always set when the GlobalISel pipeline is enabled. This makes the flag consistent with TargetOptions::EnableFastISel and allows its use in other parts of the compiler to determine when GlobalISel is enabled. The EnableGlobalISel flag had previouly only one use in TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value to determine if GlobalISel was enabled by a target and returned false in such a case. To preserve the current behaviour, a new flag TargetOptions::GlobalISelAbort is introduced to separately record the abort behaviour. Differential Revision: https://reviews.llvm.org/D54518 llvm-svn: 347861	2018-11-29 12:56:32 +00:00
Serguei Katkov	2673f1783e	[CGP] Improve compile time for complex addressing mode This is a fix for PR39625 with improvement the compile time by reducing the number of intermediate Phi nodes created. Reviewers: john.brawn, reames Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54932 llvm-svn: 347839	2018-11-29 06:45:18 +00:00
Francis Visoiu Mistrih	879087ce5b	[MachineScheduler] Add support for clustering mem ops with FI base operands Before this patch, the following stores in `merge_fail` would fail to be merged, while they would get merged in `merge_ok`: ``` void use(unsigned long long ); void merge_fail(unsigned key, unsigned index) { unsigned long long args[8]; args[0] = key; args[1] = index; use(args); } void merge_ok(unsigned long long dst, unsigned a, unsigned b) { dst[0] = a; dst[1] = b; } ``` The reason is that `getMemOpBaseImmOfs` would return false for FI base operands. This adds support for this. Differential Revision: https://reviews.llvm.org/D54847 llvm-svn: 347747	2018-11-28 12:00:28 +00:00
Francis Visoiu Mistrih	d7eebd6d83	[CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand Currently, instructions doing memory accesses through a base operand that is not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`. This means that functions such as `TII::shouldClusterMemOps` will bail out on instructions using an FI as a base instead of a register. The goal of this patch is to refactor all this to return a base operand instead of a base register. Then in a separate patch, I will add FI support to the mem op clustering in the MachineScheduler. Differential Revision: https://reviews.llvm.org/D54846 llvm-svn: 347746	2018-11-28 12:00:20 +00:00
Simon Atanasyan	af860d44fe	[DebugInfo] Rename EmitDebugThreadLocal back to EmitDebugValue. NFC This reverts r294500. DwarfCompileUnit::addAddressExpr uses DIEExpr for PCOffset. In that case the expression is unrelated to thread locals and so emitting a value of the DIEExpr does not have to always mean emit-debug-thread-local. llvm-svn: 347744	2018-11-28 11:48:07 +00:00
Craig Topper	b955bf382c	[LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision: https://reviews.llvm.org/D54906 llvm-svn: 347593	2018-11-26 21:12:39 +00:00
Craig Topper	923f463ef2	[SelectionDAG] Teach BaseIndexOffset::match to unwrap the base after looking through an add/or We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C) Differential Revision: https://reviews.llvm.org/D54818 llvm-svn: 347591	2018-11-26 20:16:33 +00:00
Than McIntosh	30c804bbb1	[CodeGen] Support custom format of stack maps Summary: Add a hook to the GCMetadataPrinter for emitting stack maps in custom format. The hook will be called at stack map generation time. The default stack map format is used if there is no hook. For this to be useful a few data structures and accessors are exposed from the StackMaps class, so the custom printer can access the stack map data. This patch authored by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, apilipenko, reames Reviewed By: reames Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D53892 llvm-svn: 347584	2018-11-26 18:43:48 +00:00
Erich Keane	e381120477	Delete dead code introduced in r347354. ParentTy is never used other than an assignment, and since it is a pointer, there is no side effect. Some versions of GCC notice and warn on this. Change-Id: I37dc1a18c7b58040419afb803621de13d8904a8f llvm-svn: 347581	2018-11-26 17:51:27 +00:00
Than McIntosh	b9e4852c92	[CodeGen] Take SPAdj into account for STATEPOINT liveness args Summary: STATEPOINT records its args' locations on stack relative to SP. If the SP is changed, take that into account. This patch authored by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D53603 llvm-svn: 347569	2018-11-26 16:16:09 +00:00
Aaron Ballman	0442a12a80	Remove an unnecessary file; NFC. This source file has not been needed since r346522 and was triggering diagnostics in MSVC about an object file which exports no public symbols (LNK4221). llvm-svn: 347565	2018-11-26 15:54:36 +00:00
Diana Picus	0528e2cfb3	[ARM GlobalISel] Support G_CTLZ and G_CTLZ_ZERO_UNDEF We can now select CLZ via the TableGen'erated code, so support G_CTLZ and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32. Legalizer: If the CLZ instruction is available, use it for both G_CTLZ and G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and lower G_CTLZ in terms of it. In order to achieve this we need to add support to the LegalizerHelper for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2). We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF if that is supported as a libcall, as opposed to just if it is Legal or Custom. Due to a minor refactoring of the helper function in charge of this, we will also allow the same behaviour for G_CTTZ and G_CTPOP. This is not going to be a problem in practice since we don't yet have support for treating G_CTTZ and G_CTPOP as libcalls (not even in DAGISel). Reg bank select: Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point. Instruction select: Nothing to do. llvm-svn: 347545	2018-11-26 11:07:02 +00:00
Diana Picus	30887bf6c3	Fix typo in comment. NFC llvm-svn: 347544	2018-11-26 11:06:53 +00:00
Sanjay Patel	7336e7c67a	[x86] limit transform for select-of-fp-constants This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526	2018-11-25 17:27:02 +00:00
Sanjay Patel	04435677d0	[SelectionDAG] move constant or splat functions to common location rL347502 moved the null sibling, so we should group all of these together. I'm not sure why these aren't methods of the SDValue class itself, but that's another patch if that's possible. llvm-svn: 347523	2018-11-25 16:09:32 +00:00
Sanjay Patel	7e119c0400	[DAG] consolidate shift simplifications ...and use them to avoid creating obviously undef values as discussed in the post-commit thread for r347478. The diffs in vector div/rem show that we were missing real optimizations by creating bogus shift nodes. llvm-svn: 347502	2018-11-23 20:05:12 +00:00

1 2 3 4 5 ...

25375 Commits