llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	d52f5ed01a	[SLPVectorizer] Use getAPInt() for comparison. NFCI. Technically integers can assert on getZExtValue() if beyond i64 range, and a fuzzer usually find this.....	2019-10-30 16:16:55 +00:00
Xiangling Liao	5c9bdc79e1	[AIX] Lowering CPI/JTI/BA to MIR Enable lowering of constant pool index, jump table index, and bloack address to MIR on AIX. Differential Revision: https://reviews.llvm.org/D69264	2019-10-30 11:21:37 -04:00
David Tellenbach	70caa1fc30	[AArch64][MachineOutliner] Return address signing for outlined functions Summary: During AArch64 frame lowering instructions to enable return address signing are inserted into function if needed. Functions generated during machine outlining don't run through target frame lowering and hence are missing such instructions. This patch introduces the following changes: 1. If not all functions that potentially participate in function outlining agree on their return address signing scope and their return address signing key, outlining is disabled for these functions. 2. If not all functions that potentially participate in function outlining agree on their support for v8.3A features, outlining is disabled for these functions. 2. If all candidate functions agree on the signing scope, signing key and and their support for v8.3 features, the outlined function behaves as if it had the same scope and key attributes and as if it would provide the same v8.3A support as the original functions. Reviewers: olista01, paquette, t.p.northover, ostannard Reviewed By: ostannard Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69097	2019-10-30 15:20:16 +00:00
Jay Foad	86549c7528	[SelectionDAG] Add support for FP_ROUND in WidenVectorOperand. Summary: This is used on AMDGPU for rounding from v3f64 (which is illegal) to v3f32 (which is legal). Subscribers: jvesely, nhaehnle, tpr, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69339	2019-10-30 15:18:21 +00:00
Jay Foad	2da4b6e514	[IR] Allow fast math flags on calls with floating point array type. Summary: This extends the rules for when a call instruction is deemed to be an FPMathOperator, which is based on the type of the call (i.e. the return type of the function being called). Previously we only allowed floating-point and vector-of-floating-point types. Now we also allow arrays (nested to any depth) of floating-point and vector-of-floating-point types. This was motivated by llpc, the pipeline compiler for AMD GPUs (https://github.com/GPUOpen-Drivers/llpc). llpc has many math library functions that operate on vectors, typically represented as <4 x float>, and some that operate on matrices, typically represented as [4 x <4 x float>], and it's useful to be able to decorate calls to all of them with fast math flags. Reviewers: spatel, wristow, arsenm, hfinkel, aemerson, efriedma, cameron.mcinally, mcberg2017, jmolloy Subscribers: wdng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69161	2019-10-30 14:00:33 +00:00
Krzysztof Parzyszek	43144ffa91	LiveIntervals: Split live intervals on multiple dead defs This is a follow-up to D67448. Split live intervals with multiple dead defs during the initial execution of the live interval analysis, but do it outside of the function createAndComputeVirtRegInterval. Differential Revision: https://reviews.llvm.org/D68666	2019-10-30 08:50:46 -05:00
Pavel Labath	83a55c6a57	minidump: Rename some architecture constants The architecture enum contains two kinds of contstants: the "official" ones defined by Microsoft, and unofficial constants added by breakpad to cover the architectures not described by the first ones. Up until now, there was no big need to differentiate between the two. However, now that Microsoft has defined https://docs.microsoft.com/en-us/windows/win32/api/sysinfoapi/ns-sysinfoapi-system_info a constant for ARM64, we have a name clash. This patch renames all breakpad-defined constants with to include the prefix "BP_". This frees up the name "ARM64", which I'll re-introduce with the new "official" value in a follow-up patch. Reviewers: amccarth, clayborg Subscribers: lldb-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D69285	2019-10-30 14:46:00 +01:00
Djordje Todorovic	532815dd5c	[ARM][AArch64][DebugInfo] Improve call site instruction interpretation Extend the describeLoadedValue() with support for target specific ARM and AArch64 instructions interpretation. The patch provides specialization for ADD and SUB operations that include a register and an immediate/offset operand. Some of the instructions can operate with global string addresses or constant pool indexes but such cases are omitted since we currently lack flexible support for processing such operands at DWARF production stage. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D67556	2019-10-30 13:58:14 +01:00
Kerry McLaughlin	5c2c94648e	[AArch64][SVE] Implement masked store intrinsics Summary: Adds support for codegen of masked stores, with non-truncating and truncating variants. Reviewers: huntergr, greened, dmgreen, rovka, sdesmalen Reviewed By: dmgreen, sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69378	2019-10-30 11:56:54 +00:00
Simon Pilgrim	81399002ae	[X86] combineOrShiftToFunnelShift - use isOperationLegalOrCustom to check FSHL/FSHR support Remove hard wired legality check.	2019-10-30 11:52:22 +00:00
Simon Pilgrim	26655376fe	[X86] combineOrShiftToFunnelShift - use getShiftAmountTy instead of hardwiring to MVT::i8	2019-10-30 11:52:22 +00:00
Kerry McLaughlin	e128c20864	[AArch64][SVE] Implement additional integer arithmetic intrinsics Summary: Add intrinsics for the following: - sxt[b\|h\|w] & uxt[b\|h\|w] - cls & clz - not & cnot Reviewers: huntergr, sdesmalen, dancgr Reviewed By: sdesmalen Subscribers: cameron.mcinally, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69567	2019-10-30 11:31:54 +00:00
Jay Foad	b592253ec6	[AMDGPU] Consolidate one more getGeneration check This one should have been done in r363902 when hasReadVCCZBug was introduced.	2019-10-30 11:16:42 +00:00
Guillaume Chatelet	119b436da1	[Alignment] Use Align for TFI.getStackAlignment() in X86ISelLowering Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, craig.topper, rnk Reviewed By: rnk Subscribers: rnk, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69034	2019-10-30 10:35:13 +01:00
Karl-Johan Karlsson	760ed8da98	[AddressSanitizer] Only instrument globals of default address space The address sanitizer ignore memory accesses from different address spaces, however when instrumenting globals the check for different address spaces is missing. This result in assertion failure. The fault was found in an out of tree target. The patch skip all globals of non default address space. Reviewed By: leonardchan, vitalybuka Differential Revision: https://reviews.llvm.org/D68790	2019-10-30 09:32:19 +01:00
QingShan Zhang	f15cf93899	[PowerPC] Clear the sideeffect bit for those instructions that didn't have the match pattern If the instruction have match pattern, llvm-tblgen will infer the sideeffect bit from the match pattern and it works well. If not, the tblgen will set it as true that hurt the scheduling. PowerPC has some instructions that didn't specify the match pattern(i.e. LXSD etc), which is manually selected post-ra according to the register pressure. We need to clear the sideeffect flag for these instructions. Differential Revision: https://reviews.llvm.org/D69232	2019-10-30 07:59:32 +00:00
David Zarzycki	f68925d450	[X86] Make memcmp vector lowering handle arbitrary expansions Teach combineVectorSizedSetCCEquality() to handle arbitrary memcmp expansions but do not change any default policy for now. This also fixes a bug in the memcmp expansion itself when large displacements are needed. https://reviews.llvm.org/D69507	2019-10-30 09:12:57 +02:00
Chris Bieneman	a34680a33e	Break out OrcError and RPC Summary: When createing an ORC remote JIT target the current library split forces the target process to link large portions of LLVM (Core, Execution Engine, JITLink, Object, MC, Passes, RuntimeDyld, Support, Target, and TransformUtils). This occurs because the ORC RPC interfaces rely on the static globals the ORC Error types require, which starts a cycle of pulling in more and more. This patch breaks the ORC RPC Error implementations out into an "OrcError" library which only depends on LLVM Support. It also pulls the ORC RPC headers into their own subdirectory. With this patch code can include the Orc/RPC/*.h headers and will only incur link dependencies on LLVMOrcError and LLVMSupport. Reviewers: lhames Reviewed By: lhames Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68732	2019-10-29 17:31:28 -07:00
Austin Kerbow	2b88b344f2	AMDGPU/GlobalISel: Legalize FDIV32 Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69581	2019-10-29 17:18:06 -07:00
Nick Terrell	6814232429	[LLD][ELF] Support --[no-]mmap-output-file with F_no_mmap Summary: Add a flag `F_no_mmap` to `FileOutputBuffer` to support `--[no-]mmap-output-file` in ELF LLD. LLD currently explicitly ignores this flag for compatibility with GNU ld and gold. We need this flag to speed up link time for large binaries in certain scenarios. When we link some of our larger binaries we find that LLD takes 50+ GB of memory, which causes memory pressure. The memory pressure causes the VM to flush dirty pages of the output file to disk. This is normally okay, since we should be flushing cold pages. However, when using BtrFS with compression we need to write 128KB at a time when we flush a page. If any page in that 128KB block is written again, then it must be flushed a second time, and so on. Since LLD doesn't write sequentially this causes write amplification. The same 128KB block will end up being flushed multiple times, causing the linker to many times more IO than necessary. We've observed 3-5x faster builds with -no-mmap-output-file when we hit this scenario. The bad scenario only applies to compressed filesystems, which group together multiple pages into a single compressed block. I've tested BtrFS, but the problem will be present for any compressed filesystem on Linux, since it is caused by the VM. Silently ignoring --no-mmap-output-file caused a silent regression when we switched from gold to lld. We pass --no-mmap-output-file to fix this edge case, but since lld silently ignored the flag we didn't realize it wasn't being respected. Benchmark building a 9 GB binary that exposes this edge case. I linked 3 times with --mmap-output-file and 3 times with --no-mmap-output-file and took the average. The machine has 24 cores @ 2.4 GHz, 112 GB of RAM, BtrFS mounted with -compress-force=zstd, and an 80% full disk. \| Mode \| Time \| \|---------\|-------\| \| mmap \| 894 s \| \| no mmap \| 126 s \| When compression is disabled, BtrFS performs just as well with and without mmap on this benchmark. I was unable to reproduce the regression with any binaries in lld-speed-test. Reviewed By: ruiu, MaskRay Differential Revision: https://reviews.llvm.org/D69294	2019-10-29 15:49:08 -07:00
Adrian Prantl	f919be3365	[DWARF5] Added support for deleted C++ special member functions. This patch adds support for deleted C++ special member functions in clang and llvm. Also added Defaulted member encodings for future support for defaulted member functions. Patch by Sourabh Singh Tomar! Differential Revision: https://reviews.llvm.org/D69215	2019-10-29 13:44:06 -07:00
Philip Reames	2460989eab	[SelectionDAG] Enable lowering unordered atomics loads w/LoadSDNode (and stores w/StoreSDNode) by default Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default. As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other. This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case. Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations. This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics. If there are problems with this patch, the most likely area will be legalization. Differential Revision: https://reviews.llvm.org/D69219	2019-10-29 12:46:24 -07:00
Craig Topper	772533d921	[X86] Narrow i64 compares with constant to i32 when the upper 32-bits are known zero. This catches some cases. There are probably ways to improve this. I tried doing it as a combine on the setcc, but that broke some cases involving flag reuse in place of test. I renamed the isX86CCUnsigned to isX86CCSigned and flipped its polarity to make it consistent with the similar functions for ISD::SETCC. This avoids calling EQ/NE as being signed or unsigned. Fixes PR43823. Differential Revision: https://reviews.llvm.org/D69499	2019-10-29 11:38:15 -07:00
Ehsan Amiri	1e9de0215f	[SVE][AArch64] Adding pattern matching for some SVE instructions. Adding patten matching for two SVE intrinsics: frecps and frsqrts. Also added patterns for fsub and fmul - these SDNodes directly correspond to machine instructions. Review: https://reviews.llvm.org/D68476 Patch authored by mgudim (Mikhail Gudim).	2019-10-29 13:17:30 -04:00
Fangrui Song	5503455ccb	[SLP] Fix -Wunused-variable. NFC	2019-10-29 09:38:55 -07:00
Sander de Smalen	d6a7da80aa	Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2) llvm/test/DebugInfo/MIR/X86/live-debug-values-reg-copy.mir failed with EXPENSIVE_CHECKS enabled, causing the patch to be reverted in rG2c496bb5309c972d59b11f05aee4782ddc087e71. This patch relands the patch with a proper fix to the live-debug-values-reg-copy.mir tests, by ensuring the MIR encodes the callee-saves correctly so that the CalleeSaved info is taken from MIR directly, rather than letting it be recalculated by the PEI pass. I've done this by running `llc -stop-before=prologepilog` on the LLVM IR as captured in the test files, adding the extra MOV instructions that were manually added in the original test file, then running `llc -run-pass=prologepilog` and finally re-added the comments for the MOV instructions.	2019-10-29 16:13:07 +00:00
Alexey Bataev	f228b53716	[SLP] Generalization of stores vectorization. Stores are vectorized with maximum vectorization factor of 16. Patch tries to improve the situation and use maximal vectorization factor. Reviewers: spatel, RKSimon, mkuper, hfinkel Differential Revision: https://reviews.llvm.org/D43582	2019-10-29 11:46:36 -04:00
Simon Pilgrim	501cf25839	[X86] Pull out combineOrShiftToFunnelShift helper. NFCI.	2019-10-29 15:29:51 +00:00
Sanjay Patel	a22282be54	[InstCombine] make icmp vector canonicalization safe for constant with undef elements This is a fix for: https://bugs.llvm.org/show_bug.cgi?id=43730 ...and as shown there, we have existing test cases that show potential miscompiles. We could just bail out for vector constants that contain any undef elements, or we can do as shown here: allow the transform, but replace the undefs with a safe value. For most of the tests shown, this results in a full splat constant (no undefs) which is probably a win for further IR analysis because we conservatively don't match undefs in most cases. Codegen can probably recover these kinds of undef lanes via demanded elements analysis if that's profitable. Differential Revision: https://reviews.llvm.org/D69519	2019-10-29 10:58:14 -04:00
Krzysztof Parzyszek	99f51960fd	[Hexagon] Handle remaining registers in getRegisterByName() This fixes https://llvm.org/PR43829.	2019-10-29 08:56:01 -05:00
Sanjay Patel	09feea972d	[IR] move/change null-check to assert This should trigger a dereference before null-check warning, but I don't see it when building with clang. In any case, the current and known future users of this helper require non-null args, so I'm converting the 'if' to an assert.	2019-10-29 09:28:47 -04:00
Sanjay Patel	a1e8ad4f2f	[IR] move helper function to replace undef constant (elements) with fixed constants This is the NFC part of D69519. We had this functionality locally in instcombine, but it can be used elsewhere, so hoisting it to Constant class.	2019-10-29 08:52:10 -04:00
Greg Bedwell	1ba72a81ca	Fix some spelling mistakes in comments. NFC	2019-10-29 12:41:24 +00:00
Greg Bedwell	ed66be5c0c	Fix a spelling mistake in a comment. NFC (I'm currently trying to debug a strange error message I get when pushing to github, despite the pushes being successful).	2019-10-29 12:32:01 +00:00
Greg Bedwell	b1c4b4d5cb	Fix a spelling mistake in a comment. NFC	2019-10-29 12:19:52 +00:00
Andrea Di Biagio	67720e7bf7	Revert "[NFC] Replace a linked list in LiveDebugVariables pass with a DenseMap" This reverts commit `8af5ada093`. As Bjorn pointed out in D68816, the iteration over `UserVals` may not be safe. Reverting on behalf of Orlando.	2019-10-29 12:13:23 +00:00
Simon Pilgrim	ec82eb2d02	Fix unused variable warning. NFCI.	2019-10-29 12:12:28 +00:00
Florian Hahn	596e4ab97a	[LCSSA] Forget values we create LCSSA phis for Summary: Currently we only forget the loop we added LCSSA phis for. But SCEV expressions in other loops could also depend on the instruction we added a PHI for and currently we do not invalidate those expressions. This can happen when we use ScalarEvolution before converting a function to LCSSA form. The SCEV expressions will refer to the non-LCSSA value. If this SCEV expression is then used with the expander, we do not preserve LCSSA form. This patch properly forgets the values we created PHIs for. Those need to be recomputed again. This patch fixes PR43458. Currently SCEV::verify does not catch this mismatch and any test would need to run multiple passes to trigger the error (e.g. -loop-reduce -loop-unroll). I will also look into catching this kind of mismatch in the verifier. Also, we currently forget the whole loop in LCSSA and I'll check if we can be more surgical. Reviewers: efriedma, sanjoy.google, reames Reviewed By: efriedma Subscribers: zzheng, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68194	2019-10-29 12:05:09 +00:00
Simon Pilgrim	2c496bb530	Revert rG70f5aecedef9a6e347e425eb5b843bf797b95319 - "Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)" This fails on EXPENSIVE_CHECKS builds	2019-10-29 11:54:58 +00:00
Jeremy Morse	ec32dff0b0	[BranchFolding] skip debug instr to avoid code change Use the existing helper function in BranchFolding, "countsAsInstruction", to skip over non-instructions. Otherwise debug instructions can be identified as the last real instruction in a block, leading to different codegen decisions when debug is enabled as demonstrated by the test case. Patch by: yechunliang (Chris Ye)! Differential Revision: https://reviews.llvm.org/D66467	2019-10-29 11:45:38 +00:00
Jay Foad	dc63d6175a	[ConstantFold] Push extractelement into getelementptr's operands This fixes a minor oversight mentioned in the review of D69379: we should push extractelement into the operands of getelementptr regardless of whether that enables further folding.	2019-10-29 10:31:52 +00:00
Georgii Rymar	3fe7f1dcf4	[yaml2obj] - Make .symtab to be not mandatory section for SHT_REL[A] section. Before this change .symtab section was required for SHT_REL[A] section declarations. yaml2obj automatically defined it in case when YAML document did not have it. With this change it is now possible to produce an object that has a relocation section, but has no symbol table. It simplifies the code and also it is inline with how we handle Link fields for another special sections. Differential revision: https://reviews.llvm.org/D69260	2019-10-29 11:43:12 +03:00
Georgii Rymar	5b118a0471	[yaml2obj] - Improve handling of the SHT_GROUP section. Currently, when we do not specify "Info" field in a YAML description for SHT_GROUP section, yaml2obj reports an error: "error: unknown symbol referenced: '' by YAML section '.group1'" Also, we do not link it with a symbol table by default, though it is what we do for AddrsigSection, HashSection, RelocationSection. (http://www.sco.com/developers/gabi/latest/ch4.sheader.html#sh_link) The patch fixes missings mentioned. Differential revision: https://reviews.llvm.org/D69299	2019-10-29 11:09:12 +03:00
Lang Hames	5a955cc8b9	[JITLink] Tighten section sorting criteria to fix a flaky test case. Sections may have zero size and zero-sized sections may share a start address with other zero-sized sections. For the section overlap test to function correctly zero-sized sections must be ordered before any non-zero sized ones. This should fix the intermittent failures in the test/ExecutionEngine/JITLink/X86/MachO_zero_fill_alignment.s test case that have been observed on some builders.	2019-10-28 22:56:13 -07:00
Matt Arsenault	21bc8e5a13	AMDGPU: Make VReg_1 only include 1 artificial register When TableGen is inferring register classes from contexts, it uses a sorting function based on the number of registers in the class. Since this was being treated as an alias of VGPR_32, they had exactly the same size. The sort used wasn't a stable sort, and even if it were, I believe the tie breaker would effectively end up being the alphabetical ordering of the class name. There appear to be issues trying to use an empty set of registers, so add only one so this will always sort to the end. Also add a comment explaining how VReg_1 is a dirty hack for SelectionDAG. This does end up changing the behavior of i1 with inline asm and VGPR constraints, but the existing behavior was was already nonsensical and inconsistent. It should probably be disallowed anyway. Fixes bug 43699	2019-10-28 20:51:51 -07:00
Shiva Chen	c1498e37ab	[RISCV] Remove RA from reserved register to use as callee saved register Remove RA from reserved register list, so we could use it as callee saved register Differential Revision: https://reviews.llvm.org/D67698	2019-10-29 11:32:16 +08:00
Johannes Doerfert	1a74645a70	[Attributor] Make IntegerState more flexible To make IntegerState more flexible but also less error prone we split it up into (1) incrementing, (2) decrementing, and (3) bit-tracking states. This adds functionality compared to before and disallows misuse, e.g., "incrementing" updates on a bit-tracking state. Part of the change is a single operator in the base class which simplifies helper functions that deal with states. There are certain functional changes but all of which should actually be corrections.	2019-10-28 20:27:22 -05:00
Evgenii Stepanov	03e882050f	[msan] Remove more attributes from sanitized functions. Summary: MSan instrumentation adds stores and loads even to pure readonly/writeonly functions. It is removing some of those attributes from instrumented functions and call targets, but apparently not enough. Remove writeonly, argmemonly and speculatable in addition to readonly / readnone. Reviewers: pcc, vitalybuka Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69541	2019-10-28 17:57:28 -07:00
Nemanja Ivanovic	25a41ad242	[PowerPC] Emit scalar fp min/max instructions VSX provides floating point minimum and maximum instructions that conform to IEEE semantics. This legalizes the respective nodes and emits VSX code for them. Furthermore, on Power9 cores we have xsmaxcdp and xsmincdp instructions that conform to language semantics for the conditional operator even in the presence of NaNs. Differential revision: https://reviews.llvm.org/D62993	2019-10-28 19:13:33 -05:00
Amy Huang	742043047c	Recommit "Add a heap alloc site marker field to the ExtraInfo in MachineInstrs" Summary: Fixes some things from original commit at https://reviews.llvm.org/D69136. The main change is that the heap alloc marker is always stored as ExtraInfo in the machine instruction instead of in the PointerSumType because it cannot hold more than 4 pointer types. Add instruction marker to MachineInstr ExtraInfo. This does almost the same thing as Pre/PostInstrSymbols, except that it doesn't create a label until printing instructions. This allows for labels to be put around instructions that are deleted/duplicated somewhere. Use this marker to track heap alloc site call instructions. Reviewers: rnk Subscribers: MatzeB, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69536	2019-10-28 16:59:32 -07:00
Amara Emerson	0f6ed432d5	[AArch64][GlobalISel] Fix assertion fail in C++ selection for vector zext of <4 x s8> We bailed out of dealing with vectors only after the assertion, move it before. Fixes PR43794	2019-10-28 15:45:01 -07:00
Nemanja Ivanovic	97e3626070	[PowerPC] Do not emit HW loop if the body contains calls to lrint/lround These two intrinsics are lowered to calls so should prevent the formation of CTR loops. In a subsequent patch, we will handle all currently known intrinsics and prevent the formation of HW loops if any unknown intrinsics are encountered. Differential revision: https://reviews.llvm.org/D68841	2019-10-28 17:23:08 -05:00
jasonliu	d83a2faacd	[NFCI][XCOFF][AIX] Skip empty Section during object file generation This is a fix to D69112 where we common up the logic of writing CsectGroup. However, we forget to skip the Sections that are empty in that patch. Reviewed by: daltenty, xingxue Differential Revision: https://reviews.llvm.org/D69447	2019-10-28 22:04:23 +00:00
Puyan Lotfi	6b7615ae9a	[MachineOutliner][NFC] clang-formating the MachineOutliner.	2019-10-28 17:58:27 -04:00
Artem Belevich	d9972f8482	[NVPTX] Added llvm.nvvm.mma.m8n8k4.* intrinsics Differential Revision: https://reviews.llvm.org/D69324	2019-10-28 13:55:30 -07:00
Hiroshi Yamauchi	75f72f6b73	[PGO][PGSO] SizeOpts changes. Summary: (Split of off D67120) SizeOpts/MachineSizeOpts changes for profile guided size optimization. (A second try after previously committed as r375254 and reverted as r375375.) Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69409	2019-10-28 12:57:26 -07:00
Francis Visoiu Mistrih	c7557dd692	[Remarks] Remove references to ELF support There is no ELF support at the moment. Remove all the references to the `.remarks` section.	2019-10-28 12:50:46 -07:00
Francis Visoiu Mistrih	209d5a12c5	[Remarks] Emit the remarks section by default for certain formats Emit a remarks section by default for the following formats: * bitstream * yaml-strtab while still providing -remarks-section=<bool> to override the defaults.	2019-10-28 12:50:46 -07:00
Puyan Lotfi	a51fc8ddf8	[MachineOuliner][NFC] Refactoring code to make outline rerunning a cleaner diff. I want to add the ability to rerun the outliner in certain cases, and I thought this could be an NFC change that could make a subsequent change that allows for rerunning the outliner a cleaner diff. Differential Revision: https://reviews.llvm.org/D69482	2019-10-28 15:13:45 -04:00
David Tellenbach	e3a45a24d1	[ARM][Thumb2InstrInfo] Fix default `0` opcode when rewriting frame indices The static functions `positiveOffsetOpcode`, `negativeOffsetOpcode` and `immediateOffsetOpcode` (lib/Target/ARM/Thumb2InstrInfo.cpp) currently can return `0` as default opcode which is meaningless in this situation. This patch replaces this default value by llvm_unreachable. Reviewers: t.p.northover, tellenbach Reviewed By: tellenbach Subscribers: tellenbach, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69432 Patch By: Lorenzo Casalino <lorenzo.casalino93@gmail.com>	2019-10-28 18:58:45 +00:00
Nico Weber	e59f7488c7	Convert files added in `d157a9bc8b` to unix line endings. Ran: git show --diff-filter=A --stat `d157a9bc8b` \| grep '\|' \| \ awk '{ print $1 }' \| xargs dos2unix	2019-10-28 14:39:45 -04:00
Jay Foad	843c0adf0f	[ConstantFold] Fold extractelement of getelementptr Summary: Getelementptr has vector type if any of its operands are vectors (the scalar operands being implicitly broadcast to all vector elements). Extractelement applied to a vector getelementptr can be folded by applying the extractelement in turn to all of the vector operands. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69379	2019-10-28 18:32:39 +00:00
Craig Topper	3da269a248	[X86] Add a DAG combine to turn (and (bitcast (vXi1 (concat_vectors (vYi1 setcc), undef,))), C) into (bitcast (vXi1 (concat_vectors (vYi1 setcc), zero,))) The legalization of v2i1->i2 or v4i1->i4 bitcasts followed by a setcc can create an and after the bitcast. If we're lucky enough that the input to the bitcast is a concat_vectors where the first operand is a setcc that can natively 0 all the upper bits of ak-register, then we should replace the other operands of the concat_vectors with zero in order to remove the AND. With the AND removed we might be able to use a kortest on the result. Differential Revision: https://reviews.llvm.org/D69205	2019-10-28 11:27:01 -07:00
Sander de Smalen	70f5aecede	Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2) Fixed up test/DebugInfo/MIR/Mips/live-debug-values-reg-copy.mir that broke r375425.	2019-10-28 18:05:19 +00:00
Craig Topper	18824d25d8	[LV] Interleaving should not exceed estimated loop trip count. Currently we may do iterleaving by more than estimated trip count coming from the profile or computed maximum trip count. The solution is to use "best known" trip count instead of exact one in interleaving analysis. Patch by Evgeniy Brevnov. Differential Revision: https://reviews.llvm.org/D67948	2019-10-28 10:58:22 -07:00
Bjorn Pettersson	80cb2cecc6	[utils] InlineFunction: fix for debug info affecting optimizations Summary: Debug info affects output from "opt -inline", InlineFunction could not handle the llvm.dbg.value when it exist between alloca instructions. Problem was that the first alloca in a sequence of allocas was handled differently from the subsequence alloca instructions. Now all static alloca instructions are treated the same (being removed if the have no uses). So it does not matter if there are dbg instructions (or any other instructions) in between. Fix the issue: https://bugs.llvm.org/show_bug.cgi?id=43291k Patch by: yechunliang (Chris Ye) Reviewers: bjope, jmorse, vsk, probinson, jdoerfert, mtrofin, aprantl, fhahn Reviewed By: bjope Subscribers: uabelho, ormris, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68633	2019-10-28 18:19:07 +01:00
Austin Kerbow	d11b93ec6a	AMDGPU: Avoid overwriting saved PC Summary: An outstanding load with same destination sgpr as call could cause PC to be updated with junk value on return. Reviewers: arsenm, rampitec Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69474	2019-10-28 10:02:22 -07:00
Sean Fertile	582e3c09d4	[AIX] Refactor AIX Call Lowering to use CCState. NFCI. This patch reworks the AIX call lowering to use CCState. Some defensive errors are added in this patch to protect from emitting bad code for calling convention logic that has not been implemented by design. The use of CCState follows the precedent of other targets and enables the reuse of calling convention logic in LowerFormalArguments, which will be rewritten to also use CCState in a late patch. Patch by Chris Bowler. Differential Revision: https://reviews.llvm.org/D69101	2019-10-28 12:44:22 -04:00
David Green	bf21f0d489	[InstCombine] Extra combine for uadd_sat This is an extra fold for a canonical form of uadd_sat, as shown in D68651. It essentially selects uadd from an add and a select. Differential Revision: https://reviews.llvm.org/D69244	2019-10-28 15:21:16 +00:00
Andrew Paverd	d157a9bc8b	Add Windows Control Flow Guard checks (/guard:cf). Summary: A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on indirect function calls, using either the check mechanism (X86, ARM, AArch64) or or the dispatch mechanism (X86-64). The check mechanism requires a new calling convention for the supported targets. The dispatch mechanism adds the target as an operand bundle, which is processed by SelectionDAG. Another pass (CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as required by /guard:cf. This feature is enabled using the `cfguard` CC1 option. Reviewers: thakis, rnk, theraven, pcc Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65761	2019-10-28 15:19:39 +00:00
Jinsong Ji	a233e7d7cb	[AArch64] Fix unannotated fall-through between switch labels This is breaking buildbot with -Werror,-Wimplicit-fallthrough on. eg: http://lab.llvm.org:8011/builders/ppc64le-lld-multistage-test/builds/6881	2019-10-28 15:18:50 +00:00
Jeremy Morse	f5e1b718a6	[DebugInfo] MachineSink: find more DBG_VALUEs to sink In the Pre-RA machine sinker, previously we were relying on all DBG_VALUEs being immediately after the instruction that defined their operands. This isn't a valid assumption, as a variable location change doesn't necessarily correspond to where the value is computed. In this patch, we collect DBG_VALUEs that might need sinking as we walk through a block, and sink all of them if their defining instruction is sunk. This patch adds some copy propagation too, so that if we sink a copy inst, the now non-dominated paths can use the copy source for the variable location. Differential Revision: https://reviews.llvm.org/D58386	2019-10-28 14:32:50 +00:00
Sanjay Patel	1ebd4a2e3a	[DAGCombiner] widen any_ext of popcount based on target support This enhances D69127 (rGe6c145e0548e3b3de6eab27e44e1504387cf6b53) to handle the looser "any_extend" cast in addition to zext. This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688: https://bugs.llvm.org/show_bug.cgi?id=43688	2019-10-28 10:07:12 -04:00
Sanjay Patel	f2e93d10fe	[CVP] prevent propagating poison when substituting edge values into a phi (PR43802) This phi simplification transform was added with: D45448 However as shown in PR43802: https://bugs.llvm.org/show_bug.cgi?id=43802 ...we must be careful not to propagate poison when we do the substitution. There might be some more complicated analysis possible to retain the overflow flag, but it should always be safe and easy to drop flags (we have similar behavior in instcombine and other passes). Differential Revision: https://reviews.llvm.org/D69442	2019-10-28 08:58:28 -04:00
Jeremy Morse	ee50590e16	[DebugInfo] MachineSink: Insert undef DBG_VALUEs when sinking instructions When we sink DBG_VALUEs between blocks, we simply move the DBG_VALUE instruction to below the sunk instruction. However, we should also mark the variable as being undef at the original location, to terminate any earlier variable location. This patch does that -- plus, if the instruction being sunk is a copy, it attempts to propagate the copy through the DBG_VALUE, replacing the destination with the source. Differential Revision: https://reviews.llvm.org/D58238	2019-10-28 12:17:56 +00:00
Dmitry Preobrazhensky	b8042dbe2b	[AMDGPU][MC][GFX10] Added v_interp_[p1/p2/mov]_f32_e64 See https://bugs.llvm.org/show_bug.cgi?id=43747 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69348	2019-10-28 15:03:43 +03:00
David Green	ba2c625531	[Codegen][ARM] Add float softening for cbrt We would previously have no soft-float softening for cbrt, so could hit a crash failing to select. This fills in what appears to be missing. Differential Revision: https://reviews.llvm.org/D69345	2019-10-28 11:08:55 +00:00
Rafael Stahl	a483302fbe	minor doc typo fix / testing github commit	2019-10-28 12:08:40 +01:00
vhscampos	f6e11a36c4	[ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE Summary: Writing support for three ACLE functions: unsigned int __cls(uint32_t x) unsigned int __clsl(unsigned long x) unsigned int __clsll(uint64_t x) CLS stands for "Count number of leading sign bits". In AArch64, these two intrinsics can be translated into the 'cls' instruction directly. In AArch32, on the other hand, this functionality is achieved by implementing it in terms of clz (count number of leading zeros). Reviewers: compnerd Reviewed By: compnerd Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69250	2019-10-28 11:06:58 +00:00
Kerry McLaughlin	da720a38b9	[AArch64][SVE] Implement masked load intrinsics Summary: Adds support for codegen of masked loads, with non-extending, zero-extending and sign-extending variants. Reviewers: huntergr, rovka, greened, dmgreen Reviewed By: dmgreen Subscribers: dmgreen, samparker, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68877	2019-10-28 10:06:14 +00:00
Sam Elliott	7214f7a79f	[RISCV] Lower llvm.trap and llvm.debugtrap Summary: Until this commit, these have lowered to a call to abort(). `llvm.trap()` now lowers to `unimp`, which should trap on all systems. `llvm.debugtrap()` now lowers to `ebreak`, which is exactly what this instruction is for. Reviewers: asb, luismarques Reviewed By: asb Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69390	2019-10-28 09:54:33 +00:00
David Zarzycki	657e4240b1	[X86] Fix 48/96 byte memcmp code gen Detect scalar ISD::ZERO_EXTEND generated by memcmp lowering and convert it to ISD::INSERT_SUBVECTOR. https://reviews.llvm.org/D69464	2019-10-28 08:41:45 +02:00
Craig Topper	7af8d5267b	[X86] Use 64-bit version of source register in LowerPATCHABLE_EVENT_CALL and LowerPATCHABLE_TYPED_EVENT_CALL Summary: The PATCHABLE_EVENT_CALL uses i32 in the intrinsic. This results in the register allocator picking a 32-bit register. We need to use the 64-bit register when forming the MOV64rr instructions. Otherwise we print illegal assembly in the text output. I think prior to this it was impossible for SrcReg to be equal to DstReg so the NOP code was not reachable. While there use Register instead of unsigned. Also add a FIXME for what looks like a bug. Reviewers: dberris Reviewed By: dberris Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69365	2019-10-27 20:44:41 -07:00
Matt Arsenault	9b0b626d2c	Use isConvergent helper instead of directly checking attribute	2019-10-27 19:39:14 -07:00
Saleem Abdulrasool	418d1ea555	PM: silence `-Wpessimizing-move` from GCC 9.2.1 (NFC) Remove the explicit move enabling NVRO.	2019-10-27 18:33:09 -04:00
Sanjay Patel	85a2146c15	[SDAG] fold insert_vector_elt with undef index Similar to: rG4c47617627fb This makes the DAG behavior consistent with IR's insertelement. https://bugs.llvm.org/show_bug.cgi?id=42689 I've tried to maintain test intent for AArch64 and WebAssembly by replacing undef index operands with something else.	2019-10-27 15:28:43 -04:00
Craig Topper	f067dd839e	[LegalizeTypes] When promoting BITREVERSE/BSWAP don't take the shift amount into account when determining the shift amount VT. If the target's preferred shift amount VT can't hold any shift amount for the promoted VT, we should use i32. The specific shift amount shouldn't matter. The type will be adjusted later when the shift itself is type legalized. This avoids an assert in getNode. Fixes PR43820.	2019-10-27 12:20:35 -07:00
Craig Topper	73f255b83a	[TargetLowering] Add getBooleanContents contents check to "SETCC (SETCC), [0\|1], [EQ\|NE] -> SETCC" combine. This combine is only valid if the inner setcc produces a 0/1 result or the inner type is MVT::i1. I haven't seen this cause any issues, just happened to notice it while reviewing combines in this function. While there also fix another call to use the value type from the SDValue for the operand instead of calling SDNode::getValueType(0). Though its likely the use is result 0, its not guaranteed.	2019-10-27 10:07:15 -07:00
Greg Bedwell	4640223ebd	[MCA] Fix a spelling mistake in a comment. NFC	2019-10-27 10:06:22 +00:00
Craig Topper	1ce8a5b385	[X86] Only look up boolean reduction cost tables if the reduction is not pairwise. Summary: We don't pattern match pairwise shuffles in SelectionDAG. So we should only return the optimized costs if its not a pairwise shuffle. I think SLP vectorizer gives priority to non pairwise shuffle if the cost is the same. And the look up for reduction intrinsics passes false for the pairwise flag. So this probably has no real effect today. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69083	2019-10-26 16:41:19 -07:00
Roman Lebedev	9d77ad5754	[APInt] Introduce APIntOps::GetMostSignificantDifferentBit() Summary: Compare two values, and if they are different, return the position of the most significant bit that is different in the values. Needed for D69387. Reviewers: nikic, spatel, sanjoy, RKSimon Reviewed By: nikic Subscribers: xbolva00, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69439	2019-10-26 23:20:58 +03:00
David Zarzycki	11c920207a	[X86] Prefer KORTEST on Knights Landing or later for memcmp() PTEST and especially the MOVMSK instructions are slow on Knights Landing or later. As a bonus, this patch increases instruction parallelism by emitting: KORTEST(PCMPNEQ(a, b), PCMPNEQ(c, d)) == 0 Instead of: KORTEST(AND(PCMPEQ(a, b), PCMPEQ(c, d))) == ~0 https://reviews.llvm.org/D69157	2019-10-26 21:14:57 +03:00
Georgii Rymar	073ab70b72	[ObjectYAML] - Do not use auto. NFC. Using 'auto' when the type is not obvious is undesired. (it is just a test commit actually)	2019-10-26 15:08:49 +03:00
cdevadas	e921ede540	[AMDGPU] Fix Vreg_1 PHI lowering in SILowerI1Copies. There is a minor flaw in the implementation of function lowerPhis. This function replaces values of regclass Vreg_1 (boolean values) involved in PHIs into an SGPR. Currently it iterates over the MBBs and performs an inplace lowering of PHIs and fails to lower any incoming value that itself is another PHI of Vreg_1 regclass. The failure occurs only when the MBB where the incoming PHI value belongs is not visited/lowered yet. To fix this problem, collect all Vreg_1 PHIs upfront and then perform the lowering. Differential Revision: https://reviews.llvm.org/D69182	2019-10-26 14:37:45 +05:30
Craig Topper	a6a37e820c	[X86][GISel] Fix typo in comment. NFC	2019-10-26 00:27:53 -07:00
John McCall	27e2c8faec	Add Record::getValueAsOptionalDef(). Using `?` as an optional marker is very useful in Clang's AST-node emitters because otherwise we need a separate class just to encode the presence or absence of a base node reference.	2019-10-25 16:39:21 -07:00
Sanjay Patel	4c47617627	[SDAG] fold extract_vector_elt with undef index This makes the DAG behavior consistent with IR's extractelement after: rGb32e4664a715 https://bugs.llvm.org/show_bug.cgi?id=42689 I've tried to maintain test intent for WebAssembly. The AMDGPU test is trying to test for crashing or other bad behavior, but I'm not sure if that's possible after this change.	2019-10-25 19:27:26 -04:00
Stanislav Mekhanoshin	4c0251da14	[AMDGPU] Enable SGPR copy folding That used to fail in the last testcase function because after %0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it was further folded into S_STORE_DWORD_IMM. Its legal effective subreg class is SReg_32 while instruction expects more restricted SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand() passed the legality check and it was caught in the verifier. Borrowed code from the verifier to check for RC legality. Differential Revision: https://reviews.llvm.org/D69445	2019-10-25 15:08:30 -07:00
Yonghong Song	a27c998c00	[BPF] fix a CO-RE issue with -mattr=+alu32 Ilya Leoshkevich (<iii@linux.ibm.com>) reported an issue that with -mattr=+alu32 CO-RE has a segfault in BPF MISimplifyPatchable pass. The pattern will be transformed by MISimplifyPatchable pass looks like below: r5 = ld_imm64 @"b:0:0$0:0" r2 = ldw r5, 0 ... r2 ... // use r2 The pass will remove the intermediate 'ldw' instruction and replacing all r2 with r5 likes below: r5 = ld_imm64 @"b:0:0$0:0" ... r5 ... // use r5 Later, the ld_imm64 insn will be replaced with r5 = <patched immediate> for field relocation purpose. With -mattr=+alu32, the input code may become r5 = ld_imm64 @"b:0:0$0:0" w2 = ldw32 r5, 0 ... w2 ... // use w2 Replacing "w2" with "r5" is incorrect and will trigger compiler internal errors. To fix the problem, if the register class of ldw* dest register is sub_32, we just replace the original ldw* register with: w2 = w5 Directly replacing all uses of w2 with in-place constructed w5 for the use operand seems not working in all cases. The latest kernel will have -mattr=+alu32 on by default, so added this flag to all CORE tests. Tested with latest kernel bpf-next branch as well with this patch. Differential Revision: https://reviews.llvm.org/D69438	2019-10-25 14:27:25 -07:00
Jian Cai	a6b0219fc4	Revert "[ARM] Uses "Sun Style" syntax for section switching" This reverts commit `03de2f84fc`.	2019-10-25 14:03:07 -07:00

1 2 3 4 5 ...

128016 Commits