llvm-project

Commit Graph

Author	SHA1	Message	Date
Daniel Neilson	367c2aea4e	[InstCombine] Properly change GEP type when reassociating loop invariant GEP chains Summary: This is a fix to PR37005. Essentially, rL328539 ([InstCombine] reassociate loop invariant GEP chains to enable LICM) contains a bug whereby it will convert: %src = getelementptr inbounds i8, i8* %base, <2 x i64> %val %res = getelementptr inbounds i8, <2 x i8> %src, i64 %val2 into: %src = getelementptr inbounds i8, i8 %base, i64 %val2 %res = getelementptr inbounds i8, <2 x i8*> %src, <2 x i64> %val By swapping the index operands if the GEPs are in a loop, and %val is loop variant while %val2 is loop invariant. This fix recreates new GEP instructions if the index operand swap would result in the type of %src changing from vector to scalar, or vice versa. Reviewers: sebpop, spatel Reviewed By: sebpop Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45287 llvm-svn: 329331	2018-04-05 18:51:45 +00:00
Craig Topper	9eec2025c5	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329330	2018-04-05 18:38:45 +00:00
Sanjay Patel	37248d35c3	[InstCombine] add test for fneg+fsub with nsz; NFC There used to be a fold that would handle this case more generally, but it was removed at rL73243 to fix PR4374: https://bugs.llvm.org/show_bug.cgi?id=4374 llvm-svn: 329322	2018-04-05 17:40:51 +00:00
Simon Pilgrim	7f6f43fa3e	[X86][SSE] Add integer add/mul vector.reduce tests llvm-svn: 329321	2018-04-05 17:37:35 +00:00
Simon Pilgrim	de5d0ffe47	[X86][SSE] Add integer and/or/xor vector.reduce tests llvm-svn: 329320	2018-04-05 17:29:51 +00:00
Simon Pilgrim	57d324082c	[X86][SSE] Add integer min/max vector.reduce tests llvm-svn: 329319	2018-04-05 17:25:40 +00:00
Sanjay Patel	deaf4f354e	[InstCombine] use pattern matchers for fsub --> fadd folds This allows folding for vectors with undef elements. llvm-svn: 329316	2018-04-05 17:06:45 +00:00
Sam Clegg	cfd44a2e69	[WebAssembly] Allow for the creation of user-defined custom sections This patch adds a way for users to create their own custom sections to be added to wasm files. At the LLVM IR layer, they are defined through the "wasm.custom_sections" named metadata. The expected use case for this is bindings generators such as wasm-bindgen. Patch by Dan Gohman Differential Revision: https://reviews.llvm.org/D45297 llvm-svn: 329315	2018-04-05 17:01:39 +00:00
Sanjay Patel	7becb3ae4b	[InstCombine] add tests for fsub --> fadd; NFC llvm-svn: 329313	2018-04-05 16:51:09 +00:00
Andrea Di Biagio	c74ad502ce	[MC][Tablegen] Allow models to describe the retire control unit for llvm-mca. This patch adds the ability to describe properties of the hardware retire control unit. Tablegen class RetireControlUnit has been added for this purpose (see TargetSchedule.td). A RetireControlUnit specifies the size of the reorder buffer, as well as the maximum number of opcodes that can be retired every cycle. A zero (or negative) value for the reorder buffer size means: "the size is unknown". If the size is unknown, then llvm-mca defaults it to the value of field SchedMachineModel::MicroOpBufferSize. A zero or negative number of opcodes retired per cycle means: "there is no restriction on the number of instructions that can be retired every cycle". Models can optionally specify an instance of RetireControlUnit. There can only be up-to one RetireControlUnit definition per scheduling model. Information related to the RCU (RetireControlUnit) is stored in (two new fields of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp). This patch fixes PR36661. Differential Revision: https://reviews.llvm.org/D45259 llvm-svn: 329304	2018-04-05 15:41:41 +00:00
Sanjay Patel	2204520e49	[PatternMatch] define m_FNeg using m_FSub Using cstfp_pred_ty in the definition allows us to match vectors with undef elements. This replicates the change for m_Not from D44076 / rL326823 and continues towards making all pattern matchers allow undef elements in vectors. llvm-svn: 329303	2018-04-05 15:36:55 +00:00
Sanjay Patel	2eaa2a43f8	[InstCombine] add vector and vector undef tests for FP folds; NFC llvm-svn: 329294	2018-04-05 15:07:35 +00:00
Tim Northover	b30388bf11	ARM: Do not spill CSR to stack on entry to noreturn functions A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch by myeisha (pmb). llvm-svn: 329287	2018-04-05 14:26:06 +00:00
Sam Parker	0e7deb8104	[DAGCombine] Revert r329160 Again, broke the big endian stage 2 builders. llvm-svn: 329283	2018-04-05 13:46:17 +00:00
Florian Hahn	8eda4b0dc0	[LoopInterchange] Require asserts for test using -stats (NFC) This fixes a buildbot failure. llvm-svn: 329279	2018-04-05 13:07:39 +00:00
Florian Hahn	6e0043365b	[LoopInterchange] Add stats counter for number of interchanged loops. Reviewers: samparker, karthikthecool, blitz.opensource Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D45209 llvm-svn: 329269	2018-04-05 10:39:23 +00:00
Simon Dardis	47e66335ec	[mips] Regenerate test before posting patch for constant multiplication (NFC) llvm-svn: 329268	2018-04-05 10:30:17 +00:00
Florian Hahn	831a757728	[LoopInterchange] Preserve LoopInfo after interchanging. LoopInterchange relies on LoopInfo being up-to-date, so we should preserve it after interchanging. This patch updates restructureLoops to move the BBs of the interchanged loops to the right place. Reviewers: davide, efriedma, karthikthecool, mcrosier Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D45278 llvm-svn: 329264	2018-04-05 09:48:45 +00:00
Craig Topper	15303dda0d	[X86] Revert r329251-329254 It's failing on the bots and I'm not sure why. This reverts: [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. [X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. [X86] Remove some InstRWs for plain store instructions on Sandy Bridge. [X86] Auto-generate complete checks. NFC llvm-svn: 329256	2018-04-05 05:19:36 +00:00
Craig Topper	25c7110a37	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329254	2018-04-05 04:42:03 +00:00
Craig Topper	6c4e08c835	[X86] Remove some InstRWs for plain store instructions on Sandy Bridge. We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion. llvm-svn: 329252	2018-04-05 04:42:01 +00:00
Craig Topper	5c36557426	[X86] Auto-generate complete checks. NFC llvm-svn: 329251	2018-04-05 04:41:59 +00:00
Taewook Oh	e0db533feb	[CallSiteSplitting] Do not perform callsite splitting inside landing pad Summary: If the callsite is inside landing pad, do not perform callsite splitting. Callsite splitting uses utility function llvm::DuplicateInstructionsInSplitBetween, which eventually calls llvm::SplitEdge. llvm::SplitEdge calls llvm::SplitCriticalEdge with an assumption that the function returns nullptr only when the target edge is not a critical edge (and further assumes that if the return value was not nullptr, the predecessor of the original target edge always has a single successor because critical edge splitting was successful). However, this assumtion is not true because SplitCriticalEdge returns nullptr if the destination block is a landing pad. This invalid assumption results assertion failure. Fundamental solution might be fixing llvm::SplitEdge to not to rely on the invalid assumption. However, it'll involve a lot of work because current API assumes that llvm::SplitEdge never fails. Instead, this patch makes callsite splitting to not to attempt splitting if the callsite is in a landing pad. Attached test case will crash with assertion failure without the fix. Reviewers: fhahn, junbuml, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45130 llvm-svn: 329250	2018-04-05 04:16:23 +00:00
Teresa Johnson	70565e4cac	[gold] Add debug-pass-manager option, and use it to test new-pass-manager Summary: Follow up from r314963. Reviewers: pcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45293 llvm-svn: 329249	2018-04-05 03:16:57 +00:00
Gerolf Hoflehner	f41aa4fd85	[IR] Upgrade comment token in objc retain release marker Older compiler issued '#' instead of ';' llvm-svn: 329248	2018-04-05 02:44:46 +00:00
Puyan Lotfi	d6f7313c8f	[MIR-Canon] Improving performance by switching to named vregs. No more skipping thounsands of vregs. Much faster running time. llvm-svn: 329246	2018-04-05 00:27:15 +00:00
Puyan Lotfi	26c504fe1e	[MIR-Canon] Adding support for multi-def -> user distance reduction. llvm-svn: 329243	2018-04-05 00:08:15 +00:00
Sam Clegg	685c5e838a	[WebAssembly] Only write 32-bits for WebAssembly::OPERAND_OFFSET32 A bug was found where an offset of -1 would generate an encoding of max int64 which is invalid in the binary format. Differential Revision: https://reviews.llvm.org/D45280 llvm-svn: 329238	2018-04-04 22:27:58 +00:00
Peter Collingbourne	f11eb3ebe7	AArch64: Implement support for the shadowcallstack attribute. The implementation of shadow call stack on aarch64 is quite different to the implementation on x86_64. Instead of reserving a segment register for the shadow call stack, we reserve the platform register, x18. Any function that spills lr to sp also spills it to the shadow call stack, a pointer to which is stored in x18. Differential Revision: https://reviews.llvm.org/D45239 llvm-svn: 329236	2018-04-04 21:55:44 +00:00
Vitaly Buka	4296ea72ff	Don't inline @llvm.icall.branch.funnel Summary: @llvm.icall.branch.funnel is musttail with variable number of arguments. After inlining current backend can't separate call targets from call arguments. Reviewers: pcc Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45116 llvm-svn: 329235	2018-04-04 21:46:27 +00:00
Evgeniy Stepanov	1f1a7a719d	hwasan: add -hwasan-match-all-tag flag Sometimes instead of storing addresses as is, the kernel stores the address of a page and an offset within that page, and then computes the actual address when it needs to make an access. Because of this the pointer tag gets lost (gets set to 0xff). The solution is to ignore all accesses tagged with 0xff. This patch adds a -hwasan-match-all-tag flag to hwasan, which allows to ignore accesses through pointers with a particular pointer tag value for validity. Patch by Andrey Konovalov. Differential Revision: https://reviews.llvm.org/D44827 llvm-svn: 329228	2018-04-04 20:44:59 +00:00
Eric Fiselier	96bbec79b4	[Analysis] Support aligned new/delete functions. Summary: Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well. This allows the compiler to perform certain optimizations including eliding new/delete calls. Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer Reviewed By: bkramer Subscribers: ckennelly, llvm-commits Differential Revision: https://reviews.llvm.org/D44769 llvm-svn: 329218	2018-04-04 19:01:51 +00:00
Eric Fiselier	e03d45fa8e	Revert "[Analysis] Support aligned new/delete functions." This reverts commit bee3bbd9bdd3ab3364b8fb0cdb6326bc1ae740e0. llvm-svn: 329217	2018-04-04 18:23:00 +00:00
Eric Fiselier	0d5f3b0281	[Analysis] Support aligned new/delete functions. Summary: Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well. This allows the compiler to perform certain optimizations including eliding new/delete calls. Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer Reviewed By: bkramer Subscribers: ckennelly, llvm-commits Differential Revision: https://reviews.llvm.org/D44769 llvm-svn: 329215	2018-04-04 18:12:01 +00:00
Craig Topper	498875fab0	[X86] Separate BSWAP32r and BSWAP64r scheduling data in SandyBridge/Haswell/Broadwell/Skylake scheduler models. The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r. llvm-svn: 329211	2018-04-04 17:54:19 +00:00
Lei Huang	09fda63af0	[Power9]Legalize and emit code for quad-precision fma instructions Legalize and emit code for the following quad-precision fma: * xsmaddqp * xsnmaddqp * xsmsubqp * xsnmsubqp Differential Revision: https://reviews.llvm.org/D44843 llvm-svn: 329206	2018-04-04 16:43:50 +00:00
Pavel Labath	6088c23431	Re-commit r329179 after fixing build&test issues - MSVC was not OK with a static_assert referencing a non-static member variable, even though it was just in a sizeof(expression). I move the assert into the emit function, where it is probably more useful. - Tests were failing in builds which did not have the X86 target configured. Since this functionality is not target-specific, I have removed the target specifiers from the .ll files. llvm-svn: 329201	2018-04-04 14:42:14 +00:00
Roman Lebedev	c0c9ba7ee0	[InstCombine] [NFC] Add tests for getting rid of select of bittest (PR36950 / PR17564) Summary: See [[ https://bugs.llvm.org/show_bug.cgi?id=36950 \| PR36950 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=17564 \| PR17564 ]], D45065, D45108 Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45107 llvm-svn: 329198	2018-04-04 14:10:13 +00:00
Dmitry Preobrazhensky	523872ea59	[AMDGPU][MC] Enabled instruction TBUFFER_LOAD_FORMAT_XYZ for SI/CI See bug 36958: https://bugs.llvm.org/show_bug.cgi?id=36958 Differential Revision: https://reviews.llvm.org/D45099 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329197	2018-04-04 13:54:55 +00:00
Simon Pilgrim	f1e668830f	[SLPVectorizer][X86] Regenerate some tests. NFCI llvm-svn: 329196	2018-04-04 13:53:51 +00:00
Simon Pilgrim	8139a88cb6	[X86][Btver2] Strip unnecessary check prefixes from resources tests llvm-svn: 329192	2018-04-04 13:25:45 +00:00
Nico Weber	55fcd07d25	Revert r329179 (and follow-up unsuccessful fix attempts 329184, 329186); it doesn't build. llvm-svn: 329190	2018-04-04 13:06:22 +00:00
Dmitry Preobrazhensky	a0b8cd038c	[AMDGPU][MC] Added support of 3-element addresses for MIMG instructions See bug 35999: https://bugs.llvm.org/show_bug.cgi?id=35999 Differential Revision: https://reviews.llvm.org/D45084 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329187	2018-04-04 13:01:17 +00:00
Pavel Labath	69baab103a	[CodeGen] Generate DWARF v5 Accelerator Tables Summary: This patch adds a DwarfAccelTableEmitter class, which generates an accelerator table, as specified in DWARF v5 standard. At the moment it only generates a DIE offset column and (if we are indexing more than one compile unit) a CU column. Indexing type units is not currently supported, as we don't even have the ability to generate DWARF v5-compatible compile units. The implementation is not data-source agnostic like the one generating apple tables. This was not necessary as we currently only have one user of this code, and without a second user it was not obvious to me how to best abstract this. (The difference between these tables and the apple ones is that they need a lot more metadata about the debug info they are indexing). The generation is triggered by the --accel-tables argument, which supersedes the --dwarf-accel-tables arg -- the latter was a simple on-off switch, but not we can choose between two kinds of accelerator tables we can generate. This is tested by parsing the generated tables with llvm-dwarfdump and the DWARFVerifier, and I've also checked that GNU readelf is able to make sense of the tables. Differential Revision: https://reviews.llvm.org/D43286 llvm-svn: 329179	2018-04-04 12:28:20 +00:00
Simon Pilgrim	d152d55ab2	[X86][CostModel] Use generic SSE levels instead of particular CPUs for shuffle costs llvm-svn: 329168	2018-04-04 11:14:12 +00:00
Nicolai Haehnle	2f5a73820c	AMDGPU: Dimension-aware image intrinsics Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166	2018-04-04 10:58:54 +00:00
Nicolai Haehnle	eb7311ffb1	StructurizeCFG: Test for branch divergence correctly Fixes cases like the new test @nonuniform. In that test, %cc itself is a uniform value; however, when reading it after the end of the loop in basic block %if, its value is effectively non-uniform, so the branch is non-uniform. This problem was encountered in https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change in itself is not sufficient to fix that bug, as there is another issue in the AMDGPU backend. As discovered after committing an earlier version of this change, this exposes a subtle interaction between this pass and DivergenceAnalysis: since we remove and re-create branch instructions, we can no longer rely on DivergenceAnalysis for branches in subregions that were already processed by the pass. Explicitly remove branch instructions from DivergenceAnalysis to avoid dangling pointers as a matter of defensive programming, and change how we detect non-uniform subregions. Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4 Differential Revision: https://reviews.llvm.org/D43743 llvm-svn: 329165	2018-04-04 10:58:15 +00:00
Nicolai Haehnle	3ffd383a15	AMDGPU: Fix copying i1 value out of loop with non-uniform exit Summary: When an i1-value is defined inside of a loop and used outside of it, we cannot simply use the SGPR bitmask from the loop's last iteration. There are also useful and correct cases of an i1-value being copied between basic blocks, e.g. when a condition is computed outside of a loop and used inside it. The concept of dominators is not sufficient to capture what is going on, so I propose the notion of "lane-dominators". Fixes a bug encountered in Nier: Automata. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743 Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D40547 llvm-svn: 329164	2018-04-04 10:57:58 +00:00
John Brawn	21d9b33d62	[AArch64] Add patterns matching (fabs (fsub x y)) to (fabd x y) Differential Revision: https://reviews.llvm.org/D44573 llvm-svn: 329163	2018-04-04 10:12:53 +00:00
Sam Parker	7ec722d603	[DAGCombine] Improve ReduceLoadWidth for SRL Recommitting rL321259. Previosuly this caused an issue with PPCBE but I didn't receieve a reproducer and didn't have the time to follow up. If the issue appears again, please provide a reproducer so I can fix it. Original commit message: If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 329160	2018-04-04 09:26:56 +00:00

1 2 3 4 5 ...

52164 Commits