llvm-project

Commit Graph

Author	SHA1	Message	Date
Heejin Ahn	c87b5e7e22	[WebAssembly] Fix subregion relationship in CFGSort Summary: The previous code for determining the innermost region in CFGSort was not correct. We determine subregion relationship by domination of their headers, i.e., if region A's header dominates region B's header, B is a subregion of A. Previously we assumed that if a BB belongs to both a loop and an exception, the region with fewer number of BBs is the innermost one. This may not be true, because while WebAssemblyException contains BBs in all its subregions (loops or exceptions), MachineLoop may not, because MachineLoop does not contain BBs that don't have a path to its header even if they are dominated by its header. Loop header <---\| \| \| Exception header \| \| \ \| A B \| \| \ \| \| C \| \| \| Loop latch \| \| \| -------------\| For example, in this CFG, the loop does not contain B and C, because they don't have a path back to the loops header. But for CFGSort we consider the exception here belongs to the loop and the exception should be a subregion of the loop and scheduled together. So here we should use `WE->contains(ML->getHeader())` (but not `ML->contains(WE->getHeader())`, for the stated region above). This also fixes some comments and deletes `Regions` vector in `RegionInfo` class, which was not used anywere. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77181	2020-04-01 08:12:41 -07:00
Georgii Rymar	f527e6f2e1	[llvm-readobj] - Do not crash when SHT_HASH table is broken. We have scenarios when the logic of --elf-hash-histogram/--hash-symbols/--hash-table options might crash when given a broken hash table. This patch adds pre-checks for tables for these 3 options and provides test cases. Differential revision: https://reviews.llvm.org/D77147	2020-04-01 18:03:02 +03:00
Jessica Clarke	616289ed29	[LegalizeTypes][RISCV] Correctly sign-extend comparison for ATOMIC_CMP_XCHG Summary: Currently, the comparison argument used for ATOMIC_CMP_XCHG is legalised with GetPromotedInteger, which leaves the upper bits of the value undefind. Since this is used for comparing in an LR/SC loop with a full-width comparison, we must sign extend it. We introduce a new getExtendForAtomicCmpSwapArg to complement getExtendForAtomicOps, since many targets have compare-and-swap instructions (or pseudos) that correctly handle an any-extend input, and the existing function determines the extension of the result, whereas we are concerned with the input. This is related to https://reviews.llvm.org/D58829, which solved the issue for ATOMIC_CMP_SWAP_WITH_SUCCESS, but not the simpler ATOMIC_CMP_SWAP. Reviewers: asb, lenary, efriedma Reviewed By: asb Subscribers: arichardson, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, evandro, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74453	2020-04-01 15:51:26 +01:00
Puyan Lotfi	e3033c0ce5	[llvm][clang][IFS] Enhancing the llvm-ifs yaml format for symbol lists. Prior to this change the clang interface stubs format resembled something ending with a symbol list like this: Symbols: a: { Type: Func } This was problematic because we didn't actually want a map format and also because we didn't like that an empty symbol list required "Symbols: {}". That is to say without the empty {} llvm-ifs would crash on an empty list. With this new format it is much more clear which field is the symbol name, and instead the [] that is used to express an empty symbol vector is optional, ie: Symbols: - { Name: a, Type: Func } or Symbols: [] or Symbols: This further diverges the format from existing llvm-elftapi. This is a good thing because although the format originally came from the same place, they are not the same in any way. Differential Revision: https://reviews.llvm.org/D76979	2020-04-01 10:49:06 -04:00
Simon Pilgrim	eb8880562e	[X86][SSE] combinePTESTCC - fold TESTZ(X,~Y) -> TESTC(Y,X)	2020-04-01 15:10:53 +01:00
Kai Wang	501522b5b2	[RISCV] Support RISC-V ELF attributes sections in llvm-readobj. Enable llvm-readobj to handle RISC-V ELF attribute sections. Differential Revision: https://reviews.llvm.org/D75833	2020-04-01 21:50:11 +08:00
David Green	a0c537834a	[ARM] Extra vmull loop tests. NFC	2020-04-01 14:07:45 +01:00
shchenz	e344f8b9db	Revert "[LSR] re-add testcase for wrongly phi node elimination - NFC" This reverts commit `f25a1b4f58`. ARM and hexagon fail at the new added case.	2020-04-01 12:58:06 +00:00
Pierre-vh	2effe8f5e7	[Target][ARM] Improvements to the VPT Block Insertion Pass This allows the MVE VPT Block insertion pass to remove VPNOTs in order to create more complex VPT blocks such as TE, TEET, TETE, etc. Differential Revision: https://reviews.llvm.org/D75993	2020-04-01 12:34:20 +01:00
shchenz	f25a1b4f58	[LSR] re-add testcase for wrongly phi node elimination - NFC Retest the case on X86/SystemZ/AArch64/PowerPC	2020-04-01 11:11:17 +00:00
Cullen Rhodes	84aa6cf1a9	[Transforms][SROA] Promote allocas with mem2reg for scalable types Summary: Aggregate types containing scalable vectors aren't supported and as far as I can tell this pass is mostly concerned with optimisations on aggregate types, so the majority of this pass isn't very useful for scalable vectors. This patch modifies SROA such that mem2reg is run on allocas with scalable types that are promotable, but nothing else such as slicing is done. The use of TypeSize in this pass has also been updated to be explicitly fixed size. When invoking the following methods in DataLayout: * getTypeSizeInBits * getTypeStoreSize * getTypeStoreSizeInBits * getTypeAllocSize we now called getFixedSize on the resultant TypeSize. This is quite an extensive change with around 50 calls to these functions, and also the first change of this kind (being explicit about fixed vs scalable size) as far as I'm aware, so feedback welcome. A test is included containing IR with scalable vectors that this pass is able to optimise. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D76720	2020-04-01 10:34:11 +00:00
Simon Pilgrim	918ccb64b0	[X86][SSE] Handle basic inversion of PTEST/TESTP operands (PR38522) PTEST/TESTP sets EFLAGS as: TESTZ: ZF = (Op0 & Op1) == 0 TESTC: CF = (~Op0 & Op1) == 0 TESTNZC: ZF == 0 && CF == 0 If we are inverting the 0'th operand of a PTEST/TESTP instruction we can adjust the comparisons to correct handle the inversion implicitly. Additionally, for "TESTZ" (ZF) cases, the allones case, PTEST(X,-1) can be simplified to PTEST(X,X). We can expand this for the TESTZ(X,~Y) pattern and also handle KTEST/KORTEST in the future. Differential Revision: https://reviews.llvm.org/D76984	2020-04-01 11:33:28 +01:00
shchenz	8b8cd150a4	Revert "[LSR] add testcase for wrongly phi node elimination - NFC" This reverts commit `dbf5e4f6c7`. The testcase has different behaviour on PowerPC and X86.	2020-04-01 10:28:43 +00:00
shchenz	dbf5e4f6c7	[LSR] add testcase for wrongly phi node elimination - NFC	2020-04-01 09:58:58 +00:00
Qiu Chaofan	d8b51789fd	[NFC] [PowerPC] Add test for frsp elimination	2020-04-01 17:54:24 +08:00
Bjorn Pettersson	ef49895da8	[X86] Do not assume types are legal in getFauxShuffleMask Summary: Make sure we do not assert on value types not being simple in getFauxShuffleMask when analysing operations such as "v8i16 = truncate v8i24". Reviewers: RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77136	2020-04-01 11:40:18 +02:00
Georgii Rymar	93fc0ba145	[yaml2obj] - Add NBucket and NChain fields for the SHT_HASH section. These fields allows to override nchain and nbucket fields of a SHT_HASH section. Differential revision: https://reviews.llvm.org/D76834	2020-04-01 12:28:16 +03:00
Florian Hahn	d307174e1d	[ConstantRange] Use APInt::or/APInt::and for single elements. Currently ConstantRange::binaryAnd/binaryOr results are too pessimistic for single element constant ranges. If both operands are single element ranges, we can use APInt's AND and OR implementations directly. Note that some other binary operations on constant ranges can cover the single element cases naturally, but for OR and AND this unfortunately is not the case. Reviewers: nikic, spatel, lebedev.ri Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D76446	2020-04-01 09:50:24 +01:00
Florian Hahn	e20cac3650	[Matrix] Add new test case with getelementptr constant exprs. The new test mostly ensures we keep doing the right thing for constant expressions while lowering matrix instructions.	2020-04-01 09:32:13 +01:00
Qiu Chaofan	95bcab8272	[DAGCombiner] Require ninf for sqrt recip estimation Currently, DAG combiner uses (fmul (rsqrt x) x) to estimate square root of x. However, this method would return NaN if x is +Inf, which is incorrect. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D76853	2020-04-01 16:23:43 +08:00
Florian Hahn	862766e01e	[Verifier] Verify matrix dimensions operands match vector size. This patch adds checks to the verifier to ensure the dimension arguments passed to the matrix intrinsics match the vector types for their arugments/return values. Reviewers: anemet, Gerolf, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D77129	2020-04-01 09:21:39 +01:00
Simon Pilgrim	f9f401dba1	[X86][AVX] Add additional 256/512-bit test cases for PACKSS/PACKUS shuffle patterns Also add lowerShuffleWithPACK call to lowerV32I16Shuffle - shuffle combining was catching it but we avoid a lot of temporary shuffle creations if we catch it at lowering first.	2020-04-01 08:19:03 +01:00
Simon Pilgrim	3c9064ed96	[X86] Run XOP vector rotation tests with/without AVX2 I noticed this while reviewing D77152 - by only testing bdver4 we weren't checking an XOP target that only had AVX1	2020-04-01 08:19:03 +01:00
Kai Luo	8eb40e41f6	[PowerPC] Don't generate ST_VSR_SCAL_INT if power8-vector is disabled Summary: In https://bugs.llvm.org/show_bug.cgi?id=45297, it fails selecting instructions for `PPCISD::ST_VSR_SCAL_INT`. The reason it generate the `PPCISD::ST_VSR_SCAL_INT` with `-power8-vector` in IR is PPC's combiner checks `hasP8Altivec` rather than `hasP8Vector`. This patch should resolve PR45297. Differential Revision: https://reviews.llvm.org/D76773	2020-04-01 02:15:25 +00:00
Shengchen Kan	d0efd7bfcf	[X86][MC] Disable Prefix padding after hardcode/prefix Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight, eli.friedman Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits, annita.zhang Tags: #llvm Differential Revision: https://reviews.llvm.org/D76475	2020-04-01 09:49:52 +08:00
Matt Arsenault	43e576593e	AMDGPU/GlobalISel: Fix insert point when lowering G_FMAD	2020-03-31 19:57:06 -04:00
Evgenii Stepanov	f9471b0010	Fix MSan false positive due to select folding. Summary: Select folding in JumpThreading can create a conditional branch on a code patch that did not have one in the original program. This is not a valid transformation in sanitize_memory functions. Note that JumpThreading does select folding in 3 different places. Two of them seem safe - they apply to a select instruction in a BB that ends with an unconditional branch to another BB, which (in turn) ends with a conditional branch or a switch with the same condition. Fixes PR45220. Reviewers: glider, dvyukov, efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76332	2020-03-31 15:25:42 -07:00
Fangrui Song	4af7560b37	[PPCInstPrinter] Print conditional branches as `bt 2, $target` instead of `bt 2, .+$imm` Follow-up of D76591. Reviewed By: #powerpc, sfertile Differential Revision: https://reviews.llvm.org/D76907	2020-03-31 15:05:38 -07:00
Joel E. Denny	8f8c4950fe	[FileCheck] Add missing %ProtectFileCheckOutput to FileCheck tests I'm committing this fixup without review because it's an obvious continuation of D65121 (committed at `f471eb8e99`).	2020-03-31 17:29:11 -04:00
Hubert Tong	478af4479a	[Object] Update ObjectFile::makeTriple for XCOFF Summary: When we encounter an XCOFF file, reflect that in the triple information. In addition to knowing the object file format, we know that the associated OS is AIX. This means that we can expect that there is no output difference in the processing of an XCOFF32 input file between cases where the triple is left unspecified by the user and cases where the user specifies `--triple powerpc-ibm-aix` explicitly. Reviewers: jhenderson, sfertile, jasonliu, daltenty Reviewed By: jasonliu Subscribers: wuzish, nemanjai, hiraditya, MaskRay, rupprecht, steven.zhang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77025	2020-03-31 17:26:30 -04:00
Daniel Frampton	494abe139a	[AArch64] Change AArch64 Windows EH UnwindHelp object to be a fixed object The UnwindHelp object is used during exception handling by runtime code. It must be findable from a fixed offset from FP. This change allocates the UnwindHelp object as a fixed object (as is done for x86_64) to ensure that both the generated code and runtime agree on the location of the object. Fixes https://bugs.llvm.org/show_bug.cgi?id=45346 Differential Revision: https://reviews.llvm.org/D77016	2020-03-31 14:21:21 -07:00
Daniel Frampton	522b4c4b88	[AArch64] Fix mismatch in prologue and epilogue for funclets on Windows The generated code for a funclet can have an add to sp in the epilogue for which there is no corresponding sub in the prologue. This patch removes the early return from emitPrologue that was preventing the sub to sp, and instead conditionalizes the appropriate parts of the rest of the function. Fixes https://bugs.llvm.org/show_bug.cgi?id=45345 Differential Revision: https://reviews.llvm.org/D77015	2020-03-31 14:21:18 -07:00
Anna Thomas	58a05675da	Revert "[InlineFunction] Handle return attributes on call within inlined body" This reverts commit `28518d9ae3`. There is a failure in MsgPackReader.cpp when built with clang. It complains about "signext and zeroext" are incompatible. Investigating offline if it is infact a UB in the MsgPackReader code.	2020-03-31 16:16:34 -04:00
Eli Friedman	dacf8d3562	[AArch64][SVE] Add support for fcmp. This also requires support for boolean "not", so I added boolean logic while I was there. Differential Revision: https://reviews.llvm.org/D76901	2020-03-31 12:04:39 -07:00
Guozhi Wei	6d20937c29	[CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcall The attached test case is simplified from tcmalloc. Both function calls should be optimized as tailcall. But llvm can only optimize the first call. The second call can't be optimized because function dupRetToEnableTailCallOpts failed to duplicate ret into block case2. There 2 problems blocked the duplication: 1 Intrinsic call llvm.assume is not handled by dupRetToEnableTailCallOpts. 2 The control flow is more complex than expected, dupRetToEnableTailCallOpts can only duplicate ret into its predecessor, but here we have an intermediate block between call and ret. The solutions: 1 Since CodeGenPrepare is already at the end of LLVM IR phase, we can simply delete the intrinsic call to llvm.assume. 2 A general solution to the complex control flow is hard, but for this case, after exit2 is duplicated into case1, exit2 is the only successor of exit1 and exit1 is the only predecessor of exit2, so they can be combined through eliminateFallThrough. But this function is called too late, there is no more dupRetToEnableTailCallOpts after it. We can add an earlier call to eliminateFallThrough to solve it. Differential Revision: https://reviews.llvm.org/D76539	2020-03-31 11:55:51 -07:00
Stanislav Mekhanoshin	08682dcc86	[AMDGPU] Define 16 bit VGPR subregs We have loads preserving low and high 16 bits of their destinations. However, we always use a whole 32 bit register for these. The same happens with 16 bit stores, we have to use full 32 bit register so if high bits are clobbered the register needs to be copied. One example of such code is added to the load-hi16.ll. The proper solution to the problem is to define 16 bit subregs and use them in the operations which do not read another half of a VGPR or preserve it if the VGPR is written. This patch simply defines subregisters and register classes. At the moment there should be no difference in code generation. A lot more work is needed to actually use these new register classes. Therefore, there are no new tests at this time. Register weight calculation has changed with new subregs so appropriate changes were made to keep all calculations just as they are now, especially calculations of register pressure. Differential Revision: https://reviews.llvm.org/D74873	2020-03-31 11:49:06 -07:00
Anna Thomas	28518d9ae3	[InlineFunction] Handle return attributes on call within inlined body Consider a callee function that has a call (C) within it which feeds into the return. When we inline that callee into a callsite that has return attributes, we can backward propagate those attributes to the call (C) within that inlined callee body. This is safe to do so only if we can guarantee transfer of execution to successor in the window of instructions between return value (i.e. the call C) and the return instruction. See added test cases. Reviewed-By: reames, jdoerfert Differential Revision: https://reviews.llvm.org/D76140	2020-03-31 14:35:40 -04:00
Ulrich Weigand	c726c920e0	[SystemZ] Allow %r0 in address context for AsmParser Registers used in any address (as well as in a few other contexts) have special semantics when a "zero" register is used, which is why the back-end defines extra register classes ADDR32, ADDR64 etc to be used to prevent the register allocator from using %r0 there. However, when writing assembler code "by hand", you sometimes need to trigger that special semantics. However, currently the AsmParser will reject %r0 in those places. In some cases it may be possible to write that instruction differently - but in others it is currently not possible at all. This check in AsmParser simply seems overly strict, so this patch just removes the check completely. This brings the behaviour of AsmParser in line with the GNU assembler as well. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45092	2020-03-31 19:48:50 +02:00
Uday Bondhugula	dc817b2dea	[InstCombine] Deduce attributes for aligned_alloc in InstCombine Make InstCombine aware of the aligned_alloc library function. Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Depends on D76970. Differential Revision: https://reviews.llvm.org/D76971	2020-03-31 23:17:28 +05:30
zhizhouy	94d912296d	[NFC] Do not run CGProfilePass when not using integrated assembler Summary: CGProfilePass is run by default in certain new pass manager optimization pipeline. Assemblers other than llvm as (such as gnu as) cannot recognize the .cgprofile entries generated and emitted from this pass, causing build time error. This patch adds new options in clang CodeGenOpts and PassBuilder options so that we can turn cgprofile off when not using integrated assembler. Reviewers: Bigcheese, xur, george.burgess.iv, chandlerc, manojgupta Reviewed By: manojgupta Subscribers: manojgupta, void, hiraditya, dexonsmith, llvm-commits, tcwang, llozano Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D62627	2020-03-31 10:31:31 -07:00
Simon Pilgrim	30436a1ce7	[X86][SSE] Add additional PTEST/TESTP inversion tests	2020-03-31 18:02:27 +01:00
Simon Pilgrim	8b925440d1	[X86][SSE] Simplify PTEST/TESTP tests for D76984 We don't need to use an allones for the second operand - test the general case.	2020-03-31 18:02:27 +01:00
Sterling Augustine	21d9d0855b	New symbolizer option to print files relative to the compilation directory. Summary: New "--relative" option to allow printing files relative to the compilation directory. Reviewers: jhenderson Subscribers: MaskRay, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76733	2020-03-31 09:29:24 -07:00
Florian Hahn	b0cd7b2799	[SCCP] Limit use of range info for binops to integers for now. This fixes a crash when building the test suite.	2020-03-31 17:08:09 +01:00
Tyker	4aeb7e1ef4	[AssumeBundles] Preserve information in EarlyCSE Summary: this patch preserve information from various places in EarlyCSE into assume bundles. Reviewers: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76769	2020-03-31 17:47:04 +02:00
Tyker	7093b92a13	[AssumeBundles] Preserve Information from Load/Store Summary: This patch preserve dereferenceable, nonnull and alignment from loads and stores. Reviewers: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76759	2020-03-31 17:47:04 +02:00
Jonas Paulsson	f481d48893	[SystemZ] Improve foldMemoryOperandImpl(). Fold MS(G)RKC -> MS(G)C. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D76771	2020-03-31 17:17:51 +02:00
Georgii Rymar	b3f13bc165	[obj2yaml] - Teach tool to dump program headers. Currently obj2yaml does not dump program headers, this patch teaches it to do that. Differential revision: https://reviews.llvm.org/D75342	2020-03-31 18:10:19 +03:00
Simon Pilgrim	7e0e5fa499	Revert rGefe59d6717dcdf7777acb9b7a734e1a520bdf22a "[X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations" This might be causing an issue on the fuchsia-x86_64-linux buildbot - reverting to see what happens.	2020-03-31 15:47:30 +01:00
Simon Pilgrim	efe59d6717	[X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations If canLowerByDroppingEvenElements indicates that the shuffle is a N:1 compaction pattern and the inputs are suitably sign/zero extended then we can use a chain of PACKSS/PACKUS to compact. This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask can recognise PACKSS/PACKUS chains.	2020-03-31 14:48:48 +01:00
Sanjay Patel	fa61b5059a	[InstCombine] remove stray auto-generated test comment; NFC The script now includes extra info about command-line options used when generating its advertisement heading, but we don't want that here. This is a special-case because we have enhanced the check lines (as noted in the 2nd comment line).	2020-03-31 09:19:12 -04:00
Florian Hahn	b37543750c	[ValueLattice] Distinguish between constant ranges with/without undef. This patch updates ValueLattice to distinguish between ranges that are guaranteed to not include undef and ranges that may include undef. A constant range guaranteed to not contain undef can be used to simplify instructions to arbitrary values. A constant range that may contain undef can only be used to simplify to a constant. If the value can be undef, it might take a value outside the range. For example, consider the snipped below define i32 @f(i32 %a, i1 %c) { br i1 %c, label %true, label %false true: %a.255 = and i32 %a, 255 br label %exit false: br label %exit exit: %p = phi i32 [ %a.255, %true ], [ undef, %false ] %f.1 = icmp eq i32 %p, 300 call void @use(i1 %f.1) %res = and i32 %p, 255 ret i32 %res } In the exit block, %p would be a constant range [0, 256) including undef as %p could be undef. We can use the range information to replace %f.1 with false because we remove the compare, effectively forcing the use of the constant to be != 300. We cannot replace %res with %p however, because if %a would be undef %cond may be true but the second use might not be < 256. Currently LazyValueInfo uses the new behavior just when simplifying AND instructions and does not distinguish between constant ranges with and without undef otherwise. I think we should address the remaining issues in LVI incrementally. Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D76931	2020-03-31 12:50:20 +01:00
Denis Antrushin	06c58f11a9	[SCEV] Use backedge SCEV of PHI only if its input is loop invariant For the PHI node %1 = phi [%A, %entry], [%X, %latch] it is incorrect to use SCEV of backedge val %X as an exit value of PHI unless %X is loop invariant. This is because exit value of %1 is value of %X at one-before-last iteration of the loop. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D73181	2020-03-31 18:39:24 +07:00
Simon Pilgrim	98357dee1c	[X86] Combine concat(palignr,palignr) -> palignr(concat,concat) combineX86ShufflesRecursively should handle this someday	2020-03-31 11:06:35 +01:00
Daan Sprenkels	464b9aeafe	[InstCombine] Transform extelt-trunc -> bitcast-extelt Canonicalize the case when a scalar extracted from a vector is truncated. Transform such cases to bitcast-then-extractelement. This will enable erasing the truncate operation. This commit fixes PR45314. reviewers: spatel Differential revision: https://reviews.llvm.org/D76983	2020-03-31 11:53:41 +02:00
David Green	2c5f43f9dd	[ARM] Fix qdadd operand order qdadd is defined as sat(Rm + sat(2*Rn)). We had the Rm and Rn switched the wrong way around. Differential Revision: https://reviews.llvm.org/D77049	2020-03-31 10:11:36 +01:00
Guillaume Chatelet	c9d5c19597	[Alignment][NFC] Transitionning more getMachineMemOperand call sites Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77121	2020-03-31 08:36:18 +00:00
Sebastian Neubauer	5d3a69feca	[AMDGPU] New llvm.amdgcn.ballot intrinsic Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function in GLSL and other shader languages. It returns a bitfield containing the result of its boolean argument in all active lanes, and zero in all inactive lanes. This is intended to replace the existing llvm.amdgcn.icmp and llvm.amdgcn.fcmp intrinsics after a suitable transition period. Use the new intrinsic in the atomic optimizer pass. Differential Revision: https://reviews.llvm.org/D65088	2020-03-31 10:35:39 +02:00
Florian Hahn	0c9c58ada0	[SCCP] Use constant ranges for casts. For casts with constant range operands, we can use ConstantRange::castOp. Reviewers: davide, efriedma, mssimpso Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D71938	2020-03-31 09:22:04 +01:00
Kai Wang	581ba35291	[RISCV] ELF attribute section for RISC-V. Leverage ARM ELF build attribute section to create ELF attribute section for RISC-V. Extract the common part of parsing logic for this section into ELFAttributeParser.[cpp\|h] and ELFAttributes.[cpp\|h]. Differential Revision: https://reviews.llvm.org/D74023	2020-03-31 16:16:19 +08:00
Djordje Todorovic	bcbd60aeb5	[Mips] Make MipsBranchExpansion aware of BBIT family of branch Octeon branches (bbit0/bbit032/bbit1/bbit132) have an immediate operand, so it is legal to have such replacement within MipsBranchExpansion::replaceBranch(). According to the specification, a branch (e.g. bbit0 ) looks like: bbit0 rs p offset // p is an immediate operand if !rs<p> then branch Without this patch, an assertion triggers in the method, and the problem has been found in the real example. Differential Revision: https://reviews.llvm.org/D76842	2020-03-31 09:20:51 +02:00
Dylan McKay	7b808b105f	[AVR] Generalize the previous interrupt bugfix to signal handlers too	2020-03-31 19:33:34 +13:00
Dylan McKay	339b34266c	[AVR] Respect the 'interrupt' function attribute In the past, AVR functions were only lowered with interrupt-specific machine code if the function was defined with the "avr-interrupt" or "avr-signal" calling conventions. This patch modifies the backend so that if the function does not have a special calling convention, but does have an "interrupt" attribute, that function is interpreted as a function with interrupts. This also extracts the "is this function an interrupt" logic from several disparate places in the backend into one AVRMachineFunctionInfo attribute. Bug found by Wilhelm Meier.	2020-03-31 19:00:18 +13:00
Wei Mi	ebad678857	[SampleFDO] Port MD5 name table support to extbinary format. Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it. Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support. Differential Revision: https://reviews.llvm.org/D76255	2020-03-30 22:07:08 -07:00
QingShan Zhang	4eeb56d088	[PowerPC] Don't do the folding if the operand is R0/X0 We have this transformation in PowerPC peephole: Replace instruction: renamable $x28 = ADDI8 renamable $x7, -8 renamable $x28 = ADD8 killed renamable $x28, renamable $x0 STFD killed renamable $f0, -8, killed renamable $x28 :: (store 8 into %ir._ind_cast99.epil) with: renamable $x28 = ADDI8 renamable $x7, -16 STFDX killed renamable $f0, $x0, killed $x28 :: (store 8 into %ir._ind_cast99.epil) It is invalid as the '$x0' in STFDX is constant 0, not register r0. Reviewed By: Nemanjai Differential Revision: https://reviews.llvm.org/D77034	2020-03-31 02:50:19 +00:00
Jessica Paquette	d5ee72065b	[GlobalISel] Implement identity transforms for x op x -> x When we have ``` a = G_OR x, x ``` or ``` b = G_AND y, y ``` We can drop the G_OR/G_AND and just use x/y respectively. Also update arm64-fallback.ll because there was an or in there which hits this transformation. Differential Revision: https://reviews.llvm.org/D77105	2020-03-30 18:22:37 -07:00
Juneyoung Lee	519f5c3796	[LegalizeTypes] Add SoftenFloatRes_FREEZE Summary: This adds SoftenFloatRes_FREEZE. Reviewers: bkramer, JamesNagurne, craig.topper, efriedma Reviewed By: craig.topper Subscribers: AbigailLinden, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76980	2020-03-31 10:16:38 +09:00
Jessica Paquette	63d70ea6a0	[GlobalISel] Combine (x op 0) -> x for operations with a right identity of 0 Implement identity combines for operations like the following: ``` %a = G_SUB %b, 0 ``` This can just be replaced with %b. Over CTMark, this gives some minor size improvements at -O3. Differential Revision: https://reviews.llvm.org/D76640	2020-03-30 16:49:52 -07:00
Matt Arsenault	b8fc192d42	Revert "[GISel]: Fix incorrect IRTranslation while translating null pointer types" This reverts commit `b3297ef051`. This change is incorrect. The current semantic of null in the IR is a pointer with the bitvalue 0. It is not a cast from an integer 0, so this should preserve the pointer type.	2020-03-30 19:30:42 -04:00
Daan Sprenkels	5227fa0c72	Recommit "[InstCombine] Update assertions in InstCombine test; NFC"	2020-03-31 00:00:41 +02:00
Matt Arsenault	db9f0d1ce5	AMDGPU: Form v_cvt_ubyte* with f16 results We get 2 conversion instructions anyway. Previously we would get a conversion with SDWA reading from a byte source, which has a larger encoding.	2020-03-30 17:59:49 -04:00
Matt Arsenault	b27d255e1e	AMDGPU/GlobalISel: Form CVT_F32_UBYTE0	2020-03-30 17:45:55 -04:00
Matt Arsenault	bcb643c8af	AMDGPU/GlobalISel: Handle image atomics	2020-03-30 17:41:04 -04:00
Matt Arsenault	48eda37282	AMDGPU/GlobalISel: Start selecting image intrinsics Does not handled atomics yet.	2020-03-30 17:33:04 -04:00
Matt Arsenault	570a578e46	AMDGPU: Account for dmask when computing image mem size Only the number of elements in the dmask will really be accessed.	2020-03-30 17:30:58 -04:00
Jay Foad	cee65d51fe	AMDGPU: Implement getMemcpyLoopLoweringType Summary: Based on a patch by Matt Arsenault. Reviewers: rampitec, kerbowa, nhaehnle, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77057	2020-03-30 22:21:01 +01:00
Matt Arsenault	2641ba52a9	AMDGPU/GlobalISel: Round up image operations with 5, 6 or 7 addresses The instruction definitions are missing for these register types, so round up to 8 like the DAG.	2020-03-30 17:02:47 -04:00
Matt Arsenault	42d5609809	AMDGPU/GlobalISel: Start handling _L to _LZ optimization We currently don't have a way to map to the equivalent intrinsic opcode, so track immediate 0s in place of the address for the selection to know to change the final opcode.	2020-03-30 17:02:30 -04:00
Daan Sprenkels	273b0d7766	Revert "[InstCombine] Update assertions in InstCombine test; NFC" This reverts commit `4243bd494d`.	2020-03-30 22:41:33 +02:00
Daan Sprenkels	4243bd494d	[InstCombine] Update assertions in InstCombine test; NFC	2020-03-30 22:15:50 +02:00
Sanjay Patel	f2fbdf76d8	[InstCombine] do not exclude min/max from icmp with casted operand fold InstCombine has a mess of logic that tries to preserve min/max patterns, but AFAICT, this one is not necessary because we can always narrow the corresponding select in this sequence to match the narrow compare. The biggest danger for this patch is inducing infinite looping or assert from exceeding max iterations. If any bots hit that in the vicinity of this commit, this is the likely patch to blame.	2020-03-30 16:10:51 -04:00
Eli Friedman	9eb1b41811	[llvm-cov] Improve error message for missing profdata I got a report recently that a user was having trouble interpreting the meaning of the error message. Hopefully this is more readable; produces something like the following: error: No such file or directory: Could not read profile data! Differential Revision: https://reviews.llvm.org/D76796	2020-03-30 12:54:07 -07:00
Matt Arsenault	4919f2e1c5	AMDGPU/GlobalISel: Basic legalize rules for G_FSHR Only handles easy 32-bit cases.	2020-03-30 11:53:01 -07:00
Bill Wendling	fa496ce3c6	[Intrinsic] Give "is.constant" the "convergent" attribute Summary: Code frequently relies upon the results of "is.constant" intrinsics to DCE invalid code paths. We don't want the intrinsic to be made control- dependent on any additional values. For instance, we can't split a PHI into a "constant" and "non-constant" part via jump threading in order to "optimize" the constant part, because the "is.constant" intrinsic is meant to return "false". Reviewers: wmi, kazu, MaskRay Reviewed By: kazu Subscribers: jdoerfert, efriedma, joerg, lebedev.ri, nikic, xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75799	2020-03-30 11:47:12 -07:00
Matt Arsenault	23da702d69	GlobalISel: Translate llvm.fshl/llvm.fshr	2020-03-30 11:34:42 -07:00
Jakub Kuderski	77ce2e21a8	[AMDGPU] Add Relocation Constant Support Summary: This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf. The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry. `SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel. Author: csyonghe <yonghe@google.com> Reviewers: tpr, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76440	2020-03-30 13:49:20 -04:00
Sameer Sahasrabuddhe	3cbbded68c	Introduce unify-loop-exits pass. For each natural loop with multiple exit blocks, this pass creates a new block N such that all exiting blocks now branch to N, and then control flow is redistributed to all the original exit blocks. The bulk of the tranformation is a new function introduced in BasicBlockUtils that an redirect control flow from a set of incoming blocks to a set of outgoing blocks via a common "hub". This is a useful workaround for a limitation in the structurizer which incorrectly orders blocks when processing a nest of loops. This pass bypasses that issue by ensuring that each natural loop is recognized as a separate region. Since the structurizer is a region pass, it no longer sees a nest of loops in a single region, and instead processes each "level" in the nesting as a separate region. The AMDGPU backend provides a new option to enable this pass before the structurizer, which may eventually be enabled by default. Reviewers: madhur13490, arsenm, nhaehnle Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D75865	2020-03-30 13:23:56 -04:00
Vedant Kumar	dcc410b5cf	[LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259) In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken count is a SCEV add expression, its type is defined by the type of the last operand of the add expression. In the test case from PR45259, this last operand happens to be a pointer, which (according to llvm::Type) does not have a primitive size in bits. In this case, LoopVectorize fails to truncate the SCEV and crashes as a result. Uing ScalarEvolution::getTypeSizeInBits makes the truncation work as expected. https://bugs.llvm.org/show_bug.cgi?id=45259 Differential Revision: https://reviews.llvm.org/D76669	2020-03-30 10:14:14 -07:00
Yuanfang Chen	ece79f4708	[X86] make sure POP has implicit def/use of stack pointer when materializing 8-bit immediates for minsize Summary: Otherwise PostRA list scheduler may reorder instruction, such as schedule this ''' pushq $0x8 pop %rbx lea 0x2a0(%rsp),%r15 ''' to ''' pushq $0x8 lea 0x2a0(%rsp),%r15 pop %rbx ''' by mistake. The patch is to prevent this to happen by making sure POP has implicit use of SP. Reviewers: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77031	2020-03-30 09:25:31 -07:00
Matt Arsenault	bb009498c2	AMDGPU/GlobalISel: Hack to fix i24 argument lowering I still think the call lowering type legalization logic split between the generic code and target is too confusing, but largely induced by the reliance on the DAG infrastructure.	2020-03-30 11:00:45 -04:00
Matt Arsenault	90a36bbd7c	AMDGPU/GlobalISel: Legalize 64-bit G_UDIV/G_UREM Mostly ported from the DAG version. This results in much worse code than the DAG version, largely due to a much worse expansion for G_UMULH.	2020-03-30 10:57:37 -04:00
Chris Jackson	f6b2c003f3	[DebugInfo] Ensure that a demanded bits optimisation in InstCombine does not result in an incorrect debuginfo variable value - Add an additional salvage and a test. Reviewers: aprantl, djtodoro Differential Revision: https://reviews.llvm.org/D76854 Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=44371	2020-03-30 15:39:22 +01:00
Florian Hahn	7899a111ea	Revert "[Darwin] Respect -fno-unroll-loops during LTO." As per post-commit comment at https://reviews.llvm.org/D76916, this should better be done at the TU level. This reverts commit `9ce198d6ed`.	2020-03-30 15:20:30 +01:00
Chris Jackson	135709aa90	[DebugInfo] Ensure dead store elimination can mark an operand value as undefined - Correct a debug info salvage and add a test Reviewers: aprantl, vsk Differential Revision: https://reviews.llvm.org/D76930 Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45080	2020-03-30 14:58:14 +01:00
Sanjay Patel	bc60cdcc3f	[InstCombine] add test for trunc-extelt; NFC Goes with D76983	2020-03-30 09:43:03 -04:00
Georgii Rymar	4cbfb98eb3	[llvm-readobj] - Improve test of --elf-hash-histogram option. This test missed the check of histograms printed for .hash sections. It was removed by mistake in D71606 where I tried to get rid of precompiled objects and did not realize that time that both SHT_GNU_HASH and SHT_HASH sections were tested and not just GNU version. Also it never tested aliases for the --elf-hash-histogram option. Differential revision: https://reviews.llvm.org/D76920	2020-03-30 15:46:45 +03:00
Georgii Rymar	821439a45a	[llvm-readobj][test] - Simplify hash-symbols test. We are able to reduce `-DBITS=32/64` to reduce this test case. I've rewrote the comments we had to generalize them and fix wrong computations they contained. Differential revision: https://reviews.llvm.org/D76924	2020-03-30 14:44:30 +03:00
Simon Pilgrim	e95d04f4f1	[X86][AVX] lowerV4X128Shuffle - attempt to widen to 2x256 to simplify shuffles If we are lowering to X86ISD::SHUF128 we are going to lose track of individual 128-bit lanes that are UNDEF, so if we can widen these to guarantee that they are sequential with their neighbour we should. This helps with later shuffle combines.	2020-03-30 12:22:26 +01:00
Florian Hahn	84c1fbab5d	[CVP] Add additional icmp for ranges with undef to test.	2020-03-30 10:59:25 +01:00
Qiu Chaofan	9aa884ccc2	[NFC] [PowerPC] Update and add tests for ori Use script to update test for ori with 32-bit imms, and add test for ori with 64-bit imms.	2020-03-30 17:46:12 +08:00
Sam Parker	94b195ff12	[ARM][LowOverheadLoops] Add horizontal reduction support Add a bit more logic into the 'FalseLaneZeros' tracking to enable horizontal reductions and also make the VADDV variants validForTailPredication. Differential Revision: https://reviews.llvm.org/D76708	2020-03-30 09:55:41 +01:00
David Green	c9eaed5149	[ARM] MVE VMOV.i64 In the original batch of MVE VMOVimm code generation VMOV.i64 was left out due to the way it was done downstream. It turns out that it's fairly simple though. This adds the codegen for it, similar to NEON. Bigendian is technically incorrect in this version, which John is fixing in a Neon patch.	2020-03-30 07:44:23 +01:00
Craig Topper	b4695351cb	[TTI][X86] Fix the value passed to IsUnsigned for cost modeling of experimental.vector.reduce.smin/smax/umin/umax. We were passing true for smax/smin and false for umax/umin.	2020-03-29 23:34:22 -07:00
Jun Ma	31a1d85c53	[Coroutines 2/2] Improve symmetric control transfer feature Differential Revision: https://reviews.llvm.org/D76913	2020-03-30 09:53:09 +08:00
Jun Ma	a94fa2c049	[Coroutines 1/2] Improve symmetric control transfer feature Differential Revision: https://reviews.llvm.org/D76911	2020-03-30 09:53:09 +08:00
Craig Topper	d74533a18b	[X86] Add sse4.1 RUNs lines to the min/max reduction cost model tests. Mostly this matches the sse4.2 we already had command lines for. Except in the i64 case since sse4.1 doesn't have pcmpgtq.	2020-03-29 16:05:35 -07:00
Daan Sprenkels	24562c6588	[InstCombine] Add tests for trunc (extelt x); (NFC) Baseline tests for D76983 (PR45314) Differential Revision: https://reviews.llvm.org/D77024	2020-03-29 17:30:54 -04:00
Craig Topper	2451e4c597	[X86] Add sse4.2 command lines to min/max reduction tests. SSE4.2 has the pcmpgtq instruction which we will use in vXi64 reductions when its available.	2020-03-29 13:51:03 -07:00
David Green	7c1a6873aa	[ARM] VMOV.64 immediate tests. NFC	2020-03-29 21:08:43 +01:00
Simon Pilgrim	9c8ec99c80	[X86][AVX] Combine 128/256-bit lane shuffles with zeroable upper subvectors to EXTRACT_SUBVECTOR (PR40720) As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128/SHUF128, and we can use the implicit zeroing of the uppers.	2020-03-29 19:51:38 +01:00
Uday Bondhugula	c0955edfd6	Introduce support for lib function aligned_alloc in TLI / memory builtins Aligned_alloc is a standard lib function and has been in glibc since 2.16 and in the C11 standard. It has semantics similar to malloc/calloc for several analyses/transforms. This patch introduces aligned_alloc in target library info and memory builtins. Subsequent ones will make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062 This change will also be useful to LLVM generators that need to allocate buffers of vector elements larger than 16 bytes (for eg. 256-bit ones), element boundary alignment for which is not typically provided by glibc malloc. Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Differential Revision: https://reviews.llvm.org/D76970	2020-03-29 23:36:24 +05:30
Matt Arsenault	0b68ca5162	AMDGPU: Add some additional tests for v_cvt_ubyte* formation Use functions now that we have them for less boilerplate in the output.	2020-03-29 14:03:07 -04:00
Sanjay Patel	febcb24f14	[InstCombine] make test independent of branch undef/UB; NFC	2020-03-29 13:32:47 -04:00
Simon Pilgrim	443dcc0e00	[X86][AVX] Add tests for 512-bit shuffle patterns that could reduce to subvector extractions	2020-03-29 18:27:18 +01:00
Simon Pilgrim	b44f07045c	Remove unnecessary empty comments from test check lines. NFC.	2020-03-29 18:27:18 +01:00
Simon Pilgrim	7734e4b3a3	[X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720) As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128, and we can use the implicit zeroing of the upper half. I've added some extra tests to vector-shuffle-combining-avx2.ll to make sure we don't lose coverage.	2020-03-29 16:41:59 +01:00
Simon Pilgrim	10439f9e32	[X86][AVX] Add X86ISD::VALIGN target shuffle decode support Allows us to combine VALIGN instructions with other shuffles - the combiner doesn't create VALIGN yet though.	2020-03-29 16:41:58 +01:00
Simon Pilgrim	a7115d51be	[X86] X86CallFrameOptimization - generalize slow push code path Replace the explicit isAtom() \|\| isSLM() test with the more general (and more specific) slowTwoMemOps() check to avoid the use of the PUSHrmm push from memory case. This is actually very tricky to test in anything but quite complex code, but the atomic-idempotent.ll tests seem to be the most straightforward to use. Differential Revision: https://reviews.llvm.org/D76239	2020-03-29 11:01:59 +01:00
Richard Diamond	4bf015c035	[AlignmentFromAssumptions] Fix a SCEV assertion resulting from address space differences. Summary: On targets with different pointer sizes, -alignment-from-assumptions could attempt to create SCEV expressions which use different effective SCEV types. The provided test illustrates the issue. In `getNewAlignment`, AASCEV would be the (only) alloca, which would have an effective SCEV type of i32. But PtrSCEV, the GEP in this case, due to being in the flat/default address space, will have an effective SCEV of i64. This patch resolves the issue by truncating PtrSCEV to AASCEV's effective type. Reviewers: hfinkel, jdoerfert Reviewed By: jdoerfert Subscribers: jvesely, nhaehnle, hiraditya, javed.absar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75471	2020-03-29 01:26:31 -05:00
Craig Topper	c0aa97b632	[X86] Add cost model test cases for fmin/fmax reduction.	2020-03-28 17:12:49 -07:00
Fangrui Song	fc93787d7e	[MC][PowerPC] Make .reloc support arbitrary relocation types Generalizes `ad7199f3e6` (R_PPC_NONE/R_PPC64_NONE).	2020-03-28 17:04:31 -07:00
Yonghong Song	ced0d1f42b	[BPF] support 128bit int explicitly in layout spec Currently, bpf does not specify 128bit alignment in its layout spec. So for a structure like struct ipv6_key_t { unsigned pid; unsigned __int128 saddr; unsigned short lport; }; clang will generate IR type %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] } Additional padding is to ensure later IR->MIR can generate correct stack layout with target layout spec. But it is common practice for a tracing program to be first compiled with target flag (e.g., x86_64 or aarch64) through clang to generate IR and then go through llc to generate bpf byte code. Tracing program often refers to kernel internal data structures which needs to be compiled with non-bpf target. But such a compilation model may cause a problem on aarch64. The bcc issue https://github.com/iovisor/bcc/issues/2827 reported such a problem. For the above structure, since aarch64 has "i128:128" in its layout string, the generated IR will have %struct.ipv6_key_t = type { i32, i128, i16 } Since bpf does not have "i128:128" in its spec string, the selectionDAG assumes alignment 8 for i128 and computes the stack storage size for the above is 32 bytes, which leads incorrect code later. The x86_64 does not have this issue as it does not have "i128:128" in its layout spec as it does permits i128 to be alignmented at 8 bytes at stack. Its IR type looks like %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] } The fix here is add i128 support in layout spec, the same as aarch64. The only downside is we may have less optimal stack allocation in certain cases since we require 16byte alignment for i128 instead of 8. But this is probably fine as i128 is not used widely and in most cases users should already have proper alignment. Differential Revision: https://reviews.llvm.org/D76587	2020-03-28 11:46:29 -07:00
Reid Kleckner	e5bf5037d8	[CodeGen] Fix sinking local values in lpads with phis There was already a test case for landingpads to handle this case, but I had forgotten to consider PHI instructions preceding the EH_LABEL in the landingpad. PR45261	2020-03-28 11:10:33 -07:00
Nikita Popov	30d712103f	[InstCombine] Use replaceOperand() API in GEP transforms To make sure that replaced operands get DCEd. This drops one iteration from gepphigep.ll, which is still not optimal. This was the last test case performing more than 3 iterations. NFC-ish, only worklist order should change.	2020-03-28 19:07:25 +01:00
Nikita Popov	672e8bfbfc	[InstCombine] Fix worklist management in foldXorOfICmps() Because this code does not use the IC-aware replaceInstUsesWith() helper, we need to manually push users to the worklist. This is NFC-ish, in that it may only change worklist order.	2020-03-28 18:25:21 +01:00
Nikita Popov	337b671b0d	[InstCombine] Change limit-max-iterations test case; NFC This particular case will stop needing multiple iterations in a followup change.	2020-03-28 18:25:20 +01:00
Martin Storsjö	e6112a56dd	[AsmPrinter] Emit .weak directive for weak linkage on COFF for symbols without a comdat MC already knows how to emulate the .weak directive (with its ELF semantics; i.e., an undefined weak symbol resolves to 0, and a defined weak symbol has lower link precedence than a strong symbol of the same name) using COFF weak externals. Plumb this through the ASM printer too, so that definitions marked with __attribute__((weak)) at the language level (which gets translated to weak linkage at the IR level) have the corresponding .weak directive emitted. Note that declarations marked with __attribute__((weak)) at the language level (which translates to extern_weak at the IR level) already have .weak directives emitted. Weak/linkonce symbols without an associated comdat (in particular, ones generated with __attribute__((weak)) in C/C++) were earlier emitted as normal unique globals, as the comdat is required to provide the linkonce semantics. This change makes sure they are emitted as .weak instead, allowing other symbols to override them. Rename the existing coff-weak.ll test to coff-linkonce.ll. I'm not quite sure what that test covers, since the behavior being tested in it (the emission of a one_only section) is just a result of passing -function-sections to llc; the linkonce_odr makes no difference. Add a new coff-weak.ll which tests the new directive emission. Based on an previous patch by Shoaib Meenai. Differential Revision: https://reviews.llvm.org/D44543	2020-03-28 18:48:58 +02:00
Martin Storsjö	8330dcadb8	[llvm-rc] Allow -1 for menu item IDs This seems to be used in some resource files, e.g. `f3217573d7/include/wx/msw/wx.rc (L28)`. MSVC rc.exe and GNU windres both allow any value here, and silently just truncate to uint16_t range. This just explicitly allows the -1 value and errors out on others - the same was done for control IDs in dialogs in `c1a67857ba`. Differential Revision: https://reviews.llvm.org/D76951	2020-03-28 14:32:08 +02:00
Simon Pilgrim	8c1dbd5c1e	[X86][SSE] Add testnzc(~X,Y) -> testnzc(X,Y) test cases	2020-03-28 10:56:57 +00:00
Simon Pilgrim	d34d2ec28b	[X86][SSE] Add original PR38522 test case	2020-03-28 10:56:57 +00:00
Simon Pilgrim	8d85da5f5a	[X86][SSE] Add combine tests for PTEST/TESTPS/TESTPD instructions Including some test coverage for PR38522	2020-03-28 10:56:57 +00:00
Serge Pavlov	f398739152	[FEnv] Constfold some unary constrained operations This change implements constant folding to constrained versions of intrinsics, implementing rounding: floor, ceil, trunc, round, rint and nearbyint. Differential Revision: https://reviews.llvm.org/D72930	2020-03-28 12:28:33 +07:00
Jessica Paquette	98d05f88d5	[GlobalISel] Fix equality for copies from physregs in matchEqualDefs When we see this: ``` %a = COPY $physreg ... SOMETHING implicit-def $physreg ... %b = COPY $physreg ``` The two copies are not equivalent, and so we shouldn't perform any folding on them. When we have two instructions which use a physical register check that they define the same virtual register(s) as well. e.g., if we run into this case ``` %a = COPY $physreg ... %b = COPY %a ``` we can say that the two copies are the same, and can be folded. Differential Revision: https://reviews.llvm.org/D76890	2020-03-27 17:52:21 -07:00
Kamlesh Kumar	aabc24acf0	[RISCV] Support llvm.thread.pointer Fixes https://bugs.llvm.org/show_bug.cgi?id=45303 (clang crashed on __builtin_thread_pointer) Reviewed By: lenary, MaskRay, luismarques Differential Revision: https://reviews.llvm.org/D76828	2020-03-27 17:30:12 -07:00
Nemanja Ivanovic	4821411347	[DAGCombine] Fix splitting indexed loads in ForwardStoreValueToDirectLoad() In DAGCombiner::visitLOAD() we perform some checks before breaking up an indexed load. However, we don't do the same checking in ForwardStoreValueToDirectLoad() which can lead to failures later during combining (see: https://bugs.llvm.org/show_bug.cgi?id=45301). This patch just adds the same checks to this function as well. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45301 Differential revision: https://reviews.llvm.org/D76778	2020-03-27 18:03:47 -05:00
Florian Hahn	9ce198d6ed	[Darwin] Respect -fno-unroll-loops during LTO. Currently -fno-unroll-loops is ignored when doing LTO on Darwin. This patch adds a new -lto-no-unroll-loops option to the LTO code generator and forwards it to the linker if -fno-unroll-loops is passed. Reviewers: thegameg, steven_wu Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D76916	2020-03-27 22:19:03 +00:00
Sanjay Patel	0f56bbc1a5	[InstCombine] reduce FP-casted and bitcasted signbit check PR45305: https://bugs.llvm.org/show_bug.cgi?id=45305 Alive2 proofs: http://volta.cs.utah.edu:8080/z/bVyrko http://volta.cs.utah.edu:8080/z/Vxpz9q	2020-03-27 17:33:59 -04:00
Sanjay Patel	e72730ee3a	[InstCombine] add tests for FP cast+bitcast signbit checks; NFC PR45305: https://bugs.llvm.org/show_bug.cgi?id=45305	2020-03-27 17:25:25 -04:00
Matt Arsenault	a8cc9047de	CodeGen: Add -denormal-fp-math-f32 flag Make the set of FP related attributes and command flags closer.	2020-03-27 14:00:39 -07:00
Jay Foad	a6dfd827e5	[AMDGPU] Fix getEUsPerCU for gfx10 in CU mode Summary: "Per CU" is a bit simplistic for gfx10, but I couldn't think of a better name. Reviewers: arsenm, rampitec, nhaehnle, dstuttard, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76861	2020-03-27 20:36:49 +00:00
Fangrui Song	152d14da64	[MC][X86] Make .reloc support arbitrary relocation types Generalizes D62014 (R_386_NONE/R_X86_64_NONE). Unlike ARM (D76746) and AArch64 (D76754), we cannot delete FK_NONE from getFixupKindSize because FK_NONE is still used by R_386_TLS_DESC_CALL/R_X86_64_TLSDESC_CALL.	2020-03-27 13:33:15 -07:00
Matt Arsenault	0fd8030be3	Fix line endings in test	2020-03-27 16:26:06 -04:00
Matt Arsenault	348735b723	AMDGPU: Stop setting attributes based on TargetOptions Having arbitrary passes looking at the TargetOptions is pretty messy. This was also disregarding if a function already had an explicit attribute setting on it. opt/llc now add the attributes to functions that don't specify the attribute. clang and lld do not call the function to do this, which they maybe should. This was also treating unsafe-fp-math as implying the others, and setting the other attributes based on it. This is not done anywhere else, and I'm not sure is correct based on the current description of the option bit. Effectively reverts `1d8cf2be89`	2020-03-27 13:13:43 -07:00
Fangrui Song	34d77516b8	[MC][AArch64] Make .reloc support arbitrary relocation types Depends on D76746. Generalizes D61973. Differential Revision: https://reviews.llvm.org/D76754	2020-03-27 12:30:52 -07:00
Fangrui Song	c389526171	[MC][ARM] Make .reloc support arbitrary relocation types Generalizes D61992. In GNU as, the .reloc directive supports arbitrary relocation types. A MCFixupKind value `V` larger than or equal to FirstLiteralRelocationKind is used to represent the relocation type whose number is V-FirstLiteralRelocationKind. This is useful for linker tests. Without the feature the assembler cannot produce certain relocation records (e.g. R_ARM_ALU_PC_G0/R_ARM_LDR_PC_G0) This helps move forward D75349 and D76575. Differential Revision: https://reviews.llvm.org/D76746	2020-03-27 12:29:49 -07:00
Craig Topper	cdd1cd7120	[X86] Don't form masked instructions if the operation has an additional user. This will cause the operation to be repeated in both a mask and another masked or unmasked form. This can a wasted of execution resources. Differential Revision: https://reviews.llvm.org/D60940	2020-03-27 10:44:22 -07:00
Simon Pilgrim	763c87309d	[X86][SSE] Add some additional v8i16 'truncation' style shuffle tests	2020-03-27 17:29:29 +00:00
Dennis Felsing	aa0be69e74	Export Segment.IsGapRegion to JSON Summary: So that external tools can make use of that information and not display such lines as uncovered. Fixes https://bugs.llvm.org/show_bug.cgi?id=45300 Reviewers: vsk Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D76763	2020-03-27 18:05:01 +01:00
jasonliu	d60d7d69de	[llvm-objdump][XCOFF][AIX] Implement -r option Summary: Implement several XCOFF hooks to get '-r' option working for llvm-objdump -r. Reviewer: DiggerLin, hubert.reinterpretcast, jhenderson, MaskRay Differential Revision: https://reviews.llvm.org/D75131	2020-03-27 16:05:42 +00:00
Sam Parker	d7084fa34a	[ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros Given that some instructions generate wider result elements than their inputs, flag them as being able to generate non zeros in the false lanes. Differential Revision: https://reviews.llvm.org/D76766	2020-03-27 15:26:13 +00:00

1 2 3 4 5 ...

70214 Commits