llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	bc598f0d61	[X86][AsmParser] Don't consider %eip as a valid register outside of 32-bit mode. This might make the error message added in r335668 unneeded, but I'm not sure yet. The check for RIP is technically unnecessary since RIP is in GR64, but that fact is kind of surprising so be explicit. llvm-svn: 336217	2018-07-03 17:40:51 +00:00
Sanjay Patel	8307bc407b	[Constants] add identity constants for fadd/fmul As the test diffs show, the current users of getBinOpIdentity() are InstCombine and Reassociate. SLP vectorizer is a candidate for using this functionality too (D28907). The InstCombine shuffle improvements are part of the planned enhancements noted in D48830. InstCombine actually has several other uses of getBinOpIdentity() via SimplifyUsingDistributiveLaws(), but we don't call that for any FP ops. Fixing that might be another part of removing the custom reassociation in InstCombine that is only done for fadd+fmul. llvm-svn: 336215	2018-07-03 17:12:59 +00:00
Sanjay Patel	2c38b7fd8b	[Reassociate] add tests for binop with identity constant; NFC llvm-svn: 336214	2018-07-03 16:44:18 +00:00
Sanjay Patel	5b4a003088	[Reassociate] regenerate checks; NFC llvm-svn: 336211	2018-07-03 16:01:41 +00:00
Sander de Smalen	128fdfa23f	[AArch64][SVE] Asm: Support for FP Complex ADD/MLA. The variants added in this patch are: - Predicated Complex floating point ADD with rotate, e.g. fcadd z0.h, p0/m, z0.h, z1.h, #90 - Predicated Complex floating point MLA with rotate, e.g. fcmla z0.h, p0/m, z1.h, z2.h, #180 - Unpredicated Complex floating point MLA with rotate (indexed operand), e.g. fcmla z0.h, p0/m, z1.h, z2.h[0], #180 Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D48824 llvm-svn: 336210	2018-07-03 16:01:27 +00:00
Amara Emerson	d912ffaba5	[AArch64][GlobalISel] Fix fallbacks introduced in r336120 due to unselectable stores. r336120 resulted in falling back to SelectionDAG more often due to the G_STORE MMOs not matching the vreg size. This fixes that by explicitly any-extending the value. llvm-svn: 336209	2018-07-03 15:59:26 +00:00
Sanjay Patel	5a6ba018d7	[Reassociate] add test for missing FP constant analysis; NFC llvm-svn: 336208	2018-07-03 15:56:04 +00:00
Sander de Smalen	8cd1f53334	[AArch64][SVE] Asm: Support for FMUL (indexed) Unpredicated FP-multiply of SVE vector with a vector-element given by vector[index], for example: fmul z0.s, z1.s, z2.s[0] which performs an unpredicated FP-multiply of all 32-bit elements in 'z1' with the first element from 'z2'. This patch adds restricted register classes for SVE vectors: ZPR_3b (only z0..z7 are allowed) - for indexed vector of 16/32-bit elements. ZPR_4b (only z0..z15 are allowed) - for indexed vector of 64-bit elements. Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D48823 llvm-svn: 336205	2018-07-03 15:31:04 +00:00
Sander de Smalen	cbd224941f	[AArch64][SVE] Asm: Support for predicated unary operations. The patch includes support for the following instructions: ABS z0.h, p0/m, z0.h NEG z0.h, p0/m, z0.h (S\|U)XTB z0.h, p0/m, z0.h (S\|U)XTB z0.s, p0/m, z0.s (S\|U)XTB z0.d, p0/m, z0.d (S\|U)XTH z0.s, p0/m, z0.s (S\|U)XTH z0.d, p0/m, z0.d (S\|U)XTW z0.d, p0/m, z0.d llvm-svn: 336204	2018-07-03 14:57:48 +00:00
Simon Pilgrim	74cc4cfa94	[DAGCombiner] visitSDIV - Permit MIN_SIGNED_VALUE in pow2 vector codegen Now that D45806 has landed, we can re-enable support for MIN_SIGNED_VALUE in the sdiv by pow2-constant code llvm-svn: 336198	2018-07-03 14:11:32 +00:00
Sanjay Patel	3074b9e53f	[InstCombine] fold shuffle-with-binop and common value This is the last significant change suggested in PR37806: https://bugs.llvm.org/show_bug.cgi?id=37806#c5 ...though there are several follow-ups noted in the code comments in this patch to complete this transform. It's possible that a binop feeding a select-shuffle has been eliminated by earlier transforms (or the code was just written like this in the 1st place), so we'll fail to match the patterns that have 2 binops from: D48401, D48678, D48662, D48485. In that case, we can try to materialize identity constants for the remaining binop to fill in the "ghost" lanes of the vector (where we just want to pass through the original values of the source operand). I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups. For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor). Differential Revision: https://reviews.llvm.org/D48830 llvm-svn: 336196	2018-07-03 13:44:22 +00:00
Sjoerd Meijer	173b7f0ec7	[AArch64] Armv8.4-A: system registers This adds the following system registers: - RAS registers, - MPAM registers, - Activitiy monitor registers, - Trace Extension registers, - Timing insensitivity of data processing instructions, - Enhanced Support for Nested Virtualization. Differential Revision: https://reviews.llvm.org/D48871 llvm-svn: 336193	2018-07-03 12:09:20 +00:00
Bjorn Pettersson	8dd6cf711f	[DebugInfo] Corrections for salvageDebugInfo Summary: When salvaging a dbg.declare/dbg.addr we should not add DW_OP_stack_value to the DIExpression (see test/Transforms/InstCombine/salvage-dbg-declare.ll). Consider this example %vla = alloca i32, i64 2 call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression()) Instcombine will turn it into %vla1 = alloca [2 x i32] %vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0 call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression()) If the GEP can be eliminated, then the dbg.declare will be salvaged and we should get %vla1 = alloca [2 x i32] call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression()) The problem was that salvageDebugInfo did not recognize dbg.declare as being indirect (%vla1 points to the value, it does not hold the value), so we incorrectly got call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value)) I also made sure that llvm::salvageDebugInfo and DIExpression::prependOpcodes do not add DW_OP_stack_value to the DIExpression in case no new operands are added to the DIExpression. That way we avoid to, unneccessarily, turn a register location expression into an implicit location expression in some situations (see test11 in test/Transforms/LICM/sinking.ll). Reviewers: aprantl, vsk Reviewed By: aprantl, vsk Subscribers: JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D48837 llvm-svn: 336191	2018-07-03 11:29:00 +00:00
Benjamin Kramer	fd171f2f89	Revert "[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values" This reverts commit r336113. It causes crashes. llvm-svn: 336189	2018-07-03 11:15:17 +00:00
Sander de Smalen	7fc8543208	[AArch64][SVE] Asm: Support for saturing ADD/SUB instructions. The variants added are: signed Saturating ADD/SUB (immediate) e.g. sqadd z0.h, z0.h, #42 unsigned Saturating ADD/SUB (immediate) e.g. uqadd z0.h, z0.h, #42 signed Saturating ADD/SUB (vectors) e.g. sqadd z0.h, z0.h, z1.h unsigned Saturating ADD/SUB (vectors) e.g. uqadd z0.h, z0.h, z1.h llvm-svn: 336186	2018-07-03 09:48:22 +00:00
Petar Jovanovic	226e6117ae	[MIPS GlobalISel] Lower arguments using stack Lower more than 4 arguments using stack. This patch targets MIPS32. It supports only functions with arguments of type i32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D47934 llvm-svn: 336185	2018-07-03 09:31:48 +00:00
Chandler Carruth	3897ded691	[PM/LoopUnswitch] Fix PR37651 by correctly invalidating SCEV when unswitching loops. Original patch trying to address this was sent in D47624, but that didn't quite handle things correctly. There are two key principles used to select whether and how to invalidate SCEV-cached information about loops: 1) We must invalidate any info SCEV has cached before unswitching as we may change (or destroy) the loop structure by the act of unswitching, and make it hard to recover everything we want to invalidate within SCEV. 2) We need to invalidate all of the loops whose CFGs are mutated by the unswitching. Notably, this isn't the entire loop nest, this is every loop contained by the outermost loop reached by an exit block relevant to the unswitch. And we need to do this even when doing trivial unswitching. I've added more focused tests that directly check that SCEV starts off with imprecise information and after unswitching (and simplifying instructions) re-querying SCEV will produce precise information. These tests also specifically work to check that an outer loop's information becomes precise. However, the testing here is still a bit imperfect. Crafting test cases that reliably fail to be analyzed by SCEV before unswitching and succeed afterward proved ... very, very hard. It took me several hours and careful work to build these, and I'm not optimistic about necessarily coming up with more to cover more elaborate possibilities. Fortunately, the code pattern we are testing here in the pass is really straightforward and reliable. Thanks to Max Kazantsev for the initial work on this as well as the review, and to Hal Finkel for helping me talk through approaches to test this stuff even if it didn't come to much. Differential Revision: https://reviews.llvm.org/D47624 llvm-svn: 336183	2018-07-03 09:13:27 +00:00
Sander de Smalen	8fcc3f5feb	[AArch64][SVE] Asm: Support for vector element FP compare. Contains the following variants: - Compare with (elements from) other vector instructions: fcmeq, fcmgt, fcmge, fcmne, fcmuo. aliases: fcmle, fcmlt. e.g. fcmle p0.h, p0/z, z0.h, z1.h => fcmge p0.h, p0/z, z1.h, z0.h - Compare absolute values with (absolute values from) other vector. instructions: facge, facgt. aliases: facle, faclt. e.g. facle p0.h, p0/z, z0.h, z1.h => facge p0.h, p0/z, z1.h, z0.h - Compare vector elements with #0.0 instructions: fcmeq, fcmgt, fcmge, fcmle, fcmlt, fcmne. e.g. fcmle p0.h, p0/z, z0.h, #0.0 llvm-svn: 336182	2018-07-03 09:07:23 +00:00
Shiva Chen	a0a52bf195	[DebugInfo] Fix PR37395. DbgLabelInst has no address as its operands. Differential Revision: https://reviews.llvm.org/D46738 Patch by Hsiangkai Wang. llvm-svn: 336176	2018-07-03 07:56:04 +00:00
Max Kazantsev	3097b76e8c	[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done This patch changes order of transform in InstCombineCompares to avoid performing transforms based on ranges which produce complex bit arithmetics before more simple things (like folding with constants) are done. See PR37636 for the motivating example. Differential Revision: https://reviews.llvm.org/D48584 Reviewed By: spatel, lebedev.ri llvm-svn: 336172	2018-07-03 06:23:57 +00:00
Craig Topper	6121699b11	[X86] Add avx512vl command line to break-false-dep.ll llvm-svn: 336169	2018-07-03 04:43:49 +00:00
Teresa Johnson	f8182f1aef	[ThinLTO] Fix printing of aliases for distributed backend indexes Summary: When we import an alias (which will import a copy of the aliasee), but aren't going to import the aliasee directly, the distributed backend index will not contain the aliasee summary. Handle this in the summary assembly printer by printing "null" as the aliasee. Reviewers: davidxl, dexonsmith Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits Differential Revision: https://reviews.llvm.org/D48699 llvm-svn: 336160	2018-07-03 01:11:43 +00:00
Teresa Johnson	50615c72b4	Remove absolute path in test My test change in r336148 accidentally included an absolute path, clean that up to fix bot failures. llvm-svn: 336151	2018-07-02 23:02:07 +00:00
Teresa Johnson	8fc766681d	[ThinLTO] Fix printing of module paths for distributed backend indexes Summary: In the individual index files emitted for distributed ThinLTO backends, the module path ids are not contiguous. Assign slots to module paths in order to handle this better and also to get contiguous numbering in the summary assembly. Reviewers: davidxl, dexonsmith Subscribers: mehdi_amini, inglorion, eraman, llvm-commits, steven_wu Differential Revision: https://reviews.llvm.org/D48698 llvm-svn: 336148	2018-07-02 22:09:23 +00:00
Heejin Ahn	402b490843	[WebAssembly] Support for atomic stores Summary: Add support for atomic store instructions. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48839 llvm-svn: 336145	2018-07-02 21:22:59 +00:00
Vadzim Dambrouski	fd10286e04	[ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m. Reviewers: efriedma, rogfer01, javed.absar Reviewed By: efriedma, rogfer01 Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D48846 llvm-svn: 336144	2018-07-02 21:05:26 +00:00
Tim Shen	c7cef4bcc4	[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428). Summary: Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change SCEV is able to prove that the loop doesn't wrap-self (due to zext i16 to i64), disabling the entire loop versioning pass. Removed the zext and just use i64. Reviewers: sanjoy Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D48409 llvm-svn: 336140	2018-07-02 20:01:54 +00:00
Dan Gohman	b01d87622b	[WebAssembly] Fix fast-isel optimization of branch conditions. LLVM doesn't guarantee anything about the high bits of a register holding an i1 value at the IR level, so don't translate LLVM IR i1 values directly into WebAssembly conditional branch operands. WebAssembly's conditional branches do demand all 32 bits be valid. Fixes PR38019. llvm-svn: 336138	2018-07-02 19:45:57 +00:00
Krzysztof Parzyszek	fd97494984	[X86] Add phony registers for high halves of regs with low halves Add registers still missing after r328016 (D43353): - for bits 15-8 of SI, DI, BP, SP (H), and R8-R15 (BH), - for bits 31-16 of R8-R15 (*WH). Thanks to Craig Topper for pointing it out. llvm-svn: 336134	2018-07-02 19:05:09 +00:00
Fangrui Song	f50ad6c311	Replace unused output filenames with /dev/null in tests Similar to rLLD336129 llvm-svn: 336131	2018-07-02 18:16:44 +00:00
Farhana Aleen	3b416db19b	[SLP] Recognize min/max pattern using instructions producing same values. Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization. %1 = extractelement <2 x i32> %a, i32 0 %2 = extractelement <2 x i32> %a, i32 1 %cond = icmp sgt i32 %1, %2 %3 = extractelement <2 x i32> %a, i32 0 %4 = extractelement <2 x i32> %a, i32 1 %select = select i1 %cond, i32 %3, i32 %4 Author: FarhanaAleen Reviewed By: ABataev, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D47608 llvm-svn: 336130	2018-07-02 17:55:31 +00:00
Sanjay Patel	b999d74132	[InstCombine] reverse canonicalization of add --> or to allow more shuffle folding This extends D48485 to allow another pair of binops (add/or) to be combined either with or without a leading shuffle: or X, C --> add X, C (when X and C have no common bits set) Here, we need value tracking to determine that the 'or' can be reversed into an 'add', and we've added general infrastructure to allow extending to other opcodes or moving to where other passes could use that functionality. Differential Revision: https://reviews.llvm.org/D48662 llvm-svn: 336128	2018-07-02 17:42:29 +00:00
Francis Visoiu Mistrih	4d5b1073ba	[MC] Error on a .zerofill directive in a non-virtual section On darwin, all virtual sections have zerofill type, and having a .zerofill directive in a non-virtual section is not allowed. Instead of asserting, show a nicer error. In order to use the equivalent of .zerofill in a non-virtual section, the usage of .zero of .space is required. This patch replaces the assert with an error. Differential Revision: https://reviews.llvm.org/D48517 llvm-svn: 336127	2018-07-02 17:29:43 +00:00
Dave Lee	d4f77a523b	nm: Add -no-weak flag for hiding weak symbols Summary: This adds a new -no-weak flag to nm to hide weak symbols in its output. This also adds a -W alias for this which is analogous to -U. Patch by Keith Smiley Reviewers: kastiglione, enderby, compnerd Reviewed By: kastiglione Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48751 llvm-svn: 336126	2018-07-02 17:24:37 +00:00
Simon Pilgrim	35f196c179	[SLPVectorizer][X86] Begin adding alternate tests for call operators Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls llvm-svn: 336125	2018-07-02 17:23:45 +00:00
Vedant Kumar	9b6c096fb5	Tighten up a test for -check-debugify, NFC Use an -implicit-check-not to make sure an error which should not occur in fact does not occur before the first CHECK line. Suggested by Paul Robinson in post-commit feedback for r335897. llvm-svn: 336123	2018-07-02 17:08:36 +00:00
Simon Pilgrim	ac193d4b5c	[CostModel][X86] Add cost tests for fp rounding intrinsics Add cost tests for fp ceil, floor, nearbyint, rint and trunc. llvm-svn: 336122	2018-07-02 17:07:01 +00:00
Craig Topper	56440b9745	[X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned. Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions. Should fix PR38001 llvm-svn: 336121	2018-07-02 17:01:54 +00:00
Amara Emerson	846f2436e8	[AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin. We currently don't any-extend vararg parameters before storing them to the stack locations on Darwin. However, SelectionDAG however does this, and so user code is in the wild which inadvertently relies on this extension. This can manifest in cases where the value stored is (int)0, but the actual parameter is interpreted by va_arg as a pointer, and so not extending to 64 bits causes the callee to load additional undefined bits. llvm-svn: 336120	2018-07-02 16:39:09 +00:00
Sam Clegg	7fecdef5b2	[WebAssembly] Convert remaining tests from elf to wasm output format Differential Revision: https://reviews.llvm.org/D48748 llvm-svn: 336116	2018-07-02 16:03:49 +00:00
Simon Pilgrim	2bc8e079f2	[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case. llvm-svn: 336113	2018-07-02 15:14:07 +00:00
Simon Pilgrim	a6be2437e7	[X86][SSE] Add v8i16 shift test for 2 shift values that doesn't match basic blend We have special case support for 2 shift values for basic blends, but irregular shift patterns end up using the generic lowering, despite shuffle lowering being good enough to handle more complex blends. llvm-svn: 336112	2018-07-02 14:53:41 +00:00
Sanjay Patel	284ba0c18f	[ValueTracking] allow undef elements when matching vector abs llvm-svn: 336111	2018-07-02 14:43:40 +00:00
Yaron Keren	d414c6c131	Disable failing test on x86_64-pc-windows-gnu, see PR38006. llvm-svn: 336110	2018-07-02 14:39:32 +00:00
Alex Bradbury	07ef10ccb6	[X86] Fix test/MC/AsmParser/exprs-invalid.s after rL336104 This was my mistake for only running test/MC/X86 and test/CodeGen/X86. Arguably .word should be removed from this test, as it is not supported universally. llvm-svn: 336107	2018-07-02 14:13:27 +00:00
Sanjay Patel	951f617e16	[InstCombine] adjust shuffle tests with IR flags; NFC Due to current limitations in constant analysis, we need flags on add or mul to show propagation for the potential transform suggested in these tests (no other binops currently report identity constants). llvm-svn: 336101	2018-07-02 13:40:54 +00:00
Florian Hahn	4ebba909a2	Recommit r328307: [IPSCCP] Use constant range information for comparisons of parameters. This version contains a fix to add values for which the state in ParamState change to the worklist if the state in ValueState did not change. To avoid adding the same value multiple times, mergeInValue returns true, if it added the value to the worklist. The value is added to the worklist depending on its state in ValueState. Original message: For comparisons with parameters, we can use the ParamState lattice elements which also provide constant range information. This improves the code for PR33253 further and gets us closer to use ValueLatticeElement for all values. Also, as we are using the range information in the solver directly, we do not need tryToReplaceWithConstantRange afterwards anymore. Reviewers: dberlin, mssimpso, davide, efriedma Reviewed By: mssimpso Differential Revision: https://reviews.llvm.org/D43762 llvm-svn: 336098	2018-07-02 12:44:04 +00:00
Sanjay Patel	d980084597	[InstCombine] add tests for shuffle-binop; NFC This is another pattern mentioned in PR37806. llvm-svn: 336096	2018-07-02 12:30:46 +00:00
Simon Pilgrim	265793d52a	[SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns. We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case. This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now... llvm-svn: 336095	2018-07-02 11:28:01 +00:00
Sander de Smalen	8d4c01a702	[AArch64][SVE] Asm: Support for (SQ)INCP/DECP (scalar, vector) Increments/decrements the result with the number of active bits from the predicate. The inc/dec variants added are: - incp x0, p0.h (scalar) - incp z0.h, p0 (vector) The unsigned saturating inc/dec variants added are: - uqincp x0, p0.h (scalar) - uqincp w0, p0.h (scalar, 32bit) - uqincp z0.h, p0 (vector) The signed saturating inc/dec variants added are: - sqincp x0, p0.h (scalar) - sqincp x0, p0.h, w0 (scalar, 32bit) - sqincp z0.h, p0 (vector) llvm-svn: 336091	2018-07-02 10:08:36 +00:00
Sander de Smalen	c504101781	[AArch64][SVE] Asm: Support for (saturating) vector INC/DEC instructions. Increment/decrement vector by multiple of predicate constraint element count. The variants added by this patch are: - INCH, INCW, INC and (saturating): - SQINCH, SQINCW, SQINCD - UQINCH, UQINCW, UQINCW - SQDECH, SQINCW, SQINCD - UQDECH, UQINCW, UQINCW For example: incw z0.s, all, mul #4 llvm-svn: 336090	2018-07-02 09:31:11 +00:00
Petar Jovanovic	3af2c992dc	[Mips][FastISel] Do not duplicate condition while lowering branches This change fixes the issue that arises when we duplicate condition from the predecessor block. If the condition's arguments are not considered alive across the blocks, fast regalloc gets confused and starts generating reloads from the slots that have never been spilled to. This change also leads to smaller code given that, unlike on architectures with condition codes, on Mips we can branch directly on register value, thus we gain nothing by duplication. Patch by Dragan Mladjenovic. Differential Revision: https://reviews.llvm.org/D48642 llvm-svn: 336084	2018-07-02 08:56:57 +00:00
Sander de Smalen	8eea4f1c7d	[AArch64][SVE] Asm: Support for vector element compares (immediate). Compare vector elements with a signed/unsigned immediate, e.g. cmpgt p0.s, p0/z, z0.s, #-16 cmphi p0.s, p0/z, z0.s, #127 llvm-svn: 336081	2018-07-02 08:20:59 +00:00
Sander de Smalen	0325e304b9	Reapply r334980 and r334983. These patches were previously reverted as they led to buildbot time-outs caused by large switch statement in printAliasInstr when using UBSan and O3. The issue has been addressed with a workaround (r335525). llvm-svn: 336079	2018-07-02 07:34:52 +00:00
Max Kazantsev	66da390506	[NFC] Test that shows unprofitability of instcombine with bit ranges llvm-svn: 336078	2018-07-02 06:55:00 +00:00
QingShan Zhang	3b2aa2b4b4	[PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's multiple for i64 pre-inc load/store For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse. unsigned long long result = 0; unsigned long long foo(char p, unsigned long long n) { for (unsigned long long i = 0; i < n; i++) { unsigned long long x1 = (unsigned long long )(p - 50000 + i); unsigned long long x2 = (unsigned long long )(p - 61024 + i); unsigned long long x3 = (unsigned long long )(p - 62048 + i); unsigned long long x4 = (unsigned long long )(p - 64096 + i); result = x1 * x2 * x3 * x4; } return result; } Patch by jedilyn(Kewen Lin). Differential Revision: https://reviews.llvm.org/D48813 --This line, and those below, will be ignored-- M lib/Target/PowerPC/PPCLoopPreIncPrep.cpp A test/CodeGen/PowerPC/preincprep-i64-check.ll llvm-svn: 336074	2018-07-02 05:46:09 +00:00
Piotr Padlewski	5b3db45e8f	Implement strip.invariant.group Summary: This patch introduce new intrinsic - strip.invariant.group that was described in the RFC: Devirtualization v2 Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47103 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 336073	2018-07-02 04:49:30 +00:00
Eric Christopher	53054141a7	Add an entry for rodata constant merge sections to the default section flags in the ELF assembler. This matches the defaults given in the rest of MC. Fixes PR37997 where we couldn't assemble our own assembly output without warnings. llvm-svn: 336072	2018-07-02 00:16:39 +00:00
Craig Topper	df99cdb95b	[X86] Fix a few test names in avx512-intrinsics-fast-isel.ll to match their clang intrinsic names. I thought I fixed these yesterday, but I guess I missed a few. llvm-svn: 336071	2018-07-01 23:49:06 +00:00
Sanjay Patel	279a1a39ad	[InstCombine] add abs tests with undef elts; NFC llvm-svn: 336065	2018-07-01 17:14:37 +00:00
Sanjay Patel	a9fdb9fd37	[PatternMatch] allow undef elements in vectors with m_Neg This is similar to the m_Not change from D44076. llvm-svn: 336064	2018-07-01 13:42:57 +00:00
David Green	963401d2be	[UnrollAndJam] New Unroll and Jam pass This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder Loop So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 336062	2018-07-01 12:47:30 +00:00
Paul Semel	8dabda70af	Revert "[llvm-readobj] Fix printing format" There is a problem with the formatting on windows build. I need to investigate on this. llvm-svn: 336061	2018-07-01 11:54:09 +00:00
Simon Pilgrim	84f77ecba9	[SLPVectorizer][X86] Add some alternate tests for cast operators Alternate opcode handling only supports binary operators, these tests demonstrate missed opportunities to vectorize some sitofp/uitofp and fptosi/fptoui style casts as well as some (successful) float bits manipulations llvm-svn: 336060	2018-07-01 11:29:46 +00:00
Eugene Leviant	6e4134459b	[Evaluator] Improve evaluation of call instruction Recommit of r335324 after buildbot failure fix llvm-svn: 336059	2018-07-01 11:02:07 +00:00
Paul Semel	49997adc88	[llvm-readobj] Fix printing format We were printing every character, even those that weren't printable. It doesn't really make sense for this option. The string content was sticked to its address, added two spaces in between. Differential Revision: https://reviews.llvm.org/D48271 llvm-svn: 336058	2018-07-01 09:51:59 +00:00
Sanjay Patel	16a42ca274	[InstCombine] add tests for negate vector with undef elts; NFC llvm-svn: 336050	2018-06-30 14:11:46 +00:00
Simon Pilgrim	fae337704e	[DAGCombiner] Handle correctly non-splat power of 2 -1 divisor (PR37119) The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks. Thanks to @zvi for the original patch. Differential Revision: https://reviews.llvm.org/D45806 llvm-svn: 336048	2018-06-30 12:22:55 +00:00
Craig Topper	50a10ba6e0	[X86] Update some avx512 fast-isel tests to match their real clang IRgen. Especially of note was the test_mm_mask_set1_epi64 and other set1 tests that were truncating the element to be broadcasted to i8 and broadcasting that instead of a whole 64 bit value. Some of the others were just correcting mask sizes on parameters due to bugs in the clang test case they were generated from that have now been fixed. Some were converting i8 to <4 x i1>/<2 x i1> by truncating to i4/i2 and then bitcasting. But the clang codegen is bitcast to <8 x i1>, then extract to <4 x i1>/<2 x i1>. This is likely to incur less trouble from the integer type legalizer in the backend. llvm-svn: 336045	2018-06-30 07:25:29 +00:00
Craig Topper	db1d7f2b16	[X86] Change some chec-prefixes from X32 to X86 to match the FileCheck command line. I think this test changed and these test cases were created around the same time and missed the change. llvm-svn: 336044	2018-06-30 06:45:10 +00:00
Craig Topper	8f6ace5bcd	[X86] Remove test cases from avx512vl-intrinsics-fast-isel.ll for intrinsics that don't really exist in clang. llvm-svn: 336043	2018-06-30 06:45:09 +00:00
Tom Stellard	eebbfc2809	AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal. Summary: We could split sizes that are not power of two into smaller sized G_IMPLICIT_DEF instructions, but this ends up generating G_MERGE_VALUES instructions which we then have to handle in the instruction selector. Since G_IMPLICIT_DEF is really a no-op it's easier just to keep everything that can fit into a register legal. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48777 llvm-svn: 336041	2018-06-30 04:09:44 +00:00
Jessica Paquette	8bda1881ca	[MachineOutliner] Add support for target-default outlining. This adds functionality to the outliner that allows targets to specify certain functions that should be outlined from by default. If a target supports default outlining, then it specifies that in its TargetOptions. In the case that it does, and the user hasn't specified that they never want to outline, the outliner will be added to the pass pipeline and will run on those default functions. This is a preliminary patch for turning the outliner on by default under -Oz for AArch64. https://reviews.llvm.org/D48776 llvm-svn: 336040	2018-06-30 03:56:03 +00:00
Craig Topper	59f2f38fe0	[X86] Remove masking from avx512 rotate intrinsics. Use select in IR instead. llvm-svn: 336035	2018-06-30 01:32:04 +00:00
Chandler Carruth	7c557f804d	[instsimplify] Move the instsimplify pass to use more obvious file names and diretory. Also cleans up all the associated naming to be consistent and removes the public access to the pass ID which was unused in LLVM. Also runs clang-format over parts that changed, which generally cleans up a bunch of formatting. This is in preparation for doing some internal cleanups to the pass. Differential Revision: https://reviews.llvm.org/D47352 llvm-svn: 336028	2018-06-29 23:36:03 +00:00
Heejin Ahn	5cc0e25324	[WebAssembly] Update comments for non-splat pow2 vector test case Summary: After rL335727, (sdiv X, 1) is treated as a special case, so we can safely transform 'sdiv's in non-splat pow vectors into 'shr's even when some of its entries are '1'. The test expectations have been already fixed in rL335771, but the comments were out of date. Also changed the filename from `vector_sdiv.ll` to `vector-sdiv.ll` to be consistent with other test file names. Reviewers: RKSimon Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D48692 llvm-svn: 336018	2018-06-29 21:27:20 +00:00
Alex Shlyapnikov	788764ca12	[HWASan] Do not retag allocas before return from the function. Summary: Retagging allocas before returning from the function might help detecting use after return bugs, but it does not work at all in real life, when instrumented and non-instrumented code is intermixed. Consider the following code: F_non_instrumented() { T x; F1_instrumented(&x); ... } { F_instrumented(); F_non_instrumented(); } - F_instrumented call leaves the stack below the current sp tagged randomly for UAR detection - F_non_instrumented allocates its own vars on that tagged stack, not generating any tags, that is the address of x has tag 0, but the shadow memory still contains tags left behind by F_instrumented on the previous step - F1_instrumented verifies &x before using it and traps on tag mismatch, 0 vs whatever tag was set by F_instrumented Reviewers: eugenis Subscribers: srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D48664 llvm-svn: 336011	2018-06-29 20:20:17 +00:00
Sean Fertile	cd0d7634f6	Revert "Extend CFGPrinter and CallPrinter with Heat Colors" This reverts r335996 which broke graph printing in Polly. llvm-svn: 336000	2018-06-29 17:48:58 +00:00
Matt Arsenault	f5be3ad7f8	AMDGPU: Don't use struct type for argument layout This was introducing unnecessary padding after the explicit arguments, depending on the alignment of the total struct type. Also has the side effect of avoiding creating an extra GEP for the offset from the base kernel argument to the explicit kernel argument offset. llvm-svn: 335999	2018-06-29 17:31:42 +00:00
Craig Topper	87b107dd69	[X86] Limit the number of target specific nodes emitted in LowerShiftParts The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes. The changed test cases still aren't optimal. Differential Revision: https://reviews.llvm.org/D48619 llvm-svn: 335998	2018-06-29 17:24:07 +00:00
Sean Fertile	3b0535b424	Extend CFGPrinter and CallPrinter with Heat Colors Extends the CFGPrinter and CallPrinter with heat colors based on heuristics or profiling information. The colors are enabled by default and can be toggled on/off for CFGPrinter by using the option -cfg-heat-colors for both -dot-cfg[-only] and -view-cfg[-only]. Similarly, the colors can be toggled on/off for CallPrinter by using the option -callgraph-heat-colors for both -dot-callgraph and -view-callgraph. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D40425 llvm-svn: 335996	2018-06-29 17:13:58 +00:00
Jonas Devlieghere	a0857eaefe	[dsymutil] Make the CachedBinaryHolder the default Replaces all uses of the old binary holder with its cached variant. Differential revision: https://reviews.llvm.org/D48770 llvm-svn: 335991	2018-06-29 16:51:52 +00:00
Petar Jovanovic	cccc236a96	[mips] Support shrink-wrapping Except for -O0, it's enabled by default. Patch by Vladimir Stefanovic. Differential Revision: https://reviews.llvm.org/D47947 llvm-svn: 335989	2018-06-29 16:37:16 +00:00
Stanislav Mekhanoshin	20d4795d93	[AMDGPU] Enable LICM in the BE pipeline This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion. Differential Revision: https://reviews.llvm.org/D48604 llvm-svn: 335988	2018-06-29 16:26:53 +00:00
Jessica Paquette	79917b9686	[MachineOutliner] Add always and never options to -enable-machine-outliner This is a recommit of r335887, which was erroneously committed earlier. To enable the MachineOutliner by default on AArch64, we need to be able to disable the MachineOutliner and also provide an option to "always" enable the outliner. This adds that capability. It allows the user to still use the old -enable-machine-outliner option, which defaults to "always". This is building up to allowing the user to specify "always" versus the target default outlining behaviour. https://reviews.llvm.org/D48682 llvm-svn: 335986	2018-06-29 16:12:45 +00:00
Sanjay Patel	4491c0dd45	[InstCombine] add more tests for shuffle-binop folds; NFC The mul+shl tests add coverage for the fold enabled with D48678. The and+or tests are not handled yet; that's D48662. llvm-svn: 335984	2018-06-29 15:28:11 +00:00
Alexey Bataev	2a03d4296a	[DEBUG_INFO, NVPTX] Do not emit .debug_loc section. Summary: .debug_loc section is not supported for NVPTX target. If there is an object whose location can change during its lifetime, we do not generate debug location info for this variable. Reviewers: echristo Subscribers: jholewinski, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D48730 llvm-svn: 335976	2018-06-29 14:23:28 +00:00
Sanjay Patel	da66753e01	[InstCombine] enhance shuffle-of-binops to allow different variable ops (PR37806) This was discussed in D48401 as another improvement for: https://bugs.llvm.org/show_bug.cgi?id=37806 If we have 2 different variable values, then we shuffle (select) those lanes, shuffle (select) the constants, and then perform the binop. This eliminates a binop. The new shuffle uses the same shuffle mask as the existing shuffle, so there's no danger of creating a difficult shuffle. All of the earlier constraints still apply, but we also check for extra uses to avoid creating more instructions than we'll remove. Additionally, we're disallowing the fold for div/rem because that could expose a UB hole. Differential Revision: https://reviews.llvm.org/D48678 llvm-svn: 335974	2018-06-29 13:44:06 +00:00
Roman Shirokiy	272eac85c7	Fix overconfident assert in ScalarEvolution::isImpliedViaMerge We can have AddRec with loops having many predecessors. This changes an assert to an early return. Differential Revision: https://reviews.llvm.org/D48766 llvm-svn: 335965	2018-06-29 11:46:30 +00:00
Sjoerd Meijer	3b599d75d5	[AArch64] Armv8.4-A: Virtualization system registers This adds the Secure EL2 extension. Differential Revision: https://reviews.llvm.org/D48711 llvm-svn: 335962	2018-06-29 11:03:15 +00:00
Simon Pilgrim	aab8660e23	[X86][SSE] Support v16i8/v32i8 vector rotations This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op. Unfortunately I haven't found a decent way to share much of this code with the shift equivalent. Differential Revision: https://reviews.llvm.org/D48655 llvm-svn: 335957	2018-06-29 09:36:39 +00:00
Roman Lebedev	8d081b78e4	SCEVExpander::expandAddRecExprLiterally(): check before casting as Instruction Summary: An alternative to D48597. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 \| PR37936 ]]. The problem is as follows: 1. `indvars` marks `%dec` as `NUW`. 2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908) 3. `loop-reduce` tries to do some further modification, but crashes with an type assertion in cast, because `%dec` is no longer an `Instruction`, If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`, store that into a file, and then run `-loop-reduce`, there is no crash. So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV. But in this case we can just not crash if it's not an `Instruction`. This is just a local fix, unlike D48597, so there may very well be other problems. Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi Reviewed By: mkazantsev Subscribers: evstupac, javed.absar, spatel, llvm-commits Differential Revision: https://reviews.llvm.org/D48599 llvm-svn: 335950	2018-06-29 07:44:20 +00:00
Craig Topper	875e9f8fa4	[X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in IR instead. While there improve the coverage of the intrinsic testing and add fast-isel tests. llvm-svn: 335944	2018-06-29 05:43:26 +00:00
Sterling Augustine	0cf1f15e83	Require x86 for this test. llvm-svn: 335939	2018-06-28 23:22:14 +00:00
Jessica Paquette	0c5d3ffbb8	[MachineOutliner] Never add the outliner in -O0 This is a recommit of r335879. We shouldn't add the outliner when compiling at -O0 even if -enable-machine-outliner is passed in. This makes sure that we don't add it in this case. This also removes -O0 from the outliner DWARF test. llvm-svn: 335930	2018-06-28 21:49:24 +00:00
Sanjay Patel	019d3cd3b4	[InstCombine] adjust shuffle tests; NFC Use xor for the extra uses test because div/rem have other problems. llvm-svn: 335924	2018-06-28 21:14:02 +00:00
Jake Ehrlich	0f440d832f	[llvm-readobj] Add experimental support for SHT_RELR sections This change adds experimental support for SHT_RELR sections, proposed here: https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg Definitions for the new ELF section type and dynamic array tags, as well as the encoding used in the new section are all under discussion and are subject to change. Use with caution! Author: rahulchaudhry Differential Revision: https://reviews.llvm.org/D47919 llvm-svn: 335922	2018-06-28 21:07:34 +00:00
Martin Storsjo	2a9bd7b756	[COFF] Fix constant sharing regression for MinGW This fixes a regression since SVN r334523, where the object files built targeting MinGW were rejected by GNU binutils tools. Prior to that commit, we only put constants in comdat for MSVC configurations. Differential Revision: https://reviews.llvm.org/D48567 llvm-svn: 335918	2018-06-28 20:28:29 +00:00
Teresa Johnson	e87868b7e9	[ThinLTO] Port InlinerFunctionImportStats handling to new PM Summary: The InlinerFunctionImportStats will collect and dump stats regarding how many function inlined into the module were imported by ThinLTO. Reviewers: wmi, dexonsmith Subscribers: mehdi_amini, inglorion, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D48729 llvm-svn: 335914	2018-06-28 20:07:47 +00:00
Eli Friedman	6613efbd4e	[ARM] Add missing Thumb2 assembler diagnostics. Mostly just adding checks for Thumb2 instructions which correspond to ARM instructions which already had diagnostics. While I'm here, also fix ARM-mode strd to check the input registers correctly. Differential Revision: https://reviews.llvm.org/D48610 llvm-svn: 335909	2018-06-28 19:53:12 +00:00
Sterling Augustine	052ce120d5	Some targets don't have lld built, so just use a binary copy of the input file. llvm-svn: 335908	2018-06-28 19:47:23 +00:00
Anastasis Grammenos	425df22ee3	[SROA] Preserve DebugLoc when rewriting alloca partitions When rewriting an alloca partition copy the DL from the old alloca over the the new one. Differential Revision: https://reviews.llvm.org/D48640 llvm-svn: 335904	2018-06-28 18:58:30 +00:00
Sterling Augustine	bc78b62169	Handle absolute symbols as branch targets in disassembly. https://reviews.llvm.org/D48554 llvm-svn: 335903	2018-06-28 18:57:13 +00:00
Vedant Kumar	197e73fede	[Debugify] Do not report line 0 locations as errors The checking logic should not treat artificial locations as being somehow problematic. Producing these locations can be the desired behavior of some passes. See llvm.org/PR37961. llvm-svn: 335897	2018-06-28 18:21:11 +00:00
Craig Topper	90317d1d94	[X86] Suppress load folding into and/or/xor if it will prevent matching btr/bts/btc. This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy. Differential Revision: https://reviews.llvm.org/D48706 llvm-svn: 335895	2018-06-28 17:58:01 +00:00
Jonas Devlieghere	b757fc3878	Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"" Reverting because this is causing failures in the LLDB test suite on GreenDragon. LLVM ERROR: unsupported relocation with subtraction expression, symbol '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction expression llvm-svn: 335894	2018-06-28 17:56:43 +00:00
Jonas Devlieghere	54a4724467	Revert "[OrcMCJIT] Fix test after r335508 causing it to fail on green dragon" This reverts commit a6b904daa1d55e31187c85e5b54ef2ddc37fa713. llvm-svn: 335893	2018-06-28 17:56:27 +00:00
Sanjay Patel	57bda365bf	[InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806) This is an enhancement to D48401 that was discussed in: https://bugs.llvm.org/show_bug.cgi?id=37806 We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other direction because that's generally better of course). This allows us to remove the shuffle as we do in the regular opcodes-are-the-same cases. This requires a small hack to make sure we don't introduce any extra poison: https://rise4fun.com/Alive/ZGv Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There are planned enhancements for opcode transforms such or -> add. Note that there's a different fold needed if we've already managed to simplify away a binop as seen in the test based on PR37806, but we manage to get that one case here because this fold is positioned above the demanded elements fold currently. Differential Revision: https://reviews.llvm.org/D48485 llvm-svn: 335888	2018-06-28 17:48:04 +00:00
Simon Pilgrim	9c70d48cb2	[DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED) We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884. llvm-svn: 335886	2018-06-28 17:33:41 +00:00
Jessica Paquette	d6261bef7b	Revert "[MachineOutliner] Add always and never options to -enable-machine-outliner" I accidentally committed this instead of D48683 because I haven't had coffee yet. llvm-svn: 335883	2018-06-28 17:26:19 +00:00
Jessica Paquette	f3a44fe833	Revert "[MachineOutliner] Never add the outliner in -O0" This reverts commit 9c7c10e4073a0bc6a759ce5cd33afbac74930091. It relies on r335872 since that introduces the machine outliner flags test. I meant to commit D48683 in that commit, but got mixed up and committed D48682 instead. So, I'm reverting this and r335872, since D48682 hasn't made it through review yet. llvm-svn: 335882	2018-06-28 17:26:18 +00:00
Jessica Paquette	c9d675266e	[MachineOutliner] Never add the outliner in -O0 We shouldn't add the outliner when compiling at -O0 even if -enable-machine-outliner is passed in. This makes sure that we don't add it in this case. This also updates machine-outliner-flags to reflect the change and improves the comment describing what that test does. llvm-svn: 335879	2018-06-28 17:05:57 +00:00
Matthias Braun	da5e7e11d1	SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O) Add NoTrapAfterNoreturn target option which skips emission of traps behind noreturn calls even if TrapUnreachable is enabled. Enable the feature on Mach-O to save code size; Comments suggest it is not possible to enable it for the other users of TrapUnreachable. rdar://41530228 DifferentialRevision: https://reviews.llvm.org/D48674 llvm-svn: 335877	2018-06-28 17:00:45 +00:00
Jessica Paquette	1ccb66c5fb	[MachineOutliner] Add always and never options to -enable-machine-outliner To enable the MachineOutliner by default on AArch64, we need to be able to disable the MachineOutliner and also provide an option to "always" enable the outliner. This adds that capability. It allows the user to still use the old -enable-machine-outliner option, which defaults to "always". This is building up to allowing the user to specify "always" versus the target-default outlining behaviour. llvm-svn: 335872	2018-06-28 16:39:42 +00:00
Haojian Wu	2103990e63	Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV" This reverts commit r335821. This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce. llvm-svn: 335871	2018-06-28 16:25:57 +00:00
Simon Pilgrim	83125594ed	[llvm-mca][x86] Add FMA4 resource tests We should be ensuring we have (near) complete test coverage of instructions, at least for the generic model. llvm-svn: 335870	2018-06-28 16:24:13 +00:00
Simon Pilgrim	12f9503d40	[llvm-mca][x86] Add 3dnow! resource tests We should be ensuring we have (near) complete test coverage of instructions, at least for the generic model. llvm-svn: 335869	2018-06-28 16:21:22 +00:00
Stanislav Mekhanoshin	67aa18f165	[AMDGPU] Early expansion of 32 bit udiv/urem This allows hoisting of a common code, for instance if denominator is loop invariant. Current change is expansion only, adding licm to the target pass list going to be a separate patch. Given this patch changes to codegen are minor as the expansion is similar to that on DAG. DAG expansion still must remain for R600. Differential Revision: https://reviews.llvm.org/D48586 llvm-svn: 335868	2018-06-28 15:59:18 +00:00
Stanislav Mekhanoshin	298a61590a	[AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16 Differential Revision: https://reviews.llvm.org/D48677 llvm-svn: 335866	2018-06-28 15:24:46 +00:00
Alexey Bataev	5f6e51d54c	[DEBUG_INFO, NVPTX] Add test for .debug_loc section, NFC. llvm-svn: 335861	2018-06-28 15:14:58 +00:00
John Brawn	bdbbd8381f	Add a PhiValuesAnalysis pass to calculate the underlying values of phis This pass is being added in order to make the information available to BasicAA, which can't do caching of this information itself, but possibly this information may be useful for other passes. Incorporates code based on Daniel Berlin's implementation of Tarjan's algorithm. Differential Revision: https://reviews.llvm.org/D47893 llvm-svn: 335857	2018-06-28 14:13:06 +00:00
Benjamin Kramer	269eb21e1c	Revert "Add support for generating a call graph profile from Branch Frequency Info." This reverts commits r335794 and r335797. Breaks ThinLTO+FDO selfhost. llvm-svn: 335851	2018-06-28 13:15:03 +00:00
Sjoerd Meijer	c89ca5582a	[ARM] Parallel DSP Pass Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of this pass is to do some straightforward IR pattern matching to create ACLE DSP intrinsics, which map on these 32-bit SIMD operations. Currently, only the SMLAD instruction gets recognised. This instruction performs two multiplications with 16-bit operands, and stores the result in an accumulator. We will follow this up with patches to recognise SMLAD in more cases, and also to generate other DSP instructions (like e.g. SADD16). Patch by: Sam Parker and Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D48128 llvm-svn: 335850	2018-06-28 12:55:29 +00:00
Matt Arsenault	1fb9013368	AMDGPU: Error on calls from graphics shaders In principle nothing should stop these from working, but work is necessary to create an ABI for dealing with the stack related registers. llvm-svn: 335829	2018-06-28 10:18:36 +00:00
Matt Arsenault	513e0c0ea4	AMDGPU: Fix assert on aggregate type kernel arguments Just fix the crash for now by not doing the optimization since figuring out how to properly convert the bits for an arbitrary struct is a pain. Also fix a crash when there is only an empty struct argument. llvm-svn: 335827	2018-06-28 10:18:11 +00:00
Simon Pilgrim	abebe4c746	[DAGCombiner] Ensure we use the correct CC result type in visitSDIV We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. llvm-svn: 335821	2018-06-28 09:54:28 +00:00
Florian Hahn	388af14f85	[SCCP] Mark CFG as preserved. SCCP does not change the CFG, so we can mark it as preserved. Reviewers: dberlin, efriedma, davide Reviewed By: davide Differential Revision: https://reviews.llvm.org/D47149 llvm-svn: 335820	2018-06-28 09:53:38 +00:00
Max Kazantsev	f5ba37182e	[IndVarSimplify] Ignore unreachable users of truncs If a trunc has a user in a block which is not reachable from entry, we can safely perform trunc elimination as if this user didn't exist. llvm-svn: 335816	2018-06-28 08:20:03 +00:00
Michael J. Spencer	5bf1ead377	Add support for generating a call graph profile from Branch Frequency Info. === Generating the CG Profile === The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight. After scanning all the functions, it generates an appending module flag containing the data. The format looks like: ``` !llvm.module.flags = !{!0} !0 = !{i32 5, !"CG Profile", !1} !1 = !{!2, !3, !4} ; List of edges !2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32 !3 = !{void (i1)* @freq, void ()* @a, i64 11} !4 = !{void (i1)* @freq, void ()* @b, i64 20} ``` Differential Revision: https://reviews.llvm.org/D48105 llvm-svn: 335794	2018-06-27 23:58:08 +00:00
Sameer AbuAsal	9b65ffb097	[RISCV] Add machine function pass to merge base + offset Summary: In r333455 we added a peephole to fix the corner cases that result from separating base + offset lowering of global address.The peephole didn't handle some of the cases because it only has a basic block view instead of a function level view. This patch replaces that logic with a machine function pass. In addition to handling the original cases it handles uses of the global address across blocks in function and folding an offset from LW\SW instruction. This pass won't run for OptNone compilation, so there will be a negative impact overall vs the old approach at O0. Reviewers: asb, apazos, mgrang Reviewed By: asb Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones Differential Revision: https://reviews.llvm.org/D47857 llvm-svn: 335786	2018-06-27 20:51:42 +00:00
Sanjay Patel	1ef49be8b6	[InstCombine] add tests for vector-select-of-binops with 2 variables; NFC llvm-svn: 335778	2018-06-27 20:23:47 +00:00
Fangrui Song	5dc371a7a6	[WebAssembly] Try fixing test/CodeGen/WebAssembly/vector_sdiv.ll llvm-svn: 335771	2018-06-27 19:35:50 +00:00
Craig Topper	6bea2c7f9b	[X86] Teach the disassembler to use %eiz/%riz instead of NoRegister when the SIB byte is present, but doesn't encode an index register and there was another shorter encoding that would achieve the same result. The %eiz/%riz are dummy registers that force the encoder to emit a SIB byte when it normally wouldn't. By emitting them in the disassembly output we ensure that assembling the disassembler output would also produce a SIB byte. This should match the behavior of objdump from binutils. llvm-svn: 335768	2018-06-27 19:03:36 +00:00
Daniel Sanders	bdeb880d14	[globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64 Now that we have the ability to legalize based on MMO's. Add support for legalizing based on AtomicOrdering and use it to correct the legalization of the atomic instructions. Also extend all() to be a variadic template as this ruleset now requires 3 and 4 argument versions. llvm-svn: 335767	2018-06-27 19:03:21 +00:00
Teresa Johnson	6835c284a4	[ThinLTO] Fix test Fix test changes added in r335760. Even though we are invoking llvm-lto2 in single threaded mode, the order of processing the modules in the backend is apparently not deterministic. Handle the expected debug messages in any order. (The determinism would be good to fix, but not related to this change.) This also undoes the change I made in r335764 to help debug this. llvm-svn: 335766	2018-06-27 19:00:35 +00:00
Teresa Johnson	6535b3562f	[ThinLTO] Modify test to help diagnose bot failures I am getting bot failures from r335760 that are difficult to diagnose since the stderr is getting redirected to FileCheck. Save and dump the debug output to stderr to help debug the issue. llvm-svn: 335764	2018-06-27 18:36:53 +00:00
Sanjay Patel	d052de856d	[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761	2018-06-27 18:16:40 +00:00
Teresa Johnson	7e7b13d016	[ThinLTO] Print names in function import debug messages when available Summary: Rather than just print the GUID, when it is available in the index, print the global name as well in the function import thin link debug messages. Names will be available when the combined index is being built by the same process, e.g. a linker or "llvm-lto2 run". Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits Differential Revision: https://reviews.llvm.org/D48612 llvm-svn: 335760	2018-06-27 18:03:39 +00:00
Jessica Paquette	f472f6159a	[MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across It isn't safe to outline sequences of instructions where x16/x17/nzcv live across the sequence. This teaches the outliner to check whether or not a specific canidate has x16/x17/nzcv live across it and discard the candidate in the case that that is true. https://bugs.llvm.org/show_bug.cgi?id=37573 https://reviews.llvm.org/D47655 llvm-svn: 335758	2018-06-27 17:43:27 +00:00
Sanjay Patel	7e45aebe55	[InstCombine] add more tests for shuffle with different binops; NFC llvm-svn: 335756	2018-06-27 17:21:57 +00:00
Craig Topper	812fcb35e7	[X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index. Fixes PR37938. llvm-svn: 335754	2018-06-27 16:47:39 +00:00
Craig Topper	069628b4df	[X86] Add test cases for D48606. llvm-svn: 335753	2018-06-27 16:47:36 +00:00
Simon Pilgrim	8a02b25313	[X86][SSE] Add missing AVX512 rotation tests Increase coverage to make sure we're not doing anything stupid without AVX512BW llvm-svn: 335746	2018-06-27 16:00:53 +00:00
Craig Topper	31cbe75b3b	[X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744	2018-06-27 15:57:53 +00:00
Stanislav Mekhanoshin	1a1687f1bb	[AMDGPU] Convert rcp to rcp_iflag If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742	2018-06-27 15:33:33 +00:00
Luke Geeson	316327150b	[AArch64] Reverting FP16 vcvth_n_s64_f16 to fix llvm-svn: 335737	2018-06-27 14:34:40 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Ivan A. Kosarev	7231598fce	[NEON] Support vldNq intrinsics in AArch32 (LLVM part) This patch adds support for the q versions of the dup (load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for example. Currently, non-q versions of the dup intrinsics are implemented in clang by generating IR that first loads the elements of the structure into the first lane with the lane (to-single-lane) intrinsics, and then propagating it other lanes. There are at least two problems with this approach. First, there are no double-spaced to-single-lane byte-element instructions. For example, there is no such instruction as 'vld2.8 { d0[0], d2[0] }, [r0]'. That means we cannot rely on the to-single-lane intrinsics and instructions to implement the q versions of the dup intrinsics. Note that to-all-lanes instructions do support all sizes of data items, including bytes. The second problem with the current approach is that we need a separate vdup instruction to propagate the structure to each lane. So for vld4q_dup_f16() we would need four vdup instructions in addition to the initial vld instruction. This patch introduces dup LLVM intrinsics and reworks handling of the currently supported (non-q) NEON dup intrinsics to expand them into those LLVM intrinsics, thus eliminating the need for using to-single-lane intrinsics and instructions. Additionally, this patch adds support for u64 and s64 dup NEON intrinsics. These are marked as Arch64-only in the ARM NEON Reference, but it seems there are no reasons to not support them in AArch32 mode. Please correct, if that is wrong. That's what we generate with this patch applied: vld2q_dup_f16: vld2.16 {d0[], d2[]}, [r0] vld2.16 {d1[], d3[]}, [r0] vld3q_dup_f16: vld3.16 {d0[], d2[], d4[]}, [r0] vld3.16 {d1[], d3[], d5[]}, [r0] vld4q_dup_f16: vld4.16 {d0[], d2[], d4[], d6[]}, [r0] vld4.16 {d1[], d3[], d5[], d7[]}, [r0] Differential Revision: https://reviews.llvm.org/D48439 llvm-svn: 335733	2018-06-27 13:57:52 +00:00
Simon Pilgrim	d3e583a52d	[DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in pow2 expansion For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs llvm-svn: 335727	2018-06-27 12:45:31 +00:00
Simon Pilgrim	41afbcb9ca	[X86][SSE] Include MIN_SIGNED element in non-uniform SDIV pow2 tests llvm-svn: 335721	2018-06-27 10:59:36 +00:00
Simon Pilgrim	dfbcc66adc	[DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0) Fixes PR37569. llvm-svn: 335719	2018-06-27 10:21:06 +00:00
Simon Pilgrim	0a566bc0ae	[DAGCombiner] Don't accept signbit sdiv divisors in sdiv-by-pow2 vector expansion (PR37569) llvm-svn: 335717	2018-06-27 09:41:22 +00:00
Luke Geeson	68cb233c0f	[AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns llvm-svn: 335715	2018-06-27 09:20:13 +00:00
Vedant Kumar	f6c0b41fb7	[InstCombine] Avoid creating mis-sized dbg.values in commonCastTransforms() This prevents InstCombine from creating mis-sized dbg.values when replacing a sequence of casts with a simpler cast. For example, in: (fptrunc (floor (fpext X))) -> (floorf X) We no longer emit dbg.value(X) (with a 32-bit float operand) to describe (fpext X) (which is a 64-bit float). This was diagnosed by the debugify check added in r335682. llvm-svn: 335696	2018-06-27 00:47:53 +00:00
Vedant Kumar	d13536e9f3	[Debugify] Handle failure to get fragment size when checking dbg.values It's not possible to get the fragment size of some dbg.values. Teach the mis-sized dbg.value diagnostic to detect this scenario and bail out. Tested with: $ find test/Transforms -print -exec opt -debugify-each -instcombine {} \; llvm-svn: 335695	2018-06-27 00:47:52 +00:00
Vedant Kumar	b9c1a234d2	[Debugify] Diagnose mis-sized dbg.values Report an error in -check-debugify when the size of a dbg.value operand doesn't match up with the size of the variable it describes. Eventually this check should be moved into the IR verifier. For the moment, it's useful to include the check in -check-debugify as a means of catching regressions and finding existing bugs. Here are some instances of bugs the new check finds in the -O2 pipeline (all in InstCombine): 1) A float is used where a double is expected: ERROR: dbg.value operand has size 32, but its variable has size 64: call void @llvm.dbg.value(metadata float %expf, metadata !12, metadata !DIExpression()), !dbg !15 2) An i8 is used where an i32 is expected: ERROR: dbg.value operand has size 8, but its variable has size 32: call void @llvm.dbg.value(metadata i8 %t4, metadata !14, metadata !DIExpression()), !dbg !24 3) A <4 x i32> is used where something twice as large is expected (perhaps a <4 x i64>, I haven't double-checked): ERROR: dbg.value operand has size 128, but its variable has size 256: call void @llvm.dbg.value(metadata <4 x i32> %4, metadata !40, metadata !DIExpression()), !dbg !95 Differential Revision: https://reviews.llvm.org/D48408 llvm-svn: 335682	2018-06-26 22:46:41 +00:00
Evgeniy Stepanov	289a7d4c7d	Revert "[asan] Instrument comdat globals on COFF targets" Causes false positive ODR violation reports on __llvm_profile_raw_version. llvm-svn: 335681	2018-06-26 22:43:48 +00:00
Michael Zolotukhin	d3b8bdef01	[JumpThreading] Don't try to rewrite a use if it's already valid. Summary: When recording uses we need to rewrite after cloning a loop we need to check if the use is not dominated by the original def. The initial assumption was that the cloned basic block will introduce a new path and thus the original def will only dominate the use if they are in the same BB, but as the reproducer from PR37745 shows it's not always the case. This fixes PR37745. Reviewers: haicheng, Ka-Ka Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D48111 llvm-svn: 335675	2018-06-26 22:19:48 +00:00
Simon Pilgrim	c9e60adcb5	[X86] Add test for SDIV by sign bit (minsigned) value llvm-svn: 335671	2018-06-26 22:03:00 +00:00
Lang Hames	6a94134b11	[ORC] Add LLJIT and LLLazyJIT, and replace OrcLazyJIT in LLI with LLLazyJIT. LLJIT is a prefabricated ORC based JIT class that is meant to be the go-to replacement for MCJIT. Unlike OrcMCJITReplacement (which will continue to be supported) it is not API or bug-for-bug compatible, but targets the same use cases: Simple, non-lazy compilation and execution of LLVM IR. LLLazyJIT extends LLJIT with support for function-at-a-time lazy compilation, similar to what was provided by LLVM's original (now long deprecated) JIT APIs. This commit also contains some simple utility classes (CtorDtorRunner2, LocalCXXRuntimeOverrides2, JITTargetMachineBuilder) to support LLJIT and LLLazyJIT. Both of these classes are works in progress. Feedback from JIT clients is very welcome! llvm-svn: 335670	2018-06-26 21:35:48 +00:00
Jessica Paquette	67599c2e1e	[X86][AsmParser] Recommit r335658 Recommit of r335658 so that it does not change the behaviour of any existing error output. llvm-svn: 335668	2018-06-26 21:30:34 +00:00
Jessica Paquette	0a80af0761	Revert "[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode" This reverts commit 4850a9aae8b38c7deadc103d634ec7397e6c323b. It caused MC/X86/x86_errors.s to fail. Will fix and recommit shortly. llvm-svn: 335660	2018-06-26 20:57:19 +00:00
Jessica Paquette	0e40d4bfc3	[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode Right now, when we use RIP-relative instructions in 32-bit mode, we'll just assert and crash. This adds an error message which tells the user that they can't do that in 32-bit mode, so that we don't crash (and also can see the issue outside of assert builds). llvm-svn: 335658	2018-06-26 20:33:46 +00:00
Stanislav Mekhanoshin	dacda79ee6	[AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic This intrinsic selects v_mad_f32 regardless of fp32 denorm support. Differential Revision: https://reviews.llvm.org/D48573 llvm-svn: 335654	2018-06-26 20:04:19 +00:00
Matt Arsenault	8c4a35237a	AMDGPU: Add pass to lower kernel arguments to loads This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650	2018-06-26 19:10:00 +00:00
Matt Arsenault	7e991d30c0	ConstantFold: Don't fold global address vs. null for addrspace != 0 Not sure why this logic seems to be repeated in 2 different places, one called by the other. On AMDGPU addrspace(3) globals start allocating at 0, so these checks will be incorrect (not that real code actually tries to compare these addresses) llvm-svn: 335649	2018-06-26 18:55:43 +00:00
Vedant Kumar	2e6c5f96dc	[Debugify] Don't treat missing dbg.values as an error (PR37942) When checking the debug info in a module, don't treat a missing dbg.value as an error. The dbg.value may simply have been DCE'd, in which case the debugger has enough information to display the variable as <optimized out>. llvm-svn: 335647	2018-06-26 18:54:10 +00:00
Matt Arsenault	2c1a570aab	LoopUnroll: Allow analyzing intrinsic call costs I'm not sure why the code here is skipping calls since TTI does try to do something for general calls, but it at least should allow intrinsics. Skip intrinsics that should not be omitted as calls, which is by far the most common case on AMDGPU. llvm-svn: 335645	2018-06-26 18:51:17 +00:00
Brendon Cahoon	b7169c435a	[Hexagon] Add a "generic" cpu Add the generic processor for Hexagon so that it can be used with 3rd party programs that create a back-end with the "generic" CPU. This patch also enables the JIT for Hexagon. Differential Revision: https://reviews.llvm.org/D48571 llvm-svn: 335641	2018-06-26 18:44:05 +00:00
Simon Pilgrim	7f55af37f4	[DAGCombiner] Don't accept -1 sdiv divisors in sdiv-by-pow2 vector expansion (PR37119) Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported. llvm-svn: 335637	2018-06-26 17:46:51 +00:00
Fangrui Song	ee15d3dcdb	Move `REQUIRES:` line to the top llvm-svn: 335635	2018-06-26 17:44:23 +00:00
Sanjay Patel	ad0bfb844d	[InstSimplify] fold shifts by sext bool https://rise4fun.com/Alive/c3Y llvm-svn: 335633	2018-06-26 17:31:38 +00:00
Sanjay Patel	3d1e4d6fa6	[InstSimplify] add tests for shifts by sext bool; NFC llvm-svn: 335631	2018-06-26 17:15:07 +00:00
Simon Pilgrim	1576df53a9	[X86][SSE] Add another sdiv by (nonuniform) minus one test (PR37119) Include a test that divides by -1 but not by 1 (another special case) llvm-svn: 335629	2018-06-26 17:06:05 +00:00
Sanjay Patel	3575f0c0b3	[InstCombine] fold urem with sext bool divisor Similar to other patches in this series: https://reviews.llvm.org/rL335512 https://reviews.llvm.org/rL335527 https://reviews.llvm.org/rL335597 https://reviews.llvm.org/rL335616 ...this is filling a gap in analysis that is exposed by an unrelated select-of-constants transform. I didn't see a way to unify the sext cases because each div/rem opcode results in a different fold. Note that in this case, the backend might want to convert the select into math: Name: sext urem %e = sext i1 %x to i32 %r = urem i32 %y, %e => %c = icmp eq i32 %y, -1 %z = zext i1 %c to i32 %r = add i32 %z, %y llvm-svn: 335622	2018-06-26 16:30:00 +00:00
Simon Pilgrim	bbfc18b5b5	[SLPVectorizer] Recognise non uniform power of 2 constants Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them. As SLP works with arrays of values I don't think we can easily use the pattern match helpers here. Differential Revision: https://reviews.llvm.org/D48214 llvm-svn: 335621	2018-06-26 16:20:16 +00:00
Sanjay Patel	0f44759b0d	[InstCombine] add tests for urem with sext bool divisor; NFC llvm-svn: 335619	2018-06-26 16:01:24 +00:00
Sanjay Patel	2b7e31095d	[InstSimplify] fold srem with sext bool divisor llvm-svn: 335616	2018-06-26 15:32:54 +00:00
James Henderson	5507f6688d	[FileCheck] Add CHECK-EMPTY directive for checking for blank lines Prior to this change, there was no clean way of getting FileCheck to check that a line is completely empty. The expected way of using "CHECK: {{^$}}" does not work because the '^' matches the end of the previous match (this behaviour may be desirable in certain instances). For the same reason, "CHECK-NEXT: {{^$}}" will fail when the previous match was at the end of the line, as the pattern will match there. Using the recommended [[:space:]] to match an explicit new line could also match a space, and thus is not always desired. Literal '\n' matches also do not work. A workaround was suggested in the review, but it is a little clunky. This change adds a new directive that behaves the same as CHECK-NEXT, except that it only matches against empty lines (nothing, not even whitespace, is allowed). As with CHECK-NEXT, it will fail if more than one newline occurs before the next blank line. Example usage: ; test.txt foo bar ; CHECK: foo ; CHECK-EMPTY: ; CHECK-NEXT: bar Differential Revision: https://reviews.llvm.org/D28896 Reviewed by: probinson llvm-svn: 335613	2018-06-26 15:15:45 +00:00
Sanjay Patel	0e0dbebeed	[InstSimplify] add tests for srem with sext bool divisor; NFC llvm-svn: 335609	2018-06-26 14:47:31 +00:00
Krzysztof Parzyszek	70f027022c	Account for undef values from predecessors in extendSegmentsToUses It is legal for a PHI node not to have a live value in a predecessor as long as the end of the predecessor is jointly dominated by an undef value. llvm-svn: 335607	2018-06-26 14:37:16 +00:00
Than McIntosh	3190993a02	[X86,ARM] Retain split-stack prolog check for sibling calls Summary: If a routine with no stack frame makes a sibling call, we need to preserve the stack space check even if the local stack frame is empty, since the call target could be a "no-split" function (in which case the linker needs to be able to fix up the prolog sequence in order to switch to a larger stack). This fixes PR37807. Reviewers: cherry, javed.absar Subscribers: srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D48444 llvm-svn: 335604	2018-06-26 14:11:30 +00:00
Teresa Johnson	63ee0e73e4	[ThinLTO] Parse module summary index from assembly Summary: Adds assembly parsing support for the module summary index (follow on to r333335 which added the assembly writing support). I added support to llvm-as to invoke the index parsing, so that it can create either a bitcode file with a Module and a per-module index, or a combined index without a Module. I will send follow on patches soon to do the following: - add support to tools such as llvm-lto2 to parse the per-module indexes from assembly instead of bitcode when testing the thin link. - verification support. Depends on D47844 and D47842. Reviewers: pcc, dexonsmith, mehdi_amini Subscribers: inglorion, eraman, steven_wu, llvm-commits Differential Revision: https://reviews.llvm.org/D47905 llvm-svn: 335602	2018-06-26 13:56:49 +00:00
Sanjay Patel	7c45debaea	[InstCombine] fold udiv with sext bool divisor Note: I didn't add a hasOneUse() check because the existing, related fold doesn't have that check. I suspect that the improved analysis and codegen make these some of the rare canonicalization cases where we allow an increase in instructions. llvm-svn: 335597	2018-06-26 12:41:15 +00:00
Tim Northover	f2f9f2f505	ARM: add binary file git swallowed. Should fix bots. llvm-svn: 335596	2018-06-26 12:28:47 +00:00
Tim Northover	b73efb85ba	ARM: correctly decode VFP instructions following unpredictable t2IT When the condition code for an IT instruction is "AL" we get strange "15" predicates on subsequent instructions. These are dealt with for most instructions by treating them as "ARMCC::AL", but VFP takes a different path which didn't have this code. llvm-svn: 335594	2018-06-26 11:39:20 +00:00
Tim Northover	bf54858115	ARM: diagnose unpredictable IT instructions IT instructions are allowed to have the 'AL' predicate, but it must never result in an 'NV' predicated instruction. Essentially this means that all branches must be 't' rather than 'e' if the predicate is 'AL'. This patch adds a diagnostic for this during assembly (error because parsing hits an assertion if allowed to continue) and an annotation during disassembly. llvm-svn: 335593	2018-06-26 11:38:41 +00:00
Florian Hahn	4a69b0bb36	[IPSCCP] Change dead blocks to unreachable after visiting all executable blocks. changeToUnreachable may remove PHI nodes from executable blocks we found values for and we would fail to replace them. By changing dead blocks to unreachable after we replaced constants in all executable blocks, we ensure such PHI nodes are replaced by their known value before. Fixes PR37780. Reviewers: efriedma, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D48421 llvm-svn: 335588	2018-06-26 10:15:02 +00:00
Bernard Ogden	56c6e7015b	[AArch64] Tighten up directives tests Move expected-fail cases from directive-cpu.s to directive-cpu-err.s. This allows us to remove the 'not' from the llvm-mc invocation in directive-cpu.s so that this test will fail in unexpected error cases. It also means that we are not relying on all stderr coming before any stdout, which seems fragile. Also make use of CHECK-NEXT to ensure that multiline error messages really are occuring together. And add a test to verify that .cpu with an arch version as extension is rejected. Differential Revision: https://reviews.llvm.org/D47873 llvm-svn: 335586	2018-06-26 09:49:31 +00:00
Bernard Ogden	15aa0db052	[AArch64] Clean up LSE directive tests These were specifying an architecture version with .cpu directive, which is invalid. As the error for this case outputs the problem instruction we were still matching the expectations of FileCheck. This patch fixes up the LSE tests to do what they seem to intend. A follow-up patch will tighten up the directive tests. Differential Revision: https://reviews.llvm.org/D47872 llvm-svn: 335585	2018-06-26 09:36:13 +00:00
Bjorn Pettersson	550517bcab	Improve ConvertDebugDeclareToDebugValue Summary: This is a follow-up to r334830 and r335031. In the valueCoversEntireFragment check we now also handle the situation when there is a variable length array (VLA) involved, and the length of the array has been reduced to a constant. The ConvertDebugDeclareToDebugValue functions that are related to PHI nodes and load instructions now avoid inserting dbg.value intrinsics when the value does not, for certain, cover the variable/fragment that should be described. In r334830 we assumed that the value always covered the entire var/fragment and we had assertions in the code to show that assumption. However, those asserts failed when compiling code with VLAs, so we removed the asserts in r335031. Now when we know that the valueCoversEntireFragment check can fail also for PHI/Load instructions we avoid to insert the faulty dbg.value intrinsic in such situations. Compared to the Store instruction scenario we simply drop the dbg.value here (as the variable does not change its value due to PHI/Load, so an earlier dbg.value describing the variable should still be valid). Reviewers: aprantl, vsk, efriedma Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48547 llvm-svn: 335580	2018-06-26 06:17:00 +00:00
Gil Rapaport	da2e2caa6c	[InstCombine] (A + 1) + (B ^ -1) --> A - B Turn canonicalized subtraction back into (-1 - B) and combine it with (A + 1) into (A - B). This is similar to the folding already done for (B ^ -1) + Const into (-1 + Const) - B. Differential Revision: https://reviews.llvm.org/D48535 llvm-svn: 335579	2018-06-26 05:31:18 +00:00
Dan Gohman	910ba33d0c	[WebAssembly] Fix lowering of varargs functions with non-legal fixed arguments. CallLoweringInfo's NumFixedArgs field gives the number of fixed arguments before legalization. The ISD::OutputArg "Outs" array holds legalized arguments, so when indexing into it to find the non-fixed arguemn, we need to use the number of arguments after legalization. Fixes PR37934. llvm-svn: 335576	2018-06-26 03:18:38 +00:00
Craig Topper	c42ed4e3c4	[X86] Use XOR for SUB (C, X) during isel if will help fold an immediate Summary: Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining. This seems like the safest way to do this trick. And we consider doing it as a DAG combine later. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48557 llvm-svn: 335575	2018-06-26 03:11:15 +00:00
Teresa Johnson	519055336d	[ThinLTO] Add string saver onto index for value names Summary: Adds a string saver to the ModuleSummaryIndex so it can store value names in the case of adding a ValueInfo for a GUID when we don't have the name stored in a Module string table. This is motivated by the upcoming summary parser patch, where we will read value names from the summary entry and want to store them, even when a Module is not available. Currently this allows us to store the name in the legacy bitcode case, and I have added a test to show that. Reviewers: pcc, dexonsmith Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits Differential Revision: https://reviews.llvm.org/D47842 llvm-svn: 335570	2018-06-26 02:29:08 +00:00
Craig Topper	689e363ff2	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction. This recommits r335562 and 335563 as a single commit. The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects. By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away. llvm-svn: 335568	2018-06-26 01:37:02 +00:00
Teresa Johnson	9766fd64fb	[ThinLTO] Add per-module indexes to combined index consistently Summary: Without this change we only add module paths to the combined index when there is a module hash or at least one global value. Make this more consistent by adding the module to the index whenever there is a summary section, and it is a per-module summary (had a MODULE_CODE_SOURCE_FILENAME record). Since we will no longer add module paths lazily, add a new interface to get the module info from the index that asserts it is already added. Fixes PR37899. Reviewers: Vlad, pcc Subscribers: mehdi_amini, inglorion, steven_wu, llvm-commits Differential Revision: https://reviews.llvm.org/D48511 llvm-svn: 335567	2018-06-26 01:32:58 +00:00
Craig Topper	6f4fdfa9af	Revert r335562 and 335563 "[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction." These were supposed to have been squashed to a single commit. llvm-svn: 335566	2018-06-26 01:31:53 +00:00
Craig Topper	c2ee4a5035	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction. The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects. By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away. llvm-svn: 335563	2018-06-26 00:43:46 +00:00
Eric Christopher	b7a52bb28a	Add a warning if someone attempts to add extra section flags to sections with well defined semantics like .rodata. llvm-svn: 335558	2018-06-25 23:53:54 +00:00
Chandler Carruth	1652996fd6	[PM/LoopUnswitch] Teach the new unswitch to handle nontrivial unswitching of switches. This works much like trivial unswitching of switches in that it reliably moves the switch out of the loop. Here we potentially clone the entire loop into each successor of the switch and re-point the cases at these clones. Due to the complexity of actually doing nontrivial unswitching, this patch doesn't create a dedicated routine for handling switches -- it would duplicate far too much code. Instead, it generalizes the existing routine to handle both branches and switches as it largely reduces to looping in a few places instead of doing something once. This actually improves the results in some cases with branches due to being much more careful about how dead regions of code are managed. With branches, because exactly one clone is created and there are exactly two edges considered, somewhat sloppy handling of the dead regions of code was sufficient in most cases. But with switches, there are much more complicated patterns of dead code and so I've had to move to a more robust model generally. We still do as much pruning of the dead code early as possible because that allows us to avoid even cloning the code. This also surfaced another problem with nontrivial unswitching before which is that we weren't as precise in reconstructing loops as we could have been. This seems to have been mostly harmless, but resulted in pointless LCSSA PHI nodes and other unnecessary cruft. With switches, we have to get this right, and everything benefits from it. While the testing may seem a bit light here because we only have two real cases with actual switches, they do a surprisingly good job of exercising numerous edge cases. Also, because we share the logic with branches, most of the changes in this patch are reasonably well covered by existing tests. The new unswitch now has all of the same fundamental power as the old one with the exception of the single unsound case of partial switch unswitching -- that really is just loop specialization and not unswitching at all. It doesn't fit into the canonicalization model in any way. We can add a loop specialization pass that runs late based on profile data if important test cases ever come up here. Differential Revision: https://reviews.llvm.org/D47683 llvm-svn: 335553	2018-06-25 23:32:54 +00:00
Craig Topper	53a41858c1	[X86] Update fpclass intrinsic tests to chain their calls to the intrinsic rather than joining them with add. The test cases try to test masked and unmasked isntructions at the same time. Previously the masked version relies on an extra fucntion parameter. Then the two results were combined with 'add'. This patch gets rid of the second parameter and just passes the result of the first intrinsic into the mask argument of the second call. Then there's no need for an 'add'. This configuration works a lot better with an upcoming patch to redefine the intrinsics to use vXi1 types for the output and mask argument. llvm-svn: 335551	2018-06-25 23:29:47 +00:00
Francis Visoiu Mistrih	48c4885fe7	[OrcMCJIT] Fix test after r335508 causing it to fail on green dragon http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/46572/console LLVM ERROR: unsupported relocation with subtraction expression, symbol '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction expression Do the same thing as MCJIT/eh-lg-pic.ll. llvm-svn: 335548	2018-06-25 23:14:08 +00:00
Sanjay Patel	0c90400bf2	[InstCombine] add/move tests for udiv; NFC llvm-svn: 335544	2018-06-25 22:27:36 +00:00
Sanjay Patel	6a96d90acd	[InstCombine] fold sdiv with sext bool divisor llvm-svn: 335527	2018-06-25 21:39:41 +00:00
Sanjay Patel	46f9b8c333	[InstCombine] add tests for sdiv with sext bool divisor; NFC llvm-svn: 335526	2018-06-25 21:36:09 +00:00
Florian Hahn	b10b141a79	Revert r335513: [SCEVExp] Advance found insertion point llvm-svn: 335522	2018-06-25 20:55:26 +00:00
Florian Hahn	0b3ed5742a	Force vector width for scev-expander-debug.ll test llvm-svn: 335520	2018-06-25 20:40:50 +00:00
Lei Huang	5d109ee3d4	[PowerPC] Fix incorrectly encoded wait instruction Encoding for the wait instruction was wrong. Fix according to ISA 3.0. Differential Revision: https://reviews.llvm.org/D48550 llvm-svn: 335514	2018-06-25 19:28:27 +00:00
Florian Hahn	5947c17fd4	[SCEVExp] Advance found insertion point until we find a non-dbg instruction. This avoids creating unnecessary casts if the IP used to be a dbg info intrinsic. Fixes PR37727. Reviewers: vsk, aprantl, sanjoy, efriedma Reviewed By: vsk, efriedma Differential Revision: https://reviews.llvm.org/D47874 llvm-svn: 335513	2018-06-25 19:17:29 +00:00
Sanjay Patel	1e911fa746	[InstSimplify] fold div/rem of zexted bool I was looking at an unrelated fold and noticed that we don't have this simplification (because the other fold would break existing tests). Name: zext udiv %z = zext i1 %x to i32 %r = udiv i32 %y, %z => %r = %y Name: zext urem %z = zext i1 %x to i32 %r = urem i32 %y, %z => %r = 0 Name: zext sdiv %z = zext i1 %x to i32 %r = sdiv i32 %y, %z => %r = %y Name: zext srem %z = zext i1 %x to i32 %r = srem i32 %y, %z => %r = 0 https://rise4fun.com/Alive/LZ9 llvm-svn: 335512	2018-06-25 18:51:21 +00:00
Sanjay Patel	a46bcbec58	[InstSimplify] add tests for div/rem with bool divisor; NFC llvm-svn: 335509	2018-06-25 18:27:14 +00:00
Reid Kleckner	88fee5fdbc	Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models" The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC code model is not implemented for MachO yet. Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335508	2018-06-25 18:16:27 +00:00
Sanjay Patel	2e8babb4fa	[InstCombine] add tests for add-of-sext-bool; NFC We canonicalize to select with a zext-add and either zext-sub or sext-sub, so this shows a pattern that's not conforming to the general trend. llvm-svn: 335506	2018-06-25 17:52:10 +00:00
Craig Topper	b9cb88a4b0	[X86] Allow base and index for gather instructions to appear in other order for Intel syntax. llvm-svn: 335500	2018-06-25 17:26:51 +00:00
Vedant Kumar	b725c69f12	[SelectionDAG] Remove debug locations from ConstantSD(FP)Nodes This removes debug locations from ConstantSDNode and ConstantSDFPNode. When this kind of node is materialized we no longer create a line table entry which jumps back to the constant's first point of use. This makes single-stepping behavior smoother, and it matches the model used by IR, where Constants have no locations. See this thread for more context: http://lists.llvm.org/pipermail/llvm-dev/2018-June/124164.html I'd like to handle constant BuildVectorSDNodes and to try to eliminate passing SDLocs to SelectionDAG::getConstant*() in follow-up commits. Differential Revision: https://reviews.llvm.org/D48468 llvm-svn: 335497	2018-06-25 17:06:18 +00:00
Matt Arsenault	b1cc4f52ff	AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptr Note a normal select test is not currently possible because this relies on input registers tracked in SIMachineFunctionInfo which are not currently serializable in MIR, but this does work end-to-end from the IR. llvm-svn: 335490	2018-06-25 16:17:48 +00:00
Matt Arsenault	921f7a27cc	StackSlotColoring: Decide colors per stack ID I thought I fixed this in r308673, but that fix was very broken. The assumption that any frame index can be used in place of another was more widespread than I realized. Even when stack slot sharing was disabled, this was still replacing frame index uses with a different ID with a different stack slot. Really fix this by doing the coloring per-stack ID, so all of the coloring logically done in a separate namespace. This is a lot simpler than trying to figure out how to change the color if the stack ID is different. llvm-svn: 335488	2018-06-25 16:05:55 +00:00
Matt Arsenault	b3feccd7fa	AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointers llvm-svn: 335485	2018-06-25 15:42:12 +00:00
David Green	8699492304	[DA] Delinearise AddRecs if we can prove they don't wrap We can prove that some delinearized subscripts do not wrap around to become negative by the fact that they are from inbound geps of load/store locations. This helps improve the delinearisation in cases where we can't prove that they are non-negative from SCEV alone. Differential Revision: https://reviews.llvm.org/D48481 llvm-svn: 335481	2018-06-25 15:13:26 +00:00
Matt Arsenault	73eeb42e50	AMDGPU: Respect align argument parameter This should avoid relying on the pointee type to get the alignment, particularly since pointee types are supposed to be removed at some point. Also fixes not getting the alignment for unsized types. llvm-svn: 335478	2018-06-25 14:29:04 +00:00
Krzysztof Parzyszek	4581f37e7c	Improve handling of COPY instructions with identical value numbers Testcases provided by Tim Renouf. Differential Revision: https://reviews.llvm.org/D48102 llvm-svn: 335472	2018-06-25 13:46:41 +00:00
Artur Pilipenko	ddc7f391d2	Revert change 335077 "[InlineSpiller] Fix a crash due to lack of forward progress from remat specifically for STATEPOINT" This change caused widespread assertion failures in our downstream testing: lib/CodeGen/LiveInterval.cpp:409: bool llvm::LiveRange::overlapsFrom(const llvm::LiveRange&, llvm::LiveRange::const_iterator) const: Assertion `!empty() && "empty range"' failed. llvm-svn: 335462	2018-06-25 12:58:13 +00:00
Artur Pilipenko	ab52071ddd	Revert change 335091. It adds extra test for the change 335077, which is also to be reverted as it causes test failures in downstream testing. llvm-svn: 335461	2018-06-25 12:55:58 +00:00
Heejin Ahn	4934f76b58	[WebAssembly] Add WebAssemblyLateEHPrepare pass Summary: Add WebAssemblyLateEHPrepare pass that does several small jobs for exception handling. This runs before CFGSort, and is different from WasmEHPrepare pass that runs before ISel, even though the names are similar. Reviewers: dschuff, majnemer Subscribers: sbc100, jgravelle-google, sunfish, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D46803 llvm-svn: 335438	2018-06-25 01:07:11 +00:00
Craig Topper	4331d6218d	[X86] Remove the changes to combineScalarToVector made in r335037. They appear to be untested other than the test case for p37879.ll and I believe we should be using SimplifyDemandedElts here to handle these cases. llvm-svn: 335436	2018-06-25 00:21:53 +00:00
Sanjay Patel	962ee178fa	[DAGCombiner] eliminate setcc bool math when input is low-bit of some value This patch has the same motivating example as D48466: define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) { %c.0282 = and i32 %c.0282.in, 268435455 %a16 = lshr i64 32508, %x %a17 = and i64 %a16, 1 %tobool = icmp eq i64 %a17, 0 %. = select i1 %tobool, i32 1, i32 2 %.286 = select i1 %tobool, i32 27, i32 26 %shr97 = lshr i32 %c.0282, %. %shl98 = shl i32 %c.0282.in, %.286 %or99 = or i32 %shr97, %shl98 %shr100 = lshr i32 %d.0280, %. %shl101 = shl i32 %d.0280, %.286 %or102 = or i32 %shr100, %shl101 store i32 %or99, i32* %ptr0 store i32 %or102, i32* %ptr1 ret void } ...but I'm trying to kill the setcc bool math sooner rather than later. By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, we can create a universally good fold because we always eliminate the condition code intermediate value. Here are Alive proofs for these (currently instcombine folds the 'add' variants, but misses the 'sub' patterns): https://rise4fun.com/Alive/Gsyp Name: sub of zext cmp mask %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %z = zext i1 %c to i32 %r = sub i32 C1, %z => %optional_cast = zext i8 %a to i32 %r = add i32 %optional_cast, C1-1 Name: add of zext cmp mask %a = and i32 %x, 1 %c = icmp eq i32 %a, 0 %z = zext i1 %c to i8 %r = add i8 %z, C1 => %optional_cast = trunc i32 %a to i8 %r = sub i8 C1+1, %optional_cast All of the tests look like improvements or neutral to me. But it is possible that x86 test+set+bitop is better than what we now show here. I suspect we could do better by adding another fold for the 'sub' variants. We start with select-of-constant in IR in the larger motivating test, so that's why I included tests with selects. Proofs for those variants: https://rise4fun.com/Alive/Bx1 Name: true const is bigger Pre: C2 == (C1 + 1) %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %r = select i1 %c, i64 C2, i64 C1 => %z = zext i8 %a to i64 %r = sub i64 C2, %z Name: false const is bigger Pre: C2 == (C1 + 1) %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %r = select i1 %c, i64 C1, i64 C2 => %z = zext i8 %a to i64 %r = add i64 C1, %z Differential Revision: https://reviews.llvm.org/D48466 llvm-svn: 335433	2018-06-24 14:37:30 +00:00
Jonas Devlieghere	fb54074112	[llvm-mt] Use WithColor for printing errors. Use the WithColor helper from support to print errors. llvm-svn: 335416	2018-06-23 16:49:07 +00:00
Craig Topper	d8d64a56b5	[X86] Make %eiz usage in 64-bit mode, force a 0x67 address size prefix. Fix some test CHECK lines. llvm-svn: 335414	2018-06-23 06:15:04 +00:00
Craig Topper	2545529034	[X86] Teach disassembler to use %eip instead of %rip when 0x67 prefix is used on a rip-relative address. llvm-svn: 335413	2018-06-23 06:03:48 +00:00
Craig Topper	68d64e3859	[X86][AsmParser] Improve base/index register checks. -Ensure EIP isn't used with an index reigster. -Ensure EIP isn't used as index register. -Ensure base register isn't a vector register. -Ensure eiz/riz usage matches the size of their base register. llvm-svn: 335412	2018-06-23 05:53:00 +00:00
Stanislav Mekhanoshin	d8c9374797	Fix invariant fdiv hoisting in LICM FDiv is replaced with multiplication by reciprocal and invariant reciprocal is hoisted out of the loop, while multiplication remains even if invariant. Switch checks for all invariant operands and only invariant denominator to fix the issue. Differential Revision: https://reviews.llvm.org/D48447 llvm-svn: 335411	2018-06-23 04:01:28 +00:00
Reid Kleckner	f5890e4e43	[IR] Split Intrinsics.inc into enums and implementations Implements PR34259 Intrinsics.h is a very popular header. Most LLVM TUs care about things like dbg_value, but they don't care how they are implemented. After I split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU from scanning 1.7 MB of source that gets pre-processed away. It also means we can modify intrinsic properties without triggering a full rebuild, but that's probably less of a win. I think the next best thing to do would be to split out the target intrinsics into their own header. Very, very few TUs care about target-specific intrinsics. It's very hard to split up the target independent intrinsics like llvm.expect, assume, and dbg.value, though. llvm-svn: 335407	2018-06-23 02:02:38 +00:00
Fangrui Song	4ef42a83f9	[ELF] Change isSectionData to exclude SHF_EXECINSTR Summary: This affects what sections are displayed as "DATA" in llvm-objdump. The other user llvm-size is unaffected. Before, a "TEXT" section is also "DATA", which seems weird. The sh_flags condition matches that of bfd's SEC_DATA but the sh_type condition uses (== SHF_PROGBITS) instead of bfd's (!= SHT_NOBITS). bfd's SEC_DATA is not appealing as so many sections will be shown as DATA. Reviewers: jyknight, Bigcheese Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48472 llvm-svn: 335405	2018-06-23 00:15:33 +00:00
Reid Kleckner	330f65b3e8	[RuntimeDyld] Implement the ELF PIC large code model relocations Prerequisite for https://reviews.llvm.org/D47211 which improves our ELF large PIC codegen. llvm-svn: 335402	2018-06-22 23:53:22 +00:00
Eli Friedman	203eaaf5ba	[LoopReroll] Rewrite induction variable rewriting. This gets rid of a bunch of weird special cases; instead, just use SCEV rewriting for everything. In addition to being simpler, this fixes a bug where we would use the wrong stride in certain edge cases. The one bit I'm not quite sure about is the trip count handling, specifically the FIXME about overflow. In general, I think we need to widen the exit condition, but that's probably not profitable if the new type isn't legal, so we probably need a check somewhere. That said, I don't think I'm making the existing problem any worse. As a followup to this, a bunch of IV-related code in root-finding could be cleaned up; with SCEV-based rewriting, there isn't any reason to assume a loop will have exactly one or two PHI nodes. Differential Revision: https://reviews.llvm.org/D45191 llvm-svn: 335400	2018-06-22 22:58:55 +00:00
Craig Topper	10e2f73793	[X86][AsmParser] Keep track of whether an explicit scale was specified while parsing an address in Intel syntax. Use it for improved error checking. This allows us to check these: -16-bit addressing doesn't support scale so we should error if we find one there. -Multiplying ESP/RSP by a scale even if the scale is 1 should be an error because ESP/RSP can't be an index. llvm-svn: 335398	2018-06-22 22:28:39 +00:00
Sanjay Patel	80b85a46db	[x86] add more tests for bit hacking opportunities with setcc; NFC Missed cases where the input and output are the same size in rL335391. llvm-svn: 335396	2018-06-22 22:07:26 +00:00
Sanjay Patel	0fe8ea568b	[PowerPC] add more tests for bit hacking opportunities with setcc; NFC Missed cases where the input and output are the same size in rL335390. llvm-svn: 335395	2018-06-22 22:06:33 +00:00
Craig Topper	1d707539e4	[X86][AsmParser] In Intel syntax make sure we support ESP/RSP being the second register in memory expressions like [EAX+ESP]. By default, the second register gets assigned to the index register slot. But ESP can't be an index register so we need to swap it with the other register. There's still a slight bug that we allow [EAX+ESP*1]. The existence of the multiply even though its with 1 should force ESP to the index register and trigger an error, but it doesn't currently. llvm-svn: 335394	2018-06-22 21:57:24 +00:00
Sanjay Patel	705cde3ac8	[x86] add tests for bit hacking opportunities with setcc; NFC We likely gave up on folding some select-of-constants patterns in IR with rL331486, and we need to recover those in the DAG. The tests without select are based on our current DAGCombiner optimizations for select-of-constants. llvm-svn: 335391	2018-06-22 21:16:54 +00:00
Sanjay Patel	6e505e4388	[PowerPC] add tests for bit hacking opportunities with setcc; NFC We likely gave up on folding some select-of-constants patterns in IR with rL331486, and we need to recover those in the DAG. The tests without select are based on our current DAGCombiner optimizations for select-of-constants. llvm-svn: 335390	2018-06-22 21:16:29 +00:00
Craig Topper	a55cc4a2e9	[X86] Add test cases showing missed select simplifcation for MCU when icmp is in a slightly different form. These test cases show that the "(select (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) ^ y" doesn't work if the select condition is changed to (and (x, 0x1) != 1) llvm-svn: 335389	2018-06-22 21:09:31 +00:00
Aditya Nandakumar	e2a7f31064	[GISel]: Add G_ADDRSPACE_CAST Opcode Added IRTranslator support for addrspacecast. https://reviews.llvm.org/D48469 reviewed by: volkan llvm-svn: 335388	2018-06-22 20:58:51 +00:00
Craig Topper	9bc2c059c3	[X86] Don't accept (%si,%bp) 16-bit address expressions. The second register is the index register and should only be %si or %di if used with a base register. And in that case the base register should be %bp or %bx. This makes us compatible with gas. We do still need to support both orders with Intel syntax which uses [bp+si] and [si+bp] llvm-svn: 335384	2018-06-22 20:20:38 +00:00
Craig Topper	c26c62e0e5	[X86][AsmParser] Allow (%bp,%si) and (%bp,%di) to be encoded without using a zero displacement. (%bp) can't be encoded without a displacement. The encoding is instead used for displacement alone. So a 1 byte displacement of 0 must be used. But if there is an index register we can encode without a displacement. llvm-svn: 335379	2018-06-22 19:42:21 +00:00
Simon Pilgrim	938dbe664b	[X86][SSE] Add sdiv by (nonuniform) minus one tests (PR37119) Test cases from D45806 llvm-svn: 335376	2018-06-22 18:31:57 +00:00
Craig Topper	cd18bb523c	[X86][AsmParser] Check for invalid 16-bit base register in Intel syntax. llvm-svn: 335373	2018-06-22 17:50:40 +00:00
Craig Topper	22d1db122a	[X86] Don't allow ESP/RSP to be used as an index register in assembly. Fixes PR37892 llvm-svn: 335370	2018-06-22 17:15:58 +00:00
Easwaran Raman	f997233890	[X86] Add a test to show missed opportunity to generate vfnmadd llvm-svn: 335367	2018-06-22 17:01:13 +00:00

... 3 4 5 6 7 ...

54411 Commits