llvm-project

Commit Graph

Author	SHA1	Message	Date
Teresa Johnson	76b5b7493c	Handle link of NoDebug CU with a CU that has debug emission enabled Summary: This is an issue both with regular and Thin LTO. When we link together a DICompileUnit that is marked NoDebug (e.g when compiling with -g0 but applying an AutoFDO profile, which requires location tracking in the compiler) and a DICompileUnit with debug emission enabled, we can have failures during dwarf debug generation. Specifically, when we have inlined from the NoDebug compile unit into the debug compile unit, we can fail during construction of the abstract and inlined scope DIEs. This is because the SPMap does not include NoDebug CUs (they are skipped in the debug_compile_units_iterator). This patch fixes the failures by skipping locations from NoDebug CUs when extracting lexical scopes. Reviewers: dblaikie, aprantl Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D29765 llvm-svn: 295384	2017-02-17 00:21:19 +00:00
Wei Mi	493fb266ed	[LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loops In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops other than current loop. This is good for the case when induction variable of outerloop being used in expr in innerloop. But it is very bad to allow such Reg from sibling loop because we may need to add lsr.iv in other sibling loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase below, one loop can be inserted with a bunch of lsr.iv because of LSR for other loops. // The induction variable j from a loop in the middle will have initial // value generated from previous sibling loop and exit value used by its // next sibling loop. void goo(long i, long j); long cond; void foo(long N) { long i = 0; long j = 0; i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); } The fix is to only allow formula with SCEVAddRecExpr type of Reg from current loop or its parents. Differential Revision: https://reviews.llvm.org/D30021 llvm-svn: 295378	2017-02-16 21:27:31 +00:00
Krzysztof Parzyszek	fb9503c080	[Hexagon] Start using regmasks on calls All the cool targets are doing it... llvm-svn: 295371	2017-02-16 20:25:23 +00:00
Simon Pilgrim	e5215751ff	[X86][SSE] Add PR31309 test case (load-extend i32 to i128). llvm-svn: 295363	2017-02-16 19:17:36 +00:00
Matt Arsenault	b95ddd7cea	AMDGPU: Remove llvm.AMDGPU.cube intrinsic llvm-svn: 295359	2017-02-16 19:09:04 +00:00
Matt Arsenault	eb65cda986	AMDGPU: Remove llvm.AMDGPU.rsq intrinsic llvm-svn: 295358	2017-02-16 19:08:58 +00:00
Hans Wennborg	35905d6a67	Re-apply r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)" The original commit was reverted in r283329 due to a miscompile in Chromium. That turned out to be the same issue as PR31257, which was fixed in r295262. llvm-svn: 295357	2017-02-16 19:04:42 +00:00
David Blaikie	b2fbb4b276	Refactor DebugHandlerBase a bit to common non-debug-having-function filtering llvm-svn: 295354	2017-02-16 18:48:33 +00:00
Matt Arsenault	920576042d	InstCombine: Canonicalize fast fmuladd to fmul + fadd llvm-svn: 295353	2017-02-16 18:46:24 +00:00
Andrea Di Biagio	42f7712e23	x86 interrupt calling convention: only save xmm registers if the target supports SSE The existing code always saves the xmm registers for 64-bit targets even if the target doesn't support SSE (which is common for kernels). Thus, the compiler inserts movaps instructions which lead to CPU exceptions when an interrupt handler is invoked. This commit fixes this bug by returning a register set without xmm registers from getCalleeSavedRegs and getCallPreservedMask for such targets. Patch by Philipp Oppermann. Differential Revision: https://reviews.llvm.org/D29959 llvm-svn: 295347	2017-02-16 18:25:37 +00:00
Sanjay Patel	8e55b685c2	[x86] add more tests of select of constants; NFC llvm-svn: 295346	2017-02-16 18:15:16 +00:00
Artur Pilipenko	85d758299e	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Resubmit -r295314 with PowerPC and AMDGPU tests updated. Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295336	2017-02-16 17:07:27 +00:00
Sjoerd Meijer	cb2d950214	[AArch64] AArch64AsmParser clean up of isImmediate functions. NFC Regression test neon-diagnostics.s needed changing because it now produces a more specific diagnostic about the immediate ranges. One change in the expected error message is not obvious, but there multiple candidate and it happens to pick the immediate diagnostic. Differential Revision: https://reviews.llvm.org/D29939 llvm-svn: 295331	2017-02-16 15:52:22 +00:00
Diana Picus	1540b06ef8	[ARM] GlobalISel: Select floating point loads llvm-svn: 295321	2017-02-16 14:10:50 +00:00
Artur Pilipenko	a1b384c4ce	Rever -r295314 "[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine" This change causes some of AMDGPU and PowerPC tests to fail. llvm-svn: 295316	2017-02-16 13:04:46 +00:00
Artur Pilipenko	daaa0c0f7d	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295314	2017-02-16 12:53:26 +00:00
Diana Picus	b1701e0b05	[ARM] GlobalISel: Select G_SEQUENCE and G_EXTRACT Since they're only used for passing around double precision floating point values into the general purpose registers, we'll lower them to VMOVDRR and VMOVRRD. llvm-svn: 295310	2017-02-16 12:19:57 +00:00
Diana Picus	6beef3c087	[ARM] GlobalISel: Select double G_FADD and copies Just use VADDD if available, bail out if not. llvm-svn: 295309	2017-02-16 12:19:52 +00:00
Diana Picus	9b32faa821	[ARM] GlobalISel: Assert that we don't use the FPR bank if we don't have VFP llvm-svn: 295308	2017-02-16 11:25:09 +00:00
Diana Picus	a93803b9fe	[ARM] GlobalISel: Add reg bank mappings for G_SEQUENCE and G_EXTRACT Support G_SEQUENCE and G_EXTRACT as needed for passing double precision floating point values in the soft-fp float mode. llvm-svn: 295306	2017-02-16 11:00:31 +00:00
Diana Picus	7f82c87022	[ARM] GlobalISel: Make the FPR bank 64-bit wide Also add mappings for single and double precision FP, and use them for G_FADD and G_LOAD. llvm-svn: 295302	2017-02-16 10:12:49 +00:00
Diana Picus	21c3d8e0fc	[ARM] GlobalISel: Legalize 64-bit G_FADD and G_LOAD For now we just mark them as legal all the time and let the other passes bail out if they can't handle it. In the future, we'll want to move more of the brains into the legalizer. llvm-svn: 295300	2017-02-16 09:09:49 +00:00
Diana Picus	ca6a890d7f	[ARM] GlobalISel: Lower double precision FP args For the hard float calling convention, we just use the D registers. For the soft-fp calling convention, we use the R registers and move values to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we make sure to honor the endianness of the target, since the CCAssignFn doesn't do that for us. For pure soft float targets, we still bail out because we don't support the libcalls yet. llvm-svn: 295295	2017-02-16 07:53:07 +00:00
Craig Topper	3731f4d173	[AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus intrinsics like it does 128/256-bit. llvm-svn: 295294	2017-02-16 07:35:23 +00:00
Craig Topper	715873ead3	[AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked intrinsics with select instructions. For 512-bit add new unmasked intrinsics. The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO. llvm-svn: 295290	2017-02-16 06:31:54 +00:00
Matt Arsenault	d3e5cb77e4	AMDGPU: Remove llvm.SI.sendmsg llvm-svn: 295270	2017-02-16 02:01:17 +00:00
Matt Arsenault	d2c8a337aa	AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsics Update test uses with expansion in terms of new intrinsics. llvm-svn: 295269	2017-02-16 02:01:13 +00:00
Hans Wennborg	a468601e0e	[X86] Re-enable conditional tail calls and fix PR31257. This reverts r294348, which removed support for conditional tail calls due to the PR above. It fixes the PR by marking live registers as implicitly used and defined by the now predicated tailcall. This is similar to how IfConversion predicates instructions. Differential Revision: https://reviews.llvm.org/D29856 llvm-svn: 295262	2017-02-16 00:04:05 +00:00
Tim Northover	9136617a3f	GlobalISel: legalize va_arg on AArch64. Uses a Custom implementation because the slot sizes being a multiple of the pointer size isn't really universal, even for the architectures that do have a simple "void *" va_list. llvm-svn: 295255	2017-02-15 23:22:50 +00:00
Tim Northover	4a652227dd	GlobalISel: support translating va_arg Since (say) i128 and [16 x i8] map to the same type in generic MIR, we also need to attach the required alignment info. llvm-svn: 295254	2017-02-15 23:22:33 +00:00
Daniel Berlin	3c1432fecf	Implement intrinsic mangling for literal struct types. Fixes PR 31921 Summary: Predicateinfo requires an ugly workaround to try to avoid literal struct types due to the intrinsic mangling not being implemented. This workaround actually does not work in all cases (you can hit the assert by bootstrapping with -print-predicateinfo), and can't be made to work without DFS'ing the type (IE copying getMangledStr and using a version that detects if it would crash). Rather than do that, i just implemented the mangling. It seems simple, since they are unified structurally. Looking at the overloaded-mangling testcase we have, it actually turns out the gc intrinsics will also crash if you try to use a literal struct. Thus, the testcase added fails before this patch, and works after, without needing to resort to predicateinfo. Reviewers: chandlerc, davide Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D29925 llvm-svn: 295253	2017-02-15 23:16:20 +00:00
Matt Arsenault	a78ca62c64	AMDGPU: Consolidate sendmsg/sendmsghalt handling and tests llvm-svn: 295244	2017-02-15 22:17:09 +00:00
Peter Collingbourne	50cbd7cc90	Re-apply r295110 and r295144 with a fix for the ASan issue. llvm-svn: 295241	2017-02-15 21:56:51 +00:00
Matt Arsenault	d122abead4	AMDGPU: Replace assert with report_fatal_error Also use a more refined condition. llvm-svn: 295239	2017-02-15 21:50:34 +00:00
Peter Collingbourne	9421c2dc54	AssumptionCache: Disable the verifier by default, move it behind a hidden cl::opt and verify from releaseMemory(). This is a short term solution to the problem that many passes currently fail to update the assumption cache. In the long term the verifier should not be controllable with a flag. We should either fix all passes to correctly update the assumption cache and enable the verifier unconditionally or somehow arrange for the assumption list to be updated automatically by passes. Differential Revision: https://reviews.llvm.org/D30003 llvm-svn: 295236	2017-02-15 21:10:09 +00:00
Arnold Schwaighofer	8d61e0030a	AddressSanitizer: don't track swifterror memory addresses They are register promoted by ISel and so it makes no sense to treat them as memory. Inserting calls to the thread sanitizer would also generate invalid IR. You would hit: "swifterror value can only be loaded and stored from, or as a swifterror argument!" llvm-svn: 295230	2017-02-15 20:43:43 +00:00
Tobias Edler von Koch	f454b9eadf	[LTO] Add ability to emit assembly to new LTO API Summary: Add a field to LTO::Config, CGFileType, to select the file type to emit (object or assembly). This is useful for testing and to implement -save-temps. Reviewers: tejohnson, mehdi_amini, pcc Reviewed By: mehdi_amini Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D29475 llvm-svn: 295226	2017-02-15 20:36:36 +00:00
Kyle Butt	7fbec9bdf1	Codegen: Make chains from trellis-shaped CFGs Lay out trellis-shaped CFGs optimally. A trellis of the shape below: A B \|\ /\| \| \ / \| \| X \| \| / \ \| \|/ \\| C D would be laid out A; B->C ; D by the current layout algorithm. Now we identify trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an increasing number of predecessors. A trellis is a a group of 2 or more predecessor blocks that all have the same successors. because of this we can tail duplicate to extend existing trellises. As an example consider the following CFG: B D F H / \ / \ / \ / \ A---C---E---G---Ret Where A,C,E,G are all small (Currently 2 instructions). The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret. The current code will copy C into B, E into D and G into F and yield the layout A,C,B(C),E,D(E),F(G),G,H,ret define void @straight_test(i32 %tag) { entry: br label %test1 test1: ; A %tagbit1 = and i32 %tag, 1 %tagbit1eq0 = icmp eq i32 %tagbit1, 0 br i1 %tagbit1eq0, label %test2, label %optional1 optional1: ; B call void @a() br label %test2 test2: ; C %tagbit2 = and i32 %tag, 2 %tagbit2eq0 = icmp eq i32 %tagbit2, 0 br i1 %tagbit2eq0, label %test3, label %optional2 optional2: ; D call void @b() br label %test3 test3: ; E %tagbit3 = and i32 %tag, 4 %tagbit3eq0 = icmp eq i32 %tagbit3, 0 br i1 %tagbit3eq0, label %test4, label %optional3 optional3: ; F call void @c() br label %test4 test4: ; G %tagbit4 = and i32 %tag, 8 %tagbit4eq0 = icmp eq i32 %tagbit4, 0 br i1 %tagbit4eq0, label %exit, label %optional4 optional4: ; H call void @d() br label %exit exit: ret void } here is the layout after D27742: straight_test: # @straight_test ; ... Prologue elided ; BB#0: # %entry ; A (merged with test1) ; ... More prologue elided mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_2 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_3 b .LBB0_4 .LBB0_2: # %optional1 ; B (copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_4 .LBB0_3: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_5 b .LBB0_6 .LBB0_4: # %optional2 ; D (copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_5: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 b .LBB0_7 .LBB0_6: # %optional3 ; F (copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit ; Ret ld 30, 96(1) # 8-byte Folded Reload addi 1, 1, 112 ld 0, 16(1) mtlr 0 blr The tail-duplication has produced some benefit, but it has also produced a trellis which is not laid out optimally. With this patch, we improve the layouts of such trellises, and decrease the cost calculation for tail-duplication accordingly. This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have back edges, which is a negative, but it has a bigger compensating positive, which is that it handles the case where there are long strings of skipped blocks much better than the original layout. Both layouts handle runs of executed blocks equally well. Branch prediction also improves if there is any correlation between subsequent optional blocks. Here is the resulting concrete layout: straight_test: # @straight_test ; BB#0: # %entry ; A (merged with test1) mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_4 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_5 .LBB0_2: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_3: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 bne 0, .LBB0_7 b .LBB0_8 .LBB0_4: # %optional1 ; B (Copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_2 .LBB0_5: # %optional2 ; D (Copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_3 .LBB0_6: # %optional3 ; F (Copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit Differential Revision: https://reviews.llvm.org/D28522 llvm-svn: 295223	2017-02-15 19:49:14 +00:00
Arnold Schwaighofer	8eb1a48540	ThreadSanitizer: don't track swifterror memory addresses They are register promoted by ISel and so it makes no sense to treat them as memory. Inserting calls to the thread sanitizer would also generate invalid IR. You would hit: "swifterror value can only be loaded and stored from, or as a swifterror argument!" llvm-svn: 295215	2017-02-15 18:57:06 +00:00
Michael Kuperstein	ba80db39d7	[DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source We currently can't legalize those, but we should really not be creating them in the first place, since legalization would probably look similar to the way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD. This fixes PR311956. Differential Revision: https://reviews.llvm.org/D29961 llvm-svn: 295213	2017-02-15 18:37:26 +00:00
Sanjay Patel	056218644b	[Inline] add tests to show attribute information loss; NFC llvm-svn: 295209	2017-02-15 17:42:58 +00:00
Simon Pilgrim	da25d5c7b6	[X86][SSE] Propagate undef upper elements from scalar_to_vector during shuffle combining Only do this for integer types currently - floats types (in particular insertps) load folding often fails with this. llvm-svn: 295208	2017-02-15 17:41:33 +00:00
Stanislav Mekhanoshin	582a5237f9	[AMDGPU] Revert failed scheduling This patch reverts region's scheduling to the original untouched state in case if we have have decreased occupancy. In addition it switches to use TargetRegisterInfo occupancy callback for pressure limits instead of gradually increasing limits which were just passed by. We are going to stay with the best schedule so we do not need to tolerate worsened scheduling anymore. Differential Revision: https://reviews.llvm.org/D29971 llvm-svn: 295206	2017-02-15 17:19:50 +00:00
Anna Thomas	94c8d4976c	Revert "[JumpThreading] Thread through guards" This reverts commit r294617. We fail on an assert while trying to get a condition from an unconditional branch. llvm-svn: 295200	2017-02-15 17:08:29 +00:00
Simon Pilgrim	d811bdd61a	[X86] Regenerate scalar stack reload test llvm-svn: 295195	2017-02-15 16:48:45 +00:00
Simon Pilgrim	a0e56d2d68	[X86] Regenerate i64 ext-load on 32-bit target tests llvm-svn: 295177	2017-02-15 14:06:17 +00:00
Simon Pilgrim	0f0e5bd3c6	[X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise ZERO inputs Add support for specifying an UNPCK input as ZERO, particularly improves ZEXT cases with non-zero offsets llvm-svn: 295169	2017-02-15 11:46:15 +00:00
Sagar Thakur	ec65792910	[LLVM][XRAY][MIPS] Support xray on mips/mipsel/mips64/mips64el Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit. Reviewed by sdardis, dberris Differential: D27697 llvm-svn: 295164	2017-02-15 10:48:11 +00:00
Daniel Jasper	eef9b03395	Revert r295110 and r295144. This fails under ASAN: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio llvm-svn: 295162	2017-02-15 09:56:08 +00:00
Craig Topper	fbc7805e25	[X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types Summary: We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs. As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast. I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable. This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused. Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0. Reviewers: delena, RKSimon, zvi Reviewed By: zvi Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D28747 llvm-svn: 295155	2017-02-15 06:58:47 +00:00

1 2 3 4 5 ...

42893 Commits