llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	2911b3a07a	[DA] Fix direction vectors for weakZeroSrcSIV Both weakZeroSrcSIV and weakZeroDstSIV are currently giving the same direction vectors. Fix weakZeroSrcSIVtest by flipping the directions it gives. Differential Revision: https://reviews.llvm.org/D46678 llvm-svn: 333658	2018-05-31 14:55:29 +00:00
Clement Courbet	2e41c5a79c	[X86] Introduce WriteFLDC for x87 constant loads. Summary: {FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI} were using WriteMicrocoded. - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake. - For ZnVer1 and Atom, values were transferred form InstRWs. - For SLM and BtVer2, I've guessed some values :( Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47585 llvm-svn: 333656	2018-05-31 14:22:01 +00:00
Simon Dardis	d9a453832d	[mips] Guard all short instructions correctly. Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47533 llvm-svn: 333645	2018-05-31 12:47:01 +00:00
Alexandros Lamprineas	61f0ba1fcc	[InstCombine, ARM] Convert vld1 to llvm load Convert a vector load intrinsic into an llvm load instruction. This is beneficial when the underlying object being addressed comes from a constant, since we get constant-folding for free. Differential Revision: https://reviews.llvm.org/D46273 llvm-svn: 333643	2018-05-31 12:19:18 +00:00
Clement Courbet	b78ab5097d	[X86] Extract latency of fldz/fld1 in separate classes. Summary: - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake. - For ZnVer1 and Atom, values were transferred form `InstRW`s. - For SLM and BtVer2, values are from Agner. This is split off from https://reviews.llvm.org/D47377 Reviewers: RKSimon, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47523 llvm-svn: 333642	2018-05-31 11:41:27 +00:00
Simon Pilgrim	346886bc0d	[X86][SSE] Add support for detecting SUB(SPLAT_BV, SPLAT) cases for shift-rotate patterns. This improves splat rotations (rotation by an uniform value), to avoid having to use the generic non-uniform shift code (extension to PR37426). llvm-svn: 333641	2018-05-31 11:25:16 +00:00
Pavel Labath	59870af66f	DWARFAcceleratorTable: fix equal_range iterators Summary: Both (Apple and DWARF5) implementations of the iterators had bugs which resulted in crashes if one attempted to iterate through the accelerator tables all the way. For the Apple tables, the issue was that we did not clear the DataOffset field when we reached the end, which made our iterator compare unequal to the "end" iterator. For the Dwarf5 tables, the problem was that we incremented the CurrentIndex pointer and then used the incremented (possibly invalid) pointer to check whether we have reached the end of the index list. The reason these bugs went undetected is because their only user (dwarfdump) only ever searched for the first match. Besides allowing us to test this fix, changing llvm-dwarfdump --find to display all matches seems like a good improvement (it makes the behavior consistent with the --name option), so I change llvm-dwarfdump to do that. The existing tests would be sufficient to test this fix with the new llvm-dwarfdump behavior, but I add a special test that demonstrates that the tool indeed displays multiple results. The find.test test needed to be tweaked a bit as the tool now does not print the ".debug_info contents" header (also consistent with how --name works). Reviewers: JDevlieghere, aprantl, dblaikie Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D47543 llvm-svn: 333635	2018-05-31 08:47:00 +00:00
Luke Geeson	2e09995d42	[AArch64] Reverted rL333427 fixing Clang UnitTest Failure llvm-svn: 333634	2018-05-31 08:27:53 +00:00
Roman Lebedev	c0ecd06428	Revert rL333106 / D46814: [InstCombine] Fold unfolded masked merge pattern with variable mask! In post-commit review, Eric Christopher notes that many new MSan warnings are being observed with this patch. The probable reason is: if 'y' is undef here and we could evaluate it twice and get different results. We can't increase the number of uses of a value. llvm-svn: 333631	2018-05-31 06:00:36 +00:00
Jan Vesely	f5016b79a6	AMDGPU/R600: Make sure functions are cacheline aligned v2: use "ensureAlignment" make functions cache line aligned Fixes GPU hangs since r333219: "AMDGPU: Split R600 AsmPrinter code into its own class" Differential Revision: https://reviews.llvm.org/D47516 llvm-svn: 333622	2018-05-31 04:08:08 +00:00
Roman Tereshin	5a65eb75c7	[GlobalISel][AArch64] LegalizerInfo verifier: Fixing bugs exposed by LegalizerInfo::verify(...) Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 llvm-svn: 333618	2018-05-31 01:56:05 +00:00
Sanjay Patel	e5bc441791	[InstCombine] don't change the size of a select if it would mismatch its condition operands' sizes Don't always: cast (select (cmp x, y), z, C) --> select (cmp x, y), (cast z), C' This is something that came up as far back as D26556, and I lost track of it. I suspect that this transform is part of the underlying problem that is inspiring some of the recent proposals that seek to match larger patterns that include a cast op. Even if that's not true, this transform causes problems for codegen (particularly with vector types). A transform to actively match the size of cmp and select operand sizes should follow. This patch just removes the harmful canonicalization in the other direction. Differential Revision: https://reviews.llvm.org/D47163 llvm-svn: 333611	2018-05-31 00:16:58 +00:00
Sanjay Patel	ceb595b04e	[InstCombine] don't negate constant expression with fsub (PR37605) X + (-C) would be transformed back into X - C, so infinite loop: https://bugs.llvm.org/show_bug.cgi?id=37605 llvm-svn: 333610	2018-05-30 23:55:12 +00:00
Vedant Kumar	e3c1fb8b12	[llvm-cov] Use the new PrintHTMLEscaped utility This removes some duplicate logic to escape characters in HTML output. llvm-svn: 333608	2018-05-30 23:35:14 +00:00
Vlad Tsyrklevich	178fdb1a3b	[LowerTypeTests] Discard extern_weak linkage for definitions Summary: Fix PR37625. It's possible for an extern_weak declaration to be emitted to the merged module when a definition exists in the ThinLTO portion of the build; discard the linkage on the declaration in that case. (otherwise we copy the linkage to the alias to the jumptable and fail) Reviewers: pcc Reviewed By: pcc Subscribers: mehdi_amini, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D47494 llvm-svn: 333604	2018-05-30 22:39:52 +00:00
Roman Tereshin	8f1753e994	[GlobalISel][AArch64] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call w/o fixing bugs This is to make it clear what kind of bugs the LegalizerInfo::verifier is able to catch and test its output Reviewers: aemerson, qcolombet Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D46338 llvm-svn: 333597	2018-05-30 22:10:04 +00:00
Peter Collingbourne	1651ac13be	llvm-objcopy: Set sh_link to 0 on unrecognized symtab-linked sections. Per discussion on the generic-abi mailing list: https://groups.google.com/forum/#!topic/generic-abi/MPr8TVtnVn4 An object file manipulation tool must either write out a symbol table with the same number of entries as the original symbol table and in the same order, or if this is impossible, refuse to operate on the object file if it has unrecognized sections that are linked to the symtab section. However, existing tools (namely GNU strip, GNU objcopy and ld.{bfd,gold,lld} -r) do not comply with this at present: they change symbol table indexes and set sh_link to 0 on the unrecognized symtab-linked sections. We intend to use the latter as a (temporary) signal that a tool has operated on a proposed new symtab-linked section and invalidated the symbol table indexes. However, llvm-objcopy currently keeps sh_link pointing to the new symtab section. This patch changes llvm-objcopy to set sh_link to 0 to match the behaviour of the other tools. Differential Revision: https://reviews.llvm.org/D47404 llvm-svn: 333581	2018-05-30 19:30:39 +00:00
Galina Kistanova	df917811ca	Reverted r333424 as it broke multiple build bots and left unfixed for a long time llvm-svn: 333578	2018-05-30 18:51:08 +00:00
Craig Topper	2fbbffa4bf	[X86] Update the fast-isel tests for _mm_rcp_ss, _mm_rsqrt_ss, and _mm_sqrt_ss to match clang codegen after r333572. llvm-svn: 333573	2018-05-30 18:30:44 +00:00
Jonas Devlieghere	f4ce54a123	[dsymutil] Escape HTML special characters in plist. When printing string in the Plist, we weren't escaping the characters which lead to invalid XML. This patch adds the escape logic to StringExtras. rdar://39785334 llvm-svn: 333565	2018-05-30 17:47:11 +00:00
Simon Pilgrim	3173f73554	[X86][AVX512BW] Fixed check prefix copy+paste typo in avx512bw-intrinsics.ll Prefix was for AVX512F instead of AVX512BW llvm-svn: 333560	2018-05-30 16:29:06 +00:00
Matt Arsenault	7b4826e6ce	AMDGPU: Use better alignment for kernarg lowering This was just emitting loads with the ABI alignment for the raw type. The true alignment is often better, especially when an illegal vector type was scalarized. The better alignment allows using a scalar load more often. llvm-svn: 333558	2018-05-30 16:17:51 +00:00
Karl-Johan Karlsson	ebaaa2ddae	[ValueTracking] Fix endless recursion in isKnownNonZero() Summary: The isKnownNonZero() function have checks that abort the recursion when it reaches the specified max depth. However one of the recursive calls was placed before the max depth check was done, resulting in a endless recursion that eventually triggered a segmentation fault. Fixed the problem by moving the max depth check above the first recursive call. Reviewers: Prazek, nlopes, spatel, craig.topper, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, bjope, llvm-commits Differential Revision: https://reviews.llvm.org/D47531 llvm-svn: 333557	2018-05-30 15:56:46 +00:00
Mark Searles	1054541490	[AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence for a loop. Previously, that kicked if greater than 2 passes over a loop, which doesn't account for loop with many bottom blocks. So, increase the threshold to (n+1), where n is the number of bottom blocks. This gives the pass an opportunity to consider the contribution of each bottom block, to the overall loop, before the forced convergence potentially kicks in. Differential Revision: https://reviews.llvm.org/D47488 llvm-svn: 333556	2018-05-30 15:47:45 +00:00
Gabor Buella	890e363e11	[X86] Lowering FMA intrinsics to native IR (LLVM part) Support for Clang lowering of fused intrinsics. This patch: 1. Removes bindings to clang fma intrinsics. 2. Introduces new LLVM unmasked intrinsics with rounding mode: int_x86_avx512_vfmadd_pd_512 int_x86_avx512_vfmadd_ps_512 int_x86_avx512_vfmaddsub_pd_512 int_x86_avx512_vfmaddsub_ps_512 supported with a new intrinsic type (INTR_TYPE_3OP_RM). 3. Introduces new x86 fmaddsub/fmsubadd folding. 4. Introduces new tests for code emitted by sequentions introduced in Clang part. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D47443 llvm-svn: 333554	2018-05-30 15:25:16 +00:00
Daniel Neilson	6b23fb764e	[AliasSet] Teach the alias set how to handle atomic memcpy/memmove/memset Summary: The atomic variants of the memcpy/memmove/memset intrinsics can be treated the same was as the regular forms, with respect to aliasing. Update the AliasSetTracker to treat the atomic forms the same was as the regular forms. llvm-svn: 333551	2018-05-30 14:43:39 +00:00
Alexandros Lamprineas	52457d33b2	[InstCombine, ARM, AArch64] Convert table lookup to shuffle vector Turning a table lookup intrinsic into a shuffle vector instruction can be beneficial. If the mask used for the lookup is the constant vector {7,6,5,4,3,2,1,0}, then the back-end generates byte reverse instructions instead. Differential Revision: https://reviews.llvm.org/D46133 llvm-svn: 333550	2018-05-30 14:38:50 +00:00
Simon Pilgrim	8df8b129ce	[X86][AVX512] Replace -cpu=knl with -mattr=+avx512f for avx512-intrinsics tests It was noticed on D47377 that these tests were being unnecessarily affected by scheduler changes. This adds vzeroupper at the end of some tests as we lose the 'FeatureFastPartialYMMorZMMWrite' feature from KNL, since Skylake+ don't support this its probably better. llvm-svn: 333549	2018-05-30 14:36:41 +00:00
Simon Pilgrim	173e225f1c	[X86][SSE] Remove unnecessary -cpu from sttni tests It was noticed on D47377 that these tests (for PR37246) were being unnecessarily affected by scheduler changes. llvm-svn: 333546	2018-05-30 14:11:57 +00:00
Simon Pilgrim	61b859dca7	[X86][SSE] Replace -cpu with equivalent -mattr for vec_cast tests It was noticed on D47377 that these tests were being unnecessarily affected by scheduler changes. llvm-svn: 333545	2018-05-30 14:01:21 +00:00
Krzysztof Parzyszek	8987174627	[Hexagon] Use vector align-left when shift amount fits in 3 bits This saves an instruction because for align-right the shift amount would need to be put in a register first. llvm-svn: 333543	2018-05-30 13:45:34 +00:00
Simon Dardis	39710e3555	[mips] Correct the predicates of arithmetic and logic instructions. As part of this effort, duplicate and correct the predicates of some aliases. Also disable code generation of some short form instructions for FastISel, as it would otherwise reject them. Reviewers: atanasyan, abeserminji, smaksimovic Differential Revision: https://reviews.llvm.org/D47075 llvm-svn: 333530	2018-05-30 11:33:35 +00:00
Tim Northover	d8949f5002	AArch64: print correct annotation for ADRP addresses. The immediate on an ADRP MCInst needs to be multiplied by 0x1000 to obtain the actual PC-offset that will be calculated. llvm-svn: 333525	2018-05-30 09:54:59 +00:00
Sander de Smalen	bdf09fe7a2	[AArch64][AsmParser] Fix segfault on illegal fpimm. Floating point immediate combining a negative sign and a hexadecimal number, e.g. #-0x0 caused the compiler to crash. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: javed.absar Differential Revision: https://reviews.llvm.org/D47483 llvm-svn: 333524	2018-05-30 09:54:19 +00:00
Daniel Cederman	60e6ce4155	[Sparc] Select correct register class for FP register constraints Summary: The fX version of floating-point registers only supports single precision. We need to map the name to dX for doubles and qX for long doubles if we want getRegForInlineAsmConstraint() to be able to pick the correct register class. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D47258 llvm-svn: 333512	2018-05-30 06:07:55 +00:00
Hiroshi Inoue	3872c6c633	[PowerPC] fix broken JIT-compiled code with tail call optimization The relocation for branch instructions in the dynamic loader of ExecutionEngine assumes branch instructions with R_PPC64_REL24 relocation type are only bl. However, with the tail call optimization, b instructions can be also used to jump into another function. This patch makes the relocation to keep bits in the branch instruction other than the jump offset to avoid relocation rewrites a b instruction into bl. Differential Revision: https://reviews.llvm.org/D47456 llvm-svn: 333502	2018-05-30 04:48:29 +00:00
Sam Clegg	e1076e5a77	Fix use of `echo` command in test script On win32 we use lit's executeBuiltinEcho to implement the echo command and this version only currently supports flags that are separate. llvm-svn: 333495	2018-05-30 03:26:28 +00:00
Sam Clegg	105bdc2557	[WebAssembly] MC: Add compile-twice test and fix corresponding bug Differential Revision: https://reviews.llvm.org/D47398 llvm-svn: 333494	2018-05-30 02:57:20 +00:00
Chandler Carruth	71fd27043e	[PM/LoopUnswitch] When using the new SimpleLoopUnswitch pass, schedule loop-cleanup passes at the beginning of the loop pass pipeline, and re-enqueue loops after even trivial unswitching. This will allow us to much more consistently avoid simplifying code while doing trivial unswitching. I've also added a test case that specifically shows effective iteration using this technique. I've unconditionally updated the new PM as that is always using the SimpleLoopUnswitch pass, and I've made the pipeline changes for the old PM conditional on using this new unswitch pass. I added a bunch of comments to the loop pass pipeline in the old PM to make it more clear what is going on when reviewing. Hopefully this will unblock doing partial unswitching instead of just full unswitching. Differential Revision: https://reviews.llvm.org/D47408 llvm-svn: 333493	2018-05-30 02:46:45 +00:00
Shiva Chen	c3d0e89284	[RISCV] Support resolving fixup_riscv_call and add to MCFixupKindInfo table Resolving fixup_riscv_call by assembler when the linker relaxation diabled and the function and callsite within the same compile unit. And also adding static_assert after Infos array declaration to avoid missing any new fixup in MCFixupKindInfo in the future. Differential Revision: https://reviews.llvm.org/D47126 llvm-svn: 333487	2018-05-30 01:16:36 +00:00
Craig Topper	aba57bfebd	[X86] Rename the operands in the recently introduced MOVSS+FMA patterns so that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction. The order should be controlled in the input pattern. llvm-svn: 333463	2018-05-29 20:46:26 +00:00
Chandler Carruth	4cbcbb0761	[LoopInstSimplify] Re-implement the core logic of loop-instsimplify to be both simpler and substantially more efficient. Rather than use a hand-rolled iteration technique that isn't quite the same as RPO, use the pre-built RPO loop body traversal utility. Once visiting the loop body in RPO, we can assert that we visit defs before uses reliably. When this is the case, the only need to iterate is when simplifying a def that is used by a PHI node along a back-edge. With this patch, the first pass over the loop body is just a complete simplification of every instruction across the loop body. When we encounter a use of a simplified instruction that stems from a PHI node in the loop body that has already been visited (due to some cyclic CFG, potentially the loop itself, or a nested loop, or unstructured control flow), we recall that specific PHI node for the second iteration. Nothing else needs to be preserved from iteration to iteration. On the second and later iterations, only instructions known to have simplified inputs are considered, each time starting from a set of PHIs that had simplified inputs along the backedges. Dead instructions are collected along the way, but deleted in a batch at the end of each iteration making the iterations themselves substantially simpler. This uses a new batch API for recursively deleting dead instructions. This alsa changes the routine to visit subloops. Because simplification is fundamentally transitive, we may need to visit the entire loop body, including subloops, to handle knock-on simplification. I've added a basic test file that helps demonstrate that all of these changes work. It includes both straight-forward loops with simplifications as well as interesting PHI-structures, CFG-structures, and a nested loop case. Differential Revision: https://reviews.llvm.org/D47407 llvm-svn: 333461	2018-05-29 20:15:38 +00:00
Craig Topper	5439b3d1e5	[X86] Fix a potential crash that occur after r333419. The code could issue a truncate from a small type to larger type. We need to extend in that case instead. llvm-svn: 333460	2018-05-29 20:04:10 +00:00
Sam Clegg	b7c6239408	[WebAssembly] Add more error checking to object file parsing This should address some of the assert failures the fuzzer has been finding such as: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6719 Differential Revision: https://reviews.llvm.org/D47086 llvm-svn: 333459	2018-05-29 19:58:59 +00:00
Matt Arsenault	4b3829d8cf	AMDGPU: Fix broken check lines llvm-svn: 333458	2018-05-29 19:35:53 +00:00
Matt Arsenault	1ea0402e82	AMDGPU: Round up kernel argument allocation size AFAIK the driver's allocation will actually have to round this up anyway. It is useful to track the rounded up size, so that the end of the kernel segment is known to be dereferencable so a wider s_load_dword can be used for a short argument at the end of the segment. llvm-svn: 333456	2018-05-29 19:35:00 +00:00
Sameer AbuAsal	97684419e8	[RISCV] Add peepholes for Global Address lowering patterns Summary: Base and offset are always separated when a GlobalAddress node is lowered (rL332641) as an optimization to reduce instruction count. However, this optimization is not profitable if the Global Address ends up being used in only instruction. This patch adds peephole optimizations that merge an offset of an address calculation into the LUI %%hi and ADD %lo of the lowering sequence. The peephole handles three patterns: 1) ADDI (ADDI (LUI %hi(global)) %lo(global)), offset ---> ADDI (LUI %hi(global + offset)) %lo(global + offset). This generates: lui a0, hi (global + offset) add a0, a0, lo (global + offset) Instead of lui a0, hi (global) addi a0, hi (global) addi a0, offset This pattern is for cases when the offset is small enough to fit in the immediate filed of ADDI (less than 12 bits). 2) ADD ((ADDI (LUI %hi(global)) %lo(global)), (LUI hi_offset)) ---> offset = hi_offset << 12 ADDI (LUI %hi(global + offset)) %lo(global + offset) Which generates the ASM: lui a0, hi(global + offset) addi a0, lo(global + offset) Instead of: lui a0, hi(global) addi a0, lo(global) lui a1, (offset) add a0, a0, a1 This pattern is for cases when the offset doesn't fit in an immediate field of ADDI but the lower 12 bits are all zeros. 3) ADD ((ADDI (LUI %hi(global)) %lo(global)), (ADDI lo_offset, (LUI hi_offset))) ---> offset = global + offhi20<<12 + offlo12 ADDI (LUI %hi(global + offset)) %lo(global + offset) Which generates the ASM: lui a1, %hi(global + offset) addi a1, %lo(global + offset) Instead of: lui a0, hi(global) addi a0, lo(global) lui a1, (offhi20) addi a1, (offlo12) add a0, a0, a1 This pattern is for cases when the offset doesn't fit in an immediate field of ADDI and both the lower 1 bits and high 20 bits are non zero. Reviewers: asb Reviewed By: asb Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang llvm-svn: 333455	2018-05-29 19:34:54 +00:00
Daniel Neilson	3a6c50f4e0	[BasicAA] Teach the analysis about atomic memcpy Summary: A simple change to derive mod/ref info from the atomic memcpy intrinsic in the same way as from the regular memcpy intrinsic. llvm-svn: 333454	2018-05-29 19:23:50 +00:00
Douglas Yung	99feb567bf	Update CodeView register names in a test that was missed in r333421. llvm-svn: 333453	2018-05-29 19:21:22 +00:00
Konstantin Zhuravlyov	2ca6b1f2ba	AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as it is set by CP Differential Revision: https://reviews.llvm.org/D47392 llvm-svn: 333451	2018-05-29 19:09:13 +00:00

1 2 3 4 5 ...

53450 Commits