llvm-project

Commit Graph

Author	SHA1	Message	Date
Konstantin Zhuravlyov	c96b5d7073	[AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables Differential Revision: https://reviews.llvm.org/D25562 llvm-svn: 284196	2016-10-14 04:37:34 +00:00
Saleem Abdulrasool	7705c4f1be	CodeGen: use MSVC division on windows itanium Windows itanium is identical to MSVC when dealing with everything but C++. Lower the math routines into msvcrt rather than compiler-rt. llvm-svn: 284175	2016-10-13 23:00:11 +00:00
Saleem Abdulrasool	06383dd272	CodeGen: adjust floating point operations in Windows itanium Windows itanium is equivalent to MSVC except in C++ mode. Ensure that the promote the 32-bit floating point operations to their 64-bit equivalences. llvm-svn: 284173	2016-10-13 22:38:15 +00:00
Sriraman Tallam	f29fa586e1	New llc option pie-copy-relocations to optimize access to extern globals. This option indicates copy relocations support is available from the linker when building as PIE and allows accesses to extern globals to avoid the GOT. Differential Revision: https://reviews.llvm.org/D24849 llvm-svn: 284160	2016-10-13 20:54:39 +00:00
Nirav Dave	a81682aad4	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r284151 which appears to be triggering a LTO failures on Hexagon llvm-svn: 284157	2016-10-13 20:23:25 +00:00
Nirav Dave	4b36957243	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after upstream changes. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 284151	2016-10-13 19:20:16 +00:00
Igor Breger	8409c356ad	[X86][AVX512] Fix sext v32i1 -> v32i8 lowering. Fix PR30600. Differential Revision: https://reviews.llvm.org/D25554 llvm-svn: 284134	2016-10-13 17:20:38 +00:00
Reid Kleckner	468e793fea	Fix for PR30687. Avoid dereferencing MBB.end(). We don't need to return a MachineInstr* from these stack probe insertion calls anyway. If we ever need to add it back, we can return an iterator instead. Based on a patch by David Kreitzer This bug is a consequence of r279314 \| dexonsmith \| 2016-08-19 13:40:12 -0700 (Fri, 19 Aug 2016) \| 110 lines We hit the "Assertion `!NodePtr->isKnownSentinel()' failed" assertion, but only when inserting a stack probe call at the end of an MBB, which isn't necessarily a common situation. Differential Revision: https://reviews.llvm.org/D25566 llvm-svn: 284130	2016-10-13 15:48:48 +00:00
Javed Absar	85874a9360	[ARM]: Assign cost of scaling used in addressing mode for ARM cores This patch assigns cost of the scaling used in addressing. On many ARM cores, a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. For instance: LDR R0, [R1, R2 LSL #2] LDR R0, [R1, -R2 LSL #2] Above, (1) takes less cycles than (2). By assigning appropriate scaling factor cost, we enable the LLVM to make the right trade-offs in the optimization and code-selection phase. Differential Revision: http://reviews.llvm.org/D24857 Reviewers: jmolloy, rengolin llvm-svn: 284127	2016-10-13 14:57:43 +00:00
Sanjay Patel	24b6ef7792	[x86] add negate-i1 run for 32-bit target llvm-svn: 284124	2016-10-13 14:27:08 +00:00
Simon Pilgrim	cb59b5257c	[DAGCombiner] Add vector support to (mul (shl X, Y), Z) -> (shl (mul X, Z), Y) style combines llvm-svn: 284122	2016-10-13 14:04:35 +00:00
Matt Arsenault	253640e18d	AMDGPU: Assume spilling will occur at -O0 Because everything live is spilled at the end of a block by fast regalloc, assume this will happen and avoid the copies of the resource descriptor. llvm-svn: 284119	2016-10-13 13:10:00 +00:00
Simon Pilgrim	26b6dbc369	Copy+pasts typo in comment describing combine test Repeated the "fold (mul x, 0) -> 0" instead of "fold (mul x, 1) -> x" llvm-svn: 284118	2016-10-13 12:54:32 +00:00
Simon Pilgrim	fa8fadc0e5	[DAGCombiner] Add vector support to C2-(A+C1) -> (C2-C1)-A folding llvm-svn: 284117	2016-10-13 12:49:31 +00:00
Simon Pilgrim	833b8a2071	[DAGCombiner] Add vector support to (sub -1, x) -> (xor x, -1) canonicalization Improves commutation potential llvm-svn: 284113	2016-10-13 12:05:20 +00:00
Oren Ben Simhon	92ccbf20ff	[X86] Basic additions to support RegCall Calling Convention. The Register Calling Convention (RegCall) was introduced by Intel to optimize parameter transfer on function call. This calling convention ensures that as many values as possible are passed or returned in registers. This commit presents the basic additions to LLVM CodeGen in order to support RegCall in X86. Differential Revision: http://reviews.llvm.org/D25022 llvm-svn: 284108	2016-10-13 07:53:43 +00:00
Craig Topper	3d41f91f61	[AVX-512] Fix v16i32 zero extending shuffle test case so it's really zero extend. llvm-svn: 284106	2016-10-13 05:41:01 +00:00
Craig Topper	ff23af4299	[AVX-512] Teach shuffle lowering to recognize 512-bit zero extends. llvm-svn: 284105	2016-10-13 05:29:41 +00:00
Craig Topper	05242739c2	[AVX-512] Add tests for basic 512-bit zero extending shuffle patterns. Code will be improved in a future commit. llvm-svn: 284104	2016-10-13 05:29:37 +00:00
Quentin Colombet	6b87a3109c	[AArch64][RegisterBankInfo] Provide alternative mappings for 64-bit load This allows RegBankSelect in greedy mode to get rid some of the cross register bank copies when loads are involved in the chain of computation. llvm-svn: 284097	2016-10-13 01:01:23 +00:00
Reid Kleckner	741d8a21d3	Correct PrivateLinkage for COFF - Use storage class C_STAT for 'PrivateLinkage' The storage class for PrivateLinkage should equal to the Internal Linkage. - Set 'PrivateGlobalPrefix' from "L" to ".L" for MM_WinCOFF (includes x86_64) MM_WinCOFF has empty GlobalPrefix '\0' so PrivateGlobalPrefix "L" may conflict to the normal symbol name starting with 'L'. Based on a patch by Han Sangjin! Manually updated test cases. llvm-svn: 284096	2016-10-13 00:55:24 +00:00
Quentin Colombet	cd80e97e88	[AArch64][RegisterBankInfo] Provide alternative mappings for G_BITCASTs. Thanks to this patch, RegBankSelect is able to get rid of some register bank copies as demonstrated in the test case. llvm-svn: 284094	2016-10-13 00:34:48 +00:00
Quentin Colombet	9e64919b7c	[AArch64][RegisterBankInfo] Use static mapping for same bank G_BITCAST. NFC. llvm-svn: 284090	2016-10-13 00:12:04 +00:00
Quentin Colombet	db643d9091	[AArch64][MachineLegalizer] Mark more G_BITCAST as legal. Basically any vector types that fits in a 32-bit register is also valid as far as copies are concerned. llvm-svn: 284089	2016-10-13 00:12:01 +00:00
Albert Gutowski	3245ee7e57	fix function label name in addressofreturnaddress test llvm-svn: 284085	2016-10-12 23:58:45 +00:00
Krzysztof Parzyszek	abc0662f04	Handle lane masks in LivePhysRegs when adding live-ins Differential Revision: https://reviews.llvm.org/D25533 llvm-svn: 284076	2016-10-12 22:53:41 +00:00
Tim Northover	fb8d989818	GlobalISel: support G_TRUNC selection on AArch64. Ahmed's patch again. llvm-svn: 284075	2016-10-12 22:49:15 +00:00
Tim Northover	69271c64d5	GlobalISel: support int <-> float conversions on AArch64. More of Ahmed's work. llvm-svn: 284074	2016-10-12 22:49:11 +00:00
Tim Northover	7dd378dd08	GlobalISel: select G_FCMP instructions on AArch64. Another of Ahmed's patches. llvm-svn: 284073	2016-10-12 22:49:07 +00:00
Tim Northover	6c02ad5e4f	GlobalISel: support selection of G_ICMP on AArch64. Patch from Ahmed Bougaca again. llvm-svn: 284072	2016-10-12 22:49:04 +00:00
Tim Northover	5e3dbf326c	GlobalISel: select G_BRCOND instructions on AArch64. llvm-svn: 284071	2016-10-12 22:49:01 +00:00
Tim Northover	6aacd27cd7	GlobalISel: mark G_BRCOND on s1 as legal. It's going to be a TBNZ (at -O0) anyway, so the high bits don't matter. llvm-svn: 284070	2016-10-12 22:48:36 +00:00
Albert Gutowski	795d7d6381	Create llvm.addressofreturnaddress intrinsic Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows. Reviewers: majnemer, rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25293 llvm-svn: 284061	2016-10-12 22:13:19 +00:00
Krzysztof Parzyszek	d62669d791	[MIRParser] Parse lane masks for register live-ins Differential Revision: https://reviews.llvm.org/D25530 llvm-svn: 284052	2016-10-12 21:06:45 +00:00
Krzysztof Parzyszek	3cb5ffeb35	Fix testcases failing after r284036 The codegen has changed slightly between my tests and the commit. llvm-svn: 284049	2016-10-12 20:39:33 +00:00
Krzysztof Parzyszek	8271be9a1d	Do not remove implicit defs in BranchFolder Branch folder removes implicit defs if they are the only non-branching instructions in a block, and the branches do not use the defined registers. The problem is that in some cases these implicit defs are required for the liveness information to be correct. Differential Revision: https://reviews.llvm.org/D25478 llvm-svn: 284036	2016-10-12 19:50:57 +00:00
Matt Arsenault	d486d3f8d1	AMDGPU: Initial implementation of VGPR indexing mode This is the most basic handling of the indirect access pseudos using GPR indexing mode. This currently only enables the mode for a single v_mov_b32 and then disables it. This is much more complicated to use than the movrel instructions, so a new optimization pass is probably needed to fold the access into the uses and keep the mode enabled for them. llvm-svn: 284031	2016-10-12 18:49:05 +00:00
Zvi Rackover	025e8614ab	[X86] Add the v4i32 flavor test-case for pr30371 llvm-svn: 284025	2016-10-12 17:06:30 +00:00
Tom Stellard	fac248cb5f	AMDGPU/SI: Change mimg intrinsic signatures This makes more fields overridable and removes redundant bits. Patch by: Changpeng Fang llvm-svn: 284024	2016-10-12 16:35:29 +00:00
Simon Pilgrim	08190943cb	[DAGCombiner] Update most ADD combines to support general vector combines Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors. Differential Revision: https://reviews.llvm.org/D25374 llvm-svn: 284015	2016-10-12 13:48:10 +00:00
Konstantin Zhuravlyov	081385a74e	[DAGCombiner] Do not remove the load of stored values when optimizations are disabled This combiner breaks debug experience and should not be run when optimizations are disabled. For example: int main() { int j = 0; j += 2; if (j == 2) return 0; return 5; } When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function. Differential Revision: https://reviews.llvm.org/D19268 llvm-svn: 284014	2016-10-12 13:44:24 +00:00
Diana Picus	40f9341154	Add AArch64 unit tests Add unit tests for checking a few tricky instruction sizes. Also remove the old tests for the instruction sizes, which were clunky and brittle. Since this is the first set of target-specific unit tests, we need to add some CMake plumbing. In the future, adding unit tests for a given target will be as simple as creating a directory with the same name as the target under unittests/Target. The tests are only run if the target is enabled in LLVM_TARGETS_TO_BUILD. Differential Revision: https://reviews.llvm.org/D24548 llvm-svn: 283990	2016-10-12 09:00:44 +00:00
Quentin Colombet	a907b5ca7c	[AArch64][InstructionSelector] Fix unintended test changes in r283973. I screwed up my merge conflict and lost some of the CHECK lines. llvm-svn: 283974	2016-10-12 04:12:44 +00:00
Quentin Colombet	9de30faeac	[AArch64][InstrustionSelector] Teach the selector about G_BITCAST. llvm-svn: 283973	2016-10-12 03:57:52 +00:00
Quentin Colombet	cb629a897c	[AArch64][InstructionSelector] Refactor the handling of copies. Although Copies are not specific to preISel, we still have to assign them a proper register class. However, given they are not constrained to anything we do not have to handle the source register at the copy. It will be properly mapped when reaching the related definition. In the process, the handlong of G_ANYEXT is slightly modified as those end up being selected as copy. The difference is that when register size do not match on both sides, we need to insert SUBREG_TO_REG operation, otherwise the post RA copy expansion will not be happy! llvm-svn: 283972	2016-10-12 03:57:49 +00:00
Quentin Colombet	5a0f5d4831	[AArch64][InstructionSelector] Fix typos in the related mir file. NFC. llvm-svn: 283971	2016-10-12 03:57:46 +00:00
Quentin Colombet	404e4350dc	[AArch64][MachineLegalizer] Mark more bitcasts as legal. Those are copies, we do not have to do any legalization action for them. llvm-svn: 283970	2016-10-12 03:57:43 +00:00
Tim Shen	4ff62b187e	[PPCMIPeephole] Fix splat elimination Summary: In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation: B = Splat A C = Splat B => C = Splat A because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not: B = Splat A C = Splat B => B = Splat A C = COPY B Fixes PR30663. Reviewers: echristo, iteratee, kbarton, nemanjai Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D25493 llvm-svn: 283961	2016-10-12 00:48:25 +00:00
Michael Kuperstein	7adbf6b042	[DAG] Fix crash in build_vector -> vector_shuffle combine Fixes a crash in the build_vector -> vector_shuffle combine when the first vector input is twice as wide as the output, and the second input vector is even wider. llvm-svn: 283953	2016-10-11 22:44:31 +00:00
Tim Northover	c1d8c2bf8c	GlobalISel: support same-size casts on AArch64. Mostly Ahmed's work again, I'm just sprucing things up slightly before committing. llvm-svn: 283952	2016-10-11 22:29:23 +00:00
Reid Kleckner	bdfc05ff93	Re-land "[Thumb] Save/restore high registers in Thumb1 pro/epilogues" Reverts r283938 to reinstate r283867 with a fix. The original change had an ArrayRef referring to a destroyed temporary initializer list. Use plain C arrays instead. llvm-svn: 283942	2016-10-11 21:14:03 +00:00
Reid Kleckner	f4876beb2b	Revert "[Thumb] Save/restore high registers in Thumb1 pro/epilogues" This reverts r283867. This appears to be an infinite loop: while (HiRegToSave != AllHighRegs.end() && CopyReg != AllCopyRegs.end()) { if (HiRegsToSave.count(*HiRegToSave)) { ... CopyReg = findNextOrderedReg(++CopyReg, CopyRegs, AllCopyRegs.end()); HiRegToSave = findNextOrderedReg(++HiRegToSave, HiRegsToSave, AllHighRegs.end()); } } llvm-svn: 283938	2016-10-11 20:54:41 +00:00
Tim Northover	3d38b3a4d1	GlobalISel: support selection of extend operations. Patch mostly by Ahmed Bougaca. llvm-svn: 283937	2016-10-11 20:50:21 +00:00
Kyle Butt	0846e56e63	Codegen: Tail-duplicate during placement. The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement. In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places. This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough. Issue from previous rollback fixed, and a new test was added for that case as well. Issue was worklist/scheduling/taildup issue in layout. Issue from 2nd rollback fixed, with 2 additional tests. Issue was tail merging/loop info/tail-duplication causing issue with loops that share a header block. Issue with early tail-duplication of blocks that branch to a fallthrough predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll Differential revision: https://reviews.llvm.org/D18226 llvm-svn: 283934	2016-10-11 20:36:43 +00:00
Sanjay Patel	c2bd185dc4	[x86] add tests for negate bool llvm-svn: 283930	2016-10-11 20:15:20 +00:00
Sanjay Patel	8253e15ef3	[DAG] add fold for masked negated sign-extended bool This enhances the fold added with: https://reviews.llvm.org/rL283900 llvm-svn: 283905	2016-10-11 17:05:52 +00:00
Sanjay Patel	58b4987284	[x86] add sext variants of tests added with r283894 llvm-svn: 283903	2016-10-11 16:49:52 +00:00
Sanjay Patel	8384703d9b	[DAG] add fold for masked negated extended bool The non-obvious motivation for adding this fold (which already happens in InstCombine) is that we want to canonicalize IR towards select instructions and canonicalize DAG nodes towards boolean math. So we need to recreate some folds in the DAG to handle that change in direction. An interesting implementation difference for cases like this is that InstCombine generally works top-down while the DAG goes bottom-up. That means we need to detect different patterns. In this case, the SimplifyDemandedBits fold prevents us from performing a zext to sext fold that would then be recognized as a negation of a sext. llvm-svn: 283900	2016-10-11 16:26:36 +00:00
Sanjay Patel	9b3c8a7321	[x86] add tests to show missed folds for masked bools llvm-svn: 283894	2016-10-11 16:04:37 +00:00
Changpeng Fang	98317d20f4	AMDGPU/SI: Update ISA version numbers for Tonga and Polaris10/11. Differential Revision: http://reviews.llvm.org/D25454 Reviewers: tstellarAMD llvm-svn: 283893	2016-10-11 16:00:47 +00:00
Simon Pilgrim	5b8627aada	[X86][SSE] Regenerate scalar i64 uitofp test Added 32-bit target test llvm-svn: 283883	2016-10-11 14:01:38 +00:00
Simon Pilgrim	092cfc597f	[X86][SSE] Regenerate vector load-trunc test llvm-svn: 283881	2016-10-11 13:55:49 +00:00
Simon Pilgrim	fe9fa7314c	[X86][SSE] Regenerate vsplit and tests To make it more obvious how bad some of that truncation code is.... llvm-svn: 283880	2016-10-11 13:51:44 +00:00
Sanjay Patel	6d71f7b348	[x86] update test to use FileCheck and auto-generate checks llvm-svn: 283876	2016-10-11 13:36:07 +00:00
Oliver Stannard	d2083fb356	[Thumb] Save/restore high registers in Thumb1 pro/epilogues The high registers are not allocatable in Thumb1 functions, but they could still be used by inline assembly, so we need to save and restore the callee-saved high registers (r8-r11) in the prologue and epilogue. This is complicated by the fact that the Thumb1 push and pop instructions cannot access these registers. Therefore, we have to move them down into low registers before pushing, and move them back after popping into low registers. In most functions, we will have low registers that are also being pushed/popped, which we can use as the temporary registers for saving/restoring the high registers. However, this is not guaranteed, so we may need to push some extra low registers to ensure that the high registers can be saved/restored. For correctness, it would be sufficient to use just one low register, but if we have enough low registers available then we only need one push/pop instruction, rather than one per high register. We can also use the argument/return registers when they are not live, and the link register when saving (but not restoring), reducing the number of extra registers we need to push. There are still a few extreme edge cases where we need two push/pop instructions, because not enough low registers can be made live in the prologue or epilogue. In addition to the regression tests included here, I've also tested this using a script to generate functions which clobber different combinations of registers, have different numbers of argument and return registers (including variadic arguments), allocate different fixed sized objects on the stack, and do or don't use variable sized allocas and the __builtin_return_address intrinsic (all of which affect the available registers in the prologue and epilogue). I ran these functions in a test harness which verifies that all of the callee-saved registers are correctly preserved. Differential Revision: https://reviews.llvm.org/D24228 llvm-svn: 283867	2016-10-11 10:12:25 +00:00
Oliver Stannard	50a74393c2	[ARM] Fix registers clobbered by SjLj EH on soft-float targets Currently, the Int_eh_sjlj_dispatchsetup intrinsic is marked as clobbering all registers, including floating-point registers that may not be present on the target. This is technically true, as we could get linked against code that does use the FP registers, but that will not actually work, as the soft-float code cannot save and restore the FP registers. SjLj exception handling can only work correctly if either all or none of the code is built for a target with FP registers. Therefore, we can assume that, when Int_eh_sjlj_dispatchsetup is compiled for a soft-float target, it is only going to be linked against other soft-float code, and so only clobbers the general-purpose registers. This allows us to check that no non-savable registers are clobbered when generating the prologue/epilogue. Differential Revision: https://reviews.llvm.org/D25180 llvm-svn: 283866	2016-10-11 10:06:59 +00:00
Daniel Jasper	0c42dc4784	Revert "Codegen: Tail-duplicate during placement." This reverts commit r283842. test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our internal testing. I'll share a link with you. llvm-svn: 283857	2016-10-11 07:36:11 +00:00
Matthias Braun	74ad41c7cd	MIRParser: Rewrite register info initialization; mostly NFC This changes MachineRegisterInfo to be initializes after parsing all instructions. This is in preparation for upcoming commits that allow the register class specification on the operand or deduce them from the MCInstrDesc. This commit removes the unused feature of having nonsequential register numbers. This was confusing anyway as the vreg numbers would be different after parsing when you had "holes" in your numbering. This patch also introduces the concept of an incomplete virtual register. An incomplete virtual register may be used during .mir parsing to construct MachineOperands without knowing the exact register class (or register bank) yet. NFC except for some error messages. Differential Revision: https://reviews.llvm.org/D22397 llvm-svn: 283848	2016-10-11 03:13:01 +00:00
Kyle Butt	ae068a320c	Codegen: Tail-duplicate during placement. The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement. In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places. This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough. Issue from previous rollback fixed, and a new test was added for that case as well. Issue was worklist/scheduling/taildup issue in layout. Issue from 2nd rollback fixed, with 2 additional tests. Issue was tail merging/loop info/tail-duplication causing issue with loops that share a header block. Issue with early tail-duplication of blocks that branch to a fallthrough predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll Differential revision: https://reviews.llvm.org/D18226 llvm-svn: 283842	2016-10-11 01:20:33 +00:00
Dylan McKay	c328fe5af4	[RegAllocGreedy] Attempt to split unspillable live intervals Summary: Previously, when allocating unspillable live ranges, we would never attempt to split. We would always bail out and try last ditch graph recoloring. This patch changes this by attempting to split all live intervals before performing recoloring. This fixes LLVM bug PR14879. I can't add test cases for any backends other than AVR because none of them have small enough register classes to trigger the bug. Reviewers: qcolombet Subscribers: MatzeB Differential Revision: https://reviews.llvm.org/D25070 llvm-svn: 283838	2016-10-11 01:04:36 +00:00
Quentin Colombet	d2623f8e38	[AArch64][InstructionSelector] Teach how to select FP load/store. This patch allows to select 32 and 64-bit FP load and store. llvm-svn: 283832	2016-10-11 00:21:14 +00:00
Quentin Colombet	0e5312787e	[AArch64][InstructionSelector] Teach the selector how to handle vector OR. This only adds the support for 64-bit vector OR. Adding more sizes is not difficult, but it requires a bigger refactoring because ORs work on any size, not necessarly the ones that match the width of the register width. Right now, this is not expressed in the legalization, so don't bother pushing the refactoring yet. llvm-svn: 283831	2016-10-11 00:21:11 +00:00
Quentin Colombet	d3126d5fb4	[AArch64][MachineLegalizer] Mark v2s32 G_LOAD as legal. Actually every 64-bit loads are legal, but right now the API does not offer a simple way to express that. llvm-svn: 283829	2016-10-11 00:21:08 +00:00
Sanjay Patel	3013a62dd8	[x86] auto-generate checks llvm-svn: 283812	2016-10-10 22:04:12 +00:00
Sanjay Patel	b493cdaabf	[x86] auto-generate checks llvm-svn: 283811	2016-10-10 22:01:42 +00:00
Tim Northover	bdf1624367	GlobalISel: select G_GLOBAL_VALUE uses on AArch64. llvm-svn: 283809	2016-10-10 21:50:00 +00:00
Tim Northover	ad0acca544	GlobalISel: allow G_GLOBAL_VALUEs in AArch64 legalization. llvm-svn: 283808	2016-10-10 21:49:53 +00:00
Tim Northover	2fda4b08ae	GlobalISel: support selecting G_GEP instructions. They're basically just an alias for G_ADD on AArch64. llvm-svn: 283807	2016-10-10 21:49:49 +00:00
Tim Northover	4edc60d785	GlobalISel: support selecting constants on AArch64. llvm-svn: 283806	2016-10-10 21:49:42 +00:00
Hal Finkel	fcd2421667	[SelectionDAGBuilder] Support llvm.flt.rounds on targets where i32 is not legal Add integer expansion for FLT_ROUNDS_ for targets where i32 is not a legal type. Patch by Edward Jones, thanks! Differential Revision: https://reviews.llvm.org/D24459 llvm-svn: 283797	2016-10-10 20:45:15 +00:00
Alexandros Lamprineas	20e9ddba73	[ARM] Fix invalid VLDM/VSTM access when targeting Big Endian with NEON The instructions VLDM/VSTM can only access word-aligned memory locations and produce alignment fault if the condition is not met. The compiler currently generates VLDM/VSTM for v2f64 load/store regardless the alignment of the memory access. Instead, if a v2f64 load/store is not word-aligned, the compiler should generate VLD1/VST1. For each non double-word-aligned VLD1/VST1, a VREV instruction should be generated when targeting Big Endian. Differential Revision: https://reviews.llvm.org/D25281 llvm-svn: 283763	2016-10-10 16:01:54 +00:00
Zvi Rackover	2a21f125bd	[X86] Prefer rotate by 1 over rotate by imm Summary: Rotate by 1 is translated to 1 micro-op, while rotate with imm8 is translated to 2 micro-ops. Fixes pr30644. Reviewers: delena, igorb, craig.topper, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D25399 llvm-svn: 283758	2016-10-10 14:43:55 +00:00
Simon Pilgrim	4aea8e8a39	Fixed windows stdout/stderr redirection in inline asm constraint tests llvm-svn: 283741	2016-10-10 11:11:27 +00:00
Chris Dewhurst	850131213f	This pass, fixing an erratum in some LEON 2 processors ensures that the SDIV instruction is not issued, but replaced by SDIVcc instead, which does not exhibit the error. Unit test included. Differential Review: https://reviews.llvm.org/D24660 llvm-svn: 283727	2016-10-10 08:53:06 +00:00
Craig Topper	9ece2f7529	[AVX-512] Add missing pattern sext or zext from bytes to quad words with a 128-bit load as input. llvm-svn: 283720	2016-10-10 06:25:48 +00:00
Craig Topper	0f905027b3	[AVX-512] Add test cases for AVX512 sign/zero extend instructions derived from the sse41 and avx2 test cases. Code will be improved in future commits. llvm-svn: 283719	2016-10-10 06:25:45 +00:00
Craig Topper	aba15075da	[AVX-512] Add an AVX512VL/BW command line to sse41-pmovxrm.ll and avx2-pmovxrm.ll. Also disable peephole so we really test pattern matching. llvm-svn: 283718	2016-10-10 06:25:42 +00:00
Michael Zuckerman	3eeac2d56b	[x86][inline-asm][llvm] accept 'v' constraint Commit in the name of:Coby Tayree 1.'v' constraint for (x86) non-avx arch imitates the already implemented 'x' constraint, i.e. allows XMM{0-15} & YMM{0-15} depending on the apparent arch & mode (32/64). 2.for the avx512 arch it allows [X,Y,Z]MM{0-31} (mode dependent) This patch applies the needed changes to clang clang patch: https://reviews.llvm.org/D25004 Differential Revision: D25005 llvm-svn: 283717	2016-10-10 05:48:56 +00:00
Craig Topper	64378f4378	[AVX-512] Port 128 and 256-bit memory->register sign/zero extend patterns from SSE file. Also add a minimal set for 512-bit. llvm-svn: 283704	2016-10-09 23:08:39 +00:00
Zvi Rackover	b764bf2987	[X86] Adding the 'nounwind' attribute to test functions for cleaner generated code Thanks to RKSimon for the suggestion. llvm-svn: 283696	2016-10-09 13:33:51 +00:00
Zvi Rackover	f841080caf	[X86] Improve the rotate ISel test Summary: - Added 64-bit target testing. - Added 64-bit operand test cases. - Added cases that demonstrate pr30644 Reviewers: RKSimon, craig.topper, igorb Differential Revision: https://reviews.llvm.org/D25401 llvm-svn: 283695	2016-10-09 13:07:25 +00:00
Elena Demikhovsky	5b10aa1f1e	DAG: Setting Masked-Expand-Load as a variant of Masked-Load node Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector. Right now, the node is used in intrinsics for VEXPAND* instructions. The work is done towards implementation of masked.expandload and masked.compressstore intrinsics. Differential Revision: https://reviews.llvm.org/D25322 llvm-svn: 283694	2016-10-09 10:48:52 +00:00
Craig Topper	43973154dd	[AVX-512] Fix execution domain for EVEX encoded VINSERTPS. llvm-svn: 283692	2016-10-09 06:41:47 +00:00
Craig Topper	e30cb00dc0	[AVX-512] Add subvector insert and extract to load/store folding tables. llvm-svn: 283689	2016-10-09 03:54:13 +00:00
Craig Topper	50a468e03f	[AVX-512] Add avx512dq to the fp stack folding test. llvm-svn: 283688	2016-10-09 03:54:09 +00:00
Craig Topper	4262d53024	[AVX-512] Add the vector down convert instructions to the store folding tables. llvm-svn: 283687	2016-10-09 03:54:05 +00:00
Simon Pilgrim	319c094771	[X86][SSE] Regenerate select tests llvm-svn: 283674	2016-10-08 21:17:44 +00:00
Zvi Rackover	ce4900aaa6	Revert "[X86] Apply the Update LLC Test Checks tool on the rotate tests." This reverts commit 283667. llvm-svn: 283673	2016-10-08 20:54:20 +00:00
Simon Pilgrim	9e7a22fc13	[X86][SSE] Regenerate and add 32-bit tests to widening tests llvm-svn: 283672	2016-10-08 19:54:28 +00:00
Simon Pilgrim	30cbd1ab84	Fix comment typos - full update script path in assertions note llvm-svn: 283670	2016-10-08 18:51:55 +00:00
Craig Topper	2067142d7d	[AVX-512] Add test case for PR30430 that I should have added in r281959. llvm-svn: 283669	2016-10-08 18:50:00 +00:00
Craig Topper	086f0c1401	[AVX-512] Fix a bug in getLargestLegalSuperClass where we inflated to VR128X/VR256X even when VLX isn't supported. This seems to have been responsible for the XMM16-31 spills observed in PR29112. With this fixed the test case has been modified to no longer have a spill of XMM16. llvm-svn: 283668	2016-10-08 18:49:57 +00:00
Zvi Rackover	2413d475fc	[X86] Apply the Update LLC Test Checks tool on the rotate tests. Also added cases demonstrating pr30644. llvm-svn: 283667	2016-10-08 18:44:47 +00:00
Simon Pilgrim	d0d90fb9b2	[X86][AVX2] Regenerate and add 32-bit tests to core tests llvm-svn: 283666	2016-10-08 18:36:57 +00:00
Sebastian Pop	eb65d72d9c	[AArch64] Avoid generating indexed vector instructions for Exynos Avoid generating indexed vector instructions for Exynos. This is needed for fmla/fmls/fmul/fmulx. For example, the instruction fmla v0.4s, v1.4s, v2.s[1] is less efficient than the instructions dup v2.4s, v2.s[1] fmla v0.4s, v1.4s, v2.4s Patch written by Abderrazek Zaafrani. Differential Revision: https://reviews.llvm.org/D21571 llvm-svn: 283663	2016-10-08 12:30:07 +00:00
Mehdi Amini	01e0e136bd	Requires the AVR backend for running test/CodeGen/AVR llvm-svn: 283653	2016-10-08 04:39:34 +00:00
Kyle Butt	2facd194a2	Revert "Codegen: Tail-duplicate during placement." This reverts commit 71c312652c10f1855b28d06697c08d47e7a243e4. llvm-svn: 283647	2016-10-08 01:47:05 +00:00
Dylan McKay	12109e7314	Allow a maximum of 64 bits to be returned in registers The rest spills to the stack Authored by Jake Goulding llvm-svn: 283635	2016-10-08 01:05:09 +00:00
Dylan McKay	c1ff65cf62	[AVR] Expand MULHS for all types Once MULHS was expanded, this exposed an issue where the condition register was thought to be 16-bit. This caused an attempt to copy a 16-bit register to an 8-bit register. Authored by Jake Goulding llvm-svn: 283634	2016-10-08 01:01:49 +00:00
Tom Stellard	5ab6154dc3	AMDGPU/SI: Handle div_fmas hazard in GCNHazardRecognizer Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25250 llvm-svn: 283622	2016-10-07 23:42:48 +00:00
Kyle Butt	37e676d857	Codegen: Tail-duplicate during placement. The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement. In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places. This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough. Issue from previous rollback fixed, and a new test was added for that case as well. Issue was worklist/scheduling/taildup issue in layout. Issue from 2nd rollback fixed, with 2 additional tests. Issue was tail merging/loop info/tail-duplication causing issue with loops that share a header block. Differential revision: https://reviews.llvm.org/D18226 llvm-svn: 283619	2016-10-07 22:33:20 +00:00
Arnold Schwaighofer	3f25658143	swifterror: Don't compute swifterror vregs during instruction selection The code used llvm basic block predecessors to decided where to insert phi nodes. Instruction selection can and will liberally insert new machine basic block predecessors. There is not a guaranteed one-to-one mapping from pred. llvm basic blocks and machine basic blocks. Therefore the current approach does not work as it assumes we can mark predecessor machine basic block as needing a copy, and needs to know the set of all predecessor machine basic blocks to decide when to insert phis. Instead of computing the swifterror vregs as we select instructions, propagate them at the end of instruction selection when the MBB CFG is complete. When an instruction needs a swifterror vreg and we don't know the value yet, generate a new vreg and remember this "upward exposed" use, and reconcile this at the end of instruction selection. This will only happen if the target supports promoting swifterror parameters to registers and the swifterror attribute is used. rdar://28300923 llvm-svn: 283617	2016-10-07 22:06:55 +00:00
Simon Pilgrim	f9648b72df	[X86][SSE] Reapplied: Add vector fcopysign combine tests Now with better lowering and fix for PR30443 llvm-svn: 283569	2016-10-07 16:00:59 +00:00
Simon Pilgrim	02f623e74c	[X86][SSE] Tidied up tests - use standard check prefixes llvm-svn: 283559	2016-10-07 14:42:22 +00:00
Konstantin Zhuravlyov	f74fc60a7d	[AMDGPU] Promote uniform (i1, i16] operations to i32 Differential Revision: https://reviews.llvm.org/D25302 llvm-svn: 283555	2016-10-07 14:22:58 +00:00
Martin Storsjo	04864f45b2	[ARM] Reapply: Use __rt_div functions for divrem on Windows Reapplying r283383 after revert in r283442. The additional fix is a getting rid of a stray space in a function name, in the refactoring part of the commit. This avoids falling back to calling out to the GCC rem functions (__moddi3, __umoddi3) when targeting Windows. The __rt_div functions have flipped the two arguments compared to the __aeabi_divmod functions. To match MSVC, we emit a check for division by zero before actually calling the library function (even if the library function itself also might do the same check). Not all calls to __rt_div functions for division are currently merged with calls to the same function with the same parameters for the remainder. This is more wasteful than a div + mls as before, but avoids calls to __moddi3. Differential Revision: https://reviews.llvm.org/D25332 llvm-svn: 283550	2016-10-07 13:28:53 +00:00
Javed Absar	fb4b6e8db9	[ARM]: Add Cortex-R52 target to LLVM This patch adds Cortex-R52, the new ARM real-time processor, to LLVM. Cortex-R52 implements the ARMv8-R architecture. llvm-svn: 283542	2016-10-07 12:06:40 +00:00
Simon Pilgrim	a5d019ee95	[X86][SSE] Update register class during MOVSD/MOVSS - BLENDPD/BLENDPS commutation MOVSD/MOVSS take a 128-bit register and a FR32/FR64 register input, the commutation code wasn't taking this into account leading to verification errors. This patch inserts a vreg copy mi to ensure that the registers are correct. Fix for PR30607 Differential Revision: https://reviews.llvm.org/D25280 llvm-svn: 283539	2016-10-07 11:18:38 +00:00
Nicolai Haehnle	87bc4c218b	AMDGPU: Fix use-after-free in SIOptimizeExecMasking Summary: There was a bug with sequences like s_mov_b64 s[0:1], exec s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill> ... s_mov_b64_term exec, s[2:3] because s[2:3] was defined and used in the same instruction, ending up with SaveExecInst inside OtherUseInsts. Note that the test case also exposes an unrelated bug. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028 Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25306 llvm-svn: 283528	2016-10-07 08:40:14 +00:00
Matt Arsenault	93401f4b5e	AMDGPU: Change check prefix in test llvm-svn: 283521	2016-10-07 03:55:04 +00:00
Dan Gohman	2726b88c03	[WebAssemby] Implement block signatures. Per spec changes, this implements block signatures, and adds just enough logic to produce correct block signatures at the ends of functions. Differential Revision: https://reviews.llvm.org/D25144 llvm-svn: 283503	2016-10-06 22:29:32 +00:00
Dan Gohman	3a643e8d46	[WebAssembly] Remove loop's bottom label. Per spec changes, loop constructs no longer have a bottom label. https://reviews.llvm.org/D25118 llvm-svn: 283502	2016-10-06 22:10:23 +00:00
Dan Gohman	7f1bdb2e02	[WebAssembly] Remove the output operand from stores. Per spec changes, store instructions in WebAssembly no longer have a return value. Update the instruction descriptions. Differential Revision: https://reviews.llvm.org/D25122 llvm-svn: 283501	2016-10-06 22:08:28 +00:00
Pirama Arumuga Nainar	cc152ac794	Handle *_EXTEND_VECTOR_INREG during Integer Legalization Summary: These nodes need legalization for 3-element vectors. This commit handles the legalization and adds tests for zext and sext. This fixes PR30614. Reviewers: RKSimon, srhines Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25268 llvm-svn: 283496	2016-10-06 21:27:05 +00:00
Michael Kuperstein	e524e22846	[X86] Preserve BasePtr for LEA64_32r When replacing FrameIndex with BasePtr, we must preserve BasePtr for LEA64_32r since BasePtr is used later for stack adjustment if it is the same as StackPtr. Patch by H.J Lu <hjl.tools@gmail.com> Differential Revision: https://reviews.llvm.org/D23575 llvm-svn: 283486	2016-10-06 19:31:27 +00:00
Simon Pilgrim	bddb412896	[X86][SSE] Add f16/f80/f128 vector sitofp test cases As discussed on D23808 llvm-svn: 283485	2016-10-06 19:29:25 +00:00
Michael Kuperstein	7cc2123847	[DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs This generalizes the build_vector -> vector_shuffle combine to support any number of inputs. The idea is to create a binary tree of shuffles, where the first layer performs pairwise shuffles of the input vectors placing each input element into the correct lane, and the rest of the tree blends these shuffles together. This doesn't try to be smart and create any sort of "optimal" shuffles. The assumption is that even a "poor" shuffle sequence is better than extracting and inserting the elements one by one. Differential Revision: https://reviews.llvm.org/D24683 llvm-svn: 283480	2016-10-06 18:58:24 +00:00
Matt Arsenault	6bc43d8627	BranchRelaxation: Support expanding unconditional branches AMDGPU needs to expand unconditional branches in a new block with an indirect branch. llvm-svn: 283464	2016-10-06 16:20:41 +00:00
Krzysztof Parzyszek	d391d6f1c3	[Hexagon] Avoid replacing full regs with subregisters in tied operands Doing so will result in the two-address pass generating incorrect code. llvm-svn: 283463	2016-10-06 16:18:04 +00:00
Matt Arsenault	ef5bba0136	BranchRelaxation: Account for function alignment llvm-svn: 283462	2016-10-06 16:00:58 +00:00
Matt Arsenault	36919a4f7c	Move AArch64BranchRelaxation to generic code llvm-svn: 283459	2016-10-06 15:38:53 +00:00
Diana Picus	6341e46cd1	Revert "[ARM] Use __rt_div functions for divrem on Windows" This reverts commit r283383 because it broke some of the bots: undefined reference to ` __aeabi_uldivmod' It affected (at least) clang-cmake-armv7-a15-selfhost, clang-cmake-armv7-a15-selfhost and clang-native-arm-lnt. llvm-svn: 283442	2016-10-06 11:24:29 +00:00
Zvi Rackover	08a37f46e3	Add test-cases which demontrate pr30561 llvm-svn: 283436	2016-10-06 10:04:00 +00:00
James Molloy	6215fad0e9	[ARM] Constant pool promotion - fix alignment calculation Global variables are GlobalValues, so they have explicit alignment. Querying DataLayout for the alignment was incorrect. Testcase added. llvm-svn: 283423	2016-10-06 07:56:00 +00:00
James Molloy	78561c4917	[ARM] Improve testcase for r283323 We can work around a shortcoming of FileCheck by using {{\[}} to match a square bracket before a [[ sequence. Thanks to Eli Friedman for the heads up! llvm-svn: 283422	2016-10-06 07:44:05 +00:00
Konstantin Zhuravlyov	b4eb5d5049	[AMDGPU] Promote uniform i16 bitreverse intrinsic to i32 Differential Revision: https://reviews.llvm.org/D25121 llvm-svn: 283415	2016-10-06 02:20:46 +00:00
Sanjay Patel	edc2baddf8	[DAG] add tests to show missing checks for SDNode FMF The AVX attribute is added to remove noise caused by SSE's destructive insts. llvm-svn: 283410	2016-10-05 23:20:32 +00:00
Adrian Prantl	b3510afcd1	Verifier: Reject any unknown named MD nodes in the llvm.dbg namespace. This came out of a discussion in https://reviews.llvm.org/D25285. There used to be various other llvm.dbg.* nodes, but we don't support upgrading them and we want to reserve the namespace for future uses. This also removes an entirely obsolete and bitrotted testcase for PR7662. Reapplies 283390 with a forgotten testcase. llvm-svn: 283400	2016-10-05 22:15:37 +00:00
Adrian Prantl	497f085475	Revert "Verifier: Reject any unknown named MD nodes in the llvm.dbg namespace." Forgot to add a testcase in r283390. llvm-svn: 283399	2016-10-05 22:15:34 +00:00
Sanjay Patel	5839858584	[DAG] change test to use 'unsafe' function attribute instead of global setting But we have node-level FMF, so the next step is to fix this at the instruction/node-level. llvm-svn: 283393	2016-10-05 21:43:50 +00:00
Adrian Prantl	71bba7253e	Verifier: Reject any unknown named MD nodes in the llvm.dbg namespace. This came out of a discussion in https://reviews.llvm.org/D25285. There used to be various other llvm.dbg.* nodes, but we don't support upgrading them and we want to reserve the namespace for future uses. This also removes an entirely obsolete and bitrotted testcase for PR7662. llvm-svn: 283390	2016-10-05 21:31:19 +00:00
Martin Storsjo	f997759aef	[ARM] Use __rt_div functions for divrem on Windows This avoids falling back to calling out to the GCC rem functions (__moddi3, __umoddi3) when targeting Windows. The __rt_div functions have flipped the two arguments compared to the __aeabi_divmod functions. To match MSVC, we emit a check for division by zero before actually calling the library function (even if the library function itself also might do the same check). Not all calls to __rt_div functions for division are currently merged with calls to the same function with the same parameters for the remainder. This is more wasteful than a div + mls as before, but avoids calls to __moddi3. Differential Revision: https://reviews.llvm.org/D24076 llvm-svn: 283383	2016-10-05 21:08:02 +00:00
James Y Knight	b0a473aaf8	[Sparc] Implement UMUL_LOHI and SMUL_LOHI instead of MULHS/MULHU/MUL. This is what the instruction-set actually provides, and the default expansions of the others into the lohi opcodes are good. llvm-svn: 283381	2016-10-05 20:54:17 +00:00
Yunzhong Gao	ba150d6156	Improve the debug-info test created in r274263. This patch is related to r274263 or Phabricator/D21818. This patch aims to improve the test case added in the previous commit to verify specifically that the stack protector pass is adding the debug line info as intended. Before, the test only verified that the verifier pass does not crash. The current approach is to generate the assembly output and then look for the .loc directive. Differential Revision: https://reviews.llvm.org/D25290 llvm-svn: 283374	2016-10-05 20:26:29 +00:00
Krzysztof Parzyszek	3b6cbd55f7	[RDF] Fix live def propagation through basic block llvm-svn: 283371	2016-10-05 20:08:09 +00:00
Bjorn Pettersson	12559441bd	[DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look through EXTRACT_VECTOR_ELT. Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple look-through of EXTRACT_VECTOR_ELT. It will compute the result based on the known bits (or known sign bits) for the vector that the element is extracted from. Reviewers: bogner, tstellarAMD, mkuper Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25007 llvm-svn: 283347	2016-10-05 17:40:27 +00:00
Bjorn Pettersson	ddd31e5637	Test commit permission. NFC llvm-svn: 283346	2016-10-05 17:22:11 +00:00
Hans Wennborg	c26c03d911	Revert r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)" This is suspected to cause a miscompile in Chromium. Reverting while investigating. llvm-svn: 283329	2016-10-05 15:39:27 +00:00
James Molloy	b7de497cb9	[Thumb] Don't try and emit LDRH/LDRB from the constant pool This is not a valid encoding - these instructions cannot do PC-relative addressing. The underlying problem here is of whitelist in ARMISelDAGToDAG that unwraps ARMISD::Wrappers during addressing-mode selection. This didn't realise TargetConstantPool was actually possible, so didn't handle it. llvm-svn: 283323	2016-10-05 14:52:13 +00:00
Oren Ben Simhon	0670e5a35b	Test commit permission llvm-svn: 283319	2016-10-05 14:12:41 +00:00
Krzysztof Parzyszek	e7c72cdbb0	Fix machine operand traversal in ScheduleDAGInstrs::fixupKills llvm-svn: 283315	2016-10-05 13:15:06 +00:00
Kyle Butt	25ac35d822	Revert "Codegen: Tail-duplicate during placement." This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc. Issue with loop info and block removal revealed by polly. I have a fix for this issue already in another patch, I'll re-roll this together with that fix, and a test case. llvm-svn: 283292	2016-10-05 01:39:29 +00:00
Kyle Butt	adabac2d57	Codegen: Tail-duplicate during placement. The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement. In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places. This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough. Issue from previous rollback fixed, and a new test was added for that case as well. Differential revision: https://reviews.llvm.org/D18226 llvm-svn: 283274	2016-10-04 23:54:18 +00:00
Sanjay Patel	bfdbea6481	[Target] move reciprocal estimate settings from TargetOptions to TargetLowering The motivation for the change is that we can't have pseudo-global settings for codegen living in TargetOptions because that doesn't work with LTO. Ideally, these reciprocal attributes will be moved to the instruction-level via FMF, metadata, or something else. But making them function attributes is at least an improvement over the current state. The ingredients of this patch are: Remove the reciprocal estimate command-line debug option. Add TargetRecip to TargetLowering. Remove TargetRecip from TargetOptions. Clean up the TargetRecip implementation to work with this new scheme. Set the default reciprocal settings in TargetLoweringBase (everything is off). Update the PowerPC defaults, users, and tests. Update the x86 defaults, users, and tests. Note that if this patch needs to be reverted, the related clang patch checked in at r283251 should be reverted too. Differential Revision: https://reviews.llvm.org/D24816 llvm-svn: 283252	2016-10-04 20:46:43 +00:00
Matthias Braun	46a5238682	AArch64: Macrofusion: Split features, add missing combinations. AArch64InstrInfo::shouldScheduleAdjacent() determines whether two instruction can benefit from macroop fusion on apple CPUs. The list turned out to be incomplete: - the "rr" variants of the instructions were missing - even the "rs" variants can have shift value == 0 and behave like the "rr" variants This also splits the MacropFusion target feature into ArithmeticBccFusion and ArithmeticCbzFusion. Differential Revision: https://reviews.llvm.org/D25142 llvm-svn: 283243	2016-10-04 19:28:21 +00:00
Nemanja Ivanovic	6354d23555	[Power9] Exploit D-Form VSX Scalar memory ops that target full VSX register set This patch corresponds to review: The newly added VSX D-Form (register + offset) memory ops target the upper half of the VSX register set. The existing ones target the lower half. In order to unify these and have the ability to target all the VSX registers using D-Form operations, this patch defines Pseudo-ops for the loads/stores which are expanded post-RA. The expansion then choses the correct opcode based on the register that was allocated for the operation. llvm-svn: 283212	2016-10-04 11:25:52 +00:00
Simon Dardis	86b3a1e79b	[mips][fastisel] Consider soft-float an unsupported floating point mode Treat soft-float as unsupported for fast-isel. Additionally, ensure we check that lowering f32 arguments also considers the case of soft-float mode. Reviewers: ehostunreach, vkalintiris, zoran.jovanovic Differential Review: https://reviews.llvm.org/D24505 llvm-svn: 283209	2016-10-04 10:35:07 +00:00
Nemanja Ivanovic	a565d9e612	Fix a test case failure on Apple PPC. llvm-svn: 283191	2016-10-04 07:37:38 +00:00
Nemanja Ivanovic	11049f8f07	[Power9] Part-word VSX integer scalar loads/stores and sign extend instructions This patch corresponds to review: https://reviews.llvm.org/D23155 This patch removes the VSHRC register class (based on D20310) and adds exploitation of the Power9 sub-word integer loads into VSX registers as well as vector sign extensions. The new instructions are useful for a few purposes: Int to Fp conversions of 1 or 2-byte values loaded from memory Building vectors of 1 or 2-byte integers with values loaded from memory Storing individual 1 or 2-byte elements from integer vectors This patch implements all of those uses. llvm-svn: 283190	2016-10-04 06:59:23 +00:00
Kyle Butt	3ffb8529bc	Revert "Codegen: Tail-duplicate during placement." This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2. Causing crashes on aarch64 build. llvm-svn: 283172	2016-10-04 00:38:23 +00:00
Kyle Butt	396bfdd707	Codegen: Tail-duplicate during placement. The tail duplication pass uses an assumed layout when making duplication decisions. This is fine, but passes up duplication opportunities that may arise when blocks are outlined. Because we want the updated CFG to affect subsequent placement decisions, this change must occur during placement. In order to achieve this goal, TailDuplicationPass is split into a utility class, TailDuplicator, and the pass itself. The pass delegates nearly everything to the TailDuplicator object, except for looping over the blocks in a function. This allows the same code to be used for tail duplication in both places. This change, in concert with outlining optional branches, allows triangle shaped code to perform much better, esepecially when the taken/untaken branches are correlated, as it creates a second spine when the tests are small enough. llvm-svn: 283164	2016-10-04 00:00:09 +00:00
Matthias Braun	d2fc0d40e4	Set some tests to an unknown vendor and OS This avoids llc using the hosts OS/vendor as defaults and triggering unwanted behaviour in the tests. This should deal with the buildbot breakages on windows after r283140. llvm-svn: 283149	2016-10-03 21:58:20 +00:00
Krzysztof Parzyszek	c8b6ecabd8	[RDF] Fix liveness propagation through shadows Each shadow only represents data flow that is restricted to its reaching def. Propagating more than that could lead to spurious register liveness, resulting in extra (incorrectly) block live-ins. llvm-svn: 283143	2016-10-03 20:17:20 +00:00
Matthias Braun	eccdee9196	X86: Do not produce GOT relocations on windows Windows has no GOT relocations the way elf/darwin has. Some people use x86_64-pc-win32-macho to build EFI firmware; Do not produce GOT relocations for this target. Differential Revision: https://reviews.llvm.org/D24627 llvm-svn: 283140	2016-10-03 20:11:24 +00:00
Konstantin Zhuravlyov	691e2e020b	[AMDGPU] Sign extend AShr when promoting (instead of zero extending) llvm-svn: 283130	2016-10-03 18:29:01 +00:00
Sanjay Patel	d27a21874b	[x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433) This should fix: https://llvm.org/bugs/show_bug.cgi?id=30433 There are a couple of open questions about the codegen: 1. Should we let scalar ops be scalars and avoid vector constant loads/splats? 2. Should we have a pass to combine constants such as the inverted pair that we have here? Differential Revision: https://reviews.llvm.org/D25165 llvm-svn: 283119	2016-10-03 16:38:27 +00:00
Matt Arsenault	40bae76620	AMDGPU: Fix missing -verify-machineinstrs in test llvm-svn: 283107	2016-10-03 12:58:59 +00:00
Simon Pilgrim	52ab136881	[X86][SSE] Add PR30371 (shuffle constant folding) test case llvm-svn: 283103	2016-10-03 12:16:39 +00:00
Sjoerd Meijer	4dbe73c1ed	[ARM] Code size optimisation to lower udiv+urem to udiv+mls instead of a library call to __aeabi_uidivmod. This is an improved implementation of r280808, see also D24133, that got reverted because isel was stuck in a loop. That was caused by the optimisation incorrectly triggering on i64 ints, which shouldn't happen because there is no 64bit hwdiv support; that put isel's type legalization and this optimisation in a loop. A native ARM compiler and testing now shows that this is fixed. Patch mostly by Pablo Barrio. Differential Revision: https://reviews.llvm.org/D25077 llvm-svn: 283098	2016-10-03 10:12:32 +00:00
Alexey Bataev	fe91cf3aba	[CodeGen] Adding a test showing the current state of poor code gen of search loop, by Andrey Tischenko PR27136 shows failure to hoist constant out of loop. This test is used as start point to fix the failure: it shows the current state of codegen and discovers what should be fixed Differential Revision: https://reviews.llvm.org/D25097 llvm-svn: 283091	2016-10-03 07:47:01 +00:00
Simon Pilgrim	a8d2168cb0	[X86][AVX2] Add support for combining target shuffles to VPERMD/VPERMPS llvm-svn: 283080	2016-10-02 21:07:58 +00:00
Simon Pilgrim	bce1f6b491	[X86][AVX2] Missed opportunities to combine to VPERMD/VPERMPS llvm-svn: 283077	2016-10-02 20:43:02 +00:00
Simon Pilgrim	b5200971d6	[X86][AVX2] Fix typo in test names We are testing vpermps not vpermd llvm-svn: 283076	2016-10-02 19:31:58 +00:00
Sanjay Patel	170d7eb303	[x86] remove 'nan' strings from copysign assertions; NFC Preemptively scrubbing these to avoid a bot fail as in PR30443: https://llvm.org/bugs/show_bug.cgi?id=30443 I'm nearly done with a patch to fix these cases, so not trying very hard to do better for the temporary win. I plan to use better checks than what the script produces for the vectorized cases. llvm-svn: 283072	2016-10-02 17:07:24 +00:00
Sanjay Patel	dfbbbcd662	[x86] add test to show unnecessary scalarization of copysign intrinsics (PR30433) llvm-svn: 283071	2016-10-02 16:31:35 +00:00
Simon Pilgrim	03afbe783d	[X86][AVX] Ensure broadcast loads respect dependencies To allow broadcast loads of a non-zero'th vector element, lowerVectorShuffleAsBroadcast can replace a load with a new load with an adjusted address, but unfortunately we weren't ensuring that the new load respected the same dependencies. This patch adds a TokenFactor and updates all dependencies of the old load to reference the new load instead. Bug found during internal testing. Differential Revision: https://reviews.llvm.org/D25039 llvm-svn: 283070	2016-10-02 15:59:15 +00:00
Hal Finkel	a9321059b9	[PowerPC] Refactor soft-float support, and enable PPC64 soft float This change enables soft-float for PowerPC64, and also makes soft-float disable all vector instruction sets for both 32-bit and 64-bit modes. This latter part is necessary because the PPC backend canonicalizes many Altivec vector types to floating-point types, and so soft-float breaks scalarization support for many operations. Both for embedded targets and for operating-system kernels desiring soft-float support, it seems reasonable that disabling hardware floating-point also disables vector instructions (embedded targets without hardware floating point support are unlikely to have Altivec, etc. and operating system kernels desiring not to use floating-point registers to lower syscall cost are unlikely to want to use vector registers either). If someone needs this to work, we'll need to change the fact that we promote many Altivec operations to act on v4f32. To make it possible to disable Altivec when soft-float is enabled, hardware floating-point support needs to be expressed as a positive feature, like the others, and not a negative feature, because target features cannot have dependencies on the disabling of some other feature. So +soft-float has now become -hard-float. Fixes PR26970. llvm-svn: 283060	2016-10-02 02:10:20 +00:00
Simon Pilgrim	1638d49f20	[X86][SSE] Add support for combining target shuffles to binary BLEND We already had support for 1-input BLEND with zero - this adds support for 2-input BLEND as well. llvm-svn: 283040	2016-10-01 16:04:28 +00:00
Simon Pilgrim	ae17cf20ce	[X86][SSE] Always combine target shuffles to MOVSD/MOVSS Now we can commute to BLENDPD/BLENDPS on SSE41+ targets if necessary, so simplify the combine matching where we can. This required me to add a couple of scalar math movsd/moss fold patterns that hadn't been needed in the past. llvm-svn: 283038	2016-10-01 15:33:01 +00:00
Simon Pilgrim	ccdd1ff49b	[X86][SSE] Enable commutation from MOVSD/MOVSS to BLENDPD/BLENDPS on SSE41+ targets Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements. We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets llvm-svn: 283037	2016-10-01 14:26:11 +00:00
Simon Pilgrim	f1c575bad7	[X86][SSE] Regenerate vselect tests and improve AVX1/AVX2 coverage llvm-svn: 283035	2016-10-01 13:10:14 +00:00
Craig Topper	5eb5ade894	[X86] Cleanup patterns for using VMOVDDUP for broadcasts. -Remove OptForSize. Not all of the backend follows the same rules for creating broadcasts and there is no conflicting pattern. -Don't stop selecting VEX VMOVDDUP when AVX512 is supported. We need VLX for EVEX VMOVDDUP. -Only use VMOVDDUP for v2i64 broadcasts if AVX2 is not supported. llvm-svn: 283020	2016-10-01 07:11:24 +00:00
Craig Topper	8aca90507f	[AVX-512] Add VLX command lines to 128 and 256-bit shufffle tests. llvm-svn: 283014	2016-10-01 06:01:18 +00:00
Mehdi Amini	86eeda8e20	Revert "AMDGPU: Don't use offen if it is 0" This reverts commit r282999. Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038 llvm-svn: 283003	2016-10-01 02:35:24 +00:00
Matt Arsenault	3070fdf798	AMDGPU: Don't use offen if it is 0 This removes many re-initializations of a base register to 0. llvm-svn: 282999	2016-10-01 01:37:15 +00:00
Reid Kleckner	9cb915b7be	[SEH] Emit the parent frame offset label even if there are no funclets This avoids errors about references to undefined local labels from unreferenced filter functions. Fixes (sort of) PR30431 llvm-svn: 282967	2016-09-30 22:10:12 +00:00
Hans Wennborg	b5643b47b6	X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302) We can't use Jcc to leave a Win64 function in general, because that confuses the unwinder. However, for "leaf" functions, that is, functions where the return address is always on top of the stack and which don't have unwind info, it's OK. Differential Revision: https://reviews.llvm.org/D24836 llvm-svn: 282920	2016-09-30 20:07:35 +00:00
Derek Schuff	e9e6891b2d	[WebAssembly] Make register stackification more conservative Register stackification currently checks VNInfo for changes. Make that more accurate by testing each intervening instruction for any other defs to the same virtual register. Patch by Jacob Gravelle Differential Revision: https://reviews.llvm.org/D24942 llvm-svn: 282886	2016-09-30 18:02:54 +00:00
Dylan McKay	309eba75b1	Revert "[RegAllocGreedy] Attempt to split unspillable live intervals" It was accidentally committed. llvm-svn: 282855	2016-09-30 14:05:15 +00:00
Dylan McKay	2a80cc688a	[RegAllocGreedy] Attempt to split unspillable live intervals Summary: Previously, when allocating unspillable live ranges, we would never attempt to split. We would always bail out and try last ditch graph recoloring. This patch changes this by attempting to split all live intervals before performing recoloring. This fixes LLVM bug PR14879. I can't add test cases for any backends other than AVR because none of them have small enough register classes to trigger the bug. Reviewers: qcolombet Subscribers: MatzeB Differential Revision: https://reviews.llvm.org/D25070 llvm-svn: 282852	2016-09-30 13:59:20 +00:00
Craig Topper	3f37a4180b	Revert r282835 "[AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not." Turns out this doesn't pass verify-machineinstrs. llvm-svn: 282841	2016-09-30 05:35:42 +00:00
Craig Topper	bc6e97b8f4	[AVX-512] Always use the full 32 register vector classes for addRegisterClass regardless of whether AVX512/VLX is enabled or not. If AVX512 is disabled, the registers should already be marked reserved. Pattern predicates and register classes on instructions should take care of most of the rest. Loads/stores and physical register copies for XMM16-31 and YMM16-31 without VLX have already been taken care of. I'm a little unclear why this changed the register allocation of the SSE2 run of the sad.ll test, but the registers selected appear to be valid after this change. llvm-svn: 282835	2016-09-30 04:31:33 +00:00
Matt Arsenault	5d8eb25e78	AMDGPU: Use unsigned compare for eq/ne For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832	2016-09-30 01:50:20 +00:00
Reid Kleckner	147f91c88e	[X86] Don't preserve Win64 SSE CSRs when SSE is disabled Code that doesn't use floating point and doesn't use SSE (kernel code) shouldn't save and restore SSE registers. Fixes PR30503 llvm-svn: 282819	2016-09-30 00:17:49 +00:00
Simon Pilgrim	d72330dd09	[X86] Add explicit test triple to make windows/msvc builds happier llvm-svn: 282719	2016-09-29 15:10:09 +00:00
Craig Topper	bd74f75619	[X86] Really fix the FileCheck line from r282690. Why does Folded Spill comments print with a different number of # characters on different systems? llvm-svn: 282693	2016-09-29 06:49:21 +00:00
Craig Topper	1b60e9d7c0	[AVX-512] Fix a check line from r282690. llvm-svn: 282691	2016-09-29 06:37:21 +00:00
Craig Topper	d875d6b9b4	[AVX-512] Support spills of XMM16-31 and YMM16-31 when VLX isn't available. This adds new pseudo instructions that can be selected during register allocation to represent loads and stores of XMM/YMM registers when AVX512F is available, but VLX isn't. They will be converted to VEX encoded moves if the register turns out to be XMM0-15/YMM0-15. Otherwise either an EVEX VEXTRACT(store) or VBROADCAST(load) will be used. Fixes one of the cases from PR29112. llvm-svn: 282690	2016-09-29 06:07:09 +00:00
Craig Topper	f91830e6ee	[X86] Remove extra FileCheck lines that got left behind in r282688. llvm-svn: 282689	2016-09-29 06:07:07 +00:00
Craig Topper	7eb0e7ce1f	[AVX-512] Replicate pattern from AVX to select VMOVDDUP for (v2f64 (X86VBroadcast f64:)). Add AVX512VL to command line of existing AVX2 test that hits this condition. llvm-svn: 282688	2016-09-29 05:54:43 +00:00
Craig Topper	e7f2611160	[X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution domain fixing table. llvm-svn: 282687	2016-09-29 05:54:39 +00:00
Craig Topper	7da0465062	[X86] Add 512-bit VPBROADCASTB and VPBROADCASTW tests. llvm-svn: 282685	2016-09-29 05:54:32 +00:00
Craig Topper	816a1d7783	[X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables. llvm-svn: 282684	2016-09-29 05:54:28 +00:00
Matt Arsenault	e6740754f0	AMDGPU: Partially fix control flow at -O0 Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. llvm-svn: 282667	2016-09-29 01:44:16 +00:00
Krzysztof Parzyszek	dcb1bcae0b	IfConversion: Add implicit uses for redefined regs with live subregisters Normally, if conversion would add implicit uses for redefined registers, e.g. R0<def> = add_if ..., R0<imp-use>. However, if only subregisters of R0 are known to be live but not R0 itself, such implicit uses will not be added, causing prior definitions of such subregisters and R0 itself to become dead. llvm-svn: 282626	2016-09-28 20:07:41 +00:00
Konstantin Zhuravlyov	e14df4b236	[AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions Differential Revision: https://reviews.llvm.org/D24125 llvm-svn: 282624	2016-09-28 20:05:39 +00:00
Simon Pilgrim	fea5c7a051	[X86][AVX] Add test showing that VBROADCAST loads don't correctly respect dependencies llvm-svn: 282613	2016-09-28 17:59:30 +00:00
Adrian Prantl	7f5866c227	Teach LiveDebugValues about lexical scopes. This addresses PR26055 LiveDebugValues is very slow. Contrary to the old LiveDebugVariables pass LiveDebugValues currently doesn't look at the lexical scopes before inserting a DBG_VALUE intrinsic. This means that we often propagate DBG_VALUEs much further down than necessary. This is especially noticeable in large C++ functions with many inlined method calls that all use the same "this"-pointer. For example, in the following code it makes no sense to propagate the inlined variable a from the first inlined call to f() into any of the subsequent basic blocks, because the variable will always be out of scope: void sink(int a); void __attribute((always_inline)) f(int a) { sink(a); } void foo(int i) { f(i); if (i) f(i); f(i); } This patch reuses the LexicalScopes infrastructure we have for LiveDebugVariables to take this into account. The effect on compile time and memory consumption is quite noticeable: I tested a benchmark that is a large C++ source with an enormous amount of inlined "this"-pointers that would previously eat >24GiB (most of them for DBG_VALUE intrinsics) and whose compile time was dominated by LiveDebugValues. With this patch applied the memory consumption is 1GiB and 1.7% of the time is spent in LiveDebugValues. https://reviews.llvm.org/D24994 Thanks to Daniel Berlin and Keith Walker for reviewing! llvm-svn: 282611	2016-09-28 17:51:14 +00:00
Artem Belevich	3e1211581c	[NVPTX] Added intrinsics for atom.gen.{sys\|cta}.* instructions. These are only available on sm_60+ GPUs. Differential Revision: https://reviews.llvm.org/D24943 llvm-svn: 282607	2016-09-28 17:25:38 +00:00
Nirav Dave	e524f50882	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r282600 due to test failues with MCJIT llvm-svn: 282604	2016-09-28 16:37:50 +00:00
Nirav Dave	e17e055b75	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 282600	2016-09-28 15:50:43 +00:00
Guy Blank	2bdc74a471	[X86][FastISel] Use a COPY from K register to a GPR instead of a K operation The KORTEST was introduced due to a bug where a TEST instruction used a K register. but, turns out that the opposite case of KORTEST using a GPR is now happening The change removes the KORTEST flow and adds a COPY instruction from the K reg to a GPR. Differential Revision: https://reviews.llvm.org/D24953 llvm-svn: 282580	2016-09-28 11:22:17 +00:00
Michael Kuperstein	3e06eafc20	[DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombine This check currently doesn't seem to do anything useful on any in-tree target: On non-x86, it always evaluates to false, so we never hit the code path that creates the shuffle with zero. On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to query in general, but doesn't make sense if only restricted to zero blends. Differential Revision: https://reviews.llvm.org/D24625 llvm-svn: 282567	2016-09-28 06:13:58 +00:00
Sanjay Patel	764ae8bd72	[x86] add folds for FP logic with vector zeros The 'or' case shows up in copysign. The copysign code also had redundant checking for a scalar zero operand with 'and', so I removed that. I'm not sure how to test vector 'and', 'andn', and 'xor' yet, but it seems better to just include all of the logic ops since we're fixing 'or' anyway. llvm-svn: 282546	2016-09-27 22:28:13 +00:00
Geoff Berry	b124331db7	[TargetRegisterInfo, AArch64] Add target hook for isConstantPhysReg(). Summary: The current implementation of isConstantPhysReg() checks for defs of physical registers to determine if they are constant. Some architectures (e.g. AArch64 XZR/WZR) have registers that are constant and may be used as destinations to indicate the generated value is discarded, preventing isConstantPhysReg() from returning true. This change adds a TargetRegisterInfo hook that overrides the no defs check for cases such as this. Reviewers: MatzeB, qcolombet, t.p.northover, jmolloy Subscribers: junbuml, aemerson, mcrosier, rengolin Differential Revision: https://reviews.llvm.org/D24570 llvm-svn: 282543	2016-09-27 22:17:27 +00:00
Keith Walker	83ebef5db3	Propagate DBG_VALUE entries when there are unvisited predecessors Variables are sometimes missing their debug location information in blocks in which the variables should be available. This would occur when one or more predecessor blocks had not yet been visited by the routine which propagated the information from predecessor blocks. This is addressed by only considering predecessor blocks which have already been visited. The solution to this problem was suggested by Daniel Berlin on the LLVM developer mailing list. Differential Revision: https://reviews.llvm.org/D24927 llvm-svn: 282506	2016-09-27 16:46:07 +00:00
Simon Dardis	d2ed8abb15	[mips] Disable tail calls temporarily Disable tail calls while the remaining bugs are fixed. Enable only for tests. Reviewers: vkalintiris Differential Review: https://reviews.llvm.org/D24912 llvm-svn: 282487	2016-09-27 13:15:54 +00:00
Nemanja Ivanovic	6f22b41398	[Power9] Builtins for ELF v.2 API conformance - back end portion This patch corresponds to review: https://reviews.llvm.org/D24396 This patch adds support for the "vector count trailing zeroes", "vector compare not equal" and "vector compare not equal or zero instructions" as well as "scalar count trailing zeroes" instructions. It also changes the vector negation to use XXLNOR (when VSX is enabled) so as not to increase register pressure (previously this was done with a splat immediate of all ones followed by an XXLXOR). This was done because the altivec.h builtins (patch to follow) use vector negation and the use of an additional register for the splat immediate is not optimal. llvm-svn: 282478	2016-09-27 08:42:12 +00:00
Craig Topper	71f1c64320	[X86] Add test case for PR30511 and r282341. llvm-svn: 282473	2016-09-27 06:44:30 +00:00
Craig Topper	4ffe5d5af0	[X86] Expand all-ones-vector test to cover 256-bit and 512-bit vectors. llvm-svn: 282472	2016-09-27 06:44:27 +00:00
Davide Italiano	a9f85d68cc	[CodeGen] Add support for emitting .init_array instead of .ctors on FreeBSD. PR: 30494 llvm-svn: 282451	2016-09-26 22:53:15 +00:00
Davide Italiano	f5d77f4aad	[CodeGen] Switch test as FreeBSD will support .init_array soon. llvm-svn: 282450	2016-09-26 22:38:17 +00:00
Derek Schuff	92d300eb8f	[WebAssembly] Use the frame pointer instead of the stack pointer When we have dynamic allocas we have a frame pointer, and when we're lowering frame indexes we should make sure we use it. Patch by Jacob Gravelle Differential Revision: https://reviews.llvm.org/D24889 llvm-svn: 282442	2016-09-26 21:18:03 +00:00
Evandro Menezes	055767d5f4	[AArch64] Fix test triplet Specify proper target triplet to pass under Windows too. llvm-svn: 282423	2016-09-26 18:09:21 +00:00
Tom Stellard	1b9748c6a2	AMDGPU/SI: Don't crash on anonymous GlobalValues Summary: We need to call AsmPrinter::getNameWithPrefix() in order to handle anonymous GlobalValues (e.g. @0, @1). Reviewers: arsenm, b-sumner Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D24865 llvm-svn: 282420	2016-09-26 17:29:25 +00:00
Geoff Berry	256fcf975f	[AArch64] Improve add/sub/cmp isel of uxtw forms. Don't match the UXTW extended reg forms of ADD/ADDS/SUB/SUBS if the 32-bit to 64-bit zero-extend can be done for free by taking advantage of the 32-bit defining instruction zeroing the upper 32-bits of the X register destination. This enables better instruction selection in a few cases, such as: sub x0, xzr, x8 instead of: mov x8, xzr sub x0, x8, w9, uxtw madd x0, x1, x1, x8 instead of: mul x9, x1, x1 add x0, x9, w8, uxtw cmp x2, x8 instead of: sub x8, x2, w8, uxtw cmp x8, #0 add x0, x8, x1, lsl #3 instead of: lsl x9, x1, #3 add x0, x9, w8, uxtw Reviewers: t.p.northover, jmolloy Subscribers: mcrosier, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D24747 llvm-svn: 282413	2016-09-26 15:34:47 +00:00
Evandro Menezes	e45de8a5ec	Add support to optionally limit the size of jump tables. Many high-performance processors have a dedicated branch predictor for indirect branches, commonly used with jump tables. As sophisticated as such branch predictors are, they tend to have well defined limits beyond which their effectiveness is hampered or even nullified. One such limit is the number of possible destinations for a given indirect branches that such branch predictors can handle. This patch considers a limit that a target may set to the number of destination addresses in a jump table. Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar <aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>. Differential revision: https://reviews.llvm.org/D21940 llvm-svn: 282412	2016-09-26 15:32:33 +00:00
James Molloy	9abb2fa5bb	[ARM] Promote small global constants to constant pools If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). This recommit contains fixes for a nasty bug related to fast-isel fallback - because fast-isel doesn't know about this optimization, if it runs and emits references to a string that we inline (because fast-isel fell back to SDAG) we will end up with an inlined string and also an out-of-line string, and we won't emit the out-of-line string, causing backend failures. It also contains fixes for emitting .text relocations which made the sanitizer bots unhappy. llvm-svn: 282387	2016-09-26 07:26:24 +00:00
Zvi Rackover	839d15a194	[X86] Optimization for replacing LEA with MOV at frame index elimination time Summary: Replace a LEA instruction of the form 'lea (%esp), %ebx' --> 'mov %esp, %ebx' MOV is preferable over LEA because usually there are more issue-slots available to execute MOVs than LEAs. Latest processors also support zero-latency MOVs. Fixes pr29022. Reviewers: hfinkel, delena, igorb, myatsina, mkuper Differential Revision: https://reviews.llvm.org/D24705 llvm-svn: 282385	2016-09-26 06:42:07 +00:00
Ayman Musa	d7a5ed4141	[X86][avx512] Fix bug in masked compress store. Differential Revision: https://reviews.llvm.org/D23984 llvm-svn: 282381	2016-09-26 06:22:08 +00:00
Craig Topper	60d3ef1d72	[AVX-512] Fix some patterns predicates to properly enforce priority for various versions of CVTDQ2PD instruction. llvm-svn: 282358	2016-09-25 16:34:02 +00:00
Craig Topper	d8b2bd492c	[AVX-512] Add the scalar unsigned integer to fp conversion instructions to hasUndefRegUpdate. llvm-svn: 282356	2016-09-25 16:33:57 +00:00
Sanjay Patel	752ad8fde7	[x86] don't try to create a vector integer inst for an SSE1 target (PR30512) This bug was introduced with: http://reviews.llvm.org/rL272511 We need to restrict the lowering to v4f32 comparisons because that's all SSE1 can handle. This should fix: https://llvm.org/bugs/show_bug.cgi?id=28044 llvm-svn: 282336	2016-09-24 20:24:06 +00:00
Sanjay Patel	0b36337d61	[x86] fix FCOPYSIGN lowering to create constants instead of ConstantPool loads This is similar to: https://reviews.llvm.org/rL279958 By not prematurely lowering to loads, we should be able to more easily eliminate the 'or' with zero instructions seen in copysign-constant-magnitude.ll. We should also be able to extend this code to handle vectors. llvm-svn: 282312	2016-09-23 23:17:29 +00:00
Matthias Braun	729c989083	llc: Add -start-before/-stop-before options Differential Revision: https://reviews.llvm.org/D23089 llvm-svn: 282302	2016-09-23 21:46:02 +00:00
Matthias Braun	1acb55e67c	ScheduleDAG: Match enum names when printing sdep kinds It is less confusing to have the same names in the debug print as the enum members. llvm-svn: 282273	2016-09-23 18:28:31 +00:00
James Molloy	85124c76fc	Revert "[ARM] Promote small global constants to constant pools" This reverts commit r282241. It caused http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19882. llvm-svn: 282249	2016-09-23 13:35:43 +00:00
Nemanja Ivanovic	d2c3c51a70	[Power9] Exploit move and splat instructions for build_vector improvement This patch corresponds to review: https://reviews.llvm.org/D21135 This patch exploits the following instructions: mtvsrws lxvwsx mtvsrdd mfvsrld In order to improve some build_vector and extractelement patterns. llvm-svn: 282246	2016-09-23 13:25:31 +00:00
James Molloy	1ce54d6be2	[ARM] Promote small global constants to constant pools If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). This recommit contains fixes for a nasty bug related to fast-isel fallback - because fast-isel doesn't know about this optimization, if it runs and emits references to a string that we inline (because fast-isel fell back to SDAG) we will end up with an inlined string and also an out-of-line string, and we won't emit the out-of-line string, causing backend failures. It also contains fixes for emitting .text relocations which made the sanitizer bots unhappy. llvm-svn: 282241	2016-09-23 12:15:58 +00:00
Tom Stellard	e88bbc34c6	AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D24835 llvm-svn: 282223	2016-09-23 01:33:26 +00:00
Arnold Schwaighofer	0fd32c005b	i386 does not support optimized swifterror handling rdar://28432565 llvm-svn: 282186	2016-09-22 20:06:25 +00:00
Hans Wennborg	c4b1d20ba2	Win64: Don't emit unwind info for "leaf" functions (PR30337) According to MSDN (see the PR), functions which don't touch any callee-saved registers (including %rsp) don't need any unwind info. This patch makes LLVM not emit unwind info for such functions, to save binary size. Differential Revision: https://reviews.llvm.org/D24748 llvm-svn: 282185	2016-09-22 19:50:05 +00:00
Nemanja Ivanovic	8dacca943a	[PowerPC] Sign extend sub-word values for atomic comparisons Atomic comparison instructions use the sub-word load instruction on Power8 and up but the value is not sign extended prior to the signed word compare instruction. This patch adds that sign extension. llvm-svn: 282182	2016-09-22 19:06:38 +00:00
Nirav Dave	9011da3d44	[DAG] Fix incorrect alignment of ext load. Correctly use alignment size from loaded size not output value size. Reviewers: jyknight, tstellarAMD, arsenm Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23356 llvm-svn: 282177	2016-09-22 17:28:43 +00:00
Krzysztof Parzyszek	b66efb855c	[PPC] Set SP after loading data from stack frame, if no red zone is present Follow-up to r280705: Make sure that the SP is only restored after all data is loaded from the stack frame, if there is no red zone. This completes the fix for https://llvm.org/bugs/show_bug.cgi?id=26519. Differential Revision: https://reviews.llvm.org/D24466 llvm-svn: 282174	2016-09-22 17:22:43 +00:00
Tim Northover	a5e38fa00d	GlobalISel: handle stack-based parameters on AArch64. llvm-svn: 282153	2016-09-22 13:49:25 +00:00
Nemanja Ivanovic	6e7879c5e6	[Power9] Add exploitation of non-permuting memory ops This patch corresponds to review: https://reviews.llvm.org/D19825 The new lxvx/stxvx instructions do not require the swaps to line the elements up correctly. In order to select them over the lxvd2x/lxvw4x instructions which require swaps, the patterns for the old instruction have a predicate that ensures they won't be selected on Power9 and newer CPUs. llvm-svn: 282143	2016-09-22 09:52:19 +00:00
Craig Topper	202b453a8a	[AVX-512] Add support for commuting VPTERNLOG instructions. VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector. We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value. This refactors and reuses parts of the FMA3 commuting code which is also a three operand instruction. llvm-svn: 282132	2016-09-22 03:00:50 +00:00
Arnold Schwaighofer	de2490d0dc	Disable tail calls if there is an swifterror argument ISel does not handle them correctly yet i.e we crash trying to emit tail call code. radar://28407842 llvm-svn: 282088	2016-09-21 16:53:36 +00:00
Cameron McInally	2aa85e1210	[AVX512] Fix return types on int_x86_avx512_gatherXXX_di intrinsics The return type should match the pass through vector type. Differential Revision: https://reviews.llvm.org/D24744 llvm-svn: 282081	2016-09-21 16:06:10 +00:00

... 3 4 5 6 7 ...

17909 Commits