llvm-project

Commit Graph

Author	SHA1	Message	Date
Chad Rosier	8787a81023	[AArch64] Fix a typo. NFC. llvm-svn: 265160	2016-04-01 17:34:38 +00:00
Sanjay Patel	a05e0ff223	[x86] avoid intermediate splat for non-zero memsets (PR27100) Follow-up to D18566 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The tests that were added in the last patch are now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. In the new tests, the splat via shuffling looks ok to me, but there might be some room for improvement depending on uarch there. Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up. This patch should resolve PR27100. Differential Revision: http://reviews.llvm.org/D18676 llvm-svn: 265148	2016-04-01 16:27:14 +00:00
Valery Pykhtin	5b3559c1ec	[AMDGPU] fix MADAK/MADMK instructions operand namings to match encoding fields. $vsrc1 -> $src1, $k -> $imm Differential Revision: http://reviews.llvm.org/D18659 llvm-svn: 265141	2016-04-01 13:13:12 +00:00
Andrea Di Biagio	8c48841907	[x86] Remove redundant call to setTargetDAGCombine for BUILD_VECTOR node type. Since revision 235394, we no longer perform target specific combines on build_vector nodes. No functional change intended. llvm-svn: 265138	2016-04-01 12:25:44 +00:00
Sagar Thakur	48973d21e1	[MIPS][LLVM-MC] Fix JR encoding for MIPSR6 ISA Summary: The assembler was picking the wrong JR variant because the pre-R6 one was still enabled at R6. Author: nitesh.jain Reviewers: vkalintiris, dsanders Subscribers: dsanders, llvm-commits, mohit.bhakkad, sagar, bhushan, jaydeep Differential: D18387 llvm-svn: 265134	2016-04-01 11:55:33 +00:00
Andrey Turetskiy	958eb46443	[X86] Introduce Lakemont CPU. Add a new Intel MCU CPU Lakemont, which doesn't support X87. Differential Revision: http://reviews.llvm.org/D18650 llvm-svn: 265128	2016-04-01 10:16:15 +00:00
James Molloy	b876c72bcc	Fix for pr24346: arm asm label calculation error in sub Some ARM instructions encode 32-bit immediates as a 8-bit integer (0-255) and a 4-bit rotation (0-30, even) in its least significant 12 bits. The original fixup, FK_Data_4, patches the instruction by the value bit-to-bit, regardless of the encoding. For example, assuming the label L1 and L2 are 0x0 and 0x104 respectively, the following instruction: add r0, r0, #(L2 - L1) ; expects 0x104, i.e., 260 would be assembled to the following, which adds 1 to r0, instead of 260: e2800104 add r0, r0, #4, 2 ; equivalently 1 The new fixup kind fixup_arm_mod_imm takes care of the encoding: e2800f41 add r0, r0, #260 Patch by Ting-Yuan Huang! llvm-svn: 265122	2016-04-01 09:40:47 +00:00
Oliver Stannard	a5520b02a5	[AArch64] Better errors for out-of-range fixups When a fixup that can be resolved by the assembler is out of range, we should report an error in the source, rather than crashing. Differential Revision: http://reviews.llvm.org/D18402 llvm-svn: 265120	2016-04-01 09:14:50 +00:00
Chuang-Yu Cheng	f8b592f213	[PPC64] Bug fix: when enabling sibling-call-opt and shrink-wrapping, the tail call branch instruction might disappear Bug Pattern: # BB#0: # %entry cmpldi 3, 0 beq- 0, .LBB0_2 # BB#1: # %exit lwz 4, 0(3) #TC_RETURNd8 LVComputationKind 0 .LBB0_2: # %cond.false mflr 0 std 0, 16(1) stdu 1, -96(1) .Ltmp0: .cfi_def_cfa_offset 96 .Ltmp1: .cfi_offset lr, 16 bl __assert_fail nop The branch instruction for tail call return is not generated, because the shrink-wrapping pass choosing a new Restore Point: %cond.false, so %exit block is not sent to emitEpilogue, that's why the branch is not generated. Thanks Kit's opinions! Reviewers: nemanjai hfinkel tjablin kbarton http://reviews.llvm.org/D17606 llvm-svn: 265112	2016-04-01 06:44:32 +00:00
Michael Kuperstein	7bab713188	Use range-based for loops. NFC. llvm-svn: 265105	2016-04-01 03:45:08 +00:00
Matthias Braun	cc7fba40fe	AArch64ISelLowering: Remove unused variables/arguments; NFC llvm-svn: 265098	2016-04-01 02:49:17 +00:00
Justin Lebar	96418481bc	[NVPTX] Add a truncate DAG node to some calls. Summary: Previously, we were running afoul of the assertion EVT(CLI.Ins[i].VT) == InVals[i].getValueType() && "LowerCall emitted a value with the wrong type!" in SelectionDAGBuilder.cpp when running the NVPTX/i8-param.ll test. This is because our backend (for some reason) treats small return values as i32, but it wasn't ever truncating the i32 back down to the expected width in the DAG. Unclear to me whether this fixes any actual bugs -- in this test, at least, the generated code is unchanged. Reviewers: jingyue Subscribers: llvm-commits, tra, jholewinski Differential Revision: http://reviews.llvm.org/D17872 llvm-svn: 265091	2016-04-01 01:09:10 +00:00
Justin Lebar	efcc81cbb4	[NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect. Summary: Previously the NVVMReflect pass would read its configuration from command-line flags or a static configuration given to the pass at instantiation time. This doesn't quite work for clang's use-case. It needs to pass a value for __CUDA_FTZ down on a per-module basis. We use a module flag for this, so the NVVMReflect pass needs to be updated to read said flag. Reviewers: tra, rnk Subscribers: cfe-commits, jholewinski Differential Revision: http://reviews.llvm.org/D18672 llvm-svn: 265090	2016-04-01 01:09:07 +00:00
Justin Lebar	645c3014a1	[NVPTX] Annotate some instructions as hasSideEffects = 0. Summary: Tablegen tries to infer this from the selection DAG patterns defined for the instructions, but it can't always. An instructive example is CLZr64. CLZr32 is correctly inferred to have no side-effects, but the selection DAG pattern for CLZr64 is slightly more complicated, and in particular the ctlz DAG node is not at the root of the pattern. Thus tablegen can't infer that CLZr64 has no side-effects. Reviewers: jholewinski Subscribers: jholewinski, tra, llvm-commits Differential Revision: http://reviews.llvm.org/D17472 llvm-svn: 265089	2016-04-01 01:09:05 +00:00
Hans Wennborg	649159df3c	Follow-up to r265036: I got these iterators mixed up llvm-svn: 265076	2016-03-31 23:55:16 +00:00
Jun Bum Lim	760afcb338	[AArch64] Allow loads with imp-def to be handled in getMemOpBaseRegImmOfsWidth() Summary: This change will allow loads with imp-def to be clustered in machine-scheduler pass. areMemAccessesTriviallyDisjoint() can also handle loads with imp-def. Reviewers: mcrosier, jmolloy, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18665 llvm-svn: 265051	2016-03-31 20:53:47 +00:00
Hal Finkel	fc35391f2b	[PowerPC] Add a late MI-level pass for QPX load/splat simplification Chapter 3 of the QPX manual states that, "Scalar floating-point load instructions, defined in the Power ISA, cause a replication of the source data across all elements of the target register." Thus, if we have a load followed by a QPX splat (from the first lane), the splat is redundant. This adds a late MI-level pass to remove the redundant splats in some of these cases (specifically when both occur in the same basic block). This optimization is scheduled just prior to post-RA scheduling. It can't happen before anything that might replace the load with some already-computed quantity (i.e. store-to-load forwarding). llvm-svn: 265047	2016-03-31 20:39:41 +00:00
Hans Wennborg	132cd62121	Revert r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" I think it might have caused these build breakages: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio llvm-svn: 265046	2016-03-31 20:27:30 +00:00
Benjamin Kramer	569efd2cfd	[ARM] Expand v1i64 and v2i64 ctpop. The default is legal, which results in 'Cannot select' errors. This is triggered during selfhost due to a recent cost model change. llvm-svn: 265040	2016-03-31 19:42:04 +00:00
Hans Wennborg	e97fb414e8	[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140) For code such as: void f(int, int); void g() { f(1, 2); } compiled for 32-bit X86 Linux, Clang would previously generate: subl $12, %esp subl $8, %esp pushl $2 pushl $1 calll f addl $16, %esp addl $12, %esp retl This patch fixes that by merging adjacent stack adjustments in eliminateCallFramePseudoInstr(). Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265039	2016-03-31 19:26:24 +00:00
Hans Wennborg	e1a2e90ffa	Change eliminateCallFramePseudoInstr() to return an iterator This will become necessary in a subsequent change to make this method merge adjacent stack adjustments, i.e. it might erase the previous and/or next instruction. It also greatly simplifies the calls to this function from Prolog- EpilogInserter. Previously, that had a bunch of logic to resume iteration after the call; now it just continues with the returned iterator. Note that this changes the behaviour of PEI a little. Previously, it attempted to re-visit the new instruction created by eliminateCallFramePseudoInstr(). That code was added in r36625, but I can't see any reason for it: the new instructions will obviously not be pseudo instructions, they will not have FrameIndex operands, and we have already accounted for the stack adjustment. Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265036	2016-03-31 18:33:38 +00:00
Jacques Pienaar	4badd6aaf3	[lanai] isBrImm should accept any non-constant immediate. isBrImm should accept any non-constant immediate. Previously it was only accepting LanaiMCExpr ones which was wrong. Differential Revision: http://reviews.llvm.org/D18571 llvm-svn: 265032	2016-03-31 17:58:55 +00:00
Ehsan Amiri	99b017ae35	[PPC] basic support for Power 9 direct move instructions http://reviews.llvm.org/D18097 Initial support does not include any patterns to generate this instructions llvm-svn: 265031	2016-03-31 17:47:17 +00:00
Sanjay Patel	92d5ea5e07	[x86] use SSE/AVX ops for non-zero memsets (PR27100) Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping into a codegen sinkhole while trying to splat a byte into an XMM reg. Follow-on bugs exposed by the current codegen are: https://llvm.org/bugs/show_bug.cgi?id=27141 https://llvm.org/bugs/show_bug.cgi?id=27143 Differential Revision: http://reviews.llvm.org/D18566 llvm-svn: 265029	2016-03-31 17:30:06 +00:00
Ulrich Weigand	3707ba8030	[PowerPC] Correctly compute 64-bit offsets in fast isel PPCSimplifyAddress contains this code: IntegerType OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(Context) : Type::getInt64Ty(Context)); to determine the type to be used for an index register, if one needs to be created. However, the "VT" here is the type of the data being loaded or stored, not* the type of an address. This means that if a data element of type i32 is accessed using an index that does not not fit into 32 bits, a wrong address is computed here. Note that PPCFastISel is only ever used on 64-bit currently, so the type of an address is actually always MVT::i64. Other parts of the code, even in this same PPCSimplifyAddress routine, already rely on that fact. Thus, this patch changes the code to simply unconditionally use Type::getInt64Ty(*Context) as OffsetTy. llvm-svn: 265023	2016-03-31 15:37:06 +00:00
Nemanja Ivanovic	a621a7f9c3	[PowerPC] Basic support for P9 atomic loads and stores This patch corresponds to review: http://reviews.llvm.org/D18032 This patch provides asm implementation for the following instructions: lwat, ldat, stwat, stdat, ldmx, mcrxrx llvm-svn: 265022	2016-03-31 15:26:37 +00:00
Jun Bum Lim	cf9744367b	[AArch64] Handle missing store pair opportunity Summary: This change will handle missing store pair opportunity where the first store instruction stores zero followed by the non-zero store. For example, this change will convert : str wzr, [x8] str w1, [x8, #4] into: stp wzr, w1, [x8] Reviewers: jmolloy, t.p.northover, mcrosier Subscribers: flyingforyou, aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18570 llvm-svn: 265021	2016-03-31 14:47:24 +00:00
Ulrich Weigand	1931b01a64	[PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast isel The fast isel pass currently emits a COPY_TO_REGCLASS node to convert from a F4RC to a F8RC register class during conversion of a floating-point number to integer. There is actually no support in the common code instruction printers to emit COPY_TO_REGCLASS nodes, so the PowerPC back-end has special code there to simply ignore COPY_TO_REGCLASS. This is correct if and only if the source and destination registers of COPY_TO_REGCLASS are the same (except for the different register class). But nothing guarantees this to be the case, and if the register allocator does end up allocating source and destination to different registers after all, the back-end simply generates incorrect code. I've included a test case that shows such incorrect code generation. However, it seems that COPY_TO_REGCLASS is actually not intended to be used at the MI layer at all. It is used during SelectionDAG, but always lowered to a plain COPY before emitting MI. Other back-end's fast isel passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong for the PowerPC back-end to emit it here. This patch changes the PowerPC back-end to directly emit COPY instead of COPY_TO_REGCLASS and removes the special handling in the instruction printers. Differential Revision: http://reviews.llvm.org/D18605 llvm-svn: 265020	2016-03-31 14:44:50 +00:00
Daniel Sanders	85fd10bd93	[mips] Range check simm16 Summary: There are too many instructions to exhaustively test so addiu and lwc2 are used as representative examples. It should be noted that many memory instructions that should have simm16 range checking do not because it is also necessary to support the macro of the same name which accepts simm32. The range checks for these occur in the macro expansion. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18437 llvm-svn: 265019	2016-03-31 14:34:00 +00:00
Daniel Sanders	eab3146156	[mips] Range check simm11 and mem_simm11. Summary: ldc2/sdc2 now emit slightly worse diagnostics for MIPS-I. The problem is that they don't trigger the custom parser because all the candidates are disabled by feature bits. On all other subtargets, the diagnostics are accurate but are subject to the usual issues of needing to report multiple ways to correct the code (e.g. smaller offset, enable a CPU feature) but only being able to report one error. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18436 llvm-svn: 265018	2016-03-31 14:23:20 +00:00
Sam Kolton	1048fb1818	[AMDGPU] Disassembler: support for DPP Review: http://reviews.llvm.org/D18642 llvm-svn: 265015	2016-03-31 14:15:04 +00:00
Daniel Sanders	dc0602a2c2	[mips] Split mem_msa into range checked mem_simm10 and mem_simm10_lsl[123] Summary: Also, made test_mi10.s formatting consistent with the majority of the MC tests. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18435 llvm-svn: 265014	2016-03-31 14:12:01 +00:00
Nirav Dave	83ce54aac2	Prevent X86ISelLowering from merging volatile loads Change isConsecutiveLoads to check that loads are non-volatile as this is a requirement for any load merges. Propagate change to two callers. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18546 llvm-svn: 265013	2016-03-31 13:40:55 +00:00
Daniel Sanders	2e9f69d933	[mips] Range check simm9 and fix a bug this revealed. Summary: The bug was that microMIPS's [ls]w[lr]e instructions claimed to support a 12-bit offset when it is only 9-bit. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D18434 llvm-svn: 265010	2016-03-31 13:15:23 +00:00
Zlatko Buljan	6221be8e46	[mips][microMIPS] Implement MFC, MFHC and DMFC* instructions Differential Revision: http://reviews.llvm.org/D17334 llvm-svn: 265002	2016-03-31 08:51:24 +00:00
Jonas Paulsson	2ba315218b	Indentation fix in SystemZInstrInfo.cpp llvm-svn: 265000	2016-03-31 08:00:14 +00:00
Craig Topper	d2aa03a60a	[X86] Use MVT instead of EVT in code called after legalization. llvm-svn: 264992	2016-03-31 04:37:41 +00:00
Hal Finkel	851b33a0b1	[PowerPC] Load two floats directly instead of using one 64-bit integer load When dealing with complex<float>, and similar structures with two single-precision floating-point numbers, especially when such things are being passed around by value, we'll sometimes end up loading both float values by extracting them from one 64-bit integer load. It looks like this: t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64 t16: i64 = srl t13, Constant:i32<32> t17: i32 = truncate t16 t18: f32 = bitcast t17 t19: i32 = truncate t13 t20: f32 = bitcast t19 The problem, especially before the P8 where those bitcasts aren't legal (and get expanded via the stack), is that it would have been better to use two floating-point loads directly. Here we add a target-specific DAGCombine to do just that. In short, we turn: ld 3, 0(5) stw 3, -8(1) rldicl 3, 3, 32, 32 stw 3, -4(1) lfs 3, -4(1) lfs 0, -8(1) into: lfs 3, 4(5) lfs 0, 0(5) llvm-svn: 264988	2016-03-31 02:56:05 +00:00
Hans Wennborg	6596977130	[X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325) The size savings are significant, and from what I can tell, both ICC and GCC do this. Differential Revision: http://reviews.llvm.org/D18573 llvm-svn: 264966	2016-03-30 23:38:01 +00:00
Matthias Braun	8d41436004	CodeGen: Factor out code for tail call result compatibility check; NFC llvm-svn: 264959	2016-03-30 22:46:04 +00:00
Matt Arsenault	2fe4fbc184	AMDGPU: Add frexp_exp intrinsic llvm-svn: 264944	2016-03-30 22:28:52 +00:00
Aaron Ballman	ef0fe1eed8	Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929	2016-03-30 21:30:00 +00:00
Simon Pilgrim	c49bd2ede0	[X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible. llvm-svn: 264922	2016-03-30 20:52:24 +00:00
Justin Lebar	e3804cc932	[NVPTX] Make NVVMReflect a function pass. Summary: Currently it's a module pass. Make it a function pass so that we can move it to PassManagerBuilder's EP_EarlyAsPossible extension point, which only accepts function passes. Reviewers: rnk Subscribers: tra, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D18615 llvm-svn: 264919	2016-03-30 20:40:11 +00:00
Chad Rosier	f7ac5f28ab	[AArch64] Fix warnings pointed out by Hal. llvm-svn: 264882	2016-03-30 18:08:51 +00:00
Tom Stellard	1d5e6d4bdc	AMDGPU/SI: Improve MachineSchedModel definition This patch contains a few improvements to the model, including: - Using a single resource with a defined buffers size for each memory unit. - Setting the IssueWidth correctly. - Fixing latency values for memory instructions. shader-db stats: 16429 shaders in 3231 tests Totals: SGPRS: 318232 -> 312328 (-1.86 %) VGPRS: 208996 -> 209346 (0.17 %) Code Size: 7147044 -> 7166440 (0.27 %) bytes LDS: 83 -> 83 (0.00 %) blocks Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave Max Waves: 49182 -> 49243 (0.12 %) Wait states: 0 -> 0 (0.00 %)A Differential Revision: http://reviews.llvm.org/D18453 llvm-svn: 264877	2016-03-30 16:35:13 +00:00
Tom Stellard	0bc954e3bc	AMDGPU/SI: Enable lanemask tracking in misched Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876	2016-03-30 16:35:09 +00:00
Jonas Paulsson	f76123386a	[SystemZ] Add nop and nopr InstAliases. For compatability with GAS, nop and nopr are recognized as alises for bc and bcr, respectively. A mask of 0 turns these instructions effectively into no-operations. Reviewed by Ulrich Weigand. llvm-svn: 264875	2016-03-30 16:11:58 +00:00
Nirav Dave	8dd66e5753	Remove HasFnAttribute guards to getFnAttribute calls These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872	2016-03-30 15:41:12 +00:00
Simon Pilgrim	b87ffe8519	[X86][XOP] BITREVERSE lowering using VPPERM XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction. llvm-svn: 264870	2016-03-30 14:14:00 +00:00
Benjamin Kramer	9415e06da7	[NVPTX] Avoid temporary std::string and make single-use function local to the cpp file. No functionality change intended. llvm-svn: 264861	2016-03-30 12:31:51 +00:00
Chandler Carruth	8e06a10d1f	[x86] Fix a horrible bug in our lowering of x86 floating point atomic operations. Specifically, we had code that tried to badly approximate reconstructing all of the possible variations on addressing modes in two x86 instructions based on those in one pseudo instruction. This is not the first bug uncovered with doing this, so stop doing it altogether. Instead generically and pedantically copy every operand from the address over to both new instructions, and strip kill flags from any register operands. This fixes a subtle bug seen in the wild where we would mysteriously drop parts of the addressing mode, causing for example the index argument in the added test case to just be completely ignored. Hypothetically, this was an extremely bad miscompile because it actually caused a predictable and leveragable write of a 64bit quantity to an unintended offset (the first element of the array intead of whatever other element was intended). As a consequence, in theory this could even have introduced security vulnerabilities. However, this was only something that could happen with an atomic floating point add. No other operation could trigger this bug, so it seems extremely unlikely to have occured widely in the wild. But it did in fact occur, and frequently in scientific applications which were using relaxed atomic updates of a floating point value after adding a delta. Those would end up being quite badly miscompiled by LLVM, which is how we found this. Of course, this often looks like a race condition in the code, but it was actually a miscompile. I suspect that this whole RELEASE_FADD thing was a complete mistake. There is no such operation, and I worry that anything other than add will get remarkably worse codegeneration. But that's not for this change.... llvm-svn: 264845	2016-03-30 08:41:59 +00:00
Chandler Carruth	81c3ddeb1c	[x86] Extract a helper function to compute the full addressing mode from an x86 MachineInstr's operands. This will be super useful to fix some bad atomics code in my next commit. No functionality changed. llvm-svn: 264819	2016-03-30 03:10:24 +00:00
Adam Nemet	fb8fbba584	[Aarch64] Turn on the LoopDataPrefetch pass for Cyclone llvm-svn: 264811	2016-03-30 00:21:29 +00:00
Adam Nemet	b81f1e0db3	[PPC] Remove -ppc-loop-prefetch-distance in favor of -prefetch-distance After the previous change, this can now be overridden centrally in the pass. llvm-svn: 264807	2016-03-29 23:45:56 +00:00
Adam Nemet	1428d41f9a	[LoopDataPrefetch] Centralize the tuning cl::opts under the pass This is effectively NFC, minus the renaming of the options (-cyclone-prefetch-distance -> -prefetch-distance). The change was requested by Tim in D17943. llvm-svn: 264806	2016-03-29 23:45:52 +00:00
James Y Knight	7306cd47d4	[SPARC] Use AtomicExpandPass to expand AtomicRMW instructions. They were previously expanded to CAS loops in a custom isel expansion, but AtomicExpandPass knows how to do that generically. Testing is covered by the existing sparc atomics.ll testcases. llvm-svn: 264771	2016-03-29 19:09:54 +00:00
Manman Ren	f46262e0b7	Swift Calling Convention: add swiftself attribute. Differential Revision: http://reviews.llvm.org/D17866 llvm-svn: 264754	2016-03-29 17:37:21 +00:00
Konstantin Zhuravlyov	ecc7cbf611	Test commit access llvm-svn: 264736	2016-03-29 15:15:44 +00:00
Simon Dardis	9a3f32c00d	[mips] Test commit: Mark insertNoop as dead code (NFC) llvm-svn: 264728	2016-03-29 13:02:19 +00:00
Daniel Sanders	5d3840fdf9	[mips] Correct MIPS16 jal/jalx to have uimm26 offsets and add MC layer range checks. NFC. Summary: However, this has no effect at this time because the instructions affected are marked 'isCodeGenOnly=1' and have no alternative for the MC layer. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D18179 llvm-svn: 264712	2016-03-29 09:40:38 +00:00
Elena Demikhovsky	95629caaa9	AVX-512: fixed a bug in fp_to_uint pattern on KNL Fixed fp_to_uint instruction selection on KNL. One pattern was missing for <4 x double> to <4 x i32> Differential Revision: http://reviews.llvm.org/D18512 llvm-svn: 264701	2016-03-29 06:33:41 +00:00
Hal Finkel	fa7057a415	[PowerPC] Refactor popcnt[dw] target features Instead of using two feature bits, one to indicate the availability of the popcnt[dw] instructions, and another to indicate whether or not they're fast, use a single enum. This allows more consistent control via target attribute strings, and via Clang's command line. llvm-svn: 264690	2016-03-29 01:36:01 +00:00
Derek Schuff	ecabac6244	[WebAssembly] Remove duplicate disabling of passes Also put all the disabled passes together llvm-svn: 264684	2016-03-28 22:52:20 +00:00
Hal Finkel	69ada2f514	[PowerPC] Clarify a comment in PPCTTI about vector loads This should say that we could do unaligned vector loads on the P7 using VSX instructions, not that we should. llvm-svn: 264683	2016-03-28 22:39:35 +00:00
Simon Pilgrim	d3df400fa9	[X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements. If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors. The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent. Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least. Differential Revision: http://reviews.llvm.org/D18492 llvm-svn: 264666	2016-03-28 21:33:52 +00:00
Haicheng Wu	6a6bc750d5	[AArch64] Do not lower scalar sdiv/udiv to a shifts + mul sequence when optimizing for minsize Mimic what x86 does when optimizing sdiv/udiv for minsize. llvm-svn: 264606	2016-03-28 18:17:07 +00:00
Hal Finkel	7059d41622	[PowerPC] On the A2, popcnt[dw] are very slow The A2 cores support the popcntw/popcntd instructions, but they're microcoded, and slower than our default software emulation. Specifically, popcnt[dw] take approximately 74 cycles, whereas our software emulation takes only 24-28 cycles. I've added a new target feature to indicate a slow popcnt[dw], instead of just removing the existing target feature from the a2/a2q processor models, because: 1. This allows us to return more accurate information via the TTI interface (I recognize that this currently makes no practical difference) 2. Is hopefully easier to understand (it allows the core's features to match its manual while still having the desired effect). llvm-svn: 264600	2016-03-28 17:52:08 +00:00
Derek Schuff	ad154c837e	Introduce MachineFunctionProperties and the AllVRegsAllocated property MachineFunctionProperties represents a set of properties that a MachineFunction can have at particular points in time. Existing examples of this idea are MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which will eventually be switched to use this mechanism. This change introduces the AllVRegsAllocated property; i.e. the property that all virtual registers have been allocated and there are no VReg operands left. With this mechanism, passes can declare that they require a particular property to be set, or that they set or clear properties by implementing e.g. MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class verifies that the requirements are met, and handles the setting and clearing based on the delcarations. Passes can also directly query and update the current properties of the MF if they want to have conditional behavior. This change annotates the target-independent post-regalloc passes; future changes will also annotate target-specific ones. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D18421 llvm-svn: 264593	2016-03-28 17:05:30 +00:00
Tom Stellard	a76bcc2ea1	AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589	2016-03-28 16:10:13 +00:00
Krzysztof Parzyszek	2d65ea74dc	[Hexagon] Improve handling of unaligned vector loads and stores llvm-svn: 264584	2016-03-28 15:43:03 +00:00
Krzysztof Parzyszek	bb63f66686	[Hexagon] Only use restore functions for single register at -Oz llvm-svn: 264581	2016-03-28 14:52:21 +00:00
Krzysztof Parzyszek	a34901aae9	[Hexagon] Speed up frame lowering when no optimizations are enabled - Do not optimize stack slots in optnone functions. - Get aligned-base register from HexagonMachineFunctionInfo instead of looking for ALIGNA instruction in the function's body. llvm-svn: 264580	2016-03-28 14:42:03 +00:00
Douglas Katzman	d0c11cf7ad	Sparc: silently ignore .proc assembler directive Differential Revision: http://reviews.llvm.org/D18463 llvm-svn: 264579	2016-03-28 14:00:11 +00:00
Jacques Pienaar	fcef3e4617	[lanai] Add Lanai backend. Add the Lanai backend to lib/Target. General Lanai backend discussion on llvm-dev thread "[RFC] Lanai backend" (http://lists.llvm.org/pipermail/llvm-dev/2016-February/095118.html). Differential Revision: http://reviews.llvm.org/D17011 llvm-svn: 264578	2016-03-28 13:09:54 +00:00
Chuang-Yu Cheng	d5eb774eb6	[Power9] Implement new altivec instructions: bcd* series This patch implements the following altivec instructions: - Decimal Convert From/to National/Zoned/Signed-QWord: bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq. - Decimal Copy-Sign/Set-Sign: bcdcpsgn. bcdsetsgn. - Decimal Shift/Unsigned-Shift/Shift-and-Round: bcds. bcdus. bcdsr. - Decimal (Unsigned) Truncate: bcdtrunc. bcdutrunc. Total 13 instructions Thanks Amehsan's advice! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D17838 llvm-svn: 264568	2016-03-28 09:04:23 +00:00
Chuang-Yu Cheng	80722719eb	[Power9] Implement new vsx instructions: insert, extract, test data class, min/max, reverse, permute, splat This change implements the following vsx instructions: - Scalar Insert/Extract xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp - Vector Insert/Extract xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp xxextractuw xxinsertw - Scalar/Vector Test Data Class xststdcdp xststdcsp xststdcqp xvtstdcdp xvtstdcsp - Maximum/Minimum xsmaxcdp xsmaxjdp xsmincdp xsminjdp - Vector Byte-Reverse/Permute/Splat xxbrd xxbrh xxbrq xxbrw xxperm xxpermr xxspltib 30 instructions Thanks Nemanja for invaluable discussion! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16842 llvm-svn: 264567	2016-03-28 08:34:28 +00:00
Elena Demikhovsky	83f0647d85	AVX-512: Fixed ICMP instruction selection for i1 operands ICMP instruction selection fails on SKX and KNL for i1 operand. I use XOR to resolve: (A == B) is equivalent to (A xor B) == 0 Differential Revision: http://reviews.llvm.org/D18511 llvm-svn: 264566	2016-03-28 07:47:58 +00:00
Chuang-Yu Cheng	5663848996	[Power9] Implement new vsx instructions: quad-precision move, fp-arithmetic This change implements the following vsx instructions: - quad-precision move xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp - quad-precision fp-arithmetic xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o) xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o) 22 instructions Thanks Nemanja and Kit for careful review and invaluable discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16110 llvm-svn: 264565	2016-03-28 07:38:01 +00:00
Hal Finkel	0b37175ca6	[PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based loop legality Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc function calls (fmax[f], etc.) generally map to function calls after lowering. For some vector types with QPX at least, however, we can legally lower these, and we don't need to prohibit CTR-based loops on their account. It turned out, however, that the logic that checked the opcodes associated with intrinsics was broken (it would set the Opcode variable, but that variable was later checked only if set for some otherwise-external function call. This fixes the latter problem and adds the FMAX/MINNUM mappings. llvm-svn: 264532	2016-03-27 05:40:56 +00:00
Simon Pilgrim	dcdf85033c	[X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization llvm-svn: 264517	2016-03-26 18:32:13 +00:00
Simon Pilgrim	e4dbeb40c6	[X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264512	2016-03-26 15:44:55 +00:00
Simon Pilgrim	3eef33a806	[X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors Currently this is to mainly to prevent scalarization of integer division by constants. Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264511	2016-03-26 15:27:20 +00:00
Simon Pilgrim	7379a70677	[X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 multiplies. Only pre-AVX512BW targets need to split v32i8 vectors. llvm-svn: 264509	2016-03-26 09:44:27 +00:00
David Majnemer	b549ab02b4	[PowerPC] Disable the CTR optimization in the presence of {min,max}num The minnum and maxnum intrinsics get lowered to libcalls which invalidates the CTR optimization. This fixes PR27083. llvm-svn: 264508	2016-03-26 09:42:31 +00:00
Simon Pilgrim	ff7b7141cd	[X86][SSE] Don't duplicate Lower256IntArith functionality in LowerMul. NFC. LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead. llvm-svn: 264506	2016-03-26 09:29:04 +00:00
Chuang-Yu Cheng	065969ec8e	[Power9] Implement new altivec instructions: permute, count zero, extend sign, negate, parity, shift/rotate, mul10 This change implements the following vector operations: - vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw - vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d - vnegd vnegw - vprtybd vprtybq vprtybw - vbpermd vpermr - vrlwnm vrlwmi vrldnm vrldmi vslv vsrv - vmul10cuq vmul10uq vmul10ecuq vmul10euq 28 instructions Thanks Nemanja, Kit for invaluable hints and discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan Phabricator: http://reviews.llvm.org/D15887 llvm-svn: 264504	2016-03-26 05:46:11 +00:00
David Majnemer	020e890a19	[X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr We forgot to add the second machine operand to our ADJCALLSTACKDOWN, resulting in crashes in PEI. This fixes PR27071. llvm-svn: 264465	2016-03-25 21:49:11 +00:00
Saleem Abdulrasool	750a90df6a	ARM: maintain BB ordering when expanding WIN__DBZCHK It is possible to have a fallthrough MBB prior to MBB placement. The original addition of the BB would result in reordering the BB as not preceding the successor. Because of the fallthrough nature of the BB, we could end up executing incorrect code or even a constant pool island! Insert the spliced BB into the same location to avoid that. Thanks to Tim Northover for invaluable hints and Fiora for the discussion on what may have been occurring! llvm-svn: 264454	2016-03-25 19:48:06 +00:00
Justin Bogner	f2a0d349a6	AMDGPU: Fix a use-after free and a missing break We're erasing MI here, but then immediately using it again inside the `if`. This moves the erase after we're done using it. Doing that reveals a second problem though - this case is missing a break, so we fall through to the default and dereference MI again. This is obviously a bug, though I don't know how to write a test that triggers it - all we do in the error case is print some extra debug output. Both of these issue crash on lots of tests under ASAN with the recycling allocator changes from PR26808 applied. llvm-svn: 264442	2016-03-25 18:33:16 +00:00
Hans Wennborg	5f916d3df4	[X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsize 64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes, respectively, whereas and/or with 8-bit immediate is only three bytes. Since these instructions imply an additional memory read (which the CPU could elide, but we don't think it does), restrict these patterns to minsize functions. Differential Revision: http://reviews.llvm.org/D18374 llvm-svn: 264440	2016-03-25 18:11:31 +00:00
Jonas Paulsson	5dd1e56de5	[SystemZ] Remove isBranch and isTerminator flags on BRCT and BRCTG. The BranchUnaryRI instruction class already sets these flags. Reviewed by Ulrich Weigand. llvm-svn: 264411	2016-03-25 15:42:30 +00:00
Chad Rosier	59bcbba6b4	[AArch64] Fix typo. NFC. llvm-svn: 264408	2016-03-25 14:37:43 +00:00
Simon Pilgrim	ac04923b0f	[X86][SSE] Don't duplicate Lower256IntArith functionality in LowerShift. NFC. LowerShift was using the same code as Lower256IntArith to split 256-bit vectors into 2 x 128-bit vectors, so now we just call Lower256IntArith. llvm-svn: 264403	2016-03-25 14:17:54 +00:00
Elena Demikhovsky	abc9c04ab7	fixed typo llvm-svn: 264395	2016-03-25 10:08:36 +00:00
Matt Arsenault	8c8fcb2585	AMDGPU: Cost model for basic integer operations This resolves bug 21148 by preventing promotion to i64 induction variables. llvm-svn: 264376	2016-03-25 01:16:40 +00:00
Hans Wennborg	4ae5119eeb	X86: Use push-pop for materializing 8-bit immediates for minsize (take 2) This is the same as r255936, with added logic for avoiding clobbering of the red zone (PR26023). Differential Revision: http://reviews.llvm.org/D18246 llvm-svn: 264375	2016-03-25 01:10:56 +00:00
Matt Arsenault	9651813ee0	AMDGPU: Partially implement getArithmeticInstrCost for FP ops llvm-svn: 264374	2016-03-25 01:00:32 +00:00
Duncan P. N. Exon Smith	1d15a9f0c9	IR: Reserve an MDKind for !llvm.loop; NFC This reserves an MDKind for !llvm.loop, which allows callers to avoid a string-based lookup. I'm not sure why it was missing. There should be no functionality change here, just a small compile-time speedup. llvm-svn: 264371	2016-03-25 00:35:38 +00:00
Saleem Abdulrasool	0dab98d926	ARM: fix optimised division on WoA We did not have an explicit branch to the continuation BB. When the check was hoisted, this could permit control follow to fall through into the division trap. Add the explicit branch to the continuation basic block to ensure that code execution is correct. llvm-svn: 264370	2016-03-25 00:34:11 +00:00
Matt Arsenault	59767cea79	AMDGPU: TTI: Make insertelement free. We don't want to have a cost to scalarizing operations. llvm-svn: 264364	2016-03-25 00:14:11 +00:00
Eric Christopher	b979d51afa	Finish the incomplete 'd' inline asm constraint support for PPC by making sure we give it a register and mark it as a register constraint. llvm-svn: 264340	2016-03-24 21:04:52 +00:00
Krzysztof Parzyszek	01598de3ec	[Hexagon] Be sure to treat subregisters of a CSR as CSRs as well llvm-svn: 264331	2016-03-24 20:31:41 +00:00
Krzysztof Parzyszek	c9d4caa32c	[Hexagon] Add support for run-time stack overflow checking Patch by Sundeep Kushwaha. llvm-svn: 264328	2016-03-24 20:20:07 +00:00
Krzysztof Parzyszek	181fdbd174	[Hexagon] Generate PIC-specific versions of save/restore routines In PIC mode, the registers R14, R15 and R28 are reserved for use by the PLT handling code. This causes all functions to clobber these registers. While this is not new for regular function calls, it does also apply to save/restore functions, which do not follow the standard ABI conventions with respect to the volatile/non-volatile registers. Patch by Jyotsna Verma. llvm-svn: 264324	2016-03-24 19:18:48 +00:00
Simon Atanasyan	26fe92d19f	[MC][mips] Add MipsMCInstrAnalysis class and register it as MC instruction analyzer The `MipsMCInstrAnalysis` class overrides the `evaluateBranch` method and calculates target addresses for branch and calls instructions. That allows llvm-objdump to print functions' names in branch instructions in the disassemble mode. Differential Revision: http://reviews.llvm.org/D18209 llvm-svn: 264309	2016-03-24 17:18:14 +00:00
Simon Pilgrim	a6ba27fbde	[X86][XOP] Fixed instruction postfixes to more closely match operands Suggested by Sanjay in D18189 as the multiple folding options in XOP instructions can be tricky llvm-svn: 264305	2016-03-24 16:31:30 +00:00
Elena Demikhovsky	95f3173ce9	AVX-512: Generate KTEST instead of TEST fir i1 vectors KTEST instruction may be used instead of TEST in this case: %int_sel3 = bitcast <8 x i1> %sel3 to i8 %res = icmp eq i8 %int_sel3, zeroinitializer br i1 %res, label %L2, label %L1 Differential Revision: http://reviews.llvm.org/D18444 llvm-svn: 264298	2016-03-24 15:53:45 +00:00
Tim Northover	4498eff9bb	CodeGen: extend RHS when splitting ATOMIC_CMP_SWAP_WITH_SUCCESS. If the operation's type has been promoted during type legalization, we need to account for the fact that the high bits of the comparison operand are likely unspecified. The LHS is usually zero-extended, but MIPS sign extends it, so we have to be slightly careful. Patch by Simon Dardis. llvm-svn: 264296	2016-03-24 15:38:38 +00:00
Tom Stellard	9babad25e5	AMDGPU/SI: Add Polaris support Patch By: Sonny Jiang llvm-svn: 264295	2016-03-24 15:31:05 +00:00
Simon Pilgrim	d7c4fce47d	[X86][XOP] Merged 128/256 bit 4op instruction definitions. NFCI. llvm-svn: 264294	2016-03-24 15:28:02 +00:00
Daniel Sanders	15f8fb6f83	[mips] Range check vsplat_simm5 and vsplat_simm10 Summary: Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D18177 llvm-svn: 264287	2016-03-24 14:53:40 +00:00
Nemanja Ivanovic	5ebc92dbe1	[PowerPC] Disable direct moves for extractelement and bitcast in 32-bit mode This patch corresponds to review: http://reviews.llvm.org/D17711 It disables direct moves on these operations in 32-bit mode since the patterns assume 64-bit registers. The final patch is slightly different from the Phabricator review as the bitcast operations needed to be disabled in 32-bit mode as well. This fixes PR26617. llvm-svn: 264282	2016-03-24 13:40:33 +00:00
Daniel Sanders	837f15187b	[mips] Range check simm10 Summary: Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D18148 llvm-svn: 264279	2016-03-24 13:26:59 +00:00
Simon Pilgrim	572ca71573	[X86][XOP] Support for VPPERM byte shuffle instruction This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode. Differential Revision: http://reviews.llvm.org/D18189 llvm-svn: 264260	2016-03-24 11:52:43 +00:00
Daniel Sanders	f692130216	[mips] Tidy up cnMIPS tablegen definitions. NFC. Summary: In particular, make the cnMIPS predicates much more obvious and prefer def ... : ... { let Foo = bar; } over: let Foo = bar in def ... : ...; Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18354 llvm-svn: 264258	2016-03-24 11:40:48 +00:00
Vasileios Kalintiris	b8a37205d2	Fix sequence point warning. NFC. llvm-svn: 264255	2016-03-24 10:53:28 +00:00
Zlatko Buljan	94af4cbcf4	[mips][microMIPS] Add CodeGen support for DIV, MOD, DIVU, MODU, DDIV, DMOD, DDIVU and DMODU instructions Differential Revision: http://reviews.llvm.org/D17137 llvm-svn: 264248	2016-03-24 09:22:45 +00:00
Hrvoje Varga	2cb74ac3c3	[mips][microMIPS] Implement MTC, MTHC and DMTC* instructions Differential Revision: http://reviews.llvm.org/D17328 llvm-svn: 264246	2016-03-24 08:02:09 +00:00
Hrvoje Varga	dbea1a1e51	[mips][microMIPS] Fix for "Cannot copy registers" assertion Differential Revision: http://reviews.llvm.org/D17068 llvm-svn: 264245	2016-03-24 06:05:35 +00:00
Paul Robinson	f81836bd18	[PS4] Guarantee an instruction after a 'noreturn' call. We need the "return address" of a noreturn call to be within the bounds of the calling function; TrapUnreachable turns 'unreachable' into a 'ud2' instruction, which has that desired effect. Differential Revision: http://reviews.llvm.org/D18414 llvm-svn: 264224	2016-03-24 00:10:03 +00:00
Matt Arsenault	30d37a74da	AMDGPU: Remove atomic inc/dec patterns There is no benefit to these since materializing the constant 1 requires the same number of instructions as materializing uint_max llvm-svn: 264215	2016-03-23 23:23:38 +00:00
Matt Arsenault	0a30e456b4	AMDGPU: Promote alloca should skip volatiles llvm-svn: 264214	2016-03-23 23:17:29 +00:00
Matt Arsenault	f43c2a0b49	AMDGPU: Insert moves of frame index to value operands Strengthen tests of storing frame indices. Right now this just creates irrelevant scheduling changes. We don't want to have multiple frame index operands on an instruction. There seem to be various assumptions that at least the same frame index will not appear twice in the LocalStackSlotAllocation pass. There's no reason to have this happen, and it just makes it easy to introduce bugs where the immediate offset is appplied to the storing instruction when it should really be applied to the value being stored as a separate add. This might not be sufficient. It might still be problematic to have an add fi, fi situation, but that's even less unlikely to happen in real code. llvm-svn: 264200	2016-03-23 21:49:25 +00:00
Cong Hou	94710840fb	Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case: jne.BB1 jp.BB1 jmp.BB2 .BB1: ... .BB2: ... AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed: jne.BB1 jnp.BB2 .BB1: ... .BB2: ... However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal. Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them. In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization. Differential Revision: http://reviews.llvm.org/D11393 llvm-svn: 264199	2016-03-23 21:45:37 +00:00
Sanjay Patel	7876f180b5	[x86] make peekThroughBitcasts() a helper function This should be hoisted further up so it can be used in DAGCombiner and other backends, but I'm limiting the scope in the interest of patch minimalism. It's not quite NFC because some of the replaced code was using an 'if' check rather than a 'while' loop, so those cases would only look through a single bitcast. llvm-svn: 264186	2016-03-23 20:16:37 +00:00
Chad Rosier	85c8594056	[AArch64] Replace return 0 with return false. NFC. llvm-svn: 264185	2016-03-23 20:07:28 +00:00
Kyle Butt	613112826e	Codegen: [PPC] Word Rotates are Zero Extending. Add Word rotates to the list of instructions that are zero extending. This allows them to be used in dot form to compare with zero. llvm-svn: 264183	2016-03-23 19:51:22 +00:00
Artyom Skrobov	e6f1b7f094	Replace a string comparison in ARMSubtarget.h with a tablegen entry in ARM.td (NFC) Reviewers: rengolin, t.p.northover Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D18393 llvm-svn: 264165	2016-03-23 16:18:13 +00:00
Oliver Stannard	aa77b1e025	[AArch64] Replace some uses of report_fatal_error with reportError in AArch64 ELF object writer If we can't handle a relocation type, report it as an error in the source, rather than asserting. I've added a more descriptive message and a test for the only cases of this that I've been able to trigger. Differential Revision: http://reviews.llvm.org/D18388 llvm-svn: 264156	2016-03-23 13:45:03 +00:00
Andrey Turetskiy	6a3d561ea0	[X86] Introduction of FeatureX87. Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87. Differential Revision: http://reviews.llvm.org/D13979 llvm-svn: 264148	2016-03-23 11:13:54 +00:00
Hrvoje Varga	c45baf212a	[mips][microMIPS] Delay slot filler modifications Differential Revision: http://reviews.llvm.org/D18181 llvm-svn: 264147	2016-03-23 10:29:38 +00:00
Valery Pykhtin	c0a77c5064	[AMDGPU] Fix missing assembler predicates. Differential Revision: http://reviews.llvm.org/D18351 llvm-svn: 264137	2016-03-23 04:27:26 +00:00
Tom Stellard	52ecd2d69b	AMDGPU: Cache information about register pressure sets We can statically decide whether or not a register pressure set is for SGPRs or VGPRs, so we don't need to re-compute this information in SIRegisterInfo::getRegPressureSetLimit(). Differential Revision: http://reviews.llvm.org/D14805 llvm-svn: 264126	2016-03-23 01:53:22 +00:00
Joerg Sonnenberger	772bb5b65d	Typo llvm-svn: 264110	2016-03-22 22:24:52 +00:00
Dan Gohman	665d7e3838	[WebAssembly] Implement the rotate instructions. llvm-svn: 264076	2016-03-22 18:01:49 +00:00
Simon Pilgrim	25fb4177fb	[X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardware Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Reapplied with a fix for PR26953 (missing vector widening legalization). Differential Revision: http://reviews.llvm.org/D17932 llvm-svn: 264062	2016-03-22 16:22:08 +00:00
Daniel Sanders	f3599eb683	[mips] Make simm6 consistent with the rest. NFC. Summary: Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18147 llvm-svn: 264057	2016-03-22 14:50:22 +00:00
Daniel Sanders	97297770a6	[mips] Range check simm7. Summary: Also renamed li_simm7 to li16_imm since it's not a simm7 and has an unusual encoding (it's a uimm7 except that 0x7f represents -1). Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18145 llvm-svn: 264056	2016-03-22 14:40:00 +00:00
Daniel Sanders	0f17d0da4a	[mips] Range check simm5. Summary: We can't check the error message for this one because there's another lw/sw available that covers a larger range. We therefore check the transition between the two sizes. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D18144 llvm-svn: 264054	2016-03-22 14:29:53 +00:00
Daniel Sanders	946dee3b5b	[mips] Range check vsplat_uimm[1234568]. Summary: Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18143 llvm-svn: 264053	2016-03-22 14:17:41 +00:00
Daniel Sanders	93fa4ce9b7	[mips] Range check uimm4_ptr, remove uimm6_ptr, and use correctly sized immediates in MSA copy/insert. Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18142 llvm-svn: 264052	2016-03-22 13:58:53 +00:00
Nicolai Haehnle	0a33abdfd2	AMDGPU: Fix dangling references introduced by r263982 Fixes Valgrind errors on the test cases that were reported as failing by buildbots. llvm-svn: 264000	2016-03-21 22:54:02 +00:00
Duncan P. N. Exon Smith	20be876a64	Fix -Wdocumentation warnings from r263853 Thanks to chapuni for catching this. llvm-svn: 263993	2016-03-21 22:13:44 +00:00
Nicolai Haehnle	a56e6b6a53	AMDGPU: Coding style fixes I meant to add these before committing r263982 as per the review, but I forgot to squash. llvm-svn: 263983	2016-03-21 20:39:24 +00:00
Nicolai Haehnle	213e87f2ee	AMDGPU: Add SIWholeQuadMode pass Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 llvm-svn: 263982	2016-03-21 20:28:33 +00:00
Krzysztof Parzyszek	b14f4fd0de	[Hexagon] Add handling fixups and instruction relaxation llvm-svn: 263981	2016-03-21 20:27:17 +00:00
Krzysztof Parzyszek	c6f1e1a709	[Hexagon] Properly encode registers in duplex instructions llvm-svn: 263980	2016-03-21 20:13:33 +00:00
Krzysztof Parzyszek	6514a887f4	[Hexagon] Fix reserving emergency spill slots for register scavenger - R10 and R11 are not reserved registers. - Check for reserved registers when finding unused caller-saved registers. llvm-svn: 263977	2016-03-21 19:57:08 +00:00
Dan Gohman	c8d7f14506	[WebAssembly] Implement the eqz instructions. llvm-svn: 263976	2016-03-21 19:54:41 +00:00
Tom Stellard	92339e888f	AMDGPU/SI: Fix threshold calculation for branching when exec is zero Summary: When control flow is implemented using the exec mask, the compiler will insert branch instructions to skip over the masked section when exec is zero if the section contains more than a certain number of instructions. The previous code would only count instructions in successor blocks, and this patch modifies the code to start counting instructions in all blocks between the start and end of the branch. Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18282 llvm-svn: 263969	2016-03-21 18:56:58 +00:00
Chad Rosier	cf173ffb46	[AArch64] Add a helpful assert. NFC. llvm-svn: 263965	2016-03-21 18:04:10 +00:00
Matt Arsenault	cb38a6bd35	AMDGPU: Remove SignBitIsZero for mubuf scratch offsets These instructions do not have the same negative base address problem that DS instructions do on SI. llvm-svn: 263964	2016-03-21 18:02:18 +00:00
Peter Collingbourne	86b9fbe980	ARM: Better codegen for 64-bit compares. This introduces a custom lowering for ISD::SETCCE (introduced in r253572) that allows us to emit a short code sequence for 64-bit compares. Before: push {r7, lr} cmp r0, r2 mov.w r0, #0 mov.w r12, #0 it hs movhs r0, #1 cmp r1, r3 it ge movge.w r12, #1 it eq moveq r12, r0 cmp.w r12, #0 bne .LBB1_2 @ BB#1: @ %bb1 bl f pop {r7, pc} .LBB1_2: @ %bb2 bl g pop {r7, pc} After: push {r7, lr} subs r0, r0, r2 sbcs.w r0, r1, r3 bge .LBB1_2 @ BB#1: @ %bb1 bl f pop {r7, pc} .LBB1_2: @ %bb2 bl g pop {r7, pc} Saves around 80KB in Chromium's libchrome.so. Some notes on this patch: - I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I introduced (nothing else needs them). However, they are necessary in order to avoid poor codegen, and they seem similar to existing combines in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to (brcond Compare)). - No support for Thumb-1. This is in principle possible, but we'd need to implement ARMISD::SUBE for Thumb-1. Differential Revision: http://reviews.llvm.org/D15256 llvm-svn: 263962	2016-03-21 18:00:02 +00:00
Renato Golin	2b6b7ffd6c	[ARM] Add Cortex-A32 support Adding Cortex-A32 as an available target in the ARM backend. Patch by Sam Parker. llvm-svn: 263956	2016-03-21 17:29:01 +00:00
Matt Arsenault	b96b57347a	AMDGPU: Add frexp_mant intrinsic llvm-svn: 263948	2016-03-21 16:11:05 +00:00
Chad Rosier	4aeab5fbf2	[AArch64] Fix a -Wdocumentation warning. NFC. llvm-svn: 263942	2016-03-21 13:43:58 +00:00
Jingyue Wu	1375560bdb	[NVPTX] Adds a new address space inference pass. Summary: The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable to convert the address space of a pointer induction variable. This patch adds a new pass called NVPTXInferAddressSpaces that overcomes that limitation using a fixed-point data-flow analysis (see the file header comments for details). The new pass is experimental and not enabled by default. Users can turn it on by setting the -nvptx-use-infer-addrspace flag of llc. Reviewers: jholewinski, tra, jlebar Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D17965 llvm-svn: 263916	2016-03-20 20:59:20 +00:00
Simon Pilgrim	fcc4532afa	[X86][SSE] Tidyup setTargetShuffleZeroElements to match computeZeroableShuffleElements Based on feedback for D14261 llvm-svn: 263911	2016-03-20 17:43:07 +00:00
Simon Pilgrim	c44472a5bc	[X86][SSE] Detect zeroable shuffle elements from different value types Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask. Differential Revision: http://reviews.llvm.org/D14261 llvm-svn: 263906	2016-03-20 15:45:42 +00:00
Igor Breger	3ea8af5108	AVX512BW: Enable v32i1/v64i1 BUILD_VECTOR Differential Revision: http://reviews.llvm.org/D18211 llvm-svn: 263898	2016-03-20 13:09:43 +00:00
Michael Kuperstein	048cc3b7a8	Use a range-based for loop. NFC. llvm-svn: 263889	2016-03-20 00:16:13 +00:00
Manman Ren	a3a019cf90	[CXX_FAST_TLS] Fix issues in ARM. We need to be careful on which registers can be explicitly handled via copies. Prologue, Epilogue use physical registers and if one belongs to the set of CSRsViaCopy, it will no longer be CSRed, since PEI overwrites it after the explicit copies. llvm-svn: 263857	2016-03-18 23:44:37 +00:00
Manman Ren	4865d89653	[CXX_FAST_TLS] Disable tail call when calling conventions are mismatched. Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller and callee have mismatched calling conventions. llvm-svn: 263856	2016-03-18 23:41:51 +00:00
Manman Ren	2828c57b6f	[CXX_FAST_TLS] fix issues with O0 on ARM, AArch64 and X86. Since at O0, explicit copies via SplitCSR may not be removed even if they are unnecessary, we choose not to use SplitCSR at O0. llvm-svn: 263855	2016-03-18 23:38:49 +00:00
Duncan P. N. Exon Smith	c3fa1eded2	AArch64: Don't modify other modules in AArch64PromoteConstant Avoid modifying other modules in `AArch64PromoteConstant` when the constant is `ConstantData` (a horrible accident, I'm sure, caught by an experimental follow-up to r261464). Previously, this walked through all the users of a constant, but that reaches into other modules when the constant doesn't depend transitively on a `GlobalValue`! Since we're walking instructions anyway, just modify the instructions we actually see. As a drive-by, instead of storing `Use` and getting the instructions again via `Use::getUser()` (which is not a constantant time lookup), store `std::pair<Instruction, unsigned>`. Besides being cheaper, this makes it easier to drop use-lists form `ConstantData` in the future. (I threw this in because I was touching all the code anyway.) Because the patch completely changes the traversal logic, it looks like a rewrite of the pass, but the core logic is all the same (or should be, minus the out-of-module changes). In other words, there should be NFC as long as the LLVMContext only has a single Module. I didn't think of a good way to test this, but I hope to submit a patch eventually that makes walking these use-lists illegal/impossible. llvm-svn: 263853	2016-03-18 23:30:54 +00:00
Alexei Starovoitov	7e453bb8be	BPF: emit an error message for unsupported signed division operation Signed-off-by: Yonghong Song <yhs@plumgrid.com> Signed-off-by: Alexei Starovoitov <ast@fb.com> llvm-svn: 263842	2016-03-18 22:02:47 +00:00
Nicolai Haehnle	fa771811b3	AMDGPU: add missing braces around multi-line if block This fixes an issue with rL263658 pointed out by Tom Stellard. llvm-svn: 263823	2016-03-18 20:32:04 +00:00
Chad Rosier	cdfd7e7201	[AArch64] Enable more load clustering in the MI Scheduler. This patch adds unscaled loads and sign-extend loads to the TII getMemOpBaseRegImmOfs API, which is used to control clustering in the MI scheduler. This is done to create more opportunities for load pairing. I've also added the scaled LDRSWui instruction, which was missing from the scaled instructions. Finally, I've added support in shouldClusterLoads for clustering adjacent sext and zext loads that too can be paired by the load/store optimizer. Differential Revision: http://reviews.llvm.org/D18048 llvm-svn: 263819	2016-03-18 19:21:02 +00:00
Nicolai Haehnle	95e8ffd398	AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format Summary: Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before the frontend patches land in Mesa. Eventually, we may want to automatically reduce the size of loads at the LLVM IR level, which requires such overloads, and in some cases Mesa can generate them directly. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18255 llvm-svn: 263792	2016-03-18 16:24:40 +00:00
Nicolai Haehnle	ad63638f6d	AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 llvm-svn: 263791	2016-03-18 16:24:31 +00:00
Nicolai Haehnle	3003ba00a3	AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format Summary: We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend cannot easily make use of an explicit soffset parameter either. Furthermore, it is likely that in the future, LLVM will be in a better position than the frontend to choose an SGPR offset if possible. Since there aren't any frontend uses of these intrinsics in upstream repositories yet, I would like to take this opportunity to change the intrinsic signatures to a single offset parameter, which is then selected to immediate offsets or voffsets using a ComplexPattern. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18218 llvm-svn: 263790	2016-03-18 16:24:20 +00:00
Sam Kolton	a74cd526e9	[AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3 Review: http://reviews.llvm.org/D18267 llvm-svn: 263789	2016-03-18 15:35:51 +00:00
Ehsan Amiri	631ed04af0	adding another optimization opportunity to readme file llvm-svn: 263775	2016-03-18 04:02:25 +00:00
Adam Nemet	709e3046ee	[LoopDataPrefetch] Add TTI to limit the number of iterations to prefetch ahead Summary: It can hurt performance to prefetch ahead too much. Be conservative for now and don't prefetch ahead more than 3 iterations on Cyclone. Reviewers: hfinkel Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17949 llvm-svn: 263772	2016-03-18 00:27:43 +00:00
Adam Nemet	6d8beeca53	[LoopDataPrefetch/Aarch64] Allow selective prefetching of large-strided accesses Summary: And use this TTI for Cyclone. As it was explained in the original RFC (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW prefetcher work up to 2KB strides. I am also adding tests for this and the previous change (D17943): * Cyclone prefetching accesses with a large stride * Cyclone not prefetching accesses with a small stride * Generic Aarch64 subtarget not prefetching either Reviewers: hfinkel Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17945 llvm-svn: 263771	2016-03-18 00:27:38 +00:00
Adam Nemet	53e758fc55	[Aarch64] Add pass LoopDataPrefetch for Cyclone Summary: This wires up the pass for Cyclone but keeps it off for now because we need a few more TTIs. The getPrefetchMinStride value is not very well tuned right now but it works well with CFP2006/433.milc which motivated this. Tests will be added as part of the upcoming large-stride prefetching patch. Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, hfinkel, rengolin Differential Revision: http://reviews.llvm.org/D17943 llvm-svn: 263770	2016-03-18 00:27:29 +00:00
Tim Shen	5cdf75084a	[PPC, FastISel] Fix ordered/unordered fcmp For fcmp, major concern about the following 6 cases is NaN result. The comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered), only one of which will be set. The result is generated by fcmpu instruction. However, bc instruction only inspects one of the first 3 bits, so when un is set, bc instruction may jump to to an undesired place. More specifically, if we expect an unordered comparison and un is set, we expect to always go to true branch; in such case UEQ, UGT and ULT still give false, which are undesired; but UNE, UGE, ULE happen to give true, since they are tested by inspecting !eq, !lt, !gt, respectively. Similarly, for ordered comparison, when un is set, we always expect the result to be false. In such case OGT, OLT and OEQ is good, since they are actually testing GT, LT, and EQ respectively, which are false. OGE, OLE and ONE are tested through !lt, !gt and !eq, and these are true. llvm-svn: 263753	2016-03-17 22:27:58 +00:00
Tim Northover	498c56c240	ARM: stop asserting on weird <3 x Ty> vectors in ISelLowering. llvm-svn: 263741	2016-03-17 20:10:28 +00:00
Petar Jovanovic	0b44f24033	[PowerPC] Disable CTR loops optimization for soft float operations This patch prevents CTR loops optimization when using soft float operations inside loop body. Soft float operations use function calls, but function calls are not allowed inside CTR optimized loops. Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D17600 llvm-svn: 263727	2016-03-17 17:11:33 +00:00
Derek Schuff	d4207ba0f6	[WebAssembly] Stackify code emitted by eliminateFrameIndex and SP writeback Summary: MRI::eliminateFrameIndex can emit several instructions to do address calculations; these can usually be stackified. Because instructions with FI operands can have subsequent operands which may be expression trees, find the top of the leftmost tree and insert the code before it, to keep the LIFO property. Also use stackified registers when writing back the SP value to memory in the epilog; it's unnecessary because SP will not be used after the epilog, and it results in better code. Differential Revision: http://reviews.llvm.org/D18234 llvm-svn: 263725	2016-03-17 17:00:29 +00:00
Changpeng Fang	234fcb81d3	AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute Symmary: ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed. Reviewers arsenm, tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18197 llvm-svn: 263720	2016-03-17 16:43:50 +00:00
Nicolai Haehnle	79cad857a0	AMDGPU: mark atomic instructions as sources of divergence Summary: As explained by the comment, threads will typically see different values returned by atomic instructions even if the arguments are equal. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18156 llvm-svn: 263719	2016-03-17 16:21:59 +00:00
Simon Pilgrim	0f37fbac51	[X86][SSE] Simplified blend-with-zero combining We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines This patch stops the combine if we already have a blend in place (means we miss some domain corrections) llvm-svn: 263717	2016-03-17 15:59:36 +00:00
Saleem Abdulrasool	071a099102	ARM: Revert SVN r253865, 254158, fix windows division The two changes together weakened the test and caused a regression with division handling in MSVC mode. They were applied to avoid an assertion being triggered in the block frequency analysis. However, the underlying problem was simply being masked rather than solved properly. Address the actual underlying problem and revert the changes. Rather than analyze the cause of the assertion, the division failure was assumed to be an overflow. The underlying issue was a subtle bug in the BB construction in the emission of the div-by-zero check (WIN__DBZCHK). We did not construct the proper successor information in the basic blocks, nor did we update the PHIs associated with the basic block when we split them. This would result in assertions being triggered in the block frequency analysis pass. Although the original tests are being removed, the tests themselves performed very little in terms of validation but merely tested that we did not assert when generating code. Update this with new tests that actually ensure that we do not regress on the code generation. llvm-svn: 263714	2016-03-17 14:10:49 +00:00
Simon Atanasyan	58ee875296	[mips] Use `formatImm` call to print immediate value in the `MipsInstPrinter` That allows, for example, to print hex-formatted immediates using llvm-objdump --print-imm-hex command line option. Differential Revision: http://reviews.llvm.org/D18195 llvm-svn: 263704	2016-03-17 10:43:36 +00:00
Scott Egerton	d65377da78	[mips] Eliminate instances of "potentially uninitialised local variable" warnings, NFC Summary: This should eliminate all occurrences of this within LLVMMipsAsmParser. This patch is in response to http://reviews.llvm.org/D17983. I was unable to reproduce the warnings on my machine so please advise if this fixes the warnings. Reviewers: ariccio, vkalintiris, dsanders Subscribers: dblaikie, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18087 llvm-svn: 263703	2016-03-17 10:37:51 +00:00
James Y Knight	f44fc5219f	Tweak some atomics functions in preparation for larger changes; NFC. - Rename getATOMIC to getSYNC, as llvm will soon be able to emit both '__sync' libcalls and '__atomic' libcalls, and this function is for the '__sync' ones. - getInsertFencesForAtomic() has been replaced with shouldInsertFencesForAtomic(Instruction), so that the decision can be made per-instruction. This functionality will be used soon. - emitLeadingFence/emitTrailingFence are no longer called if shouldInsertFencesForAtomic returns false, and thus don't need to check the condition themselves. llvm-svn: 263665	2016-03-16 22:12:04 +00:00
Nicolai Haehnle	ef160de3e5	AMDGPU: Prevent uniform loops from becoming infinite Summary: Uniform loops where the branch leaving the loop is predicated on VCCNZ must be skipped if EXEC = 0, otherwise they will be infinite. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18137 llvm-svn: 263658	2016-03-16 20:14:33 +00:00
Colin LeMahieu	bb0cdfb9f7	[Hexagon] Adding missing break in switch statement. Extra operands would have been appended to the end. llvm-svn: 263657	2016-03-16 20:00:38 +00:00
Sanjay Patel	be37e62e0c	fix function names; NFC llvm-svn: 263646	2016-03-16 18:00:09 +00:00
Michel Danzer	302f83ac4e	AMDGPU: Verify instructions in non-debug builds as well And emit an error if it fails. This prevents illegal instructions from getting sent to the GPU, which would potentially result in a hang. This is a candidate for the stable branch(es). Reviewed-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 263627	2016-03-16 09:10:42 +00:00
Michel Danzer	beb79ceb19	AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormat Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 263626	2016-03-16 09:10:35 +00:00
Igor Breger	0ba7b04f5f	AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for 512bit vector because PCMPGT supported only for 128/256bit. Differential Revision: http://reviews.llvm.org/D18204 llvm-svn: 263624	2016-03-16 08:48:26 +00:00
Davide Italiano	dfdf278ebf	[MC] Rename TLSDESC as it's not ARM specific. Similarly to what was done for TLSCALL in r263515. llvm-svn: 263564	2016-03-15 17:29:52 +00:00
Changpeng Fang	01f6062227	AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS Summary: Static LDS size is saved in MachineFunctionInfo::LDSSize, We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter, we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize. Reviewers: arsenm tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18064 llvm-svn: 263563	2016-03-15 17:28:44 +00:00
Douglas Katzman	708eeb0519	Myriad: Add new sparc CPU kinds. llvm-svn: 263557	2016-03-15 16:41:47 +00:00
Davide Italiano	249c45d92e	[MC] Rename TLSCALL as it's not ARM specific. `MCSymbolRefExpr` variant kind for TLSCALL is prefixed with _ARM_ since this is how it was originally implemented. The X86_64 version is exactly the same so there's no reason to create a new variant, we can just rename the existing one to be machine-independent. This generalization is the first step to implement support for GNU2 TLS dialect in MC. Differential Revision: http://reviews.llvm.org/D18160 llvm-svn: 263515	2016-03-15 00:25:22 +00:00
Eric Christopher	da8b3f1914	Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware" as it seems to be causing crashes during code generation in halide. PR forthcoming. This reverts commit r263303. llvm-svn: 263512	2016-03-14 23:59:57 +00:00
Ulrich Weigand	52aa7fba3f	[SystemZ] Add missing isBranch flags to certain instruction Some instructions were missing isBranch, isCall, or isTerminator flags. This didn't really affect code generation since most of the affected patterns were used only for the AsmParser and/or disassembler. However, it could affect tools using the MC layer to disassemble and parse binary code (e.g. via MCInstrDesc::mayAffectControlFlow). llvm-svn: 263478	2016-03-14 20:16:30 +00:00
Chad Rosier	27c352d26d	[AArch64] Refactor AArch64FrameLowering::emitPrologue. NFC. http://reviews.llvm.org/D18125 Patch by Aditya Kumar. llvm-svn: 263461	2016-03-14 18:24:34 +00:00
Chad Rosier	6d98655070	[AArch64] Break the dependency between FP and SP when possible. When the SP in not changed because of realignment/VLAs etc., we restore the SP by using the previous value of SP and not the FP. Breaking the dependency will help in cases when the epilog of a callee is close to the epilog of the caller; for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of the callee. http://reviews.llvm.org/D18060 Patch by Aditya Kumar and Evandro Menezes. llvm-svn: 263458	2016-03-14 18:17:41 +00:00
Chad Rosier	7a21bb196b	[Mips] Fix -Wunused-private-field warning after r263444. llvm-svn: 263454	2016-03-14 18:10:20 +00:00
Sanjay Patel	7506852709	[DAG] use !isUndef() ; NFCI llvm-svn: 263453	2016-03-14 18:09:43 +00:00
Sanjay Patel	5719584129	[DAG] use isUndef() ; NFCI llvm-svn: 263448	2016-03-14 17:28:46 +00:00
Tom Stellard	331f981cc9	AMDGPU/SI: Handle wait states required for DPP instructions Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17543 llvm-svn: 263447	2016-03-14 17:05:56 +00:00
Sanjay Patel	62d707c8d9	[x86, AVX] replace masked load with full vector load when possible Converting masked vector loads to regular vector loads for x86 AVX should always be a win. I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any objections. 1. x86 already does this kind of optimization for multiple scalar loads -> vector load. 2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner. Differential Revision: http://reviews.llvm.org/D18094 llvm-svn: 263446	2016-03-14 16:54:43 +00:00
Daniel Sanders	e8efff373a	[mips] MIPS32R6 compact branch support Summary: MIPSR6 introduces a class of branches called compact branches. Unlike the traditional MIPS branches which have a delay slot, compact branches do not have a delay slot. The instruction following the compact branch is only executed if the branch is not taken and must not be a branch. It works by generating compact branches for MIPS32R6 when the delay slot filler cannot fill a delay slot. Then, inspecting the generated code for forbidden slot hazards (a compact branch with an adjacent branch or other CTI) and inserting nops to clear this hazard. Patch by Simon Dardis. Reviewers: vkalintiris, dsanders Subscribers: MatzeB, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D16353 llvm-svn: 263444	2016-03-14 16:24:05 +00:00
Marek Olsak	ed2213e6ef	AMDGPU/SI: Incomplete shader binaries need to finish execution at the end Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D18058 llvm-svn: 263441	2016-03-14 15:57:14 +00:00
Nicolai Haehnle	74127fe8d7	AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence Summary: When multiple threads perform an atomic op with the same arguments, they will usually see different return values. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18101 llvm-svn: 263440	2016-03-14 15:37:18 +00:00
Vasileios Kalintiris	42db3ff47f	[mips] Use range-based for loops. NFC. llvm-svn: 263438	2016-03-14 15:05:30 +00:00
Ulrich Weigand	cdce026b4d	[SystemZ] Avoid LER on z13 due to partial register dependencies On the z13, it turns out to be more efficient to access a full floating-point register than just the upper half (as done e.g. by the LE and LER instructions). Current code already takes this into account when loading from memory by using the LDE instruction in place of LE. However, we still generate LER, which shows the same performance issues as LE in certain circumstances. This patch changes the back-end to emit LDR instead of LER to implement FP32 register-to-register copies on z13. llvm-svn: 263431	2016-03-14 13:50:03 +00:00
Zlatko Buljan	fba68931ed	[mips] Fix an issue with long double when function roundl is defined Differential Revision: http://reviews.llvm.org/D17760 llvm-svn: 263428	2016-03-14 12:50:23 +00:00
Daniel Sanders	127d2d2b46	[mips] Range check uimm16_64 Summary: Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D17725 llvm-svn: 263427	2016-03-14 12:44:44 +00:00
Daniel Sanders	cfa3483c8e	[mips] Simplify ordering of range checked immediate classes. Summary: With the addition of checks to ensure that operands have a strict ordering it has become tricky to manage the order in the way I originally intended. This patch linearizes the ordering which simplifies the implementation but requires an order that is arbitrary in places. Here are some examples: * uimm4 < uimm5 < uimm6 * simm4 < uimm4 < simm5 < uimm5 * uimm5 < uimm5_plus1 (1..32) < uimm5_plus32 (32..63) < uimm6 The term 'superset' starts to break down here since the _plus classes are not true supersets of uimm5 (but they are still subsets of uimm6). * uimm5 < uimm5_64, and uimm5 < vsplat_uimm5 This is entirely arbitrary. We need an ordering and what we pick is unimportant since only one is possible for a given mnemonic. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D17723 llvm-svn: 263423	2016-03-14 11:46:30 +00:00
Nikolay Haustov	79af6b33e0	[AMDGPU] Assembler: SOP* instruction fixes s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit. s_rfe_b64 has just one destination operand and no source. Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that. Add s_memrealtime test and change comments in smem.s to follow common style. Change test for s_memtime to use non-zero register to make it really test encoding. Add tests for s_buffer_load*. Add tests for SOPC instructions (same for SI and VI) Differential Revision: http://reviews.llvm.org/D18040 llvm-svn: 263420	2016-03-14 11:17:19 +00:00
Daniel Sanders	19b7f76afa	[mips] Range check uimm6_lsl2. Summary: Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17291 llvm-svn: 263419	2016-03-14 11:16:56 +00:00
Hans Wennborg	369ebfe4c9	Try to fix build of WebAssemblyRegStackify.cpp on Windows It's failing to build on VS2015 with: C:\b\build\slave\ClangToTWin\build\src\third_party\llvm\lib\Target\WebAssembly\WebAssemblyRegStackify.cpp(520): error C2668: 'llvm::make_reverse_iterator': ambiguous call to overloaded function C:\b\build\slave\ClangToTWin\build\src\third_party\llvm\include\llvm/ADT/STLExtras.h(217): note: could be 'std::reverse_iterator<llvm::MachineBasicBlock::iterator> llvm::make_reverse_iterator<llvm::MachineInstrBundleIterator<llvm::MachineInstr>>(IteratorTy)' with [ IteratorTy=llvm::MachineInstrBundleIterator<llvm::MachineInstr> ] C:\b\depot_tools\win_toolchain\vs_files\391bbf1220d3edcd3cc3fccdb56224181e3b13a7\win_sdk\bin\..\..\VC\include\xutility(1217): note: or 'std::reverse_iterator<llvm::MachineBasicBlock::iterator> std::make_reverse_iterator<llvm::MachineInstrBundleIterator<llvm::MachineInstr>>(_RanIt)' [found using argument-dependent lookup] with [ _RanIt=llvm::MachineInstrBundleIterator<llvm::MachineInstr> ] I don't have VS2015 locally at the moment, but hopefully this will help. llvm-svn: 263418	2016-03-14 11:04:15 +00:00
Igor Breger	a949100532	AVX512: icmp operation should be always lowered to CMPM (AVX-512) instruction on SKX. implemented by delena Differential Revision: http://reviews.llvm.org/D18054 llvm-svn: 263417	2016-03-14 10:26:39 +00:00
Valery Pykhtin	0f97f17152	[AMDGPU] AsmParser: Factor out parseRegister. NFC. llvm-svn: 263411	2016-03-14 07:43:42 +00:00
Valery Pykhtin	9e33c7f5d3	[AMDGPU] AsmParser: refactor post push_back vector access. NFC. llvm-svn: 263409	2016-03-14 05:25:44 +00:00
Valery Pykhtin	f91911c3ae	[AMDGPU] AsmParser: remove redundant isReg checks. NFC. llvm-svn: 263407	2016-03-14 05:01:45 +00:00
Mehdi Amini	ba9fba81d6	Remove PreserveNames template parameter from IRBuilder This reapplies r263258, which was reverted in r263321 because of issues on Clang side. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263393	2016-03-13 21:05:13 +00:00
Simon Pilgrim	035b19ecf5	[X86][SSE41] Avoid variable blend for constant v8i16 shifts The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering. llvm-svn: 263383	2016-03-13 18:35:59 +00:00
Craig Topper	955308fbee	[X86] Remove many operands that represent memory stores from outs to ins. These operands are the registers and immediates that specify the memory address not the memory itself thus they are inputs. llvm-svn: 263354	2016-03-13 02:56:31 +00:00
Nemanja Ivanovic	bd56e4e25a	Fix for PR 26378 This patch corresponds to review: http://reviews.llvm.org/D17712 We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This caused duplicate definition asserts when the pass is reused on the module (i.e. with -compile-twice or in JIT contexts). llvm-svn: 263338	2016-03-12 10:23:07 +00:00
Quentin Colombet	cf9732b417	[X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer. cmpxchg[8\|16]b uses RBX as one of its argument. In other words, using this instruction clobbers RBX as it is defined to hold one the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. Reserved registers have special semantic that only the target understands and enforces, because of that, the register allocator don’t use them, but also, don’t try to make sure they are used properly (remember it does not know how they are supposed to be used). Therefore, when RBX is used as a reserved register but defined by something that is not compatible with that use, the register allocator will not fix the surrounding code to make sure it gets saved and restored properly around the broken code. This is the responsibility of the target to do the right thing with its reserved register. To fix that, when the base pointer needs to be preserved, we use a different pseudo instruction for cmpxchg that save rbx. That pseudo takes two more arguments than the regular instruction: - One is the value to be copied into RBX to set the proper value for the comparison. - The other is the virtual register holding the save of the value of RBX as the base pointer. This saving is done as part of isel (i.e., we emit a copy from rbx). cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp This gets expanded into: rbx = copy input_for_rbx_reg cmpxchg <regular cmpxchg args> rbx = save_of_rbx_as_bp Note: The actual modeling of the pseudo is a bit more complicated to make sure the interferes that appears after the pseudo gets expanded are properly modeled before that expansion. This fixes PR26883. llvm-svn: 263325	2016-03-12 02:25:27 +00:00
Eric Christopher	35abd051c0	Temporarily revert: commit ae14bf6488e8441f0f6d74f00455555f6f3943ac Author: Mehdi Amini <mehdi.amini@apple.com> Date: Fri Mar 11 17:15:50 2016 +0000 Remove PreserveNames template parameter from IRBuilder Summary: Following r263086, we are now relying on a flag on the Context to discard Value names in release builds. Reviewers: chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18023 From: Mehdi Amini <mehdi.amini@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258 91177308-0d34-0410-b5e6-96231b3b80d8 until we can figure out what to do about clang and Release build testing. This reverts commit 263258. llvm-svn: 263321	2016-03-12 01:47:22 +00:00
Simon Pilgrim	33d57c7547	[X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Differential Revision: http://reviews.llvm.org/D17932 llvm-svn: 263303	2016-03-11 22:18:05 +00:00
Ahmed Bougacha	171f7b9986	[AArch64] Don't blindly lower f16/f128 FCCMPs. Instead, extend f16 (like we do when lowering a standalone SETCC), and let f128 be legalized to the RT calls. Fixes PR26803. llvm-svn: 263301	2016-03-11 22:02:58 +00:00
Dan Gohman	da323e88ea	[WebAssembly] Add `final` keywords to a few more subclasses, for consistency. llvm-svn: 263287	2016-03-11 19:45:37 +00:00
Simon Pilgrim	7b2164ffe0	Fix spelling. llvm-svn: 263266	2016-03-11 17:31:43 +00:00
Mehdi Amini	99eab3dd06	Remove PreserveNames template parameter from IRBuilder Summary: Following r263086, we are now relying on a flag on the Context to discard Value names in release builds. Reviewers: chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18023 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263258	2016-03-11 17:15:50 +00:00
Valery Pykhtin	a7f480b4e9	[AMDGPU] Fix VOPC instruction operand namings Differential Revision: http://reviews.llvm.org/D17966 llvm-svn: 263242	2016-03-11 14:53:28 +00:00
Simon Pilgrim	7ca9614c71	[X86][AVX] Fixed issue where a long chain of shuffles could attempt to combine to a single (illegal) PSHUFB instruction. Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases. llvm-svn: 263239	2016-03-11 14:39:10 +00:00
Vasileios Kalintiris	e2cbc21b6f	[mips] MIPSR6 Instruction itineraries Summary: Defines instruction itineraries for common MIPSR6 instructions. Patch by Simon Dardis. Reviewers: vkalintiris Subscribers: MatzeB, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D17198 llvm-svn: 263229	2016-03-11 13:05:06 +00:00
Daniel Sanders	78e8902097	[mips] Range check simm4. Summary: Reviewers: vkalintiris Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D16811 llvm-svn: 263220	2016-03-11 11:37:50 +00:00
Nikolay Haustov	6560781c4f	[AMDGPU] Assembler: change v_madmk operands to have same order as mad. The constant is now at source operand 1 (previously at 2). This is also how it is in legacy AMD sp3 assembler. Update tests. Differential Revision: http://reviews.llvm.org/D17984 llvm-svn: 263212	2016-03-11 09:27:25 +00:00
Chandler Carruth	89c45a162f	[PM] Port GVN to the new pass manager, wire it up, and teach a couple of tests to run GVN in both modes. This is mostly the boring refactoring just like SROA and other complex transformation passes. There is some trickiness in that GVN's ValueNumber class requires hand holding to get to compile cleanly. I'm open to suggestions about a better pattern there, but I tried several before settling on this. I was trying to balance my desire to sink as much implementation detail into the source file as possible without introducing overly many layers of abstraction. Much like with SROA, the design of this system is made somewhat more cumbersome by the need to support both pass managers without duplicating the significant state and logic of the pass. The same compromise is struck here. I've also left a FIXME in a doxygen comment as the GVN pass seems to have pretty woeful documentation within it. I'd like to submit this with the FIXME and let those more deeply familiar backfill the information here now that we have a nice place in an interface to put that kind of documentaiton. Differential Revision: http://reviews.llvm.org/D18019 llvm-svn: 263208	2016-03-11 08:50:55 +00:00
Matt Arsenault	bafc9dc591	AMDGPU: Don't use InstVisitor for AMDGPUPromoteAlloca Frontend authors are strongly encouraged to keep allocas in the entry block, so don't bother visiting every instruction in the other blocks of the function. llvm-svn: 263206	2016-03-11 08:20:50 +00:00
Matt Arsenault	6b6a2c37bc	AMDGPU: R600 code splitting cleanup Move a few functions only used by R600 to R600 specific code, fix header macros to stop using R600, mark classes as final. llvm-svn: 263204	2016-03-11 08:00:27 +00:00
Matt Arsenault	9a19c240c0	AMDGPU: Materialize sign bits with bfrev If a constant is the same as the reverse of an inline immediate, this is 4 bytes smaller than having to embed a 32-bit literal. llvm-svn: 263201	2016-03-11 07:42:49 +00:00
Tim Northover	6092de5075	AArch64: only try to use scaled fcvt ops on legal vector types. Before we ended up calling getSimpleVectorType on a <3 x float>, which asserted. llvm-svn: 263169	2016-03-10 23:02:21 +00:00
Sanjay Patel	0181943b89	[x86] don't use a shuffle when a vselect will do; NFCI Looking at the IR definition of a masked load made me realize there was no reason to use a shuffle here, so we don't need to convert the format of the mask at all. llvm-svn: 263167	2016-03-10 22:35:33 +00:00
Simon Pilgrim	61eb49e437	[X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 263159	2016-03-10 20:40:26 +00:00
Peter Collingbourne	aba16fca5d	ARM: Support relative references using the PREL31 symbol variant. Differential Revision: http://reviews.llvm.org/D17937 llvm-svn: 263156	2016-03-10 19:30:18 +00:00
Tim Northover	00e2dcec02	AArch64: remove pseudo-instructions used only for their patterns. There's no real reason for these pseudos to exist, we should be writing real patterns even if it is slightly less convenient. NFC. llvm-svn: 263141	2016-03-10 18:46:12 +00:00
Nicolai Haehnle	b142770bfe	AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics Summary: They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa to implement the GL_ARB_shader_image_load_store extension. The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling and loads). However, this is not currently implemented. For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller" opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32 is actually implemented since GLSL also only has a vec4 variant of the store instructions, although it's conceivable that Mesa will want to be smarter about this in the future. BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which has a legacy name, pretends not to access memory, and does not capture the full flexibility of the instruction. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17277 llvm-svn: 263140	2016-03-10 18:43:50 +00:00
Michael Kuperstein	8be8de6d62	[X86] Correctly select registers to pop into for x86_64 When trying to replace an add to esp with pops, we need to choose dead registers to pop into. Registers clobbered by the call and not imp-def'd by it should be safe. Except that it's not enough to check the register itself isn't defined, we also need to make sure no overlapping registers are defined either. This fixes PR26711. Differential Revision: http://reviews.llvm.org/D18029 llvm-svn: 263139	2016-03-10 18:43:21 +00:00
Balaram Makam	e9b2725287	[AArch64] Optimize compare and branch sequence when the compare's constant operand is power of 2 Summary: Peephole optimization that generates a single TBZ/TBNZ instruction for test and branch sequences like in the example below. This handles the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC Examples: and w8, w8, #0x400 cbnz w8, L1 to tbnz w8, #10, L1 Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17942 llvm-svn: 263136	2016-03-10 17:54:55 +00:00

... 3 4 5 6 7 ...

36959 Commits