llvm-project

Commit Graph

Author	SHA1	Message	Date
Artem Tamazov	751985a757	[AMDGPU][mc] Fix ds_min/max[_rtn]_f32 - extra source operand removed. Fixes Bug 28215. Lit tests updated. Differential Revision: https://reviews.llvm.org/D25837 llvm-svn: 284825	2016-10-21 14:49:22 +00:00
Sanjay Patel	cbaba93ce8	[DAG] use SDNode flags 'nsz' to enable fadd/fsub with zero folds As discussed in D24815, let's start the process of killing off the broken fast-math global state housed in TargetOptions and eliminate the need for function-level fast-math attributes. Here we enable two similar folds that are possible when we don't care about signed-zero: fadd nsz x, 0 --> x fsub nsz 0, x --> -x Note that although the test cases include a 'sin' function call, I'm side-stepping the FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node. Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't actually do anything today because Flags are silently dropped for any node that is not a binary operator. Differential Revision: https://reviews.llvm.org/D25297 llvm-svn: 284824	2016-10-21 14:36:58 +00:00
Simon Pilgrim	c98d99a600	[X86][AVX2] Begun generalizing lowering to VPERMD/VPERMPS in preparation for AVX512 support. llvm-svn: 284823	2016-10-21 13:00:47 +00:00
Simon Pilgrim	32b06235da	[X86][AVX512] Add mask/maskz writemask support to subvector broadcast shuffle decode comments llvm-svn: 284821	2016-10-21 12:14:24 +00:00
John Brawn	84b21835f1	[LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loops When we have a loop with a known upper bound on the number of iterations, and furthermore know that either the number of iterations will be either exactly that upper bound or zero, then we can fully unroll up to that upper bound keeping only the first loop test to check for the zero iteration case. Most of the work here is in plumbing this 'max-or-zero' information from the part of scalar evolution where it's detected through to loop unrolling. I've also gone for the safe default of 'false' everywhere but howManyLessThans which could probably be improved. Differential Revision: https://reviews.llvm.org/D25682 llvm-svn: 284818	2016-10-21 11:08:48 +00:00
Bjorn Pettersson	9fcd605d1e	[AArch64] Corrected spill size for DDD register class. NFCI Summary: The spill size was incorrectly set to 196 bits, which isn't a multiple of 8. This problem was detected when experimenting with asserts that the spill size should be a multiple of the byte size. New corrected value for the spill size is set to 192 bits. Note that tablegen (RegisterInfoEmitter) will divide the size set in the RegisterClass definition by 8. So this change should not have any impact on the tablegen output (trunc(192/8) == trunc(196/8) == 24 bytes). Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, rengolin Differential Revision: https://reviews.llvm.org/D25818 llvm-svn: 284814	2016-10-21 09:53:42 +00:00
Davide Italiano	d15477b09d	Revert "[GVN/PRE] Hoist global values outside of loops." There's no agreement about this patch. I personally find the PRE machinery of the current GVN hard enough to reason about that I'm not sure I'll try to land this again, instead of working on the rewrite). llvm-svn: 284796	2016-10-21 01:37:02 +00:00
Keno Fischer	b04df8eaa2	Fix cross-endianness RuntimeDyld relocation for ARM rL284780 fixed the PREL31 relocation and added a test for it. Being the first such test for ARM relocations, it exposed incorrect endianness assumptions (causing buildbot failures on big-endian hosts). Fix that by using the same helpers used for the x86 case. llvm-svn: 284789	2016-10-20 22:15:56 +00:00
Li Huang	fcfe8cd3ae	[SCEV] Add a threshold to restrict number of mul operands to be inlined into SCEV This is to avoid inlining too many multiplication operands into a SCEV, which could take exponential time in the worst case. Reviewers: Sanjoy Das, Mehdi Amini, Michael Zolotukhin Differential Revision: https://reviews.llvm.org/D25794 llvm-svn: 284784	2016-10-20 21:38:39 +00:00
Keno Fischer	c32ffe3916	Fix PREL31 relocation on ARM Summary: This is a 31bits relative relocation instead of a 32bits absolute relocation. Reviewers: t.p.northover, peter.smith, rengolin Subscribers: aemerson, llvm-commits, samparker Differential Revision: https://reviews.llvm.org/D25069 llvm-svn: 284780	2016-10-20 21:15:29 +00:00
Michael Kuperstein	b2443ed62b	[X86] Enable interleaved memory access by default This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 llvm-svn: 284779	2016-10-20 21:04:31 +00:00
Daniel Berlin	cd2deacac6	[MSSA] Avoid unnecessary use walks when calling getClobberingMemoryAccess Summary: This allows us to mark when uses have been optimized. This lets us avoid rewalking (IE when people call getClobberingAccess on everything), and also enables us to later relax the requirement of use optimization during updates with less cost. Reviewers: george.burgess.iv Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25172 llvm-svn: 284771	2016-10-20 20:13:45 +00:00
Kevin Enderby	c8bb42283e	Another additional error check for invalid Mach-O files for the load commands that use the MachO::twolevel_hints_command type which includes only the LC_TWOLEVEL_HINTS load command. This is not used in llvm libObject code or in llvm tool code. But does appear in one of the binary test files. While this load command is obsolete it is easier to add code for it in libObject than edit or change the binary test case. llvm-svn: 284769	2016-10-20 20:10:30 +00:00
Zachary Turner	4d49eb9fa0	[CodeView] Refactor serialization to use StreamInterface. This was all using ArrayRef<>s before which presents a problem when you want to serialize to or deserialize from an actual PDB stream. An ArrayRef<> is really just a special case of what can be handled with StreamInterface though (e.g. by using a ByteStream), so changing this to use StreamInterface allows us to plug in a PDB stream and get all the record serialization and deserialization for free on a MappedBlockStream. Subsequent patches will try to remove TypeTableBuilder and TypeRecordBuilder in favor of class that operate on Streams as well, which should allow us to completely merge the reading and writing codepaths for both types and symbols. Differential Revision: https://reviews.llvm.org/D25831 llvm-svn: 284762	2016-10-20 18:31:19 +00:00
Konstantin Zhuravlyov	521e5ef4ce	[AMDGPU] Make note record name a static const member of target streamer Differential Revision: https://reviews.llvm.org/D25746 llvm-svn: 284760	2016-10-20 18:22:36 +00:00
Konstantin Zhuravlyov	08326b6256	[AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only) Differential Revision: https://reviews.llvm.org/D25693 llvm-svn: 284759	2016-10-20 18:12:38 +00:00
Dehao Chen	f03f51555a	Using branch probability to guide critical edge splitting. Summary: The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass. The performance impact on speccpu2006 on Intel sandybridge machines: spec/2006/fp/C++/444.namd 25.3 +0.26% spec/2006/fp/C++/447.dealII 45.96 -0.10% spec/2006/fp/C++/450.soplex 41.97 +1.49% spec/2006/fp/C++/453.povray 36.83 -0.96% spec/2006/fp/C/433.milc 23.81 +0.32% spec/2006/fp/C/470.lbm 41.17 +0.34% spec/2006/fp/C/482.sphinx3 48.13 +0.69% spec/2006/int/C++/471.omnetpp 22.45 +3.25% spec/2006/int/C++/473.astar 21.35 -2.06% spec/2006/int/C++/483.xalancbmk 36.02 -2.39% spec/2006/int/C/400.perlbench 33.7 -0.17% spec/2006/int/C/401.bzip2 22.9 +0.52% spec/2006/int/C/403.gcc 32.42 -0.54% spec/2006/int/C/429.mcf 39.59 +0.19% spec/2006/int/C/445.gobmk 26.98 -0.00% spec/2006/int/C/456.hmmer 24.52 -0.18% spec/2006/int/C/458.sjeng 28.26 +0.02% spec/2006/int/C/462.libquantum 55.44 +3.74% spec/2006/int/C/464.h264ref 46.67 -0.39% geometric mean +0.20% Manually checked 473 and 471 to verify the diff is in the noise range. Reviewers: rengolin, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24818 llvm-svn: 284757	2016-10-20 18:06:52 +00:00
Simon Pilgrim	365be4f95c	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors We weren't checking for uniform const costs before the general cost, resulting in very high estimates. llvm-svn: 284755	2016-10-20 18:00:35 +00:00
Pirama Arumuga Nainar	05b0f93ad3	Fix _EXTEND_VECTOR_INREG legalization Summary: While promoting _EXTEND_VECTOR_INREG nodes whose inputs are already promoted, perform the appropriate sign extension for the promoted node before doing the *_EXTEND_VECTOR_INREG operation. If not, the undefined high-order bits of the promoted operand may (a) be garbage inc ase of zext) or (b) contribute the wrong sign-bit (in case of sext) Updated the promote-vec3.ll test after this change. The diff shows explicit zeroing in case of zext and intermediate sign extension in case of sext. Reviewers: RKSimon Subscribers: llvm-commits, srhines Differential Revision: https://reviews.llvm.org/D25790 llvm-svn: 284752	2016-10-20 17:56:36 +00:00
Sanjay Patel	0051efcf97	[Target] remove TargetRecip class; 2nd try This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs caused by faulty usage of StringRef. This version also renames a pair of functions: getRecipEstimateDivEnabled() getRecipEstimateSqrtEnabled() as suggested by Eric Christopher. original commit msg: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284746	2016-10-20 16:55:45 +00:00
Simon Pilgrim	025e26dd32	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults. We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases. llvm-svn: 284744	2016-10-20 16:39:11 +00:00
Valery Pykhtin	e55fd41f73	[AMDGPU] add fcopysign(f64, f32) pattern Differential revision: https://reviews.llvm.org/D25827 llvm-svn: 284743	2016-10-20 16:17:54 +00:00
Benjamin Kramer	b2505005c7	Retire llvm::alignOf in favor of C++11 alignof. No functionality change intended. llvm-svn: 284733	2016-10-20 15:02:18 +00:00
Benjamin Kramer	26b2593b24	[GVN] Use defaulted members. No functional change. llvm-svn: 284726	2016-10-20 13:09:12 +00:00
Simon Dardis	226752c15d	[mips][mcjit] Add the majority of N32 support. The missing piece is relocation composition for %hi(%neg(%gp_rel(x))) and similar. Patch by: Daniel Sanders llvm-svn: 284724	2016-10-20 13:02:23 +00:00
Benjamin Kramer	2a8bef8769	Do a sweep over move ctors and remove those that are identical to the default. All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721	2016-10-20 12:20:28 +00:00
Pavel Labath	59838f7ea6	Reapply "Add Chrono.h - std::chrono support header" This is a resubmission of r284590. The mingw build should be fixed now. The problem was we were matching time_t with _localtime_64s, which was incorrect on _USE_32BIT_TIME_T systems. Instead I use localtime_s, which should always evaluate to the correct function. llvm-svn: 284720	2016-10-20 12:05:50 +00:00
Simon Pilgrim	618d3aedaf	[DAGCombiner] Add general constant vector support to (srl (shl x, c), c) -> (and x, cst2) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284717	2016-10-20 11:10:21 +00:00
Simon Pilgrim	25059360d5	Fix spelling mistake in comment. llvm-svn: 284714	2016-10-20 10:42:14 +00:00
Simon Pilgrim	071da46a35	Fix MSVC bool -> uint64_t promotion warning llvm-svn: 284713	2016-10-20 10:37:58 +00:00
Jonas Paulsson	8010b631d5	[SystemZ] Post-RA scheduler implementation Post-RA sched strategy and scheduling instruction annotations for z196, zEC12 and z13. This scheduler optimizes decoder grouping and balances processor resources (including side steering the FPd unit instructions). The SystemZHazardRecognizer keeps track of the scheduling state, which can be dumped with -debug-only=misched. Reviers: Ulrich Weigand, Andrew Trick. https://reviews.llvm.org/D17260 llvm-svn: 284704	2016-10-20 08:27:16 +00:00
Peter Collingbourne	c7766778a0	X86: Allow expressions to appear as u8imm operands. llvm-svn: 284688	2016-10-20 01:58:34 +00:00
Peter Collingbourne	de1f039360	X86: Deduplicate some lowering code. NFCI. llvm-svn: 284686	2016-10-20 01:21:26 +00:00
Reid Kleckner	40d7230f2f	Use __func__ directly now that all supported compilers support it Remove the portability macro now that it is unused. llvm-svn: 284681	2016-10-20 00:22:23 +00:00
Victor Leschuk	2ede126b1b	DebugInfo: preparation to implement DW_AT_alignment - Add alignment attribute to DIVariable family - Modify bitcode format to match new DIVariable representation - Update tests to match these changes (also add bitcode upgrade test) - Expect that frontend passes non-zero align value only when it is not default (was forcibly aligned by alignas()/_Alignas()/__atribute__(aligned()) Differential Revision: https://reviews.llvm.org/D25073 llvm-svn: 284678	2016-10-20 00:13:12 +00:00
Reid Kleckner	990504e625	Remove LLVM_NOEXCEPT and replace it with noexcept Now that we have dropped MSVC 2013, all supported compilers support noexcept and we can drop this portability macro. llvm-svn: 284672	2016-10-19 23:52:38 +00:00
Kevin Enderby	210030ba95	Next set of additional error checks for invalid Mach-O files for the load commands that use the MachO::thread_command type but are not used in llvm libObject code but used in llvm tool code. This includes the LC_UNIXTHREAD and LC_THREAD load commands. A quick note about the philosophy of the error checking in libObject for Mach-O files, the idea behind the checking is that we never will return a Mach-O file out of libObject that contains unknown things in the load commands. To do this the 32-bit ARM and PPC general tread states needed to be defined as two test case binaries contained them. If other thread states for other CPUs need to be added we will do that as needed. Going forward the LC_MAIN load command is used to set the entry point in Mach-O executables these days instead of an LC_UNIXTHREAD as was done in the past. So today only in core files are LC_THREAD load commands and thread states usually found. Other thread states have not yet been defined in include/Support/MachO.h at this time. But that can be added as needed with their corresponding checking also added. llvm-svn: 284668	2016-10-19 23:44:34 +00:00
Rong Xu	2c684cfd94	[PGO] Fix bogus warning for merging empty llvm profile file Profile runtime can generate an empty raw profile (when there is no function in the shared library). This empty profile is treated as a text format profile. A test format profile without the flag of "#IR" is thought to be a clang generated profile. So in llvm profile merging, we will get a bogus warning of "Merge IR generated profile with Clang generated profile." The fix here is to skip the empty profile (when the buffer size is 0) for profile merge. Reviewers: vsk, davidxl Differential Revision: http://reviews.llvm.org/D25687 llvm-svn: 284659	2016-10-19 22:51:17 +00:00
Mehdi Amini	db46b7d217	Add computeHostNumPhysicalCores() implementation for Darwin Differential Revision: https://reviews.llvm.org/D25800 llvm-svn: 284656	2016-10-19 22:36:07 +00:00
Wei Ding	3cb2a1e8d1	AMDGPU : Add a function to enable and disable IEEEBit for SC and shader respectively. Differential Revision: http://reviews.llvm.org/D25789 llvm-svn: 284655	2016-10-19 22:34:49 +00:00
Sanjay Patel	efd8885772	[InstSimplify] fold negation of sign-bit 0 - X --> X, if X is 0 or the minimum signed value 0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW I noticed this pattern might be created in the backend after the change from D25485, so we'll want to add a similar fold for the DAG. The use of computeKnownBits in InstSimplify may be something to investigate if the compile time of InstSimplify is noticeable. We could replace computeKnownBits with specific pattern matchers or limit the recursion. Differential Revision: https://reviews.llvm.org/D25785 llvm-svn: 284649	2016-10-19 21:23:45 +00:00
Hans Wennborg	2d55d67c62	Typo: nomed struct -> named struct llvm-svn: 284635	2016-10-19 20:10:03 +00:00
Reid Kleckner	f8d1d12fef	[GlobalMerge] Handle non-landingpad EH pads This code crashed on funclet-style EH instructions such as catchpad, catchswitch, and cleanuppad. Just treat all EH pad instructions equivalently and avoid merging the globals they reference through any use. llvm-svn: 284633	2016-10-19 19:56:22 +00:00
Artur Pilipenko	5c6ef75485	[IndVarSimplify] Teach calculatePostIncRange to take guards into account Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D25739 llvm-svn: 284632	2016-10-19 19:43:54 +00:00
Matthew Simpson	41fa838f07	[LV] Avoid emitting trivially dead instructions Some instructions from the original loop, when vectorized, can become trivially dead. This happens because of the way we structure the new loop. For example, we create new induction variables and induction variable "steps" in the new loop. Thus, when we go to vectorize the original induction variable update, it may no longer be needed due to the instructions we've already created. This patch prevents us from creating these redundant instructions. This reduces code size before simplification and allows greater flexibility in code generation since we have fewer unnecessary instruction uses. Differential Revision: https://reviews.llvm.org/D25631 llvm-svn: 284631	2016-10-19 19:22:02 +00:00
Chad Rosier	6e3a92ec88	[AliasSetTracker] Add support for memcpy and memmove. Differential Revision: https://reviews.llvm.org/D25776 llvm-svn: 284630	2016-10-19 19:09:03 +00:00
Artur Pilipenko	f2d5dc5dc6	[IndVarSimplify] Use control-dependent range information to prove non-negativity This change is motivated by the case when IndVarSimplify doesn't widen a comparison of IV increment because it can't prove IV increment being non-negative. We end up with a redundant trunc of the widened increment on this example. for.body: %i = phi i32 [ %start, %for.body.lr.ph ], [ %i.inc, %for.inc ] %within_limits = icmp ult i32 %i, 64 br i1 %within_limits, label %continue, label %for.end continue: %i.i64 = zext i32 %i to i64 %arrayidx = getelementptr inbounds i32, i32* %base, i64 %i.i64 %val = load i32, i32* %arrayidx, align 4 br label %for.inc for.inc: %i.inc = add nsw nuw i32 %i, 1 %cmp = icmp slt i32 %i.inc, %limit br i1 %cmp, label %for.body, label %for.end There is a range check inside of the loop which guarantees the IV to be non-negative. NSW on the increment guarantees that the increment is also non-negative. Teach IndVarSimplify to use the range check to prove non-negativity of loop increments. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D25738 llvm-svn: 284629	2016-10-19 18:59:03 +00:00
Chad Rosier	16970a847c	[AliasSetTracker] Return void for add() functions. NFC. Differential Revision: https://reviews.llvm.org/D25748 llvm-svn: 284628	2016-10-19 18:50:32 +00:00
Krzysztof Parzyszek	c87155037b	[AMDGPU] Stop using MCRegisterClass::getSize() Differential Review: https://reviews.llvm.org/D24675 llvm-svn: 284619	2016-10-19 17:40:36 +00:00
Teresa Johnson	ec544c552e	[ThinLTO] Default backend threads to heavyweight_hardware_concurrency Summary: Changes default backend parallelism from thread::hardware_concurrency to the new llvm::heavyweight_hardware_concurrency, which for X86 Linux defaults to the number of physical cores (and will fall back to thread::hardware_concurrency otherwise). This avoid oversubscribing the physical cores using hyperthreading. Reviewers: mehdi_amini, pcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25775 llvm-svn: 284618	2016-10-19 17:35:01 +00:00

1 2 3 4 5 ...

96047 Commits