llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	a78b768ed4	[AVX-512] Promote 512-bit integer loads to v8i64 similar to what is done for 128/256-bit vectors for overall consistency. llvm-svn: 278318	2016-08-11 06:04:07 +00:00
Craig Topper	14aa2665d3	[AVX-512] Add patterns to allow EVEX encoded stores of v16i16/v8i16/v16i8/v32i8 even when BWI is not supported. llvm-svn: 278317	2016-08-11 06:04:04 +00:00
Craig Topper	3563d0f622	[AVX-512] Fix the 128-bit and 256-bit nontemporal load patterns with elements type other than i64. These loads have all been promoted to v2i64/v4i64 loads so we need bitcasts or we end up selecting VMOVDQA32/VMOVDQU32 instead. llvm-svn: 278316	2016-08-11 06:04:00 +00:00
Tim Northover	357f1be2ca	GlobalISel: support same ConstantExprs as Instructions. It's more than just inttoptr, but the others can't be tested until we have support for non-trivial constants (they currently get unavoidably folded to a ConstantInt). llvm-svn: 278303	2016-08-10 23:02:41 +00:00
Tim Northover	2ff5935a95	GlobalISel: add tests forgotten in r278293. llvm-svn: 278296	2016-08-10 22:13:48 +00:00
Changpeng Fang	fb9c3818dd	AMDGPU/SI: Implement amdgcn image intrinsics with sampler Summary: This patch define and implement amdgcn image intrinsics with sampler. 1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty, and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name to have three suffixes to overload each of these three types; 2. D128 as well as two other flags are implied in the three types, for example, if you use v8i32 as resource type, then r128 is 0! 3. don't expose TFE flag, and other flags are exposed in the instruction order: unrm, glc, slc, lwe and da. Differential Revision: http://reviews.llvm.org/D22838 Reviewed by: arsenm and tstellarAMD llvm-svn: 278291	2016-08-10 21:15:30 +00:00
Kyle Butt	81d32846b0	Codegen: Don't tail-duplicate blocks with un-analyzable fallthrough. If AnalyzeBranch can't analyze a block and it is possible to fallthrough, then duplicating the block doesn't make sense, as only one block can be the layout predecessor for the un-analyzable fallthrough. Submitted wit a test case, but NOTE: the test case doesn't currently fail. However, the test case fails with D20505 and would have saved me some time debugging. llvm-svn: 278288	2016-08-10 21:03:27 +00:00
Kyle Butt	e1c931b171	CodeGen: If Convert blocks that would form a diamond when tail-merged. The following function currently relies on tail-merging for if conversion to succeed. The common tail of cond_true and cond_false is extracted, and this then forms a diamond pattern that can be successfully if converted. If this block does not get extracted, either because tail-merging is disabled or the threshold is higher, we should still recognize this pattern and if-convert it. Fixed a regression in the original commit. Need to un-reverse branches after reversing them, or other conversions go awry. define i32 @t2(i32 %a, i32 %b) nounwind { entry: %tmp1434 = icmp eq i32 %a, %b ; <i1> [#uses=1] br i1 %tmp1434, label %bb17, label %bb.outer bb.outer: ; preds = %cond_false, %entry %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ] %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ] br label %bb bb: ; preds = %cond_true, %bb.outer %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ] %tmp. = sub i32 0, %b_addr.021.0.ph %tmp.40 = mul i32 %indvar, %tmp. %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph br i1 %tmp3, label %cond_true, label %cond_false cond_true: ; preds = %bb %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph %indvar.next = add i32 %indvar, 1 br i1 %tmp1437, label %bb17, label %bb cond_false: ; preds = %bb %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0 %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10 br i1 %tmp14, label %bb17, label %bb.outer bb17: ; preds = %cond_false, %cond_true, %entry %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ] ret i32 %a_addr.026.1 } Without tail-merging or diamond-tail if conversion: LBB1_1: @ %bb @ =>This Inner Loop Header: Depth=1 cmp r0, r1 ble LBB1_3 @ BB#2: @ %cond_true @ in Loop: Header=BB1_1 Depth=1 subs r0, r0, r1 cmp r1, r0 it ne cmpne r0, r1 bgt LBB1_4 LBB1_3: @ %cond_false @ in Loop: Header=BB1_1 Depth=1 subs r1, r1, r0 cmp r1, r0 bne LBB1_1 LBB1_4: @ %bb17 bx lr With diamond-tail if conversion, but without tail-merging: @ BB#0: @ %entry cmp r0, r1 it eq bxeq lr LBB1_1: @ %bb @ =>This Inner Loop Header: Depth=1 cmp r0, r1 ite le suble r1, r1, r0 subgt r0, r0, r1 cmp r1, r0 bne LBB1_1 @ BB#2: @ %bb17 bx lr llvm-svn: 278287	2016-08-10 20:45:56 +00:00
Matt Arsenault	57431c9680	AMDGPU: Change insertion point of si_mask_branch Insert before the skip branch if one is created. This is a somewhat more natural placement relative to the skip branches, and makes it possible to implement analyzeBranch for skip blocks. The test changes are mostly due to a quirk where the block label is not emitted if there is a terminator that is not also a branch. llvm-svn: 278273	2016-08-10 19:11:42 +00:00
Sanjay Patel	5ccc85fe83	[x86, AVX] allow FP vector select folding to bitwise logic ops (PR28895) This handles the case in: https://llvm.org/bugs/show_bug.cgi?id=28895 ...but we are not getting all of the possibilities yet. Eg, we use 'X86::FANDN' for scalar FP select combines. That enhancement is filed as: https://llvm.org/bugs/show_bug.cgi?id=28925 Differential Revision: https://reviews.llvm.org/D23337 llvm-svn: 278270	2016-08-10 19:00:11 +00:00
Nicolai Haehnle	02d784172c	LiveIntervalAnalysis: fix a crash in repairOldRegInRange Summary: See the new test case for one that was (non-deterministically) crashing on trunk and deterministically hit the assertion that I added in D23302. Basically, the machine function contains a sequence DS_WRITE_B32 %vreg4, %vreg14:sub0, ... DS_WRITE_B32 %vreg4, %vreg14:sub0, ... %vreg14:sub1<def> = COPY %vreg14:sub0 and SILoadStoreOptimizer::mergeWrite2Pair merges the two DS_WRITE_B32 instructions into one before calling repairIntervalsInRange. Now repairIntervalsInRange wants to repair %vreg14, in particular, and ends up trying to repair %vreg14:sub1 as well, but that only becomes active _after_ the range that is to be repaired, hence the crash due to LR.find(...) == LR.begin() at the start of repairOldRegInRange. I believe that just skipping those subrange is fine, but again, not too familiar with that code. Reviewers: MatzeB, kparzysz, tstellarAMD Subscribers: llvm-commits, MatzeB Differential Revision: https://reviews.llvm.org/D23303 llvm-svn: 278268	2016-08-10 18:51:14 +00:00
Kyle Butt	71b1ca1be4	Codegen: Tail Merge: Be less aggressive with special cases. This change makes it possible for tail-duplication and tail-merging to be disjoint. By being less aggressive when merging during layout, there are no overlapping cases between tail-duplication and tail-merging, provided the thresholds are disjoint. There is a remaining TODO to benchmark the succ_size() test for non-layout tail merging. llvm-svn: 278265	2016-08-10 18:36:18 +00:00
Krzysztof Parzyszek	0bbad0fc86	[Hexagon] Simplify the SplitConst32/64 pass llvm-svn: 278256	2016-08-10 18:05:47 +00:00
Krzysztof Parzyszek	3b946c90ef	[Hexagon] Add extra patterns for single-precision min/max instructions llvm-svn: 278252	2016-08-10 17:56:24 +00:00
Tim Northover	7552ef5a00	GlobalISel: avoid inserting redundant COPYs for bitcasts. If the value produced by the bitcast hasn't been referenced yet, we can simply reuse the input register avoiding an unnecessary COPY instruction. llvm-svn: 278245	2016-08-10 16:51:14 +00:00
Krzysztof Parzyszek	a3386501af	[Hexagon] Use integer instructions for floating point immediates Floating point instructions use general purpose registers, so the few instructions that can put floating point immediates into registers are, in fact, integer instruction. Use them explicitly instead of having pseudo-instructions specifically for dealing with floating point values. Simplify the constant loading instructions (from sdata) to have only two: one for 32-bit values and one for 64-bit values: CONST32 and CONST64. llvm-svn: 278244	2016-08-10 16:46:36 +00:00
Simon Pilgrim	b204f03004	[X86][XOP] Tweak vpermil2pd test to stop it being combined away The target shuffle combined to a BLENDPD pattern which we will shortly add support for. llvm-svn: 278233	2016-08-10 15:15:56 +00:00
Simon Pilgrim	f1f55198c1	[X86][SSE] Regenerate vector shift lowering tests llvm-svn: 278232	2016-08-10 15:13:49 +00:00
Sanjay Patel	2c677a9306	use different comparison predicates for better test coverage llvm-svn: 278229	2016-08-10 15:06:11 +00:00
Simon Pilgrim	ac8fa6c2c6	[X86][SSE] Add support for combining target shuffles to MOVSS/MOVSD Only do this on pre-SSE41 targets where we should be lowering to BLENDPS/BLENDPD instead llvm-svn: 278228	2016-08-10 14:15:41 +00:00
Simon Pilgrim	d99242c44d	[X86][SSE] Regenerate SSE1 tests Properly demonstrate the nasty codegen we get for vselect without integer vectors llvm-svn: 278215	2016-08-10 12:26:40 +00:00
Simon Pilgrim	cb5a189b90	Regenerate test llvm-svn: 278214	2016-08-10 12:24:19 +00:00
Simon Pilgrim	85c7ea86ae	[DAGCombine] Avoid INSERT_SUBVECTOR reinsertions (PR28678) If the input vector to INSERT_SUBVECTOR is another INSERT_SUBVECTOR, and this inserted subvector replaces the last insertion, then insert into the common source vector. i.e. INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx ) --> INSERT_SUBVECTOR( Vec, SubNew, Idx ) Differential Revision: https://reviews.llvm.org/D23330 llvm-svn: 278211	2016-08-10 10:50:53 +00:00
Sam Parker	62965c96df	[ARM] Improve sxta{b\|h} and uxta{b\|h} tests Created a Thumb2 predicated pattern matcher that uses Thumb2 and HasT2ExtractPack and used it to redefine the patterns for sxta{b\|h} and uxta{b\|h}. Also used the similar patterns to fill in isel pattern gaps for the corresponding instructions in the ARM backend. The patch is mainly changes to tests since most of this functionality appears not to have been tested. Differential Revision: https://reviews.llvm.org/D23273 llvm-svn: 278207	2016-08-10 09:34:34 +00:00
Tim Northover	d403a3d8ee	GlobalISel: support 'undef' constant. llvm-svn: 278174	2016-08-09 23:01:30 +00:00
Derek Schuff	66641322ce	[WebAssembly] Add -emscripten-cxx-exceptions-whitelist option This patch adds -emscripten-cxx-exceptions-whitelist option to WebAssemblyLowerEmscriptenExceptions pass. This options is the list of function names in which Emscripten-style exception handling is enabled. This is to support emscripten's EXCEPTION_CATCHING_WHITELIST which exists because of the performance impact of emscripten's non-zero-cost EH method. Patch by Heejin Ahn Differential Revision: https://reviews.llvm.org/D23292 llvm-svn: 278171	2016-08-09 22:37:00 +00:00
Tim Northover	5ed648e509	GlobalISel: first translation support for Constants. For now put them all in the entry block. This should be correct but may give poor runtime performance. Hopefully MachineSinking combined with isReMaterializable can solve those issues, but if not the interface is sound enough to support alternatives. llvm-svn: 278168	2016-08-09 21:28:04 +00:00
Sanjay Patel	d34b128fbc	add test cases for missed vselect optimizations (PR28895) llvm-svn: 278165	2016-08-09 21:07:17 +00:00
Sanjay Patel	b61346b8b0	regenerate checks and remove 'opt' run dependency llvm-svn: 278154	2016-08-09 20:09:16 +00:00
David Majnemer	adc688ce9c	[X86] Don't model UD2/UD2B as a terminator A UD2 might make its way into the program via a call to @llvm.trap. Obviously, calls are not terminators. However, we modeled the X86 instruction, UD2, as a terminator. Later on, this confuses the epilogue insertion machinery which results in the epilogue getting inserted before the UD2. For some platforms, like x64, the result is a violation of the ABI. Instead, model UD2/UD2B as a side effecting instruction which may observe memory. llvm-svn: 278144	2016-08-09 17:55:12 +00:00
Simon Pilgrim	76964e3140	[DAGCombiner] Better support for shifting large value type by constants As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths. This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required. Differential Revision: https://reviews.llvm.org/D23007 llvm-svn: 278141	2016-08-09 17:39:11 +00:00
Simon Pilgrim	27740d038c	[X86][XOP] Add support for combining target shuffles to VPERMIL2PD/VPERMIL2PS llvm-svn: 278120	2016-08-09 12:56:15 +00:00
Elena Demikhovsky	0e0e07f436	AVX-512: A new test for FMA intrinsic A new test that explores sub-optimal sequence of FMA intrinsic and FNEG operation. An upcoming patch will fix it. llvm-svn: 278117	2016-08-09 11:54:14 +00:00
Simon Pilgrim	aae7d4a1b6	[X86][XOP] Add support for combining target shuffles to VPPERM llvm-svn: 278114	2016-08-09 10:56:29 +00:00
Dean Michael Berris	3a25d84a51	[XRay] Test for xray_instr_map in object file. (NFC) This makes a trivial change in the emission of the per-function XRay tables, and makes sure that the xray_instr_map section does show up in the object file. llvm-svn: 278113	2016-08-09 10:42:11 +00:00
Simon Pilgrim	54c32ddf55	[X86][SSE] Fix memory folding of (v)roundsd / (v)roundss We only had partial memory folding support for the intrinsic definitions, and (as noted on PR27481) was causing FR32/FR64/VR128 mismatch errors with the machine verifier. This patch adds missing memory folding support for both intrinsics and the ffloor/fnearbyint/fceil/frint/ftrunc patterns and in doing so fixes the failing machine verifier stack folding tests from PR27481. Differential Revision: https://reviews.llvm.org/D23276 llvm-svn: 278106	2016-08-09 09:32:34 +00:00
Craig Topper	92a4ff1294	[AVX-512] Add support for execution domain switching masked logical ops between floating point and integer domain. This switches PS<->D and PD<->Q. llvm-svn: 278097	2016-08-09 05:26:07 +00:00
Craig Topper	9bd6241106	[X86] Remove the Fv packed logical operation alias instructions. Replace them with patterns to the regular instructions. This enables execution domain fixing which is why the tests changed. llvm-svn: 278090	2016-08-09 03:06:33 +00:00
Craig Topper	de06b51d3d	[X86] Remove unnecessary bitcast from the front of AVX1Only 256-bit logical operation patterns. llvm-svn: 278088	2016-08-09 03:06:26 +00:00
Matthias Braun	7313ca6dbf	X86InstrInfo: Update liveness in classifyLea() We need to update liveness information when we create COPYs in classifyLea(). This fixes http://llvm.org/28301 llvm-svn: 278086	2016-08-09 01:47:26 +00:00
Derek Schuff	53b9af02c8	[WebAssembly] Fix bugs in WebAssemblyLowerEmscriptenExceptions pass * Delete extra '_' prefixes from JS library function names. fixImports() function in JS glue code deals with this for wasm. * Change command-line option names in order to be consistent with asm.js. * Add missing lowering code for llvm.eh.typeid.for intrinsics * Delete commas in mangled function names * Fix a function argument attributes bug. Because we add the pointer to the original callee as the first argument of invoke wrapper, all argument attribute indices have to be incremented by one. Patch by Heejin Ahn Differential Revision: https://reviews.llvm.org/D23258 llvm-svn: 278081	2016-08-09 00:29:55 +00:00
Derek Schuff	b7d6d9e3cd	[WebAssembly] Fix CFI index to account for padding nullptr function The WebAssembly linker now creates a dummy function at index 0 to prevent miscomparisons with the NULL pointer, see https://github.com/WebAssembly/binaryen/pull/658. Thanks to pcc for pointing out this problem! Patch by Dominic Chen Differential Revision: https://reviews.llvm.org/D23137 llvm-svn: 278073	2016-08-08 23:56:01 +00:00
Charles Davis	e9c32c7ed3	Revert "[X86] Support the "ms-hotpatch" attribute." This reverts commit r278048. Something changed between the last time I built this--it takes awhile on my ridiculously slow and ancient computer--and now that broke this. llvm-svn: 278053	2016-08-08 21:20:15 +00:00
Charles Davis	0822aa118e	[X86] Support the "ms-hotpatch" attribute. Summary: Based on two patches by Michael Mueller. This is a target attribute that causes a function marked with it to be emitted as "hotpatchable". This particular mechanism was originally devised by Microsoft for patching their binaries (which they are constantly updating to stay ahead of crackers, script kiddies, and other ne'er-do-wells on the Internet), but is now commonly abused by Windows programs to hook API functions. This mechanism is target-specific. For x86, a two-byte no-op instruction is emitted at the function's entry point; the entry point must be immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding. This padding is where the patch code is written. The two byte no-op is then overwritten with a short jump into this code. The no-op is usually a `movl %edi, %edi` instruction; this is used as a magic value indicating that this is a hotpatchable function. Reviewers: majnemer, sanjoy, rnk Subscribers: dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D19908 llvm-svn: 278048	2016-08-08 21:01:39 +00:00
Krzysztof Parzyszek	341cf3fbe5	[Hexagon] Add pattern for 64-bit mulhs llvm-svn: 278040	2016-08-08 19:24:25 +00:00
Elliot Colp	d9e6668928	Re-add SystemZ SNaN test The floating-point bug affecting ninja-x64-msvc-RA-centos6 is fixed (r277813) so this test should now pass llvm-svn: 278034	2016-08-08 18:11:13 +00:00
Oliver Stannard	8331aaee8f	[ARM] Add support for embedded position-independent code This patch adds support for some new relocation models to the ARM backend: * Read-only position independence (ROPI): Code and read-only data is accessed PC-relative. The offsets between all code and RO data sections are known at static link time. This does not affect read-write data. * Read-write position independence (RWPI): Read-write data is accessed relative to the static base register (r9). The offsets between all writeable data sections are known at static link time. This does not affect read-only data. These two modes are independent (they specify how different objects should be addressed), so they can be used individually or together. They are otherwise the same as the "static" relocation model, and are not compatible with SysV-style PIC using a global offset table. These modes are normally used by bare-metal systems or systems with small real-time operating systems. They are designed to avoid the need for a dynamic linker, the only initialisation required is setting r9 to an appropriate value for RWPI code. I have only added support to SelectionDAG, not FastISel, because FastISel is currently disabled for bare-metal targets where these modes would be used. Differential Revision: https://reviews.llvm.org/D23195 llvm-svn: 278015	2016-08-08 15:28:31 +00:00
Silviu Baranga	fa00ba3c1a	[AArch64] PR28877: Don't assume we're running after legalization when creating vcvtfp2fxs Summary: The DAG combine transformation that was generating the aarch64_neon_vcvtfp2fxs node was assuming that all inputs where legal and wasn't accounting that the input could be a v4f64 if we're trying to do the transformation before legalization. We now bail out in this case. All illegal types besides v4f64 were already rejected. Fixes https://llvm.org/bugs/show_bug.cgi?id=28877. Reviewers: jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D23261 llvm-svn: 278002	2016-08-08 13:13:57 +00:00
Craig Topper	f44423120f	[AVX-512] Improve lowering of inserting a single element into lowest element of a 512-bit vector of zeroes by using vmovq/vmovd/vmovss/vmovsd. llvm-svn: 277965	2016-08-07 21:52:59 +00:00
Nico Weber	99ceee8a85	Revert r277905, it caused PR28894 llvm-svn: 277962	2016-08-07 20:18:04 +00:00

1 2 3 4 5 ...

16939 Commits