llvm-project

Commit Graph

Author	SHA1	Message	Date
Adrian Prantl	29cf0c4318	Revert "Teach the IR verifier to reject conflicting debug info for function arguments." This reverts commit r295749 while investigating PR32042. It looks like this check uncovered a problem in the frontend that needs to be fixed before the check can be enabled again. llvm-svn: 296005	2017-02-23 19:13:48 +00:00
Sanjay Patel	4a4fbe162f	[DAG] add convenience function to get -1 constant; NFCI llvm-svn: 296004	2017-02-23 19:02:33 +00:00
Chad Rosier	95abfa35d6	[Reassociate] Add negated value of negative constant to the Duplicates list. In OptimizeAdd, we scan the operand list to see if there are any common factors between operands that can be factored out to reduce the number of multiplies (e.g., 'AA+ABC+D' -> 'A(A+BC)+D'). For each operand of the operand list, we only consider unique factors (which is tracked by the Duplicate set). Now if we find a factor that is a negative constant, we add the negated value as a factor as well, because we can percolate the negate out. However, we mistakenly don't add this negated constant to the Duplicates set. Consider the expression A2-2 + B. Obviously, nothing to factor. For the added value A2*-2 we over count 2 as a factor without this change, which causes the assert reported in PR30256. The problem is that this code is assuming that all the multiply operands of the add are already reassociated. This change avoids the issue by making OptimizeAdd tolerate multiplies which haven't been completely optimized; this sort of works, but we're doing wasted work: we'll end up revisiting the add later anyway. Another possible approach would be to enforce RPO iteration order more strongly. If we have RedoInsts, we process them immediately in RPO order, rather than waiting until we've finished processing the whole function. Intuitively, it seems like the natural approach: reassociation works on expression trees, so the optimization only works in one direction. That said, I'm not sure how practical that is given the current Reassociate; the "optimal" form for an expression depends on its use list (see all the uses of "user_back()"), so Reassociate is really an iterative optimization of sorts, so any changes here would probably get messy. PR30256 Differential Revision: https://reviews.llvm.org/D30228 llvm-svn: 296003	2017-02-23 18:49:03 +00:00
Dehao Chen	533bc6ea8e	Use base discriminator in sample pgo profile matching. Summary: The discriminator has been encoded, and only the base discriminator should be used during profile matching. Reviewers: dblaikie, davidxl Reviewed By: dblaikie, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30218 llvm-svn: 295999	2017-02-23 18:27:45 +00:00
Krzysztof Parzyszek	2cfc7a48de	[Hexagon] Avoid IMPLICIT_DEFs as new-value producers llvm-svn: 295997	2017-02-23 17:47:34 +00:00
Adam Nemet	b516cf3f3f	[LazyMachineBFI] Reimplement with getAnalysisIfAvailable Since LoopInfo is not available in machine passes as universally as in IR passes, using the same approach for OptimizationRemarkEmitter as we did for IR will run LoopInfo and DominatorTree unnecessarily. (LoopInfo is not used lazily by ORE.) To fix this, I am modifying the approach I took in D29836. LazyMachineBFI now uses its client passes including MachineBFI itself that are available or otherwise compute them on the fly. So for example GreedyRegAlloc, since it's already using MBFI, will reuse that instance. On the other hand, AsmPrinter in Justin's patch will generate DT, LI and finally BFI on the fly. (I am of course wondering now if the simplicity of this approach is even preferable in IR. I will do some experiments.) Testing is provided by an updated version of D29837 which requires Justin's patch to bring ORE to the AsmPrinter. Differential Revision: https://reviews.llvm.org/D30128 llvm-svn: 295996	2017-02-23 17:30:01 +00:00
Filipe Cabecinhas	33dd486f1d	[AddressSanitizer] Add PS4 offset llvm-svn: 295994	2017-02-23 17:10:28 +00:00
Sanjay Patel	68e4cb3c86	[InstCombine] use loop instead of recursion to peek through FPExt; NFCI llvm-svn: 295992	2017-02-23 16:39:51 +00:00
Sanjay Patel	adf2ab16e4	[InstCombine] use 'match' to reduce code; NFCI llvm-svn: 295991	2017-02-23 16:26:03 +00:00
Jan Vesely	70293a045b	AMDGPU/SI: Fix trunc i16 pattern Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990	2017-02-23 16:12:21 +00:00
Simon Pilgrim	4c0ea9d438	Strip trailing whitespace. llvm-svn: 295989	2017-02-23 16:07:04 +00:00
Krzysztof Parzyszek	af5ff65d67	[Hexagon] Patterns for CTPOP, BSWAP and BITREVERSE llvm-svn: 295981	2017-02-23 15:02:09 +00:00
Diana Picus	a8cb0cd8f2	[ARM] GlobalISel: Lower call returns Introduce a common ValueHandler for call returns and formal arguments, and inherit two different versions for handling the differences (at the moment the only difference is the way physical registers are marked as used). llvm-svn: 295973	2017-02-23 14:18:41 +00:00
Alexey Bataev	f77d1656af	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295972	2017-02-23 13:37:09 +00:00
Diana Picus	a606713c33	[ARM] GlobalISel: Lower call parameters in regs Add support for lowering calls with parameters than can fit into regs. Use the same ValueHandler that we used for function returns, but rename it to match its new, extended purpose. llvm-svn: 295971	2017-02-23 13:25:43 +00:00
Ayman Musa	4b2c968c43	[X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register class of their first input when creating node in fast-isel. (Quick fix to buildbot failure after rL295940 commit). llvm-svn: 295970	2017-02-23 13:15:44 +00:00
Simon Dardis	d410fc8f28	[mips][ias] Further relax operands of certain assembly instructions This patch adjusts the most relaxed predicate of immediate operands to accept immediate forms such as ~(0xf0000000\|0x000f00000). Previously these forms would be accepted by GAS and rejected by IAS. This partially resolves PR/30383. Thanks to Sean Bruno for reporting the issue! Reviewers: slthakur, seanbruno Differential Revision: https://reviews.llvm.org/D29218 llvm-svn: 295965	2017-02-23 12:40:58 +00:00
Kristof Beyls	5ac6adbb6d	Fix assertion failure in ARMConstantIslandPass. The ARMConstantIslandPass didn't have support for handling accesses to constant island objects through ARM::t2LDRBpci instructions. This adds support for that. This fixes PR31997. llvm-svn: 295964	2017-02-23 12:24:55 +00:00
Simon Pilgrim	858d8e672d	Fix signed/unsigned comparison warning on MSVC llvm-svn: 295962	2017-02-23 12:00:34 +00:00
Alexey Bataev	14b370c1bf	Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong" This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84. llvm-svn: 295957	2017-02-23 11:09:35 +00:00
Alexey Bataev	7ae653285d	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295956	2017-02-23 10:57:15 +00:00
Alexey Bataev	7337212e83	Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong" This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d. llvm-svn: 295951	2017-02-23 09:59:29 +00:00
Alexey Bataev	68f2402c61	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295949	2017-02-23 09:40:38 +00:00
Ayman Musa	524dbdaa2b	[X86][AVX512] Remove VCVTSS2SDZ & VCVTSD2SSZ from memory folding tables as they introduce new read dependency when folding. (Quick fix to buildbot fail). llvm-svn: 295946	2017-02-23 08:13:36 +00:00
Ayman Musa	6e670cf44f	[X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency between VEX/EVEX versions. AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors. Differential Revision: https://reviews.llvm.org/D29988 llvm-svn: 295940	2017-02-23 07:24:21 +00:00
Matt Arsenault	f0a88dbaab	LoadStoreVectorizer: Split even sized illegal chains properly Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933	2017-02-23 03:58:53 +00:00
Craig Topper	185ced8b2b	[X86][IR] In AutoUpgrade, check explicitly for xop.vpcmov and xop.vpcmov.256 instead of anything starting with xop.vpcmov There were some older intrinsics that only existed for less than a month in 2012 that still exist in some out of tree test files that start with this string, but aren't able to be handled by the current upgrade code and fire an assert. Now we'll go back to treating them as not intrinsics at all and just passing them through to output. Fixes PR32041, sort of. llvm-svn: 295930	2017-02-23 03:22:14 +00:00
Matt Arsenault	d4bca1e9ef	AMDGPU: Replace disabled exp inputs with undef llvm-svn: 295914	2017-02-23 00:44:03 +00:00
Matt Arsenault	a9e16e6597	AMDGPU: Add another BFE pattern This is the pattern that falls out of the instruction's definition if offset == 0. llvm-svn: 295912	2017-02-23 00:23:43 +00:00
Matt Arsenault	79a45db7f5	AMDGPU: Use clamp with f64 llvm-svn: 295908	2017-02-22 23:53:37 +00:00
Michael Kuperstein	6181c62b95	Revert r295868 because it breaks a different SLP lit test. llvm-svn: 295906	2017-02-22 23:35:13 +00:00
Matt Arsenault	d5c6515b68	AMDGPU: Fold FP clamp as modifier bit The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905	2017-02-22 23:27:53 +00:00
Wei Ding	f2cce02eb2	AMDGPU : Update TrapCode based on Trap Handler ABI. Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904	2017-02-22 23:22:19 +00:00
Justin Bogner	d519a92a27	[libFuzzer] Update traces hooks test after r293741 This test now passes on darwin. llvm-svn: 295902	2017-02-22 23:12:36 +00:00
Justin Bogner	59c8420018	[libFuzzer] Mark a test that infinite loops as unsupported We need to investigate this, but for now it just causes too much headache when trying to run these tests. llvm-svn: 295900	2017-02-22 23:05:17 +00:00
Matt Arsenault	f5262256a1	AMDGPU: Add replacement bfe intrinsics llvm-svn: 295899	2017-02-22 23:04:58 +00:00
Sanjay Patel	4805ce0b17	[InstCombine] don't try SimplifyDemandedInstructionBits from add/sub because it's slow and unlikely to succeed Notably, no regression tests change when we remove these calls, and these are expensive calls. The motivation comes from the general acknowledgement that the compiler is getting slower: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109188.html http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html And specifically the test case attached to PR32037: https://bugs.llvm.org//show_bug.cgi?id=32037 Profiling the middle-end (opt) part of the compile: $ ./opt -O2 row_common.bc -o /dev/null ...visitAdd and visitSub are near the top of the instcombine list, and the calls to SimplifyDemandedInstructionBits() are high within each of those. Those calls account for 1%+ of the opt time in either debug or release profiles. And that's the rough win I see from this patch when testing opt built release from r295864 on an iMac with Haswell 4GHz (model 4790K). It seems unlikely that we'd be able to eliminate add/sub or change their operands given that add/sub normally affect all bits, and the PR32037 example shows no IR difference after this change using -O2. Also worth noting - the code comment in visitAdd: // This handles stuff like (X & 254)+1 -> (X&254)\|1 ...isn't true. That transform is handled later with a call to haveNoCommonBitsSet(). Differential Revision: https://reviews.llvm.org/D30270 llvm-svn: 295898	2017-02-22 23:01:12 +00:00
Eugene Zelenko	db56e5a89a	[CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 295893	2017-02-22 22:32:51 +00:00
Krzysztof Parzyszek	ab57c2bad3	[Hexagon] Implement @llvm.readcyclecounter() llvm-svn: 295892	2017-02-22 22:28:47 +00:00
Matt Arsenault	7b6c5d28f5	AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPR This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891	2017-02-22 22:23:32 +00:00
Daniel Berlin	fccbda967a	PredicateInfo: Support switch statements Summary: Depends on D29606 and D29682 Makes us pass GVN's edge.ll (we also will pass a few other testcases they just need cleaning up). Thoughts on the Predicate* hiearchy of classes especially welcome :) (it's not clear to me how best to organize it, and currently, the getBlock* seems ... uglier than maybe wasting a field somewhere or something). Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29747 llvm-svn: 295889	2017-02-22 22:20:58 +00:00
Daniel Berlin	17e8d0eae2	Move updating functions to MemorySSAUpdater. Add updater to passes that now need it. Move around code in MemorySSA to expose needed functions. Summary: Mostly cleanup Reviewers: george.burgess.iv Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D30221 llvm-svn: 295887	2017-02-22 22:19:55 +00:00
Wei Mi	74d5a90fa6	[LSR] Canonicalize formula and put recursive Reg related with current loop in ScaledReg. After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs. Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg. But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg. Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant parts RegA and RegB are not grouped together and cannot be promoted outside of loop. With this patch, it will ensure lsr.iv to be generated later in the expr: RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop. Differential Revision: https://reviews.llvm.org/D26781 llvm-svn: 295884	2017-02-22 21:47:08 +00:00
Krzysztof Parzyszek	3596a81c69	[RDF] Support for partial structural aliases in RegisterAggr llvm-svn: 295883	2017-02-22 21:42:15 +00:00
Zachary Turner	842972b740	[Support] Re-add the special OSX flags on mmap. The problem appears to be that these flags can only be used when mapping a file for read-only, not for readwrite. So we do that here. llvm-svn: 295880	2017-02-22 21:24:06 +00:00
Krzysztof Parzyszek	65971d97b0	[Hexagon] Add intrinsics for masked vector stores Patch by Harsha Jagasia. llvm-svn: 295879	2017-02-22 21:23:09 +00:00
Matt Arsenault	93e65ea733	AMDGPU: Don't look at chain users when adjusting writemask Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878	2017-02-22 21:16:41 +00:00
Matt Arsenault	707780b420	AMDGPU: Always allocate emergency stack slot at offset 0 This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877	2017-02-22 21:05:25 +00:00
Matt Arsenault	61ec6a03ca	AMDGPU: Change exp with compr bit printing llvm-svn: 295873	2017-02-22 20:37:12 +00:00
Wei Ding	6ade56e0a0	Revert "AMDGPU : Update TrapCode based on Trap Handler ABI." This reverts commit r295867. llvm-svn: 295871	2017-02-22 20:29:22 +00:00
Alexey Bataev	b551a81c28	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295868	2017-02-22 20:06:40 +00:00
Wei Ding	4991d3570f	AMDGPU : Update TrapCode based on Trap Handler ABI. Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867	2017-02-22 20:05:06 +00:00
Rafael Espindola	f133ccbd8d	Move llvm_unreachable out of switch. This should make gcc happy and still produce a clang warning if we add another value to the enum. llvm-svn: 295865	2017-02-22 19:42:14 +00:00
Geoff Berry	6bb79157dd	[AArch64] Extend AArch64RedundantCopyElimination to do simple copy propagation. Summary: Extend AArch64RedundantCopyElimination to catch cases where the register that is known to be zero is COPY'd in the predecessor block. Before this change, this pass would catch cases like: CBZW %W0, <BB#1> BB#1: %W0 = COPY %WZR // removed After this change, cases like the one below are also caught: %W0 = COPY %W1 CBZW %W1, <BB#1> BB#1: %W0 = COPY %WZR // removed This change results in a 4% increase in static copies removed by this pass when compiling the llvm test-suite. It also fixes regressions caused by doing post-RA copy propagation (a separate change to be put up for review shortly). Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D30113 llvm-svn: 295863	2017-02-22 19:10:45 +00:00
Davide Italiano	e122d6885a	[ModuleSummaryAnalysis] Don't crash when referencing unnamed globals. Instead, just be conservative as these are unfrequent enough. Thanks to Peter Collingbourne for the discussion about this on IRC. llvm-svn: 295861	2017-02-22 18:53:38 +00:00
Dan Gohman	7ea5adfff4	[WebAssembly] Implement the wasm binary container header. Also, update the version number to 0x1, which is what engines are now expecting. llvm-svn: 295860	2017-02-22 18:50:20 +00:00
Karl-Johan Karlsson	6eaed7aceb	[LoopVectorize] Added address space check when analysing interleaved accesses Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 This reverts r295042 (re-applies r295038) with an additional fix for the buildbot problem. llvm-svn: 295858	2017-02-22 18:37:36 +00:00
Dan Gohman	38b42b4a95	[WebAssembly] Define a table of function signatures for runtime library calls. LLVM CodeGen emits references to external symbols that are never declared in LLVM IR level, so they have no declared signature. However, WebAssembly requires all functions be declared with signatures. This patch adds a table for providing signatures for known runtime libcalls that will be used in subsequent patches to emit declarations for such functions. llvm-svn: 295857	2017-02-22 18:34:16 +00:00
Krzysztof Parzyszek	ace1b89060	[RDF] Skip undef uses when calculating kill flags llvm-svn: 295856	2017-02-22 18:29:16 +00:00
Krzysztof Parzyszek	ba36b92bef	[RDF] Only access block live-ins when tracking liveness llvm-svn: 295855	2017-02-22 18:27:36 +00:00
Michal Gorny	5ddd2a5bda	[Support] Provide linux/magic.h fallback for older kernels The function for distinguishing local and remote files added in r295768 unconditionally uses linux/magic.h header to provide necessary filesystem magic numbers. However, in kernel headers predating 2.6.18 the magic numbers are spread throughout multiple include files. Furthermore, LLVM did not require kernel headers being installed so far. To increase the portability across different versions of Linux kernel and different Linux systems, add CMake header checks for linux/magic.h and -- if it is missing -- the linux/nfs_fs.h and linux/smb.h headers which contained the numbers previously. Furthermore, since the numbers are static and the feature does not seem critical enough to make LLVM require kernel headers at all, add fallback constants for the case when none of the necessary headers is available. Differential Revision: https://reviews.llvm.org/D30261 llvm-svn: 295854	2017-02-22 18:09:15 +00:00
Dehao Chen	920677a997	Fix an obvious bug in SampleProfileReaderGCC. Summary: The CallTargetProfile should be added to FProfile to be consistent with other profile readers. Reviewers: dnovillo, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30233 llvm-svn: 295852	2017-02-22 17:27:21 +00:00
Dan Gohman	a63e8eb138	[WebAssembly] Configure codegen to legalize f16 values. llvm-svn: 295850	2017-02-22 16:28:00 +00:00
Bill Seurer	8e48f416ad	[DAGCombiner] revert r295336 r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. llvm-svn: 295849	2017-02-22 16:27:33 +00:00
Simon Pilgrim	13cdd57964	[X86][SSE] getTargetConstantBitsFromNode - insert constant bits directly into masks. Minor optimization, don't create temporary mask APInts that are just going to be OR'd into the accumulate masks - insert directly instead. llvm-svn: 295848	2017-02-22 15:38:13 +00:00
Simon Pilgrim	3a895c4873	[X86][SSE] Use APInt::getBitsSet() instead of APInt::getLowBitsSet().shl() separately. NFCI. llvm-svn: 295845	2017-02-22 15:04:55 +00:00
Simon Pilgrim	3b97067ae8	Fix -Wunused-but-set-variable warning by removing unused 'aggregateIsPacked' checking llvm-svn: 295830	2017-02-22 13:37:31 +00:00
Benjamin Kramer	5a7e0f8357	[GlobalISel] Fix compiler warnings and make assert assert something. llvm-svn: 295827	2017-02-22 12:59:47 +00:00
Alexey Bataev	2e0031b371	[SLP] Remove unused initial value from the variable, NFC. llvm-svn: 295826	2017-02-22 12:57:58 +00:00
Igor Breger	f7359d893a	[X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr . Reviewers: qcolombet, rovka, zvi, ab Reviewed By: rovka Subscribers: mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29816 llvm-svn: 295824	2017-02-22 12:25:09 +00:00
Roger Ferrer Ibanez	56db97d4de	[ARM] Fix constant islands pass. The pass tries to fix a spill of LR that turns out to be unnecessary. So it removes the tPOP but forgets to remove tPUSH. This causes the stack be misaligned upon returning the function. Thus, remove the tPUSH as well in this case. Differential Revision: https://reviews.llvm.org/D30207 llvm-svn: 295816	2017-02-22 09:06:21 +00:00
Ayman Musa	ceea56c705	[X86] Fix memory operands definition for some instructions. Change integer memory operands to FP memory operands to some FP instructions. Differential Revision: https://reviews.llvm.org/D30201 llvm-svn: 295813	2017-02-22 08:06:29 +00:00
Justin Bogner	8281c81413	OptDiag: Add const to some interfaces that don't modify anything. NFC This needed a const_cast for the dominator tree recalculation in OptimizationRemarkEmitter, but we do that all over the place already and it's safe. llvm-svn: 295812	2017-02-22 07:38:17 +00:00
Javed Absar	b672722810	[ARM] Classification Improvements to ARM Sched-Models. NFCI. This patch adds missing sched classes for Thumb2 instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. These patches should help write schedulers better and faster in the future for ARM sub-targets. Reviewer: Diana Picus Differential Revision: https://reviews.llvm.org/D29953 llvm-svn: 295811	2017-02-22 07:22:57 +00:00
Craig Topper	56d4022997	[AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810	2017-02-22 06:54:18 +00:00
Sanjoy Das	5cd6c5cacf	[ValueTracking] Make poison propagation more aggressive Summary: Motivation: fix PR31181 without regression (the actual fix is still in progress). However, the actual content of PR31181 is not relevant here. This change makes poison propagation more aggressive in the following cases: 1. poision * Val == poison, for any Val. In particular, this changes existing intentional and documented behavior in these two cases: a. Val is 0 b. Val is 2^k * N 2. poison << Val == poison, for any Val 3. getelementptr is poison if any input is poison I think all of these are justified (and are axiomatically true in the new poison / undef model): 1a: we need poison * 0 to be poison to allow transforms like these: A * (B + C) ==> A * B + A * C If poison * 0 were 0 then the above transform could not be allowed since e.g. we could have A = poison, B = 1, C = -1, making the LHS poison * (1 + -1) = poison * 0 = 0 and the RHS poison * 1 + poison * -1 = poison + poison = poison 1b: we need e.g. poison * 4 to be poison since we want to allow A * 4 ==> A + A + A + A If poison * 4 were a value with all of their bits poison except the last four; then we'd not be able to do this transform since then if A were poison the LHS would only be "partially" poison while the RHS would be "full" poison. 2: Same reasoning as (1b), we'd like have the following kinds transforms be legal: A << 1 ==> A + A Reviewers: majnemer, efriedma Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D30185 llvm-svn: 295809	2017-02-22 06:52:32 +00:00
Sean Silva	9011aca5f4	Use const-ref in range-loop for to avoid copying pairs of std::string No reason to create temporaries. Differential Revision: https://reviews.llvm.org/D29871 Patch by sergio.martins! llvm-svn: 295807	2017-02-22 06:34:04 +00:00
Dan Gohman	18eafb6c68	[WebAssembly] Add skeleton MC support for the Wasm container format This just adds the basic skeleton for supporting a new object file format. All of the actual encoding will be implemented in followup patches. Differential Revision: https://reviews.llvm.org/D26722 llvm-svn: 295803	2017-02-22 01:23:18 +00:00
Rui Ueyama	e67e162654	Fix -Wcovered-switch-default. llvm-svn: 295799	2017-02-22 01:01:45 +00:00
Matt Arsenault	1f17c66890	AMDGPU: Add cvt.pkrtz intrinsic Convert llvm.SI.packf16 test uses llvm-svn: 295797	2017-02-22 00:27:34 +00:00
Michael Kuperstein	c2af82b4b7	[LoopUnroll] Enable PGO-based loop peeling by default. This enables peeling of loops with low dynamic iteration count by default, when profile information is available. Differential Revision: https://reviews.llvm.org/D27734 llvm-svn: 295796	2017-02-22 00:27:34 +00:00
Matt Arsenault	9417505f7d	AMDGPU: Remove llvm.AMDGPU.clamp intrinsic llvm-svn: 295789	2017-02-21 23:46:04 +00:00
Matt Arsenault	2fdf2a1a18	AMDGPU: Redefine clamp node as clamp 0.0-1.0 Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788	2017-02-21 23:35:48 +00:00
Artem Belevich	29bbdc1c32	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784	2017-02-21 22:56:05 +00:00
Matt Arsenault	7d6b71db4f	AMDGPU: Formatting fixes llvm-svn: 295783	2017-02-21 22:50:41 +00:00
Matt Arsenault	f0a4823b91	DAG: Check if extract_vector_elt is legal or custom Avoids test regressions in future AMDGPU commits when more vector types are custom lowered. llvm-svn: 295782	2017-02-21 22:47:27 +00:00
Evandro Menezes	a8d3301ee1	[AArch64, X86] Add statistics for the MacroFusion pass llvm-svn: 295777	2017-02-21 22:16:13 +00:00
Evandro Menezes	b9b7f4b8d3	[AArch64, X86] Guard against both instrs being wild cards If both instrs are wild cards, the result can be a crash. llvm-svn: 295776	2017-02-21 22:16:11 +00:00
Eugene Zelenko	49e2fc4f5f	[CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 295773	2017-02-21 22:07:52 +00:00
Zachary Turner	e1ca5a294c	Try to fix the buildbot on OSX. Since I'm only seeing failures on OSX, and it's saying permission denied, I'm suspecting this is due to the addition of the MAP_RESILIENT_CODESIGN and/or MAP_RESILIENT_MEDIA flags. Speculatively trying to remove those to get the bots working. llvm-svn: 295770	2017-02-21 21:31:28 +00:00
Zachary Turner	6bc2dac132	Try to fix Android build. llvm-svn: 295769	2017-02-21 21:13:10 +00:00
Zachary Turner	392ed9d342	[Support] Add a function to check if a file resides locally. Differential Revision: https://reviews.llvm.org/D30010 llvm-svn: 295768	2017-02-21 20:55:47 +00:00
Xin Tong	ccee0e0c05	Make default value for disable-licm-promotion in licm explicit. llvm-svn: 295767	2017-02-21 20:53:48 +00:00
Rafael Espindola	23a76be5ad	Don't modify archive members unless really needed. For whatever reason ld64 requires that member headers (not the member themselves) should be aligned. The only way to do that is to edit the previous member so that it ends at an aligned boundary. Since modifying data put in an archive is an undesirable property, llvm-ar should only do it when it is absolutely necessary. llvm-svn: 295765	2017-02-21 20:40:54 +00:00
Evgeniy Stepanov	1fd19c6e5d	Fix PR31896. Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset). llvm-svn: 295762	2017-02-21 20:17:34 +00:00
Zachary Turner	43313b3e89	Try to fix line endings. llvm-svn: 295759	2017-02-21 19:52:57 +00:00
Sanjay Patel	cb731f1538	[InstCombine] canonicalize non-obivous forms of integer min/max This is part of trying to clean up our handling of min/max patterns in IR. By converting these to canonical form, we're more likely to recognize them because there are various places in InstCombine that don't use matchSelectPattern or m_SMax and friends. The backend fixups referenced in the now deleted TODO comment were added with: https://reviews.llvm.org/rL291392 https://reviews.llvm.org/rL289738 If there's any codegen fallout from this change, we should be able to address it in DAGCombiner or target-specific lowering. llvm-svn: 295758	2017-02-21 19:33:53 +00:00
Zachary Turner	3788818730	Remove svn:eol-style property from 2 files. There are still over 3400 files remaining with this property set, but there are tens of thousands more with the property not set. Until we decide what to do on a global scale, this at least unblocks me temporarily. llvm-svn: 295756	2017-02-21 19:29:56 +00:00
Matt Arsenault	c2a44e4c3c	AMDGPU: Remove llvm.AMDGPU.flbit intrinsic llvm-svn: 295754	2017-02-21 19:27:33 +00:00
Matt Arsenault	e0bf7d02f0	AMDGPU: Don't use stack space for SGPR->VGPR spills Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753	2017-02-21 19:12:08 +00:00
Xin Tong	ebfe01c121	[LoopSimplify] Simplify how we compute UniqueExit Summary: Simplify how we compute UniqueExit. Reuse ExitBlockSet. Reviewers: sanjoy, efriedma, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30182 llvm-svn: 295751	2017-02-21 19:10:58 +00:00
Adrian Prantl	11b2d7dad8	Teach the IR verifier to reject conflicting debug info for function arguments. Conflicting debug info for function arguments causes hard-to-debug assertions in the DWARF backend, so the Verifier should reject it. For performance reasons this only checks function arguments from non-inlined debug intrinsics for now. rdar://problem/30520286 llvm-svn: 295749	2017-02-21 19:03:15 +00:00
Geoff Berry	5d534b6a11	[CodeGenPrepare] Sink and duplicate more 'and' instructions. Summary: Rework the code that was sinking/duplicating (icmp and, 0) sequences into blocks where they were being used by conditional branches to form more tbz instructions on AArch64. The new code is more general in that it just looks for 'and's that have all icmp 0's as users, with a target hook used to select which subset of 'and' instructions to consider. This change also enables 'and' sinking for X86, where it is more widely beneficial than on AArch64. The 'and' sinking/duplicating code is moved into the optimizeInst phase of CodeGenPrepare, where it can take advantage of the fact the OptimizeCmpExpression has already sunk/duplicated any icmps into the blocks where they are used. One minor complication from this change is that optimizeLoadExt needed to be updated to always mark 'and's it has determined should be in the same block as their feeding load in the InsertedInsts set to avoid an infinite loop of hoisting and sinking the same 'and'. This change fixes a regression on X86 in the tsan runtime caused by moving GVNHoist to a later place in the optimization pipeline (see PR31382). Reviewers: t.p.northover, qcolombet, MatzeB Subscribers: aemerson, mcrosier, sebpop, llvm-commits Differential Revision: https://reviews.llvm.org/D28813 llvm-svn: 295746	2017-02-21 18:53:14 +00:00
Simon Pilgrim	8eb515d8c4	[X86] EltsFromConsecutiveLoads SDLoc argument should be const&. There appears never to have been a time that the reference was updated. llvm-svn: 295739	2017-02-21 17:42:28 +00:00
Vassil Vassilev	59e5a64435	Do not leak OpenedHandles. Reviewed by Vedant Kumar (D30178) llvm-svn: 295737	2017-02-21 17:30:43 +00:00
Simon Pilgrim	791955819c	[X86][AVX2] Fix VPBROADCASTQ folding on 32-bit targets. As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ. llvm-svn: 295733	2017-02-21 16:41:44 +00:00
John Brawn	a6e95e1652	[ARM] Correct SP/PC handling in t2MOVr PC isn't allowed in the source operand of t2MOVr, so change the register class to one without PC. SP handling is slightly trickier and changes depending on if we're in ARMv8, so do that in checkTargetMatchPredicate. Differential Revision: https://reviews.llvm.org/D30199 llvm-svn: 295732	2017-02-21 16:41:29 +00:00
Simon Pilgrim	3546156122	[X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL. This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns. llvm-svn: 295723	2017-02-21 15:09:00 +00:00
Anna Thomas	ec36f3b79a	[InstCombine] Do not exercise nested max/min pattern on abs Summary: This is a fix for assertion failure in `getInverseMinMaxSelectPattern` when ABS is passed in as a select pattern. We should not be invoking the simplification rule for ABS(MIN(~ x,y))) or ABS(MAX(~x,y)) combinations. Added a test case which would cause an assertion failure without the patch. Reviewers: sanjoy, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30051 llvm-svn: 295719	2017-02-21 14:40:28 +00:00
Igor Breger	812f319794	[AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index. Differential Revision: https://reviews.llvm.org/D30189 llvm-svn: 295718	2017-02-21 14:01:25 +00:00
Diana Picus	613b65696a	[ARM] GlobalISel: Lower calls to void() functions For now, we hardcode a BLX instruction, and generate an ADJCALLSTACKDOWN/UP pair with amount 0. llvm-svn: 295716	2017-02-21 11:33:59 +00:00
Evgeny Stupachenko	9909872e30	The patch introduces new way of narrowing complex (>UINT16 variants) solutions. The new method introduced under "-lsr-exp-narrow" option (currenlty set to true). Summary: The method is based on registers number mathematical expectation and should be generally closer to optimal solution. Please see details in comments to "LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function (in lib/Transforms/Scalar/LoopStrengthReduce.cpp). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D29862 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 295704	2017-02-21 07:34:40 +00:00
Craig Topper	d88389aa7e	[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697	2017-02-21 06:39:13 +00:00
Craig Topper	16d9730b86	[X86] Fix formatting. NFC llvm-svn: 295695	2017-02-21 06:27:13 +00:00
Craig Topper	d9fe664868	[AVX-512] Use sse_load_f32/f64 in place of scalar_to_vector and scalar load in some patterns. llvm-svn: 295693	2017-02-21 04:26:10 +00:00
Craig Topper	d890db6952	[AVX-512] Fix the ExeDomain for vcmpss/vcmpsd. llvm-svn: 295691	2017-02-21 04:26:04 +00:00
Sanjoy Das	7b0b408973	[ValueTracking] clang-format a section I'm about to touch; NFC (Whitespace only change) llvm-svn: 295690	2017-02-21 02:42:42 +00:00
Matthias Braun	9ab403942b	ScheduleDAG: Cleanup; NFC - Fix doxygen comments (do not repeat documented name, remove definition comment if there is already one at the declaration, add \p, ...) - Add some const modifiers - Use range based for llvm-svn: 295688	2017-02-21 01:27:33 +00:00
Matthias Braun	05e5fd6ba2	SubtargetFeature: Cleanup; NFC - Fix doxygen comments - Remove duplicated comments - Remove section comments (which became wrong over time) - Use more `const` and references but less `auto` llvm-svn: 295687	2017-02-21 01:27:29 +00:00
Sanjoy Das	90208720e3	Add a wrapper around copy_if in STLExtras; NFC I will add one more use for this in a later change. llvm-svn: 295685	2017-02-21 00:38:44 +00:00
Taewook Oh	4cf5c1087c	[BranchFolding] Update debug location along with the update of branch instruction. Summary: Currently, BranchFolder drops DebugLoc for branch instructions in some places. For example, for the test code attached, the branch instruction of 'entry' block has a DILocation of ``` !12 = !DILocation(line: 6, column: 3, scope: !11) ``` , but this information is gone when then block is lowered because BranchFolder misses it. This patch is a fix for this issue. Reviewers: qcolombet, aprantl, craig.topper, MatzeB Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29902 llvm-svn: 295684	2017-02-21 00:12:38 +00:00
Sanjoy Das	85cd132068	[IndVars] Add an assert We've already checked that the loop is in simplify form before, but a little paranoia never hurt anyone. llvm-svn: 295680	2017-02-20 23:37:11 +00:00
Davide Italiano	a9de0109b3	[IR/Verifier] List the CU we weren't able to find in `llvm.dbg.cu`. llvm-svn: 295678	2017-02-20 22:51:42 +00:00
Daniel Berlin	78cbd28102	MemorySSA: Add support for renaming uses in the updater. Summary: This lets one add aliasing stores to the updater. (i'm next going to move the creation/etc functions to the updater) Reviewers: george.burgess.iv Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D30154 llvm-svn: 295677	2017-02-20 22:26:03 +00:00
Craig Topper	2012dda9a0	[AVX-512] Add a few more patterns for selecting masked vpternlog with broadcast loads where the passthru operand is not operand 0. llvm-svn: 295673	2017-02-20 17:44:09 +00:00
Simon Pilgrim	2967ed1c7e	[X86] Tidyup combineExtractVectorElt. NFCI. Pull out repeated code for extraction index operand and source vector value type. Use isNullConstant helper to check for zero extraction index. llvm-svn: 295670	2017-02-20 16:09:45 +00:00
Diana Picus	1c33c9f0b0	[ARM] GlobalISel: Don't select atomic loads There used to be a check in the IRTranslator that prevented us from having to deal with atomic loads/stores. That check has been removed in r294993 and the AArch64 backend was updated accordingly. This commit does the same thing for the ARM backend. In general, in the ARM backend we introduce fences during the atomic expand pass, so we don't have to worry about atomics, except for the 32-bit ARMv8 target, which handles atomics more like AArch64. Since we don't want to worry about that yet, just bail out of instruction selection if we find any atomic loads. llvm-svn: 295662	2017-02-20 14:45:58 +00:00
Igor Breger	fda32d266a	[X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector. Its more profitable to go through memory (1 cycles throughput) than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index. IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. Removing the VINSERT node, we don't need it any more. Differential Revision: https://reviews.llvm.org/D29690 llvm-svn: 295660	2017-02-20 14:16:29 +00:00
Simon Pilgrim	5910ebe720	[X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX Use v8i64 ASHR instructions if we don't have VLX. Differential Revision: https://reviews.llvm.org/D28537 llvm-svn: 295656	2017-02-20 12:16:38 +00:00
Simon Pilgrim	c0dc9a4913	Strip trailing whitespace. llvm-svn: 295653	2017-02-20 11:56:43 +00:00
Simon Pilgrim	50b958c07a	[SelectionDAG] Add scalarization support for ISD::*_EXTEND_VECTOR_INREG opcodes. Thanks to Mikael Holmén for the initial test case llvm-svn: 295652	2017-02-20 11:55:58 +00:00
Sjoerd Meijer	e22a79e898	AArch64AsmParser: tablegen the isBranchTarget helper functions Use tablegen to autogenerate isBranchtarget helper functions. This is a cleanup that removes almost identical functions that differ only in a few constants. Differential Revision: https://reviews.llvm.org/D30160 llvm-svn: 295649	2017-02-20 10:57:54 +00:00
Ayman Musa	51ffeab8c8	[X86][AVX] Extend hasVEX_WPrefix bit to accept WIG value (W Ignore) + update all AVX instructions with the new value. Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0. This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible). Differential Revision: https://reviews.llvm.org/D29876 llvm-svn: 295643	2017-02-20 08:27:54 +00:00
Alexey Bataev	f96465b9b8	[SLP] nullptr'ize initial value in `findBuildAggregate()`, NFC. Initial value of V is sett nullptr, as it is not used. llvm-svn: 295642	2017-02-20 08:04:11 +00:00
Alexey Bataev	2f6b124e01	[SLP] Rework `findBuildAggregate()` from ercursive form to iterative, NFC. Reviewers: mkuper Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30103 llvm-svn: 295641	2017-02-20 07:49:39 +00:00
Craig Topper	c6c68f5958	[AVX-512] Add more patterns to fold masked VPTERNLOG with load when the passthru isn't operand 0. llvm-svn: 295640	2017-02-20 07:00:40 +00:00
Craig Topper	a5fa2e40f9	[AVX-512] Fix mistake in the immediate swizzle for some of the VPTERNLOG patterns. llvm-svn: 295638	2017-02-20 07:00:34 +00:00
Lang Hames	67de5d24a9	[Orc] Rename ObjectLinkingLayer -> RTDyldObjectLinkingLayer. The current ObjectLinkingLayer (now RTDyldObjectLinkingLayer) links objects in-process using MCJIT's RuntimeDyld class. In the near future I hope to add new object linking layers (e.g. a remote linking layer that links objects in the JIT target process, rather than the client), so I'm renaming this class to be more descriptive. llvm-svn: 295636	2017-02-20 05:45:14 +00:00
Craig Topper	5b4e36aafa	[AVX-512] Add more VPTERNLOG patterns to enable folding of broadcast loads that aren't in operand 2. llvm-svn: 295634	2017-02-20 02:47:42 +00:00
Craig Topper	c184b671d9	[X86] Use memory form of shift right by 1 when the rotl immediate is one less than the operation size. An earlier commit already did this for the register form. llvm-svn: 295626	2017-02-20 00:37:23 +00:00
Craig Topper	63801df251	[AVX-512] Remove AddedComplexity from masked operations. The size of the patterns already increases their priority. llvm-svn: 295619	2017-02-19 21:44:35 +00:00
Simon Pilgrim	14a7eee0b4	[X86] Use peekThroughOneUseBitcasts helper. NFCI. llvm-svn: 295618	2017-02-19 21:40:51 +00:00
Davide Italiano	16b476ffcc	[X86] Prefer static_cast<> to C-style cast. NFCI. llvm-svn: 295617	2017-02-19 21:35:41 +00:00
Craig Topper	489057715e	[AVX-512] Disable peephole optimizations on the VPTERNLOG commute test. Add new patterns to enable isel to fold the loads on it own. llvm-svn: 295616	2017-02-19 21:32:15 +00:00
Davide Italiano	1aef59eb44	[AArch64] Prefer static_cast<> to C-style cast. NFCI. llvm-svn: 295615	2017-02-19 21:31:14 +00:00
Simon Pilgrim	d590de2998	[X86][SSE] Use getTargetConstantBitsFromNode to find zeroable shuffle elements. Replaces existing approach that could only search BUILD_VECTOR nodes. Requires getTargetConstantBitsFromNode to discriminate cases with all/partial UNDEF bits in each element - this should also be useful when we get around to supporting getTargetShuffleMaskIndices with UNDEF elements. llvm-svn: 295613	2017-02-19 19:40:31 +00:00
Craig Topper	4e794c71a6	[AVX-512] Add patterns to recognize masked vpternlog when the passthrough operand is not operand 0. This uses a SDNodeXForm to swizzle the appropriate immediate bits to allow this to be matched. llvm-svn: 295612	2017-02-19 19:36:58 +00:00
Simon Pilgrim	4271186f9c	[X86][SSE] Enable initial support for domain crossing at high shuffle combine depths. As discussed on D27692, this permits another domain to be used to combine a shuffle at high depths. We currently set the required depth at 4 or more combined shuffles, this is probably too high for most targets but is a good starting point and already helps avoid a number of costly variable shuffles. llvm-svn: 295608	2017-02-19 17:19:38 +00:00
Artyom Skrobov	be31754094	Remove redundant call to GluedNodes.back() [NFC] llvm-svn: 295607	2017-02-19 16:56:18 +00:00
Simon Pilgrim	6d07d514de	[X86][SSE] Generalize INSERTPS/SHUFPS/SHUFPD combines across domains. Relax the INSERTPS/SHUFPS/SHUFPD combines to support integer inputs if permitted. llvm-svn: 295606	2017-02-19 15:15:40 +00:00

1 2 3 4 5 ...

100087 Commits