llvm-project

Commit Graph

Author	SHA1	Message	Date
Artem Belevich	29bbdc1c32	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. Original code only used vector loads/stores for explicit vector arguments. It could also do more loads/stores than necessary (e.g v5f32 would touch 8 f32 values). Aggregate types were loaded one element at a time, even the vectors contained within. This change attempts to generalize (and simplify) parameter space loads/stores so that vector loads/stores can be used more broadly. Functionality of the patch has been verified by compiling thrust test suite and manually checking the differences between PTX generated by llvm with and without the patch. General algorithm: * ComputePTXValueVTs() flattens input/output argument into a flat list of scalars to load/store and returns their types and offsets. * VectorizePTXValueVTs() uses that data to create vectorization plan which returns an array of flags marking boundaries of vectorized load/stores. Scalars are represented as 1-element vectors. * Code that generates loads/stores implements a simple state machine that constructs a vector according to the plan. Differential Revision: https://reviews.llvm.org/D30011 llvm-svn: 295784	2017-02-21 22:56:05 +00:00
Matt Arsenault	7d6b71db4f	AMDGPU: Formatting fixes llvm-svn: 295783	2017-02-21 22:50:41 +00:00
Matt Arsenault	f0a4823b91	DAG: Check if extract_vector_elt is legal or custom Avoids test regressions in future AMDGPU commits when more vector types are custom lowered. llvm-svn: 295782	2017-02-21 22:47:27 +00:00
Evandro Menezes	a8d3301ee1	[AArch64, X86] Add statistics for the MacroFusion pass llvm-svn: 295777	2017-02-21 22:16:13 +00:00
Evandro Menezes	b9b7f4b8d3	[AArch64, X86] Guard against both instrs being wild cards If both instrs are wild cards, the result can be a crash. llvm-svn: 295776	2017-02-21 22:16:11 +00:00
Eugene Zelenko	49e2fc4f5f	[CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 295773	2017-02-21 22:07:52 +00:00
Zachary Turner	e1ca5a294c	Try to fix the buildbot on OSX. Since I'm only seeing failures on OSX, and it's saying permission denied, I'm suspecting this is due to the addition of the MAP_RESILIENT_CODESIGN and/or MAP_RESILIENT_MEDIA flags. Speculatively trying to remove those to get the bots working. llvm-svn: 295770	2017-02-21 21:31:28 +00:00
Zachary Turner	6bc2dac132	Try to fix Android build. llvm-svn: 295769	2017-02-21 21:13:10 +00:00
Zachary Turner	392ed9d342	[Support] Add a function to check if a file resides locally. Differential Revision: https://reviews.llvm.org/D30010 llvm-svn: 295768	2017-02-21 20:55:47 +00:00
Xin Tong	ccee0e0c05	Make default value for disable-licm-promotion in licm explicit. llvm-svn: 295767	2017-02-21 20:53:48 +00:00
Rafael Espindola	23a76be5ad	Don't modify archive members unless really needed. For whatever reason ld64 requires that member headers (not the member themselves) should be aligned. The only way to do that is to edit the previous member so that it ends at an aligned boundary. Since modifying data put in an archive is an undesirable property, llvm-ar should only do it when it is absolutely necessary. llvm-svn: 295765	2017-02-21 20:40:54 +00:00
Evgeniy Stepanov	1fd19c6e5d	Fix PR31896. Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset). llvm-svn: 295762	2017-02-21 20:17:34 +00:00
Zachary Turner	43313b3e89	Try to fix line endings. llvm-svn: 295759	2017-02-21 19:52:57 +00:00
Sanjay Patel	cb731f1538	[InstCombine] canonicalize non-obivous forms of integer min/max This is part of trying to clean up our handling of min/max patterns in IR. By converting these to canonical form, we're more likely to recognize them because there are various places in InstCombine that don't use matchSelectPattern or m_SMax and friends. The backend fixups referenced in the now deleted TODO comment were added with: https://reviews.llvm.org/rL291392 https://reviews.llvm.org/rL289738 If there's any codegen fallout from this change, we should be able to address it in DAGCombiner or target-specific lowering. llvm-svn: 295758	2017-02-21 19:33:53 +00:00
Zachary Turner	3788818730	Remove svn:eol-style property from 2 files. There are still over 3400 files remaining with this property set, but there are tens of thousands more with the property not set. Until we decide what to do on a global scale, this at least unblocks me temporarily. llvm-svn: 295756	2017-02-21 19:29:56 +00:00
Matt Arsenault	c2a44e4c3c	AMDGPU: Remove llvm.AMDGPU.flbit intrinsic llvm-svn: 295754	2017-02-21 19:27:33 +00:00
Matt Arsenault	e0bf7d02f0	AMDGPU: Don't use stack space for SGPR->VGPR spills Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753	2017-02-21 19:12:08 +00:00
Xin Tong	ebfe01c121	[LoopSimplify] Simplify how we compute UniqueExit Summary: Simplify how we compute UniqueExit. Reuse ExitBlockSet. Reviewers: sanjoy, efriedma, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30182 llvm-svn: 295751	2017-02-21 19:10:58 +00:00
Adrian Prantl	11b2d7dad8	Teach the IR verifier to reject conflicting debug info for function arguments. Conflicting debug info for function arguments causes hard-to-debug assertions in the DWARF backend, so the Verifier should reject it. For performance reasons this only checks function arguments from non-inlined debug intrinsics for now. rdar://problem/30520286 llvm-svn: 295749	2017-02-21 19:03:15 +00:00
Geoff Berry	5d534b6a11	[CodeGenPrepare] Sink and duplicate more 'and' instructions. Summary: Rework the code that was sinking/duplicating (icmp and, 0) sequences into blocks where they were being used by conditional branches to form more tbz instructions on AArch64. The new code is more general in that it just looks for 'and's that have all icmp 0's as users, with a target hook used to select which subset of 'and' instructions to consider. This change also enables 'and' sinking for X86, where it is more widely beneficial than on AArch64. The 'and' sinking/duplicating code is moved into the optimizeInst phase of CodeGenPrepare, where it can take advantage of the fact the OptimizeCmpExpression has already sunk/duplicated any icmps into the blocks where they are used. One minor complication from this change is that optimizeLoadExt needed to be updated to always mark 'and's it has determined should be in the same block as their feeding load in the InsertedInsts set to avoid an infinite loop of hoisting and sinking the same 'and'. This change fixes a regression on X86 in the tsan runtime caused by moving GVNHoist to a later place in the optimization pipeline (see PR31382). Reviewers: t.p.northover, qcolombet, MatzeB Subscribers: aemerson, mcrosier, sebpop, llvm-commits Differential Revision: https://reviews.llvm.org/D28813 llvm-svn: 295746	2017-02-21 18:53:14 +00:00
Simon Pilgrim	8eb515d8c4	[X86] EltsFromConsecutiveLoads SDLoc argument should be const&. There appears never to have been a time that the reference was updated. llvm-svn: 295739	2017-02-21 17:42:28 +00:00
Vassil Vassilev	59e5a64435	Do not leak OpenedHandles. Reviewed by Vedant Kumar (D30178) llvm-svn: 295737	2017-02-21 17:30:43 +00:00
Simon Pilgrim	791955819c	[X86][AVX2] Fix VPBROADCASTQ folding on 32-bit targets. As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ. llvm-svn: 295733	2017-02-21 16:41:44 +00:00
John Brawn	a6e95e1652	[ARM] Correct SP/PC handling in t2MOVr PC isn't allowed in the source operand of t2MOVr, so change the register class to one without PC. SP handling is slightly trickier and changes depending on if we're in ARMv8, so do that in checkTargetMatchPredicate. Differential Revision: https://reviews.llvm.org/D30199 llvm-svn: 295732	2017-02-21 16:41:29 +00:00
Simon Pilgrim	3546156122	[X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL. This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns. llvm-svn: 295723	2017-02-21 15:09:00 +00:00
Anna Thomas	ec36f3b79a	[InstCombine] Do not exercise nested max/min pattern on abs Summary: This is a fix for assertion failure in `getInverseMinMaxSelectPattern` when ABS is passed in as a select pattern. We should not be invoking the simplification rule for ABS(MIN(~ x,y))) or ABS(MAX(~x,y)) combinations. Added a test case which would cause an assertion failure without the patch. Reviewers: sanjoy, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30051 llvm-svn: 295719	2017-02-21 14:40:28 +00:00
Igor Breger	812f319794	[AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index. Differential Revision: https://reviews.llvm.org/D30189 llvm-svn: 295718	2017-02-21 14:01:25 +00:00
Diana Picus	613b65696a	[ARM] GlobalISel: Lower calls to void() functions For now, we hardcode a BLX instruction, and generate an ADJCALLSTACKDOWN/UP pair with amount 0. llvm-svn: 295716	2017-02-21 11:33:59 +00:00
Evgeny Stupachenko	9909872e30	The patch introduces new way of narrowing complex (>UINT16 variants) solutions. The new method introduced under "-lsr-exp-narrow" option (currenlty set to true). Summary: The method is based on registers number mathematical expectation and should be generally closer to optimal solution. Please see details in comments to "LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function (in lib/Transforms/Scalar/LoopStrengthReduce.cpp). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D29862 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 295704	2017-02-21 07:34:40 +00:00
Craig Topper	d88389aa7e	[X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697	2017-02-21 06:39:13 +00:00
Craig Topper	16d9730b86	[X86] Fix formatting. NFC llvm-svn: 295695	2017-02-21 06:27:13 +00:00
Craig Topper	d9fe664868	[AVX-512] Use sse_load_f32/f64 in place of scalar_to_vector and scalar load in some patterns. llvm-svn: 295693	2017-02-21 04:26:10 +00:00
Craig Topper	d890db6952	[AVX-512] Fix the ExeDomain for vcmpss/vcmpsd. llvm-svn: 295691	2017-02-21 04:26:04 +00:00
Sanjoy Das	7b0b408973	[ValueTracking] clang-format a section I'm about to touch; NFC (Whitespace only change) llvm-svn: 295690	2017-02-21 02:42:42 +00:00
Matthias Braun	9ab403942b	ScheduleDAG: Cleanup; NFC - Fix doxygen comments (do not repeat documented name, remove definition comment if there is already one at the declaration, add \p, ...) - Add some const modifiers - Use range based for llvm-svn: 295688	2017-02-21 01:27:33 +00:00
Matthias Braun	05e5fd6ba2	SubtargetFeature: Cleanup; NFC - Fix doxygen comments - Remove duplicated comments - Remove section comments (which became wrong over time) - Use more `const` and references but less `auto` llvm-svn: 295687	2017-02-21 01:27:29 +00:00
Sanjoy Das	90208720e3	Add a wrapper around copy_if in STLExtras; NFC I will add one more use for this in a later change. llvm-svn: 295685	2017-02-21 00:38:44 +00:00
Taewook Oh	4cf5c1087c	[BranchFolding] Update debug location along with the update of branch instruction. Summary: Currently, BranchFolder drops DebugLoc for branch instructions in some places. For example, for the test code attached, the branch instruction of 'entry' block has a DILocation of ``` !12 = !DILocation(line: 6, column: 3, scope: !11) ``` , but this information is gone when then block is lowered because BranchFolder misses it. This patch is a fix for this issue. Reviewers: qcolombet, aprantl, craig.topper, MatzeB Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29902 llvm-svn: 295684	2017-02-21 00:12:38 +00:00
Sanjoy Das	85cd132068	[IndVars] Add an assert We've already checked that the loop is in simplify form before, but a little paranoia never hurt anyone. llvm-svn: 295680	2017-02-20 23:37:11 +00:00
Davide Italiano	a9de0109b3	[IR/Verifier] List the CU we weren't able to find in `llvm.dbg.cu`. llvm-svn: 295678	2017-02-20 22:51:42 +00:00
Daniel Berlin	78cbd28102	MemorySSA: Add support for renaming uses in the updater. Summary: This lets one add aliasing stores to the updater. (i'm next going to move the creation/etc functions to the updater) Reviewers: george.burgess.iv Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D30154 llvm-svn: 295677	2017-02-20 22:26:03 +00:00
Craig Topper	2012dda9a0	[AVX-512] Add a few more patterns for selecting masked vpternlog with broadcast loads where the passthru operand is not operand 0. llvm-svn: 295673	2017-02-20 17:44:09 +00:00
Simon Pilgrim	2967ed1c7e	[X86] Tidyup combineExtractVectorElt. NFCI. Pull out repeated code for extraction index operand and source vector value type. Use isNullConstant helper to check for zero extraction index. llvm-svn: 295670	2017-02-20 16:09:45 +00:00
Diana Picus	1c33c9f0b0	[ARM] GlobalISel: Don't select atomic loads There used to be a check in the IRTranslator that prevented us from having to deal with atomic loads/stores. That check has been removed in r294993 and the AArch64 backend was updated accordingly. This commit does the same thing for the ARM backend. In general, in the ARM backend we introduce fences during the atomic expand pass, so we don't have to worry about atomics, except for the 32-bit ARMv8 target, which handles atomics more like AArch64. Since we don't want to worry about that yet, just bail out of instruction selection if we find any atomic loads. llvm-svn: 295662	2017-02-20 14:45:58 +00:00
Igor Breger	fda32d266a	[X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector. Its more profitable to go through memory (1 cycles throughput) than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index. IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. Removing the VINSERT node, we don't need it any more. Differential Revision: https://reviews.llvm.org/D29690 llvm-svn: 295660	2017-02-20 14:16:29 +00:00
Simon Pilgrim	5910ebe720	[X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX Use v8i64 ASHR instructions if we don't have VLX. Differential Revision: https://reviews.llvm.org/D28537 llvm-svn: 295656	2017-02-20 12:16:38 +00:00
Simon Pilgrim	c0dc9a4913	Strip trailing whitespace. llvm-svn: 295653	2017-02-20 11:56:43 +00:00
Simon Pilgrim	50b958c07a	[SelectionDAG] Add scalarization support for ISD::*_EXTEND_VECTOR_INREG opcodes. Thanks to Mikael Holmén for the initial test case llvm-svn: 295652	2017-02-20 11:55:58 +00:00
Sjoerd Meijer	e22a79e898	AArch64AsmParser: tablegen the isBranchTarget helper functions Use tablegen to autogenerate isBranchtarget helper functions. This is a cleanup that removes almost identical functions that differ only in a few constants. Differential Revision: https://reviews.llvm.org/D30160 llvm-svn: 295649	2017-02-20 10:57:54 +00:00
Ayman Musa	51ffeab8c8	[X86][AVX] Extend hasVEX_WPrefix bit to accept WIG value (W Ignore) + update all AVX instructions with the new value. Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0. This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible). Differential Revision: https://reviews.llvm.org/D29876 llvm-svn: 295643	2017-02-20 08:27:54 +00:00

1 2 3 4 5 ...

99904 Commits