llvm-project

Commit Graph

Author	SHA1	Message	Date
Juergen Ributzka	c110c0b99a	Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ. Note: This version fixed an issue with the TBZ/TBNZ instructions that were generated in FastISel. The issue was that the 64bit version of TBZ (TBZX) automagically sets the upper bit of the immediate field that is used to specify the bit we want to test. To test for any of the lower 32bits we have to first extract the subregister and use the 32bit version of the TBZ instruction (TBZW). Original commit message: Teach selectBranch to fold bit test and branch into a single instruction (TBZ or TBNZ). llvm-svn: 218693	2014-09-30 19:59:35 +00:00
Matt Arsenault	9706978077	R600/SI: Fix printing of clamp and omod No tests for omod since nothing uses it yet, but this should get rid of the remaining annoying trailing zeros after some instructions. llvm-svn: 218692	2014-09-30 19:49:48 +00:00
Matt Arsenault	272c50a1fe	R600/SI: Update VOP3b to not include obsolete operands abs / neg are now part of the srcN_modifiers operands llvm-svn: 218691	2014-09-30 19:49:43 +00:00
Bradley Smith	7a77075530	Extend C disassembler API to allow specifying target features llvm-svn: 218682	2014-09-30 16:31:40 +00:00
Reed Kotler	3ebdcc9ea7	Add numeric extend, trunctate to mips fast-isel Summary: Add numeric extend, trunctate to mips fast-isel Reactivates D4827 Test Plan: fpext.ll loadstoreconv.ll Reviewers: dsanders Subscribers: mcrosier Differential Revision: http://reviews.llvm.org/D5251 llvm-svn: 218681	2014-09-30 16:30:13 +00:00
Tom Coxon	2c13e71728	[AArch64] Remove unnecessary whitespace. (Test commit) llvm-svn: 218680	2014-09-30 16:23:16 +00:00
Andrea Di Biagio	c7c524129b	[DAG] Check in advance if a build_vector has a legal type before attempting to convert it into a shuffle. Currently, the DAG Combiner only tries to convert type-legal build_vector nodes into shuffles. This patch simply moves the logic that checks if a build_vector has a legal value type up before we even start analyzing the operands. This allows to early exit immediately from method 'visitBUILD_VECTOR' if the node type is known to be illegal. No functional change intended. llvm-svn: 218677	2014-09-30 15:30:22 +00:00
Alex Lorenz	597eaf2a43	Revert r218673 'llvm-cov: add test for report's function & file association.' Test causes buildbot failures. llvm-svn: 218676	2014-09-30 14:48:12 +00:00
Alex Lorenz	a891e6d44a	llvm-cov: add test for report's function & file association. This commit adds a test which checks that the functions defined in header files will get associated with the header files rather than the source files in the reports. Differential Revision: http://reviews.llvm.org/D5489 llvm-svn: 218673	2014-09-30 12:52:31 +00:00
Alex Lorenz	cb1702d45a	llvm-cov: Use the number of executed functions for the function coverage metric. This commit fixes llvm-cov's function coverage metric by using the number of executed functions instead of the number of fully covered functions. Differential Revision: http://reviews.llvm.org/D5196 llvm-svn: 218672	2014-09-30 12:45:13 +00:00
Lorenzo Martignoni	40d3deeb7d	Introduce support for custom wrappers for vararg functions. Differential Revision: http://reviews.llvm.org/D5412 llvm-svn: 218671	2014-09-30 12:33:16 +00:00
Robert Khasanov	28a7df0b5f	[AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}. Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 218670	2014-09-30 12:15:52 +00:00
Robert Khasanov	5aa4445bde	[AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ} Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1. Now cmp intrinsics lower in the following way: (i8 (int_x86_avx512_mask_pcmpeq_q_128 (v2i64 %a), (v2i64 %b), (i8 %mask))) -> (i8 (bitcast (v8i1 (insert_subvector undef, (v2i1 (and (PCMPEQM %a, %b), (extract_subvector (v8i1 (bitcast %mask)), 0))), 0)))) llvm-svn: 218669	2014-09-30 11:41:54 +00:00
Robert Khasanov	b25e562d14	[AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW. Added new operand type for intrinsics (IIT_V64) llvm-svn: 218668	2014-09-30 11:32:22 +00:00
Robert Khasanov	a27c8e0fd9	[AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ. Added CMP_MASK intrinsic type llvm-svn: 218667	2014-09-30 11:19:50 +00:00
Job Noorman	a9372a2755	Make sure aggregates are properly alligned on MSP430. llvm-svn: 218665	2014-09-30 11:15:44 +00:00
Chad Rosier	aab5d7bd33	[IndVarSimplify] Widen loop unsigned compares. This patch extends r217953 to handle unsigned comparison. Phabricator revision: http://reviews.llvm.org/D5526 llvm-svn: 218659	2014-09-30 03:17:42 +00:00
Chandler Carruth	aaf8e03d92	[x86] Revert r218588, r218589, and r218600. These patches were pursuing a flawed direction and causing miscompiles. Read on for details. Fundamentally, the premise of this patch series was to map VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because we are going to have to lower to VSELECT nodes for some blends to trigger the instruction selection patterns of variable blend instructions. This doesn't actually work out so well. In order to match performance with the existing VECTOR_SHUFFLE lowering code, we would need to re-slice the blend in order to fit it into either the integer or floating point blends available on the ISA. When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources) this works well because the X86 backend ensures that these types of operands to VSELECT get sign extended into '-1' and '0' for true and false, allowing us to re-slice the bits in whatever granularity without changing semantics. However, if the VSELECT condition comes from some other source, for example code lowering vector comparisons, it will likely only have the required bit set -- the high bit. We can't blindly slice up this style of VSELECT. Reid found some code using Halide that triggers this and I'm hopeful to eventually get a test case, but I don't need it to understand why this is A Bad Idea. There is another aspect that makes this approach flawed. When in VECTOR_SHUFFLE form, we have very distilled information that represents the constant blend mask. Converting back to a VSELECT form actually can lose this information, and so I think now that it is better to treat this as VECTOR_SHUFFLE until the very last moment and only use VSELECT nodes for instruction selection purposes. My plan is to: 1) Clean up and formalize the target pre-legalization DAG combine that converts a VSELECT with a constant condition operand into a VECTOR_SHUFFLE. 2) Remove any fancy lowering from VSELECT during legalization relying entirely on the DAG combine to catch cases where we can match to an immediate-controlled blend instruction. One additional step that I'm not planning on but would be interested in others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV which encodes a fully legalized VSELECT node. Then it would be easy to write isel patterns only in terms of this to ensure VECTOR_SHUFFLE legalization only ever forms the fully legalized construct and we can't cycle between it and VSELECT combining. llvm-svn: 218658	2014-09-30 02:52:28 +00:00
Chandler Carruth	964747adcf	[x86] Add some vector-register broadcast operations to the 256-bit v4 tests which were missing them. llvm-svn: 218657	2014-09-30 02:32:36 +00:00
Matt Arsenault	1c4571e0fd	R600: Fix broken check lines, missing scalar case. llvm-svn: 218655	2014-09-30 01:05:29 +00:00
Matt Arsenault	06a711dce5	Fix missing C++ mode comment llvm-svn: 218654	2014-09-30 01:05:27 +00:00
Juergen Ributzka	6ac12439d0	[FastISel][AArch64] Fold sign-/zero-extends into the load instruction. The sign-/zero-extension of the loaded value can be performed by the memory instruction for free. If the result of the load has only one use and the use is a sign-/zero-extend, then we emit the proper load instruction. The extend is only a register copy and will be optimized away later on. Other instructions that consume the sign-/zero-extended value are also made aware of this fact, so they don't fold the extend too. This fixes rdar://problem/18495928. llvm-svn: 218653	2014-09-30 00:49:58 +00:00
Juergen Ributzka	0616d9d41a	[FastISel][AArch64] Factor out scale factor calculation. NFC. Factor out the code that determines the implicit scale factor of memory operations for a given value type. llvm-svn: 218652	2014-09-30 00:49:54 +00:00
Nick Kledzik	5ffacc1655	[llvm-objdump] switch some uses of format() to format_hex() and left_justify() llvm-svn: 218649	2014-09-30 00:19:58 +00:00
Eric Christopher	a2db922c0e	Simplify conditional. llvm-svn: 218643	2014-09-29 23:31:13 +00:00
Adam Nemet	6bddb8c3a5	[AVX512] Use X86VectorVTInfo in the masking helper classes and the FMAs No functionality change. Makes the code more compact (see the FMA part). This needs a new type attribute MemOpFrag in X86VectorVTInfo. For now I only defined this in the simple cases. See the commment before the attribute. Diff of X86.td.expanded before and after is empty except for the appearance of the new attribute. llvm-svn: 218637	2014-09-29 22:54:41 +00:00
Hans Wennborg	f26bfc1671	WinCOFFObjectWriter: optimize the string table for common suffices This is a follow-up from r207670 which did the same for ELF. Differential Revision: http://reviews.llvm.org/D5530 llvm-svn: 218636	2014-09-29 22:43:20 +00:00
Eric Christopher	6a0551e43a	Add soft-float to the key for the subtarget lookup in the TargetMachine map, this makes sure that we can compile the same code for two different ABIs (hard and soft float) in the same module. Update one testcase accordingly (and fix some confusing naming) and add a new testcase as well with the ordering swapped which would highlight the problem. llvm-svn: 218632	2014-09-29 21:57:54 +00:00
Eric Christopher	9b270d4dc9	Fix spelling and reflow comments. llvm-svn: 218631	2014-09-29 21:57:52 +00:00
Dave Estes	5f9daea101	[AArch64] Refines the Cortex-A57 Machine Model Primarily refines all of the instructions with accurate latency and micro-op information. Refinements largely focus on the NEON instructions. Additionally, a few advanced features are modeled, including forwarding for MAC instructions and hazards for floating point SQRT and DIV. Lastly, the issue-width is reduced to three so that the scheduler will better accommodate the narrower decode and dispatch width. llvm-svn: 218627	2014-09-29 21:27:36 +00:00
David Blaikie	ce3f573ae8	Unit test r218187, changing RTDyldMemoryManager::getSymbolAddress's behavior favor mangled lookup over unmangled lookup. The contract of this function seems problematic (fallback in either direction seems like it could produce bugs in one client or another), but here's some tests for its current behavior, at least. See the commit/review thread of r218187 for more discussion. llvm-svn: 218626	2014-09-29 21:25:13 +00:00
Aaron Ballman	be8ce197aa	Fixing the build for compilers which do not yet have support for constexpr functions, NFC. llvm-svn: 218622	2014-09-29 20:27:01 +00:00
Jordan Rose	59e4e1b5fe	Add getValueOr to llvm::Optional<T>. This takes a single argument convertible to T, and - if the Optional has a value, returns the existing value, - otherwise, constructs a T from the argument and returns that. Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS. llvm-svn: 218618	2014-09-29 18:56:08 +00:00
Jordan Rose	40424cd6ab	Add "typedef T value_type;" to llvm::Optional<T>. Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS. llvm-svn: 218617	2014-09-29 18:56:05 +00:00
Matt Arsenault	9f617a0cb2	Fixing missing C++ mode comment llvm-svn: 218612	2014-09-29 15:55:18 +00:00
Matt Arsenault	1fd0c62821	Fix include order llvm-svn: 218611	2014-09-29 15:53:15 +00:00
Matt Arsenault	9783e00d1e	R600/SI: Fix hardcoded values for modifiers. Move enums to SIDefines.h llvm-svn: 218610	2014-09-29 15:50:26 +00:00
Matt Arsenault	3d4233fe48	R600/SI: Also fix fsub + fadd a, a to mad combines llvm-svn: 218609	2014-09-29 14:59:38 +00:00
Matt Arsenault	02cb0ff7db	R600/SI: Fix using mad with multiplies by 2 These turn into fadds, so combine them into the target mad node. fadd (fadd (a, a), b) -> mad 2.0, a, b llvm-svn: 218608	2014-09-29 14:59:34 +00:00
Chad Rosier	70d54ac848	[AArch64] Improve cost model to handle sdiv by a pow-of-two. This patch improves the target-specific cost model to better handle signed division by a power of two. The immediate result is that this enables the SLP vectorizer to do a better job. http://reviews.llvm.org/D5469 PR20714 llvm-svn: 218607	2014-09-29 13:59:31 +00:00
Frederic Riss	312a02e193	Store TypeUnits in a SmallVector<DWARFUnitSection> instead of a single DWARFUnitSection. There will be multiple TypeUnits in an unlinked object that will be extracted from different sections. Now that we have DWARFUnitSection that is supposed to represent an input section, we need a DWARFUnitSection<TypeUnit> per input .debug_types section. Once this is done, the interface is homogenous and we can move the Section parsing code into DWARFUnitSection. This is a respin of r218513 that got reverted because it broke some builders. This new version features an explicit move constructor for the DWARFUnitSection class to workaround compilers unable to generate correct C++11 default constructors. Reviewers: samsonov, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5482 llvm-svn: 218606	2014-09-29 13:56:39 +00:00
Kevin Qin	fc02e3c363	Use a loop to simplify the runtime unrolling prologue. Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: if (extraiters == loopfactor) jump L1 if (extraiters == loopfactor-1) jump L2 ... L1: LoopBody; L2: LoopBody; ... if tripcount < loopfactor jump End Loop: ... End: It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop. This commit is to use a loop to execute the extra iterations in prologue, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: else jump Prol Prol: LoopBody; extraiters -= 1 // Omitted if unroll factor is 2. if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2. if (tripcount < loopfactor) jump End Loop: ... End: Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution. llvm-svn: 218604	2014-09-29 11:15:00 +00:00
Oliver Stannard	a4eba5ad70	[Thumb2] ldrexd and strexd are not defined on v7M The Thumb2 ldrexd and strexd instructions are not defined for M-class architectures. llvm-svn: 218603	2014-09-29 10:57:29 +00:00
Chandler Carruth	6cbf43167b	[x86] Make the new vector shuffle lowering lower blends as VSELECT nodes, and rely exclusively on its logic. This removes a ton of duplication from the blend lowering and centralizes it in one place. One downside is that it requires a bunch of hacks to make this work with the current legalization framework. We have to manually speculate one aspect of legalizing VSELECT nodes to get everything to work nicely because the existing legalization framework isn't actually bottom-up. The other grossness is that we somewhat duplicate the analysis of constant blends. I'm on the fence here. If reviewers thing this would look better with VSELECT when it has constant operands dumping over tho VECTOR_SHUFFLE, we could go that way. But it would be a substantial change because currently all of the actual blend instructions are matched via patterns in the TD files based around VSELECT nodes (despite them not being perfect fits for that). Suggestions welcome, but at least this removes the rampant duplication in the backend. llvm-svn: 218600	2014-09-29 09:57:07 +00:00
Jyoti Allur	b76b57fefd	Remove dead code from DIBuilder llvm-svn: 218593	2014-09-29 06:32:54 +00:00
Chandler Carruth	b1cc7a8542	[x86] Delete a bunch of really bad and totally unnecessary code in the X86 target-specific DAG combining that tried to convert VSELECT nodes into VECTOR_SHUFFLE nodes that it "knew" would lower into immediate-controlled blend nodes. Turns out, we have perfectly good lowering of all these VSELECT nodes, and indeed that lowering already knows how to handle lowering through BLENDI to immediate-controlled blend nodes. The code just wasn't getting used much because this thing forced the world to go through the vector shuffle lowering. Yuck. This also exposes that I was too aggressive in avoiding domain crossing in v218588 with that lowering -- when the other option is to expand into two 128-bit vectors, it is worth domain crossing. Restore that behavior now that we have nice tests covering it. The test updates here fall into two camps. One is where previously we ended up with an unsigned encoding of the blend operand and now we get a signed encoding. In most of those places there were elaborate comments explaining exactly what these operands really mean. Rather than that, just switch these tests to use the nicely decoded comments that make it obvious that the final shuffle matches. The other updates are just removing pointless domain crossing by blending integers with PBLENDW rather than BLENDPS. llvm-svn: 218589	2014-09-29 02:01:20 +00:00
Chandler Carruth	d639c7a829	[x86] Refactor all of the VSELECT-as-blend lowering code to avoid domain crossing and generally work more like the blend emission code in the new vector shuffle lowering. My goal is to have the new vector shuffle lowering just produce VSELECT nodes that are either matched here to BLENDI or are legal and matched in the .td files to specific blend instructions. That seems much cleaner as there are other ways to produce a VSELECT anyways. =] No observable functionality changed yet, mostly because this code appears to be near-dead. The behavior of this lowering routine did change though. This code being mostly dead and untestable will change with my next commit which will also point some new tests at it. llvm-svn: 218588	2014-09-29 01:32:54 +00:00
Chandler Carruth	2f9e56e527	[x86] Improve naming and comments for VSELECT lowering. No functionality changed. llvm-svn: 218586	2014-09-29 00:51:58 +00:00
Chandler Carruth	c7129276cd	[x86] Add the dispatch skeleton to the new vector shuffle lowering for AVX-512. There is no interesting logic yet. Everything ends up eventually delegating to the generic code to split the vector and shuffle the halves. Interestingly, that logic does a significantly better job of lowering all of these types than the generic vector expansion code does. Mostly, it lets most of the cases fall back to nice AVX2 code rather than all the way back to SSE code paths. Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next up will be to incrementally add direct support for the basic instruction set to each type (adding tests first). llvm-svn: 218585	2014-09-29 00:37:27 +00:00
Chandler Carruth	32a3ebda14	[x86] Make the split-and-lower routine fully generic by relaxing the assertion, making the name generic, and improving the documentation. Step 1 in adding very primitive support for AVX-512. No functionality changed yet. llvm-svn: 218584	2014-09-29 00:21:49 +00:00

1 2 3 4 5 ...

108168 Commits