llvm-project

Commit Graph

Author	SHA1	Message	Date
David Majnemer	59939acd26	InstCombine: Optimize icmp eq/ne (shl Const2, A), Const1 The following implements the optimization for sequences of the form: icmp eq/ne (shl Const2, A), Const1 Such sequences can be transformed to: icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2)) This handles only the equality operators for now. Other operators need to be handled. Patch by Ankur Garg! llvm-svn: 220162	2014-10-19 08:23:08 +00:00
Chandler Carruth	a801dd5799	Fix a long-standing miscompile in the load analysis that was uncovered by my refactoring of this code. The method isSafeToLoadUnconditionally assumes that the load will proceed with the preferred type alignment. Given that, it has to ensure that the alloca or global is at least that aligned. It has always done this historically when a datalayout is present, but has never checked it when the datalayout is absent. When I refactored the code in r220156, I exposed this path when datalayout was present and that turned the latent bug into a patent bug. This fixes the issue by just removing the special case which allows folding things without datalayout. This isn't worth the complexity of trying to tease apart when it is or isn't safe without actually knowing the preferred alignment. llvm-svn: 220161	2014-10-19 08:17:50 +00:00
Chandler Carruth	8a99373812	Switch how the datalayout availability test is handled in this code to make much more sense and in theory be more correct. If you trace the code alllll the way back to when it was first introduced, the comments make it slightly more clear what was going on here. At that time, the only way Base != V was if DL (then TD) was non-null. As a consequence, if DL was null, that meant we were loading directly from the alloca or global found above the test. After refactoring, this has become at least terribly subtle and potentially incorrect. There are many forms of pointer manipulation that can be traversed without DataLayout, and some of them would in fact change the size of object being loaded vs. allocated. Rather than this subtlety, I've hoisted the actual 'return true' bits into the code which actually found an alloca or global and based them on the loaded pointer being that alloca or global. This is both more clear and safer. I've also added comments about exactly why this set of predicates is used. I've also corrected a misleading comment about globals -- if overridden they may not just have a different size, they may be null and completely unsafe to load from! Hopefully this confuses the next reader a bit less. I don't have any test cases or anything, the patch is motivated strictly to improve the readability of the code. llvm-svn: 220156	2014-10-19 00:42:16 +00:00
Bob Wilson	1e1f13862e	Use triple predicate functions instead of checking values directly. NFC. llvm-svn: 220155	2014-10-19 00:39:30 +00:00
Chandler Carruth	38e98d5782	Rename 'TD' to 'DL' in this function as the argument is now a DataLayout argument. llvm-svn: 220151	2014-10-18 23:47:22 +00:00
Chandler Carruth	1f27f03849	Fix the other comment to use modern doxygen style and be a bit more direct. Notably, comment on the fact that the loaded type is significant in that it determines how wide of an access must be safe. llvm-svn: 220150	2014-10-18 23:46:17 +00:00
Chandler Carruth	be49df3d2c	More formatting cleanup brought to you by clang-format. llvm-svn: 220149	2014-10-18 23:41:25 +00:00
Chandler Carruth	b56052f44d	Clean up doxygen syntax and reword comments to flow better, have a brief section, and not have unfinished sentence fragments. llvm-svn: 220147	2014-10-18 23:31:55 +00:00
Chandler Carruth	d67244df4e	Clean up the formatting and trailing whitespace of a routine before editting it. llvm-svn: 220146	2014-10-18 23:19:03 +00:00
Lang Hames	ad0962aec5	[PBQP] Replace the interference-constraints algorithm with a faster version loosely based on linear scan. On x86-64 this is good for a ~2% drop in compile time on the nightly test suite. llvm-svn: 220143	2014-10-18 17:26:07 +00:00
Chandler Carruth	be9dccd64d	Preserve AA metadata when combining (cast (load (...))) -> (load (cast (...))). llvm-svn: 220141	2014-10-18 11:00:12 +00:00
Chandler Carruth	2f75fcfef3	[InstCombine] Do an about-face on how LLVM canonicalizes (cast (load ...)) and (load (cast ...)): canonicalize toward the former. Historically, we've tried to load using the type of the pointer, and tried to match that type as closely as possible removing as many pointer casts as we could and trading them for bitcasts of the loaded value. This is deeply and fundamentally wrong. Repeat after me: memory does not have a type! This was a hard lesson for me to learn working on SROA. There is only one thing that should actually drive the type used for a pointer, and that is the type which we need to use to load from that pointer. Matching up pointer types to the loaded value types is very useful because it minimizes the physical size of the IR required for no-op casts. Similarly, the only thing that should drive the type used for a loaded value is how that value is used! Again, this minimizes casts. And in fact, the only thing motivating types in any part of LLVM's IR are the types used by the operations in the IR. We should match them as closely as possible. I've ended up removing some tests here as they were testing bugs or behavior that is no longer present. Mostly though, this is just cleanup to let the tests continue to function as intended. The only fallout I've found so far from this change was SROA and I have fixed it to not be impeded by the different type of load. If you find more places where this change causes optimizations not to fire, those too are likely bugs where we are assuming that the type of pointers is "significant" for optimization purposes. llvm-svn: 220138	2014-10-18 06:36:22 +00:00
Nick Kledzik	3b2aa057e6	[llvm-objdump] Fix mach-o binding decompression error llvm-svn: 220119	2014-10-18 01:21:02 +00:00
Chandler Carruth	2dc9682e59	[SROA] Change how SROA does vector-based promotion of allocas to handle cases where the alloca type, the load types, and the store types used all disagree. Previously, the only way that vector-based promotion occured was if the alloca type was a vector type. This was one of the very few remaining uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns out it was a bad idea. The alloca type can change very easily based on the mixture of types loaded and stored to that alloca. We shouldn't be relying on it as a signal for very much. Instead, the source of truth should be loads and stores. We should canonicalize the loads and stores as much as possible and then rely on them exclusively in SROA. When looking and loads and stores, we may find many different candidate vector types. This change will let SROA try all of them to find a vector type which is a viable way to promote the entire alloca to a vector register. With this change, it becomes possible to do better canonicalization and optimization of loads and stores without breaking SROA in random ways, and that should allow fixing a core source of performance loss in hot numerical loops such as those in Eigen. llvm-svn: 220116	2014-10-18 00:44:02 +00:00
Aaron Watry	8114437a8f	R600/SI: Add global atomicrmw xchg v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220110	2014-10-17 23:33:03 +00:00
Aaron Watry	d672ee2a47	R600/SI: Add global atomicrmw xor v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220109	2014-10-17 23:33:01 +00:00
Aaron Watry	8a911e6926	R600/SI: Add global atomicrmw or v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220108	2014-10-17 23:32:59 +00:00
Aaron Watry	58c9992f15	R600/SI: Add global atomicrmw min/umin v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220107	2014-10-17 23:32:57 +00:00
Aaron Watry	29f295d7a5	R600/SI: Add global atomicrmw max/umax v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220106	2014-10-17 23:32:56 +00:00
Aaron Watry	621278034c	R600/SI: Add global atomicrmw and v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220105	2014-10-17 23:32:54 +00:00
Aaron Watry	328f1bae8e	R600/SI: Add global atomicrmw sub v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220104	2014-10-17 23:32:52 +00:00
Evgeniy Stepanov	e08633e900	[msan] Fix handling of byval arguments with large alignment. MSan param-tls slots are 8-byte aligned. This change clips alignment of memcpy into param-tls to 8. llvm-svn: 220101	2014-10-17 23:29:44 +00:00
Pete Cooper	230332f4fe	Check for dynamic alloca's when selecting lifetime intrinsics. TL;DR: Indexing maps with [] creates missing entries. The long version: When selecting lifetime intrinsics, we index the static alloca map with the AllocaInst we find for that lifetime. Trouble is, we don't first check to see if this is a dynamic alloca. On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime. 0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots. PEI would later trigger an assert because it expects the stack protector to not be dead. This fix ensures that we only get frame indices for static allocas, ie, those in the map. Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken. rdar://problem/18672951 llvm-svn: 220099	2014-10-17 22:59:33 +00:00
Rafael Espindola	7da1ea83a9	Revert "TRE: make TRE a bit more aggressive" This reverts commit r219899. This also updates byval-tail-call.ll to make it clear what was breaking. Adding r219899 again will cause the load/store to disappear. llvm-svn: 220093	2014-10-17 21:25:48 +00:00
Bill Schmidt	ba637db298	[PowerPC] Change assert to better form llvm-svn: 220092	2014-10-17 21:19:59 +00:00
Matt Arsenault	a708358e93	R600/SI: Remove redundant setting of instruction bits These are all set on the instruction base classes. llvm-svn: 220091	2014-10-17 21:13:11 +00:00
Bill Schmidt	a087d74250	[PowerPC] Change liveness testing in VSX FMA mutation pass With VSX enabled, LLVM crashes when compiling test/CodeGen/PowerPC/fma.ll. I traced this to the liveness test that's revised in this patch. The interval test is designed to only work for virtual registers, but in this case the AddendSrcReg is physical. Since there is already a walk of the MIs between the AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg in that loop. At Hal Finkel's request, I converted the liveness test to an assert restricted to virtual registers. I've changed the fma.ll test to have VSX and non-VSX variants so we can test both kinds of multiply-adds. llvm-svn: 220090	2014-10-17 21:02:44 +00:00
Matt Arsenault	933c38df40	Fix typo llvm-svn: 220068	2014-10-17 18:02:31 +00:00
Matt Arsenault	e184482bf8	R600/SI: Also check for FPImm literal constants llvm-svn: 220067	2014-10-17 18:00:50 +00:00
Matt Arsenault	d282ada508	R600/SI: Allow commuting with source modifiers llvm-svn: 220066	2014-10-17 18:00:48 +00:00
Matt Arsenault	8943d24949	R600/SI: Simplify code with hasModifiersSet llvm-svn: 220065	2014-10-17 18:00:45 +00:00
Matt Arsenault	ace5b76739	R600/SI: Fix general commuting breaking src mods The generic code trying to use findCommutedOpIndices won't understand that it needs to swap the modifier operands also, so it should fail if they are set. llvm-svn: 220064	2014-10-17 18:00:43 +00:00
Matt Arsenault	ffc5d5bbf0	R600/SI: Cleanup code with ChangeToFPImmediate llvm-svn: 220063	2014-10-17 18:00:41 +00:00
Matt Arsenault	6d3cd544bb	R600/SI: Allow comuting fp immediates llvm-svn: 220062	2014-10-17 18:00:39 +00:00
Matt Arsenault	aa5ccfb566	R600/SI: Use early return instead of checking condition twice Any commutable instruction will have at least src1. llvm-svn: 220061	2014-10-17 18:00:37 +00:00
Matt Arsenault	328b1193b5	R600/SI: Use complex pattern for MUBUF load patterns. This eliminates a use of the SI_ADDR64_RSRC pseudo llvm-svn: 220057	2014-10-17 17:43:00 +00:00
Matt Arsenault	83a535ff6b	R600/SI: Remove SI_BUFFER_RSRC pseudo Just use REG_SEQUENCE directly, so there are fewer instructions to need to deal with later. llvm-svn: 220056	2014-10-17 17:42:56 +00:00
Juergen Ributzka	ad2363f9ee	[Stackmaps] Enable invoking the patchpoint intrinsic. Patch by Kevin Modzelewski Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits, reames Differential Revision: http://reviews.llvm.org/D5634 llvm-svn: 220055	2014-10-17 17:39:00 +00:00
Andrea Di Biagio	c48cb86f05	[X86] Fix missed selection of non-temporal store of zero vector. When the input to a store instruction was a zero vector, the backend always selected a normal vector store regardless of the non-temporal hint. This is fixed by this patch. This fixes PR19370. llvm-svn: 220054	2014-10-17 17:27:06 +00:00
James Molloy	f497d5511d	[AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering. We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same. While we're at it, avoid writing to std::vector::end()... Bug found with random testing and a lot of coffee. llvm-svn: 220051	2014-10-17 17:06:31 +00:00
Bill Schmidt	2d1128acb2	[PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64 types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This patch adds that support. As with lxvd2x/stxvd2x, this involves straightforward overriding of the patterns normally recognized for lvx/stvx, with preference given to the VSX patterns when VSX is enabled. In addition, the logic for permitting misaligned memory accesses is modified so that v4r32 and v4i32 are treated the same as v2f64 and v2i64 when VSX is enabled. Finally, the DAG generation for unaligned loads is changed to just use a normal LOAD (which will become lxvw4x) on P8 and later hardware, where unaligned loads are preferred over lvsl/lvx/lvx/vperm. A number of tests now generate the VSX loads/stores instead of lvx/stvx, so this patch adds VSX variants to those tests. I've also added <4 x float> tests to the vsx.ll test case, and created a vsx-p8.ll test case to be used for testing code generation for the P8Vector feature. For now, that simply tests the unaligned load/store behavior. This has been tested along with a temporary patch to enable the VSX and P8Vector features, with no new regressions encountered with or without the temporary patch applied. llvm-svn: 220047	2014-10-17 15:13:38 +00:00
Jan Vesely	54468a5a58	Mips: Only set divrem i64 to custom on 64bit Reviewed-by: Daniel Sanders <daniel.sanders@imgtec.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220046	2014-10-17 14:45:28 +00:00
Jan Vesely	af62cf4db0	SelectionDAG: Add sext_inreg optimizations v2: use dyn_cast fixup comments v3: use cast Reviewed-by: Matt Arsenault <arsenm2@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220044	2014-10-17 14:45:25 +00:00
Vasileios Kalintiris	238692beb9	[mips] Add support for COP1's Branch-On-Cond-Likely instructions Summary: Depends on D5782 Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5802 llvm-svn: 220042	2014-10-17 14:08:28 +00:00
Vasileios Kalintiris	6d1e64896d	[mips] Add support for COP0's Branch-On-Cond-Likely instructions Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5782 llvm-svn: 220036	2014-10-17 12:38:35 +00:00
Hal Finkel	dd38c0b876	[DSE] Remove no-data-layout-only type-based overlap checking DSE's overlap checking contained special logic, used only when no DataLayout was available, which inferred a complete overwrite when the pointee types were equal. This logic seems fine for regular loads/stores, but does not work for memcpy and friends. Instead of fixing this, I'm just removing it. Philosophically, transformations should not contain enhanced behavior used only when data layout is lacking (data layout should be strictly additive), and maintaining these rarely-tested code paths seems not worthwhile at this stage. Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case (slightly reduced from that provided by Aliaksei) replaces the original contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few other tests have been updated to have a data layout. llvm-svn: 220035	2014-10-17 11:56:00 +00:00
Rafael Espindola	b66130209b	Add back commits r219835 and a fixed version of r219829. The only difference from r219829 is using getOrCreateSectionSymbol(*ELFSec) instead of GetOrCreateSymbol(ELFSec->getSectionName()) in ELFObjectWriter which causes us to use the correct section symbol even if we have multiple sections with the same name. Original messages: r219829: Correctly handle references to section symbols. When processing assembly like .long .text we were creating a new undefined symbol .text. GAS on the other hand would handle that as a reference to the .text section. This patch implements that by creating the section symbols earlier so that they are visible during asm parsing. The patch also updates llvm-readobj to print the symbol number in the relocation dump so that the test can differentiate between two sections with the same name. r219835: Allow forward references to section symbols. llvm-svn: 220021	2014-10-17 01:48:58 +00:00
Akira Hatanaka	0d0c78180d	ARM: Fix a bug which was causing convergence failure in constant-island pass. The bug is in ARMConstantIslands::createNewWater where the upper bound of the new water split point is computed: // This could point off the end of the block if we've already got constant // pool entries following this block; only the last one is in the water list. // Back past any possible branches (allow for a conditional and a maximally // long unconditional). if (BaseInsertOffset + 8 >= UserBBI.postOffset()) { BaseInsertOffset = UserBBI.postOffset() - UPad - 8; DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset)); } The split point is supposed to be somewhere between the machine instruction that loads from the constant pool entry and the end of the basic block, before branch instructions. The code above is fine if the basic block is large enough and there are a sufficient number of instructions following the machine instruction. However, if the machine instruction is near the end of the basic block, BaseInsertOffset can point to the machine instruction or another instruction that precedes it, and this can lead to convergence failure. This commit fixes this bug by ensuring BaseInsertOffset is larger than the offset of the instruction following the constant-loading instruction. rdar://problem/18581150 llvm-svn: 220015	2014-10-17 01:31:47 +00:00
Rafael Espindola	4544a4062c	Revert commit r219835 and r219829. Revert "Correctly handle references to section symbols." Revert "Allow forward references to section symbols." Rui found a regression I am debugging. llvm-svn: 220010	2014-10-17 01:06:02 +00:00
Peter Zotov	aff492c6fd	[LLVM-C] Add LLVMInstructionClone. llvm-svn: 220007	2014-10-17 01:02:34 +00:00
Matt Arsenault	bfaab76f6b	R600/SI: Simplify debug printing llvm-svn: 219999	2014-10-17 00:36:20 +00:00
Matt Arsenault	661a031af6	R600/SI: Remove another VALU pattern llvm-svn: 219988	2014-10-16 23:33:37 +00:00
Peter Collingbourne	e186319319	Introduce LLVMParseCommandLineOptions C API function. llvm-svn: 219975	2014-10-16 22:47:52 +00:00
Juergen Ributzka	fd4633e1a5	Reduce code duplication between patchpoint and non-patchpoint lowering. NFC. This is in preparation for another patch that makes patchpoints invokable. Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5657 llvm-svn: 219967	2014-10-16 21:26:35 +00:00
Chandler Carruth	8393406f05	[SROA] Switch the common variable name for the 'AllocaSlices' class to 'AS'. Using 'S' as this was a terrible idea. Arguably, 'AS' is not much better, but it at least follows the idea of using initialisms and removes active confusion about the AllocaSlices variable and a Slice variable. llvm-svn: 219963	2014-10-16 21:11:55 +00:00
Chandler Carruth	61747042c1	[SROA] More range-based cleanups to SROA, these brought to you by clang-modernize. I did have to clean up the variable types and whitespace a bit because the use of auto made the code much less readable here. llvm-svn: 219962	2014-10-16 21:05:14 +00:00
Chandler Carruth	57d4cae202	[SROA] Switch a couple of overly complex iterator accessors to just be ArrayRef accessors. I think this even came up in review that this was over-engineered, and indeed it was. Time to un-build it. llvm-svn: 219958	2014-10-16 20:42:08 +00:00
Robin Morisset	e2de06bef6	Erase fence insertion from SelectionDAGBuilder.cpp (NFC) Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain unsound, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957	2014-10-16 20:34:57 +00:00
Matt Arsenault	70c82173f3	R600/SI: Remove unnecessary VALU patterns These haven't been necessary since allowing selecting SALU instructions in non-entry blocks was enabled. llvm-svn: 219956	2014-10-16 20:31:50 +00:00
Chandler Carruth	c659df9389	[SROA] Start more deeply moving SROA to use ranges rather than just iterators. There are a ton of places where it essentially wants ranges rather than just iterators. This is just the first step that adds the core slice range typedefs and uses them in a couple of places. I still have to explicitly construct them because they've not been punched throughout the entire set of code. More range-based cleanups incoming. llvm-svn: 219955	2014-10-16 20:24:07 +00:00
Matt Arsenault	a3fe7c62d1	R600: Fix nonsensical implementation of computeKnownBits for BFE This was resulting in invalid simplifications of sdiv llvm-svn: 219953	2014-10-16 20:07:40 +00:00
Rafael Espindola	11aaaeebe0	Delete -std-compile-opts. These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951	2014-10-16 20:00:02 +00:00
Bjorn Steinbrink	d20816fde9	Allow call-slop optzn for destinations with a suitable dereferenceable attribute Summary: Currently, call slot optimization requires that if the destination is an argument, the argument has the sret attribute. This is to ensure that the memory access won't trap. In addition to sret, we can also allow the optimization to happen for arguments that have the new dereferenceable attribute, which gives the same guarantee. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5832 llvm-svn: 219950	2014-10-16 19:43:08 +00:00
Sanjay Patel	c699a6117b	fold: sqrt(x * x * y) -> fabs(x) * sqrt(y) If a square root call has an FP multiplication argument that can be reassociated, then we can hoist a repeated factor out of the square root call and into a fabs(). In the simplest case, this: y = sqrt(x * x); becomes this: y = fabs(x); This patch relies on an earlier optimization in instcombine or reassociate to put the multiplication tree into a canonical form, so we don't have to search over every permutation of the multiplication tree. Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to use function-level attributes to do this optimization. This needs to be fixed for both the intrinsics and in the backend. Differential Revision: http://reviews.llvm.org/D5787 llvm-svn: 219944	2014-10-16 18:48:17 +00:00
Juergen Ributzka	03a0611061	[AArch64] Fix miscompile of sdiv-by-power-of-2. When the constant divisor was larger than 32bits, then the optimized code generated for the AArch64 backend would emit the wrong code, because the shift was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would loose the upper 32bits. This fixes rdar://problem/18678801. llvm-svn: 219934	2014-10-16 16:41:15 +00:00
Vasileios Kalintiris	167c372118	[mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 nodes. Summary: In order to support big endian targets for the BuildPairF64 nodes we just need to swap the low/high pair registers. Additionally, for the ExtractElementF64 nodes we have to calculate the correct stack offset with respect to the node's register/operand that we want to extract. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5753 llvm-svn: 219931	2014-10-16 15:41:51 +00:00
Vasileios Kalintiris	711028f718	[mips] Marked the DI/EI instruction aliases as MIPS32r2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5751 llvm-svn: 219927	2014-10-16 15:23:52 +00:00
Vasileios Kalintiris	f445a56b61	Test commit access: remove extra new line at the end of file llvm-svn: 219925	2014-10-16 14:37:00 +00:00
Akira Hatanaka	5c221ef98f	Reapply r219832 - InstCombine: Narrow switch instructions using known bits. The code committed in r219832 asserted when it attempted to shrink a switch statement whose type was larger than 64-bit. llvm-svn: 219902	2014-10-16 06:00:46 +00:00
Saleem Abdulrasool	7f52921976	TRE: make TRE a bit more aggressive Make tail recursion elimination a bit more aggressive. This allows us to get tail recursion on functions that are just branches to a different function. The fact that the function takes a byval argument does not restrict it from being optimised into just a tail call. llvm-svn: 219899	2014-10-16 03:27:30 +00:00
Akira Hatanaka	40c2cf4afc	Revert r219832. llvm-svn: 219884	2014-10-16 01:17:02 +00:00
Hal Finkel	2400c96cc3	[LVI] Add some additional comments about caching and context instructions Philip Reames and I had a long conversation about this, mostly because it is not obvious why the current logic is correct. Hopefully, these comments will prevent such confusion in the future. llvm-svn: 219882	2014-10-16 00:40:05 +00:00
Matt Arsenault	f1b34cf6b6	R600: Remove dead function llvm-svn: 219879	2014-10-16 00:08:09 +00:00
Sanjoy Das	360b1ed5f2	Revert "r219834 - Teach ScalarEvolution to sharpen range information" This change breaks the asan buildbots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13468 llvm-svn: 219878	2014-10-15 23:46:04 +00:00
Hal Finkel	68dc3c7ab2	Preserve non-byval pointer alignment attributes using @llvm.assume when inlining For pointer-typed function arguments, enhanced alignment can be asserted using the 'align' attribute. When inlining, if this enhanced alignment information is not otherwise available, preserve it using @llvm.assume-based alignment assumptions. llvm-svn: 219876	2014-10-15 23:44:41 +00:00
Hal Finkel	6f814db8d7	Add CreateAlignmentAssumption to IRBuilder Clang CodeGen had a utility function for creating pointer alignment assumptions using the @llvm.assume intrinsic. This functionality will also be needed by the inliner (to preserve function-argument alignment attributes when inlining), so this moves the utility function into IRBuilder where it can be used both by Clang CodeGen and also other LLVM-level code. llvm-svn: 219875	2014-10-15 23:44:22 +00:00
Adam Nemet	4285c1f8cc	[AVX512] Add DQ subvector inserts In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4 respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative VTs). Since DQ has native support for these intructions, I peeled off the non-"Alt" part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ. Fixes <rdar://problem/18426089> llvm-svn: 219874	2014-10-15 23:42:17 +00:00
Adam Nemet	449b3f0931	[AVX512] Two new attributes in X86VectorVTInfo for subvector insert The new attributes are NumElts and the CD8TupleForm. This prepares the code to enable x8 and x2 inserts. NFC, no change in X86.td.expanded except for the new attributes. llvm-svn: 219871	2014-10-15 23:42:09 +00:00
Adam Nemet	b1c3ef4b60	[AVX512] Rename arg from Opcode32/64 to Opcode128/256 in vinsert_for_size It's the W bit that selects between 32 or 64 elt type and not the opcode. The opcode selects between the width of the insert (128 or 256). llvm-svn: 219870	2014-10-15 23:42:04 +00:00
Matt Arsenault	20893b3611	R600: Remove unnecessary part of computeKnownBitsForTargetNode Zero-width BFEs are combined away already, so there's no point in handling them. llvm-svn: 219868	2014-10-15 23:37:49 +00:00
Matt Arsenault	6de7af4242	Move variable down to use llvm-svn: 219867	2014-10-15 23:37:42 +00:00
Alexander Potapenko	6909b5b567	Add MachOObjectFile::getUuid() This CL introduces MachOObjectFile::getUuid(). This function returns an ArrayRef to the object file's UUID, or an empty ArrayRef if the object file doesn't contain an LC_UUID load command. The new function is gonna be used by llvm-symbolizer. llvm-svn: 219866	2014-10-15 23:35:45 +00:00
Chris Bieneman	5c4e9551c9	Fixing the build failure due to compiler warnings and unnecessary disambiguation. llvm-svn: 219861	2014-10-15 23:11:35 +00:00
Chris Bieneman	732e0aa9fb	Defining a new API for debug options that doesn't rely on static global cl::opts. Summary: This is based on the discussions from the LLVMDev thread: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075886.html Reviewers: chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5389 llvm-svn: 219854	2014-10-15 21:54:35 +00:00
Tom Stellard	c8d7920ad9	R600/SI: Fix bug where immediates were being used in DS addr operands The SelectDS1Addr1Offset complex pattern always tries to store constant lds pointers in the offset operand and store a zero value in the addr operand. Since the addr operand does not accept immediates, the zero value needs to first be copied to a register. This newly created zero value will not go through normal instruction selection, so we need to manually insert a V_MOV_B32_e32 in the complex pattern. This bug was hidden by the fact that if there was another zero value in the DAG that had not been selected yet, then the CSE done by the DAG would use the unselected node for the addr operand rather than the one that was just created. This would lead to the zero value being selected and the DAG automatically inserting a V_MOV_B32_e32 instruction. llvm-svn: 219848	2014-10-15 21:08:59 +00:00
Eric Christopher	2181fb2ff3	Avoid caching the MachineFunction, we don't use it outside of runOnMachineFunction. llvm-svn: 219847	2014-10-15 21:06:25 +00:00
Sid Manning	a002296427	Wrong attribute. LLVM_ATTRIBUTE_UNUSED not LLVM_ATTRIBUTE_USED This original fix for the build break was correct. LLVM_ATTRIBUTE_USED removes the warning message because it keeps the function in the object file. LLVM_ATTRIBUTE_UNUSED indicates that it may or may not be used depending on build settings. llvm-svn: 219846	2014-10-15 20:41:17 +00:00
Duncan P. N. Exon Smith	8d5aeb2698	IR: Move NumOperands from User to Value, NFC Store `User::NumOperands` (and `MDNode::NumOperands`) in `Value`. On 64-bit host architectures, this reduces `sizeof(User)` and all subclasses by 8, and has no effect on `sizeof(Value)` (or, incidentally, on `sizeof(MDNode)`). On 32-bit host architectures, this increases `sizeof(Value)` by 4. However, it has no effect on `sizeof(User)` and `sizeof(MDNode)`, so the only concrete subclasses of `Value` that actually see the increase are `BasicBlock`, `Argument`, `InlineAsm`, and `MDString`. Moreover, I'll be shocked and confused if this causes a tangible memory regression. This has no functionality change (other than memory footprint). llvm-svn: 219845	2014-10-15 20:39:05 +00:00
Duncan P. N. Exon Smith	fcece4d216	IR: Cleanup comments for Value, User, and MDNode A follow-up commit will modify the memory-layout of `Value`, `User`, and `MDNode`. First fix the comments to be doxygen-friendly (and to follow the coding standards). - Use "\brief" instead of "repeatedName -". - Add a brief intro where it was missing. - Remove duplicated comments from source files (and a couple of noisy/trivial comments altogether). llvm-svn: 219844	2014-10-15 20:28:31 +00:00
Sid Manning	2ceaeb6baf	Wrong attribute. LLVM_ATTRIBUTE_USED not LLVM_ATTRIBUTE_UNUSED llvm-svn: 219837	2014-10-15 19:32:52 +00:00
Rafael Espindola	78206c3576	Allow forward references to section symbols. llvm-svn: 219835	2014-10-15 19:30:18 +00:00
Sanjoy Das	90c2f1455a	Teach ScalarEvolution to sharpen range information. If x is known to have the range [a, b) in a loop predicated by (icmp ne x, a), its range can be sharpened to [a + 1, b). Get ScalarEvolution and hence IndVars to exploit this fact. This change triggers an optimization to widen-loop-comp.ll, so it had to be edited to get it to pass. phabricator: http://reviews.llvm.org/D5639 llvm-svn: 219834	2014-10-15 19:25:28 +00:00
Sid Manning	74cd020fca	Add LLVM_ATTRIBUTE_UNUSED to function currently just used in an assert Fixes break when -Wunused-function is used. llvm-svn: 219833	2014-10-15 19:24:14 +00:00
Akira Hatanaka	5bb9346a45	InstCombine: Narrow switch instructions using known bits. Truncate the operands of a switch instruction to a narrower type if the upper bits are known to be all ones or zeros. rdar://problem/17720004 llvm-svn: 219832	2014-10-15 19:05:50 +00:00
Juergen Ributzka	f82c987a5c	Reapply "[FastISel][AArch64] Add custom lowering for GEPs." This is mostly a copy of the existing FastISel GEP code, but we have to duplicate it for AArch64, because otherwise we would bail out even for simple cases. This is because the standard fastEmit functions don't cover MUL at all and ADD is lowered very inefficientily. The original commit had a bug in the add emit logic, which has been fixed. llvm-svn: 219831	2014-10-15 18:58:07 +00:00
Juergen Ributzka	6780f0f7a0	[FastISel][AArch64] Factor out add with immediate emission into a helper function. NFC. Simplify add with immediate emission by factoring it out into a helper function. llvm-svn: 219830	2014-10-15 18:58:02 +00:00
Rafael Espindola	a74b5e6823	Correctly handle references to section symbols. When processing assembly like .long .text we were creating a new undefined symbol .text. GAS on the other hand would handle that as a reference to the .text section. This patch implements that by creating the section symbols earlier so that they are visible during asm parsing. The patch also updates llvm-readobj to print the symbol number in the relocation dump so that the test can differentiate between two sections with the same name. llvm-svn: 219829	2014-10-15 18:55:30 +00:00
Sid Manning	12cd21aacd	Enable the instruction printer in HexagonMCTargetDesc This adds the MCInstPrinter to the LLVMHexagonDesc library and removes the dependency LLVMHexagonAsmPrinter had on LLVMHexagonDesc. This is a prerequisite needed by the disassembler. Phabricator Revision: http://reviews.llvm.org/D5734 llvm-svn: 219826	2014-10-15 18:27:40 +00:00
Matt Arsenault	1a74aff846	R600/SI: Also try to use 0 base for misaligned 8-byte DS loads. llvm-svn: 219823	2014-10-15 18:06:43 +00:00
Matt Arsenault	7b68fdf3c0	R600: Fix miscompiles when BFE has multiple uses SimplifyDemandedBits would break the other uses of the operand. llvm-svn: 219819	2014-10-15 17:58:34 +00:00
Sanjay Patel	c00017d1f6	correct const-ness with auto and dyn_cast 1. Use const with autos. 2. Don't bother with explicit const in cast ops because they do it automagically. Thanks, David B. / Aaron B. / Reid K. llvm-svn: 219817	2014-10-15 17:45:13 +00:00
Hal Finkel	3b7fc86677	[SLPVectorize] Basic ephemeral-value awareness The SLP vectorizer should not vectorize ephemeral values. These are used to express information to the optimizer, and vectorizing them does not lead to faster code (because the ephemeral values are dropped prior to code generation, vectorized or not), and obscures the information the instructions are attempting to communicate (the logic that interprets the arguments to @llvm.assume generically does not understand vectorized conditions). Also, uses by ephemeral values are free (because they, and the necessary extractelement instructions, will be dropped prior to code generation). llvm-svn: 219816	2014-10-15 17:35:01 +00:00
Hal Finkel	8683d2b0d2	Treat the WorkSet used to find ephemeral values as double-ended We need to make sure that we visit all operands of an instruction before moving deeper in the operand graph. We had been pushing operands onto the back of the work set, and popping them off the back as well, meaning that we might visit an instruction before visiting all of its uses that sit in between it and the call to @llvm.assume. To provide an explicit example, given the following: %q0 = extractelement <4 x float> %rd, i32 0 %q1 = extractelement <4 x float> %rd, i32 1 %q2 = extractelement <4 x float> %rd, i32 2 %q3 = extractelement <4 x float> %rd, i32 3 %q4 = fadd float %q0, %q1 %q5 = fadd float %q2, %q3 %q6 = fadd float %q4, %q5 %qi = fcmp olt float %q6, %q5 call void @llvm.assume(i1 %qi) %q5 is used by both %qi and %q6. When we visit %qi, it will be marked as ephemeral, and we'll queue %q6 and %q5. %q6 will be marked as ephemeral and we'll queue %q4 and %q5. Under the old system, we'd then visit %q4, which would become ephemeral, %q1 and then %q0, which would become ephemeral as well, and now we have a problem. We'd visit %rd, but it would not be marked as ephemeral because we've not yet visited %q2 and %q3 (because we've not yet visited %q5). This will be covered by a test case in a follow-up commit that enables ephemeral-value awareness in the SLP vectorizer. llvm-svn: 219815	2014-10-15 17:34:48 +00:00
Derek Schuff	05fb735f3a	[MC] Make bundle alignment mode setting idempotent and support nested bundles Summary: Currently an error is thrown if bundle alignment mode is set more than once per module (either via the API or the .bundle_align_mode directive). This change allows setting it multiple times as long as the alignment doesn't change. Also nested bundle_lock groups are currently not allowed. This change allows them, with the effect that the group stays open until all nests are exited, and if any of the bundle_lock directives has the align_to_end flag, the group becomes align_to_end. These changes make the bundle aligment simpler to use in the compiler, and also better match the corresponding support in GNU as. Reviewers: jvoung, eliben Differential Revision: http://reviews.llvm.org/D5801 llvm-svn: 219811	2014-10-15 17:10:04 +00:00
Duncan P. N. Exon Smith	7f637a9b48	DI: Make comments "brief"-er, NFC Follow-up to r219801. Post-commit review pointed out that all comments require a `\brief` description [1], so I converted many and recrafted a few to be briefer or to include a brief intro. (If I'm going to clean them up, I should do it right!) [1]: http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments llvm-svn: 219808	2014-10-15 17:01:28 +00:00
Sanjay Patel	473e7fdb08	Use 'auto' for easier reading; no functional change intended. llvm-svn: 219804	2014-10-15 16:21:37 +00:00
Duncan P. N. Exon Smith	d79c4fd595	DI: Cleanup comments, NFC A number of comment cleanups: - Remove duplicated function and class names from comments. - Remove duplicated comments from source file (some of which were out-of-sync). - Move any unduplicated comments from source file to header. - Remove some noisy comments entirely (e.g., a comment for `DIDescriptor::print()` saying "print descriptor" just gets in the way of reading the code). llvm-svn: 219801	2014-10-15 16:15:15 +00:00
Rafael Espindola	7b61ddfa6e	Simplify handling of --noexecstack by using getNonexecutableStackSection. llvm-svn: 219799	2014-10-15 16:12:52 +00:00
Duncan P. N. Exon Smith	3bfffde27a	DI: Use a `DenseMap` instead of named metadata, NFC Remove a strange round-trip through named metadata to assign preserved local variables to their subprograms. llvm-svn: 219798	2014-10-15 16:11:41 +00:00
Rafael Espindola	ad33dd2914	Move getNonexecutableStackSection up to the base ELF class. The .note.GNU-stack section is not SystemZ/X86 specific. llvm-svn: 219796	2014-10-15 15:44:16 +00:00
Matt Arsenault	f179420c57	R600: Use existing variable llvm-svn: 219778	2014-10-15 05:07:00 +00:00
Matt Arsenault	7acfddf17c	R600: Remove outdated comment llvm-svn: 219777	2014-10-15 05:06:57 +00:00
Juergen Ributzka	42379d4cf7	Revert "[FastISel][AArch64] Add custom lowering for GEPs." This breaks our internal build bots. Reverting it to get the bots green again. llvm-svn: 219776	2014-10-15 04:55:48 +00:00
Jingyue Wu	2954280f6a	[MachineSink] Use the real post dominator tree Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. This is the second try of the fix. The first one (D4814) caused a performance regression due to failing to sink instructions out of loops (PR21115). This patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower one regardless of whether the target block post-dominates the source. Thanks Alexey Volkov for reporting PR21115! Test Plan: Added a NVPTX codegen test to verify that our change prevents the backend from over-sinking. It also shows the unnecessary register pressure caused by over-sinking. Added an X86 test to verify we can sink instructions out of loops regardless of the dominance relationship. This test is reduced from Alexey's test in PR21115. Updated an affected test in X86. Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime performance. Results are attached separately in the review thread. Reviewers: Jiangning, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5633 llvm-svn: 219773	2014-10-15 03:27:43 +00:00
Tim Northover	e9ff4c29b9	ARM: drop check for triple that's no longer used. Early attempts to support AAPCS bare metal MachO targets based the decision on the CPU being compiled for. This was not a particularly great idea and we've got a better option now, but this check remained. No functional change for any target we care about. llvm-svn: 219767	2014-10-15 01:05:01 +00:00
Eric Christopher	7396cc9978	Remove unused variable. llvm-svn: 219750	2014-10-15 00:09:07 +00:00
Eric Christopher	611f0488ff	No need to cache this unused variable. Patch by Ehsan Akhgari. llvm-svn: 219749	2014-10-14 23:58:51 +00:00
Gerolf Hoflehner	5d26d40fc5	[AArch64] Wrong CC access in CSINC-conditional branch sequence This is a follow up to commit r219742. It removes the CCInMI variable and accesses the CC in CSCINC directly. In the case of a conditional branch accessing the CC with CCInMI was wrong. llvm-svn: 219748	2014-10-14 23:55:00 +00:00
Gerolf Hoflehner	a4c96d02a2	[AAarch64] Optimize CSINC-branch sequence Peephole optimization that generates a single conditional branch for csinc-branch sequences like in the examples below. This is possible when the csinc sets or clears a register based on a condition code and the branch checks that register. Also the condition code may not be modified between the csinc and the original branch. Examples: 1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44 to b.<invCC> 2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44 to b.<CC> rdar://problem/18506500 llvm-svn: 219742	2014-10-14 23:07:53 +00:00
Hal Finkel	1a600faba0	[LoopVectorize] Ignore @llvm.assume for cost estimates and legality A few minor changes to prevent @llvm.assume from interfering with loop vectorization. First, treat @llvm.assume like the lifetime intrinsics, which are scalarized (but don't otherwise interfere with the legality checking). Second, ignore the cost of ephemeral instructions in the loop (these will go away anyway during CodeGen). Alignment assumptions and other uses of @llvm.assume can often end up inside of loops that should be vectorized (this is not uncommon for assumptions generated by __attribute__((align_value(n))), for example). llvm-svn: 219741	2014-10-14 22:59:49 +00:00
Simon Pilgrim	a798e9ffdf	[X86][SSE] pslldq/psrldq shuffle mask decodes Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq SSE2/AVX2 byte shift instructions. Differential Revision: http://reviews.llvm.org/D5598 llvm-svn: 219738	2014-10-14 22:31:34 +00:00
Tim Northover	cf6ce0c8f7	ARM: remove ARM/Thumb distinction for preferred alignment. Thumb1 has legitimate reasons for preferring 32-bit alignment of types i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be a multiple of 4. However, this is a trade-off betweem code size and RAM usage; the DataLayout string is not the best place to represent it even if desired. So this patch removes the extra Thumb requirements, hopefully making ARM and Thumb completely compatible in this respect. llvm-svn: 219734	2014-10-14 22:12:17 +00:00
Tim Northover	9a4c043d67	ARM: allow misaligned local variables in Thumb1 mode. There's no hard requirement on LLVM to align local variable to 32-bits, so the Thumb1 frame handling needs to be able to deal with variables that are only naturally aligned without falling over. llvm-svn: 219733	2014-10-14 22:12:14 +00:00
Juergen Ributzka	4dfd590eaa	[FastISel][AArch64] Add custom lowering for GEPs. This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail out even for simple cases, because the standard fastEmit functions don't cover MUL and ADD is lowered inefficientily. llvm-svn: 219726	2014-10-14 21:41:23 +00:00
Hans Wennborg	f6aafeee60	[x86 asm] allow fwait alias in both At&t and Intel modes (PR21208) Differential Revision: http://reviews.llvm.org/D5741 llvm-svn: 219725	2014-10-14 21:41:17 +00:00
Tim Northover	aa09ac6e83	ARM: set preferred aggregate alignment to 32 universally. Before, ARM and Thumb mode code had different preferred alignments, which could lead to some rather unexpected results. There's justification for reducing it from the default 64-bits (wasted space), but I don't think there is for going below 32-bits. There's no actual ABI change here, just to reassure people. llvm-svn: 219719	2014-10-14 20:57:26 +00:00
Hal Finkel	db5f86a9bf	[CFL-AA] CFL-AA should not assert on an va_arg instruction The CFL-AA implementation was missing a visit* routine for va_arg instructions, causing it to assert when run on a function that had one. For now, handle these in a conservative way. Fixes PR20954. llvm-svn: 219718	2014-10-14 20:51:26 +00:00
Sanjay Patel	0ca42bb5a8	Optimize away fabs() calls when input is squared (known positive). Eliminate library calls and intrinsic calls to fabs when the input is a squared value. Note that no unsafe-math / fast-math assumptions are needed for this optimization. Differential Revision: http://reviews.llvm.org/D5777 llvm-svn: 219717	2014-10-14 20:43:11 +00:00
Juergen Ributzka	cd11a2806b	[FastISel][AArch64] Fix sign-/zero-extend folding when SelectionDAG is involved. Sign-/zero-extend folding depended on the load and the integer extend to be both selected by FastISel. This cannot always be garantueed and SelectionDAG might interfer. This commit adds additonal checks to load and integer extend lowering to catch this. Related to rdar://problem/18495928. llvm-svn: 219716	2014-10-14 20:36:02 +00:00
David Majnemer	dad2103801	InstCombine: Don't miscompile X % ((Pow2 << A) >>u B) We assumed that A must be greater than B because the right hand side of a remainder operator must be nonzero. However, it is possible for A to be less than B if Pow2 is a power of two greater than 1. Take for example: i32 %A = 0 i32 %B = 31 i32 Pow2 = 2147483648 ((Pow2 << 0) >>u 31) is non-zero but A is less than B. This fixes PR21274. llvm-svn: 219713	2014-10-14 20:28:40 +00:00
Jan Vesely	e5121f3c10	Reapply "R600: Add new intrinsic to read work dimensions" This effectively reverts revert 219707. After fixing the test to work with new function name format and renamed intrinsic. Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219710	2014-10-14 20:05:26 +00:00
Hal Finkel	171c2ec008	Revert "r216914 - Revert: [APFloat] Fixed a bug in method 'fusedMultiplyAdd'" Reapply r216913, a fix for PR20832 by Andrea Di Biagio. The commit was reverted because of buildbot failures, and credit goes to Ulrich Weigand for isolating the underlying issue (which can be confirmed by Valgrind, which does helpfully light up like the fourth of July). Uli explained the problem with the original patch as: It seems the problem is calling multiplySignificand with an addend of category fcZero; that is not expected by this routine. Note that for fcZero, the significand parts are simply uninitialized, but the code in (or rather, called from) multiplySignificand will unconditionally access them -- in effect using uninitialized contents. This version avoids using a category == fcZero addend within multiplySignificand, which avoids this problem (the Valgrind output is also now clean). Original commit message: [APFloat] Fixed a bug in method 'fusedMultiplyAdd'. When folding a fused multiply-add builtin call, make sure that we propagate the correct result in the case where the addend is zero, and the two other operands are finite non-zero. Example: define double @test() { %1 = call double @llvm.fma.f64(double 7.0, double 8.0, double 0.0) ret double %1 } Before this patch, the instruction simplifier wrongly folded the builtin call in function @test to constant 'double 7.0'. With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and propagates the expected result (i.e. 56.0). Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence of NaN/Inf operands. This fixes PR20832. llvm-svn: 219708	2014-10-14 19:23:07 +00:00
Rafael Espindola	db3f0a24ec	Revert "R600: Add new intrinsic to read work dimensions" This reverts commit r219705. CodeGen/R600/work-item-intrinsics.ll was failing on linux. llvm-svn: 219707	2014-10-14 18:58:04 +00:00
Rafael Espindola	76936ebc49	Remove unused member variable. Fixes pr20904. llvm-svn: 219706	2014-10-14 18:53:16 +00:00
Jan Vesely	86187d231a	R600: Add new intrinsic to read work dimensions v2: Add SI lowering Add test v3: Place work dimensions after the kernel arguments. v4: Calculate offset while lowering arguments v5: rebase v6: change prefix to AMDGPU Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219705	2014-10-14 18:52:07 +00:00
Jan Vesely	df19696374	R600: FMA is VecALU only instruction Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219704	2014-10-14 18:52:04 +00:00
Reed Kotler	d4ea29e6b6	Finish getting Mips fast-isel to match up with AArch64 fast-isel Summary: In order to facilitate use of common code, checking by reviewers of other fast-isel ports, and hopefully to eventually move most of Mips and other fast-isel ports into target independent code, I've tried to get the two implementations to line up. There is no functional code change. Just methods moved in the file to be in the same order as in AArch64. Test Plan: No functional change. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, aemerson, rfuhler Differential Revision: http://reviews.llvm.org/D5692 llvm-svn: 219703	2014-10-14 18:27:58 +00:00
David Blaikie	3dfe4788ae	DebugInfo: Ensure that all debug location scope chains from instructions within a function, lead to the function itself. Let me tell you a tale... Originally committed in r211723 after discovering a nasty case of weird scoping due to inlining, this was reverted in r211724 after it fired in ASan/compiler-rt. (minor diversion where I accidentally committed/reverted again in r211871/r211873) After further testing and fixing bugs in ArgumentPromotion (r211872) and Inlining (r212065) it was recommitted in r212085. Reverted in r212089 after the sanitizer buildbots still showed problems. Fixed another bug in ArgumentPromotion (r212128) found by this assertion. Recommitted in r212205, reverted in r212226 after it crashed some more on sanitizer buildbots. Fix clang some more in r212761. Recommitted in r212776, reverted in r212793. ASan failures. Recommitted in r213391, reverted in r213432, trying to reproduce flakey ASan build failure. Fixed bugs in r213805 (ArgPromo + DebugInfo), r213952 (LiveDebugVariables strips dbg_value intrinsics in functions not described by debug info). Recommitted in r214761, reverted in r214999, flakey failure on Windows buildbot. Fixed DeadArgElimination + DebugInfo bug in r219210. Recommitted in r219215, reverted in r219512, failure on ObjC++ atomic properties in the test-suite on Darwin. Fixed ObjC++ atomic properties issue in Clang in r219690. [This commit is provided 'as is' with no hope that this is the last time I commit this change either expressed or implied] llvm-svn: 219702	2014-10-14 18:22:52 +00:00
Rafael Espindola	c57477912a	Remove method that is identical to the base class one. llvm-svn: 219700	2014-10-14 17:38:38 +00:00
Matt Arsenault	e775f5fe76	R600/SI: Use DS offsets for constant addresses Use 0 as the base address for a constant address, so if we have a constant address we can save moves and form read2/write2s. llvm-svn: 219698	2014-10-14 17:21:19 +00:00
David Blaikie	24026502d5	Revert "Fix stuff... again." Accidental commit. This reverts commit r219693. llvm-svn: 219695	2014-10-14 17:13:09 +00:00
David Blaikie	e75f963c61	Revert some parts of r196288 that were confusing and untested. If we figure out why they should be here, let's add some testing of some kind so we can better demonstrate why it's needed. llvm-svn: 219694	2014-10-14 17:12:02 +00:00
David Blaikie	27549023b0	Fix stuff... again. llvm-svn: 219693	2014-10-14 17:11:59 +00:00
Hal Finkel	a3f23e3725	[LVI] Check for @llvm.assume dominating the edge branch When LazyValueInfo uses @llvm.assume intrinsics to provide edge-value constraints, we should check for intrinsics that dominate the edge's branch, not just any potential context instructions. An assumption that dominates the edge's branch represents a truth on that edge. This is specifically useful, for example, if multiple predecessors assume a pointer to be nonnull, allowing us to simplify a later null comparison. The test case, and an initial patch, were provided by Philip Reames. Thanks! llvm-svn: 219688	2014-10-14 16:04:49 +00:00
NAKAMURA Takumi	256d37ad31	Revert r219638, (r219640 and r219676), "Removing the static destructor from ManagedStatic.cpp by controlling the allocation and de-allocation of the mutex." It caused hang-up on msc17 builder, probably deadlock. llvm-svn: 219687	2014-10-14 15:58:16 +00:00
Robert Khasanov	1a77f6664e	[AVX512] Extended avx512_binop_rm to DQ/VL subsets. Added encoding tests. llvm-svn: 219686	2014-10-14 15:13:56 +00:00
Robert Khasanov	545d1b7726	[AVX512] Extended avx512_binop_rm to BW/VL subsets. Added encoding tests. llvm-svn: 219685	2014-10-14 14:36:19 +00:00
Bradley Smith	698e08f4cf	[AArch64] Fix crash with empty/pseudo-only blocks in A53 erratum (835769) workaround llvm-svn: 219684	2014-10-14 14:02:41 +00:00
Eric Christopher	7c558cf4d6	Grab the subtarget info off of the MachineFunction rather than indirecting through the TargetMachine. llvm-svn: 219674	2014-10-14 08:44:19 +00:00
Eric Christopher	da84e33791	Use the triple to figure out if this is a darwin target, not the subtarget. llvm-svn: 219673	2014-10-14 08:25:26 +00:00
Eric Christopher	307c2cb26f	Remove unnecessary TargetMachine.h includes. llvm-svn: 219672	2014-10-14 07:22:08 +00:00
Eric Christopher	6062180203	Grab the subtarget and subtarget dependent variables off of MachineFunction rather than TargetMachine. llvm-svn: 219671	2014-10-14 07:22:00 +00:00
Eric Christopher	b66367a891	Grab the subtarget and subtarget dependent variables off of MachineFunction rather than TargetMachine. llvm-svn: 219670	2014-10-14 07:17:23 +00:00
Eric Christopher	92b4bcbbee	Instead of the TargetMachine cache the MachineFunction and TargetRegisterInfo in the peephole optimizer. This makes it easier to grab subtarget dependent variables off of the MachineFunction rather than the TargetMachine. llvm-svn: 219669	2014-10-14 07:17:20 +00:00
Eric Christopher	eb9e87f6e3	Access subtarget specific variables off of the MachineFunction's cached subtarget and not the TargetMachine. llvm-svn: 219668	2014-10-14 07:00:33 +00:00
Eric Christopher	99556d77ef	Access the subtarget off of the MachineFunction via the DAG scheduler or via the SelectionDAG if available. Otherwise grab the subtarget off of the MachineFunction by going up the parent chain. llvm-svn: 219666	2014-10-14 06:56:25 +00:00
Hao Liu	3cb826ca10	[AArch64]Select wide immediate offset into [Base+XReg] addressing mode e.g Currently we'll generate following instructions if the immediate is too wide: MOV X0, WideImmediate ADD X1, BaseReg, X0 LDR X2, [X1, 0] Using [Base+XReg] addressing mode can save one ADD as following: MOV X0, WideImmediate LDR X2, [BaseReg, X0] Differential Revision: http://reviews.llvm.org/D5477 llvm-svn: 219665	2014-10-14 06:50:36 +00:00
Eric Christopher	b65c7b919c	Remove the use and member variable of the TargetMachine from MachineLICM as we can get the same data off of the MachineFunction. llvm-svn: 219663	2014-10-14 06:26:57 +00:00
Eric Christopher	20c98938bb	Have MachineInstrBundle use the MachineFunction for subtarget access rather than the TargetMachine. llvm-svn: 219662	2014-10-14 06:26:55 +00:00
Eric Christopher	d3fa440d08	Access the subtarget off of the MachineFunction rather than through the TargetMachine. llvm-svn: 219661	2014-10-14 06:26:53 +00:00
Marcello Maggioni	5bbe3df63f	Switch to select optimization for two-case switches This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block and a test to check that this condition is handled. llvm-svn: 219656	2014-10-14 01:58:26 +00:00
Eric Christopher	4c67d5a1e3	Include map into the A15SDOptimizer rather than pick it up transitively from the DFAPacketizer via TargetInstrInfo.h. llvm-svn: 219652	2014-10-14 01:13:51 +00:00
Eric Christopher	2a321f74f0	Remove the TargetMachine from DFAPacketizer since it was only being used to grab subtarget specific things that we can grab from the MachineFunction anyhow. llvm-svn: 219650	2014-10-14 01:03:16 +00:00
Sanjay Patel	17045f7fac	fix formatting; NFC llvm-svn: 219645	2014-10-14 00:33:23 +00:00
Chandler Carruth	7b8297a61e	Add some optional passes around the vectorizer to both better prepare the IR going into it and to clean up the IR produced by the vectorizers. Note that these are off by default right now while folks collect data on whether the performance tradeoff is reasonable. In a build of the 'opt' binary, I see about 2% compile time regression due to this change on average. This is in my mind essentially the worst expected case: very little of the opt binary is going to benefit from these extra passes. I've seen several benchmarks improve in performance my small amounts due to running these passes, and there are certain (rare) cases where these passes make a huge difference by either enabling the vectorizer at all or by hoisting runtime checks out of the outer loop. My primary motivation is to prevent people from seeing runtime check overhead in benchmarks where the existing passes and optimizers would be able to eliminate that. I've chosen the sequence of passes based on the kinds of things that seem likely to be relevant for the code at each stage: rotaing loops for the vectorizer, finding correlated values, loop invariants, and unswitching opportunities from any runtime checks, and cleaning up commonalities exposed by the SLP vectorizer. I'll be pinging existing threads where some of these issues have come up and will start new threads to get folks to benchmark and collect data on whether this is the right tradeoff or we should do something else. llvm-svn: 219644	2014-10-14 00:31:29 +00:00
Peter Collingbourne	ba689eeb38	Introduce LLVMWriteBitcodeToMemoryBuffer C API function. llvm-svn: 219643	2014-10-14 00:30:59 +00:00
David Majnemer	db0773089f	InstCombine: Fix miscompile in X % -Y -> X % Y transform We assumed that negation operations of the form (0 - %Z) resulted in a negative number. This isn't true if %Z was originally negative. Substituting the negative number into the remainder operation may result in undefined behavior because the dividend might be INT_MIN. This fixes PR21256. llvm-svn: 219639	2014-10-13 22:37:51 +00:00
Chris Bieneman	b75d8f300c	Removing the static destructor from ManagedStatic.cpp by controlling the allocation and de-allocation of the mutex. This patch adds a new llvm_call_once function which is used by the ManagedStatic implementation to safely initialize a global to avoid static construction and destruction. llvm-svn: 219638	2014-10-13 22:37:25 +00:00
Eric Christopher	1c5fce0ebb	Migrate another set of getSubtargetImpl away. llvm-svn: 219636	2014-10-13 21:57:44 +00:00
David Majnemer	a252138942	InstCombine: Don't miscompile (x lshr C1) udiv C2 We have a transform that changes: (x lshr C1) udiv C2 into: x udiv (C2 << C1) However, it is unsafe to do so if C2 << C1 discards any of C2's bits. This fixes PR21255. llvm-svn: 219634	2014-10-13 21:48:30 +00:00
Reed Kotler	a562b46db7	Make first of several changes to bring up to AArch64 fast-isel style Summary: Make Mips fast-isel track the form of AArch64 where practical. This makes it easier for people to review the code, to borrow similar code, and to see how to eventually move a lot of this target code for fast-isels into target independent code. These are just cosmetic changes. Should be no functional difference. Test Plan: make check test-suite for 4 flavors mips32 r1/r2 , -O0/-O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: aemerson, llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5595 llvm-svn: 219633	2014-10-13 21:46:41 +00:00
Adrian Prantl	049d21caea	Add an assertion about the integrity of the iterator. Broken parent scope pointers in inlined DIVariables can cause ensureAbstractVariableIsCreated to insert new abstract scopes, thus invalidating the iterator in this loop and leading to hard-to-debug crashes. Useful when manually reducing IR for testcases. llvm-svn: 219628	2014-10-13 20:44:58 +00:00
Adrian Prantl	13c58820f8	constify the getters in SDNodeDbgValue. llvm-svn: 219627	2014-10-13 20:43:47 +00:00
Chad Rosier	df82a33d42	Refactor debug statement and remove dead argument. NFC. llvm-svn: 219626	2014-10-13 19:46:39 +00:00
Filipe Cabecinhas	9d7bd78ffa	Fix a broadcast related regression on the vector shuffle lowering. Summary: Test by Robert Lougher! Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5745 llvm-svn: 219617	2014-10-13 16:16:16 +00:00
Matt Arsenault	3f3a2751e0	R600/SI: Minor cleanup of function llvm-svn: 219616	2014-10-13 15:47:59 +00:00
Yuri Gorshenin	ab1b88ab59	[asan-asm-instrumentation] Follow-up fixes to r219602: asserts are moved into function. llvm-svn: 219610	2014-10-13 11:44:06 +00:00
Renato Golin	16ea8ba3bc	Adds support for the Cortex-A17 to the ARM backend Patch by Matthew Wahab. llvm-svn: 219606	2014-10-13 10:22:19 +00:00
Bradley Smith	f2a801d8ac	[AArch64] Add workaround for Cortex-A53 erratum (835769) Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is possible for a 64-bit multiply-accumulate instruction in AArch64 state to generate an incorrect result. The details are quite complex and hard to determine statically, since branches in the code may exist in some circumstances, but all cases end with a memory (load, store, or prefetch) instruction followed immediately by the multiply-accumulate operation. The safest work-around for this issue is to make the compiler avoid emitting multiply-accumulate instructions immediately after memory instructions and the simplest way to do this is to insert a NOP. This patch implements such work-around in the backend, enabled via the option -aarch64-fix-cortex-a53-835769. The work-around code generation is not enabled by default. llvm-svn: 219603	2014-10-13 10:12:35 +00:00
Yuri Gorshenin	46853b55fa	[asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register. Summary: [asan-asm-instrumentation] Fixed memory references which includes %rsp as a base or an index register. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5599 llvm-svn: 219602	2014-10-13 09:37:47 +00:00
NAKAMURA Takumi	59fe0d4e56	Unix/Signals.inc: Let findModulesAndOffsets() built conditionally regarding to (defined(HAVE_BACKTRACE) && defined(ENABLE_BACKTRACES)). [-Wunused-function] llvm-svn: 219596	2014-10-13 04:32:43 +00:00
NAKAMURA Takumi	75a0240056	Revert r219584, "[X86] Memory folding for commutative instructions." It broke i686 selfhosting. llvm-svn: 219595	2014-10-13 04:17:34 +00:00
Richard Smith	dc69ce32ef	[modules] Stop excluding Support/Debug.h from the Support module. This header has been modular since r206822, and excluding it was leading to workarounds such as the one in r219592, which this change removes. llvm-svn: 219593	2014-10-13 00:41:03 +00:00
Benjamin Kramer	24165219b1	[Modules] Add some missing includes to make files compile stand-alone. llvm-svn: 219592	2014-10-12 22:49:26 +00:00
Benjamin Kramer	7000ca3f55	Modernize old-style static asserts. NFC. llvm-svn: 219588	2014-10-12 17:56:40 +00:00
Joerg Sonnenberger	5ca10d0edb	Revert r219223, it creates invalid PHI nodes. llvm-svn: 219587	2014-10-12 17:16:04 +00:00
Benjamin Kramer	240b85eec5	InstCombine: Turn (x != 0 & x <u C) into the canonical range check form (x-1 <u C-1) llvm-svn: 219585	2014-10-12 14:02:34 +00:00
Simon Pilgrim	77ac26d279	[X86] Memory folding for commutative instructions. This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too. Differential Revision: http://reviews.llvm.org/D5701 llvm-svn: 219584	2014-10-12 10:52:55 +00:00
David Majnemer	27adb1240f	InstCombine: Simplify commonIDivTransforms A helper routine, MultiplyOverflows, was a less efficient reimplementation of APInt's smul_ov and umul_ov. While we are here, clean up the code so it's more uniform. No functionality change intended. llvm-svn: 219583	2014-10-12 08:34:24 +00:00
Simon Pilgrim	3c1e1e9498	Test commit access (email fix) Indentation tidyup. llvm-svn: 219577	2014-10-11 20:28:56 +00:00
Benjamin Kramer	603c2c79ed	AssumptionTracker: Don't create temporary CallbackVHs. Those are expensive to create in cold cache scenarios. NFC. llvm-svn: 219575	2014-10-11 19:13:01 +00:00
Benjamin Kramer	dd13643b97	MC: Shrink MCSymbolRefExpr by only storing the bits we need. 32 -> 16 bytes on x86_64. NFC. llvm-svn: 219574	2014-10-11 17:57:27 +00:00
Benjamin Kramer	3e67db92bc	MC: Bit pack MCSymbolData. On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member variables private and clean up uses to go through the existing accessors. NFC. llvm-svn: 219573	2014-10-11 15:07:21 +00:00
Simon Pilgrim	d89591e0a1	Test commit access Fix comment typo + spelling. llvm-svn: 219572	2014-10-11 14:23:36 +00:00
David Majnemer	fe7fccff11	InstCombine: Don't fold (X <<s log(INT_MIN)) /s INT_MIN to X Consider the case where X is 2. (2 <<s 31)/s-2147483648 is zero but we would fold to X. Note that this is valid when we are in the unsigned domain because we require NUW: 2 <<u 31 results in poison. This fixes PR21245. llvm-svn: 219568	2014-10-11 10:20:04 +00:00
David Majnemer	cb9d596655	InstCombine, InstSimplify: (%X /s C1) /s C2 isn't always 0 when C1 * C2 overflow consider: C1 = INT_MIN C2 = -1 C1 * C2 overflows without a doubt but consider the following: %x = i32 INT_MIN This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1. N. B. Move the unsigned version of this transform to InstSimplify, it doesn't create any new instructions. This fixes PR21243. llvm-svn: 219567	2014-10-11 10:20:01 +00:00
David Majnemer	3cac85e071	InstCombine: mul to shl shouldn't preserve nsw consider: mul i32 nsw %x, -2147483648 this instruction will not result in poison if %x is 1 however, if we transform this into: shl i32 nsw %x, 31 then we will be generating poison because we just shifted into the sign bit. This fixes PR21242. llvm-svn: 219566	2014-10-11 10:19:52 +00:00
Chandler Carruth	bff0ae772c	[SCEV] Fix one more caller blindly passing the latch to SCEV's getSmallConstantTripCount even when it isn't the exiting block. I missed this in my first audit, very sorry. This was found in LNT and elsewhere. I don't have a test case, but it was completely obvious from inspection that this was the problem. I'll see if I can reduce a test case, but I'm not really hopeful, and the value seems quite low. llvm-svn: 219562	2014-10-11 05:28:30 +00:00
Chandler Carruth	bc97a4f46c	Guard the definition of the stack tracing function with the same macros that guard its usage. Without this, we can get unused function warnings when backtraces are disabled. llvm-svn: 219558	2014-10-11 01:04:40 +00:00
Reed Kotler	62de6b96b5	Add basic conditional branches in mips fast-isel Summary: Implement the most basic form of conditional branches in Mips fast-isel. Test Plan: br1.ll run 4 flavors of test-suite. mips32 r1/r2 and at -O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5583 llvm-svn: 219556	2014-10-11 00:55:18 +00:00
Chandler Carruth	6666c27e99	[SCEV] Add some asserts to the recently improved trip count computation routines and fix all of the bugs they expose. I hit a test case that crashed even without these asserts due to passing a non-exiting latch to the ExitingBlock parameter of the trip count computation machinery. However, when I add the nice asserts, it turns out we have plenty of coverage of these bugs, they just didn't manifest in crashers. The core problem seems to stem from an assumption that the latch is the exiting block. While this is often true, and somewhat the "normal" way to think about loops, it isn't necessarily true. The correct way to call the trip count routines in a generic fashion (that is, without a particular exit in mind) is to just use the loop's single exiting block if it has one. The trip count can't be computed generically unless it does. This works great for the loop vectorizer. The loop unroller actually wants to select the latch when it has to chose between multiple exits because for unrolling it is the latch trips that matter. But if this is the desire, it needs to explicitly guard for non-exiting latches and check for the generic trip count in that case. I've added the asserts, and added convenience APIs for querying the trip count generically that check for a single exit block. I've kept the APIs consistent between computing trip count and trip multiples. Thansk to Mark for the help debugging and tracking down the right fix here! llvm-svn: 219550	2014-10-11 00:12:11 +00:00
Lang Hames	3d4340f8c8	[MCJIT] Replace memcpy with readBytesUnaligned in RuntimeDyldMachOI386. This should fix the failures of the MachO_i386_DynNoPIC_relocations.s test case on MIPS hosts. llvm-svn: 219543	2014-10-10 23:07:09 +00:00
Sanjay Patel	ad8b666624	Return undef on FP <-> Int conversions that overflow (PR21330). The LLVM Lang Ref states for signed/unsigned int to float conversions: "If the value cannot fit in the floating point value, the results are undefined." And for FP to signed/unsigned int: "If the value cannot fit in ty2, the results are undefined." This matches the C definitions. The existing behavior pins to infinity or a max int value, but that may just lead to more confusion as seen in: http://llvm.org/bugs/show_bug.cgi?id=21130 Returning undef will hopefully lead to a less silent failure. Differential Revision: http://reviews.llvm.org/D5603 llvm-svn: 219542	2014-10-10 23:00:21 +00:00
Alexey Samsonov	96983b89b0	Follow-up to r219534 to make symbolization more robust. 1) Explicitly provide important arguments to llvm-symbolizer, not relying on defaults. 2) Be more defensive about symbolizer output. This might fix weird failures on ninja-x64-msvc-RA-centos6 buildbot. llvm-svn: 219541	2014-10-10 22:58:26 +00:00
Matt Arsenault	61cc9083d0	R600/SI: Change how DS offsets are printed Match SC by using offset/offset0/offset1 and printing in decimal. llvm-svn: 219537	2014-10-10 22:16:07 +00:00
Matt Arsenault	fe0a2e677b	R600/SI: Match read2/write2 stride 64 versions llvm-svn: 219536	2014-10-10 22:12:32 +00:00
Alexey Samsonov	8a584bb3d7	Re-land r219354: Use llvm-symbolizer to symbolize LLVM/Clang crash dumps. In fact, symbolization is now expected to work only on Linux and FreeBSD/NetBSD, where we have dl_iterate_phdr and can learn the main executable name without argv0 (it will be possible on BSD systems after http://reviews.llvm.org/D5693 lands). #ifdef-out the code for all the rest Unix systems. Reviewed in http://reviews.llvm.org/D5610 llvm-svn: 219534	2014-10-10 22:06:59 +00:00
Matt Arsenault	410332860d	R600/SI: Add load / store machine optimizer pass. Currently this only functions to match simple cases where ds_read2_* / ds_write2_* instructions can be used. In the future it might match some of the other weird load patterns, such as direct to LDS loads. Currently enabled only with a subtarget feature to enable easier testing. llvm-svn: 219533	2014-10-10 22:01:59 +00:00
Sanjoy Das	1f05c51e5e	This patch teaches ScalarEvolution to pick and use !range metadata. It also makes it more aggressive in querying range information by adding a call to isKnownPredicateWithRanges to isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond. phabricator: http://reviews.llvm.org/D5638 Reviewed by: atrick, hfinkel llvm-svn: 219532	2014-10-10 21:22:34 +00:00
Chandler Carruth	38811ccb97	[mips] Actually mark that the default case is unreachable as this switch is over a subset of condition codes. This fixes the -Werror build which warns about use of uninitialized variables in the default case. llvm-svn: 219531	2014-10-10 21:07:03 +00:00
Reed Kotler	1f64ecab79	Implement floating point compare for mips fast-isel Summary: Expand SelectCmp to handle floating point compare Test Plan: fpcmpa.ll run 4 flavors of test-suite, mips32 r1/r2 O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5567 llvm-svn: 219530	2014-10-10 20:46:28 +00:00
David Blaikie	325c5757aa	Revert "DebugInfo: Ensure that all debug location scope chains from instructions within a function, lead to the function itself." This invariant is violated (& the assertions fire) on some Objective C++ in the test-suite. Reverting while I investigate. This reverts commit r219215. llvm-svn: 219523	2014-10-10 18:46:21 +00:00
Matt Arsenault	a39da09eb6	R600/SI: Disable copying of SCC llvm-svn: 219519	2014-10-10 17:44:47 +00:00
Reed Kotler	497311ab99	implement integer compare in mips fast-isel Summary: implement SelectCmp (integer compare ) in mips fast-isel Test Plan: icmpa.ll also ran 4 test-suite flavors mips32 r1/r2 O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler, mcrosier Differential Revision: http://reviews.llvm.org/D5566 llvm-svn: 219518	2014-10-10 17:39:51 +00:00
Mark Heffernan	2beab5f0b4	This patch de-pessimizes the calculation of loop trip counts in ScalarEvolution in the presence of multiple exits. Previously all loops exits had to have identical counts for a loop trip count to be considered computable. This pessimization was implemented by calling getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock) inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME in the comments of that function). The pessimization was added to fix a corner case involving undefined behavior (pr/16130). This patch more precisely handles the undefined behavior case allowing the pessimization to be removed. ControlsExit replaces IsSubExpr to more precisely track the case where undefined behavior is expected to occur. Because undefined behavior is tracked more precisely we can remove MustExit from ExitLimit. MustExit was used to track the case where the limit was computed potentially assuming undefined behavior even if undefined behavior didn't necessarily occur. llvm-svn: 219517	2014-10-10 17:39:11 +00:00
Fariborz Jahanian	2132fbe21e	Add couple of missing 'override' keyword. NFC. llvm-svn: 219516	2014-10-10 17:34:30 +00:00
Bill Schmidt	dcce023549	[PowerPC] Reduce names from Power8Vector to P8Vector Per Hal Finkel's review, improving typability of some variable names. llvm-svn: 219514	2014-10-10 17:21:15 +00:00
Hal Finkel	7a87f8a670	[MiSched] Fix a logic error in tryPressure() Fixes a logic error in the MachineScheduler found by Steve Montgomery (and confirmed by Andy). This has gone unfixed for months because the fix has been found to introduce some small performance regressions. However, Andy has recommended that, at this point, we fix this to avoid further dependence on the incorrect behavior (and then follow-up separately on any regressions), and I agree. Fixes PR18883. llvm-svn: 219512	2014-10-10 17:06:20 +00:00
Reed Kotler	12f9488e33	Implement floating point to integer conversion in mips fast-isel Summary: Add the ability to convert 64 or 32 bit floating point values to integer in mips fast-isel Test Plan: fpintconv.ll ran 4 flavors of test-suite with no errors, misp32 r1/r2 O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler, mcrosier Differential Revision: http://reviews.llvm.org/D5562 llvm-svn: 219511	2014-10-10 17:00:46 +00:00
David Blaikie	7d6f29d1ee	Simplify a few uses of DwarfDebug::SPMap llvm-svn: 219510	2014-10-10 16:59:52 +00:00
Timur Iskhodzhanov	2cf8a1ded8	Reorder functions in WinCodeViewLineTables.cpp [NFC] This helps read the comments and understand the code in a natural order llvm-svn: 219508	2014-10-10 16:05:32 +00:00
Frederic Riss	b3c9912a45	[dwarfdump] Prettyprint DW_AT_APPLE_property_attribute bitfield values. This change depends on the ApplePropertyString helper that I sent spearately. Not sure how you want this tested: as a tool test by adding a binary to dump, or as an llvm test starting from an IR file? Reviewers: dblaikie, samsonov Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5689 llvm-svn: 219507	2014-10-10 15:51:10 +00:00
Frederic Riss	d4de180e19	[dwarfdump] Resolve also variable specifications/abstract_origins. DW_AT_specification and DW_AT_abstract_origin resolving was only performed on subroutine DIEs because it used the getSubroutineName method. Introduce a more generic getName() and use it to dump the reference attributes. Testcases have been updated to check the printed names instead of the offsets except when the name could be ambiguous. Reviewers: dblaikie, samsonov Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5625 llvm-svn: 219506	2014-10-10 15:51:02 +00:00
Benjamin Kramer	2c99e413ba	Reduce double set lookups. NFC. llvm-svn: 219505	2014-10-10 15:32:50 +00:00
Bill Schmidt	cfc4a54a48	[PowerPC] Add feature for Power8 vector extensions The current VSX feature for PowerPC specifies availability of the VSX instructions added with the 2.06 architecture version. With 2.07, the architecture adds new instructions to both the Category:Vector and Category:VSX instruction sets. Additionally, unaligned vector storage operations have improved performance. This patch adds a feature to provide access to the new instructions and performance capabilities of Power8. For compatibility with GCC, the feature is controlled via a new -mpower8-vector switch, and the feature causes the __POWER8_VECTOR__ builtin define to be generated by the preprocessor. There is a companion patch for cfe being committed at the same time. llvm-svn: 219501	2014-10-10 15:09:28 +00:00
Zoran Jovanovic	98bd58ca33	[mips][microMIPS] Implement ADDIUSP instruction Differential Revision: http://reviews.llvm.org/D5084 llvm-svn: 219500	2014-10-10 14:37:30 +00:00
Zoran Jovanovic	95e14e711d	[mips][microMIPS] Implement JR16 instruction Differential Revision: http://reviews.llvm.org/D5062 llvm-svn: 219498	2014-10-10 14:02:44 +00:00
Zoran Jovanovic	b26f889afa	[mips][microMIPS] Implement ADDIUS5 instruction Differential Revision: http://reviews.llvm.org/D5049 llvm-svn: 219495	2014-10-10 13:45:34 +00:00
Zoran Jovanovic	b39a174f11	ps][microMIPS] Implement JRC instruction Differential Revision: http://reviews.llvm.org/D5045 llvm-svn: 219494	2014-10-10 13:31:18 +00:00
Zoran Jovanovic	6097bad3f8	[mips][microMIPS] Implement JALRS16 instruction Differential Revision: http://reviews.llvm.org/D5027 llvm-svn: 219493	2014-10-10 13:22:28 +00:00
Timur Iskhodzhanov	7edfc5948b	Fix a small typo, NFC llvm-svn: 219492	2014-10-10 12:52:58 +00:00
Benjamin Kramer	f9a2975417	APInt: Unfold return expressions so RVO can work. Saves a couple of expensive deep copies. NFC. llvm-svn: 219487	2014-10-10 10:18:12 +00:00
Chandler Carruth	82cc9641f7	Don't use an unqualified 'abs' function call with a builtin type. This is dangerous for numerous reasons. The primary risk here is with floating point or double types where if the wrong header files are included in a strange order this can implicitly convert to integers and then call the C abs function on the integers. There is a secondary risk that even impacts integers where if the namespace the code is written in ever defines an abs overload for types within that namespace the global abs will be hidden. The correct form is to call std::abs or write 'using std::abs' for builtin types (and only the latter is correct in any generic context). I've also added the requisite header to be a bit more explicit here. llvm-svn: 219484	2014-10-10 08:27:19 +00:00
David Blaikie	4191cbce8c	Sink the per-CU part of DwarfDebug::finishSubprogramDefinitions into DwarfCompileUnit. llvm-svn: 219477	2014-10-10 06:39:29 +00:00
David Blaikie	58410f241e	Sink most of DwarfDebug::constructAbstractSubprogramScopeDIE down into DwarfCompileUnit. llvm-svn: 219476	2014-10-10 06:39:26 +00:00
Chandler Carruth	d9edd1e2ab	[ADT] Add the scalbn function for APFloat. llvm-svn: 219473	2014-10-10 04:54:30 +00:00
Hal Finkel	49dadc0bc3	[LVI] Revert the remainder of "r218231 - Add two thresholds lvi-overdefined-BB-threshold and lvi-overdefined-threshold" Some of r218231 was reverted with the code that used it in r218971, but not all of it. This removes the rest (which is now dead). llvm-svn: 219469	2014-10-10 03:56:24 +00:00
David Blaikie	9ab48849ad	Avoid unnecessary map lookup/insertion. llvm-svn: 219466	2014-10-10 03:09:38 +00:00
Arnold Schwaighofer	d7d010eb2a	SimplifyCFG: Don't convert phis into selects if we could remove undef behavior instead We used to transform this: define void @test6(i1 %cond, i8* %ptr) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br label %bb2 bb2: %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ] store i8 2, i8* %ptr.2, align 8 ret void } into this: define void @test6(i1 %cond, i8* %ptr) { %ptr.2 = select i1 %cond, i8* null, i8* %ptr store i8 2, i8* %ptr.2, align 8 ret void } because the simplifycfg transformation into selects would happen to happen before the simplifycfg transformation that removes unreachable control flow (We have 'unreachable control flow' due to the store to null which is undefined behavior). The existing transformation that removes unreachable control flow in simplifycfg is: /// If BB has an incoming value that will always trigger undefined behavior /// (eg. null pointer dereference), remove the branch leading here. static bool removeUndefIntroducingPredecessor(BasicBlock BB) Now we generate: define void @test6(i1 %cond, i8 %ptr) { store i8 2, i8* %ptr.2, align 8 ret void } I did not see any impact on the test-suite + externals. rdar://18596215 llvm-svn: 219462	2014-10-10 01:27:02 +00:00
Sanjay Patel	3d497cd778	Improve sqrt estimate algorithm (fast-math) This patch changes the fast-math implementation for calculating sqrt(x) from: y = 1 / (1 / sqrt(x)) to: y = x * (1 / sqrt(x)) This has 2 benefits: less code / faster code and one less estimate instruction that may lose precision. The only target that will be affected (until http://reviews.llvm.org/D5658 is approved) is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf or vector sqrtf and 4 less flops for a double-precision sqrt. We also eliminate a constant load and extra register usage. Differential Revision: http://reviews.llvm.org/D5682 llvm-svn: 219445	2014-10-09 21:26:35 +00:00
Sanjay Patel	6d28da10e5	delete function names from comments llvm-svn: 219444	2014-10-09 21:24:46 +00:00
Sanjay Patel	352fb46d4f	delete function name from comment llvm-svn: 219443	2014-10-09 21:23:39 +00:00
Frederic Riss	b5e53eefb7	Add ApplePropertyString dump helper to Dwarf.{h\|cpp}. Reviewers: dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5688 llvm-svn: 219442	2014-10-09 20:43:04 +00:00
Samuel Antao	1194b8fd40	Fix bug in GPR to FPR moves in PPC64LE. The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem. llvm-svn: 219441	2014-10-09 20:42:56 +00:00
David Blaikie	73cc705a37	Remove unused parameter llvm-svn: 219440	2014-10-09 20:36:27 +00:00
David Blaikie	78b65b6f2c	Sink DwarfDebug::createAndAddScopeChildren down into DwarfCompileUnit. llvm-svn: 219437	2014-10-09 20:26:15 +00:00
David Blaikie	1d072348cf	Sink DwarfDebug::constructSubprogramScopeDIE down into DwarfCompileUnit llvm-svn: 219436	2014-10-09 20:21:36 +00:00
Chad Rosier	bd64d46188	[Reassociate] Don't canonicalize X - undef to X + (-undef). Phabricator Revision: http://reviews.llvm.org/D5674 PR21205 llvm-svn: 219434	2014-10-09 20:06:29 +00:00
Benjamin Kramer	2c3778dc51	Remove a compiler bug workaround from 2007. The affected versions of gcc are long gone. NFC. llvm-svn: 219433	2014-10-09 19:50:39 +00:00
Hal Finkel	cbbd3df836	Revert "[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information."" This reverts commit r219135 -- still causing miscompiles in SPEC it seems... llvm-svn: 219432	2014-10-09 19:48:12 +00:00

... 3 4 5 6 7 ...

73802 Commits