llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	ca3124f74b	[InstCombine] clean up visitAshr(); NFCI llvm-svn: 292036	2017-01-14 23:13:50 +00:00
Davide Italiano	6d28500ff9	[NewGVN] Fix a warning from GCC. Patch by Gonsolo. Differential Revision: https://reviews.llvm.org/D28731 llvm-svn: 292031	2017-01-14 20:44:08 +00:00
Davide Italiano	ed67f1978e	[NewGVN] clang-format this file after recent changes. llvm-svn: 292026	2017-01-14 20:15:04 +00:00
Davide Italiano	7cf29dcca5	[NewGVN] Try to be consistent wit the style used in this file. NFCI. llvm-svn: 292025	2017-01-14 20:13:18 +00:00
Davide Italiano	76de68eaf9	[TargetLowering] Simplfiy a bit. NFCI. llvm-svn: 292024	2017-01-14 20:09:29 +00:00
Simon Pilgrim	d419b73a42	[CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 has landed llvm-svn: 292023	2017-01-14 19:24:23 +00:00
Simon Pilgrim	8e5ecf8ad1	[X86][XOP] Added support for VPMADCSWD 'extend+hadd' IFMA patterns VPMADCSWD act as VPADDD( VPMADDWD( x, y ), z ) - multiply+extend+hadd and add to v4i32 accumulator llvm-svn: 292021	2017-01-14 18:52:13 +00:00
Simon Pilgrim	b290805e94	[X86][XOP] Added support for VPMACSDQH/VPMACSDQL 'extension' IFMA patterns VPMACSDQH/VPMACSDQL act as VPADDQ( VPMULDQ( x, y ), z ) - multiply+extending either the odd/even 4i32 input elements and adding to v2i64 accumulator llvm-svn: 292020	2017-01-14 18:08:54 +00:00
Simon Pilgrim	a1631749f8	[X86][XOP] Added support for VPMACSWW/VPMACSDD 'lossy' IFMA patterns VPMACSWW/VPMACSDD act as add( mul( x, y ), z ) - ignoring any upper bits from both the multiply and add stages llvm-svn: 292019	2017-01-14 17:13:52 +00:00
Craig Topper	63e2cd6caa	[AVX-512] Teach two address instruction pass to replace masked move instructions with blendm instructions when its beneficial. Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions. This also picks up cases where the masked move was created due to a masked load intrinsic. Differential Revision: https://reviews.llvm.org/D28454 llvm-svn: 292005	2017-01-14 07:50:52 +00:00
Craig Topper	09b7e0f01d	[AVX-512] Replace V_SET0 in AVX-512 patterns with AVX512_128_SET0. Enhance AVX512_128_SET0 expansion to make this possible. We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD. This makes it possible for the register allocator to have all 32 registers available to work with. llvm-svn: 292004	2017-01-14 07:29:24 +00:00
Marcello Maggioni	0616b5ff5c	Removing potentially error-prone fallthrough. NFC This fallthrough if other cases are added between fabs and default could cause fabs to fall to the next case resulting in a bug. Better getting rid of it immediately just to be sure. llvm-svn: 292003	2017-01-14 07:28:47 +00:00
Craig Topper	9cc685a56e	[X86] Simplify the code that calculates a scaled blend mask. We don't need a second loop. llvm-svn: 291996	2017-01-14 04:29:15 +00:00
Craig Topper	9850210d03	[AVX-512] Change blend mask in lowerVectorShuffleAsBlend to a 64-bit value. Also add 32-bit mode command lines to the test case that exercises this just to make sure we sanely handle the 64-bit immediate there. This fixes a undefined sanitizer failure from r291888. llvm-svn: 291994	2017-01-14 04:19:35 +00:00
Eugene Zelenko	5fa43960f3	[Transforms/Utils] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 291983	2017-01-14 00:32:38 +00:00
Easwaran Raman	a7bdb8a513	Compute summary before calling extractProfTotalWeight extractProfTotalWeight checks if the profile type is sample profile, but before that we have to ensure that summary is available. Also expanded the unittest to test the case where there is no summar Differential Revision: https://reviews.llvm.org/D28708 llvm-svn: 291982	2017-01-14 00:32:37 +00:00
Daniel Berlin	b66164ca34	NewGVN: Kill unneeded DFSDomMap, cleanup a few comments. llvm-svn: 291981	2017-01-14 00:24:23 +00:00
Justin Bogner	1a314dac4b	GlobalISel: Abort in ResetMachineFunctionPass if fallback isn't enabled When GlobalISel is configured to abort rather than fallback the only thing that resetting the machine function does is make things harder to debug. If we ever get to this point in the abort configuration it indicates that we've already hit a bug, so this changes the behaviour to abort instead. llvm-svn: 291977	2017-01-13 23:46:11 +00:00
Sanjay Patel	40f401776b	[InstCombine] optimize unsigned icmp of increment Allows LLVM to optimize sequences like the following: %add = add nuw i32 %x, 1 %cmp = icmp ugt i32 %add, %y Into: %cmp = icmp uge i32 %x, %y Previously, only signed comparisons were being handled. Decrements could also be handled, but 'sub nuw %x, 1' is currently canonicalized to 'add %x, -1' in InstCombineAddSub, losing the nuw flag. Removing that canonicalization seems like it might have far-reaching ramifications so I kept this simple for now. Patch by Matti Niemenmaa! Differential Revision: https://reviews.llvm.org/D24700 llvm-svn: 291975	2017-01-13 23:25:46 +00:00
Tim Northover	2b57987827	[GlobalISel] track predecessor mapping during switch lowering. Correctly populating Machine PHIs relies on knowing exactly how the IR level CFG was lowered to MachineIR. This needs to be tracked by any translation phases that meddle (currently only SwitchInst handling). llvm-svn: 291973	2017-01-13 23:11:37 +00:00
Sanjay Patel	2d4b456427	[InstCombine] use m_APInt to allow lshr folds for vectors with splat constants llvm-svn: 291972	2017-01-13 23:04:10 +00:00
Daniel Berlin	c0431fd02d	NewGVN: Move leaders around properly to ensure we have a canonical dominating leader. Fixes PR 31613. Summary: This is a testcase where phi node cycling happens, and because we do not order the leaders by domination or anything similar, the leader keeps changing. Using std::set for the members is too expensive, and we actually don't need them sorted all the time, only at leader changes. We could keep both a set and a vector, and keep them mostly sorted and resort as necessary, or use a set and a fibheap, but all of this seems premature. After running some statistics, we are able to avoid the vast majority of sorting by keeping a "next leader" field. Most congruence classes only have leader changes once or twice during GVN. Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28594 llvm-svn: 291968	2017-01-13 22:40:01 +00:00
Greg Clayton	c109bbea57	Add a variant of DWARFDie::find() and DWARFDie::findRecursively() that takes a llvm::ArrayRef<dwarf::Attribute>. This allows us efficiently look for more than one attribute, something that is quite common in DWARF consumption. Differential Revision: https://reviews.llvm.org/D28704 llvm-svn: 291967	2017-01-13 22:32:12 +00:00
David Majnemer	bba17390c7	[LoopStrengthReduce] Don't bother rewriting PHIs in catchswitch blocks The catchswitch instruction cannot be split, don't bother trying to rewrite it. This fixes PR31627. llvm-svn: 291966	2017-01-13 22:24:27 +00:00
David Majnemer	e0ebdf4bc7	[CodeGen] Simplify getRecipEstimateForFunc It used two attribute lookups when only one was needed. llvm-svn: 291965	2017-01-13 22:24:25 +00:00
Greg Clayton	97d22187d0	Cleanup how DWARFDie attributes are accessed and decoded. Removed all DWARFDie::getAttributeValueAs() calls. Renamed: Optional<DWARFFormValue> DWARFDie::getAttributeValue(dwarf::Attribute); To: Optional<DWARFFormValue> DWARFDie::find(dwarf::Attribute); Added: Optional<DWARFFormValue> DWARFDie::findRecursively(dwarf::Attribute); All decoding of Optional<DWARFFormValue> values are now done using the dwarf::to() functions from DWARFFormValue.h: Old code: auto DeclLine = DWARFDie.getAttributeValueAsSignedConstant(DW_AT_decl_line).getValueOr(0); New code: auto DeclLine = toUnsigned(DWARFDie.find(DW_AT_decl_line), 0); This composition helps us since we can now easily do: auto DeclLine = toUnsigned(DWARFDie.findRecursively(DW_AT_decl_line), 0); This allows us to easily find attribute values in the current DIE only (the first new code above) or in any DW_AT_abstract_origin or DW_AT_specification Dies using the line above. Note that the code line length is shorter and more concise. Differential Revision: https://reviews.llvm.org/D28581 llvm-svn: 291959	2017-01-13 21:08:18 +00:00
David L. Jones	41cecba8e9	"Use" lambda captures which are otherwise only used in asserts. NFC Summary: The LLVM coding standards recommend "using" values that are only needed by asserts: http://llvm.org/docs/CodingStandards.html#assert-liberally Without this change, LLVM cannot bootstrap with -Werror as the second stage fails with this new warning: https://reviews.llvm.org/rL291905 See also the previous fixes: https://reviews.llvm.org/rL291916 https://reviews.llvm.org/rL291939 https://reviews.llvm.org/rL291940 https://reviews.llvm.org/rL291941 Reviewers: rsmith Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28695 llvm-svn: 291957	2017-01-13 21:02:41 +00:00
Artem Belevich	64dc9be7b4	[NVPTX] Added support for half-precision floating point. Only scalar half-precision operations are supported at the moment. - Adds general support for 'half' type in NVPTX. - fp16 math operations are supported on sm_53+ GPUs only (can be disabled with --nvptx-no-f16-math). - Type conversions to/from fp16 are supported on all GPU variants. - On GPU variants that do not have full fp16 support (or if it's disabled), fp16 operations are promoted to fp32 and results are converted back to fp16 for storage. Differential Revision: https://reviews.llvm.org/D28540 llvm-svn: 291956	2017-01-13 20:56:17 +00:00
Konstantin Zhuravlyov	7d88275577	[AMDGPU] Implement f16 fcopysign and fcopysign(f32, f64) Differential Revision: https://reviews.llvm.org/D28496 llvm-svn: 291954	2017-01-13 19:49:25 +00:00
James Y Knight	99a2ce2af2	Check for register clobbers when merging a vreg live range with a reserved physreg in RegisterCoalescer. Previously, we only checked for clobbers when merging into a READ of the physreg, but not when merging from a WRITE to the physreg. Differential Revision: https://reviews.llvm.org/D28527 llvm-svn: 291942	2017-01-13 19:08:36 +00:00
Sanjay Patel	acd24c7b6a	[InstCombine] use 'match' and other clean-up; NFCI llvm-svn: 291937	2017-01-13 18:52:10 +00:00
Artem Belevich	d109f46573	[NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed. Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32 instruction even when unsafe FP math was not allowed. Clang-generated IR is not affected by this as it uses precise sin/cos from CUDA's libdevice when unsafe math is disabled. Differential Revision: https://reviews.llvm.org/D28619 llvm-svn: 291936	2017-01-13 18:48:13 +00:00
Sanjay Patel	b22f6c5f26	[InstCombine] use m_APInt to allow shl folds for vectors with splat constants llvm-svn: 291934	2017-01-13 18:39:09 +00:00
Michael Liao	468fb745e8	[SCEV] Limit recursion depth of constant evolving. - For a loop body with VERY complicated exit condition evaluation, constant evolving may run out of stack on platforms such as Windows. Need to limit the recursion depth. Differential Revision: https://reviews.llvm.org/D28629 llvm-svn: 291927	2017-01-13 18:28:30 +00:00
Sanjay Patel	cf08203105	[InstCombine] use Op0/Op1 local variables more consistently with shifts; NFC llvm-svn: 291923	2017-01-13 18:08:25 +00:00
Malcolm Parsons	17d266bc96	Remove unused lambda captures. NFC llvm-svn: 291916	2017-01-13 17:12:16 +00:00
Sanjay Patel	5178363687	[InstCombine] if the condition of a select may be known via assumes, eliminate the select This is a limited solution for PR31512: https://llvm.org/bugs/show_bug.cgi?id=31512 The motivation is that we will need to increase usage of llvm.assume and/or metadata to solve PR28430: https://llvm.org/bugs/show_bug.cgi?id=28430 ...and this kind of simplification is needed to take advantage of that extra information. The 'not' test case would be handled by: https://reviews.llvm.org/D28485 Differential Revision: https://reviews.llvm.org/D28337 llvm-svn: 291915	2017-01-13 17:02:42 +00:00
Ivan Krasin	1ed7896c1b	Revert r291903 and r291898. Reason: they break check-lld on the bots. Summary: Revert [ARM] Fix ubig32_t read in ARMAttributeParser Now using support functions to read data instead of trying to perform casts. =========================================================== Revert [ARM] Enable objdump to construct triple for ARM Now that The ARMAttributeParser has been moved into the library, it has been modified so that it can parse the attributes without printing them and stores them in a map. ELFObjectFile now queries the attributes to fill out the architecture details of a provided triple for 'arm' and 'thumb' targets. llvm-objdump uses this new functionality. Subscribers: llvm-commits, samparker, aemerson, mgorny Differential Revision: https://reviews.llvm.org/D28683 llvm-svn: 291911	2017-01-13 16:45:15 +00:00
Saleem Abdulrasool	6ef45916c6	ARM: match GCC's behaviour for builtins GCC changes the CC between the user-code and the builtins based on the value of `-target` rather than `-mfloat-abi`. When a HF target is used, the VFP variant of the AAPCS CC is used. Otherwise, the AAPCS variant is used. In all cases, the AEABI functions use the AAPCS CC. Adjust the calling convention based on the target. Resolves PR30543! llvm-svn: 291909	2017-01-13 16:25:33 +00:00
Benjamin Kramer	061f4a5fe6	Apply clang-tidy's performance-unnecessary-value-param to LLVM. With some minor manual fixes for using function_ref instead of std::function. No functional change intended. llvm-svn: 291904	2017-01-13 14:39:03 +00:00
Sam Parker	2178370e53	[ARM] Fix ubig32_t read in ARMAttributeParser Now using support functions to read data instead of trying to perform casts. Differential Revision: https://reviews.llvm.org/D28669 llvm-svn: 291903	2017-01-13 14:36:09 +00:00
Daniel Sanders	d6a1831ea7	[globalisel][aarch64] Make getCopyMapping() take register banks ID's rather than IsGPR booleans Summary: This allows the function to handle architectures with more than two register banks. Depends on D27978 Reviewers: ab, t.p.northover, rovka, qcolombet Subscribers: aditya_nandakumar, kristof.beyls, aemerson, rengolin, vkalintiris, dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D27339 llvm-svn: 291902	2017-01-13 14:16:33 +00:00
Simon Pilgrim	7f2a6d5e8c	[X86][AVX512] Add support for variable ASHR v2i64/v4i64 support without VLX Use v8i64 variable ASHR instructions if we don't have VLX. This is a reduced version of D28537 that just adds support for variable shifts - I'll continue with that patch (for just constant/uniform shifts) once I've fixed the type legalization issue in avx512-cvt.ll. Differential Revision: https://reviews.llvm.org/D28604 llvm-svn: 291901	2017-01-13 13:16:19 +00:00
Daniel Sanders	21ac840fca	[aarch64][globalisel] Move getValueMapping/getCopyMapping to AArch64GenRegisterBankInfo. NFC. Summary: We did lose a little specificity in the assertion messages for the PartialMappingIdx enumerators in this change but this was necessary to avoid unnecessary use of 'public:' and we haven't lost anything that can't be discovered easily in lldb. Once this is tablegen-erated we could also safely remove the assertions. Depends on D27976 Reviewers: t.p.northover, ab, rovka, qcolombet Subscribers: aditya_nandakumar, aemerson, rengolin, vkalintiris, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D27978 llvm-svn: 291900	2017-01-13 11:50:34 +00:00
Daniel Sanders	f81cf47e65	[aarch64][globalisel] Refactor getRegBankBaseIdxOffset() to remove the power-of-2 assumption. NFC Summary: We don't exploit it yet though Depends on D27976 Reviewers: t.p.northover, ab, rovka, qcolombet Subscribers: aditya_nandakumar, aemerson, rengolin, vkalintiris, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D27977 llvm-svn: 291899	2017-01-13 11:23:37 +00:00
Sam Parker	770ceb69ba	[ARM] Enable objdump to construct triple for ARM Now that The ARMAttributeParser has been moved into the library, it has been modified so that it can parse the attributes without printing them and stores them in a map. ELFObjectFile now queries the attributes to fill out the architecture details of a provided triple for 'arm' and 'thumb' targets. llvm-objdump uses this new functionality. Differential Revision: https://reviews.llvm.org/D28281 llvm-svn: 291898	2017-01-13 11:04:21 +00:00
Daniel Sanders	438a1ecc2c	[aarch64][globalisel] Move data into <Target>GenRegisterBankInfo. NFC. Summary: Depends on D27809 Reviewers: t.p.northover, rovka, qcolombet, ab Subscribers: aditya_nandakumar, aemerson, rengolin, vkalintiris, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D27976 llvm-svn: 291897	2017-01-13 10:53:57 +00:00
Sam Parker	34315eec58	[ARM] Moved ARMAttributeParser to Support Moved ARMAttributeParser out of llvm-readobj and into the support library. Differential Revision: https://reviews.llvm.org/D28227 llvm-svn: 291896	2017-01-13 10:50:01 +00:00
Diana Picus	a2c59149e1	[ARM] CodeGen: Replace AddDefaultT1CC and AddNoT1CC. NFC For AddDefaultT1CC, we add a new helper t1CondCodeOp, which creates the appropriate register operand. For AddNoT1CC, we use the existing condCodeOp helper - we only had two uses of AddNoT1CC, so at this point it's probably not worth having yet another helper just for them. Differential Revision: https://reviews.llvm.org/D28603 llvm-svn: 291894	2017-01-13 10:37:37 +00:00
Diana Picus	8a73f5562f	[ARM] CodeGen: Remove AddDefaultCC. NFC. Replace all uses of AddDefaultCC with add(condCodeOp()). The transformation has been done automatically with a custom tool based on Clang AST Matchers + RefactoringTool. Differential Revision: https://reviews.llvm.org/D28557 llvm-svn: 291893	2017-01-13 10:18:01 +00:00
Diana Picus	116bbab4e4	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891	2017-01-13 09:58:52 +00:00
Diana Picus	4f8c3e1882	[ARM] CodeGen: Remove AddDefaultPred. NFC. Replace all uses of AddDefaultPred with MachineInstrBuilder::add(predOps()). This makes the code building MachineInstrs more readable, because it allows us to write code like: MIB.addSomeOperand(blah) .add(predOps()) .addAnotherOperand(blahblah) instead of AddDefaultPred(MIB.addSomeOperand(blah)) .addAnotherOperand(blahblah) This commit also adds the predOps helper in the ARM backend, as well as the add method taking a variable number of operands to the MachineInstrBuilder. The transformation has been done mostly automatically with a custom tool based on Clang AST Matchers + RefactoringTool. Differential Revision: https://reviews.llvm.org/D28555 llvm-svn: 291890	2017-01-13 09:37:56 +00:00
Michael Zuckerman	558a4d8419	[X86][AVX512] Adding missing shuffle lowering to blend mask instructions Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) . In this patch, I added new pattern match for this case. Reviewers: 1. craig.topper 2. guyblank 3. RKSimon 4. igorb Differential Revision: https://reviews.llvm.org/D28483 llvm-svn: 291888	2017-01-13 09:06:00 +00:00
Tobias Grosser	190d4e5fa2	RegionPass: Set isExecuted flag correctly This was forgotten in r291882. Without this fix, the Polly build bots are broken. llvm-svn: 291887	2017-01-13 09:00:17 +00:00
Craig Topper	1ec84c2a18	[AVX-512] Remove unmasked BLENDM instructions from the wrong load folding table. The unmasked versions read memory from operand 2, but were in the operand 3 table. These aren't the most interesting set of blendm instructions as the unmasked version isn't useful. We were also missing the B and W forms. I'll add the masked versions of all sizes in a future patch. llvm-svn: 291885	2017-01-13 07:28:56 +00:00
Craig Topper	46b6ecf41e	[X86] Move some entries in the load folding tables to move appropriate grouping. NFC llvm-svn: 291884	2017-01-13 07:28:53 +00:00
Craig Topper	eec4890346	[IR] Don't call assertModuleIsMaterialized in release builds Summary: To fix a release vs debug build linking error, r259695 made the body of assertModuleIsMaterialized empty if Value.cpp gets compiled in a release build. This way any code compiled as a debug build can still link against a release version of the function. This patch takes this a step farther and removes all calls to it from Value.h in any code that includes it in a relase build. This shrinks the opt binary on my macbook build by 17240 bytes. Reviewers: rafael Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28191 llvm-svn: 291883	2017-01-13 06:26:18 +00:00
Serge Pavlov	d409411ef1	Track validity of pass results Running tests with expensive checks enabled exhibits some problems with verification of pass results. First, the pass verification may require results of analysis that are not available. For instance, verification of loop info requires results of dominator tree analysis. A pass may be marked as conserving loop info but does not need to be dependent on DominatorTreePass. When a pass manager tries to verify that loop info is valid, it needs dominator tree, but corresponding analysis may be already destroyed as no user of it remained. Another case is a pass that is skipped. For instance, entities with linkage available_externally do not need code generation and such passes are skipped for them. In this case result verification must also be skipped. To solve these problems this change introduces a special flag to the Pass structure to mark passes that have valid results. If this flag is reset, verifications dependent on the pass result are skipped. Differential Revision: https://reviews.llvm.org/D27190 llvm-svn: 291882	2017-01-13 06:09:54 +00:00
Easwaran Raman	b035f914e4	ProfileSummaryInfo improvements. * Add is{Hot\|Cold}CallSite methods * Fix a bug in isHotBB where it was looking for MD_prof on a return instruction * Use MD_prof data only if sample profiling was used to collect profiles. * Add an unit test to ProfileSummaryInfo Differential Revision: https://reviews.llvm.org/D28584 llvm-svn: 291878	2017-01-13 01:34:00 +00:00
Eugene Zelenko	8187c192c6	[PowerPC] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 291872	2017-01-13 00:58:58 +00:00
Greg Clayton	0e62ee7d60	Add the ability to iterate across all attributes in a DIE. Differential Revision: https://reviews.llvm.org/D28386 llvm-svn: 291861	2017-01-13 00:13:42 +00:00
Evgeniy Stepanov	f01c70fec0	[asan] Don't overalign global metadata. Other than on COFF with incremental linking, global metadata should not need any extra alignment. Differential Revision: https://reviews.llvm.org/D28628 llvm-svn: 291859	2017-01-12 23:26:20 +00:00
Evgeniy Stepanov	5d31d08a21	[asan] Refactor instrumentation of globals. llvm-svn: 291858	2017-01-12 23:03:03 +00:00
Teresa Johnson	83aaf358fd	[ThinLTO] Import static functions from the same module as caller Summary: We can sometimes end up with multiple copies of a local function that have the same GUID in the index. This happens when there are local functions with the same name that are in different source files with the same name (but in different directories), and they were compiled in their own directory so had the same path at compile time. In this case make sure we import the copy in the caller's module. While it isn't a correctness problem (the renamed reference which is based on the module IR hash will be unique since the module must have had an externally visible function that was imported), importing the wrong copy will result in lost performance opportunity since it won't be referenced and inlined. Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28440 llvm-svn: 291841	2017-01-12 22:04:45 +00:00
Chris Bieneman	07088c1060	[ObjectYAML] Pull yaml2dwarf out of yaml2obj for reuse This patch pulls the yaml2dwarf code out of yaml2obj into a new set of DWARF emitter functions in the DWARFYAML namespace. This will enable the YAML->DWARF code to be used inside DWARF tests by populating the DWARFYAML structs and calling the Emitter functions. llvm-svn: 291828	2017-01-12 21:35:21 +00:00
Robert Lougher	b0124c1eb8	[DebugInfo] Remove redundant check in SimplifyCFG; NFC. llvm-svn: 291813	2017-01-12 21:11:09 +00:00
Eli Friedman	b5c3a0d1c3	[SCEV] Simplify SolveLinEquationWithOverflow a bit. Cleanup in preparation for generalizing it. llvm-svn: 291808	2017-01-12 20:21:00 +00:00
Nikolai Bozhenov	f02ac0eeb2	[X86] Replace AND+IMM64 with SRL/SHL Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result is unused and the mask has only higher/lower bits set. For example, with this patch LLVM emits shrq $41, %rdi je instead of movabsq $0xFFFFFE0000000000, %rcx testq %rcx, %rdi je This reduces number of instructions, code size and register pressure. The transformation is applied only for cases where the mask cannot be encoded as an immediate value within TESTQ instruction. Differential Revision: https://reviews.llvm.org/D28198 llvm-svn: 291806	2017-01-12 19:54:27 +00:00
Nikolai Bozhenov	6bdf92cec7	[X86] Tune bypassing of slow division for Intel CPUs 64-bit integer division in Intel CPUs is extremely slow, much slower than 32-bit division. On the other hand, 8-bit and 16-bit divisions aren't any faster. The only important exception is Atom where DIV8 is fastest. Because of that, the patch 1) Enables bypassing of 64-bit division for Atom, Silvermont and all big cores. 2) Modifies 64-bit bypassing to use 32-bit division instead of 16-bit one. This doesn't make the shorter division slower but increases chances of taking it. Moreover, it's much more likely to prove at compile-time that a value fits 32 bits and doesn't require a run-time check (e.g. zext i32 to i64). Differential Revision: https://reviews.llvm.org/D28196 llvm-svn: 291800	2017-01-12 19:34:15 +00:00
Matt Arsenault	45337df08f	AMDGPU: Skip fneg/select combine if it can fold into other llvm-svn: 291792	2017-01-12 18:58:15 +00:00
Matt Arsenault	31c039ef2e	AMDGPU: Fold free fneg into sin llvm-svn: 291790	2017-01-12 18:48:09 +00:00
Saleem Abdulrasool	555e5980a5	ARM: slightly more table driven libcall setup Switch some additional library call setup to be table driven. This makes it more immediately obvious what the library call looks like. This is important for ARM since the calling conventions for the builtins change based on the target/libcall name. NFC llvm-svn: 291789	2017-01-12 18:46:11 +00:00
Robert Lougher	6717a6fe54	[DebugInfo] DILocation variable declaration should be const; NFC. llvm-svn: 291787	2017-01-12 18:33:49 +00:00
Hans Wennborg	84da661509	Avoid std::errc::protocol_* to appease mingw Like r291636 and r285261. llvm-svn: 291786	2017-01-12 18:33:14 +00:00
Robert Lougher	f5df7a18dd	[DebugInfo] Add const to DILocation variable declaration; NFC. llvm-svn: 291785	2017-01-12 18:29:28 +00:00
Matt Arsenault	a8c325e2f5	AMDGPU: Fold fneg into fmul_legacy llvm-svn: 291784	2017-01-12 18:26:30 +00:00
Matt Arsenault	ff7e5aadf5	AMDGPU: Fold fneg into rcp llvm-svn: 291779	2017-01-12 17:46:35 +00:00
Matt Arsenault	4242d48c36	AMDGPU: Fold fneg into fp_round llvm-svn: 291778	2017-01-12 17:46:33 +00:00
Matt Arsenault	98d2bf1024	AMDGPU: Fold fneg into fp_extend llvm-svn: 291777	2017-01-12 17:46:28 +00:00
Daniel Sanders	b7391dd3b4	[globalisel] Move as much RegisterBank initialization to the constructor as possible Summary: The register bank is now entirely initialized in the constructor. However, we still have the hardcoded number of register classes which will be dealt with in the TableGen patch (D27338) since we do not have access to this information to resolve this at this stage. The number of register classes is known to the TRI and to TableGen but the RegisterBank constructor is too early for the former and too late for the latter. This will be fixed when the data is tablegen-erated. Reviewers: t.p.northover, ab, rovka, qcolombet Subscribers: aditya_nandakumar, kristof.beyls, vkalintiris, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D27809 llvm-svn: 291770	2017-01-12 16:11:23 +00:00
Amjad Aboud	9607571861	[DebugInfo] Added DI macro creation API to DIBuilder. Differential Revision: https://reviews.llvm.org/D16077 llvm-svn: 291769	2017-01-12 15:49:46 +00:00
Daniel Sanders	ae03595bfb	[globalisel] Initialize RegisterBanks with static data. Summary: Refactor the RegisterBank initialization to use static data. This requires GlobalISel implementations to rewrite calls to createRegisterBank() and addRegBankCoverage() into a call to setRegBankData(). Out of tree targets can use diff 4 of D27807 (https://reviews.llvm.org/D27807?id=84117) to have addRegBankCoverage() dump the register classes and other data that needs to be provided to setRegBankData(). This is the method that was used to generate the static data in this patch. Tablegen-eration of this static data will follow after some refactoring. Reviewers: t.p.northover, ab, rovka, qcolombet Subscribers: aditya_nandakumar, kristof.beyls, vkalintiris, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D27807 Differential Revision: https://reviews.llvm.org/D27808 llvm-svn: 291768	2017-01-12 15:32:10 +00:00
Piotr Padlewski	9530883e8c	[Devirtualization] MemDep returns non-local !invariant.group dependencies Summary: Memory Dependence Analysis was limited to return only local dependencies for invariant.group handling. Now it returns NonLocal when it finds it and then by asking getNonLocalPointerDependency we get found dep. Thanks to this we are able to devirtualize loops! void indirect(A &a, int n) { for (int i = 0 ; i < n; i++) a.foo(); } void test(int n) { A a; indirect(a); } After inlining a.foo() will be changed to direct call, even if foo and A::A() is external (but only if vtable definition is be available). Reviewers: nlewycky, dberlin, chandlerc, rsmith Subscribers: mehdi_amini, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D28137 llvm-svn: 291762	2017-01-12 11:33:58 +00:00
Matt Arsenault	f003198b28	AMDGPU: Fix sub_oneuse being marked commutative llvm-svn: 291748	2017-01-12 07:17:28 +00:00
Craig Topper	24c3a2395f	[AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 with VLX, but no DQ or BW support. llvm-svn: 291747	2017-01-12 06:49:12 +00:00
Craig Topper	69ab67b279	[AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 when avx512vl is available, but not avx512dq. llvm-svn: 291746	2017-01-12 06:49:08 +00:00
Elad Cohen	c5ba925ef2	[X86][AVX512] Fix PR31515 - Do not flip vselect condition if it's not a vXi1 mask r289653 added a case where `vselect <cond> <vector1> <all-zeros>` is transformed to: `vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>` This was not aimed to catch cases where Cond is not a vXi1 mask but it does. Moreover, when Cond type is VxiN (N > 1) then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond). This patch changes the above to xor with allones, and avoids entering the case for non-mask Conds. llvm-svn: 291745	2017-01-12 06:49:03 +00:00
Matt Arsenault	63f953795e	AMDGPU: Fold fneg into fma or fmad Patch mostly by Fiona Glaser llvm-svn: 291733	2017-01-12 00:32:16 +00:00
Matt Arsenault	4103a81d6d	AMDGPU: Fold fneg into fmul Patch mostly by Fiona Glaser llvm-svn: 291732	2017-01-12 00:23:20 +00:00
Matt Arsenault	2529fba989	AMDGPU: Fold fneg into fadd Patch mostly by Fiona Glaser llvm-svn: 291731	2017-01-12 00:09:34 +00:00
Matt Arsenault	2a04ff97ad	AMDGPU: Pull fneg/fabs out of a select Allows better source modifier usage. llvm-svn: 291729	2017-01-11 23:57:38 +00:00
Davide Italiano	eac05f6b88	[NewGVN] Fixup store count for the `initial` congruency class. It was always zero. When we move a store from `initial` to its own congruency class, we end up with a negative store count, which is obviously wrong. Also, while here, change StoreCount to be signed so that the assertions actually fire. Ack'ed by Daniel Berlin. llvm-svn: 291725	2017-01-11 23:41:24 +00:00
Zachary Turner	629cb7d8cc	[CodeView] Finish decoupling TypeDatabase from TypeDumper. Previously the type dumper itself was passed around to a lot of different places and manipulated in ways that were more appropriate on the type database. For example, the entire TypeDumper was passed into the symbol dumper, when all the symbol dumper wanted to do was lookup the name of a TypeIndex so it could print it. That's what the TypeDatabase is for -- mapping type indices to names. Another example is how if the user runs llvm-pdbdump with the option to dump symbols but not types, we still have to visit all types so that we can print minimal information about the type of a symbol, but just without dumping full symbol records. The way we did this before is by hacking it up so that we run everything through the type dumper with a null printer, so that the output goes to /dev/null. But really, we don't need to dump anything, all we want to do is build the type database. Since TypeDatabaseVisitor now exists independently of TypeDumper, we can do this. We just build a custom visitor callback pipeline that includes a database visitor but not a dumper. All the hackery around printers etc goes away. After this patch, we could probably even delete the entire CVTypeDumper class since really all it is at this point is a thin wrapper that hides the details of how to build a useful visitation pipeline. It's not a priority though, so CVTypeDumper remains for now. After this patch we will be able to easily plug in a different style of type dumper by only implementing the proper visitation methods to dump one-line output and then sticking it on the pipeline. Differential Revision: https://reviews.llvm.org/D28524 llvm-svn: 291724	2017-01-11 23:24:22 +00:00
Peter Collingbourne	1b5f1cfdb4	X86: Remove dead code. NFC. llvm-svn: 291721	2017-01-11 23:00:28 +00:00
Matt Arsenault	24a1273ae1	AMDGPU: Fix shrinking of addc/subb. To shrink to VOP2 the input carry must also be VCC. llvm-svn: 291720	2017-01-11 22:58:12 +00:00
Matt Arsenault	682eb4396a	AMDGPU: Fix sext_inreg for i1 in i16 This produces worse code when i16 is legal, mostly due to combines getting confused by conversions inserted for uniform 16-bit operations. llvm-svn: 291717	2017-01-11 22:35:22 +00:00
Matt Arsenault	28bd4cbeaf	AMDGPU: Fix breaking VOP3 v_add_i32s This was shrinking the instruction even though the carry output register was a virtual register, not known VCC. llvm-svn: 291716	2017-01-11 22:35:17 +00:00
Kuba Mracek	503162b4a1	[asan] Set alignment of __asan_global_* globals to sizeof(GlobalStruct) When using profiling and ASan together (-fprofile-instr-generate -fcoverage-mapping -fsanitize=address), at least on Darwin, the section of globals that ASan emits (__asan_globals) is misaligned and starts at an odd offset. This really doesn't have anything to do with profiling, but it triggers the issue because profiling emits a string section, which can have arbitrary size. This patch changes the alignment to sizeof(GlobalStruct). Differential Revision: https://reviews.llvm.org/D28573 llvm-svn: 291715	2017-01-11 22:26:10 +00:00
Davide Italiano	0dc68bfa87	Revert "[NewGVN] Strengthen a couple of assertions." It's breaking some bots. Will investigate and recommit. llvm-svn: 291712	2017-01-11 22:00:29 +00:00
Matt Arsenault	69e3001b84	AMDGPU: Fix folding immediates into mac src2 Whether it is legal or not needs to check for the instruction it will be replaced with. llvm-svn: 291711	2017-01-11 22:00:02 +00:00
Davide Italiano	ff69405213	[NewGVN] Parenthesise assertion condition (-Wparenthesis). Format an assertion message while I'm here. llvm-svn: 291710	2017-01-11 21:58:42 +00:00
Davide Italiano	6e919df2f5	[NewGVN] Strengthen a couple of assertions. StoreCount >= 0 on `unsigned` is always true, otherwise. llvm-svn: 291709	2017-01-11 21:49:00 +00:00
Peter Collingbourne	7636532c1b	LowerTypeTests: Represent the memory region size with the constant size-1. This means that we can use a shorter instruction sequence in the case where the size is a power of two and on the boundary between two representations. Differential Revision: https://reviews.llvm.org/D28421 llvm-svn: 291706	2017-01-11 21:32:10 +00:00
Eli Friedman	bd6dedaa7f	[SCEV] Make howFarToZero max backedge-taken count check for precondition. Refines max backedge-taken count if a loop like "for (int i = 0; i != n; ++i) { /* body */ }" is rotated. Differential Revision: https://reviews.llvm.org/D28536 llvm-svn: 291704	2017-01-11 21:07:15 +00:00
Eli Friedman	8396265655	[SCEV] Make howFarToZero use a simpler formula for max backedge-taken count. This is both easier to understand, and produces a tighter bound in certain cases. Differential Revision: https://reviews.llvm.org/D28393 llvm-svn: 291701	2017-01-11 20:55:48 +00:00
Peter Collingbourne	6bca5a0d82	Re-apply r291205, "LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase.", with a fix for an off-by-one error. llvm-svn: 291699	2017-01-11 20:28:46 +00:00
Daniel Berlin	f6eba4be2c	NewGVN: Fix PR31594, by tracking the store count of congruence classes, and updating checking to allow for equivalence through reachability. (Sadly, the checking here is not perfect, and can't be made perfect, so we'll have to disable it after we are satisfied with correctness. Right now it is just "very unlikely" to happen.) llvm-svn: 291698	2017-01-11 20:22:36 +00:00
Daniel Berlin	3a1bd0216a	NewGVN: Refactor performCongruenceFinding and split out congruence class moving llvm-svn: 291697	2017-01-11 20:22:05 +00:00
Rong Xu	20f5df1d70	Resubmit "[PGO] Turn off comdat renaming in IR PGO by default" This patch resubmits the changes in r291588. llvm-svn: 291696	2017-01-11 20:19:41 +00:00
Kyle Butt	efe56fed12	Revert "CodeGen: Allow small copyable blocks to "break" the CFG." This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded. This needs a simple probability check because there are some cases where it is not profitable. llvm-svn: 291695	2017-01-11 19:55:19 +00:00
Eli Friedman	3a03742c37	[ARM] More aggressive matching for vpadd and vpaddl. The new matchers work after legalization to make them simpler, and to avoid blocking other optimizations. Differential Revision: https://reviews.llvm.org/D27779 llvm-svn: 291693	2017-01-11 19:33:38 +00:00
Michael Kuperstein	f69e64662b	[SLP] Remove bogus assert. The removed assert seems bogus - it's perfectly legal for the roots of the vectorized subtrees to be equal even if the original scalar values aren't, if the original scalars happen to be equivalent. This fixes PR31599. Differential Revision: https://reviews.llvm.org/D28539 llvm-svn: 291692	2017-01-11 19:23:57 +00:00
Davide Italiano	3b5edf4fa7	[lib/Object] Unbreak build with -Werror (unused variable). NFCI. llvm-svn: 291691	2017-01-11 19:05:27 +00:00
Greg Clayton	d1efea89c9	Remove all variants of DWARFDie::getAttributeValueAs...() that had parameters that specified default values. Now we only support returning Optional<> values and have changed all clients over to use Optional::getValueOr(). Differential Revision: https://reviews.llvm.org/D28569 llvm-svn: 291686	2017-01-11 17:43:37 +00:00
Tim Northover	778d0816ab	GlobalISel: only print debug info with -debug. NFC. Turns out DEBUG(...) has uses even inside NDEBUG checks. llvm-svn: 291685	2017-01-11 17:33:37 +00:00
Ivan Krasin	42e6b4fd98	Revert rL291205 because it breaks Chrome tests under CFI. Summary: Revert LowerTypeTests: Split the pass in two: a resolution phase and a lowering phase. This change separates how type identifiers are resolved from how intrinsic calls are lowered. All information required to lower an intrinsic call is stored in a new TypeIdLowering data structure. The idea is that this data structure can either be initialized using the module itself during regular LTO, or using the module summary in ThinLTO backends. Original URL: https://reviews.llvm.org/D28341 Reviewers: pcc Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D28532 llvm-svn: 291684	2017-01-11 16:54:04 +00:00
Simon Pilgrim	0c1faf432b	Remove trailing whitespace. NFCI. llvm-svn: 291680	2017-01-11 16:38:20 +00:00
Piotr Padlewski	e41beed403	[MemDep] NFC variable name change llvm-svn: 291679	2017-01-11 16:23:54 +00:00
George Rimar	4bf308317d	[lib/Object] - Introduce Decompressor class. Decompressor intention is to reduce duplication of code. Currently LLD has own implementation of decompressor for compressed debug sections. This class helps to avoid it and share the code. LLD patch for reusing it is D28106 Differential revision: https://reviews.llvm.org/D28105 llvm-svn: 291675	2017-01-11 15:26:41 +00:00
Jonas Paulsson	c282975604	[SystemZ] Improve isFoldableMemAccessOffset(). A store of an extracted element or a load which gets inserted into a vector, will be combined into a vector load/store element instruction. Therefore, isFoldableMemAccessOffset(), which is called by LSR, should return false in these cases. Reviewer: Ulrich Weigand llvm-svn: 291673	2017-01-11 14:40:39 +00:00
Hal Finkel	8a9a783f2c	Make processing @llvm.assume more efficient - Add affected values to the assumption cache Here's my second try at making @llvm.assume processing more efficient. My previous attempt, which leveraged operand bundles, r289755, didn't end up working: it did make assume processing more efficient but eliminating the assumption cache made ephemeral value computation too expensive. This is a more-targeted change. We'll keep the assumption cache, but extend it to keep a map of affected values (i.e. values about which an assumption might provide some information) to the corresponding assumption intrinsics. This allows ValueTracking and LVI to find assumptions relevant to the value being queried without scanning all assumptions in the function. The fact that ValueTracking started doing O(number of assumptions in the function) work, for every known-bits query, has become prohibitively expensive in some cases. As discussed during the review, this is a pragmatic fix that, longer term, will likely be replaced by a more-principled solution (perhaps based on an extended SSA form). Differential Revision: https://reviews.llvm.org/D28459 llvm-svn: 291671	2017-01-11 13:24:24 +00:00
Elena Demikhovsky	9d0e7c33d3	X86 CodeGen: Optimized pattern for truncate with unsigned saturation. DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. And VPACKUS* instructions on SEE* targets. Differential Revision: https://reviews.llvm.org/D28216 llvm-svn: 291670	2017-01-11 12:59:32 +00:00
Sam Kolton	9772eb3907	[AMDGPU] Assembler: SDWA/DPP should not accept scalar registers and immediate operands Reviewers: artem.tamazov, nhaustov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28157 llvm-svn: 291668	2017-01-11 11:46:30 +00:00
Simon Pilgrim	5a81fefad3	[X86][AVX512BW] Vectorize v64i8 vector shifts Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665	2017-01-11 10:36:51 +00:00
Chandler Carruth	3bab7e1a79	[PM] Separate the LoopAnalysisManager from the LoopPassManager and move the latter to the Transforms library. While the loop PM uses an analysis to form the IR units, the current plan is to have the PM itself establish and enforce both loop simplified form and LCSSA. This would be a layering violation in the analysis library. Fundamentally, the idea behind the loop PM is to transform loops in addition to running passes over them, so it really seemed like the most natural place to sink this was into the transforms library. We can't just move everything because we also have loop analyses that rely on a subset of the invariants. So this patch splits the the loop infrastructure into the analysis management that has to be part of the analysis library, and the transform-aware pass manager. This also required splitting the loop analyses' printer passes out to the transforms library, which makes sense to me as running these will transform the code into LCSSA in theory. I haven't split the unittest though because testing one component without the other seems nearly intractable. Differential Revision: https://reviews.llvm.org/D28452 llvm-svn: 291662	2017-01-11 09:43:56 +00:00
Elad Cohen	0c2601073e	[X86] Fix PR30926 - Add patterns for (v)cvtsi2s{s,d} and (v)cvtsd2s{s,d} The code emiited by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes redundant (v)movss/(v)movsd instructions. This patch adds patterns for optimizing these sequences. Differential revision: https://reviews.llvm.org/D28455 llvm-svn: 291660	2017-01-11 09:11:48 +00:00
Mohammed Agabaria	2c96c43388	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Dean Michael Berris	d6c18657bb	[XRay] Define the library for XRay trace logs Summary: In this change we move the definition of the log reading routines from the tools directory in LLVM to {include/llvm,lib}/XRay. We improve the documentation a little bit for the publicly accessible headers, and adjust the top-matter. This also leads to some refactoring and cleanup in the tooling code. In particular, we do the following: - Rename the class from LogReader to Trace, as it better represents the logical set of records as opposed to a log. - Use file type detection instead of asking the user to say what format the input file is. This allows us to keep the interface simple and encapsulate the logic of loading the data appropriately. In future changes we increase the API surface and write dedicated unit tests for the XRay library. Depends on D24376. Reviewers: dblaikie, echristo Subscribers: mehdi_amini, mgorny, llvm-commits, varno Differential Revision: https://reviews.llvm.org/D28345 llvm-svn: 291652	2017-01-11 06:39:09 +00:00
Chandler Carruth	410eaeb064	[PM] Rewrite the loop pass manager to use a worklist and augmented run arguments much like the CGSCC pass manager. This is a major redesign following the pattern establish for the CGSCC layer to support updates to the set of loops during the traversal of the loop nest and to support invalidation of analyses. An additional significant burden in the loop PM is that so many passes require access to a large number of function analyses. Manually ensuring these are cached, available, and preserved has been a long-standing burden in LLVM even with the help of the automatic scheduling in the old pass manager. And it made the new pass manager extremely unweildy. With this design, we can package the common analyses up while in a function pass and make them immediately available to all the loop passes. While in some cases this is unnecessary, I think the simplicity afforded is worth it. This does not (yet) address loop simplified form or LCSSA form, but those are the next things on my radar and I have a clear plan for them. While the patch is very large, most of it is either mechanically updating loop passes to the new API or the new testing for the loop PM. The code for it is reasonably compact. I have not yet updated all of the loop passes to correctly leverage the update mechanisms demonstrated in the unittests. I'll do that in follow-up patches along with improved FileCheck tests for those passes that ensure things work in more realistic scenarios. In many cases, there isn't much we can do with these until the loop simplified form and LCSSA form are in place. Differential Revision: https://reviews.llvm.org/D28292 llvm-svn: 291651	2017-01-11 06:23:21 +00:00
Craig Topper	7af39837a9	Revert r291645 "[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent." Some test appears to be hanging on the build bots. llvm-svn: 291650	2017-01-11 04:59:25 +00:00
Adam Nemet	e2aaf3a35e	[LICM] Report failing to hoist conditionally-executed loads These are interesting again because the user may not be aware that this is a common reason preventing LICM. A const is removed from an instruction pointer declaration in order to pass it to ORE. Differential Revision: https://reviews.llvm.org/D27940 llvm-svn: 291649	2017-01-11 04:39:49 +00:00
Adam Nemet	81941b3195	[LICM] Report failing to hoist a load with an invariant address These are interesting because lack of precision in alias information could be standing in the way of this optimization. An example is the case in the test suite that I showed in the DevMeeting talk: http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/MultiSource/Benchmarks/FreeBench/distray/CMakeFiles/distray.dir/html/_org_test-suite_MultiSource_Benchmarks_FreeBench_distray_distray.c.html#L236 canSinkOrHoistInst is also used from LoopSink, which does not use opt-remarks so we need to take ORE as an optional argument. Differential Revision: https://reviews.llvm.org/D27939 llvm-svn: 291648	2017-01-11 04:39:45 +00:00
Adam Nemet	358433ce1b	[LICM] Report successful hoist/sink/promotion Differential Revision: https://reviews.llvm.org/D27938 llvm-svn: 291646	2017-01-11 04:39:35 +00:00
Craig Topper	577d258569	[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent. llvm-svn: 291645	2017-01-11 04:02:23 +00:00
Matt Arsenault	e482403e1c	DAGCombiner: Add hasOneUse checks to fadd/fma combine Even with aggressive fusion enabled, this requires duplicating the fmul, or increases an fadd to another fma which is not an improvement. llvm-svn: 291642	2017-01-11 02:02:12 +00:00
Eugene Zelenko	c4ad1ce068	[Target] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 291641	2017-01-11 01:45:03 +00:00
Hans Wennborg	6573976f57	Re-commit r289955: [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367) This was reverted because it would miscompile code where the cmp had multiple uses. That was due to a deficiency in the existing code, which was fixed in r291630 (see the PR for details). This re-commit includes an extra test for the kind of code that got miscompiled: @test_sub_1_setcc_jcc. llvm-svn: 291640	2017-01-11 01:36:57 +00:00
Matt Arsenault	8260666d11	InstSimplify: Refactor function to use more switches llvm-svn: 291634	2017-01-11 00:57:54 +00:00
Hans Wennborg	12de693747	[X86] Dont run combineSetCCAtomicArith() when the cmp has multiple uses We would miscompile the following: void g(int); int f(volatile long long *p) { bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0; g(b ? 12 : 34); return b ? 56 : 78; } into pushq %rax lock incq (%rdi) movl $12, %eax movl $34, %edi cmovlel %eax, %edi callq g(int) testq %rax, %rax <---- Bad. movl $56, %ecx movl $78, %eax cmovsl %ecx, %eax popq %rcx retq because the code failed to take into account that the cmp has multiple uses, replaced one of them, and left the other one comparing garbage. llvm-svn: 291630	2017-01-11 00:49:54 +00:00
Quentin Colombet	0b63b31b2e	[RegBankSelect] Improve the output of the debug messages. Add more information about mapping cost and chosen solution. llvm-svn: 291629	2017-01-11 00:48:41 +00:00
Zachary Turner	a9054ddd9c	[CodeView/PDB] Rename a bunch of files. We were starting to get some name clashes between llvm-pdbdump and the common CodeView framework, so I took this opportunity to rename a bunch of files to more accurately describe their usage. This also helps in llvm-pdbdump to distinguish between different files and whether they are used for pretty dump mode or raw dump mode. llvm-svn: 291627	2017-01-11 00:35:43 +00:00
Zachary Turner	c640b76db5	[CodeView] Add TypeDatabase class. This creates a centralized class in which to store type records. It stores types as an array of entries, which matches the notion of a type stream being a topologically sorted DAG. Logic to build up such a database was already being used in CVTypeDumper, so CVTypeDumper is now updated to to read from a TypeDatabase which is filled out by an earlier visitor in the pipeline. Differential Revision: https://reviews.llvm.org/D28486 llvm-svn: 291626	2017-01-11 00:35:08 +00:00
Matt Arsenault	1e0edbf03c	InstSimplify: Eliminate fabs on known positive llvm-svn: 291624	2017-01-11 00:33:24 +00:00
Jan Vesely	0d6cb1caaf	AMDGPU/EG,CM: Add fp16 conversion instructions Differential Revision: https://reviews.llvm.org/D28164 llvm-svn: 291622	2017-01-11 00:12:39 +00:00
Rong Xu	acd6360251	Revert "[PGO] Turn off comdat renaming in IR PGO by default" This patch reverts r291588: [PGO] Turn off comdat renaming in IR PGO by default, as we are seeing some hash mismatches in our internal tests. llvm-svn: 291621	2017-01-10 23:54:31 +00:00
Sanjay Patel	db0938fd9a	[InstCombine] add a wrapper for a common pair of transforms; NFCI Some of the callers are artificially limiting this transform to integer types; this should make it easier to incrementally remove that restriction. llvm-svn: 291620	2017-01-10 23:49:07 +00:00
Florian Hahn	4f9d6d56c0	[loop-unroll] Properly populate LoopInfo for loops cloned in LoopUnrollRuntime. Summary: This fixes Transforms/LoopUnroll/runtime-loop3.ll which failed with EXTENSIVE_DEBUG, because the cloned basic blocks were not added to the correct sub-loops in LoopUnrollRuntime.cpp. Reviewers: dexonsmith, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28482 llvm-svn: 291619	2017-01-10 23:43:35 +00:00
Justin Lebar	7d81813d76	[TM] Restore default TargetOptions in TargetMachine::resetTargetOptions. Summary: Previously if you had * a function with the fast-math-enabled attr, followed by * a function without the fast-math attr, the second function would inherit the first function's fast-math-ness. This means that mixing fast-math and non-fast-math functions in a module was completely broken unless you explicitly annotated every non-fast-math function with "unsafe-fp-math"="false". This appears to have been broken since r176986 (March 2013), when the resetTargetOptions function was introduced. This patch tests the correct behavior as best we can. I don't think I can test FPDenormalMode and NoTrappingFPMath, because they aren't used in any backends during function lowering. Surprisingly, I also can't find any uses at all of LessPreciseFPMAD affecting generated code. The NVPTX/fast-math.ll test changes are an expected result of fixing this bug. When FMA is disabled, we emit add as "add.rn.f32", which prevents fma combining. Before this patch, fast-math was enabled in all functions following the one which explicitly enabled it on itself, so we were emitting plain "add.f32" where we should have generated "add.rn.f32". Reviewers: mkuper Subscribers: hfinkel, majnemer, jholewinski, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28507 llvm-svn: 291618	2017-01-10 23:43:04 +00:00
Evandro Menezes	330e1b8945	[AArch64] Consider all vector types for FeatureSlowMisaligned128Store The original code considered only v2i64 as slow for this feature. This patch consider all 128-bit long vector types as slow candidates. In internal tests, extending this feature to all 128-bit vector types resulted in an overall improvement of 1% on Exynos M1. Differential revision: https://reviews.llvm.org/D27998 llvm-svn: 291616	2017-01-10 23:42:21 +00:00
Matt Arsenault	51818c14b3	AMDGPU: Constant fold when immediate is materialized In future commits these patterns will appear after moveToVALU changes. llvm-svn: 291615	2017-01-10 23:32:04 +00:00

1 2 3 4 5 ...

98548 Commits