llvm-project

Commit Graph

Author	SHA1	Message	Date
NAKAMURA Takumi	70ad98aca4	Reformat. llvm-svn: 248261	2015-09-22 11:13:55 +00:00
NAKAMURA Takumi	bf9cc7f30b	Fix utf8 chars. llvm-svn: 248259	2015-09-22 11:10:08 +00:00
Chad Rosier	03a47305ec	[Machine Combiner] Refactor machine reassociation code to be target-independent. No functional change intended. Patch by Haicheng Wu <haicheng@codeaurora.org>! http://reviews.llvm.org/D12887 PR24522 llvm-svn: 248164	2015-09-21 15:09:11 +00:00
Eric Christopher	a4e5d3cf8e	constify the Function parameter to the TTI creation callback and propagate to all callers/users/etc. llvm-svn: 247864	2015-09-16 23:38:13 +00:00
Sanjay Patel	a260701bbb	propagate fast-math-flags on DAG nodes After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is one test case in this patch to prove that point. This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF ( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes. This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the current global settings. Differential Revision: http://reviews.llvm.org/D12095 llvm-svn: 247815	2015-09-16 16:31:21 +00:00
Daniel Sanders	50f17235dd	Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC. Eric has replied and has demanded the patch be reverted. llvm-svn: 247702	2015-09-15 16:17:27 +00:00
Daniel Sanders	153010c52d	Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC. Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Thanks go to Pavel Labath for fixing LLDB for me. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247692	2015-09-15 14:08:28 +00:00
Daniel Sanders	c40de48041	Revert r247684 - Replace Triple with a new TargetTuple ... LLDB needs to be updated in the same commit. llvm-svn: 247686	2015-09-15 13:46:21 +00:00
Daniel Sanders	18d4b0dab7	Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC. Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247683	2015-09-15 13:17:40 +00:00
Daniel Sanders	c8cd6e95d2	Fix namespace indentation and missing blank lines before 'public:' in *MCAsmInfo.h. NFC. This is to reduce noise in a following commit. Also fixes a couple missing spaces before the reference operator. llvm-svn: 247679	2015-09-15 12:27:06 +00:00
NAKAMURA Takumi	8061e8645f	PPCFrameLowering::emitEpilogue(): Avoid manipulating MBBI on iterator end. It caused crash in MachineInstr::hasPropertyInBundle() since r247237. llvm-svn: 247395	2015-09-11 08:20:56 +00:00
Cong Hou	c536bd9e73	Pass BranchProbability/BlockMass by value instead of const& as they are small. NFC. llvm-svn: 247357	2015-09-10 23:10:42 +00:00
Kit Barton	d3b904d440	Enable the shrink wrapping optimization for PPC64. The changes in this patch are as follows: 1. Modify the emitPrologue and emitEpilogue methods to work properly when the prologue and epilogue blocks are not the first/last blocks in the function 2. Fix a bug in PPCEarlyReturn optimization caused by an empty entry block in the function 3. Override the runShrinkWrap PredicateFtor (defined in TargetMachine) to check whether shrink wrapping should run: Shrink wrapping will run on PPC64 (Little Endian and Big Endian) unless -enable-shrink-wrap=false is specified on command line A new test case, ppc-shrink-wrapping.ll was created based on the existing shrink wrapping tests for x86, arm, and arm64. Phabricator review: http://reviews.llvm.org/D11817 llvm-svn: 247237	2015-09-10 01:55:44 +00:00
Chandler Carruth	7b560d40bd	[PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible with the new pass manager, and no longer relying on analysis groups. This builds essentially a ground-up new AA infrastructure stack for LLVM. The core ideas are the same that are used throughout the new pass manager: type erased polymorphism and direct composition. The design is as follows: - FunctionAAResults is a type-erasing alias analysis results aggregation interface to walk a single query across a range of results from different alias analyses. Currently this is function-specific as we always assume that aliasing queries are within a function. - AAResultBase is a CRTP utility providing stub implementations of various parts of the alias analysis result concept, notably in several cases in terms of other more general parts of the interface. This can be used to implement only a narrow part of the interface rather than the entire interface. This isn't really ideal, this logic should be hoisted into FunctionAAResults as currently it will cause a significant amount of redundant work, but it faithfully models the behavior of the prior infrastructure. - All the alias analysis passes are ported to be wrapper passes for the legacy PM and new-style analysis passes for the new PM with a shared result object. In some cases (most notably CFL), this is an extremely naive approach that we should revisit when we can specialize for the new pass manager. - BasicAA has been restructured to reflect that it is much more fundamentally a function analysis because it uses dominator trees and loop info that need to be constructed for each function. All of the references to getting alias analysis results have been updated to use the new aggregation interface. All the preservation and other pass management code has been updated accordingly. The way the FunctionAAResultsWrapperPass works is to detect the available alias analyses when run, and add them to the results object. This means that we should be able to continue to respect when various passes are added to the pipeline, for example adding CFL or adding TBAA passes should just cause their results to be available and to get folded into this. The exception to this rule is BasicAA which really needs to be a function pass due to using dominator trees and loop info. As a consequence, the FunctionAAResultsWrapperPass directly depends on BasicAA and always includes it in the aggregation. This has significant implications for preserving analyses. Generally, most passes shouldn't bother preserving FunctionAAResultsWrapperPass because rebuilding the results just updates the set of known AA passes. The exception to this rule are LoopPass instances which need to preserve all the function analyses that the loop pass manager will end up needing. This means preserving both BasicAAWrapperPass and the aggregating FunctionAAResultsWrapperPass. Now, when preserving an alias analysis, you do so by directly preserving that analysis. This is only necessary for non-immutable-pass-provided alias analyses though, and there are only three of interest: BasicAA, GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is preserved when needed because it (like DominatorTree and LoopInfo) is marked as a CFG-only pass. I've expanded GlobalsAA into the preserved set everywhere we previously were preserving all of AliasAnalysis, and I've added SCEVAA in the intersection of that with where we preserve SCEV itself. One significant challenge to all of this is that the CGSCC passes were actually using the alias analysis implementations by taking advantage of a pretty amazing set of loop holes in the old pass manager's analysis management code which allowed analysis groups to slide through in many cases. Moving away from analysis groups makes this problem much more obvious. To fix it, I've leveraged the flexibility the design of the new PM components provides to just directly construct the relevant alias analyses for the relevant functions in the IPO passes that need them. This is a bit hacky, but should go away with the new pass manager, and is already in many ways cleaner than the prior state. Another significant challenge is that various facilities of the old alias analysis infrastructure just don't fit any more. The most significant of these is the alias analysis 'counter' pass. That pass relied on the ability to snoop on AA queries at different points in the analysis group chain. Instead, I'm planning to build printing functionality directly into the aggregation layer. I've not included that in this patch merely to keep it smaller. Note that all of this needs a nearly complete rewrite of the AA documentation. I'm planning to do that, but I'd like to make sure the new design settles, and to flesh out a bit more of what it looks like in the new pass manager first. Differential Revision: http://reviews.llvm.org/D12080 llvm-svn: 247167	2015-09-09 17:55:00 +00:00
Eric Christopher	71f6e2f568	Fix the PPC CTR Loop pass to look for calls to the intrinsics that read CTR and count them as reading the CTR. llvm-svn: 247083	2015-09-08 22:14:58 +00:00
Hal Finkel	ccf9259c00	[PowerPC] Don't commute trivial rlwimi instructions To commute a trivial rlwimi instructions (meaning one with a full mask and zero shift), we'd need to ability to form an all-zero mask (instead of an all-one mask) using rlwimi. We can't represent this, however, and we'll miscompile code if we try. The code quality problem that this highlights (that SDAG simplification can lead to us generating an ISD::OR node with a constant zero LHS) will be fixed as a follow-up. Fixes PR24719. llvm-svn: 246937	2015-09-06 04:17:30 +00:00
Hal Finkel	b1518d6c24	[PowerPC] Fix and(or(x, c1), c2) -> rlwimi generation PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from an input pattern that looks like this: and(or(x, c1), c2) but the associated logic does not work if there are bits that are 1 in c1 but 0 in c2 (these are normally canonicalized away, but that can't happen if the 'or' has other users. Make sure we abort the transformation if such bits are discovered. Fixes PR24704. llvm-svn: 246900	2015-09-05 00:02:59 +00:00
Hal Finkel	4a7be23976	[PowerPC] Enable interleaved-access vectorization This adds a basic cost model for interleaved-access vectorization (and a better default for shuffles), and enables interleaved-access vectorization by default. The relevant difference from the default cost model for interleaved-access vectorization, is that on PPC, the shuffles that end up being used are much cheaper than modeling the process with insert/extract pairs (which are quite expensive, especially on older cores). llvm-svn: 246824	2015-09-04 00:10:41 +00:00
Hal Finkel	75afa2b6b6	[PowerPC] Always use aggressive interleaving on the A2 On the A2, with an eye toward QPX unaligned-load merging, we should always use aggressive interleaving. It is generally superior to only using concatenation unrolling. llvm-svn: 246819	2015-09-03 23:23:00 +00:00
Hal Finkel	e6702ca0e2	[PowerPC] Try harder to find a base+offset when looking for consecutive accesses When forming permutation-based unaligned vector loads, we need to know whether it is valid to read ahead of the requested address by a full vector length. Doing so is more efficient (and allows for more CSE with later loads), but could trigger a page fault if invalid. To determine validity, we look for other loads in the same block that access the relevant address range. The relevant point here is that we need to do this as part of the process of forming permutation-based vector loads, and this happens quite early in the SDAG pipeline - specifically before many of the address calculations are fully canonicalized. As a result, we need to try harder to recognize base+offset address computations, because they still might appear as chain of adds (base+offset+offset, for example). To account for this, we'll look through chains of adds, accumulating the constant offsets. llvm-svn: 246813	2015-09-03 22:37:44 +00:00
Hal Finkel	f11bc761d8	[PowerPC] Include the permutation cost for unaligned vector loads Pre-P8, when we generate code for unaligned vector loads (for Altivec and QPX types), even when accounting for the combining that takes place for multiple consecutive such loads, there is at least one load instructions and one permutation for each load. Make sure the cost reported reflects the cost of the permutes as well. llvm-svn: 246807	2015-09-03 21:23:18 +00:00
Hal Finkel	99d95328d6	[PowerPC] Compute the MMO offset for an unaligned load with signed arithmetic If you compute the MMO offset using unsigned arithmetic, you end up with a large positive offset instead of a small negative one. In theory, this could cause bad instruction-scheduling decisions later. I noticed this by inspection from the debug output, and using that for the regression test is the best I can do right now. llvm-svn: 246805	2015-09-03 21:12:15 +00:00
Hal Finkel	79dbf5b562	[PowerPC] Cleanup cost model for unaligned vector loads/stores I'm adding a regression test to better cover code generation for unaligned vector loads and stores, but there's no functional change to the code generation here. There is an improvement to the cost model for unaligned vector loads and stores, mostly for QPX (for which we were not previously accounting for the permutation-based loads), and the cost model implementation is cleaner. llvm-svn: 246712	2015-09-02 21:03:28 +00:00
Hal Finkel	77c8b7ffd3	[PowerPC] Don't always consider P8Altivec-only masks in LowerVECTOR_SHUFFLE LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the TableGen-generated matching code, and it does this by testing the same predicates used by the TableGen files. Unfortunately, when we added new P8Altivec-only predicates, we started universally testing them in LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8, we'd end up with a selection failure. llvm-svn: 246675	2015-09-02 16:52:37 +00:00
Reid Kleckner	e00faf8ce1	[EH] Handle non-Function personalities like unknown personalities Also delete and simplify a lot of MachineModuleInfo code that used to be needed to handle personalities on landingpads. Now that the personality is on the LLVM Function, we no longer need to track it this way on MMI. Certainly it should not live on LandingPadInfo. llvm-svn: 246478	2015-08-31 20:02:16 +00:00
Hal Finkel	a2cdbce661	[PowerPC] Fixup SELECT_CC (and SETCC) patterns with i1 comparison operands There were really two problems here. The first was that we had the truth tables for signed i1 comparisons backward. I imagine these are not very common, but if you have: setcc i1 x, y, LT this has the '0 1' and the '1 0' results flipped compared to: setcc i1 x, y, ULT because, in the signed case, '1 0' is really '-1 0', and the answer is not the same as in the unsigned case. The second problem was that we did not have patterns (at all) for the unsigned comparisons select_cc nodes for i1 comparison operands. This was the specific cause of PR24552. These had to be added (and a missing Altivec promotion added as well) to make sure these function for all types. I've added a bunch more test cases for these patterns, and there are a few FIXMEs in the test case regarding code-quality. Fixes PR24552. llvm-svn: 246400	2015-08-30 22:12:50 +00:00
Hal Finkel	982e8d48f8	[MIR Serialization] static -> static const in getSerializable*MachineOperandTargetFlags Make the arrays 'static const' instead of just 'static'. Post-commit review comment from Roman Divacky on IRC. NFC. llvm-svn: 246376	2015-08-30 08:07:29 +00:00
Hal Finkel	2d55698ed7	[PowerPC/MIR Serialization] Target flags serialization support Add support for MIR serialization of PowerPC-specific operand target flags (based on the generic infrastructure added in r244185 and r245383). I won't even pretend that this is good test coverage, but this includes the regression test associated with r246372. Adding an MIR test for that fix is far superior to adding an IR-level test because particular instruction-scheduling decisions are necessary in order to expose the bug, and using an MIR test we can start the pipeline post-scheduling. llvm-svn: 246373	2015-08-30 07:50:35 +00:00
Hal Finkel	d2fd9becf4	[PowerPC] Don't assume ADDISdtprelHA's source is r3 Even through ADDISdtprelHA generally has r3 as its source register, it is possible for the instruction scheduler to move things around such that some other register is the source. We need to print the actual source register, not always r3. Fixes PR24394. The test case will come in a follow-up commit because it depends on MIR target-flags parsing. llvm-svn: 246372	2015-08-30 07:44:05 +00:00
Hal Finkel	7ffe55ae9d	[PowerPC] Remove unnecessary braces in PPCVSXFMAMutate Address Eric's post-commit review of r245741. NFC. llvm-svn: 246121	2015-08-26 23:41:53 +00:00
Matthias Braun	ccfc9c8d6d	FastISel: Use finishCondBranch() for ARM,Mips,PowerPC FastISel Note that after this change branch probabilities are preserved now. llvm-svn: 245998	2015-08-26 01:55:47 +00:00
Hal Finkel	0f2ddcb83f	[PowerPC] PPCVSXFMAMutate should ignore trivial-copy addends We might end up with a trivial copy as the addend, and if so, we should ignore the corresponding FMA instruction. The trivial copy can be coalesced away later, so there's nothing to do here. We should not, however, assert. Fixes PR24544. llvm-svn: 245907	2015-08-24 23:48:28 +00:00
Bill Schmidt	32fd189de2	[PPC64LE] Fix PR24546 - Swap optimization and debug values This patch fixes PR24546, which demonstrates a segfault during the VSX swap removal pass. The problem is that debug value instructions were not excluded from the list of instructions to be analyzed for webs of related computation. I've added the test case from the PR as a crash test in test/CodeGen/PowerPC. llvm-svn: 245862	2015-08-24 19:27:27 +00:00
Hal Finkel	ff9639d6b7	[PowerPC] PPCVSXFMAMutate should not segfault on undef input registers When PPCVSXFMAMutate would look at the input addend register, it would get its input value number. This would fail, however, if the register was undef, causing a segfault. Don't segfault (just skip such FMA instructions). Fixes the test case from PR24542 (although that may have been over-reduced). llvm-svn: 245741	2015-08-21 21:34:24 +00:00
Hal Finkel	9fdce9adee	[PowerPC] Fix value type on XVCMPEQDP for v2f64 comparisons XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs to be v2i64 (as that's the corresponding SETCC type). Fixes PR24225. llvm-svn: 245535	2015-08-20 03:02:02 +00:00
Hal Finkel	be78c25acb	[PowerPC] Fix the int2fp(fp2int(x)) DAGCombine to ignore ppc_fp128 This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128 operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)), but shouldn't (it should only apply to f32/f64 types). The result was a crash. llvm-svn: 245530	2015-08-20 01:18:20 +00:00
Nemanja Ivanovic	5f1cea4141	Temporary fix for the self-host failures introduced by rL244921. This revision has introduced an issue that only affects bootstrapped compiler when it is printing the ASM. I am working on resolving the issue, but in the meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64 and the associated testing until I can get this fixed. llvm-svn: 245481	2015-08-19 19:04:47 +00:00
Chandler Carruth	2f1fd1658f	[PM] Port ScalarEvolution to the new pass manager. This change makes ScalarEvolution a stand-alone object and just produces one from a pass as needed. Making this work well requires making the object movable, using references instead of overwritten pointers in a number of places, and other refactorings. I've also wired it up to the new pass manager and added a RUN line to a test to exercise it under the new pass manager. This includes basic printing support much like with other analyses. But there is a big and somewhat scary change here. Prior to this patch ScalarEvolution was never actually invalidated!!! Re-running the pass just re-wired up the various other analyses and didn't remove any of the existing entries in the SCEV caches or clear out anything at all. This might seem OK as everything in SCEV that can uses ValueHandles to track updates to the values that serve as SCEV keys. However, this still means that as we ran SCEV over each function in the module, we kept accumulating more and more SCEVs into the cache. At the end, we would have a SCEV cache with every value that we ever needed a SCEV for in the entire module!!! Yowzers. The releaseMemory routine would dump all of this, but that isn't realy called during normal runs of the pipeline as far as I can see. To make matters worse, there is actually a key that we don't update with value handles -- there is a map keyed off of Loops. Because LoopInfo does* release its memory from run to run, it is entirely possible to run SCEV over one function, then over another function, and then lookup a Loop* from the second function but find an entry inserted for the first function! Ouch. To make matters still worse, there are plenty of updates that don't trip a value handle. It seems incredibly unlikely that today GVN or another pass that invalidates SCEV can update values in just such a way that a subsequent run of SCEV will incorrectly find lookups in a cache, but it is theoretically possible and would be a nightmare to debug. With this refactoring, I've fixed all this by actually destroying and recreating the ScalarEvolution object from run to run. Technically, this could increase the amount of malloc traffic we see, but then again it is also technically correct. ;] I don't actually think we're suffering from tons of malloc traffic from SCEV because if we were, the fact that we never clear the memory would seem more likely to have come up as an actual problem before now. So, I've made the simple fix here. If in fact there are serious issues with too much allocation and deallocation, I can work on a clever fix that preserves the allocations (while clearing the data) between each run, but I'd prefer to do that kind of optimization with a test case / benchmark that shows why we need such cleverness (and that can test that we actually make it faster). It's possible that this will make some things faster by making the SCEV caches have higher locality (due to being significantly smaller) so until there is a clear benchmark, I think the simple change is best. Differential Revision: http://reviews.llvm.org/D12063 llvm-svn: 245193	2015-08-17 02:08:17 +00:00
Saleem Abdulrasool	3e190cb098	PowerPC: remove dead initialization (NFC) Identified by the clang static analyzer. No functional change intended. llvm-svn: 245022	2015-08-14 03:48:35 +00:00
Nemanja Ivanovic	1c39ca6501	Scalar to vector conversions using direct moves This patch corresponds to review: http://reviews.llvm.org/D11471 It improves the code generated for converting a scalar to a vector value. With direct moves from GPRs to VSRs, we no longer require expensive stack operations for this. Subsequent patches will handle the reverse case and more general operations between vectors and their scalar elements. llvm-svn: 244921	2015-08-13 17:40:44 +00:00
Alex Lorenz	e40c8a2b26	PseudoSourceValue: Replace global manager with a manager in a machine function. This commit removes the global manager variable which is responsible for storing and allocating pseudo source values and instead it introduces a new manager class named 'PseudoSourceValueManager'. Machine functions now own an instance of the pseudo source value manager class. This commit also modifies the 'get...' methods in the 'MachinePointerInfo' class to construct pseudo source values using the instance of the pseudo source value manager object from the machine function. This commit updates calls to the 'get...' methods from the 'MachinePointerInfo' class in a lot of different files because those calls now need to pass in a reference to a machine function to those methods. This change will make it easier to serialize pseudo source values as it will enable me to transform the mips specific MipsCallEntry PseudoSourceValue subclass into two target independent subclasses. Reviewers: Akira Hatanaka llvm-svn: 244693	2015-08-11 23:09:45 +00:00
Cameron Esfahani	f97999dc46	Explicitly clear the MI operand list when getInstruction() is called. Call MI.clear() within MCD::OPC_Decode case and inside of translateInstruction() for the X86 target. Remove now unnecessary MI.clear() from ARMDisassembler. Summary: Explicitly clear the MI operand list when getInstruction() is called. Reviewers: hfinkel, t.p.northover, hvarga, kparzysz, jyknight, qcolombet, uweigand Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11665 llvm-svn: 244557	2015-08-11 01:15:07 +00:00
Benjamin Kramer	df005cbe19	Fix some comment typos. llvm-svn: 244402	2015-08-08 18:27:36 +00:00
Pete Cooper	ebcd748927	Convert a bunch of loops to foreach. NFC. After r244074, we now have a successors() method to iterate over all the successors of a TerminatorInst. This commit changes a bunch of eligible loops to use it. llvm-svn: 244260	2015-08-06 20:22:46 +00:00
Chandler Carruth	93205eb966	[TTI] Make the cost APIs in TargetTransformInfo consistently use 'int' rather than 'unsigned' for their costs. For something like costs in particular there is a natural "negative" value, that of savings or saved cost. As a consequence, there is a lot of code that subtracts or creates negative values based on cost, all of which is prone to awkwardness or bugs when dealing with an unsigned type. Similarly, we never want these values to wrap, as that would cause Very Bad code generation (likely percieved as an infinite loop as we try to emit over 2^32 instructions or some such insanity). All around 'int' seems a much better fit for these basic metrics. I've added asserts to ensure that at least the TTI interface never returns negative numbers here. If we ever have a use case for negative numbers, we can remove this, but this way a bug where someone used '-1' to produce a 'very large' cost will be caught by the assert. This passes all tests, and is also UBSan clean. No functional change intended. Differential Revision: http://reviews.llvm.org/D11741 llvm-svn: 244080	2015-08-05 18:08:10 +00:00
Bill Schmidt	42ddd71120	[PPC] Fix PR24216: Don't generate splat for misaligned shuffle mask Given certain shuffle-vector masks, LLVM emits splat instructions which splat the wrong bytes from the source register. The issue is that the function PPC::isSplatShuffleMask() in PPCISelLowering.cpp does not ensure that the splat pattern found is requesting bytes that are aligned on an EltSize boundary. This patch detects this situation as not a valid splat mask, resulting in a permute being generated instead of a splat. Patch and test case by Tyler Kenney, cleaned up a bit by me. This is a simple bug fix that would be good to incorporate into 3.7. llvm-svn: 243519	2015-07-29 14:31:57 +00:00
Sanjay Patel	1dd15598cf	fix TLI's combineRepeatedFPDivisors interface to return the minimum user threshold This fix was suggested as part of D11345 and is part of fixing PR24141. With this change, we can avoid walking the uses of a divisor node if the target doesn't want the combineRepeatedFPDivisors transform in the first place. There is no NFC-intended other than that. Differential Revision: http://reviews.llvm.org/D11531 llvm-svn: 243498	2015-07-28 23:05:48 +00:00
Chih-Hung Hsieh	1e859582d6	Implement target independent TLS compatible with glibc's emutls.c. The 'common' section TLS is not implemented. Current C/C++ TLS variables are not placed in common section. DWARF debug info to get the address of TLS variables is not generated yet. clang and driver changes in http://reviews.llvm.org/D10524 Added -femulated-tls flag to select the emulated TLS model, which will be used for old targets like Android that do not support ELF TLS models. Added TargetLowering::LowerToTLSEmulatedModel as a target-independent function to convert a SDNode of TLS variable address to a function call to __emutls_get_address. Added into lib/Target//ISelLowering.cpp to call LowerToTLSEmulatedModel for TLSModel::Emulated. Although all targets supporting ELF TLS models are enhanced, emulated TLS model has been tested only for Android ELF targets. Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for emulated TLS variables. Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls. TODO: Add proper DIE for emulated TLS variables. Added new unit tests with emulated TLS. Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243438	2015-07-28 16:24:05 +00:00
Colin LeMahieu	fe2c8b8015	[llvm-mc] Pushing plumbing through for --fatal-warnings flag. llvm-svn: 243334	2015-07-27 21:56:53 +00:00
Pete Cooper	2e20147403	Revert "Add const to some Type* parameters which didn't need to be mutable. NFC." This reverts commit r243146. Feedback from Craig Topper and David Blaikie was that we don't put const on Type as it has no mutable state. llvm-svn: 243282	2015-07-27 17:15:24 +00:00
Eric Christopher	f0024d14f1	Fix PPCMaterializeInt to check the size of the integer based on the extension property we're requesting - zero or sign extended. This fixes cases where we want to return a zero extended 32-bit -1 and not be sign extended for the entire register. Also updated the already out of date comment with the current behavior. llvm-svn: 243192	2015-07-25 00:48:08 +00:00
Eric Christopher	03df7ac8a9	PPCMaterializeInt should only take a ConstantInt so represent this in the prototype and fix up all uses. llvm-svn: 243191	2015-07-25 00:48:06 +00:00
Pete Cooper	098f7c1fcb	Add const to some Type* parameters which didn't need to be mutable. NFC. We were only getting the size of the type which doesn't need to modify the type. llvm-svn: 243146	2015-07-24 19:19:26 +00:00
Pete Cooper	0debbdc872	Use foreach loops for StructType::elements(). NFC. We had a few places where we did for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) { but those could instead do for (auto *EltTy : STy->elements()) { llvm-svn: 243136	2015-07-24 18:55:49 +00:00
Bill Schmidt	2be8054b49	[PPC64LE] More vector swap optimization TLC This makes one substantive change and a few stylistic changes to the VSX swap optimization pass. The substantive change is to permit LXSDX and LXSSPX instructions to participate in swap optimization computations. The previous change to insert a swap following a SUBREG_TO_REG widening operation makes this almost trivial. I experimented with also permitting STXSDX and STXSSPX instructions. This can be done using similar techniques: we could insert a swap prior to a narrowing COPY operation, and then permit these stores to participate. I prototyped this, but discovered that the pattern of a narrowing COPY followed by an STXSDX does not occur in any of our test-suite code. So instead, I added commentary indicating that this could be done. Other TLC: - I changed SH_COPYSCALAR to SH_COPYWIDEN to more clearly indicate the direction of the copy. - I factored the insertion of swap instructions into a separate function. Finally, I added a new test case to check that the scalar-to-vector loads are working properly with swap optimization. llvm-svn: 242838	2015-07-21 21:40:17 +00:00
JF Bastien	e4d22d59d1	Targets: commonize some stack realignment code This patch does the following: * Fix FIXME on `needsStackRealignment`: it is now shared between multiple targets, implemented in `TargetRegisterInfo`, and isn't `virtual` anymore. This will break out-of-tree targets, silently if they used `virtual` and with a build error if they used `override`. * Factor out `canRealignStack` as a `virtual` function on `TargetRegisterInfo`, by default only looks for the `no-realign-stack` function attribute. Multiple targets duplicated the same `needsStackRealignment` code: - Aarch64. - ARM. - Mips almost: had extra `DEBUG` diagnostic, which the default implementation now has. - PowerPC. - WebAssembly. - x86 almost: has an extra `-force-align-stack` option, which the default implementation now has. The default implementation of `needsStackRealignment` used to just return `false`. My current patch changes the behavior by simply using the above shared behavior. This affects: - AMDGPU - BPF - CppBackend - MSP430 - NVPTX - Sparc - SystemZ - XCore - Out-of-tree targets This is a breaking change! `make check` passes. The only implementation of the `virtual` function (besides the slight different in x86) was Hexagon (which did `MF.getFrameInfo()->getMaxAlignment() > 8`), and potentially some out-of-tree targets. Hexagon now uses the default implementation. `needsStackRealignment` was being overwritten in `<Target>GenRegisterInfo.inc`, to return `false` as the default also did. That was odd and is now gone. Reviewers: sunfish Subscribers: aemerson, llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D11160 llvm-svn: 242727	2015-07-20 22:51:32 +00:00
Bill Schmidt	54cced54a6	[PowerPC] v4i32 is a VSRCRegClass I was looking at some vector code generation and kept seeing unnecessary vector copies into the Altivec half of the VSX registers. I discovered that we overlooked v4i32 when adding the register classes for VSX; we only added v4f32 and v2f64. This means that anything that canonicalizes into v4i32 (which is a LOT of stuff) ends up being forced into VRRC on its way to VSRC. The fix is one line. The rest of the patch is fixing up some test cases whose code generation has changed as a result. This seems like it would be a good candidate for backport to 3.7. llvm-svn: 242442	2015-07-16 21:14:07 +00:00
Mehdi Amini	bd7287ebe5	Move most user of TargetMachine::getDataLayout to the Module one Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. This patch is quite boring overall, except for some uglyness in ASMPrinter which has a getDataLayout function but has some clients that use it without a Module (llmv-dsymutil, llvm-dwarfdump), so some methods are taking a DataLayout as parameter. Reviewers: echristo Subscribers: yaron.keren, rafael, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11090 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 242386	2015-07-16 06:11:10 +00:00
Bill Schmidt	1e77bb12b4	[PPC64LE] Fix vec_sld semantics for little endian The vec_sld interface provides access to the vsldoi instruction. Unlike most of the vec_* interfaces, we do not attempt to change the generated code for vec_sld based on the endian mode. It is too difficult to correctly infer the desired semantics because of different element types, and the corrected instruction sequence is expensive, involving loading a permute control vector and performing a generalized permute. For GCC, this was implemented as "Don't touch the vec_sld" implementation. When it came time for the LLVM implementation, I did the same thing. However, this was hasty and incorrect. In LLVM's version of altivec.h, vec_sld was previously defined in terms of the vec_perm interface. Because vec_perm semantics are adjusted for little endian, this means that leaving vec_sld untouched causes it to generate something different for LE than for BE. Not good. This back-end patch accompanies the changes to altivec.h that change vec_sld's behavior for little endian. Those changes mean that we see slightly different code in the back end when trying to recognize a VSLDOI instruction in isVSLDOIShuffleMask. In particular, a ShuffleKind of 1 (where the two inputs are identical) must now be treated the same way as a ShuffleKind of 2 (little endian with different inputs) when little endian mode is in force. This is because ShuffleKind of 1 is defined using big-endian numbering. This has a ripple effect on LowerBUILD_VECTOR, where we create our own internal VSLDOI instructions. Because these are a ShuffleKind of 1, they will now have their shift amounts subtracted from 16 when recognizing the shuffle mask. To avoid problems we have to subtract them from 16 again before creating the VSLDOI instructions. There are a couple of other uses of BuildVSLDOI, but these do not need to be modified because the shift amount is 8, which is unchanged when subtracted from 16. llvm-svn: 242296	2015-07-15 15:45:30 +00:00
Benjamin Kramer	c11fd3e775	[PPC] Disassemble little endian ppc instructions in the right byte order PR24122. The test is simply a byte swapped version of ppc64-encoding.txt. llvm-svn: 242288	2015-07-15 12:56:19 +00:00
Hal Finkel	5d36b230b5	[PowerPC] Use the MachineCombiner to reassociate fadd/fmul This is a direct port of the code from the X86 backend (r239486/r240361), which uses the MachineCombiner to reassociate (floating-point) adds/muls to increase ILP, to the PowerPC backend. The rationale is the same. There is a lot of copy-and-paste here between the X86 code and the PowerPC code, and we should extract at least some of this into CodeGen somewhere. However, I don't want to do that until this code is enhanced to handle FMAs as well. After that, we'll be in a better position to extract the common parts. llvm-svn: 242279	2015-07-15 08:23:05 +00:00
Hal Finkel	673b493e98	[PowerPC] Extend physical register live range in PPCVSXFMAMutate If the source of the copy that defines the addend is a physical register, then its existing live range may not extend to the FMA being mutated. Make sure we extend the live range of the register to meet the FMA because it will become its operand in this case. I don't have an independent test case, but it will be exposed by change to be committed shortly enabling the use of the machine combiner to do fadd/fmul reassociation, and will be covered by one of the associated regression tests. llvm-svn: 242278	2015-07-15 08:23:03 +00:00
Hal Finkel	4012024fea	[PowerPC] Support symbolic targets in patchpoints Follow-up r235483, with the corresponding support in PPC. We use a regular call for symbolic targets (because they're much cheaper than indirect calls). llvm-svn: 242239	2015-07-14 22:53:11 +00:00
Hal Finkel	9bbad03b98	[PowerPC] Use the ABI indirect-call protocol for patchpoints We used to take the address specified as the direct target of the patchpoint and did no TOC-pointer handling. This, however, as not all that useful, because MCJIT tends to create a lot of modules, and they have their own TOC sections. Thus, to call from the generated code to other generated code, you really need to switch TOC pointers. Make this work as expected, and under ELFv1, tread the address as the function descriptor address so that the correct TOC pointer can be loaded. llvm-svn: 242217	2015-07-14 22:26:06 +00:00
Pete Cooper	65c69407c8	Add allnodes() iterator range to SelectionDAG. NFC. SelectionDAG already had begin/end methods for iterating over all the nodes, but didn't define an iterator_range for us in foreach loops. This adds such a method and uses it in some of the eligible places throughout the backends. llvm-svn: 242212	2015-07-14 22:10:54 +00:00
Hal Finkel	8acae5276e	[PowerPC] Fix the PPCInstrInfo::getInstrLatency implementation PowerPC uses itineraries to describe processor pipelines (and dispatch-group restrictions for P7/P8 cores). Unfortunately, the target-independent implementation of TII.getInstrLatency calls ItinData->getStageLatency, and that looks for the largest cycle count in the pipeline for any given instruction. This, however, yields the wrong answer for the PPC itineraries, because we don't encode the full pipeline. Because the functional units are fully pipelined, we only model the initial stages (there are no relevant hazards in the later stages to model), and so the technique employed by getStageLatency does not really work. Instead, we should take the maximum output operand latency, and that's what PPCInstrInfo::getInstrLatency now does. This caused some test-case churn, including two unfortunate side effects. First, the new arrangement of copies we get from function parameters now sometimes blocks VSX FMA mutation (a FIXME has been added to the code and the test cases), and we have one significant test-suite regression: SingleSource/Benchmarks/BenchmarkGame/spectral-norm 56.4185% +/- 18.9398% In this benchmark we have a loop with a vectorized FP divide, and it with the new scheduling both divides end up in the same dispatch group (which in this case seems to cause a problem, although why is not exactly clear). The grouping structure is hard to predict from the bottom of the loop, and there may not be much we can do to fix this. Very few other test-suite performance effects were really significant, but almost all weakly favor this change. However, in light of the issues highlighted above, I've left the old behavior available via a command-line flag. llvm-svn: 242188	2015-07-14 20:02:02 +00:00
Matthias Braun	9912bb817c	MachineRegisterInfo: Remove UsedPhysReg infrastructure We have a detailed def/use lists for every physical register in MachineRegisterInfo anyway, so there is little use in maintaining an additional bitset of which ones are used. Removing it frees us from extra book keeping. This simplifies VirtRegMap. Differential Revision: http://reviews.llvm.org/D10911 llvm-svn: 242173	2015-07-14 17:52:07 +00:00
Nemanja Ivanovic	984a3613b3	Add missing builtins to the PPC back end for ABI compliance (vol. 4) This patch corresponds to review: http://reviews.llvm.org/D11183 Back end portion of the fourth round of additions to altivec.h. llvm-svn: 242167	2015-07-14 17:25:20 +00:00
Matthias Braun	0256486532	PrologEpilogInserter: Rewrite API to determine callee save regsiters. This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan(): - Rename the function to determineCalleeSaves() - Pass a bitset of callee saved registers by reference, thus avoiding the function-global PhysRegUsed bitset in MachineRegisterInfo. - Without PhysRegUsed the implementation is fine tuned to not save physcial registers which are only read but never modified. Related to rdar://21539507 Differential Revision: http://reviews.llvm.org/D10909 llvm-svn: 242165	2015-07-14 17:17:13 +00:00
Bill Schmidt	15deb803b4	[PPC64LE] More improvements to VSX swap optimization This patch allows VSX swap optimization to succeed more frequently. Specifically, it is concerned with common code sequences that occur when copying a scalar floating-point value to a vector register. This patch currently handles cases where the floating-point value is already in a register, but does not yet handle loads (such as via an LXSDX scalar floating-point VSX load). That will be dealt with later. A typical case is when a scalar value comes in as a floating-point parameter. The value is copied into a virtual VSFRC register, and then a sequence of SUBREG_TO_REG and/or COPY operations will convert it to a full vector register of the class required by the context. If this vector register is then used as part of a lane-permuted computation, the original scalar value will be in the wrong lane. We can fix this by adding a swap operation following any widening SUBREG_TO_REG operation. Additional COPY operations may be needed around the swap operation in order to keep register assignment happy, but these are pro forma operations that will be removed by coalescing. If a scalar value is otherwise directly referenced in a computation (such as by one of the many XS* vector-scalar operations), we currently disable swap optimization. These operations are lane-sensitive by definition. A MentionsPartialVR flag is added for use in each swap table entry that mentions a scalar floating-point register without having special handling defined. A common idiom for PPC64LE is to convert a double-precision scalar to a vector by performing a splat operation. This ensures that the value can be referenced as V[0], as it would be for big endian, whereas just converting the scalar to a vector with a SUBREG_TO_REG operation leaves this value only in V[1]. A doubleword splat operation is one form of an XXPERMDI instruction, which takes one doubleword from a first operand and another doubleword from a second operand, with a two-bit selector operand indicating which doublewords are chosen. In the general case, an XXPERMDI can be permitted in a lane-swapped region provided that it is properly transformed to select the corresponding swapped values. This transformation is to reverse the order of the two input operands, and to reverse and complement the bits of the selector operand (derivation left as an exercise to the reader ;). A new test case that exercises the scalar-to-vector and generalized XXPERMDI transformations is added as CodeGen/PowerPC/swaps-le-5.ll. The patch also requires a change to CodeGen/PowerPC/swaps-le-3.ll to use CHECK-DAG instead of CHECK for two independent instructions that now appear in reverse order. There are two small unrelated changes that are added with this patch. First, the XXSLDWI instruction was incorrectly omitted from the list of lane-sensitive instructions; this is now fixed. Second, I observed that the same webs were being rejected over and over again for different reasons. Since it's sufficient to reject a web only once, I added a check for this to speed up the compilation time slightly. llvm-svn: 242081	2015-07-13 22:58:19 +00:00
Hal Finkel	cbf08925ef	[PowerPC] Make use of the TargetRecip system r238842 added the TargetRecip system for controlling use of reciprocal estimates for sqrt and division using a set of parameters that can be set by the frontend. Clang now supports a sophisticated -mrecip option, and this will allow that option to effectively control the relevant code-generation functionality of the PPC backend. llvm-svn: 241985	2015-07-12 02:33:57 +00:00
Hal Finkel	965cea5670	[PowerPC] Support the nest parameter attribute This adds support for the 'nest' attribute, which allows the static chain register to be set for functions calls under non-Darwin PPC/PPC64 targets. r11 is the chain register (which the PPC64 ELF ABI calls the "environment pointer"). For indirect calls under PPC64 ELFv1, this would normally be loaded from the function descriptor, but providing an explicit 'nest' parameter will override that process and use the value provided. This allows __builtin_call_with_static_chain to work as expected on PowerPC. llvm-svn: 241984	2015-07-12 00:37:44 +00:00
Duncan P. N. Exon Smith	754e21f244	MC: Remove MCSubtargetInfo() default constructor Force all creators of `MCSubtargetInfo` to immediately initialize it, merging the default constructor and the initializer into an initializing constructor. Besides cleaning up the code a little, this makes it clear that the initializer is never called again later. Out-of-tree backends need a trivial change: instead of calling: auto *X = new MCSubtargetInfo(); InitXYZMCSubtargetInfo(X, ...); return X; they should call: return createXYZMCSubtargetInfoImpl(...); There's no real functionality change here. llvm-svn: 241957	2015-07-10 22:43:42 +00:00
JF Bastien	b73a2ed20e	Target RegisterInfo: devirtualize TargetFrameLowering Summary: The target frame lowering's concrete type is always known in RegisterInfo, yet it's only sometimes devirtualized through a static_cast. This change adds an auto-generated static function <Target>GenRegisterInfo::getFrameLowering(const MachineFunction &MF) which does this devirtualization, and uses this function in all targets which can. This change was suggested by sunfish in D11070 for WebAssembly, I figure that I may as well improve the other targets while I'm here. Subscribers: sunfish, ted, llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D11093 llvm-svn: 241921	2015-07-10 18:13:17 +00:00
Nemanja Ivanovic	d9e4b4ff36	NFC. Added a blank line for consistency. llvm-svn: 241913	2015-07-10 14:25:17 +00:00
Nemanja Ivanovic	5655fb320c	Add missing builtins to the PPC back end for ABI compliance (vol. 3) This patch corresponds to review: http://reviews.llvm.org/D10973 Back end portion of the third round of additions to altivec.h. llvm-svn: 241900	2015-07-10 12:38:08 +00:00
Pat Gavlin	a717f255b6	Allow {e,r}bp as the target of {read,write}_register. This patch allows the read_register and write_register intrinsics to read/write the RBP/EBP registers on X86 iff the targeted register is the frame pointer for the containing function. Differential Revision: http://reviews.llvm.org/D10977 llvm-svn: 241827	2015-07-09 17:40:29 +00:00
Mehdi Amini	eaabc51e78	Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user A documentation for this function would be nice by the way. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241807	2015-07-09 15:12:23 +00:00
Mehdi Amini	157e5a6d10	Remove getDataLayout() from TargetSelectionDAGInfo (had no users) Summary: Remove empty subclass in the process. This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren, ted Differential Revision: http://reviews.llvm.org/D11045 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241780	2015-07-09 02:10:08 +00:00
Mehdi Amini	a749f2ad47	Remove getDataLayout() from TargetLowering Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: yaron.keren, rafael, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11042 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241779	2015-07-09 02:09:52 +00:00
Mehdi Amini	0cdec1e2ab	Make isLegalAddressingMode() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11040 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241778	2015-07-09 02:09:40 +00:00
Mehdi Amini	5c183d5239	Make getByValTypeAlignment() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: yaron.keren, rafael, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11038 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241777	2015-07-09 02:09:28 +00:00
Mehdi Amini	9639d650bb	Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11037 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241776	2015-07-09 02:09:20 +00:00
Mehdi Amini	44ede33a69	Make TargetLowering::getPointerTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775	2015-07-09 02:09:04 +00:00
Mehdi Amini	5010ebf181	Make TargetTransformInfo keeping a reference to the Module DataLayout DataLayout is no longer optional. It was initialized with or without a DataLayout, and the DataLayout when supplied could have been the one from the TargetMachine. Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11021 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241774	2015-07-09 02:08:42 +00:00
Mehdi Amini	56228dabfa	Redirect DataLayout from TargetMachine to Module in ComputeValueVTs() Summary: Avoid using the TargetMachine owned DataLayout and use the Module owned one instead. This requires passing the DataLayout up the stack to ComputeValueVTs(). This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11019 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241773	2015-07-09 01:57:34 +00:00
Daniel Sanders	f423f5627c	Change the last few internal StringRef triples into Triple objects. Summary: This concludes the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. At this point, the StringRef-form of GNU Triples should only be used in the public API (including IR serialization) and a couple objects that directly interact with the API (most notably the Module class). The next step is to replace these Triple objects with the TargetTuple object that will represent our authoratative/unambiguous internal equivalent to GNU Triples. Reviewers: rengolin Subscribers: llvm-commits, jholewinski, ted, rengolin Differential Revision: http://reviews.llvm.org/D10962 llvm-svn: 241472	2015-07-06 16:56:07 +00:00
Daniel Sanders	fbdab437f0	Where Triple has a suitable predicate, use it rather than the enum values. NFC. Reviewers: mcrosier Subscribers: llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10960 llvm-svn: 241469	2015-07-06 16:33:18 +00:00
Peter Collingbourne	6a9d1774d0	IR: Do not consider available_externally linkage to be linker-weak. From the linker's perspective, an available_externally global is equivalent to an external declaration (per isDeclarationForLinker()), so it is incorrect to consider it to be a weak definition. Also clean up some logic in the dead argument elimination pass and clarify its comments to better explain how its behavior depends on linkage, introduce GlobalValue::isStrongDefinitionForLinker() and start using it throughout the optimizers and backend. Differential Revision: http://reviews.llvm.org/D10941 llvm-svn: 241413	2015-07-05 20:52:35 +00:00
Benjamin Kramer	9bfb627a0e	[TargetLowering] StringRefize asm constraint getters. There is some functional change here because it changes target code from atoi(3) to StringRef::getAsInteger which has error checking. For valid constraints there should be no difference. llvm-svn: 241411	2015-07-05 19:29:18 +00:00
Nemanja Ivanovic	d358b8f80d	Add missing builtins to the PPC back end for ABI compliance (vol. 2) This patch corresponds to review: http://reviews.llvm.org/D10874 Back end portion of the second round of additions to altivec.h. llvm-svn: 241398	2015-07-05 06:03:51 +00:00
Bill Schmidt	a1c30053e7	[PPC64LE] Remove implicit-subreg restriction from VSX swap removal In r241285, I removed the SUBREG_TO_REG restriction from VSX swap removal, determining that this was overly conservative. We have another form of the same restriction in that we check for the presence of implicit subregs in vector operations. As with SUBREG_TO_REG for partial register conversions, an implicit subreg is safe in and of itself, provided no other operation makes a lane-sensitive assumption about the result. This patch removes that restriction, by removing the HasImplicitSubreg flag and all code that relies on it. I've added a test case that fails to optimize before this patch is applied, and optimizes properly with the patch. Test based on a report from Anton Blanchard. llvm-svn: 241290	2015-07-02 19:01:22 +00:00
Bill Schmidt	7c691fee1c	[PPC64LE] Teach swap optimization about the doubleword splat idiom With a previous patch, the VSX swap optimization is able to recognize the doubleword load-splat idiom that can be implemented using lxvdsx. However, that does not cover a doubleword splat where the source is a register. We can implement this using xxspltd (a special form of xxpermdi). This patch teaches the swap optimization pass about this idiom. As a prerequisite, it also permits swap optimization to succeed for all forms of SUBREG_TO_REG. Previously we were conservative and only allowed SUBREG_TO_REG when it copied a full register. However, on reflection any form of SUBREG_TO_REG is safe in and of itself, so long as an unsafe operation is not performed on its result. In particular, a widening SUBREG_TO_REG often occurs as an input to a doubleword splat idiom, particularly in auto-vectorized code. The doubleword splat idiom is an XXPERMDI operation where both source registers are identical, and the selection mask is either 0 (splat the first element) or 3 (splat the second element). To determine whether the registers are identical, we use the existing mechanism for looking through "copy-like" operations. That mechanism has a side effect of marking the XXPERMDI operation as using a physical register, which would invalidate its presence in a swap-optimized region. This is correct for the form of XXPERMDI that performs a swap and hence would be removed, but is not what we want for a doubleword-splat variety of XXPERMDI. Therefore we reset the physical-register flag on the XXPERMDI when it represents a splat. A simple test case is added to verify that we generate the splat and that we also remove the xxswapd instructions that would otherwise be associated with the load and store of another operand. llvm-svn: 241285	2015-07-02 17:03:06 +00:00
Bill Schmidt	ae94f11d55	[PPC64LE] Enable missing lxvdsx optimization, and related swap optimization When adding little-endian vector support for PowerPC last year, I inadvertently disabled an optimization that recognizes a load-splat idiom and generates the lxvdsx instruction. This patch moves the offending logic so lxvdsx is once again generated. This pattern is frequently generated by the vectorizer for scalar loads of an effective constant. Previously the lxvdsx instruction was wrongly listed as lane-sensitive for the VSX swap optimization (since both doublewords are identical, swaps are safe). This patch fixes this as well, so that vectorized code using lxvdsx can now have swaps removed from the computation. There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll that checks for the missing optimization. However, vsx.ll was only being tested for POWER7 with big-endian code generation. I've added a little-endian RUN statement and expected LE code generation for all the tests in vsx.ll to give us a bit better VSX coverage, including what's needed for this patch. llvm-svn: 241183	2015-07-01 19:40:07 +00:00
Nemanja Ivanovic	7df26c9b6f	Modified a comment about the reason for the patch (removed commented code). llvm-svn: 241110	2015-06-30 20:01:16 +00:00
Nemanja Ivanovic	9c8d4cf272	Fixes a bug with __builtin_vsx_lxvdw4x on Little Endian systems llvm-svn: 241108	2015-06-30 19:45:45 +00:00
Ranjeet Singh	86ecbb7b54	Reverting r241058 because it's causing buildbot failures. llvm-svn: 241061	2015-06-30 12:32:53 +00:00
Ranjeet Singh	5b119091a1	There are a few places where subtarget features are still represented by uint64_t, this patch replaces these usages with the FeatureBitset (std::bitset) type. Differential Revision: http://reviews.llvm.org/D10542 llvm-svn: 241058	2015-06-30 11:30:42 +00:00
Nemanja Ivanovic	f502a428e6	Add missing builtins to the PPC back end for ABI compliance (vol. 1) This patch corresponds to review: http://reviews.llvm.org/D10638 This is the back end portion of patch http://reviews.llvm.org/D10637 It just adds the code gen and intrinsic functions necessary to support that patch to the back end. llvm-svn: 240820	2015-06-26 19:26:53 +00:00
NAKAMURA Takumi	520b45df84	PPCISelLowering.cpp: Appease PR23956. [-Wdocumentation] llvm-svn: 240727	2015-06-25 23:38:44 +00:00

1 2 3 4 5 ...

4599 Commits