llvm-project

Commit Graph

Author	SHA1	Message	Date
Benjamin Kramer	c6cc58e703	Remove unnecessary copying or replace it with moves in a bunch of places. NFC. llvm-svn: 219061	2014-10-04 16:55:56 +00:00
Duncan P. N. Exon Smith	176b691d32	Revert "Revert "DI: Fold constant arguments into a single MDString"" This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010	2014-10-03 20:01:09 +00:00
Benjamin Kramer	e12a6bac32	Eliminate some deep std::vector copies. NFC. llvm-svn: 218999	2014-10-03 18:33:16 +00:00
Duncan P. N. Exon Smith	786cd049fc	Revert "DI: Fold constant arguments into a single MDString" This reverts commit r218914 while I investigate some bots. llvm-svn: 218918	2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith	571f97bd90	DI: Fold constant arguments into a single MDString This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 218914	2014-10-02 21:56:57 +00:00
Sanjay Patel	12d1ce5408	Optimize square root squared (PR21126). When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X. This can happen in the real world when calculating x ** 3/2. This occurs in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c. Differential Revision: http://reviews.llvm.org/D5584 llvm-svn: 218906	2014-10-02 21:10:54 +00:00
Sanjay Patel	b41d46118a	Use the local variable that other clauses around here are already using. llvm-svn: 218876	2014-10-02 15:20:45 +00:00
Zinovy Nis	ccc3e3733b	[BUG][INDVAR] Fix for PR21014: wrong SCEV operands commuting for non-commutative instructions My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]' thus inverting index expression. This patch fixes it. Thanks to Jörg Sonnenberger for pointing. Differential Revision: http://reviews.llvm.org/D5576 llvm-svn: 218867	2014-10-02 13:01:15 +00:00
Duncan P. N. Exon Smith	611afb229c	DIBuilder: Encapsulate DIExpression's element type `DIExpression`'s elements are 64-bit integers that are stored as `ConstantInt`. The accessors already encapsulate the storage. This commit updates the `DIBuilder` API to also encapsulate that. llvm-svn: 218797	2014-10-01 20:26:08 +00:00
Adrian Prantl	87b7eb9d0f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Adrian Prantl	b458dc2eee	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	25a7174e7a	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
Tom Stellard	0a4e9a3b25	C API: Add LLVMCloneModule() llvm-svn: 218775	2014-10-01 17:14:57 +00:00
Evgeniy Stepanov	815f2869ad	Revert r218721, r218735. Failing bootstrap on Linux (arm, x86). http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470 http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518 llvm-svn: 218752	2014-10-01 10:07:28 +00:00
Gerolf Hoflehner	19fc3dafc8	[InstCombine] Fix for assert build failures caused by r218721 The icmp-select-icmp optimization made the implicit assumption that the select-icmp instructions are in the same block and asserted on it. The fix explicitly checks for that condition and conservatively suppresses the optimization when it is violated. llvm-svn: 218735	2014-10-01 03:24:39 +00:00
Gerolf Hoflehner	08cc4b950c	[InstCombine] Optimize icmp-select-icmp In special cases select instructions can be eliminated by replacing them with a cheaper bitwise operation even when the select result is used outside its home block. The instances implemented are patterns like %x=icmp.eq %y=select %x,%r, null %z=icmp.eq\|neq %y, null br %z,true, false ==> %x=icmp.ne %y=icmp.eq %r,null %z=or %x,%y br %z,true,false The optimization is integrated into the instruction combiner and performed only when all uses of the select result can be replaced by the select operand proper. For this dominator information is used and dominance is now a required analysis pass in the combiner. The optimization itself is iterative. The critical step is to replace the select result with the non-constant select operand. So the select becomes local and the combiner iteratively works out simpler code pattern and eventually eliminates the select. rdar://17853760 llvm-svn: 218721	2014-10-01 00:13:22 +00:00
Jingyue Wu	fc0296704c	[SimplifyCFG] threshold for folding branches with common destination Summary: This patch adds a threshold that controls the number of bonus instructions allowed for folding branches with common destination. The original code allows at most one bonus instruction. With this patch, users can customize the threshold to allow multiple bonus instructions. The default threshold is still 1, so that the code behaves the same as before when users do not specify this threshold. The motivation of this change is that tuning this threshold significantly (up to 25%) improves the performance of some CUDA programs in our internal code base. In general, branch instructions are very expensive for GPU programs. Therefore, it is sometimes worth trading more arithmetic computation for a more straightened control flow. Here's a reduced example: __global__ void foo(int a, int b, int c, int d, int e, int n, const int input, int output) { int sum = 0; for (int i = 0; i < n; ++i) sum += (((i ^ a) > b) && (((i \| c ) ^ d) > e)) ? 0 : input[i]; *output = sum; } The select statement in the loop body translates to two branch instructions "if ((i ^ a) > b)" and "if (((i \| c) ^ d) > e)" which share a common destination. With the default threshold, SimplifyCFG is unable to fold them, because computing the condition of the second branch "(i \| c) ^ d > e" requires two bonus instructions. With the threshold increased, SimplifyCFG can fold the two branches so that the loop body contains only one branch, making the code conceptually look like: sum += (((i ^ a) > b) & (((i \| c ) ^ d) > e)) ? 0 : input[i]; Increasing the threshold significantly improves the performance of this particular example. In the configuration where both conditions are guaranteed to be true, increasing the threshold from 1 to 2 improves the performance by 18.24%. Even in the configuration where the first condition is false and the second condition is true, which favors shortcuts, increasing the threshold from 1 to 2 still improves the performance by 4.35%. We are still looking for a good threshold and maybe a better cost model than just counting the number of bonus instructions. However, according to the above numbers, we think it is at least worth adding a threshold to enable more experiments and tuning. Let me know what you think. Thanks! Test Plan: Added one test case to check the threshold is in effect Reviewers: nadav, eliben, meheff, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D5529 llvm-svn: 218711	2014-09-30 22:23:38 +00:00
Lorenzo Martignoni	40d3deeb7d	Introduce support for custom wrappers for vararg functions. Differential Revision: http://reviews.llvm.org/D5412 llvm-svn: 218671	2014-09-30 12:33:16 +00:00
Chad Rosier	aab5d7bd33	[IndVarSimplify] Widen loop unsigned compares. This patch extends r217953 to handle unsigned comparison. Phabricator revision: http://reviews.llvm.org/D5526 llvm-svn: 218659	2014-09-30 03:17:42 +00:00
Kevin Qin	fc02e3c363	Use a loop to simplify the runtime unrolling prologue. Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: if (extraiters == loopfactor) jump L1 if (extraiters == loopfactor-1) jump L2 ... L1: LoopBody; L2: LoopBody; ... if tripcount < loopfactor jump End Loop: ... End: It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop. This commit is to use a loop to execute the extra iterations in prologue, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: else jump Prol Prol: LoopBody; extraiters -= 1 // Omitted if unroll factor is 2. if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2. if (tripcount < loopfactor) jump End Loop: ... End: Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution. llvm-svn: 218604	2014-09-29 11:15:00 +00:00
Chad Rosier	7b974b73ae	[IndVar] Don't widen loop compare unless IV user is sign extended. PR21030 llvm-svn: 218539	2014-09-26 20:05:35 +00:00
Kostya Serebryany	34ddf8725c	[asan] don't instrument module CTORs that may be run before asan.module_ctor. This fixes asan running together -coverage llvm-svn: 218421	2014-09-24 22:41:55 +00:00
David Peixotto	0d4d5e64ec	Fix assertion in LICM doFinalization() The doFinalization method checks that the LoopToAliasSetMap is empty. LICM populates that map as it runs through the loop nest, deleting the entries for child loops as it goes. However, if a child loop is deleted by another pass (e.g. unrolling) then the loop will never be deleted from the map because LICM walks the loop nest to find entries it can delete. The fix is to delete the loop from the map and free the alias set when the loop is deleted from the loop nest. Differential Revision: http://reviews.llvm.org/D5305 llvm-svn: 218387	2014-09-24 16:48:31 +00:00
Michael Liao	d120916ca7	Allow BB duplication threshold to be adjusted through JumpThreading's ctor - BB duplication may not be desired on targets where there is no or small branch penalty and code duplication needs restrict control. llvm-svn: 218375	2014-09-24 04:59:06 +00:00
Reid Kleckner	78927e884b	GlobalOpt: Preserve comdats of unoptimized initializers Rather than slurping in and splatting out the whole ctor list, preserve the existing array entries without trying to understand them. Only remove the entries that we know we can optimize away. This way we don't need to wire through priority and comdats or anything else we might add. Fixes a linker issue where the .init_array or .ctors entry would point to discarded initialization code if the comdat group from the TU with the faulty global_ctors entry was dropped. llvm-svn: 218337	2014-09-23 22:33:01 +00:00
Lenny Maiorani	9eefc81219	Using a deque to manage the stack of nodes is faster here. Vector is slow due to many reallocations as the size regularly changes in unpredictable ways. See the investigation provided on the mailing list for more information: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120116/135228.html llvm-svn: 218182	2014-09-20 13:29:20 +00:00
Eric Christopher	d85ffb1fc0	Add a new pass FunctionTargetTransformInfo. This pass serves as a shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004	2014-09-18 00:34:14 +00:00
David Blaikie	dba94ec3c7	Reapply fix in r217988 (reverted in r217989) and remove the alternative fix committed in r217987. This type isn't owned polymorphically (as demonstrated by making the dtor protected and everything still compiling) so just address the warning by protecting the base dtor and making the derived class final. llvm-svn: 217990	2014-09-17 22:27:36 +00:00
David Blaikie	d8978ec085	Revert "Fix -Wnon-virtual-dtor warning introduced in r217982." An alternative fix was already committed. This reverts commit r217988. llvm-svn: 217989	2014-09-17 22:17:59 +00:00
David Blaikie	20dd05ccfd	Fix -Wnon-virtual-dtor warning introduced in r217982. llvm-svn: 217988	2014-09-17 22:15:40 +00:00
Chris Bieneman	cf93cbb7a4	Fixing a build error. llvm-svn: 217983	2014-09-17 21:06:59 +00:00
Chris Bieneman	ad070d0588	Refactoring SimplifyLibCalls to remove static initializers and generally cleaning up the code. Summary: This eliminates ~200 lines of code mostly file scoped struct definitions that were unnecessary. Reviewers: chandlerc, resistor Reviewed By: resistor Subscribers: morisset, resistor, llvm-commits Differential Revision: http://reviews.llvm.org/D5364 llvm-svn: 217982	2014-09-17 20:55:46 +00:00
Chad Rosier	307b50b0f6	[IndVarSimplify] Partially revert r217953 to see if this fixes the bots. Specifically, disable widening of unsigned compare instructions. llvm-svn: 217962	2014-09-17 16:35:09 +00:00
Chad Rosier	bb99f40530	[IndVarSimplify] Widen loop compare instructions. This improves other optimizations such as LSR. A sext may be added to the compare's other operand, but this can often be hoisted outside of the loop. llvm-svn: 217953	2014-09-17 14:10:33 +00:00
Andrea Di Biagio	5b92b4971a	[InstCombine] Fix wrong folding of constant comparison involving ahsr and negative quantities (PR20945). Example: define i1 @foo(i32 %a) { %shr = ashr i32 -9, %a %cmp = icmp ne i32 %shr, -5 ret i1 %cmp } Before this fix, the instruction combiner wrongly thought that %shr could have never been equal to -5. Therefore, %cmp was always folded to 'true'. However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore, in this example, it is not valid to fold %cmp to 'true'. The problem was only affecting the case where the comparison was between negative quantities where one of the quantities was obtained from arithmetic shift of a negative constant. This patch fixes the problem with the wrong folding (fixes PR20945). With this patch, the 'icmp' from the example is now simplified to a comparison between %a and 1. This still allows us to get rid of the arithmetic shift (%shr). llvm-svn: 217950	2014-09-17 11:32:31 +00:00
Jingyue Wu	b67140b812	Remove dead code in SimplifyCFG Summary: UsedByBranch is always true according to how BonusInst is defined. Test Plan: Passes check-all, and also verified if (BonusInst && !UsedByBranch) { ... } is never entered during check-all. Reviewers: resistor, nadav, jingyue Reviewed By: jingyue Subscribers: llvm-commits, eliben, meheff Differential Revision: http://reviews.llvm.org/D5324 llvm-svn: 217824	2014-09-15 20:48:13 +00:00
Nick Lewycky	9e6d184803	Add control of function merging to the PMBuilder. llvm-svn: 217731	2014-09-13 21:46:00 +00:00
Benjamin Kramer	0bd147da17	Simplify code. No functionality change. llvm-svn: 217726	2014-09-13 12:38:49 +00:00
Juergen Ributzka	14ae60407d	[C API] Make the 'lower switch' pass available via the C API. llvm-svn: 217630	2014-09-11 21:32:32 +00:00
Hal Finkel	f83e1f7f66	[AlignmentFromAssumptions] Don't crash just because the target is 32-bit We used to crash processing any relevant @llvm.assume on a 32-bit target (because we'd ask SE to subtract expressions of differing types). I've copied our 'simple.ll' test, but with the data layout from arm-linux-gnueabihf to get some meaningful test coverage here. llvm-svn: 217574	2014-09-11 08:40:17 +00:00
Rafael Espindola	c435adcde0	Add doInitialization/doFinalization to DataLayoutPass. With this a DataLayoutPass can be reused for multiple modules. Once we have doInitialization/doFinalization, it doesn't seem necessary to pass a Module to the constructor. Overall this change seems in line with the idea of making DataLayout a required part of Module. With it the only way of having a DataLayout used is to add it to the Module. llvm-svn: 217548	2014-09-10 21:27:43 +00:00
Hal Finkel	71b7084112	[AlignmentFromAssumptions] Don't divide by zero for unknown starting alignment The routine that determines an alignment given some SCEV returns zero if the answer is unknown. In a case where we could determine the increment of an AddRec but not the starting alignment, we would compute the integer modulus by zero (which is illegal and traps). Prevent this by returning early if either the start or increment alignment is unknown (zero). llvm-svn: 217544	2014-09-10 21:05:52 +00:00
Gerolf Hoflehner	008e5cdcba	[PassManager] Adding Hidden attribute to EnableMLSM option llvm-svn: 217539	2014-09-10 20:24:03 +00:00
Gerolf Hoflehner	24815d9b8f	[MergedLoadStoreMotion] Move pass enabling option to PassManagerBuilder llvm-svn: 217538	2014-09-10 19:55:29 +00:00
Sanjay Patel	b653de1ada	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528	2014-09-10 17:58:16 +00:00
Gerolf Hoflehner	e4f6684d1b	Removed misleading comment. llvm-svn: 217527	2014-09-10 17:54:50 +00:00
Stepan Dyatkovskiy	fe134cdfa7	MergeFunctions: FunctionPtr has been renamed to FunctionNode. It's supposed to store additional pass information for current function here. That was the reason for name change. llvm-svn: 217483	2014-09-10 10:08:25 +00:00
NAKAMURA Takumi	1ab0cf0e28	SampleProfile.cpp: Prune a stray \param added in r217437. [-Wdocumentation] llvm-svn: 217465	2014-09-09 22:44:30 +00:00
NAKAMURA Takumi	bb4fac9050	ScalarOpts/LLVMBuild.txt: Prune unused dependency to IPA. llvm-svn: 217448	2014-09-09 15:00:38 +00:00
NAKAMURA Takumi	37ffecf06b	ScalarOpts/LLVMBuild.txt: Reorder. llvm-svn: 217447	2014-09-09 15:00:26 +00:00

1 2 3 4 5 ...

11906 Commits