llvm-project

Commit Graph

Author	SHA1	Message	Date
Matthias Braun	36768c684f	LiveRangeEdit: Check for completely empy subranges after removing ValNos. Completely empty subranges are not allowed and must be removed when subreg liveness is enabled. llvm-svn: 224804	2014-12-24 02:11:46 +00:00
Matthias Braun	f603c88d13	LiveIntervalAnalysis: Fix performance bug that I introduced in r224663. Without a reference the code did not remember when moving the iterators of the subranges/registerunit ranges forward and instead would scan from the beginning again at the next position. llvm-svn: 224803	2014-12-24 02:11:43 +00:00
Adrian Prantl	3026a54aa2	Debug Info: In symmetry to DW_TAG_pointer_type, do not emit the byte size of a DW_TAG_ptr_to_member_type. This restores the behavior from before r224780-r224781. llvm-svn: 224799	2014-12-24 01:17:51 +00:00
Mehdi Amini	d38920891e	Always assert in DAGCombine and not only when -debug is enabled Right now in DAG Combine check the validity of the returned type only when -debug is given on the command line. However usually the test cases in the validation does not use -debug. An Assert build should always check this. llvm-svn: 224779	2014-12-23 18:59:02 +00:00
Michael Kuperstein	f4536ea6e8	[DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements This partially fixes PR21943. For AVX, we go from: vmovq (%rsi), %xmm0 vmovq (%rdi), %xmm1 vpermilps $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3] vinsertps $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3] vinsertps $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3] vpermilps $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3] vinsertps $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0] To the expected: vmovq (%rdi), %xmm0 vmovhpd (%rsi), %xmm0, %xmm0 retq Fixing this for AVX2 is still open. Differential Revision: http://reviews.llvm.org/D6749 llvm-svn: 224759	2014-12-23 08:59:45 +00:00
Reid Kleckner	ce0093344f	Make musttail more robust for vector types on x86 Previously I tried to plug musttail into the existing vararg lowering code. That turned out to be a mistake, because non-vararg calls use significantly different register lowering, even on x86. For example, AVX vectors are usually passed in registers to normal functions and memory to vararg functions. Now musttail uses a completely separate lowering. Hopefully this can be used as the basis for non-x86 perfect forwarding. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6156 llvm-svn: 224745	2014-12-22 23:58:37 +00:00
Quentin Colombet	84f89ccd45	[CodeGenPrepare] Handle properly the promotion of operands when this does not generate instructions. Fixes PR21978. Related to <rdar://problem/18310086> llvm-svn: 224717	2014-12-22 18:11:52 +00:00
Rafael Espindola	366e5c1bf1	The leak detector is dead, long live asan and valgrind. In resent times asan and valgrind have found way more memory management bugs in llvm than the special purpose leak detector. llvm-svn: 224703	2014-12-22 13:00:36 +00:00
Saleem Abdulrasool	90c224a143	CodeGen: minor style tweaks to SSP Clean up some style related things in the StackProtector CodeGen. NFC. llvm-svn: 224693	2014-12-21 21:52:38 +00:00
Matt Arsenault	22b4c256e1	Enable (sext x) == C --> x == (trunc C) combine Extend the existing code which handles this for zext. This makes this more useful for targets with ZeroOrNegativeOne BooleanContent and obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne) since the constant will now be shrunk to i1. llvm-svn: 224691	2014-12-21 16:48:42 +00:00
Saleem Abdulrasool	57b5fe57e3	CodeGen: constify and use range loop for SSP Use range-based for loop and constify the iterators. NFC. llvm-svn: 224683	2014-12-20 21:37:51 +00:00
Matthias Braun	714c494ca1	LiveIntervalAnalysis: No kill flags for partially undefined uses. We must not add kill flags when reading a vreg with some undefined subregisters, if subreg liveness tracking is enabled. This is because the register allocator may reuse these undefined subregisters for other values which are not killed. llvm-svn: 224664	2014-12-20 01:54:50 +00:00
Matthias Braun	7f8dece1d7	LiveIntervalAnalysis: cleanup addKills(), NFC - Use more const modifiers - Use references for things that can't be nullptr - Improve some variable names llvm-svn: 224663	2014-12-20 01:54:48 +00:00
Reid Kleckner	f2acbbaf22	EH: Sink computation of local PadMap variable into function that uses it No functionality change. llvm-svn: 224635	2014-12-19 22:30:08 +00:00
Reid Kleckner	93acac6cfc	Add the ExceptionHandling::MSVC enumeration It is intended to be used for a family of personality functions that have similar IR preparation requirements. Typically when interoperating with MSVC personality functions, bits of functionality need to be outlined from the main function into helper functions. There is also usually more than one landing pad per invoke, which does not match the LLVM IR landingpad representation. None of this is implemented yet. This change just adds a new enum that is active for *-windows-msvc and delegates to the EH removal preparation pass. No functionality change for other targets. llvm-svn: 224625	2014-12-19 22:19:48 +00:00
Sanjay Patel	0428a5786e	merge consecutive stores of extracted vector elements Add a path to DAGCombiner::MergeConsecutiveStores() to combine multiple scalar stores when the store operands are extracted vector elements. This is a partial fix for PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). For the new test case, codegen improves from: vmovss %xmm0, (%rdi) vextractps $1, %xmm0, 4(%rdi) vextractps $2, %xmm0, 8(%rdi) vextractps $3, %xmm0, 12(%rdi) vextractf128 $1, %ymm0, %xmm0 vmovss %xmm0, 16(%rdi) vextractps $1, %xmm0, 20(%rdi) vextractps $2, %xmm0, 24(%rdi) vextractps $3, %xmm0, 28(%rdi) vzeroupper retq To: vmovups %ymm0, (%rdi) vzeroupper retq Patch reviewed by Nadav Rotem. Differential Revision: http://reviews.llvm.org/D6698 llvm-svn: 224611	2014-12-19 20:23:41 +00:00
Matthias Braun	aeb50b3805	RegisterCoalescer: rewrite eliminateUndefCopy(). This also fixes problems with undef copies of subregisters. I can't attach a testcase for that as none of the targets in trunk has subregister liveness tracking enabled. llvm-svn: 224560	2014-12-19 01:39:46 +00:00
Adrian Prantl	cf44e7870b	Explain why LLVM is emitting a DW_AT_containing_type inside of a class. llvm-svn: 224555	2014-12-19 00:01:20 +00:00
Matthias Braun	15abf3743c	LiveIntervalAnalysis: Cleanup computeDeadValues - This also fixes a bug introduced in r223880 where values were not correctly marked as Dead anymore. - Cleanup computeDeadValues(): split up SubRange code variant, simplify arguments. llvm-svn: 224538	2014-12-18 19:58:52 +00:00
Eric Christopher	661f2d1ca1	Add a new string member to the TargetOptions struct for the name of the abi we should be using. For targets that don't use the option there's no change, otherwise this allows external users to set the ABI via string and avoid some of the -backend-option pain in clang. Use this option to move the ABI for the ARM port from the Subtarget to the TargetMachine and update the testcases accordingly since it's no longer valid to set via -mattr. llvm-svn: 224492	2014-12-18 02:20:58 +00:00
Matthias Braun	0a410f6243	RegisterCoalescer: Fix stripCopies() picking up main range instead of subregister range This fixes a problem where stripCopies() would switch to values in the main liverange when it crossed a copy instruction. However when joining subranges we need to stay in the respective subregister ranges. llvm-svn: 224461	2014-12-17 21:25:20 +00:00
Matthias Braun	8142efa8ea	ExecutionDepsFix: Correctly handle wide registers. The ExecutionDepsFix previously mapped each register to 1 or zero registers of the register class it was called with and therefore simulating liveness for. This was problematic for cases involving wider registers like Q0 on ARM where ExecutionDepsFix gets invoked for the Dxx registers. In these cases the wide register would get mapped to the last matching D register, while it should have been all matching D registers. This commit changes the AliasMap to use a SmallVector to map registers to potentially multiple destination regclass registers. This is required to avoid regressions with subregister liveness tracking enabled. llvm-svn: 224447	2014-12-17 19:13:47 +00:00
Michael Kuperstein	047b1a0400	[DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle. This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors. This fixes PR15872 and provides a partial fix for PR21711. Differential Revision: http://reviews.llvm.org/D6678 llvm-svn: 224429	2014-12-17 12:32:17 +00:00
Toma Tabacu	a23f13c3b0	[mips] Set GCC-compatible MIPS asssembler options before inline asm blocks. Summary: When generating MIPS assembly, LLVM always overrides the default assembler options by emitting the '.set noreorder', '.set nomacro' and '.set noat' directives, while GCC uses the default options if an assembly-level function contains inline assembly code. This becomes a problem when the code generated by LLVM is interleaved with inline assembly which assumes GCC-like assembler options (from Linux, for example). This patch fixes these conflicts by setting the appropriate assembler options at the beginning of an inline asm block and popping them at the end. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6637 llvm-svn: 224425	2014-12-17 10:56:16 +00:00
Matthias Braun	f4a72cd06e	RegisterCoalescer: Sprinkle some const modifiers. llvm-svn: 224409	2014-12-17 02:18:13 +00:00
Quentin Colombet	fc2201e922	[CodeGenPrepare] Reapply r224351 with a fix for the assertion failure: The type promotion helper does not support vector type, so when make such it does not kick in in such cases. Original commit message: [CodeGenPrepare] Move sign/zero extensions near loads using type promotion. This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible. Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet. Context Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern. Motivating Example Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void } As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA. Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA. This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures. Note: The throughput numbers are similar on Sandy Bridge and Haswell. Proposed Solution To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load)))) The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch. Performance Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3% The results are consistent for both O3 and Os. <rdar://problem/18310086> llvm-svn: 224402	2014-12-17 01:36:17 +00:00
David Blaikie	8b979f01c6	PR21875: codegen for non-type template parameters of nullptr_t type llvm-svn: 224399	2014-12-17 00:43:22 +00:00
Reid Kleckner	04b69f89aa	Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type promotion." This reverts commit r224351. It causes assertion failures when building ICU. llvm-svn: 224397	2014-12-17 00:29:23 +00:00
Hans Wennborg	224cb82a39	SelectionDAG switch lowering: use 'unsigned' to count destination popularity SwitchInst::getNumCases() returns unsinged, so using uint64_t to count cases seems unnecessary. Also fix a missing CHECK in the test case. llvm-svn: 224393	2014-12-16 23:41:59 +00:00
Sanjay Patel	7129c10cae	merge consecutive loads that are offset from a base address SelectionDAG::isConsecutiveLoad() was not detecting consecutive loads when the first load was offset from a base address. This patch recognizes that pattern and subtracts the offset before comparing the second load to see if it is consecutive. The codegen change in the new test case improves from: vmovsd 32(%rdi), %xmm0 vmovsd 48(%rdi), %xmm1 vmovhpd 56(%rdi), %xmm1, %xmm1 vmovhpd 40(%rdi), %xmm0, %xmm0 vinsertf128 $1, %xmm1, %ymm0, %ymm0 To: vmovups 32(%rdi), %ymm0 An existing test case is also improved from: vmovsd (%rdi), %xmm0 vmovsd 16(%rdi), %xmm1 vmovsd 24(%rdi), %xmm2 vunpcklpd %xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0] vmovhpd 8(%rdi), %xmm1, %xmm3 To: vmovsd (%rdi), %xmm0 vmovsd 16(%rdi), %xmm1 vmovhpd 24(%rdi), %xmm0, %xmm0 vmovhpd 8(%rdi), %xmm1, %xmm1 This patch fixes PR21771 ( http://llvm.org/bugs/show_bug.cgi?id=21771 ). Differential Revision: http://reviews.llvm.org/D6642 llvm-svn: 224379	2014-12-16 21:57:18 +00:00
Matt Arsenault	dd3b77d64c	Move lowerConstant to AsmPrinter This was a static function before, and NVPTX duplicated it because it wasn't exposed. llvm-svn: 224354	2014-12-16 19:16:14 +00:00
Quentin Colombet	d5e57b731f	[CodeGenPrepare] Move sign/zero extensions near loads using type promotion. This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible. Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet. Context Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern. Motivating Example Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void } As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA. Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA. This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures. Note: The throughput numbers are similar on Sandy Bridge and Haswell. Proposed Solution To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load)))) The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch. Performance Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3% The results are consistent for both O3 and Os. <rdar://problem/18310086> llvm-svn: 224351	2014-12-16 19:09:03 +00:00
Aaron Ballman	0d6a010c13	Fixing -Wsign-compare warnings; NFC. llvm-svn: 224337	2014-12-16 14:04:11 +00:00
Matthias Braun	1aed6ffa35	LiveRangeCalc: Rewrite subrange calculation This changes subrange calculation to calculate subranges sequentially instead of in parallel. The code is easier to understand that way and addresses the code review issues raised about LiveOutData being hard to understand/needing more comments by removing them :) llvm-svn: 224313	2014-12-16 04:03:38 +00:00
Adrian Prantl	b9fa945d51	ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions. Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in the middle of the function prologue would cause debug info to end the prologue too early and worse, attach the line number of the CFI instruction, which incidentally is often 0. llvm-svn: 224294	2014-12-16 00:20:49 +00:00
Matthias Braun	c3a72c2e5f	Revert "LiveRangeCalc: Rewrite subrange calculation" Revert until I find out why non-subreg enabled targets break. This reverts commit 6097277eefb9c5fb35a7f493c783ee1fd1b9d6a7. llvm-svn: 224278	2014-12-15 21:36:35 +00:00
Matthias Braun	0352201e15	LiveRangeCalc: Rewrite subrange calculation This changes subrange calculation to calculate subranges sequentially instead of in parallel. The code is easier to understand that way and addresses the code review issues raised about LiveOutData being hard to understand/needing more comments by removing them :) llvm-svn: 224272	2014-12-15 21:16:21 +00:00
Matthias Braun	42fab34ffb	LiveRangeCalc: use more range based for loops; NFC llvm-svn: 224263	2014-12-15 19:40:46 +00:00
Michael Ilseman	addddc441f	Silence more static analyzer warnings. Add in definedness checks for shift operators, null checks when pointers are assumed by the code to be non-null, and explicit unreachables. llvm-svn: 224255	2014-12-15 18:48:43 +00:00
Akira Hatanaka	7ba78302b5	Rename argument strings of codegen passes to avoid collisions with command line options. This commit changes the command line arguments (PassInfo::PassArgument) of two passes, MachineFunctionPrinter and MachineScheduler, to avoid collisions with command line options that have the same argument strings. This bug manifests when the PassList construct (defined in opt.cpp) is used in a tool that links with codegen passes. To reproduce the bug, paste the following lines into llc.cpp and run llc. #include "llvm/IR/LegacyPassNameParser.h" static llvm:🆑:list<const llvm::PassInfo*, bool, llvm::PassNameParser> PassList(llvm:🆑:desc("Optimizations available:")); rdar://problem/19212448 llvm-svn: 224186	2014-12-13 04:52:04 +00:00
Andrea Di Biagio	d65fd9facd	Reapply "[MachineScheduler] Fix for PR21807: minor code difference building with/without -g." This reapplies r224118 with a fix for test 'misched-code-difference-with-debug.ll'. That test was failing on some buildbots because it was x86 specific but it was missing a target triple. Added an explicit triple to test misched-code-difference-with-debug.ll. llvm-svn: 224126	2014-12-12 15:09:58 +00:00
Andrea Di Biagio	5634a54efc	Revert: [MachineScheduler] Fix for PR21807: minor code difference building with/without -g. Test 'misched-code-difference-with-debug.ll' was failing on some buildbots. llvm-svn: 224121	2014-12-12 13:34:03 +00:00
Andrea Di Biagio	01236e3eca	[MachineScheduler] Fix for PR21807: minor code difference building with/without -g. This patch fixes the issue reported as PR21807. There was a minor difference in the generated code depending on the -g flag. The cause was that with -g the machine scheduler used a different scheduling strategy. This decision was based on the number of instructions in a schedule region and included debug instructions in that count. This patch fixes the issue in MISched and provides a test. Patch by Russell Gallop! llvm-svn: 224118	2014-12-12 12:41:22 +00:00
Ekaterina Romanova	90ff20d8f5	A fix for PR21176. DW_OP_const <const> doesn't describe a constant value, but a value at a constant address. The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value. Added DW_OP_stack_value to the stack. Marked incorrect-variable-debugloc1.ll to xfail for PowerPC64, while the the failure (PR21881) is being investigated. llvm-svn: 224098	2014-12-12 05:11:47 +00:00
Philip Reames	60de8b29f7	Comment and minor code cleanup for GCStrategy (NFC) Updating comments to reflect the current state of the world after my recent changes to ownership structure and generally better describe what a GCStrategy is and how it works. llvm-svn: 224086	2014-12-12 00:49:03 +00:00
Matt Arsenault	810cb62962	Add target hook for whether it is profitable to reduce load widths Add an option to disable optimization to shrink truncated larger type loads to smaller type loads. On SI this prevents using scalar load instructions in some cases, since there are no scalar extloads. llvm-svn: 224084	2014-12-12 00:00:24 +00:00
Duncan P. N. Exon Smith	d6f8e4b03c	CodeGen: Stop using LeakDetector for MachineInstr Since `MachineInstr` is required to have a trivial destructor, it cannot remove itself from `LeakDetection`. Remove the calls. As it happens, this requirement is because `MachineFunction` allocates all `MachineInstr`s in a custom allocator; when the `MachineFunction` is destroyed they're dropped of the edge. There's no benefit to detecting leaks. llvm-svn: 224061	2014-12-11 21:51:37 +00:00
Matthias Braun	7e37a5f523	[CodeGen] Add print and verify pass after each MachineFunctionPass by default Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. This is the 2nd attempt at this after realizing that PassManager::add() may actually delete the pass. llvm-svn: 224059	2014-12-11 21:26:47 +00:00
Rafael Espindola	01c73610d0	This reverts commit r224043 and r224042. check-llvm was failing. llvm-svn: 224045	2014-12-11 20:03:57 +00:00
Matthias Braun	a7c82a9f1d	[CodeGen] Add print and verify pass after each MachineFunctionPass by default Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. llvm-svn: 224042	2014-12-11 19:42:05 +00:00
Matthias Braun	a4e932db16	[CodeGen] Let MachineVerifierPass own its banner string llvm-svn: 224041	2014-12-11 19:41:51 +00:00
Patrik Hagglund	cb06a36c9a	Bugfix in InlineSpiller::traceSiblingValue(). Properly determine whether or not a phi was added by splitting. Check against the current VNInfo of OrigLI instead of against the OrigVNI argument. Patch provided by Jonas Paulsson. Reviewed by Quentin Colombet. llvm-svn: 224009	2014-12-11 10:40:17 +00:00
Ekaterina Romanova	75fd123967	Reverting commit 223981, because the test that I added (incorrect-variable-debugloc1.ll) failed for llvm-ppc64. The test is failing for llvm-ppc64 because for this platform the location list is not being generated at all (most likely because of the bug in PPC code optimization or generation). I will file a bug agains PPC compiler, but meanwhile, until PPC bug is fixed, I will have to revert my change. llvm-svn: 224000	2014-12-11 06:22:35 +00:00
Philip Reames	1e30897497	GCStrategy should not own GCFunctionInfo This change moves the ownership and access of GCFunctionInfo (the object which describes the safepoints associated with a safepoint under GCRoot) to GCModuleInfo. Previously, this was owned by GCStrategy which was in turned owned by GCModuleInfo. This made GCStrategy module specific which is 'surprising' given it's name and other purposes. There's a few more changes needed, but we're getting towards the point we can reuse GCStrategy for gc.statepoint as well. p.s. The style of this code ends up being a mess. I was trying to move code around without otherwise changing much. Once I get the ownership structure rearranged, I will go through and fixup spacing, naming, comments etc. Differential Revision: http://reviews.llvm.org/D6587 llvm-svn: 223994	2014-12-11 01:47:23 +00:00
Matthias Braun	09afa1ea74	LiveInterval: Use range based for loops for subregister ranges. llvm-svn: 223991	2014-12-11 00:59:06 +00:00
Ekaterina Romanova	ceeaba7932	A fix for PR21176. DW_OP_const <const> doesn't describe a constant value, but a value at a constant address. The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value. Added DW_OP_stack_value to the stack. -This line, and those below, will be ignored-- M lib/CodeGen/AsmPrinter/DwarfDebug.cpp A test/DebugInfo/incorrect-variable-debugloc1.ll llvm-svn: 223981	2014-12-10 23:19:56 +00:00
Matthias Braun	96761959d4	LiveInterval: Use more range based for loops for value numbers and segments. llvm-svn: 223978	2014-12-10 23:07:54 +00:00
Aaron Ballman	e5a2a0c9a8	Silencing a -Wsequence-point warning, and the resulting undefined behavior. NFC. llvm-svn: 223926	2014-12-10 14:14:54 +00:00
Matthias Braun	96d7732b08	MachineVerifier: Allow physreg use if just a subreg is defined. We can't mark partially undefined registers, so we have to allow reading a register in the machine verifier if just parts of a register are defined. llvm-svn: 223896	2014-12-10 01:13:13 +00:00
Matthias Braun	21554d9b30	MachineVerifier: Allow LiveInterval segments to end at a partial write. In the subregister liveness tracking case we do not create implicit reads on partial register writes anymore, still we need to produce a new SSA value for partial writes so the live segment has to end. llvm-svn: 223895	2014-12-10 01:13:11 +00:00
Matthias Braun	279f83645c	VirtRegMap: Improve block live-in info if subregister liveness is available. llvm-svn: 223894	2014-12-10 01:13:08 +00:00
Matthias Braun	d70caaf5a5	VirtRegMap: No implicit defs/uses for super registers with subreg liveness tracking. Adding the implicit defs/uses to the superregisters is semantically questionable but was not dangerous before as the register allocator never assigned the same register to two overlapping LiveIntervals even when the actually live subregisters do not overlap. With subregister liveness tracking enabled this does actually happen and leads to subsequent bugs if we don't stop adding the superregister defs/uses. llvm-svn: 223892	2014-12-10 01:13:04 +00:00
Matthias Braun	587e27415d	LiveRegMatrix: Respect subregister liveness when allocating registers. llvm-svn: 223891	2014-12-10 01:13:01 +00:00
Matthias Braun	a0f0c1f013	LiveIntervalUnion: Allow specification of liverange when unifying/extracting. This allows it to add subregister ranges into the union. llvm-svn: 223890	2014-12-10 01:12:59 +00:00
Matthias Braun	14f764c872	RegisterCoalescer: Preserve subregister liveranges. llvm-svn: 223888	2014-12-10 01:12:52 +00:00
Matthias Braun	2079aa9140	LiveInterval: Add removeEmptySubRanges(). llvm-svn: 223887	2014-12-10 01:12:40 +00:00
Matthias Braun	8970d847c4	LiveIntervalAnalysis: Add subregister aware variants pruneValue(). llvm-svn: 223886	2014-12-10 01:12:36 +00:00
Matthias Braun	e3d3b88cb9	Add a flag to enable/disable subregister liveness. llvm-svn: 223884	2014-12-10 01:12:30 +00:00
Matthias Braun	e5f861b781	LiveIntervalAnalysis: Adapt repairIntervalsInRange() to subregister liveness. llvm-svn: 223883	2014-12-10 01:12:26 +00:00
Matthias Braun	fe896c703c	LiveRangeEdit: Adapt eliminateDeadDef() to subregister liveness. llvm-svn: 223882	2014-12-10 01:12:23 +00:00
Matthias Braun	7044d69e87	LiveIntervalAnalysis: Adapt handleMove() to subregister ranges. llvm-svn: 223881	2014-12-10 01:12:20 +00:00
Matthias Braun	20e1f38a41	LiveIntervalAnalysis: Update SubRanges in shrinkToUses(). llvm-svn: 223880	2014-12-10 01:12:18 +00:00
Matthias Braun	2f66232bde	LiveIntervalAnalysis: Compute subregister ranges. llvm-svn: 223878	2014-12-10 01:12:12 +00:00
Matthias Braun	3f1d8fdd33	LiveInterval: Add support to track liveness of subregisters. This code adds the required data structures. Algorithms to compute it follow. llvm-svn: 223877	2014-12-10 01:12:10 +00:00
Matthias Braun	e62c207092	LiveInterval: Add a 'covers' operation to LiveRange. llvm-svn: 223876	2014-12-10 01:12:06 +00:00
Philip Reames	de226055ca	Remove the Module pointer from GCStrategy and GCMetadataPrinter In the current implementation, GCStrategy is a part of the ownership structure for the gc metadata which describes a Module. It also contains a reference to the module in question. As a result, GCStrategy instances are essentially Module specific. I plan to transition away from this design. Instead, a GCStrategy will be owned by the LLVMContext. It will be a lightweight policy object which contains no information about the Modules or Functions involved, but can be easily reached given a Function. The first step in this transition is to remove the direct Module reference from GCStrategy. This also requires removing the single user of this reference, the GCMetadataPrinter hierarchy. In theory, this will allow the lifetime of the printers to be scoped to the LLVMContext as well, but in practice, I'm not actually changing that. (Yet?) An alternate design would have been to move the direct Module reference into the GCMetadataPrinter and change the keying of the owning maps to explicitly key off both GCStrategy and Module. I'm open to doing it that way instead, but didn't see much value in preserving the per Module association for GCMetadataPrinters. The next change in this sequence will be to start unwinding the intertwined ownership between GCStrategy, GCModuleInfo, and GCFunctionInfo. Differential Revision: http://reviews.llvm.org/D6566 llvm-svn: 223859	2014-12-09 23:57:54 +00:00
Duncan P. N. Exon Smith	5bf8fef580	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
Juergen Ributzka	8bda738221	[CGP] Rewrite pattern match for splitBranchCondition to work with Values instead. Rewrite the pattern match code to work also with Values instead with Instructions only. Also remove the no longer need matcher (m_Instruction). llvm-svn: 223797	2014-12-09 17:50:10 +00:00
Juergen Ributzka	194350a936	Revert "Move function to obtain branch weights into the BranchInst class. NFC." This reverts commit r223784 and copies the 'ExtractBranchMetadata' to CodeGenPrepare. llvm-svn: 223795	2014-12-09 17:32:12 +00:00
Juergen Ributzka	c1bbcbbd32	[CodeGenPrepare] Split branch conditions into multiple conditional branches. This optimization transforms code like: bb1: %0 = icmp ne i32 %a, 0 %1 = icmp ne i32 %b, 0 %or.cond = or i1 %0, %1 br i1 %or.cond, label %TrueBB, label %FalseBB into a multiple branch instructions like: bb1: %0 = icmp ne i32 %a, 0 br i1 %0, label %TrueBB, label %bb2 bb2: %1 = icmp ne i32 %b, 0 br i1 %1, label %TrueBB, label %FalseBB This optimization is already performed by SelectionDAG, but not by FastISel. FastISel cannot perform this optimization, because it cannot generate new MachineBasicBlocks. Performing this optimization at CodeGenPrepare time makes it available to both - SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be removed. There are currenty a few differences in codegen for X86 and PPC, so this commmit only enables it for FastISel. Reviewed by Jim Grosbach This fixes rdar://problem/19034919. llvm-svn: 223786	2014-12-09 16:36:13 +00:00
Owen Anderson	558012a3fc	Fix a few instances found in SelectionDAG where we were not handling F16 at parity with F32 and F64. llvm-svn: 223760	2014-12-09 06:50:39 +00:00
Hal Finkel	c8cf2b88bc	Handle early-clobber registers in the aggressive anti-dep breaker The aggressive anti-dep breaker, used by the PowerPC backend during post-RA scheduling (but is available to all targets), did not handle early-clobber MI operands (at all). When constructing the list of available registers for the replacement of some def operand, check the using instructions, and remove registers assigned to early-clobbered defs from the set. Fixes PR21452. llvm-svn: 223727	2014-12-09 01:00:59 +00:00
Tom Stellard	3e01d47d98	MISched: Fix moving stores across barriers This fixes an issue with ScheduleDAGInstrs::buildSchedGraph where stores without an underlying object would not be added as a predecessor to the current BarrierChain. llvm-svn: 223717	2014-12-08 23:36:48 +00:00
Justin Bogner	61ba2e3996	InstrProf: An intrinsic and lowering for instrumentation based profiling Introduce the ``llvm.instrprof_increment`` intrinsic and the ``-instrprof`` pass. These provide the infrastructure for writing counters for profiling, as in clang's ``-fprofile-instr-generate``. The implementation of the instrprof pass is ported directly out of the CodeGenPGO classes in clang, and with the followup in clang that rips that code out to use these new intrinsics this ends up being NFC. Doing the instrumentation this way opens some doors in terms of improving the counter performance. For example, this will make it simple to experiment with alternate lowering strategies, and allows us to try handling profiling specially in some optimizations if we want to. Finally, this drastically simplifies the frontend and puts all of the lowering logic in one place. llvm-svn: 223672	2014-12-08 18:02:35 +00:00
Hans Wennborg	08de833c1c	SelectionDAG switch lowering: Replace unreachable default with most popular case. This can significantly reduce the size of the switch, allowing for more efficient lowering. I also worked with the idea of exploiting unreachable defaults by omitting the range check for jump tables, but always ended up with a non-neglible binary size increase. It might be worth looking into some more. SimplifyCFG currently does this transformation, but I'm working towards changing that so we can optimize harder based on unreachable defaults. Differential Revision: http://reviews.llvm.org/D6510 llvm-svn: 223566	2014-12-06 01:28:50 +00:00
Eric Christopher	d1fb7e4590	These two calls were grabbing the same register info. Unify them. llvm-svn: 223502	2014-12-05 19:23:55 +00:00
Ahmed Bougacha	55e3c2d9cf	[CodeGenPrepare] Use variables for reused values. NFC. llvm-svn: 223491	2014-12-05 18:04:40 +00:00
Hal Finkel	66d7791176	Revert "r223440 - Consider subregs when calling MI::registerDefIsDead for phys deps" Reverting this because, while it fixes the problem in the reduced test case, it does not fix the problem in the full test case from the bug report. llvm-svn: 223442	2014-12-05 02:07:35 +00:00
Hal Finkel	d013d99fe0	Consider subregs when calling MI::registerDefIsDead for phys deps The scheduling dependency graph is built bottom-up within each scheduling region, and ScheduleDAGInstrs::addPhysRegDeps is called to add output/anti dependencies, based on physical registers, to the SUs for instructions based on those that come before them. In the test case, we start before post-RA scheduling with a block that looks like this: ... INLINEASM <... andc $0,$0,$2 stdcx. $0,0,$3 bne- 1b > [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:G8RC], %X6<earlyclobber,def,dead>, $1:[mem], %X3<kill>, $2:[reguse:G8RC], %X5<kill>, $3:[reguse:G8RC], %X3, $4:[mem], %X3, $5:[clobber], %CC<earlyclobber,imp-def,dead>, <<badref>> ... %X4<def,dead> = ANDIo8 %X4<kill>, 1, %CR0<imp-def,dead>, %CR0GT<imp-def> ... %R29<def> = ISEL %R3<undef>, %R4<kill>, %CR0GT<kill> where it is relevant that %CC is an alias to %CR0, and that %CR0GT is a subregister of %CR0. However, for post-RA scheduling, no dependency was added to prevent the INLINEASM from being scheduled in between the ANDIo8 and the ISEL (which communicate via the %CR0GT register). In ScheduleDAGInstrs::addPhysRegDeps, when called for the %CC operand, we'd iterate over all of its aliases (which include %CC itself and also %CR0), and look for previously-encountered defs of those registers. We'd find the ANDIo8, but decide not to add a dependency between the INLINEASM and the ANDIo8 because both the INLINEASM's def of %CC is dead, and also the ANDIo8 def of %CR0 is dead. This ignores, however, that ANDIo8 has a non-dead def of %CR0GT, a subregister of %CR0, and thus a dependency still must exist. To fix this problem, when calling registerDefIsDead on the SU with the def, we also check all subregisters for possible non-dead defs, and add the dependency if any are found. Fixes PR21742. llvm-svn: 223440	2014-12-05 01:57:22 +00:00
Adrian Prantl	ab255fcd09	Cleanup: Calls to getDwarfRegNum() may actually fail, if there is no DWARF register number mapping, or if the register was a virtual register that was never materialized. Previously, we would just emit a bogus location, after this patch we don't emit a location at all by doing an early exit. After my bugfix in r223401 today, this doesn't actually happen on any target that I tested this with, but it's still preferable to make the possibility of a failure explicit. llvm-svn: 223428	2014-12-05 01:02:46 +00:00
Adrian Prantl	da7e03f1bf	Simplify implementation and testcase of r223401 based on feedback from dblaikie. llvm-svn: 223405	2014-12-04 22:58:41 +00:00
Adrian Prantl	a3ae0b3b5b	Debug info: If the RegisterCoalescer::reMaterializeTrivialDef() is eliminating all uses of a vreg, update any DBG_VALUE describing that vreg to point to the rematerialized register instead. llvm-svn: 223401	2014-12-04 22:29:04 +00:00
Patrik Hagglund	d06de4b954	Use DomTree in MachineSink to sink over diamonds. According to a previous FIXME comment we now not only look at MBB successors, but also handle code sinking past them: x = computation if () {} else {} use x The instruction could be sunk over the whole diamond for the if/then/else (or loop, etc), allowing it to be sunk into other blocks after that. Modified test added in r204522, due to one spill less present. Minor fixes in comments. Patch provided by Jonas Paulsson. Reviewed by Hal Finkel. llvm-svn: 223350	2014-12-04 10:36:42 +00:00
Simon Pilgrim	be24ab367b	[InstCombine] Minor optimization for bswap with binary ops Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 llvm-svn: 223349	2014-12-04 09:44:01 +00:00
Elena Demikhovsky	f1de34b84d	Masked Load / Store Intrinsics - the CodeGen part. I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348	2014-12-04 09:40:44 +00:00
Matt Arsenault	4e27343eec	Allow target to specify prefix for labels Use the MCAsmInfo instead of the DataLayout, and allow specifying a custom prefix for labels specifically. HSAIL requires that labels begin with @, but global symbols with &. llvm-svn: 223323	2014-12-04 00:06:57 +00:00
Quentin Colombet	079aba733a	[RegAllocFast] Handle implicit definitions conservatively. Prior to this commit, physical registers defined implicitly were considered free right after their definition, i.e.. like dead definitions. Therefore, their uses had to immediately follow their definitions, otherwise the related register may be reused to allocate a virtual register. This commit fixes this assumption by keeping implicit definitions alive until they are actually used. The downside is that if the implicit definition was dead (and not marked at such), we block an otherwise available register. This is however conservatively correct and makes the fast register allocator much more robust in particular regarding the scheduling of the instructions. Fixes PR21700. llvm-svn: 223317	2014-12-03 23:38:08 +00:00
Peter Collingbourne	51d2de7b9e	Prologue support Patch by Ben Gamari! This redefines the `prefix` attribute introduced previously and introduces a `prologue` attribute. There are a two primary usecases that these attributes aim to serve, 1. Function prologue sigils 2. Function hot-patching: Enable the user to insert `nop` operations at the beginning of the function which can later be safely replaced with a call to some instrumentation facility 3. Runtime metadata: Allow a compiler to insert data for use by the runtime during execution. GHC is one example of a compiler that needs this functionality for its tables-next-to-code functionality. Previously `prefix` served cases (1) and (2) quite well by allowing the user to introduce arbitrary data at the entrypoint but before the function body. Case (3), however, was poorly handled by this approach as it required that prefix data was valid executable code. Here we redefine the notion of prefix data to instead be data which occurs immediately before the function entrypoint (i.e. the symbol address). Since prefix data now occurs before the function entrypoint, there is no need for the data to be valid code. The previous notion of prefix data now goes under the name "prologue data" to emphasize its duality with the function epilogue. The intention here is to handle cases (1) and (2) with prologue data and case (3) with prefix data. References ---------- This idea arose out of discussions[1] with Reid Kleckner in response to a proposal to introduce the notion of symbol offsets to enable handling of case (3). [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html Test Plan: testsuite Differential Revision: http://reviews.llvm.org/D6454 llvm-svn: 223189	2014-12-03 02:08:38 +00:00
Hal Finkel	bbdee93638	[PowerPC] Implement readcyclecounter for PPC32 We've long supported readcyclecounter on PPC64, but it is easier there (the read of the 64-bit time-base register can be accomplished via a single instruction). This now provides an implementation for PPC32 as well. On PPC32, the time-base register is still 64 bits, but can only be read 32 bits at a time via two separate SPRs. The ISA manual explains how to do this properly (it involves re-reading the upper bits and looping if the counter has wrapped while being read). This requires PPC to implement a custom integer splitting legalization for the READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then gets turned into a pseudo-instruction, which is then expanded to the necessary sequence (which has three SPR reads, the comparison and the branch). Thanks to Paul Hargrove for pointing out to me that this was still unimplemented. llvm-svn: 223161	2014-12-02 22:01:00 +00:00
Philip Reames	72fbe7a6f0	Restructure some assertion checking based on post commit feedback by Aaron and Tom. llvm-svn: 223150	2014-12-02 21:01:48 +00:00
Philip Reames	f814a511da	Appease a build bot complaining about an unused variable that's used in an assertion. llvm-svn: 223142	2014-12-02 19:28:57 +00:00
Philip Reames	1a1bdb22bf	[Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder This is the third patch in a small series. It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085). The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them. With this change, gc.statepoints should be functionally complete. The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now. I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated. The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it. During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints. Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack. The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases. In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator. In principal, we shouldn't need to eagerly spill at all. The register allocator should do any spilling required and the statepoint should simply record that fact. Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure. Reviewed by: atrick, ributzka llvm-svn: 223137	2014-12-02 18:50:36 +00:00
Ahmed Bougacha	54b7d334c7	[MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction. Go through implicit defs of CSMI and MI, and clear the kill flags on their uses in all the instructions between CSMI and MI. We might have made some of the kill flags redundant, consider: subs ... %NZCV<imp-def> <- CSMI csinc ... %NZCV<imp-use,kill> <- this kill flag isn't valid anymore subs ... %NZCV<imp-def> <- MI, to be eliminated csinc ... %NZCV<imp-use,kill> Since we eliminated MI, and reused a register imp-def'd by CSMI (here %NZCV), that register, if it was killed before MI, should have that kill flag removed, because it's lifetime was extended. Also, add an exhaustive testcase for the motivating example. Reviewed by: Juergen Ributzka <juergen@apple.com> llvm-svn: 223133	2014-12-02 18:09:51 +00:00
Philip Reames	0365f1a376	[Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & x86-64 Backend This is the second patch in a small series. This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints. It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code. I will be submitting the SelectionDAG parts within the next 24-48 hours. Since those pieces are by far the most complicated, I wanted to minimize the size of that patch. That patch will include the tests which exercise the functionality in this patch. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683. The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots. The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes. The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT. (Doing so would break relocation semantics for collectors which wish to relocate roots.) The implementation of STATEPOINT is closely modeled on PATCHPOINT. Eventually, much of the code in this patch will be removed. The long term plan is to merge the functionality provided by statepoints and patchpoints. Merging their implementations in the backend is likely to be a good starting point. Reviewed by: atrick, ributzka llvm-svn: 223085	2014-12-01 22:52:56 +00:00
Ahmed Bougacha	fb6eeb74c5	[MachineVerifier] Accept a MBB with a single landing pad successor. The MachineVerifier used to check that there was always exactly one unconditional branch to a non-landingpad (normal) successor. If that normal successor to an invoke BB is unreachable, it seems reasonable to only have one successor, the landing pad. On targets other than AArch64 (and on AArch64 with a different testcase), the branch folder turns the branch to the landing pad into a fallthrough. The MachineVerifier, which relies on AnalyzeBranch, is unable to check the condition, and doesn't complain. However, it does in this specific testcase, where the branch to the landing pad remained. Make the MachineVerifier accept it. llvm-svn: 223059	2014-12-01 18:43:53 +00:00
Hans Wennborg	5bef5b522b	Revert r223049, r223050 and r223051 while investigating test failures. I didn't foresee affecting the Clang test suite :/ llvm-svn: 223054	2014-12-01 17:36:43 +00:00
Hans Wennborg	1571336fb2	SelectionDAG switch lowering: Replace unreachable default with most popular case. This can significantly reduce the size of the switch, allowing for more efficient lowering. I also worked with the idea of exploiting unreachable defaults by omitting the range check for jump tables, but always ended up with a non-neglible binary size increase. It might be worth looking into some more. llvm-svn: 223049	2014-12-01 17:08:32 +00:00
Akira Hatanaka	b9991a2656	[stack protector] Set edge weights for newly created basic blocks. This commit fixes a bug in stack protector pass where edge weights were not set when new basic blocks were added to lists of successor basic blocks. Differential Revision: http://reviews.llvm.org/D5766 llvm-svn: 222987	2014-12-01 04:27:03 +00:00
Hans Wennborg	6dfb041ffc	Switch lowering: reformat some for loops etc. NFC llvm-svn: 222962	2014-11-29 21:24:12 +00:00
Hans Wennborg	6c42d1a5de	Switch lowering: Fix broken 'Figure out which block is next' code This doesn't seem to have worked in a long time, but other optimizations would clean it up. llvm-svn: 222961	2014-11-29 21:17:05 +00:00
Simon Pilgrim	2bfd9129f4	Target triple OS detection tidyup. NFC Use Triple::isOS*() helpers where possible. llvm-svn: 222960	2014-11-29 19:18:21 +00:00
Duncan P. N. Exon Smith	9bc81fbe92	Revert "Masked Vector Load and Store Intrinsics." This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936	2014-11-28 21:29:14 +00:00
Elena Demikhovsky	6de72e5b0c	Converted back to Unix format (after my last commit 222632) llvm-svn: 222636	2014-11-23 15:21:53 +00:00
Elena Demikhovsky	9e5089a938	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
Manman Ren	f0a582bada	Debug Info: revert r222195, r222210 and r222239. This is no longer needed after David's fix at r222377 + r222485. rdar://18958417 llvm-svn: 222563	2014-11-21 19:55:23 +00:00
Manman Ren	c98ec0e70a	[Objective-C] Support a new special module flag that will be put into the objc_imageinfo struct. rdar://17954668 llvm-svn: 222558	2014-11-21 19:24:55 +00:00
Sanjay Patel	eb4a4d5aeb	Don't repeat class/function/variable names in comments. NFC. llvm-svn: 222555	2014-11-21 18:58:38 +00:00
Sanjay Patel	b06441aded	Less space; NFC llvm-svn: 222546	2014-11-21 18:05:59 +00:00
Andrea Di Biagio	0225b5bf6f	[DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero. Before this patch, the DAGCombiner only tried to convert build_vector dag nodes into shuffles if all operands were either extract_vector_elt or undef. This patch improves that logic and teaches the DAGCombiner how to deal with build_vector dag nodes where one or more operands are zero. A build_vector dag node with some zero operands is turned into a shuffle only if the resulting shuffle mask is legal for the target. llvm-svn: 222536	2014-11-21 14:32:06 +00:00
Andrea Di Biagio	26e8f4d166	[DAG] Refactor the shuffle combining logic in DAGCombiner. NFC. This patch simplifies the logic that combines a pair of shuffle nodes into a single shuffle if there is a legal mask. Also added comments to better describe the algorithm. No functional change intended. llvm-svn: 222522	2014-11-21 11:33:07 +00:00
Hao Liu	44e5d7a131	DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510	2014-11-21 06:39:58 +00:00
Matthias Braun	745ea43687	RegisterCoalescer: Improve debug messages - Show "Considering..." message after flipping so you actually see the final destination vreg as destination. - Add a message on final join, so you can grep for "Success" messages to obtain a list of which register got merged with which. llvm-svn: 222382	2014-11-19 19:46:17 +00:00
Matthias Braun	d2f4c77800	Add a print and verify pass after the RegisterCoalescer llvm-svn: 222381	2014-11-19 19:46:15 +00:00
Matthias Braun	47760d9667	MachineVerifier: Report register for bad liveranges llvm-svn: 222380	2014-11-19 19:46:13 +00:00
Matthias Braun	9f87d75060	Introduce register dump helper llvm-svn: 222379	2014-11-19 19:46:11 +00:00
Simon Pilgrim	3ac3b251a9	[X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2 This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain. I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr. Differential Revision: http://reviews.llvm.org/D5699 llvm-svn: 222340	2014-11-19 10:06:49 +00:00
David Blaikie	70573dcd9f	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334	2014-11-19 07:49:26 +00:00
David Blaikie	5106ce7897	Remove StringMap::GetOrCreateValue in favor of StringMap::insert Having two ways to do this doesn't seem terribly helpful and consistently using the insert version (which we already has) seems like it'll make the code easier to understand to anyone working with standard data structures. (I also updated many references to the Entry's key and value to use first() and second instead of getKey{Data,Length,} and get/setValue - for similar consistency) Also removes the GetOrCreateValue functions so there's less surface area to StringMap to fix/improve/change/accommodate move semantics, etc. llvm-svn: 222319	2014-11-19 05:49:42 +00:00
Owen Anderson	b5a259935c	Fix an incorrect chain operand when expanding INSERT_VECTOR operations through the stack. Patch by Daniil Troshkov! llvm-svn: 222254	2014-11-18 20:50:19 +00:00
Frederic Riss	fdccfc1e19	Allow DwarfCompileUnit::constructImportedEntityDIE to instanciate a GlobalVariable DIE. Usually global variables are in a retain list and instanciated before any call to constructImportedEntityDIE is made. This isn't true for forward declarations though. The testcase for this change is generated by a clang patched to emit such forward declarations (patch at http://reviews.llvm.org/D6173 which will land soon). The updated testcase tests more than just global variables, it now tests every type of 'using' clause we support. llvm-svn: 222217	2014-11-18 02:46:11 +00:00
Manman Ren	554865da5b	Debug Info: In DIBuilder, the context field of a global variable is updated to use DIScopeRef. A paired commit at clang will follow to show cases where we will use an identifer for the context of a global variable. rdar://18958417 llvm-svn: 222195	2014-11-18 00:29:08 +00:00
Oliver Stannard	d29db9b949	Fix optimisations of SELECT_CC which assumed result is boolean Some optimisations in DAGCombiner cause miscompilations for targets that use TargetLowering::UndefinedBooleanContent, because they assume that the results of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and XORed. These optimisations are only valid for targets that use ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent. This is a follow-up to D6210/r221693. llvm-svn: 222123	2014-11-17 10:49:31 +00:00
Craig Topper	f98c606479	Add missing semicolon from r222118. llvm-svn: 222119	2014-11-17 05:58:26 +00:00
Craig Topper	cf0444ba2a	Move register class name strings to a single array in MCRegisterInfo to reduce static table size and number of relocation entries. Indices into the table are stored in each MCRegisterClass instead of a pointer. A new method, getRegClassName, is added to MCRegisterInfo and TargetRegisterInfo to lookup the string in the table. llvm-svn: 222118	2014-11-17 05:50:14 +00:00
Craig Topper	6438fc3d05	Replace a couple asserts with static_asserts. llvm-svn: 222114	2014-11-17 00:26:50 +00:00
Craig Topper	7f416c8acb	Convert some EVTs to MVTs where only a SimpleValueType is needed. llvm-svn: 222109	2014-11-16 21:17:18 +00:00
Andrea Di Biagio	e13a0b81f4	[DAG] Improved target independent vector shuffle folding logic. This patch teaches the DAGCombiner how to combine shuffles according to rules: shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2) llvm-svn: 222090	2014-11-15 22:56:25 +00:00
Reid Kleckner	c2291f3905	Rename EH related stuff to be more precise Summary: The current "WinEH" exception handling type is more about Itanium-style LSDA tables layered on top of the Windows native unwind info format instead of .eh_frame tables or EHABI unwind info. Use the name "ItaniumWinEH" to better reflect the hybrid nature of the design. Also rename isExceptionHandlingDWARF to usesItaniumLSDAForExceptions, since the LSDA is part of the Itanium C++ ABI document, and not the DWARF standard. Reviewers: echristo Subscribers: llvm-commits, compnerd Differential Revision: http://reviews.llvm.org/D6279 llvm-svn: 222062	2014-11-14 23:31:07 +00:00
Reid Kleckner	283bc2ed28	Allow the use of functions as typeinfo in landingpad clauses This is one step towards supporting SEH filter functions in LLVM. llvm-svn: 221954	2014-11-14 00:35:50 +00:00
Reid Kleckner	971c3ea67b	Use nullptr instead of NULL for variadic sentinels Windows defines NULL to 0, which when used as an argument to a variadic function, is not a null pointer constant. As a result, Clang's -Wsentinel fires on this code. Using '0' would be wrong on most 64-bit platforms, but both MSVC and Clang make it work on Windows. Sidestep the issue with nullptr. llvm-svn: 221940	2014-11-13 22:55:19 +00:00
Aditya Nandakumar	3053155652	We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed. llvm-svn: 221926	2014-11-13 21:29:21 +00:00
Aditya Nandakumar	a27193297f	This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878	2014-11-13 09:26:31 +00:00
Frederic Riss	0f7abef2cf	Add an assert and a test that verify r221709's fix. llvm-svn: 221854	2014-11-13 03:20:23 +00:00
Quentin Colombet	f5485bb008	[CodeGenPrepare] Handle zero extensions in the TypePromotionHelper. Prior to this patch the TypePromotionHelper was promoting only sign extensions. Supporting zero extensions changes: - How constants are extended. - How sign extensions, zero extensions, and truncate are composed together. - How the type of the extended operation is recorded. Now we need to know the kind of the extension as well as its type. Each change is fairly small, unlike the diff. Most of the diff are comments/variable renaming to say "extension" instead of "sign extension". The performance improvements on the test suite are within the noise. Related to <rdar://problem/18310086>. llvm-svn: 221851	2014-11-13 01:44:51 +00:00
Frederic Riss	3a6b354b3e	Fix emission of Dwarf accelerator table when there are multiple CUs. The DIE offset in the accel tables is an offset relative to the start of the debug_info section, but we were encoding the offset to the start of the containing CU. llvm-svn: 221837	2014-11-12 23:48:14 +00:00
Ahmed Bougacha	026600d967	[CodeGenPrepare] Replace other uses of EVT::getEVT with TL::getValueType. r221820 fixed a problem (PR21548) where an iPTR was used in TLI legality checks, which isn't valid and resulted in a failed assertion. The solution was to lower pointer types into the correct target's VT, by using TL::getValueType instead of EVT::getEVT. This commit changes 3 other uses of EVT::getEVT, but without any tests: - One of these non-lowered EVTs is passed to allowsMisalignedMemoryAccesses, which goes into target's TL implementation and doesn't cause any problem (yet.) - Two others are passed to TLI.isOperationLegalOrCustom: - one only looks at extensions, so doesn't concern pointers. - one only looks at binary operators, so also isn't a problem. The latter might some day be exposed to pointers and cause the same assert as the original PR, because there's a comment hinting at also supporting cast ops. For consistency, update all of them and be done with it. llvm-svn: 221827	2014-11-12 23:05:03 +00:00
Ahmed Bougacha	0788d49a40	[CodeGenPrepare][AArch64] Fix a TLI legality check on iPTR to use a lowered instead. Fixes PR21548. Related to PR20474. llvm-svn: 221820	2014-11-12 22:16:55 +00:00
Timur Iskhodzhanov	0e76a16200	Temporary fix for PR21528 - use mangled C++ function names in COFF debug info to un-break ASan on Windows llvm-svn: 221813	2014-11-12 20:21:20 +00:00
Timur Iskhodzhanov	a11b32b7e5	[COFF] Make it clearer that the symbols subsection holds function display name rather than just name llvm-svn: 221812	2014-11-12 20:10:09 +00:00
Duncan P. N. Exon Smith	de36e8040f	Revert "IR: MDNode => Value" Instead, we're going to separate metadata from the Value hierarchy. See PR21532. This reverts commit r221375. This reverts commit r221373. This reverts commit r221359. This reverts commit r221167. This reverts commit r221027. This reverts commit r221024. This reverts commit r221023. This reverts commit r220995. This reverts commit r220994. llvm-svn: 221711	2014-11-11 21:30:22 +00:00

1 2 3 4 5 ...

17679 Commits