llvm-project

Commit Graph

Author	SHA1	Message	Date
Duncan P. N. Exon Smith	10be9a8868	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206707, reapplying r206704. The preceding commit to CalcSpillWeights should have sorted out the failing buildbots. <rdar://problem/14292693> llvm-svn: 206766	2014-04-21 17:57:07 +00:00
Duncan P. N. Exon Smith	e63327e967	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206704, as expected. llvm-svn: 206707	2014-04-19 22:46:00 +00:00
Duncan P. N. Exon Smith	875ddfac75	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite. I've done a careful audit, added some asserts, and fixed a couple of bugs (unfortunately, they were in unlikely code paths). There's a small chance that this will appease the failing bots [1][2]. (If so, great!) If not, I have a follow-up commit ready that will temporarily add -debug-only=block-freq to the two failing tests, allowing me to compare the code path between what the failing bots and what my machines (and the rest of the bots) are doing. Once I've triggered those builds, I'll revert both commits so the bots go green again. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 <rdar://problem/14292693> llvm-svn: 206704	2014-04-19 22:34:26 +00:00
Duncan P. N. Exon Smith	76b813619a	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206666, as planned. Still stumped on why the bots are failing. Sanitizer bots haven't turned anything up. If anyone can help me debug either of the failures (referenced in r206666) I'll owe them a beer. (In the meantime, I'll be auditing my patch for undefined behaviour.) llvm-svn: 206677	2014-04-19 00:42:46 +00:00
Duncan P. N. Exon Smith	b3caf3646f	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206628, reapplying r206622 (and r206626). Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce on Darwin, and Chandler can't reproduce on Linux. Asan and valgrind don't tell us anything, but we're hoping the msan bot will catch it. So, I'm applying this again to get more feedback from the bots. I'll leave it in long enough to trigger builds in at least the sanitizer buildbots (it was failing for reasons unrelated to my commit last time it was in), and hopefully a few others.... and then I expect to revert a third time. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 llvm-svn: 206666	2014-04-18 22:30:03 +00:00
Duncan P. N. Exon Smith	0842ff36a6	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206622 and the MSVC fixup in r206626. Apparently the remotely failing tests are still failing, despite my attempt to fix the nondeterminism in r206621. llvm-svn: 206628	2014-04-18 17:56:08 +00:00
Duncan P. N. Exon Smith	f8361d127a	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206556, effectively reapplying commit r206548 and its fixups in r206549 and r206550. In an intervening commit I've added target triples to the tests that were failing remotely [1] (but passing locally). I'm hoping the mystery is solved? I'll revert this again if the tests are still failing remotely. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206622	2014-04-18 17:22:25 +00:00
Duncan P. N. Exon Smith	e576167df8	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commits r206548, r206549 and r206549. There are some unit tests failing that aren't failing locally [1], so reverting until I have time to investigate. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206556	2014-04-18 02:17:43 +00:00
Duncan P. N. Exon Smith	12e68e1733	blockfreq: Rewrite BlockFrequencyInfoImpl Rewrite the shared implementation of BlockFrequencyInfo and MachineBlockFrequencyInfo entirely. The old implementation had a fundamental flaw: precision losses from nested loops (or very wide branches) compounded past loop exits (and convergence points). The @nested_loops testcase at the end of test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This function has three nested loops, with branch weights in the loop headers of 1:4000 (exit:continue). The old analysis gives non-sensical results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': ---- Block Freqs ---- entry = 1.0 for.cond1.preheader = 1.00103 for.cond4.preheader = 5.5222 for.body6 = 18095.19995 for.inc8 = 4.52264 for.inc11 = 0.00109 for.end13 = 0.0 The new analysis gives correct results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': block-frequency-info: nested_loops - entry: float = 1.0, int = 8 - for.cond1.preheader: float = 4001.0, int = 32007 - for.cond4.preheader: float = 16008001.0, int = 128064007 - for.body6: float = 64048012001.0, int = 512384096007 - for.inc8: float = 16008001.0, int = 128064007 - for.inc11: float = 4001.0, int = 32007 - for.end13: float = 1.0, int = 8 Most importantly, the frequency leaving each loop matches the frequency entering it. The new algorithm leverages BlockMass and PositiveFloat to maintain precision, separates "probability mass distribution" from "loop scaling", and uses dithering to eliminate probability mass loss. I have unit tests for these types out of tree, but it was decided in the review to make the classes private to BlockFrequencyInfoImpl, and try to shrink them (or remove them entirely) in follow-up commits. The new algorithm should generally have a complexity advantage over the old. The previous algorithm was quadratic in the worst case. The new algorithm is still worst-case quadratic in the presence of irreducible control flow, but it's linear without it. The key difference between the old algorithm and the new is that control flow within a loop is evaluated separately from control flow outside, limiting propagation of precision problems and allowing loop scale to be calculated independently of mass distribution. Loops are visited bottom-up, their loop scales are calculated, and they are replaced by pseudo-nodes. Mass is then distributed through the function, which is now a DAG. Finally, loops are revisited top-down to multiply through the loop scales and the masses distributed to pseudo nodes. There are some remaining flaws. - Irreducible control flow isn't modelled correctly. LoopInfo and MachineLoopInfo ignore irreducible edges, so this algorithm will fail to scale accordingly. There's a note in the class documentation about how to get closer. See also the comments in test/Analysis/BlockFrequencyInfo/irreducible.ll. - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting the 64-bit integer precision used downstream. - The "bias" calculation proposed on llvmdev is not incorporated here. This will be added in a follow-up commit, once comments from this review have been handled. llvm-svn: 206548	2014-04-18 01:57:45 +00:00
Richard Osborne	da16ff47cd	[XCore] Don't create invalid MKMSK instructions inside loadImmediate(). Summary: Previously loadImmediate() would produce MKMSK instructions with invalid immediate values such as mkmsk r0, 9. Fix this by checking the mask size is valid. Reviewers: robertlytton Reviewed By: robertlytton CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3289 llvm-svn: 206163	2014-04-14 12:30:35 +00:00
Richard Osborne	47155af5eb	[XCore] Add support for the "m" inline asm constraint. Summary: This provides support for CP and DP relative global accesses in inline asm. Reviewers: robertlytton Reviewed By: robertlytton Differential Revision: http://llvm-reviews.chandlerc.com/D2943 llvm-svn: 203129	2014-03-06 16:37:48 +00:00
Richard Osborne	1b5fc39710	[XCore] Fix call of absolute address. Previously for: tail call void inttoptr (i64 65536 to void ()*)() nounwind We would emit: bl 65536 The immediate operand of the bl instruction is a relative offset so it is wrong to use the absolute address here. llvm-svn: 202860	2014-03-04 16:50:30 +00:00
Richard Osborne	521bdf211d	[XCore] Support functions returning more than 4 words. If a function returns a large struct by value return the first 4 words in registers and the rest on the stack in a location reserved by the caller. This is needed to support the xC language which supports functions returning an arbitrary number of return values. This is r202397 reapplied with a fix to avoid an uninitialized read of a member. llvm-svn: 202414	2014-02-27 17:47:54 +00:00
Richard Osborne	527aa5052d	Revert r202396, r202397. These are causing test failures, revert for now. llvm-svn: 202398	2014-02-27 14:24:13 +00:00
Richard Osborne	e82bf0988e	[XCore] Support functions returning more than 4 words. Summary: If a function returns a large struct by value return the first 4 words in registers and the rest on the stack in a location reserved by the caller. This is needed to support the xC language which supports functions returning an arbitrary number of return values. Reviewers: robertlytton Reviewed By: robertlytton CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2889 llvm-svn: 202397	2014-02-27 14:00:40 +00:00
Richard Osborne	a283d24ad9	[XCore] Target optimized library function __memcpy_4() Summary: If the src, dst and size of a memcpy are known to be 4 byte aligned we can call __memcpy_4() instead of memcpy(). Reviewers: robertlytton Reviewed By: robertlytton CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2871 llvm-svn: 202395	2014-02-27 13:39:07 +00:00
Richard Osborne	d6e85018c5	[XCore] Add dag combines for instructions that ignore some input bits. These instructions ignore the high bits of one of their input operands - try and use this to simplify the code. llvm-svn: 202394	2014-02-27 13:20:11 +00:00
Richard Osborne	2d3a2bee41	[XCore] Provide information about known zero bits of resource instructions. llvm-svn: 202393	2014-02-27 13:20:06 +00:00
Andrew Trick	9f240f742b	Use regnum regex in an XCore test case. llvm-svn: 202315	2014-02-26 23:22:49 +00:00
Andrew Trick	2560d11e72	Very temporarily XFAILing a test. Will be fixed shortly. llvm-svn: 202310	2014-02-26 22:39:59 +00:00
Richard Osborne	50e3b7f759	[XCore] Add intrinsic for CLRPT (clear port time) instruction. llvm-svn: 202172	2014-02-25 17:31:15 +00:00
Richard Osborne	92fdd3491a	[XCore] Add intrinsic for EDU (event disable unconditional) instruction. llvm-svn: 202171	2014-02-25 17:31:06 +00:00
Richard Osborne	8b7466e886	[XCore] Prefer to word align functions. The behaviour of the XCore's instruction buffer means that the performance of the same code sequence can differ depending on whether it starts at a 4 byte aligned address or not. Since we don't model the instruction buffer in the backend we have no way of knowing for sure if it is beneficial to word align a specific function. However, in the absence of precise modelling, it is better on balance to word align functions because: * It makes a fetch-nop while executing the prologue slightly less likely. * If we don't word align functions then a small perturbation in one function can have a dramatic knock on effect. If the size of the function changes it might change the alignment and therefore the performance of all the functions that happen to follow it in the binary. This butterfly effect makes it harder to reason about and measure the performance of code. llvm-svn: 202163	2014-02-25 16:37:15 +00:00
Robert Lytton	346e808ec6	XCore target: Handle common linkage llvm-svn: 201563	2014-02-18 11:21:59 +00:00
Robert Lytton	af6c256c34	XCore target: Fix llvm.eh.return and EH info register handling llvm-svn: 201561	2014-02-18 11:21:48 +00:00
Robert Lytton	70b5ba49c3	XCore target: fix const section handling Xcore target ABI requires const data that is externally visible to be handled differently if it has C-language linkage rather than C++ language linkage. Clang now emits ".cp.rodata" section information. All other externally visible constant data will be placed in the DP section. llvm-svn: 201144	2014-02-11 10:36:26 +00:00
Robert Lytton	9b6bb461b1	XCore target: Lower ATOMIC_LOAD & ATOMIC_STORE llvm-svn: 201143	2014-02-11 10:36:18 +00:00
Benjamin Kramer	c10563d14e	Fix broken CHECK lines. llvm-svn: 199016	2014-01-11 21:06:00 +00:00
Robert Lytton	9523aa41fb	XCore Target: correct callee save register spilling when callsUnwindInit is true. llvm-svn: 198616	2014-01-06 14:21:12 +00:00
Robert Lytton	c8c4aa667b	XCore target: Lower EH_RETURN llvm-svn: 198615	2014-01-06 14:21:07 +00:00
Robert Lytton	5da175214b	XCore target: Lower FRAME_TO_ARGS_OFFSET This requires a knowledge of the stack size which is not known until the frame is complete, hence the need for the XCoreFTAOElim pass which lowers the XCoreISD::FRAME_TO_ARGS_OFFSET instrution into its final form. llvm-svn: 198614	2014-01-06 14:21:00 +00:00
Robert Lytton	dec798751a	XCore target: Lower RETURNADDR Only handles a depth of zero (the same as FRAMEADDR) llvm-svn: 198613	2014-01-06 14:20:53 +00:00
Robert Lytton	cbb588a264	XCore target: Optimise entsp / retsp selection llvm-svn: 198612	2014-01-06 14:20:47 +00:00
Robert Lytton	bc4d976152	XCore target: fix handling of unsized global arrays in large code model llvm-svn: 198609	2014-01-06 14:20:32 +00:00
Robert Lytton	7fbca3cce0	XCore target: Make handling of large frames not dependent upon an FP. eliminateFrameIndex() has been reworked to handle both small & large frames with either a FP or SP. An additional Slot is required for Scavenging spills when not using FP for large frames. Reworked the handling of Register Scavenging. Whether we are using an FP or not, whether it is a large frame or not, and whether we are using a large code model or not are now independent. llvm-svn: 196091	2013-12-02 11:05:28 +00:00
Robert Lytton	d3ffa66c6c	XCore target: fix large code model 'select' indirect address handling. llvm-svn: 196088	2013-12-02 10:18:37 +00:00
Robert Lytton	ff38d37c77	XCore target: Add large code model When using large code model: Global objects larger than 'CodeModelLargeSize' bytes are placed in sections named with a trailing ".large" The folded global address of such objects are lowered into the const pool. During inspection it was noted that LowerConstantPool() was using a default offset of zero. A fix was made, but due to only offsets of zero being generated, testing only verifies the change is not detrimental. Correct the flags emitted for explicitly specified sections. We assume the size of the object queried by getSectionForConstant() is never greater than CodeModelLargeSize. To handle greater than CodeModelLargeSize, changes to AsmPrinter would be required. llvm-svn: 196087	2013-12-02 10:18:31 +00:00
Robert Lytton	7f4d1c9d11	XCore target: extend tests in preparation llvm-svn: 196086	2013-12-02 10:18:24 +00:00
Robert Lytton	0abd2c96b5	XCore target: Fix eliminateFrameIndex() to handle large frames Large frame offsets are loaded from the ConstantPool. Where possible, offsets are encoded using the smaller MKMSK instruction. Large frame offsets can only be used when there is a frame-pointer. llvm-svn: 196085	2013-12-02 10:18:19 +00:00
Robert Lytton	a9f984fb76	XCore target: Enable frames larger than 65535 to be lowered llvm-svn: 196084	2013-12-02 10:18:14 +00:00
Rafael Espindola	4929301af4	Error if we see an alias to a declaration. In ELF and COFF an alias is just another offset in a section. There is no way to represent an alias to something in another file. In MachO, the spec has the N_INDR type which should allow for exactly that, but is not currently implemented. Given that it is specified but not implemented, we error in codegen to avoid miscompiling but don't reject aliases to declarations in the verifier to leave the option open of implementing it. In the past we have used alias to declarations as a way of implementing weakref, which is why it exists in some old tests which this patch updates. llvm-svn: 194705	2013-11-14 13:58:06 +00:00
Robert Lytton	a83c0482dd	XCore target: implement exception handling llvm-svn: 194564	2013-11-13 10:19:31 +00:00
Robert Lytton	494591b87f	XCore target: fix bug in aligning 'byval i8*' on the stack llvm-svn: 194466	2013-11-12 10:11:35 +00:00
Robert Lytton	f7f0c5e326	XCore target test for hidden declaration llvm-svn: 194465	2013-11-12 10:11:30 +00:00
Robert Lytton	61d9149c73	Add XCore support for ATOMIC_FENCE. ATOMIC_FENCE is lowered to a compiler barrier which is codegen only. There is no need to emit an instructions since the XCore provides sequential consistency. Original patch by Richard Osborne llvm-svn: 194464	2013-11-12 10:11:26 +00:00
Robert Lytton	ed835b6fd4	XCore target: return error for unsupported alignment llvm-svn: 194463	2013-11-12 10:11:05 +00:00
Robert Lytton	4eec834c2b	XCore target fix bug in emitArrayBound() causing segmentation fault llvm-svn: 192434	2013-10-11 10:27:13 +00:00
Robert Lytton	117deb9695	XCore target does not emit '.hidden' or '.protected' attributes llvm-svn: 192433	2013-10-11 10:27:00 +00:00
Robert Lytton	9715375c25	XCore target: fix bug in XCoreLowerThreadLocal.cpp When a ConstantExpr which uses a thread local is part of a PHI node instruction, the insruction that replaces the ConstantExpr must be inserted in the predecessor block, in front of the terminator instruction. If the predecessor block has multiple successors, the edge is first split. llvm-svn: 192432	2013-10-11 10:26:48 +00:00
Robert Lytton	3139db02c8	XCore target: add XCoreTargetLowering::isZExtFree() llvm-svn: 192431	2013-10-11 10:26:29 +00:00

1 2 3 4

164 Commits