llvm-project

Commit Graph

Author	SHA1	Message	Date
Charlie Turner	8b2caa458f	Emit the build attribute Tag_conformance. Claim conformance to version 2.09 of the ARM ABI. This build attribute must be emitted first amongst the build attributes when written to an object file. This is to simplify conformance detection by consumers. Change-Id: If9eddcfc416bc9ad6e5cc8cdcb05d0031af7657e llvm-svn: 225166	2015-01-05 13:12:17 +00:00
Karthik Bhat	8ec742c2f9	Select lower sub,abs pattern to sabd on AArch64 This patch lowers patterns such as- sub v0.4s, v0.4s, v1.4s abs v0.4s, v0.4s to sabd v0.4s, v0.4s, v1.4s on AArch64. Review: http://reviews.llvm.org/D6781 llvm-svn: 225165	2015-01-05 13:11:07 +00:00
Michael Kuperstein	6ae456b0d7	Fix broken test from r225159. llvm-svn: 225164	2015-01-05 12:34:01 +00:00
Chandler Carruth	539dc4b9d5	[PM] Don't run the machinery of invalidating all the analysis passes when all are being preserved. We want to short-circuit this for a couple of reasons. One, I don't really want passes to grow a dependency on actually receiving their invalidate call when they've been preserved. I'm thinking about removing this entirely. But more importantly, preserving everything is likely to be the common case in a lot of scenarios, and it would be really good to bypass all of the invalidation and preservation machinery there. Avoiding calling N opaque functions to try to invalidate things that are by definition still valid seems important. =] This wasn't really inpsired by much other than seeing the spam in the logging for analyses, but it seems better ot get it checked in rather than forgetting about it. llvm-svn: 225163	2015-01-05 12:32:11 +00:00
Chandler Carruth	e5e8fb3bf6	[PM] Add names and debug logging for analysis passes to the new pass manager. This starts to allow us to test analyses more easily, but it's really only the beginning. Some of the code here is still untestable without manual changes to create analysis passes, but I wanted to factor it into a small of chunks as possible. Next up in order to be able to test things are, in no particular order: - No-op analyses passes so we don't have to use real ones to exercise the pass maneger itself. - Automatic way of generating dummy passes that require an analysis be run, including a variant that calls a 'print' method on a pass to make it even easier to print out the results of an analysis. - Dummy passes that invalidate all analyses for their IR unit so we can test invalidation and re-runs. - Automatic way to print each analysis pass as it is re-run. - Automatic but optional verification of analysis passes everywhere possible. I'm not claiming I'll get to all of these immediately, but that's what is in the pipeline at some stage. I'm fleshing out exactly what I need and what to prioritize by working on converting analyses and then trying to test the conversion. =] llvm-svn: 225162	2015-01-05 12:21:44 +00:00
Jiangning Liu	40c1b35292	Fixed a bug in memory dependence checking module of loop vectorization. The following loop should not be vectorized with current algorithm. {code} // loop body ... = a[i] (1) ... = a[i+1] (2) ....... a[i+1] = .... (3) a[i] = ... (4) {code} The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed. For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized. The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly. llvm-svn: 225159	2015-01-05 10:08:58 +00:00
Hal Finkel	9bb61de1be	[PowerPC] Enable speculation of cttz/ctlz PPC has an instruction for ctlz with defined zero behavior, and our lowering of cttz (provided by DAGCombine) is also efficient and branchless, so speculating these makes sense. llvm-svn: 225150	2015-01-05 05:24:42 +00:00
Chandler Carruth	73b0164fe5	[SROA] Apply a somewhat heavy and unpleasant hammer to fix PR22093, an assert out of the new pre-splitting in SROA. This fix makes the code do what was originally intended -- when we have a store of a load both dealing in the same alloca, we force them to both be pre-split with identical offsets. This is really quite hard to do because we can keep discovering problems as we go along. We have to track every load over the current alloca which for any resaon becomes invalid for pre-splitting, and go back to remove all stores of those loads. I've included a couple of test cases derived from PR22093 that cover the different ways this can happen. While that PR only really triggered the first of these two, its the same fundamental issue. The other challenge here is documented in a FIXME now. We end up being quite a bit more aggressive for pre-splitting when loads and stores don't refer to the same alloca. This aggressiveness comes at the cost of introducing potentially redundant loads. It isn't clear that this is the right balance. It might be considerably better to require that we only do pre-splitting when we can presplit every load and store involved in the entire operation. That would give more consistent if conservative results. Unfortunately, it requires a non-trivial change to the actual pre-splitting operation in order to correctly handle cases where we end up pre-splitting stores out-of-order. And it isn't 100% clear that this is the right direction, although I'm starting to suspect that it is. llvm-svn: 225149	2015-01-05 04:17:53 +00:00
Hal Finkel	2f61879ff4	[PowerPC] Materialize i64 constants using rotation with masking r225135 added the ability to materialize i64 constants using rotations in order to reduce the instruction count. Sometimes we can use a rotation only with some extra masking, so that we take advantage of the fact that generating a bunch of extra higher-order 1 bits is easy using li/lis. llvm-svn: 225147	2015-01-05 03:41:38 +00:00
Chandler Carruth	9c31db4f94	[PM] Wire up support for explicitly running the verifier pass. The required functionality has been there for some time, but I never managed to actually wire it into the command line registry of passes. Let's do that. llvm-svn: 225144	2015-01-05 00:08:53 +00:00
Simon Pilgrim	b65a6ee831	[X86][SSE] Added vector packing test for pr12412 llvm-svn: 225138	2015-01-04 19:08:03 +00:00
Simon Pilgrim	a1540c11ec	[X86][SSE] Added vector integer truncation tests - based off pr15524 llvm-svn: 225137	2015-01-04 17:52:00 +00:00
Hal Finkel	241ba79f95	[PowerPC] Materialize i64 constants using rotation Materializing full 64-bit constants on PPC64 can be expensive, requiring up to 5 instructions depending on the locations of the non-zero bits. Sometimes materializing a rotated constant, and then applying the inverse rotation, requires fewer instructions than the direct method. If so, do that instead. In r225132, I added support for forming constants using bit inversion. In effect, this reverts that commit and replaces it with rotation support. The bit inversion is useful for turning constants that are mostly ones into ones that are mostly zeros (thus enabling a more-efficient shift-based materialization), but the same effect can be obtained by using negative constants and a rotate, and that is at least as efficient, if not more. llvm-svn: 225135	2015-01-04 15:43:55 +00:00
Hal Finkel	ca6375fb75	[PowerPC] Materialize i64 constants using bit inversion Materializing full 64-bit constants on PPC64 can be expensive, requiring up to 5 instructions depending on the locations of the non-zero bits. Sometimes materializing the bit-reversed constant, and then flipping the bits, requires fewer instructions than the direct method. If so, do that instead. llvm-svn: 225132	2015-01-04 12:35:03 +00:00
David Majnemer	087dc8b831	InstCombine: match can find ConstantExprs, don't assume we have a Value We assumed the output of a match was a Value, this would cause us to assert because we would fail a cast<>. Instead, use a helper in the Operator family to hide the distinction between Value and Constant. This fixes PR22087. llvm-svn: 225127	2015-01-04 07:36:02 +00:00
David Majnemer	6ee8d17bc6	ValueTracking: ComputeNumSignBits should tolerate misshapen phi nodes PHI nodes can have zero operands in the middle of a transform. It is expected that utilities in Analysis don't freak out when this happens. Note that it is considered invalid to allow these misshapen phi nodes to make it to another pass. This fixes PR22086. llvm-svn: 225126	2015-01-04 07:06:53 +00:00
Saleem Abdulrasool	ddd926441e	llvm-readobj: add support to dump COFF export tables This enhances llvm-readobj to print out the COFF export table, similar to the -coff-import option. This is useful for testing in lld. llvm-svn: 225120	2015-01-03 21:35:09 +00:00
Saleem Abdulrasool	67f729933f	ARM: permit tail calls to weak externals on COFF Weak externals are resolved statically, so we can actually generate the tail call on PE/COFF targets without breaking the requirements. It is questionable whether we want to propagate the current behaviour for MachO as the requirements are part of the ARM ELF specifications, and it seems that prior to the SVN r215890, we would have tail'ed the call. For now, be conservative and only permit it on PE/COFF where the call will always be fully resolved. llvm-svn: 225119	2015-01-03 21:35:00 +00:00
Hal Finkel	5772566ed6	[PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference The existing code provided for specifying a global loop alignment preference. However, the preferred loop alignment might depend on the loop itself. For recent POWER cores, loops between 5 and 8 instructions should have 32-byte alignment (while the others are better with 16-byte alignment) so that the entire loop will fit in one i-cache line. To support this, getPrefLoopAlignment has been made virtual, and can be provided with an optional MachineLoop* so the target can inspect the loop before answering the query. The default behavior, as before, is to return the value set with setPrefLoopAlignment. MachineBlockPlacement now queries the target for each loop instead of only once per function. There should be no functional change for other targets. llvm-svn: 225117	2015-01-03 17:58:24 +00:00
Hal Finkel	d73bfba7eb	[PowerPC] Use 16-byte alignment for modern cores for functions/loops Most modern PowerPC cores prefer that functions and loops start on 16-byte-aligned boundaries (), so instruct block placement, etc. to make this happen. The branch selector has also been adjusted so account for the extra nops that might now be inserted before loop headers. () Some cores actually prefer other alignments for small loops, but that will be addressed in a follow-up commit. llvm-svn: 225115	2015-01-03 14:58:25 +00:00
Hal Finkel	4edc66b8de	[PowerPC] Add support for the CMPB instruction Newer POWER cores, and the A2, support the cmpb instruction. This instruction compares its operands, treating each of the 8 bytes in the GPRs separately, returning a 'mask' result of 0 (for false) or -1 (for true) in each byte. Code generation support is added, in the form of a PPCISelDAGToDAG DAG-preprocessing routine, that recognizes patterns close to what the instruction computes (either exactly, or related by a constant masking operation), and generates the cmpb instruction (along with any necessary constant masking operation). This can be expanded if use cases arise. llvm-svn: 225106	2015-01-03 01:16:37 +00:00
Kostya Serebryany	d421db05bb	[asan] simplify the tracing code, make it use the same guard variables as coverage llvm-svn: 225103	2015-01-03 00:54:43 +00:00
Craig Topper	ae8e1b3831	[X86] Disassembler support for move to/from %rax with a 32-bit memory offset is REX.W and AdSize prefix are both present. llvm-svn: 225099	2015-01-03 00:00:20 +00:00
David Majnemer	c8a576b5c0	InstCombine: Detect when llvm.umul.with.overflow always overflows We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero and LHSKnownOne * RHSKnownOne overflow. llvm-svn: 225077	2015-01-02 07:29:47 +00:00
Craig Topper	055845f5cb	[X86] Make the instructions that use AdSize16/32/64 co-exist together without using mode predicates. This is necessary to allow the disassembler to be able to handle AdSize32 instructions in 64-bit mode when address size prefix is used. Eventually we should probably also support 'addr32' and 'addr16' in the assembler to override the address size on some of these instructions. But for now we'll just use special operand types that will lookup the current mode size to select the right instruction. llvm-svn: 225075	2015-01-02 07:02:25 +00:00
Chandler Carruth	24ac830d7c	[SROA] Teach SROA to be more aggressive in splitting now that we have a pre-splitting pass over loads and stores. Historically, splitting could cause enough problems that I hamstrung the entire process with a requirement that splittable integer loads and stores must cover the entire alloca. All smaller loads and stores were unsplittable to prevent chaos from ensuing. With the new pre-splitting logic that does load/store pair splitting I introduced in r225061, we can now very nicely handle arbitrarily splittable loads and stores. In order to fully benefit from these smarts, we need to mark all of the integer loads and stores as splittable. However, we don't actually want to rewrite partitions with all integer loads and stores marked as splittable. This will fail to extract scalar integers from aggregates, which is kind of the point of SROA. =] In order to resolve this, what we really want to do is only do pre-splitting on the alloca slices with integer loads and stores fully splittable. This allows us to uncover all non-integer uses of the alloca that would benefit from a split in an integer load or store (and where introducing the split is safe because it is just memory transfer from a load to a store). Once done, we make all the non-whole-alloca integer loads and stores unsplittable just as they have historically been, repartition and rewrite. The result is that when there are integer loads and stores anywhere within an alloca (such as from a memcpy of a sub-object of a larger object), we can split them up if there are non-integer components to the aggregate hiding beneath. I've added the challenging test cases to demonstrate how this is able to promote to scalars even a case where we have even partially overlapping loads and stores. This restores the single-store behavior for small arrays of i8s which is really nice. I've restored both the little endian testing and big endian testing for these exactly as they were prior to r225061. It also forced me to be more aggressive in an alignment test to actually defeat SROA. =] Without the added volatiles there, we actually split up the weird i16 loads and produce nice double allocas with better alignment. This also uncovered a number of bugs where we failed to handle splittable load and store slices which didn't have a begininng offset of zero. Those fixes are included, and without them the existing test cases explode in glorious fireworks. =] I've kept support for leaving whole-alloca integer loads and stores as splittable even for the purpose of rewriting, but I think that's likely no longer needed. With the new pre-splitting, we might be able to remove all the splitting support for loads and stores from the rewriter. Not doing that in this patch to try to isolate any performance regressions that causes in an easy to find and revert chunk. llvm-svn: 225074	2015-01-02 03:55:54 +00:00
Chandler Carruth	e65ae89327	[SROA] Add a test case for r225068 / PR22080. llvm-svn: 225070	2015-01-02 00:34:29 +00:00
Chandler Carruth	0715cba02d	[SROA] Teach SROA how to much more intelligently handle split loads and stores. When there are accesses to an entire alloca with an integer load or store as well as accesses to small pieces of the alloca, SROA splits up the large integer accesses. In order to do that, it uses bit math to merge the small accesses into large integers. While this is effective, it produces insane IR that can cause significant problems in the rest of the optimizer: - It can cause load and store mismatches with GVN on the non-alloca side where we end up loading an i64 (or some such) rather than loading specific elements that are stored. - We can't always get rid of the integer bit math, which is why we can't always fix the loads and stores to work well with GVN. - This is especially bad when we have operations that mix poorly with integer bit math such as floating point operations. - It will block things like the vectorizer which might be able to handle the scalar stores that underly the aggregate. At the same time, we can't just directly split up these loads and stores in all cases. If there is actual integer arithmetic involved on the values, then using integer bit math is actually the perfect lowering because we can often combine it heavily with the surrounding math. The solution this patch provides is to find places where SROA is partitioning aggregates into small elements, and look for splittable loads and stores that it can split all the way to some other adjacent load and store. These are uniformly the cases where failing to split the loads and stores hurts the optimizer that I have seen, and I've looked extensively at the code produced both from more and less aggressive approaches to this problem. However, it is quite tricky to actually do this in SROA. We may have loads and stores to the same alloca, or other complex patterns that are hard to handle. This complexity leads to the somewhat subtle algorithm implemented here. We have to do this entire process as a separate pass over the partitioning of the alloca, and split up all of the loads prior to splitting the stores so that we can handle safely the cases of overlapping, including partially overlapping, loads and stores to the same alloca. We also have to reconstitute the post-split slice configuration so we can avoid iterating again over all the alloca uses (the slow part of SROA). But we also have to ensure that when we split up loads and stores to other allocas, we do re-iterate over them in SROA to adapt to the more refined partitioning now required. With this, I actually think we can fix a long-standing TODO in SROA where I avoided splitting as many loads and stores as probably should be splittable. This limitation historically mitigated the fallout of all the bad things mentioned above. Now that we have more intelligent handling, I plan to remove the FIXME and more aggressively mark integer loads and stores as splittable. I'll do that in a follow-up patch to help with bisecting any fallout. The net result of this change should be more fine-grained and accurate scalars being formed out of aggregates. At the very least, Clang now generates perfect code for this high-level test case using std::complex<float>: #include <complex> void g1(std::complex<float> &x, float a, float b) { x += std::complex<float>(a, b); } void g2(std::complex<float> &x, float a, float b) { x -= std::complex<float>(a, b); } void foo(const std::complex<float> &x, float a, float b, std::complex<float> &x1, std::complex<float> &x2) { std::complex<float> l1 = x; g1(l1, a, b); std::complex<float> l2 = x; g2(l2, a, b); x1 = l1; x2 = l2; } This code isn't just hypothetical either. It was reduced out of the hot inner loops of essentially every part of the Eigen math library when using std::complex<float>. Those loops would consistently and pervasively hop between the floating point unit and the integer unit due to bit math extraction and insertion of floating point values that were "stored" in a 64-bit integer register around the loop backedge. So far, this change has passed a bootstrap and I have done some other testing and so far, no issues. That doesn't mean there won't be though, so I'll be prepared to help with any fallout. If you performance swings in particular, please let me know. I'm very curious what all the impact of this change will be. Stay tuned for the follow-up to also split more integer loads and stores. llvm-svn: 225061	2015-01-01 11:54:38 +00:00
Hal Finkel	c58ce4132a	[PowerPC] Improve instruction selection bit-permuting operations (64-bit) This is the second installment of improvements to instruction selection for "bit permutation" instruction sequences. r224318 added logic for instruction selection for 32-bit bit permutation sequences, and this adds lowering for 64-bit sequences. The 64-bit sequences are more complicated than the 32-bit ones because: a) the 64-bit versions of the 32-bit rotate-and-mask instructions work by replicating the lower 32-bits of the value-to-be-rotated into the upper 32 bits -- and integrating this into the cost modeling for the various bit group operations is non-trivial b) unlike the 32-bit instructions in 32-bit mode, the rotate-and-mask instructions cannot, in one instruction, specify the mask starting index, the mask ending index, and the rotation factor. Also, forming arbitrary 64-bit constants is more complicated than in 32-bit mode because the number of instructions necessary is value dependent. Plus, support for 'late masking' was added: it is sometimes more efficient to treat the overall value as if it had no mandatory zero bits when planning the bit-group insertions, and then mask them in at the very end. Unfortunately, as the structure of the bit groups is different in the two cases, the more feasible implementation technique was to generate both instruction sequences, and then pick the shorter one. And finally, we now generate reasonable code for i64 bswap: rldicl 5, 3, 16, 0 rldicl 4, 3, 8, 0 rldicl 6, 3, 24, 0 rldimi 4, 5, 8, 48 rldicl 5, 3, 32, 0 rldimi 4, 6, 16, 40 rldicl 6, 3, 48, 0 rldimi 4, 5, 24, 32 rldicl 5, 3, 56, 0 rldimi 4, 6, 40, 16 rldimi 4, 5, 48, 8 rldimi 4, 3, 56, 0 vs. what we used to produce: li 4, 255 rldicl 5, 3, 24, 40 rldicl 6, 3, 40, 24 rldicl 7, 3, 56, 8 sldi 8, 3, 8 sldi 10, 3, 24 sldi 12, 3, 40 rldicl 0, 3, 8, 56 sldi 9, 4, 32 sldi 11, 4, 40 sldi 4, 4, 48 andi. 5, 5, 65280 andis. 6, 6, 255 andis. 7, 7, 65280 sldi 3, 3, 56 and 8, 8, 9 and 4, 12, 4 and 9, 10, 11 or 6, 7, 6 or 5, 5, 0 or 3, 3, 4 or 7, 9, 8 or 4, 6, 5 or 3, 3, 7 or 3, 3, 4 which is 12 instructions, instead of 25, and seems optimal (at least in terms of code size). llvm-svn: 225056	2015-01-01 02:53:29 +00:00
Sanjay Patel	e68f71574f	InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X Some day the backend may handle instruction-level fast math flags and make this transform unnecessary, but it's still better practice to use the canonical representation of fneg when possible (use a -0.0). This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ). See also http://reviews.llvm.org/D6723. Differential Revision: http://reviews.llvm.org/D6731 llvm-svn: 225050	2014-12-31 22:14:05 +00:00
Rafael Espindola	54b435ec3c	Add r224985 back with a fix. The issues was that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. Original message: Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 225048	2014-12-31 17:19:34 +00:00
Colin LeMahieu	5691eb5ee7	Reverting 225045 and 225043 and XFAIL multiline.ll on hexagon llvm-svn: 225047	2014-12-31 17:14:35 +00:00
Rafael Espindola	35e42db3ed	Add a test for the recent compiler-rt build failure. llvm-svn: 225046	2014-12-31 16:58:05 +00:00
Rafael Espindola	d4da9040de	Revert "Remove doesSectionRequireSymbols." This reverts commit r224985. I am investigating why it made an Apple bot unhappy. llvm-svn: 225044	2014-12-31 16:06:48 +00:00
Craig Topper	a7a8c4c09e	[X86] Update disassembler tests for absolute move instructions to check the encodings. This provides testing for r225036. 64-bit mode is still broken. llvm-svn: 225037	2014-12-31 07:24:23 +00:00
David Majnemer	f89dc3edc9	InstCombine: try to transform A-B < 0 into A < B We are allowed to move the 'B' to the right hand side if we an prove there is no signed overflow and if the comparison itself is signed. llvm-svn: 225034	2014-12-31 04:21:41 +00:00
Alexey Samsonov	553185ee4b	Revert "merge consecutive stores of extracted vector elements" This reverts commit r224611. This change causes crashes in X86 DAG->DAG Instruction Selection. llvm-svn: 225031	2014-12-31 00:40:28 +00:00
Colin LeMahieu	bc405294f0	[Hexagon] Adding accumulating add/sub, doubleword logic-not variants, doubleword bitfield extract, word parity, accumulating multiplies with saturation. llvm-svn: 225024	2014-12-31 00:08:34 +00:00
David Blaikie	3927b97151	Fix a test case to not depend on asm comment syntax, so as to be portable Too many different comment characters - instead of trying to account for them all, instead disable the comments and just check for end-of-line instead. llvm-svn: 225020	2014-12-30 23:33:55 +00:00
David Blaikie	7a3f48292c	Generalize even further, for ARM comment syntax (@) llvm-svn: 225019	2014-12-30 23:23:58 +00:00
Colin LeMahieu	8971e055ae	[Hexagon] Adding double-logic on predicate instructions. llvm-svn: 225018	2014-12-30 23:22:39 +00:00
David Blaikie	56605aff6b	Generalize test case to handle different asm syntax (# or // comments) llvm-svn: 225017	2014-12-30 23:21:57 +00:00
Colin LeMahieu	65f3e12ed1	[Hexagon] Adding newvalue compare and jumps. llvm-svn: 225015	2014-12-30 23:04:21 +00:00
David Blaikie	aeaa5bf55e	DebugInfo: Omit is_stmt from line table entries on the same line. GCC does this for non-zero discriminators and since GCC doesn't produce column info, that was the only place it comes up there. For LLVM, since we can emit discriminators and/or column info, it makes more sense to invert the condition and just test for changes in line number. This should resolve at least some of the GDB 7.5 test suite failures created by recent Clang changes that increase the location fidelity (which, since Clang defaults to including column info on Linux by default created a bunch of cases that confused GDB). In theory we could do this better/differently by grouping actual source statements together in a similar manner to the way lexical scopes are handled but given that GDB isn't really in a position to consume that (& users are probably somewhat used to different lines being different 'statements') this seems the safest and cheapest change. (I'm concerned that doing this 'right' would bloat the debugloc data even further - something Duncan's working hard to address) llvm-svn: 225011	2014-12-30 22:47:13 +00:00
Colin LeMahieu	0cba5f1b43	[Hexagon] Adding postincrement register newvalue stores. llvm-svn: 225010	2014-12-30 22:34:08 +00:00
Colin LeMahieu	9014890819	[Hexagon] Removing old newvalue store variants. Adding postincrement immediate newvalue stores. llvm-svn: 225009	2014-12-30 22:28:31 +00:00
Zoran Jovanovic	10646918d1	[mips][microMIPS] Relocate with symbol for micromips symbols Differential Revision: http://reviews.llvm.org/D6796 llvm-svn: 225008	2014-12-30 22:04:16 +00:00
Colin LeMahieu	820d5cb608	[Hexagon] Adding indexed store new-value variants. llvm-svn: 225007	2014-12-30 22:00:26 +00:00
Colin LeMahieu	2bad4a7177	[Hexagon] Adding indexed store of immediates. llvm-svn: 225006	2014-12-30 21:01:38 +00:00
Colin LeMahieu	94a498bf0e	[Hexagon] Adding indexed stores. llvm-svn: 225005	2014-12-30 20:42:23 +00:00
Peter Collingbourne	7ef497b1f5	x86_64: Fix calls to __morestack under the large code model. Under the large code model, we cannot assume that __morestack lives within 2^31 bytes of the call site, so we cannot use pc-relative addressing. We cannot perform the call via a temporary register, as the rax register may be used to store the static chain, and all other suitable registers may be either callee-save or used for parameter passing. We cannot use the stack at this point either because __morestack manipulates the stack directly. To avoid these issues, perform an indirect call via a read-only memory location containing the address. This solution is not perfect, as it assumes that the .rodata section is laid out within 2^31 bytes of each function body, but this seems to be sufficient for JIT. Differential Revision: http://reviews.llvm.org/D6787 llvm-svn: 225003	2014-12-30 20:05:19 +00:00
Kostya Serebryany	aa185bfc4b	[asan] change _sanitizer_cov_module_init to accept int* instead of int** llvm-svn: 224999	2014-12-30 19:29:28 +00:00
Michael Kuperstein	c43b063358	[COFF] Don't try to add quotes to already quoted linker directives If a linker directive is already quoted, don't try to quote it again, otherwise it creates a mess. This pops up in places like: #pragma comment(linker,"\"/foo bar'\"") Differential Revision: http://reviews.llvm.org/D6792 llvm-svn: 224998	2014-12-30 19:23:48 +00:00
Colin LeMahieu	9161d47476	[Hexagon] Adding reg-reg indexed load forms. llvm-svn: 224997	2014-12-30 18:58:47 +00:00
Colin LeMahieu	377ac65340	[Hexagon] Adding compare byte/halfword reg-reg/reg-imm forms. Adding compare to general register reg-imm form. llvm-svn: 224991	2014-12-30 17:39:24 +00:00
Colin LeMahieu	d7a56fd9ff	[Hexagon] Updating constant extender def, adding alu-not instructions, compare to general register, and inverted compares. llvm-svn: 224989	2014-12-30 15:44:17 +00:00
Rafael Espindola	b22d5aa49a	Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 224985	2014-12-30 13:13:27 +00:00
Rafael Espindola	2f471303c1	Simplify test a bit. It looks like the original intent was to check which symbols were created. With macho-dump the sections were being checked just to match which symbol was in which section. llvm-objdump prints the section a symbol is in. llvm-svn: 224980	2014-12-30 05:09:17 +00:00
Peter Zotov	ecb5c4f237	[OCaml] Fix bitrot in tests. llvm-svn: 224979	2014-12-30 03:24:14 +00:00
Peter Zotov	b45d5bd955	[lit] Make config.llvm_lib_dir available on cmake, too. The OCaml tests require config.llvm_lib_dir to determine the OCaml package search path. llvm-svn: 224978	2014-12-30 03:24:11 +00:00
Craig Topper	aa1c51ee01	Testcases for r224939. llvm-svn: 224976	2014-12-30 02:35:56 +00:00
Rafael Espindola	81aef1bd52	Convert test to llvm-readobj. NFC. llvm-svn: 224973	2014-12-30 01:34:06 +00:00
Philip Reames	5909ebcd6c	Semantic tests for memory invalidation at statepoints These are simply a collection of tests intended to show that information about the contents of gc references in the heap is lost at a statepoint. I've tried to write them so that they don't disallow correct transformations, while still being fairly easy to understand. p.s. Ideas for additional tests are welcome. Differential Revision: http://reviews.llvm.org/D6491 llvm-svn: 224971	2014-12-29 23:55:33 +00:00
Philip Reames	9db26ffc9a	Carry facts about nullness and undef across GC relocation This change implements four basic optimizations: If a relocated value isn't used, it doesn't need to be relocated. If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.) If the value being relocated is undef, the relocation is meaningless. If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.) I outlined other planned work in comments. Differential Revision: http://reviews.llvm.org/D6600 llvm-svn: 224968	2014-12-29 23:27:30 +00:00
Philip Reames	b35f46ce06	Refine the notion of MayThrow in LICM to include a header specific version In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up. This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs. define void @nothrow_header(i64 %x, i64 %y, i1 %cond) { ; CHECK-LABEL: nothrow_header ; CHECK-LABEL: entry ; CHECK: %div = udiv i64 %x, %y ; CHECK-LABEL: loop ; CHECK: call void @use(i64 %div) entry: br label %loop loop: ; preds = %entry, %for.inc %div = udiv i64 %x, %y br i1 %cond, label %loop-if, label %exit loop-if: call void @use(i64 %div) br label %loop exit: ret void } The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load. The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata. Differential Revision: http://reviews.llvm.org/D6725 llvm-svn: 224965	2014-12-29 23:00:57 +00:00
Philip Reames	5ad26c353c	Loading from null is valid outside of addrspace 0 This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen. This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces. We really should introduce a hook to control this property on a per target per address space basis. We may be loosing valuable optimizations in some address spaces by being too conservative. Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me. llvm-svn: 224961	2014-12-29 22:46:21 +00:00
Rafael Espindola	d351a18ebe	Convert test to llvm-readobj. NFC. llvm-svn: 224959	2014-12-29 22:14:35 +00:00
Colin LeMahieu	651b72095b	[Hexagon] Adding allocframe, post-increment circular immediate stores, post-increment circular register stores, and bit reversed post-increment stores. llvm-svn: 224957	2014-12-29 21:33:45 +00:00
Colin LeMahieu	bda31b42a0	[Hexagon] Adding post-increment register form stores and register-immediate form stores with tests. llvm-svn: 224952	2014-12-29 20:44:51 +00:00
Colin LeMahieu	9a3cd3f58c	[Hexagon] Replacing the remaining postincrement stores with versions that have encoding bits. llvm-svn: 224951	2014-12-29 20:00:43 +00:00
Rafael Espindola	517b47232b	Convert test to FileCheck. NFC. llvm-svn: 224950	2014-12-29 19:50:32 +00:00
Colin LeMahieu	3d34afb32d	[Hexagon] Renaming old multiclass for removal. Adding post-increment store classes and instruction defs. llvm-svn: 224949	2014-12-29 19:42:14 +00:00
Rafael Espindola	44eae72c40	Add segmented stack support for DragonFlyBSD. Patch by Michael Neumann. llvm-svn: 224936	2014-12-29 15:47:28 +00:00
NAKAMURA Takumi	4099328596	llvm/test/CodeGen/X86/fast-isel-call-bool.ll: Add explicit -mtriple=x86_64-unknown to satisfy x64. llvm-svn: 224907	2014-12-28 23:37:11 +00:00
Keno Fischer	fd22c6693b	[X86][ISel] Fix a regression I introduced in r224884 The else case ResultReg was not checked for validity. To my surprise, this case was not hit in any of the existing test cases. This includes a new test cases that tests this path. Also drop the `target triple` declaration from the original test as suggested by H.J. Lu, because apparently with it the test won't be run on Linux llvm-svn: 224901	2014-12-28 15:20:57 +00:00
Michael Kuperstein	683c3cde43	[X86] Add missing memory variants to AVX false dependency breaking Adds missing memory instruction variants to AVX false dependency breaking handling. (SSE was handled in r224246) Differential Revision: http://reviews.llvm.org/D6780 llvm-svn: 224900	2014-12-28 13:15:05 +00:00
Andrea Di Biagio	22ee3f63b9	[CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz. If the control flow is modelling an if-statement where the only instruction in the 'then' basic block (excluding the terminator) is a call to cttz/ctlz, CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control flow graph. Example: \code entry: %cmp = icmp eq i64 %val, 0 br i1 %cmp, label %end.bb, label %then.bb then.bb: %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) br label %end.bb end.bb: %cond = phi i64 [ %c, %then.bb ], [ 64, %entry] \code In this example, basic block %then.bb is taken if value %val is not zero. Also, the phi node in %end.bb would propagate the size-of in bits of %val only if %val is equal to zero. With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb into basic block %entry only if cttz is cheap to speculate for the target. Added two new hooks in TargetLowering.h to let targets customize the behavior (i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'. By default, both methods return 'false'. On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI. Differential Revision: http://reviews.llvm.org/D6728 llvm-svn: 224899	2014-12-28 11:07:35 +00:00
Elena Demikhovsky	87700a734d	Scalarizer for masked load and store intrinsics. Masked vector intrinsics are a part of common LLVM IR, but they are really supported on AVX2 and AVX-512 targets. I added a code that translates masked intrinsic for all other targets. The masked vector intrinsic is converted to a chain of scalar operations inside conditional basic blocks. http://reviews.llvm.org/D6436 llvm-svn: 224897	2014-12-28 08:54:45 +00:00
David Majnemer	d0bcef2040	PowerPC: CTR shouldn't fire if a TLS call is in the loop Determining the address of a TLS variable results in a function call in certain TLS models. This means that a simple ICmpInst might actually result in invalidating the CTR register. In such cases, do not attempt to rely on the CTR register for loop optimization purposes. This fixes PR22034. Differential Revision: http://reviews.llvm.org/D6786 llvm-svn: 224890	2014-12-27 19:45:38 +00:00
Keno Fischer	8438b08663	[FastIsel][X86] Fix invalid register replacement for bool args Summary: Consider the following IR: %3 = load i8* undef %4 = trunc i8 %3 to i1 %5 = call %jl_value_t.0* @foo(..., i1 %4, ...) ret %jl_value_t.0* %5 Bools (that are the result of direct truncs) are lowered as whatever the argument to the trunc was and a "and 1", causing the part of the MBB responsible for this argument to look something like this: %vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7 Later, when the load is lowered, it will insert %vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14 but remember to (at the end of isel) replace vreg7 by vreg15. Now for the bug. In fast isel lowering, we mistakenly mark vreg8 as the result of the load instead of the trunc. This adds a fixup to have vreg8 replaced by whatever the result of the load is as well, so we end up with %vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15 which is an SSA violation and causes problems later down the road. This fixes PR21557. Test Plan: Test test case from PR21557 is added to the test suite. Reviewers: ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6245 llvm-svn: 224884	2014-12-27 13:10:15 +00:00
Rafael Espindola	47beb8ace0	Convert test to llvm-readobj. NFC. llvm-svn: 224872	2014-12-26 22:47:39 +00:00
Colin LeMahieu	8233fb002d	[Hexagon] Adding auto-incrementing loads with and without byte reversal. llvm-svn: 224871	2014-12-26 21:09:25 +00:00
Colin LeMahieu	0a721cd4e1	[Hexagon] Adding locked loads. llvm-svn: 224870	2014-12-26 20:42:27 +00:00
Colin LeMahieu	ff370ed90e	[Hexagon] Adding deallocframe and circular addressing loads. llvm-svn: 224869	2014-12-26 20:30:58 +00:00
Colin LeMahieu	c83cbbf6a1	[Hexagon] Adding remaining post-increment instruction variants. Removing unused classes. llvm-svn: 224868	2014-12-26 19:31:46 +00:00
Colin LeMahieu	fe9612e09d	[Hexagon] Adding post-increment unsigned byte loads. llvm-svn: 224867	2014-12-26 19:12:11 +00:00
Colin LeMahieu	96976a10a3	[Hexagon] Adding post-increment signed byte loads with tests. llvm-svn: 224866	2014-12-26 18:57:13 +00:00
Rafael Espindola	1a7ef86add	Use llvm-readobj. NFC. llvm-svn: 224864	2014-12-26 18:22:05 +00:00
Craig Topper	c4b12166f2	[X86] Add the debug registers DR8-DR15 so we can assemble and disassemble references to them. llvm-svn: 224862	2014-12-26 18:20:05 +00:00
Craig Topper	d5b39237a1	[X86] Don't fail disassembly if REX.R/REX.B is used on an MMX register. Similar fix to not fail to disassembler CR9-CR15 references. llvm-svn: 224861	2014-12-26 18:19:44 +00:00
Timur Iskhodzhanov	b6fa52f274	Band-aid fix for PR22032: don't emit DWARF debug info if AddressSanitizer is enabled on Windows llvm-svn: 224860	2014-12-26 17:00:51 +00:00
Rafael Espindola	5d94634c13	No need to run llvm-as. NFC. llvm-svn: 224859	2014-12-26 16:42:47 +00:00
David Majnemer	b1296ec0fd	InstCombine: Infer nuw for multiplies A multiply cannot unsigned wrap if there are bitwidth, or more, leading zero bits between the two operands. llvm-svn: 224849	2014-12-26 09:50:35 +00:00
David Majnemer	54c2ca2539	InstCombe: Infer nsw for multiplies We already utilize this logic for reducing overflow intrinsics, it makes sense to reuse it for normal multiplies as well. llvm-svn: 224847	2014-12-26 09:10:14 +00:00
Craig Topper	ee9eef2fd8	Teach disassembler to handle illegal immediates on (v)cmpps/pd/ss/sd instructions. Instead of rejecting we'll just generate the _alt forms that don't try to alter the mnemonic. While I'm here, merge some common code in the Instruction printers for the condition code replacement and fix the mask on SSE to be 3-bits instead of 4. llvm-svn: 224846	2014-12-26 06:36:28 +00:00
Hal Finkel	0c505b08a5	[PowerPC] [FastISel] i1 constants must be zero extended When materializing constant i1 values, they must be zero extended. We represent i1 values as [0, 1], not [0, -1], in i32 registers. As it turns out, this code path was dead for i1 values prior to r216006 (which is why this did not manifest in miscompiles until recently). Fixes -O0 self-hosting on PPC64/Linux. llvm-svn: 224842	2014-12-25 23:08:25 +00:00
Elena Demikhovsky	fb81b93e17	Masked Load/Store - Changed the order of parameters in intrinsics. No functional changes. The documentation is coming. llvm-svn: 224829	2014-12-25 07:49:20 +00:00
David Majnemer	2913eca4e2	CodeGen: Error on redefinitions instead of asserting It's possible to have a prior definition of a symbol in module asm. Raise an error instead of crashing. llvm-svn: 224828	2014-12-24 23:06:55 +00:00
David Majnemer	8e92dfee20	CodeGen: Allow aliases to be overridden by variables llvm-svn: 224827	2014-12-24 22:44:29 +00:00
David Majnemer	58cb80c940	MC: Label definitions are permitted after .set directives .set directives may be overridden by other .set directives as well as label definitions. This fixes PR22019. llvm-svn: 224811	2014-12-24 10:27:50 +00:00

1 2 3 4 5 ...

27813 Commits