llvm-project

Commit Graph

Author	SHA1	Message	Date
Michael Kuperstein	9fe42604aa	[X86] Do not lower scalar sdiv/udiv to a shifts + mul sequence when optimizing for minsize There are some cases where the mul sequence is smaller, but for the most part, using a div is preferable. This does not apply to vectors, since x86 doesn't have vector idiv, and a vector mul/shifts sequence ought to be smaller than a scalarized division. Differential Revision: http://reviews.llvm.org/D12082 llvm-svn: 245431	2015-08-19 11:21:43 +00:00
Chandler Carruth	9a0051cd59	[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate insertions. The old behavior could cause arbitrarily bad memory usage in the DAG combiner if there was heavy traffic of adding nodes already on the worklist to it. This commit switches the DAG combine worklist to work the same way as the instcombine worklist where we null-out removed entries and only add new entries to the worklist. My measurements of codegen time shows slight improvement. The memory utilization is unsurprisingly dominated by other factors (the IR and DAG itself I suspect). This change results in subtle, frustrating churn in the particular order in which DAG combines are applied which causes a number of minor regressions where we fail to match a pattern previously matched by accident. AFAICT, all of these should be using AddToWorklist to directly or should be written in a less brittle way. None of the changes seem drastically bad, and a few of the changes seem distinctly better. A major change required to make this work is to significantly harden the way in which the DAG combiner handle nodes which become dead (zero-uses). Previously, we relied on the ability to "priority-bump" them on the combine worklist to achieve recursive deletion of these nodes and ensure that the frontier of remaining live nodes all were added to the worklist. Instead, I've introduced a routine to just implement that precise logic with no indirection. It is a significantly simpler operation than that of the combiner worklist proper. I suspect this will also fix some other problems with the combiner. I think the x86 changes are really minor and uninteresting, but the avx512 change at least is hiding a "regression" (despite the test case being just noise, not testing some performance invariant) that might be looked into. Not sure if any of the others impact specific "important" code paths, but they didn't look terribly interesting to me, or the changes were really minor. The consensus in review is to fix any regressions that show up after the fact here. Thanks to the other reviewers for checking the output on other architectures. There is a specific regression on ARM that Tim already has a fix prepped to commit. Differential Revision: http://reviews.llvm.org/D4616 llvm-svn: 213727	2014-07-23 07:08:53 +00:00
Tim Northover	ee20caaf82	TableGen: use PrintMethods to print more aliases llvm-svn: 208607	2014-05-12 18:04:06 +00:00
Stephen Lin	f799e3f944	Convert CodeGen//.ll tests to use the new CHECK-LABEL for easier debugging. No functionality change and all tests pass after conversion. This was done with the following sed invocation to catch label lines demarking function boundaries: sed -i '' "s/^;$ $$[A-Z0-9_]$:$ $test$[A-Za-z0-9_-]$:$ $$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen//*.ll which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct. llvm-svn: 186258	2013-07-13 20:38:47 +00:00
Owen Anderson	de89ecf1fc	Reapply r174343, with a fix for a scary DAG combine bug where it failed to differentiate between the alignment of the base point of a load, and the overall alignment of the load. This caused infinite loops in DAG combine with the original application of this patch. ORIGINAL COMMIT LOG: When the target-independent DAGCombiner inferred a higher alignment for a load, it would replace the load with one with the higher alignment. However, it did not place the new load in the worklist, which prevented later DAG combines in the same phase (for example, target-specific combines) from ever seeing it. This patch corrects that oversight, and updates some tests whose output changed due to slightly different DAGCombine outputs. llvm-svn: 174431	2013-02-05 19:24:39 +00:00
NAKAMURA Takumi	3753b28cd2	Revert r174343, "When the target-independent DAGCombiner inferred a higher alignment for a load," It caused hangups in compiling clang/lib/Parse/ParseDecl.cpp and clang/lib/Driver/Tools.cpp in stage2 on some hosts. llvm-svn: 174374	2013-02-05 14:44:16 +00:00
Owen Anderson	a47fdbb032	When the target-independent DAGCombiner inferred a higher alignment for a load, it would replace the load with one with the higher alignment. However, it did not place the new load in the worklist, which prevented later DAG combines in the same phase (for example, target-specific combines) from ever seeing it. This patch corrects that oversight, and updates some tests whose output changed due to slightly different DAGCombine outputs. llvm-svn: 174343	2013-02-05 06:25:30 +00:00
Benjamin Kramer	3aab6a86a2	PR13326: Fix a subtle edge case in the udiv -> magic multiply generator. This caused 6 of 65k possible 8 bit udivs to be wrong. llvm-svn: 160058	2012-07-11 18:31:59 +00:00
Andrew Trick	8523b16ff5	Instruction scheduling itinerary for Intel Atom. Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT. Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches. Adds a test to verify that the scheduler is working. Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP. Patch by Preston Gurd! llvm-svn: 149558	2012-02-01 23:20:51 +00:00
Jakob Stoklund Olesen	4931bbc671	Be more aggressive about following hints. RAGreedy::tryAssign will now evict interference from the preferred register even when another register is free. To support this, add the EvictionCost struct that counts how many hints are broken by an eviction. We don't want to break one hint just to satisfy another. Rename canEvict to shouldEvict, and add the first bit of eviction policy that doesn't depend on spill weights: Always make room in the preferred register as long as the evictees can be split and aren't already assigned to their preferred register. Also make the CSR avoidance more accurate. When looking for a cheaper register it is OK to use a new volatile register. Only CSR aliases that have never been used before should be avoided. llvm-svn: 134735	2011-07-08 20:46:18 +00:00
Benjamin Kramer	cfcea12fe2	BuildUDIV: If the divisor is even we can simplify the fixup of the multiplied value by introducing an early shift. This allows us to compile "unsigned foo(unsigned x) { return x/28; }" into shrl $2, %edi imulq $613566757, %rdi, %rax shrq $32, %rax ret instead of movl %edi, %eax imulq $613566757, %rax, %rcx shrq $32, %rcx subl %ecx, %eax shrl %eax addl %ecx, %eax shrl $4, %eax on x86_64 llvm-svn: 127829	2011-03-17 20:39:14 +00:00
Cameron Zwarich	8731d0cc83	The signed version of our "magic number" computation for the integer approximation of a constant had a minor typo introduced when copying it from the book, which caused it to favor negative approximations over positive approximations in many cases. Positive approximations require fewer operations beyond the multiplication. In the case of division by 3, we still generate code that is a single instruction larger than GCC's code. llvm-svn: 126097	2011-02-21 00:22:02 +00:00
Benjamin Kramer	946e1522b6	Teach DAGCombine to fold fold (sra (trunc (sr x, c1)), c2) -> (trunc (sra x, c1+c2) when c1 equals the amount of bits that are truncated off. This happens all the time when a smul is promoted to a larger type. On x86-64 we now compile "int test(int x) { return x/10; }" into movslq %edi, %rax imulq $1717986919, %rax, %rax movq %rax, %rcx shrq $63, %rcx sarq $34, %rax <- used to be "shrq $32, %rax; sarl $2, %eax" addl %ecx, %eax This fires 96 times in gcc.c on x86-64. llvm-svn: 124559	2011-01-30 16:38:43 +00:00
Dale Johannesen	a94e36bbee	Reapply 122353-122355 with fixes. 122354 was wrong; the shift type was needed one place, the shift count type another. The transform in 123555 had the same problem. llvm-svn: 122366	2010-12-21 21:55:50 +00:00
Dale Johannesen	87c47499c6	Revert 122353-122355 for the moment, they broke stuff. llvm-svn: 122360	2010-12-21 21:22:27 +00:00
Dale Johannesen	fa5dc82fda	Get the type of a shift from the shift, not from its shift count operand. These should be the same but apparently are not always, and this is cleaner anyway. This improves the code in an existing test. llvm-svn: 122354	2010-12-21 20:06:19 +00:00
Chris Lattner	15090e1eb0	take care of some todos, transforming [us]mul_lohi into a wider mul if the wider mul is legal. llvm-svn: 121848	2010-12-15 06:04:19 +00:00
Chris Lattner	c3301e970c	merge two tests llvm-svn: 121847	2010-12-15 05:58:59 +00:00
Chris Lattner	8e21a02c19	rename test llvm-svn: 121697	2010-12-13 08:39:40 +00:00

19 Commits