Ultimately, we'll need to leave something behind to indicate which
alloca will hold the exception, but we can figure that out when it comes
time to emit the __CxxFrameHandler3 catch handler table.
llvm-svn: 231164
Selection conditions may be vectors or scalars. Make sure InstCombine
doesn't indiscriminately assume that a select which is value dependent
on another select have identical select condition types.
This fixes PR22773.
llvm-svn: 231156
From:
int M, total;
void foo() {
int i;
for (i = 0; i < M; i++) {
total = total + i / 2;
}
}
This is the kernel loop:
.LBB0_2: # %for.body
=>This Inner Loop Header: Depth=1
movl %edx, %esi
movl %ecx, %edx
shrl $31, %edx
addl %ecx, %edx
sarl %edx
addl %esi, %edx
incl %ecx
cmpl %eax, %ecx
jl .LBB0_2
--------------------------
The first mov insn "movl %edx, %esi" could be removed if we change "addl %esi, %edx" to "addl %edx, %esi".
The IR before TwoAddressInstructionPass is:
BB#2: derived from LLVM BB %for.body
Predecessors according to CFG: BB#1 BB#2
%vreg3<def> = COPY %vreg12<kill>; GR32:%vreg3,%vreg12
%vreg2<def> = COPY %vreg11<kill>; GR32:%vreg2,%vreg11
%vreg7<def,tied1> = SHR32ri %vreg3<tied0>, 31, %EFLAGS<imp-def,dead>; GR32:%vreg7,%vreg3
%vreg8<def,tied1> = ADD32rr %vreg3<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg3,%vreg7
%vreg9<def,tied1> = SAR32r1 %vreg8<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8
%vreg4<def,tied1> = ADD32rr %vreg9<kill,tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg9,%vreg2
%vreg5<def,tied1> = INC64_32r %vreg3<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg5,%vreg3
CMP32rr %vreg5, %vreg0, %EFLAGS<imp-def>; GR32:%vreg5,%vreg0
%vreg11<def> = COPY %vreg4; GR32:%vreg11,%vreg4
%vreg12<def> = COPY %vreg5<kill>; GR32:%vreg12,%vreg5
JL_4 <BB#2>, %EFLAGS<imp-use,kill>
Now TwoAddressInstructionPass will choose vreg9 to be tied with vreg4. However, it doesn't see that there is copy from vreg4 to vreg11 and another copy from vreg11 to vreg2 inside the loop body. To remove those copies, it is necessary to choose vreg2 to be tied with vreg4 instead of vreg9. This code pattern commonly appears when there is reduction operation in a loop.
So check for a reversed copy chain and if we encounter one then we can commute the add instruction so we can avoid a copy.
Patch by Wei Mi.
http://reviews.llvm.org/D7806
llvm-svn: 231148
Summary:
This does not conceptually belongs here. Instead provide a shortcut
getModule() that provides access to the DataLayout.
Reviewers: chandlerc, echristo
Reviewed By: echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8027
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231147
The assertion was just checking a class invariant that's pretty easy to
verify by inspection (no mutating operations, and the two non-copy ctors
already ensure the state is maintained) so remove the explicit copy ctor
in favor of the default, thus allowing the use of the default copy
assignment operator without hitting the C++11 deprecation here.
llvm-svn: 231143
Accidentally committed a few more of these cleanup changes than
intended. Still breaking these out & tidying them up.
This reverts commit r231135.
llvm-svn: 231136
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.
llvm-svn: 231135
This type could be made copyable (= default a protected copy ctor in the
base class, and preferably make the derived class final to avoid risks
of providing a slicing copy operation to further derived classes) but it
seemed easier to avoid that complexity for a dump function that I assume
(by symmetry with ResourcePriorityQueue's dump, which was actively
buggy) not often used.
llvm-svn: 231133
Previously we had only Linux using DTPOFF for these; all X86 ELF
targets should. Fixes a side issue mentioned in PR21077.
Differential Revision: http://reviews.llvm.org/D8011
llvm-svn: 231130
frame register before checking if there is a DWARF register number for it.
Thanks to H.J. Lu for diagnosing this and providing the testcase!
llvm-svn: 231121
The cause of the issue is the interaction of two factors:
1) When generating a DW_TAG_imported_declaration DIE which imports another
imported declaration, the code in AsmPrinter/DwarfCompileUnit.cpp
asserts that the second imported declaration must already have a DIE.
2) There is a non-determinism in the order in which imported declarations
within the same scope are processed.
Because of the non-determinism (2), it is possible that an imported
declaration is processed before another one it depends on, breaking the
assumption in (1).
The source of the non-determinism is that the imported declaration
DIDescriptors are sorted by scope in DwarfDebug::beginModule(); however that
sort is not a stable_sort, therefore the order of the declarations within
the same scope is not preserved. The attached patch changes the std::sort to
a std::stable_sort and it fixes the problem.
Test omitted due to it being non-deterministic and depending on the
implementation of std::sort.
llvm-svn: 231100
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464. I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned. Let me know if I'm wrong :).
The code changes are fairly mechanical:
- Bumped the "Debug Info Version".
- `DIBuilder` now creates the appropriate subclass of `MDNode`.
- Subclasses of DIDescriptor now expect to hold their "MD"
counterparts (e.g., `DIBasicType` expects `MDBasicType`).
- Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
for printing comments.
- Big update to LangRef to describe the nodes in the new hierarchy.
Feel free to make it better.
Testcase changes are enormous. There's an accompanying clang commit on
its way.
If you have out-of-tree debug info testcases, I just broke your build.
- `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to
update all the IR testcases.
- Unfortunately I failed to find way to script the updates to CHECK
lines, so I updated all of these by hand. This was fairly painful,
since the old CHECKs are difficult to reason about. That's one of
the benefits of the new hierarchy.
This work isn't quite finished, BTW. The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro). Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I
also expect to make a few schema changes now that it's easier to reason
about everything.
llvm-svn: 231082
This prevents the behavior observed in llvm.org/PR22369. I am not sure
whether I am reading the code correctly, but the early exit based on
isLiveOutPastPHIs() seems to make the wrong assumption that
RegisterCoalescer won't be able to coalesce those copies later.
This change hides the new behavior behind -no-phi-elim-live-out-early-exit
as it currently breaks four tests:
* Assertion in:
CodeGen/Hexagon/hwloop-cleanup.ll
* Worse code in:
CodeGen/X86/coalescer-commute4.ll
CodeGen/X86/phys_subreg_coalesce-2.ll
CodeGen/X86/zlib-longest-match.ll
The root cause here seems to be that the heuristic that determines
the visitation order in RegisterCoalescer gets less lucky.
llvm-svn: 231064
This lets us avoid a few copies that are otherwise hard to get rid of.
The way this is done is, the custom-inserter looks at the following
instruction for another CMOV, and replaces both at the same time.
A previous version used a new CMOV2 opcode, but the custom inserter
is expected to be able to return a different basic block anyway, which
means it's OK - though far from ideal - to alter that block's contents.
Explicitly document that, in case it ever makes a difference.
Alternatives welcome!
Follow-up to r231045.
rdar://19767934
Closes http://reviews.llvm.org/D8019
llvm-svn: 231046
Fold and/or of setcc's to double CMOV:
(CMOV F, T, ((cc1 | cc2) != 0)) -> (CMOV (CMOV F, T, cc1), T, cc2)
(CMOV F, T, ((cc1 & cc2) != 0)) -> (CMOV (CMOV T, F, !cc1), F, !cc2)
When we can't use the CMOV instruction, it might increase branch
mispredicts. When we can, or when there is no mispredict, this
improves throughput and reduces register pressure.
These can't be catched by generic combines, because the pattern can
appear when legalizing some instructions (such as fcmp une).
rdar://19767934
http://reviews.llvm.org/D7634
llvm-svn: 231045
By loading from indexed offsets into a byte array and applying a mask, a
program can test bits from the bit set with a relatively short instruction
sequence. For example, suppose we have 15 bit sets to lay out:
A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits),
F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits),
L (4 bits), M (3 bits), N (2 bits), O (1 bit)
These bits can be laid out in a 16-byte array like this:
Byte Offset
0123456789ABCDEF
Bit
7 HHHHHHHHHIIIIIII
6 GGGGGGGGGGJJJJJJ
5 FFFFFFFFFFFKKKKK
4 EEEEEEEEEEEELLLL
3 DDDDDDDDDDDDDMMM
2 CCCCCCCCCCCCCCNN
1 BBBBBBBBBBBBBBBO
0 AAAAAAAAAAAAAAAA
For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to
test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done
in 1-2 machine instructions on x86, or 4-6 instructions on ARM.
This uses the LPT multiprocessor scheduling algorithm to lay out the bits
efficiently.
Saves ~450KB of instructions in a recent build of Chromium.
Differential Revision: http://reviews.llvm.org/D7954
llvm-svn: 231043
There's really no reason to have them have entries in the symbol table
anymore. Old versions of ld64 had some bugs in this area but those have
been fixed long ago.
llvm-svn: 231041
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.
Ought to be NFC, but it does slightly alter the output format of the
textual assembly.
This reapplies 230930 without the assertion in DebugLocEntry::finalize()
because not all Machine registers can be lowered into DWARF register
numbers and floating point constants cannot be expressed.
llvm-svn: 231023
This re-lands change r230921. r230921 was reverted because it broke a
clang test; a checkin fixing the clang test will be commited shortly.
Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533. SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow. There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.
Revert "IndVarSimplify: Allow LFTR to fire more often"
This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).
Revert "IndVarSimplify: Don't let LFTR compare against a poison value"
This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).
Reviewers: majnemer, atrick, spatel
Differential Revision: http://reviews.llvm.org/D7979
llvm-svn: 231018
Add the enum "LLVMLinkerMode" back for backwards-compatibility and add the
linker mode parameter back to the "LLVMLinkModules" function. The paramter is
ignored and has no effect.
Patch provided by: Filip Pizlo
Reviewed by: Rafael and Sean
llvm-svn: 230988
When reading a yaml::SequenceTraits object, YAMLIO does not report an
error if the yaml item is not a sequence. Instead, YAMLIO reads an
empty sequence. For example:
---
seq:
foo: 1
bar: 2
...
If `seq` is a SequenceTraits object, then reading the above yaml will
yield `seq` as an empty sequence.
Fix this to report an error for the above mapping ("not a sequence")
Patch by William Fisher. Thanks!
llvm-svn: 230976
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.
Ought to be NFC, but it does slightly alter the output format of the
textual assembly.
This reapplies 230930 with a relaxed assertion in DebugLocEntry::finalize()
that allows for empty DWARF expressions for constant FP values.
llvm-svn: 230975
Summary:
When the RHS of a conditional move node is zero, we can utilize the $zero
register by inverting the conditional move instruction and by swapping the
order of its True/False operands.
Reviewers: dsanders
Differential Revision: http://reviews.llvm.org/D7945
llvm-svn: 230956
Previously this would result in assertion failures or simply crashes
at various points in the optimizer when trying to create types of zero
bit width.
llvm-svn: 230936
A short list of some of the improvements:
1) Now supports -all command line argument, which implies many
other command line arguments to simplify usage.
2) Now supports -no-compiler-generated command line argument to
exclude compiler generated types.
3) Prints base class list.
4) -class-definitions implies -types.
5) Proper display of bitfields.
6) Can now distinguish between struct/class/interface/union.
And a few other minor tweaks.
llvm-svn: 230933
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.
Ought to be NFC, but it does slightly alter the output format of the
textual assembly.
llvm-svn: 230930
Summary:
As far as I can tell, the real bug causing the issue was fixed in
r230533. SCEVExpander should mark an increment operation as nuw or nsw
only if it can *prove* that the operation does not overflow. There
shouldn't be any situation where we have to do something different
because of no-wrap flags generated by SCEVExpander.
Revert "IndVarSimplify: Allow LFTR to fire more often"
This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213).
Revert "IndVarSimplify: Don't let LFTR compare against a poison value"
This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102).
Reviewers: majnemer, atrick, spatel
Differential Revision: http://reviews.llvm.org/D7979
llvm-svn: 230921
With initializer lists there is a really neat idiomatic way to write
this, 'ArrayRef.equals({1, 2, 3, 4, 5})'. Remove the equal method which
always had a hard limit on the number of arguments. I considered
rewriting it with variadic templates but that's not really a good fit
for a function with homogeneous arguments.
'ArrayRef == {1, 2, 3, 4, 5}' would've been even more awesome, but C++11
doesn't allow init lists with binary operators.
llvm-svn: 230907
Such edges are zero matrix, and they bring no additional info to the
allocation problem, apart from contributing to nodes' degree. Removing
those edges is expected to improve allocation time.
Tune the spill cost comparison, as this gives better average performances
now that the nodes' degrees has changed.
llvm-svn: 230904
We were missing a check for the following fold in DAGCombiner:
// fold (fmul (fmul x, c1), c2) -> (fmul x, (fmul c1, c2))
If 'x' is also a constant, then we shouldn't do anything. Otherwise, we could end up swapping the operands back and forth forever.
This should fix:
http://llvm.org/bugs/show_bug.cgi?id=22698
Differential Revision: http://reviews.llvm.org/D7917
llvm-svn: 230884
Start using `TempMDNode` in `DIDescriptor::replaceAllUsesWith()`
(effectively `std::unique_ptr<MDNode, MDNode::deleteTemporary>`).
Besides making ownership more explicit, this prepares for when
`DIDescriptor` refers to nodes that are *not* `MDTuple`. The old logic
for "replacing" a node with itself used `MDNode::get()` to return a new
(uniqued) `MDTuple`, while the new logic just defers to
`MDNode::replaceWithUniqued()` (which also typically saves an allocation
and RAUW traffic by mutating the temporary in place).
llvm-svn: 230879
While gaining practical experience hand-updating CHECK lines (for moving
the new debug info hierarchy into place), I learnt a few things about
CHECK-ability of the specialized node assembly output.
- The first part of a `CHECK:` is to identify the "right" node (this
is especially true if you intend to use the new `CHECK-SAME`
feature, since the first CHECK needs to identify the node correctly
before you can split the line).
- If there's a `tag:`, it should go first.
- If there's a `name:`, it should go next (followed by the
`linkageName:`, if any).
- If there's a `scope:`, it should follow after that.
- When a node type supports multiple DW_TAGs, but one is implied by
its name and is overwhelmingly more common, the `tag:` field is
terribly uninteresting unless it's different.
- `MDBasicType` is almost always `DW_TAG_base_type`.
- `MDTemplateValueParameter` is almost always
`DW_TAG_template_value_parameter`.
- Printing `name: ""` doesn't improve CHECK-ability, and there are far
more nodes than I realized that are commonly nameless.
- There are a few other fields that similarly aren't very interesting
when they're empty.
This commit updates the `AsmWriter` as suggested above (and makes
necessary changes in `LLParser` for round-tripping).
llvm-svn: 230877
Properly escape string fields in metadata. I've added a spot-check with
direct coverage for `MDFile::getFilename()`, but we'll get more coverage
once the hierarchy is moved into place (since this comes up in various
checked-in testcases).
I've replicated the `if` logic using the `ShouldSkipEmpty` flag
(although a follow-up commit is going to change how often this flag is
specified); no NFCI other than escaping the string fields.
llvm-svn: 230875
Extract logic for escaping a string field in the new debug info
hierarchy from `GenericDebugNode`. A follow-up commit will use it far
more widely (hence the dead code for `ShouldSkipEmpty`).
llvm-svn: 230873
Previously it was impossible to distinguish between "There is
no PDB implementation for this platform" and "I tried to load
the PDB, but couldn't find the file", making it hard to figure
out if you built llvm-pdbdump incorrectly or if you just mistyped
a file name.
This patch adds proper error handling so that we can know exactly
what went wrong.
llvm-svn: 230868
Level 1 should abort for all instructions but call/terminators/args.
Instead it was aborting only if the level was > 2
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230861
When using SetConsoleTextAttribute() to set the foreground or
background color, if you don't explicitly set both colors, then
a default value of black will be chosen for whichever you don't
specify a value for.
This is annoying when you have a non default console background
color, for example, and you try to set the foreground color.
This patch gets the existing fg/bg color and when you set one
attribute, sets the opposite attribute to its existing color
prior to comitting the update.
Reviewed by: Aaron Ballman
Differential Revision: http://reviews.llvm.org/D7967
llvm-svn: 230859
Leaving empty blocks around just opens up a can of bugs like PR22704. Deleting
them early also slightly simplifies code.
Thanks to Sanjay for the IR test case.
llvm-svn: 230856
All of the cases were just appending from random access iterators to a
vector. Using insert/append can grow the vector to the perfect size
directly and moves the growing out of the loop. No intended functionalty
change.
llvm-svn: 230845
It turns out the naming of inserted phis and selects is sensative to the order in which two sets are iterated. We need to nail this down to avoid non-deterministic output and possible test failures.
The modified test is the one I first noticed something odd in. The change is making it more strict to report the error. With the test change, but without the code change, the test fails roughly 1 in 5. With the code change, I've run ~30 runs without error.
Long term, the right fix here is to adjust the naming scheme. I'm checking in this hack to avoid any possible non-determinism in the tests over the weekend. HJust because I only noticed one case doesn't mean it's actually the only case. I hope to get to the right change Monday.
std->llvm data structure changes bugfix change #3
llvm-svn: 230835
Inserting into a DenseMap you're iterating over is not well defined. This is unfortunate since this is well defined on a std::map.
"cleanup per llvm code style standards" bug #2
llvm-svn: 230827
These tests cover the 'base object' identification and rewritting portion of RewriteStatepointsForGC. These aren't completely exhaustive, but they've proven to be reasonable effective over time at finding regressions.
In the process of porting these tests over, I found my first "cleanup per llvm code style standards" bug. We were relying on the order of iteration when testing the base pointers found for a derived pointer. When we switched from std::set to DenseSet, this stopped being a safe assumption. I'm suspecting I'm going to find more of those. In particular, I'm now really wondering about the main iteration loop for this algorithm. I need to go take a closer look at the assumptions there.
I'm not really happy with the fact these are testing what is essentially debug output (i.e. enabled via command line flags). Suggestions for how to structure this better are very welcome.
llvm-svn: 230818
Straightforward patch to emit an alignment directive when emitting a
TOC entry. The test case was generated from the test in PR22711 that
demonstrated a misaligned .toc section. The object code is run
through llvm-readobj to verify that the correct alignment has been
applied to the .toc section.
Thanks to Ulrich Weigand for running down where the fix was needed.
llvm-svn: 230801
Essentially the same as the GEP change in r230786.
A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)
import fileinput
import sys
import re
pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")
for line in sys.stdin:
sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7649
llvm-svn: 230794
Summary:
Until now, we did this (among other things) based on whether or not the
target was Windows. This is clearly wrong, not just for Win64 ABI functions
on non-Windows, but for System V ABI functions on Windows, too. In this
change, we make this decision based on the ABI the calling convention
specifies instead.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7953
llvm-svn: 230793
When using Altivec, we can use vector loads and stores for aligned memcpy and
friends. Starting with the P7 and VXS, we have reasonable unaligned vector
stores. Starting with the P8, we have fast unaligned loads too.
For QPX, we use vector loads are stores, but only for aligned memory accesses.
llvm-svn: 230788
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.
This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.
* This doesn't modify gep operators, only instructions (operators will be
handled separately)
* Textual IR changes only. Bitcode (including upgrade) and changing the
in-memory representation will be in separate changes.
* geps of vectors are transformed as:
getelementptr <4 x float*> %x, ...
->getelementptr float, <4 x float*> %x, ...
Then, once the opaque pointer type is introduced, this will ultimately look
like:
getelementptr float, <4 x ptr> %x
with the unambiguous interpretation that it is a vector of pointers to float.
* address spaces remain on the pointer, not the type:
getelementptr float addrspace(1)* %x
->getelementptr float, float addrspace(1)* %x
Then, eventually:
getelementptr float, ptr addrspace(1) %x
Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.
update.py:
import fileinput
import sys
import re
ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
def conv(match, line):
if not match:
return line
line = match.groups()[0]
if len(match.groups()[5]) == 0:
line += match.groups()[2]
line += match.groups()[3]
line += ", "
line += match.groups()[1]
line += "\n"
return line
for line in sys.stdin:
if line.find("getelementptr ") == line.find("getelementptr inbounds"):
if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
line = conv(re.match(ibrep, line), line)
elif line.find("getelementptr ") != line.find("getelementptr ("):
line = conv(re.match(normrep, line), line)
sys.stdout.write(line)
apply.sh:
for name in "$@"
do
python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
rm -f "$name.tmp"
done
The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh
After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).
The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7636
llvm-svn: 230786
This work is currently being rethought along different lines and
if this work is needed it can be resurrected out of svn. Remove it
for now as no current work in ongoing on it and it's unused. Verified
with the authors before removal.
llvm-svn: 230780
Summary:
Currently fast-isel-abort will only abort for regular instructions,
and just warn for function calls, terminators, function arguments.
There is already fast-isel-abort-args but nothing for calls and
terminators.
This change turns the fast-isel-abort options into an integer option,
so that multiple levels of strictness can be defined.
This will help no being surprised when the "abort" option indeed does
not abort, and enables the possibility to write test that verifies
that no intrinsics are forgotten by fast-isel.
Reviewers: resistor, echristo
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D7941
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230775
This removes a bit of duplicated code and more importantly, remembers the
labels so that they don't need to be looked up by name.
This in turn allows for any name to be used and avoids a crash if the name
we wanted was already taken.
llvm-svn: 230772
vectors. This lets us fix the rest of the v16 lowering problems when
pshufb is clearly better.
We might still be able to improve some of the lowerings by enabling the
other combine-based rewriting to fire for non-128-bit vectors, but this
at least should remove any regressions from using the fancy v16i16
lowering strategy.
llvm-svn: 230753
repeated 128-bit lane shuffles of wider vector types and use it to lower
256-bit v16i16 vector shuffles where applicable.
This should let us perfectly lowering the pattern of pshuflw and pshufhw
even for AVX2 256-bit patterns.
I've not added AVX-512 support, but it should be trivial for someone
working on that to wire up.
Note that currently this generates bad, long shuffle chains because we
don't combine 256-bit target shuffles. The subsequent patches will fix
that.
llvm-svn: 230751
Summary:
We identify the cases where the operand to an ADDE node is a constant
zero. In such cases, we can avoid generating an extra ADDu instruction
disguised as an identity move alias (ie. addu $r, $r, 0 --> move $r, $r).
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7906
llvm-svn: 230742
Currently, the ASan executables built with -O0 are unnecessarily slow.
The main reason is that ASan instrumentation pass inserts redundant
checks around promotable allocas. These allocas do not get instrumented
under -O1 because they get converted to virtual registered by mem2reg.
With this patch, ASan instrumentation pass will only instrument non
promotable allocas, giving us a speedup of 39% on a collection of
benchmarks with -O0. (There is no measurable speedup at -O1.)
llvm-svn: 230724
that is iterating over it
Inserting elements into a `DenseMap` invalidated iterators pointing
into the `DenseMap` instance.
Differential Revision: http://reviews.llvm.org/D7924
llvm-svn: 230719
Summary:
This change causes us to actually save non-volatile registers in a Win64
ABI function that calls a System V ABI function, and vice-versa.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7919
llvm-svn: 230714
uses of TM->getSubtargetImpl and propagate to all calls.
This could be a debugging regression in places where we had a
TargetMachine and/or MachineFunction but don't have it as part
of the MachineInstr. Fixing this would require passing a
MachineFunction/Function down through the print operator, but
none of the existing uses in tree seem to do this.
llvm-svn: 230710
Function pointers were not correctly handled by the dumper, and
they would print as "* name". They now print as
"int (__cdecl *name)(int arg1, int arg2)" as they should.
Also, doubles were being printed as floats. This fixes that bug
as well, and adds tests for all builtin types. as well as a test
for function pointers.
llvm-svn: 230703
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.
llvm-svn: 230699
blend as legal.
We made the same mistake in two different places. Whenever we are custom
lowering a v32i8 blend we need to check whether we are custom lowering
it only for constant conditions that can be shuffled, or whether we
actually have AVX2 and full dynamic blending support on bytes. Both are
fixed, with comments added to make it clear what is going on and a new
test case.
llvm-svn: 230695
dynamic blends.
This makes it much more clear what is going on. The case we're handling
is that of dynamic conditions, and we're bailing when the nature of the
vector types and subtarget preclude lowering the dynamic condition
vselect as an actual blend.
No functionality changed here, but this will make a subsequent bug-fix
to this code much more clear.
llvm-svn: 230690
Creating BinaryCoverageReader is a strange and complicated dance where
the constructor sets error codes that member functions will later
read, and the object is in an invalid state if readHeader isn't
immediately called after construction.
Instead, make the constructor private and add a static create method
to do the construction properly. This also has the benefit of removing
readHeader completely and simplifying the interface of the object.
llvm-svn: 230676
It is not sound to mark the increment operation as `nuw` or `nsw`
based on a proof off of the add recurrence if the increment operation
we emit happens to be a `sub` instruction.
I could not come up with a test case for this -- the cases where
SCEVExpander decides to emit a `sub` instruction is quite small, and I
cannot think of a way I'd be able to get SCEV to prove that the
increment does not overflow in those cases.
Differential Revision: http://reviews.llvm.org/D7899
llvm-svn: 230673
On 32bits x86 Darwin, the register mappings for the eh_frane and
debug_frame sections are different. Thus the same CFI instructions
should result in different registers in the object file. The
problem isn't target specific though, but it requires that the
mappings for EH register numbers be different from the standard
Dwarf one.
The patch looks a bit clumsy. LLVM uses the EH mapping as
canonical for everything frame related. Thus we need to do a
double conversion EH -> LLVM -> Non-EH, when emitting the
debug_frame section.
Fixes PR22363.
Differential Revision: http://reviews.llvm.org/D7593
llvm-svn: 230670
InstCombine has long had logic to convert aligned Altivec load/store intrinsics
into regular loads and stores. This mirrors that functionality for QPX vector
load/store intrinsics.
llvm-svn: 230660
have the debugger step through each one individually. Turn off the
combine for adjacent stores at -O0 so we get this behavior.
Possibly, DAGCombine shouldn't run at all at -O0, but that's for
another day; see PR22346.
Differential Revision: http://reviews.llvm.org/D7181
llvm-svn: 230659
There was a problem when passing structures as variable arguments.
The structures smaller than 64 bit were not left justified on MIPS64
big endian. This is now fixed by shifting the value to make it left-
justified when appropriate.
This fixes the bug http://llvm.org/bugs/show_bug.cgi?id=21608
Patch by Aleksandar Beserminji.
Differential Revision: http://reviews.llvm.org/D7881
llvm-svn: 230657
In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the
features bits are not computed. This patch lets the asm printer
emit ".cpu cortex-a9" directive for krait and the hwdiv feature is
enabled through ".arch_extension". In short, krait is treated
as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since
it is not supported bu GNU GAS yet
llvm-svn: 230651
This patch is in response to r223147 where the avaiable features are
computed based on ".cpu" directive. This will work clean for the standard
variants like cortex-a9. For custom variants which rely on standard cpu names
for assembly, the additional features of a CPU should be propagated. This can be
done via ".arch_extension" as long as the assembler supports it. The
implementation for krait along with unit test will be submitted in next patch.
llvm-svn: 230650
accesses are via different types
Noticed this while generalizing the code for loop distribution.
I confirmed with Arnold that this was indeed a bug and managed to create
a testcase.
llvm-svn: 230647
The latency for the WriteMULm class was set to 4, which is actually lower than the latency for WriteMULr (5).
A better estimate would be 4 added to WriteMULr, that is, 9.
llvm-svn: 230634
formulaic into the top v8i16 lowering routine.
This makes the generalized lowering a completely general and single path
lowering which will allow generalizing it in turn for multiple 128-bit
lanes.
llvm-svn: 230623
Also remove the somewhat misleading initializers from
VectorizationFactor and VectorizationInterleave. They will get
initialized with the default ctor since no cl::init is provided.
llvm-svn: 230608
Use the IRBuilder helpers for gc.statepoint and gc.result, instead of
coding the construction by hand. Note that the gc.statepoint IRBuilder
handles only CallInst, not InvokeInst; retain that part of hand-coding.
Differential Revision: http://reviews.llvm.org/D7518
llvm-svn: 230591
Explanation: This function is in TargetLowering because it uses
RegClassForVT which would need to be moved to TargetRegisterInfo
and would necessitate moving isTypeLegal over as well - a massive
change that would just require TargetLowering having a TargetRegisterInfo
class member that it would use.
llvm-svn: 230585
This required plumbing a TargetRegisterInfo through computeRegisterProperties
and into findRepresentativeClass which uses it for register class
iteration. This required passing a subtarget into a few target specific
initializations of TargetLowering.
llvm-svn: 230583
This symbol exists only to pull in the required pieces of the runtime,
so nothing ever needs to refer to it. Making it hidden avoids the
potential for issues with duplicate symbols when linking profiled
libraries together.
llvm-svn: 230566
Remove a newline from `AssemblyWriter::printMDNodeBody()`, and add one
to `AssemblyWriter::writeMDNode()`. NFCI for assembly output.
However, this drops an inconsistent newline from `Metadata::print()`
when `this` is an `MDNode`. Now the newline added by `Metadata::dump()`
won't look so verbose.
llvm-svn: 230565
This is a follow-on to r227491 which tightens the check for propagating FP
values. If a non-constant value happens to be a zero, we would hit the same
bug as before.
Bug noted and patch suggested by Eli Friedman.
llvm-svn: 230564
the .h file. It's used in only one place (other than recursively)
and there's no need to include it everywhere.
Saves almost 900k from total llvm object file size.
llvm-svn: 230561
Summary: SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is.
Test Plan: make check
Reviewers: jvoung, chandlerc
Subscribers: llvm-commits
llvm-svn: 230560
It turns out we have a macro to ensure that debuggers can access
`dump()` methods. Use it. Hopefully this will prevent me (and others)
from committing crimes like in r223802 (search for /10000/, or just see
the fix in r224407).
llvm-svn: 230555
LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node,
represent loads from the TOC, which is invariant. As a result, these loads can
be hoisted out of loops, etc. In order to do this, we need to generate
GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory
intrinsic node type. Once this is done, the MMO transfer is automatically
handled for TableGen-driven instruction selection, and for nodes generated
directly in PPCISelDAGToDAG, we need to transfer the MMOs manually.
Also, we were not transferring MMOs associated with pre-increment loads, so do
that too.
Lastly, this fixes an exposed bug where R30 was not added as a defined operand of
UpdateGBR.
This problem was highlighted by an example (used to generate the test case)
posted to llvmdev by Francois Pichet.
llvm-svn: 230553
This is the first commit in a small series aiming at making
debug_frame dump more useful (right now it prints a list of
opeartions without their operands).
llvm-svn: 230547
The Win64 epilogue structure is very restrictive, it permits a very
small number of opcodes and none of them are 'mov'.
This means that given:
mov %rbp, %rsp
pop %rbp
The mov isn't the epilogue, only the pop is. This is problematic unless
a frame pointer is present in which case we are free to do whatever we'd
like in the "body" of the function. If a frame pointer is present,
unwinding will undo the prologue operations in reverse order regardless
of the fact that we are at an instruction which is reseting the stack
pointer.
llvm-svn: 230543
This change aligns globals to the next highest power of 2 bytes, up to a
maximum of 128. This makes it more likely that we will be able to compress
bit sets with a greater alignment. In many more cases, we can now take
advantage of a new optimization also introduced in this patch that removes
bit set checks if the bit set is all ones.
The 128 byte maximum was found to provide the best tradeoff between instruction
overhead and data overhead in a recent build of Chromium. It allows us to
remove ~2.4MB of instructions at the cost of ~250KB of data.
Differential Revision: http://reviews.llvm.org/D7873
llvm-svn: 230540
(The change was landed in r230280 and caused the regression PR22674.
This version contains a fix and a test-case for PR22674).
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
Apart from the attached test case, another (more realistic)
manifestation of the bug can be seen in
Transforms/IndVarSimplify/pr20680.ll.
Differential Revision: http://reviews.llvm.org/D7778
llvm-svn: 230533
We had somehow accumulated a few target-specific SDAG nodes dealing with PPC64
TOC access that were referenced only in TableGen patterns. The associated
(pseudo-)instructions are used, but are being generated directly. NFC.
llvm-svn: 230518
With a diabolically crafted test case, we could recurse
through this code and return true instead of false.
The larger engineering crime is the use of magic numbers.
Added FIXME comments for those.
llvm-svn: 230515
Reapply r230248.
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.
llvm-svn: 230499
MMX_MOVD64rm zero-extends i32 load results into i64 registers.
The peephole optimizer will try to fold it in other MMX foldable
instructions, the wrong thing to do, since there's no MMX memory
instruction that loads from i32 and does implict zero extension.
Remove 'canFoldAsLoad' from MOVD64rm in order to prevent such folding.
The current MMX tests already test this, but since there are no MMX
instructions in the foldable tables yet, this did not trigger. This
commit prepares the addition of those instructions.
llvm-svn: 230498
Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR,
STR, and ADD only allow offsets that are a multiple of 4. Make some changes
to better make use of these instructions:
* Use word loads for anyext byte and halfword loads from the stack.
* Enforce 4-byte alignment on objects accessed in this way, to ensure that
the offset is valid.
* Do the same for objects whose frame index is used, in order to avoid having
to use more than one ADD to generate the frame index.
* Correct how many bits of offset we think AddrModeT1_s has.
Patch by John Brawn.
llvm-svn: 230496
Gather and scatter instructions additionally write to one of the source operands - mask register.
In this case Gather has 2 destination values - the loaded value and the mask.
Till now we did not support code gen pattern for gather - the instruction was generated from
intrinsic only and machine node was hardcoded.
When we introduce the masked_gather node, we need to select instruction automatically,
in the standard way.
I added a flag "hasTwoExplicitDefs" that allows to handle 2 destination operands.
(Some code in the X86InstrFragmentsSIMD.td is commented out, just to split one big
patch in many small patches)
llvm-svn: 230471
Summary:
This change fixes the FIXME that you recently added when you committed
(a modified version of) my patch. When `InstCombine` combines a load and
store of an pointer to those of an equivalently-sized integer, it currently
drops any `!nonnull` metadata that might be present. This change replaces
`!nonnull` metadata with `!range !{ 1, -1 }` metadata instead.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7621
llvm-svn: 230462
Like r230414, add bitcode support including backwards compatibility, for
an explicit type parameter to GEP.
At the suggestion of Duncan I tried coalescing the two older bitcodes into a
single new bitcode, though I did hit a wrinkle: I couldn't figure out how to
create an explicit abbreviation for a record with a variable number of
arguments (the indicies to the gep). This means the discriminator between
inbounds and non-inbounds gep is a full variable-length field I believe? Is my
understanding correct? Is there a way to create such an abbreviation? Should I
just use two bitcodes as before?
Reviewers: dexonsmith
Differential Revision: http://reviews.llvm.org/D7736
llvm-svn: 230415
Summary:
I've taken my best guess at this, but I've cargo culted in places & so
explanations/corrections would be great.
This seems to pass all the tests (check-all, covering clang and llvm) so I
believe that pretty well exercises both the backwards compatibility and common
(same version) compatibility given the number of checked in bitcode files we
already have. Is that a reasonable approach to testing here? Would some more
explicit tests be desired?
1) is this the right way to do back-compat in this case (looking at the number
of entries in the bitcode record to disambiguate between the old schema and
the new?)
2) I don't quite understand the logarithm logic to choose the encoding type of
the type parameter in the abbreviation description, but I found another
instruction doing the same thing & it seems to work. Is that the right
approach?
Reviewers: dexonsmith
Differential Revision: http://reviews.llvm.org/D7655
llvm-svn: 230414
This adds support for the QPX vector instruction set, which is used by the
enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes
wide, holding 4 double-precision floating-point values. Boolean values, modeled
here as <4 x i1> are actually also represented as floating-point values
(essentially { -1, 1 } for { false, true }). QPX shares many features with
Altivec and VSX, but is distinct from both of them. One major difference is
that, instead of adding completely-separate vector registers, QPX vector
registers are extensions of the scalar floating-point registers (lane 0 is the
corresponding scalar floating-point value). The operations supported on QPX
vectors mirrors that supported on the scalar floating-point values (with some
additional ones for permutations and logical/comparison operations).
I've been maintaining this support out-of-tree, as part of the bgclang project,
for several years. This is not the entire bgclang patch set, but is most of the
subset that can be cleanly integrated into LLVM proper at this time. Adding
this to the LLVM backend is part of my efforts to rebase bgclang to the current
LLVM trunk, but is independently useful (especially for codes that use LLVM as
a JIT in library form).
The assembler/disassembler test coverage is complete. The CodeGen test coverage
is not, but I've included some tests, and more will be added as follow-up work.
llvm-svn: 230413
This patch unifies the comdat and non-comdat code paths. By doing this
it add missing features to the comdat side and removes the fixed
section assumptions from the non-comdat side.
In ELF there is no one true section for "4 byte mergeable" constants.
We are better off computing the required properties of the section
and asking the context for it.
llvm-svn: 230411
The builder is based on a layout algorithm that tries to keep members of
small bit sets together. The new layout compresses Chromium's bit sets to
around 15% of their original size.
Differential Revision: http://reviews.llvm.org/D7796
llvm-svn: 230394
There is no need to open-code the alignment calculation, we have a
handy RoundUpToAlignment function which "Does The Right Thing (TM)".
llvm-svn: 230392
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date: Mon Feb 23 23:04:28 2015 +0000
Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.
and
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date: Sun Feb 22 18:17:28 2015 +0000
[DagCombiner] Generalized BuildVector Vector Concatenation
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.
This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.
This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.
Differential Revision: http://reviews.llvm.org/D7816
as the root cause of PR22678 which is causing an assertion inside the DAG combiner.
I'll follow up to the main thread as well.
llvm-svn: 230358
The reason why these large shift sizes happen is because OpaqueConstants
currently inhibit alot of DAG combining, but that has to be addressed in
another commit (like the proposal in D6946).
Differential Revision: http://reviews.llvm.org/D6940
llvm-svn: 230355
The logic is almost there already, with our special homogeneous aggregate
handling. Tweaking it like this allows front-ends to emit AAPCS compliant code
without ever having to count registers or add discarded padding arguments.
Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to
apply the logic to all integer arrays for more consistency.
llvm-svn: 230348
This reverts commit r230062.
Debian stable (wheezy) ships still with cmake 2.8.9.
The commit broke my LLVM/Polly buildbot, to my knowledge our only Linux+cmake
buildbot.
llvm-svn: 230343
For almost all node types, if the target requested custom lowering, and
LowerOperation returned its input, we'd treat the original node as legal. This
did not work, however, for many loads and stores, because they follow
slightly different code paths, and we did not account for the possibility of
LowerOperation returning its input at those call sites.
I think that we now handle this consistently everywhere. At the call sites in
LegalizeDAG, we used to assert in this case, so there's no functional change
for any existing code there. For the call sites in LegalizeVectorOps, this
really only affects whether or not we set Changed = true, but I think makes the
semantics clearer.
No test case here, but it will be covered by an upcoming PowerPC commit adding
QPX support.
llvm-svn: 230332
Summary: Separated some instruction and pseudo-instruction definitions from InstAlias definitions, added banner for pseudo-instructions and removed a redundant whitespace from a pseudo-instruction definition. No functional change.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7552
llvm-svn: 230327
When AddressSanitizer only a single dynamic alloca and no static allocas, due to an early exit from FunctionStackPoisoner::poisonStack we forget to unpoison the dynamic alloca. This patch fixes that.
Reviewed at http://reviews.llvm.org/D7810
llvm-svn: 230316
Summary: Begin to add various address modes; including alloca.
Test Plan: Make sure there are no regressions in test-suite at O0/02 in mips32r1/r2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: echristo, rfuhler, llvm-commits
Differential Revision: http://reviews.llvm.org/D6426
llvm-svn: 230300
This is a follow up to r230233 to fix something that I noticed by
inspection. The AddrModeT2_i8s4 addressing mode does not support
negative offsets. I spent a good chunk of the day trying to come up with
a testcase for this but was not successful. This addressing mode is used
to spill and restore GPRPair registers in Thumb2 code and that does not
happen often. We also make very limited used of negative offsets when
lowering frame indexes. I am going ahead with the change anyway, because
I am pretty confident that it is correct. I also added a missing assertion
to check that the low bits of the scaled offset are zero.
llvm-svn: 230297
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal. {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.
NOTE: I had accidentally committed an unrelated change with the commit
message of this change in r230275 (r230275 was reverted in r230279).
This is the correct change for this commit message.
Differential Revision: http://reviews.llvm.org/D7808
llvm-svn: 230291
When debugging LTO issues with ld64, we use -save-temps to save the merged
optimized bitcode file, then invoke ld64 again on the single bitcode file to
speed up debugging code generation passes and ld64 stuff after code generation.
llvm linking a single bitcode file via lto_codegen_add_module will generate a
different bitcode file from the single input. With the newly-added
lto_codegen_set_module, we can make sure the destination module is the same as
the input.
lto_codegen_set_module will transfer the ownship of the module to code
generator.
rdar://19024554
llvm-svn: 230290
This case is interesting because ScalarEvolutionExpander lowers min(a,
b) as ~max(~a,~b). I think the profitability heuristics can be made
more clever/aggressive, but this is a start.
Differential Revision: http://reviews.llvm.org/D7821
llvm-svn: 230285
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
Apart from the attached test case, another (more realistic) manifestation
of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll.
NOTE: this change was landed with an incorrect commit message in
rL230275 and was reverted for that reason in rL230279. This commit
message is the correct one.
Differential Revision: http://reviews.llvm.org/D7778
llvm-svn: 230280
230275 got committed with an incorrect commit message due to a mixup
on my side. Will re-land in a few moments with the correct commit
message.
llvm-svn: 230279
This patch teaches the backend how to expand a double-half conversion into
a double-float conversion immediately followed by a float-half conversion.
We do this only under fast-math, and if float-half conversions are legal
for the target.
Added test CodeGen/X86/fastmath-float-half-conversion.ll
Differential Revision: http://reviews.llvm.org/D7832
llvm-svn: 230276
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal. {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.
Differential Revision: http://reviews.llvm.org/D7808
llvm-svn: 230275