The atomic tests assume the two-operand forms, so I've restricted them to z10.
Running and-01.ll, or-01.ll and xor-01.ll for z196 as well as z10 shows why
using convertToThreeAddress() is better than exposing the three-operand forms
first and then converting back to two operands where possible (which is what
I'd originally tried). Using the three-operand form first stops us from
taking advantage of NG, OG and XG for spills.
llvm-svn: 186683
This first step just adds definitions for SLLK, SRLK and SRAK.
The next patch will actually make use of them during codegen.
insn-bad.s tests that some form of error is reported when using these
instructions on z10. More work is needed to get the "instruction requires:
distinct-ops" that we'd ideally like, so I've stubbed that part out for now.
I'll come back and make it mandatory once the necessary changes are in.
llvm-svn: 186680
Somehow forgot to git rm these two files. I believe I left the remaining
invalid* tests intentionally, though whether my reasons were sound is a
different matter.
llvm-svn: 186663
The tests were checking for barriers which the ARM ARM says they must execute
as a full system DMB/DSB, rather than that they're UNDEFINED and LLVM does in
fact represent them.
The tests happened to be passing because they were using a non-versioned ARM
triple which didn't have *any* DMB/DSB instructions.
llvm-svn: 186662
This allows "llvm-mc -disassemble" to accept two new features:
+ Using comma as a byte separator
+ Grouping bytes with '[' and ']' pairs.
The behaviour outside a [...] group is unchanged. But within the group once
llvm-mc encounters a true error, it stops rather than trying to resynchronise
the stream at the next byte. This is more useful for disassembly tests, where
we have an almost-instruction in mind and don't care what the misaligned
interpretation would be. Particularly if it means llvm-mc won't actually see
the next intended almost-instruction.
As a side effect, this means llvm-mc can disassemble its own -show-encoding
output if copy-pasted.
llvm-svn: 186661
implementation of the SROA algorithm. We were using the term 'partition'
in many places that no longer ever represented an actual partition, but
rather just an arbitrary slice of an alloca.
No functionality change intended here. Mostly just renaming of types,
functions, variables, and rewording of comments. Several comments were
rewritten to make a lot more sense in the new structure of things.
The stats are still weird and not reflective of how this really works.
I'll fix those up in a separate patch as it is a touch more semantic of
a change...
llvm-svn: 186659
SROA.
The crux of the issue is that now we track uses of a partition of the
alloca in two places: the iterators over the partitioning uses and the
previously collected split uses vector. We weren't accounting for the
fact that the split uses might invalidate integer widening in ways other
than due to their width (in this case due to being volatile).
Further reduced testcase added to the tests.
llvm-svn: 186655
1> Use DebugInfoFinder to find debug info MDNodes.
2> Add disable-debug-info-verifier to disable verifying debug info.
3> Disable verifying for testing cases that fail (will update the testing cases
later on).
4> MDNodes generated by clang can have empty filename for TAG_inheritance and
TAG_friend, so DIType::Verify is modified accordingly.
Note that DebugInfoFinder does not list all debug info MDNode.
For example, clang can generate:
metadata !{i32 786468}, which will fail to verify.
This MDNode is used by debug info but not included in DebugInfoFinder.
This MDNode is generated as a temporary node in DIBuilder::createFunction
Value *TElts[] = { GetTagConstant(VMContext, DW_TAG_base_type) };
MDNode::getTemporary(VMContext, TElts)
llvm-svn: 186634
All changes were made by the following bash script:
find test/CodeGen -name "*.ll" | \
while read NAME; do
echo "$NAME"
grep -q "^; *RUN: *llc.*debug" $NAME && continue
grep -q "^; *RUN:.*llvm-objdump" $NAME && continue
grep -q "^; *RUN: *opt.*" $NAME && continue
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\([A-Za-z0-9_-]*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC[:]* *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
done
sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
mv $TEMP $NAME
done
This script catches a superset of the cases caught by the script associated with commit r186280. It initially found some false positives due to unusual constructs in a minority of tests; all such cases were disambiguated first in commit r186621.
llvm-svn: 186624
Summary:
Dump optional data directory entries in the PE/COFF header, so that
we can test the output of LLD linker. This patch updates the test binary
file, but the source of the binary is the same. I just re-linked the file.
I don't know how the previous file was linked, but the previous file did
not have any data directory entries for some reason.
Reviewers: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1148
llvm-svn: 186623
* assert that the return value is one of the documented values on msdn.
* on FILE_TYPE_UNKNOWN, check GetLastError.
Unfortunately I can't think of a way to get a FILE_TYPE_UNKNOWN on a test.
llvm-svn: 186595
The plan is to use it for clang and lld.
Major behavior changes:
- We can now parse UTF-16 files that have a byte order mark.
- PR16209: Don't drop backslashes on the floor if they don't escape
anything.
The actual parsing loop was based on code from Clang's driver.cpp,
although it's been rewritten to track its state with control flow rather
than state variables.
Reviewers: hans
Differential Revision: http://llvm-reviews.chandlerc.com/D1170
llvm-svn: 186587
The original code only folded SRA into ROTATE ... SELECTED BITS
if there was no outer shift. This patch splits out that check
and generalises it slightly. The extra cases aren't really that
interesting, but this is paving the way for RNSBG support.
llvm-svn: 186571
In hindsight, using "RISBG" for something that can be any type of
R.SBG instruction was a bit confusing, so this renames it to RxSBG.
That might not be the best choice either, since there is an instruction
called RXSBG, but hopefully the lower-case letter stands out enough.
While there I fixed a couple of GNUisms that had crept in --
sorry about that!
llvm-svn: 186569
end of a vector. This was found with ASan. I've had one other report of
a crasher, but thus far been unable to reproduce the crash. It may well
be fixed with this version, and if not I'd like to get more information
from the build bots about what is happening.
See r186316 for the full commit log for the new implementation of the
SROA algorithm.
llvm-svn: 186565
Support for dynamic stack alignments in the PPC backend has been unfinished, in
part because it depends on dynamic stack realignment (which I only just
recently implemented fully). Now we can also support dynamic allocas with
higher than the default target stack alignment (16 bytes).
In order to round-up the requested size to the maximum requested alignment, we
need an additional register to hold the rounded-up size. We're already using one
scavenged register to hold the previous stack-pointer value (which needs to be
stored with the signal-safe stdux update), and so when we have dynamic allocas
and a large alignment, we allocate two emergency spill slots for the scavenger.
llvm-svn: 186562
We don't want cast and dyn_cast to work on temporaries. They don't extend
lifetime like a direct bind to a reference would, so they can introduce
hard to find bugs.
I added tests to make sure we don't regress this. Thanks to Eli Friedman for
noticing this and for his suggestions on how to test it.
llvm-svn: 186559
First, this changes the base-pointer implementation to remove an unnecessary
complication (and one that is incompatible with how builtin SjLj is
implemented): instead of using r31 as the base pointer when it is not needed as
a frame pointer, now the base pointer will always be r30 when needed.
Second, we introduce another pseudo register, BP, which is used just like the FP
pseudo register to refer to the base register before we know for certain what
register it will be.
Third, we now save BP into the jmp_buf, and restore r30 from that slot in
longjmp. If the function that called setjmp did not use a base pointer, then
r30 will be overwritten by the setjmp-calling-function's restore code. FP
restoration (which is restored into r31) works the same way.
llvm-svn: 186545
it clear what we want to do. Unfortunately the conversion to
pointer operator fires now instead and chasing down all of the
conversions and making them explicit and handled is a large task
so add a FIXME with it.
llvm-svn: 186543
There were a couple of different loops that were not handling
'.' correctly in APFloat::convertFromHexadecimalString; these mistakes
could lead to assertion failures and incorrect rounding for overlong
hex float literals.
Fixes PR16643.
llvm-svn: 186539
This should fix the windows bots. It looks like the failing tests are of the
form
prog1 > file
prog2 file
and prog2 fails trying to read the file. The best fix would probably be to close
stdout/stderr in prog1, but it was not the intention of 186511 to change this,
so just restore the old behavior for now.
llvm-svn: 186530
This has some advantages:
* Lets us use native, utf16 windows functions.
* Easy to produce good errors on windows about trying to use a
directory when we want a file.
* Simplifies the unix version a bit.
llvm-svn: 186511
Duncan pointed out a mistake in my fix in r186425 when only one of the allocas
being compared had the target-default alignment. This is essentially his
suggested solution. Thanks!
llvm-svn: 186510
The issue is that CMAKE_BUILD_TYPE=RelWithDebInfo LLVM_ENABLE_ASSERTIONS=ON was
not building with assertions enabled. (I was unable to find what in the LLVM
source tree was adding -DNDEBUG to the build line in this case, so decided that
it must be cmake itself that was adding it - this may depend on the cmake
version). The fix treats any mode that is not Debug as being the same as
Release for this purpose (previously it was being assumed that cmake would only
add -DNDEBUG for Release and not for RelWithDebInfo or MinSizeRel). If other
versions of cmake don't add -DNDEBUG for RelWithDebInfo then that's OK: with
this change you just get a useless but harmless -UNDEBUG or -DNDEBUG.
llvm-svn: 186499
My patch 'r183551 - ARM FastISel integer sext/zext improvements' was incorrect when emitting ARM register-immediate ASR, LSL, LSR instructions: they are pseudo-instructions in ARMInstrInfo.td and I should have used MOVsi instead.
This is not an issue when code is generated through a .s file, but is an issue when generated straight to a .o (-filetype=obj).
llvm-svn: 186489
Because the builtin longjmp implementation uses a CTR-based indirect jump, when
the control flow arrives at the builtin setjmp call, the CTR register has
necessarily been clobbered. Correspondingly, this adds CTR to the list of
implicit definitions of the builtin setjmp pseudo instruction.
We don't need to add CTR to the implicit definitions of builtin longjmp
because, even though it does clobber the CTR register, the control flow cannot
return to inside the loop unless there is also a builtin setjmp call.
llvm-svn: 186488
Rename's documentation says "Files are renamed as if by POSIX rename()". and it
is used for atomically updating output files from a temporary. Having rename
fallback to a non atomic copy has the potential to hide bugs, like using
a temporary file in /tmp instead of a unique name next to the final destination.
llvm-svn: 186483
This builds on some frame-lowering code that has existed since 2005 (r24224)
but was disabled in 2008 (r48188) because it needed base pointer support to
function correctly. This implementation follows the strategy suggested by Dale
Johannesen in r48188 where the following comment was added:
This does not currently work, because the delta between old and new stack
pointers is added to offsets that reference incoming parameters after the
prolog is generated, and the code that does that doesn't handle a variable
delta. You don't want to do that anyway; a better approach is to reserve
another register that retains to the incoming stack pointer, and reference
parameters relative to that.
And now we do exactly that. If we don't need a frame pointer, then we use r31
as a base pointer. If we do need a frame pointer, then we use r30 as a base
pointer. The base pointer retains the value of the stack pointer before it was
decremented in the prologue. We then use the base pointer to resolve all
negative frame indicies. The basic scheme follows that for base pointers in the
X86 backend.
We use a base pointer when we need to dynamically realign the incoming stack
pointer. This currently applies only to static objects (dynamic allocas with
large alignments, and base-pointer support in SjLj lowering will come in future
commits).
llvm-svn: 186478
This check does not always work because not all of the GEPs use a constant offset, but it happens often enough to reduce the number of times we use SCEV.
llvm-svn: 186465
block. Blocks that have an indirect branch terminator, even if it's not the
last terminator, should still be treated as unanalyzable.
<rdar://problem/14437274>
Reducing a useful regression test case is proving difficult - I hope to have
one soon.
llvm-svn: 186461
This adds an instruction alias to make the assembler recognize the alternate literal form: pli [PC, #+/-<imm>]
See A8.8.129 in the ARM ARM (DDI 0406C.b).
Fixes <rdar://problem/14403733>.
llvm-svn: 186459
These floats all represented block frequencies anyway, so just use the
BlockFrequency class directly.
Some floating point computations remain in tryLocalSplit(). They are
estimating spill weights which are still floats.
llvm-svn: 186435
Original commit message:
Remove floating point computations from SpillPlacement.cpp.
Patch by Benjamin Kramer!
Use the BlockFrequency class instead of floats in the Hopfield network
computations. This rescales the node Bias field from a [-2;2] float
range to two block frequencies BiasN and BiasP pulling in opposite
directions. This construct has a more predictable behavior when block
frequencies saturate.
The per-node scaling factors are no longer necessary, assuming the block
frequencies around a bundle are consistent.
This patch can cause the register allocator to make different spilling
decisions. The differences should be small.
llvm-svn: 186434
Use PMIN/PMAX for UGE/ULE vector comparions to reduce the number of required
instructions. This trick also works for UGT/ULT, but there is no advantage in
doing so. It wouldn't reduce the number of instructions and it would actually
reduce performance.
Reviewer: Ben
radar:5972691
llvm-svn: 186432
This is to support parsing UTF16 response files in LLVM/lib/Option for
lld and clang.
Reviewers: hans
Differential Revision: http://llvm-reviews.chandlerc.com/D1138
llvm-svn: 186426
For safety, the inliner cannot decrease the allignment on an alloca when
merging it with another.
I've included two variants of the test case for this: one with DataLayout
available, and one without. When DataLayout is not available, if only one of
the allocas uses the default alignment (getAlignment() == 0), then they cannot
be safely merged.
llvm-svn: 186425
When truncating to a format with fewer mantissa bits, APFloat::convert
will perform a right shift of the mantissa by the difference of the
precision of the two formats. Usually, this will result in just the
mantissa bits needed for the target format.
One special situation is if the input number is denormal. In this case,
the right shift may discard significant bits. This is usually not a
problem, since truncating a denormal usually results in zero (underflow)
after normalization anyway, since the result format's exponent range is
usually smaller than the target format's.
However, there is one case where the latter property does not hold:
when truncating from ppc_fp128 to double. In particular, truncating
a ppc_fp128 whose first double of the pair is denormal should result
in just that first double, not zero. The current code however
performs an excessive right shift, resulting in lost result bits.
This is then caught in the APFloat::normalize call performed by
APFloat::convert and causes an assertion failure.
This patch checks for the scenario of truncating a denormal, and
attempts to (possibly partially) replace the initial mantissa
right shift by decrementing the exponent, if doing so will still
result in a valid *target format* exponent.
Index: test/CodeGen/PowerPC/pr16573.ll
===================================================================
--- test/CodeGen/PowerPC/pr16573.ll (revision 0)
+++ test/CodeGen/PowerPC/pr16573.ll (revision 0)
@@ -0,0 +1,11 @@
+; RUN: llc < %s | FileCheck %s
+
+target triple = "powerpc64-unknown-linux-gnu"
+
+define double @test() {
+ %1 = fptrunc ppc_fp128 0xM818F2887B9295809800000000032D000 to double
+ ret double %1
+}
+
+; CHECK: .quad -9111018957755033591
+
Index: lib/Support/APFloat.cpp
===================================================================
--- lib/Support/APFloat.cpp (revision 185817)
+++ lib/Support/APFloat.cpp (working copy)
@@ -1956,6 +1956,23 @@
X86SpecialNan = true;
}
+ // If this is a truncation of a denormal number, and the target semantics
+ // has larger exponent range than the source semantics (this can happen
+ // when truncating from PowerPC double-double to double format), the
+ // right shift could lose result mantissa bits. Adjust exponent instead
+ // of performing excessive shift.
+ if (shift < 0 && isFiniteNonZero()) {
+ int exponentChange = significandMSB() + 1 - fromSemantics.precision;
+ if (exponent + exponentChange < toSemantics.minExponent)
+ exponentChange = toSemantics.minExponent - exponent;
+ if (exponentChange < shift)
+ exponentChange = shift;
+ if (exponentChange < 0) {
+ shift -= exponentChange;
+ exponent += exponentChange;
+ }
+ }
+
// If this is a truncation, perform the shift before we narrow the storage.
if (shift < 0 && (isFiniteNonZero() || category==fcNaN))
lostFraction = shiftRight(significandParts(), oldPartCount, -shift);
llvm-svn: 186409
We'd forgotten to provide string representations for the special ARMISD atomic
nodes; this adds them in. No effect on CodeGen, just makes the output of
"-view-whatever-dags" slightly more readable.
llvm-svn: 186406
Another patch in the series to make more use of R.SBG. This one extends
r186072 and r186073 to handle cases where the AND is inside the shift.
llvm-svn: 186399
This patch enables calls to __aeabi_idivmod when in EABI mode,
by using the remainder value returned on registers (R1),
enabled by the ARM triple "none-eabi". Note that Darwin and
GNUEABI triples will continue lowering on GNU style, that is,
using the stack for the remainder.
Still need to add SREM/UREM support fix for 64-bit lowering.
llvm-svn: 186390
This is a micro optimization. Instead of going char*->StringRef->Twine->char*,
go char*->Twine->char* and avoid having to copy the filename on the stack.
llvm-svn: 186380
llvm-ar is the only user of toWin32Time() (via setLastModificationAndAccessTime), and r186298 can be reverted.
It had been buggy since the initial commit.
FIXME: Could we rename {from|to}Win32Time as {from|to}Win32FILETIME in TimeValue?
llvm-svn: 186374
We can have a FrameSetup in one basic block and the matching FrameDestroy
in a different basic block when we have struct byval. In that case, SPAdj
is not zero at beginning of the basic block.
Modify PEI to correctly set SPAdj at beginning of each basic block using
DFS traversal. We used to assume SPAdj is 0 at beginning of each basic block.
PEI had an assert SPAdjCount || SPAdj == 0.
If we have a Destroy <n> followed by a Setup <m>, PEI will assert failure.
We can add an extra condition to make sure the pairs are matched:
The pairs start with a FrameSetup.
But since we are doing a much better job in the verifier, this patch removes
the check in PEI.
PR16393
llvm-svn: 186364
This change mirrors the changes that were made to the X86 and ARM targets to
support subtarget feature changing. As indicated in r182899, the mechanism is
still undergoing revision, and so as with the X86 and ARM targets, there is no
test case yet (there is no effective functionality change).
llvm-svn: 186357
1> on every path through the CFG, a FrameSetup <n> is always followed by a
FrameDestroy <n> and a FrameDestroy is always followed by a FrameSetup.
2> stack adjustments are identical on all CFG edges to a merge point.
3> frame is destroyed at end of a return block.
PR16393
llvm-svn: 186350
PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the
common subclass of the true and false inputs, and then selecting either the
32-bit or the 64-bit isel variant based on the result of calling
PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC)
(where RC is the common subclass). Unfortunately, this is not quite right: if
we have something like this:
%vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76;
G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6
then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and
G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8
pseudo-register). As a result, we also need to check the common subclass
against GPRC_NOR0 and G8RC_NOX0 explicitly.
This had not been a problem for clients of insertSelect that called
canInsertSelect first (because it had a compensating mistake), but insertSelect
is also used by the PPC pseudo-instruction expander, and this error was causing
a problem in that context.
This problem was found by csmith.
llvm-svn: 186343
This is consistent with the ELF object writer.
Add some COFF tests that relocate against an alias.
Reviewers: espindola
Differential Revision: http://llvm-reviews.chandlerc.com/D1079
llvm-svn: 186341
There is a comment at the top of DAGTypeLegalizer::PerformExpensiveChecks
which, in part, says:
// Note that these invariants may not hold momentarily when processing a node:
// the node being processed may be put in a map before being marked Processed.
Unfortunately, this assert would be valid only if the above-mentioned invariant
held unconditionally. This was causing llc to assert when, in fact,
everything was fine.
Thanks to Richard Sandiford for investigating this issue!
Fixes PR16562.
llvm-svn: 186338
a bot.
This reverts the commit which introduced a new implementation of the
fancy SROA pass designed to reduce its overhead. I'll skip the huge
commit log here, refer to r186316 if you're looking for how this all
works and why it works that way.
llvm-svn: 186332
No functionality change.
This is preparing to move response file parsing into lib/Option so it
can be shared between clang and lld. This change isn't just a
micro-optimization. Clang's driver uses a std::set<std::string> to
unique arguments while parsing response files, so this matches that.
llvm-svn: 186319
Joerg Sonnenberger tells me one can open a directory in freebsd. I will try
to centralize our calls to open so that we can handle O_BINARY in one place,
and will then handle this there too.
llvm-svn: 186317
different core implementation strategy.
Previously, SROA would build a relatively elaborate partitioning of an
alloca, associate uses with each partition, and then rewrite the uses of
each partition in an attempt to break apart the alloca into chunks that
could be promoted. This was very wasteful in terms of memory and compile
time because regardless of how complex the alloca or how much we're able
to do in breaking it up, all of the datastructure work to analyze the
partitioning was done up front.
The new implementation attempts to form partitions of the alloca lazily
and on the fly, rewriting the uses that make up that partition as it
goes. This has a few significant effects:
1) Much simpler data structures are used throughout.
2) No more double walk of the recursive use graph of the alloca, only
walk it once.
3) No more complex algorithms for associating a particular use with
a particular partition.
4) PHI and Select speculation is simplified and happens lazily.
5) More precise information is available about a specific use of the
alloca, removing the need for some side datastructures.
Ultimately, I think this is a much better implementation. It removes
about 300 lines of code, but arguably removes more like 500 considering
that some code grew in the process of being factored apart and cleaned
up for this all to work.
I've re-used as much of the old implementation as possible, which
includes the lion's share of code in the form of the rewriting logic.
The interesting new logic centers around how the uses of a partition are
sorted, and split into actual partitions.
Each instruction using a pointer derived from the alloca gets
a 'Partition' entry. This name is totally wrong, but I'll do a rename in
a follow-up commit as there is already enough churn here. The entry
describes the offset range accessed and the nature of the access. Once
we have all of these entries we sort them in a very specific way:
increasing order of begin offset, followed by whether they are
splittable uses (memcpy, etc), followed by the end offset or whatever.
Sorting by splittability is important as it simplifies the collection of
uses into a partition.
Once we have these uses sorted, we walk from the beginning to the end
building up a range of uses that form a partition of the alloca.
Overlapping unsplittable uses are merged into a single partition while
splittable uses are broken apart and carried from one partition to the
next. A partition is also introduced to bridge splittable uses between
the unsplittable regions when necessary.
I've looked at the performance PRs fairly closely. PR15471 no longer
will even load (the module is invalid). Not sure what is up there.
PR15412 improves by between 5% and 10%, however it is nearly impossible
to know what is holding it up as SROA (the entire pass) takes less time
than reading the IR for that test case. The analysis takes the same time
as running mem2reg on the final allocas. I suspect (without much
evidence) that the new implementation will scale much better however,
and it is just the small nature of the test cases that makes the changes
small and noisy. Either way, it is still simpler and cleaner I think.
llvm-svn: 186316
is executed within the same second as the inputs for the test are
checked out from the source tree, it will fail to update due to being
below the resolution of the 'mtime' test used.
Now, this may seem improbably to you... ok, maybe *really* improbable,
but consider a system which does distributed execution of tests by
shipping their inputs to another machine and runs them. That might cause
the mtime to be quite recent during the test run. ;]
Instead, create two files directly in the test (allowing all platforms
to see the problem) and add either a use of the 'touch' command that
forces one mtime to some time quite a bit in the past, or it sleeps for
just over a second to be outside of the precision window.
llvm-svn: 186282
This update was done with the following bash script:
find test/CodeGen -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
done
sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
mv $TEMP $NAME
fi
done
llvm-svn: 186280
The great thing about the SCEVAddRec No-Wrap flag (unlike nsw/nuw) is
that is can be preserved while normalizing (reassociating and
factoring).
The bad thing is that is can't be tranfered back to IR, which is one
of the reasons I don't like the concept of SCEVExpander.
Sorry, I can't think of a direct way to test this, which is why these
were FIXMEs for so long. I just think it's a good time to finally
clean it up.
llvm-svn: 186273
This conversion was done with the following bash script:
find test/Transforms -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)define\([^@]*\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3define\4@$FUNC(/g" $TEMP
done
mv $TEMP $NAME
fi
done
llvm-svn: 186269
This update was done with the following bash script:
find test/Transforms -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP
done
mv $TEMP $NAME
fi
done
llvm-svn: 186268
This was done with the following sed invocation to catch label lines demarking function boundaries:
sed -i '' "s/^;\( *\)\([A-Z0-9_]*\):\( *\)test\([A-Za-z0-9_-]*\):\( *\)$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen/*/*.ll
which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct.
llvm-svn: 186258
If an outside loop user of the reduction value uses the header phi node we
cannot just reduce the vectorized phi value in the vector code epilog because
we would loose VF-1 reductions.
lp:
p = phi (0, lv)
lv = lv + 1
...
brcond , lp, outside
outside:
usr = add 0, p
(Say the loop iterates two times, the value of p coming out of the loop is one).
We cannot just transform this to:
vlp:
p = phi (<0,0>, lv)
lv = lv + <1,1>
..
brcond , lp, outside
outside:
p_reduced = p[0] + [1];
usr = add 0, p_reduced
(Because the original loop iterated two times the vectorized loop would iterate
one time, but p_reduced ends up being zero instead of one).
We would have to execute VF-1 iterations in the scalar remainder loop in such
cases. For now, just disable vectorization.
PR16522
llvm-svn: 186256
It is failing with
YAMLTest.cpp:38: instantiated from here
YAMLTraits.h:226: error: 'llvm::yaml::MappingTraits<<unnamed>::BinaryHolder>::mapping' is not a valid template argument for type 'void (*)(llvm::yaml::IO&, <unnamed>::BinaryHolder&)' because function 'static void llvm::yaml::MappingTraits<<unnamed>::BinaryHolder>::mapping(llvm::yaml::IO&, <unnamed>::BinaryHolder&)' has not external linkage
llvm-svn: 186245
In general, one should always complete CFG modifications first, update
CFG-based analyses, like Dominatores and LoopInfo, then generate
instruction sequences.
LoopVectorizer was creating a new loop, calling SCEVExpander to
generate checks, then updating LoopInfo. I just changed the order.
llvm-svn: 186241
ARM paired GPR COPY was being lowered to two MOVr without CC. This
patch puts the CC back.
My test is a reduction of the case where I encountered the issue,
64-bit atomics use paired GPRs.
The issue only occurs with selectionDAG, FastISel doesn't encounter it
so I didn't bother calling it.
llvm-svn: 186226
This fixes two bugs is lib/Object that the use in llvm-ar found:
* In OS X created archives, the name can be padded with nulls. Strip them.
* In the constructor, remember the first non special member and use that in
begin_children. This makes sure we skip all special members, not just the
first one.
The change to llvm-ar itself consist of
* Using lib/Object for reading archives instead of ArchiveReader.cpp.
* Writing the modified archive directly, instead of creating an in memory
representation.
The old Archive library was way more general than what is needed, as can
be seen by the diffstat of this patch.
Having llvm-ar using lib/Object now opens the way for creating regular symbol
tables for both native objects and bitcode files so that we can use those
archives for LTO.
llvm-svn: 186197
Fixes a 35% degradation compared to unvectorized code in
MiBench/automotive-susan and an equally serious regression on a private
image processing benchmark.
radar://14351991
llvm-svn: 186188
Address calculation for gather/scather in vectorized code can incur a
significant cost making vectorization unbeneficial. Add infrastructure to add
cost.
Tests and cost model for targets will be in follow-up commits.
radar://14351991
llvm-svn: 186187