Rename's documentation says "Files are renamed as if by POSIX rename()". and it
is used for atomically updating output files from a temporary. Having rename
fallback to a non atomic copy has the potential to hide bugs, like using
a temporary file in /tmp instead of a unique name next to the final destination.
llvm-svn: 186483
This builds on some frame-lowering code that has existed since 2005 (r24224)
but was disabled in 2008 (r48188) because it needed base pointer support to
function correctly. This implementation follows the strategy suggested by Dale
Johannesen in r48188 where the following comment was added:
This does not currently work, because the delta between old and new stack
pointers is added to offsets that reference incoming parameters after the
prolog is generated, and the code that does that doesn't handle a variable
delta. You don't want to do that anyway; a better approach is to reserve
another register that retains to the incoming stack pointer, and reference
parameters relative to that.
And now we do exactly that. If we don't need a frame pointer, then we use r31
as a base pointer. If we do need a frame pointer, then we use r30 as a base
pointer. The base pointer retains the value of the stack pointer before it was
decremented in the prologue. We then use the base pointer to resolve all
negative frame indicies. The basic scheme follows that for base pointers in the
X86 backend.
We use a base pointer when we need to dynamically realign the incoming stack
pointer. This currently applies only to static objects (dynamic allocas with
large alignments, and base-pointer support in SjLj lowering will come in future
commits).
llvm-svn: 186478
This check does not always work because not all of the GEPs use a constant offset, but it happens often enough to reduce the number of times we use SCEV.
llvm-svn: 186465
block. Blocks that have an indirect branch terminator, even if it's not the
last terminator, should still be treated as unanalyzable.
<rdar://problem/14437274>
Reducing a useful regression test case is proving difficult - I hope to have
one soon.
llvm-svn: 186461
This adds an instruction alias to make the assembler recognize the alternate literal form: pli [PC, #+/-<imm>]
See A8.8.129 in the ARM ARM (DDI 0406C.b).
Fixes <rdar://problem/14403733>.
llvm-svn: 186459
These floats all represented block frequencies anyway, so just use the
BlockFrequency class directly.
Some floating point computations remain in tryLocalSplit(). They are
estimating spill weights which are still floats.
llvm-svn: 186435
Original commit message:
Remove floating point computations from SpillPlacement.cpp.
Patch by Benjamin Kramer!
Use the BlockFrequency class instead of floats in the Hopfield network
computations. This rescales the node Bias field from a [-2;2] float
range to two block frequencies BiasN and BiasP pulling in opposite
directions. This construct has a more predictable behavior when block
frequencies saturate.
The per-node scaling factors are no longer necessary, assuming the block
frequencies around a bundle are consistent.
This patch can cause the register allocator to make different spilling
decisions. The differences should be small.
llvm-svn: 186434
Use PMIN/PMAX for UGE/ULE vector comparions to reduce the number of required
instructions. This trick also works for UGT/ULT, but there is no advantage in
doing so. It wouldn't reduce the number of instructions and it would actually
reduce performance.
Reviewer: Ben
radar:5972691
llvm-svn: 186432
This is to support parsing UTF16 response files in LLVM/lib/Option for
lld and clang.
Reviewers: hans
Differential Revision: http://llvm-reviews.chandlerc.com/D1138
llvm-svn: 186426
For safety, the inliner cannot decrease the allignment on an alloca when
merging it with another.
I've included two variants of the test case for this: one with DataLayout
available, and one without. When DataLayout is not available, if only one of
the allocas uses the default alignment (getAlignment() == 0), then they cannot
be safely merged.
llvm-svn: 186425
When truncating to a format with fewer mantissa bits, APFloat::convert
will perform a right shift of the mantissa by the difference of the
precision of the two formats. Usually, this will result in just the
mantissa bits needed for the target format.
One special situation is if the input number is denormal. In this case,
the right shift may discard significant bits. This is usually not a
problem, since truncating a denormal usually results in zero (underflow)
after normalization anyway, since the result format's exponent range is
usually smaller than the target format's.
However, there is one case where the latter property does not hold:
when truncating from ppc_fp128 to double. In particular, truncating
a ppc_fp128 whose first double of the pair is denormal should result
in just that first double, not zero. The current code however
performs an excessive right shift, resulting in lost result bits.
This is then caught in the APFloat::normalize call performed by
APFloat::convert and causes an assertion failure.
This patch checks for the scenario of truncating a denormal, and
attempts to (possibly partially) replace the initial mantissa
right shift by decrementing the exponent, if doing so will still
result in a valid *target format* exponent.
Index: test/CodeGen/PowerPC/pr16573.ll
===================================================================
--- test/CodeGen/PowerPC/pr16573.ll (revision 0)
+++ test/CodeGen/PowerPC/pr16573.ll (revision 0)
@@ -0,0 +1,11 @@
+; RUN: llc < %s | FileCheck %s
+
+target triple = "powerpc64-unknown-linux-gnu"
+
+define double @test() {
+ %1 = fptrunc ppc_fp128 0xM818F2887B9295809800000000032D000 to double
+ ret double %1
+}
+
+; CHECK: .quad -9111018957755033591
+
Index: lib/Support/APFloat.cpp
===================================================================
--- lib/Support/APFloat.cpp (revision 185817)
+++ lib/Support/APFloat.cpp (working copy)
@@ -1956,6 +1956,23 @@
X86SpecialNan = true;
}
+ // If this is a truncation of a denormal number, and the target semantics
+ // has larger exponent range than the source semantics (this can happen
+ // when truncating from PowerPC double-double to double format), the
+ // right shift could lose result mantissa bits. Adjust exponent instead
+ // of performing excessive shift.
+ if (shift < 0 && isFiniteNonZero()) {
+ int exponentChange = significandMSB() + 1 - fromSemantics.precision;
+ if (exponent + exponentChange < toSemantics.minExponent)
+ exponentChange = toSemantics.minExponent - exponent;
+ if (exponentChange < shift)
+ exponentChange = shift;
+ if (exponentChange < 0) {
+ shift -= exponentChange;
+ exponent += exponentChange;
+ }
+ }
+
// If this is a truncation, perform the shift before we narrow the storage.
if (shift < 0 && (isFiniteNonZero() || category==fcNaN))
lostFraction = shiftRight(significandParts(), oldPartCount, -shift);
llvm-svn: 186409
We'd forgotten to provide string representations for the special ARMISD atomic
nodes; this adds them in. No effect on CodeGen, just makes the output of
"-view-whatever-dags" slightly more readable.
llvm-svn: 186406
Another patch in the series to make more use of R.SBG. This one extends
r186072 and r186073 to handle cases where the AND is inside the shift.
llvm-svn: 186399
This patch enables calls to __aeabi_idivmod when in EABI mode,
by using the remainder value returned on registers (R1),
enabled by the ARM triple "none-eabi". Note that Darwin and
GNUEABI triples will continue lowering on GNU style, that is,
using the stack for the remainder.
Still need to add SREM/UREM support fix for 64-bit lowering.
llvm-svn: 186390
This is a micro optimization. Instead of going char*->StringRef->Twine->char*,
go char*->Twine->char* and avoid having to copy the filename on the stack.
llvm-svn: 186380
llvm-ar is the only user of toWin32Time() (via setLastModificationAndAccessTime), and r186298 can be reverted.
It had been buggy since the initial commit.
FIXME: Could we rename {from|to}Win32Time as {from|to}Win32FILETIME in TimeValue?
llvm-svn: 186374
We can have a FrameSetup in one basic block and the matching FrameDestroy
in a different basic block when we have struct byval. In that case, SPAdj
is not zero at beginning of the basic block.
Modify PEI to correctly set SPAdj at beginning of each basic block using
DFS traversal. We used to assume SPAdj is 0 at beginning of each basic block.
PEI had an assert SPAdjCount || SPAdj == 0.
If we have a Destroy <n> followed by a Setup <m>, PEI will assert failure.
We can add an extra condition to make sure the pairs are matched:
The pairs start with a FrameSetup.
But since we are doing a much better job in the verifier, this patch removes
the check in PEI.
PR16393
llvm-svn: 186364
This change mirrors the changes that were made to the X86 and ARM targets to
support subtarget feature changing. As indicated in r182899, the mechanism is
still undergoing revision, and so as with the X86 and ARM targets, there is no
test case yet (there is no effective functionality change).
llvm-svn: 186357
1> on every path through the CFG, a FrameSetup <n> is always followed by a
FrameDestroy <n> and a FrameDestroy is always followed by a FrameSetup.
2> stack adjustments are identical on all CFG edges to a merge point.
3> frame is destroyed at end of a return block.
PR16393
llvm-svn: 186350
PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the
common subclass of the true and false inputs, and then selecting either the
32-bit or the 64-bit isel variant based on the result of calling
PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC)
(where RC is the common subclass). Unfortunately, this is not quite right: if
we have something like this:
%vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76;
G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6
then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and
G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8
pseudo-register). As a result, we also need to check the common subclass
against GPRC_NOR0 and G8RC_NOX0 explicitly.
This had not been a problem for clients of insertSelect that called
canInsertSelect first (because it had a compensating mistake), but insertSelect
is also used by the PPC pseudo-instruction expander, and this error was causing
a problem in that context.
This problem was found by csmith.
llvm-svn: 186343
This is consistent with the ELF object writer.
Add some COFF tests that relocate against an alias.
Reviewers: espindola
Differential Revision: http://llvm-reviews.chandlerc.com/D1079
llvm-svn: 186341
There is a comment at the top of DAGTypeLegalizer::PerformExpensiveChecks
which, in part, says:
// Note that these invariants may not hold momentarily when processing a node:
// the node being processed may be put in a map before being marked Processed.
Unfortunately, this assert would be valid only if the above-mentioned invariant
held unconditionally. This was causing llc to assert when, in fact,
everything was fine.
Thanks to Richard Sandiford for investigating this issue!
Fixes PR16562.
llvm-svn: 186338
a bot.
This reverts the commit which introduced a new implementation of the
fancy SROA pass designed to reduce its overhead. I'll skip the huge
commit log here, refer to r186316 if you're looking for how this all
works and why it works that way.
llvm-svn: 186332
No functionality change.
This is preparing to move response file parsing into lib/Option so it
can be shared between clang and lld. This change isn't just a
micro-optimization. Clang's driver uses a std::set<std::string> to
unique arguments while parsing response files, so this matches that.
llvm-svn: 186319
Joerg Sonnenberger tells me one can open a directory in freebsd. I will try
to centralize our calls to open so that we can handle O_BINARY in one place,
and will then handle this there too.
llvm-svn: 186317
different core implementation strategy.
Previously, SROA would build a relatively elaborate partitioning of an
alloca, associate uses with each partition, and then rewrite the uses of
each partition in an attempt to break apart the alloca into chunks that
could be promoted. This was very wasteful in terms of memory and compile
time because regardless of how complex the alloca or how much we're able
to do in breaking it up, all of the datastructure work to analyze the
partitioning was done up front.
The new implementation attempts to form partitions of the alloca lazily
and on the fly, rewriting the uses that make up that partition as it
goes. This has a few significant effects:
1) Much simpler data structures are used throughout.
2) No more double walk of the recursive use graph of the alloca, only
walk it once.
3) No more complex algorithms for associating a particular use with
a particular partition.
4) PHI and Select speculation is simplified and happens lazily.
5) More precise information is available about a specific use of the
alloca, removing the need for some side datastructures.
Ultimately, I think this is a much better implementation. It removes
about 300 lines of code, but arguably removes more like 500 considering
that some code grew in the process of being factored apart and cleaned
up for this all to work.
I've re-used as much of the old implementation as possible, which
includes the lion's share of code in the form of the rewriting logic.
The interesting new logic centers around how the uses of a partition are
sorted, and split into actual partitions.
Each instruction using a pointer derived from the alloca gets
a 'Partition' entry. This name is totally wrong, but I'll do a rename in
a follow-up commit as there is already enough churn here. The entry
describes the offset range accessed and the nature of the access. Once
we have all of these entries we sort them in a very specific way:
increasing order of begin offset, followed by whether they are
splittable uses (memcpy, etc), followed by the end offset or whatever.
Sorting by splittability is important as it simplifies the collection of
uses into a partition.
Once we have these uses sorted, we walk from the beginning to the end
building up a range of uses that form a partition of the alloca.
Overlapping unsplittable uses are merged into a single partition while
splittable uses are broken apart and carried from one partition to the
next. A partition is also introduced to bridge splittable uses between
the unsplittable regions when necessary.
I've looked at the performance PRs fairly closely. PR15471 no longer
will even load (the module is invalid). Not sure what is up there.
PR15412 improves by between 5% and 10%, however it is nearly impossible
to know what is holding it up as SROA (the entire pass) takes less time
than reading the IR for that test case. The analysis takes the same time
as running mem2reg on the final allocas. I suspect (without much
evidence) that the new implementation will scale much better however,
and it is just the small nature of the test cases that makes the changes
small and noisy. Either way, it is still simpler and cleaner I think.
llvm-svn: 186316
is executed within the same second as the inputs for the test are
checked out from the source tree, it will fail to update due to being
below the resolution of the 'mtime' test used.
Now, this may seem improbably to you... ok, maybe *really* improbable,
but consider a system which does distributed execution of tests by
shipping their inputs to another machine and runs them. That might cause
the mtime to be quite recent during the test run. ;]
Instead, create two files directly in the test (allowing all platforms
to see the problem) and add either a use of the 'touch' command that
forces one mtime to some time quite a bit in the past, or it sleeps for
just over a second to be outside of the precision window.
llvm-svn: 186282
This update was done with the following bash script:
find test/CodeGen -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
done
sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
mv $TEMP $NAME
fi
done
llvm-svn: 186280
The great thing about the SCEVAddRec No-Wrap flag (unlike nsw/nuw) is
that is can be preserved while normalizing (reassociating and
factoring).
The bad thing is that is can't be tranfered back to IR, which is one
of the reasons I don't like the concept of SCEVExpander.
Sorry, I can't think of a direct way to test this, which is why these
were FIXMEs for so long. I just think it's a good time to finally
clean it up.
llvm-svn: 186273
This conversion was done with the following bash script:
find test/Transforms -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)define\([^@]*\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3define\4@$FUNC(/g" $TEMP
done
mv $TEMP $NAME
fi
done
llvm-svn: 186269
This update was done with the following bash script:
find test/Transforms -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP
done
mv $TEMP $NAME
fi
done
llvm-svn: 186268
This was done with the following sed invocation to catch label lines demarking function boundaries:
sed -i '' "s/^;\( *\)\([A-Z0-9_]*\):\( *\)test\([A-Za-z0-9_-]*\):\( *\)$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen/*/*.ll
which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct.
llvm-svn: 186258
If an outside loop user of the reduction value uses the header phi node we
cannot just reduce the vectorized phi value in the vector code epilog because
we would loose VF-1 reductions.
lp:
p = phi (0, lv)
lv = lv + 1
...
brcond , lp, outside
outside:
usr = add 0, p
(Say the loop iterates two times, the value of p coming out of the loop is one).
We cannot just transform this to:
vlp:
p = phi (<0,0>, lv)
lv = lv + <1,1>
..
brcond , lp, outside
outside:
p_reduced = p[0] + [1];
usr = add 0, p_reduced
(Because the original loop iterated two times the vectorized loop would iterate
one time, but p_reduced ends up being zero instead of one).
We would have to execute VF-1 iterations in the scalar remainder loop in such
cases. For now, just disable vectorization.
PR16522
llvm-svn: 186256
It is failing with
YAMLTest.cpp:38: instantiated from here
YAMLTraits.h:226: error: 'llvm::yaml::MappingTraits<<unnamed>::BinaryHolder>::mapping' is not a valid template argument for type 'void (*)(llvm::yaml::IO&, <unnamed>::BinaryHolder&)' because function 'static void llvm::yaml::MappingTraits<<unnamed>::BinaryHolder>::mapping(llvm::yaml::IO&, <unnamed>::BinaryHolder&)' has not external linkage
llvm-svn: 186245
In general, one should always complete CFG modifications first, update
CFG-based analyses, like Dominatores and LoopInfo, then generate
instruction sequences.
LoopVectorizer was creating a new loop, calling SCEVExpander to
generate checks, then updating LoopInfo. I just changed the order.
llvm-svn: 186241
ARM paired GPR COPY was being lowered to two MOVr without CC. This
patch puts the CC back.
My test is a reduction of the case where I encountered the issue,
64-bit atomics use paired GPRs.
The issue only occurs with selectionDAG, FastISel doesn't encounter it
so I didn't bother calling it.
llvm-svn: 186226
This fixes two bugs is lib/Object that the use in llvm-ar found:
* In OS X created archives, the name can be padded with nulls. Strip them.
* In the constructor, remember the first non special member and use that in
begin_children. This makes sure we skip all special members, not just the
first one.
The change to llvm-ar itself consist of
* Using lib/Object for reading archives instead of ArchiveReader.cpp.
* Writing the modified archive directly, instead of creating an in memory
representation.
The old Archive library was way more general than what is needed, as can
be seen by the diffstat of this patch.
Having llvm-ar using lib/Object now opens the way for creating regular symbol
tables for both native objects and bitcode files so that we can use those
archives for LTO.
llvm-svn: 186197
Fixes a 35% degradation compared to unvectorized code in
MiBench/automotive-susan and an equally serious regression on a private
image processing benchmark.
radar://14351991
llvm-svn: 186188
Address calculation for gather/scather in vectorized code can incur a
significant cost making vectorization unbeneficial. Add infrastructure to add
cost.
Tests and cost model for targets will be in follow-up commits.
radar://14351991
llvm-svn: 186187
This is a generic block implementation that works on more than machine blocks.
The C++ mode addition is a bonus due to the extra space provided.
llvm-svn: 186175
In particular:
movsbw %al, %ax --> cbtw
movswl %ax, %eax --> cwtl
movslq %eax, %rax --> cltq
According to Intel's manual those have the same performance characteristics but
come with a smaller encoding.
llvm-svn: 186174
CHECK-LABEL is meant to be used in place on CHECK on lines containing identifiers or other unique labels (they need not actually be labels in the source or output language, though.) This is used to break up the input stream into separate blocks delineated by CHECK-LABEL lines, each of which is checked independently. This greatly improves the accuracy of errors and fix-it hints in many cases, and allows for FileCheck to recover from errors in one block by continuing to subsequent blocks.
Some tests will be converted to use this new directive in forthcoming patches.
llvm-svn: 186162
against a constant."
This reverts commit r186107. It didn't handle wrapping arithmetic in the
loop correctly and thus caused the following C program to count from
0 to UINT64_MAX instead of from 0 to 255 as intended:
#include <stdio.h>
int main() {
unsigned char first = 0, last = 255;
do { printf("%d\n", first); } while (first++ != last);
}
Full test case and instructions to reproduce with just the -indvars pass
sent to the original review thread rather than to r186107's commit.
llvm-svn: 186152
Normal (sext (setcc ...)) sequences are optimised into
(select_cc ..., -1, 0) by DAGCombiner::visitSIGN_EXTEND.
However, this is deliberately not done for vectors, and after
vector type legalization we have (sext_inreg (setcc ...)) instead.
I wondered about trying to extend DAGCombiner to handle this case too,
but it seemed to be a loss on some other targets I tried, even those for
which SETCC isn't "legal" and SELECT_CC is.
llvm-svn: 186149
GPR and FPR constraints like "{r2}" and "{f2}" weren't handled correctly
because the name-to-regno mapping depends on the value type and
(because of that) the internal names in RegStrings are not the
same as the AsmName.
CC constraints like "{cc}" didn't work either because there was no
associated register class.
llvm-svn: 186148
If the source of these instructions is spilled we should load the destination.
If the destination is spilled we should store the source.
llvm-svn: 186147
Summary:
This patch adds explicit calling convention types for the Win64 and
System V/x86-64 ABIs. This allows code to override the default, and use
the Win64 convention on a target that wants to use SysV (and
vice-versa). This is needed to implement the `ms_abi` and `sysv_abi` GNU
attributes.
Reviewers:
CC:
llvm-svn: 186144
Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler.
llvm-svn: 186139
We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for
v8i16 (which occurs in the test case) or v16i8. The same was true for
V_SETALLONES (so I added the associated patterns for those as well).
Another bug found by llvm-stress.
llvm-svn: 186108
Patch by Michele Scandale!
Adds a special handling of the case where, during the loop exit
condition rewriting, the exit value is a constant of bitwidth lower
than the type of the induction variable: instead of introducing a
trunc operation in order to match correctly the operand types, it
allows to convert the constant value to an equivalent constant,
depending on the initial value of the induction variable and the trip
count, in order have an equivalent comparison between the induction
variable and the new constant.
llvm-svn: 186107
This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI
with an out-of-range operand. Most uses of the isRunOfOnes function are guarded
by a condition that the value is not zero. This was not true in two places, and
in both places a zero input would result in an out-of-rage MB value (= 32).
To fix this, isRunOfOnes returns false on a zero input (and I've remove one
now-redundant guard).
llvm-svn: 186101
We can vectorize them because in the case where we wrap in the address space the
unvectorized code would have had to access a pointer value of zero which is
undefined behavior in address space zero according to the LLVM IR semantics.
(Thank you Duncan, for pointing this out to me).
Fixes PR16592.
llvm-svn: 186088
* All systems we support have some form of long name support.
* The options has different names and semantics in different implementations
('f' on gnu, 'T' on OS X), which makes it unlikely it is normally used on
build systems.
* It was completely untested.
llvm-svn: 186078
- Coallocate entires for AttributeSetImpls and Nodes after the class itself.
- Remove mutable iterators from immutable classes.
- Remove unused context field from AttributeImpl.
- Derive Enum/Align/String attribute implementations from AttributeImpl instead
of having a whole new inheritance tree for them.
- Derive AlignAttributeImpl from EnumAttributeImpl.
llvm-svn: 186075
RISBG can handle some ANDs for which no AND IMMEDIATE exists.
It also acts as a three-operand AND for some cases where an
AND IMMEDIATE could be used instead.
It might be worth adding a pass to replace RISBG with AND IMMEDIATE
in cases where the register operands end up being the same and where
AND IMMEDIATE is smaller.
llvm-svn: 186072
RISBG has three 8-bit operands (I3, I4 and I5). I'd originally
restricted all three to 6 bits, since that's the only range we intended
to use at the time. However, the top bit of I4 acts as a "zero" flag for
RISBG, while the top bit of I3 acts as a "test" flag for RNSBG & co.
This patch therefore allows them to have the full 8-bit range.
I've left the fifth operand as a 6-bit value for now since the
upper 2 bits have no defined meaning.
llvm-svn: 186070
predecessors of the two blocks it is attempting to merge supply the
same incoming values to any phi in the successor block. This change
allows merging in the case where there is one or more incoming values
that are undef. The undef values are rewritten to match the non-undef
value that flows from the other edge. Patch by Mark Lacey.
llvm-svn: 186069
MF is normally initialized in AsmPrinter::SetupMachineFunction, but if the file
contains only globals (no functions), then we need this to be initialized
because, when encountering an error, lowerConstant() references it.
This should fix the non-deterministic failures of
test/CodeGen/X86/nonconst-static-iv.ll, etc.
llvm-svn: 186068
When computing currently-live registers, the register scavenger excludes undef
uses. As a result, undef uses are ignored when computing the restore points of
registers spilled into the emergency slots. While the register scavenger
normally excludes from consideration, when scavenging, registers used by the
current instruction, we need to not exclude undef uses. Otherwise, we might end
up requiring more emergency spill slots than we have (in the case where the
undef use *is* the currently-spilled register).
Another bug found by llvm-stress.
llvm-svn: 186067
Without the changes introduced into this patch, if TRE saw any allocas at all,
TRE would not perform TRE *or* mark callsites with the tail marker.
Because TRE runs after mem2reg, this inadequacy is not a death sentence. But
given a callsite A without escaping alloca argument, A may not be able to have
the tail marker placed on it due to a separate callsite B having a write-back
parameter passed in via an argument with the nocapture attribute.
Assume that B is the only other callsite besides A and B only has nocapture
escaping alloca arguments (*NOTE* B may have other arguments that are not passed
allocas). In this case not marking A with the tail marker is unnecessarily
conservative since:
1. By assumption A has no escaping alloca arguments itself so it can not
access the caller's stack via its arguments.
2. Since all of B's escaping alloca arguments are passed as parameters with
the nocapture attribute, we know that B does not stash said escaping
allocas in a manner that outlives B itself and thus could be accessed
indirectly by A.
With the changes introduced by this patch:
1. If we see any escaping allocas passed as a capturing argument, we do
nothing and bail early.
2. If we do not see any escaping allocas passed as captured arguments but we
do see escaping allocas passed as nocapture arguments:
i. We do not perform TRE to avoid PR962 since the code generator produces
significantly worse code for the dynamic allocas that would be created
by the TRE algorithm.
ii. If we do not return twice, mark call sites without escaping allocas
with the tail marker. *NOTE* This excludes functions with escaping
nocapture allocas.
3. If we do not see any escaping allocas at all (whether captured or not):
i. If we do not have usage of setjmp, mark all callsites with the tail
marker.
ii. If there are no dynamic/variable sized allocas in the function,
attempt to perform TRE on all callsites in the function.
Based off of a patch by Nick Lewycky.
rdar://14324281.
llvm-svn: 186057
I had thought that these tests could be target-neutral, but in practice this is
not the case (on some targets, like Hexagon and Darwin), they trigger an assert
(a different assert than the one that r186044 fixes).
llvm-svn: 186051
For some reason, the Hexagon backend does not reject these invalid static
initializer expressions, but instead crashes in AsmPrinter::EmitGlobalConstant.
llvm-svn: 186045
A non-constant-foldable static initializer expression containing insertvalue or
extractvalue had been causing an assert:
Constants.cpp:1971: Assertion `FC && "ExtractValue constant expr couldn't be
folded!"' failed.
Now we report a more-sensible "Unsupported expression in static initializer"
error instead.
Fixes PR15417.
llvm-svn: 186044
It is not reliable to depend on the output of llvm_unreachable. The original
change will have proper tests when llvm-ar moves to lib/Object (soon).
llvm-svn: 186043
There is no lib/Archive anymore and some archive tests were in test/Archive and
others in test/Object. Since archive is just one of the formats supported by
lib/Object, test/Object is probably the best location.
llvm-svn: 186038
Patch from Игорь Пашев (I do hope we support utf-8 commit messages; I
also hope he'll forgive me for transliterating it as Igor Pashev in
case things go horribly wrong).
llvm-svn: 186034
The status function is already using a syscall that returns the file size.
Remember it and implement file_size as a simple wrapper.
No functionally change, but clients that already use status now can avoid
calling file_size.
llvm-svn: 186016