Currently this only functions to match simple cases
where ds_read2_* / ds_write2_* instructions can be used.
In the future it might match some of the other weird
load patterns, such as direct to LDS loads.
Currently enabled only with a subtarget feature to enable
easier testing.
llvm-svn: 219533
It also makes it more aggressive in querying range information by
adding a call to isKnownPredicateWithRanges to
isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond.
phabricator: http://reviews.llvm.org/D5638
Reviewed by: atrick, hfinkel
llvm-svn: 219532
I was quiet surprised to find this feature being used. Fortunately the uses
I found look fairly simple. In fact, they are just a very verbose version
of the regular ar commands.
Start implementing it then by parsing the script and setting the command
variables as if we had a regular command line.
This patch adds just enough support to create an empty archive and do a bit
of error checking. In followup patches I will implement at least addmod
and addlib.
From the description in the manual, even the more general case should not
be too hard to implement if needed. The features that don't map 1:1 to
the simple command line are
* Reading from multiple archives.
* Creating multiple archives.
llvm-svn: 219521
ScalarEvolution in the presence of multiple exits. Previously all
loops exits had to have identical counts for a loop trip count to be
considered computable. This pessimization was implemented by calling
getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock)
inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME
in the comments of that function). The pessimization was added to fix
a corner case involving undefined behavior (pr/16130). This patch more
precisely handles the undefined behavior case allowing the pessimization
to be removed.
ControlsExit replaces IsSubExpr to more precisely track the case where
undefined behavior is expected to occur. Because undefined behavior is
tracked more precisely we can remove MustExit from ExitLimit. MustExit
was used to track the case where the limit was computed potentially
assuming undefined behavior even if undefined behavior didn't necessarily
occur.
llvm-svn: 219517
Fixes a logic error in the MachineScheduler found by Steve Montgomery (and
confirmed by Andy). This has gone unfixed for months because the fix has been
found to introduce some small performance regressions. However, Andy has
recommended that, at this point, we fix this to avoid further dependence on the
incorrect behavior (and then follow-up separately on any regressions), and I
agree.
Fixes PR18883.
llvm-svn: 219512
Summary: Add the ability to convert 64 or 32 bit floating point values to integer in mips fast-isel
Test Plan:
fpintconv.ll
ran 4 flavors of test-suite with no errors, misp32 r1/r2 O0/O2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits, rfuhler, mcrosier
Differential Revision: http://reviews.llvm.org/D5562
llvm-svn: 219511
This change depends on the ApplePropertyString helper that I sent spearately.
Not sure how you want this tested: as a tool test by adding a binary to dump, or as an llvm test starting from an IR file?
Reviewers: dblaikie, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5689
llvm-svn: 219507
DW_AT_specification and DW_AT_abstract_origin resolving was only performed
on subroutine DIEs because it used the getSubroutineName method. Introduce
a more generic getName() and use it to dump the reference attributes.
Testcases have been updated to check the printed names instead of the offsets
except when the name could be ambiguous.
Reviewers: dblaikie, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5625
llvm-svn: 219506
instead
We used to transform this:
define void @test6(i1 %cond, i8* %ptr) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br label %bb2
bb2:
%ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
store i8 2, i8* %ptr.2, align 8
ret void
}
into this:
define void @test6(i1 %cond, i8* %ptr) {
%ptr.2 = select i1 %cond, i8* null, i8* %ptr
store i8 2, i8* %ptr.2, align 8
ret void
}
because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).
The existing transformation that removes unreachable control flow in simplifycfg
is:
/// If BB has an incoming value that will always trigger undefined behavior
/// (eg. null pointer dereference), remove the branch leading here.
static bool removeUndefIntroducingPredecessor(BasicBlock *BB)
Now we generate:
define void @test6(i1 %cond, i8* %ptr) {
store i8 2, i8* %ptr.2, align 8
ret void
}
I did not see any impact on the test-suite + externals.
rdar://18596215
llvm-svn: 219462
Long section names are represented as a slash followed by a numeric
ASCII string. This number is an offset into a string table.
Print the appropriate entry in the string table instead of the less
enlightening /4.
N.B. yaml2obj already does the right thing, this test exercises both
sides of the (de-)serialization.
llvm-svn: 219458
This patch changes the fast-math implementation for calculating sqrt(x) from:
y = 1 / (1 / sqrt(x))
to:
y = x * (1 / sqrt(x))
This has 2 benefits: less code / faster code and one less estimate instruction
that may lose precision.
The only target that will be affected (until http://reviews.llvm.org/D5658 is approved)
is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf
or vector sqrtf and 4 less flops for a double-precision sqrt.
We also eliminate a constant load and extra register usage.
Differential Revision: http://reviews.llvm.org/D5682
llvm-svn: 219445
The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem.
llvm-svn: 219441
LLVM assumes INSERT_SUBREG will always have register operands, so
we need to legalize non-register operands, like FrameIndexes, to
avoid random assertion failures.
llvm-svn: 219420
The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x
incorrectly use the XForm_1 instruction format, rather than the
XX1Form instruction format. This is likely a pasto when creating
these instructions, which were based on lvx and so forth. This patch
uses the correct format.
The existing reformatting test (test/MC/PowerPC/vsx.s) missed this
because the two formats differ only in that XX1Form has an extension
to the target register field in bit 31. The tests for these
instructions used a target register of 7, so the default of 0 in bit
31 for XForm_1 didn't expose a problem. For register numbers 32-63
this would be noticeable. I've changed the test to use higher
register numbers to verify my change is effective.
llvm-svn: 219416
This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we
wrongly computed the distance between the highest bits set of two negative
values.
This fixes PR21222.
Differential Revision: http://reviews.llvm.org/D5700
llvm-svn: 219406
This adds the Pat<>'s for the intrinsics. These are necessary because we
don't lower these intrinsics to SDNodes but match them directly. See the
rational in the previous commit.
llvm-svn: 219362
These derive from the new asm-only masking definitions.
Unfortunately I wasn't able to find a ISel pattern that we could legally
generate for the masking variants. The problem is that since the destination
is v4* we would need VK4 register classes and v4i1 value types to express the
masking. These are however not legal types/classes in AVX512f but only in VL,
so things get complicated pretty quickly. We can revisit this question later
if we have a more pressing need to express something like this.
So the ISel patterns are empty for the masking instructions and the next patch
will add Pat<>s instead to match the intrinsics calls with instructions.
llvm-svn: 219361
Summary:
I had forgotten to check for NotSlowIncDec in the patterns that can generate
inc/dec for the above pattern (added in D4796).
This currently applies to Atom Silvermont, KNL and SKX.
Test Plan: New checks on atomic_mi.ll
Reviewers: jfb, nadav
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5677
llvm-svn: 219336
A function with discardable linkage cannot be discarded if its a member
of a COMDAT group without considering all the other COMDAT members as
well. This sort of thing is already handled by GlobalOpt/GlobalDCE.
This fixes PR21206.
llvm-svn: 219335
The icmp-select-icmp optimization targets select-icmp.eq
only. This is now ensured by testing the branch predicate
explictly. This commit also includes the test case for pr21199.
llvm-svn: 219282
COFF normally doesn't allow us to describe the alignment of COMMON
symbols.
It turns out that most linkers use the symbol size as a hint as to how
aligned the symbol should be.
However the BFD folks have added a .drectve command, which we
now support as of r219229, that allows us to specify the alignment
precisely. With this in mind, stop rounding sizes up.
llvm-svn: 219281
Summary:
Fix pr21099
The pseudocode of what we were doing (spread through two functions) was:
if (operand.doesNotFitIn32Bits())
Opc.initializeWithFoo();
if (operand < 0)
operand = -operand;
if (operand.doesFitIn8Bits())
Opc.initializeWithBar();
else if (operand.doesFitIn32Bits())
Opc.initializeWithBlah();
doStuff(Opc);
So for operand == INT32_MIN, Opc was never initialized because the operand changes
from fitting in 32 bits to not fitting, causing the various bugs/error messages
noted by pr21099.
This patch adds an extra test at the beginning for this case, and an
llvm_unreachable to have better error message if the operand ends up
not fitting in 32-bits at the end.
Test Plan: new test + make check
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5655
llvm-svn: 219257
This is somewhat the inverse of how similar bugs in DAE and ArgPromo
manifested and were addressed. In those passes, individual call sites
were visited explicitly, and then the old function was deleted. This
left the debug info with a null llvm::Function* that needed to be
updated to point to the new function.
In the case of DFSan, it RAUWs the old function with the wrapper, which
includes debug info. So now the debug info refers to the wrapper, which
doesn't actually have any instructions with debug info in it, so it is
ignored entirely - resulting in a DW_TAG_subprogram with no high/low pc,
etc. Instead, fix up the debug info to refer to the original function
after the RAUW messed it up.
Reviewed/discussed with Peter Collingbourne on the llvm-dev mailing
list.
llvm-svn: 219249
`LoopUnrollPass` says that it preserves `LoopInfo` -- make it so. In
particular, tell `LoopInfo` about copies of inner loops when unrolling
the outer loop.
Conservatively, also tell `ScalarEvolution` to forget about the original
versions of these loops, since their inputs may have changed.
Fixes PR20987.
llvm-svn: 219241