Summary: It is the same as LA, except that it can also load 64-bit addresses and it only works on 64-bit MIPS architectures.
Reviewers: tomatabacu, seanbruno, vkalintiris
Subscribers: brooks, seanbruno, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D9524
llvm-svn: 245208
These only get generated if the target supports them. If one of the variants is not legal and the other is, and it is safe to do so, the other variant will be emitted.
For example on AArch32 (V8), we have scalar fminnm but not fmin.
Fix up a couple of tests while we're here - one now produces better code, and the other was just plain wrong to start with.
llvm-svn: 245196
PR24469 resulted because DeleteDeadInstruction in handleNonLocalStoreDeletion was
deleting the next basic block iterator. Fixed the same by resetting the basic block iterator
post call to DeleteDeadInstruction.
llvm-svn: 245195
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
If we can ignore NaNs, fmin/fmax libcalls can become compare and select
(this is what we turn std::min / std::max into).
This IR should then be optimized in the backend to whatever is best for
any given target. Eg, x86 can use minss/maxss instructions.
This should solve PR24314:
https://llvm.org/bugs/show_bug.cgi?id=24314
Differential Revision: http://reviews.llvm.org/D11866
llvm-svn: 245187
Bitwise arithmetic can obscure a simple sign-test. If replacing the
mask with a truncate is preferable if the type is legal because it
permits us to rephrase the comparison more explicitly.
llvm-svn: 245171
We can set additional bits in a mask given that we know the other
operand of an AND already has some bits set to zero. This can be more
efficient if doing so allows us to use an instruction which implicitly
sign extends the immediate.
This fixes PR24085.
Differential Revision: http://reviews.llvm.org/D11289
llvm-svn: 245169
For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors.
(zext (truncate x)) -> (zext (and(x, m))
Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code.
This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles.
Differential Revision: http://reviews.llvm.org/D11764
llvm-svn: 245160
Some personality routines require funclet exit points to be clearly
marked, this is done by producing a token at the funclet pad and
consuming it at the corresponding ret instruction. CleanupReturnInst
already had a spot for this operand but CatchReturnInst did not.
Other personality routines don't need to use this which is why it has
been made optional.
llvm-svn: 245149
This patch makes the Merge Functions pass faster by calculating and comparing
a hash value which captures the essential structure of a function before
performing a full function comparison.
The hash is calculated by hashing the function signature, then walking the basic
blocks of the function in the same order as the main comparison function. The
opcode of each instruction is hashed in sequence, which means that different
functions according to the existing total order cannot have the same hash, as
the comparison requires the opcodes of the two functions to be the same order.
The hash function is a static member of the FunctionComparator class because it
is tightly coupled to the exact comparison function used. For example, functions
which are equivalent modulo a single variant callsite might be merged by a more
aggressive MergeFunctions, and the hash function would need to be insensitive to
these differences in order to exploit this.
The hashing function uses a utility class which accumulates the values into an
internal state using a standard bit-mixing function. Note that this is a different interface
than a regular hashing routine, because the values to be hashed are scattered
amongst the properties of a llvm::Function, not linear in memory. This scheme is
fast because only one word of state needs to be kept, and the mixing function is
a few instructions.
The main runOnModule function first computes the hash of each function, and only
further processes functions which do not have a unique function hash. The hash
is also used to order the sorted function set. If the hashes differ, their
values are used to order the functions, otherwise the full comparison is done.
Both of these are helpful in speeding up MergeFunctions. Together they result in
speedups of 9% for mysqld (a mostly C application with little redundancy), 46%
for libxul in Firefox, and 117% for Chromium. (These are all LTO builds.) In all
three cases, the new speed of MergeFunctions is about half that of the module
verifier, making it relatively inexpensive even for large LTO builds with
hundreds of thousands of functions. The same functions are merged, so this
change is free performance.
Author: jrkoenig
Reviewers: nlewycky, dschuff, jfb
Subscribers: llvm-commits, aemerson
Differential revision: http://reviews.llvm.org/D11923
llvm-svn: 245140
This seems to only work some of the time. In some situations,
this seems to use a nonsensical type and isn't actually aware of the
memory being accessed. e.g. if branch condition is an icmp of a pointer,
it checks the addressing mode of i1.
llvm-svn: 245137
Summary:
http://reviews.llvm.org/D11212 made Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs for add instructions. This patch expands that to sub, mul and shl instructions.
This change makes LSR able to generate pointer induction variables for loops like these, where the index is 32 bit and the pointer is 64 bit:
for (int i = 0; i < numIterations; ++i)
sum += ptr[i - offset];
for (int i = 0; i < numIterations; ++i)
sum += ptr[i * stride];
for (int i = 0; i < numIterations; ++i)
sum += ptr[3 * (i << 7)];
Reviewers: atrick, sanjoy
Subscribers: sanjoy, majnemer, hfinkel, llvm-commits, meheff, jingyue, eliben
Differential Revision: http://reviews.llvm.org/D11860
llvm-svn: 245118
Although targeting CoreCLR is similar to targeting MSVC, there are
certain important differences that the backend must be aware of
(e.g. differences in stack probes, EH, and library calls).
Differential Revision: http://reviews.llvm.org/D11012
llvm-svn: 245115
We canonicalize V64 vectors to V128 through insert_subvector: the other
FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't,
so we'd fail to match fmls and generate fneg+fmla instead.
The vector equivalents are already tested and functional.
llvm-svn: 245107
This patch makes the Darwin ARM backend take advantage of TargetParser. It
also teaches TargetParser about ARMV7K for the first time. This makes target
triple parsing more consistent across llvm.
Differential Revision: http://reviews.llvm.org/D11996
llvm-svn: 245081
This patch fixes the x86 implementation of allowsMisalignedMemoryAccess() to correctly
return the 'Fast' output parameter for 32-byte accesses. To test that, an existing load
merging optimization is changed to use the TLI hook. This exposes a shortcoming in the
current logic and results in the regression test update. Changing other direct users of
the isUnalignedMem32Slow() x86 CPU attribute would be a follow-on patch.
Without the fix in allowsMisalignedMemoryAccesses(), we will infinite loop when targeting
SandyBridge because LowerINSERT_SUBVECTOR() creates 32-byte loads from two 16-byte loads
while PerformLOADCombine() splits them back into 16-byte loads.
Differential Revision: http://reviews.llvm.org/D10662
llvm-svn: 245075
Summary: Similar to the change we applied to ASan. The same test case works.
Reviewers: samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11961
llvm-svn: 245067
This reverts commit r245047.
It was failing on the darwin bots. The problem was that when running
./bin/llc -march=msp430
llc gets to
if (TheTriple.getTriple().empty())
TheTriple.setTriple(sys::getDefaultTargetTriple());
Which means that we go with an arch of msp430 but a triple of
x86_64-apple-darwin14.4.0 which fails badly.
That code has to be updated to select a triple based on the value of
march, but that is not a trivial fix.
llvm-svn: 245062
Other than some places that were handling unknown as ELF, this should
have no change. The test updates are because we were detecting
arm-coff or x86_64-win64-coff as ELF targets before.
It is not clear if the enum should live on the Triple. At least now it lives
in a single location and should be easier to move somewhere else.
llvm-svn: 245047
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.
I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.
llvm-svn: 245035
This introduces the basic functionality to support "token types".
The motivation stems from the need to perform operations on a Value
whose provenance cannot be obscured.
There are several applications for such a type but my immediate
motivation stems from WinEH. Our personality routine enforces a
single-entry - single-exit regime for cleanups. After several rounds of
optimizations, we may be left with a terminator whose "cleanup-entry
block" is not entirely clear because control flow has merged two
cleanups together. We have experimented with using labels as operands
inside of instructions which are not terminators to indicate where we
came from but found that LLVM does not expect such exotic uses of
BasicBlocks.
Instead, we can use this new type to clearly associate the "entry point"
and "exit point" of our cleanup. This is done by having the cleanuppad
yield a Token and consuming it at the cleanupret.
The token type makes it impossible to obscure or otherwise hide the
Value, making it trivial to track the relationship between the two
points.
What is the burden to the optimizer? Well, it turns out we have already
paid down this cost by accepting that there are certain calls that we
are not permitted to duplicate, optimizations have to watch out for
such instructions anyway. There are additional places in the optimizer
that we will probably have to update but early examination has given me
the impression that this will not be heroic.
Differential Revision: http://reviews.llvm.org/D11861
llvm-svn: 245029
Summary:
This patch implements my promised optimization to reunites certain sexts from
operands after we extract the constant offset. See the header comment of
reuniteExts for its motivation.
One key building block that enables this optimization is Bjarke's poison value
analysis (D11212). That helps to prove "a +nsw b" can't overflow.
Reviewers: broune
Subscribers: jholewinski, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12016
llvm-svn: 245003
This commit modifies the way the machine basic blocks are serialized - now the
machine basic blocks are serialized using a custom syntax instead of relying on
YAML primitives. Instead of using YAML mappings to represent the individual
machine basic blocks in a machine function's body, the new syntax uses a single
YAML block scalar which contains all of the machine basic blocks and
instructions for that function.
This is an example of a function's body that uses the old syntax:
body:
- id: 0
name: entry
instructions:
- '%eax = MOV32r0 implicit-def %eflags'
- 'RETQ %eax'
...
The same body is now written like this:
body: |
bb.0.entry:
%eax = MOV32r0 implicit-def %eflags
RETQ %eax
...
This syntax change is motivated by the fact that the bundled machine
instructions didn't map that well to the old syntax which was using a single
YAML sequence to store all of the machine instructions in a block. The bundled
machine instructions internally use flags like BundledPred and BundledSucc to
determine the bundles, and serializing them as MI flags using the old syntax
would have had a negative impact on the readability and the ease of editing
for MIR files. The new syntax allows me to serialize the bundled machine
instructions using a block construct without relying on the internal flags,
for example:
BUNDLE implicit-def dead %itstate, implicit-def %s1 ... {
t2IT 1, 24, implicit-def %itstate
%s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate
}
This commit also converts the MIR testcases to the new syntax. I developed
a script that can convert from the old syntax to the new one. I will post the
script on the llvm-commits mailing list in the thread for this commit.
llvm-svn: 244982
We used to just say "invalid type suffix for instruction", which is
misleading. This is because we fallback to the long-form matcher if the
short-form matcher failed, losing the error information on the way.
Save it, so that we can provide a little better diagnostics when the
long-form matcher thinks a suffix is the cause of the error.
llvm-svn: 244955
If <src> is non-zero we can safely set the flag to true, and this
results in less code generated for, e.g. ffs(x) + 1 on FreeBSD.
Thanks to majnemer for suggesting the fix and reviewing.
Code generated before the patch was applied:
0: 0f bc c7 bsf %edi,%eax
3: b9 20 00 00 00 mov $0x20,%ecx
8: 0f 45 c8 cmovne %eax,%ecx
b: 83 c1 02 add $0x2,%ecx
e: b8 01 00 00 00 mov $0x1,%eax
13: 85 ff test %edi,%edi
15: 0f 45 c1 cmovne %ecx,%eax
18: c3 retq
Code generated after the patch was applied:
0: 0f bc cf bsf %edi,%ecx
3: 83 c1 02 add $0x2,%ecx
6: 85 ff test %edi,%edi
8: b8 01 00 00 00 mov $0x1,%eax
d: 0f 45 c1 cmovne %ecx,%eax
10: c3 retq
It seems we can still use cmove and save another 'test' instruction, but
that can be tackled separately.
Differential Revision: http://reviews.llvm.org/D11989
llvm-svn: 244947