handle more allocas with loads past the end of the alloca.
I suspect there are some related crashers with slightly different
patterns, but I'll fix those and add test cases as I find them.
Thanks to David Majnemer for the excellent test case reduction here.
Made this super simple to debug and fix.
llvm-svn: 246289
This patch includes a fix for a llvm-readobj test. With this patch,
the tool does no longer print out COFF headers for the short import
file, but that's probably desirable because the header for the short
import file is dummy.
llvm-svn: 246283
COFF short import files are special kind of files that contains only
DLL-exported symbol names. That's different from object files because
it has no data except symbol names.
This change implements a SymbolicFile interface for the short import
files so that symbol names can be accessed through that interface.
llvm-ar is now able to read the file and create symbol table entries
for short import files.
llvm-svn: 246276
Summary:
Change the coloring algorithm in WinEHPrepare to visit a funclet's exits
in its parents' contexts and so properly classify the continuations of
nested funclets.
Also change the placement of cloned blocks to be deterministic and to
maintain the relative order of each funclet's blocks.
Add a lit test showing various patterns that require cloning, the last
several of which don't have CHECKs yet because they require cloning
entire funclets which is NYI.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12353
llvm-svn: 246245
After hitting @llvm.assume(X) we can:
- propagate equality that X == true
- if X is icmp/fcmp (with eq operation), and one of operand
is constant we can change all variables with constants in the same BasicBlock
http://reviews.llvm.org/D11918
llvm-svn: 246243
Prior to this patch, we hadn't been marking StratifiedSets with the
appropriate StratifiedAttrs when handling the result of no-args call
instructions. This caused us to report NoAlias when handed, for
example, an escaped alloca and a result from an opaque function. Now we
properly mark the return value of said functions.
Thanks again to Chandler, Richard, and Nick for pinging me about this.
Differential review: http://reviews.llvm.org/D12408
llvm-svn: 246240
more than 2 instructions.
I introduced this regression a while back and did not noticed it because I
somehow forgot to push the initial test cases for the pass!
Fix that as well!
llvm-svn: 246239
llvm::splitCodeGen is a function that implements the core of parallel LTO
code generation. It uses llvm::SplitModule to split the module into linkable
partitions and spawning one code generation thread per partition. The function
produces multiple object files which can be linked in the usual way.
This has been threaded through to LTOCodeGenerator (and llvm-lto for testing
purposes). Separate patches will add parallel LTO support to the gold plugin
and lld.
Differential Revision: http://reviews.llvm.org/D12260
llvm-svn: 246236
We can now run 32-bit programs with empty catch bodies. The next step
is to change PEI so that we get funclet prologues and epilogues.
llvm-svn: 246235
Any call which is side effect free is trivially OK to speculate. We
already had similar logic in EarlyCSE and GVN but we were missing it
from isSafeToSpeculativelyExecute.
This fixes PR24601.
llvm-svn: 246232
Fixes PR24602: r245689 introduced an unguarded use of
SelectionDAG::FoldConstantArithmetic, which returns 0 when it fails
because of opaque (hoisted) constants.
llvm-svn: 246217
Constant propagation for single precision math functions (such as
tanf) is already working, but was not enabled. This patch enables
these for many single-precision functions, and adds respective test
cases.
Newly handled functions: acosf asinf atanf atan2f ceilf coshf expf
exp2f fabsf floorf fmodf logf log10f powf sinhf tanf tanhf
llvm-svn: 246194
This patch changes the analysis diagnostics produced when loops with
floating-point recurrences or memory operations are identified. The new messages
say "cannot prove it is safe to reorder * operations; allow reordering by
specifying #pragma clang loop vectorize(enable)". Depending on the type of
diagnostic the message will include additional options such as ffast-math or
__restrict__.
This patch also allows the vectorize(enable) pragma to override the low pointer
memory check threshold. When the hint is given a higher threshold is used.
See the clang patch for the options produced for each diagnostic.
llvm-svn: 246187
Constant propagation for single precision math functions (such as
tanf) is already working, but was not enabled. This patch enables
these for many single-precision functions, and adds respective test
cases.
Newly handled functions: acosf asinf atanf atan2f ceilf coshf expf
exp2f fabsf floorf fmodf logf log10f powf sinhf tanf tanhf
llvm-svn: 246186
Constant propagation for single precision math functions (such as
tanf) is already working, but was not enabled. This patch enables
these for many single-precision functions, and adds respective test
cases.
Newly handled functions: acosf asinf atanf atan2f ceilf coshf expf
exp2f fabsf floorf fmodf logf log10f powf sinhf tanf tanhf
llvm-svn: 246158
Unlike scalar operations, we can perform vector operations on element types that
are smaller than the native integer types. We type-promote scalar operations if
they are smaller than a native type (e.g., i8 arithmetic is promoted to i32
arithmetic on Arm targets). This patch detects and removes type-promotions
within the reduction detection framework, enabling the vectorization of small
size reductions.
In the legality phase, we look through the ANDs and extensions that InstCombine
creates during promotion, keeping track of the smaller type. In the
profitability phase, we use the smaller type and ignore the ANDs and extensions
in the cost model. Finally, in the code generation phase, we truncate the result
of the reduction to allow InstCombine to rewrite the entire expression in the
smaller type.
This fixes PR21369.
http://reviews.llvm.org/D12202
Patch by Matt Simpson <mssimpso@codeaurora.org>!
llvm-svn: 246149
Globals in address spaces other than one may have 0 as a valid address,
so we should not assume that they can be null.
Reviewed by Philip Reames.
llvm-svn: 246137
A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-store forwarding across a release fence.
We do need to make sure that stores before the fence can't be eliminated even if there's another store to the same location after the fence. In theory, we could reorder the second store above the fence and *then* eliminate the former, but we can't do this if the stores are on opposite sides of the fence.
Note: While more aggressive then what's there, this patch is still implementing a really conservative ordering. In particular, I'm not trying to exploit undefined behavior via races, or the fact that the LangRef says only 'atomic' accesses are ordered w.r.t. fences.
Differential Revision: http://reviews.llvm.org/D11434
llvm-svn: 246134
When computing base pointers, we introduce new instructions to propagate the base of existing instructions which might not be bases. However, the algorithm doesn't make any effort to recognize when the new instruction to be inserted is the same as an existing one already in the IR. Since this is happening immediately before rewriting, we don't really have a chance to fix it after the pass runs without teaching loop passes about statepoints.
I'm really not thrilled with this patch. I've rewritten it 4 different ways now, but this is the best I've come up with. The case where the new instruction is just the original base defining value could be merged into the existing algorithm with some complexity. The problem is that we might have something like an extractelement from a phi of two vectors. It may be trivially obvious that the base of the 0th element is an existing instruction, but I can't see how to make the algorithm itself figure that out. Thus, I resort to the call to SimplifyInstruction instead.
Note that we can only adjust the instructions we've inserted ourselves. The live sets are still being tracked in side structures at this point in the code. We can't easily muck with instructions which might be in them. Long term, I'm really thinking we need to materialize the live pointer sets explicitly in the IR somehow rather than using side structures to track them.
Differential Revision: http://reviews.llvm.org/D12004
llvm-svn: 246133
This patch ensures that every analysis diagnostic produced by the vectorizer
will be printed if the loop has a vectorization hint on it. The condition has
also been improved to prevent printing when a disabling hint is specified.
llvm-svn: 246132
This is a one-line-change patch that moves the update to UnhandledWeights to the correct position: it should be updated for all clusters instead of just range clusters.
Differential Revision: http://reviews.llvm.org/D12391
llvm-svn: 246129
As Sanjoy pointed out over in http://reviews.llvm.org/D11819, a switch on an icmp should always be able to become a branch instruction. This patch generalizes that notion slightly to prove that the default case of a switch is unreachable if the cases completely cover all possible bit patterns in the condition. Once that's done, the switch to branch conversion kicks in just fine.
Note: Duplicate case values are disallowed by the LangRef and verifier.
Differential Revision: http://reviews.llvm.org/D11995
llvm-svn: 246125
Summary:
Let NVPTX backend detect integer min and max patterns during isel and emit intrinsics that enable hardware support.
Reviewers: jholewinski, meheff, jingyue
Subscribers: arsenm, llvm-commits, meheff, jingyue, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D12377
llvm-svn: 246107
Currently, when lowering switch statement and a new basic block is built for jump table / bit test header, the edge to this new block is not assigned with a correct weight. This patch collects the edge weight from all its successors and assign this sum of weights to the edge (and also the other fall-through edge). Test cases are adjusted accordingly.
Differential Revision: http://reviews.llvm.org/D12166#fae6eca7
llvm-svn: 246104
Things of note:
- Other linkage types aren't handled yet. We'll figure it out with dynamic linking.
- Special LLVM globals are either ignored, or error out for now.
- TLS isn't supported yet (WebAssembly will have threads later).
- There currently isn't a syntax for alignment, I left it in a comment so it's easy to hook up.
- Undef is convereted to whatever the type's appropriate null value is.
- assert versus report_fatal_error: follow what other AsmPrinters do, and assert only on what should have been caught elsewhere.
llvm-svn: 246092
This was causing problems when some functions use a GuardReg and some
don't as can happen when mixing SelectionDAG and FastISel generated
functions.
llvm-svn: 246075
If you're going to realign %sp to get object alignment properly (which
the code does), and stack offsets and alignments are calculated going
down from %fp (which they are), then the total stack size had better
be a multiple of the alignment. LLVM did indeed ensure that.
And then, after aligning, the sparc frame code added 96 (for sparcv8)
to the frame size, making any requested alignment of 64-bytes or
higher *guaranteed* to be misaligned. The test case added with r245668
even tests this exact scenario, and asserted the incorrect behavior,
which I somehow failed to notice. D'oh.
This change fixes the frame lowering code to align the stack size
*after* adding the spill area, instead.
Differential Revision: http://reviews.llvm.org/D12349
llvm-svn: 246042
This is a fix for disassembling unusual instruction sequences in 64-bit
mode w.r.t the CALL rel16 instruction. It might be desirable to move the
check somewhere else, but it essentially mimics the special case
handling with JCXZ in 16-bit mode.
The current behavior accepts the opcode size prefix and causes the
call's immediate to stop disassembling after 2 bytes. When debugging
sequences of instructions with this pattern, the disassembler output
becomes extremely unreliable and essentially useless (if you jump midway
into what lldb thinks is a unified instruction, you'll lose %rip). So we
ignore the prefix and consume all 4 bytes when disassembling a 64-bit
mode binary.
Note: in Vol. 2A 3-99 the Intel spec states that CALL rel16 is N.S. N.S.
is defined as:
Indicates an instruction syntax that requires an address override
prefix in 64-bit mode and is not supported. Using an address
override prefix in 64-bit mode may result in model-specific
execution behavior. (Vol. 2A 3-7)
Since 0x66 is an operand override prefix we should be OK (although we
may want to warn about 0x67 prefixes to 0xe8). On the CPUs I tested
with, they all ignore the 0x66 prefix in 64-bit mode.
Patch by Matthew Barney!
Differential Revision: http://reviews.llvm.org/D9573
llvm-svn: 246038
This was only added to preserve the old ScalarRepl's use of SSAUpdater
which was originally to avoid use of dominance frontiers. Now, we only
need a domtree, and we'll need a domtree right after this pass as well
and so it makes perfect sense to always and only use the dom-tree
powered mem2reg. This was flag-flipper earlier and has stuck reasonably
so I wanted to gut the now-dead code out of SROA before we waste more
time with it. Among other things, this will make passmanager porting
easier.
llvm-svn: 246028