PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from
an input pattern that looks like this:
and(or(x, c1), c2)
but the associated logic does not work if there are bits that are 1 in c1 but 0
in c2 (these are normally canonicalized away, but that can't happen if the 'or'
has other users. Make sure we abort the transformation if such bits are
discovered.
Fixes PR24704.
llvm-svn: 246900
When forming permutation-based unaligned vector loads, we need to know whether
it is valid to read ahead of the requested address by a full vector length.
Doing so is more efficient (and allows for more CSE with later loads), but
could trigger a page fault if invalid. To determine validity, we look for other
loads in the same block that access the relevant address range.
The relevant point here is that we need to do this as part of the process of
forming permutation-based vector loads, and this happens quite early in the
SDAG pipeline - specifically before many of the address calculations are fully
canonicalized. As a result, we need to try harder to recognize base+offset
address computations, because they still might appear as chain of adds
(base+offset+offset, for example). To account for this, we'll look through
chains of adds, accumulating the constant offsets.
llvm-svn: 246813
If you compute the MMO offset using unsigned arithmetic, you end up with a
large positive offset instead of a small negative one. In theory, this could
cause bad instruction-scheduling decisions later.
I noticed this by inspection from the debug output, and using that for the
regression test is the best I can do right now.
llvm-svn: 246805
Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time
we have a merged access candidate. Without this patch, we were generating unaligned
16-byte (SSE) memops for x86 targets where those accesses are slow.
This change was mentioned in:
http://reviews.llvm.org/D10662 and
http://reviews.llvm.org/D10905
and will help solve PR21711.
Differential Revision: http://reviews.llvm.org/D12573
llvm-svn: 246771
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
llvm-svn: 246769
Summary:
This intrinsic can be used to extract a pointer to the exception caught by
a given catchpad. Its argument has token type and must be a `catchpad`.
Also clarify ExtendingLLVM documentation regarding overloaded intrinsics.
Reviewers: majnemer, andrew.w.kaylor, sanjoy, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12533
llvm-svn: 246752
We used to accept (and even test, and generate) 16-byte alignment
for 32-byte nontemporal stores, but they require 32-byte alignment,
per SDM. Found by inspection.
Instead of hardcoding 16 in the patfrag, check for natural alignment.
Also fix the autoupgrade and the various tests.
Also, use explicit -mattr instead of -mcpu: I stared at the output
several minutes wondering why I get 2x movntps for the unaligned
case (which is the ideal output, but needs some work: see FIXME),
until I remembered corei7-avx implies +slow-unaligned-mem-32.
llvm-svn: 246733
I'm adding a regression test to better cover code generation for unaligned
vector loads and stores, but there's no functional change to the code
generation here. There is an improvement to the cost model for unaligned vector
loads and stores, mostly for QPX (for which we were not previously accounting
for the permutation-based loads), and the cost model implementation is cleaner.
llvm-svn: 246712
This patch uses the metadata defined in D12341 to avoid creating an unpredictable branch.
Differential Revision: http://reviews.llvm.org/D12342
llvm-svn: 246692
This patch uses the metadata defined in D12341 to avoid creating an unpredictable branch.
Differential Revision: http://reviews.llvm.org/D12343
llvm-svn: 246691
LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the
TableGen-generated matching code, and it does this by testing the same
predicates used by the TableGen files. Unfortunately, when we added new
P8Altivec-only predicates, we started universally testing them in
LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8,
we'd end up with a selection failure.
llvm-svn: 246675
This is a continuation of the fix from:
http://reviews.llvm.org/D10662
and discussion in:
http://reviews.llvm.org/D12154
Here, we distinguish slow unaligned SSE (128-bit) accesses from slow unaligned
scalar (64-bit and under) accesses. Other lowering (eg, getOptimalMemOpType)
assumes that unaligned scalar accesses are always ok, so this changes
allowsMisalignedMemoryAccesses() to match that behavior.
Differential Revision: http://reviews.llvm.org/D12543
llvm-svn: 246658
Vector 'getelementptr' with scalar base is an opportunity for gather/scatter intrinsic to generate a better sequence.
While looking for uniform base, we want to use the scalar base pointer of GEP, if exists.
Differential Revision: http://reviews.llvm.org/D11121
llvm-svn: 246622
The code introduced in r244314 assumed that EXTRACT_VECTOR_ELT only
takes constant indices, but it does accept variables.
Bail out for those: we can't use them, as the shuffles we want to
reconstruct do require constant masks.
llvm-svn: 246594
This matches the ARM behavior. In both cases, the register is part
of the optional Performance Monitors extension, so, add the feature,
and enable it for the A-class processors we support.
Differential Revision: http://reviews.llvm.org/D12425
llvm-svn: 246555
Summary:
Interleaved access lowering removes a memory operation and a
sequence of vector shuffles and replaces it with a series of
memory operations. This should be always beneficial.
This pass in only enabled on ARM/AArch64.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D12145
llvm-svn: 246540
Currently, when edge weights are assigned to edges that are created when lowering switch statement, the weight on the edge to default statement (let's call it "default weight" here) is not considered. We need to distribute this weight properly. However, without value profiling, we have no idea how to distribute it. In this patch, I applied the heuristic that this weight is evenly distributed to successors.
For example, given a switch statement with cases 1,2,3,5,10,11,20, and every edge from switch to each successor has weight 10. If there is a binary search tree built to test if n < 10, then its two out-edges will have weight 4x10+10/2 = 45 and 3x10 + 10/2 = 35 respectively (currently they are 40 and 30 without considering the default weight). Each distribution (which is 5 here) will be stored in each SwitchWorkListItem for further distribution.
There are some exceptions:
For a jump table header which doesn't have any edge to default statement, we don't distribute the default weight to it.
For a bit test header which covers a contiguous range and hence has no edges to default statement, we don't distribute the default weight to it.
When the branch checks a single value or a contiguous range with no edge to default statement, we don't distribute the default weight to it.
In other cases, the default weight is evenly distributed to successors.
Differential Revision: http://reviews.llvm.org/D12418
llvm-svn: 246522
SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.
The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.
The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:
(setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
(which is possible for some combinations of (op1, op2))
If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.
To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).
Fixes PR24636.
llvm-svn: 246507
Summary: This handles all load/store operations that WebAssembly defines, and handles those necessary for C++ such as i1. I left a FIXME for outstanding features which aren't required for now.
Reviewers: sunfish
Subscribers: jfb, llvm-commits, dschuff
llvm-svn: 246500
Also delete and simplify a lot of MachineModuleInfo code that used to be
needed to handle personalities on landingpads. Now that the personality
is on the LLVM Function, we no longer need to track it this way on MMI.
Certainly it should not live on LandingPadInfo.
llvm-svn: 246478
The ISelLowering code turned insertion turned the element for the
lowest lane of a BUILD_VECTOR into an INSERT_SUBREG, this prohibited
the patterns for SCALAR_TO_VECTOR(Load) to match later. Restrict this
to cases without a load argument.
Reported in rdar://22223823
Differential Revision: http://reviews.llvm.org/D12467
llvm-svn: 246462
AggressiveAntiDepBreaker was doing some EarlyClobber checking, but was not
checking that the register being potentially renamed was defined by an
early-clobber def where there was also a use, in that instruction, of the
register being considered as the target of the rename. Fixes PR24014.
llvm-svn: 246423
There were really two problems here. The first was that we had the truth tables
for signed i1 comparisons backward. I imagine these are not very common, but if
you have:
setcc i1 x, y, LT
this has the '0 1' and the '1 0' results flipped compared to:
setcc i1 x, y, ULT
because, in the signed case, '1 0' is really '-1 0', and the answer is not the
same as in the unsigned case.
The second problem was that we did not have patterns (at all) for the unsigned
comparisons select_cc nodes for i1 comparison operands. This was the specific
cause of PR24552. These had to be added (and a missing Altivec promotion added
as well) to make sure these function for all types. I've added a bunch more
test cases for these patterns, and there are a few FIXMEs in the test case
regarding code-quality.
Fixes PR24552.
llvm-svn: 246400
Add support for MIR serialization of PowerPC-specific operand target flags
(based on the generic infrastructure added in r244185 and r245383).
I won't even pretend that this is good test coverage, but this includes the
regression test associated with r246372. Adding an MIR test for that fix is far
superior to adding an IR-level test because particular instruction-scheduling
decisions are necessary in order to expose the bug, and using an MIR test we
can start the pipeline post-scheduling.
llvm-svn: 246373
This is especially visible in softfp mode, for example in the implementation of libm fabs/fneg functions. If we have:
%1 = vmovdrr r0, r1
%2 = fabs %1
then move the fabs before the vmovdrr:
%1 = and r1, #0x7FFFFFFF
%2 = vmovdrr r0, r1
This is never a lose, and could be a serious win because the vmovdrr may be followed by a vmovrrd, which would enable us to remove the conversion into FPRs completely.
We already do this for f32, but not for f64. Tests are added for both.
llvm-svn: 246360
The VOP3 encoding of these allows any SGPR pair for the i1
output, but this was forced before to always use vcc.
This doesn't yet try to use this, but does add the operand
to the definitions so the main change is adding vcc to the
output of the VOP2 encoding.
llvm-svn: 246358
Without a memory operand, mayLoad or mayStore instructions
are treated as hasUnorderedMemRef, which results in much worse
scheduling.
We really should have a verifier check that any
non-side effecting mayLoad or mayStore has a memory operand.
There are a few instructions (interp and images) which I'm
not sure what / where to add these.
llvm-svn: 246356
I'm working on adding !dbg attachments to functions (PR23367), which
we'll use to determine the canonical subprogram for a function (instead
of the `subprograms:` array in the compile units). This updates a few
old tests in preparation.
Transforms/Mem2Reg/ConvertDebugInfo2.ll had an old-style grep+count
based test that would start to fail because I've added an extra line
with `!dbg`. Instead, explicitly `CHECK` for what I think the test
actually cares about.
All three testcases have subprograms with a valid `function:` reference
-- which means my upgrade script will add a `!dbg` attachment -- but
that aren't referenced from any compile unit. I suspect these testcases
were handreduced over-zealously (or have bitrotted?). Add a reference
from the compile unit so that upcoming Verifier checks won't fail here.
llvm-svn: 246351
As a follow-up to r246098, require `DISubprogram` definitions
(`isDefinition: true`) to be 'distinct'. Specifically, add an assembler
check, a verifier check, and bitcode upgrading logic to combat testcase
bitrot after the `DIBuilder` change.
While working on the testcases, I realized that
test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its
purpose was to check for a corner case in PR22792 where two subprogram
definitions match exactly and share the same metadata node. The new
verifier check, requiring that subprogram definitions are 'distinct',
precludes that possibility.
I updated almost all the IR with the following script:
git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' |
grep -v test/Bitcode |
xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/'
Likely some variant of would work for out-of-tree testcases.
llvm-svn: 246327
When combiner AA is enabled, look at stores on the same chain.
Non-aliasing stores are moved to the same chain so the existing
code fails because it expects to find an adajcent store on a consecutive
chain.
Because of how DAGCombiner tries these store combines,
MergeConsecutiveStores doesn't see the correct set of stores on the chain
when it visits the other stores. Each store individually has its chain
fixed before trying to merge consecutive stores, and then tries to merge
stores from that point before the other stores have been processed to
have their chains fixed. To fix this, attempt to use FindBetterChain
on any possibly neighboring stores in visitSTORE.
Suppose you have 4 32-bit stores that should be merged into 1 vector
store. One store would be visited first, fixing the chain. What happens is
because not all of the store chains have yet been fixed, 2 of the stores
are merged. The other 2 stores later have their chains fixed,
but because the other stores were already merged, they have different
memory types and merging the two different sized stores is not
supported and would be more difficult to handle.
llvm-svn: 246307
Summary:
Change the coloring algorithm in WinEHPrepare to visit a funclet's exits
in its parents' contexts and so properly classify the continuations of
nested funclets.
Also change the placement of cloned blocks to be deterministic and to
maintain the relative order of each funclet's blocks.
Add a lit test showing various patterns that require cloning, the last
several of which don't have CHECKs yet because they require cloning
entire funclets which is NYI.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12353
llvm-svn: 246245
more than 2 instructions.
I introduced this regression a while back and did not noticed it because I
somehow forgot to push the initial test cases for the pass!
Fix that as well!
llvm-svn: 246239
We can now run 32-bit programs with empty catch bodies. The next step
is to change PEI so that we get funclet prologues and epilogues.
llvm-svn: 246235
Fixes PR24602: r245689 introduced an unguarded use of
SelectionDAG::FoldConstantArithmetic, which returns 0 when it fails
because of opaque (hoisted) constants.
llvm-svn: 246217
This is a one-line-change patch that moves the update to UnhandledWeights to the correct position: it should be updated for all clusters instead of just range clusters.
Differential Revision: http://reviews.llvm.org/D12391
llvm-svn: 246129
Summary:
Let NVPTX backend detect integer min and max patterns during isel and emit intrinsics that enable hardware support.
Reviewers: jholewinski, meheff, jingyue
Subscribers: arsenm, llvm-commits, meheff, jingyue, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D12377
llvm-svn: 246107
Currently, when lowering switch statement and a new basic block is built for jump table / bit test header, the edge to this new block is not assigned with a correct weight. This patch collects the edge weight from all its successors and assign this sum of weights to the edge (and also the other fall-through edge). Test cases are adjusted accordingly.
Differential Revision: http://reviews.llvm.org/D12166#fae6eca7
llvm-svn: 246104
Things of note:
- Other linkage types aren't handled yet. We'll figure it out with dynamic linking.
- Special LLVM globals are either ignored, or error out for now.
- TLS isn't supported yet (WebAssembly will have threads later).
- There currently isn't a syntax for alignment, I left it in a comment so it's easy to hook up.
- Undef is convereted to whatever the type's appropriate null value is.
- assert versus report_fatal_error: follow what other AsmPrinters do, and assert only on what should have been caught elsewhere.
llvm-svn: 246092
This was causing problems when some functions use a GuardReg and some
don't as can happen when mixing SelectionDAG and FastISel generated
functions.
llvm-svn: 246075
If you're going to realign %sp to get object alignment properly (which
the code does), and stack offsets and alignments are calculated going
down from %fp (which they are), then the total stack size had better
be a multiple of the alignment. LLVM did indeed ensure that.
And then, after aligning, the sparc frame code added 96 (for sparcv8)
to the frame size, making any requested alignment of 64-bytes or
higher *guaranteed* to be misaligned. The test case added with r245668
even tests this exact scenario, and asserted the incorrect behavior,
which I somehow failed to notice. D'oh.
This change fixes the frame lowering code to align the stack size
*after* adding the spill area, instead.
Differential Revision: http://reviews.llvm.org/D12349
llvm-svn: 246042
Summary:
This change makes the variable argument intrinsics, `llvm.va_start` and
`llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows
inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch
I have to add support for GCC's `__builtin_ms_va_list` constructs.
Reviewers: nadav, asl, eugenis
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1622
llvm-svn: 245990
When lowering switch statement, if bit tests are used then LLVM will always generates a jump to the default statement in the last bit test. However, this is not necessary when all cases in bit tests cover a contiguous range. This is because when generating the bit tests header MBB, there is a range check that guarantees cases in bit tests won't go outside of [low, high], where low and high are minimum and maximum case values in the bit tests. This patch checks if this is the case and then doesn't emit jump to default statement and hence saves a bit test and a branch.
Differential Revision: http://reviews.llvm.org/D12249
llvm-svn: 245976
This is a follow-on from the discussion in http://reviews.llvm.org/D12154.
This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.
A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx
This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449
Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.
Differential Revision: http://reviews.llvm.org/D12288
llvm-svn: 245950
This fixes two issues in x86 fptoui lowering.
1) Makes conversions from f80 go through the right path on AVX-512.
2) Implements an inline sequence for fptoui i64 instead of a library
call. This improves performance by 6X on SSE3+ and 3X otherwise.
Incidentally, it also removes the use of ftol2 for fptoui, which was
wrong to begin with, as ftol2 converts to a signed i64, producing
wrong results for values >= 2^63.
Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D11316
llvm-svn: 245924
We might end up with a trivial copy as the addend, and if so, we should ignore
the corresponding FMA instruction. The trivial copy can be coalesced away later,
so there's nothing to do here. We should not, however, assert. Fixes PR24544.
llvm-svn: 245907
Summary: I forgot to squash git commits before doing an svn dcommit of D12219. Reverting, and re-submitting.
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D12298
llvm-svn: 245886
This patch fixes PR24546, which demonstrates a segfault during the VSX
swap removal pass. The problem is that debug value instructions were
not excluded from the list of instructions to be analyzed for webs of
related computation. I've added the test case from the PR as a crash
test in test/CodeGen/PowerPC.
llvm-svn: 245862
The FP16_TO_FP node only uses the bottom 16 bits of its input, so the
following pattern can be optimised by removing the AND:
(FP16_TO_FP (AND op, 0xffff)) -> (FP16_TO_FP op)
This is a common pattern for ARM targets when functions have __fp16
arguments, as they are passed as floats (so that they get passed in the
correct registers), but then bitcast and truncated to ignore the top 16
bits.
llvm-svn: 245832
Summary:
WinEHPrepare is going to require that cleanuppad and catchpad produce values
of token type which are consumed by any cleanupret or catchret exiting the
pad. This change updates the signatures of those operators to require/enforce
that the type produced by the pads is token type and that the rets have an
appropriate argument.
The catchpad argument of a `CatchReturnInst` must be a `CatchPadInst` (and
similarly for `CleanupReturnInst`/`CleanupPadInst`). To accommodate that
restriction, this change adds a notion of an operator constraint to both
LLParser and BitcodeReader, allowing appropriate sentinels to be constructed
for forward references and appropriate error messages to be emitted for
illegal inputs.
Also add a verifier rule (noted in LangRef) that a catchpad with a catchpad
predecessor must have no other predecessors; this ensures that WinEHPrepare
will see the expected linear relationship between sibling catches on the
same try.
Lastly, remove some superfluous/vestigial casts from instruction operand
setters operating on BasicBlocks.
Reviewers: rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12108
llvm-svn: 245797
Summary:
__shared__ variable may now emit undef value as initializer, do not
throw error on that.
Test Plan: test/CodeGen/NVPTX/global-addrspace.ll
Patch by Xuetian Weng
Reviewers: jholewinski, tra, jingyue
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D12242
llvm-svn: 245785
We can wait on either VM, EXP or LGKM.
The waits are independent.
Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.
Here's an example of subtle perf reduction this patch solves:
This is without the patch:
buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen
The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.
Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.
Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.
Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.
Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.
Patch by: Axel Davy
Differential Revision: http://reviews.llvm.org/D11883
llvm-svn: 245755
When PPCVSXFMAMutate would look at the input addend register, it would get its
input value number. This would fail, however, if the register was undef,
causing a segfault. Don't segfault (just skip such FMA instructions).
Fixes the test case from PR24542 (although that may have been over-reduced).
llvm-svn: 245741
See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software
Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data.
llvm-svn: 245733
This will confirm that the patch in D12154 is actually NFC.
It will also confirm that the proposed changes for the AMD chips
are behaving as expected.
llvm-svn: 245704
This is intended to improve code generation for GEPs, as the index value is
shifted by the element size and in GEPs of multi-dimensional arrays the index
of higher dimensions is multiplied by the lower dimension size.
Differential Revision: http://reviews.llvm.org/D12197
llvm-svn: 245689
Note: I do not implement a base pointer, so it's still impossible to
have dynamic realignment AND dynamic alloca in the same function.
This also moves the code for determining the frame index reference
into getFrameIndexReference, where it belongs, instead of inline in
eliminateFrameIndex.
[Begin long-winded screed]
Now, stack realignment for Sparc is actually a silly thing to support,
because the Sparc ABI has no need for it -- unlike the situation on
x86, the stack is ALWAYS aligned to the required alignment for the CPU
instructions: 8 bytes on sparcv8, and 16 bytes on sparcv9.
However, LLVM unfortunately implements user-specified overalignment
using stack realignment support, so for now, I'm going to go along
with that tradition. GCC instead treats objects which have alignment
specification greater than the maximum CPU-required alignment for the
target as a separate block of stack memory, with their own virtual
base pointer (which gets aligned). Doing it that way avoids needing to
implement per-target support for stack realignment, except for the
targets which *actually* have an ABI-specified stack alignment which
is too small for the CPU's requirements.
Further unfortunately in LLVM, the default canRealignStack for all
targets effectively returns true, despite that implementing that is
something a target needs to do specifically. So, the previous behavior
on Sparc was to silently ignore the user's specified stack
alignment. Ugh.
Yet MORE unfortunate, if a target actually does return false from
canRealignStack, that also causes the user-specified alignment to be
*silently ignored*, rather than emitting an error.
(I started looking into fixing that last, but it broke a bunch of
tests, because LLVM actually *depends* on having it silently ignored:
some architectures (e.g. non-linux i386) have smaller stack alignment
than spilled-register alignment. But, the fact that a register needs
spilling is not known until within the register allocator. And by that
point, the decision to not reserve the frame pointer has been frozen
in place. And without a frame pointer, stack realignment is not
possible. So, canRealignStack() returns false, and
needsStackRealignment() then returns false, assuming everyone can just
go on their merry way assuming the alignment requirements were
probably just suggestions after-all. Sigh...)
Differential Revision: http://reviews.llvm.org/D12208
llvm-svn: 245668
When producing conditional compare sequences for or operations we need
to negate the operands and the finally tested flags. The thing is if we negate
the finally tested flags this equals a logical negation of all previously
emitted expressions. There was a case missing where we have to order OR
expressions so they get emitted first.
This fixes http://llvm.org/PR24459
llvm-svn: 245641
Create CMP;CCMP sequences from and/or trees does not gain us anything if
the and/or tree is materialized to a GP register anyway. While most of
the code already checked for hasOneUse() there was one important case
missing.
llvm-svn: 245640
Fixes PR23464: one way to use the broadcast intrinsics is:
_mm256_broadcastw_epi16(_mm_cvtsi32_si128(*(int*)src));
We don't currently fold this, but now that we use native IR for
the intrinsics (r245605), we can look through one bitcast to find
the broadcast scalar.
Differential Revision: http://reviews.llvm.org/D10557
llvm-svn: 245613
Since r245605, the clang headers don't use these anymore.
r245165 updated some of the tests already; update the others, add
an autoupgrade, remove the intrinsics, and cleanup the definitions.
Differential Revision: http://reviews.llvm.org/D10555
llvm-svn: 245606
XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs
to be v2i64 (as that's the corresponding SETCC type).
Fixes PR24225.
llvm-svn: 245535
This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128
operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)),
but shouldn't (it should only apply to f32/f64 types). The result was a crash.
llvm-svn: 245530
This commit modifies the serialization syntax so that the global IR values in
machine memory operands use the global value '@<name>' syntax instead of the
current '%ir.<name>' syntax.
The unnamed global IR values are handled by this commit as well, as the
existing global value parsing method can parse the unnamed globals already.
llvm-svn: 245527
The global IR values in machine memory operands should use the global value
'@<name>' syntax instead of the current '%ir.<name>' syntax.
However, the global value call entry pseudo source values use the global value
syntax already. Therefore, the syntax for the call entry pseudo source values
has to be changed so that the global values and call entry global value PSVs
can be parsed without ambiguities.
llvm-svn: 245526
We still need to add constant folding of vector comparisons to fold the tests for targets that don't support the respective min/max nodes
I needed to update 2011-12-06-AVXVectorExtractCombine to load a vector instead of using a constant vector to prevent it folding
Differential Revision: http://reviews.llvm.org/D12118
llvm-svn: 245503
We are already falling back to SelectionDAG when encountering an shift with UB.
This adds the same checks for shifts with UB that get folded into arithmetic or
logical operations.
This fixes rdar://problem/22345295.
llvm-svn: 245499
We don't do a great job with >= 0 comparisons against zero when the
result is used as an i8.
Given something like:
void f(long long LL, bool *B) {
*B = LL >= 0;
}
We used to generate:
shrq $63, %rdi
xorb $1, %dil
movb %dil, (%rsi)
Now we generate:
testq %rdi, %rdi
setns (%rsi)
Differential Revision: http://reviews.llvm.org/D12136
llvm-svn: 245498
Previously WebAssembly's datalayout string had -v128:8:128. This had been an
attempt to declare a certain level of support for unaligned SIMD accesses.
However, clang makes its own determinations for SIMD alignment that are
independent of the datalayout string, so this wasn't actually meaningful.
llvm-svn: 245494
Check to see if this is a CONCAT_VECTORS of a bunch of EXTRACT_SUBVECTOR operations. If so, and if the EXTRACT_SUBVECTOR vector inputs come from at most two distinct vectors the same size as the result, attempt to turn this into a legal shuffle.
Differential Revision: http://reviews.llvm.org/D12125
llvm-svn: 245490
This commit serializes the machine instruction's register operand ties.
The ties are printed out only when the instructon has register ties that are
different from the ties that are specified in the instruction's description.
llvm-svn: 245482
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. I am working on resolving the issue, but in the
meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64
and the associated testing until I can get this fixed.
llvm-svn: 245481
The defined registers are already serialized - they are represented by placing
them before the '=' in a machine instruction. However, certain instructions like
INLINEASM can have defined register operands after the '=', so this commit
introduces the 'def' register flag for such operands.
llvm-svn: 245480
Reintroduce r245442. Remove an overly conservative assertion introduced
in r245442. We could replace the assertion to use `shareSameRegisterFile`
instead, but in that point in `insertPHI` we already lost the original
Def subreg to check against. So drop the assertion completely.
Original commit message:
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
A:
psllq %mm1, %mm0
movd %mm0, %r9
jmp C
B:
por %mm1, %mm0
movd %mm0, %r9
jmp C
C:
movd %r9, %mm0
pshufw $238, %mm0, %mm0
Becomes:
A:
psllq %mm1, %mm0
jmp C
B:
por %mm1, %mm0
jmp C
C:
pshufw $238, %mm0, %mm0
Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526
llvm-svn: 245479
This patch updates the X86 lowering so that the Exception Pointer and Selector
are 64-bit wide only if Subtarget.isTarget64BitLP64.
Patch by João Porto
Reviewers: dschuff, rnk
Differential Revision: http://reviews.llvm.org/D12111
llvm-svn: 245454
Reapply r243486.
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
A:
psllq %mm1, %mm0
movd %mm0, %r9
jmp C
B:
por %mm1, %mm0
movd %mm0, %r9
jmp C
C:
movd %r9, %mm0
pshufw $238, %mm0, %mm0
Becomes:
A:
psllq %mm1, %mm0
jmp C
B:
por %mm1, %mm0
jmp C
C:
pshufw $238, %mm0, %mm0
Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526
llvm-svn: 245442
Summary:
The mid-end was generating vector smin/smax/umin/umax nodes, but
we were using vbsl to generatate the code. This adds the vmin/vmax
patterns and a test to check that we are now generating vmin/vmax
instructions.
Reviewers: rengolin, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D12105
llvm-svn: 245439
There are some cases where the mul sequence is smaller, but for the most part,
using a div is preferable. This does not apply to vectors, since x86 doesn't
have vector idiv, and a vector mul/shifts sequence ought to be smaller than a
scalarized division.
Differential Revision: http://reviews.llvm.org/D12082
llvm-svn: 245431
This commit adds support for bit mask target flag serialization to the MIR
printer and the MIR parser. It also adds support for the machine operand's
target flag serialization to the AArch64 target.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 245383
The current code normalizes select(C0, x, select(C1, x, y)) towards
select(C0|C1, x, y) if the targets prefers that form. This patch adds an
additional rule that if the select(C1, x, y) part already exists in the
function then we want to normalize into the other direction because the
effects of reusing the existing value are bigger than transforming into
the target preferred form.
This addresses regressions following r238793, see also:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150727/290272.html
Differential Revision: http://reviews.llvm.org/D11616
llvm-svn: 245350
State numbers are calculated by performing a walk from the innermost
funclet to the outermost funclet. Rudimentary support for the new EH
constructs has been added to the assembly printer, just enough to test
the new machinery.
Differential Revision: http://reviews.llvm.org/D12098
llvm-svn: 245331
It is possible to be in a situation where more than one funclet token is
a valid SSA value. If we see a terminator which exits a funclet which
doesn't use the funclet's token, replace it with unreachable.
Differential Revision: http://reviews.llvm.org/D12074
llvm-svn: 245238
Summary:
When demoting an SSA value that has a use on a phi and one of the phi's
predecessors terminates with catchret, the edge needs to be split and the
load inserted in the new block, else we'll still have a cross-funclet SSA
value.
Add a test for this, and for the similar case where a def to be spilled is
on and invoke and a critical edge, which was already implemented but
missing a test.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12065
llvm-svn: 245218
These only get generated if the target supports them. If one of the variants is not legal and the other is, and it is safe to do so, the other variant will be emitted.
For example on AArch32 (V8), we have scalar fminnm but not fmin.
Fix up a couple of tests while we're here - one now produces better code, and the other was just plain wrong to start with.
llvm-svn: 245196
We can set additional bits in a mask given that we know the other
operand of an AND already has some bits set to zero. This can be more
efficient if doing so allows us to use an instruction which implicitly
sign extends the immediate.
This fixes PR24085.
Differential Revision: http://reviews.llvm.org/D11289
llvm-svn: 245169
For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors.
(zext (truncate x)) -> (zext (and(x, m))
Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code.
This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles.
Differential Revision: http://reviews.llvm.org/D11764
llvm-svn: 245160
Some personality routines require funclet exit points to be clearly
marked, this is done by producing a token at the funclet pad and
consuming it at the corresponding ret instruction. CleanupReturnInst
already had a spot for this operand but CatchReturnInst did not.
Other personality routines don't need to use this which is why it has
been made optional.
llvm-svn: 245149
Although targeting CoreCLR is similar to targeting MSVC, there are
certain important differences that the backend must be aware of
(e.g. differences in stack probes, EH, and library calls).
Differential Revision: http://reviews.llvm.org/D11012
llvm-svn: 245115
We canonicalize V64 vectors to V128 through insert_subvector: the other
FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't,
so we'd fail to match fmls and generate fneg+fmla instead.
The vector equivalents are already tested and functional.
llvm-svn: 245107
This patch makes the Darwin ARM backend take advantage of TargetParser. It
also teaches TargetParser about ARMV7K for the first time. This makes target
triple parsing more consistent across llvm.
Differential Revision: http://reviews.llvm.org/D11996
llvm-svn: 245081
This patch fixes the x86 implementation of allowsMisalignedMemoryAccess() to correctly
return the 'Fast' output parameter for 32-byte accesses. To test that, an existing load
merging optimization is changed to use the TLI hook. This exposes a shortcoming in the
current logic and results in the regression test update. Changing other direct users of
the isUnalignedMem32Slow() x86 CPU attribute would be a follow-on patch.
Without the fix in allowsMisalignedMemoryAccesses(), we will infinite loop when targeting
SandyBridge because LowerINSERT_SUBVECTOR() creates 32-byte loads from two 16-byte loads
while PerformLOADCombine() splits them back into 16-byte loads.
Differential Revision: http://reviews.llvm.org/D10662
llvm-svn: 245075
This reverts commit r245047.
It was failing on the darwin bots. The problem was that when running
./bin/llc -march=msp430
llc gets to
if (TheTriple.getTriple().empty())
TheTriple.setTriple(sys::getDefaultTargetTriple());
Which means that we go with an arch of msp430 but a triple of
x86_64-apple-darwin14.4.0 which fails badly.
That code has to be updated to select a triple based on the value of
march, but that is not a trivial fix.
llvm-svn: 245062
Other than some places that were handling unknown as ELF, this should
have no change. The test updates are because we were detecting
arm-coff or x86_64-win64-coff as ELF targets before.
It is not clear if the enum should live on the Triple. At least now it lives
in a single location and should be easier to move somewhere else.
llvm-svn: 245047
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.
I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.
llvm-svn: 245035
This commit modifies the way the machine basic blocks are serialized - now the
machine basic blocks are serialized using a custom syntax instead of relying on
YAML primitives. Instead of using YAML mappings to represent the individual
machine basic blocks in a machine function's body, the new syntax uses a single
YAML block scalar which contains all of the machine basic blocks and
instructions for that function.
This is an example of a function's body that uses the old syntax:
body:
- id: 0
name: entry
instructions:
- '%eax = MOV32r0 implicit-def %eflags'
- 'RETQ %eax'
...
The same body is now written like this:
body: |
bb.0.entry:
%eax = MOV32r0 implicit-def %eflags
RETQ %eax
...
This syntax change is motivated by the fact that the bundled machine
instructions didn't map that well to the old syntax which was using a single
YAML sequence to store all of the machine instructions in a block. The bundled
machine instructions internally use flags like BundledPred and BundledSucc to
determine the bundles, and serializing them as MI flags using the old syntax
would have had a negative impact on the readability and the ease of editing
for MIR files. The new syntax allows me to serialize the bundled machine
instructions using a block construct without relying on the internal flags,
for example:
BUNDLE implicit-def dead %itstate, implicit-def %s1 ... {
t2IT 1, 24, implicit-def %itstate
%s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate
}
This commit also converts the MIR testcases to the new syntax. I developed
a script that can convert from the old syntax to the new one. I will post the
script on the llvm-commits mailing list in the thread for this commit.
llvm-svn: 244982
This patch corresponds to review:
http://reviews.llvm.org/D11471
It improves the code generated for converting a scalar to a vector value. With
direct moves from GPRs to VSRs, we no longer require expensive stack operations
for this. Subsequent patches will handle the reverse case and more general
operations between vectors and their scalar elements.
llvm-svn: 244921
They rely on global fast-math options, but soon ISel will rely only on fast-math flags on the instructions themselves. Rip the fast checks out into their own file so we can mark their instructions as fast.
llvm-svn: 244914
These tests relied on -enable-no-nans-fp-math, whereas soon they'll take their no-nans hint
from the FCMP instruction itself, so split the no-nans stuff out into its own test.
Also do a slight rejig of instruction order. The old FMIN/MAX backend matching had to deal with looking through casts, which it never did particularly well. Now, instcombine will recognize such patterns and canonicalize the cast outside the select. So modify the test inputs to assume that instcombine has already run.
llvm-svn: 244913
Summary:
Update the demotion logic in WinEHPrepare to avoid creating new cleanups by
walking predecessors as necessary to insert stores for EH-pad PHIs.
Also avoid creating stores for EH-pad PHIs that have no uses.
The store/load placement is still pretty naive. Likely future improvements
(at least for optimized compiles) include:
- Share loads for related uses as possible
- Coalesce non-interfering use/def-related PHIs
- Store at definition point rather than each PHI pred for non-interfering
lifetimes.
Reviewers: rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11955
llvm-svn: 244894
Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when
attempting to compile a routine with a return type of
{ <4 x float>, <4 x float>, <4 x float>, <4 x float> }
on a system without vector instruction support.
This is because after legalizing the vector type, we get a return value
consisting of 16 floats, which cannot all be returned in registers.
Usually, what should happen in this case is that the target's CanLowerReturn
routine rejects the return type, in which case SelectionDAG falls back to
implementing a structure return in memory via implicit reference.
However, the SystemZ target never actually implemented any CanLowerReturn
routine, and thus would accept any struct return type.
This patch fixes the crash by implementing CanLowerReturn. As a side effect,
this also handles fp128 return values, fixing a todo that was noted in
SystemZCallingConv.td.
llvm-svn: 244889
Other than PC-relative loads/store the patterns that match the various
load/store addressing modes have the same complexity, so the order that they
are matched is the order that they appear in the .td file.
Rearrange the instruction definitions in ARMInstrThumb.td, and make use of
AddedComplexity for PC-relative loads, so that the instruction matching order
is the order that results in the simplest selection logic. This also makes
register-offset load/store be selected when it should, as previously it was
only selected for too-large immediate offsets.
Differential Revision: http://reviews.llvm.org/D11800
llvm-svn: 244882
Now that we can properly promote mismatched FCOPYSIGNs (r244858), we
can mark the FP_ROUND on the result as truncating, to expose folding.
FCOPYSIGN doesn't change anything but the sign bit, so
(fp_round (fcopysign (fpext a), b))
is equivalent to (modulo the sign bit):
(fp_round (fpext a))
which is a no-op.
llvm-svn: 244862
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.
Follow-up to r243924, r243926, and r244858.
llvm-svn: 244860
We don't care about its type, and there's even a combine that'll fold
away the FP_EXTEND if we let it run. However, until it does, we'll have
something broken like:
(f32 (fp_extend (f64 v)))
Scalar f16 follow-up to r243924.
llvm-svn: 244858
Summary:
D11924 implemented part of the floating-point comparisons, this patch implements the rest:
* Tell ISelLowering that all booleans are either 0 or 1.
* Expand the eq/ne/lt/le/gt/ge floating-point comparisons to the canonical ones (similar to what Mips32r6InstrInfo.td does).
* Add tests for ord/uno.
* Add tests for ueq/one/ult/ule/ugt/uge.
* Fix existing comparison tests to remove the (res & 1) code, which setBooleanContents stops from generating.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11970
llvm-svn: 244779
r242520 was reverted in r244313 as the expected behaviour of the alias
attribute in C is that the alias has the same size as the aliasee. However
we can re-introduce adding the size on the alias when the aliasee does not,
from a source code or object perspective, exist as a discrete entity. This
happens when the aliasee is not a symbol, or when that symbol is private.
Differential Revision: http://reviews.llvm.org/D11943
llvm-svn: 244752
On Mach-O emitting aliases for the variables that make up a MergedGlobals
variable can cause problems when linking with dead stripping enabled so don't
do that, except for external variables where we must emit an alias.
llvm-svn: 244748
This abstracts away the test for "when can we fold across a MachineInstruction"
into the the MI interface, and changes call-frame optimization use the same test
the peephole optimizer users.
Differential Revision: http://reviews.llvm.org/D11945
llvm-svn: 244729
As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).
InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).
I also moved all the relevant combine tests into InstCombine/blend_x86.ll
Differential Revision: http://reviews.llvm.org/D11934
llvm-svn: 244723
For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor
fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason
is that many index computations are carried out in 64-bits but never actually exceed the
capacity of a 32-bit word.
llvm-svn: 244684
Some of the FP comparisons (ueq, one, ult, ule, ugt, uge) are currently broken, I'll fix them in a follow-up.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11924
llvm-svn: 244665
Summary:
For example:
s6 = s0*s5;
s2 = s6*s6 + s6;
...
s4 = s6*s3;
We notice that it is possible for s2 is folded to fma (s0, s5, fmul (s6 s6)).
This only happens when Aggressive is true, otherwise hasOneUse() check
already prevents from folding the multiplication with more uses.
Test Plan: test/CodeGen/NVPTX/fma-assoc.ll
Patch by Xuetian Weng
Reviewers: hfinkel, apazos, jingyue, ohsallen, arsenm
Subscribers: arsenm, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D11855
llvm-svn: 244649
Other objects can never reference the MergedGlobals symbol so external linkage
is never needed. Using private instead of internal linkage means the object is
more similar to what it looks like when global merging is not enabled, with
the only difference being that the merged variables are addressed indirectly
relative to the start of the section they are in.
Also add aliases for merged variables with internal linkage, as this also makes
the object be more like what it is when they are not merged.
Differential Revision: http://reviews.llvm.org/D11942
llvm-svn: 244615
First step in preventing immediates that occur more than once within a single
basic block from being pulled into their users, in order to prevent unnecessary
large instruction encoding .Currently enabled only when optimizing for size.
Patch by: zia.ansari@intel.com
Differential Revision: http://reviews.llvm.org/D11363
llvm-svn: 244601
Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change:
- Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant.
- f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall!
llvm-svn: 244595
When optimizing for size, replace "addl $4, %esp" and "addl $8, %esp"
following a call by one or two pops, respectively. We don't try to do it in
general, but only when the stack adjustment immediately follows a call - which
is the most common case.
That allows taking a short-cut when trying to find a free register to pop into,
instead of a full-blown liveness check. If the adjustment immediately follows a
call, then every register the call clobbers but doesn't define should be dead at
that point, and can be used.
Differential Revision: http://reviews.llvm.org/D11749
llvm-svn: 244578
The condition for clearing the folding candidate list was clamped together
with the "uninteresting instruction" condition. This is too conservative,
e.g. we don't need to clear the list when encountering an IMPLICIT_DEF.
Differential Revision: http://reviews.llvm.org/D11591
llvm-svn: 244577
Summary: I somehow forgot to add these when I added the basic floating-point opcodes. Also remove ceil/floor/trunc/nearestint for now, and add them only when properly tested.
Subscribers: llvm-commits, sunfish, jfb
Differential Revision: http://reviews.llvm.org/D11927
llvm-svn: 244562
Summary: convertToHexString doesn't represent them correctly at this point in time. This is a follow-up to sunfish's suggestion in D11914.
Subscribers: llvm-commits, sunfish, jfb
Differential Revision: http://reviews.llvm.org/D11925
llvm-svn: 244551
This commit serializes the UsedPhysRegMask register mask from the machine
register information class. The mask is serialized as an inverted
'calleeSavedRegisters' mask to keep the output minimal.
This commit also allows the MIR parser to infer this mask from the register
mask operands if the machine function doesn't specify it.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 244548
Summary:
For now output using C99's hexadecimal floating-point representation.
This patch also cleans up how machine operands are printed: instead of special-casing per type of machine instruction, the code now handles operands generically.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11914
llvm-svn: 244520
The PATCHPOINT instructions have a single optional defined register operand,
but the machine verifier can't verify the optional defined register operands.
This commit makes sure that the machine verifier won't report an error when a
PATCHPOINT instruction doesn't have its optional defined register operand.
This change will allow us to enable the machine verifier for the code
generation tests for the patchpoint intrinsics.
Reviewers: Juergen Ributzka
llvm-svn: 244513
frame setup instruction.
This commit ensures that the stack map lowering code in FastISel adds an
appropriate number of immediate operands to the frame setup instruction.
The previous code added just one immediate operand, which was fine for a target
like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit
operands. This caused the machine verifier to report an error when the old code
added just one.
Reviewers: Juergen Ributzka
Differential Revision: http://reviews.llvm.org/D11853
llvm-svn: 244508
NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF.
As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire.
I did [[ https://github.com/jfbastien/benchmark-x86-flags | a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:
| Time per call (ms) | Runtime (ms) | Benchmark |
| 0.000012514 | 6257 | sete.i386 |
| 0.000012810 | 6405 | sete.i386-fast |
| 0.000010456 | 5228 | sete.x86-64 |
| 0.000010496 | 5248 | sete.x86-64-fast |
| 0.000012906 | 6453 | lahf-sahf.i386 |
| 0.000013236 | 6618 | lahf-sahf.i386-fast |
| 0.000010580 | 5290 | lahf-sahf.x86-64 |
| 0.000010304 | 5152 | lahf-sahf.x86-64-fast |
| 0.000028056 | 14028 | pushf-popf.i386 |
| 0.000027160 | 13580 | pushf-popf.i386-fast |
| 0.000023810 | 11905 | pushf-popf.x86-64 |
| 0.000026468 | 13234 | pushf-popf.x86-64-fast |
Clearly `PUSHF`/`POPF` are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose.
Reviewers: rnk, jvoung, t.p.northover
Subscribers: llvm-commits
Differential revision: http://reviews.llvm.org/D6629
llvm-svn: 244503
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.
Differential Revision: http://reviews.llvm.org/D11886
llvm-svn: 244495
The LDD/STD instructions can load/store a 64bit quantity from/to
memory to/from a consecutive even/odd pair of (32-bit) registers. They
are part of SparcV8, and also present in SparcV9. (Although deprecated
there, as you can store 64bits in one register).
As recommended on llvmdev in the thread "How to enable use of 64bit
load/store for 32bit architecture" from Apr 2015, I've modeled the
64-bit load/store operations as working on a v2i32 type, rather than
making i64 a legal type, but with few legal operations. The latter
does not (currently) work, as there is much code in llvm which assumes
that if i64 is legal, operations like "add" will actually work on it.
The same assumption does not hold for v2i32 -- for vector types, it is
workable to support only load/store, and expand everything else.
This patch:
- Adds a new register class, IntPair, for even/odd pairs of registers.
- Modifies the list of reserved registers, the stack spilling code,
and register copying code to support the IntPair register class.
- Adds support in AsmParser. (note that in asm text, you write the
name of the first register of the pair only. So the parser has to
morph the single register into the equivalent paired register).
- Adds the new instructions themselves (LDD/STD/LDDA/STDA).
- Hooks up the instructions and registers as a vector type v2i32. Adds
custom legalizer to transform i64 load/stores into v2i32 load/stores
and bitcasts, so that the new instructions can actually be
generated, and marks all operations other than load/store on v2i32
as needing to be expanded.
- Copies the unfortunate SelectInlineAsm hack from ARMISelDAGToDAG.
This hack undoes the transformation of i64 operands into two
arbitrarily-allocated separate i32 registers in
SelectionDAGBuilder. and instead passes them in a single
IntPair. (Arbitrarily allocated registers are not useful, asm code
expects to be receiving a pair, which can be passed to ldd/std.)
Also adds a bunch of test cases covering all the bugs I've added along
the way.
Differential Revision: http://reviews.llvm.org/D8713
llvm-svn: 244484
I looked into adding a warning / error for this to FileCheck, but there doesn't
seem to be a good way to avoid it triggering on the instances of it in RUN lines.
llvm-svn: 244481
PR24139 contains an analysis of poor register allocation. One of the findings
was that when calculating the spill weight, a rematerializable interval once
split is no longer rematerializable. This is because the isRematerializable
check in CalcSpillWeights.cpp does not follow the copies introduced by live
range splitting (after splitting, the live interval register definition is a
copy which is not rematerializable).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D11686
llvm-svn: 244439
The pass adds new kernel arguments for image attributes, and
resolves calls to dummy attribute and resource id getter functions.
Patch by: Zoltan Gilian
llvm-svn: 244372
Summary:
Port the ReconstructShuffle function from AArch64 to ARM
to handle mismatched incoming types in the BUILD_VECTOR
node.
This fixes an outstanding FIXME in the ReconstructShuffle
code.
Reviewers: t.p.northover, rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D11720
llvm-svn: 244314
Summary: WebAssembly's tablegen instructions have the names WebAssembly expects, but by LLVM convention they're uppercase and suffixed with their type after an underscore. Leave the C++ code that way, but print outt he names WebAssembly expects (lowercase, no type). We could teach tablegen to do this later, maybe by using `!cast<string>(node)` in the .td files.
Reviewers: sunfish
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D11776
llvm-svn: 244305
The block address machine operands can reference IR blocks in other functions.
This commit fixes a bug where the references to unnamed IR blocks in other
functions weren't serialized correctly.
llvm-svn: 244299
When we are not emitting the condition for the branch, because the condition is
in another BB or SDAG did the selection for us, then we have to mask the flag in
the register with AND.
This is required when the condition comes from a truncate, because SDAG only
truncates down to a legal size of i32.
This fixes rdar://problem/22161062.
llvm-svn: 244291
This reverts commit r243198 and 243304.
Turns out this wasn't the correct fix for this problem. It works only within
FastISel, but fails when the truncate is selected by SDAG.
llvm-svn: 244287
Summary: This allows us to consolidate several of the TableGen patterns.
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11602
llvm-svn: 244253
points.
There is an infinite loop that can occur in Shrink Wrapping while searching
for the Save/Restore points.
Part of this search checks whether the save/restore points are located in
different loop nests and if so, uses the (post) dominator trees to find the
immediate (post) dominator blocks. However, if the current block does not have
any immediate (post) dominators then this search will result in an infinite
loop. This can occur in code containing an infinite loop.
The modification checks whether the immediate (post) dominator is different from
the current save/restore block. If it is not, then the search terminates and the
current location is not considered as a valid save/restore point for shrink wrapping.
Phabricator: http://reviews.llvm.org/D11607
llvm-svn: 244247
This change improves EmitLoweredSelect() so that multiple contiguous CMOV pseudo
instructions with the same (or exactly opposite) conditions get lowered using a single
new basic-block. This eliminates unnecessary extra basic-blocks (and CFG merge points)
when contiguous CMOVs are being lowered.
Patch by: kevin.b.smith@intel.com
Differential Revision: http://reviews.llvm.org/D11428
llvm-svn: 244202
This commit implements the initial serialization of the machine operand target
flags. It extends the 'TargetInstrInfo' class to add two new methods that help
to provide text based serialization for the target flags.
This commit can serialize only the X86 target flags, and the target flags for
the other targets will be serialized in the follow-up commits.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 244185
More specifically, make NVPTXISelDAGToDAG able to emit cached loads (LDG) for pointer induction variables.
Also fix latent bug where LDG was not restricted to kernel functions. I believe that this could not be triggered so far since we do not currently infer that a pointer is global outside a kernel function, and only loads of global pointers are considered for cached loads.
llvm-svn: 244166
This commit serializes the offset for the following operands: target index,
global address, external symbol, constant pool index, and block address.
llvm-svn: 244157
Summary: PR24191 finds that the expected memory-register operations aren't generated when relaxed { load ; modify ; store } is used. This is similar to PR17281 which was addressed in D4796, but only for memory-immediate operations (and for memory orderings up to acquire and release). This patch also handles some floating-point operations.
Reviewers: reames, kcc, dvyukov, nadav, morisset, chandlerc, t.p.northover, pete
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11382
llvm-svn: 244128
return StringSwitch<int>(Flags)
.Case("g", 0x1)
.Case("nzcvq", 0x2)
.Case("nzcvqg", 0x3)
.Default(-1);
...
// The _g and _nzcvqg versions are only valid if the DSP extension is
// available.
if (!Subtarget->hasThumb2DSP() && (Mask & 0x2))
return -1;
ARMARM confirms that the comment is right, and the code was wrong.
llvm-svn: 244029
In r242277, I updated the MachineCombiner to work with itineraries, but I
missed a call that is scheduling-model-only (the opcode-only form of
computeInstrLatency). Using the form that takes an MI* allows this to work with
itineraries (and should be NFC for subtargets with scheduling models).
llvm-svn: 244020
In the commentary for D11660, I wasn't sure if it was alright to create new
integer machine instructions without also creating the implicit EFLAGS operand.
From what I can see, the implicit operand is always created by the MachineInstrBuilder
based on the instruction type, so we don't have to do that explicitly. However, in
reviewing the debug output, I noticed that the operand was not marked as 'dead'.
The machine combiner should do that to preserve future optimization opportunities
that may be checking for that dead EFLAGS operand themselves.
Differential Revision: http://reviews.llvm.org/D11696
llvm-svn: 243990
It introduced two regressions on 64-bit big-endian targets running under N32
(MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4, and
MultiSource/Applications/kimwitu++/kc) The issue is that on 64-bit targets
comparisons such as BEQ compare the whole GPR64 but incorrectly tell the
instruction selector that they operate on GPR32's. This leads to the
elimination of i32->i64 extensions that are actually required by
comparisons to work correctly.
There's currently a patch under review that fixes this problem.
llvm-svn: 243984
This adds the software division routines for the Windows RTABI. These are not
expected to be used often though as most modern Windows ARM capable targets
support hardware division. In the case that the target CPU doesnt support
hardware division, this will be the fallback.
llvm-svn: 243952
There's a bunch of code in LowerFCOPYSIGN that does smart lowering, and
is actually already vector-aware; let's use it instead of scalarizing!
The only interesting change is that for v2f32, we previously always used
use v4i32 as the integer vector type.
Use v2i32 instead, and mark FCOPYSIGN as Custom.
llvm-svn: 243926
We used to legalize it like it's any other binary operations. It's not,
because it accepts mismatched operand types. Because of that, we used
to hit various asserts and miscompiles.
Specialize vector legalizations to, in the worst case, unroll, or, when
possible, to just legalize the operand that needs legalization.
Scalarization isn't covered, because I can't think of a target where
some but not all of the 1-element vector types are to be scalarized.
llvm-svn: 243924
Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s.
The backend is liable to start relying on that (if it hasn't already),
so make uniquable `DICompileUnit`s illegal and automatically upgrade old
bitcode. This is a nice cleanup, since we can remove an unnecessary
`DenseSet` (and the associated uniquing info) from `LLVMContextImpl`.
Almost all the testcases were updated with this script:
git grep -e '= !DICompileUnit' -l -- test |
grep -v test/Bitcode |
xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,'
I imagine something similar should work for out-of-tree testcases.
llvm-svn: 243885
This is necessary for WatchOS support, where the compact unwind format assumes
this kind of layout. For now we only want this on Swift-like CPUs though, where
it's been the Xcode behaviour for ages. Also, since it can expand the prologue
we don't want it at -Oz.
llvm-svn: 243884
Enabling merging of extern globals appears to be generally either beneficial or
harmless. On some benchmarks suites (on Cortex-M4F, Cortex-A9, and Cortex-A57)
it gives improvements in the 1-5% range, but in the rest the overall effect is
zero.
Differential Revision: http://reviews.llvm.org/D10966
llvm-svn: 243874
In http://reviews.llvm.org/rL215382, IT forming was made more conservative under
the belief that a flag-setting instruction was unpredictable inside an IT block on ARMv6M.
But actually, ARMv6M doesn't even support IT blocks so that's impossible. In the ARMARM for
v7M, v7AR and v8AR it states that the semantics of such an instruction changes inside an
IT block - it doesn't set the flags. So actually it is fine to use one inside an IT block
as long as the flags register is dead afterwards.
This gives significant performance improvements in a variety of MPEG based workloads.
Differential revision: http://reviews.llvm.org/D11680
llvm-svn: 243869
Summary: This currently sets the shift amount RHS to the same type as the LHS, and assumes that the LHS is a simple type. This isn't currently the case e.g. with weird integers sizes, but will eventually be true and will assert if not. That's what you get for having an experimental backend: break it and you get to keep both pieces. Most backends either set the RHS to MVT::i32 or MVT::i64, but WebAssembly is a virtual ISA and tries to have regular-looking binary operations where both operands are the same type (even if a 64-bit RHS shifter is slightly silly, hey it's free!).
Subscribers: llvm-commits, sunfish, jfb
Differential Revision: http://reviews.llvm.org/D11715
llvm-svn: 243860
The XformToShuffleWithZero method currently checks AND masks at the per-lane level for all-one and all-zero constants and attempts to convert them to legal shuffle clear masks.
This patch generalises XformToShuffleWithZero, splitting and checking the sub-lanes of the constants down to the byte level to see if any legal shuffle clear masks are possible. This allows a lot of masks (often from legalization or truncation) to be folded into existing shuffle patterns and removes a lot of constant mask loading.
There are a few examples of poor shuffle lowering that are exposed by this patch that will be cleaned up in future patches (e.g. merging shuffles that are separated by bitcasts, x86 legalized v8i8 zero extension uses PMOVZX+AND+AND instead of AND+PMOVZX, etc.)
Differential Revision: http://reviews.llvm.org/D11518
llvm-svn: 243831
Summary: Also test 64-bit integers, except shifts for now which are broken because isel dislikes the 32-bit truncate that precedes them.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11699
llvm-svn: 243822
This commit fixes a bug in the class 'SIInstrInfo' where the implicit register
machine operands were added to a machine instruction in an incorrect order -
the implicit uses were added before the implicit defs.
I found this bug while working on moving the implicit register operand
verification code from the MIR parser to the machine verifier.
This commit also makes the method 'addImplicitDefUseOperands' in the machine
instruction class public so that it can be reused in the 'SIInstrInfo' class.
Reviewers: Matt Arsenault
Differential Revision: http://reviews.llvm.org/D11689
llvm-svn: 243799
Summary:
For example, in
struct S {
int *x;
int *y;
};
__global__ void foo(S s) {
int *b = s.y;
// use b
}
"b" is guaranteed to point to global. NVPTX should emit ld.global/st.global for
accessing "b".
Reviewers: jholewinski
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11505
llvm-svn: 243790
Summary:
Use -1 as numoperands for the return SDTypeProfile, denoting that return is variadic. Note that the patterns in InstrControl.td still need to match the inputs, so this ins't an "anything goes" variadic on ret!
The next step will be to handle other local types (not just int32).
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11692
llvm-svn: 243783
Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags,
using `DW_TAG_variable` in their place Stop exposing the `tag:` field at
all in the assembly format for `DILocalVariable`.
Most of the testcase updates were generated by the following sed script:
find test/ -name "*.ll" -o -name "*.mir" |
xargs grep -l 'DILocalVariable' |
xargs sed -i '' \
-e 's/tag: DW_TAG_arg_variable, //' \
-e 's/tag: DW_TAG_auto_variable, //'
There were only a handful of tests in `test/Assembly` that I needed to
update by hand.
(Note: a follow-up could change `DILocalVariable::DILocalVariable()` to
set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable`
(as appropriate), instead of having that logic magically in the backend
in `DbgVariable`. I've added a FIXME to that effect.)
llvm-svn: 243774
Summary:
This prints assembly for int32 integer operations defined in WebAssemblyInstrInteger.td only, with major caveats:
- The operation names are currently incorrect.
- Other integer and floating-point types will be added later.
- The printer isn't factored out to handle recursive AST code yet, since it can't even handle control flow anyways.
- The assembly format isn't full s-expressions yet either, this will be added later.
- This currently disables PrologEpilogCodeInserter as well as MachineCopyPropagation becasue they don't like virtual registers, which WebAssembly likes quite a bit. This will be fixed by factoring out NVPTX's change (currently a fork of PrologEpilogCodeInserter).
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11671
llvm-svn: 243763
Add i16, i32, i64 imul machine instructions to the list of reassociation
candidates.
A new bit of logic is needed to handle integer instructions: they have an
implicit EFLAGS operand, so we have to make sure it's dead in order to do
any reassociation with integer ops.
Differential Revision: http://reviews.llvm.org/D11660
llvm-svn: 243756
Summary:
Favor the extended reg patterns over the shifted reg patterns that match
only the operand shift and not the full sign/zero extend and shift.
Reviewers: jmolloy, t.p.northover
Subscribers: mcrosier, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D11569
llvm-svn: 243753
For a modulo (reminder) operation,
clang -target armv7-none-linux-gnueabi generates "__modsi3"
clang -target armv7-none-eabi generates "__aeabi_idivmod"
clang -target armv7-linux-androideabi generates "__modsi3"
Android bionic libc doesn't provide a __modsi3, instead it provides a
"__aeabi_idivmod". This patch fixes the LLVM ARMISelLowering to generate
the correct call when ever there is a modulo operation.
Differential Revision: http://reviews.llvm.org/D11661
llvm-svn: 243717
Fixing MinSize attribute handling was discussed in D11363.
This is a prerequisite patch to doing that.
The handling of OptSize when lowering mem* functions was broken
on Darwin because it wants to ignore -Os for these cases, but the
existing logic also made it ignore -Oz (MinSize).
The Linux change demonstrates a widespread problem. The backend
doesn't usually recognize the MinSize attribute by itself; it
assumes that if the MinSize attribute exists, then the OptSize
attribute must also exist.
Fixing this more generally will be a follow-on patch or two.
Differential Revision: http://reviews.llvm.org/D11568
llvm-svn: 243693
Summary:
This hidden option would disable code generation through FastISel by
default. It was removed from the available options and from the
Fast-ISel tests that required it in order to run the tests.
Reviewers: dsanders
Subscribers: qcolombet, llvm-commits
Differential Revision: http://reviews.llvm.org/D11610
llvm-svn: 243638
Summary:
Previously, we would sign-extend non-boolean negative constants and
zero-extend otherwise. This was problematic for PHI instructions with
negative values that had a type with bitwidth less than that of the
register used for materialization.
More specifically, ComputePHILiveOutRegInfo() assumes the constants
present in a PHI node are zero extended in their container and
afterwards deduces the known bits.
For example, previously we would materialize an i16 -4 with the
following instruction:
addiu $r, $zero, -4
The register would end-up with the 32-bit 2's complement representation
of -4. However, ComputePHILiveOutRegInfo() would generate a constant
with the upper 16-bits set to zero. The SelectionDAG builder would use
that information to generate an AssertZero node that would remove any
subsequent trunc & zero_extend nodes.
In theory, we should modify ComputePHILiveOutRegInfo() to consult
target-specific hooks about the way they prefer to materialize the
given constants. However, git-blame reports that this specific code
has not been touched since 2011 and it seems to be working well for every
target so far.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11592
llvm-svn: 243636
Bonus change to remove emacs major mode marker from SystemZMachineFunctionInfo.cpp because emacs already knows it's C++ from the extension. Also fix typo "appeary" in AMDGPUMCAsmInfo.h.
llvm-svn: 243585
This patch improves the 32-bit target i64 constant matching to detect the shuffle vector splats that are introduced by i64 vector shift vectorization (D8416).
Differential Revision: http://reviews.llvm.org/D11327
llvm-svn: 243577
It's potentially more efficient on Cyclone, and from the optimization guides &
schedulers looks like it has no effect on Cortex-A53 or A57. In general you'd
expect a MOV to be about the most efficient instruction with its semantics,
even though the official "UXTW" alias is really a UBFX.
llvm-svn: 243576
This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed.
Differential Revision: http://reviews.llvm.org/D11439
llvm-svn: 243569
Given certain shuffle-vector masks, LLVM emits splat instructions
which splat the wrong bytes from the source register. The issue is
that the function PPC::isSplatShuffleMask() in PPCISelLowering.cpp
does not ensure that the splat pattern found is requesting bytes that
are aligned on an EltSize boundary. This patch detects this situation
as not a valid splat mask, resulting in a permute being generated
instead of a splat.
Patch and test case by Tyler Kenney, cleaned up a bit by me.
This is a simple bug fix that would be good to incorporate into 3.7.
llvm-svn: 243519
This commit defines subtarget feature strict-align and uses it instead of
cl::opt -aarch64-strict-align to decide whether strict alignment should be
forced.
rdar://problem/21529937
llvm-svn: 243516
PR24141: https://llvm.org/bugs/show_bug.cgi?id=24141
contains a test case where we have duplicate entries in a node's uses() list.
After r241826, we use CombineTo() to delete dead nodes when combining the uses into
reciprocal multiplies, but this fails if we encounter the just-deleted node again in
the list.
The solution in this patch is to not add duplicate entries to the list of users that
we will subsequently iterate over. For the test case, this avoids triggering the
combine divisors logic entirely because there really is only one user of the divisor.
Differential Revision: http://reviews.llvm.org/D11345
llvm-svn: 243500
This commit defines subtarget feature strict-align and uses it instead of
cl::opt -arm-strict-align to decide whether strict alignment should be
forced. Also, remove the logic that was checking the OS and architecture
as clang is now responsible for setting strict-align based on the command
line options specified and the target architecute and OS.
rdar://problem/21529937
http://reviews.llvm.org/D11470
llvm-svn: 243493
Reapply 243271 with more fixes; although we are not handling multiple
sources with coalescable copies, we were not properly skipping this
case.
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
A:
psllq %mm1, %mm0
movd %mm0, %r9
jmp C
B:
por %mm1, %mm0
movd %mm0, %r9
jmp C
C:
movd %r9, %mm0
pshufw $238, %mm0, %mm0
Becomes:
A:
psllq %mm1, %mm0
jmp C
B:
por %mm1, %mm0
jmp C
C:
pshufw $238, %mm0, %mm0
Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526
llvm-svn: 243486
Summary:
Currently, we support only the MIPS O32 ABI calling convention for call
lowering. With this change we avoid using the O32 calling convetion for
lowering calls marked as using the fast calling convention.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11515
llvm-svn: 243485
Summary:
Generate correct code for the select instruction by zero-extending
it's boolean/condition operand to GPR-width. This is necessary because
the conditional-move instructions operate on the whole register.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11506
llvm-svn: 243469
If the pointer is the store's value operand, this would produce
a broken module. Make sure the use is actually for the pointer operand.
llvm-svn: 243462
GR64 <-> VR64 copies).
This commit adds a MIR test case for the commit r242191, which was committed
without one. This test case verifies that the ExpandPostRA pass expands the
GR64 <-> VR64 copies into the appropriate MMX_MOV instructions.
llvm-svn: 243457
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.
clang and driver changes in http://reviews.llvm.org/D10524
Added -femulated-tls flag to select the emulated TLS model,
which will be used for old targets like Android that do not
support ELF TLS models.
Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.
Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.
TODO: Add proper DIE for emulated TLS variables.
Added new unit tests with emulated TLS.
Differential Revision: http://reviews.llvm.org/D10522
llvm-svn: 243438
Summary:
Add patterns for doing floating point round with various rounding modes
followed by conversion to int as a single FCVT* instruction.
Reviewers: t.p.northover, jmolloy
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D11424
llvm-svn: 243422
This path add the aarch64 lowering of __builtin_thread_pointer. It uses
the already implemented AArch64ISD::THREAD_POINTER used in TLS generation.
llvm-svn: 243412
VPAND is a lot faster than VPSHUFB and VPBLENDVB - this patch ensures we attempt to lower to a basic bitmask before lowering to the slower byte shuffle/blend instructions.
Split off from D11518.
Differential Revision: http://reviews.llvm.org/D11541
llvm-svn: 243395
This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ).
I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy
when targeting 32-bit x86 and causes a miscompile/crash in Wine:
https://bugs.winehq.org/show_bug.cgi?id=38826https://llvm.org/bugs/show_bug.cgi?id=22371#c25
The quick fix is to simply remove the scalar FP logical instructions from the load folding table
in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs,
fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP
logical instructions (because that's all x86 gives us anyway). That lets us do the load folding
legally.
Differential Revision: http://reviews.llvm.org/D11477
llvm-svn: 243361
Summary: WebAssemblySubtarget.cpp expects a default 'generic' CPU to exist, and this seems to be prevalent with other targets. It makes sense to have something between MVP and bleeding-edge, even though for now it's the same as MVP. This removes a warning that's currently generated.
Subscribers: jfb, llvm-commits, sunfish
Differential Revision: http://reviews.llvm.org/D11546
llvm-svn: 243345
This commit serializes the references from the machine basic blocks to the
unnamed basic blocks.
This commit adds a new attribute to the machine basic block's YAML mapping
called 'ir-block'. This attribute contains the actual reference to the
basic block.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 243340
be reserved.
The decision to reserve x18 is going to be made solely by the front-end,
so it isn't necessary to check if the OS is Darwin in the backend.
llvm-svn: 243308
Author: Dave Airlie <airlied@redhat.com>
In order to implement indirect sampler loads, we don't
want to match on a VGPR load but an SGPR one for constants,
as we cannot feed VGPRs to the sampler only SGPRs.
this should be applicable for llvm 3.7 as well.
llvm-svn: 243294
This commit zeroes out the virtual register references in the machine
function's liveins in the class 'MachineRegisterInfo' when the virtual
register definitions are cleared.
Reviewers: Matthias Braun
llvm-svn: 243290
Reapply r242295 with fixes in the implementation.
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
A:
psllq %mm1, %mm0
movd %mm0, %r9
jmp C
B:
por %mm1, %mm0
movd %mm0, %r9
jmp C
C:
movd %r9, %mm0
pshufw $238, %mm0, %mm0
Becomes:
A:
psllq %mm1, %mm0
jmp C
B:
por %mm1, %mm0
jmp C
C:
pshufw $238, %mm0, %mm0
Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526
llvm-svn: 243271
When truncating to non-legal types (such as i16, i8 and i1) always use an AND
instruction to mask out the upper bits. This was only done when the source type
was an i64, but not when the source type was an i32.
This commit fixes this and adds the missing i32 truncate tests.
This fixes rdar://problem/21990703.
llvm-svn: 243198
extension property we're requesting - zero or sign extended.
This fixes cases where we want to return a zero extended 32-bit -1
and not be sign extended for the entire register. Also updated the
already out of date comment with the current behavior.
llvm-svn: 243192
whether register x18 should be reserved.
This change is needed because we cannot use a backend option to set
cl::opt "aarch64-reserve-x18" when doing LTO.
Out-of-tree projects currently using cl::opt option "-aarch64-reserve-x18"
to reserve x18 should make changes to add subtarget feature "reserve-x18"
to the IR.
rdar://problem/21529937
Differential Revision: http://reviews.llvm.org/D11463
llvm-svn: 243186
Add a verifier check that `DILocalVariable`s of tag
`DW_TAG_arg_variable` always have a non-zero 'arg:' field, and those of
tag `DW_TAG_auto_variable` always have a zero 'arg:' field. These are
the only configurations that are properly understood by the backend.
(Also, fix the bad examples in LangRef and test/Assembler, and fix the
bug in Kaleidoscope Ch8.)
A large number of testcases seem to have bitrotted their way forward
from some ancient version of the debug info hierarchy that didn't have
`arg:` parameters. If you have out-of-tree testcases that start failing
in the verifier and you don't care enough to get the `arg:` right, you
may have some luck just calling:
sed -e 's/, arg: 0/, arg: 1/'
or some such, but I hand-updated the ones in tree.
llvm-svn: 243183
This commit serializes the callee saved information from the class
'MachineFrameInfo'. This commit extends the YAML mappings for the fixed and
the ordinary stack objects and adds an optional 'callee-saved-register'
attribute. This attribute is used to serialize the callee save information.
llvm-svn: 243173
This commit serializes the virtual register allocations hints of type 0.
These hints specify the preferred physical registers for allocations.
llvm-svn: 243156
This commit adds the liveins and successors properties to machine basic blocks
in some of the MIR tests to ensure that the tests will pass when the MIR parser
will run the machine verifier after initializing a machine function.
llvm-svn: 243124
This commit moves and transforms the generic test
'CodeGen/MIR/successor-basic-blocks.mir' into an X86 specific test
'CodeGen/MIR/X86/successor-basic-blocks.mir'. This change is required in order
to enable the machine verifier for the MIR parser, as the machine verifier
verifies that the machine basic blocks contain instructions that actually
determine the machine basic block successors.
llvm-svn: 243123
Some shufflevectors are currently being incorrectly lowered in the AArch32
backend as the existing checks for detecting the NEON operations from the
shufflevector instruction expects the shuffle mask and the vector operands to be
of the same length.
This is not always the case as the mask may be twice as long as the operand;
here only the lower half of the shufflemask gets checked, so provided the lower
half of the shufflemask looks like a vector transpose (or even is just all -1
for undef) then the intrinsics may get incorrectly lowered into a vector
transpose (VTRN) instruction.
This patch fixes this by accommodating for both cases and adds regression tests.
Differential Revision: http://reviews.llvm.org/D11407
llvm-svn: 243103
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.
Differential Revision: http://reviews.llvm.org/D11408
llvm-svn: 243100
The test in PR24199 ( https://llvm.org/bugs/show_bug.cgi?id=24199 ) crashes because machine
trace metrics was not ignoring dbg_value instructions when calculating data dependencies.
The machine-combiner pass asks machine trace metrics to calculate an instruction trace,
does some reassociations, and calls MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
along with MachineTraceMetrics::invalidate(). The dbg_value instructions have their operands
invalidated, but the instructions are not expected to be deleted.
On a subsequent loop iteration of the machine-combiner pass, machine trace metrics would be
called again and die while accessing the invalid debug instructions.
Differential Revision: http://reviews.llvm.org/D11423
llvm-svn: 243057
Adds pushes to the folding tables.
This also required a fix to the TD definition, since the memory forms of
the push instructions did not have the right mayLoad/mayStore flags.
Differential Revision: http://reviews.llvm.org/D11340
llvm-svn: 243010
The DAG Node "SCALAR_TO_VECTOR" may be created if the type of the scalar element is legal.
Added a check for the scalar type before creating this node.
Added a test that fails with assertion on the current version.
Differential Revision: http://reviews.llvm.org/D11413
llvm-svn: 242994
This commit broke the build. Numerous build bots broken, and it was
blocking my progress so reverting.
It should be trivial to reproduce -- enable the BPF backend and it
should fail when running llvm-tblgen.
llvm-svn: 242992