When AA is being used, non-aliasing stores are canonicalized to use the same
chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of
this by looking only as users of a store's chain operand. However, user
iteration is not result-number specific, we need to check that the use is as a
chain operand, and not via some other operand. It is certainly possible to have
another potentially-aliasing store, which shares the first's base pointer, and
uses the first's chain's node via some other operand.
Failure to catch this situation caused, at least in the included test case, an
assert later because the relative sequence-number ordering caused later
replacement to create a cycle in the DAG.
llvm-svn: 248698
Summary:
Factor the code that rewrites invokes to calls and rewrites WinEH
terminators to their "unwind to caller" equivalents into a helper in
Utils/Local, and use it in the three places I'm aware of that need to do
this.
Reviewers: andrew.w.kaylor, majnemer, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13152
llvm-svn: 248677
Trying to use the version with the explicit output operand
would complain because of the missing WriteSALU. I'm not sure
why it doesn't complain about this with the implicit VCC def.
llvm-svn: 248646
BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change:
Pros:
1. It uses only a half space of the current one.
2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer.
Cons:
1. Constructing a probability using arbitrary numerator and denominator needs additional calculations.
2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio.
One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern.
Differential revision: http://reviews.llvm.org/D12603
llvm-svn: 248633
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).
The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.
This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.
The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.
Differential Revision: http://reviews.llvm.org/D12635
llvm-svn: 248622
The algorithm would not modify the live-in list of blocks below the save
block point which is correct unless it happens to be a restore point at
the same time.
Also fixes the benign issue of live-in registers being added twice in
some cases.
The testcase is based on a test submitted by Kit Barton.
Differential Revision: http://reviews.llvm.org/D13176
llvm-svn: 248620
If a virtual register is copied and another copy was already
seen, replace with the previous copy. This only handles the
simplest cases for now.
This pattern shows up from various operand restrictions
AMDGPU has which require inserting copies depending
on the register class of the operands.
llvm-svn: 248611
This fixes a select error when the i64 source was also
bitcasted to v2i32 in the original source.
Instead of awkwardly trying to select the modified source value and
the store, replace before isel begins.
Uses a worklist to avoid possible problems from mutating the DAG,
although it seems to work OK without it.
llvm-svn: 248589
These were all using the default 32-bit VALU write class,
but the i64/f64 compares are half rate.
I'm not sure this is really correct, because they are still using
the write to VALU write class, even though they really write
to the SALU.
llvm-svn: 248582
We now emit the compiler generated divide by zero check that was needed for the
MSVC routines. We construct a psuedo-instruction for the DBZ check as the
operation requires splitting up the BB. For the 64-bit operations, we need to
custom expand the node as we need to insert the DBZ check and then emit the
libcall to the appropriate name. Because this is target specific, it seemed
better to reproduce the expansion operation from the target-agnostic type
legalization rather than sink this there to avoid the duplication. The division
library calls now match MSVC semantically.
llvm-svn: 248561
Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H
llvm-svn: 248536
These are necessary for implementing mem_fence for
OpenCL 2.0.
The VI assembler tests are disabled since it seems to be
using the wrong encoding or opcode.
llvm-svn: 248532
Fixes the overflow case of llvm.*absdiff intrinsic also updats the tests and LangRef.rst accordingly.
Differential Revision: http://reviews.llvm.org/D11678
llvm-svn: 248483
Allow a target to do something other than search for copies
that will avoid cross register bank copies.
Implement for SI by only rewriting the most basic copies,
so it should look through anything like a subregister extract.
I'm not entirely satisified with this because it seems like
eliminating a reg_sequence that isn't fully used should work
generically for all targets without them having to override
something. However, it seems to be tricky to have a simple
implementation of this without rewriting to invalid kinds
of subregister copies on some targets.
I'm not sure if there is currently a generic way to easily check
if a subregister index would be valid for the current use.
The current set of TargetRegisterInfo::get*Class functions don't
quite behave like I would expect (e.g. getSubClassWithSubReg
returns the maximal register class rather than the minimal), so
I'm not sure how to make the generic test keep searching if
SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making
the default implementation to check for simple copies breaks
a variety of ARM and x86 tests by producing illegal subregister uses.
The ARM tests are not actually changed since it should still be using
the same sharesSameRegisterFile implementation, this just relaxes
them to not check for specific registers.
llvm-svn: 248478
If the stores are storing values from loads which partially
alias the stores, we could end up placing the merged loads
and stores on the same chain which has the potential to break.
Each store may have a different chain dependency on only some
of the original loads. Create a new TokenFactor to capture all
of the required dependencies of the stores rather than assuming
all stores can use the same chain.
The testcase is a situation where this happens, although
it does not have an observable change from this. The DAG nodes
just happened to not be reordered before despite this missing
chain dependency.
This is based on an off-list report for an out of tree target
which regressed due to r246307 and I haven't managed to find a case
where the nodes do end up reordered with an in tree target.
llvm-svn: 248468
Instead of always inserting a copy in case
the super register is itself a subregister,
only extract to the super reg class if this is
actually the case.
This shouldn't really change codegen, but
makes looking at the output of SIFixSGPRCopies
easier to read.
llvm-svn: 248467
This time, the issue is that we weren't accounting for the possibility that
aligned DPRs could have been stored after the final "push" in a prologue. When
that happened we effectively moved a "sub sp, #N" from below the aligned stores
to above them, and everything went to pot.
To make it worse, I'd actually committed something testing that we produced
wrong code, so the test update is tiny.
llvm-svn: 248437
This patch changes the order of GEPs generated by Splitting GEPs
pass, specially when one of the GEPs has constant and the base is
loop invariant, then we will generate the GEP with constant first
when beneficial, to expose more cases for LICM.
If originally Splitting GEP generate the following:
do.body.i:
%idxprom.i = sext i32 %shr.i to i64
%2 = bitcast %typeD* %s to i8*
%3 = shl i64 %idxprom.i, 2
%uglygep = getelementptr i8, i8* %2, i64 %3
%uglygep7 = getelementptr i8, i8* %uglygep, i64 1032
...
Now it genereates:
do.body.i:
%idxprom.i = sext i32 %shr.i to i64
%2 = bitcast %typeD* %s to i8*
%3 = shl i64 %idxprom.i, 2
%uglygep = getelementptr i8, i8* %2, i64 1032
%uglygep7 = getelementptr i8, i8* %uglygep, i64 %3
...
For no-loop cases, the original way of generating GEPs seems to
expose more CSE cases, so we don't change the logic for no-loop
cases, and only limit our change to the specific case we are
interested in.
llvm-svn: 248420
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.
LLVM counterpart to D12835
Differential Revision: http://reviews.llvm.org/D13002
llvm-svn: 248368
We may have subregister defs which are unused but not discovered and
cleaned up prior to liveness analysis. This creates multiple connected
components in the resulting live range which are forbidden in the
MachineVerifier because they would unnecesarily constrain the register
allocator. Rewrite those dead definitions to define a newly created
virtual register.
Differential Revision: http://reviews.llvm.org/D13035
llvm-svn: 248335
ARM counterpart to r248291:
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.
Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.
Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".
Differential Revision: http://reviews.llvm.org/D13033
llvm-svn: 248294
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.
Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.
Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".
Differential Revision: http://reviews.llvm.org/D13033
llvm-svn: 248291
The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice.
Differential Revision: http://reviews.llvm.org/D12969
llvm-svn: 248266
This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane.
The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well.
Differential Revision: http://reviews.llvm.org/D12561
llvm-svn: 248250
This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents.
This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t))))
Differential Revision: http://reviews.llvm.org/D13003
llvm-svn: 248210
The vext pseudo-instruction takes the number of elements that need to be
extracted, not the number of bytes. Hence, use the number of elements
directly instead of scaling them with a factor.
Reviewers: Silviu Baranga, James Molloy
(not reflected in the differential revision)
Differential Revision: http://reviews.llvm.org/D12974
llvm-svn: 248208
The ISD::FPOW and ISD::FSINCOS opcodes default to Legal, but there
is no legal instruction for those on SystemZ. This could cause
LLVM internal errors. Fixed by setting the operation action to
Expand for those opcodes.
Also added test cases for all other LLVM IR intrinsics that should
generate a library call. (Those already work correctly since the
default operation action is fine.)
llvm-svn: 248180
If storing multiple FP constants, some subset of the stores
would be replaced with integers due to visit order, so
MergeConsecutiveStores would only partially merge
these.
llvm-svn: 248169
Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1))
Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x))
Differential Revision: http://reviews.llvm.org/D12663
llvm-svn: 248091
In if-conversion, there is a utility function MergeBlocks() that is used to merge blocks. However, when new edges are built in this function the edge weight is either not provided or not updated properly, leading to a modified CFG with incorrect edge weights. This patch corrects this issue.
Differential Revision: http://reviews.llvm.org/D12513
llvm-svn: 248030
In ARMBaseInstrInfo::isProfitableToIfCvt(), there is a simple cost model in which the number of cycles is scaled by a probability to estimate the cost. However, when the number of cycles is small (which is usually the case), there is a precision issue after the computation. To avoid this issue, this patch scales those cycles by 1024 (chosen to make the multiplication a litter faster) before they are scaled by the probability. Other variables are also scaled up for the final comparison.
Differential Revision: http://reviews.llvm.org/D12742
llvm-svn: 248018
Summary:
For bitfield insert OR matching, check both operands for larger pattern
first before checking for smaller pattern.
Add pattern for unsigned bitfield insert-in-zero done with SHL+AND.
Resolves PR21631.
Reviewers: jmolloy, t.p.northover
Subscribers: aemerson, rengolin, llvm-commits, mcrosier
Differential Revision: http://reviews.llvm.org/D12908
llvm-svn: 248006
- Strenghten the logic to be sure we hoist the restore point out of the current
loop. (The fixes a bug with infinite loop, added as part of the patch.)
- Walk over the exit blocks of the current loop to conver to the desired restore
point in one iteration of the update loop.
llvm-svn: 247958
Windows EH funclets need to be contiguous. The FuncletLayout pass will
ensure that the funclets are together and begin with a funclet entry MBB.
Differential Revision: http://reviews.llvm.org/D12943
llvm-svn: 247937
This makes catchret look more like a branch, and less like a weird use
of BlockAddress. It also lets us get away from
llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label
arithmetic.
llvm-svn: 247936
getLandingPadSuccessor assumes that each invoke can have at most one EH
pad successor, but WinEH invokes can have more than one. Two out of
three callers of getLandingPadSuccessor don't use the returned
landingpad, so we can make them use this simple predicate instead.
Eventually we'll have to circle back and fix SplitKit.cpp so that
register allocation works. Baby steps.
llvm-svn: 247904
AVX-512 does not provide an instruction that shuffles mask register. So I do the following way:
mask-2-simd , shuffle simd , simd-2-mask
Differential Revision: http://reviews.llvm.org/D12727
llvm-svn: 247876
The MSVC doesn't really support exception specifications so let's just
turn these into cleanuppads. Later, we might use terminatepad to more
efficiently encode the "noexcept"-ness of a function body.
llvm-svn: 247848
Clang now passes the adjectives as an argument to catchpad.
Getting the CatchObj working is simply a matter of threading another
static alloca through codegen, first as an alloca, then as a frame
index, and finally as a frame offset.
llvm-svn: 247844
We are experimenting with a new approach to saving and restoring SSA
values used across funclets: let the register allocator do the dirty
work for us.
However, this means that we need to be able to clone commoned blocks
without relying on demotion.
llvm-svn: 247835
Otherwise we'd try to emit the thunk that passes the LSDA to
__CxxFrameHandler3. We don't emit the LSDA if there were no landingpads,
so we'd end up with an assembler error when trying to write the COFF
object.
llvm-svn: 247820
This pass implements a simple algorithm for conversion from CFG to
wasm's structured control flow. It doesn't yet handle multiple-entry
loops; that will be added in a future patch.
It also adds initial support for switch statements.
Differential Revision: http://reviews.llvm.org/D12735
llvm-svn: 247818
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing,
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests:
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.
This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.
This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.
Differential Revision: http://reviews.llvm.org/D12095
llvm-svn: 247815
When trying emit a stack adjustments using pops, frame lowering selects an
arbitrary free GPR. It should always select one from an appropriate class...
This fixes PR24649.
Patch by: amjad.aboud@intel.com
Differential Revision: http://reviews.llvm.org/D12609
llvm-svn: 247785
When building LLVM as a (potentially dynamic) library that can be linked against
by multiple compilers, the default triple is not really meaningful.
We allow to explicitely set it to an empty string when configuring LLVM.
In this case, said "target independent" tests in the test suite that are using
the default triple are disabled by matching the newly available feature
"default_triple".
Reviewers: probinson, echristo
Differential Revision: http://reviews.llvm.org/D12660
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247775
Under certain circumstances, tryBuildVectorShuffle would attempt to
create a BUILD_VECTOR node with an invalid combination of types.
This happened when one of the components of the original BUILD_VECTOR
was itself a TRUNCATE node. That TRUNCATE was stripped off during
intermediate processing to simplify code, but when adding the node
back to the result vector, we still need it to get the type right.
llvm-svn: 247694
Turning (op x (mul y k)) into (op x (lsl (mul y k>>n) n)) is beneficial when
we can do the lsl as a shifted operand and the resulting multiply constant is
simpler to generate.
Do this by doing the transformation when trying to select a shifted operand,
as that ensures that it actually turns out better (the alternative would be to
do it in PreprocessISelDAG, but we don't know for sure there if extracting the
shift would allow a shifted operand to be used).
Differential Revision: http://reviews.llvm.org/D12196
llvm-svn: 247569
KNL does not have VXORPS, VORPS for 512-bit values.
I use integer VPXOR, VPOR that actually do the same.
X86ISD::FXOR/FOR are generated as a result of FSUB combining.
Differential Revision: http://reviews.llvm.org/D12753
llvm-svn: 247523
The changes in:
test/CodeGen/X86/machine-cp.ll
are just due to scheduling differences after some logic instructions were reassociated.
llvm-svn: 247516
realignment should be forced.
With this commit, we can now force stack realignment when doing LTO and
do so on a per-function basis. Also, add a new cl::opt option
"stackrealign" to CommandFlags.h which is used to force stack
realignment via llc's command line.
Out-of-tree projects currently using -force-align-stack to force stack
realignment should make changes to attach the attribute to the functions
in the IR.
Differential Revision: http://reviews.llvm.org/D11814
llvm-svn: 247450
We used different conditions to determine if we should emit startproc vs
endproc. Use the same condition to ensure that they will always be
paired.
This fixes PR24374.
llvm-svn: 247435
The rest of the EH pads are fine, since they have at most one label and
take fewer operands for the personality.
Old catchpad vs. new:
%5 = catchpad [i8* bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8*)] to label %__except.ret.10 unwind label %catchendblock.9
-----
%5 = catchpad [i8* bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8*)]
to label %__except.ret.10 unwind label %catchendblock.9
llvm-svn: 247433
The Win32 EH runtime caller does not preserve EBP, even though it does
preserve the CSRs (EBX, ESI, EDI) for us. The result was that each
finally funclet call would leave the frame pointer off by 12 bytes.
llvm-svn: 247348
The (mostly-deprecated) SelectionDAG-based ILPListDAGScheduler scheduler
was making poor scheduling decisions, causing high register pressure and
extraneous register spills.
Switching to the newer machine scheduler generates better code -- even
without there being a machine model defined for SPARC yet.
(Actually committing the test changes too, this time, unlike r247315)
llvm-svn: 247343
Summary:
The coloring code in WinEHPrepare queues cleanuprets' successors with the
correct color (the parent one) when it sees their cleanuppad, and so later
when iterating successors knows to skip processing cleanuprets since
they've already been queued. This latter check was incorrectly under an
'else' condition and so inadvertently was not kicking in for single-block
cleanups. This change sinks the check out of the 'else' to fix the bug.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12751
llvm-svn: 247299
order.
The implicit register verifier in the MIR parser should only check if the
instruction's default implicit operands are present in the instruction. It
should not check the order in which they occur.
llvm-svn: 247283