Summary:
For bitfield insert OR matching, check both operands for larger pattern
first before checking for smaller pattern.
Add pattern for unsigned bitfield insert-in-zero done with SHL+AND.
Resolves PR21631.
Reviewers: jmolloy, t.p.northover
Subscribers: aemerson, rengolin, llvm-commits, mcrosier
Differential Revision: http://reviews.llvm.org/D12908
llvm-svn: 248006
- Strenghten the logic to be sure we hoist the restore point out of the current
loop. (The fixes a bug with infinite loop, added as part of the patch.)
- Walk over the exit blocks of the current loop to conver to the desired restore
point in one iteration of the update loop.
llvm-svn: 247958
Summary:
The BUILD_VECTOR node will truncate its operators to match the
type. We need to take this into account when constant folding -
we need to perform a truncation before constant folding the elements.
This is because the upper bits can change the result, depending on
the operation type (for example this is the case for min/max).
This change also adds a regression test.
Reviewers: jmolloy
Subscribers: jmolloy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12697
llvm-svn: 247265
First, we need to teach isFrameOffsetLegal about STNP.
It already knew about the STP/LDP variants, but those were probably
never exercised, because it's only the load/store optimizer that
generates STP/LDP, and the only user of the method is frame lowering,
which runs earlier.
The STP/LDP cases were wrong: they didn't take into account the fact
that they return two results, not one, so the immediate offset will be
the 4th operand, not the 3rd.
Follow-up to r247234.
llvm-svn: 247236
We could go through the load/store optimizer and match STNP where
we would have matched a nontemporal-annotated STP, but that's not
reliable enough, as an opportunistic optimization.
Insetad, we can guarantee emitting STNP, by matching them at ISel.
Since there are no single-input nontemporal stores, we have to
resort to some high-bits-extracting trickery to generate an STNP
from a plain store.
Also, we need to support another, LDP/STP-specific addressing mode,
base + signed scaled 7-bit immediate offset.
For now, only match the base. Let's make it smart separately.
Part of PR24086.
llvm-svn: 247231
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
llvm-svn: 246769
This patch uses the metadata defined in D12341 to avoid creating an unpredictable branch.
Differential Revision: http://reviews.llvm.org/D12342
llvm-svn: 246692
This matches the ARM behavior. In both cases, the register is part
of the optional Performance Monitors extension, so, add the feature,
and enable it for the A-class processors we support.
Differential Revision: http://reviews.llvm.org/D12425
llvm-svn: 246555
Currently, when edge weights are assigned to edges that are created when lowering switch statement, the weight on the edge to default statement (let's call it "default weight" here) is not considered. We need to distribute this weight properly. However, without value profiling, we have no idea how to distribute it. In this patch, I applied the heuristic that this weight is evenly distributed to successors.
For example, given a switch statement with cases 1,2,3,5,10,11,20, and every edge from switch to each successor has weight 10. If there is a binary search tree built to test if n < 10, then its two out-edges will have weight 4x10+10/2 = 45 and 3x10 + 10/2 = 35 respectively (currently they are 40 and 30 without considering the default weight). Each distribution (which is 5 here) will be stored in each SwitchWorkListItem for further distribution.
There are some exceptions:
For a jump table header which doesn't have any edge to default statement, we don't distribute the default weight to it.
For a bit test header which covers a contiguous range and hence has no edges to default statement, we don't distribute the default weight to it.
When the branch checks a single value or a contiguous range with no edge to default statement, we don't distribute the default weight to it.
In other cases, the default weight is evenly distributed to successors.
Differential Revision: http://reviews.llvm.org/D12418
llvm-svn: 246522
The ISelLowering code turned insertion turned the element for the
lowest lane of a BUILD_VECTOR into an INSERT_SUBREG, this prohibited
the patterns for SCALAR_TO_VECTOR(Load) to match later. Restrict this
to cases without a load argument.
Reported in rdar://22223823
Differential Revision: http://reviews.llvm.org/D12467
llvm-svn: 246462
As a follow-up to r246098, require `DISubprogram` definitions
(`isDefinition: true`) to be 'distinct'. Specifically, add an assembler
check, a verifier check, and bitcode upgrading logic to combat testcase
bitrot after the `DIBuilder` change.
While working on the testcases, I realized that
test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its
purpose was to check for a corner case in PR22792 where two subprogram
definitions match exactly and share the same metadata node. The new
verifier check, requiring that subprogram definitions are 'distinct',
precludes that possibility.
I updated almost all the IR with the following script:
git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' |
grep -v test/Bitcode |
xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/'
Likely some variant of would work for out-of-tree testcases.
llvm-svn: 246327
more than 2 instructions.
I introduced this regression a while back and did not noticed it because I
somehow forgot to push the initial test cases for the pass!
Fix that as well!
llvm-svn: 246239
When producing conditional compare sequences for or operations we need
to negate the operands and the finally tested flags. The thing is if we negate
the finally tested flags this equals a logical negation of all previously
emitted expressions. There was a case missing where we have to order OR
expressions so they get emitted first.
This fixes http://llvm.org/PR24459
llvm-svn: 245641
Create CMP;CCMP sequences from and/or trees does not gain us anything if
the and/or tree is materialized to a GP register anyway. While most of
the code already checked for hasOneUse() there was one important case
missing.
llvm-svn: 245640
We are already falling back to SelectionDAG when encountering an shift with UB.
This adds the same checks for shifts with UB that get folded into arithmetic or
logical operations.
This fixes rdar://problem/22345295.
llvm-svn: 245499
The current code normalizes select(C0, x, select(C1, x, y)) towards
select(C0|C1, x, y) if the targets prefers that form. This patch adds an
additional rule that if the select(C1, x, y) part already exists in the
function then we want to normalize into the other direction because the
effects of reusing the existing value are bigger than transforming into
the target preferred form.
This addresses regressions following r238793, see also:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150727/290272.html
Differential Revision: http://reviews.llvm.org/D11616
llvm-svn: 245350
These only get generated if the target supports them. If one of the variants is not legal and the other is, and it is safe to do so, the other variant will be emitted.
For example on AArch32 (V8), we have scalar fminnm but not fmin.
Fix up a couple of tests while we're here - one now produces better code, and the other was just plain wrong to start with.
llvm-svn: 245196
For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors.
(zext (truncate x)) -> (zext (and(x, m))
Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code.
This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles.
Differential Revision: http://reviews.llvm.org/D11764
llvm-svn: 245160
We canonicalize V64 vectors to V128 through insert_subvector: the other
FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't,
so we'd fail to match fmls and generate fneg+fmla instead.
The vector equivalents are already tested and functional.
llvm-svn: 245107
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.
I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.
llvm-svn: 245035
These tests relied on -enable-no-nans-fp-math, whereas soon they'll take their no-nans hint
from the FCMP instruction itself, so split the no-nans stuff out into its own test.
Also do a slight rejig of instruction order. The old FMIN/MAX backend matching had to deal with looking through casts, which it never did particularly well. Now, instcombine will recognize such patterns and canonicalize the cast outside the select. So modify the test inputs to assume that instcombine has already run.
llvm-svn: 244913
Now that we can properly promote mismatched FCOPYSIGNs (r244858), we
can mark the FP_ROUND on the result as truncating, to expose folding.
FCOPYSIGN doesn't change anything but the sign bit, so
(fp_round (fcopysign (fpext a), b))
is equivalent to (modulo the sign bit):
(fp_round (fpext a))
which is a no-op.
llvm-svn: 244862
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.
Follow-up to r243924, r243926, and r244858.
llvm-svn: 244860
We don't care about its type, and there's even a combine that'll fold
away the FP_EXTEND if we let it run. However, until it does, we'll have
something broken like:
(f32 (fp_extend (f64 v)))
Scalar f16 follow-up to r243924.
llvm-svn: 244858
r242520 was reverted in r244313 as the expected behaviour of the alias
attribute in C is that the alias has the same size as the aliasee. However
we can re-introduce adding the size on the alias when the aliasee does not,
from a source code or object perspective, exist as a discrete entity. This
happens when the aliasee is not a symbol, or when that symbol is private.
Differential Revision: http://reviews.llvm.org/D11943
llvm-svn: 244752
On Mach-O emitting aliases for the variables that make up a MergedGlobals
variable can cause problems when linking with dead stripping enabled so don't
do that, except for external variables where we must emit an alias.
llvm-svn: 244748
Other objects can never reference the MergedGlobals symbol so external linkage
is never needed. Using private instead of internal linkage means the object is
more similar to what it looks like when global merging is not enabled, with
the only difference being that the merged variables are addressed indirectly
relative to the start of the section they are in.
Also add aliases for merged variables with internal linkage, as this also makes
the object be more like what it is when they are not merged.
Differential Revision: http://reviews.llvm.org/D11942
llvm-svn: 244615
Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change:
- Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant.
- f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall!
llvm-svn: 244595
frame setup instruction.
This commit ensures that the stack map lowering code in FastISel adds an
appropriate number of immediate operands to the frame setup instruction.
The previous code added just one immediate operand, which was fine for a target
like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit
operands. This caused the machine verifier to report an error when the old code
added just one.
Reviewers: Juergen Ributzka
Differential Revision: http://reviews.llvm.org/D11853
llvm-svn: 244508
I looked into adding a warning / error for this to FileCheck, but there doesn't
seem to be a good way to avoid it triggering on the instances of it in RUN lines.
llvm-svn: 244481
When we are not emitting the condition for the branch, because the condition is
in another BB or SDAG did the selection for us, then we have to mask the flag in
the register with AND.
This is required when the condition comes from a truncate, because SDAG only
truncates down to a legal size of i32.
This fixes rdar://problem/22161062.
llvm-svn: 244291
This reverts commit r243198 and 243304.
Turns out this wasn't the correct fix for this problem. It works only within
FastISel, but fails when the truncate is selected by SDAG.
llvm-svn: 244287