This commit makes LLVM not estimate branch probabilities when doing a
single bit bitmask tests.
The code that originally made me discover this is:
if ((a & 0x1) == 0x1) {
..
}
In this case we don't actually have any branch probability information
and should not assume to have any. LLVM transforms this into:
%and = and i32 %a, 1
%tobool = icmp eq i32 %and, 0
So, in this case, the result of a bitwise and is compared against 0,
but nevertheless, we should not assume to have probability
information.
CodeGen/ARM/2013-10-11-select-stalls.ll started failing because the
changed probabilities changed the results of
ARMBaseInstrInfo::isProfitableToIfCvt() and led to an Ifcvt of the
diamond in the test. AFAICT, the test was never meant to test this and
thus changing the test input slightly to not change the probabilities
seems like the best way to preserve the meaning of the test.
llvm-svn: 234979
if ((a & 0x1) == 0x1) {
..
}
In this case we don't actually have any branch probability information and
should not assume to have any. LLVM transforms this into:
%and = and i32 %a, 1
%tobool = icmp eq i32 %and, 0
So, in this case, the result of a bitwise and is compared against 0,
but nevertheless, we should not assume to have probability
information.
llvm-svn: 234898
Combine something like:
(v8i8 concat_vectors (v2i8 bitcast (i16)) x4)
into:
(v8i8 (bitcast (v4i16 BUILD_VECTOR (i16) x4)))
If any of the scalars are floating point, use that throughout.
Differential Revision: http://reviews.llvm.org/D8948
llvm-svn: 234809
Currently, there's a single flag, checked by the pass itself.
It can't force-enable the pass (and is on by default), because it
might not even have been created, as that's the targets decision.
Instead, have separate explicit flags, so that the decision is
consistently made in the target.
Keep the flag as a last-resort "force-disable GlobalMerge" for now,
for backwards compatibility.
llvm-svn: 234666
For the most common ones (such as fadd), we already did the promotion.
Do the same thing for all the others.
Currently, we'll just crash/assert on all these operations, as
there's no hardware or libcall support whatsoever.
f16 (half) is specified as an interchange - not arithmetic - format,
and is expected to be promoted to single-precision for arithmetic
operations.
While there, teach the legalizer about promoting some of the (mostly
floating-point) operations that we never needed before.
Differential Revision: http://reviews.llvm.org/D8648
See related discussion on the thread for: http://reviews.llvm.org/D8755
llvm-svn: 234550
We already do:
concat_vectors(scalar, undef) -> scalar_to_vector(scalar)
When the scalar is legal.
When it's not, but is a truncated legal scalar, we can also do:
concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)
Which is equivalent, since the upper lanes are undef anyway.
While there, teach the combine to look at more than 2 operands.
Differential Revision: http://reviews.llvm.org/D8883
llvm-svn: 234530
The integer extend optimization tries to fold the extend into the load
instruction. This requires us to identify if the extend has already been
emitted or not and act accordingly on it.
The check that was originally performed for this was not sufficient. Besides
checking the ValueMap for a mapped register we also need to check if the
virtual register has already an associated machine instruction that defines it.
This fixes rdar://problem/20470788.
llvm-svn: 234529
restrictions when choosing a type for small-memcpy inlining in
SelectionDAGBuilder.
This ensures that the loads and stores output for the memcpy won't be further
expanded during legalization, which would cause the total number of instructions
for the memcpy to exceed (often significantly) the inlining thresholds.
<rdar://problem/17829180>
llvm-svn: 234462
The bug manifests when there are two loads and two stores chained as follows in
a DAG,
(ld v3f32) -> (st f32) -> (ld v3f32) -> (st f32)
and the stores' values are extracted from the preceding vector loads.
MergeConsecutiveStores would replace the first store in the chain with the
merged vector store, which would create a cycle between the merged store node
and the last load node that appears in the chain.
This commits fixes the bug by replacing the last store in the chain instead.
rdar://problem/20275084
Differential Revision: http://reviews.llvm.org/D8849
llvm-svn: 234430
This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.
Differential Revision: http://reviews.llvm.org/D8516
llvm-svn: 234004
extended loads.
Implement the related target lowering hook so that the optimization has a better
estimation of the cost of an extension.
rdar://problem/19267165
llvm-svn: 233753
it more liberally.
SplitVecOp_TRUNCATE has logic for recursively splitting oversize vectors
that need more than one round of splitting to become legal. There are many
other ISD nodes that could benefit from this logic, so factor it out and
use it for FP_TO_UINT,FP_TO_SINT,SINT_TO_FP,UINT_TO_FP and FTRUNC.
llvm-svn: 233681
When we expand the RET_ReallyLR pseudo instruction we also need to transfer the
implicit operands.
The return register is an implicit operand and without it the liveness
calculation generates an incorrect live-out set for the patchpoint.
This fixes rdar://problem/19068476.
llvm-svn: 233635
method.
This enables the instprinter to print a different system register name based on
the feature bits of the per-function subtarget.
Differential Revision: http://reviews.llvm.org/D8668
llvm-svn: 233412
Tailcalls are only OK with forwarded sret pointers. With explicit sret,
one approximation is to check that the pointer isn't an Instruction, as
in that case it might point into some local memory (alloca). That's not
OK with tailcalls.
Explicit sret counterpart to r233409.
Differential Revison: http://reviews.llvm.org/D8510
llvm-svn: 233410
Tailcalls are only OK with forwarded sret pointers. With sret demotion,
they're not, as we'd have a pointer into a soon-to-be-dead stack frame.
Differential Revison: http://reviews.llvm.org/D8510
llvm-svn: 233409
"Fix the MachineScheduler's logic for updating ready times for in-order.
Now the scheduler updates a node's ready time as soon as it is
scheduled, before releasing dependent nodes."
This fix was only made in one variant of the ScheduleDAGMI driver.
Francois de Ferriere reported the issue in the other bit of code where
it was also needed.
I never got around to coming up with a test case, but it's an
obvious fix that shouldn't be delayed any longer.
I'll try to refactor this code a little better.
I did verify performance on a wide variety of targets and saw no
negative impact with this fix.
llvm-svn: 233366
The pass used to be enabled by default with CodeGenOpt::Less (-O1).
This is too aggressive, considering the pass indiscriminately merges
all globals together.
Currently, performance doesn't always improve, and, on code that uses
few globals (e.g., the odd file- or function- static), more often than
not is degraded by the optimization. Lengthy discussion can be found
on llvmdev (AArch64-focused; ARM has similar problems):
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-February/082800.html
Also, it makes tooling and debuggers less useful when dealing with
globals and data sections.
GlobalMerge needs to better identify those cases that benefit, and this
will be done separately. In the meantime, move the pass to run with
-O3 rather than -O1, on both ARM and AArch64.
llvm-svn: 233024
Summary: Building FP16 constant vectors caused the FP16 data to be bitcast to i64. This patch creates a BITCAST node with the correct value, and adds a test to verify correct handling.
Reviewers: mcrosier
Reviewed By: mcrosier
Subscribers: mcrosier, jmolloy, ab, srhines, llvm-commits, rengolin, aemerson
Differential Revision: http://reviews.llvm.org/D8369
llvm-svn: 232562
Optimize concat_vectors of truncated vectors, where the intermediate
type is illegal, to avoid said illegality, e.g.,
(v4i16 (concat_vectors (v2i16 (truncate (v2i64))),
(v2i16 (truncate (v2i64)))))
->
(v4i16 (truncate (v4i32 (concat_vectors (v2i32 (truncate (v2i64))),
(v2i32 (truncate (v2i64)))))))
This isn't really target-specific, and, as such, would best go in the
DAGCombiner. However, ISD::TRUNCATE legality isn't keyed on both input
and result type, so we might generate worse code when we don't know
better. On AArch64 we know it's fine for v2i64->v4i16 and v4i32->v8i8.
rdar://20022387
llvm-svn: 232459
As part of PR22777, fix testcases that fail the debug info verifier.
The changes fall into the following categories:
- Empty `filename:` fields in `MDFile`s. Compile units and some types
require non-empty filenames. A number of testcases have empty
filenames, probably due to hand-reduction of testcases.
- Not-quite empty arrays: `!{i32 0}`. This used to be equivalent in
the debug info schema to `!{}`. They cause problems for
`!MDSubroutineType`'s `types:` array, since it requires all operands
to be valid types. (Note that `!{null}` is the correct type array
for functions that take no arguments and return `void`.)
- Significantly bitrotted testcases. Nodes got left behind a few
upgrades ago because of missing or invalid tags.
llvm-svn: 232415
Similar to gep (r230786) and load (r230794) changes.
Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.
(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)
import fileinput
import sys
import re
rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)
def conv(match):
line = match.group(1)
line += match.group(4)
line += ", "
line += match.group(2)
return line
line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
sys.stdout.write(line[off:match.start()])
sys.stdout.write(conv(match))
off = match.end()
sys.stdout.write(line[off:])
llvm-svn: 232184
%Q5_Q6<def> = COPY %Q2_Q3
%D5<def> =
%D3<def> =
%D3<def> = COPY %D6 // Incorrectly removed in MachineCopyPropagation
Using of %D3 results in incorrect result ...
Reviewed in http://reviews.llvm.org/D8242
llvm-svn: 232142
This adds new node types for each intrinsic.
For instance, for addv, we have AArch64ISD::UADDV, such that:
(v4i32 (uaddv ...))
is the same as
(v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...))))
that is,
(v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
(i32 (int_aarch64_neon_uaddv ...)), ssub)
In a combine, we transform all such across-vector-lanes intrinsics to:
(i32 (extract_vector_elt (uaddv ...), 0))
This has one big advantage: by making the extract_element explicit, we
enable the existing patterns for lane-aware instructions to fire.
This lets us avoid needlessly going through the GPRs. Consider:
uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) {
return vmulq_n_u32(a, vaddvq_u32(b));
}
We now generate:
addv.4s s1, v1
mul.4s v0, v0, v1[0]
instead of the previous:
addv.4s s1, v1
fmov w8, s1
dup.4s v1, w8
mul.4s v0, v1, v0
rdar://20044838
llvm-svn: 231840
Teach the load store optimizer how to sign extend a result of a load pair when
it helps creating more pairs.
The rational is that loads are more expensive than sign extensions, so if we
gather some in one instruction this is better!
<rdar://problem/20072968>
llvm-svn: 231527
As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU linkers
ld.bfd and ld.gold currently only support a subset of the whole range of AArch64
ELF TLS relocations. Furthermore, they assume that some of the code sequences to
access thread-local variables are produced in a very specific sequence.
When the sequence is not as the linker expects, it can silently mis-relaxe/mis-optimize
the instructions.
Even if that wouldn't be the case, it's good to produce the exact sequence,
as that ensures that linkers can perform optimizing relaxations.
This patch:
* implements support for 16MiB TLS area size instead of 4GiB TLS area size. Ideally clang
would grow an -mtls-size option to allow support for both, but that's not part of this patch.
* by default doesn't produce local dynamic access patterns, as even modern ld.bfd and ld.gold
linkers do not support the associated relocations. An option (-aarch64-elf-ldtls-generation)
is added to enable generation of local dynamic code sequence, but is off by default.
* makes sure that the exact expected code sequence for local dynamic and general dynamic
accesses is produced, by making use of a new pseudo instruction. The patch also removes
two (AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing AArch64-specific pseudo
SDNode instructions that are superseded by the new one (TLSDESC_CALLSEQ).
llvm-svn: 231227
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464. I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned. Let me know if I'm wrong :).
The code changes are fairly mechanical:
- Bumped the "Debug Info Version".
- `DIBuilder` now creates the appropriate subclass of `MDNode`.
- Subclasses of DIDescriptor now expect to hold their "MD"
counterparts (e.g., `DIBasicType` expects `MDBasicType`).
- Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
for printing comments.
- Big update to LangRef to describe the nodes in the new hierarchy.
Feel free to make it better.
Testcase changes are enormous. There's an accompanying clang commit on
its way.
If you have out-of-tree debug info testcases, I just broke your build.
- `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to
update all the IR testcases.
- Unfortunately I failed to find way to script the updates to CHECK
lines, so I updated all of these by hand. This was fairly painful,
since the old CHECKs are difficult to reason about. That's one of
the benefits of the new hierarchy.
This work isn't quite finished, BTW. The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro). Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I
also expect to make a few schema changes now that it's easier to reason
about everything.
llvm-svn: 231082
Essentially the same as the GEP change in r230786.
A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)
import fileinput
import sys
import re
pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")
for line in sys.stdin:
sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7649
llvm-svn: 230794
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.
This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.
* This doesn't modify gep operators, only instructions (operators will be
handled separately)
* Textual IR changes only. Bitcode (including upgrade) and changing the
in-memory representation will be in separate changes.
* geps of vectors are transformed as:
getelementptr <4 x float*> %x, ...
->getelementptr float, <4 x float*> %x, ...
Then, once the opaque pointer type is introduced, this will ultimately look
like:
getelementptr float, <4 x ptr> %x
with the unambiguous interpretation that it is a vector of pointers to float.
* address spaces remain on the pointer, not the type:
getelementptr float addrspace(1)* %x
->getelementptr float, float addrspace(1)* %x
Then, eventually:
getelementptr float, ptr addrspace(1) %x
Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.
update.py:
import fileinput
import sys
import re
ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile( r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
def conv(match, line):
if not match:
return line
line = match.groups()[0]
if len(match.groups()[5]) == 0:
line += match.groups()[2]
line += match.groups()[3]
line += ", "
line += match.groups()[1]
line += "\n"
return line
for line in sys.stdin:
if line.find("getelementptr ") == line.find("getelementptr inbounds"):
if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
line = conv(re.match(ibrep, line), line)
elif line.find("getelementptr ") != line.find("getelementptr ("):
line = conv(re.match(normrep, line), line)
sys.stdout.write(line)
apply.sh:
for name in "$@"
do
python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
rm -f "$name.tmp"
done
The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh
After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).
The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.
Reviewers: rafael, dexonsmith, grosser
Differential Revision: http://reviews.llvm.org/D7636
llvm-svn: 230786
Summary:
Currently fast-isel-abort will only abort for regular instructions,
and just warn for function calls, terminators, function arguments.
There is already fast-isel-abort-args but nothing for calls and
terminators.
This change turns the fast-isel-abort options into an integer option,
so that multiple levels of strictness can be defined.
This will help no being surprised when the "abort" option indeed does
not abort, and enables the possibility to write test that verifies
that no intrinsics are forgotten by fast-isel.
Reviewers: resistor, echristo
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D7941
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230775
(The change was landed in r230280 and caused the regression PR22674.
This version contains a fix and a test-case for PR22674).
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
Apart from the attached test case, another (more realistic)
manifestation of the bug can be seen in
Transforms/IndVarSimplify/pr20680.ll.
Differential Revision: http://reviews.llvm.org/D7778
llvm-svn: 230533
The reason why these large shift sizes happen is because OpaqueConstants
currently inhibit alot of DAG combining, but that has to be addressed in
another commit (like the proposal in D6946).
Differential Revision: http://reviews.llvm.org/D6940
llvm-svn: 230355
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
Apart from the attached test case, another (more realistic) manifestation
of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll.
NOTE: this change was landed with an incorrect commit message in
rL230275 and was reverted for that reason in rL230279. This commit
message is the correct one.
Differential Revision: http://reviews.llvm.org/D7778
llvm-svn: 230280
230275 got committed with an incorrect commit message due to a mixup
on my side. Will re-land in a few moments with the correct commit
message.
llvm-svn: 230279
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal. {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.
Differential Revision: http://reviews.llvm.org/D7808
llvm-svn: 230275
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.
Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.
Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering
llvm-svn: 229413
directly into blends of the splats.
These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.
Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.
llvm-svn: 229308
We used to do this DAG combine, but it's not always correct:
If the first fp_round isn't a value preserving truncation, it might
introduce a tie in the second fp_round, that wouldn't occur in the
single-step fp_round we want to fold to.
In other words, double rounding isn't the same as rounding.
Differential Revision: http://reviews.llvm.org/D7571
llvm-svn: 228911
analysis.
We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
different heuristic now so there is a slight test churn.
Test changes:
* combine-comparisons-by-cse.ll: Removed unneeded branch check.
* 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
* coalesce-subregs.ll: Superfluous block check.
* 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
* PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
* select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.
llvm-svn: 228826
While various DAG combines try to guarantee that a vector SETCC
operation will have the same output size as input, there's nothing
intrinsic to either creation or LegalizeTypes that actually guarantees
it, so the function needs to be ready to handle a mismatch.
Fortunately this is easy enough, just extend or truncate the naturally
compared result.
I couldn't reproduce the failure in other backends that I know have
SIMD, so it's probably only an issue for these two due to shared
heritage.
Should fix PR21645.
llvm-svn: 228518
Avoid the creation of select instructions which can result in different
scheduling of the selects.
I also added a bunch of additional store volatiles. Those avoid A
CodeGen problem (bug?) where normalizes and denomarlizing the control
moves all shift instructions into the first block where ISel can't match
them together with the cmps.
llvm-svn: 228362
In case CSE reuses a previoulsy unused register the dead-def flag has to
be cleared on the def operand, as exposed by the arm64-cse.ll test.
This fixes PR22439 and the corresponding rdar://19694987
Differential Revision: http://reviews.llvm.org/D7395
llvm-svn: 228178
Currently, Cortex-A72 is modelled as an Cortex-A57 except the fp
load balancing pass isn't enabled for Cortex-A72 as it's not
profitable to have it enabled for this core.
Patch by Ranjeet Singh.
llvm-svn: 228140
This avoids a partial false dependency on the previous content of
the upper lanes of the destination vector register.
Differential Revision: http://reviews.llvm.org/D7307
llvm-svn: 227820
Some of those didn't even have run lines: they were removed
inadvertently during the Great Merge of 2014.
They used to check for DUPs, but now we go through W-regs?
Filed PR22418 for that potential regression.
For now, just make the tests explicit, so we now where we stand.
llvm-svn: 227635
This patch adds the missing LD[U]RSW variants to the load store optimizer, so
that we generate LDPSW when possible.
<rdar://problem/19583480>
llvm-svn: 226978
AAPCS64 says that it's up to the platform to specify whether x18 is
reserved, and a first step on that way is to add a flag controlling
it.
From: Andrew Turner <andrew@fubar.geek.nz>
llvm-svn: 226664
Original patch by Luke Iannini. Minor improvements and test added by
Erik de Castro Lopo.
Differential Revision: http://reviews.llvm.org/D6877
From: Erik de Castro Lopo <erikd@mega-nerd.com>
llvm-svn: 226473
This commit moves `MDLocation`, finishing off PR21433. There's an
accompanying clang commit for frontend testcases. I'll attach the
testcase upgrade script I used to PR21433 to help out-of-tree
frontends/backends.
This changes the schema for `DebugLoc` and `DILocation` from:
!{i32 3, i32 7, !7, !8}
to:
!MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8)
Note that empty fields (line/column: 0 and inlinedAt: null) don't get
printed by the assembly writer.
llvm-svn: 226048
Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case
where the original subtraction could have caused an overflow.
Reverting the same.
llvm-svn: 225341
We used to generate code similar to:
umov.b w8, v0[2]
strb w8, [x0, x1]
because the STR*ro* patterns were preferred to ST1*.
Instead, we can avoid going through GPRs, and generate:
add x8, x0, x1
st1.b { v0 }[2], [x8]
This patch increases the ST1* AddedComplexity to achieve that.
rdar://16372710
Differential Revision: http://reviews.llvm.org/D6202
llvm-svn: 225183
For 0-lane stores, we used to generate code similar to:
fmov w8, s0
str w8, [x0, x1, lsl #2]
instead of:
str s0, [x0, x1, lsl #2]
To correct that: for store lane 0 patterns, directly match to STR <subreg>0.
Byte-sized instructions don't have the special case for a 0 index,
because FPR8s are defined to have untyped content.
rdar://16372710
Differential Revision: http://reviews.llvm.org/D6772
llvm-svn: 225181
This patch lowers patterns such as-
sub v0.4s, v0.4s, v1.4s
abs v0.4s, v0.4s
to
sabd v0.4s, v0.4s, v1.4s
on AArch64.
Review: http://reviews.llvm.org/D6781
llvm-svn: 225165
Now that `Metadata` is typeless, reflect that in the assembly. These
are the matching assembly changes for the metadata/value split in
r223802.
- Only use the `metadata` type when referencing metadata from a call
intrinsic -- i.e., only when it's used as a `Value`.
- Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
when referencing it from call intrinsics.
So, assembly like this:
define @foo(i32 %v) {
call void @llvm.foo(metadata !{i32 %v}, metadata !0)
call void @llvm.foo(metadata !{i32 7}, metadata !0)
call void @llvm.foo(metadata !1, metadata !0)
call void @llvm.foo(metadata !3, metadata !0)
call void @llvm.foo(metadata !{metadata !3}, metadata !0)
ret void, !bar !2
}
!0 = metadata !{metadata !2}
!1 = metadata !{i32* @global}
!2 = metadata !{metadata !3}
!3 = metadata !{}
turns into this:
define @foo(i32 %v) {
call void @llvm.foo(metadata i32 %v, metadata !0)
call void @llvm.foo(metadata i32 7, metadata !0)
call void @llvm.foo(metadata i32* @global, metadata !0)
call void @llvm.foo(metadata !3, metadata !0)
call void @llvm.foo(metadata !{!3}, metadata !0)
ret void, !bar !2
}
!0 = !{!2}
!1 = !{i32* @global}
!2 = !{!3}
!3 = !{}
I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines). I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.
This is part of PR21532.
llvm-svn: 224257
In the large code model we have to first get the address of the GOT entry, load
the address of the constant, and then load the constant itself.
To avoid these loads and the GOT entry alltogether this commit changes the way
how FP constants are materialized in the large code model. The constats are now
materialized in a GPR and then bitconverted/moved into the FPR.
Reviewed by Tim Northover
Fixes rdar://problem/16572564.
llvm-svn: 223941
The load/store value type is currently not available when lowering the memcpy
intrinsic. Add the missing nullptr check to support this in 'computeAddress'.
Fixes rdar://problem/19178947.
llvm-svn: 223818
This optimization transforms code like:
bb1:
%0 = icmp ne i32 %a, 0
%1 = icmp ne i32 %b, 0
%or.cond = or i1 %0, %1
br i1 %or.cond, label %TrueBB, label %FalseBB
into a multiple branch instructions like:
bb1:
%0 = icmp ne i32 %a, 0
br i1 %0, label %TrueBB, label %bb2
bb2:
%1 = icmp ne i32 %b, 0
br i1 %1, label %TrueBB, label %FalseBB
This optimization is already performed by SelectionDAG, but not by FastISel.
FastISel cannot perform this optimization, because it cannot generate new
MachineBasicBlocks.
Performing this optimization at CodeGenPrepare time makes it available to both -
SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be
removed. There are currenty a few differences in codegen for X86 and PPC, so
this commmit only enables it for FastISel.
Reviewed by Jim Grosbach
This fixes rdar://problem/19034919.
llvm-svn: 223786
A global variable without an explicit alignment specified should be assumed to
be ABI-aligned according to its type, like on other platforms. This allows us
to use better memory operations when accessing it.
rdar://18533701
llvm-svn: 223180
This frequently leads to cases like:
ldr xD, [xN, :lo12:var]
add xA, xN, :lo12:var
ldr xD, [xA, #8]
where the ADD would have been needed anyway, and the two distinct addressing
modes can prevent the formation of an ldp. Because of how we handle ADRP
(aggressively forming an ADRP/ADD pseudo-inst at ISel time), this pattern also
results in duplicated ADRP instructions (one on its own to cover the ldr, and
one combined with the add).
llvm-svn: 223172
Reduce the number of nops emitted for stackmap shadows on AArch64 by counting
non-stackmap instructions up to the next branch target towards the requested
shadow.
<rdar://problem/14959522>
llvm-svn: 223156
Go through implicit defs of CSMI and MI, and clear the kill flags on
their uses in all the instructions between CSMI and MI.
We might have made some of the kill flags redundant, consider:
subs ... %NZCV<imp-def> <- CSMI
csinc ... %NZCV<imp-use,kill> <- this kill flag isn't valid anymore
subs ... %NZCV<imp-def> <- MI, to be eliminated
csinc ... %NZCV<imp-use,kill>
Since we eliminated MI, and reused a register imp-def'd by CSMI
(here %NZCV), that register, if it was killed before MI, should have
that kill flag removed, because it's lifetime was extended.
Also, add an exhaustive testcase for the motivating example.
Reviewed by: Juergen Ributzka <juergen@apple.com>
llvm-svn: 223133
The blocking code originated in ARM, which is more aggressive about casting
types to a canonical representative before doing anything else, so I missed out
most vector HFAs and broke the ABI. This should fix it.
llvm-svn: 223126
r208210 introduced an optimization that improves the vector select
codegen by doing the setcc on vectors directly.
This is a problem they the setcc operands are i1s, because the
optimization would create vectors of i1, which aren't legal.
Part of PR21549.
Differential Revision: http://reviews.llvm.org/D6308
llvm-svn: 223075
r213378 improved f16 bitcasts, so that they go directly through subregs,
instead of through the stack. That code now causes an assertion failure
for bitcasts from other 16-bits types (most importantly v2i8).
Correct that by doing the custom lowering for i16 bitcasts only when the
input is an f16.
Part of PR21549.
Differential Revision: http://reviews.llvm.org/D6307
llvm-svn: 223074
The MachineVerifier used to check that there was always exactly one
unconditional branch to a non-landingpad (normal) successor.
If that normal successor to an invoke BB is unreachable, it seems
reasonable to only have one successor, the landing pad.
On targets other than AArch64 (and on AArch64 with a different testcase),
the branch folder turns the branch to the landing pad into a fallthrough.
The MachineVerifier, which relies on AnalyzeBranch, is unable to check
the condition, and doesn't complain. However, it does in this specific
testcase, where the branch to the landing pad remained.
Make the MachineVerifier accept it.
llvm-svn: 223059
The AAPCS treats small structs and homogeneous floating (or vector) aggregates
specially, and guarantees they either get passed as a contiguous block of
registers, or prevent any future use of those registers and get passed on the
stack.
This concept can fit quite neatly into LLVM's own type system, mapping an HFA
to [N x float] and so on, and small structs to [N x i64]. Doing so allows
front-ends to emit AAPCS compliant code without having to duplicate the
register counting logic.
llvm-svn: 222903
This mostly entails adding relocations, however there are a couple of
changes to existing relocations:
1. R_AARCH64_NONE is defined to be zero rather than 256
R_AARCH64_NONE has been defined to be zero for a long time elsewhere
e.g. binutils and glibc since the submission of the AArch64 port in
2012 so this is required for compatibility.
2. R_AARCH64_TLSDESC_ADR_PAGE renamed to R_AARCH64_TLSDESC_ADR_PAGE21
I don't think there is any way for relocation names to leak out of LLVM
so this should not break anything.
Tested with check-all with no regressions.
llvm-svn: 222821
The pattern matching failed to recognize all instances of "-1", because when
comparing against "-1" we didn't use an APInt of the same bitwidth.
This commit fixes this and also adds inverse versions of the conditon to catch
more cases.
llvm-svn: 222722
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)
A hook is added to allow the target to control whether it needs to do such combine.
Reviewed in http://reviews.llvm.org/D6334
llvm-svn: 222510
SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE
and LICM. By enabling these passes we can have better address calculations and generate a better addressing
mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57.
Reviewed in http://reviews.llvm.org/D5864.
llvm-svn: 222331
shift-right for booleans (i1).
Arithmetic shift-right immediate with sign-/zero-extensions also works for
boolean values. Update the assert and the test cases to reflect that fact.
llvm-svn: 222272
shift-right for booleans (i1).
Logical shift-right immediate with sign-/zero-extensions also works for boolean
values. Update the assert and the test cases to reflect that fact.
llvm-svn: 222270
Shifts also perform sign-/zero-extends to larger types, which requires us to emit
an integer extend instead of a simple COPY.
Related to PR21594.
llvm-svn: 222257
This change emits a COPY for a shift-immediate with a "zero" shift value.
This fixes PR21594 where we emitted a shift instruction with an incorrect
immediate operand.
llvm-svn: 222247
The generic FastISel code would bail, because it can't emit a sign-extend for
AArch64. This copies the code over and uses AArch64 specific emit functions.
This is not ideal and 'computeAddress' should handles this, so it can fold the
address computation into the memory operation.
I plan to clean up 'computeAddress' anyways, so I will add that in a future
commit.
Related to rdar://problem/18962471.
llvm-svn: 221923
This folds the compare emission into the select emission when possible, so we
can directly use the flags and don't have to emit a separate compare.
Related to rdar://problem/18960150.
llvm-svn: 221847
In the case we optimize an integer extend away and replace it directly with the
source register, we also have to clear all kill flags at all its uses.
This is necessary, because the orignal IR instruction might be trivially dead,
but we replaced it with a nop at MI level.
llvm-svn: 221628
Reversing a CB* instruction used to drop the flags on the condition. On the
included testcase, this lead to a read from an undefined vreg.
Using addOperand keeps the flags, here <undef>.
Differential Revision: http://reviews.llvm.org/D6159
llvm-svn: 221507
While fixing up the register classes in the machine combiner in a previous
commit I missed one.
This fixes the last one and adds a test case.
llvm-svn: 221308
Registers are not all equal. Some are not allocatable (infinite cost),
some have to be preserved but can be used, and some others are just free
to use.
Ensure there is a cost hierarchy reflecting this fact, so that the
allocator will favor scratch registers over callee-saved registers.
llvm-svn: 221293
Some literals in the AArch64 backend had 15 'f's rather than 16, causing
comparisons with a constant 0xffffffffffffffff to be miscompiled.
llvm-svn: 221157
r212242 introduced a legalizer hook, originally to let AArch64 widen
v1i{32,16,8} rather than scalarize, because the legalizer expected, when
scalarizing the result of a conversion operation, to already have
scalarized the operands. On AArch64, v1i64 is legal, so that commit
ensured operations such as v1i32 = trunc v1i64 wouldn't assert.
It did that by choosing to widen v1 types whenever possible. However,
v1i1 types, for which there's no legal widened type, would still trigger
the assert.
This commit fixes that, by only scalarizing a trunc's result when the
operand has already been scalarized, and introducing an extract_elt
otherwise.
This is similar to r205625.
Fixes PR20777.
llvm-svn: 220937
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.
Tests included.
rdar://18816719
llvm-svn: 220933
Benchmarks have shown that it's harmless to the performance there, and having a
unified set of passes between the two cores where possible helps big.LITTLE
deployment.
Patch by Z. Zheng.
llvm-svn: 220744
This is a minor change to use the immediate version when the operand is a null
value. This should get rid of an unnecessary 'mov' instruction in debug
builds and align the code more with the one generated by SelectionDAG.
This fixes rdar://problem/18785125.
llvm-svn: 220713
The pattern matching for a 'ConstantInt' value was too restrictive. Checking for
a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction.
This fixes rdar://problem/18784732.
llvm-svn: 220709
This fixes a bug where the input register was not defined for the 'tbz/tbnz'
instruction. This happened, because we folded the 'and' instruction from a
different basic block.
This fixes rdar://problem/18784013.
llvm-svn: 220704
At higher optimization levels the LLVM IR may contain more complex patterns for
loads/stores from/to frame indices. The 'computeAddress' function wasn't able to
handle this and triggered an assertion.
This fix extends the possible addressing modes for frame indices.
This fixes rdar://problem/18783298.
llvm-svn: 220700
This fixes a miscompilation in the AArch64 fast-isel which was
triggered when a branch is based on an icmp with condition eq or ne,
and type i1, i8 or i16. The cbz instruction compares the whole 32-bit
register, so values with the bottom 1, 8 or 16 bits clear would cause
the wrong branch to be taken.
llvm-svn: 220553
This adds support for legalization of instructions of the form:
[fp_conv] <1 x i1> %op to <1 x double>
where fp_conv is one of fpto[us]i, [us]itofp. This used to assert
because they were simply missing from the vector operand scalarizer.
A similar problem arose in r190830, with trunc instead.
Fixes PR20778.
Differential Revision: http://reviews.llvm.org/D5810
llvm-svn: 220533
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.
llvm-svn: 220321
Every target we support has support for assembly that looks like
a = b - c
.long a
What is special about MachO is that the above combination suppresses the
production of a relocation.
With this change we avoid producing the intermediary labels when they don't
add any value.
llvm-svn: 220256
We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same.
While we're at it, avoid writing to std::vector::end()...
Bug found with random testing and a lot of coffee.
llvm-svn: 220051
When the constant divisor was larger than 32bits, then the optimized code
generated for the AArch64 backend would emit the wrong code, because the shift
was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would
loose the upper 32bits.
This fixes rdar://problem/18678801.
llvm-svn: 219934
This is mostly a copy of the existing FastISel GEP code, but we have to
duplicate it for AArch64, because otherwise we would bail out even for simple
cases. This is because the standard fastEmit functions don't cover MUL at all
and ADD is lowered very inefficientily.
The original commit had a bug in the add emit logic, which has been fixed.
llvm-svn: 219831
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.
Examples:
1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
to b.<invCC>
2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
to b.<CC>
rdar://problem/18506500
llvm-svn: 219742
This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail
out even for simple cases, because the standard fastEmit functions don't cover
MUL and ADD is lowered inefficientily.
llvm-svn: 219726
Sign-/zero-extend folding depended on the load and the integer extend to be
both selected by FastISel. This cannot always be garantueed and SelectionDAG
might interfer. This commit adds additonal checks to load and integer extend
lowering to catch this.
Related to rdar://problem/18495928.
llvm-svn: 219716
e.g Currently we'll generate following instructions if the immediate is too wide:
MOV X0, WideImmediate
ADD X1, BaseReg, X0
LDR X2, [X1, 0]
Using [Base+XReg] addressing mode can save one ADD as following:
MOV X0, WideImmediate
LDR X2, [BaseReg, X0]
Differential Revision: http://reviews.llvm.org/D5477
llvm-svn: 219665
Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is
possible for a 64-bit multiply-accumulate instruction in AArch64 state to
generate an incorrect result. The details are quite complex and hard to
determine statically, since branches in the code may exist in some
circumstances, but all cases end with a memory (load, store, or prefetch)
instruction followed immediately by the multiply-accumulate operation.
The safest work-around for this issue is to make the compiler avoid emitting
multiply-accumulate instructions immediately after memory instructions and the
simplest way to do this is to insert a NOP.
This patch implements such work-around in the backend, enabled via the option
-aarch64-fix-cortex-a53-835769.
The work-around code generation is not enabled by default.
llvm-svn: 219603
The code already folds sign-/zero-extends, but only if they are arguments to
mul and shift instructions. This extends the code to also fold them when they
are direct inputs.
llvm-svn: 219187
Tiny enhancement to the address computation code to also fold sub instructions
if the rhs is constant and can be folded into the offset.
llvm-svn: 219186
This commit fixes an issue with sign-/zero-extending loads that was discovered
by Richard Barton.
We use now the correct load instructions for sign-extending loads to 64bit. Also
updated and added more unit tests.
llvm-svn: 219185
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash. The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).
Original commit message follows.
--
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
llvm-svn: 219010
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
llvm-svn: 218914
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
llvm-svn: 218778
Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST
was introduced in r217159 and r217138. This patch adds a missing cast from
v2f32 to v1i64 which is causing some compilation failures. Also added test
cases to cover various modimm types and BUILD_VECTORs with i64 elements.
llvm-svn: 218751
Note: This version fixed an issue with the TBZ/TBNZ instructions that were
generated in FastISel. The issue was that the 64bit version of TBZ (TBZX)
automagically sets the upper bit of the immediate field that is used to specify
the bit we want to test. To test for any of the lower 32bits we have to first
extract the subregister and use the 32bit version of the TBZ instruction (TBZW).
Original commit message:
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).
llvm-svn: 218693
The sign-/zero-extension of the loaded value can be performed by the memory
instruction for free. If the result of the load has only one use and the use is
a sign-/zero-extend, then we emit the proper load instruction. The extend is
only a register copy and will be optimized away later on.
Other instructions that consume the sign-/zero-extended value are also made
aware of this fact, so they don't fold the extend too.
This fixes rdar://problem/18495928.
llvm-svn: 218653
If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.
This problem is found in spec2006-197.parser.
For example,
stur w10, [x11, #-4]
stur w10, [x11, #-4]
Then one of the two stur instructions can be removed.
Patch by David Xu!
llvm-svn: 218569
Shift-left immediate with sign-/zero-extensions also works for boolean values.
Update the assert and the test cases to reflect that fact.
This should fix a bug found by Chad.
llvm-svn: 218275
When looking through sign/zero-extensions the code would always assume there is
such an extension instruction and use the wrong operand for the address.
There was also a minor issue in the handling of 'AND' instructions. I
accidentially used a 'cast' instead of a 'dyn_cast'.
llvm-svn: 218161
With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.
llvm-svn: 218101
When folding the intrinsic flag into the branch or select we also have to
consider the fact if the intrinsic got simplified, because it changes the
flag we have to check for.
llvm-svn: 218034
Small optimization in 'simplifyAddress'. When the offset cannot be encoded in
the load/store instruction, then we need to materialize the address manually.
The add instruction can encode a wider range of immediates than the load/store
instructions. This change tries to fold the offset into the add instruction
first before materializing the offset in a register.
llvm-svn: 218031
The 'AND' instruction could be used to mask out the lower 32 bits of a register.
If this is done inside an address computation we might be able to fold the
instruction into the memory instruction itself.
and x1, x1, #0xffffffff ---> ldrb x0, [x0, w1, uxtw]
ldrb x0, [x0, x1]
llvm-svn: 218030
This takes advanatage of the CBZ and CBNZ instruction to further optimize the
common null check pattern into a single instruction.
This is related to rdar://problem/18358882.
llvm-svn: 217972
This adds the last two missing floating-point condition codes (FCMP_UEQ and
FCMP_ONE) also to the branch selection. In these two cases an additonal branch
instruction is required.
This also adds unit tests to checks all the different condition codes.
This is related o rdar://problem/18358882.
llvm-svn: 217966
This adds the missing test case for the previous commit:
Allow handling of vectors during return lowering for little endian machines.
Sorry for the noise.
llvm-svn: 217847
This lowers frem to a runtime libcall inside fast-isel.
The test case also checks the CallLoweringInfo bug that was exposed by this
change.
This fixes rdar://problem/18342783.
llvm-svn: 217833
Do
(shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
This is already done for multiplies, but since multiplies
by powers of two are turned into shifts, we also need
to handle it here.
This might want checks for isLegalAddImmediate to avoid
transforming an add of a legal immediate with one that isn't.
llvm-svn: 217610
David Blaikie's commits r217563 & r217564, which added shared_ptr to the
CostPool have fixed some memory leak issues exposed by the PBQP with
coalescing constraints.
The sanitizer bot was failing because of those leaks. Now that the leaks
are gone, we can reenable the aarch64/pbqp test.
llvm-svn: 217580
This adds target specific support for using the PBQP register allocator on the
AArch64, for the A57 cpu.
By default, the PBQP allocator is not used, unless explicitely required
on the command line with "-aarch64-pbqp".
llvm-svn: 217504
using static relocation model and small code model.
Summary: currently we generate GOT based relocations for weak symbol
references regardless of the underlying relocation model. This should
be change so that in static relocation model we use a constant pool
load instead.
Patch from: Keith Walker
Reviewers: Renato Golin, Tim Northover
llvm-svn: 217503
Patched by Sergey Dmitrouk.
This pass tries to make consecutive compares of values use same operands to
allow CSE pass to remove duplicated instructions. For this it analyzes
branches and adjusts comparisons with immediate values by converting:
GE -> GT
GT -> GE
LT -> LE
LE -> LT
and adjusting immediate values appropriately. It basically corrects two
immediate values towards each other to make them equal.
llvm-svn: 217220
Follow up to r217138, extending the logic to other NEON-immediate instructions.
As before, the instruction already performs the correct operation and we're
just using a different type for convenience, so we want a true nop-cast.
Patch by Asiri Rathnayake.
llvm-svn: 217159