Move the hazard scheduling pass to after the long branch pass, as the
long branch pass can create forbiddden slot hazards. Rather than complicating
the implementation of the long branch pass to handle forbidden slot hazards,
just reorder the passes.
llvm-svn: 318657
The VSX versions have the advantage of a full 64-register target whereas the FP
ones have the advantage of lower latency and higher throughput. So what we’re
after is using the faster instructions in low register pressure situations and
using the larger register file in high register pressure situations.
The heuristic chooses between the following 7 pairs of instructions.
PPC::LXSSPX vs PPC::LFSX
PPC::LXSDX vs PPC::LFDX
PPC::STXSSPX vs PPC::STFSX
PPC::STXSDX vs PPC::STFDX
PPC::LXSIWAX vs PPC::LFIWAX
PPC::LXSIWZX vs PPC::LFIWZX
PPC::STXSIWX vs PPC::STFIWX
Differential Revision: https://reviews.llvm.org/D38486
llvm-svn: 318651
Summary:
This patch fixes an issue so that the right alias is printed when the instruction has tied operands. It checks the number of operands in the resulting instruction as opposed to the alias, and then skips over tied operands that should not be printed in the alias.
This allows to generate the preferred assembly syntax for the AArch64 'ins' instruction, which should always be displayed as 'mov' according to the ARM Architecture Reference Manual. Several unit tests have changed as a result, but only to reflect the preferred disassembly. Some other InstAlias patterns (movk/bic/orr) needed a slight adjustment to stop them becoming the default and breaking other unit tests.
Please note that the patch is mostly the same as https://reviews.llvm.org/D29219 which was reverted because of an issue found when running TableGen with the Address Sanitizer. That issue has been addressed in this iteration of the patch.
Reviewers: rengolin, stoklund, huntergr, SjoerdMeijer, rovka
Reviewed By: rengolin, SjoerdMeijer
Subscribers: fhahn, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D40030
llvm-svn: 318650
This patch adds a new abstraction layer to VPlan and leverages it to model the planned
instructions that manipulate masks (AND, OR, NOT), introduced during predication.
The new VPValue and VPUser classes model how data flows into, through and out
of a VPlan, forming the vertices of a planned Def-Use graph. The new
VPInstruction class is a generic single-instruction Recipe that models a
planned instruction along with its opcode, operands and users. See
VectorizationPlan.rst for more details.
Differential Revision: https://reviews.llvm.org/D38676
llvm-svn: 318645
Add instruction selector test for RSBri, which is derived from
AsI1_rbin_irs, and make sure it doesn't get mistaken for SUBri, which is
derived from the very similar AsI1_bin_irs pattern.
llvm-svn: 318643
Remove some of the instruction selector tests for binary operators (and,
or, xor). These are all derived from the same kind of TableGen pattern,
AsI1_bin_irs, so there's no point in testing all of them.
llvm-svn: 318642
In rL316552, we ban intersection of unsigned latch range with signed range check and vice
versa, unless the entire range check iteration space is known positive. It was a correct
functional fix that saved us from dealing with ambiguous values, but it also appeared
to be a very restrictive limitation. In particular, in the following case:
loop:
%iv = phi i32 [ 0, %preheader ], [ %iv.next, %latch]
%iv.offset = add i32 %iv, 10
%rc = icmp slt i32 %iv.offset, %len
br i1 %rc, label %latch, label %deopt
latch:
%iv.next = add i32 %iv, 11
%cond = icmp i32 ult %iv.next, 100
br it %cond, label %loop, label %exit
Here, the unsigned iteration range is `[0, 100)`, and the safe range for range
check is `[-10, %len - 10)`. For unsigned iteration spaces, we use unsigned
min/max functions for range intersection. Given this, we wanted to avoid dealing
with `-10` because it is interpreted as a very big unsigned value. Semantically, range
check's safe range goes through unsigned border, so in fact it is two disjoint
ranges in IV's iteration space. Intersection of such ranges is not trivial, so we prohibited
this case saying that we are not allowed to intersect such ranges.
What semantics of this safe range actually means is that we can start from `-10` and go
up increasing the `%iv` by one until we reach `%len - 10` (for simplicity let's assume that
`%len - 10` is a reasonably big positive value).
In particular, this safe iteration space includes `0, 1, 2, ..., %len - 11`. So if we were able to return
safe iteration space `[0, %len - 10)`, we could safely intersect it with IV's iteration space. All
values in this range are non-negative, so using signed/unsigned min/max for them is unambiguous.
In this patch, we alter the algorithm of safe range calculation so that it returnes a subset of the
original safe space which is represented by one continuous range that does not go through wrap.
In order to reach this, we use modified SCEV substraction function. It can be imagined as a function
that substracts by `1` (or `-1`) as long as the further substraction does not cause a wrap in IV iteration
space. This allows us to perform IRCE in many situations when we deal with IV space and range check
of different types (in terms of signed/unsigned).
We apply this approach for both matching and not matching types of IV iteration space and the
range check. One implication of this is that now IRCE became smarter in detection of empty safe
ranges. For example, in this case:
loop:
%iv = phi i32 [ %begin, %preheader ], [ %iv.next, %latch]
%iv.offset = sub i32 %iv, 10
%rc = icmp ult i32 %iv.offset, %len
br i1 %rc, label %latch, label %deopt
latch:
%iv.next = add i32 %iv, 11
%cond = icmp i32 ult %iv.next, 100
br it %cond, label %loop, label %exit
If `%len` was less than 10 but SCEV failed to trivially prove that `%begin - 10 >u %len- 10`,
we could end up executing entire loop in safe preloop while the main loop was still generated,
but never executed. Now, cutting the ranges so that if both `begin - 10` and `%len - 10` overflow,
we have a trivially empty range of `[0, 0)`. This in some cases prevents us from meaningless optimization.
Differential Revision: https://reviews.llvm.org/D39954
llvm-svn: 318639
We must collect all AddModes even if they are the same.
This is due to Original value is different but we need all original
values collected as they are used as anchors in common phi finding.
Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40166
llvm-svn: 318638
As the first test shows, we could transform an llvm intrinsic which never sets errno
into a libcall which could set errno (even though it's marked readnone?), so that's
not ideal.
It's possible that we can also transform a libcall which could set errno to an
intrinsic given the fast-math-flags constraint, but that's deferred to determine
exactly which set of FMF are needed.
Differential Revision: https://reviews.llvm.org/D40150
llvm-svn: 318628
The 'ord' and 'uno' predicates have a logic operation for NAN built into their definitions:
FCMP_ORD = 7, ///< 0 1 1 1 True if ordered (no nans)
FCMP_UNO = 8, ///< 1 0 0 0 True if unordered: isnan(X) | isnan(Y)
So we can simplify patterns like this:
(fcmp ord (known NNAN), X) && (fcmp ord X, Y) --> fcmp ord X, Y
(fcmp uno (known NNAN), X) || (fcmp uno X, Y) --> fcmp uno X, Y
It might be better to split this into (X uno 0) | (Y uno 0) as a canonicalization, but that
would be another patch.
Differential Revision: https://reviews.llvm.org/D40130
llvm-svn: 318627
kernel verifier is becoming smarter and soon will support
direct and indirect function calls.
Remove obsolete error from BPF backend.
Make call to use PCRel_4 fixup.
'bpf to bpf' calls are distinguished from 'bpf to kernel' calls
by insn->src_reg == BPF_PSEUDO_CALL == 1 which is used as relocation
indicator similar to ld_imm64->src_reg == BPF_PSEUDO_MAP_FD == 1
The actual 'call' instruction remains the same for both
'bpf to kernel' and 'bpf to bpf' calls.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 318614
PR34553 has gone, adding tests to ensure it doesn't come back.
vselect_packss_v16i64 still has some awful codegen on AVX512 targets....
llvm-svn: 318599
Summary:
With this patch I tried to reduce the complexity of the code sightly, by
removing some indirection. Please let me know what you think.
Reviewers: junbuml, mcrosier, davidxl
Reviewed By: junbuml
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40037
llvm-svn: 318593
This makes sure that functions that only clobber xmm registers
(on win64) also get the right cfi directives, if dwarf exceptions
are enabled.
Differential Revision: https://reviews.llvm.org/D40191
llvm-svn: 318591
We used to detect loads feeding fp instructions, but we were
failing to take into account cases where this happens through copies.
For instance, loads can fed copies coming from the ABI lowering
of floating point arguments/results.
llvm-svn: 318589
We used to detect that stores were fed by fp instructions, but we were
failing to take into account cases where this happens through copies.
For instance, stores can be fed by copies coming from the ABI lowering
of floating point arguments.
llvm-svn: 318588
Instead of asserting that the type sizes are exactly equal, we check
that the new size is big enough to contain the original type.
We have to relax this constrain because, right now, we sometimes
specify that things that are smaller than a storage type are legal
instead of widening everything to the size of a storage type.
E.g., we say that G_AND s16 is legal and we map that on GPR32.
This is something we may revisit in the future (either by changing
the legalization process or keeping track separately of the storage
size and the size of the type), but let us reflect the reality of
the situation for now.
llvm-svn: 318587
Revert the following commits:
r318369 [asan] Fallback to non-ifunc dynamic shadow on android<22.
r318235 [asan] Prevent rematerialization of &__asan_shadow.
r317948 [sanitizer] Remove unnecessary attribute hidden.
r317943 [asan] Use dynamic shadow on 32-bit Android.
MemoryRangeIsAvailable() reads /proc/$PID/maps into an mmap-ed buffer
that may overlap with the address range that we plan to use for the
dynamic shadow mapping. This is causing random startup crashes.
llvm-svn: 318575
Differential Revision: https://reviews.llvm.org/D39737
This is the second attempt to commit this. The test was broken on Linux in the first attempt.
llvm-svn: 318560
This is mostly moving VMEM clause breaking into
the hazard recognizer. Also move another hazard
currently handled in the waitcnt pass.
Also stops breaking clauses unless xnack is enabled.
llvm-svn: 318557
Summary: This change fix PR35342 by replacing only the current use with undef in unreachable blocks.
Reviewers: efriedma, mcrosier, igor-laevsky
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40184
llvm-svn: 318551
making it no longer even remotely simple.
The pass will now be more of a "full loop unswitching" pass rather than
anything substantively simpler than any other approach. I plan to rename
it accordingly once the dust settles.
The key ideas of the new loop unswitcher are carried over for
non-trivial unswitching:
1) Fully unswitch a branch or switch instruction from inside of a loop to
outside of it.
2) Update the CFG and IR. This avoids needing to "remember" the
unswitched branches as well as avoiding excessively cloning and
reliance on complex parts of simplify-cfg to cleanup the cfg.
3) Update the analyses (where we can) rather than just blowing them away
or relying on something else updating them.
Sadly, #3 is somewhat compromised here as the dominator tree updates
were too complex for me to want to reason about. I will need to make
another attempt to do this now that we have a nice dynamic update API
for dominators. However, we do adhere to #3 w.r.t. LoopInfo.
This approach also adds an important principls specific to non-trivial
unswitching: not *all* of the loop will be duplicated when unswitching.
This fact allows us to compute the cost in terms of how much *duplicate*
code is inserted rather than just on raw size. Unswitching conditions
which essentialy partition loops will work regardless of the total loop
size.
Some remaining issues that I will be addressing in subsequent commits:
- Handling unstructured control flow.
- Unswitching 'switch' cases instead of just branches.
- Moving to the dynamic update API for dominators.
Some high-level, interesting limitationsV that folks might want to push
on as follow-ups but that I don't have any immediate plans around:
- We could be much more clever about not cloning things that will be
deleted. In fact, we should be able to delete *nothing* and do
a minimal number of clones.
- There are many more interesting selection criteria for which branch to
unswitch that we might want to look at. One that I'm interested in
particularly are a set of conditions which all exit the loop and which
can be merged into a single unswitched test of them.
Differential revision: https://reviews.llvm.org/D34200
llvm-svn: 318549
The assertion was introduced in r317853 but there are cases when a call
isn't handled either as direct or indirect. In this case we add a
reference graph edge but not a call graph edge.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, inglorion, eraman, hiraditya, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D40056
llvm-svn: 318540
Fix test as it is assuming that the cache pruning is always being
performed by default. Explicitly set prune interval to 0s to ensure
pruning is always performed.
llvm-svn: 318520
Enabling and using dwarf exceptions seems like an easier path
to take, than to make the COFF/ARM backend output EHABI directives.
Previously, no EH model was enabled at all on this target.
There's no point in setting UseIntegratedAssembler to false since
GNU binutils doesn't support Windows on ARM, and since we don't
need to support external assembler, we don't need to use register
numbers in cfi directives.
Differential Revision: https://reviews.llvm.org/D39532
llvm-svn: 318510
The logic of replacing of a couple `RANGE_CHECK_LOWER + RANGE_CHECK_UPPER`
into `RANGE_CHECK_BOTH` in fact duplicates the logic of range intersection which
happens when we calculate safe iteration space. Effectively, the result of intersection of
these ranges doesn't differ from the range of merged range check.
We chose to remove duplicating logic in favor of code simplicity.
Differential Revision: https://reviews.llvm.org/D39589
llvm-svn: 318508
We might have instructions such as ext(copy(trunc)), and while cleaning
up legalization artifacts, we can also dce the copies that are in
between legalization artifacts.
llvm-svn: 318501
Add additional RUN clauses to test for -asan-mapping-scale=5 in
selective tests, with special CHECK statements where needed.
Differential Revision: https://reviews.llvm.org/D39775
llvm-svn: 318493
iterator to walk the list which keeps changing inside the loop. When the
UseList contains several uses with the same user, we end processing the same
user more than once, which leads to an assert.
With this fix, unique users are saved and processed later to avoid
processing duplicates.
Differential Revision: https://reviews.llvm.org/D39864
llvm-svn: 318477
't' constraint normally only accepts f32 operands, but for VCVT the
operands can be i32. LLVM is overly restrictive and rejects asm like:
float foo() {
float result;
__asm__ __volatile__(
"vcvt.f32.s32 %[result], %[arg1]\n"
: [result]"=t"(result)
: [arg1]"t"(0x01020304) );
return result;
}
Relax the value type for 't' constraint to either f32 or i32.
Differential Revision: https://reviews.llvm.org/D40137
llvm-svn: 318472
Only do this pre-legalize in case we're using the sign extend to legalize for KNL.
This recovers all of the tests that changed when I stopped SelectionDAGBuilder from deleting sign extends.
There's more work that could be done here particularly to fix the i8->i64 test case that experienced split.
llvm-svn: 318468
Previously SelectionDAGBuilder would remove this sign extend leading to a failure during isel.
The codegen here isn't very nice as we ended up triggering a split.
llvm-svn: 318467
The sign extend might be from an i16 or i8 type and was inserted by InstCombine to match the pointer width. X86 gather legalization isn't currently detecting this to reinsert a sign extend to make things legal.
It's a bit weird for the SelectionDAGBuilder to do this kind of optimization in the first place. With this removed we can at least lean on InstCombine somewhat to ensure the index is i32 or i64.
I'll work on trying to recover some of the test cases by removing sign extends in the backend when its safe to do so with an understanding of the current legalizer capabilities.
This should fix PR30690.
llvm-svn: 318466
The wider element type will normally cause legalize to try to split and scalarize the gather/scatter, but we can't handle that. Instead, truncate the index early so the gather/scatter node is insulated from the legalization.
This really shouldn't happen in practice since InstCombine will normalize index types to the same size as pointers.
llvm-svn: 318452
This patch changes all i32 constant in store instruction to i64 with truncation, to increase the chance that the referenced constant can be shared with other i64 constant.
Differential Revision: https://reviews.llvm.org/D39352
llvm-svn: 318436
Summary:
This change introduces a `DynamicSymbols` field to the ELF specific YAML
supported by `yaml2obj` and `obj2yaml`. This grouping of symbols provides a way
to represent ELF dynamic symbols. The `DynamicSymbols` structure is identical to
the existing `Symbols`.
Reviewers: compnerd, jakehehrlich, silvas
Reviewed By: silvas
Subscribers: silvas, jakehehrlich, llvm-commits
Differential Revision: https://reviews.llvm.org/D39582
llvm-svn: 318433
Summary:
This change fixes a bug where `obj2yaml` can in some cases produce YAML that
causes `yaml2obj` to error.
The ELF YAML document structure has a `Sections` mapping, which contains three
mappings, all of which are optional: `Local`, `Global`, and `Weak.` Any one of
these can be missing, but if all three are missing, then `yaml2obj` errors. This
change allows YAML input for cases like this one.
I have tested this with check-llvm and check-lld, and all tests passed.
This change is the result of test failures while working on D39582, which
introduces a `DynamicSymbols` mapping, which will be empty at times.
Reviewers: compnerd, jakehehrlich, silvas, kledzik, mehdi_amini, pcc
Reviewed By: compnerd
Subscribers: silvas, llvm-commits
Differential Revision: https://reviews.llvm.org/D39908
llvm-svn: 318428
llvm.invariant.group.barrier may accept pointers to arbitrary address space.
This patch let it accept pointers to i8 in any address space and returns
pointer to i8 in the same address space.
Differential Revision: https://reviews.llvm.org/D39973
llvm-svn: 318413
// trunc (binop X, C) --> binop (trunc X, C')
// trunc (binop (ext X), Y) --> binop X, (trunc Y)
I'm grouping sub with the other binops because that makes the code simpler
and the transforms are valid:
https://rise4fun.com/Alive/UeF
...so even though we don't expect a sub with constant Op1 or any of the
other opcodes with constant Op0 due to canonicalization rules, we might as
well handle those situations if non-canonical code somehow reaches this
point (it should just make instcombine more efficient in reaching its
end goal).
This should solve the problem that later manifests in the vectorizers in
PR35295:
https://bugs.llvm.org/show_bug.cgi?id=35295
llvm-svn: 318404
Fix a couple places where the minimum alignment/size should be a
function of the shadow granularity:
- alignment of AllGlobals
- the minimum left redzone size on the stack
Added a test to verify that the metadata_array is properly aligned
for shadow scale of 5, to be enabled when we add build support
for testing shadow scale of 5.
Differential Revision: https://reviews.llvm.org/D39470
llvm-svn: 318395
SelectionDAGBuilder::visitAlloca assumes alloca address space is 0, which is
incorrect for triple amdgcn---amdgiz and causes isel failure.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D40095
llvm-svn: 318392
Change the calculation for the desired ValueType for non-sign
extending loads, as in those cases we don't care about the
higher bits. This creates a smaller ExtVT and allows for such
combinations as:
(srl (zextload i16, [addr]), 8) -> (zextload i8, [addr + 1])
Differential Revision: https://reviews.llvm.org/D40034
llvm-svn: 318390
This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2.
Reviewers: delena, RKSimon, craig.topper, dorit
Reviewed By: dorit
Differential Revision: https://reviews.llvm.org/D40008
llvm-svn: 318385
When expanding exit conditions for pre- and postloops, we may end up expanding a
recurrency from the loop to in its loop's preheader. This produces incorrect IR.
This patch ensures that IRCE uses SCEVExpander correctly and only expands code which
is safe to expand in this particular location.
Differentian Revision: https://reviews.llvm.org/D39234
llvm-svn: 318381
The type legalizer will try to scalarize these operations if it sees them, but there is no handling for scalarizing them. This leads to a fatal error. With this change they will now be scalarized by the mem intrinsic scalarizing pass before SelectionDAG.
llvm-svn: 318380
processDbgDeclares assumes pointer size is the same for different addr spaces.
It uses pointer size for addr space 0 for all pointers, which causes assertion
in stripAndAccumulateInBoundsConstantOffsets for amdgcn---amdgiz since
pointer in addr space 5 has different size than in addr space 0.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D40085
llvm-svn: 318370
Add hook in BPF backend so that llvm-objdump can print out
the jmp target with label names, e.g.,
...
if r1 != 2 goto 6 <LBB0_2>
...
goto 7 <LBB0_4>
...
LBB0_2:
...
LBB0_4:
...
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 318358
Summary:
This patch adds a LLVM_ENABLE_GISEL_COV which, like LLVM_ENABLE_DAGISEL_COV,
causes TableGen to instrument the generated table to collect rule coverage
information. However, LLVM_ENABLE_GISEL_COV goes a bit further than
LLVM_ENABLE_DAGISEL_COV. The information is written to files
(${CMAKE_BINARY_DIR}/gisel-coverage-* by default). These files can then be
concatenated into ${LLVM_GISEL_COV_PREFIX}-all after which TableGen will
read this information and use it to emit warnings about untested rules.
This technique could also be used by SelectionDAG and can be further
extended to detect hot rules and give them priority over colder rules.
Usage:
* Enable LLVM_ENABLE_GISEL_COV in CMake
* Build the compiler and run some tests
* cat gisel-coverage-[0-9]* > gisel-coverage-all
* Delete lib/Target/*/*GenGlobalISel.inc*
* Build the compiler
Known issues:
* ${LLVM_GISEL_COV_PREFIX}-all must be generated as a manual
step due to a lack of a portable 'cat' command. It should be the
concatenation of all ${LLVM_GISEL_COV_PREFIX}-[0-9]* files.
* There's no mechanism to discard coverage information when the ruleset
changes
Depends on D39742
Reviewers: ab, qcolombet, t.p.northover, aditya_nandakumar, rovka
Reviewed By: rovka
Subscribers: vsk, arsenm, nhaehnle, mgorny, kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D39747
llvm-svn: 318356
Use VOP3 add/addc like usual.
This has some tradeoffs. Inline immediates fold
a little better, but other constants are worse off.
SIShrinkInstructions could be made smarter to handle
these cases.
This allows us to avoid selecting scalar adds where we
need to track the carry in scc and replace its users.
This makes it easier to use the carryless VALU adds.
llvm-svn: 318340
The original -O binary implementation just copied segment data from the
object and dumped it into a file. This doesn't take into account any
operations performed on objects such as section removal. GNU objcopy has
some specific behavior that we'd also like to respect. For instance
using -O binary and -j <some_section> will dump <some_section> to a
file. This change implements GNU objcopy style -O binary to as close of
an approximation as I can determine.
Differential Revision: https://reviews.llvm.org/D39713
llvm-svn: 318324
Note that one-use and shouldChangeType() are checked ahead of the switch.
Without the narrowing folds, we can produce inferior vector code as shown in PR35299:
https://bugs.llvm.org/show_bug.cgi?id=35299
llvm-svn: 318323
Implements TargetLowering callback 'mayBeEmittedAsTailCall' that enables
CodeGenPrepare to duplicate returns when they might enable a tail-call.
Differential Revision: https://reviews.llvm.org/D39777
llvm-svn: 318321
InstCombine salvages debug info for every instruction it erases from its
worklist, but it wasn't doing it during its initial DCE when populating
its worklist. This fixes that.
This should help improve availability of 'this' in optimized debug info
when casts are necessary.
llvm-svn: 318320
Some CPUs are already overriding these sign extension instructions but we should be able to use the WriteALU schedule class by default.
Differential Revision: https://reviews.llvm.org/D39899
llvm-svn: 318308
Summary:
Added more remarks to SLP pass, in particular "missed" optimization remarks.
Also proposed several tests for new functionality.
Patch by Vladimir Miloserdov!
For reference you may look at: https://reviews.llvm.org/rL302811
Reviewers: anemet, fhahn
Reviewed By: anemet
Subscribers: javed.absar, lattner, petecoup, yakush, llvm-commits
Differential Revision: https://reviews.llvm.org/D38367
llvm-svn: 318307
This is a recommit of r316869 which was speculatively reverted with r317444 and
subsequently shown to not be the cause of PR35210. That crash should be fixed
after r318237.
Original commit message:
The old PM sets the options of what used to be known as "latesimplifycfg" on the
instantiation after the vectorizers have run, so that's what we'redoing here.
FWIW, there's a later SimplifyCFGPass instantiation in both PMs where we do not
set the "late" options. I'm not sure if that's intentional or not.
Differential Revision: https://reviews.llvm.org/D39407
llvm-svn: 318299
Summary:
Prevent an issue where a diagnostic is reported multiple times by bailing out with a ParseFail if an invalid SVE register element qualifier/suffix is specified, for example:
<stdin>:10:18: error: invalid sve vector kind qualifier
add z20.h, z2.h, z31.x
^
<stdin>:10:18: error: invalid sve vector kind qualifier
add z20.h, z2.h, z31.x
...
<stdin>:10:18: error: invalid sve vector kind qualifier
add z20.h, z2.h, z31.x
^
Reviewers: fhahn, rengolin
Reviewed By: rengolin
Subscribers: aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D39894
llvm-svn: 318297
APInt is now used instead of uint64_t in function genConstMult() allowing
multiplication optimizations with constants of arbitrary length.
Patch by Milos Stojanovic.
Differential Revision: https://reviews.llvm.org/D38130
llvm-svn: 318296
In constructAbstractSubprogramScopeDIE there can be a potential mismatch
between `this` and the CU of ContextDIE when a scope is shared between
two DISubprograms belonging to a different CU. In that case, `this` is
the CU that was specified in the IR, but the CU of ContextDIE is that of
the first subprogram that was emitted. This patch fixes the mismatch by
looking up the CU of ContextDIE, and switching to use that.
This fixes PR35212 (https://bugs.llvm.org/show_bug.cgi?id=35212)
Patch by Philip Craig!
Differential revision: https://reviews.llvm.org/D39981
llvm-svn: 318289
Summary:
This fixes PR35241.
When using byval, the data is effectively copied as part of the call
anyway, so the pointer returned by the alloca will not be leaked to the
callee and thus there is no reason to issue a warning.
Reviewers: rnk
Reviewed By: rnk
Subscribers: Ka-Ka, llvm-commits
Differential Revision: https://reviews.llvm.org/D40009
llvm-svn: 318279
Summary:
This patch optimizes a binop sandwiched between 2 selects with the same condition. Since we know its only used by the select we can propagate the appropriate input value from the earlier select.
As I'm writing this I realize I may need to avoid doing this for division in case the select was protecting a divide by zero?
Reviewers: spatel, majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39999
llvm-svn: 318267
It crashes building sqlite; see reply on the llvm-commits thread.
> [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops.
>
> Patch tries to improve vectorization of the following code:
>
> void add1(int * __restrict dst, const int * __restrict src) {
> *dst++ = *src++;
> *dst++ = *src++ + 1;
> *dst++ = *src++ + 2;
> *dst++ = *src++ + 3;
> }
> Allows to vectorize even if the very first operation is not a binary add, but just a load.
>
> Fixed issues related to previous commit.
>
> Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
>
> Reviewed By: ABataev, RKSimon
>
> Subscribers: llvm-commits, RKSimon
>
> Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 318239
Simplifying a loop latch changes the IR and we need to make sure the pass manager knows to invalidate analysis passes if that happened.
PR35210 discovered a case where we failed to invalidate the post dominator tree after this simplification because we no changes other than simplifying the loop latch.
Fixes PR35210.
Differential Revision: https://reviews.llvm.org/D40035
llvm-svn: 318237
Summary:
In the mode when ASan shadow base is computed as the address of an
external global (__asan_shadow, currently on android/arm32 only),
regalloc prefers to rematerialize this value to save register spills.
Even in -Os. On arm32 it is rather expensive (2 loads + 1 constant
pool entry).
This changes adds an inline asm in the function prologue to suppress
this behavior. It reduces AsanTest binary size by 7%.
Reviewers: pcc, vitalybuka
Subscribers: aemerson, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40048
llvm-svn: 318235
The test was doing -stop-after=isel, but that pass is actually the
AMDGPUDAGToDAGISel pass, which might not be built when targeting x86_64.
This changes the test to -stop-after=expand-isel-pseudos instead.
Follow-up to r318202.
llvm-svn: 318220
Allows users to view GraphResult objects in a DOT directed-graph format. This feature can be turned on through the --print-graphs flag.
Also enabled pretty-printing of instructions in output. Together these features make analysis of unprotected CF instructions much easier by providing a visual control flow graph.
Reviewers: pcc
Subscribers: llvm-commits, kcc, vlad.tsyrklevich
Differential Revision: https://reviews.llvm.org/D39819
llvm-svn: 318211
artifacts along with DCE
Legalization Artifacts are all those insts that are there to make the
type system happy. Currently, the target needs to say all combinations
of extends and truncs are legal and there's no way of verifying that
post legalization, we only have *truly* legal instructions. This patch
changes roughly the legalization algorithm to process all illegal insts
at one go, and then process all truncs/extends that were added to
satisfy the type constraints separately trying to combine trivial cases
until they converge. This has the added benefit that, the target
legalizerinfo can only say which truncs and extends are okay and the
artifact combiner would combine away other exts and truncs.
Updated legalization algorithm to roughly the following pseudo code.
WorkList Insts, Artifacts;
collect_all_insts_and_artifacts(Insts, Artifacts);
do {
for (Inst in Insts)
legalizeInstrStep(Inst, Insts, Artifacts);
for (Artifact in Artifacts)
tryCombineArtifact(Artifact, Insts, Artifacts);
} while(!Insts.empty());
Also, wrote a simple wrapper equivalent to SetVector, except for
erasing, it avoids moving all elements over by one and instead just
nulls them out.
llvm-svn: 318210
This adjusts the tests to hopfully pacify the
llvm-clang-x86_64-expensive-checks-win buildbot.
Unlike many other instructions, these instructions have aliases which
take coprocessor registers, gpr register, accumulator (and dsp accumulator)
registers, floating point registers, floating point control registers and
coprocessor 2 data and control operands.
For the moment, these aliases are treated as pseudo instructions which are
expanded into the underlying instruction. As a result, disassembling these
instructions shows the underlying instruction and not the alias.
Reviewers: slthakur, atanasyan
Differential Revision: https://reviews.llvm.org/D35253
llvm-svn: 318207
Summary:
Instcombine (and probably other passes) sometimes want to change the
type of an alloca. To do this, they generally create a new alloca with
the desired type, create a bitcast to make the new pointer type match
the old pointer type, replace all uses with the cast, and then simplify
the casts. We already knew how to salvage dbg.value instructions when
removing casts, but we can extend it to cover dbg.addr and dbg.declare.
Fixes a debug info quality issue uncovered in Chromium in
http://crbug.com/784609
Reviewers: aprantl, vsk
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40042
llvm-svn: 318203
This patch peels off the top case in switch statement into a branch if the
probability exceeds a threshold. This will help the branch prediction and
avoids the extra compares when lowering into chain of branches.
Differential Revision: http://reviews.llvm.org/D39262
llvm-svn: 318202
Clang implements the -finstrument-functions flag inherited from GCC, which
inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.
This is useful for getting a trace of how the functions in a program are
executed. Normally, the calls remain even if a function is inlined into another
function, but it is useful to be able to turn this off for users who are
interested in a lower-level trace, i.e. one that reflects what functions are
called post-inlining. (We use this to generate link order files for Chromium.)
LLVM already has a pass for inserting similar instrumentation calls to
mcount(), which it does after inlining. This patch renames and extends that
pass to handle calls both to mcount and the cygprofile functions, before and/or
after inlining as controlled by function attributes.
Differential Revision: https://reviews.llvm.org/D39287
llvm-svn: 318195
Patch tries to improve vectorization of the following code:
void add1(int * __restrict dst, const int * __restrict src) {
*dst++ = *src++;
*dst++ = *src++ + 1;
*dst++ = *src++ + 2;
*dst++ = *src++ + 3;
}
Allows to vectorize even if the very first operation is not a binary add, but just a load.
Fixed issues related to previous commit.
Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev
Reviewed By: ABataev, RKSimon
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D28907
llvm-svn: 318193
In rare cases, common code will attempt to select an OR of two
constants. This confuses the logic in splitLargeImmediate,
causing an internal error during isel. Fixed by simply leaving
this case to common code to handle.
This fixes PR34859.
llvm-svn: 318187
They don't actually change nay behaviour, as llvm-strings currently
checks the whole object without looking at individual sections anyway.
This allows using llvm-strings in a context that explicitly passes
the -a option.
Differential Revision: https://reviews.llvm.org/D40020
llvm-svn: 318185
Summary:
Bypass of slow divs based on operand values is currently disabled for
-Os. Do the same when profile summary is available and the working set
size of the application is huge. This is similar to how loop peeling is
guarded by hasHugeWorkingSetSize. In the div bypass case, the generated
extra code (and the extra branch) tendss to outweigh the benefits of the
bypass. This results in noticeable performance improvement on an
internal application.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39992
llvm-svn: 318179
Before using the 32-bit RISBMux set of instructions we need to
verify that the input bits are actually within range of the 32-bit
instruction. This fixer PR35289.
llvm-svn: 318177
This change adds a new flag not present in GNU objcopy that we call
--strip-non-alloc.
Differential Revision: https://reviews.llvm.org/D39926
llvm-svn: 318168
TargetLowering::LowerCallTo assumes that sret value type corresponds to a
pointer in default address space, which is incorrect, since sret value type
should correspond to a pointer in alloca address space, which may not
be the default address space. This causes assertion for amdgcn target
in amdgiz environment.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39996
llvm-svn: 318167
We haven't been supporting anything but ELF64LE since the start. Luckily
this was always accounted for and the change is pretty trivial. B35281
requests this change for ELF32LE. This change adds support for ELF32LE,
ELF64BE, and ELF32BE with all supported features that already existed
for ELF64LE.
Differential Revision: https://reviews.llvm.org/D39977
llvm-svn: 318166
This patch is part of D38676.
The patch introduces two new Recipes to handle instructions whose vectorization
involves masking. These Recipes take VPlan-level masks in D38676, but still rely
on ILV's existing createEdgeMask(), createBlockInMask() in this patch.
VPBlendRecipe handles intra-loop phi nodes, which are vectorized as a sequence
of SELECTs. Its execute() code is refactored out of ILV::widenPHIInstruction(),
which now handles only loop-header phi nodes.
VPWidenMemoryInstructionRecipe handles load/store which are to be widened
(but are not part of an Interleave Group). In this patch it simply calls
ILV::vectorizeMemoryInstruction on execute().
Differential Revision: https://reviews.llvm.org/D39068
llvm-svn: 318149
Because the block-splitting code is multi-purpose, we have to meddle with the
branches when using it to fixup a conditional branch destination. We got the
code right, but forgot to update the CFG so the verifier complained when
expensive checks were on.
Probably harmless since constant-islands comes so late, but best to fix it
anyway.
llvm-svn: 318148
Get rid of the handwritten instruction selector code for handling
G_CONSTANT. This code wasn't checking all the preconditions correctly
anyway, so it's better to leave it to TableGen, which can handle at
least some cases correctly (e.g. MOVi, MOVi16, folding into binary
operations). Also add tests to cover those cases.
llvm-svn: 318146
When we emit a tail call for Armv8-M, but then discover that the caller needs to
save/restore `LR`, we convert the tail call to an ordinary one, since restoring
`LR` takes extra instructions, which may negate the benefits of the tail
call. If the callee, however, takes stack arguments, this conversion is
incorrect, since nothing has been done to pass the stack arguments.
Thus the patch reverts https://reviews.llvm.org/rL294000
Also, we improve the instruction sequence for popping `LR` in the case when we
couldn't immediately find a scratch low register, but we can use as a temporary
one of the callee-saved low registers and restore `LR` before popping other
callee-saves.
Differential Revision: https://reviews.llvm.org/D39599
llvm-svn: 318143
This test was originally added when an old bug was fixed that caused
broken iterator code to break basic block placement.
The issue has an extremely low chance of every being a problem again.
This specific test is very flaky and fails often due to upstream
changes.
I have removed this test because it negates more value than it returns.
llvm-svn: 318134
If the register from the copy from exec was spilled,
the copy before the spill was deleted leaving a spill
of undefined register verifier error and miscompiling.
Check for other use instructions of the copy register.
llvm-svn: 318132
Registers it and everything, updates all the references, etc.
Next patch will add support to Clang's `-fexperimental-new-pass-manager`
path to actually enable BoundsChecking correctly.
Differential Revision: https://reviews.llvm.org/D39084
llvm-svn: 318128
For now at least. We clearly need some kind of comdat or
linkonce_odr support for wasm but currently COMDAT is not
supported.
Disable COMDAT support in the same way we do the Mach-O. This
also causes clang not to generated COMDATs.
Differential Revision: https://reviews.llvm.org/D39873
llvm-svn: 318123
Many projects use this option. There are two ways to use it. You can
either a) Just use --strip-debug and keep the old file with debug
content or b) you can use --strip-debug, --only-keep-debug, and
--add-gnu-debuglink all in conjunction to create two separate files, the
stripped file and the debug file. --only-keep-debug is more complicated
than --strip-debug because it keeps the section headers without keeping
section contents. That's not really supported by llvm-objcopy at the
moment but I plan on adding it. So this change just supports a) and
options to support b) will come soon.
Differential Revision: https://reviews.llvm.org/D39919
llvm-svn: 318094
This change adds a slightly less extreme form of stripping. It should
remove any section that starts with ".debug" and should remove any
symbol table or relocations. In general this strips out most of the
stuff you don't need to execute but leaves a number of things around.
This behavior has been designed to be compatible with GNU strip/objcopy
--strip-all so that anywhere you currently use --strip-all you should be
able to use llvm-objcopy as a drop in replacement.
Differential Revision: https://reviews.llvm.org/D39769
llvm-svn: 318092
Summary:
This fixes PR35221.
Use pseudo-instructions to let MachineCSE hoist global address computation.
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39871
llvm-svn: 318081
If the base of our gather corresponds to something contained in X86ISD::Wrapper we should be able to fold it into the address.
This patch refactors some of the address matching to more fully use the X86ISelAddressMode struct and the getAddressOperands helper. A new helper function matchVectorAddress is added to call matchWrapper or fall back to matchAddressBase.
We should also be able to support constant offsets from a wrapper, but I'll look into that in a future patch. We may even be able to completely reuse matchAddress here, but I wanted to start simple and work up to it.
Differential Revision: https://reviews.llvm.org/D39927
llvm-svn: 318057
Summary:
If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.
Will fix PR 34538
Reviewers: sanjoy, hfinkel, junryoungju
Reviewed By: sanjoy, junryoungju
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38494
llvm-svn: 318050
Make one of the legalizer tests a bit more robust by making sure all
values we're interested in are used (either in a store or a return) and
by using loads instead of constants for obtaining values on fewer than
32 bits. This should make the test less fragile to changes in the
legalize combiner, since those loads are legal (as opposed to the
constants, which were being widened and thus produced opportunities for
the legalize combiner).
llvm-svn: 318047
Remove builtins from llvm and add AutoUpgrade support.
Also add fast-isel tests for the TEST and TESTN instructions.
Differential Revision: https://reviews.llvm.org/D38736
llvm-svn: 318036
When generating table jump code for switch statements, place the jump
table label as the first operand in the various addition instructions
in order to enable addressing mode selectors to better match index
computation and possibly fold them into the addressing mode of the
table entry load instruction.
Differential revision: https://reviews.llvm.org/D39752
llvm-svn: 318033
CodeGenPrepare sinks address computations from one basic block to another
and attempts to reuse address computations that have already been sunk. If
the same address computation appears twice with the first instance as an
operand of a load whose result is an operand to a simplifable select,
CodeGenPrepare simplifies the select and recursively erases the now dead
instructions. CodeGenPrepare then attempts to use the erased address
computation for the second load.
Fix this by erasing the cached address value if it has zero uses before
looking for the address value in the sunken address map.
This partially resolves PR35209.
Thanks to Alexander Richardson for reporting the issue!
Reviewers: john.brawn
Differential Revision: https://reviews.llvm.org/D39841
llvm-svn: 318032
Summary:
This patch extends the partial inliner to support inlining parts of
vararg functions, if the vararg handling is done in the outlined part.
It adds a `ForwardVarArgsTo` argument to InlineFunction. If it is
non-null, all varargs passed to the inlined function will be added to
all calls to `ForwardVarArgsTo`.
The partial inliner takes care to only pass `ForwardVarArgsTo` if the
varargs handing is done in the outlined function. It checks that vastart
is not part of the function to be inlined.
`test/Transforms/CodeExtractor/PartialInlineNoInline.ll` (already part
of the repo) checks we do not do partial inlining if vastart is used in
a basic block that will be inlined.
Reviewers: davide, davidxl, grosser
Reviewed By: davide, davidxl, grosser
Subscribers: gyiu, grosser, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D39607
llvm-svn: 318028
This patch, together with a matching clang patch (https://reviews.llvm.org/D38672), implements the lowering of X86 shuffle i/f intrinsics to IR.
Differential Revision: https://reviews.llvm.org/D38671
Change-Id: I1e7d359a74743e995ec356237a85214ce55d3661
llvm-svn: 318026
Updated the scheduling information of the SKX subtarget in the file X86SchedSkylakeServer.td under lib/Target/X86 to:
1. add regular opcodes in addition to the suffixed "_Int" opcodes
2. add the (V)MAXCPD/MAXCPS/MAXCSD/MAXCSS/MINCPD/MINCPS/MINCSD/MINCSS
instructions that are equivalent to their counterparts without the 'C' as they are part of a hack to
make floating point min/max commutable under fast math.
Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D39833
Change-Id: Ie13702a5ce1b1a08af91ca637a52b6962881e7d6
llvm-svn: 318024
This was using a custom function that didn't handle the
addressing modes properly for private. Use
isLegalAddressingMode to avoid duplicating this.
Additionally, skip the combine if there is only one use
since the standard combine will handle it.
llvm-svn: 318013
The VRNDSCALE instructions implement a superset of the (V)ROUND instructions. They are equivalent if the upper 4-bits of the immediate are 0.
This patch lowers the legacy intrinsics to the VRNDSCALE ISD node and masks the upper bits of the immediate to 0. This allows us to take advantage of the larger register encoding space.
We should maybe consider converting VRNDSCALE back to VROUND in the EVEX to VEX pass if the extended registers are not being used.
I notice some load folding opportunities being missed for the VRNDSCALESS/SD instructions that I'll try to fix in future patches.
llvm-svn: 318008
Summary:
This patch adds an early out to visitICmpInst if we are looking at a compare as part of an integer absolute value idiom. Similar is already done for min/max.
In the particular case I observed in a benchmark we had an absolute value of a load from an indexed global. We simplified the compare using foldCmpLoadFromIndexedGlobal into a magic bit vector, a shift, and an and. But the load result was still used for the select and the negate part of the absolute valute idiom. So we overcomplicated the code and lost the ability to recognize it as an absolute value.
I've chosen a simpler case for the test here.
Reviewers: spatel, davide, majnemer
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39766
llvm-svn: 317994
This is consistent with out normal implementation of scalar instructions.
While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice.
llvm-svn: 317977
Allow a pattern rewriter to be installed in CodeGenDAGPatterns and use it to
correct situations where SelectionDAG and GlobalISel disagree on
representation. For example, it would rewrite:
(sextload:i32 $ptr)<<unindexedload>><<sextload>><<sextloadi16>
to:
(sext:i32 (load:i16 $ptr)<<unindexedload>>)
I'd have preferred to replace the fragments and have the expansion happen
naturally as part of PatFrag expansion but the type inferencing system can't
cope with loads of types narrower than those mentioned in register classes.
This is because the SDTCisInt's on the sext constrain both the result and
operand to the 'legal' integer types (where legal is defined as 'a register
class can contain the type') which immediately rules the narrower types out.
Several targets (those with only one legal integer type) would then go on to
crash on the SDTCisOpSmallerThanOp<> when it removes all the possible types
for the result of the extend.
Also, improve isObviouslySafeToFold() slightly to automatically return true for
neighbouring instructions. There can't be any re-ordering problems if
re-ordering isn't happenning. We'll need to improve it further to handle
sign/zero-extending loads when the extend and load aren't immediate neighbours
though.
llvm-svn: 317971
Currently we can only get a uniform base from a simple GEP with 2 operands. This causes us to miss address folding opportunities for simple global array accesses as the test case shows.
This patch adds support for larger GEPs if the other indices are 0 since those don't require any additional computations to be inserted.
We may also want to handle constant splats of zero here, but I'm leaving that for future work when I have a real world example.
Differential Revision: https://reviews.llvm.org/D39911
llvm-svn: 317947
Summary:
The following kernel change has moved ET_DYN base to 0x4000000 on arm32:
https://marc.info/?l=linux-kernel&m=149825162606848&w=2
Switch to dynamic shadow base to avoid such conflicts in the future.
Reserve shadow memory in an ifunc resolver, but don't use it in the instrumentation
until PR35221 is fixed. This will eventually let use save one load per function.
Reviewers: kcc
Subscribers: aemerson, srhines, kubamracek, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D39393
llvm-svn: 317943
Also change some default cases into llvm_unreachable in
WindowsResourceCOFFWriter, to make it easier to find if they
are triggerd from within e.g. lld, which supported ARM64 earlier
than llvm-cvtres did.
Differential Revision: https://reviews.llvm.org/D39892
llvm-svn: 317942
Summary:
This adds logic to CVP to remove some overflow checks. It uses LVI to remove
operations with at least one constant. Specifically, this can remove many
overflow intrinsics immediately following an overflow check in the source code,
such as:
if (x < INT_MAX)
... x + 1 ...
Patch by Joel Galenson!
Reviewers: sanjoy, regehr
Reviewed By: sanjoy
Subscribers: fhahn, pirama, srhines, llvm-commits
Differential Revision: https://reviews.llvm.org/D39483
llvm-svn: 317911
Change the test format for SVE assembler/disassembler tests to be less verbose and have both tests in the same file.
The tests check the following:
* All instructions are assembled correctly into the right encoding.
* All instructions are disassembled correctly (into the preferred assembly format)
* Without -mattr=+sve the instructions are not assembled.
* Without -mattr=+sve the instructions are not disassembled.
This patch also adds several negative tests for SVE add/sub.
Patch by Sander De Smalen.
Reviewed by: rengolin, fhahn
Differential Revision: https://reviews.llvm.org/D39792
llvm-svn: 317894
Summary:
The associated debug value is updated when the virtual source register
of a copy is completely eliminated and replaced with a rematerialize
value in the defed register of the copy. As the debug value now is
associated with another register it also need to be moved, otherwise
the debug value isn't valid.
Reviewers: aprantl
Reviewed By: aprantl
Subscribers: MatzeB, llvm-commits, qcolombet
Differential Revision: https://reviews.llvm.org/D38024
llvm-svn: 317880
* The method getRegAllocationHints() is now of bool type instead of void. If
true is returned, regalloc (AllocationOrder) will *only* try to allocate the
hints, as opposed to merely trying them before non-hinted registers.
* TargetRegisterInfo::getRegAllocationHints() is implemented for SystemZ with
an increase in number of LOCRs.
In this case, it is desired to force the hints even though there is a slight
increase in spilling, because if a non-hinted register would be allocated,
the LOCRMux pseudo would have to be expanded with a jump sequence. The LOCR
(Load On Condition) SystemZ instruction must have both operands in either the
low or high part of the 64 bit register.
Reviewers: Quentin Colombet and Ulrich Weigand
https://reviews.llvm.org/D36795
llvm-svn: 317879
r600 uses dummy pointer info for lowering load/store. Since dummy pointer info
assumes address space 0, this causes isel failure when temporary load/store SDNodes
are generated for amdgiz environment.
Since the offest is not constant, FixedStack pseudo source value cannot be used
to create the pointer info. This patch creates pointer info using llvm undef value.
At least this provides correct address space so that isel can be done correctly.
Differential Revision: https://reviews.llvm.org/D39698
llvm-svn: 317862
The pointer info for pseudo source for r600 is not correct when
alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39670
llvm-svn: 317861
This change doesn't fix the root cause of the miscompile PR34966 as the root
cause is in the linker ld64. This change makes call graph more complete
allowing to have better module imports/exports.
rdar://problem/35344706
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, inglorion, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D39356
llvm-svn: 317853
Summary:
This wrapper checks if there is at least one non-zero weight before
setting the metadata.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39872
llvm-svn: 317845
When the Constant Hoisting pass moves expensive constants into a
common block, it would assign a debug location equal to the last use
of that constant. While this is certainly intuitive, it places the
constant in an out-of-order location, according to the debug location
information. This produces out-of-order stepping when debugging
programs affected by this pass.
This patch creates in-order stepping behavior by merging the debug
locations for hoisted constants, and the new insertion point.
Patch by Matthew Voss!
Differential Revision: https://reviews.llvm.org/D38088
llvm-svn: 317827
Summary:
The analysis of the store sequence goes in straight order - from the
first store to the last. Bu the best opportunity for vectorization will
happen if we're going to use reverse order - from last store to the
first. It may be best because usually users have some initialization
part + further processing and this first initialization may confuse
SLP vectorizer.
Reviewers: RKSimon, hfinkel, mkuper, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39606
llvm-svn: 317821
The toxic stew of created values named 'tmp' and tests that already have
values named 'tmp' and CHECK lines looking for values named 'tmp' causes
bad things to happen in our test line auto-generation scripts because it
wants to use 'TMP' as a prefix for unnamed values. Use less 'tmp' to
avoid that.
llvm-svn: 317818
The reassociate pass generates named values such as "%tmp2" which trips up the script's regex's
because the script uses a 'TMP' prefix for unnamed values (%2).
llvm-svn: 317810
We don't really need any special handling of "offsettable"
memory addresses, but since some existing code uses inline
asm statements with the "o" constraint, add support for this
constraint for compatibility purposes.
llvm-svn: 317807
Correct the definition of 'j' as being unavailable for microMIPS32R6 and
provide the 'b' assembly idiom for codegen purposes for microMIPS32r3.
Provide the necessary 'br' pattern for microMIPS32R6 as it now longer
incorrectly uses the 'j' instruction.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D39741
llvm-svn: 317801
Summary:
This change allows yaml input to control the order of implicitly added sections
(`.symtab`, `.strtab`, `.shstrtab`). The order is controlled by adding a
placeholder section of the given name to the Sections field.
This change is to support changes in D39582, where it is desirable to control
the location of the `.dynsym` section.
This reapplied version fixes:
1. use of a function call within an assert
2. failing lld test which has an unnamed section
3. incorrect section count when given an unnamed section
Additionally, one more test to cover the unnamed section failure.
Reviewers: compnerd, jakehehrlich
Reviewed By: jakehehrlich
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39749
llvm-svn: 317789
We must patch all existing incoming values of Phi node,
otherwise it is possible that we can see poison
where program does not expect to see it.
This is the similar what GVN does.
The added test test/Transforms/GVN/PRE/pre-jt-add.ll shows an
example of wrong optimization done by jump threading due to
GVN PRE did not patch existing incoming value.
Reviewers: mkazantsev, wmi, dberlin, davide
Reviewed By: dberlin
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D39637
llvm-svn: 317768
We've worked around bugs in the frontend by ignoring the count from
wrapped segments when a line has at least one region entry segment.
Those frontend bugs are now fixed, so it's time to regenerate the
checked-in covmapping files and remove the workaround.
llvm-svn: 317761
r317453 added new ISD nodes without rounding modes that were added to an existing if/else chain. But all the previous nodes handled there included a rounding mode. The final code after this if/else chain expected an extra operand that isn't present for the new nodes.
llvm-svn: 317748
Adds the blacklist behaviour to llvm-cfi-verify. Now will calculate which lines caused expected failures in the blacklist and reports the number of affected indirect CF instructions for each blacklist entry.
Also moved DWARF checking after instruction analysis to improve performance significantly - unrolling the inlining stack is expensive.
Reviewers: vlad.tsyrklevich
Subscribers: aprantl, pcc, kcc, llvm-commits
Differential Revision: https://reviews.llvm.org/D39750
llvm-svn: 317743