Kill flags need to be updated correctly when moving stores up/down to
form store pair instructions.
Those invalid flags have been ignored before but as of r290014 they are
recognized when using -mllvm -verify-machineinstrs.
Also simplifies test/CodeGen/AArch64/ldst-opt-dbg-limit.mir, renames it
to ldst-opt.mir test and adds a new tests for this change.
Differential Revision: https://reviews.llvm.org/D28875
llvm-svn: 292625
This patch fixes debug information for __thread variable on Mips
using .dtprelword and .dtpreldword directives.
Patch by Aleksandar Beserminji.
Differential Revision: http://reviews.llvm.org/D28770
llvm-svn: 292624
The recommit fixes a bug related with live interval update after the partial
redundent copy is moved.
The original patch is to solve the performance problem described in PR27827.
Register coalescing sometimes cannot remove a copy because of interference.
But if we can find a reverse copy in one of the predecessor block of the copy,
the copy is partially redundent and we may remove the copy partially by moving
it to the predecessor block without the reverse copy.
Differential Revision: https://reviews.llvm.org/D28585
llvm-svn: 292621
This is the second attemp to recommit r292526.
The original summary:
Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.
llvm-svn: 292616
We also want to optimise tests like this: return a*b == 0. The MULS
instruction is flag setting, so we don't need the CMP instruction but can
instead branch on the result of the MULS. The generated instructions sequence
for this example was: MULS, MOVS, MOVS, CMP. The MOVS instruction load the
boolean values resulting from the select instruction, but these MOVS
instructions are flag setting and were thus preventing this optimisation. Now
we first reorder and move the MULS to before the CMP and generate sequence
MOVS, MOVS, MULS, CMP so that the optimisation could trigger. Reordering of the
MULS and MOVS is safe to do because the subsequent MOVS instructions just set
the CPSR register and don't use it, i.e. the CPSR is dead.
Differential Revision: https://reviews.llvm.org/D27990
llvm-svn: 292608
Simplify a packss/packus truncation based on the elements of the mask that are actually demanded.
Differential Revision: https://reviews.llvm.org/D28777
llvm-svn: 292591
Like several other loop passes (the vectorizer, etc) this pass doesn't
really fit the model of a loop pass. The critical distinction is that it
isn't intended to be pipelined together with other loop passes. I plan
to add some documentation to the loop pass manager to make this more
clear on that side.
LoopSink is also different because it doesn't really need a lot of the
infrastructure of our loop passes. For example, if there aren't loop
invariant instructions causing a preheader to exist, there is no need to
form a preheader. It also doesn't need LCSSA because this pass is
only involved in sinking invariant instructions from a preheader into
the loop, not reasoning about live-outs.
This allows some nice simplifications to the pass in the new PM where we
can directly walk the loops once without restructuring them.
Differential Revision: https://reviews.llvm.org/D28921
llvm-svn: 292589
Part of the assert has been left active for further debugging.
The other part has been turned into a stat for tracking for the
moment.
llvm-svn: 292583
By default c++filt demangles functions, though you can optionally pass
`-t` to have it decode types as well, behaving nearly identical to
`__cxa_demangle`. Add support for this mode.
llvm-svn: 292576
This recommits r292526 which is reverted in r292529 after fixing the test case.
The original summary:
Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.
llvm-svn: 292570
Inline spiller can decide to move a spill as early as possible in the basic block.
It will skip phis and label, but we also need to make sure it skips instructions
in the basic block prologue which restore exec mask.
Added isPositionLike callback in TargetInstrInfo to detect instructions which
shall be skipped in addition to common phis, labels etc.
Differential Revision: https://reviews.llvm.org/D27997
llvm-svn: 292554
Big functions with large vreg # are quite unwieldy to update.
Change it to have one function per test (it does increase boilerplate,
but makes the core hopefully more readable and maintanable).
llvm-svn: 292552
Big functions with large vreg # are quite unwieldy to update. This test
also relied on legal s8 operations which we're considering removing.
Change it to have one function per test (it does increase boilerplate,
but makes the core hopefully more readable and maintanable), and use
100% legal operations throughout.
llvm-svn: 292551
Summary:
Fence instructions are currently marked as `ModRef` for all memory locations.
We can improve this for constant memory locations (such as constant globals),
since fence instructions cannot modify these locations.
This helps us to forward constant loads across fences (added test case in GVN).
There were no changes in behaviour for similar test cases in early-cse and licm.
Reviewers: dberlin, sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28914
llvm-svn: 292546
Use CHECK-NEXT to verify that a test breaks whenever unexpected passes,
analyses, or invalidations show up in default pipelines. The test case
is constructed so that we don't expect to invalidate anything, and needs
to be kept that way.
The test is slightly less strict than we'd like because of differences
in type pretty-printing.
(Right now it does show some invalidations - all of those are intentional
and temporary.)
Differential Revision: https://reviews.llvm.org/D28887
llvm-svn: 292536
This can prove that:
extern int f;
int g() {
int x = 0;
for (int i = 0; i < 365; ++i) {
x /= f;
}
return x;
}
always returns zero. Thanks to Sanjoy for confirming this
transformation actually made sense (bugs are mine).
llvm-svn: 292531
Use CHECK-NEXT to verify that a test breaks whenever unexpected passes,
analyses, or invalidations show up in default pipelines. The test case
is constructed so that we don't expect to invalidate anything, and needs
to be kept that way.
(Right now it does show some invalidations - all of those are intentional
and temporary.)
Differential Revision: https://reviews.llvm.org/D28887
llvm-svn: 292530
This patch improves the knownbits logic for unsigned integer min/max opcodes.
For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits.
This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set.
Differential Revision: https://reviews.llvm.org/D28853
llvm-svn: 292528
Currently, a GEP is considered free only if its indices are all constant.
TTI::getGEPCost() can give target-specific more accurate analysis. TTI is
already used for the cost of many other instructions.
Differential Revision: https://reviews.llvm.org/D28693
llvm-svn: 292526
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623
Reviewers: rengolin, dberris
Reviewed By: dberris
Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown
Differential Revision: https://reviews.llvm.org/D28624
llvm-svn: 292516
Add a SMLoc to MCExpr. Most code does not generate or consume the SMLoc (yet).
Patch by Sanne Wouda <sanne.wouda@arm.com>!
Differential Revision: https://reviews.llvm.org/D28861
llvm-svn: 292515
If F is a Thumb function symbol, and G = F + const, and G is a
function symbol, then G is Thumb. Because what else could it be?
Differential Revision: https://reviews.llvm.org/D28878
llvm-svn: 292514
Summary:
In case of non-alloca pointers, we check for whether it is a pointer
from malloc-like calls and it is not captured. In such case, we can
promote the pointer, as the caller will have no way to access this pointer
even if there is unwinding in middle of the loop.
Reviewers: hfinkel, sanjoy, reames, eli.friedman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28834
llvm-svn: 292510
It describes a region of arbitrary data included in a Mach-O file.
Its initial use is to record extra data in MH_CORE files.
rdar://30001545
rdar://30001731
llvm-svn: 292500
As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further
llvm-svn: 292493
As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further
llvm-svn: 292487
Summary:
The SDNodeOrder is saved in the IROrder field in the SDNode, and this
field may affects scheduling. Thus, letting dbg.value/declare increase
the order numbers may in turn affect scheduling.
Because of this change we also need to update the code deciding when
dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues.
Dbg values now have the same order as the SDNode they are connected to,
not the following orders.
Test cases provided by Florian Hahn.
Reviewers: bogner, aprantl, sunfish, atrick
Reviewed By: atrick
Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D25318
llvm-svn: 292485
If the subvector comes from a load, we convert to SUBV_BROADCAST and use a broadcast instruction. But if there is no load we keep the inserts. I think we should create the SUBV_BROADCAST even without the load and let isel use the fallback patterns that are used if the load can't be folded. This will use the SHUFF32X4 or similar instruction for the 128-bit into 512-bit case and a single insert for 128 into 256 or 256 into 512.
This should be fixed so subvector broadcast intrinsics can be replaced with native IR since some of those currently lower directly to SHUFF32X4.
llvm-svn: 292475
Summary:
Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND.
We already do this for scalar i1 operations so I just extended it to vectors of i1.
Reviewers: zvi, delena
Reviewed By: delena
Subscribers: guyblank, llvm-commits
Differential Revision: https://reviews.llvm.org/D28888
llvm-svn: 292474
For -(x + y) -> (-x) + (-y), if x == -y, this would
change the result from -0.0 to 0.0. Since the fma/fmad
combine is an extension of this problem it also
applies there.
fmul should be fine, and I don't think any of the unary
operators or conversions should be a problem either.
llvm-svn: 292473
c++filt does not attempt to demangle symbols which do not match its
expected format. This means that the symbol must start with _Z or ___Z
(block invocation function extension). Any other symbols are returned
as is. Note that this is different from the behaviour of __cxa_demangle
which will demangle fragments.
llvm-svn: 292467
LV no longer "requires" LCSSA and LoopSimplify, and instead forms
them internally as required. So, there's nothing preventing it from
being enabled.
llvm-svn: 292464
Type identifiers are exported by:
- Adding coarse-grained information about how to test the type
identifier to the summary.
- Creating symbols in the object file (aliases and absolute symbols)
containing fine-grained information about the type identifier.
Differential Revision: https://reviews.llvm.org/D28424
llvm-svn: 292462
This changes the vectorizer to explicitly use the loopsimplify and lcssa utils,
instead of "requiring" the transformations as if they were analyses.
This is not NFC, since it changes the LCSSA behavior - we no longer run LCSSA
for all loops, but rather only for the loops we expect to modify.
Differential Revision: https://reviews.llvm.org/D28868
llvm-svn: 292456
There's no neg.f16 instruction, so negation has to
be done via subtraction from zero.
Differential Revision: https://reviews.llvm.org/D28876
llvm-svn: 292452
To avoid regressions, make ScalarEvolution::createSCEV a bit more
clever.
Also get rid of some useless code in ScalarEvolution::howFarToZero
which was hiding this bug.
No new testcase because it's impossible to actually expose this bug:
we don't have any in-tree users of getUDivExactExpr besides the two
functions I just mentioned, and they both dodged the problem. I'll
try to add some interesting users in a followup.
Differential Revision: https://reviews.llvm.org/D28587
llvm-svn: 292449
r291670 doesn't crash on the original testcase from PR31589,
but it crashes on a slightly more complex one.
PR31589 has the new reproducer.
llvm-svn: 292444
We currently check whether a reduction has a single outside user. We don't
really need to require that - we just need to make sure a single value is
used externally. The number of external users of that value shouldn't actually
matter.
Differential Revision: https://reviews.llvm.org/D28830
llvm-svn: 292424
ARM seems to prefer that long literals be formed from their little end in
order to promote the fusion of the instrs pairs MOV/MOVK and MOVK/MOVK on
Cortex A57 and others (v. "Cortex A57 Software Optimisation Guide", section
4.14).
Differential revision: https://reviews.llvm.org/D28697
llvm-svn: 292422
Summary:
Without this, we're stressing the RAUW of unique nodes,
which is a costly operation. This is intended to limit
the number of RAUW, and is very effective on the total
link-time of opt with ThinLTO, before:
real 4m4.587s user 15m3.401s sys 0m23.616s
after:
real 3m25.261s user 12m22.132s sys 0m24.152s
Reviewers: tejohnson, pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28751
llvm-svn: 292420
Limit register coalescer by not allowing it to artificially increase
size of registers beyond dword. Such super-registers are in fact
register sequences and not distinct HW registers.
With more super-regs we would need to allocate adjacent registers
and constraint regalloc more than needed. Moreover, our super
registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2,
VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers
allocation even more, resulting in excessive spilling.
Differential Revision: https://reviews.llvm.org/D28782
llvm-svn: 292413
Non-prevailing weak/linkonce odr symbols will be dropped by ThinLTO to
available_externally when possible. If they had an initializer in the
global_ctors list, a comdat group was being created. This code
already had logic to skip available_externally defs, but now the
EliminateAvailableExternally pass will drop these symbols to
declarations earlier. Change the check to skip all declarations for
linker (which includes available_externally along with declarations).
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28737
llvm-svn: 292408
An ELFObjectFile can now create SubtargetFeatures from the available
ARM build attributes, in a similar manner to MIPS. I've moved the
MIPS code into its own function and the ARM handler also has a
separate function.
Differential Revision: https://reviews.llvm.org/D28291
llvm-svn: 292403
A 64-bit relocation does not exist in 32-bit ARMELF. Report an error
instead of crashing.
PR23870
Patch by Sanne Wouda (sanwou01).
Differential Revision: https://reviews.llvm.org/D28851
llvm-svn: 292373
Enable an ELFObjectFile to read the its arm build attributes to
produce a target triple with a specific ARM architecture.
llvm-objdump now uses this functionality to automatically produce
a more accurate target.
Differential Revision: https://reviews.llvm.org/D28769
llvm-svn: 292366
This patch improves the mul instruction combine function (combineMul)
by adding new layer of logic.
In this patch, we are adding the ability to fold (mul x, -((1 << c) -1))
or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective.
Differential Revision: https://reviews.llvm.org/D28232
llvm-svn: 292358
This reverts commit r292210, as it broke the Thumb buldbot with:
clang-5.0: error: the clang compiler does not support '-fxray-instrument
on thumbv7-unknown-linux-gnueabihf'.
llvm-svn: 292357
During post-RA pseudo expansion, an 'undef' flag of the source operand should
be propagated by emitGRX32Move().
Review: Ulrich Weigand
llvm-svn: 292353
This patch fixes bugzilla 31576 (https://llvm.org/bugs/show_bug.cgi?id=31576).
"data32" instruction prefix was not defined in the llvm.
An exception had to be added to the X86 tablegen and AsmPrinter because both "data16" and "data32" are encoded to 0x66 (but in different modes).
Differential Revision: https://reviews.llvm.org/D28468
llvm-svn: 292352
claims to test.
LoopSimplify was unifying the multiple exits in this test case, making
it never even test the multiple exit handling of LoopDeletion. Doh.
Now it works (thanks to a great idea from mkuper) and will fail if we
ever change something to make it stop working.
llvm-svn: 292331
You can now define the register class of a virtual register on the
operand itself avoiding the need to use a "registers:" block.
Example: "%0:gr64 = COPY %rax"
Differential Revision: https://reviews.llvm.org/D22398
llvm-svn: 292321
Summary:
This change also lets us use max.{s,u}16. There's a vague warning in a
test about this maybe being less efficient, but I could not come up with
a case where the resulting SASS (sm_35 or sm_60) was different with or
without max.{s,u}16. It's true that nvcc seems to emit only
max.{s,u}32, but even ptxas 7.0 seems to have no problem generating
efficient SASS from max.{s,u}16 (the casts up to i32 and back down to
i16 seem to be implicit and nops, happening via register aliasing).
In the absence of evidence, better to have fewer special cases, emit
more straightforward code, etc. In particular, if a new GPU has 16-bit
min/max instructions, we want to be able to use them.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28732
llvm-svn: 292304
Summary: Previously we lowered it literally, to shifts and xors.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28722
llvm-svn: 292303
Summary:
Avoid an unnecessary conversion operation when using the result of
ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction
we run returns an i32.
(Previously if we used the value as an i32, we'd do an unnecessary
zext+trunc.)
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28721
llvm-svn: 292302
Summary:
* Disable "ctlz speculation", which inserts a branch on every ctlz(x) which
has defined behavior on x == 0 to check whether x is, in fact zero.
* Add DAG patterns that avoid re-truncating or re-expanding the result
of the 16- and 64-bit ctz instructions.
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D28719
llvm-svn: 292299
Summary: Partial unrolling should have separate threshold with full unrolling.
Reviewers: efriedma, mzolotukhin
Reviewed By: efriedma, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28831
llvm-svn: 292293
The patch is to solve the performance problem described in PR27827.
Register coalescing sometimes cannot remove a copy because of interference.
But if we can find a reverse copy in one of the predecessor block of the copy,
the copy is partially redundent and we may remove the copy partially by moving
it to the predecessor block without the reverse copy.
Differential Revision: https://reviews.llvm.org/D28585
llvm-svn: 292292
Some platforms (notably iOS) use a different calling convention for unnamed vs
named parameters in varargs functions, so we need to keep track of this
information when translating calls.
Since not many platforms are involved, the guts of the special handling is in
the ValueHandler class (with a generic implementation that should work for most
targets).
llvm-svn: 292283
Correctly populating Machine PHIs relies on knowing exactly how the IR level
CFG was lowered to MachineIR. This needs to be tracked by any translation
phases that meddle (currently only SwitchInst handling).
This reapplies r291973 which was reverted because of testing failures. Fixes:
+ Don't return an ArrayRef to a local temporary.
+ Incorporate Kristof's suggested comment improvements.
llvm-svn: 292278
Summary: Add a test case for LICM when promoting locals that may be read after the throw within the loop.
Reviewers: eli.friedman, hfinkel, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28822
llvm-svn: 292261
If a memory instruction will be vectorized, but it's pointer operand is
non-consecutive-like, the instruction is a gather or scatter operation. Its
pointer operand will be non-uniform. This should fix PR31671.
Reference: https://llvm.org/bugs/show_bug.cgi?id=31671
Differential Revision: https://reviews.llvm.org/D28819
llvm-svn: 292254
Even with the fix from r291630, this still causes problems. I get
widespread assertion failures in the Swift runtime's WeakRefCount::increment()
function. I sent a reduced testcase in reply to the commit.
llvm-svn: 292242
runnig LCSSA over them prior to running the loop pipeline.
This also teaches the loop PM to verify that LCSSA form is preserved
throughout the pipeline's run across the loop nest.
Most of the test updates just leverage this new functionality. One has to be
relaxed with the new PM as IVUsers is less powerful when it sees LCSSA input.
Differential Revision: https://reviews.llvm.org/D28743
llvm-svn: 292241
Also, add the corresponding match to the AssumptionCache's 'Affected Values' list.
Differential Revision: https://reviews.llvm.org/D28485
llvm-svn: 292239
Summary:
Tests under tools/llvm-objdump should not use inputs from Object. Copied the
required inputs and aligned the new tests to be more consistent with the existing
tests in this respect.
Reviewers: ioeric
Reviewed By: ioeric
Subscribers: davide, djasper, cfe-commits
Differential Revision: https://reviews.llvm.org/D28799
llvm-svn: 292222
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623
Reviewers: rengolin, dberris
Reviewed By: dberris
Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown
Differential Revision: https://reviews.llvm.org/D28624
llvm-svn: 292210
Emit error when BPF backend sees a call to a global function or to an external symbol.
The kernel verifier only allows calls to predefined helpers from bpf.h
which are defined in 'enum bpf_func_id'. Such calls in assembler must
look like 'call [1-9]+' where number matches bpf_func_id.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 292204
SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.
Use TLI instead, removing another heap of redundant code.
This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to two tests:
- calling a non-variadic function via a variadic prototype isn't OK;
it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter; the SDAG code accepts any integer
type, which meant using i32 on x86_64 worked.
I don't think it's worth supporting either of these (IMO) broken
testcases. Instead, fix them to be more correct.
llvm-svn: 292189
Add missing fabs(fpext) optimzation that worked with the call,
and also fixes it creating a second fpext when there were multiple
uses.
llvm-svn: 292172
Here we define the `graph` subcommand which generates a graph from the function
call information and uses it to present the call information graphically with
additional annotations.
Reviewers: dblaikie, dberris
Differential Revision: https://reviews.llvm.org/D27243
llvm-svn: 292156
Generally, the ISEL is expanded into if-then-else sequence, in some
cases (like when the destination register is the same with the true
or false value register), it may just be expanded into just the if
or else sequence.
llvm-svn: 292154