Summary:
Also disables leak checking on lto tests, due to many leaks reported
in the system's ld64.
Reviewers: kcc, pcc, bogner, kubamracek
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D37781
llvm-svn: 314535
The type of a SCEVConstant may not match the corresponding LLVM Value.
In this case, we skip the constant folding for now.
TODO: Replace ConstantInt Zero by ConstantPointerNull
llvm-svn: 314531
The test attempts to use -1 as carry-in for v_addc_*.
Before writing r314522, I did actually test this on real hardware,
and found that it doesn't work. So r314522 is correct in restricting
the carry-in operand: just remove those tests to make things pass
again.
llvm-svn: 314530
Summary:
Demanglers such as libiberty know how to strip suffixes of the form
\.[a-zA-Z]+\.\d+, but our current promoted value suffixes are
.llvm.${modulehash}, where the module hash is in hex. Change the
module hash to decimal to allow demanglers to handle this.
Reviewers: danielcdh
Subscribers: llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D38405
llvm-svn: 314527
This implement the insertion operator for DWARF address ranges so they
are consistently printed as [LowPC, HighPC).
While a dump method might have felt more consistent, it is used
exclusively for printing error messages in the verifier and never used
for actual dumping. Hence this approach is more intuitive and creates
less clutter at the call sites.
Differential revision: https://reviews.llvm.org/D38395
llvm-svn: 314523
The hardware will only forward EXEC_LO; the high 32 bits will be zero.
Additionally, inline constants do not work. At least,
v_addc_u32_e64 v0, vcc, v0, v1, -1
which could conceivably be used to combine (v0 + v1 + 1) into a single
instruction, acts as if all carry-in bits are zero.
The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine
s_mov_b64 s[0:1], exec
v_cndmask_b32_e64 v0, v1, v2, s[0:1]
into
v_mov_b32 v0, v3
but it's not particularly high priority.
Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.*
llvm-svn: 314522
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.
Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng
Reviewed By: hfinkel
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38085
llvm-svn: 314517
Implement shouldCoalesce() to help regalloc avoid running out of GR128
registers.
If a COPY involving a subreg of a GR128 is coalesced, the live range of the
GR128 virtual register will be extended. If this happens where there are
enough phys-reg clobbers present, regalloc will run out of registers (if
there is not a single GR128 allocatable register available).
This patch tries to allow coalescing only when it can prove that this will be
safe by checking the (local) interval in question.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D37899https://bugs.llvm.org/show_bug.cgi?id=34610
llvm-svn: 314516
New instructions are added to AArch32 and AArch64 to aid
floating-point multiplication and addition of complex numbers, where
the complex numbers are packed in a vector register as a pair of
elements. The Imaginary part of the number is placed in the more
significant element, and the Real part of the number is placed in the
less significant element.
This patch adds assembler for the ARM target.
Differential Revision: https://reviews.llvm.org/D36789
llvm-svn: 314511
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.
Recommitting r314497. This version does not contain test which fails
when compiler is not build in debug mode.
Differential Revision: https://reviews.llvm.org/D37328
llvm-svn: 314507
Add missing license information to MicroMipsInstrFPU.td and
fix most of the formatting errors present. Others will be
addressed in a follow up commits.
llvm-svn: 314505
Added additional tests for vector multiplications with multipliers that are:
* powers of 2 displaced by 1,
* product of a power of 2 displaced by one with another power of 2.
Patch by @pacxx (Michael Haidl)
Differential Revision: https://reviews.llvm.org/D38350
llvm-svn: 314504
Summary:
This commit adds comments on how the AMDPAL OS type overloads the
existing AMDGPU_ calling conventions used by Mesa, and adds a couple of
new ones.
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D37752
llvm-svn: 314502
Summary:
Added support for scratch (including spilling) for OS type amdpal:
generates code to set up the scratch descriptor if it is needed.
With amdpal, the scratch resource descriptor is loaded from offset 0 of
the global information table. The low 32 bits of the address of the
global information table is passed in s0.
Added amdgpu-git-ptr-high function attribute to hard-wire the high 32
bits of the address of the global information table. If the function
attribute is not specified, or is 0xffffffff, then the backend generates
code to use the high 32 bits of pc.
The documentation for the AMDPAL ABI will be added in a later commit.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye
Differential Revision: https://reviews.llvm.org/D37483
llvm-svn: 314501
Summary:
This operating system type represents the AMDGPU PAL runtime, and will
be required by the AMDGPU backend in order to generate correct code for
this runtime.
Currently it generates the same code as not specifying an OS at all.
That will change in future commits.
Patch from Tim Corringham.
Subscribers: arsenm, nhaehnle
Differential Revision: https://reviews.llvm.org/D37380
llvm-svn: 314500
This patch introduces 3 helper functions: error(), warn() and note() to
make printing during verification more consistent. When supported, the
respective prefixes are printed in color using the same color scheme as
clang.
Differential revision: https://reviews.llvm.org/D38368
llvm-svn: 314498
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.
Differential Revision: https://reviews.llvm.org/D37328
llvm-svn: 314497
Allow the proper recognition of Enum values and global variables inside ms inline-asm memory / immediate expressions, as they require some additional overhead and treated incorrect if doesn't early recognized.
supersedes D33278, D35774
Differential Revision: https://reviews.llvm.org/D37412
llvm-svn: 314493
This is slightly less verbose for the common case of a single build directory
and more intuitive when using this API directly from the interpreter.
llvm-svn: 314491
Previous patch fixed one of LLVM buildbots (lld-x86_64-win7).
However, some others have already been failing because of make_unique
compilation error (llvm-clang-x86_64-expensive-checks-win).
llvm-svn: 314480
This allows llvm-rc to parse user-defined resources (ref:
msdn.microsoft.com/en-us/library/windows/desktop/aa381054.aspx).
These statements either import files, or put the specified raw data in
the resulting resource file.
Thanks to Nico Weber for his original work in this area.
Differential Revision: https://reviews.llvm.org/D37033
llvm-svn: 314478
This allows the ints to be written as integer expressions evaluating to
unsigned 16-bit/32-bit integers.
All the expressions may use the following operators: + - & | ~, and
parentheses. Minus token - can be also unary. There is no precedence of
the operators other than the unary operators binding stronger than their
binary counterparts.
Differential Revision: https://reviews.llvm.org/D37022
llvm-svn: 314477
This commit yanks out the repeated sections of code in pruneCandidates into
two lambdas: ShouldSkipCandidate and Prune. This simplifies the logic in
pruneCandidates significantly, and reduces the chance of introducing bugs by
folding all of the shared logic into one place.
llvm-svn: 314475
Summary:
X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters.
I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs.
Reviewers: RKSimon, zvi, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38273
llvm-svn: 314474
LR is an untypical callee saved register in that it is restored into a
different register (PC) and thus does not live-out of the return block.
This case requires the `Restored` flag in CalleeSavedInfo to be cleared.
This fixes a number of cases where this wasn't handled correctly yet.
llvm-svn: 314471
This extends the set of llvm-rc parser's available resources by
another one, VERSIONINFO.
Ref: msdn.microsoft.com/en-us/library/windows/desktop/aa381058.aspx
Thanks to Nico Weber for his original work in this area.
Differential Revision: https://reviews.llvm.org/D37021
llvm-svn: 314468
The expensive-checks build bot found a problem with the r314428 commit:
if CC is live after a ATOMIC_CMP_SWAPW instruction, it needs to be
marked as live-in to the block after the loop the pseudo gets expanded
to. This actually fixes a code-gen bug as well, since if the CC isn't
live, the CR and JLH are merged to a CRJLH which doesn't actually set
the condition code any more.
llvm-svn: 314465
If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract.
Differential Revision: https://reviews.llvm.org/D38375
llvm-svn: 314457
/code/llvm-project/llvm/unittests/ExecutionEngine/Orc/RTDyldObjectLinkingLayerTest.cpp:260:38: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
[this](decltype(ObjLayer)::ObjHandleT,
llvm-svn: 314454
In setupEntryBlockAndCallSites in CodeGen/SjLjEHPrepare.cpp,
we fetch and store the actual frame pointer, but on return via
the longjmp intrinsic, it always was restored into the r7 variable.
On windows, the frame pointer should be restored into r11 instead of r7.
On Darwin (where sjlj exception handling is used by default), the frame
pointer is always r7, both in arm and thumb mode, and likewise, on
windows, the frame pointer always is r11.
On linux however, if sjlj exception handling is enabled (which it isn't
by default), libcxxabi and the user code can be built in differing modes
using different registers as frame pointer. Therefore, when restoring
registers on a platform where we don't always use the same register
depending on code mode, restore both r7 and r11.
Differential Revision: https://reviews.llvm.org/D38253
llvm-svn: 314451
We aren't do any in register extends here so we should be able to just the target independent nodes directly and allow them to be lowered as necessary.
llvm-svn: 314447
This patch implements the dwarfdump option --find=<name>. This option
looks for a DIE in the accelerator tables and dumps it if found. This
initial patch only adds support for .apple_names to keep the review
small, adding the other sections and pubnames support should be
trivial though.
Differential Revision: https://reviews.llvm.org/D38282
llvm-svn: 314439
JumpThreading now preserves dominance and lazy value information across the
entire pass. The pass manager is also informed of this preservation with
the goal of DT and LVI being recalculated fewer times overall during
compilation.
This change prepares JumpThreading for enhanced opportunities; particularly
those across loop boundaries.
Patch by: Brian Rzycki <b.rzycki@samsung.com>,
Sebastian Pop <s.pop@samsung.com>
Differential revision: https://reviews.llvm.org/D37528
llvm-svn: 314435
Summary: If we have BWI instructions we can widen to v32i16 to do the multiply instead of splitting.
Reviewers: RKSimon, spatel, zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38305
llvm-svn: 314432
Summary:
Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash.
DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug.
Reviewers: RKSimon, spatel, zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38276
llvm-svn: 314430
Previously we were using one of the subvector indices twice. The included test case causes an assert without this change.
Thanks to Simon Pilgrim for catching this.
llvm-svn: 314429
The SystemZ compare-and-swap instructions already provide the "success"
indication via a condition-code value, so the default expansion of those
operations generates an unnecessary extra comparsion.
llvm-svn: 314428
This patch adds a check to the DWARF verifier to detect CUs without a
unit DIE.
Differential revision: https://reviews.llvm.org/D38363
llvm-svn: 314426
This patch disables codegen support for branch likely instructions to
address a potential bug. These branches were unselectable as
they had the same patterns as the normal branches but came after them
when ISel was concerned.
The branch likely instructions were marked as having no delay
slots when they have annulling delay slots. The delay slot filler
does not currently handle annulling delay slot branches, so this
would lead to wrong codegen if these branches were generated.
Reviewers: atanasyan, nitesh.jain
Differential Revision: https://reviews.llvm.org/D38169
llvm-svn: 314421
Summary:
A DBG_VALUE that is referring to a physical register is
valid up until the next def of the register, or the end
of the basic block that it belongs to.
LiveDebugVariables is computing live intervals (slot index
ranges) for DBG_VALUE instructions, before regalloc, in order
to be able to re-insert DBG_VALUE instructions again after
regalloc. When the DBG_VALUE is mapping a variable to a
physical register we do not need to compute the range. We
should simply re-insert the DBG_VALUE at the start position.
The problem that was found, resulting in this patch, was a
situation when the DBG_VALUE was the last real use of the
physical register. The computeIntervals/extendDef methods
extended the range to cover the whole basic block, even though
the physical register very well could be allocated to some
virtual register inside the basic block. So the extended
range could not be trusted.
This patch is a preparation for https://reviews.llvm.org/D38229,
where the goal is to insert DBG_VALUE after each new definition
of a variable, even if the virtual registers that the variable
was connected to has been coalesced into using the same physical
register (e.g. due to two address instructions). For more info
see https://bugs.llvm.org/show_bug.cgi?id=34545
Reviewers: aprantl, rnk, echristo
Reviewed By: aprantl
Subscribers: Ka-Ka, llvm-commits
Differential Revision: https://reviews.llvm.org/D38140
llvm-svn: 314414
The second argument for Allocator::Deallocate is the number of elements,
not the size of a single element. In asan mode specifying a large number
of elements poisoned random memory regions, leading to crashes
everywhere.
llvm-svn: 314413
Summary:
This allows sharing the lattice value code between LVI and SCCP (D36656).
It also adds a `satisfiesPredicate` function, used by D36656.
Reviewers: davide, sanjoy, efriedma
Reviewed By: sanjoy
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D37591
llvm-svn: 314411
MS allows the following size directives: float/double and long as synonymous to dword/qword and dword, respectively.
Differential Revision: https://reviews.llvm.org/D37190
llvm-svn: 314410
Before this change using any of the -name*= command line options with an output
directory would result in a single file (functions.txt/functions.html)
containing the coverage for those specific functions. Now you get the same
directory structure as when not using any -name*= options.
Differential Revision: https://reviews.llvm.org/D38280
llvm-svn: 314396
It's currently quite difficult to test passes like branch relaxation, which
requires branches with large displacement to be generated. The .space assembler
directive makes it easy to create arbitrarily large basic blocks, but
getInlineAsmLength is not able to parse it and so the size of the block is not
correctly estimated. Other backends (AArch64, AMDGPU) introduce options just
for testing that artificially restrict the ranges of branch instructions (e.g.
aarch64-tbz-offset-bits). Although parsing a single form of the .space
directive feels inelegant, it does allow a more direct testing approach.
This patch adapts the .space parsing code from
Mips16InstrInfo::getInlineAsmLength and removes it now the extra functionality
is provided by the base implementation. I want to move this functionality to
the generic getInlineAsmLength as 1) I need the same for RISC-V, and 2) I feel
other backends will benefit from more direct testing of large branch
displacements.
Differential Revision: https://reviews.llvm.org/D37798
llvm-svn: 314393
This is a follow-on of D37211.
D37211 eliminates a compare instruction if two conditional branches can be made based on the one compare instruction, e.g.
if (a == 0) { ... }
else if (a < 0) { ... }
This patch extends this optimization to support partially redundant cases, which often happen in while loops.
For example, one compare instruction is moved from the loop body into the preheader by this optimization in the following example.
do {
if (a == 0) dummy1();
a = func(a);
} while (a > 0);
Differential Revision: https://reviews.llvm.org/D38236
llvm-svn: 314390
%lo(), %hi(), and %pcrel_hi() are supported and test cases have been added to
ensure the appropriate fixups and relocations are generated. I've added an
instruction format field which is used in RISCVMCCodeEmitter to, for
instance, tell whether it should emit a lo12_i fixup or a lo12_s fixup
(RISC-V has two 12-bit immediate encodings depending on the instruction
type).
Differential Revision: https://reviews.llvm.org/D23568
llvm-svn: 314389
Summary:
If we have a non-allocated register, we allow us to try recoloring of an
already allocated and "Done" register, even if they are of the same
register class, if the non-allocated register has at least one tied def
and the allocated one has none.
It should be easier to recolor the non-tied register than the tied one, so
it might be an improvement even if they use the same regclasses.
Reviewers: qcolombet
Reviewed By: qcolombet
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D38309
llvm-svn: 314388
Without this, we could end up trying to get the Nth (0-indexed) element
from a subvector of size N.
Differential Revision: https://reviews.llvm.org/D37880
llvm-svn: 314380
This patch adds new insn, "reg = be16/be32/be64 reg",
for bswap to little endian for big-endian target (bpfeb).
It also adds new insn for negation "reg = -reg".
Currently, for source code, e.g.,
b = -a
LLVM still prefers to generate:
b = 0 - a
But "reg = -reg" format can be used in assembly code.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 314376
Summary:
And now that we no longer have to explicitly free() the Loop instances, we can
(with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
doing.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38201
llvm-svn: 314375
concept.
Add a unit-test to make sure we don't backslide, and tweak the MockBaseLayer
utility to make it easier to test this kind of thing in the future.
llvm-svn: 314374
If we do not initialize Prefix here, Prefix.data() returns a nullptr.
Later, it is passed to memcpy. memcpy's behavior is undefined if src (or
dst) is a nullptr even if a given size is 0. That's why this code
triggered UBsan.
llvm-svn: 314368
Summary:
This avoids C++ UB if the GEP is weird and the calculation overflows
int64_t, and it's also observable in the cost model's results.
Such GEPs are almost surely not valid pointers, but LLVM nonetheless
generates them sometimes.
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38337
llvm-svn: 314362
This patch produces a crash and hexagon_vector_loop_carried_reuse_constant.ll test fails on Windows (llvm-clang-x86_64-expensive-checks-win build bot).
llvm-svn: 314361
This reverts r314017 and similar code added in later commits. It seems to not work for pointer compares and is causing a bot failure for the last several days.
llvm-svn: 314360
The tar format originally supported up to 99 byte filename. The two
extensions are proposed later: Ustar or PAX.
In the UStar extension, a pathanme is split at a '/' and its "prefix"
and "suffix" are stored in different locations in the tar header. Since
"prefix" can be up to 155 byte, it can represent up to 254 byte
filename (but exact limit depends on the location of '/' character in
a pathname.)
Our TarWriter first attempt to use UStar extension and then fallback to
PAX extension.
But there's a bug in UStar header creation. "Suffix" part must be a NUL-
terminated string, but we didn't handle it correctly. As a result, if
your filename just 100 characters long, the last character was droppped.
This patch fixes the issue.
Differential Revision: https://reviews.llvm.org/D38149
llvm-svn: 314349
Summary:
*In-source builds* of LLVM, in which a user invokes `cmake` from within the
LLVM source directory, or invokes `cmake -B/path/to/source/dir/of/llvm`,
are explicitly checked for and disallowed by LLVM's `CMakeLists.txt`.
*In-tree builds*, on the other hand, refer to when the source directories
of projects such as Clang are nested within the `llvm/tools` source
directory. These are not disallowed, and are in fact a common way of
building LLVM and Clang.
Revise the comment to match the logic underneath it: it checks for an
"in-source build", not an "in-tree build".
Reviewers: beanz
Reviewed By: beanz
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D38317
llvm-svn: 314348
FileOutputBuffer::create() attempts to remove a target file if the file
is a regular one, which results in an unexpected result in a failure
scenario.
If something goes wrong and the user of FileOutputBuffer decides to not
call commit(), it leaves nothing. An existing file is removed, and no
new file is created.
What we should do is to atomically replace an existing file with a new
file using rename(), so that it wouldn't remove an existing file without
creating a new one.
Differential Revision: https://reviews.llvm.org/D38283
llvm-svn: 314345
This commit allows the outliner to avoid saving and restoring the link register
on AArch64 when it is dead within an entire class of candidates.
This introduces changes to the way the outliner interfaces with the target.
For example, the target now interfaces with the outliner using a
MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and
getOutliningFrameOverhead.
This also improves several comments on the outliner's cost model.
https://reviews.llvm.org/D36721
llvm-svn: 314341
Summary:
According to https://gcc.gnu.org/wiki/SplitStacks, the linker expects a zero-sized .note.GNU-split-stack section if split-stack is used (and also .note.GNU-no-split-stack section if it also contains non-split-stack functions), so it can handle the cases where a split-stack function calls non-split-stack function.
This change adds the sections if needed.
Fixes PR #34670.
Reviewers: thanm, rnk, luqmana
Reviewed By: rnk
Subscribers: llvm-commits
Patch by Cherry Zhang <cherryyz@google.com>
Differential Revision: https://reviews.llvm.org/D38051
llvm-svn: 314335
We already have zeroable bits in an APInt. We might as well use that instead of checking for an all zero BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D37950
llvm-svn: 314332
In some cases the result psadbw is smaller than the type of the add that started the match. Currently in these cases we are using a smaller add and inserting the result.
If we instead combine the psadbw with zeros and use the full size add we can take advantage of implicit zeroing we get if we emit a narrower move before the add.
In a future patch, I want to make isel aware that the psadbw itself already zeroed the upper bits and remove the move entirely.
Differential Revision: https://reviews.llvm.org/D37453
llvm-svn: 314331
reductions.
If both operands of the newly created SelectInst are Undefs the
resulting operation is also Undef, not SelectInst. It may cause crashes
when trying to propagate IR flags because function expects exactly
SelectInst instruction, nothing else.
llvm-svn: 314323
These changes faciliate positive behavior for arithmetic based select
expressions that match its translation criteria, keeping code size gated to
neutral or improved scenarios.
Patch by Michael Berg <michael_c_berg@apple.com>!
Differential Revision: https://reviews.llvm.org/D38263
llvm-svn: 314320
Summary:
Found when testing stage-2 build with D38101.
```
In file included from /build/llvm/lib/Support/Path.cpp:1045:
/build/llvm/lib/Support/Unix/Path.inc:648:14: error: comparison 'uint64_t' (aka 'unsigned long') > 18446744073709551615 is always false [-Werror,-Wtautological-constant-compare]
if (length > std::numeric_limits<size_t>::max()) {
~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
`size_t` is `uint64_t` here, apparently, thus any `uint64_t` value
always fits into `size_t`.
Initial patch was to use some preprocessor logic to
not check if the size is known to fit at compile time.
But Zachary Turner suggested using this approach.
Reviewers: Bigcheese, rafael, zturner, mehdi_amini
Reviewed by (via email): zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38132
llvm-svn: 314312
Before this change using any of the -name*= command line options with an output
directory would result in a single file (functions.txt/functions.html)
containing the coverage for those specific functions. Now you get the same
directory structure as when not using any -name*= options.
Differential Revision: https://reviews.llvm.org/D38280
llvm-svn: 314310
This was intended to be no-functional-change, but it's not - there's a test diff.
So I thought I should stop here and post it as-is to see if this looks like what was expected
based on the discussion in PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603
Notes:
1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried
through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'.
The parameter isn't passed down, so we pick up the default value from the function signature
after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the
'SimplifyCFG' calls.
2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops.
This would theoretically allow us to differentiate the transforms controlled by those params
independently.
3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too.
I just stopped here to minimize the diffs.
4. Similarly, I stopped short of messing with the pass manager layer. I have another question that
could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG
set to true no matter where in the pipeline it's creating SimplifyCFG passes?
// Create an early function pass manager to cleanup the output of the
// frontend.
EarlyFPM.addPass(SimplifyCFGPass());
-->
/// \brief Construct a pass with the default thresholds
/// and switch optimizations.
SimplifyCFGPass::SimplifyCFGPass()
: BonusInstThreshold(UserBonusInstThreshold),
LateSimplifyCFG(true) {} <-- switches get converted to lookup tables and loops may not be in canonical form
If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG'
setting via recursion was masking this bug.
Differential Revision: https://reviews.llvm.org/D38138
llvm-svn: 314308
InlineCost can understand Select IR now. This patch finds free Select IRs and
continue the propagation of SimplifiedValues, ConstantOffsetPtrs, and
SROAArgValues.
Differential Revision: https://reviews.llvm.org/D37198
llvm-svn: 314307
NFC.
Updated 8 regression tests to use -mattr instead of -mcpu flag as follows:
-mcpu=knl --> -mattr=+avx512f
-mcpu=skx --> -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq
The updates are as part of the preparation of a large commit to add all instruction scheduling for the SKX target.
Reviewers: delena, zvi, RKSimon
Differential Revision: https://reviews.llvm.org/D38222
Change-Id: I2381c9b5bb75ecacfca017243c22d054f6eddd14
llvm-svn: 314306
This patch makes analyzeBranch eliminate unconditional branch to the next instruction.
After basic blocks are re-organized by optimizers, such as machine block placement, a BB may end with an unconditional branch to the next (fallthrough) BB. This patch removes such redundant branch instruction.
Differential Revision: https://reviews.llvm.org/D37730
llvm-svn: 314297
As commented on D37849 and rL313547, AVX1 targets were missing a chance to use vmovmskpd for v4f64/v4i64 results for bool vector bitcasts
llvm-svn: 314293
I implemented isTruncateFree in rL313533, this patch fixes the logic
to match my comment, as the previous logic was too general. Now the
only truncates that are free are i64 -> i32.
Differential Revision: https://reviews.llvm.org/D38234
llvm-svn: 314280
This is necessary, but not sufficient, for having working SJLJ exception
handling on x86_64.
Differential Revision: https://reviews.llvm.org/D38254
llvm-svn: 314277
The callsite value is already stored indexed from 0 in
the _Unwind_Context struct. When accessed via the functions
_Unwind_GetIP and _Unwind_SetIP, the value is indexed from 1,
but those functions handle the offseting. When reading directly
from the struct here, we shouldn't subtract 1.
This matches the code generated by the ARM target, where SJLJ
exception handling is used by default on iOS.
This makes clang-built object files for 32 bit x86 mingw work when
linked with libgcc/libstdc++.
Differential Revision: https://reviews.llvm.org/D38251
llvm-svn: 314276
This matches the types of the struct members defined in
lib/CodeGen/SjLjEHPrepare.cpp, and the definition of this struct in libgcc.
Differential Revision: https://reviews.llvm.org/D38248
llvm-svn: 314275
Summary:
A new FDR metadata record will support logging a function call argument;
appending multiple metadata records will represent a sequence of arguments
meaning that "holes" are not representable by the buffer format. Each
call argument is currently a 64-bit value (useful for "this" pointers and
synchronization objects).
If present, we put this argument to the function call "entry" record it
belongs to, and alter its type to notify the user of its presence.
Reviewers: dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32840
llvm-svn: 314269
This patch tries to transform cases like:
for (unsigned i = 0; i < N; i += 2) {
bool c0 = (i & 0x1) == 0;
bool c1 = ((i + 1) & 0x1) == 1;
}
To
for (unsigned i = 0; i < N; i += 2) {
bool c0 = true;
bool c1 = true;
}
This commit also update test/Transforms/IndVarSimplify/replace-srem-by-urem.ll to prevent constant folding.
Differential Revision: https://reviews.llvm.org/D38272
llvm-svn: 314266
This change adds support for dynamic relocations (allocated
SHT_REL/SHT_RELA sections with a dynamic symbol table as their link).
I had to reland this because of a I wasn't initilizing some pointers.
llvm-svn: 314263
Summary:
Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow . This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.
Reviewers: jlebar
Subscribers: jholewinski, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38265
llvm-svn: 314253
Summary:
In rare cases, loads that don't get prefetched that were marked as
strided loads could cause a crash if they occurred in a loop with other
colliding loads.
Reviewers: mcrosier
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38261
llvm-svn: 314252
Summary:
This addresses a correctness bug for LD[1234]*_POST opcodes that have
the prefetcher fix applied to them: the base register was not being
written back from the temp after being incremented, so it would appear
to never be incremented.
Also, fix some opcode tag computations based on some updated HW details
to get better tag avoidance and thus better prefetcher performance.
Reviewers: mcrosier
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38256
llvm-svn: 314251
This hook is called after register allocation with two physical registers. We don't need a separate instruction at that time to force register class constraints. I left in the assert though. We also have a fatal error in X86MCCodeEmitter if we ever encode an H-reg and a REX prefix.
llvm-svn: 314248
Previously these were being included as both imports and
exports, with the import being satisfied by the export
(or some strong symbol) at runtime. However proved
unnecessary and actually complicated linking as it meant
there was not a 1-to-1 mapping between a wasm function
/global index and a linker symbol.
Differential Revision: https://reviews.llvm.org/D38246
llvm-svn: 314245
In the past while, I've committed a number of patches in the PowerPC back end
aimed at eliminating comparison instructions. However, this causes some failures
in proprietary source and these issues are not observed in SPEC or any open
source packages I've been able to run.
As a result, I'm pulling the entire series and will refactor it to:
- Have a single entry point for easy control
- Have fine-grained control over which patterns we transform
A side-effect of this is that test cases for these patches (and modified by
them) are XFAIL-ed. This is a temporary measure as it is counter-productive
to remove/modify these test cases and then have to modify them again when
the refactored patch is recommitted.
The failure will be investigated in parallel to the refactoring effort and
the recommit will either have a fix for it or will leave this transformation
off by default until the problem is resolved.
llvm-svn: 314244
This patch expands the support of lowerInterleavedStore to {8|16|32}x8i stride 3.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) .
This patch is part two of two patches and it covers the store (interlevaed) side.
The patch goal is to optimize the following sequence:
a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7
into
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7
Reviewers:
zvi
guyblank
dorit
Ayal
Differential Revision: https://reviews.llvm.org/D37117
Change-Id: I56ced8bcbea809a37654060771911ade20246ccc
llvm-svn: 314234
If this transformation succeeds, we're going to remove our dependency on the shift by rewriting the and. So it doesn't matter how many uses the shift has.
This distributes the one use check to other transforms in foldICmpAndConstConst that do need it.
Differential Revision: https://reviews.llvm.org/D38206
llvm-svn: 314233
It is useful for the symbol to contain the index of the
function of global it represents in the function/global
index space.
For imports we also store the import index so that the
linker can find, for example, the signature of the
corresponding function, which is defined by the import
In the long run we need to decide whether this API
surface should be closer to binary (where imported
functions are seperate) or the wasm spec (where the
function index space is unified).
Differential Revision: https://reviews.llvm.org/D38189
llvm-svn: 314230
When dsymutil generates the companion file, its strips all unnecessary
sections by omitting their body and setting the offset in their
corresponding load command to zero.
One such section is the .eh_frame section, as it contains runtime
information rather than debug information and is part of the __TEXT
segment. When reading this section, we would just read the number of
bytes specified in the load command, starting from offset 0 (i.e. the
beginning of the file).
Rather than trying to parse this obviously invalid section, dwarfdump
now skips this.
Differential revision: https://reviews.llvm.org/D38135
llvm-svn: 314208
The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations.
Differential Revision: https://reviews.llvm.org/D37949
llvm-svn: 314207
This is a 2nd attempt at:
https://reviews.llvm.org/rL310055
...which was reverted at rL310123 because of PR34074:
https://bugs.llvm.org/show_bug.cgi?id=34074
In this version, we break out of the inner loop after we successfully merge and kill a pair of stores. In the
earlier rev, we were continuing instead, which meant we could process the invalid info from a now dead store.
Original commit message (authored by Filipe Cabecinhas):
This fixes PR31777.
If both stores' values are ConstantInt, we merge the two stores
(shifting the smaller store appropriately) and replace the earlier (and
larger) store with an updated constant.
In the future we should also support vectors of integers. And maybe
float/double if we can.
Differential Revision: https://reviews.llvm.org/D30703
llvm-svn: 314206
This patch adds logic to follow a symbol's aliases when the symbol name
cannot be found in the current object file. It checks the main binary
for the symbol's address and queries the current object for its aliases
(symbols with the same address) before printing out a warning.
Differential revision: https://reviews.llvm.org/D38230
llvm-svn: 314198
Removing X86 broadcast(f/i)32x2 intrinsics from llvm.
Adding autoUpgrade support.
Moving matching tests from avx512dq-intrinsics.ll to avx512dq-intrinsics-upgrade.ll and from avx512dqvl-intrinsics.ll to avx512dqvl-intrinsics-upgrade.ll.
Differential Revision: https://reviews.llvm.org/D38220
llvm-svn: 314195
Usually the frontend communicates the size of wchar_t via metadata and
we can optimize wcslen (and possibly other calls in the future). In
cases without the wchar_size metadata we would previously try to guess
the correct size based on the target triple; however this is fragile to
keep up to date and may miss users manually changing the size via flags.
Better be safe and stop guessing and optimizing if the frontend didn't
communicate the size.
Differential Revision: https://reviews.llvm.org/D38106
llvm-svn: 314185
This was an oversight in the original backend data layout.
The AVR architecture does not have the concept of unaligned loads - all
loads/stores from all addresses are aligned to one byte.
Discovered in avr-rust issue #64https://github.com/avr-rust/rust/issues/64
Patch By Gergo Erdi.
llvm-svn: 314179
llvm-cov's report mode does not print any output when -show-functions is
specified and no source files are specified. This can be surprising, so
the tool should at least print out an error message when this happens.
rdar://problem/34636859
llvm-svn: 314175
Summary:
Sanitizer blacklist entries currently apply to all sanitizers--there
is no way to specify that an entry should only apply to a specific
sanitizer. This is important for Control Flow Integrity since there are
several different CFI modes that can be enabled at once. For maximum
security, CFI blacklist entries should be scoped to only the specific
CFI mode(s) that entry applies to.
Adding section headers to SpecialCaseLists allows users to specify more
information about list entries, like sanitizer names or other metadata,
like so:
[section1]
fun:*fun1*
[section2|section3]
fun:*fun23*
The section headers are regular expressions. For backwards compatbility,
blacklist entries entered before a section header are put into the '[*]'
section so that blacklists without sections retain the same behavior.
SpecialCaseList has been modified to also accept a section name when
matching against the blacklist. It has also been modified so the
follow-up change to clang can define a derived class that allows
matching sections by SectionMask instead of by string.
Reviewers: pcc, kcc, eugenis, vsk
Reviewed By: eugenis, vsk
Subscribers: vitalybuka, llvm-commits
Differential Revision: https://reviews.llvm.org/D37924
llvm-svn: 314170
It leads to some improvements, but also a regression for the simple
case, so it's not clearly a good idea.
test/CodeGen/ARM/vcvt.ll now has test coverage to show the difference.
Ultimately, the right solution is probably to custom-lower fp-to-int
conversions, to something like ARMISD::VCVT_F32_S32 plus a bitcast.
It's hard to do the right thing when the implicit bitcast isn't visible
to DAG transforms.
llvm-svn: 314169
R12 is used for the SwiftError parameter. It is no longer a CSR as it
is used for transfer the SwiftError, and the caller must preserve it if
they need to.
llvm-svn: 314165
All this optimization cares about is knowing how many low bits of LHS is known to be zero and whether that means that the result is 0 or greater than the RHS constant. It doesn't matter where the zeros in the low bits came from. So we don't need to specifically look for an AND. Instead we can use known bits.
Differential Revision: https://reviews.llvm.org/D38195
llvm-svn: 314153
As far as I know SUBREG_TO_REG is stating that the upper bits are 0. But if we are just converting the GR32 with no checks, then we have no reason to say the upper bits are 0.
I don't really know how to test this today since I can't find anything that looks that closely at SUBREG_TO_REG. The test changes here seems to be some perturbance of register allocation.
Differential Revision: https://reviews.llvm.org/D38001
llvm-svn: 314152
This change refactors some of the code to allow for some code
deduplication in later diffs as well as just to make adding a new
section type more self contained to the class itself. The idea for this
was first mentioned by James in D 37915 and will be used in that change
as recommended.
This change follows changes for dynamic sections but precedes support
for dynamic relocations.
Differential Revision: https://reviews.llvm.org/D38008
llvm-svn: 314148
The 1st attempt at this:
https://reviews.llvm.org/rL314117
was reverted at:
https://reviews.llvm.org/rL314118
because of bot fails for clang tests that were checking optimized IR. That should be fixed with:
https://reviews.llvm.org/rL314144
...so try again.
Original commit message:
The transform to convert an extract-of-a-select-of-vectors was added at:
https://reviews.llvm.org/rL194013
And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>
Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.
The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.
The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.
Differential Revision: https://reviews.llvm.org/D38006
llvm-svn: 314147
This teach simplifyDemandedBits to handle constant splat vector shifts.
This required changing some uses of getZExtValue to getLimitedValue since we can't rely on legalization using getShiftAmountTy for the shift amount.
I believe there may have been a bug in the ((X << C1) >>u ShAmt) handling where we didn't check if the inner shift was too large. I've fixed that here.
I had to add new patterns to ARM because the zext/sext the patterns were trying to look for got turned into an any_extend with this patch. Happy to split that out too, but not sure how to test without this change.
Differential Revision: https://reviews.llvm.org/D37665
llvm-svn: 314139
Add two callbacks to MachineEvaluator, so that specific implementations
can specify more details about register classes:
- composeWithSubRegIndex(RC,Idx), to provide the register class for a
register from RC used in conjunction with a subregister index Idx.
- getPhysRegBitWidth(Reg), to provide the size in bits of the given
physical register.
llvm-svn: 314136
This replaces the large number of patterns that handle every possible case of zeroing after a masked compare with a few simpler patterns that use a predicate to check for a masked compare producer.
This is similar to what we do for detecting free GR32->GR64 zero extends and free xmm->ymm/zmm zero extends.
This shrinks the isel table from ~590k to ~531k. This is a roughly 10% reduction in size.
Differential Revision: https://reviews.llvm.org/D38217
llvm-svn: 314133
Since now SCEV can handle 'urem', an 'urem' is a better canonical form than an 'srem' because it has well-defined behavior
This is a follow up of D34598
Differential Revision: https://reviews.llvm.org/D38072
llvm-svn: 314125
The transform to convert an extract-of-a-select-of-vectors was added at:
rL194013
And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>
Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.
The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.
The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.
Differential Revision: https://reviews.llvm.org/D38006
llvm-svn: 314117
Summary:
This code iterates the 'Orders' vector in parallel with the DbgValue
list, emitting all DBG_VALUEs that occurred between the last IR order
insertion point and the next insertion point. This assumes the
SDDbgValue list is sorted in IR order, which it usually is. However, it
is not sorted when a node with a debug value is replaced with another
one. When this happens, TransferDbgValues is called, and the new value
is added to the end of the list.
The problem can be solved by stably sorting the list by IR order.
Reviewers: aprantl, Ka-Ka
Reviewed By: aprantl
Subscribers: MatzeB, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D38197
llvm-svn: 314114
This patch expands the support of lowerInterleavedStore to 8x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2.
In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future.
The patch goal is to optimize the following sequence:
At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding
each 8 chars:
c0, c1, , c7
m0, m1, , m7
y0, y1, , y7
k0, k1, ., k7
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Reviewers
DavidKreitzer
Farhana
zvi
igorb
guyblank
RKSimon
Ayal
Differential Revision: https://reviews.llvm.org/D36058
Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32
llvm-svn: 314109
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential review.
llvm-svn: 314106
Summary:
Right now there are two functions with the same name, one does the work
and the other one returns true if expansion is needed. Rename
TargetTransformInfo::expandMemCmp to make it more consistent with other
members of TargetTransformInfo.
Remove the unused Instruction* parameter.
Differential Revision: https://reviews.llvm.org/D38165
llvm-svn: 314096
This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us.
llvm-svn: 314083
This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type.
Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases.
Differential Revision: https://reviews.llvm.org/D35320
llvm-svn: 314076
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential review.
llvm-svn: 314073
We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'.
llvm-svn: 314072
This fixes the avr-rust issue (#75) with floating-point comparisons generating broken code.
By default, LLVM assumes these comparisons return 32-bit values, but ours are 8-bit.
Patch By Thomas Backman.
llvm-svn: 314070
The code wasn't yelling at the user when there's a reference
from a DIGlobalVariableExpression. Thanks to Adrian for the
reduced testcase. Fixes PR34672.
llvm-svn: 314069
This is a follow-up from D38181 (r314023). We have to put 64-bit
constants into a register using a separate instruction, so we
should try harder to avoid that.
From what I see, we're not likely to encounter this pattern in the
DAG because the upstream setcc combines from this don't (usually?)
produce this pattern. If we fix that, then this will become more
relevant. Since the cost of handling this case is just loosening
the predicate of the existing fold, we might as well do it now.
llvm-svn: 314064
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314062
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314060
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314055
Fixed suboptimal encoding of instruction memory operand when assembler is used to select 32 bit fixup rather than 8 bit immediate for encoding memory offset value.
Differential Revision: https://reviews.llvm.org/D38117
llvm-svn: 314044
This patch just adds the missing information to the P9 scheduling model to allow
the model to be marked as complete.
The model has been verified against P9 documentation. The model was verified
with utils/schedcover.py.
Differential Revision: https://reviews.llvm.org/D35695
llvm-svn: 314026
The (non-)obvious win comes from saving 3 bytes by using the 0x83 'and' opcode variant instead of 0x81.
There are also better improvements based on known-bits that allow us to eliminate the mask entirely.
As noted, this could be extended. There are potentially other wins from always shifting first, but doing
that reveals a tangle of problems in other pattern matching. We do this transform generically in
instcombine, but we often have icmp IR that doesn't match that pattern, so we must account for this
in the backend.
Differential Revision: https://reviews.llvm.org/D38181
llvm-svn: 314023
The result of the isSignBitCheck isn't used anywhere else and this allows us to share the m_APInt call in the likely case that it isn't a sign bit check.
llvm-svn: 314018
Summary: Conditional returns were not taken into consideration at all. Implement them by turning them into jumps and normal returns. This means there is a slightly higher performance penalty for conditional returns, but this is the best we can do, and it still disturbs little of the rest.
Reviewers: dberris, echristo
Subscribers: sanjoy, nemanjai, hiraditya, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D38102
llvm-svn: 314005
Also add operator<< for use with raw_ostream to InfoByHwMode and its
derived classes.
Recommitting r313989 with the fix for unresolved references: explicitly
define the operator<< in namespace llvm.
llvm-svn: 314004
Usually an intrinsic is a simple target instruction, it should have a small latency. A real function call has much larger latency. So handle the intrinsic call in function getInstructionLatency().
Differential Revision: https://reviews.llvm.org/D38104
llvm-svn: 314003
If the two instructions being compared for equivalence have corresponding operands
that are integer constants, then check their values to determine equivalence.
Patch by Suyog Sarda!
llvm-svn: 313993
Summary:
A SCEV such as:
{%v2,+,((-1 * (trunc i64 (-1 * %v1) to i32)) + (-1 * (trunc i64 %v1 to i32)))}<%loop>
can be folded into, simply, {%v2,+,0}. However, the current code in ::getAddExpr()
will not try to apply the simplification m*trunc(x)+n*trunc(y) -> trunc(trunc(m)*x+trunc(n)*y)
because it only keys off having a non-multiplied trunc as the first term in the simplification.
This patch generalizes this code to try to do a more generic fold of these trunc
expressions.
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37888
llvm-svn: 313988
Avoid unnecessary std::string creations during TypeSetByHwMode::writeToStream.
Found during investigations into PR28222
Differential Revision: https://reviews.llvm.org/D38174
llvm-svn: 313983
Combine CMOV[i16]<-[SIGN,ZERO,ANY]_EXTEND to [i32,i64] into CMOV[i32,i64].
One example of where it is useful is:
before (20 bytes)
<foo>:
test $0x1,%dil
mov $0x307e,%ax
mov $0xffff,%cx
cmovne %ax,%cx
movzwl %cx,%eax
retq
after (18 bytes)
<foo>:
test $0x1,%dil
mov $0x307e,%ecx
mov $0xffff,%eax
cmovne %ecx,%eax
retq
Reviewers: craig.topper, aaboud, spatel, RKSimon, zvi
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36711
llvm-svn: 313982
We've found a serious issue with the current implementation of loop predication.
The current implementation relies on SCEV and this turned out to be problematic.
To fix the problem we had to rework the pass substantially. We have had the
reworked implementation in our downstream tree for a while. This is the initial
patch of the series of changes to upstream the new implementation.
For now the transformation is limited to the following case:
* The loop has a single latch with either ult or slt icmp condition.
* The step of the IV used in the latch condition is 1.
* The IV of the latch condition is the same as the post increment IV of the guard condition.
* The guard condition is ult.
See the review or the LoopPredication.cpp header for the details about the
problem and the new implementation.
Reviewed By: sanjoy, mkazantsev
Differential Revision: https://reviews.llvm.org/D37569
llvm-svn: 313981
This patch re-commits the patch that was pulled out due to a
problem it caused, but with a fix for the problem. The fix
was reviewed separately by Eric Christopher and Hal Finkel.
Differential Revision: https://reviews.llvm.org/D38054
llvm-svn: 313978
For the following function:
double fn1(double d0, double d1, double d2) {
double a = -d0 - d1 * d2;
return a;
}
on ARM, LLVM generates code along the lines of
vneg.f64 d0, d0
vmls.f64 d0, d1, d2
i.e., a negate and a multiply-subtract.
The attached patch adds instruction selection patterns to allow it to generate the single instruction
vnmla.f64 d0, d1, d2
(multiply-add with negation) instead, like GCC does.
Committed on behalf of @gergo- (Gergö Barany)
Differential Revision: https://reviews.llvm.org/D35911
llvm-svn: 313972
Summary: Previously we would dereference Symtab without checking for null.
Reviewers: davide, atanasyan, rafael
Reviewed By: davide, atanasyan
Differential Revision: https://reviews.llvm.org/D38080
llvm-svn: 313970
This patch adds the -o and --out-file options for compatibility with
Darwin's dwarfdump.
Differential revision: https://reviews.llvm.org/D38125
llvm-svn: 313969
This patch adds instruction patterns for operations in BPF_ALU. After this,
assembler could recognize some 32-bit ALU statement. For example, those listed
int the unit test file.
Separate MOV patterns are unnecessary as MOV is ALU operation that could reuse
ALU encoding infrastructure, this patch removed those redundant patterns.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 313961
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 313960
Arithmetic and jump instructions, load and store instructions are sharing
the same 8-bit code field encoding,
A better instruction pattern implemention could be the following inheritance
relationships, and each layer only encoding those fields which start to
diverse from that layer. This avoids some redundant code.
InstBPF -> TYPE_ALU_JMP -> ALU/JMP
InstBPF -> TYPE_LD_ST -> Load/Store
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 313959
Currently, eBPF backend is using some constant directly in instruction patterns,
This patch replace them with mnemonics and removed some unnecessary temparary
variables.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 313958
The previous SwiftCC support for AAPCS64 was partially correct. It
setup swiftself parameters in the proper register but failed to setup
swifterror in the correct register. This would break compilation of
swift code for non-Darwin AAPCS64 conforming environments.
llvm-svn: 313956
There were two issues, one Python 3 specific related to Unicode,
and another which is that the tool substitution for lld no longer
rejected matches where a / preceded the tool name.
llvm-svn: 313928
This patch adds a pass that removes the computation of provably redundant
expressions that have been computed earlier in a previous iteration. It
relies on the use of PHIs to identify loop carried dependences.
This is scalar replacement for vector types.
llvm-svn: 313925
in the second slice of a Mach-O universal file.
The code in llvm-objdump in in DisassembleMachO() was getting the default
CPU then incorrectly setting into the global variable used for the -mcpu option
if that was not set. This caused a second call to DisassembleMachO() to use
the wrong default CPU when disassembling the next slice in a Mach-O universal
file. And would result in bad disassembly and an error message about an
recognized processor for the target:
% llvm-objdump -d -m -arch all fat.macho-armv7s-arm64
fat.macho-armv7s-arm64 (architecture armv7s):
(__TEXT,__text) section
armv7:
0: 60 47 bx r12
fat.macho-armv7s-arm64 (architecture arm64):
'cortex-a7' is not a recognized processor for this target (ignoring processor)
'cortex-a7' is not a recognized processor for this target (ignoring processor)
(__TEXT,__text) section
___multc3:
0: .long 0x1e620810
rdar://34439149
llvm-svn: 313921
debuginfo-tests has need to reuse a lot of common configuration
from clang and lld, and in general it seems like all of the
projects which are tightly coupled (e.g. lld, clang, llvm, lldb,
etc) can benefit from knowing about one other. For example,
lldb needs to know various things about how to run clang in its
test suite. Since there's a lot of common substitutions and
operations that need to be shared among projects, sinking this
up into LLVM makes sense.
In addition, this patch introduces a function add_tool_substitution
which handles all the dirty intricacies of matching tool names
which was previously copied around the various config files. This
is now a simple straightforward interface which is hard to mess
up.
Differential Revision: https://reviews.llvm.org/D37944
llvm-svn: 313919
This has gone back and forth, but it seems this is necessary
after all. realpath is not sufficient because if you have a
file named 'C:\foo.txt', then both realpath('c:\foo.txt') and
realpath(C:\foo.txt') return the string that was passed to them
exactly as is, meaning the case of the drive-letter won't match.
The problem before was not that we were normalizing the case of
items going into the config map, but rather that we were
normalizing the case of something we needed to print. The value
that is used to key on the config map should never be printed.
llvm-svn: 313918
Summary:
Avoid using XZR/WZR directly as operands to split stores of zero
vectors. Doing so can lead to the XZR/WZR being used by an instruction
that doesn't allow it (e.g. add).
Fixes bug 34674.
Reviewers: t.p.northover, efriedma, MatzeB
Subscribers: aemerson, rengolin, javed.absar, mcrosier, eraman, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38146
llvm-svn: 313916
The previous version of dumper implemented UTF-16 byte swap incorrectly
on big-endian machines. This now gets fixed.
Thanks to Bill Seurer for testing the patch locally.
Differential Review: https://reviews.llvm.org/D38150
llvm-svn: 313912
This patch adds dumping of line table instructions as well as the final
state at each specified pc value in verbose mode. This is essentially
the same as the default in Darwin's dwarfdump. Dumping the actual line
table opcodes can be particularly useful for something like debugging a
bad `.debug_line` section.
Differential revision: https://reviews.llvm.org/D37971
llvm-svn: 313910
At least for the 64-bit and less case, we should be able to determine if we even have a mask without counting any bits. This also removes the need to explicitly check for 0 active bits, isMask will return false for 0.
llvm-svn: 313908
The fix is to avoid invalidating our insertion point in
replaceDbgDeclare:
Builder.insertDeclare(NewAddress, DIVar, DIExpr, Loc, InsertBefore);
+ if (DII == InsertBefore)
+ InsertBefore = &*std::next(InsertBefore->getIterator());
DII->eraseFromParent();
I had to write a unit tests for this instead of a lit test because the
use list order matters in order to trigger the bug.
The reduced C test case for this was:
void useit(int*);
static inline void inlineme() {
int x[2];
useit(x);
}
void f() {
inlineme();
inlineme();
}
llvm-svn: 313905
Summary:
SelectionDAGISel::LowerArguments is associating arguments
with frame indices (FuncInfo->setArgumentFrameIndex). That
information is later on used by EmitFuncArgumentDbgValue to
create DBG_VALUE instructions that denotes that a variable
can be found on the stack.
I discovered that for our (big endian) out-of-tree target
the association created by SelectionDAGISel::LowerArguments
sometimes is wrong. I've seen this happen when a 64-bit value
is passed on the stack. The argument will occupy two stack
slots (frame index X, and frame index X+1). The fault is
that a call to setArgumentFrameIndex is associating the
64-bit argument with frame index X+1. The effect is that the
debug information (DBG_VALUE) will point at the least significant
part of the arguement on the stack. When printing the
argument in a debugger I will get the wrong value.
I managed to create a test case for PowerPC that seems to
show the same kind of problem.
The bugfix will look at the datalayout, taking endianness into
account when examining a BUILD_PAIR node, assuming that the
least significant part is in the first operand of the BUILD_PAIR.
For big endian targets we should use the frame index from
the second operand, as the most significant part will be stored
at the lower address (using the highest frame index).
Reviewers: bogner, rnk, hfinkel, sdardis, aprantl
Reviewed By: aprantl
Subscribers: nemanjai, aprantl, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37740
llvm-svn: 313901
This makes all paths lowercase on Windows, which seemed like a
good idea at the time, but it means that tests can't properly
use FileCheck to match expected path names.
llvm-svn: 313889
Config map is not exposed through the command line, so testing this
is somewhat tricky. But basically we need a test that if a custom
driver builds a config map and passes it to main, it gets respected.
A config map allows config files in the source tree to be mapped
to alternate config files in the build tree. This particular test
works by having two config files in separate directories, and
setting up a config map to have that redirects A/lit.site.cfg
to B/altconfig. Then, we print a message in A/lit.site.cfg
and B/altconfig and check that we do see the output from B
but don't see the output from A. Additionally we test that
the test suite specified by A's config map is properly discovered.
Differential Revision: https://reviews.llvm.org/D38105
llvm-svn: 313887
This patch updates register allocation to enable spilling gprs to
volatile vector registers rather than the stack. It can be enabled
for Power9 with option -ppc-enable-gpr-to-vsr-spills.
Differential Revision: https://reviews.llvm.org/D34815
llvm-svn: 313886
This is a bit ugly because we can't put Optional into a union. Hide all
of that behind a set of accessors and make accesses safer using asserts.
llvm-svn: 313884
In case of using a "nested" relocation expressions like this
`%hi(%neg(%gp_rel()))`, N32 ABI requires generation of three consecutive
relocations. That differs from the N64 ABI case where all relocations
are packed into the single relocation record.
llvm-svn: 313879
Now we pass the 'Is64_' flag to the MCELFObjectTargetWriter ctor iif
when we make deal with N64 ABI. So it is redundant to pass additional
'IsN64' flag.
llvm-svn: 313878
More conversions to load-and-test can be made with this patch by adding a
forward search in optimizeCompareZero().
Review: Ulrich Weigand
https://reviews.llvm.org/D38076
llvm-svn: 313877
.. as well as the two subsequent changes r313826 and r313875.
This leads to segfaults in combination with ASAN. Will forward repro
instructions to the original author (rnk).
llvm-svn: 313876
Summary:
There already was code that tried to remove the dbg.declare, but that code
was placed after we had called
I->replaceAllUsesWith(UndefValue::get(I->getType()));
on the alloca, so when we searched for the relevant dbg.declare, we
couldn't find it.
Now we do the search before we call RAUW so there is a chance to find it.
An existing testcase needed update due to this. Two dbg.declare with undef
were removed and then suddenly one of the two CHECKS failed.
Before this patch we got
call void @llvm.dbg.declare(metadata i24* undef, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
call void @llvm.dbg.declare(metadata %struct.prog_src_register* undef, metadata !14, metadata !DIExpression()), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
and with it we get
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !15
call void @llvm.dbg.value(metadata i32 0, metadata !14, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 24)), !dbg !15
However, the CHECKs in the testcase checked things in a silly order, so
they only passed since they found things in the first dbg.declare. Now
we changed the order of the checks and the test passes.
Reviewers: rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37900
llvm-svn: 313875
The N32 ABI uses RELA relocation format, do not use 3-in-1 relocation's
encoding, and uses ELFCLASS32. This change passes the `IsN32` flag
to the `MCAsmBackend` to distinguish usage of N32 ABI.
We still do not handle some cases like providing the `-target-abi=o32`
command line option with the `mips64` target triple. That's why
elf_header.s contains some "FIXME" strings. This case will be fixed in
a separate patch.
Differential revision: https://reviews.llvm.org/D37960
llvm-svn: 313873
This patch prevents dsymutil from resolving a reference to a NULL DIE
when a bogus reference happens to be coincidentally referencing a NULL
DIE. Now this is detected as an invalid reference and a warning is
printed.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=33873
Differential revision: https://reviews.llvm.org/D38078
llvm-svn: 313872
This patch contains fix for reverted commit
rL312318 which was causing failure due to use
of unchecked dyn_cast to CIInit.
Patch by: Nikola Prica.
llvm-svn: 313870
Revert the patch causing the functional failures.
The patch owner is notified with test cases which fail.
Test case has been provided to Maxim offline.
llvm-svn: 313857
Summary:
This appears to break some bots, when getToolsPath fails to find some or
all of the tools (for example, an incomplete GnuWin32 installation).
Reviewers: zturner, modocache
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D38115
llvm-svn: 313854
Passing "-dump" to llvm-cov will now print more detailed information
about function hash and counter mismatches. This should make it easier
to debug *.profdata files which contain incorrect records, and to debug
other scenarios where coverage goes missing due to mismatch issues.
llvm-svn: 313853
We can have a v_mac with an immediate src0.
We can still fold if it's an inline immediate,
otherwise it already uses the constant bus.
llvm-svn: 313852
Many editors and Python-related diagnostics tools such as
debuggers break or fail in mysterious ways when python files
don't end in .py. This is especially true on Windows, but
still exists on other platforms. I don't want to be too heavy
handed in changing everything across the board, but I do want
to at least *allow* lit configs to have .py extensions. This
patch makes the discovery process first look for a config file
with a .py extension, and if one is not found, then looks for
a config file using the old method. So for existing users, there
should be no functional change.
Differential Revision: https://reviews.llvm.org/D37838
llvm-svn: 313849
Summary:
This manifested itself in lld since it meant that weak
symbols were not appearing in archive symbol tables.
Subscribers: jfb, dschuff, jgravelle-google, aheejin
Differential Revision: https://reviews.llvm.org/D38111
llvm-svn: 313838
Summary:
The documentation refers to a boolean that controls whether response files are
handled, but this is incorrect. Since r165535, response files are always
enabled.
Reviewers: compnerd, rafael
Reviewed By: compnerd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38095
llvm-svn: 313830
I noticed this inefficiency while investigating PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603
This fix will likely push another bug (we don't maintain state of 'LateSimplifyCFG')
into hiding, but I'll try to clean that up with a follow-up patch anyway.
llvm-svn: 313829
Summary:
This implements the design discussed on llvm-dev for better tracking of
variables that live in memory through optimizations:
http://lists.llvm.org/pipermail/llvm-dev/2017-September/117222.html
This is tracked as PR34136
llvm.dbg.addr is intended to be produced and used in almost precisely
the same way as llvm.dbg.declare is today, with the exception that it is
control-dependent. That means that dbg.addr should always have a
position in the instruction stream, and it will allow passes that
optimize memory operations on local variables to insert llvm.dbg.value
calls to reflect deleted stores. See SourceLevelDebugging.rst for more
details.
The main drawback to generating DBG_VALUE machine instrs is that they
usually cause LLVM to emit a location list for DW_AT_location. The next
step will be to teach DwarfDebug.cpp how to recognize more DBG_VALUE
ranges as not needing a location list, and possibly start setting
DW_AT_start_offset for variables whose lifetimes begin mid-scope.
Reviewers: aprantl, dblaikie, probinson
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37768
llvm-svn: 313825
This reverts commit SVN r313668. The original test case attempted to
write a pointer value into 16-bits, although the value may exceed the
range representable in 16-bits. Ensure that the symbol is located in
the address space such that its absolute address is representable in
16-bits. This should fix the assertion failure that was seen on the
Windows hosts.
llvm-svn: 313822
We already did (X & C2) > C1 --> (X & C2) != 0, if any bit set in (X & C2) will produce a result greater than C1. But there is an equivalent inverse condition with <= C1 (which will be canonicalized to < C1+1)
Differential Revision: https://reviews.llvm.org/D38065
llvm-svn: 313819
The Swift CC is identical to Win64 CC with the exception of swift error
being passed in r12 which is a CSR. However, since this calling
convention is only used in swift -> swift code, it does not impact
interoperability and can be treated entirely as Win64 CC. We would
previously incorrectly lower the frame setup as we did not treat the
frame as conforming to Win64 specifications.
llvm-svn: 313813
These write to the low and high half of the destination
register and leave the other 16-bits unchanged. This is true
for most 16-bit instructions on gfx9, but we don't use that
now.
llvm-svn: 313812
Summary: Resubmission of D37937. Fixed i386 target building (conversion from std::size_t& to uint64_t& failed). Fixed documentation warning failure about docs/CFIVerify.rst not being in the tree.
Reviewers: vlad.tsyrklevich
Reviewed By: vlad.tsyrklevich
Patch by Mitch Phillips
Subscribers: sbc100, mgorny, pcc, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D38089
llvm-svn: 313809
Also add some tests that should be able to use v_mad_mixhi_f16,
but do not yet. This is trickier because we don't really model
the partial update of the register done by 16-bit instructions.
llvm-svn: 313806
Summary: Resubmission of D37937. Fixed i386 target building (conversion from std::size_t& to uint64_t& failed). Fixed documentation warning failure about docs/CFIVerify.rst not being in the tree.
Reviewers: vlad.tsyrklevich
Reviewed By: vlad.tsyrklevich
Patch by Mitch Phillips
Subscribers: mgorny, pcc, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D38089
llvm-svn: 313798
Add adds support for naming data segments. This is useful
useful linkers so that they can merge similar sections.
Differential Revision: https://reviews.llvm.org/D37886
llvm-svn: 313795
Add support for passing SwiftError through a register on the Windows x64
calling convention. This allows the use of swifterror attributes on
parameters which is used by the swift front end for the `Error`
parameter. This partially enables building the swift standard library
for Windows x86_64.
llvm-svn: 313791
This enables readobj to output Windows resource files (.res). This way,
we'll be able to test .res outputs without comparing them byte-by-byte
with "magic binary files" generated by MS toolchain.
Differential Revision: https://reviews.llvm.org/D38058
llvm-svn: 313790
This patch renames K_MIPS64 to K_GNU64 as part of a change to add
support for writing archives with 64-bit indexes in the symbol table.
llvm-svn: 313787
After r313775, it's easier to maintain a parallel BitVector of spilled
locations indexed by location number.
I wasn't able to build a good reduced test case for this iteration of
the bug, but I added a more direct assertion that spilled values must
use frame index locations. If this bug reappears, it won't only fire on
the NEON vector code that we detected it on, but on medium-sized
integer-only programs as well.
llvm-svn: 313786
This changes some STL data types to corresponding LLVM
data types that have better performance characteristics.
Differential Revision: https://reviews.llvm.org/D37957
llvm-svn: 313783
This broke the buildbots, e.g.
http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/391
> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask'
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Subscribers: mzolotukhin
>
> Reviewed By: ayal
>
> Differential Revision: https://reviews.llvm.org/D36130
>
> Review comments updated accordingly
>
> Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
>
> Added a TODO for sortLoadAccesses API
>
> Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
>
> Modified the TODO for sortLoadAccesses API
>
> Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
>
> Review comment update for using OpdNum to insert the mask in respective location
>
> Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
>
> Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
>
> Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313781
This patch implements the Darwin dwarfdump option --recurse-depth=<N>,
which limits the recursion depth when selectively printing DIEs at an
offset.
Differential Revision: https://reviews.llvm.org/D38064
llvm-svn: 313778
Summary:
The new code should be linear in the number of DBG_VALUEs, while the old
code was quadratic. NFC intended.
This is also hopefully a more direct expression of the problem, which is
to:
1. Rewrite all virtual register operands to stack slots or physical
registers
2. Uniquely number those machine operands, assigning them location
numbers
3. Rewrite all uses of the old location numbers in the interval map to
use the new location numbers
In r313400, I attempted to track which locations were spilled in a
parallel bitvector indexed by location number. My code was broken
because these location numbers are not stable during rewriting.
Reviewers: aprantl, hans
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D38068
llvm-svn: 313775
In these cases, two selects have constant selectable operands for
both the true and false components and have the same conditional
expression.
We then create two arithmetic operations of the same type and feed a
final select operation using the result of the true arithmetic for the true
operand and the result of the false arithmetic for the false operand and reuse
the original conditionl expression.
The arithmetic operations are naturally folded as a consequence, leaving
only the newly formed select to replace the old arithmetic operation.
Patch by: Michael Berg <michael_c_berg@apple.com>
Differential Revision: https://reviews.llvm.org/D37019
llvm-svn: 313774
I did not upload two binaries that I reference in tests.
This change adds support for sections involved in dynamic loading such
as SHT_DYNAMIC, SHT_DYNSYM, and allocated string tables.
The two added binaries used for tests can be downloaded here and here
Differential Revision: https://reviews.llvm.org/D36560
llvm-svn: 313772
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask'
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Subscribers: mzolotukhin
Reviewed By: ayal
Differential Revision: https://reviews.llvm.org/D36130
Review comments updated accordingly
Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0
Added a TODO for sortLoadAccesses API
Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58
Modified the TODO for sortLoadAccesses API
Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565
Review comment update for using OpdNum to insert the mask in respective location
Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce
Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase
Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b
llvm-svn: 313771
This adds an LLVM_ENABLE_IR_PGO option to enable building llvm and its
tools with IR PGO instrumentation.
Usage: -DLLVM_BUILD_INSTRUMENTED=On -DLLVM_ENABLE_IR_PGO=On (both
options must be enabled)
Differential Revision: https://reviews.llvm.org/D38066
llvm-svn: 313770
I overzealously landed this before I was sure that another change
wouldn't break the build that this change depends on.
This change adds support for sections involved in dynamic loading such
as SHT_DYNAMIC, SHT_DYNSYM, and allocated string tables.
The two added binaries used for tests can be downloaded here and here
Differential Revision: https://reviews.llvm.org/D36560
llvm-svn: 313767
Summary:
The fix for dead stripping analysis in the case of SamplePGO indirect
calls to local functions (r313151) introduced the possibility of an
infinite loop.
Make sure we check for the value being already live after we update it
for SamplePGO indirect call handling.
Reviewers: danielcdh
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D38086
llvm-svn: 313766
Bug pointed out by EricWF. This would construct a path where
items would be added in the wrong order, potentially leading
to using the wrong tools for testing.
llvm-svn: 313765
Despite a strong CMake warning that this is an unsupported
libcxx build configuration, some bots still rely on being
able to check out lit and libcxx independently with no
LLVM sources, and then run lit against libcxx.
A previous patch broke that workflow, so this is making it work
again. Unfortunately, it breaks generation of the llvm-lit
script for libcxx, but we will just have to live with that until
a solution is found that allows libcxx to make more use of
llvm build pieces. libcxx can still run tests by using the
ninja check target, or by running lit.py directly against the
build tree or source tree.
Differential Revision: https://reviews.llvm.org/D38057
llvm-svn: 313763
Remove unneeded attributes from test/DebugInfo/Generic/imported-name-inlined.ll because it was causing failures on pure MIPS builds.
Patch by Miloš Stojanović!
Differential Revision: https://reviews.llvm.org/D38079
llvm-svn: 313762
This version of the patch fixes an off-by-one error causing PR34596. We
do not need to use std::next(BlockIter) when calling updateDepths, as
BlockIter already points to the next element.
Original commit message:
> For large basic blocks with lots of combinable instructions, the
> MachineTraceMetrics computations in MachineCombiner can dominate the compile
> time, as computing the trace information is quadratic in the number of
> instructions in a BB and it's relevant successors/predecessors.
> In most cases, knowing the instruction depth should be enough to make
> combination decisions. As we already iterate over all instructions in a basic
> block, the instruction depth can be computed incrementally. This reduces the
> cost of machine-combine drastically in cases where lots of instructions
> are combined. The major drawback is that AFAIK, computing the critical path
> length cannot be done incrementally. Therefore we only compute
> instruction depths incrementally, for basic blocks with more
> instructions than inc_threshold. The -machine-combiner-inc-threshold
> option can be used to set the threshold and allows for easier
> experimenting and checking if using incremental updates for all basic
> blocks has any impact on the performance.
>
> Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
>
> Reviewed By: fhahn
>
> Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D36619
llvm-svn: 313751
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
Commit after rebase for patch D36130
Change-Id: I8add1c265455669ef288d880f870a9522c8c08ab
llvm-svn: 313736
If these checks fail we end up not selecting an instruction at all. So we are already relying on the immediate being checked upstream of isel. So doing the check in isel is just bloat to the isel table. Interestingly, we didn't check on the AVX512 version of the instructions anyway.
llvm-svn: 313724
Also starts selecting global loads for constant address
in some cases. Some end up selecting to mubuf still, which
requires investigation.
We still get sub-optimal regalloc and extra waitcnts inserted
due to not really tracking the liveness of the separate register
halves.
llvm-svn: 313716
Summary:
With this change:
- Methods in LoopBase trip an assert if the receiver has been invalidated
- LoopBase::clear frees up the memory held the LoopBase instance
This change also shuffles things around as necessary to work with this stricter invariant.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38055
llvm-svn: 313708
Many svn-based buildbots seem to be getting stuck continually
in tree conflicts due to the output of pyc files. I'm disabling
these as a temporary measure in an attempt to get everything
stable again.
I'll try to remove this code once I understand the problem
better.
llvm-svn: 313698
This re-applies commit r313685, this time with the proper updates to
the test cases.
Original commit message:
Unreachable blocks in the machine instr representation are these
weird empty blocks with no successors.
The MIR printer used to not print empty lists of successors. However,
the MIR parser now treats non-printed list of successors as "please
guess it for me". As a result, the parser tries to guess the list of
successors and given the block is empty, just assumes it falls through
the next block (if any).
For instance, the following test case used to fail the verifier.
The MIR printer would print
entry
/ \
true (def) false (no list of successors)
|
split.true (use)
The MIR parser would understand this:
entry
/ \
true (def) false
| / <-- invalid edge
split.true (use)
Because of the invalid edge, we get the "def does not
dominate all uses" error.
The fix consists in printing empty successor lists, so that the parser
knows what to do for unreachable blocks.
rdar://problem/34022159
llvm-svn: 313696
Summary:
See comment for why I think this is a good idea.
This change also:
- Removes an SCEV test case. The SCEV test was not testing anything useful (most of it was `#if 0` ed out) and it would need to be updated to deal with a private ~Loop::Loop.
- Updates the loop pass manager test case to deal with a private ~Loop::Loop.
- Renames markAsRemoved to markAsErased to contrast with removeLoop, via the usual remove vs. erase idiom we already have for instructions and basic blocks.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37996
llvm-svn: 313695
Add adds support for naming data segments. This is useful
useful linkers so that they can merge similar sections.
Differential Revision: https://reviews.llvm.org/D37886
llvm-svn: 313692
In the lambda we are now returning the remark by value so we need to preserve
its type in the insertion operator. This requires making the insertion
operator generic.
I've also converted a few cases to use the new API. It seems to work pretty
well. See the LoopUnroller for a slightly more interesting case.
llvm-svn: 313691
Summary: Introduces the llvm-cfi-verify tool to llvm. Includes the design document (docs/CFIVerify.rst). Current implementation of the tool is simply a disassembler that identifies and prints the indirect control flow instructions.
Reviewers: vlad.tsyrklevich
Reviewed By: vlad.tsyrklevich
Patch by Mitch Phillips
Subscribers: llvm-commits, kcc, pcc, mgorny
Differential Revision: https://reviews.llvm.org/D37937
llvm-svn: 313688
Unreachable blocks in the machine instr representation are these
weird empty blocks with no successors.
The MIR printer used to not print empty lists of successors. However,
the MIR parser now treats non-printed list of successors as "please
guess it for me". As a result, the parser tries to guess the list of
successors and given the block is empty, just assumes it falls through
the next block (if any).
For instance, the following test case used to fail the verifier.
The MIR printer would print
entry
/ \
true (def) false (no list of successors)
|
split.true (use)
The MIR parser would understand this:
entry
/ \
true (def) false
| / <-- invalid edge
split.true (use)
Because of the invalid edge, we get the "def does not
dominate all uses" error.
The fix consists in printing empty successor lists, so that the parser
knows what to do for unreachable blocks.
rdar://problem/34022159
llvm-svn: 313685
I didn't initialize a pointer to be nullptr that I needed to.
This change adds support for nested and even overlapping segments. This means
that PT_PHDR, PT_GNU_RELRO, PT_TLS, and PT_DYNAMIC can be supported properly.
Differential Revision: https://reviews.llvm.org/D36558
llvm-svn: 313682
The ARM docs suggest in examples that the flags can have either case, and there
are applications in the wild that (libopencm3, for example) that expect to be
able to use the uppercase spelling.
https://reviews.llvm.org/D37953
llvm-svn: 313680
This reverts r313640, originally r313400, one more time for essentially
the same issue. My BitVector of spilled location numbers isn't working
because we coalesce identical DBG_VALUE locations as we rewrite them,
invalidating the location numbers used to index the BitVector.
llvm-svn: 313679