Currently the lowering for i16 image coordinates asserts on gfx10. I'm
somewhat confused by this though. The feature is missing from the
gfx10 feature lists, but the a16 bit appears to be present in the
manual for MIMG instructions.
This is another part of a problem noted in PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024
The AVX2 code may use awkward 256-bit shuffles vs. the AVX code that gets split
into the expected 128-bit unpack instructions. We have to be selective in
matching the types where we try to do this though. Otherwise, we can end up
with more instructions (in the case of v8x32/v4x64).
Differential Revision: https://reviews.llvm.org/D72575
There was a change to trap1 instruction between v62 and v65. This
feature will allow the assembler/disassembler to handle different
variants depending on the CPU version.
Most X87 compare instructions write to the X87 status word, while the SSE (U)COMI compares write to rFLAGS. These are often handled very differently on CPUs (e.g. rFLAGS outputs typically involve a fpu2gpr transfer), and we shouldn't be grouping all these instructions behind a single class - so this patch splits off the SSE compares into a new WriteFComX class (and currently keeps the same behaviours). If there's a need to distinguish between X87 instructions more closely we can investigate that in the future, but as we don't handle any of the X87 side effects at the moment its unlikely to have any notable effect.
Introduce a method to walk through use-def chains to decide whether
it's possible to remove a given instruction and its users. These
instructions are then stored in a set until the end of the transform
when they're erased. This is now used to perform checks on the
iteration count (LoopDec chain), element count (VCTP chain) and the
possibly redundant iteration count.
As well as being able to remove chains of instructions, we know also
check that the sub feeding the vctp is producing the expected value.
Differential Revision: https://reviews.llvm.org/D71837
Blindly following unique-successors chain appeared to be a bad idea.
In a degenerate case when block jumps to itself that goes into endless loop.
Discovered this problem when playing with additional changes,
managed to reproduce it on existing LoopPredication code.
Fix by checking a "visited" set while iterating through unique successors.
Reviewed By: skatkov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72908
Add DemandedElts handling to ISD::ANY_EXTEND and add missing ISD::ANY_EXTEND_VECTOR_INREG handling. Despite the lack of test changes this code IS being used - its just that the ANY_EXTEND ops are legalized later on (typically to ZERO_EXTEND equivalents) so we typically manage to combine later on.
GCC will accept any case for assembler directives.
For example ".abort" and ".ABORT" (even ".aBoRt")
are equivalent.
https://sourceware.org/binutils/docs/as/Pseudo-Ops.html#Pseudo-Ops
"The names are case insensitive for most targets,
and usually written in lower case."
Change llvm-mc to accept any case for generic directives
or aliases of those directives.
This for Bugzilla #39527.
Differential Revision: https://reviews.llvm.org/D72686
Summary:
Several SVE intrinsics with immediate arguments (including those
added by D70253 & D70437) do not use the ImmArg property.
This patch adds ImmArg<Op> where required and changes
the appropriate patterns which match the immediates.
Reviewers: efriedma, sdesmalen, andwar, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72612
This include file was created in October and has a "using namespace llvm". This seems to get exposed to other include files and finally onto cpp files. While this somewhat okay for llvm itself, its bad for other projects that use llvm as a library and includes a header file that picks this up. This was found by ISPC which has some class names at gloal scope with the same names as LLVM.
It looks like RISCV accidentally became dependent on this. I fixed it by reordering some includes in the RISCV code, but maybe we want to change the TableGenEmitter to put "namespace llvm {" in the generated file instead? But we probably want to do the simplest thing first so we can merge it to 10.0.
Differential Revision: https://reviews.llvm.org/D72895
This is (more?) usable by GDB pretty printers and seems nicer to write.
There's one tricky caveat that in C++14 (LLVM's codebase today) the
static constexpr member declaration is not a definition - so odr use of
this constant requires an out of line definition, which won't be
provided (that'd make all these trait classes more annoyidng/expensive
to maintain). But the use of this constant in the library implementation
is/should always be in a non-odr context - only two unit tests needed to
be touched to cope with this/avoid odr using these constants.
Based on/expanded from D72590 by Christian Sigg.
Given the following situation:
x = G_FCONSTANT (something that can't be materialized)
G_STORE x, some_addr
We know that x must be materialized as at least a single mov. However, at the
time of selection, the G_STORE will have been regbankselected to a FPR store.
So, as a result, you'll get an unnecessary fmov into the G_STORE.
Storing a constant value in a GPR and a constant value in a FPR are the same.
So, whenever you see a G_FCONSTANT that feeds into only G_STORES, so might as
well make it a G_CONSTANT.
This adds a target-specific combine which changes G_FCONSTANTs feeding into
G_STOREs into G_CONSTANTs.
Differential Revision: https://reviews.llvm.org/D72814
This case can be handled as a regular selection pattern, so move it
out of the weird post-isel folding code which doesn't have an exactly
equivalent place in GlobalISel.
I think it doesn't make much sense to do this optimization here
though, and it would be more useful in instcombine. There's not really
any new information that will be gained during lowering since these
inputs were known from the beginning.
This change has 2 components:
Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It
describes how the Dwarf frame base will be encoded. That can be a register (the
default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or
a DW_OP_WASM_location descriptr.
WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the
correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs
has run. Make WebAssemblyExplicitLocals store the local it allocates for the
frame register. Use this local information to implement getDwarfFrameBase
The result is that the DW_AT_frame_base attribute is correctly encoded for each
subprogram, and each param and local variable has a correct DW_AT_location that
uses DW_OP_fbreg to refer to the frame base.
Differential Revision: https://reviews.llvm.org/D71681
We lifted this code from InstCombine for general usage in:
rL369842
...but it's not safe as-is. There are no existing users that can
trigger this bug, but I discovered it via crashing several
regression tests when trying to use it for select folding in
InstSimplify.
ICmp requires (vector) integer types, so give up on anything that's
not integer or FP (pointers and ?) then bitcast the constants
before trying the match. That matches the definition of "equal or
undef" that I was looking for. If someone wants an FP-aware version
of equality (deal with NaN, -0.0), that could be a different mode
or different function.
Differential Revision: https://reviews.llvm.org/D72784
Introduce parsing, add a few instances of parameter use into GVN-PRE tests.
Reviewers: skatkov, asbirlea
Reviewed By: skatkov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72752
This was assuming the narrow target was the source type. Respect the
requested type when these don't match by using intermediate
merges. This avoids producing very wide, illegal shift expansions.
The algorithm here only works if the sint_to_fp doesn't do any
rounding. Otherwise it can round before the offset fixup is
applied. Add an assert to protect this.
To avoid breaking the one test in tree that tested this code
with a set of types that fail the assert, I've enabled i32->f32
to use the i64->f32 algorithm. This only occurs when f64 isn't
a legal type. If f64 is legal then we do i32->f64->f32 instead.
Differential Revision: https://reviews.llvm.org/D72794
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.
Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
This reverts commit 3f3017e because there's a failure on peel-loop-nests.ll
with LLVM_ENABLE_EXPENSIVE_CHECKS on.
Differential Revision: https://reviews.llvm.org/D70304
Summary:
The `llc` tool currently defaults to Static relocation model and generates non-relocatable code for 32-bit Power.
This is not desirable on AIX where we always generate Position Independent Code (PIC). This patch makes PIC the default relocation model for AIX.
Reviewers: daltenty, hubert.reinterpretcast, DiggerLin, Xiangling_L, sfertile
Reviewed By: hubert.reinterpretcast
Subscribers: mgorny, wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72479
There are a few global (cl::opt) controls that enable optional
behavior in GVN. Introduce GVNOptions that provide corresponding
per-pass instance controls.
That will allow to use GVN multiple times in pipeline each time
with different settings.
Reviewers: asbirlea, rnk, reames, skatkov, fhahn
Reviewed By: fhahn
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72732
Summary:
The old pass manager separated speed optimization and size optimization
levels into two unsigned values. Coallescing both in an enum in the new
pass manager may lead to unintentional casts and comparisons.
In particular, taking a look at how the loop unroll passes were constructed
previously, the Os/Oz are now (==new pass manager) treated just like O3,
likely unintentionally.
This change disallows raw comparisons between optimization levels, to
avoid such unintended effects. As an effect, the O{s|z} behavior changes
for loop unrolling and loop unroll and jam, matching O2 rather than O3.
The change also parameterizes the threshold values used for loop
unrolling, primarily to aid testing.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: zzheng, ychen, mehdi_amini, hiraditya, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72547
Recommitting e93e0d413f after reverting due to test failures, which
will hopefully now be fixed. Original commit message:
After expanding the pseudo instructions, update the liveness info.
We do this in a post-order traversal of the loop, including its
exit blocks and preheader(s).
Differential Revision: https://reviews.llvm.org/D72131
Summary:
This commits is a rework of the patch in
https://reviews.llvm.org/D67572.
The rework was requested to prevent out-of-tree performance regression
when vectorizing out-of-tree IR intrinsics. The vectorization of such
intrinsics is enquired via the static function `isTLIScalarize`. For
detail see the discussion in https://reviews.llvm.org/D67572.
Reviewers: uabelho, fhahn, sdesmalen
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72734
Testing compiler-rt, a new assertion failure occurs when building
the GwpAsanTestObjects object. I'm uploading a reproducer to D70597.
This reverts commit 75188b01e9.
If there are DBG_VALUEs between phi and label (after phi and before label),
DBG_VALUE will block PHI lowering after the LABEL. Moving all DBG_VALUEs
after Labels in the function ScheduleDAGSDNodes::EmitSchedule to avoid
impacting PHI lowering.
before:
PHI
DBG_VALUE
LABEL
after: (move DBG_VALUE after label)
PHI
LABEL
DBG_VALUE
then: (phi lowering after label)
LABEL
COPY
DBG_VALUE
Fixes the issue: https://bugs.llvm.org/show_bug.cgi?id=43859
Differential Revision: https://reviews.llvm.org/D70597
The assume intrinsic is intentionally marked as may reading/writing
memory, to avoid passes moving them around. When flattening the CFG
for predicated blocks, we have to drop the assume calls, as they
are control-flow dependent.
There are some cases where we can do better (when control flow is
preserved), but that is follow-up work.
Fixes PR43620.
Reviewers: hsaito, rengolin, dcaballe, Ayal
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D68814
Summary:
This change implements the expansion in two parts:
- Add a utility function emitAMDGPUPrintfCall() in LLVM.
- Invoke the above function from Clang CodeGen, when processing a HIP
program for the AMDGPU target.
The printf expansion has undefined behaviour if the format string is
not a compile-time constant. As a sufficient condition, the HIP
ToolChain now emits -Werror=format-nonliteral.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D71365
Summary: Support for i64 arguments (in register), return values and constants along with tests.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D72776
This was moved in October 2018, but we don't appear to be using
this for vectors on any in tree target.
Moving it back simplifies D72794 so we can share the code for i32->f32.
float-point exception.
This patch also modify some mayRaiseFPException flag which set in D68854.
Differential Revision: https://reviews.llvm.org/D72750
This will provide a more consistent view to codegen for these
attributes. The current system is somewhat awkward, and the fields in
TargetOptions are reset based on the command line flag if the
attribute isn't set. By forcing these attributes with the flag, there
can never be an inconsistency in the behavior if code directly
inspects the attribute on the function without considering the command
line flags.
The code is trying to copy the i64 value to an xmm register to
use a 64-bit store so that the 64-bit fild can benefit from
store forwarding.
But this trick only works if f64 is going to be stored in an
XMM register. If we only have SSE1 then only float is in xmm
register. So this trick just causes 2 stores i32 stores, an f64
load into the x87, an f64 from x87, and a 64-bit fild. So we end
up with an extra stack temporary and still didn't get store forwarding.
We might be able to use v2f32 here instead, but I didn't check. I
just wanted the code to make sense.
Found by inspection as I continue to stare too hard at our
int_to_fp conversions.
Summary:
This patch could be treated as a rebase of D33960. It also fixes PR35547.
A fix for `llvm/test/Other/close-stderr.ll` is proposed in D68164. Seems
the consensus is that the test is passing by chance and I'm not
sure how important it is for us. So it is removed like in D33960 for now.
The rest of the test fixes are just adding `--crash` flag to `not` tool.
** The reason it fixes PR35547 is
`exit` does cleanup including calling class destructor whereas `abort`
does not do any cleanup. In multithreading environment such as ThinLTO or JIT,
threads may share states which mostly are ManagedStatic<>. If faulting thread
tearing down a class when another thread is using it, there are chances of
memory corruption. This is bad 1. It will stop error reporting like pretty
stack printer; 2. The memory corruption is distracting and nondeterministic in
terms of error message, and corruption type (depending one the timing, it
could be double free, heap free after use, etc.).
Reviewers: rnk, chandlerc, zturner, sepavloff, MaskRay, espindola
Reviewed By: rnk, MaskRay
Subscribers: wuzish, jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, arichardson, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, lenary, s.egerton, pzheng, cfe-commits, MaskRay, filcab, davide, MatzeB, mehdi_amini, hiraditya, steven_wu, dexonsmith, rupprecht, seiya, llvm-commits
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D67847
When tail duplication estimates a size of tail it uses instruction
count. Account for a number of instrictions in a bundle too.
Differential Revision: https://reviews.llvm.org/D72783
After extracting, fix up debug info in both the old and new functions by
1) Pointing line locations and debug intrinsics to the new subprogram
scope, and
2) Deleting intrinsics which point to values outside of the new
function.
Depends on https://reviews.llvm.org/D72795.
Testing: check-llvm, check-clang, a build of LNT in the `-Os -g` config
with "-mllvm -hot-cold-split=1" set, and end-to-end debugging of a toy
program which undergoes splitting to verify that lldb can find
variables, single step, etc. in extracted code.
rdar://45507940
Differential Revision: https://reviews.llvm.org/D72801
This does produce slightly different code. Now a unique IMPLICIT_DEF
is emitted for each of the implicit_def operands, rather than reusing
the same one.
I'm mildly worried about potentially reordering exp/exp_done with
IntrWriteMem on the intrinsic.
Requires hacking out the illegal type on SI, so manually select that
case during lowering.
This now develops the same problem G_ZEXT/G_ANYEXT have where the
requested type is assumed to be the source type. This will be fixed
separately by creating intermediate merges.
Summary:
In July 21 2010 `llvm::NamedMDNode` was refactored such that it would no
longer subclass `llvm::Value`:
https://github.com/llvm/llvm-project/commit/2637cc1a38d7336ea30caf
As part of this change, a map type from metadata names to their named
metadata, `llvm::MDSymbolTable`, was deleted. In its place, the type
of member `llvm::Module::NamedMDSymTab` was changed, from
`llvm::MDSymbolTable` to `void *`. The underlying memory allocations
for this pointer were changed to `new StringMap<NamedMDNode *>()`.
However, as far as I can tell, there's no need for obscuring the
underlying type being pointed to by the `void *`, and no need for
static casts from `void *` to `StringMap`. In fact, I don't think
there's a need for explicit calls to `new` and `delete` at all.
This commit changes `NamedMDSymTab` from a pointer to a reference, which
automatically couples its lifetime with the lifetime of its owning
`llvm::Module` instance, thus removing the explicit calls to `new` and
`delete` in the `llvm::Module` constructor and destructor. It also
changes the type from `void *` to a newly defined `NamedMDSymTabType`,
and removes the static casts.
Test Plan:
An ASAN-enabled build and run of `check-all` succeeds with this change
(aside from some tests that always fail for me in ASAN for some reason,
such as `check-clang` `SemaTemplate/stack-exhaustion.cpp`).
Reviewers: aprantl, dblaikie, chandlerc, pcc, echristo
Reviewed By: dblaikie
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72812
It appears to be rather useful when analyzing Loops with multiple
deoptimizing exits, perhaps merged ones.
For now it is used in LoopPredication, will be adding more uses
in other loop passes.
Reviewers: asbirlea, fhahn, skatkov, spatel, reames
Reviewed By: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72754
Summary:
InlineResult is used both in APIs assessing whether a call site is
inlinable (e.g. llvm::isInlineViable) as well as in the function
inlining utility (llvm::InlineFunction). It means slightly different
things (can/should inlining happen, vs did it happen), and the
implicit casting may introduce ambiguity (casting from 'false' in
InlineFunction will default a message about hight costs,
which is incorrect here).
The change renames the type to a more generic name, and disables
implicit constructors.
Reviewers: eraman, davidxl
Reviewed By: davidxl
Subscribers: kerbowa, arsenm, jvesely, nhaehnle, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72744
Summary: Duplicate code in widenWithVariantLoadUseCodegen is removed and also use assert to check unknown extension type as it should be filtered out by the pre condition check before calling this function.
Reviewers: az, sanjoy, sebpop, efriedma, javed.absar, sanjoy.google
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits, amehsan
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72652
Factor out the logic needed to update debug locations contained within
MD_loop metadata.
This refactor is preparation for a future change that also needs to
rewrite MD_loop metadata.
rdar://45507940
This reverts D53469, which changed llvm's DWARF emission to emit
DW_AT_call_return_pc as a function-local offset. Such an encoding is not
compatible with post-link block re-ordering tools and isn't standards-
compliant.
In addition to reverting back to the original DW_AT_call_return_pc
encoding, teach lldb how to fix up DW_AT_call_return_pc when the address
comes from an object file pointed-to by a debug map. While doing this I
noticed that lldb's support for tail calls that cross a DSO/object file
boundary wasn't covered, so I added tests for that. This latter case
exercises the newly added return PC fixup.
The dsymutil changes in this patch were originally included in D49887:
the associated test should be sufficient to test DW_AT_call_return_pc
encoding purely on the llvm side.
Differential Revision: https://reviews.llvm.org/D72489
Based on Don Hinton's patch in https://reviews.llvm.org/D72406. This feature
was accidentally left out of e9e26c01cd, and
would have pessimized concurrent compilation in the default case.
Thanks for spotting this Don!
This patch imports constant variables even when they can't be internalized
(which results in promotion). This offers some extra constant folding
opportunities.
Differential revision: https://reviews.llvm.org/D70404
Summary:
Current peeling implementation bails out in case of loop nests.
The patch introduces a field in TargetTransformInfo structure that
certain targets can use to relax the constraints if it's
profitable (disabled by default).
Also additional option is added to enable peeling manually for
experimenting and testing purposes.
Reviewers: fhahn, lebedev.ri, xbolva00
Reviewed By: xbolva00
Subscribers: xbolva00, hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D70304
As discussed in the motivating PR44509:
https://bugs.llvm.org/show_bug.cgi?id=44509
...we can end up with worse code using fast-math than without.
This is because the reassociate pass greedily transforms fsub
into fneg/fadd and apparently (based on the regression tests
seen here) expects instcombine to clean that up if it wasn't
profitable. But we were missing this fold:
(X - Y) - Z --> X - (Y + Z)
There's another, more specific case that I think we should
handle as shown in the "fake" fneg test (but missed with a real
fneg), but that's another patch. That may be tricky to get
right without conflicting with existing transforms for fneg.
Differential Revision: https://reviews.llvm.org/D72521
This patch makes the target triple available via the LLJIT interface, and moves
the IRTransformLayer from LLLazyJIT down into LLJIT. Together these changes make
it easier to use the lazyReexports utility with LLJIT, and to apply IR
transforms to code as it is compiled in LLJIT (rather than requiring transforms
to be applied manually before code is added). An code example is added in
llvm/examples/LLJITExamples/LLJITWithLazyReexports
A bug in the existing implementation meant that lazyReexports would not work if
the aliased name differed from the alias's name, i.e. all lazy reexports had to
be of the form (lib1, name) -> (lib2, name). This patch fixes the issue by
capturing the alias's name in the NotifyResolved callback. To simplify this
capture, and the LazyCallThroughManager code in general, the NotifyResolved
callback is updated to use llvm::unique_function rather than a custom class.
No test case yet: This can only be tested at runtime, and the only in-tree
client (lli) always uses aliases with matching names. I will add a new LLJIT
example shortly that will directly test the lazyReexports API and the
non-trivial alias use case.
Summary:
This patch implements `formatv()` formatting for `dwarf::LineNumberOps`
and makes use of it for the `llvm-dwarfdump --debug-line` dump.
Previously, unknown line number standard opcodes would lead to undefined
behaviour. The code would attempt to format the data pointer of an empty
`StringRef` (a null pointer) using `%s`. According to the description
for `format()`, use of that interface carries the "risk of `printf`".
Passing a null pointer in place of an array to a C library function
results in undefined behaviour.
Reviewers: jhenderson, daltenty, stevewan
Reviewed By: jhenderson
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72369
Bitcast only really applies between scalars and vectors. Implement as
an unmerge and remerge. The test needs to tolerate failure since one
of the unmerges currently fails to legalize.
The 16 bank LDS case is complicated due to using multiple
instructions. If I attempt to write a pattern for it, the generated
selector incorrectly places the copy to m0 after the first
instruction, so that needs to be separately addressed.
Also fix not gluing the copy to m0 to the second operation in the
second half of the 16 bank lowering.
These intrinsics and the corresponding ISD nodes were recently added. PPC has
instructions that do this for vectors. Legalize them and add patterns to emit
the satuarting instructions.
Differential revision: https://reviews.llvm.org/D71940
These intrinsics expand to a variable number of instructions so just like in
ISelLowering.cpp we use custom code to deal with them.
Committing Tim's original patch.
Differential Revision: https://reviews.llvm.org/D65656
----
Breaks EXPENSIVE_CHECKS builds.
if users don't specific -mattr, the default target-feature come
from IR attribute.
Reviewers: lenary, asb
Reviewed By: lenary, asb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70837
Note: this is a reland with a trivial 2 lines fix in ELFState<ELFT>::writeSectionContent.
It adds a check similar to ones we already have for other sections to fix the case revealed
by bots, like http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/60744.
The encoded sequence of Elf*_Relr entries in a SHT_RELR section looks
like [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
i.e. start with an address, followed by any number of bitmaps. The address
entry encodes 1 relocation. The subsequent bitmap entries encode up to 63(31)
relocations each, at subsequent offsets following the last address entry.
More information is here:
https://github.com/llvm-mirror/llvm/blob/master/lib/Object/ELF.cpp#L272
This patch adds a support for these sections.
Differential revision: https://reviews.llvm.org/D71872
The encoded sequence of Elf*_Relr entries in a SHT_RELR section looks
like [ AAAAAAAA BBBBBBB1 BBBBBBB1 ... AAAAAAAA BBBBBB1 ... ]
i.e. start with an address, followed by any number of bitmaps. The address
entry encodes 1 relocation. The subsequent bitmap entries encode up to 63(31)
relocations each, at subsequent offsets following the last address entry.
More information is here:
https://github.com/llvm-mirror/llvm/blob/master/lib/Object/ELF.cpp#L272
This patch adds a support for these sections.
Differential revision: https://reviews.llvm.org/D71872
The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.
This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.
Differential revision: https://reviews.llvm.org/D68092
Summary:
This patch implements minimal VE code generation for empty function bodies (no args, no value return).
Contents
* empty function code generation test.
* Minimal function prologue & epilogue emission
* Instruction formats and instruction definitions as far as required for the empty function prologue & epilogue.
* I64 register class definitions.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D72598
We were performing an emulated i32->f64 in the SSE registers, then
storing that value to memory and doing a extload into the X87
domain.
After this patch we'll now just store the i32 to memory along
with an i32 0. Then do a 64-bit FILD to f80 completely in the X87
unit. This matches what we do without SSE.
Summary:
This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point.
One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument).
The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty).
Currently, AAValueConstantRange is created in `getAssumedConstant` method when `AAValueSimplify` returns `nullptr`(worst state).
Supported
- BinaryOperator(add, sub, ...)
- CmpInst(icmp eq, ...)
- !range metadata
`AAValueConstantRange` is not intended to extend to polyhedral range value analysis.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: phosek, davezarzycki, baziotis, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71620
All the callers of this function will be ScheduleDAGMI from the
MachineScheduler. This allows us to use the extra info available in
ScheduleDAGMI without resorting to awkward casts.
Summary:
As currently written, -target powerpcspe will enable SPE regardless of
disabling the feature later on in the command line. Instead, change
this to just set a default CPU to 'e500' instead of a generic CPU.
As part of this, add FeatureSPE to the e500 definition.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D72673
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.
A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.
This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so
One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.
Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278
Reviewers: chandlerc, beanz, mgorny, rnk, hans
Reviewed By: rnk, hans
Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54439
This flag was originally part of D70157, but was removed as we carved away pieces of the review. Since we have the nop support checked in, and it appears mature(*), I think it's time to add the master flag. For now, it will default to nop padding, but once the prefix padding support lands, we'll update the defaults.
(*) I can now confirm that downstream testing of the changes which have landed to date - nop padding and compiler support for suppressions - is passing all of the functional testing we've thrown at it. There might still be something lurking, but we've gotten enough coverage to be confident of the basic approach.
Note that the new flag can be used either when assembling an .s file, or when using the integrated assembler directly from the compiler. The later will use all of the suppression mechanism and should always generate correct code. We don't yet have assembly syntax for the suppressions, so passing this directly to the assembler w/a raw .s file may result in broken code. Use at your own risk.
Also note that this isn't the wiring for the clang option. I think the most recent review for that is D72227, but I've lost track, so that might be off.
Differential Revision: https://reviews.llvm.org/D72738
Pass small FP values in GPRs or stack memory according the the normal
convention. This is what gcc -mno-sse does on Win64.
I adjusted the conditions under which we emit an error to check if the
argument or return value would be passed in an XMM register when SSE is
disabled. This has a side effect of no longer emitting an error for FP
arguments marked 'inreg' when targetting x86 with SSE disabled. Our
calling convention logic was already assigning it to FP0/FP1, and then
we emitted this error. That seems unnecessary, we can ignore 'inreg' and
compile it without SSE.
Reviewers: jyknight, aemerson
Differential Revision: https://reviews.llvm.org/D70465
This allows us to generate better code for selecting the fixup
to load.
Previously when the sign was set we had to load offset 0. And
when it was clear we had to load offset 4. This required a testl,
setns, zero extend, and finally a mul by 4. By switching the offsets
we can just shift the sign bit into the lsb and multiply it by 4.
Summary:
- `dead-mi-elimination` assumes MIR in the SSA form and cannot be
arranged after phi elimination or DeSSA. It's enhanced to handle the
dead register definition by skipping use check on it. Once a register
def is `dead`, all its uses, if any, should be `undef`.
- Re-arrange the DIE in RA phase for AMDGPU by placing it directly after
`detect-dead-lanes`.
- Many relevant tests are refined due to different register assignment.
Reviewers: rampitec, qcolombet, sunfish
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72709
These intrinsics expand to a variable number of instructions so just like in
ISelLowering.cpp we use custom code to deal with them.
Committing Tim's original patch.
Differential Revision: https://reviews.llvm.org/D65656
This code is untested in tree because the "APFloat::semanticsPrecision(sem) >= SrcVT.getSizeInBits() - 1" check is false for most combinations for int and fp types except maybe i32 and f64. For that you would need i32 to be an illegal type, but f64 to be legal and have custom handling for legalizing the split sint_to_fp. The precision check itself was added in 2010 to fix a double rounding issue in the algorithm that would occur if the sint_to_fp was not able to do the conversion without rounding.
Differential Revision: https://reviews.llvm.org/D72728
When multiple guard intrinsics are merged into one, currently the
result of eraseInstFromFunction() is returned -- however, this
should only be done if the current instruction is being removed.
In this case we're removing a different instruction and should
instead report that the current one has been modified by returning it.
For this test case, this reduces the number of instcombine iterations
from 5 to 2 (the minimum possible).
Differential Revision: https://reviews.llvm.org/D72558
This ports the MergeFunctions pass to the NewPM. This was rather
straightforward, as no analyses are used.
Additionally MergeFunctions needs to be conditionally enabled in
the PassBuilder, but I left that part out of this patch.
Differential Revision: https://reviews.llvm.org/D72537
This fixes the issue encountered in D71164. Instead of using a
range-based for, manually iterate over the users and advance the
iterator beforehand, so we do not skip any users due to iterator
invalidation.
Differential Revision: https://reviews.llvm.org/D72657
Summary:
Mem op clustering adds a weak edge in the DAG between two loads or
stores that should be clustered, but the direction of this edge is
pretty arbitrary (it depends on the sort order of MemOpInfo, which
represents the operands of a load or store). This often means that two
loads or stores will get reordered even if they would naturally have
been scheduled together anyway, which leads to test case churn and goes
against the scheduler's "do no harm" philosophy.
The fix makes sure that the direction of the edge always matches the
original code order of the instructions.
Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover
Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72706
Enabling shrink wrapping requires ensuring the insertion point of the
epilogue is correct for MBBs without a terminator, in which case the
instruction to adjust the stack pointer is the last instruction in the
block.
Differential Revision: https://reviews.llvm.org/D62190
Summary:
An assert added to the index-based WPD was trying to verify that we only
have multiple vtables for a given guid when they are all non-external
linkage. This is too conservative because we may have multiple external
vtable with the same guid when they are in comdat. Remove the assert,
as we don't have comdat information in the index, the linker should
issue an error in this case.
See discussion on D71040 for more information.
Reviewers: evgeny777, aganea
Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72648
By directly emitting the constants as a constant pool load we seem to avoid the build_vector/extract_subvector combines that resulted in the duplicate loads we had before.
Differential Revision: https://reviews.llvm.org/D72307
SUMMARY:
In this patch we put the global variable in a Csect which's SectionKind is "ReadOnlyWithRel" into Data Section.
Reviewers: hubert.reinterpretcast,jasonliu,Xiangling_L
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D72461
Summary:
If aligment on `LoadInst` isn't specified, load is assumed to be ABI-aligned.
And said aligment may be different for different types.
So if we change load type, but don't pay extra attention to the aligment
(i.e. keep it unspecified), we may either overpromise (if the default aligment
of the new type is higher), or underpromise (if the default aligment
of the new type is smaller).
Thus, if no alignment is specified, we need to manually preserve the implied ABI alignment.
This addresses https://bugs.llvm.org/show_bug.cgi?id=44543 by making combineLoadToNewType preserve ABI alignment of the load.
Reviewers: spatel, lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72710
There's only one user of this API currently, and it seems
impossible that it would compare values with different types.
But that's not true in general, so we need to make sure the
types are the same.
As denoted by the FIXME comments, we will also crash on FP
values. That's what brought me here, but we can make that a
follow-up patch.
Fix a missing and broken test: 2 VPT blocks predicated on the same VCMP
instruction that can be folded. The problem was that for each VPT block, we
record the predicate statements with a list, but the same instruction was added
twice. Thus, we were running in an assert trying to remove the same instruction
twice. To avoid this the instructions are now recorded with a set.
Differential Revision: https://reviews.llvm.org/D72699
Summary:
On Windows, when a function does not have an unwind table (for example, EH
filtering funclets), we don't correctly pair FP and LR to form the frame record
in all circumstances.
Fix this by invalidating a pair when the second register is FP when compiling
for Windows, even when CFI is not needed.
Fixes PR44271 introduced by D65653.
Reviewers: efriedma, sdesmalen, rovka, rengolin, t.p.northover, thegameg, greened
Reviewed By: rengolin
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71754
This reverts commit a03d7b0f24.
As discussed in D68298, this causes a compile-time regression, in case
the DTs requested are not used elsewhere in GlobalOpt. We should only
get the DTs if they are available here, but this seems not possible with
the legacy pass manager from a module pass.
For memcpy/memset/memmove etc., replace ExternalSymbolSDNode with a
MCSymbolSDNode, which have a prefix dot before function name as entry
point symbol.
Differential Revision: https://reviews.llvm.org/D70718
We were upgrading it to faddp, but a version taking two type parameters instead
of one. This then got upgraded a second time to the version with just one
parameter, but occasionally (for reasons I don't understand) this unusual
two-stage process corrupted a use-list, leading to a crash when the two faddp
declarations didn't match.
Code in getRoot made the assumption that every node in PendingLoads
must always itself have a dependency on the current DAG root node.
After the changes in 04a8696, it turns out that this assumption no
longer holds true, causing wrong codegen in some cases (e.g. stores
after constrained FP intrinsics might get deleted).
To fix this, we now need to make sure that the TokenFactor created
by getRoot always includes the previous root, if there is no implicit
dependency already present.
The original getControlRoot code already has exactly this check,
so this patch simply reuses that code now for getRoot as well.
This fixes the regression.
NFC if no constrained FP intrinsic is present.
Summary:
This cleans up a lot of ugly `foreach` bodges that I've been using to
work around the lack of those two language features. Now they both
exist, I can make then all into something more legible!
In particular, in the common pattern in `ARMInstrMVE.td` where a
multiclass defines an `Instruction` instance plus one or more `Pat` that
select it, I've used a `defvar` to wrap `!cast<Instruction>(NAME)` so
that the patterns themselves become a little more legible.
Replacing a `foreach` with a `defvar` removes a level of block
structure, so several pieces of code have their indentation changed by
this patch. Best viewed with whitespace ignored.
NFC: the output of `llvm-tblgen -print-records` on the two affected
Tablegen sources is exactly identical before and after this change, so
there should be no effect at all on any of the other generated files.
Reviewers: MarkMurrayARM, miyuki
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72690
We have a whitelist of instructions that we allow when tail
predicating, since these are trivial ones that we've deemed need no
special handling. Now change ARMLowOverheadLoops to allow the
non-trivial instructions if they're contained within a valid VPT
block. Since a valid block is one that is predicated upon the VCTP so
we know that these non-trivial instructions will still behave as
expected once the implicit predication is used instead.
This also fixes a previous test failure.
Differential Revision: https://reviews.llvm.org/D72509
As mentioned by @nikic on rGef5debac4302, we can merge the guaranteed bottom zero bits from the shifted value, and then, if a min shift amount is known, zero out the bottom bits as well.
Use the already provided helper function to get the operand type so
that we can detect whether the vpr is being used as a predicate or
not. Also use existing helpers to get the predicate indices when we
converting the vpt blocks. This enables us to support both types of
vpr predicate operand.
Differential Revision: https://reviews.llvm.org/D72504
Summary:
This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80".
The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable.
To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP).
Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations.
When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant )
It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt).
Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma, andreadb
Reviewed By: efriedma
Subscribers: gbedwell, john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70680
As mentioned by @nikic on rGef5debac4302 (although that was just about SHL), we can merge the guaranteed top zero bits from the shifted value, and then, if a min shift amount is known, zero out the top bits as well.
SHL tests / handling will be added in a follow up patch.
Due to the current way that we collect predicated instructions, we
can't easily handle vpsel in tail predicated loops. There are a
couple of issues:
1) It will use the VPR as a predicate operand, but doesn't have to be
instead a VPT block, which means we can assert while building up
the VPT block because we don't find another VPST to being a new
one.
2) VPSEL still requires a VPR operand even after tail predicating,
which means we can't remove it unless there is another
instruction, such as vcmp, that can provide the VPR def.
The first issue should be a relatively simple fix in the logic of the
LowOverheadLoops pass, whereas the second will require us to
represent the 'implicit' tail predication with an explicit value.
Differential Revision: https://reviews.llvm.org/D72629
Enables the masked gather pass to create a masked
gather loading from a base and vector of offsets.
This also enables v8i16 and v16i8 gather loads.
Differential Revision: https://reviews.llvm.org/D72330
Summary:
This allows you to make some of the defs in a multiclass or `foreach`
conditional on an expression computed from the parameters or iteration
variables.
It was already possible to simulate an if statement using a `foreach`
with a dummy iteration variable and a list constructed using `!if` so
that it had length 0 or 1 depending on the condition, e.g.
foreach unusedIterationVar = !if(condition, [1], []<int>) in { ... }
But this syntax is nicer to read, and also more convenient because it
allows an else clause.
To avoid upheaval in the implementation, I've implemented `if` as pure
syntactic sugar on the `foreach` implementation: internally, `ParseIf`
actually does construct exactly the kind of foreach shown above (and
another reversed one for the else clause if present).
Reviewers: nhaehnle, hfinkel
Reviewed By: hfinkel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71474
Summary:
This allows you to define a global or local variable to an arbitrary
value, and refer to it in subsequent definitions.
The main use I anticipate for this is if you have to compute some
difficult function of the parameters of a multiclass, and then use it
many times. For example:
multiclass Foo<int i, string s> {
defvar op = !cast<BaseClass>("whatnot_" # s # "_" # i);
def myRecord {
dag a = (op this, (op that, the other), (op x, y, z));
int b = op.subfield;
}
def myOtherRecord<"template params including", op>;
}
There are a couple of ways to do this already, but they're not really
satisfactory. You can replace `defvar x = y` with a loop over a
singleton list, `foreach x = [y] in { ... }` - but that's unintuitive
to someone who hasn't seen that workaround idiom before, and requires
an extra pair of braces that you often didn't really want. Or you can
define a nested pair of multiclasses, with the inner one taking `x` as
a template parameter, and the outer one instantiating it just once
with the desired value of `x` computed from its other parameters - but
that makes it awkward to sequentially compute each value based on the
previous ones. I think `defvar` makes things considerably easier.
You can also use `defvar` at the top level, where it inserts globals
into the same map used by `defset`. That allows you to define global
constants without having to make a dummy record for them to live in:
defvar MAX_BUFSIZE = 512;
// previously:
// def Dummy { int MAX_BUFSIZE = 512; }
// and then refer to Dummy.MAX_BUFSIZE everywhere
Reviewers: nhaehnle, hfinkel
Reviewed By: hfinkel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71407
This change allows to model the height of the instruction
within a bundle for latency adjustment purposes.
Differential Revision: https://reviews.llvm.org/D72669
We do not have InstrItinerary so generic getInstLatency() was always
defaulting to return 1 cycle. We need to use TargetSchedModel instead
to compute an instruction's latency.
Differential Revision: https://reviews.llvm.org/D72655
We're planning to remove the shufflemask operand from ShuffleVectorInst
(D72467); fix GlobalISel so it doesn't depend on that Constant.
The change to prelegalizercombiner-shuffle-vector.mir happens because
the input contains a literal "-1" in the mask (so the parser/verifier
weren't really handling it properly). We now treat it as equivalent to
"undef" in all contexts.
Differential Revision: https://reviews.llvm.org/D72663
Summary: This fixes a crash in internal builds under SamplePGO.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72653
Summary:
A recent fix in D69452 fixed index based WPD in the presence of
available_externally vtables. It added a cast of the vtable def
summary to a GlobalVarSummary. However, in some cases one def may be an
alias, in which case we need to get the base object before casting,
otherwise we will crash.
Reviewers: evgeny777, steven_wu, aganea
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71040
Summary:
This is the next portion of patches for dsymutil.
Create DwarfEmitter interface to generate all debug info tables.
Put DwarfEmitter into DwarfLinker library and make tools/dsymutil/DwarfStreamer
to be child of DwarfEmitter.
It passes check-all testing. MD5 checksum for clang .dSYM bundle matches
for the dsymutil with/without that patch.
Reviewers: JDevlieghere, friss, dblaikie, aprantl
Reviewed By: JDevlieghere
Subscribers: merge_guards_bot, hiraditya, thegameg, probinson, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D72476
The lto::Config object saved on the global LTO object should not be
updated by any of the LTO backends. Otherwise we could run into
interference between threads utilizing it. Motivated by some proposed
changes that would have caused it to get modified in the ThinLTO
backends.
Summary:
be15dfa88f broke GlobalISel's usage of getSetCCInverse() which currently
appears to be limited to our out-of-tree backend. GlobalISel doesn't use
EVT's and isn't able to derive them from the information it has as it
doesn't distinguish between integer and floating point types (that
distinction is made by operations rather than values). Bring back the
bool version of getSetCCInverse() in a way that doesn't break the intent
of be15dfa88f but also allows GlobalISel to continue using it.
Reviewers: spatel, bogner, arichardson
Reviewed By: arichardson
Subscribers: rovka, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72309
readPrefixes() assumes insn->bytes is non-empty. The code path is not
exercised in llvm-mc because llvm-mc does not feed empty input to
MCDisassembler::getInstruction().
This bug is uncovered by a5994c789a.
An empty string did not crash before because the deleted regionReader()
allowed UINT64_C(-1) as insn->readerCursor.
Bytes.size() <= Address -> R->Base
0 <= UINT64_C(-1) - UINT32_C(-1)
This patch makes it so that cases where multiple instructions that differ only
in their FrameIndex MachineOperand values no longer collide. For instance:
%1:_(p0) = G_FRAME_INDEX %stack.0
%2:_(p0) = G_FRAME_INDEX %stack.1
Prior to this patch these instructions would collide together.
Differential Revision: https://reviews.llvm.org/D71583
The current users of the waterfall loop utility functions do not make
use of the restored original insert point. The insertion is either
done, or they set the insert point somewhere else. A future change
will want to insert instructions after the waterfall loop, but
figuring out the point after the loop is more difficult than ensuring
the insert point is there after the loop.
The branch target needs to be changed depending on whether there is an
unconditional branch or not.
Loops also need to be similarly fixed, but compiling a simple testcase
end to end requires another set of patches that aren't upstream yet.
Some target like arm/riscv with soft-float will have compiling crash when using -fno-unsafe-math-optimization option.
This patch will add the missing strict FP support to SoftenFloatRes_XINT_TO_FP.
Differential Revision: https://reviews.llvm.org/D72277
Reasonable assumptions can be made when a parsed address length does not
match the expected length, so there's no need for this to be fatal.
Reviewed by: ikudrin
Differential Revision: https://reviews.llvm.org/D72154
Summary:
This patch makes it easy to try out different preinlining thresholds
with a command-line switch just like -preinline-threshold for the
legacy pass manager.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72618
Summary: These seem to be the machine operand types currently needed by the
RISC-V target.
Reviewers: asb, lenary
Reviewed By: lenary
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72275
Summary:
The Pointer Authentication Extension (PAC) was added in Armv8.3-A. Some
instructions are implemented in the HINT space to allow compiling code
common to CPUs regardless of whether they feature PAC or not, and still
benefit from PAC protection in the PAC-enabled CPUs.
The 8.3-specific mnemonics were currently enabled in any architecture, and
LLVM was emitting them in assembly files when PAC code generation was
enabled. This was ok for compilations where both LLVM codegen and the
integrated assembler were used. However, the LLVM codegen was not
compatible with other assemblers (e.g. GAS). Given the fact that the
approach from these assemblers (i.e. to disallow Armv8.3-A mnemonics if
compiling for Armv8.2-A or lower) is entirely reasonable, this patch makes
LLVM to emit HINT when building for Armv8.2-A and below, instead of
PACIASP, AUTIASP and friends. Then, LLVM assembly should be compatible
with other assemblers.
Reviewers: samparker, chill, LukeCheeseman
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71658
The R_(MICRO)MIPS_JALR optimization only works when used against functions.
Using the relocation against a data symbol (e.g. function pointer) will
cause some linkers that don't ignore the hint in this case (e.g. LLD prior
to commit 5bab291b7b) to generate a relative branch to the data symbol
which crashes at run time. Before this patch, LLVM was erroneously emitting
these relocations against local-dynamic TLS function pointers and global
function pointers with internal visibility.
Reviewers: atanasyan, jrtc27, vstefanovic
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D72571
When compiling position-independent executables, we now use
DW_EH_PE_pcrel | DW_EH_PE_sdata4. However, the MIPS ABI does not define a
64-bit PC-relative ELF relocation so we cannot use sdata8 for the large
code model case. When using the large code model, we fall back to the
previous behaviour of generating absolute relocations.
With this change clang-generated .o files can be linked by LLD without
having to pass -Wl,-z,notext (which creates text relocations).
This is simpler than the approach used by ld.bfd, which rewrites the
.eh_frame section to convert absolute relocations into relative references.
I saw in D13104 that apparently ld.bfd did not accept pc-relative relocations
for MIPS ouput at some point. However, I also checked that recent ld.bfd
can process the clang-generated .o files so this no longer seems true.
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D72228
We need to ensure that fpexcept.strict nodes are not optimized away even if
the result is unused. To do that, we need to chain them into the block's
terminator nodes, like already done for PendingExcepts.
This patch adds two new lists of pending chains, PendingConstrainedFP and
PendingConstrainedFPStrict to hold constrained FP intrinsic nodes without
and with fpexcept.strict markers. This allows not only to solve the above
problem, but also to relax chains a bit further by no longer flushing all
FP nodes before a store or other memory access. (They are still flushed
before nodes with other side effects.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D72341
As detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value.
This patch adds support to SelectionDAG::ComputeKnownBits to use KnownBits::countMinTrailingZeros and countMinLeadingZeros to set the minimum guaranteed leading/trailing known zero bits.
Differential Revision: https://reviews.llvm.org/D72573
which is the default TLS model for non-PIC objects. This allows large/
many thread local variables or a compact/fast code in an executable.
Specification is same as that of GCC. For example, the code model
option precedes the TLS size option.
TLS access models other than local-exec are not changed. It means
supoort of the large code model is only in the local exec TLS model.
Patch By KAWASHIMA Takahiro (kawashima-fj <t-kawashima@fujitsu.com>)
Reviewers: dmgreen, mstorsjo, t.p.northover, peter.smith, ostannard
Reviewd By: peter.smith
Committed by: peter.smith
Differential Revision: https://reviews.llvm.org/D71688
Summary:
It is useful to keep statistics on how many instructions we have
compressed, so we can see if future changes are increasing or decreasing this
number.
Reviewers: asb, luismarques
Reviewed By: asb, luismarques
Subscribers: xbolva00, sameer.abuasal, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67495
Summary:
This patch will provide support for auto return type for the C++ member
functions. Before this return type of the member function is deduced and
stored in the DIE.
This patch includes llvm side implementation of this feature.
Patch by: Awanish Pandey <Awanish.Pandey@amd.com>
Reviewers: dblaikie, aprantl, shafik, alok, SouraVX, jini.susan.george
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D70524
If addrecexpr has nuw flag, the value should never be less than its
start value and start value does not required to be SCEVConstant.
Reviewed By: nikic, sanjoy
Differential Revision: https://reviews.llvm.org/D71690
Summary:
AMO memory operands use a custom parser in order to accept both (reg)
and 0(reg). However, the validation predicate used for these operands
was only checking that they were registers, and not the register class,
so non-GPRs (such as FPRs) were also accepted. Thus, fix this by making
the predicate check that they are GPRs.
Reviewers: asb, lenary
Reviewed By: asb, lenary
Subscribers: hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72471
For a target symbol defined in the same section, currently we don't emit
a relocation if VariantKind is VK_None (with few exceptions like RISC-V
relaxation), while GNU as emits one. This causes program behavior
differences with and without -ffunction-sections, and can break intended
symbol interposition in a -shared link.
```
.globl foo
foo:
call foo # no relocation. On other targets, may be written as b foo, etc
call bar # a relocation if bar is in another section (e.g. -ffunction-sections)
call foo@plt # a relocation
```
Unify these cases by always emitting a relocation. If we ever want to
optimize `call foo` in -shared links, we should emit a STB_LOCAL alias
and call via the alias.
ARM/thumb2-beq-fixup.s: we now emit a relocation to global_thumb_fn as GNU as does.
X86/Inputs/align-branch-64-2.s: we now emit R_X86_64_PLT32 to foo as GNU does.
ELF/relax.s: rewrite the test as target-in-same-section.s .
We omitted relocations to `global` and now emit R_X86_64_PLT32.
Note, GNU as does not emit a relocation for `jmp global` (maybe its own
bug). Our new behavior is compatible except `jmp global`.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D72197
.section name, "flags"G, @type, GroupName[, linkage]
As of binutils 2.33, linkage cannot be 'unique'. For integrated
assembler, we use both 'o' flag and 'unique' linkage to support
--gc-sections and COMDAT with lld.
https://sourceware.org/ml/binutils/2019-11/msg00266.html
Only perform this if we are shuffling lower and upper lane elements across the lanes (otherwise splitting to lower xmm shuffles would be better).
This is a regression if we shuffle build_vectors due to getVectorShuffle canonicalizing 'blend of splat' build vectors, for now I've set this not to shuffle build_vector nodes at all to avoid this.
Current implementation of BaseMemOpsClusterMutation is a little bit
obscure. This patch directly uses a map from store chain ID to set of
memory instrs to make it simpler, so that future improvements are easier
to read, update and review.
Reviewed By: evandro
Differential Revision: https://reviews.llvm.org/D72070
This causes the STRICT_FSETCC/STRICT_FSETCCS nodes to lowered
early while lowering SELECT, but the output chain doesn't get
connected. Then we visit the node again when it is its turn
because we haven't replaced the use of the chain result. In the
case of the fp128 libcall lowering, after D72341 this will cause
the libcall to be emitted twice.
Custom legalization can produce MERGE_VALUES to return multiple
results. We can expand them immediately instead of leaving them
around for DAG combine to clean up.
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.
If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
This fixes an off-by-one error in the argc value computed by runAsMain, and
switches lli back to using the input bitcode (rather than the string "lli") as
the effective program name.
Thanks to Stefan Graenitz for spotting the bug.
This patch allows for handling a failure inside a CrashRecoveryContext in the same way as the global exception/signal handler. A failure will have the same side-effect, such as cleanup of temporarty file, printing callstack, calling relevant signal handlers, and finally returning an exception code. This is an optional feature, disabled by default.
This is a support patch for D69825.
Differential Revision: https://reviews.llvm.org/D70568
llvm-objdump -d on clang is decreased from 7.8s to 7.4s.
The improvement is likely due to the elimination of logger setup and
dbgprintf(), which has a large overhead.
Some of the simplest handlers just call TLI and if that fails,
they fall back to unrolling. For those just inline the TLI call
and share the unrolling call with the default case of Expand.
For ExpandFSUB and ExpandBITREVERSE so that its obvious they
don't return results sometimes and want to defer to LegalizeDAG.
The primary motivation of this change is to bring the code more closely in sync behavior wise with the assembler's version of nop emission. I'd like to eventually factor them into one, but that's hard to do when one has features the other doesn't.
The longest encodeable nop on x86 is 15 bytes, but many processors - for instance all intel chips - can't decode the 15 byte form efficiently. On those processors, it's better to use either a 10 byte or 11 byte sequence depending.
Add initial support for lowering v4f64 shuffles to SHUFPD(VPERM2F128(V1, V2), VPERM2F128(V1, V2)), eventually this could be used for v8f32 (and maybe v8f64/v16f32) but I'm being conservative for the initial implementation as only v4f64 can always succeed.
This currently is only called from lowerShuffleAsLanePermuteAndShuffle so only gets used for unary shuffles, and we limit this to cases where we use upper elements as otherwise concating 2 xmm shuffles is probably the better case.
Helps with poor shuffles mentioned in D66004.
Fix https://bugs.llvm.org/show_bug.cgi?id=44419 by preserving the
nuw on sub of geps. We only do this if the offset has a multiplication
as the final operation, as we can't be sure the operations is nuw
in the other cases without more thorough analysis.
Differential Revision: https://reviews.llvm.org/D72048
We use the stack for X87 fp_round and for moving from SSE f32/f64 to
X87 f64/f80. Or from X87 f64/f80 to SSE f32/f64.
Note for the SSE<->X87 conversions the conversion always happens in the
X87 domain. The load/store ops in the X87 instructions are able
to signal exceptions.
Summary:
- SI Whole Quad Mode phase is replacing WQM pseudo instructions with v_mov instructions.
While this is necessary for the special handling of moving results out of WWM live ranges,
it is not necessary for WQM live ranges. The result is a v_mov from a register to itself after every
WQM operation. This change uses a COPY psuedo in these cases, which allows the register
allocator to coalesce the moves away.
Reviewers: tpr, dstuttard, foad, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71386
Summary:
This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare.
This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code.
Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn
Reviewed By: efriedma
Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72536
Summary:
The goal is to simplify experimentation on the cost model. Today,
CallAnalyzer decides 2 things: legality, and benefit. The refactoring
keeps legality assessment in CallAnalyzer, and factors benefit
evaluation out, as an extension.
Reviewers: davidxl, eraman
Reviewed By: davidxl
Subscribers: kamleshbhalui, fedor.sergeev, hiraditya, baloghadamsoftware, haicheng, a.sidorin, Szelethus, donat.nagy, dkrupp, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71733
This reverts the AMDGPU DAG mutation implemented in D72487 and gives
a more general way of adjusting BUNDLE operand latency.
It also replaces FixBundleLatencyMutation with adjustSchedDependency
callback in the AMDGPU, fixing not only successor latencies but
predecessors' as well.
Differential Revision: https://reviews.llvm.org/D72535
Add a predicate to MCInstDesc that allows tools to determine whether an
instruction authenticates a pointer. This can be used by diagnostic
tools to hint at pointer authentication failures.
Differential Revision: https://reviews.llvm.org/D70329
rdar://55089604
ONE is currently softened to OGT | OLT. But the libcalls for OGT and OLT libcalls will trigger an exception for QNAN. At least for X86 with libgcc. UEQ on the other hand uses UO | OEQ. The UO and OEQ libcalls will not trigger an exception for QNAN.
This patch changes ONE to use the inverse of the UEQ lowering. So we now produce O & UNE. Technically the existing behavior was correct for a signalling ONE, but since I don't know how to generate one of those from clang that seemed like something we can deal with later as we would need to fix other predicates as well. Also removing spurious exceptions seemed better than missing an exception.
There are also problems with quiet OGT/OLT/OLE/OGE, but those are harder to fix.
Differential Revision: https://reviews.llvm.org/D72477
This system wasn't very well designed for multi-result nodes. As
a consequence they weren't consistently registered in the
LegalizedNodes map leading to nodes being revisited for different
results.
I've removed the "Result" variable from the main LegalizeOp method
and used a SDNode* instead. The result number from the incoming
Op SDValue is only used for deciding which result to return to the
caller. When LegalizeOp is called it should always register a
legalized result for all of its results. Future calls for any other
result should be pulled for the LegalizedNodes map.
Legal nodes will now register all of their results in the map
instead of just the one we were called for.
The Expand and Promote handling to use a vector of results similar
to LegalizeDAG. Each of the new results is then re-legalized and
logged in the LegalizedNodes map for all of the Results for the
node being legalized. None of the handles register their own
results now. And none call ReplaceAllUsesOfValueWith now.
Custom handling now always passes result number 0 to LowerOperation.
This matches what LegalizeDAG does. Since the introduction of
STRICT nodes, I've encountered several issues with X86's custom
handling being called with an SDValue pointing at the chain and
our custom handlers using that to get a VT instead of result 0.
This should prevent us from having any more of those issues. On
return we will update the LegalizedNodes map for all results so
we shouldn't call the custom handler again for each result number.
I want to push SDNode* further into the Expand and Promote
handlers, but I've left that for a follow to keep this patch size
down. I've created a dummy SDValue(Node, 0) to keep the handlers
working.
Differential Revision: https://reviews.llvm.org/D72224
The Linux kernel uses -fpatchable-function-entry to implement DYNAMIC_FTRACE_WITH_REGS
for arm64 and parisc. GCC 8 implemented
-fpatchable-function-entry, which can be seen as a generalized form of
-mnop-mcount. The N,M form (function entry points before the Mth NOP) is
currently only used by parisc.
This patch adds N,0 support to AArch64 codegen. N is represented as the
function attribute "patchable-function-entry". We will use a different
function attribute for M, if we decide to implement it.
The patch reuses the existing patchable-function pass, and
TargetOpcode::PATCHABLE_FUNCTION_ENTER which is currently used by XRay.
When the integrated assembler is used, __patchable_function_entries will
be created for each text section with the SHF_LINK_ORDER flag to prevent
--gc-sections (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93197) and
COMDAT (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195) issues.
Retrospectively, __patchable_function_entries should use a PC-relative
relocation type to avoid the SHF_WRITE flag and dynamic relocations.
"patchable-function-entry"'s interaction with Branch Target
Identification is still unclear (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 for GCC discussions).
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D72215
Summary:
This patch pushes the AIX vararg unimplemented error diagnostic later
and allows vararg calls so long as all the arguments can be passed in register.
This patch extends the AIX calling convention implementation to initialize
GPR(s) for vararg float arguments. On AIX, both GPR(s) and FPR are allocated
for floating point arguments. The GPR(s) are only initialized for vararg calls,
otherwise the callee is expected to retrieve the float argument in the FPR.
f64 in AIX PPC32 requires special handling in order to allocated and
initialize 2 GPRs. This is performed with bitcast, SRL, truncation to
initialize one GPR for the MSW and bitcast, truncations to initialize
the other GPR for the LSW.
A future patch will follow to add support for arguments passed on the stack.
Patch provided by: cebowleratibm
Reviewers: sfertile, ZarkoCA, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D71013
We only use lowerShuffleAsLanePermuteAndShuffle for unary shuffles at the moment, but we should consistently handle lane index calculations for multiple inputs in both the AVX1 and AVX2 paths.
Minor (almost NFC) tidyup as I'm hoping to use lowerShuffleAsLanePermuteAndShuffle for binary shuffles soon.
Previously extern function is added as BTF_KIND_VAR. This does not work
well with existing BTF infrastructure as function expected to use
BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO.
This patch added extern function to BTF_KIND_FUNC. The two bits 0:1
of btf_type.info are used to indicate what kind of function it is:
0: static
1: global
2: extern
Differential Revision: https://reviews.llvm.org/D71638
Summary:
Avoid using the `nocf_check` attribute with Control Flow Guard. Instead, use a
new `"guard_nocf"` function attribute to indicate that checks should not be
added on indirect calls within that function. Add support for
`__declspec(guard(nocf))` following the same syntax as MSVC.
Reviewers: rnk, dmajor, pcc, hans, aaron.ballman
Reviewed By: aaron.ballman
Subscribers: aaron.ballman, tomrittervg, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72167