AMDGPU has some buffer intrinsics which theoretically could use
this. Some of the generated tables include the 3 and 4 element vector
versions of these rounded to 64-bits, which is ambiguous. Add these to
help the table disambiguate these.
Assertion change is for the path odd sized vectors now take for R600.
v3i16 is widened to v4i16, which then needs to be promoted to v4i32.
llvm-svn: 369038
For indirect call sites having a small set of possible callees,
!callees metadata can be used to indicate what those callees are.
This patch updates the call graph and lazy call graph analyses so
that they consider this metadata when encountering call sites. For
the call graph, it adds a new external call graph node to the graph
for each unique !callees metadata node. A call graph edge connects
an indirect call site with the external node associated with the
!callees metadata that is attached to it. And there is an edge from
this external node to each of the callees indicated by the metadata.
Similarly, for the lazy call graph, the patch adds Ref edges from a
caller to the possible callees indicated by the metadata.
The primary purpose of the patch is to facilitate iterating over the
functions in a module such that all of the callees indicated by a
given !callees metadata node will be visited prior to the functions
containing call sites annotated by that node. This property is
required by optimizations performing a bottom-up traversal of the
SCC DAG. For example, the inliner can be made to inline through an
indirect call. If the call site is annotated with !callees metadata,
this patch ensures that the inliner will have visited all of the
callees prior to the caller, allowing it to reliably compute the
cost of inlining one or more of the potential callees.
Original patch by @mssimpso. I've made some small changes to get it
to apply, build, and pass tests on the top of tree, as well as
some minor tweaks to formatting and functionality.
Subscribers: mehdi_amini, hiraditya, llvm-commits, mssimpso
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D39339
llvm-svn: 369025
This should have the same semantics. We use std::shared_mutex instead on
MSVC and C++17, std::shared_timed_mutex is less efficient than our
custom implementation on Windows, std::shared_mutex should be faster.
llvm-svn: 369018
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
This patch teaches the RCU how to peek 'next' RCUTokens. A new method has been
added to the RetireControlUnit class with the goal of minimizing the complexity
of follow-up patches that will enable macro-fusion support in mca.
This patch also adds method Instruction::getNumMicroOpcodes() to simplify common
interactions with the instruction descriptor (a pattern quite common in some
pipeline stages).
Added the ability to override the default set of consumed scheduler resources
(this -again- is to simplify future patches that add support for macro-op fusion).
No functional change intended.
llvm-svn: 369010
This patch slightly changes the API in the attempt to simplify resource buffer
queries. It is done in preparation for a patch that will enable support for
macro fusion.
llvm-svn: 368994
Some uses of getArgumentAliasingToReturnedPointer and
isIntrinsicReturningPointerAliasingArgumentWithoutCapturing require the
calls/intrinsics to preserve the nullness of the argument.
For alias analysis, the nullness property does not really come into
play.
This patch explicitly sets it to true. In D61669, the alias analysis
uses will be switched to not require preserving nullness.
Reviewers: nlopes, efriedma, hfinkel, sanjoy, aqjune, jdoerfert
Reviewed By: jdoerfert
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64150
llvm-svn: 368993
This patch adds a ptrmask intrinsic which allows masking out bits of a
pointer that must be zero when accessing it, because of ABI alignment
requirements or a restriction of the meaningful bits of a pointer
through the data layout.
This avoids doing a ptrtoint/inttoptr round trip in some cases (e.g. tagged
pointers) and allows us to not lose information about the underlying
object.
Reviewers: nlopes, efriedma, hfinkel, sanjoy, jdoerfert, aqjune
Reviewed by: sanjoy, jdoerfert
Differential Revision: https://reviews.llvm.org/D59065
llvm-svn: 368986
assume_safety implies that loads under "if's" can be safely executed
speculatively (unguarded, unmasked). However this assumption holds only for the
original user "if's", not those introduced by the compiler, such as the
fold-tail "if" that guards us from loading beyond the original loop trip-count.
Currently the combination of fold-tail and assume-safety pragmas results in
ignoring the fold-tail predicate that guards the loads, generating unmasked
loads. This patch fixes this behavior.
Differential Revision: https://reviews.llvm.org/D66106
Reviewers: Ayal, hsaito, fhahn
llvm-svn: 368973
Summary: This exposes `CallInst`'s tail call kind via new `LLVMGetTailCallKind` and `LLVMSetTailCallKind` functions. The motivation for this is to be able to see `musttail` for languages that require mandatory tail calls for correctness. Today only the weaker `LLVMSetTail` is exposed and there is no way to set `GuaranteedTailCallOpt` via the C API.
Reviewers: CodaFi, jyknight, deadalnix, rnk
Reviewed By: CodaFi
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66061
llvm-svn: 368945
Summary:
Instead of constantly keeping track of the nonnull status with the
dereferenceable information we can simply query the nonnull attribute
whenever we need the information (debug + manifest).
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66113
llvm-svn: 368924
Summary:
As one of the first attributes, and one of the complex ones,
AAReturnedValues was not using liveness but we filtered the result after
the fact. This change adds liveness usage during the creation. The
algorithm is also improved and shorter.
The new algorithm will collect returned values over time using the
generic facilities that work with liveness already, e.g.,
genericValueTraversal which does not look at dead PHI node predecessors.
A test to show how this leads to better results is included.
Note: Unresolved calls and resolved calls are now tracked explicitly.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66120
llvm-svn: 368922
Summary:
If the associated context instruction is assumed dead we do not need to
update or manifest the state.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66116
llvm-svn: 368921
Summary:
The next attempt to clean up the Attributor interface before we grow it
further.
Before, we used a combination of two values (associated + anchor) and an
argument number (or -1) to determine a location. This was very fragile.
The new system uses exclusively IR positions and we restrict the
generation of IR positions to special constructor methods that verify
internal constraints we have. This will catch misuse early.
The auto-conversion, e.g., in getAAFor, is now performed through the
SubsumingPositionIterator. This iterator takes an IR position and allows
to visit all IR positions that "subsume" the given one, e.g., function
attributes "subsume" argument attributes of that function. For a
detailed breakdown see the class comment of SubsumingPositionIterator.
This patch also introduces the IRPosition::getAttrs() to extract IR
attributes at a certain position. The method knows how to look up in
different positions that are equivalent, e.g., the argument position for
call site arguments. We also introduce three new positions kinds such
that we have all IR positions where attributes can be placed and one for
"floating" values.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65977
llvm-svn: 368919
I'm planning on handling intrinsics that will benefit from checking
the address space enums. Don't bother moving the address collection
for now, since those won't need th enums.
llvm-svn: 368895
This reverts commit r368849, because it breaks some bots (e.g.
llvm-clang-x86_64-win-fast).
It turns out this is not as NFC as we had hoped, because operator== will
consider two std::error_codes to be distinct even though they both hold
"success" values if they have different categories.
llvm-svn: 368854
Summary:
The main motivation for this is unit tests, which contain a large macro
for pretty-printing std::error_code, and this macro is duplicated in
every file that needs to do this. However, the functionality may be
useful elsewhere too.
In this patch I have reimplemented the existing ASSERT_NO_ERROR macros
to reuse the new functionality, but I have kept the macro (as a
one-liner) as it is slightly more readable than ASSERT_EQ(...,
std::error_code()).
Reviewers: sammccall, ilya-biryukov
Subscribers: zturner, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65643
llvm-svn: 368849
MCP currently uses changeDebugValuesDefReg / collectDebugValues to find
debug users of a register, however those functions assume that all
DBG_VALUEs immediately follow the specified instruction, which isn't
necessarily true. This is going to become very often untrue when we turn
off CodeGenPrepare::placeDbgValues.
Instead of calling changeDebugValuesDefReg on an instruction to change its
debug users, in this patch we instead collect DBG_VALUEs of copies as we
iterate over insns, and update the debug users of copies that are made
dead. This isn't a non-functional change, because MCP will now update
DBG_VALUEs that aren't immediately after a copy, but refer to the same
register. I've hijacked the regression test for PR38773 to test for this
new behaviour, an entirely new test seemed overkill.
Differential Revision: https://reviews.llvm.org/D56265
llvm-svn: 368835
Changes: no changes. A fix for the clang code will be landed right on top.
Original commit message:
SectionRef::getName() returns std::error_code now.
Returning Expected<> instead has multiple benefits.
For example, it forces user to check the error returned.
Also Expected<> may keep a valuable string error message,
what is more useful than having a error code.
(Object\invalid.test was updated to show the new messages printed.)
This patch makes a change for all users to switch to Expected<> version.
Note: in a few places the error returned was ignored before my changes.
In such places I left them ignored. My intention was to convert the interface
used, and not to improve and/or the existent users in this patch.
(Though I think this is good idea for a follow-ups to revisit such places
and either remove consumeError calls or comment each of them to clarify why
it is OK to have them).
Differential revision: https://reviews.llvm.org/D66089
llvm-svn: 368826
SectionRef::getName() returns std::error_code now.
Returning Expected<> instead has multiple benefits.
For example, it forces user to check the error returned.
Also Expected<> may keep a valuable string error message,
what is more useful than having a error code.
(Object\invalid.test was updated to show the new messages printed.)
This patch makes a change for all users to switch to Expected<> version.
Note: in a few places the error returned was ignored before my changes.
In such places I left them ignored. My intention was to convert the interface
used, and not to improve and/or the existent users in this patch.
(Though I think this is good idea for a follow-ups to revisit such places
and either remove consumeError calls or comment each of them to clarify why
it is OK to have them).
Differential revision: https://reviews.llvm.org/D66089
llvm-svn: 368812
A quick contrast of this ABI with the currently-implemented ABI:
- Allocation is implicitly managed by the lowering passes, which is fine
for frontends that are fine with assuming that allocation cannot fail.
This assumption is necessary to implement dynamic allocas anyway.
- The lowering attempts to fit the coroutine frame into an opaque,
statically-sized buffer before falling back on allocation; the same
buffer must be provided to every resume point. A buffer must be at
least pointer-sized.
- The resume and destroy functions have been combined; the continuation
function takes a parameter indicating whether it has succeeded.
- Conversely, every suspend point begins its own continuation function.
- The continuation function pointer is directly returned to the caller
instead of being stored in the frame. The continuation can therefore
directly destroy the frame when exiting the coroutine instead of having
to leave it in a defunct state.
- Other values can be returned directly to the caller instead of going
through a promise allocation. The frontend provides a "prototype"
function declaration from which the type, calling convention, and
attributes of the continuation functions are taken.
- On the caller side, the frontend can generate natural IR that directly
uses the continuation functions as long as it prevents IPO with the
coroutine until lowering has happened. In combination with the point
above, the frontend is almost totally in charge of the ABI of the
coroutine.
- Unique-yield coroutines are given some special treatment.
llvm-svn: 368788
Summary: Fix "llvm-profdata show" so it can work with compact binary format profile. The change is to mark all functions "used" so SampleProfileReaderCompactBinary::read will read in all profiles available for dumping. The function names will be MD5 hash for compact binary format.
Reviewers: wmi, davidxl, danielcdh
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65162
llvm-svn: 368731
Summary:
This is a tweak to r368311 and r368646 which auto upgrades the calls to
objc runtime functions to objc runtime intrinsics, in order to make sure
that the auto upgrader does not trigger with up-to-date bitcode.
It is possible for bitcode that is up-to-date to contain direct calls to
objc runtime function and those are not inserted by compiler as part of
ARC and they should not be upgraded. Now auto upgrader only triggers as
when the old style of ARC marker is used so it is guaranteed that it
won't trigger on update-to-date bitcode.
This also means it won't do this upgrade for bitcode from llvm-8 and
llvm-9, which preserves the behavior of those releases. Ideally they
should be upgraded as well but it is more important to make sure
AutoUpgrader will not trigger on up-to-date bitcode.
Reviewers: ahatanak, rjmccall, dexonsmith, pete
Reviewed By: dexonsmith
Subscribers: hiraditya, jkorous, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66153
llvm-svn: 368730
Addresses post-commit comments on https://reviews.llvm.org/D64825. Use
assert instead of llvm_unreachable to check if invalid csect types are being
generated. Use report_fatal_error on unimplemented XCOFF features.
Differential Revision: https://reviews.llvm.org/D64825
llvm-svn: 368720
An incorrect verification error revealed that the list of type tags was
incomplete. This patch adds the missing types by adding a tag kind to
the Dwarf.def file, which is used by the `isType` function.
A test was added for the original verification error.
Differential revision: https://reviews.llvm.org/D65914
llvm-svn: 368718
This patch replaces the JITDylib::DefinitionGenerator typedef with a class of
the same name, and adds support for attaching a sequence of DefinitionGeneration
objects to a JITDylib.
This patch also adds a new definition generator,
StaticLibraryDefinitionGenerator, that can be used to add symbols fom a static
library to a JITDylib. An object from the static library will be added (via
a supplied ObjectLayer reference) whenever a symbol from that object is
referenced.
To enable testing, lli is updated to add support for the --extra-archive option
when running in -jit-kind=orc-lazy mode.
llvm-svn: 368707
Currently shufflemasks get emitted as any other constant, and you end
up with a bunch of virtual registers of G_CONSTANT with a
G_BUILD_VECTOR. The AArch64 selector then asserts on anything that
doesn't fit this pattern. This isn't an ideal representation, and
should avoid legalization and have fewer opportunities for a
representational error.
Rather than invent a new shuffle mask operand type, similar to what
ShuffleVectorSDNode does, just track the original IR Constant mask
operand. I don't completely like the idea of adding another link to
the IR, but MIR is already quite dependent on IR constants already,
and this will allow sharing the shuffle mask utility functions with
the IR.
llvm-svn: 368704
Summary:
This implements an optimization described in Hacker's Delight 10-17:
when `C` is constant, the result of `X % C == 0` can be computed
more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
One huge caveat: this signed case is only valid for positive divisors.
While we can freely negate negative divisors, we can't negate `INT_MIN`,
so for now if `INT_MIN` is encountered, we bailout.
As a follow-up, it should be possible to handle that more gracefully
via extra `and`+`setcc`+`select`.
This passes llvm's test-suite, and from cursory(!) cross-examination
the folds (the assembly) match those of GCC, and manual checking via alive
did not reveal any issues (other than the `INT_MIN` case)
Reviewers: RKSimon, spatel, hermord, craig.topper, xbolva00
Reviewed By: RKSimon, xbolva00
Subscribers: xbolva00, thakis, javed.absar, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65366
llvm-svn: 368702
Currently we can't keep any state in the selector object that we get from
subtarget. As a result we have to plumb through all our variables through
multiple functions. This change makes it non-const and adds a virtual init()
method to allow further state to be captured for each target.
AArch64 makes use of this in this patch to cache a call to hasFnAttribute()
which is expensive to call, and is used on each selection of G_BRCOND.
Differential Revision: https://reviews.llvm.org/D65984
llvm-svn: 368652
Summary:
This was mostly an experiment to assess the feasibility of completely
eliminating a problematic implicit conversion case in D61321 in advance of
landing that* but it also happens to align with the goal of propagating the
use of Register/MCRegister instead of unsigned so I believe it makes sense
to commit it.
The overall process for eliminating the implicit conversions from
Register/MCRegister -> unsigned was to:
1. Add an explicit conversion to support genuinely required conversions to
unsigned. For example, using them as an index for IndexedMap. Sadly it's
not possible to have an explicit and implicit conversion to the same
type and only deprecate the implicit one so I called the explicit
conversion get().
2. Temporarily annotate the implicit conversion to unsigned with
LLVM_ATTRIBUTE_DEPRECATED to make them visible
3. Eliminate implicit conversions by propagating Register/MCRegister/
explicit-conversions appropriately
4. Remove the deprecation added in 2.
* My conclusion is that it isn't feasible as there's too much code to
update in one go.
Depends on D65678
Reviewers: arsenm
Subscribers: MatzeB, wdng, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65685
llvm-svn: 368643
- There was a simple typo in TextStub code that prevented version 3 files to be read.
- Included a version 3 unit test to handle the differences in the format.
- Also a typo in Error.h inside the comments.
https://reviews.llvm.org/D66041
This patch is from Cyndy Ishida <cyndy_ishida@apple.com>.
llvm-svn: 368630
Summary:
This commit fixed a race condition from multi-threaded thinLTO backends that causes non-deterministic memory corruption for a data structure used only by AutoFDO with compact binary profile.
GUIDToFuncNameMap, a static data member of type DenseMap in FunctionSamples is used as a per-module mapping from function name MD5 to name string when input AutoFDO profile is in compact binary format. However with ThinLTO, we can have parallel backends modifying and accessing the class static map concurrently. The fix is to make GUIDToFuncNameMap a member of SampleProfileLoader instead of a file static data.
Reviewers: wmi, davidxl, danielcdh
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65848
llvm-svn: 368596
Convert SymbolNameSize and SectionNameSize into just `NameSize`. The length of
a name embeded in a symbol table entry or section header table entry is length 8
for Sections, Symbols and Files. No need to have a distinct constant for each
one. Also removes the Size argument to 'generateStringRef' as the size is
always 'XCOFF::NameSize'.
llvm-svn: 368584
Support -march=tigerlake for x86.
Compare with Icelake Client, It include 4 more new features ,they are
avx512vp2intersect, movdiri, movdir64b, shstk.
Patch by Xiang Zhang (xiangzhangllvm)
Differential Revision: https://reviews.llvm.org/D65840
llvm-svn: 368543
Summary:
Hoisting/sinking instruction out of a loop isn't always beneficial. Hoisting an instruction from a cold block inside a loop body out of the loop could hurt performance. This change makes Loop ICM profile aware - it now checks block frequency to make sure hoisting/sinking anly moves instruction to colder block.
Test Plan:
ninja check
Reviewers: asbirlea, sanjoy, reames, nikic, hfinkel, vsk
Reviewed By: asbirlea
Subscribers: fhahn, vsk, davidxl, xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65060
llvm-svn: 368526
This is an extension of a transform that tries to produce positive floating-point
constants to improve canonicalization (and hopefully lead to more reassociation
and CSE).
The original patches were:
D4904
D5363 (rL221721)
But as the test diffs show, these were limited to basic patterns by walking from
an instruction to its single user rather than recursively moving up the def-use
sequence. No fast-math is required here because we're only rearranging implicit
FP negations in intermediate ops.
A motivating bug is:
https://bugs.llvm.org/show_bug.cgi?id=32939
Differential Revision: https://reviews.llvm.org/D65954
llvm-svn: 368512
Introducing non-global control for default block-scan-limit in MemDep analysis.
Useful when there are many compilations per initialized LLVM instance (e.g. JIT).
Reviewed By: asbirlea
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65806
llvm-svn: 368502
The default behavior of Clang's indirect function call checker will replace
the address of each CFI-checked function in the output file's symbol table
with the address of a jump table entry which will pass CFI checks. We refer
to this as making the jump table `canonical`. This property allows code that
was not compiled with ``-fsanitize=cfi-icall`` to take a CFI-valid address
of a function, but it comes with a couple of caveats that are especially
relevant for users of cross-DSO CFI:
- There is a performance and code size overhead associated with each
exported function, because each such function must have an associated
jump table entry, which must be emitted even in the common case where the
function is never address-taken anywhere in the program, and must be used
even for direct calls between DSOs, in addition to the PLT overhead.
- There is no good way to take a CFI-valid address of a function written in
assembly or a language not supported by Clang. The reason is that the code
generator would need to insert a jump table in order to form a CFI-valid
address for assembly functions, but there is no way in general for the
code generator to determine the language of the function. This may be
possible with LTO in the intra-DSO case, but in the cross-DSO case the only
information available is the function declaration. One possible solution
is to add a C wrapper for each assembly function, but these wrappers can
present a significant maintenance burden for heavy users of assembly in
addition to adding runtime overhead.
For these reasons, we provide the option of making the jump table non-canonical
with the flag ``-fno-sanitize-cfi-canonical-jump-tables``. When the jump
table is made non-canonical, symbol table entries point directly to the
function body. Any instances of a function's address being taken in C will
be replaced with a jump table address.
This scheme does have its own caveats, however. It does end up breaking
function address equality more aggressively than the default behavior,
especially in cross-DSO mode which normally preserves function address
equality entirely.
Furthermore, it is occasionally necessary for code not compiled with
``-fsanitize=cfi-icall`` to take a function address that is valid
for CFI. For example, this is necessary when a function's address
is taken by assembly code and then called by CFI-checking C code. The
``__attribute__((cfi_jump_table_canonical))`` attribute may be used to make
the jump table entry of a specific function canonical so that the external
code will end up taking a address for the function that will pass CFI checks.
Fixes PR41972.
Differential Revision: https://reviews.llvm.org/D65629
llvm-svn: 368495
Summary:
Targets often have instructions that can sign-extend certain cases faster
than the equivalent shift-left/arithmetic-shift-right. Such cases can be
identified by matching a shift-left/shift-right pair but there are some
issues with this in the context of combines. For example, suppose you can
sign-extend 8-bit up to 32-bit with a target extend instruction.
%1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
%2:_(s32) = G_ASHR %1:_(s32), i32 24
%3:_(s32) = G_ASHR %2:_(s32), i32 1
would reasonably combine to:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 25
which no longer matches the special case. If your shifts and extend are
equal cost, this would break even as a pair of shifts but if your shift is
more expensive than the extend then it's cheaper as:
%2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
%3:_(s32) = G_ASHR %2:_(s32), i32 1
It's possible to match the shift-pair in ISel and emit an extend and ashr.
However, this is far from the only way to break this shift pair and make
it hard to match the extends. Another example is that with the right
known-zeros, this:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 24
%3:_(s32) = G_MUL %2:_(s32), i32 2
can become:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 23
All upstream targets have been configured to lower it to the current
G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
handle their faster cases.
To follow-up: Provide a way to legalize based on the constant. At the
moment, I'm thinking that the best way to achieve this is to provide the
MI in LegalityQuery but that opens the door to breaking core principles
of the legalizer (legality is not context sensitive). That said, it's
worth noting that looking at other instructions and acting on that
information doesn't violate this principle in itself. It's only a
violation if, at the end of legalization, a pass that checks legality
without being able to see the context would say an instruction might not be
legal. That's a fairly subtle distinction so to give a concrete example,
saying %2 in:
%1 = G_CONSTANT 16
%2 = G_SEXT_INREG %0, %1
is legal is in violation of that principle if the legality of %2 depends
on %1 being constant and/or being 16. However, legalizing to either:
%2 = G_SEXT_INREG %0, 16
or:
%1 = G_CONSTANT 16
%2:_(s32) = G_SHL %0, %1
%3:_(s32) = G_ASHR %2, %1
depending on whether %1 is constant and 16 does not violate that principle
since both outputs are genuinely legal.
Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61289
llvm-svn: 368487
Summary:
This patch keeps track of MCSymbols created for blocks that were
referenced in inline asm. It prevents creating a new symbol which
doesn't refer to the block.
Inline asm may have a reference to a label. The asm parser however
doesn't recognize it as a label and tries to create a new symbol. The
result being that instead of the original symbol (e.g. ".Ltmp0") the
parser replaces it in the inline asm with the new one (e.g. ".Ltmp00")
without updating it in the symbol table. So the machine basic block
retains the "old" symbol (".Ltmp0"), but the inline asm uses the new one
(".Ltmp00").
Reviewers: nickdesaulniers, craig.topper
Subscribers: nathanchance, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65304
llvm-svn: 368477
For type values that do not have proper names, print reasonable representation
in llvm-nm, llvm-readobj and llvm-readelf, matching GNU tools.s
Fixes PR41713.
Differential Revision: https://reviews.llvm.org/D65537
llvm-svn: 368451
Summary: Implement a new analysis to estimate the number of cache lines
required by a loop nest.
The analysis is largely based on the following paper:
Compiler Optimizations for Improving Data Locality
By: Steve Carr, Katherine S. McKinley, Chau-Wen Tseng
http://www.cs.utexas.edu/users/mckinley/papers/asplos-1994.pdf
The analysis considers temporal reuse (accesses to the same memory
location) and spatial reuse (accesses to memory locations within a cache
line). For simplicity the analysis considers memory accesses in the
innermost loop in a loop nest, and thus determines the number of cache
lines used when the loop L in loop nest LN is placed in the innermost
position.
The result of the analysis can be used to drive several transformations.
As an example, loop interchange could use it determine which loops in a
perfect loop nest should be interchanged to maximize cache reuse.
Similarly, loop distribution could be enhanced to take into
consideration cache reuse between arrays when distributing a loop to
eliminate vectorization inhibiting dependencies.
The general approach taken to estimate the number of cache lines used by
the memory references in the inner loop of a loop nest is:
Partition memory references that exhibit temporal or spatial reuse into
reference groups.
For each loop L in the a loop nest LN: a. Compute the cost of the
reference group b. Compute the 'cache cost' of the loop nest by summing
up the reference groups costs
For further details of the algorithm please refer to the paper.
Authored By: etiotto
Reviewers: hfinkel, Meinersbur, jdoerfert, kbarton, bmahjour, anemet,
fhahn
Reviewed By: Meinersbur
Subscribers: reames, nemanjai, MaskRay, wuzish, Hahnfeld, xusx595,
venkataramanan.kumar.llvm, greened, dmgreen, steleman, fhahn, xblvaOO,
Whitney, mgorny, hiraditya, mgrang, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63459
llvm-svn: 368439
MSVC (19.16) wants to see the definition of Instruction in
`std::pair<unsigned, const Instruction &> SourceRef` to decide
if it is assignable.
Patch by Orivej Desh.
Differential Revision: https://reviews.llvm.org/D65844
llvm-svn: 368436
Flag -show-encoding enables the printing of instruction encodings as part of the
the instruction info view.
Example (with flags -mtriple=x86_64-- -mcpu=btver2):
Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)
[7]: Encoding Size
[1] [2] [3] [4] [5] [6] [7] Encodings: Instructions:
1 2 1.00 4 c5 f0 59 d0 vmulps %xmm0, %xmm1, %xmm2
1 4 1.00 4 c5 eb 7c da vhaddps %xmm2, %xmm2, %xmm3
1 4 1.00 4 c5 e3 7c e3 vhaddps %xmm3, %xmm3, %xmm4
In this example, column Encoding Size is the size in bytes of the instruction
encoding. Column Encodings reports the actual instruction encodings as byte
sequences in hex (objdump style).
The computation of encodings is done by a utility class named mca::CodeEmitter.
In future, I plan to expose the CodeEmitter to the instruction builder, so that
information about instruction encoding sizes can be used by the simulator. That
would be a first step towards simulating the throughput from the decoders in the
hardware frontend.
Differential Revision: https://reviews.llvm.org/D65948
llvm-svn: 368432
I've now needed to add an extra parameter to this call twice recently. Not only
is the signature getting extremely unwieldy, but just updating all of the
callsites and implementations is a pain. Putting the parameters in a struct
sidesteps both issues.
llvm-svn: 368408
This isn't the most robust error handling API, but does allow clients to
opt-in to getting Errors they can handle. I suspect the long-term
solution would be to move away from the lazy unit parsing and have an
explicit step that parses the unit and then allows access to the other
APIs that require a parsed unit.
llvm-dwarfdump could be expanded to use this (or newer/better API) to
demonstrate the benefit of it - but for now lld will use this in a
follow-up cl which ensures lld can exit non-zero on errors like this (&
provide more descriptive diagnostics including which object file the
error came from).
(error access to later errors when parsing nested DIEs would be good too
- but, again, exposing that without it being a hassle for every consumer
may be tricky)
llvm-svn: 368377
GlobalAlias and GlobalIFunc ought to be treated the same by the IR
linker, so we can generalize the code to be in terms of their common
base class GlobalIndirectSymbol.
Differential Revision: https://reviews.llvm.org/D55046
llvm-svn: 368357
the bitcode has the arm64 retainAutoreleasedReturnValue marker
The ARC middle-end passes stopped optimizing or transforming bitcode
that has been compiled with old compilers after we started emitting
calls to ARC runtime functions as intrinsic calls instead of normal
function calls in the front-end and made changes to teach the ARC
middle-end passes about those intrinsics (see r349534). This patch
converts calls to ARC runtime functions that are not intrinsic functions
to intrinsic function calls if the bitcode has the arm64
retainAutoreleasedReturnValue marker. Checking for the presence of the
marker is necessary to make sure we aren't changing ARC function calls
that were originally MRR message sends (see r349952).
rdar://problem/53280660
Differential Revision: https://reviews.llvm.org/D65902
llvm-svn: 368311
Summary:
This patch enable assembly output of local commons for AIX using .lcomm
directives. Adds a EmitXCOFFLocalCommonSymbol to MCStreamer so we can emit the
AIX version of .lcomm assembly directives which include a csect name. Handle the
case of BSS locals in PPCAIXAsmPrinter by using EmitXCOFFLocalCommonSymbol. Adds
a test for generating .lcomm on AIX Targets.
Reviewers: cebowleratibm, hubert.reinterpretcast, Xiangling_L, jasonliu, sfertile
Reviewed By: sfertile
Subscribers: wuzish, nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64825
llvm-svn: 368306
This adds pre- and post- increment and decrements for MVE loads and stores. It
uses the builtin pre and post load/store detection, unlike Neon. Loads are
selected with the code in tryT2IndexedLoad, stores are selected with tablegen
patterns. The immediates have a +/-7bit range, multiplied by the size of the
element.
Differential Revision: https://reviews.llvm.org/D63840
llvm-svn: 368305
For some targets the LICM pass can result in sub-optimal code in some
cases where it would be better not to run the pass, but it isn't
always possible to suppress the transformations heuristically.
Where the front-end has insight into such cases it is beneficial
to attach loop metadata to disable the pass - this change adds the
llvm.licm.disable metadata to enable that.
Differential Revision: https://reviews.llvm.org/D64557
llvm-svn: 368296
In some cases a symbol might have section index == SHN_XINDEX.
This is an escape value indicating that the actual section header index
is too large to fit in the containing field.
Then the SHT_SYMTAB_SHNDX section is used. It contains the 32bit values
that stores section indexes.
ELF gABI says that there can be multiple SHT_SYMTAB_SHNDX sections,
i.e. for example one for .symtab and one for .dynsym
(1) https://groups.google.com/forum/#!topic/generic-abi/-XJAV5d8PRg
(2) DT_SYMTAB_SHNDX: http://www.sco.com/developers/gabi/latest/ch5.dynamic.html
In this patch I am only supporting a single SHT_SYMTAB_SHNDX associated
with a .symtab. This is a more or less common case which is used a few tests I saw in LLVM.
I decided not to create the SHT_SYMTAB_SHNDX section as "implicit",
but implement is like a kind of regular section for now.
i.e. tools do not recreate this section or its content, like they do for
symbol table sections, for example. That should allow to write all kind of
possible broken test cases for our needs and keep the output closer to requested.
Differential revision: https://reviews.llvm.org/D65446
llvm-svn: 368272
Currently, we have a code duplication in llvm-readobj which was introduced in D63266.
The duplication was introduced to allow llvm-readobj to dump the partially
broken object. Methods in ELFFile<ELFT> perform a strict validation of the inputs,
what is itself good, but not for dumper tools, that might want to dump the information,
even if some pieces are broken/unexpected.
This patch introduces a warning handler which can be passed to ELFFile<ELFT> methods
and can allow skipping the non-critical errors when needed/possible.
For demonstration, I removed the duplication from llvm-readobj and implemented a warning using
the new custom warning handler. It also deduplicates the strings printed, making the output less verbose.
Differential revision: https://reviews.llvm.org/D65515
llvm-svn: 368260
Summary:
The ever growing switch required Attribute::AttrKind values but they
might not be available for all abstract attributes we deduce. With the
new method we track statistics at the abstract attribute level. The
provided macros simplify the usage and make the messages uniform.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65732
llvm-svn: 368227
Summary:
The wrapper reduces boilerplate code and also provide a nice way to
determine the state type used by an abstract attributes statically via
AAType::StateType.
This was already discussed as part of the review of D65711.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65786
llvm-svn: 368224
Summary:
So far, whenever one wants to look at returned values, one had to deal
with the AAReturnedValues and potentially with the AAIsDead attribute.
In the same spirit as other checkForAllXXX methods, we add this
functionality now to the Attributor. By adopting the use sites we got
better results when return instructions were dead.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65733
llvm-svn: 368222
Without this patch computeConstantDifference returns None for cases like
these:
computeConstantDifference(%x, %x)
computeConstantDifference({%x,+,16}, {%x,+,16})
Differential Revision: https://reviews.llvm.org/D65474
llvm-svn: 368193
Some of these names were abbreviated, some were not, some pluralised,
some not. Made the API difficult to use - since it's an exact 1:1
mapping to the DWARF sections - use those names (changing underscore
separation for camel casing).
llvm-svn: 368189
- Remove support for non-recursive mutexes. This was unused.
- The std::recursive_mutex is now created/destroyed unconditionally.
Locking is still only done if threading is enabled.
- Alias SmartScopedLock to std::lock_guard.
This should make no semantic difference on the existing APIs.
llvm-svn: 368158
This was initially committed in r368059 but got reverted in r368084
because there was a faulty logic in how the shift amounts type mismatch
was being handled (it simply wasn't).
I've added an explicit bailout before we SimplifyAddInst() - i don't think
it's designed in general to handle differently-typed values, even though
the actual problem only comes from ConstantExpr's.
I have also changed the common type deduction, to not just blindly
look past zext, but try to do that so that in the end types match.
Differential Revision: https://reviews.llvm.org/D65380
llvm-svn: 368141
When e_shstrndx is equal to SHN_XINDEX,
the index of the section string table section should
be taken from the sh_link field of the section
header at index 0.
If sh_link is broken, e.g. contains an index that is
larger than number of sections, then error is reported.
This error message was untested before.
Differential revision: https://reviews.llvm.org/D65391
llvm-svn: 368139
Globals are instrumented by adding a pointer tag to their symbol values
and emitting metadata into a special section that allows the runtime to tag
their memory when the library is loaded.
Due to order of initialization issues explained in more detail in the comments,
shadow initialization cannot happen during regular global initialization.
Instead, the location of the global section is marked using an ELF note,
and we require libc support for calling a function provided by the HWASAN
runtime when libraries are loaded and unloaded.
Based on ideas discussed with @evgeny777 in D56672.
Differential Revision: https://reviews.llvm.org/D65770
llvm-svn: 368102
This reverts r368059 (git commit 0f95710976)
This caused Clang to assert while self-hosting and compiling
SystemZInstrInfo.cpp. Reduction is running.
llvm-svn: 368084
Commit r368064 was necessary after r367953 (D65712) broke the module
build. That happened, apparently, because the template class IRAttribute
defined in the header had a virtual method defined in the corresponding
source file (IRAttribute::manifest). To unbreak the situation this patch
introduces a helper function IRAttributeManifest::manifestAttrs which
is used to implement IRAttribute::manifest in the header. The deifnition
of the helper function is still in the source file.
Patch by jdoerfert (Johannes Doerfert)
Differential Revision: https://reviews.llvm.org/D65821
llvm-svn: 368076
https://reviews.llvm.org/D65698
This adds a KnownBits analysis pass for GISel. This was done as a
pass (compared to static functions) so that we can add other features
such as caching queries(within a pass and across passes) in the future.
This patch only adds the basic pass boiler plate, and implements a lazy
non caching knownbits implementation (ported from SelectionDAG). I've
also hooked up the AArch64PreLegalizerCombiner pass to use this - there
should be no compile time regression as the analysis is lazy.
llvm-svn: 368065
Summary:
This is tested by D61289 but has been pulled into a separate patch at
a reviewers request.
Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm, rovka
Reviewed By: arsenm
Subscribers: javed.absar, hiraditya, wdng, kristof.beyls, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61321
llvm-svn: 368063
Summary:
When fiddling with sched profiles, especially creating new ones, it's amazingly easy
to end up with malformed .td that crashes tablegen, without explanation of the bug.
This changes the most common assertion i have encountered to dump enough information
to be able to fix the .td
Split of from D63628
Reviewers: RKSimon, craig.topper, nhaehnle
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65790
llvm-svn: 368060
Summary:
Currently `reassociateShiftAmtsOfTwoSameDirectionShifts()` only handles
two shifts one after another. If the shifts are `shl`, we still can
easily perform the fold, with no extra legality checks:
https://rise4fun.com/Alive/OQbM
If we have right-shift however, we won't be able to make it
any simpler than it already is.
After this the only thing missing here is constant-folding: (`NewShAmt >= bitwidth(X)`)
* If it's a logical shift, then constant-fold to `0` (not `undef`)
* If it's a `ashr`, then a splat of original signbit
https://rise4fun.com/Alive/E1Khttps://rise4fun.com/Alive/i0V
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65380
llvm-svn: 368059
This updates all libraries and tools in LLVM Core to use 64-bit offsets
which directly or indirectly come to DataExtractor.
Differential Revision: https://reviews.llvm.org/D65638
llvm-svn: 368014
Using 64-bit offsets is required to fully implement 64-bit DWARF.
As these classes are used in many different libraries they should
temporarily support both 32- and 64-bit offsets.
Differential Revision: https://reviews.llvm.org/D64006
llvm-svn: 368013
This patch changes the DAG legalizer to respect the operation actions
set by the target for strict floating-point operations. (Currently, the
legalizer will usually fall back to mutate to the non-strict action
(which is assumed to be legal), and only skip mutation if the strict
operation is marked legal.)
With this patch, if whenever a strict operation is marked as Legal or
Custom, it is passed to the target as usual. Only if it is marked as
Expand will the legalizer attempt to mutate to the non-strict operation.
Note that this will now fail if the non-strict operation is itself
marked as Custom -- the target will have to provide a Custom definition
for the strict operation then as well.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D65226
llvm-svn: 368012
Summary:
Before this patch MGATHER/MSCATTER is capable of representing all
common addressing modes, but only when illegal types are used.
This patch adds an IndexType property so more representations
are available when using legal types only.
Original modes:
vector of bases
base + vector of signed scaled offsets
New modes:
base + vector of signed unscaled offsets
base + vector of unsigned scaled offsets
base + vector of unsigned unscaled offsets
The current behaviour of addressing modes for gather/scatter remains
unchanged.
Patch by Paul Walker.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D65636
llvm-svn: 368008
Added two more conversions to satisfy MSVC and moved the declaration of
MCPhysReg to MCRegister.h to enable that
This reverts r367932 (git commit eac86ec25f)
llvm-svn: 367965
Summary:
Similar to `Attributor::checkForAllCallSites`, we now provide such
functionality for instructions of a certain opcode through
`Attributor::checkForAllInstructions` and the convenient wrapper
`Attributor::checkForAllCallLikeInstructions`. This cleans up code,
avoids duplication, and simplifies the usage of liveness information.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65731
llvm-svn: 367961
Summary:
Certain properties, e.g., an AttrKind, are not shared among all abstract
attributes. This patch extracts the functionality into a helper struct.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65712
llvm-svn: 367953
To remove boilerplate, mostly passing through values to the
AbstractAttriubute base class, we extract the state into an IRPosition
helper. There is no function change intended but the IRPosition struct
will provide more functionality down the line.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65711
llvm-svn: 367952
Summary:
The new scheme is similar to the pass manager and dyn_cast scheme where
we identify classes by the address of a static member. This is better
than the old scheme in which we had to "invent" new Attributor enums if
there was no corresponding one.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65710
llvm-svn: 367951
Summary:
Instead of storing the reference to the InformationCache we now pass it
whenever it might be needed.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65709
llvm-svn: 367950
Summary:
When the WebAssembly backend encounters a return type that doesn't
fit within i32, SelectionDAG performs sret demotion, adding an
additional argument to the start of the function that contains
a pointer to an sret buffer to use instead. However, this conflicts
with the emscripten sjlj lowering pass. There we translate calls like:
```
call {i32, i32} @foo()
```
into (in pseudo-llvm)
```
%addr = @foo
call {i32, i32} @__invoke_{i32,i32}(%addr)
```
i.e. we perform an indirect call through an extra function.
However, the sret transform now transforms this into
the equivalent of
```
%addr = @foo
%sret = alloca {i32, i32}
call {i32, i32} @__invoke_{i32,i32}(%sret, %addr)
```
(while simultaneously translation the implementation of @foo as well).
Unfortunately, this doesn't work out. The __invoke_ ABI expected
the function address to be the first argument, causing crashes.
There is several possible ways to fix this:
1. Implementing the sret rewrite at the IR level as well and performing
it as part of lowering to __invoke
2. Fixing the wasm backend to recognize that __invoke has a special ABI
3. A change to the binaryen/emscripten ABI to recognize this situation
This revision implements the middle option, teaching the backend to
treat __invoke_ functions specially in sret lowering. This is achieved
by
1) Introducing a new CallingConv ID for invoke functions
2) When this CallingConv ID is seen in the backend and the first argument
is marked as sret (a function pointer would never be marked as sret),
swapping the first two arguments.
Reviewed By: tlively, aheejin
Differential Revision: https://reviews.llvm.org/D65463
llvm-svn: 367935
When we remove instructions cached references could still be live. This
patch avoids removing invoke instructions that are replaced by calls and
instead keeps them around but in a dead block.
llvm-svn: 367933
MSVC finds ambiguity where clang doesn't and it looks like it's not going to be an easy fix
Reverting while I figure out how to fix it
This reverts r367916 (git commit aa15ec3c23)
This reverts r367920 (git commit 5d14efe279)
llvm-svn: 367932
Any addresses that we pass to llvm-symbolizer are going to be untagged,
while any HWASAN instrumented globals are going to be tagged in the
symbol table. Therefore we need to untag the addresses before using them.
Differential Revision: https://reviews.llvm.org/D65769
llvm-svn: 367926
FastISel already does this since the initial arm64 port was upstreamed, so
it seems there are no issues with doing this at -O0 for very small memcpys.
Gives a 0.2% geomean code size improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D65758
llvm-svn: 367919
Summary:
This has no functional effect but makes it more obvious which parts of the
compiler do not use Register/MCRegister when you mark the implicit conversion
deprecated.
Implicit conversions for comparisons accounted for ~20% (~3k of ~13k) of
the implicit conversions when I first measured it. I haven't maintained
those numbers as other patches have landed though so it may be out of date.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65678
llvm-svn: 367916