This is partial recommit of r325224, reverted in 325227. The relevant
part of original comment is below.
Analysis of fails in the case of out of memory errors can be tricky on
Windows. Such error emerges at the point where memory allocation function
fails, but manifests itself when null pointer is used. These two points
may be distant from each other. Besides, next runs may not exhibit
allocation error.
Usual programming practice does not require checking result of 'operator
new' because it throws 'std::bad_alloc' in the case of allocation error.
However, LLVM is usually built with exceptions turned off, so 'new' can
return null pointer. This change installs custom new handler, which causes
fatal error in the case of out of memory. The handler is installed
automatically prior to call to 'main' during construction of a static
object defined in 'lib/Support/ErrorHandling.cpp'. If the application does
not use this file, the handler may be installed manually by a call to
'llvm::install_out_of_memory_new_handler', declared in
'include/llvm/Support/ErrorHandling.h".
Differential Revision: https://reviews.llvm.org/D43010
llvm-svn: 325426
Sadly, r324359 caused at least PR36312. There is a patch out for review
but it seems to be taking a bit and we've already had these crashers in
tree for too long. We're hitting this PR in real code now and are
blocked on shipping new compilers as a consequence so I'm reverting us
back to green.
Sorry for the churn due to the stacked changes that I had to revert. =/
llvm-svn: 325420
Summary:
Gold plugin does not add pass to ThinLTO modules without useful symbols.
In this case ThinLTO can't create corresponding index file and some features, like CFI,
cannot be processes by backed correctly without index.
Given that we don't need the backed output we can request it to avoid
processing the module. This is implemented by this patch using new
"SkipModuleByDistributedBackend" flag.
Reviewers: pcc, tejohnson
Subscribers: mehdi_amini, inglorion, eraman, cfe-commits
Differential Revision: https://reviews.llvm.org/D42995
llvm-svn: 325411
...and delete the equivalent local functiona from InstCombine.
These might be useful to other InstCombine files or other passes
and makes FP queries more similar to integer constant queries.
llvm-svn: 325398
This was originally reported as a bug with the symptom being "cvdump
crashes when printing an LLD-linked PDB that has an S_FILESTATIC record
in it". After some additional investigation, I determined that this was
a symptom of a larger problem, and in fact the real problem was in the
way we emitted the global PDB string table. As evidence of this, you can
take any lld-generated PDB, run cvdump -stringtable on it, and it would
return no results.
My hypothesis was that cvdump could not *find* the string table to begin
with. Normally it would do this by looking in the "named stream map",
finding the string /names, and using its value as the stream index. If
this lookup fails, then cvdump would fail to load the string table.
To test this hypothesis, I looked at the name stream map generated by a
link.exe PDB, and I emitted exactly those bytes into an LLD-generated
PDB. Suddenly, cvdump could read our string table!
This code has always been hacky and we knew there was something we
didn't understand. After all, there were some comments to the effect of
"we have to emit strings in a specific order, otherwise things don't
work". The key to fixing this was finally understanding this.
The way it works is that it makes use of a generic serializable hash map
that maps integers to other integers. In this case, the "key" is the
offset into a buffer, and the value is the stream number. If you index
into the buffer at the offset specified by a given key, you find the
name. The underlying cause of all these problems is that we were using
the identity function for the hash. i.e. if a string's offset in the
buffer was 12, the hash value was 12. Instead, we need to hash the
string *at that offset*. There is an additional catch, in that we have
to compute the hash as a uint32 and then truncate it to uint16.
Making this work is a little bit annoying, because we use the same hash
table in other places as well, and normally just using the identity
function for the hash function is actually what's desired. I'm not
totally happy with the template goo I came up with, but it works in any
case.
The reason we never found this bug through our own testing is because we
were building a /parallel/ hash table (in the form of an
llvm::StringMap<>) and doing all of our lookups and "real" hash table
work against that. I deleted all of that code and now everything goes
through the real hash table. Then, to test it, I added a unit test which
adds 7 strings and queries the associated values. I test every possible
insertion order permutation of these 7 strings, to verify that it really
does work as expected.
Differential Revision: https://reviews.llvm.org/D43326
llvm-svn: 325386
Summary:
The LazyValueInfo pass caches a copy of the DominatorTree when available.
Whenever there are pending DominatorTree updates within JumpThreading's
DeferredDominance object we cannot use the cached DT for LVI analysis.
This commit adds the new methods enableDT() and disableDT() to LVI.
JumpThreading also sets the appropriate usage model before calling LVI
analysis methods.
Fixes https://bugs.llvm.org/show_bug.cgi?id=36133
Reviewers: sebpop, dberlin, kuhar
Reviewed by: sebpop, kuhar
Subscribers: uabelho, llvm-commits, aprantl, hiraditya, a.elovikov
Differential Revision: https://reviews.llvm.org/D42717
llvm-svn: 325356
There is a latent Windows kernel bug, the exact trigger
conditions are not well understood, which can cause a file
to be correctly written, but unable to be correctly read.
The workaround appears to be simply calling FlushFileBuffers.
Differential Revision: https://reviews.llvm.org/D42925
llvm-svn: 325274
This is mainly a move of simplifyShuffleOperands from DAGCombiner::visitVECTOR_SHUFFLE to create a more general purpose TargetLowering::SimplifyDemandedVectorElts implementation.
Further features can be moved/added in future patches.
Differential Revision: https://reviews.llvm.org/D42896
llvm-svn: 325232
Analysis of fails in the case of out of memory errors can be tricky on
Windows. Such error emerges at the point where memory allocation function
fails, but manifests itself when null pointer is used. These two points
may be distant from each other. Besides, next runs may not exhibit
allocation error.
Usual programming practice does not require checking result of 'operator
new' because it throws 'std::bad_alloc' in the case of allocation error.
However, LLVM is usually built with exceptions turned off, so 'new' can
return null pointer. This change installs custom new handler, which causes
fatal error in the case of out of memory. The handler is installed
automatically prior to call to 'main' during construction of a static
object defined in 'lib/Support/ErrorHandling.cpp'. If the application does
not use this file, the handler may be installed manually by a call to
'llvm::install_out_of_memory_new_handler', declared in
'include/llvm/Support/ErrorHandling.h".
There are calls to C allocation functions, malloc, calloc and realloc.
They are used for interoperability with C code, when allocated object has
variable size and when it is necessary to avoid call of constructors. In
many calls the result is not checked against null pointer. To simplify
checks, new functions are defined in the namespace 'llvm' with the
same names as these C function. These functions produce fatal error if
allocation fails. User should use 'llvm::malloc' instead of 'std::malloc'
in order to use the safe variant. This change replaces 'std::malloc'
in the cases when the result of allocation function is not checked against
null pointer.
Finally, there are plain C code, that uses malloc and similar functions. If
the result is not checked, assert statements are added.
Differential Revision: https://reviews.llvm.org/D43010
llvm-svn: 325224
Summary:
TypeID summaries are used by CFI and need to be serialized by ThinLTO
indexing for later use by LTO Backend.
Reviewers: tejohnson, pcc
Subscribers: mehdi_amini, inglorion, eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D42611
llvm-svn: 325182
Summary:
This patch adds templated functions to MachineIRBuilder for some opcodes
and adds pattern matcher support for G_AND and G_OR.
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D43309
llvm-svn: 325162
So that macros defined in inline assembly blocks are available to the
whole file.
This provides a consistent behavior with other assembly directives,
since equations for example are already preserved between inline
assembly blocks.
PR: 36110
Patch by Roger!
llvm-svn: 325139
Summary:
This patch implements a variant of the DJB hash function which folds the
input according to the algorithm in the Dwarf 5 specification (Section
6.1.1.4.5), which in turn references the Unicode Standard (Section 5.18,
"Case Mappings").
To achieve this, I have added a llvm::sys::unicode::foldCharSimple
function, which performs this mapping. The implementation of this
function was generated from the CaseMatching.txt file from the Unicode
spec using a python script (which is also included in this patch). The
script tries to optimize the function by coalescing adjecant mappings
with the same shift and stride (terms I made up). Theoretically, it
could be made a bit smarter and merge adjecant blocks that were
interrupted by only one or two characters with exceptional mapping, but
this would save only a couple of branches, while it would greatly
complicate the implementation, so I deemed it was not worth it.
Since we assume that the vast majority of the input characters will be
US-ASCII, the folding hash function has a fast-path for handling these,
and only whips out the full decode+fold+encode logic if we encounter a
character outside of this range. It might be possible to implement the
folding directly on utf8 sequences, but this would also bring a lot of
complexity for the few cases where we will actually need to process
non-ascii characters.
Reviewers: JDevlieghere, aprantl, probinson, dblaikie
Subscribers: mgorny, hintonda, echristo, clayborg, vleschuk, llvm-commits
Differential Revision: https://reviews.llvm.org/D42740
llvm-svn: 325107
Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout.
p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits.
The index size parameter is optional, if not specified, it is equal to the pointer size.
Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width.
It works fine if you can convert pointer to integer for address calculation and all registered targets do this.
But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout.
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html
I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account.
This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size.
Differential Revision: https://reviews.llvm.org/D42123
llvm-svn: 325102
* Document most API's
* Delete a useless function call
* Fix a discrepancy between the single and multi-opcode variants of
getActionDefinitions().
The multi-opcode variant now requires that more than one opcode is requested.
Previously it acted much like the single-opcode form but unnecessarily
enforced the requirements of the multi-opcode form.
llvm-svn: 325067
Seeing some inlining missing in internal uses of symbolizer. I'll work
on a reproduction, tests, improvements & recommit as soon as possible.
(Chandler would like it to be known that this improvement did make
check-llvm 4x faster... - so there's certainly some fairly good
motivation to push on fixing/figuring this out & getting it back in)
This reverts commit r321345.
llvm-svn: 324981
Summary:
This patch makes postdominators always recalculate the tree when an update causes to change the tree roots.
As @dmgreen noticed in [[ https://reviews.llvm.org/D41298 | D41298 ]], the previous implementation was not conservative enough and it was possible to end up with a PostDomTree that was different than a freshly computed one.
The patch also compares postdominators with a freshly computed tree at the end of full verification to make sure we don't hit similar issues in the future.
This should (ideally) be also backported to 6.0 before the release, although I don't have any reports of this causing an observable error. It should be safe to do it even if it's late in the release, as the change only makes the current behavior more conservative.
Reviewers: dmgreen, dberlin, davide, brzycki, grosser
Reviewed By: brzycki, grosser
Subscribers: llvm-commits, dmgreen
Differential Revision: https://reviews.llvm.org/D43140
llvm-svn: 324962
It caused assertion failure
Assertion failed: (!DD.IsLambda && !MergeDD.IsLambda && "faked up lambda definition?"), function MergeDefinitionData, file /Users/buildslave/jenkins/workspace/clang-stage1-configure-RA/llvm/tools/clang/lib/Serialization/ASTReaderDecl.cpp, line 1675.
on the second stage build bots.
llvm-svn: 324932
Rather than encode the absence of a checksum with a Kind variant, instead put
both the kind and value in a struct and wrap it in an Optional.
Differential Revision: http://reviews.llvm.org/D43043
llvm-svn: 324928
Armv8.1-A added an atomic load-clear instruction (which performs bitwise
and with the complement of it's operand), but not a load-and
instruction. Our current code-generation for atomic load-and always
inserts an MVN instruction to invert its argument, even if it could be
folded into a constant or another instruction.
This adds lowering early in selection DAG to convert a load-and
operation into an xor with -1 and a load-clear, allowing the normal DAG
optimisations to work on it.
To do this, I've had to add a new ISD opcode, ATOMIC_LOAD_CLR. I don't
see any easy way to do this with an AArch64-specific ISD node, because
the code-generation for atomic operations assumes the SDNodes are of
type AtomicSDNode.
I've left the old tablegen patterns in because they are still needed for
global isel.
Differential revision: https://reviews.llvm.org/D42478
llvm-svn: 324908
Summary:
For better vectorization result we should take into consideration the
cost of the user insertelement instructions when we try to
vectorize sequences that build the whole vector. I.e. if we have the
following scalar code:
```
<Scalar code>
insertelement <ScalarCode>, ...
```
we should consider the cost of the last `insertelement ` instructions as
the cost of the scalar code.
Reviewers: RKSimon, spatel, hfinkel, mkuper
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42657
llvm-svn: 324893
The current implementation of `getPostIncExpr` invokes `getAddExpr` for two recurrencies
and expects that it always returns it a recurrency. But this is not guaranteed to happen if we
have reached max recursion depth or refused to make SCEV simplification for other reasons.
This patch changes its implementation so that now it always returns SCEVAddRec without
relying on `getAddExpr`.
Differential Revision: https://reviews.llvm.org/D42953
llvm-svn: 324866
Add GraphTraits definitions to the FunctionSummary and ModuleSummaryIndex classes. These GraphTraits will be used to construct find SCC's in ThinLTO analysis passes.
llvm-svn: 324854
SelectionDAG::getBoolConstant was recently introduced. At the time I didn't know getConstTrueVal existed, but I think getBoolConstant is better as it will use the source VT to make sure it can properly detect floating point if it is configured differently.
llvm-svn: 324832
Summary:
This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR.
This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares.
Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it.
This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once.
Reviewers: spatel, delena, RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43137
llvm-svn: 324827
This adds a wasm-import-module function attribute and a .import_module
assembler directive, for specifying module import names for WebAssembly.
Currently these may only be used for function symbols; global variables
may be considered in the future.
WebAssembly has a two-level namespace scheme for symbols, and it's
normally the linker's job to assign the module name, which is the
first-level name. The attributes here allow users to specify their
own module names explicitly, which is useful for tools generating
bindings to modules defined in other languages.
This feature is not fully usable yet. It will evolve along with the
ongoing symbol table and lld changes.
Differential Revision: https://reviews.llvm.org/D42520
llvm-svn: 324778
Summary:
Since the lifetime markers are metadata instructions, they should probably be treated as such by the isMetaInstruction predicate.
There was no issue that provoked this change, I just ran across it while investigating another issue.
Reviewers: aprantl, MatzeB
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43111
llvm-svn: 324771
Peviously we were reporting undefined symbol as being defined
by the IMPORT sections.
This change reports undefined symbols in the same that other
formats do, and also removes the need to store the section
with each symbol (since it can be derived from the symbol
type).
Differential Revision: https://reviews.llvm.org/D43101
llvm-svn: 324770
Extend salvageDebugInfo to preserve the debug info from a dead 'or'
with a constant.
Patch by Ismail Badawi!
Differential Revision: https://reviews.llvm.org/D43129
llvm-svn: 324764
Before it was declared at a struct, which differs from
DominatorTree. Make it a class so both can be declared
the same way without hitting the warning about mismatched
struct vs. class declarations.
llvm-svn: 324760
Rely on the assembler to finalize the layout of the DWARF/Itanium
exception-handling LSDA. Rather than calculate the exact size of each
thing in the LSDA, use assembler directives:
To emit the offset to the TTBase label:
.uleb128 .Lttbase0-.Lttbaseref0
.Lttbaseref0:
To emit the size of the call site table:
.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
... call site table entries ...
.Lcst_end0:
To align the type info table:
... action table ...
.balign 4
.long _ZTIi
.long _ZTIl
.Lttbase0:
Using assembler directives simplifies the compiler and allows switching
the encoding of offsets in the call site table from udata4 to uleb128 for
a large code size savings. (This commit does not change the encoding.)
The combination of the uleb128 followed by a balign creates an unfortunate
dependency cycle that the assembler must sometimes resolve either by
padding an LEB or by inserting zero padding before the type table. See
PR35809 or GNU as bug 4029.
Patch by Ryan Prichard!
llvm-svn: 324749
Summary:
The class contained arrays of two structures (DataArray and HashData).
These structures were in 1:1 correspondence, and one of them contained
pointers to the other (and *both* contained a "Name" field). By merging
these two structures into one, we can save a bit of space without
negatively impacting much of anything.
Reviewers: JDevlieghere, aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43073
llvm-svn: 324724
Handles were returned by addModule and used as keys for removeModule,
findSymbolIn, and emitAndFinalize. Their job is now subsumed by VModuleKeys,
which simplify resource management by providing a consistent handle across all
layers.
llvm-svn: 324700
Including a blank file is confusing and makes it look like something
went wrong. Rather than requiring people know why this is blank, let's
just make it explicitly #undef the macro that it would define if it
weren't empty.
llvm-svn: 324659
Many in SimplifySetCC and FoldSetCC try to create true or false constants. Some of them query getBooleanContents to figure out whether to use all ones or just 1 for true. But many places do not check and just use 1 without ensuring the VT has an i1 scalar type. Note sure if those places only trigger before type legalization so they only see an i1
type?
To cleanup the inconsistency and reduce some duplicated code, this patch adds a getBoolConstant method to SelectionDAG that takes are of querying getBooleanContents and doing the right thing.
Differential Revision: https://reviews.llvm.org/D43037
llvm-svn: 324634
This is a support change for a CFE change (https://reviews.llvm.org/D42978)
that allows march and -target-cpu to list the valid targets in a note. The changes
are limited to the ARM/AArch64, since this is the only target that gets the CPU
list from LLVM.
llvm-svn: 324623
Summary:
Right now using a ProcResource automatically counts as usage of all
super ProcResGroups. All this is done during codegen, so there is no
way for schedulers to get this information at runtime.
This adds the information of which individual ProcRes units are
contained in a ProcResGroup in MCProcResourceDesc.
Reviewers: gchatelet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43023
llvm-svn: 324582
The issue is that clang was first creating a extern_weak hidden GV and
then changing the linkage to external.
Once we know it is not extern_weak we know it must be dso_local.
This patch refactors the code that sets the implicit dso_local to a
helper private function that is used every time we change the linkage
or visibility.
I will commit a patch to clang in a minute.
llvm-svn: 324551
This patch is the LLVM part of fixing the issues, described in
https://bugs.llvm.org/show_bug.cgi?id=36168
* The representation of enumerator values in the debug info metadata now
contains a boolean flag isUnsigned, which determines how the bits of
the value are interpreted.
* The DW_TAG_enumeration type DIE now always (for DWARF version >= 3)
includes a DW_AT_type attribute, which refers to the underlying
integer type, as suggested in DWARFv4 (5.7 Enumeration Type Entries).
* The debug info metadata for enumeration type contains (in flags)
indication whether this is a C++11 "fixed enum".
* For C++11 enumeration with a fixed underlying type, the DIE also
includes the DW_AT_enum_class attribute (for DWARF version >= 4).
* Encoding of enumerator constants uses DW_FORM_sdata for signed values
and DW_FORM_udata for unsigned values, as suggested by DWARFv4 (7.5.4
Attribute Encodings).
The changes should be backwards compatible:
* the isUnsigned attribute is optional and defaults to false.
* if the underlying type for the enumeration is not available, the
enumerator values are considered signed.
* the FixedEnum flag defaults to clear.
* the bitcode format for DIEnumerator stores the unsigned flag bit #1 of
the first record element, so the format does not change and the zero
previously stored there is consistent with the false default for
IsUnsigned.
Differential Revision: https://reviews.llvm.org/D42734
llvm-svn: 324489
The failures happened because of assert which was overconfident about
SCEV's proving capabilities and is generally not valid.
Differential Revision: https://reviews.llvm.org/D42835
llvm-svn: 324473
Sometimes `isLoopEntryGuardedByCond` cannot prove predicate `a > b` directly.
But it is a common situation when `a >= b` is known from ranges and `a != b` is
known from a dominating condition. Thia patch teaches SCEV to sum these facts
together and prove strict comparison via non-strict one.
Differential Revision: https://reviews.llvm.org/D42835
llvm-svn: 324453
Summary:
A recent fix to drop dead symbols (r323633) did not work for ThinLTO
distributed backends because we lose the WithGlobalValueDeadStripping
set on the index during the thin link. This patch adds a new flags
record to the bitcode format for the index, and serializes this flag
for the combined index (it would always be 0 for the per-module index
generated by the compile step, so no need to serialize the new flags
record there until/unless we add another flag that applies to the
per-module indexes).
Generally this flag should always be set for the distributed backends,
which are necessarily performed after the thin link. However, if we were
to simply set this flag on the index applied to the distributed backends
(invoked via clang), we would lose the ability to disable dead stripping
via -compute-dead=false for debugging purposes.
Reviewers: grimar, pcc
Subscribers: mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D42799
llvm-svn: 324444
Summary:
Some of the commands tries to get the register without checking
if the specified operands is a register and causing crash. All commands
should check the type of the operand first and reject if the type is
not expected.
Reviewers: dsanders, qcolombet
Reviewed By: qcolombet
Subscribers: qcolombet, rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42984
llvm-svn: 324442
n Rust, an enum that carries data in the variants is, essentially, a
discriminated union. Furthermore, the Rust compiler will perform
space optimizations on such enums in some situations. Previously,
DWARF for these constructs was emitted using a hack (a magic field
name); but this approach stopped working when more space optimizations
were added in https://github.com/rust-lang/rust/pull/45225.
This patch changes LLVM to allow discriminated unions to be
represented in DWARF. It adds createDiscriminatedUnionType and
createDiscriminatedMemberType to DIBuilder and then arranges for this
to be emitted using DWARF's DW_TAG_variant_part and DW_TAG_variant.
Note that DWARF requires that a discriminated union be represented as
a structure with a variant part. However, as Rust only needs to emit
pure discriminated unions, this is what I chose to expose on
DIBuilder.
Patch by Tom Tromey!
Differential Revision: https://reviews.llvm.org/D42082
llvm-svn: 324426
In particular this patch switches RTDyldObjectLinkingLayer to use
orc::SymbolResolver and threads the requried changse (ExecutionSession
references and VModuleKeys) through the existing layer APIs.
The purpose of the new resolver interface is to improve query performance and
better support parallelism, both in JIT'd code and within the compiler itself.
The most visibile change is switch of the <Layer>::addModule signatures from:
Expected<Handle> addModule(std::shared_ptr<ModuleType> Mod,
std::shared_ptr<JITSymbolResolver> Resolver)
to:
Expected<Handle> addModule(VModuleKey K, std::shared_ptr<ModuleType> Mod);
Typical usage of addModule will now look like:
auto K = ES.allocateVModuleKey();
Resolvers[K] = createSymbolResolver(...);
Layer.addModule(K, std::move(Mod));
See the BuildingAJIT tutorial code for example usage.
llvm-svn: 324405
Generalize existing constant matching to work with non-uniform constant vectors as well.
Differential Revision: https://reviews.llvm.org/D42818
llvm-svn: 324369
Instruction Selection
Cleanup cycle/validity checks in ISel (IsLegalToFold,
HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full
search for cycles / dependencies pruning the search when topological
property of NodeId allows.
As part of this propogate the NodeId-based cutoffs to narrow
hasPreprocessorHelper searches.
Reviewers: craig.topper, bogner
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D41293
llvm-svn: 324359
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base
address offsets.
SPEC2017 on Ryzen shows no significant perf difference.
Differential Revision: https://reviews.llvm.org/D42607
llvm-svn: 324289
The major visible difference here is that in line-table dumps,
directory and file names are wrapped in double-quotes; previously,
directory names got single quotes and file names were not quoted at
all.
The improvement in this patch is that when a DWARF v5 line table
header has indirect strings, in a verbose dump these will all have
their section[offset] printed as well as the name itself. This
matches the format used for dumping strings in the .debug_info
section.
Differential Revision: https://reviews.llvm.org/D42802
llvm-svn: 324270
Summary:
This complements the fixes in r323633 and r324075 which drop the
definitions of dead functions and variables, respectively.
Fixes PR36208.
Reviewers: grimar, rafael
Subscribers: mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D42856
llvm-svn: 324242
The type-shrinking logic in reduction detection, although narrow in scope, is
also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies
the approach to rely on the demanded bits and value tracking analyses, if
available. We currently perform type-shrinking separately for reductions and
other instructions in the loop. Long-term, we should probably think about
computing minimal bit widths in a more complete way for the loops we want to
vectorize.
PR35734
Differential Revision: https://reviews.llvm.org/D42309
llvm-svn: 324195
Clang already stopped using these a couple months ago.
The test cases aren't great as there is nothing forcing the operations to stay in k-registers so some of them moved back to scalar ops due to the bitcasts being moved around.
llvm-svn: 324177
This resolver conforms to the LegacyJITSymbolResolver interface, and will be
replaced with a null-returning resolver conforming to the newer
orc::SymbolResolver interface in the near future. This patch renames the class
to avoid a clash.
llvm-svn: 324175
This change adds support to llvm-dwarfdump for dumping DWARF5
.debug_rnglists sections in regular ELF files.
It is not complete, in that several DW_RLE_* encodings are currently
not supported, but does dump the headert and the basic ranges for
DW_RLE_start_length and DW_RLE_start_end encodings.
Obvious next steps are to add verbose dumping that dumps the raw
encodings, rather than the interpreted contents, to add -verify support
of the section (e.g. to show that the correct number of offsets are
specified), add dumping of .debug_rnglists.dwo, and to add support for
other encodings.
Reviewed by: dblaikie, JDevlieghere
Differential Revision: https://reviews.llvm.org/D42481
llvm-svn: 324077
llvm-objdump could get C feature by ELF::EF_RISCV_RVC e_flag,
so then we don't have to add -mattr=+c on the command line.
Differential Revision: https://reviews.llvm.org/D42629
llvm-svn: 324058
This is a bit faster in theory, in practice it's cold code that's only
active in !NDEBUG, so it probably doesn't make a difference. This is one
of the last users of our homegrown Atomic.h.
llvm-svn: 323999
Summary:
D42698 adds child_edge_{begin|end} and children_edges to GraphTraits
which are used here. The reason for this change is to make it easy to
use count propagation on ModulesummaryIndex. As it stands,
CallGraphTraits is in Analysis while ModuleSummaryIndex is in IR.
Reviewers: davidxl, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42703
llvm-svn: 323994
Summary:
This change is mostly adding comments to GraphTraits describing
interfaces to iterate over children edges of a node. These will
have to be implemented by specializations of GraphTraits. The
non-comment change is the addition of children_edges template
function that returns an iterator range.
The motivation for this is to use it in synthetic count propagation
algorithm and remove the CallGraphTraits class that provide similar
interfaces.
Reviewers: dberlin, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42698
llvm-svn: 323990
Summary:
This change expands the amount of registers stashed by the entry and
`__xray_CustomEvent` trampolines.
We've found that since the `__xray_CustomEvent` trampoline calls can show up in
situations where the scratch registers are being used, and since we don't
typically want to affect the code-gen around the disabled
`__xray_customevent(...)` intrinsic calls, that we need to save and restore the
state of even the scratch registers in the handling of these custom events.
Reviewers: pcc, pelikan, dblaikie, eizan, kpw, echristo, chandlerc
Reviewed By: echristo
Subscribers: chandlerc, echristo, hiraditya, davide, dblaikie, llvm-commits
Differential Revision: https://reviews.llvm.org/D40894
llvm-svn: 323940
For now, we are not using wasm globals, except for modeling of
the stack points.
Alos, factor out common struct WasmGlobalType, which matches the
name for that tuple in the Wasm spec and rename methods
to "isBindingGlobal", "isTypeGlobal" to avoid ambiguity.
Patch by Nicholas Wilson!
Differential Revision: https://reviews.llvm.org/D42750
llvm-svn: 323901
Start using the new LegalizerInfo API introduced in r323681.
Keep the old API for opcodes that need Lowering in some circumstances
(G_FNEG and G_UREM/G_SREM).
llvm-svn: 323876
When selecting a split candidate for region splitting, the register allocator tries to predict which candidate will have the cheapest spill cost.
Global splitting may cause the creation of local intervals, and they might spill.
This patch makes RA take into account the spill cost of local split intervals in use blocks (we already take into account the spill cost in through blocks).
A flag ("-condsider-local-interval-cost") controls weather we do this advanced cost calculation (it's on by default for X86 target, off for the rest).
Differential Revision: https://reviews.llvm.org/D41585
Change-Id: Icccb8ad2dbf13124f5d97a18c67d95aa6be0d14d
llvm-svn: 323870
In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do
copies CPSR ↔ GPR but not all Thumb1 targets implement them.
The schedule can attempt, before attempting a copy, to clone the instructions
but it does not currently do that for nodes with input glue. In this patch we
introduce a target-hook to let the hook decide if a glued machinenode is still
eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS .
As a follow-up of this change we should actually implement the copies for the
Thumb1 targets that do implement them and restrict the hook to the targets that
can't really do such copy as these clones are not ideal.
This change fixes PR35836.
Differential Revision: https://reviews.llvm.org/D42051
llvm-svn: 323857
Sometimes users do not specify data layout in LLVM assembly and let llc set the
data layout by target triple after loading the LLVM assembly.
Currently the parser checks alloca address space no matter whether the LLVM
assembly contains data layout definition, which causes false alarm since the
default data layout does not contain the correct alloca address space.
The parser also calls verifier to check debug info and updating invalid debug
info. Currently there is no way to let the verifier to check debug info only.
If the verifier finds non-debug-info issues the parser will fail.
For llc, the fix is to remove the check of alloca addr space in the parser and
disable updating debug info, and defer the updating of debug info and
verification to be after setting data layout of the IR by target.
For other llvm tools, since they do not override data layout by target but
instead can override data layout by a command line option, an argument for
overriding data layout is added to the parser. In cases where data layout
overriding is necessary for the parser, the data layout can be provided by
command line.
Differential Revision: https://reviews.llvm.org/D41832
llvm-svn: 323826
Summary: ThinLTO may skip object for other reasons, e.g. if there is no summary.
Reviewers: pcc, eugenis
Subscribers: mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D42514
llvm-svn: 323818
Without the patch !if() is only evaluated if it's used directly.
If it's passed through more than one level of class inheritance,
we end up with a reference to an anonymous record with unresolved
references to the original arguments !if may have used.
The root cause of the problem is that TernOpInit::isComplete()
was always returning false and that prevented use of the folded
value of !if() as an initializer for the record at the next level
of inheritance.
Differential Revision: https://reviews.llvm.org/D42695
llvm-svn: 323807
Based on a profile, a couple of hot spots were identified in the
main type merging loop. The code was simplified, a few loops
were re-arranged, and some outlined functions were inlined. This
speeds up type merging by a decent amount, shaving around 3-4 seconds
off of a 40 second link in my test case.
Differential Revision: https://reviews.llvm.org/D42559
llvm-svn: 323790
Introduce an extension to support passing linker options to the linker.
These would be ignored by older linkers, but newer linkers which support
this feature would be able to process the linker.
Emit a special discarded section `.linker-option`. The content of this
section is a pair of strings (key, value). The key is a type identifier for
the parameter. This allows for an argument free parameter that will be
processed by the linker with the value being the parameter. As an example,
`lib` identifies a library to be linked against, traditionally the `-l`
argument for Unix-based linkers with the parameter being the library name.
Thanks to James Henderson, Cary Coutant, Rafael Espinolda, Sean Silva
for the valuable discussion on the design of this feature.
llvm-svn: 323783
candidates with coldcc attribute.
This recommits r322721 reverted due to sanitizer memory leak build bot failures.
Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 323778
r323476 added support for DW_FORM_line_strp, and incorrectly made that
depend on having a DWARFUnit available. We shouldn't be tracking
.debug_line_str in DWARFUnit after all. After this patch, I can do an
NFC follow up and undo a bunch of the "plumbing" part of r323476.
Differential Revision: https://reviews.llvm.org/D42609
llvm-svn: 323691
Prior to committing r323681, we decided to change pick() to identity() since
it wasn't clear from the name what pick() did. However, identity() isn't a very
good name either since it implies that no changes are made. For some reason,
naming it changeTo() didn't occur to me until just after the commit. This
should resolve the lack of clarity that pick() had while still implying that
it changes the MIR.
llvm-svn: 323689
Summary:
As discussed in D42244, we have difficulty describing the legality of some
operations. We're not able to specify relationships between types.
For example, declaring the following
setAction({..., 0, s32}, Legal)
setAction({..., 0, s64}, Legal)
setAction({..., 1, s32}, Legal)
setAction({..., 1, s64}, Legal)
currently declares these type combinations as legal:
{s32, s32}
{s64, s32}
{s32, s64}
{s64, s64}
but we currently have no means to say that, for example, {s64, s32} is
not legal. Some operations such as G_INSERT/G_EXTRACT/G_MERGE_VALUES/
G_UNMERGE_VALUES have relationships between the types that are currently
described incorrectly.
Additionally, G_LOAD/G_STORE currently have no means to legalize non-atomics
differently to atomics. The necessary information is in the MMO but we have no
way to use this in the legalizer. Similarly, there is currently no way for the
register type and the memory type to differ so there is no way to cleanly
represent extending-load/truncating-store in a way that can't be broken by
optimizers (resulting in illegal MIR).
It's also difficult to control the legalization strategy. We've added support
for legalizing non-power of 2 types but there's still some hardcoded assumptions
about the strategy. The main one I've noticed is that type0 is always legalized
before type1 which is not a good strategy for `type0 = G_EXTRACT type1, ...` if
you need to widen the container. It will converge on the same result eventually
but it will take a much longer route when legalizing type0 than if you legalize
type1 first.
Lastly, the definition of legality and the legalization strategy is kept
separate which is not ideal. It's helpful to be able to look at a one piece of
code and see both what is legal and the method the legalizer will use to make
illegal MIR more legal.
This patch adds a layer onto the LegalizerInfo (to be removed when all targets
have been migrated) which resolves all these issues.
Here are the rules for shift and division:
for (unsigned BinOp : {G_LSHR, G_ASHR, G_SDIV, G_UDIV})
getActionDefinitions(BinOp)
.legalFor({s32, s64}) // If type0 is s32/s64 then it's Legal
.clampScalar(0, s32, s64) // If type0 is <s32 then WidenScalar to s32
// If type0 is >s64 then NarrowScalar to s64
.widenScalarToPow2(0) // Round type0 scalars up to powers of 2
.unsupported(); // Otherwise, it's unsupported
This describes everything needed to both define legality and describe how to
make illegal things legal.
Here's an example of a complex rule:
getActionDefinitions(G_INSERT)
.unsupportedIf([=](const LegalityQuery &Query) {
// If type0 is smaller than type1 then it's unsupported
return Query.Types[0].getSizeInBits() <= Query.Types[1].getSizeInBits();
})
.legalIf([=](const LegalityQuery &Query) {
// If type0 is s32/s64/p0 and type1 is a power of 2 other than 2 or 4 then it's legal
// We don't need to worry about large type1's because unsupportedIf caught that.
const LLT &Ty0 = Query.Types[0];
const LLT &Ty1 = Query.Types[1];
if (Ty0 != s32 && Ty0 != s64 && Ty0 != p0)
return false;
return isPowerOf2_32(Ty1.getSizeInBits()) &&
(Ty1.getSizeInBits() == 1 || Ty1.getSizeInBits() >= 8);
})
.clampScalar(0, s32, s64)
.widenScalarToPow2(0)
.maxScalarIf(typeInSet(0, {s32}), 1, s16) // If type0 is s32 and type1 is bigger than s16 then NarrowScalar type1 to s16
.maxScalarIf(typeInSet(0, {s64}), 1, s32) // If type0 is s64 and type1 is bigger than s32 then NarrowScalar type1 to s32
.widenScalarToPow2(1) // Round type1 scalars up to powers of 2
.unsupported();
This uses a lambda to say that G_INSERT is unsupported when type0 is bigger than
type1 (in practice, this would be a default rule for G_INSERT). It also uses one
to describe the legal cases. This particular predicate is equivalent to:
.legalFor({{s32, s1}, {s32, s8}, {s32, s16}, {s64, s1}, {s64, s8}, {s64, s16}, {s64, s32}})
In terms of performance, I saw a slight (~6%) performance improvement when
AArch64 was around 30% ported but it's pretty much break even right now.
I'm going to take a look at constexpr as a means to reduce the initialization
cost.
Future work:
* Make it possible for opcodes to share rulesets. There's no need for
G_LSHR/G_ASHR/G_SDIV/G_UDIV to have separate rule and ruleset objects. There's
no technical barrier to this, it just hasn't been done yet.
* Replace the type-index numbers with an enum to get .clampScalar(Type0, s32, s64)
* Better names for things like .maxScalarIf() (clampMaxScalar?) and the vector rules.
* Improve initialization cost using constexpr
Possible future work:
* It's possible to make these rulesets change the MIR directly instead of
returning a description of how to change the MIR. This should remove a little
overhead caused by parsing the description and routing to the right code, but
the real motivation is that it removes the need for LegalizeAction::Custom.
With Custom removed, there's no longer a requirement that Custom legalization
change the opcode to something that's considered legal.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, reames, bogner
Reviewed By: bogner
Subscribers: hintonda, bogner, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42251
llvm-svn: 323681
Summary:
Fix a few places that were modifying code after register
allocation to set the renamable bit correctly to avoid failing the
validation added in D42449.
llvm-svn: 323675
Summary:
The improvements to the LegalizerInfo discussed in D42244 require that
LegalizerInfo::LegalizeAction be available for use in other classes. As such,
it needs to be moved out of LegalizerInfo. This has been done separately to the
next patch to minimize the noise in that patch.
llvm-svn: 323669
Microsoft Visual Studio rejects the static constexpr static list of
atoms even though it's valid C++. This provides a workaround to unbreak
the bots.
llvm-svn: 323667
MSVC complains that the constexpr "expression did not evaluate to a
constant". Trying to make it happy by adding a `const` specifier as
suggested in https://stackoverflow.com/questions/37574343.
llvm-svn: 323659
This patch adds support for generating accelerator tables in dsymutil.
This feature was already present in our internal repository but not yet
upstreamed because it requires changes to the Apple accelerator table
implementation.
Differential revision: https://reviews.llvm.org/D42501
llvm-svn: 323655
This patch renames DwarfAccelTable.{h,cpp} to AccelTable.{h,cpp} and
moves the header to the include dir so it is accessible by the
dsymutil implementation.
Differential revision: https://reviews.llvm.org/D42529
llvm-svn: 323654
The test was failing because of an incorrect sizeof check in the name
index parsing code. This code was meant to check that we have enough
input to parse the fixed-size part of the dwarf header, which it did by
comparing the input to sizeof(Header). Originally struct Header only
contained the fixed-size part, but during review, we've moved additional
members into it, which rendered the sizeof check invalid.
I resolve this by moving the fixed-size part to a separate struct and
updating the sizeof-expression to use that.
llvm-svn: 323648
Summary:
This modifies the dwarfdump output to align it with the new .debug_names
dump. It also renames two header fields to match similar fields in the
dwarf5 header.
A couple of tests needed to be updated to match new output. The changes
were fairly straight-forward, although not really automatable.
Reviewers: JDevlieghere, aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42415
llvm-svn: 323641
Summary:
This commit renames DWARFAcceleratorTable to AppleAcceleratorTable to free up
the first name as an interface for the different accelerator tables.
Then I add a DWARFDebugNames class for the dwarf5 table.
Presently, the only common functionality of the two classes is the dump()
method, because this is the only method that was necessary to implement
dwarfdump -debug-names; and because the rest of the
AppleAcceleratorTable interface does not directly transfer to the dwarf5
tables (the main reason for that is that the present interface assumes
the tables are homogeneous, but the dwarf5 tables can have different
keys associated with each entry).
I expect to make the common interface richer as I add more functionality
to the new class (and invent a way to represent it in generic way).
In terms of sharing the implementation, I found the format of the two
tables sufficiently different to frustrate any attempts to have common
parsing or dumping code, so presently the implementations share just low
level code for formatting dwarf constants.
Reviewers: vleschuk, JDevlieghere, clayborg, aprantl, probinson, echristo, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42297
llvm-svn: 323638
This patch moves the DJB hash to support. This is consistent with other
hashing algorithms living there. The hash is used by the DWARF
accelerator tables. We're doing this now because the hashing function is
needed by dsymutil and we don't want to link against libBinaryFormat.
Differential revision: https://reviews.llvm.org/D42594
llvm-svn: 323616
Summary:
This change is step two in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use
getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 323597
- using qualified pointer addrspace in intrinsics class to avoid .f32 mangling
- changed too common atomic mangling to ds
- added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic
Reviewed by: b-sumner
Differential Revision: https://reviews.llvm.org/D42383
llvm-svn: 323516
This form is like DW_FORM_strp, but points to .debug_line_str instead
of .debug_str as the string section. It's intended to be used from
the line-table header, and allows string-pooling of directory and
filenames across compilation units.
Differential Revision: https://reviews.llvm.org/D42553
llvm-svn: 323476
Summary:
The intent of this is to allow the code to be used with ThinLTO. In
Thinlink phase, a traditional Callgraph can not be computed even though
all the necessary information (nodes and edges of a call graph) is
available. This is due to the fact that CallGraph class is closely tied
to the IR. This patch first extends GraphTraits to add a CallGraphTraits
graph. This is then used to implement a version of counts propagation
on a generic callgraph.
Reviewers: davidxl
Subscribers: mehdi_amini, tejohnson, llvm-commits
Differential Revision: https://reviews.llvm.org/D42311
llvm-svn: 323475
It was reverted after buildbot regressions.
Original commit message:
This allows relative block frequency of call edges to be passed
to the thinlink stage where it will be used to compute synthetic
entry counts of functions.
llvm-svn: 323460
This brings it in line with std::optional. My recent changes to
make Optional of trivial types trivially copyable introduced
diverging behavior depending on the type, which is bad. Now all
types have the same moving behavior.
llvm-svn: 323445
computeDeadSymbols accessed isLive() which was not public
before. It does not make much sence to keep isLive() private
because flags are available via flags() public member anyways.
llvm-svn: 323415
This patch extends the atom types used by the Apple accelerator tables
with two dsymutil extensions:
- DW_ATOM_type_type_flags
- DW_ATOM_qual_name_hash
llvm-svn: 323414
first argument.
This makes lookupFlags more consistent with lookup (which takes the query as the
first argument) and composes better in practice, since lookups are usually
linearly chained: Each lookupFlags can populate the result map based on the
symbols not found in the previous lookup. (If the maps were returned rather than
passed by reference there would have to be a merge step at the end).
llvm-svn: 323398
https://reviews.llvm.org/D41373
The various components are
GICombinerHelper contains transformations that are common to all
targets. Targets can pick and choose which transformations (at
function/opcode granularity) each pass uses via configuring a
GICombinerInfo.
GICombiner contains some common code and it does the traversal,
driving of combines, worklist management and iterating until
convergence.
GICombinerInfo is an interface with a virtual method called combine.
The combiner info will allow targets to pick and choose (or
implement their own specific combines). CombineInfos can make
use of available combines in GICombineHelper to configure the
transformations for a particular pass. Currently this approach allows
cherry picking transformations from helpers (at function/opcode
granularity) and also allows early returning on specific
transformations. Targets also get to prioritize whether target specific
combines run before/after the opt-in generic combines. Ideally we would
like this part to be configured by both C++ and Tablegen. The
CombinerInfo also has a field which indicates how to deal with
IllegalOps (ie - should we allow to create them/or legalize them?).
A CombinerPass would configure a CombinerInfo, create the GICombiner
with the Info, and call
GICombiner::combineMachineInstrs(MachineFunction&).
This organization is very similar to the GISelLegalizer.
llvm-svn: 323392
functions/methods that return JITSymbols.
lookupFlagsWithLegacyFn takes a SymbolNameSet and a legacy lookup function and
returns a LookupFlagsResult. It uses the legacy lookup function to search for
each symbol. If found, getFlags is called on the symbol and the flags added to
the SymbolFlags map. If not found, the symbol is added to the SymbolsNotFound
set.
lookupWithLegacyFn takes an AsynchronousSymbolQuery, a SymbolNameSet and a
legacy lookup function. Each symbol in the SymbolNameSet is searched for via the
legacy lookup function. If it is found, its getAddress function is called
(triggering materialization if it has not happened already) and the resulting
mapping stored in the query. If it is not found the symbol is added to the
unresolved symbols set which is returned at the end of the function. If an
error occurs during legacy lookup or materialization it is passed to the
query via setFailed and the function returns immediately.
llvm-svn: 323388
This patch adds a LambdaSymbolResolver convenience utility that can create an
orc::SymbolResolver from a pair of function objects that supply the behavior for
the lookupFlags and lookup methods.
This class plays the same role for orc::SymbolResolver as the legacy
LambdaResolver class plays for LegacyJITSymbolResolver, and will replace the
latter class once all ORC APIs are migrated to orc::SymbolResolver.
This patch also adds some documentation for the orc::SymbolResolver class as
this was left out of the original commit.
llvm-svn: 323375
Summary:
This allows relative block frequency of call edges to be passed to the
thinlink stage where it will be used to compute synthetic entry counts
of functions.
Reviewers: tejohnson, pcc
Subscribers: mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D42212
llvm-svn: 323349
Summary:
`getAction(const InstrAspect &) const` breaks encapsulation by exposing
the smaller components that are used to decide how to legalize an
instruction.
This is a problem because we need to change the implementation of
LegalizerInfo so that it's able to describe particular type combinations
rather than just cartesian products of types.
For example, declaring the following
setAction({..., 0, s32}, Legal)
setAction({..., 0, s64}, Legal)
setAction({..., 1, s32}, Legal)
setAction({..., 1, s64}, Legal)
currently declares these type combinations as legal:
{s32, s32}
{s64, s32}
{s32, s64}
{s64, s64}
but we currently have no means to say that, for example, {s64, s32} is
not legal. Some operations such as G_INSERT/G_EXTRACT/G_MERGE_VALUES/
G_UNMERGE_VALUES has relationships between the types that are currently
described incorrectly.
Additionally, G_LOAD/G_STORE currently have no means to legalize non-atomics
differently to atomics. The necessary information is in the MMO but we have no
way to use this in the legalizer. Similarly, there is currently no way for the
register type and the memory type to differ so there is no way to cleanly
represent extending-load/truncating-store in a way that can't be broken by
optimizers (resulting in illegal MIR).
This patch introduces LegalityQuery which provides all the information
needed by the legalizer to make a decision on whether something is legal
and how to legalize it.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, reames, bogner
Reviewed By: bogner
Subscribers: bogner, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D42244
llvm-svn: 323342
This allows us to specify the magic number for the DJB hash function.
This feature is needed by dsymutil to emit Apple types accelerator
table.
llvm-svn: 323341
We're getting bug reports:
https://bugs.llvm.org/show_bug.cgi?id=35807https://bugs.llvm.org/show_bug.cgi?id=35840https://bugs.llvm.org/show_bug.cgi?id=36045
...where we blow up the stack in value tracking because other passes are sending
in selects that have an operand that is itself the select.
We don't currently have a reliable way to avoid analyzing dead code that may take
non-standard forms, so bail out when things go too far.
This mimics the recursion depth limitations in other parts of value tracking.
Unfortunately, this pushes the underlying problems for other passes (jump-threading,
simplifycfg, correlated-propagation) into hiding. If someone wants to uncover those
again, the first draft of this patch on Phab would do that (it would assert rather
than bail out).
Differential Revision: https://reviews.llvm.org/D42442
llvm-svn: 323331
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.
For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.
It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.
Differential Revision: https://reviews.llvm.org/D38313
llvm-svn: 323321
Summary:
This patch extends the DISubrange 'count' field to take either a
(signed) constant integer value or a reference to a DILocalVariable
or DIGlobalVariable.
This is patch [1/3] in a series to extend LLVM's DISubrange Metadata
node to support debugging of C99 variable length arrays and vectors with
runtime length like the Scalable Vector Extension for AArch64. It is
also a first step towards representing more complex cases like arrays
in Fortran.
Reviewers: echristo, pcc, aprantl, dexonsmith, clayborg, kristof.beyls, dblaikie
Reviewed By: aprantl
Subscribers: rnk, probinson, fhahn, aemerson, rengolin, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D41695
llvm-svn: 323313
Summary:
Currently, there are 2 ways to verify a DomTree:
* `DT.verify()` -- runs full tree verification and checks all the properties and gives a reason why the tree is incorrect. This is run by when EXPENSIVE_CHECKS are enabled or when `-verify-dom-info` flag is set.
* `DT.verifyDominatorTree()` -- constructs a fresh tree and compares it against the old one. This does not check any other tree properties (DFS number, levels), nor ensures that the construction algorithm is correct. Used by some passes inside assertions.
This patch introduces DomTree verification levels, that try to close the gape between the two ways of checking trees by introducing 3 verification levels:
- Full -- checks all properties, but can be slow (O(N^3)). Used when manually requested (e.g. `assert(DT.verify())`) or when `-verify-dom-info` is set.
- Basic -- checks all properties except the sibling property, and compares the current tree with a freshly constructed one instead. This should catch almost all errors, but does not guarantee that the construction algorithm is correct. Used when EXPENSIVE checks are enabled.
- Fast -- checks only basic properties (reachablility, dfs numbers, levels, roots), and compares with a fresh tree. This is meant to replace the legacy `DT.verifyDominatorTree()` and in my tests doesn't cause any noticeable performance impact even in the most pessimistic examples.
When used to verify dom tree wrapper pass analysis on sqlite3, the 3 new levels make `opt -O3` take the following amount of time on my machine:
- no verification: 8.3s
- `DT.verify(VerificationLevel::Fast)`: 10.1s
- `DT.verify(VerificationLevel::Basic)`: 44.8s
- `DT.verify(VerificationLevel::Full)`: 1m 46.2s
(and the previous `DT.verifyDominatorTree()` is within the noise of the Fast level)
This patch makes `DT.verifyDominatorTree()` pick between the 3 verification levels depending on EXPENSIVE_CHECKS and `-verify-dom-info`.
Reviewers: dberlin, brzycki, davide, grosser, dmgreen
Reviewed By: dberlin, brzycki
Subscribers: MatzeB, llvm-commits
Differential Revision: https://reviews.llvm.org/D42337
llvm-svn: 323298
Summary:
Currently, there is no way to extract a basic block from a function easily. This patch
extends llvm-extract to extract the specified basic block(s).
Reviewers: loladiro, rafael, bogner
Reviewed By: bogner
Subscribers: hintonda, mgorny, qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D41638
llvm-svn: 323266
The grow_memory and current_memory instructions are expected to be
officially renamed to mem.grow and mem.size. Introduce new intrinsics
with the new names. These new names aren't yet official, so for now,
use them at your own risk.
Also, take this opportunity to add arguments for the currently unused
immediate field in those instructions.
llvm-svn: 323222
Summary:
This patch is adding remark messages to the LoopVersioning LICM pass,
which will be useful for optimization remark emitter (ORE) infrastructure.
Patch by: Deepak Porwal
Reviewers: anemet, ashutosh.nema, eastig
Subscribers: eastig, vivekvpandya, fhahn, llvm-commits
llvm-svn: 323183
This applies to most pipelines except the LTO and ThinLTO backend
actions - it is for use at the beginning of the overall pipeline.
This extension point will be used to add the GCOV pass when enabled in
Clang.
llvm-svn: 323166
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.
When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.
This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.
Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41723
llvm-svn: 323155
Use 'unsigned' for these bitfields so they actually pack together.
Previously it used three words for these bits instead of one.
Add some static_asserts to prevent this from being undone.
llvm-svn: 323135
This frees up the first name to be used as an base class for the
apple table and the dwarf5 .debug_names accel table. The rename was
split off from D42297 (adding of debug_names support), which is still
under review.
llvm-svn: 323113
Summary:
Discovered when clangd loads YAML symbols, some symbol documentations
start with indicators (e.g. "-"), but YAML prints them as plain scalars
(no quotes), which make the YAML parser fail to parse.
For these kind of strings, we need quotes.
Reviewers: sammccall
Reviewed By: sammccall
Subscribers: ilya-biryukov, ioeric, llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D42362
llvm-svn: 323097
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains).
2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings.
3. BreakFalseDeps - Breaks false dependencies.
4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks.
This also included the following changes to ExcecutionDepsFix original logic:
1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class.
2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class.
Additional changes in affected files:
1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate.
2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate.
Additional refactoring changes will follow.
This commit is (almost) NFC.
The only functional change is that now BreakFalseDeps will break dependency for all register classes.
Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet.
In a future commit several instructions (and tests) will be added.
This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.
Additional relevant reviews:
https://reviews.llvm.org/D40331https://reviews.llvm.org/D40332https://reviews.llvm.org/D40333https://reviews.llvm.org/D40334
Differential Revision: https://reviews.llvm.org/D40330
Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275
llvm-svn: 323087
Summary:
This adds a definition of the .debug_names section and the new constants
(DW_IDX_???) which are used in it.
Reviewers: JDevlieghere, aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42296
llvm-svn: 323084
orc::SymbolResolver to JITSymbolResolver adapter.
The new orc::SymbolResolver interface uses asynchronous queries for better
performance. (Asynchronous queries with bulk lookup minimize RPC/IPC overhead,
support parallel incoming queries, and expose more available work for
distribution). Existing ORC layers will soon be updated to use the
orc::SymbolResolver API rather than the legacy llvm::JITSymbolResolver API.
Because RuntimeDyld still uses JITSymbolResolver, this patch also includes an
adapter that wraps an orc::SymbolResolver with a JITSymbolResolver API.
llvm-svn: 323073
lookupFlags returns a SymbolFlagsMap for the requested symbols, along with a
set containing the SymbolStringPtr for any symbol not found in the VSO.
The JITSymbolFlags for each symbol will have been stripped of its transient
JIT-state flags (i.e. NotMaterialized, Materializing).
Calling lookupFlags does not trigger symbol materialization.
llvm-svn: 323060
By using a union for Constant* and ConstantRange we can shave off ptr
size bytes off lattice elements. On 64 bit systems, it brings down the
size to 40 bytes from 48 bytes.
Initialization of Range happens on-demand using placement new, if the
state changes to constantrange from non-constantrange. Similarly, the
Range object is destroyed if the state changes from constantrange to
non-constantrange.
Reviewers: reames, anna, davide
Reviewed By: reames, davide
Differential Revision: https://reviews.llvm.org/D41903
llvm-svn: 323049
This (together with the corresponding LLD commit, that contains the
testcase updates) fixes PR35733.
Differential Revision: https://reviews.llvm.org/D41631
llvm-svn: 323035
These fix some odd cfg cases where batch-updating the post
dom tree fails. Usually around infinite loops and roots
ending up being different.
Differential Revision: https://reviews.llvm.org/D42247
llvm-svn: 323034
`llvm.used` contains a list of pointers to named values which the
compiler, assembler, and linker are required to treat as if there is a
reference that they cannot see. Ensure that the symbols are preserved
by adding an explicit `-include` reference to the linker command.
llvm-svn: 323017
ExternalSymbolMap now stores the string key (rather than using a StringRef),
as the object file backing the key may be removed at any time.
llvm-svn: 323001
Summary:
This patch attempts to fix the DomTree incremental insertion bug found here [[ https://bugs.llvm.org/show_bug.cgi?id=35969 | PR35969 ]] .
When performing an insertion into a piece of unreachable CFG, we may find the same not at different levels. When this happens, the node can turn out to be affected when we find it starting from a node with a lower level in the tree. The level at which we start visitation affects if we consider a node affected or not.
This patch tracks the lowest level at which each node was visited during insertion and allows it to be visited multiple times, if it can cause it to be considered affected.
Reviewers: brzycki, davide, dberlin, grosser
Reviewed By: brzycki
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42231
llvm-svn: 322993
Previously, the DIBuilder didn't expose functionality to set its compile unit
in any other way than calling createCompileUnit. This meant that the outliner,
which creates new functions, had to create a new compile unit for its debug
info.
This commit adds an optional parameter in the DIBuilder's constructor which
lets you set its CU at construction.
It also changes the MachineOutliner so that it keeps track of the DISubprograms
for each outlined sequence. If debugging information is requested, then it
uses one of the outlined sequence's DISubprograms to grab a CU. It then uses
that CU to construct the DISubprogram for the new outlined function.
The test has also been updated to reflect this change.
See https://reviews.llvm.org/D42254 for more information. Also see the e-mail
discussion on D42254 in llvm-commits for more context.
llvm-svn: 322992
This avoids playing games with pseudo pass IDs and avoids using an
unreliable MRI::isSSA() check to determine whether register allocation
has happened.
Note that this renames:
- MachineLICMID -> EarlyMachineLICM
- PostRAMachineLICMID -> MachineLICMID
to be consistent with the EarlyTailDuplicate/TailDuplicate naming.
llvm-svn: 322927
Split TailDuplicatePass into EarlyTailDuplicate and TailDuplicate. This
avoids playing games with fake pass IDs and using MRI::isSSA() to
determine pre-/post-RA state.
llvm-svn: 322926
Re-commit of r322200: The testcase shouldn't hit machineverifiers
anymore with r322917 in place.
Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.
This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
pointer.
- Note that `useFPForScavengingIndex()` would previously return false
when a base pointer was available leading to the emergency spillslot
getting allocated late (that's the whole effect of this callback).
Which made no sense to me so I took this case out: Even though the
emergency spillslot is technically not referenced by FP in this case
we still want it allocated early.
Differential Revision: https://reviews.llvm.org/D40876
llvm-svn: 322919
Bulk queries reduce IPC/RPC overhead for cross-process JITing and expose
opportunities for parallel compilation.
The two new query methods are lookupFlags, which finds the flags for each of a
set of symbols; and lookup, which finds the address and flags for each of a
set of symbols. (See doxygen comments for more details.)
The existing JITSymbolResolver class is renamed LegacyJITSymbolResolver, and
modified to extend the new JITSymbolResolver class using the following scheme:
- lookupFlags is implemented by calling findSymbolInLogicalDylib for each of the
symbols, then returning the result of calling getFlags() on each of these
symbols. (Importantly: lookupFlags does NOT call getAddress on the returned
symbols, so lookupFlags will never trigger materialization, and lookupFlags will
never call findSymbol, so only symbols that are part of the logical dylib will
return results.)
- lookup is implemented by calling findSymbolInLogicalDylib for each symbol and
falling back to findSymbol if findSymbolInLogicalDylib returns a null result.
Assuming a symbol is found its getAddress method is called to materialize it and
the result (if getAddress succeeds) is stored in the result map, or the error
(if getAddress fails) is returned immediately from lookup. If any symbol is not
found then lookup returns immediately with an error.
This change will break any out-of-tree derivatives of JITSymbolResolver. This
can be fixed by updating those classes to derive from LegacyJITSymbolResolver
instead.
llvm-svn: 322913
This adds a new instrinsic to support the rdpid instruction. The implementation is a bit weird because the intrinsic is defined as always returning 32-bits, but the assembler support thinks the instruction produces a 64-bit register in 64-bit mode. But really it zeros the upper 32 bits. So I had to add separate patterns where 64-bit mode uses an extract_subreg.
Differential Revision: https://reviews.llvm.org/D42205
llvm-svn: 322910
There's some abstraction overhead in the underlying
mechanisms that were being used, and it was leading to an
abundance of small but not-free copies being made. This
showed up on a profile. Eliminating this and going back to
a low-level byte-based implementation speeds up lld with
/DEBUG between 10 and 15%.
Differential Revision: https://reviews.llvm.org/D42148
llvm-svn: 322871
r322086 removed the trailing information describing reg classes for each
register.
This patch adds printing reg classes next to every register when
individual operands/instructions/basic blocks are printed. In the case
of dumping MIR or printing a full function, by default don't print it.
Differential Revision: https://reviews.llvm.org/D42239
llvm-svn: 322867
While the memmove workaround fixed it for GCC 6.3. GCC 4.8 and GCC 7.1
are still broken. I have no clue what's going on, just blacklist GCC for
now.
Needless to say this code is ubsan, asan and msan-clean.
llvm-svn: 322862
I've seen random crashes with GCC 4.8, GCC 6.3 and GCC 7.3, triggered by
my Optional change. All of them affect a different set of targets. This
change fixes the instance of the problem I'm seeing on my local machine,
let's hope it's good enough for the other instances too.
llvm-svn: 322859
This makes uses of Optional more transparent to the compiler (and
clang-tidy) and generates slightly smaller code.
This is a re-land of r317019, which had issues with GCC 4.8 back then.
Those issues don't reproduce anymore, but I'll watch the buildbots
closely in case anything goes wrong.
llvm-svn: 322838
This is similar to r322317, but for visibility. It is not as neat
because we have to special case extern_weak.
The idea is the same as the previous change, make the transition to
explicit dso_local easier for the frontends. With this they only have
to add dso_local to symbols where we need some external information to
decide if it is dso_local (like it being part of an ELF executable).
llvm-svn: 322806
Right now, it is not possible to run MachineCSE in the middle of the
GlobalISel pipeline. Being able to run generic optimizations between the
core passes of GlobalISel was one of the goals of the new ISel framework.
This is the first attempt to do it.
The problem is that MachineCSE pass assumes all register operands have a
register class, which, in GlobalISel context, won't be true until after the
InstructionSelect pass. The reason for this behaviour is that before
replacing one virtual register with another, MachineCSE pass (and most of
the other optimization machine passes) must check if the virtual registers'
constraints have a (sufficiently large) intersection, and constrain the
resulting register appropriately if such intersection exists.
GlobalISel extends the representation of such constraints from just a
register class to a triple (low-level type, register bank, register
class).
This commit adds MachineRegisterInfo::constrainRegAttrs method that extends
MachineRegisterInfo::constrainRegClass to such a triple.
The idea is that going forward we should use:
- RegisterBankInfo::constrainGenericRegister within GlobalISel's
InstructionSelect pass
- MachineRegisterInfo::constrainRegClass within SelectionDAG ISel
- MachineRegisterInfo::constrainRegAttrs everywhere else regardless
the target and instruction selector it uses.
Patch by Roman Tereshin. Thanks!
llvm-svn: 322805
Summary:
This patch adds a new target option in order to control GlobalISel.
This will allow the users to enable/disable GlobalISel prior to the
backend by calling `TargetMachine::setGlobalISel(bool Enable)`.
No test case as there is already a test to check GlobalISel
command line options.
See: CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll.
Reviewers: qcolombet, aemerson, ab, dsanders
Reviewed By: qcolombet
Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42137
llvm-svn: 322773
Summary:
The class wraps a uint64_t and an enum to represent the type of profile
count (real and synthetic) with some helper methods.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41883
llvm-svn: 322771
Get rid of DEBUG_FUNCTION_NAME symbols. When we actually debug
data, maybe we'll want somewhere to put it... but having a symbol
that just stores the name of another symbol seems odd.
It means you have multiple Symbols with the same name, one
containing the actual function and another containing the name!
Store the names in a vector on the WasmObjectFile when reading
them in. Also stash them on the WasmFunctions themselves.
The names are //not// "symbol names" or aliases or anything,
they're just the name that a debugger should show against the
function body itself. NB. The WasmObjectFile stores them so that
they can be exported in the YAML losslessly, and hence the tests
can be precise.
Enforce that the CODE section has been read in before reading
the "names" section. Requires minor adjustment to some tests.
Patch by Nicholas Wilson!
Differential Revision: https://reviews.llvm.org/D42075
llvm-svn: 322741
candidates with coldcc attribute.
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 322721
We seem to be (logically) returning ArchExtKinds here in all cases, so
the return type should reflect that.
The static_cast is necessary because `A.ID` is actually an `unsigned`,
presumably since we use `decltype(A)` to represent extended attributes
for both ARM and AArch64, which use distinct `ArchExtKinds`.
We can't trivially make the same change for ARM, because one of the
values it returns is the bitwise-or of two `ARM::ArchExtKind`s.
llvm-svn: 322613
Summary:
- Fix a bug in PrettyBuiltinDumper that returns "void" as the name for
an unspecified builtin type. Since the unspecified param of a variadic
function is considered a builtin of unspecified type in PDBs, we set
"..." for its name.
- Provide a method to determine if a PDBSymbolFunc is variadic in
PrettyFunctionDumper since PDBSymbolFunc::getArgument() doesn't return the
last unspecified-type param.
- Add a pretty-func-dumper.test to test pretty dumping of variadic
functions.
Reviewers: zturner, llvm-commits
Reviewed By: zturner
Differential Revision: https://reviews.llvm.org/D41801
llvm-svn: 322608
Summary:
This patch adds CustomRenderer which renders the matched
operands to the specified instruction.
Targets can enable the matching of SDNodeXForm by adding
a definition that inherits from GICustomOperandRenderer and
GISDNodeXFormEquiv as follows.
def gi_imm8 : GICustomOperandRenderer<"renderImm8”>,
GISDNodeXFormEquiv<imm8_xform>;
Custom renderer functions should be of the form:
void render(MachineInstrBuilder &MIB, const MachineInstr &I);
Reviewers: dsanders, ab, rovka
Reviewed By: dsanders
Subscribers: kristof.beyls, javed.absar, llvm-commits, mgrang, qcolombet
Differential Revision: https://reviews.llvm.org/D42012
llvm-svn: 322582
Summary: Not sure this needs a review or not. Erring on the safe side.
Reviewers: dblaikie
Differential Revision: https://reviews.llvm.org/D41666
llvm-svn: 322538
The ELF specification says that all ELF data structures are aligned to
their natural alignments both in memory and file. That means when we
access mmap'ed ELF files, we could assume that all data structures are
aligned properly.
However, in reality, we assume that the data structures are aligned only
to two bytes because .a files only guarantee that their member files are
aligned to two bytes in archive files. So the data access is already
unaligned.
This patch relaxes the alignment requirement even more, so that we
accept unaligned access to all ELF data structures.
This patch in particular makes lld bug-compatible with icc. Intel C
compiler doesn't seem to care about data alignment and generates unaligned
relocation sections (https://bugs.llvm.org/show_bug.cgi?id=35854).
I also saw another instance of compatibility issues with our internal tool
which creates unaligned section headers.
Because GNU linkers are not picky about alignment, looks like it is
not uncommon that ELF-generating tools create unaligned files.
There is a performance penalty with this patch on host machines on which
unaligned access is expensive. x86 and AArch64 are fine. ARMv6 is a
problem, but I don't think using ARMv6 machines as hosts is common, so I
believe it's not a real problem.
Differential Revision: https://reviews.llvm.org/D41978
llvm-svn: 322407
Summary:
In preparation for https://reviews.llvm.org/D41675 this NFC changes this
prototype of MemIntrinsicInst::setAlignment() to accept an unsigned instead
of a Constant.
llvm-svn: 322403
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.
Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.
LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perform the
preversation was minimally altered and simply marked as
preserved for the PassManager to be informed.
This extends the analysis available to JumpThreading for
future enhancements such as threading across loop headers.
Reviewers: dberlin, kuhar, sebpop
Reviewed By: kuhar, sebpop
Subscribers: mgorny, dmgreen, kuba, rnk, rsmith, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40146
llvm-svn: 322401
We can probably take this a step further since the only
user of the isUsed flag is AsmParser it should probably
be doing this explicitly. For now this is a step in the
right direction though.
Differential Revision: https://reviews.llvm.org/D41971
llvm-svn: 322386
It was never fully disallowed. We were rejecting it in the asm parser,
but not in the verifier.
Currently TargetMachine::shouldAssumeDSOLocal returns true for hidden
ifuncs. I considered changing it and moving the check from the asm
parser to the verifier.
The reason for deciding to allow it instead is that all linkers handle
a direct reference just fine. They use the plt address as the address
of the function. In fact doing that means that clang doesn't have the
same bug as gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83782.
This patch then removes the check from the asm parser and updates the
bitcode reader and writer.
llvm-svn: 322378
ExecutionSession will represent a running JIT program.
VModuleKey is a unique key assigned to each module added as part of
an ExecutionSession. The Layer concept will be updated in future to
require a VModuleKey when a module is added.
llvm-svn: 322336
The PeepholeOptimizer would fail for vregs without a definition. If this
was caused by an undef operand abort to keep the code simple (so we
don't need to add logic everywhere to replicate the undef flag).
Differential Revision: https://reviews.llvm.org/D40763
llvm-svn: 322319
While updating clang tests for having clang set dso_local I noticed
that:
- There are *a lot* of tests to update.
- Many of the updates are redundant.
They are redundant because a GV is "obviously dso_local". This patch
starts formalizing that a bit by requiring that internal and private
GVs be dso_local too. Since they all are, we don't have to print
dso_local to the textual representation, making it a bit more compact
and easier to read.
llvm-svn: 322317