Should fix some more bot failures from r291108.
This should have been a DenseSet, since GUID is not a pointer type.
It caused some bots to fail, but for some reason I wasnt't getting a
build failure.
llvm-svn: 291115
Should fix bot failures due to r291108 which happened due to a
change required in ModuleSummaryIndexYAML.h which was just added in
r291069.
llvm-svn: 291111
Summary:
This adds a new summary flag NotEligibleToImport that subsumes
several existing flags (NoRename, HasInlineAsmMaybeReferencingInternal
and IsNotViableToInline). It also subsumes the checking of references
on the summary that was being done during the thin link by
eligibleForImport() for each candidate. It is much more efficient to
do that checking once during the per-module summary build and record
it in the summary.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28169
llvm-svn: 291108
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.
Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.
Differential Revision: https://reviews.llvm.org/D27518
llvm-svn: 291106
To make this work, pointers from the MachineBasicBlock to the LLVM-IR-level
basic blocks need to be initialized, as the AsmPrinter uses this link to be
able to print out labels for the basic blocks that are address-taken.
Most of the changes in this commit are about adapting existing tests to include
the basic block name that is now printed out in the MIR format, now that the
name becomes available as the link to the LLVM-IR basic block is initialized.
The relevant test change for the functionality added in this patch are the
added "(address-taken)" strings in
test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll.
Differential Revision: https://reviews.llvm.org/D28123
llvm-svn: 291105
This commit does this using a trivial chain of conditional branches. In the
future, we probably want to reuse the optimized switch lowering used in
SelectionDAG.
Differential Revision: https://reviews.llvm.org/D28176
llvm-svn: 291099
Summary:
Intel's i5-6300U CPU is reporting to have a model id of 78 (4e).
The Host detection assumes that to be Skylake Xeon (with AVX512 support),
instead of a normal Skylake machine.
Patch by: Valentin Churavy
Reviewers: nalimilan, craig.topper
Subscribers: hfinkel, tkelman, craig.topper, nalimilan, llvm-commits
Differential Revision: https://reviews.llvm.org/D28221
llvm-svn: 291084
Set up basic YAML I/O support for module summaries, plumb the summary into
the pass and add a few command line flags to test YAML I/O support. Bitcode
support to come separately, as will the code in LowerTypeTests that actually
uses the summary. Also add a couple of tests that pass by virtue of the pass
doing nothing with the summary (which happens to be the correct thing to do
for those tests).
Differential Revision: https://reviews.llvm.org/D28041
llvm-svn: 291069
a cxxabi.h in the include search paths.
This comes up when libc++ is installed with some other abi library. At
some points in time in history we have had CMake hackery to try and get
a cxxabi.h installed that would work, but there are lots of examples
lacking this. Also, the just-built tree with libc++ seems to not quite
get this right.
To let folks make progress, we can easily work around this by detecting
that the header is missing and disabling the relevant parts of gtest.
This should fix the last remainging build bot failures. While these
failures are typically indicative of a questionable install, I don't
think gtest should be the thing that surfaces those issues and I don't
want folks blocked on this.
llvm-svn: 291063
Summary:
This covers most of PassManager.h, up to the introduction of inner/outer
analysis proxies.
If there's a theme to these changes, it's simplifying the language. For
example:
* PreservedAnalyses is a "set of analyses", not an "abstract set".
"Abstract" doesn't have any particular meaning here.
* "Build types for the concept types" becomes "define the concept types".
* Instead of "data structures optimized for pointer-like types using
the alignment-provided low bits", say "data structures that use the
low bits of pointers."
* "Clear the map pointing into the results list" becomes
"Delete the map entries that point into the results list."
This patch also fixes a few places where we referred to "function" and
"module" pass/analysis managers, instead of the more abstract "IRUnitT"
PM/AMs we have now.
Subscribers: mehdi_amini
Differential Revision: https://reviews.llvm.org/D27367
llvm-svn: 291040
performing partial redundancy elimination (PRE). Not doing so can cause jumpy line
tables and confusing (though correct) source attributions.
Differential Revision: https://reviews.llvm.org/D27857
llvm-svn: 291037
I somehow wrote this fix and then lost it prior to commit. Really sorry
about the noise. This should fix some issues with hacking add_definition
to do things with warning flags.
llvm-svn: 291033
This required re-working the streaming support and lit's support for
'--gtest_list_tests' but otherwise seems to be a clean upgrade.
Differential Revision: https://reviews.llvm.org/D28154
llvm-svn: 291029
Summary:
This is a relatively simple scheme: we use the index emitted in the
bitcode to avoid loading all the global metadata. Instead we load
the index with their position in the bitcode so that we can load each
of them individually. Materializing the global metadata block in this
condition only triggers loading the named metadata, and the ones
referenced from there (transitively). When materializing a function,
metadata from the global block are loaded lazily as they are
referenced.
Two main current limitations are:
1) Global values other than functions are not materialized on demand,
so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records
(and their transitive dependencies).
2) When we load a single metadata, we don't recurse on the operands,
instead we use a placeholder or a temporary metadata. Unfortunately
tepmorary nodes are very expensive. This is why we don't have it
always enabled and only for importing.
These two limitations can be lifted in a subsequent improvement if
needed.
With this change, the total link time of opt with ThinLTO and Debug
Info enabled is going down from 282s to 224s (~20%).
Reviewers: pcc, tejohnson, dexonsmith
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28113
llvm-svn: 291027
If this is a problem for anyone (shared_ptr is two pointers in size,
whereas IntrusiveRefCntPtr is 1 - and the ref count control block that
make_shared adds is probably larger than the one int in RefCountedBase)
I'd prefer to address this by adding a lower-overhead version of
shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing)
to avoid the intrusiveness - this allows memory ownership to remain
orthogonal to types and at least to me, seems to make code easier to
understand (since no implicit ownership acquisition can happen).
This recommits 291006, reverted in r291007.
llvm-svn: 291016
Summary:
When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero
extended. For example, given double 65534.0, without legalization:
fp-to-uint16: 65534.0 -> 0xfffe
With the legalization:
fp-to-sint32: 65534.0 -> 0x0000fffe
Without this patch, legalization wrongly emits a signed extend assertion,
which is consumed by later icmp instruction, and cause miscompile.
Note that the floating point value must be in [0, 65535), otherwise the
behavior is undefined.
This patch reverts r279223 behavior and adds more tests and
documentations.
In PR29041's context, James Molloy mentioned that:
We don't need to mask because conversion from float->uint8_t is
undefined if the integer part of the float value is not representable in
uint8_t. Therefore we can assume this doesn't happen!
which is totally true and good, because fptoui is documented clearly to
have undefined behavior when overflow/underflow happens. We should take
the advantage of this behavior so that we can save unnecessary mask
instructions.
Reviewers: jmolloy, nadav, echristo, kbarton
Subscribers: mehdi_amini, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28284
llvm-svn: 291015
Summary:
Instead of matching:
(a + i) + 1 -> (a + i, undef, 1)
Now it matches:
(a + i) + 1 -> (a, i, 1)
Reviewers: rengolin
Differential Revision: http://reviews.llvm.org/D26367
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 291012
If this is a problem for anyone (shared_ptr is two pointers in size,
whereas IntrusiveRefCntPtr is 1 - and the ref count control block that
make_shared adds is probably larger than the one int in RefCountedBase)
I'd prefer to address this by adding a lower-overhead version of
shared_ptr (possibly refactoring IntrusiveRefCntPtr into such a thing)
to avoid the intrusiveness - this allows memory ownership to remain
orthogonal to types and at least to me, seems to make code easier to
understand (since no implicit ownership acquisition can happen).
llvm-svn: 291006
The intrusive nature of the reference counting is not required/used
here, so simplify the ownership model to make the code easier to
understand.
llvm-svn: 291005
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:
1. When determining tail-call eligibility. If we make a tail call (i.e.
directly branch to a function) then there is no place for the linker to add
a TOC restoration.
2. When determining when we need to add a nop instruction after a call.
Likewise, if there is a possibility that the linker might need to add a
TOC restoration after a call, then we need to put a nop after the call
(the bl instruction).
First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.
Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).
There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.
Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.
Differential Revision: https://reviews.llvm.org/D27231
llvm-svn: 291003
This roughly matches the semantics of std::enable_shared_from_this - that it
does not dictate the ownership model of all users, but constrains those users
taking advantage of the intrusive nature to do so only when there's a guarantee
that that's the ownership model being used for the object being passed.
Reviewers: jlebar
Differential Revision: https://reviews.llvm.org/D28245
llvm-svn: 290987
This test case has been reduced from test/Analysis/RegionInfo/mix_1.ll and
provides us with a minimal example of a test case which caused problems while
working on an improved version of the RegionInfo analysis. We upstream this
test case, as it certainly can be helpful in future debugging and optimization
tests.
Test case reduced by Pratik Bhatu <cs12b1010@iith.ac.in>
llvm-svn: 290974
This reapplies r289828 (reverted in r289833 as it broke the address sanitizer). The
debugloc is now only set when the instruction is not a call, as this causes the
verifier to assert (the inliner requires an inlinable callsite to have a debug loc
if the caller and callee have debug info).
Original commit message:
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Original review: https://reviews.llvm.org/D27590
llvm-svn: 290973
These tests are missing a target triple and the -m elf_x86_64 gold option,
which makes them fail on non-x86 targets.
Differential revision: https://reviews.llvm.org/D28285
Reviewers: tejohnson
llvm-svn: 290965
Summary:
Change llvm-link to use the FunctionImporter handling, instead of
manually invoking the Linker. We still need to load the module
in llvm-link to do the desired testing for invalid import requests
(weak functions), and to get the GUID (in case the function is local).
Also change the drop-debug-info test to use llvm-link so that importing
is forced (in order to test debug info handling) and independent of
import logic changes.
Reviewers: mehdi_amini
Subscribers: mgorny, llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D28277
llvm-svn: 290964
This CPU type was not previously recognized by LLVM which led to emitting
poor (and sometimes incorrect) code in some JIT workloads on such a machine.
llvm-svn: 290961
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.
In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.
Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)
Reviewers: aprantl, MatzeB, mkuper
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D27688
llvm-svn: 290955
This just removes the usage of llvm::reverse and llvm::seq. That makes
it harder to handle the empty case correctly and so I've also added
a test there.
This is just a shot in the dark at what might be behind the buildbot
failures. I can't reproduce any issues locally including with ASan...
I feel like I'm missing something...
llvm-svn: 290954
This is both convenient and more efficient as we can skip any
intermediate reallocation of the vector.
This usage pattern came up in a subsequent patch on the pass manager,
but it seems generically useful so I factored it out and added unittests
here.
llvm-svn: 290952
Summary:
The InlineSpiller was accessing the DominatorTreeBase directly
through the public data member DT in the MachineDominatorTree.
This is not a good idea as the "cached" information in
SplitCriticalEdges is not applied before the access.
The DominatorTreeBase must be accessed through the member
function getBase() in MachineDominatorTree.
The fault was introduced in r266162.
I think the public data member DT in the MachineDominatorTree
should have been made private in the original code (r215576)
that introduced the concept of lazily updating the
MachineDominatorTree information from
MachineBasicBlock::SplitCriticalEdge().
Patch by Karl-Johan Karlsson <karl-johan.karlsson@ericsson.com>
Reviewers: wmi, qcolombet
Subscribers: llvm-commits, bjope, uabelho
Differential Revision: https://reviews.llvm.org/D27983
llvm-svn: 290950
Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly.
Differential Revision: https://reviews.llvm.org/D28138
llvm-svn: 290948
In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.
This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).
Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs.
I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need.
Differential Revision: https://reviews.llvm.org/D28219
llvm-svn: 290947
0-7: Address
8-11: Line
12-13: Column
14-15: File
16-19: Isa
20-23: Discriminator
24+: bit fields
The packing is fine until the "Isa" field, which is an 8-bit int that occupies 4 bytes. We can instead move Discriminator into the 16-19 slot, and pack Isa into the 20-23 range along with the bit fields:
0-7: Address
8-11: Line
12-13: Column
14-15: File
16-19: Discriminator
20-23: Isa + bit fields
This layout is only 24 bytes. This 25% reduction in size may seem small but a large binary can have line tables with thousands of rows stored in a vector.
Patch by Simon Que!
Differential Revision: https://reviews.llvm.org/D27961
llvm-svn: 290931
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))
This is only possible if C2 is negative and C2 is greater than or equal to negative C1.
llvm-svn: 290927
This test was testing that we could correctly find the parent of a DIE, but it was actually just testing the special case where a DIE's depth was 1. This corrects that error by adding an extra level into the the DWARF to ensure that we correctly get the parent by looking for the parent with a depth that is 1 less than the current depth.
Differential Revision: https://reviews.llvm.org/D28261
llvm-svn: 290918
As per post-commit review for r289993 (D27775), we can only safely
import a type as a decl if it has an Identifier, as the Name alone
is not enough to be unique across modules.
llvm-svn: 290915
I'm not sure what determines the minor version, but it appears
that it's possible for a fully updated, release version of
VS2015 with Update 3 can go (at least) as low as 19.00.24213.1.
Updating the compiler version check to account for this so we
don't generate superfluous warnings.
llvm-svn: 290914
I wrote this patch before seeing the comment in:
https://reviews.llvm.org/D27114
...that suggests we should actually be canonicalizing the other way.
So just in case we decide this is the right way, we might as well
have a cleaner implementation.
llvm-svn: 290912
Use getReturnedArgOperand() instead of rolling our own. Note that it's
equivalent because there can only be one 'returned' operand.
The existing code was also incorrect: there already was awkward logic to
ignore callee/EH blocks, but operands can now also be operand bundles,
in which case we'll look for non-existent parameter attributes.
Unfortunately, this isn't observable in-tree, as it only crashes when
exercising the regular call lowering logic with operand bundles.
Still, this is a nice small cleanup anyway.
llvm-svn: 290905
Provide a distinct contents for semBogus and semPPCDoubleDouble in order
to prevent compilers from collapsing them to a single memory address,
while we heavily rely on every semantic having distinct address.
This happens if insecure optimization collapsing identical values is
enabled. As a result, APFloats of semBogus are indistinguishable from
semPPCDoubleDouble -- and whenever the move constructor is used, the old
value beings being incorrectly recognized as a semPPCDoubleDouble.
Since the values in semPPCDoubleDouble are not used anywhere,
we can easily solve this issue via altering the value of one of the
fields and therefore ensuring that the collapse can not occur.
Differential Revision: https://reviews.llvm.org/D28112
llvm-svn: 290896
Summary:
No need to have this per-architecture. While there, unify 32-bit ARM's
behaviour with what changed elsewhere and start function names lowercase
as per the coding standards. Individual entry emission code goes to the
entry's own class.
Fully tested on amd64, cross-builds on both ARMs and PowerPC.
Reviewers: dberris
Subscribers: aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D28209
llvm-svn: 290858
Summary:
Regardless how the loop body weight is distributed, we should preserve
total loop body weight. i.e. we should have same weight reaching the body of the loop
or its duplicates in peeled and unpeeled case.
Reviewers: mkuper, davidxl, anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28179
llvm-svn: 290833
Apparently my suggestion of using ternary doesn't really work
as clang complains about incompatible types on LHS and RHS. Some
GCC versions happen to accept the code but clang behaviour is
correct here.
llvm-svn: 290822
Add an explicit LLVM_ENABLE_DIA_SDK option to control building support
for DIA SDK-based debugging. Control its value to match whether DIA SDK
support was found and expose it in LLVMConfig (alike LLVM_ENABLE_ZLIB).
Its value is needed for LLDB to determine whether to run tests requiring
DIA support. Currently it is obtained from llvm/Config/config.h;
however, this file is not available for standalone builds. Following
this change, LLDB will be modified to use the value from LLVMConfig.
Differential Revision: https://reviews.llvm.org/D26255
llvm-svn: 290818
GNU as rejects input where .cfi_sections is used after .cfi_startproc,
if the new section differs from the old. Adjust our output to always
emit .cfi_sections before the first .cfi_startproc to minimize necessary
code.
Differential Revision: https://reviews.llvm.org/D28011
llvm-svn: 290817
Summary:
This avoids the very fragile code for null expressions. We could also use a denseset that tracks which things have null expressions instead, but that seems pretty fragile and premature optimization.
This resolves a number of infinite loop cases, test reductions coming.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28193
llvm-svn: 290816
Summary: Previously, we tried to fix up the equivalences during symbolic evaluation. This does not work. Now, we change the equivalences during congruence finding, where it belongs. We also initialize the equivalence table to give a maximal answer.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28192
llvm-svn: 290815
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.
In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).
* Shiffle-broadcast cost will be changed in Simon's upcoming patch.
Differential Revision: https://reviews.llvm.org/D28118
llvm-svn: 290810
Summary:
`PromotedFloats` needs to be checked in
`DAGTypeLegalizer::PerformExpensiveChecks`. This patch fixes a few type
legalization failures with expansive checks for ARM fp16 tests.
Reviewers: baldrick, bogner, arsenm
Subscribers: arsenm, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D28187
llvm-svn: 290796
I'm not sure if this was intentional, but today
isGuaranteedToTransferExecutionToSuccessor returns true for readonly and
argmemonly calls that may throw. This commit changes the function to
not implicitly infer nounwind this way.
Even if we eventually specify readonly calls as not throwing,
isGuaranteedToTransferExecutionToSuccessor is not the best place to
infer that. We should instead teach FunctionAttrs or some other such
pass to tag readonly functions / calls as nounwind instead.
llvm-svn: 290794
I don't think this hole is currently exposed, but I crashed regression tests for
jump-threading and loop-vectorize after I added calls to isKnownNonNullAt() in
InstSimplify as part of trying to solve PR28430:
https://llvm.org/bugs/show_bug.cgi?id=28430
That's because they call into value tracking with a context instruction, but no
other parts of the query structure filled in.
For more background, see the discussion in:
https://reviews.llvm.org/D27855
llvm-svn: 290786