Use the existing AVX2 v8i16 vector shift lowering for v16i8 (extending to v16i32) on AVX512 targets and v32i8 (extending to v32i16) on AVX512BW targets.
Cost model updates to follow.
llvm-svn: 291451
Use the existing AVX2 v8i16 vector shift lowering for v16i16 on AVX512 targets (AVX512BW will have already have lowered with vpsravw).
Cost model updates to follow.
llvm-svn: 291445
Summary:
Originally
i64 = umax t8, Constant:i64<4>
was expanded into
i32,i32 = umax Constant:i32<0>, Constant:i32<0>
i32,i32 = umax t7, Constant:i32<4>
Now instead the two produced umax:es return i32 instead of i32, i32.
Thanks to Jan Vesely for help with the test case.
Patch by mikael.holmen at ericsson.com
Reviewers: bogner, jvesely, tstellarAMD, arsenm
Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D28135
llvm-svn: 291441
This patch doesn't create thunk for branch operation when following conditions are met:
- Architecture is AArch64
- Relocation target is in the same object file
- Relocation target is close enough to be encoded in immediate offset
In such case we branch directly to the target instead of branching to thunk
Differential revision: https://reviews.llvm.org/D28108
llvm-svn: 291431
invalid.
This fixes use-after-free bugs that will arise with any interesting use
of SCEV.
I've added a dedicated test that works diligently to trigger these kinds
of bugs in the new pass manager and also checks for them explicitly as
well as triggering ASan failures when things go squirly.
llvm-svn: 291426
Summary:
By using stripPointerCasts we can get to the root
value and then walk down the bitcast graph
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28181
llvm-svn: 291405
I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR.
The debug output shows nodes like this:
t4: i32 = xor t2, Constant:i32<-1>
t21: i8 = setcc t4, Constant:i32<0>, setlt:ch
t14: i32 = select t21, t4, Constant:i32<-1>
And because the select is holding onto the t4 (xor) node while EmitTest creates a new
x86-specific xor node, the lowering results in:
t4: i32 = xor t2, Constant:i32<-1>
t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1>
t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1
Differential Revision: https://reviews.llvm.org/D28374
llvm-svn: 291392
The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation).
Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled.
Added missing AVX2/AVX512BW costs as well.
llvm-svn: 291391
We should probably teach the two address instruction pass to turn masked moves into BLENDM when its beneficial to the register allocator.
llvm-svn: 291371
Summary:
The issue happens with:
%0 = ....., !tbaa !0
%1 = ....., !tbaa !1
With !0 that references !1.
In this case when loading !0 we generates a temporary for the
operand !1. We now flush it immediately and trigger the load of
!1 before moving on. If we don't we get the temporary when
attaching to %1. This is usually not an issue except that we
eagerly try to update TBAA MDNodes, which is obviously not possible
if we only have a temporary.
Differential Revision: https://reviews.llvm.org/D28423
llvm-svn: 291362
Don't print a line multiple times, each for different inlining contexts, if
nothing happened in any context. This prevents situations like this:
[[
> main:
65 | if ((i * ni + j) % 20 == 0) fprintf
> print_array:
65 | if ((i * ni + j) % 20 == 0) fprintf
]]
which could happen if different optimizations were missed in different inlining
contexts.
llvm-svn: 291361
fabs(x * x) is not generally safe to assume x is positive if x is a NaN.
This is also less general than it could be, so this will be replaced
with a transformation on the intrinsic.
llvm-svn: 291359
Gracefully leave code that performs function-pointer bitcasts implying
non-trivial pointer conversions alone, rather than aborting, since it's
just undefined behavior.
llvm-svn: 291326
WebAssembly requires caller and callee signatures to match exactly. In LLVM,
there are a variety of circumstances where signatures may be mismatched in
practice, and one can bitcast a function address to another type to call it
as that type. This patch adds a pass which replaces bitcasted function
addresses with wrappers to replace the bitcasts.
This doesn't catch everything, but it does match many common cases.
llvm-svn: 291315
Summary: LLVM's non-standard notion of phi nodes means we can't both try to substitute for undef in phi nodes *and* use phi nodes as leaders all the time. This changes NewGVN to use the same semantics as SimplifyPHINode to decide which phi nodes are equivalent.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28312
llvm-svn: 291308
Summary:
r285871 introduced an assert that was overly aggressive in the case
of a same-named local in different same-named files (in different
directories), where the source name and therefore the GUID ended up
the same because the files were compiled in their own directory without
any leading path. Change the handling in the promotion logic to get
the summary for the version in that module.
This also exposed an issue where we are not always importing the
right copy, which is a performance not correctness issue (because
the renaming is based on the module hash which must be different,
see the bug report for details). I will fix that as a follow-on.
Fixes PR31561.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28411
llvm-svn: 291304
Modify the --system-libs option in llvm-config to print system libs only
when using static linking. The system libraries are irrelevant when
linking to a shared library since the library has appropriate library
dependencies embedded.
Modify the --system-libs test appropriately to force static linking, and
disable it if static libs are not available (i.e. BUILD_SHARED_LIBS is
enabled).
Differential Revision: https://reviews.llvm.org/D27805
llvm-svn: 291285
Canonicalize all CMake booleans to 0/1 before passing them to lit, to
ensure that the Python side handles all of them consistently
and correctly. 0/1 is a safe choice of values that trigger the same
boolean interpretation in CMake, Python and C++.
Furthermore, using them without quotes improves the chance Python will
explicitly fail when an incorrect value (such as ON/OFF, TRUE/FALSE,
YES/NO) is accidentally passed, rather than silently misinterpreting
the value.
This replaces a lot of different logics spread around lit site files,
attempting to partially reproduce the boolean logic used in CMake
and usually silently failing when an uncommon value was used instead.
In fact, some of them were never working correctly since different
values were assigned in CMake and checked in Python.
The alternative solution could be to create a common parser for CMake
booleans in lit and use it consistently throughout the site files.
However, it does not seem like the best idea to create redundant
implementation of the same logic and have to follow upstream if it ever
is extended to handle more values.
Differential Revision: https://reviews.llvm.org/D28294
llvm-svn: 291284
Remove config.test_examples from lit.site.cfg and the relevant
ENABLE_EXAMPLES definition from CMake. It is not used anywhere.
Differential Revision: https://reviews.llvm.org/D28283
llvm-svn: 291283
This is fixing a bug where Loop Vectorization is widening a load but
with a lower alignment. Hoisting the load without propagating the alignment
will allow inst-combine to later deduce a higher alignment that what the pointer
actually is.
Differential Revision: https://reviews.llvm.org/D28408
llvm-svn: 291281
Re-apply r288561: This time with a fix where the ADDs that are part of a
3 instruction LOH would not invalidate the "LastAdrp" state. This fixes
http://llvm.org/PR31361
Previously this pass was using up to 5% compile time in some cases which
is a bit much for what it is doing. The pass featured a full blown
data-flow analysis which in the default configuration was restricted to a
single block.
This rewrites the pass under the assumption that we only ever work on a
single block. This is done in a single pass maintaining a state machine
per general purpose register to catch LOH patterns.
Differential Revision: https://reviews.llvm.org/D27329
This reverts commit 9e6cedb0a4f14364d6511597a9160305e7d34493.
llvm-svn: 291266
order to avoid jumpy line tables. Calls are left alone because they may be inlined.
Differential Revision: https://reviews.llvm.org/D28390
llvm-svn: 291258
This change separates how type identifiers are resolved from how intrinsic
calls are lowered. All information required to lower an intrinsic call
is stored in a new TypeIdLowering data structure. The idea is that this
data structure can either be initialized using the module itself during
regular LTO, or using the module summary in ThinLTO backends.
Differential Revision: https://reviews.llvm.org/D28341
llvm-svn: 291205
We used the logBase2 of the high instead of the ceilLogBase2 resulting
in the wrong result for certain values. For example, it resulted in an
i1 AssertZExt when the exclusive portion of the range was 3.
llvm-svn: 291196
Summary:
Using the linker-supplied list of "preserved" symbols, we can compute
the list of "dead" symbols, i.e. the one that are not reachable from
a "preserved" symbol transitively on the reference graph.
Right now we are using this information to mark these functions as
non-eligible for import.
The impact is two folds:
- Reduction of compile time: we don't import these functions anywhere
or import the function these symbols are calling.
- The limited number of import/export leads to better internalization.
Patch originally by Mehdi Amini.
Reviewers: mehdi_amini, pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23488
llvm-svn: 291177
Promotion is always legal when a store within the loop is guaranteed to execute.
However, this is not a necessary condition - for promotion to be memory model
semantics-preserving, it is enough to have a store that dominates every exit
block. This is because if the store dominates every exit block, the fact the
exit block was executed implies the original store was executed as well.
Differential Revision: https://reviews.llvm.org/D28147
llvm-svn: 291171
Add an assert that checks whether liveins are up to date before they are
used.
- Do not print liveins into .mir files anymore in situations where they
are out of date anyway.
- The assert in the RegisterScavenger is superseded by the new one in
livein_begin().
- Skip parts of the liveness updating logic in IfConversion.cpp when
liveness isn't tracked anymore (just enough to avoid hitting the new
assert()).
Differential Revision: https://reviews.llvm.org/D27562
llvm-svn: 291169
Summary:
For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register.
The lower 64-bits of the register are evaluated to determine the shift amount.
This patch improves the construction of the vector containing the shift amount.
Reviewers: craig.topper, delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28353
llvm-svn: 291120
Summary:
This adds a new summary flag NotEligibleToImport that subsumes
several existing flags (NoRename, HasInlineAsmMaybeReferencingInternal
and IsNotViableToInline). It also subsumes the checking of references
on the summary that was being done during the thin link by
eligibleForImport() for each candidate. It is much more efficient to
do that checking once during the per-module summary build and record
it in the summary.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28169
llvm-svn: 291108
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.
Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.
Differential Revision: https://reviews.llvm.org/D27518
llvm-svn: 291106
To make this work, pointers from the MachineBasicBlock to the LLVM-IR-level
basic blocks need to be initialized, as the AsmPrinter uses this link to be
able to print out labels for the basic blocks that are address-taken.
Most of the changes in this commit are about adapting existing tests to include
the basic block name that is now printed out in the MIR format, now that the
name becomes available as the link to the LLVM-IR basic block is initialized.
The relevant test change for the functionality added in this patch are the
added "(address-taken)" strings in
test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll.
Differential Revision: https://reviews.llvm.org/D28123
llvm-svn: 291105
This commit does this using a trivial chain of conditional branches. In the
future, we probably want to reuse the optimized switch lowering used in
SelectionDAG.
Differential Revision: https://reviews.llvm.org/D28176
llvm-svn: 291099
Set up basic YAML I/O support for module summaries, plumb the summary into
the pass and add a few command line flags to test YAML I/O support. Bitcode
support to come separately, as will the code in LowerTypeTests that actually
uses the summary. Also add a couple of tests that pass by virtue of the pass
doing nothing with the summary (which happens to be the correct thing to do
for those tests).
Differential Revision: https://reviews.llvm.org/D28041
llvm-svn: 291069
performing partial redundancy elimination (PRE). Not doing so can cause jumpy line
tables and confusing (though correct) source attributions.
Differential Revision: https://reviews.llvm.org/D27857
llvm-svn: 291037
Summary:
This is a relatively simple scheme: we use the index emitted in the
bitcode to avoid loading all the global metadata. Instead we load
the index with their position in the bitcode so that we can load each
of them individually. Materializing the global metadata block in this
condition only triggers loading the named metadata, and the ones
referenced from there (transitively). When materializing a function,
metadata from the global block are loaded lazily as they are
referenced.
Two main current limitations are:
1) Global values other than functions are not materialized on demand,
so we need to eagerly load METADATA_GLOBAL_DECL_ATTACHMENT records
(and their transitive dependencies).
2) When we load a single metadata, we don't recurse on the operands,
instead we use a placeholder or a temporary metadata. Unfortunately
tepmorary nodes are very expensive. This is why we don't have it
always enabled and only for importing.
These two limitations can be lifted in a subsequent improvement if
needed.
With this change, the total link time of opt with ThinLTO and Debug
Info enabled is going down from 282s to 224s (~20%).
Reviewers: pcc, tejohnson, dexonsmith
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28113
llvm-svn: 291027
Summary:
When promoting fp-to-uint16 to fp-to-sint32, the result is actually zero
extended. For example, given double 65534.0, without legalization:
fp-to-uint16: 65534.0 -> 0xfffe
With the legalization:
fp-to-sint32: 65534.0 -> 0x0000fffe
Without this patch, legalization wrongly emits a signed extend assertion,
which is consumed by later icmp instruction, and cause miscompile.
Note that the floating point value must be in [0, 65535), otherwise the
behavior is undefined.
This patch reverts r279223 behavior and adds more tests and
documentations.
In PR29041's context, James Molloy mentioned that:
We don't need to mask because conversion from float->uint8_t is
undefined if the integer part of the float value is not representable in
uint8_t. Therefore we can assume this doesn't happen!
which is totally true and good, because fptoui is documented clearly to
have undefined behavior when overflow/underflow happens. We should take
the advantage of this behavior so that we can save unnecessary mask
instructions.
Reviewers: jmolloy, nadav, echristo, kbarton
Subscribers: mehdi_amini, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28284
llvm-svn: 291015
Summary:
Instead of matching:
(a + i) + 1 -> (a + i, undef, 1)
Now it matches:
(a + i) + 1 -> (a, i, 1)
Reviewers: rengolin
Differential Revision: http://reviews.llvm.org/D26367
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 291012
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:
1. When determining tail-call eligibility. If we make a tail call (i.e.
directly branch to a function) then there is no place for the linker to add
a TOC restoration.
2. When determining when we need to add a nop instruction after a call.
Likewise, if there is a possibility that the linker might need to add a
TOC restoration after a call, then we need to put a nop after the call
(the bl instruction).
First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.
Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).
There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.
Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.
Differential Revision: https://reviews.llvm.org/D27231
llvm-svn: 291003
This test case has been reduced from test/Analysis/RegionInfo/mix_1.ll and
provides us with a minimal example of a test case which caused problems while
working on an improved version of the RegionInfo analysis. We upstream this
test case, as it certainly can be helpful in future debugging and optimization
tests.
Test case reduced by Pratik Bhatu <cs12b1010@iith.ac.in>
llvm-svn: 290974
This reapplies r289828 (reverted in r289833 as it broke the address sanitizer). The
debugloc is now only set when the instruction is not a call, as this causes the
verifier to assert (the inliner requires an inlinable callsite to have a debug loc
if the caller and callee have debug info).
Original commit message:
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Original review: https://reviews.llvm.org/D27590
llvm-svn: 290973
These tests are missing a target triple and the -m elf_x86_64 gold option,
which makes them fail on non-x86 targets.
Differential revision: https://reviews.llvm.org/D28285
Reviewers: tejohnson
llvm-svn: 290965
Summary:
Change llvm-link to use the FunctionImporter handling, instead of
manually invoking the Linker. We still need to load the module
in llvm-link to do the desired testing for invalid import requests
(weak functions), and to get the GUID (in case the function is local).
Also change the drop-debug-info test to use llvm-link so that importing
is forced (in order to test debug info handling) and independent of
import logic changes.
Reviewers: mehdi_amini
Subscribers: mgorny, llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D28277
llvm-svn: 290964
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.
In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.
Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)
Reviewers: aprantl, MatzeB, mkuper
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D27688
llvm-svn: 290955
In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.
This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).
Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs.
I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need.
Differential Revision: https://reviews.llvm.org/D28219
llvm-svn: 290947
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))
This is only possible if C2 is negative and C2 is greater than or equal to negative C1.
llvm-svn: 290927
Summary:
Regardless how the loop body weight is distributed, we should preserve
total loop body weight. i.e. we should have same weight reaching the body of the loop
or its duplicates in peeled and unpeeled case.
Reviewers: mkuper, davidxl, anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28179
llvm-svn: 290833
Add an explicit LLVM_ENABLE_DIA_SDK option to control building support
for DIA SDK-based debugging. Control its value to match whether DIA SDK
support was found and expose it in LLVMConfig (alike LLVM_ENABLE_ZLIB).
Its value is needed for LLDB to determine whether to run tests requiring
DIA support. Currently it is obtained from llvm/Config/config.h;
however, this file is not available for standalone builds. Following
this change, LLDB will be modified to use the value from LLVMConfig.
Differential Revision: https://reviews.llvm.org/D26255
llvm-svn: 290818
GNU as rejects input where .cfi_sections is used after .cfi_startproc,
if the new section differs from the old. Adjust our output to always
emit .cfi_sections before the first .cfi_startproc to minimize necessary
code.
Differential Revision: https://reviews.llvm.org/D28011
llvm-svn: 290817
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.
In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).
* Shiffle-broadcast cost will be changed in Simon's upcoming patch.
Differential Revision: https://reviews.llvm.org/D28118
llvm-svn: 290810
Summary:
gep 0, 0 is equivalent to bitcast. LLVM canonicalizes it
to getelementptr because it make SROA can then handle it.
Simple case like
void g(A &a) {
z(a);
if (glob)
a.foo();
}
void testG() {
A a;
g(a);
}
was not devirtualized with -fstrict-vtable-pointers because luck of
handling for gep 0 in Memory Dependence Analysis
Reviewers: dberlin, nlewycky, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28126
llvm-svn: 290763
I remove one extra line, but because annoyingly llvm-lit does not
clean the output directory before running the test, it didn't fail
locally (the file was present from a previous run).
llvm-svn: 290740
This is similar to the allocfn case - if an alloca is not captured, then it's
necessarily thread-local.
Differential Revision: https://reviews.llvm.org/D28170
llvm-svn: 290738
Summary:
The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.
The patch also simplified the code by reducing one parameter in UP.
The patch only affects code-gen of two speccpu2006 benchmark:
445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.
Reviewers: mzolotukhin, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26989
llvm-svn: 290737
Some incoming changes in ThinLTO will break this test.
Instead of relying on the heuristic to import, we
force the importing to happen with llvm-link.
llvm-svn: 290736
This reverts commit r290694. It broke sanitizer tests on Win64. I'll
probably bring this back, but the jump tables will just live in .text
like they do for MSVC.
llvm-svn: 290714
This fixes the issue exposed in PR31393, where we weren't trying
sufficiently hard to diagnose bad TBAA metadata.
This does reduce the variety in the error messages we print out, but I
think the tradeoff of verifying more, simply and quickly overrules the
need for more helpful error messags here.
llvm-svn: 290713
Among other stuff, this allows to use predefined .option.machine_version_major
/minor/stepping symbols in the directive.
Relevant test expanded at once (also file renamed for clarity).
Differential Revision: https://reviews.llvm.org/D28140
llvm-svn: 290710
This change adds a new intrinsic which is intended to provide memcpy functionality
with additional atomicity guarantees. Please refer to the review thread
or language reference for further details.
Differential Revision: https://reviews.llvm.org/D27133
llvm-svn: 290708
The accidentally had trivially dead code. Also needed to adjust the rounding mode to not CUR_DIRECTION so the intrinsics don't get converted to native operations before going through SimplifyDemandedVectorElts.
llvm-svn: 290702
Summary:
We were already using 32-bit jump table entries, but this was a
consequence of the default PIC model on Win64, and not an intentional
design decision. This patch ensures that we always use 32-bit label
difference jump table entries on Win64 regardless of the PIC model. This
is a good idea because it saves executable size and object file size.
Moving the jump tables to .rdata cleans up the disassembled object code
and reduces the available ROP targets, but it requires adding one more
RIP-relative lea to the code. COFF doesn't have relocations to express
the difference between two arbitrary symbols, so we can't use the jump
table label in the label difference like we do elsewhere.
Fixes PR31488
Reviewers: majnemer, compnerd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28141
llvm-svn: 290694
This index record the position for each metadata record in
the bitcode, so that the reader will be able to lazy-load
on demand each individual record.
We also make sure that every abbrev is emitted upfront so
that the block can be skipped while reading.
I don't plan to commit this before having the reader
counterpart, but I figured this can be reviewed mostly
independently.
Recommit r290684 (was reverted in r290686 because a test
was broken) after adding a threshold to avoid emitting
the index when unnecessary (little amount of metadata).
This optimization "hides" a limitation of the ability
to backpatch in the bitstream: we can only backpatch
safely when the position has been flushed. So if we emit
an index for one metadata, it is possible that (part of)
the offset placeholder hasn't been flushed and the backpatch
will fail.
Differential Revision: https://reviews.llvm.org/D28083
llvm-svn: 290690
Summary:
This index record the position for each metadata record in
the bitcode, so that the reader will be able to lazy-load
on demand each individual record.
We also make sure that every abbrev is emitted upfront so
that the block can be skipped while reading.
I don't plan to commit this before having the reader
counterpart, but I figured this can be reviewed mostly
independently.
Reviewers: pcc, tejohnson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28083
llvm-svn: 290684
Jump table emission can switch to .rdata before
WinException::endFunction gets called. Just remember the appropriate
text section we started in and reset back to it when we end the
function. We were already switching sections back from .xdata anyway.
Fixes the first problem in PR31488, so that now COFF switch tables can
live in .rdata if we want them to.
llvm-svn: 290678
This is an orthogonal and separated layer instead of being embedded
inside the pass manager. While it adds a small amount of complexity, it
is fairly minimal and the composability and control seems worth the
cost.
The logic for this ends up being nicely isolated and targeted. It should
be easy to experiment with different iteration strategies wrapped around
the CGSCC bottom-up walk using this kind of facility.
The mechanism used to track devirtualization is the simplest one I came
up with. I think it handles most of the cases the existing iteration
machinery handles, but I haven't done a *very* in depth analysis. It
does however match the basic intended semantics, and we can tweak or
tune its exact behavior incrementally as necessary. One thing that we
may want to revisit is freshly building the value handle set on each
iteration. While I don't think this will be a significant cost (it is
strictly fewer value handles but more churn of value handes than the old
call graph), it is conceivable that we'll want a somewhat more clever
tracking mechanism. My hope is to layer that on as a follow up patch
with data supporting any implementation complexity it adds.
This code also provides for a basic count heuristic: if the number of
indirect calls decreases and the number of direct calls increases for
a given function in the SCC, we assume devirtualization is responsible.
This matches the heuristics currently used in the legacy pass manager.
Differential Revision: https://reviews.llvm.org/D23114
llvm-svn: 290665
analyses when we're about to break apart an SCC.
We can't wait until after breaking apart the SCC to invalidate things:
1) Which SCC do we then invalidate? All of them?
2) Even if we invalidate all of them, a newly created SCC may not have
a proxy that will convey the invalidation to functions!
Previously we only invalidated one of the SCCs and too late. This led to
stale analyses remaining in the cache. And because the caching strategy
actually works, they would get used and chaos would ensue.
Doing invalidation early is somewhat pessimizing though if we *know*
that the SCC structure won't change. So it turns out that the design to
make the mutation API force the caller to know the *kind* of mutation in
advance was indeed 100% correct and we didn't do enough of it. So this
change also splits two cases of switching a call edge to a ref edge into
two separate APIs so that callers can clearly test for this and take the
easy path without invalidating when appropriate. This is particularly
important in this case as we expect most inlines to be between functions
in separate SCCs and so the common case is that we don't have to so
aggressively invalidate analyses.
The LCG API change in turn needed some basic cleanups and better testing
in its unittest. No interesting functionality changed there other than
more coverage of the returned sequence of SCCs.
While this seems like an obvious improvement over the current state, I'd
like to revisit the core concept of invalidating within the CG-update
layer at all. I'm wondering if we would be better served forcing the
callers to handle the invalidation beforehand in the cases that they
can handle it. An interesting example is when we want to teach the
inliner to *update and preserve* analyses. But we can cross that bridge
when we get there.
With this patch, the new pass manager an build all of the LLVM test
suite at -O3 and everything passes. =D I haven't bootstrapped yet and
I'm sure there are still plenty of bugs, but this gives a nice baseline
so I'm going to increasingly focus on fleshing out the missing
functionality, especially the bits that are just turned off right now in
order to let us establish this baseline.
llvm-svn: 290664
There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.
Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky
Differential Revision: https://reviews.llvm.org/D27901
llvm-svn: 290663
when they are call edges at the leaf but may (transitively) be reached
via ref edges.
It turns out there is a simple rule: insert everything as a ref edge
which is a safe conservative default. Then we let the existing update
logic handle promoting some of those to call edges.
Note that it would be fairly cheap to make these call edges right away
if that is desirable by testing whether there is some existing call path
from the source to the target. It just seemed like slightly more
complexity in this code path that isn't strictly necessary. If anyone
feels strongly about handling this differently I'm happy to change it.
llvm-svn: 290649
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.
This fixes PR31286.
Differential Revision: https://reviews.llvm.org/D27992
llvm-svn: 290641
Replace the use of grep with FileCheck. Tidy up some of the tests. A
few of the tests have been left as weak as previously, though some have
been made more stringent.
llvm-svn: 290616
BasicAA in r290603.
I've kept the basic testing in the new PM test file as that also covers
the AAManager invalidation logic. If/when there is a good place for
broader AA testing it could move there.
This test is somewhat unsatisfying as I can't get it to fail even with
ASan outside of explicit checks of the invalidation. Apparently we don't
yet have any test coverage of the BasicAA code paths using either the
domtree or loopinfo -- I made both of them always be null and check-llvm
passed.
llvm-svn: 290612
The effect of the bug was that we would incorrectly create summaries
for global and weak values defined in module asm (since we were
essentially testing for bit 1 which is SF_Undefined, and the
RecordStreamer ignores local undefined references). This would have
resulted in conservatively disabling importing of anything referencing
globals and weaks defined in module asm. Added these cases to the test
which now fails without this bug fix.
Fixes PR31459.
llvm-svn: 290610
The feature allows for conditional assembly, filling the entries
of .amd_kernel_code_t etc.
Symbols are defined with value 0 at the beginning of each kernel scope.
After each register usage, the respective symbol is set to:
value = max( value, ( register index + 1 ) )
Thus, at the end of scope the value represents a count of used registers.
Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the
next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also
dummy scope that lies from the beginning of source file til the
first .amdgpu_hsa_kernel.
Test added.
Differential Revision: https://reviews.llvm.org/D27859
llvm-svn: 290608
This requires custom handling because BasicAA caches handles to other
analyses and so it needs to trigger indirect invalidation.
This fixes one of the common crashes when using the new PM in real
pipelines. I've also tweaked a regression test to check that we are at
least handling the most immediate case.
I'm going to work at re-structuring this test some to both scale better
(rather than all being in one file) and check more invalidation paths in
a follow-up commit, but I wanted to get the basic bug fix in place.
llvm-svn: 290603
not really wired into the loop pass manager in a way that will let us
productively use these passes yet.
This lets the new PM get farther in basic testing which is useful for
establishing a good baseline of "doesn't explode". There are still
plenty of crashers in basic testing though, this just gets rid of some
noise that is well understood and not representing a specific or narrow
bug.
llvm-svn: 290601
inter-analysis dependencies) to use the new invalidation infrastructure.
This teaches it to invalidate itself when any of the peer function
AA results that it uses become invalid. We do this by just tracking the
originating IDs. I've kept it in a somewhat clunky API since some users
of AAResults are outside the new PM right now. We can clean this API up
if/when those users go away.
Secondly, it uses the registration on the outer analysis manager proxy
to trigger deferred invalidation when a module analysis result becomes
invalid.
I've included test cases that specifically try to trigger use-after-free
in both of these cases and they would crash or hang pretty horribly for
me even without ASan. Now they work nicely.
The `InvalidateAnalysis` utility pass required some tweaking to be
useful in this context and it still is pretty garbage. I'd like to
switch it back to the previous implementation and teach the explicit
invalidate method on the AnalysisManager to take care of correctly
triggering indirect invalidation, but I wanted to go ahead and send this
out so folks could see how all of this stuff works together in practice.
And, you know, that it does actually work. =]
Differential Revision: https://reviews.llvm.org/D27205
llvm-svn: 290595
most of the inliner test cases.
The inliner involves a bunch of interesting code and tends to be where
most of the issues I've seen experimenting with the new PM lie. All of
these test cases pass, but I'd like to keep some more thorough coverage
here so doing a fairly blanket enabling.
There are a handful of interesting tests I've not enabled yet because
they're focused on the always inliner, or on functionality that doesn't
(yet) exist in the inliner.
llvm-svn: 290592
skipping indirectly recursive inline chains.
To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.
This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.
llvm-svn: 290590
Nothing really interesting here, but I had to improve the test to use
variables rather than hard coding value names as we happen to end up
with different value names in the new PM.
llvm-svn: 290589
We currently ignore the `allocsize` attribute on functions calls with
the `nobuiltin` attribute when trying to lower `@llvm.objectsize`. We
shouldn't care about `nobuiltin` here: `allocsize` is explicitly added
by the user, not inferred based on a function's symbol.
llvm-svn: 290588
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
This builds on r290554 which added supported for 128 and 256-bit.
llvm-svn: 290582
The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded.
llvm-svn: 290573
This mostly involved converting from grep to FileCheck and tidying up
the IR used.
In one case (invoke_test-3.ll) the test had become completely pointless
as we use 'resume' rather than 'unwind' now, and even then it did not
occur at the end of the line.
llvm-svn: 290570
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.
llvm-svn: 290566
These particular sequences will be generated after a future change to teach InstCombine to turn masked scalar arithmetic intrinsics into native IR.
llvm-svn: 290563
inside of `InlineFunction`. Prior to this, call instructions are
specifically being rewritten and replaced within the inlined region,
invalidating some of the call sites.
Several of these regions are using the same technique to walk the
inlined region so this seems clearly safe up to this point.
I've also added a short circuit to the scan for call sites based on what
other code is doing.
With this, the most common crash I've found in the new inliner code is
fixed. I've turned it on for another test case that covers this
scenario.
I'll make my way through most of the other inliner test cases
just to get some easy coverage next.
llvm-svn: 290562