Summary:
Make this interface reusable similarly to std::call_once and std::once_flag interface.
This makes porting LLDB to NetBSD easier as there was in the original approach a portable way to specify a non-static once_flag. With this change translating std::once_flag to llvm::once_flag is mechanical.
Sponsored by <The NetBSD Foundation>
Reviewers: mehdi_amini, labath, joerg
Reviewed By: mehdi_amini
Subscribers: emaste, clayborg
Differential Revision: https://reviews.llvm.org/D29566
llvm-svn: 294143
Similar was already done for several other shuffles in this function.
The test changes are because the old code used explicity zeroing for elements that could have been undef.
While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us.
llvm-svn: 294130
Summary:
Without this change, the getVR() call would hit an assert since it was
being passed a physical register.
Update the AArch64/ldst-opt.ll test with a case that triggers this
behavior by adding a run with strict-align, which causes an unaligned
STR XZR instruction to be split into byte stores, creating an
EXTRACT_SUBREG of XZR that triggers the original problem.
Reviewers: bogner, qcolombet, MatzeB, atrick
Subscribers: aemerson, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D29495
llvm-svn: 294129
Summary: This avoid the need to duplicate all pattern and actually end up exposing some opportunity to optimize existing pattern that did not exists in both directions on an existing test case.
Reviewers: mkuper, spatel, bkramer, RKSimon, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29541
llvm-svn: 294125
Summary:
This patch tablegen-erates the ARM register bank information so that the
static tables added in D27807 no longer need to be maintained.
Depends on D27338
Reviewers: t.p.northover, rovka, ab, qcolombet, aditya_nandakumar
Reviewed By: rovka
Subscribers: aemerson, rengolin, mgorny, dberris, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D28567
llvm-svn: 294124
It is sufficient to skip emission of these arguments as we have nothing
to actually pass through the function call.
The AVR-GCC reference has nothing to say about zero-sized arguments,
presumably because C/C++ doesn't support them. This means we don't have
to worry about ABI differences.
llvm-svn: 294119
This was originally introduced in r278321 to work around correctness
problems in the ExecutionDepsFix pass; Probably also to keep the
performance benefits of breaking the false dependencies which of course
also affect undef operands.
ExecutionDepsFix has been improved here recently (see for example
r278321) so we should not need this exception any longer.
Differential Revision: https://reviews.llvm.org/D29525
llvm-svn: 294087
Move a check for blocks that are not candidates for tail duplication up before
the logging. Reduces logging noise. No non-logging changes intended.
llvm-svn: 294086
Anything that needs to be passed to AnalyzeBranch unfortunately can't be const,
or more would be const. Added const_iterator to BlockChain to allow
BlockChain to be const when we don't expect to change it.
llvm-svn: 294085
An assert occurs when calling SlotIndexes::getInstructionIndex with
a DBG_VALUE instruction because the function expects an instruction
with a slot index. However, there is no slot index for a DBG_VALUE
instruction.
Differential Revision: https://reviews.llvm.org/D29048
llvm-svn: 294070
This patch is based on the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/109631.html
Folding to i1 should always be desirable because that's better for value tracking
and we have special folds for i1 types.
I checked for other users of shouldChangeType() where this might have an effect,
but we already handle the i1 case differently than other types in all of those cases.
Side note: the default datalayout includes i1, so it seems we only find this gap in
shouldChangeType + phi folding for the case when there is (1) an explicit datalayout
without i1, (2) casting to i1 from a legal type, and (3) a phi with exactly 2 incoming
casted operands (as Björn mentioned).
Differential Revision: https://reviews.llvm.org/D29336
llvm-svn: 294066
Summary: As per title. I ran into that limitation of the API doing some other work, so I though that'd be a nice addition.
Reviewers: jroelofs, compnerd, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29503
llvm-svn: 294063
The code comments didn't match the code logic, and we didn't actually distinguish the fake unary (not/neg/fneg)
operators from arguments. Adding another level to the weighting scheme provides more structure and can help
simplify the pattern matching in InstCombine and other places.
I fixed regressions that would have shown up from this change in:
rL290067
rL290127
But that doesn't mean there are no pattern-matching logic holes left; some combines may just be missing regression tests.
Should fix:
https://llvm.org/bugs/show_bug.cgi?id=28296
Differential Revision: https://reviews.llvm.org/D27933
llvm-svn: 294049
This has quite positive performance impact according to measurements.
Before previous fixes to limit the optimization that was too high
and blowed compile time and scratch usage, but now this is gone and
we can bump the threshold.
Differential Revision: https://reviews.llvm.org/D29505
llvm-svn: 294032
This re-applies commit r292189, reverted in r292191.
SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.
Use TLI instead, removing another heap of redundant code.
This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to a few tests:
- calling a non-variadic function via a variadic prototype isn't OK;
it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter; the SDAG code accepts any integer
type, which meant using i32 on x86_64 worked.
- a handful of SystemZ tests check the SDAG support for lax prototype
checking: Ulrich agrees on removing them.
I don't think it's worth supporting any of these (IMO) invalid
testcases. Instead, fix them to be more meaningful.
llvm-svn: 294028
This generalizes memory access sorting to use differences between SCEVs,
instead of relying on constant offsets. That allows us to properly do
SLP vectorization of non-sequentially ordered loads within loops bodies.
Differential Revision: https://reviews.llvm.org/D29425
llvm-svn: 294027
Currently these flags are always the inverse of each other, so there is
no need to keep them separate.
Differential Revision: https://reviews.llvm.org/D29471
llvm-svn: 294016
The importer was previously using ModuleLinker in a sort of "IRMover mode". Use
IRMover directly instead in order to remove a level of indirection.
I will remove all importing support from ModuleLinker in a separate
change.
Differential Revision: https://reviews.llvm.org/D29468
llvm-svn: 294014
The .end <symbol> directive for MIPS marks the end of a symbol and sets the
symbol's size. Previously, the corresponding emitDirective handler asserted
that a function's size could be evaluated to an absolute value at that point
in time.
This cannot be done with when directives like .align have been encountered,
instead set the function's size to the corresponding symbolic expression and
let ELFObjectWriter resolve the expression to an absolute value. This avoids
a redundant call to evaluateAsAbsolute.
llvm-svn: 294012
ISD::DELETED_NODE && "NodeToMatch was removed partway through
selection"' failed.
NodeToMatch can be modified during matching, but code does not handle
this situation.
Differential Revision: https://reviews.llvm.org/D29292
llvm-svn: 294003
Summary:
The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.
Reviewers: rengolin, jmolloy, rovka, olista01
Reviewed By: olista01
Subscribers: llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D29020
llvm-svn: 294000
Currently LLVM supports vectorization of horizontal reduction
instructions with initial value set to 0. Patch supports vectorization
of reduction with non-zero initial values. Also it supports a
vectorization of instructions with some extra arguments, like:
float f(float x[], int a, int b) {
float p = a % b;
p += x[0] + 3;
for (int i = 1; i < 32; i++)
p += x[i];
return p;
}
Patch allows vectorization of this kind of horizontal reductions.
Differential Revision: https://reviews.llvm.org/D28961
llvm-svn: 293994
This reverts commit r293970.
After more discussion, this belongs to the linker side and
there is no added value to do it at this level.
llvm-svn: 293993
Exit loop analysis early if suitable private access found.
Do not account for GEPs which are invariant to loop induction variable.
Do not account for Allocas which are too big to fit into register file anyway.
Add option for tuning: -amdgpu-unroll-threshold-private.
Differential Revision: https://reviews.llvm.org/D29473
llvm-svn: 293991
On Windows, the symbols "___stop___sancov_guards" and "___start___sancov_guards"
are not defined automatically. So, we need to take a different approach.
We define 3 sections:
Section ".SCOV$A" will only hold a variable ___start___sancov_guard.
Section ".SCOV$M" will hold the main data.
Section ".SCOV$Z" will only hold a variable ___stop___sancov_guards.
When linking, they will be merged sorted by the characters after the $, so we
can use the pointers of the variables ___[start|stop]___sancov_guard to know the
actual range of addresses of that section.
In this diff, I updated instrumentation to include all the guard arrays in
section ".SCOV$M".
Differential Revision: https://reviews.llvm.org/D28434
llvm-svn: 293987
While looking to add support for placing singular types (types that will
only be emitted in one place (such as attached to a strong vtable or
explicit template instantiation definition)) not in type units (since
type units have overhead) I stumbled across that change causing an
increase in pubtypes.
Turns out we were missing some types from type units if they were only
referenced from other type units and not from the debug_info section.
This fixes that, following GCC's line of describing the offset of such
entities as the CU die (since there's no compile unit-relative offset
that would describe such an entity - they aren't in the CU). Also like
GCC, this change prefers to describe the type stub within the CU rather
than the "just use the CU offset" fallback where possible. This may give
the DWARF consumer some opportunity to find the extra info in the type
stub - though I'm not sure GDB does anything with this currently.
The size of the pubnames/pubtypes sections now match exactly with or
without type units enabled.
This nearly triples (+189%) the pubtypes section for a clang self-host
and grows pubnames by 0.07% (without compression). For a total of 8%
increase in debug info sections of the objects of a Split DWARF build
when using type units.
llvm-svn: 293971
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.
This is a recommit of r293912 after fixing build failures,
and a recommit of r293918 after fixing LLD tests.
Differential Revision: https://reviews.llvm.org/D28978
llvm-svn: 293970
Summary: Some compilers, including MSVC and Clang, allow linker options to be specified in source files. In the legacy LTO API, there is a getLinkerOpts() method that returns linker options for the bitcode module being processed. This change adds that method to the new API, so that the COFF linker can get the right linker options when using the new LTO API.
Reviewers: pcc, ruiu, mehdi_amini, tejohnson
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D29207
llvm-svn: 293950
On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI.
llvm-svn: 293944
1. Added comments for options
2. Added missing option cl::desc field
3. Uniified function filter option for graph viewing.
Now PGO count/raw-counts share the same
filter option: -view-bfi-func-name=.
llvm-svn: 293938
On ELF every section can have a corresponding section symbol. When in
an assembly file we have
.quad .text
the '.text' refers to that symbol.
The way we used to handle them is to leave .text an undefined symbol
until the very end when the object writer would map them to the
actual section symbol.
The problem with that is that anything before the end would see an
undefined symbol. This could result in bad diagnostics
(test/MC/AArch64/label-arithmetic-diags-elf.s), or incorrect results
when using the asm streamer (est/MC/Mips/expansion-jal-sym-pic.s).
Fixing this will also allow using the section symbol earlier for
setting sh_link of SHF_METADATA sections.
This patch includes a few hacks to avoid changing our behaviour when
handling conflicts between section symbols and other symbols. I
reported pr31850 to track that.
llvm-svn: 293936
This is the second in the series of patches to enable adding
of machine sched-models for ARM processors easier and compact.
This patch focuses on integer instructions and adds missing
sched definitions.
Reviewers: rovka, rengolin
Differential Revision: https://reviews.llvm.org/D29127
llvm-svn: 293935
In r283838, we added the capability of splitting unspillable register.
When doing so we had to make sure the split live-ranges were also
unspillable and we did that by marking the related live-ranges in the
delegate method that is called when a new vreg is created.
However, by accessing the live-range there, we also triggered their lazy
computation (LiveIntervalAnalysis::getInterval) which is not what we
want in general. Indeed, later code in LiveRangeEdit is going to build
the live-ranges this lazy computation may mess up that computation
resulting in assertion failures. Namely, the createEmptyIntervalFrom
method expect that the live-range is going to be empty, not computed.
Thanks to Mikael Holmén <mikael.holmen@ericsson.com> for noticing and
reporting the problem.
llvm-svn: 293934
Use SetUnhandledExceptionFilter instead of AddVectoredExceptionHandler.
According to the documentation on Structured Exception Handling, this is the
order for the Exception Dispatching:
+ If the process is being debugged, the system notifies the debugger.
+ The Vectored Exception Handler is called.
+ The system attempts to locate a frame-based exception handler by searching the
stack frames of the thread in which the exception occurred.
+ If no frame-based handler can be found, the UnhandledExceptionFilter filter is
called.
+ Default handling based on the exception type.
So, similar to what we do for asan, we should use SetUnhandledExceptionFilter
instead of AddVectoredExceptionHandler, so user's code that is being fuzzed can
execute frame-based exception handlers before we catch them . We want to catch
unhandled exceptions, not all the exceptions.
Differential Revision: https://reviews.llvm.org/D29462
llvm-svn: 293920
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.
This is a recommit of r293912 after fixing build failures.
Differential Revision: https://reviews.llvm.org/D28978
llvm-svn: 293918
When a symbol is not exported outside of the
DSO, it is can be hidden. Usually we try to internalize
as much as possible, but it is not always possible, for
instance a symbol can be referenced outside of the LTO
unit, or there can be cross-module reference in ThinLTO.
Differential Revision: https://reviews.llvm.org/D28978
llvm-svn: 293912
Summary: While scanning predecessors to find an available loaded value, if the predecessor has a single predecessor, we can continue scanning through the single predecessor.
Reviewers: mcrosier, rengolin, reames, davidxl, haicheng
Reviewed By: rengolin
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D29200
llvm-svn: 293896
Recommiting after fixing X86 inc/dec chain bug.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 293893
Merging Load-add-store pattern into a increment op previously dropped
the load's chain from the instructions dependence if the store is
chained to a TokenFactor.
llvm-svn: 293892
It is important to change the ArgInfo's type from pointer to integer, otherwise
the CC assign function won't know what to do. Instead of hacking it up, we use
ComputeValueVTs and introduce some of the helpers that we will need later on for
lowering more complex types.
llvm-svn: 293889
Allow unknown types in TLI.getValueType, otherwise we get asserts for certain
types that we do not support yet (instead of returning that we don't support
them and falling through the normal error path).
llvm-svn: 293888
Summary:
We can hoist out loads that are dominated by invariant.start, to the preheader.
We conservatively assume the load is variant, if we see a corresponding
use of invariant.start (it could be an invariant.end or an escaping
call).
Reviewers: mkuper, sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29331
llvm-svn: 293887
This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain).
So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc.
Differential Revision: https://reviews.llvm.org/D28810
llvm-svn: 293880
This moves creation of SUBV_BROADCAST and merging of adjacent loads that are being inserted together.
This is a step towards removing legalizing of INSERT_SUBVECTOR except for vXi1 cases.
llvm-svn: 293872
These linkages mean that the ultimately prevailing symbol will have the same
semantics as any non-prevailing copy of the symbol, so we are free to ignore
the linker's resolution.
Differential Revision: https://reviews.llvm.org/D29367
llvm-svn: 293865
The operand types were defined to fit the fp16_to_fp node, which
has the half as an integer type. v_cvt_f32_f16 does support
source modifiers, so change this to have an FP type and modifiers.
For targets without legal f16, this requires recognizing the
bit operations and trying to produce them.
llvm-svn: 293857
Committing after fixing suggested changes and tested release/debug builds on
x86_64-linux and arm/aarch64 builds.
Differential revision: https://reviews.llvm.org/D29042
llvm-svn: 293850
LTO. Replace it with a related assertion, ensuring that abstract
variables appear only in abstract scopes.
Part of PR31437.
Differential Revision: http://reviews.llvm.org/D29430
llvm-svn: 293841
Functions matching LDS use to occupancy return results for a workgroup
of 64 workitems. The numbers has to be adjusted for bigger workgroups.
For example a workgroup of size 256 already occupies 4 waves just by
itself. Given that all numbers of LDS use in the compiler are per
workgroup, occupancy shall be multiplied by 4 in this case. Each 64
workitems still limited by the same number, but 4 subrgoups 64 workitems
each can afford 4 times more LDS to get the same occupancy.
In addition change initializes LDS size in the subtarget to a real value
for SI+ targets. This is required since LDS size is a variable in these
calculations.
Differential Revision: https://reviews.llvm.org/D29423
llvm-svn: 293837
Add 2 features: posix and windows.
Sometimes we want some specific tests only for posix and we use:
REQUIRES: posix
Sometimes we want some specific tests only for windows and we use:
REQUIRES: windows
Differential Revision: https://reviews.llvm.org/D29418
llvm-svn: 293827
Commands should expand the wildcards on Windows, the cmd prompt doesn't.
Because of that sancov was not finding the needed file.
To deal with this, we use ls and xargs from gnu win utils.
Differential Revision: https://reviews.llvm.org/D29374
llvm-svn: 293825
Previously, mergeTypeStreams returns only true or false, so it was
impossible to know the reason if it failed. This patch changes the
function signature so that it returns an Error object.
Differential Revision: https://reviews.llvm.org/D29362
llvm-svn: 293820
Although this is 'no-functional-change-intended', I'm adding tests
for shl-shl and lshr-lshr pairs because there is no existing test
coverage for those folds.
It seems like we should be able to remove some code from foldShiftedShift()
at this point because we're handling those patterns on the general path.
llvm-svn: 293814
Summary: No need to try to ease BB from LoopHeaders as we already know that BB is not in LoopHeaders.
Reviewers: hsung, majnemer, mcrosier, haicheng, rengolin
Reviewed By: rengolin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29232
llvm-svn: 293802
This tries to address what Hal defined (in the post-commit review of
r293727) a long-standing problem with noinline, where we end up
de facto inlining trivial functions e.g.
__attribute__((noinline)) int patatino(void) { return 5; }
because of return value propagation.
llvm-svn: 293799
The GAS assembler supports the ".set bopt" directive but according
to the sources it doesn't do anything. It's supposed to optimize
branches by filling the delay slot of a branch with it's target.
This patch teaches the MIPS asm parser to accept both and warn in
the case of 'bopt' that the bopt directive is unsupported.
This resolves PR/31841.
Thanks to Sean Bruno for reporting the issue!
llvm-svn: 293798
This introduces the `analyze` subcommand. For now there is only
one option, to analyze hash collisions in the type streams. In
the future, however, we could add many more things here, such
as performing size analyses, compacting, and statistics about
the type of records etc.
llvm-svn: 293795
When disassembling a DSO, for calls to functions from the PLT, llvm-objdump only
prints the offset from the PLT, like: <.plt+0x30>.
While objdump and dumpbin print the function name, like:
<__sanitizer_cov_trace_pc_guard@plt>
When analyzing the coverage in libFuzzer we dissasemble and look for the calls
to __sanitizer_cov_trace_pc_guard.
So, this fails when using llvm-objdump on a DSO.
Differential Revision: https://reviews.llvm.org/D29372
llvm-svn: 293791
This patch moves some helper functions related to interleaved access
vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like
to use these functions in a follow-on patch that improves interleaved load and
store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was
already duplicated there and has been removed.
Differential Revision: https://reviews.llvm.org/D29398
llvm-svn: 293788
Summary:
If there are two adjacent guards with different conditions, we can
remove one of them and include its condition into the condition of
another one. This patch allows InstCombine to merge them by the
following pattern:
guard(a); guard(b) -> guard(a & b).
Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29378
llvm-svn: 293778
These were simply preserving the flags of the original operation,
which was too conservative in most cases and incorrect for mul.
nsw/nuw may be needed for some combines to cleanup messes when
intermediate sext_inregs are introduced later.
Tested valid combinations with alive.
llvm-svn: 293776
Summary:
This change allows a re-order of two intructions if their uses
are overlapped.
Patch by Serguei Katkov!
Reviewers: reames, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29120
llvm-svn: 293775
A program may contain llvm.assume info that disagrees with other analysis.
This may be caused by UB in the program, so we must not crash because of that.
As noted in the code comments:
https://llvm.org/bugs/show_bug.cgi?id=31809
...we can do better, but this at least avoids the assert/crash in the bug report.
Differential Revision: https://reviews.llvm.org/D29395
llvm-svn: 293773
DebugInfoDWARFTests is the only user so far which initializes the
MCObjectStreamer without initializing the ASMParser. The MIPS backend
relies on the ASMParser to initialize the MipsABIInfo object and to
update the target streamer with it. This should turn the mips buildbots
green.
Reviewers: atanasyan, zoran.jovanovic
Differential Revision: https://reviews.llvm.org/D28025
llvm-svn: 293772
The the following instructions:
- LD/LWZ (expanded from sjLj pseudo-instructions)
- LXVL/LXVLL vector loads
- STXVL/STXVLL vector stores
all require G8RC_NO0X class registers for RA.
Differential Revision: https://reviews.llvm.org/D29289
Committed for Lei Huang
llvm-svn: 293769
Summary:
This way, the type legalization machinery will take care of registering
the result of this node properly.
This patches fixes all failing fp16 test cases with expensive checks.
(CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll
CodeGen/X86/soft-fp.ll)
Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel
Reviewed By: hfinkel
Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D28195
llvm-svn: 293765
Add both cores to the target parser and TableGen. Test that eabi
attributes are set correctly for both cores. Additionally, test the
absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively.
Committed on behalf of Sanne Wouda.
Reviewers : rengolin, olista01.
Differential Revision: https://reviews.llvm.org/D29073
llvm-svn: 293761
Summary:
I have a similar patch up for review already (D29173). If you prefer I
can squash them both together.
Also I think there more potential for code sharing between
LoopUnroll.cpp and LoopUnrollRuntime.cpp. Do you think patches for
that would be worthwhile?
Reviewers: mkuper, mzolotukhin
Reviewed By: mkuper, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29311
llvm-svn: 293758
For SSE we use fp because of the smaller encoding, but that doesn't apply to AVX. So just do the natural thing so we don't have to explain why we aren't. We can't do this for 256-bit loads/stores since integer loads and stores aren't available in AVX1 so we need fallback patterns since the integer types are legal.
This doesn't affect any tests because execution domain fixing freely converts the instructions anyway. Honestly, we could probably rely on it for the SSE size optimization too.
llvm-svn: 293743
This feature enables the fusion of such operations on Cortex A57, as
recommended in its Software Optimisation Guide, sections 4.14 and 4.15.
Differential revision: https://reviews.llvm.org/D28698
llvm-svn: 293739
This feature enables the fusion of such operations on Cortex A57, as
recommended in its Software Optimisation Guide, section 4.13, and on Exynos
M1.
Differential revision: https://reviews.llvm.org/D28491
llvm-svn: 293738
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.
In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.
Differential revision: https://reviews.llvm.org/D28489
llvm-svn: 293737
Summary:
isSuitableMemoryOp method is repsonsible for verification
that instruction is a candidate to use in implicit null check.
Additionally it checks that base register is not re-defined before.
In case base has been re-defined it just returns false and lookup
is continued while any suitable instruction will not succeed this check
as well. This results in redundant further operations.
So when we found that base register has been re-defined we just
stop.
Patch by Serguei Katkov!
Reviewers: reames, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29119
llvm-svn: 293736
SplitEditor::defFromParent() can create a register copy.
If register is a tuple of other registers and not all lanes are used
a copy will be done on a full tuple regardless. Later register unit
for an unused lane will be considered free and another overlapping
register tuple can be assigned to a different value even though first
register is live at that point. That is because interference only look at
liveness info, while full register copy clobbers all lanes, even unused.
This patch fixes copy to only cover used lanes.
Differential Revision: https://reviews.llvm.org/D29105
llvm-svn: 293728
Summary:
This change implements the instrumentation map loading library which can
understand both YAML-defined instrumentation maps, and ELF 64-bit object
files that have the XRay instrumentation map section. We break it out
into a library on its own to allow for other applications to deal with
the XRay instrumentation map defined in XRay-instrumented binaries.
This type provides both raw access to the logical representation of the
instrumentation map entries as well as higher level functions for
converting a function ID into a function address.
At this point we only support ELF64 binaries and YAML-defined XRay
instrumentation maps. Future changes should extend this to support
32-bit ELF binaries, as well as other binary formats (like MachO).
As part of this change we also migrate all uses of the extraction logic
that used to be defined in tools/llvm-xray/ to use this new type and
interface for loading from files. We also remove the flag from the
`llvm-xray` tool that required users to specify the type of the
instrumentation map file being provided to instead make the library
auto-detect the file type.
Reviewers: dblaikie
Subscribers: mgorny, varno, llvm-commits
Differential Revision: https://reviews.llvm.org/D29319
llvm-svn: 293721
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.
Differential Revision: https://reviews.llvm.org/D28583
llvm-svn: 293716
x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as
desired.
Verified that the particular nvptx approximate instructions here do in
fact return 0 for x = 0.
llvm-svn: 293713
Technically, this is actually changes the expression and the original
assert was "wrong", but since the conjunction is with true, it doesn't
matter in this case.
llvm-svn: 293709
Fix a bug where we would construct shufflevector instructions addressing
invalid elements.
Differential Revision: https://reviews.llvm.org/D29313
llvm-svn: 293673
Well, sort of. But the lower-level code that invoke used to be using completely
botched the handling of varargs functions, which hopefully won't be possible if
they're using the same code.
llvm-svn: 293670
@ABS8 can be applied to symbols which appear as immediate operands to
instructions that have a 8-bit immediate form for that operand. It causes
the assembler to use the 8-bit form and an 8-bit relocation (e.g. R_386_8
or R_X86_64_8) for the symbol.
Differential Revision: https://reviews.llvm.org/D28688
llvm-svn: 293667
Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation.
Reviewers: xur, davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29040
llvm-svn: 293657
In clang, the grammar for mangling for these names are "<special-name> ::= TW <object name>" for wrapper variables or "<special-name> ::= TH <object name>" for initialization variables.
Initial change was made in libccxxabi r293638
llvm-svn: 293643
Summary:
For some reason instructions are being inserted in the wrong order with some
builds. I'm not sure why this is happening.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D29325
llvm-svn: 293639
Summary:
The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.
The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.
Fixes Bug 31626.
Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD
Subscribers: jholewinski, nemanjai, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D28675
llvm-svn: 293635
The Requires class overrides the target requirements of an instruction,
rather than adding to them, so all ARM instructions need to include the
IsARM predicate when they have overwitten requirements.
This caused the swp and swpb instructions to be allowed in thumb mode
assembly, and the ARM encoding of CDP to be selected in codegen (which
is different for conditional instructions).
Differential Revision: https://reviews.llvm.org/D29283
llvm-svn: 293634
transformToIndexedCompare
If they don't have the same type, the size of the constant
index would need to be adjusted (and this wouldn't be always
possible).
Alternatively we could try the analysis with the initial
RHS value, which would guarantee that the two sides have
the same type. However it is unlikely that in practice this
would pass our transformation requirements.
Fixes PR31808 (https://llvm.org/bugs/show_bug.cgi?id=31808).
llvm-svn: 293629
Also add the ability to recognise PINSR(Vex, 0, Idx).
Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat.
The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch.
llvm-svn: 293627
Just adds the vmr (Vector Move Register) mnemonic for the VOR instruction in
the PPC back end.
Committing on behalf of brunoalr (Bruno Rosa).
Differential Revision: https://reviews.llvm.org/D29133
llvm-svn: 293626
Summary:
rL293124 added the necessary infrastructure to properly add the cloned
top level loop to LoopInfo, which means we do not have to do it manually
in CloneLoopBlocks.
@mkuper sorry for not pointing this out during my review of D29156, I just
realized that today.
Reviewers: mzolotukhin, chandlerc, mkuper
Reviewed By: mkuper
Subscribers: llvm-commits, mkuper
Differential Revision: https://reviews.llvm.org/D29173
llvm-svn: 293615
Summary:
This lets us lower to sqrt.approx and rsqrt.approx under more
circumstances.
* Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32,
when fast-math is enabled. Previously, we only would emit it for
calls to @llvm.nvvm.sqrt.f. (With this patch we no longer emit
sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to
simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.)
* Now we emit the ftz version of rsqrt.approx when ftz is enabled.
Previously, we only emitted rsqrt.approx when ftz was disabled.
Reviewers: hfinkel
Subscribers: llvm-commits, tra, jholewinski
Differential Revision: https://reviews.llvm.org/D28508
llvm-svn: 293605
I think this is safe as long as no inputs are known to ever
be nans.
Also add an intrinsic for fmed3 to be able to handle all safe
math cases.
llvm-svn: 293598
For now just port some of the existing NVPTX tests
and from an old HSAIL optimization pass which
approximately did the same thing.
Don't enable the pass yet until more testing is done.
llvm-svn: 293580
Make SolveLinEquationWithOverflow take the start as a SCEV, so we can
solve more cases. With that implemented, get rid of the special case
for powers of two.
The additional functionality probably isn't particularly useful,
but it might help a little for certain cases involving pointer
arithmetic.
Differential Revision: https://reviews.llvm.org/D28884
llvm-svn: 293576
Summary:
In revision rL278321, ExecutionDepsFix learned how to pick a better
register for undef register reads, e.g. for instructions such as
`vcvtsi2sdq`. While this revision improved performance on a good number
of our benchmarks, it unfortunately also caused significant regressions
(up to 3x) on others. This regression turned out to be caused by loops
such as:
PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT
^ |
+----------------------------------+
In the previous version of the clearance calculation, we would visit
the blocks in order, remembering for each whether there were any
incoming backedges from blocks that we hadn't processed yet and if
so queuing up the block to be re-processed. However, for loop structures
such as the above, this is clearly insufficient, since the block B
does not have any unknown backedges, so we do not see the false
dependency from the previous interation's Def of xmm registers in B.
To fix this, we need to consider all blocks that are part of the loop
and reprocess them one the correct clearance values are known. As
an optimization, we also want to avoid reprocessing any later blocks
that are not part of the loop.
In summary, the iteration order is as follows:
Before: PH A B C D A'
Corrected (Naive): PH A B C D A' B' C' D'
Corrected (w/ optimization): PH A B C A' B' C' D
To facilitate this optimization we introduce two new counters for each
basic block. The first counts how many of it's predecssors have
completed primary processing. The second counts how many of its
predecessors have completed all processing (we will call such a block
*done*. Now, the criteria to reprocess a block is as follows:
- All Predecessors have completed primary processing
- For x the number of predecessors that have completed primary
processing *at the time of primary processing of this block*,
the number of predecessors that are done has reached x.
The intuition behind this criterion is as follows:
We need to perform primary processing on all predecessors in order to
find out any direct defs in those predecessors. When predecessors are
done, we also know that we have information about indirect defs (e.g.
in block B though that were inherited through B->C->A->B). However,
we can't wait for all predecessors to be done, since that would
cause cyclic dependencies. However, it is guaranteed that all those
predecessors that are prior to us in reverse postorder will be done
before us. Since we iterate of the basic blocks in reverse postorder,
the number x above, is precisely the count of the number of predecessors
prior to us in reverse postorder.
Reviewers: myatsina
Differential Revision: https://reviews.llvm.org/D28759
llvm-svn: 293571
Create a WasmDumper subclass of ObjDumper to support Webassembly binary
files.
Patch by Sam Clegg
Differential Revision: https://reviews.llvm.org/D27355
llvm-svn: 293569
- Move DEBUG_TYPE below includes
- Change unknown address space constant to be consistent with other
passes
- Grammar fixes in debug output
llvm-svn: 293567
combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs.
This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns.
llvm-svn: 293560