In expansion of FCOPYSIGN, the shift node is missing when the two
operands of FCOPYSIGN are of the same size. We should always generate
shift node (if the required shift bit is not zero) to put the sign
bit into the right position, regardless of the size of underlying
types.
Differential Revision: https://reviews.llvm.org/D49973
llvm-svn: 338665
Originally, this was part of a larger refactoring I'd planned, but had to abandoned. I figured the minor improvement in readability was worthwhile.
llvm-svn: 338663
Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements
feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector
loses precision. This patch removes the code that adds these nodes
to true f64 operands. It also adds patterns required to ensure
the code is still vectorized rather than converting individual
elements and inserting into a vector.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38342
Differential Revision: https://reviews.llvm.org/D50121
llvm-svn: 338658
AArch64 ELF ABI does not define a static relocation type for TLS offset within
a module, which makes it impossible for compiler to generate a valid
DW_AT_location content for thread local variables. Currently LLVM generates an
invalid R_AARCH64_ABS64 relocation at the DW_AT_location field for a TLS
variable. That causes trouble for linker because thread local variable does
not have an absolute address at link time. AArch64 GCC solves the problem by
not generating DW_AT_location for thread local variables. We should do the
same in LLVM.
Differential Revision: https://reviews.llvm.org/D43860
llvm-svn: 338655
Summary:
Mark a variable as maybe-unused to prevent a -Wunused-but-set-variable
warning in optimized builds where asserts are removed.Test/first commit
to check setup and understand patch submission process.
Reviewers: srhines, pirama, dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49756
llvm-svn: 338654
(Previously reverted in r338442)
I'm told that the breakage came from us using an x86 triple on configs
that didn't have x86 enabled. This is remedied by moving the
debugcounter test to an x86 directory (where there's also a
opt-bisect-isel.ll test for similar reasons).
I can't repro the reverse-iteration failure mentioned in the revert with
this patch, so I assume that a misconfiguration on my end is what caused
that.
Original commit message:
Add DebugCounters to DivRemPairs
For people who don't use DebugCounters, NFCI.
Patch by Zhizhou Yang!
Differential Revision: https://reviews.llvm.org/D50033
llvm-svn: 338653
The callable flag can be used to indicate that a symbol is callable. If present,
the symbol is callable. If absent, the symbol may or may not be callable (the
client must determine this by context, for example by examining the program
representation that will provide the symbol definition).
This flag will be used in the near future to enable creation of lazy compilation
stubs based on SymbolFlagsMap instances only (without having to provide
additional information to determine which symbols need stubs).
llvm-svn: 338649
This is patch 4 of 4 NFC refactorings to handle type units and compile
units more consistently and with less concern about the object-file
section that they came from.
Patch 4 combines separate DWARFUnitVectors for compile and type units
into a single DWARFUnitVector that contains both. For now the
implementation distinguishes compile units from type units by putting
all compile units at the front of the vector, reflecting the DWARF v4
distinction between .debug_info and .debug_types sections. A future
patch will change this to allow the free mixing of unit kinds, as is
specified by DWARF v5.
Differential Revision: https://reviews.llvm.org/D49744
llvm-svn: 338633
This is patch 3 of 4 NFC refactorings to handle type units and compile
units more consistently and with less concern about the object-file
section that they came from.
Patch 3 simply renames DWARFUnitSection to DWARFUnitVector, as the
object-file section of a unit is nearly irrelevant now.
Differential Revision: https://reviews.llvm.org/D49743
llvm-svn: 338632
This is patch 2 of 4 NFC refactorings to handle type units and compile
units more consistently and with less concern about the object-file
section that they came from.
Patch 2 takes the existing std::deque<DWARFUnitSection> for type units
and makes it a simple DWARFUnitSection, simplifying the handling of
type units and making it more consistent with compile units.
Differential Revision: https://reviews.llvm.org/D49742
llvm-svn: 338629
This is patch 1 of 4 NFC refactorings to handle type units and compile
units more consistently and with less concern about the object-file
section that they came from.
Patch 1 replaces the templated DWARFUnitSection with a non-templated
version. That is, instead of being a SmallVector of pointers to a
specific unit kind, it is not a SmallVector of pointers to the base
class for both type and compile units. Virtual methods are magic.
Differential Revision: https://reviews.llvm.org/D49741
llvm-svn: 338628
Mutate the node type during selection when it
doesn't matter. This avoids an intermediate bitcast
node on targets with legal i16/f16.
Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16,
which I assume are OK.
llvm-svn: 338619
Summary:
Added an option that allows to emit only '.loc' and '.file' kind debug
directives, but disables emission of the DWARF sections. Required for
NVPTX target to support profiling. It requires '.loc' and '.file'
directives, but does not require any DWARF sections for the profiler.
Reviewers: probinson, echristo, dblaikie
Subscribers: aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46021
llvm-svn: 338616
It seems like perhaps because cstdio isn't directly included, the
compiler is accidentally picking up wprintf from somewhere else
and trying to call that. Hopefully this fixes it.
llvm-svn: 338614
We now emit a move of -1 before the cmov and do the addition after the cmov just like the case with an extra addition.
This may be slightly worse for code size, but is more consistent with other compilers. And we might be able to hoist the mov -1 outside of loops.
llvm-svn: 338613
This is useful for understanding how our demangler processes
back references and for investigating issues related to
back references. But it's a feature only useful for debugging
the demangling process itself, so I'm marking it hidden.
llvm-svn: 338609
After we detected the presence of a template via ?$ we would proceed by
only demangling a simple unqualified name. This means we would fail on
templated operators (and perhaps other yet-to-be-determined things)
This was discovered while doing some refactoring to store richer
semantic information about the demangled types to pave the way for
overhauling the way we handle backreferences. (Specifically, we need to
defer recording or resolving back-references until a symbol has been
completely demangled, because we need to use information that only
occurs later in the mangled string to decide whether a back-reference
should be recorded.)
Differential Revision: https://reviews.llvm.org/D50145
llvm-svn: 338608
Summary:
D25878, which added support for !absolute_symbol for normal X86 ISel,
did not add support for materializing references to absolute symbols for
X86 FastISel. This causes build failures because FastISel generates
PC-relative relocations for absolute symbols. Fall back to normal ISel
for references to !absolute_symbol GVs. Fix for PR38200.
Reviewers: pcc, craig.topper
Reviewed By: pcc
Subscribers: hiraditya, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D50116
llvm-svn: 338599
I've put CMPXCHG8B/CMPXCHG16B in the same file, even though technically they are under separate CPUID bits all targets seem to support both (or neither).
llvm-svn: 338595
The bug is visible in the constant-folded x86 tests. We can't use the
negated shift amount when the type is not power-of-2:
https://rise4fun.com/Alive/US1r
...so in that case, use the regular lowering that includes a select
to guard against a shift-by-bitwidth. This path is improved by only
calculating the modulo shift amount once now.
Also, improve the rotate (with power-of-2 size) lowering to use
a negate rather than subtract from bitwidth. This improves the
codegen whether we have a rotate instruction or not (although
we can still see that we're not matching to a legal rotate in
all cases).
llvm-svn: 338592
There is nothing x86-specific about this code, so it'd be nice to make this available for other targets to use in the future (and get it out of X86ISelLowering!).
Differential Revision: https://reviews.llvm.org/D50083
llvm-svn: 338586
This patch just extract code into a separate function to remove some
duplication between the old and new pass manager pipeline. Due to the
different CGSCC iterators used, not all code duplication was eliminated.
llvm-svn: 338585
Summary:
Add support for --rename-section flags from gnu objcopy.
Not all flags appear to have an effect for ELF objects, but allowing them would allow easier drop-in replacement. Other unrecognized flags are rejected.
This was only tested by comparing flags printed by "readelf -e <.o>" against the output of gnu vs llvm objcopy, it hasn't been tested to be valid beyond that.
Reviewers: jakehehrlich, alexshap
Subscribers: llvm-commits, paulsemel, alexshap
Differential Revision: https://reviews.llvm.org/D49870
llvm-svn: 338582
Clang support for the Armv8.2-A FP16 vector intrinsic was committed in
rC328277, but this was never followed up, i.e. the LLVM part is missing.
I've raised PR38404, and this is the first step to address this. I.e.,
this adds tests for the Armv8.2-A FP16 vector intrinsic, and thus shows
which intrinsics already work, and which need further work.
Differential Revision: https://reviews.llvm.org/D50142
llvm-svn: 338568
The functions `lookForDIEsToKeep` and `keepDIEAndDependencies` can have
some very deep recursion. This tackles part of this problem by removing
the recursion from `lookForDIEsToKeep` by turning it into a worklist.
The difficulty in doing so is the computation of incompleteness, which
depends on the incompleteness of its children. To compute this, we
insert "continuation markers" into the worklist. This informs the work
loop to (re)compute the incompleteness property of the DIE associated
with it (i.e. the parent of the previously processed DIE).
This patch should generate byte-identical output. Unfortunately it also
has some impact of performance, regressing by about 4% when processing
clang on my machine.
Differential revision: https://reviews.llvm.org/D48899
llvm-svn: 338536
Getting the DWARF types section is only implemented for ELF object
files. We already disabled emitting debug types in clang (r337717), but
now we also report an fatal error (rather than crashing) when trying to
obtain this section in MC. Additionally we ignore the generate debug
types flag for unsupported target triples.
See PR38190 for more information.
Differential revision: https://reviews.llvm.org/D50057
llvm-svn: 338527
Summary:
Add _L to _LZ image intrinsic table mapping to table gen.
In ISelLowering check if image intrinsic has lod and if it's equal
to zero, if so remove lod and change opcode to equivalent mapped _LZ.
Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71
Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49483
llvm-svn: 338523
The DAG combiner logic to simplify AND masks in shift counts is invalid.
While it is true that the SystemZ shift instructions ignore all but the
low 6 bits of the shift count, it is still invalid to simplify the AND
masks while the DAG still uses the standard shift operators (which are
*not* defined to match the SystemZ instruction behavior).
Instead, this patch performs equivalent operations during instruction
selection. For completely removing the AND, this now happens via
additional DAG match patterns implemented by a multi-alternative
PatFrags. For simplifying a 32-bit AND to a 16-bit AND, the existing DAG
patterns were already mostly OK, they just needed an output XForm to
actually truncate the immediate value.
Unfortunately, the latter change also exposed a bug in TableGen: it
seems XForms are currently only handled correctly for direct operands of
the outermost operation node. This patch also fixes that bug by simply
recurring through the whole pattern. This should be NFC for all other
targets.
Differential Revision: https://reviews.llvm.org/D50096
llvm-svn: 338521
The DWARFDie is a lightweight utility wrapper that stores a pointer to a
compile unit and a debug info entry. Currently, its iterator (used for
walking over its children) stores a DWARFDie and returns a const
reference when dereferencing it.
When the iterator is modified (by incrementing or decrementing it), this
reference becomes invalid. This was happening when calling reverse on
it, because the std::reverse_iterator is keeping a temporary copy of the
iterator (see
https://en.cppreference.com/w/cpp/iterator/reverse_iterator for a good
illustration).
The relevant code in libcxx:
reference operator*() const {_Iter __tmp = current; return *--__tmp;}
When dereferencing the reverse iterator, we decrement and return a
reference to a DWARFDie stored in the stack frame of this function,
resulting in UB at runtime.
This patch specifies the std::reverse_iterator for DWARFDie to do the
right thing.
Differential revision: https://reviews.llvm.org/D49679
llvm-svn: 338506
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338494
It's not strictly required by the transform of the cmov and the add, but it makes sure we restrict it to the cases we know we want to match.
While there canonicalize the operand order of the cmov to simplify the matching and emitting code.
llvm-svn: 338492
This revision implements support for generating DWARFv5 .debug_addr section.
The implementation is pretty straight-forward: we just check the dwarf version
and emit section header if needed.
Reviewers: aprantl, dblaikie, probinson
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D50005
llvm-svn: 338487
This patch intends to enable jump threading when a method whose return type is std::pair<int, bool> or std::pair<bool, int> is inlined.
For example, jump threading does not happen for the if statement in func.
std::pair<int, bool> callee(int v) {
int a = dummy(v);
if (a) return std::make_pair(dummy(v), true);
else return std::make_pair(v, v < 0);
}
int func(int v) {
std::pair<int, bool> rc = callee(v);
if (rc.second) {
// do something
}
SROA executed before the method inlining replaces std::pair by i64 without splitting in both callee and func since at this point no access to the individual fields is seen to SROA.
After inlining, jump threading fails to identify that the incoming value is a constant due to additional instructions (like or, and, trunc).
This series of patch add patterns in InstructionSimplify to fold extraction of members of std::pair. To help jump threading, actually we need to optimize the code sequence spanning multiple BBs.
These patches does not handle phi by itself, but these additional patterns help NewGVN pass, which calls instsimplify to check opportunities for simplifying instructions over phi, apply phi-of-ops optimization to result in successful jump threading.
SimplifyDemandedBits in InstCombine, can do more general optimization but this patch aims to provide opportunities for other optimizers by supporting a simple but common case in InstSimplify.
This first patch in the series handles code sequences that merges two values using shl and or and then extracts one value using lshr.
Differential Revision: https://reviews.llvm.org/D48828
llvm-svn: 338485
EFLAGS copy lowering.
If you have a branch of LLVM, you may want to cherrypick this. It is
extremely unlikely to hit this case empirically, but it will likely
manifest as an "impossible" branch being taken somewhere, and will be
... very hard to debug.
Hitting this requires complex conditions living across complex control
flow combined with some interesting memory (non-stack) initialized with
the results of a comparison. Also, because you have to arrange for an
EFLAGS copy to be in *just* the right place, almost anything you do to
the code will hide the bug. I was unable to reduce anything remotely
resembling a "good" test case from the place where I hit it, and so
instead I have constructed synthetic MIR testing that directly exercises
the bug in question (as well as the good behavior for completeness).
The issue is that we would mistakenly assume any SETcc with a valid
condition and an initial operand that was a register and a virtual
register at that to be a register *defining* SETcc...
It isn't though....
This would in turn cause us to test some other bizarre register,
typically the base pointer of some memory. Now, testing this register
and using that to branch on doesn't make any sense. It even fails the
machine verifier (if you are running it) due to the wrong register
class. But it will make it through LLVM, assemble, and it *looks*
fine... But wow do you get a very unsual and surprising branch taken in
your actual code.
The fix is to actually check what kind of SETcc instruction we're
dealing with. Because there are a bunch of them, I just test the
may-store bit in the instruction. I've also added an assert for sanity
that ensure we are, in fact, *defining* the register operand. =D
llvm-svn: 338481
we aren't incorrectly generating any of it when doing SLH.
There was a bug that only occured with SLH that very much looked like it
could be caused by bad unwind info, and so this was a prime suspect.
Turns out that everything is fine, but this way we'll *see* if we end
up, for example, putting things we shouldn't inside the prolog.
llvm-svn: 338480
It is necessary to generate fixups in .debug_line as relaxation is
enabled due to the address delta may be changed after relaxation.
DWARF will record the mappings of lines and addresses in
.debug_line section. It will encode the information using special
opcodes, standard opcodes and extended opcodes in Line Number
Program. I use DW_LNS_fixed_advance_pc to encode fixed length
address delta and DW_LNE_set_address to encode absolute address
to make it possible to generate fixups in .debug_line section.
Differential Revision: https://reviews.llvm.org/D46850
llvm-svn: 338477
Previously we were just visiting the blocks in the function in IR order, which
is rather arbitrary. Therefore we wouldn't always visit defs before uses, but
the translation code relies on this assumption in some places.
Only codegen change seen in tests is an elision of a redundant copy.
Fixes PR38396
llvm-svn: 338476
Call shouldOutlineFromFunctionByDefault, isFunctionSafeToOutlineFrom,
getOutliningType, and getMachineOutlinerMBBFlags using the correct
TargetInstrInfo. And don't create a MachineFunction for a function
declaration.
The call to getOutliningCandidateInfo is still a little weird, but at
least the weirdness is explicitly called out.
Differential Revision: https://reviews.llvm.org/D49880
llvm-svn: 338465
Disable ARMCodeGenPrepare by default again. It is causing verifier
failues in V8 that look like:
Duplicate integer as switch case
switch i32 %trunc, label %if.end13 [
i32 0, label %cleanup36
i32 0, label %if.then8
], !dbg !4981
i32 0
fatal error: error in backend: Broken function found, compilation aborted!
I will continue reducing the test case and send it along.
llvm-svn: 338452
Summary:
See binutils-gdb/bfd/elf.c, GNU objcopy also strips .stab* (STABS)
.line* (DWARF 1) .gnu.linkonce.wi.* (linkonce section for .debug_info) but
I'm not sure we need to be compatible with it.
Reviewers: dblaikie, alexshap, jakehehrlich, jhenderson
Reviewed By: alexshap, jakehehrlich
Subscribers: aprantl, JDevlieghere, jakehehrlich, llvm-commits
Differential Revision: https://reviews.llvm.org/D50100
llvm-svn: 338443
Correct the address space for the inserted argument
stack slot.
AMDGPU seems to not do anything with this information,
so I don't think this was breaking anything.
llvm-svn: 338428
When lowering calling conventions, prefer to decompose vectors
into the constitute register types. This avoids artifical constraints
to satisfy a wide super-register.
This improves code quality because now optimizations don't need to
deal with the super-register constraint. For example the immediate
folding code doesn't deal with 4 component reg_sequences, so by
breaking the register down earlier the existing immediate folding
code is able to work.
This also avoids the need for the shader input processing code
to manually split vector types.
llvm-svn: 338416
Don't declare them as X86SchedWritePair when the folded class will never be used.
Note: MOVBE (load/store endian conversion) instructions tend to have a very different behaviour to BSWAP.
llvm-svn: 338412
As was done for vector rotations, we can efficiently use ISD::MULHU for vXi8/vXi16 ISD::SRL lowering.
Shift-by-zero cases are still problematic (mainly on v32i8 due to extra AND/ANDN/OR or VPBLENDVB blend masks but v8i16/v16i16 aren't great either if PBLENDW fails) so I've limited this first patch to known non-zero cases if we can't easily use PBLENDW.
Differential Revision: https://reviews.llvm.org/D49562
llvm-svn: 338407
This patch does the same thing as r338153 for COFF.
Note that this patch affects only the order of log messages.
The output file is already deterministic.
Differential Revision: https://reviews.llvm.org/D50023
llvm-svn: 338406
These aren't exhaustive, but cover some instructions that are only available in 32-bit mode (where would we be without good BCD math performance?).
llvm-svn: 338404
Summary:
Similar to D49636, but for PMADDUBSW. This instruction has the additional complexity that the addition of the two products saturates to 16-bits rather than wrapping around. And one operand is treated as signed and the other as unsigned.
A C example that triggers this pattern
```
static const int N = 128;
int8_t A[2*N];
uint8_t B[2*N];
int16_t C[N];
void foo() {
for (int i = 0; i != N; ++i)
C[i] = MIN(MAX((int16_t)A[2*i]*(int16_t)B[2*i] + (int16_t)A[2*i+1]*(int16_t)B[2*i+1], -32768), 32767);
}
```
Reviewers: RKSimon, spatel, zvi
Reviewed By: RKSimon, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49829
llvm-svn: 338402
This commit fixes two issues with the liveness information after the
call:
1) The code always spills RCX and RDX if InProlog == true, which results
in an use of undefined phys reg.
2) FinalReg, JoinReg, RoundedReg, SizeReg are not added as live-ins to
the basic blocks that use them, therefore they are seen undefined.
https://llvm.org/PR38376
Differential Revision: https://reviews.llvm.org/D50020
llvm-svn: 338400
Builder clang-cmake-armv8-full failed due to the assembly 'comment'
notation is not '#' in the target. So, I use CHECK-SAME to avoid to
check the comment notation in the same line in the test case.
llvm-svn: 338398
Summary:
When DFS numbers are not yet calculated for a dominator tree, we have to walk it up to say whether one node dominates some other.
This patch makes the slow walks shorter by only walking until the level of the node we check against is reached. This is because a node cannot possibly dominate something higher in its tree.
When running opt with -O3, the patch results in:
* 25% fewer loop iterations for `opt` (fullLTO)
* 30% fewer loop iterations for sqlite
Reviewers: brzycki, asbirlea, chandlerc, NutshellySima, grosser
Reviewed By: NutshellySima
Subscribers: mehdi_amini, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49955
llvm-svn: 338396
Workaround bug where the InstCombine pass was asserting on the IR added in lit
test, where we have a bitcast instruction after a GEP from an addrspace cast.
The second bitcast in the test was getting combined into
`bitcast <16 x i32>* %0 to <16 x i32> addrspace(3)*`, which looks like it should
be an addrspace cast instruction instead. Otherwise if control flow is allowed
to continue as it is now we create a GEP instruction
`<badref> = getelementptr inbounds <16 x i32>, <16 x i32>* %0, i32 0`. However
because the type of this instruction doesn't match the address space we hit an
assert when replacing the bitcast with that GEP.
```
void llvm::Value::doRAUW(llvm::Value*, bool): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
```
Differential Revision: https://reviews.llvm.org/D50058
llvm-svn: 338395
Summary:
When inserting lcssa Phi Nodes in the exit block
mak sure to preserve the original instructions DL.
Reviewers: vsk
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D50009
llvm-svn: 338391
There are two forms for label debug information in DWARF format.
1. Labels in a non-inlined function:
DW_TAG_label
DW_AT_name
DW_AT_decl_file
DW_AT_decl_line
DW_AT_low_pc
2. Labels in an inlined function:
DW_TAG_label
DW_AT_abstract_origin
DW_AT_low_pc
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
It also generates label debug information under global isel.
Differential Revision: https://reviews.llvm.org/D45556
llvm-svn: 338390
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338387
A detailed description of the tool has been recently added by Matt to
CommandGuide/llvm-mca.rst. File README.txt is now redundant and can be removed;
all the relevant user-guide information has been improved and then moved to
llvm-mca.rst.
In future, we should add another .rst for the "llvm-mca developer manual" to
provide infromation about:
- llvm-mca internals.
- How to add custom stages to the simulated pipeline.
- How to provide extra processor info in the scheduling model to improve the
analysis performed by llvm-mca.
llvm-svn: 338386
This is being done in order to make GVN able to better optimize certain inputs.
MemDep doesn't use PhiValues directly, but does need to notifiy it when things
get invalidated.
Differential Revision: https://reviews.llvm.org/D48489
llvm-svn: 338384
Summary:
If the ExtractElement instructions can be optimized out during the
vectorization and we need to reshuffle the parent vector, this
ShuffleInstruction may be inserted in the wrong place causing compiler
to produce incorrect code.
Reviewers: spatel, RKSimon, mkuper, hfinkel, javed.absar
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49928
llvm-svn: 338380
We could choose a free 0 for this, but this
matches the behavior for fmul undef, 1.0. Also,
the NaN use is more useful for folding use operations
although if it's not eliminated it is more expensive
in terms of code size.
llvm-svn: 338376
The LLD implementation of Tag_ABI_VFP_args needs to check the rarely seen
values of 3 (toolchain specific) and 4 compatible with both Base and VFP.
Add the missing enumeration values so that LLD can refer to them without
having to use the raw numbers.
Differential Revision: https://reviews.llvm.org/D50049
llvm-svn: 338373
This patch teaches llvm-mca how to identify dependency breaking instructions on
btver2.
An example of dependency breaking instructions is the zero-idiom XOR (example:
`XOR %eax, %eax`), which always generates zero regardless of the actual value of
the input register operands.
Dependency breaking instructions don't have to wait on their input register
operands before executing. This is because the computation is not dependent on
the inputs.
Not all dependency breaking idioms are also zero-latency instructions. For
example, `CMPEQ %xmm1, %xmm1` is independent on
the value of XMM1, and it generates a vector of all-ones.
That instruction is not eliminated at register renaming stage, and its opcode is
issued to a pipeline for execution. So, the latency is not zero.
This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis
interface. That method takes as input an instruction (i.e. MCInst) and a
MCSubtargetInfo.
The default implementation of isDependencyBreaking() conservatively returns
false for all instructions. Targets may override the default behavior for
specific CPUs, and return a value which better matches the subtarget behavior.
In future, we should teach to Tablegen how to automatically generate the body of
isDependencyBreaking from scheduling predicate definitions. This would allow us
to expose the knowledge about dependency breaking instructions to the machine
schedulers (and, potentially, other codegen passes).
Differential Revision: https://reviews.llvm.org/D49310
llvm-svn: 338372
The ELF for the Arm architecture document defines, for EF_ARM_EABI_VER5 and
above, the flags EF_ARM_ABI_FLOAT_HARD and EF_ARM_ABI_FLOAT_SOFT. These
have been defined to be compatible with the existing EF_ARM_VFP_FLOAT and
EF_ARM_SOFT_FLOAT used by gcc for EF_ARM_EABI_UNKNOWN.
This patch adds the flags in addition to the existing ones so that any code
depending on the old names will still work.
Differential Revision: https://reviews.llvm.org/D49992
llvm-svn: 338370
Since z13, the max group size will be 2 if any μop has more than 3 register
sources.
This has been ignored sofar in the SystemZHazardRecognizer, but is now
handled by recognizing those instructions and adjusting the tracking of
decoding and the cost heuristic for grouping.
Review: Ulrich Weigand
https://reviews.llvm.org/D49847
llvm-svn: 338368
This fold was written in an odd way and tried to avoid
an endless loop by bailing out on all constants instead
of the supposedly problematic case of -1. But (X & -1)
should always be simplified before we reach here, so I'm
not sure how that is a problem.
There were no tests for the commuted patterns, so I added
those at rL338364.
llvm-svn: 338367
isFNEG was duplicating much of what was done by getTargetConstantBitsFromNode in its own calls to getTargetConstantFromNode.
Noticed while reviewing D48467.
llvm-svn: 338358
Contrary to ELF, we don't add any markers that distinguish data generated
with .short/.long from normal instructions, so the .inst directive only
adds compatibility with assembly that uses it.
Differential Revision: https://reviews.llvm.org/D49936
llvm-svn: 338356
Contrary to ELF, we don't add any markers that distinguish data generated
with .long from normal instructions, so the .inst directive only adds
compatibility with assembly that uses it.
Differential Revision: https://reviews.llvm.org/D49935
llvm-svn: 338355
The patch introduces loop analysis (VPLoopInfo/VPLoop) for VPBlockBases.
This analysis will be necessary to perform some H-CFG transformations and
detect and introduce regions representing a loop in the H-CFG.
Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D48816
llvm-svn: 338346
In one place we checked X86Subtarget.slowLEA() to decide if the pass should run. But to decide what the pass should we only check isSLM. This resulted in Goldmont going down the Bonnell path.
llvm-svn: 338342
This patch fixes demangling of template aliases as template-template
arguments, and also fixes function pointers and references as
not type template parameters. All of these can be properly
demangled now, so I've ported over the test
clang/test/CodeGenCXX/ms-template-callbacks.cpp. All of these
tests pass
llvm-svn: 338340
Also refactors some existing code to materialize addresses for the large code
model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR.
This implements PR36390.
Differential Revision: https://reviews.llvm.org/D49903
llvm-svn: 338337
The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation.
llvm-svn: 338329
This patch adds support for demangling r-value references, new
operators such as the ""_foo operator, lambdas, alias types,
nullptr_t, and various other C++11'isms.
There is 1 failing test remaining in this file, which appears to
be related to back-referencing. This type of problem has the
potential to get ugly so I'd rather fix it in a separate patch.
Differential Revision: https://reviews.llvm.org/D50013
llvm-svn: 338324
Summary:
This patch mostly copies the existing Instruction Flow, and stage descriptions
from the mca README. I made a few text tweaks, but no semantic changes,
and made reference to the "default pipeline." I also removed the internals
references (e.g., reference to class names and header files). I did leave the
LSUnit name around, but only as an abbreviated word for the load-store unit.
Reviewers: andreadb, courbet, RKSimon, gbedwell, filcab
Reviewed By: andreadb
Subscribers: tschuett, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D49692
llvm-svn: 338319
This is exchanging a sub-of-1 with add-of-minus-1:
https://rise4fun.com/Alive/plKAH
This is another step towards improving select-of-constants codegen (see D48970).
x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral.
I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but
I think canonicalizing to 'add' is more likely to produce further transforms because we have more
folds for 'add'.
Differential Revision: https://reviews.llvm.org/D49924
llvm-svn: 338317
The patch introduces dominator analysis for VPBlockBases and extend
VPlan's GraphTraits specialization with the required interfaces. Dominator
analysis will be necessary to perform some H-CFG transformations and
to introduce VPLoopInfo (LoopInfo analysis on top of the VPlan representation).
Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D48815
llvm-svn: 338310
Also, make SerializationTraits for pairs forward the actual pair
template type arguments to the underlying serializer. This allows, for example,
std::pair<StringRef, bool> to be passed as an argument to an RPC call expecting
a std::pair<std::string, bool>, since there is an underlying serializer from
StringRef to std::string that can be used.
llvm-svn: 338305
Thinking about it more it might be possible for the later nodes to be folded in getNode in such a way that the other created nodes are left dead. This can cause use counts to be incorrect on nodes that aren't dead.
So its probably safer to leave this alone.
llvm-svn: 338298
Summary:
Normally, inling does not happen if caller does not have
"null-pointer-is-valid"="true" attibute but callee has it.
However, alwaysinline may force callee to be inlined.
In this case, if the caller has the "null-pointer-is-valid"="true"
attribute, copy the attribute to caller.
Reviewers: efriedma, a.elovikov, lebedev.ri, jyknight
Reviewed By: efriedma
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D50000
llvm-svn: 338292
This teaches the outliner to save LR to a register rather than the stack when
possible. This allows us to avoid bumping the stack in outlined functions in
some cases. By doing this, in a later patch, we can teach the outliner to do
something like this:
f1:
...
bl OUTLINED_FUNCTION
...
f2:
...
move LR's contents to a register
bl OUTLINED_FUNCTION
move the register's contents back
instead of falling back to saving LR in both cases.
llvm-svn: 338278
Previously, I thought this was a Windows failure. Then I realized it failed on
every bot that used the verifier. This makes it use the verifier always, and
adds that pass to the pipeline checks so that it's consistent across all bots.
llvm-svn: 338272
Summary:
Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed. This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op.
Patch by: sameconrad (Sam Conrad)
Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar
Reviewed By: lebedev.ri
Subscribers: efriedma, kparzysz, llvm-commits
Differential Revision: https://reviews.llvm.org/D47681
llvm-svn: 338270
This reapplies commit r338206 reverted by r338214 since the bug that
r338206 uncovered has been fixed in r338268.
Add support for inline assembly with matching input operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR). Note that regular input is already handled by existing
code.
llvm-svn: 338269
It seems like the pass pipeline on Windows is slightly different than on Linux
and macOS. As a result, the arm64-opt-remarks-lazy-bfi test has been failing.
This switches a CHECK-NEXT to a CHECK-DAG to try and get this running properly
again.
It'd be nice to switch it back to a CHECK-NEXT if possible, but the CHECK-NEXT
lines following the line we care about (the optimization remark emitter)
do a pretty good job of enforcing the ordering we want.
Hopefully this works, since I don't have a Windows machine. ;)
Example failure: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/11295
llvm-svn: 338267
This patch enables instructions that are destructive on their
destination- and first source operand, to be prefixed with a
MOVPRFX instruction.
This patch also adds a variety of tests:
- positive tests for all instructions and forms that accept a
movprfx for either or both predicated and unpredicated forms.
- negative tests for all instructions and forms that do not accept
an unpredicated or predicated movprfx.
- negative tests for the diagnostics that get emitted when a MOVPRFX
instruction is used incorrectly.
This is patch [2/2] in a series to add MOVPRFX instructions:
- Patch [1/2]: https://reviews.llvm.org/D49592
- Patch [2/2]: https://reviews.llvm.org/D49593
Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D49593
llvm-svn: 338261
This patch adds predicated and unpredicated MOVPRFX instructions, which
can be prepended to SVE instructions that are destructive on their first
source operand, to make them a constructive operation, e.g.
add z1.s, p0/m, z1.s, z2.s <=> z1 = z1 + z2
can be made constructive:
movprfx z0, z1
add z0.s, p0/m, z0.s, z2.s <=> z0 = z1 + z2
The predicated MOVPRFX instruction can additionally be used to zero
inactive elements, e.g.
movprfx z0.s, p0/z, z1.s
add z0.s, p0/m, z0.s, z2.s
Not all instructions can be prefixed with the MOVPRFX instruction
which is why this patch also adds a mechanism to validate prefixed
instructions. The exact rules when a MOVPRFX applies is detailed in
the SVE supplement of the Architectural Reference Manual.
This is patch [1/2] in a series to add MOVPRFX instructions:
- Patch [1/2]: https://reviews.llvm.org/D49592
- Patch [2/2]: https://reviews.llvm.org/D49593
Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D49592
llvm-svn: 338258
This makes it easier for someone to copy-paste this line, change the path, and run the command.
Differential Revision: https://reviews.llvm.org/D49201
llvm-svn: 338254
The combination of r338240 and r338242 causes the opt pass pipeline tests to
fail because of how r338242 makes BasicAA be invalidated more often. Adjust the
tests to reflect this.
llvm-svn: 338250
By using PhiValuesAnalysis we can get all the values reachable from a phi, so
we can be more precise instead of giving up when a phi has phi operands. We
can't make BaseicAA directly use PhiValuesAnalysis though, as the user of
BasicAA may modify the function in ways that PhiValuesAnalysis can't cope with.
For this optional usage to work correctly BasicAAWrapperPass now needs to be not
marked as CFG-only (i.e. it is now invalidated even when CFG is preserved) due
to how the legacy pass manager handles dependent passes being invalidated,
namely the depending pass still has a pointer to the now-dead dependent pass.
Differential Revision: https://reviews.llvm.org/D44564
llvm-svn: 338242
The machine verifier asserts with:
Assertion failed: (isMBB() && "Wrong MachineOperand accessor"), function getMBB, file ../include/llvm/CodeGen/MachineOperand.h, line 542.
It calls analyzeBranch which tries to call getMBB if the opcode is
JMP_1, but in this case we do:
JMP_1 @OUTLINED_FUNCTION
I believe we have to use TAILJMPd64 instead of JMP_1 since JMP_1 is used
with brtarget8.
Differential Revision: https://reviews.llvm.org/D49299
llvm-svn: 338237
Summary:
These instructions interact with hardware blocks outside the shader core,
and they can have "scalar" side effects even when EXEC = 0. We don't
want these scalar side effects to occur when all lanes want to skip
these instructions, so always add the execz skip branch instruction
for basic blocks that contain them.
Also ensure that we skip scalar stores / atomics, though we don't
code-gen those yet.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48431
Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44
llvm-svn: 338235