Unpredicated copy of optionally-shifted immediate to SVE vector,
along with MOV-aliases.
This patch contains parsing and printing support for
cpy_imm8_opt_lsl_(i8|i16|i32|i64). This operand allows a signed value in
the range -128 to +127. For element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512.
For element-width of 8 bits a range of -128 to 255 is accepted, since a copy
of a byte can be considered either signed/unsigned.
Note: This patch renames tryParseAddSubImm() -> tryParseImmWithOptionalShift()
and moves the behaviour of trying to shift a plain immediate by an allowed
shift-value to its addImmWithOptionalShiftOperands() method, so that the
parsing itself is generic and allows immediates from multiple shifted operands.
This is done because an immediate can be divisible by both shifted operands.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D47309
llvm-svn: 333263
Previously, this pass would look at the (static) set returned by
getCallPreservedMask() and add those back as preserved in the case when
isSafeForNoCSROpt() returns false.
A problem is that a target may have to save some registers even when NoCSROpt
takes place. For instance, on SystemZ, the return register is needed upon
return from a function.
Furthermore, getCallPreservedMask() only includes the registers that the
target actually wishes to emit save/restore instructions for. This means that
subregs and (fully saved) superregs are missing.
This patch instead takes the (dynamic) set returned by target for the
function from determineCalleeSaves() and then adds sub/super regs to build
the set to be used when building the RegMask for the function.
Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D46315
llvm-svn: 333261
Summary:
Lower control flow did not correctly handle the case that a loop break
in if/else was on a condition that was not guaranteed to be masked by
exec. The first test kernel shows an example of this going wrong; after
exiting the loop, exec is all ones, even if it was not before the loop.
The fix is for lowering of if-break and else-break to insert an
S_AND_B64 to mask the break condition with exec. This commit also
includes the optimization of not inserting that S_AND_B64 if it is
obviously not needed because the break condition is the result of a
V_CMP in the same basic block.
V2: Addressed some review comments.
V3: Test fixes.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D44046
Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c
llvm-svn: 333258
Re-add the feature flag for invpcid, which was removed in r294561.
Add an intrinsic, which always uses a 32 bit integer as first argument,
while the instruction actually uses a 64 bit register in 64 bit mode
for the INVPCID_TYPE argument.
Reviewers: craig.topper
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47141
llvm-svn: 333255
The plan had always been to move towards using this rather than so much
in-pass simplification within the loop pipeline, but we never got around
to it.... until only a couple months after it was removed due to disuse.
=/
This commit is just a pure revert of the removal. I will add tests and
do some basic cleanup in follow-up commits. Then I'll wire it into the
loop pass pipeline.
Differential Revision: https://reviews.llvm.org/D47353
llvm-svn: 333250
Setting the "Debug Info Version" module flag makes it possible to pipe
synthetic debug info into llc, which is useful for testing backends.
llvm-svn: 333237
When a GEP with all zero indices is converted to bitcast, its DI wasn't
copied over to the newly created instruction. This patch fixes that bug.
Patch by Kareem Ergawy!
Differential Revision: https://reviews.llvm.org/D47347
llvm-svn: 333235
If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.
This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. Need to check alive to make sure there are no corner cases.
Differential Revision: https://reviews.llvm.org/D47041
llvm-svn: 333226
Reassociation of math ops in some contexts (especially vector contexts)
has generally only been happening when the 'fast' FMF was set. This
enables reassoication when only the finer grained controls 'reassoc' and
'nsz' are set.
Differential Revision: https://reviews.llvm.org/D47335
llvm-svn: 333221
The existing code has three different ways to try to lower a 64-bit
immediate to the sequence ORR+MOVK. The result is messy: it misses
some possible sequences, and the order of the checks means we sometimes
emit two MOVKs when we only need one.
Instead, just use a simple loop to try all possible two-instruction
ORR+MOVK sequences.
Differential Revision: https://reviews.llvm.org/D47176
llvm-svn: 333218
Previously update_mca_test_checks worked entirely at "block" level where
a block is some sequence of lines delimited by at least one empty line.
This generally worked well, but could sometimes lead to excessive
repetition of check lines for various prefixes if some block was almost
identical between prefixes, but not quite (for example, due to a
different dispatch width in the otherwise identical summary views).
This new analyis attempts to split blocks further in the case where the
following conditions are met:
a) There is some prefix common to every RUN line (typically 'ALL').
b) The first line of the block is common to the output with every prefix.
c) The block has the same number of lines for the output with every prefix.
Also, regenerated all llvm-mca test files with the following command:
update_mca_test_checks.py "../test/tools/llvm-mca/*/*.s" "../test/tools/llvm-mca/*/*/*.s"
The new analysis showed a "multiple lines not disambiguated by prefixes" warning
for test "AArch64/Exynos/scheduler-queue-usage.s" so I've also added some
explicit prefixes to each of the RUN lines in that test.
Differential Revision: https://reviews.llvm.org/D47321
llvm-svn: 333204
Summary:
In LICM, CFG could be changed in splitPredecessorsOfLoopExit(), which update
only DT and LoopInfo. Therefore, we should preserve only DT and LoopInfo specifically,
instead of all analyses that depend on the CFG (setPreservesCFG()).
This change should fix PR37323.
Reviewers: uabelho, davide, dberlin, Ka-Ka
Reviewed By: dberlin
Subscribers: mzolotukhin, bjope, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46775
llvm-svn: 333198
The ARM/ARM64 AESE and AESD instructions have a builtin XOR as the first step in
the instruction. Therefore, if the AES key is zero and the AES data was
previously XORed, it can be combined into a single instruction.
Differential Revision: https://reviews.llvm.org/D47239
Patch by Michael Brase!
llvm-svn: 333193
This commit adds a color category so tools can document this option and
enables it for dwarfdump and dsymuttil.
rdar://problem/40498996
llvm-svn: 333176
To do this:
1. Add fixup_riscv_relax fixup types which eventually will
transfer to R_RISCV_RELAX relocation types.
2. Insert R_RISCV_RELAX relocation types to auipc function call
expression when linker relaxation enabled.
Differential Revision: https://reviews.llvm.org/D44886
llvm-svn: 333158
Summary:
If NaryReassociate succeed it will, when replacing the old instruction
with the new instruction, also recursively delete trivially
dead instructions from the old instruction. However, if the input to the
NaryReassociate pass contain dead code it is not save to recursively
delete trivially deadinstructions as it might lead to deleting the newly
created instruction.
This patch will fix the problem by using WeakVH to detect this
rare case, when the newly created instruction is dead, and it will then
restart the basic block iteration from the beginning.
This fixes pr37539
Reviewers: tra, meheff, grosser, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47139
llvm-svn: 333155
The match pattern in the definition of LXSDX is xoaddr, so the Pseudo
instruction XFLOADf64 never gets selected. XFLOADf64 expands to LXSDX/LFDX post
RA based on the register pressure. To avoid ambiguity, we need to remove the
select pattern for LXSDX, same as what was done for LXSD. STXSDX also have
the same issue.
Patch by Qing Shan Zhang (steven.zhang).
Differential Revision: https://reviews.llvm.org/D47178
llvm-svn: 333150
This patch continues a series of patches started by r332907 (reapplied
as r332917).
In this commit we move register bank checks back from epilogue of
every rule matcher to a position locally close to the rest of the
checks for a particular (nested) instruction.
This increases the number of common conditions within 2nd level
groups.
This is expected to decrease time GlobalISel spends in its
InstructionSelect pass by about 2% for an -O0 build as measured on
sqlite3-amalgamation (http://sqlite.org/download.html) targeting
AArch64 (cross-compile on x86).
Reviewers: qcolombet, dsanders, bogner, aemerson, javed.absar
Reviewed By: qcolombet
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D44700
llvm-svn: 333144
This patch continues a series of patches started by r332907 (reapplied
as r332917).
In this commit we greedily stuff 2nd level GroupMatcher's common
conditions with as many predicates as possible. This is purely
post-processing and it doesn't change which rules are put into the
groups in the first place: that decision is made by looking at the
first common predicate only.
The compile time improvements are minor and well within error margin,
however, it's highly improbable that this transformation could
pessimize performance, thus I'm still committing it for potential
gains for targets not implementing GlobalISel yet and out of tree
targets.
Reviewers: qcolombet, dsanders, bogner, aemerson, javed.absar
Reviewed By: qcolombet
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D44700
llvm-svn: 333139
When a bitcast is being sunk in -codegenprepare pass, its DI wasn't
copied over to the newly created instruction. This patch fixes that
bug.
Patch by Kareem Ergawy!
Differential Revision: https://reviews.llvm.org/D47282
llvm-svn: 333133
Summary:
Set CostPerUse higher for registers that are not used in the compressed
instruction set. This will influence the greedy register allocator to reduce
the use of registers that can't be encoded in 16 bit instructions. This
affects register allocation even when compressed instruction isn't targeted,
we see no major negative codegen impact.
Reviewers: asb
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang
Differential Revision: https://reviews.llvm.org/D47039
llvm-svn: 333132
This is a small follow-up to the revisions r333117 and r331663.
1. Avoid the name conflicts of the generated variables for prefixes.
2. Apply clang-format -i -style=llvm to llvm-objcopy.cpp once again.
3. Add a test for the flag with double dash.
Test plan: make check-all
llvm-svn: 333120
Implemente patterns to extract [Un]signed Word vector element and convert to
quad-precision.
Differential Revision: https://reviews.llvm.org/D46536
llvm-svn: 333115
This patch continues a series of patches started by r332907 (reapplied
as r332917)
In this commit we sort type checks towards the beginning of every rule
within the MatchTable as they fail often and it's best to fail early.
This is expected to decrease time GlobalISel spends in its
InstructionSelect pass by roughly 7% for an -O0 build as measured on
sqlite3-amalgamation (http://sqlite.org/download.html) targeting
AArch64. The amalgamation is a large single-file C-source that makes
compiler backend performance improvements to stand out from frontend.
It's also a part of CTMark.
Reviewers: qcolombet, dsanders, bogner, aemerson, javed.absar
Reviewed By: qcolombet
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D44700
llvm-svn: 333114
Implemente patterns to extract [Un]signed DWord vector element and convert to
quad-precision.
Differential Revision: https://reviews.llvm.org/D46333
llvm-svn: 333112
Summary:
StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get the order.
The only problem with it is that sometimes backedges for outer loops will be visited before backedges for inner loops.
To solve this problem, a loop depth based approach has been used to make sure all blocks in this loop has been visited
before moving on to outer loop.
However, we found a problem for a SubRegion which is a loop itself:
--> BB1 --> BB2 --> BB3 -->
In this case, BB2 is a SubRegion (loop), and thus its loopdepth is different than that of BB1 and BB3. This fact will lead
BB2 to be placed in the wrong order.
In this work, we treat the SubRegion as a special case and use its exit block to determine the loop and its depth
to guard the sorting.
Reviewers:
arsenm, jlebar
Differential Revision:
https://reviews.llvm.org/D46912
llvm-svn: 333111
Summary:
Finally fixes [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]].
Now that the backend is all done, we can finally fold it!
The canonical unfolded masked merge pattern is
```(x & m) | (y & ~m)```
There is a second, equivalent variant:
```(x | ~m) & (y | m)```
Only one of them (the or-of-and's i think) is canonical.
And if the mask is not a constant, we should fold it to:
```((x ^ y) & M) ^ y```
https://rise4fun.com/Alive/ndQw
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: nicholas, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D46814
llvm-svn: 333106
This patch implements the "block reciprocal throughput" computation in the
SummaryView.
The block reciprocal throughput is computed as the MAX of:
- NumMicroOps / DispatchWidth
- Resource Cycles / #Units (for every resource consumed).
The block throughput is bounded from above by the hardware dispatch throughput.
That is because the DispatchWidth is an upper bound on how many opcodes can be part
of a single dispatch group.
The block throughput is also limited by the amount of hardware parallelism. The
number of available resource units affects how the resource pressure is
distributed, and also how many blocks can be delivered every cycle.
llvm-svn: 333095
For RISC-V it is desirable to have relaxation happen in the linker once
addresses are known, and as such the size between two instructions/byte
sequences in a section could change.
For most assembler expressions, this is fine, as the absolute address results
in the expression being converted to a fixup, and finally relocations.
However, for expressions such as .quad .L2-.L1, the assembler folds this down
to a constant once fragments are laid out, under the assumption that the
difference can no longer change, although in the case of linker relaxation the
differences can change at link time, so the constant is incorrect. One place
where this commonly appears is in debug information, where the size of a
function expression is in a form similar to the above.
This patch extends the assembler to allow an AsmBackend to declare that it
does not want the assembler to fold down this expression, and instead generate
a pair of relocations that allow the linker to carry out the calculation. In
this case, the expression is not folded, but when it comes to emitting a
fixup, the generic FK_Data_* fixups are converted into a pair, one for the
addition half, one for the subtraction, and this is passed to the relocation
generating methods as usual. I have named these FK_Data_Add_* and
FK_Data_Sub_* to indicate which half these are for.
For RISC-V, which supports this via e.g. the R_RISCV_ADD64, R_RISCV_SUB64 pair
of relocations, these are also set to always emit relocations relative to
local symbols rather than section offsets. This is to deal with the fact that
if relocations were calculated on e.g. .text+8 and .text+4, the result 12
would be stored rather than 4 as both addends are added in the linker.
Differential Revision: https://reviews.llvm.org/D45181
Patch by Simon Cook.
llvm-svn: 333079
Loop unswitching makes substantial changes to a loop that can also affect cached
SCEV info in its outer loops as well, but it only cares to invalidate SCEV cache for the
innermost loop in case of full unswitching and does not invalidate anything at all in
case of trivial unswitching. As result, we may end up with incorrect data in cache.
Differential Revision: https://reviews.llvm.org/D46045
Reviewed By: mzolotukhin
llvm-svn: 333072
Summary:
Patch for capture tracking broke
bootstrap of clang with -fstict-vtable-pointers
which resulted in debbugging nightmare. It was fixed
https://reviews.llvm.org/D46900 but as it turned
out, there were other parts like inliner (computing of
noalias metadata) that I found after bootstraping with enabled
assertions.
Reviewers: hfinkel, rsmith, chandlerc, amharc, kuhar
Subscribers: JDevlieghere, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D47088
llvm-svn: 333070
SafepointIRVerifier crashed while traversing blocks without a DomTreeNode.
This could happen with a custom pipeline or when some optional passes were skipped by OptBisect.
SafepointIRVerifier is fixed to traverse basic blocks that are reachable from entry. Test are added.
Patch Author: Yevgeny Rouban!
Reviewers: anna, reames, dneilson, DaniilSuchkov, skatkov
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47011
llvm-svn: 333063
This patch continues a series of patches started by r332907 (reapplied
as r332917)
In this commit we start grouping rules with common first condition on
the second level of the table.
This is expected to decrease time GlobalISel spends in its
InstructionSelect pass by roughly 13% for an -O0 build as measured on
sqlite3-amalgamation (http://sqlite.org/download.html) targeting
AArch64.
Reviewers: qcolombet, dsanders, bogner, aemerson, javed.absar
Reviewed By: qcolombet
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D44700
llvm-svn: 333053
Also, produce the canonical IR abs (s<0) to be more efficient.
This is the libcall equivalent of the clang builtin change from:
rL333038
Pasting from that commit message:
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."
That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.
llvm-svn: 333042
Summary: Previous patch does not care if a value is changed between calloc and strlen. This needs to be removed from InstCombine and maybe moved to DSE later after some rework.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47218
llvm-svn: 333022
The integer operation convertion for some reason only happens
if the source is a bitcast from an integer, which happens to
always be the situation when the result is loaded. Add
an additional pattern for when the source operation is really
an FP operation.
llvm-svn: 333019
This patch continues a series of patches started by r332907 (reapplied
as r332917)
In this commit we introduce a new matching opcode GIM_SwitchOpcode
that implements a jump table over opcodes and start emitting them for
root instructions.
This is expected to decrease time GlobalISel spends in its
InstructionSelect pass by roughly 20% for an -O0 build as measured on
sqlite3-amalgamation (http://sqlite.org/download.html) targeting
AArch64.
To some degree, we assume here that the opcodes form a dense set,
which is true at the moment for all upstream targets given the
limitations of our rule importing mechanism.
It might not be true for out of tree targets, specifically due to
pseudo's. If so, we might noticeably increase the size of the
MatchTable with this patch due to padding zeros. This will be
addressed later.
Reviewers: qcolombet, dsanders, bogner, aemerson, javed.absar
Reviewed By: qcolombet
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D44700
llvm-svn: 333017
When we're outlining a sequence that ends in a call, we can save up to
three instructions in the outlined function by turning the call into
a tail-call. I refer to this as thunk outlining because the resulting
outlined function looks like a thunk; suggestions welcome for a better
name.
In addition to making the outlined function shorter, thunk outlining
allows outlining calls which would otherwise be illegal to outline:
we don't need to save/restore LR, so we don't need to prove anything
about the stack access patterns of the callee.
To make this work effectively, I also added
MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner
code; this allows treating an arbitrary instruction as a terminator in
the suffix tree.
Differential Revision: https://reviews.llvm.org/D47173
llvm-svn: 333015
If one runs llvm-objcopy --strip-all --keep-symbol foo
and the symbol table indeed contains the symbol "foo"
then it should not be removed.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D47052
llvm-svn: 333008
This patch fixes two bugs:
* test1: Previously assume(a >= 5) concluded that a == 5. That's only
valid for assume(a == 5)...
* test2: If operands were swapped, additional users were added to the
wrong cmp operand. This resulted in an "unsettled iteration"
assertion failure.
Patch by Nikita Popov
Differential Revision: https://reviews.llvm.org/D46974
llvm-svn: 333007
In DWARF v5, the DWO ID is in the (split/skeleton) CU header, not an
attribute on the CU DIE.
This changes the size of those headers, so use the parsed size whenever
we have one, for simplicitly.
Differential Revision: https://reviews.llvm.org/D47158
llvm-svn: 333004
Some ISA's such as microMIPS32(R6) have instructions which are near identical
for code generation purposes, e.g. xor and xor16. These instructions take the
same value types for operands and return values, have the same
instruction predicates and map to the same ISD opcode. (These instructions do
differ by register classes.)
In such cases, the FastISel generator rejects the instruction definition.
This patch borrows the 'FastIselShouldIgnore' bit from rL129692 and enables
applying it to an instruction definition.
Reviewers: mcrosier
Differential Revision: https://reviews.llvm.org/D46953
llvm-svn: 332983
MipsLongBranchPass and MipsHazardSchedule passes are joined to one pass
because of mutual conflict. When MipsHazardSchedule inserts 'nop's, it
potentially breaks some jumps, so they have to be expanded to long
branches. When some branch is expanded to long branch, it potentially
creates a hazard situation, which should be fixed by adding nops.
New pass is called MipsBranchExpansion, it combines these two passes,
and runs them alternately until one of them reports no changes were made.
Differential Revision: https://reviews.llvm.org/D46641
llvm-svn: 332977
This enables us to detect more fast path sdiv cases under cost analysis.
This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs.
Found while working on D46276
Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases.
Differential Revision: https://reviews.llvm.org/D46637
llvm-svn: 332969
Summary:
When lowerswitch merge several cases into a new default block it's not
updating the PHI nodes accordingly. The code that update the PHI nodes
for the default edge only update the first entry and do not remove the
remaining ones, to make sure the number of entries match the number of
predecessors.
This is easily fixed by replacing the code that update the PHI node with
the already existing utility function for updating PHI nodes.
Reviewers: hans, reames, arsenm
Reviewed By: arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D47055
llvm-svn: 332960
Summary:
In LoopVersioning::addPHINodes we need to iterate over all
users for a value "Inst", and if the user is outside of the
VersionedLoop we should replace the use of "Inst" by using
the value "PN" instead.
Replacing the use of "Inst" for a user of "Inst" also means
that Inst->users() is modified. So it is not safe to do the
replace while iterating over Inst->users() as we used to do.
This patch splits the task into two steps. First we iterate
over Inst->users() to find all users that should be updated.
Those users are saved into a local data structure on the stack.
And then, in the second step, we do the actual updates. This
time iterating over the local data structure.
Reviewers: mzolotukhin, anemet
Reviewed By: mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47134
llvm-svn: 332958
We can eliminate old value if bound_ctrl = 1 and row_mask = bank_mask = 0xf.
This is alternative implementation working with the intrinsic in InstCombine.
Original review for past-ISel optimization: D46570.
Differential Revision: https://reviews.llvm.org/D46596
llvm-svn: 332956
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.
Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.
llvm-svn: 332953
For both argument and return types, promote illegal types like i24 to i32,
and if a type can't be easily promoted, clear out the signature before
bailing out, so avoid leaving it in a partially complete state.
Fixes PR37546.
llvm-svn: 332947
This patch continues a series of patches that decrease time spent by
GlobalISel in its InstructionSelect pass by roughly 60% for -O0 builds
for large inputs as measured on sqlite3-amalgamation
(http://sqlite.org/download.html) targeting AArch64.
This commit specifically removes number of operands checks that are
redundant if the instruction's opcode already guarantees that number
of operands (or more), and also avoids any kind of checks on a def
operand of a nested instruction as everything about it was already
checked at its use.
The expected performance implication is about 3% off InstructionSelect
comparing to the baseline (before the series of patches)
This patch also contains a bit of NFC changes required for further
patches in the series.
Every commit planned shares the same Phabricator Review.
Reviewers: qcolombet, dsanders, bogner, aemerson, javed.absar
Reviewed By: qcolombet
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D44700
llvm-svn: 332945
This is the FP sibling of D43141 with the corresponding IR change in rL327212.
We can't propagate undef here because if a variable operand is a NaN, these
binops must propagate NaN. Neither global nor node-level fast-math makes a
difference. If we have 'nnan', I think later folds can turn the NaN into undef.
The tests in X86/fp-undef.ll are meant to be the definitive verification for
these folds - everything reduces identically now.
The other test changes are collateral damage. They may need to be altered to
preserve their intent.
Differential Revision: https://reviews.llvm.org/D47026
llvm-svn: 332920
Apparently the compile time problem was caused by the fact that not
all compilers / STL implementations can automatically convert
std::unique_ptr<Derived> to std::unique_ptr<Base>. Fixed (hopefully)
by making sure it's std::unique_ptr<Derived>&& (rvalue ref) to
std::unique_ptr<Base> conversion instead.
llvm-svn: 332917
This patch starts a series of patches that decrease time spent by
GlobalISel in its InstructionSelect pass by roughly 60% for -O0 builds
for large inputs as measured on sqlite3-amalgamation
(http://sqlite.org/download.html) targeting AArch64.
The performance improvements are achieved solely by reducing the
number of matching GIM_* opcodes executed by the MatchTable's
interpreter during the selection by approx. a factor of 30, which also
brings contribution of this particular part of the selection process
to the overall runtime of InstructionSelect pass down from approx.
60-70% to 5-7%, thus making further improvements in this particular
direction not very profitable.
The improvements described above are expected for any target that
doesn't have many complex patterns. The targets that do should
strictly benefit from the changes, but by how much exactly is hard to
estimate beforehand. It's also likely that such target WILL benefit
from further improvements to MatchTable, most likely the ones that
bring it closer to a perfect decision tree.
This commit specifically is rather large mostly NFC commit that does
necessary preparation work and refactoring, there will be a following
series of small patches introducing a specific optimization each
shortly after.
This commit specifically is expected to cause a small compile time
regression (around 2.5% of InstructionSelect pass time), which should
be fixed by the next commit of the series.
Every commit planned shares the same Phabricator Review.
Reviewers: qcolombet, dsanders, bogner, aemerson, javed.absar
Reviewed By: qcolombet
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D44700
llvm-svn: 332907
Summary:
As pointed out in D46528, we errneously transform cases like `xor X, -1`,
even though we use said function.
It's because the `-1` is actually a bitcast there.
So i think we can just look through it in the function.
Differential Revision: https://reviews.llvm.org/D47156
llvm-svn: 332905
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].
[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated (see `@in`), and we need to make sure that they are generated.
Differential Revision: https://reviews.llvm.org/D46528
llvm-svn: 332904
SimplifyDemandedBits can remove bits from the masks for the shift amounts we need to see to detect rotates.
This patch uses zeroes from computeKnownBits to fill in some of these mask bits to make the match work.
As currently written this calls computeKnownBits even when the mask hasn't been simplified because it made the code simpler. If we're worried about compile time performance we can improve this.
I know we're talking about making a rotate intrinsic, but hopefully we can go ahead and do this change and just make sure the rotate intrinsic also handles it.
Differential Revision: https://reviews.llvm.org/D47116
llvm-svn: 332895
This removes 6 intrinsics since we no longer need separate mask and maskz intrinsics.
Differential Revision: https://reviews.llvm.org/D47124
llvm-svn: 332890
In all cases, we're pulling the cast above the select.
That's not a good canonicalization if we're creating
a select that then mismatches the operand size of its
condition.
llvm-svn: 332883
Summary:
The correct spelling is "DWARF", the debugging format, not "DWARG".
The typo is in a (not executed by lit) comment in a test file, so
fixing it does not result in any functional change.
Test Plan: check-llvm, just in case
llvm-svn: 332878
Rather than relying on the user to do the address calculating in
DW_AT_location we should just dump the absolute address.
rdar://problem/38513870
Differential revision: https://reviews.llvm.org/D47152
llvm-svn: 332873
Change matchSelectPattern to return X and -X for ABS/NABS in a well defined order. Adjust EarlyCSE to account for this. Ensure the SPF result is some kind of min/max and not abs/nabs in one place in InstCombine that made me nervous.
Prevously we returned the two operands of the compare part of the abs pattern. The RHS is always going to be a 0i, 1 or -1 constant. This isn't a very meaningful thing to return for any one. There's also some freedom in the abs pattern as to what happens when the value is equal to 0. This freedom led to early cse failing to match when different constants were used in otherwise equivalent operations. By returning the input and its negation in a defined order we can ensure an exact match. This also makes sure both patterns use the exact same subtract instruction for the negation. I believe CSE should evebntually make this happen and properly merge the nsw/nuw flags. But I'm not familiar with CSE and what order it does things in so it seemed like it might be good to really enforce that they were the same.
Differential Revision: https://reviews.llvm.org/D47037
llvm-svn: 332865
r332654 was reverted due to an unused function warning in
release build. This commit includes the same code with the
warning silenced.
Differential Revision: https://reviews.llvm.org/D44338
llvm-svn: 332860
Summary:
This patch fixes PR37526 by simplifying the newly generated LoadInst
instructions. If the pointer address is a bitcast from the pointer to
the NewType, we can just remove this extra bitcast instead of creating
the new one. This fixes the PR37526 + may speed up the whole compilation
process.
Reviewers: spatel, RKSimon, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47144
llvm-svn: 332855
Summary: Add LLVMDIBuilderCreateObjCIVar, LLVMDIBuilderCreateObjCProperty, and LLVMDIBuilderCreateInheritance to allow declaring metadata for Objective-C class hierarchies and their associated properties and instance variables.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: harlanhaskins, llvm-commits
Differential Revision: https://reviews.llvm.org/D47123
llvm-svn: 332850
Chances are we'll be asked again after type legalization, but before that point
it's better to claim misaligned accesses aren't allowed than to assert.
llvm-svn: 332840
We were previously using a DT in CVP through SimplifyQuery, but not requiring it in
the new pass manager. Hence it would crash if DT was not already available. This now
gets DT directly and plumbs it through to where it is used (instead of using it
through SQ).
llvm-svn: 332836
MipsLongBranchPass and MipsHazardSchedule passes are joined to one pass
because of mutual conflict. When MipsHazardSchedule inserts 'nop's, it
potentially breaks some jumps, so they have to be expanded to long
branches. When some branch is expanded to long branch, it potentially
creates a hazard situation, which should be fixed by adding nops.
New pass is called MipsBranchExpansion, it combines these two passes,
and runs them alternately until one of them reports no changes were made.
Differential Revision: https://reviews.llvm.org/D46641
llvm-svn: 332834
As suggested by Fabian on PR37426, we can use PMULUDQ to perform v4i32 vector rotations as the upper 32bits of the multiply will contain the 'wrapped' bits of the rotation.
v8i16/v16i8 rotations would be straightforward to add to lowerRotate in the future - ideally we'd mostly share code with the vector shifts lowering.
Differential Revision: https://reviews.llvm.org/D46954
llvm-svn: 332832
Summary: Add wrappers for a module's alias iterators and a getter and setter for the aliasee value.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits, harlanhaskins
Differential Revision: https://reviews.llvm.org/D46808
llvm-svn: 332826
Previously the compiler was using the microMIPSR3 variants, incorrectly.
Reviewers: atanasyan, abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D46948
llvm-svn: 332820
We already do this for min/max (see the blob above the diff),
so we should do the same for abs/nabs.
A sign-bit check (<s 0) is used as a predicate for other IR
transforms and it's likely the best for codegen.
This might solve the motivating cases for D47037 and D47041,
but I think those patches still make sense. We can't guarantee
this canonicalization if the icmp has more than one use.
Differential Revision: https://reviews.llvm.org/D47076
llvm-svn: 332819
In the patch rL329547, we have lifted the over-restrictive limitation on collected range
checks, allowing to work with range checks with the end of their range not being
provably non-negative. However it appeared that the non-negativity of this value was
assumed in the utility function `ClampedSubtract`. In particular, its reasoning is based
on the fact that `0 <= SINT_MAX - X`, which is not true if `X` is negative.
The function `ClampedSubtract` is only called twice, once with `X = 0` (which is OK)
and the second time with `X = IRC.getEnd()`, where we may now see the problem if
the end is actually a negative value. In this case, we may sometimes miscompile.
This patch is the conservative fix of the miscompile problem. Rather than rejecting
non-provably non-negative `getEnd()` values, we will check it for non-negativity in
runtime. For this, we use function `smax(smin(X, 0), -1) + 1` that is equal to `1` if `X`
is non-negative and is equal to 0 if `X` is negative. If we multiply `Begin, End` of safe
iteration space by this function calculated for `X = IRC.getEnd()`, we will get the original
`[Begin, End)` if `IRC.getEnd()` was non-negative (and, thus, `ClampedSubtract` worked
correctly) and the empty range `[0, 0)` in case if ` IRC.getEnd()` was negative.
So we in fact prohibit execution of the main loop if at least one of range checks was
made against a negative value (and we figured it out in runtime). It is still better than
what we have before (non-negativity had to be proved in compile time) and prevents
us from miscompile, however it is sometiles too restrictive for unsigned range checks
against a negative value (which in fact can be eliminated).
Once we re-implement `ClampedSubtract` in a way that it handles negative `X` correctly,
this limitation can be lifted, too.
Differential Revision: https://reviews.llvm.org/D46860
Reviewed By: samparker
llvm-svn: 332809
The evaluator goes through BB and creates global vars as temporary values to evaluate
results of LLVM instructions. It creates undef for alloca, however it assumes alloca
in addr space 0. If the next instruction is addrspace cast to 0, then we get an invalid
cast instruction.
This patch let the temp global var have an address space matching alloca addr space,
so that the valuation can be done.
Differential Revision: https://reviews.llvm.org/D47081
llvm-svn: 332794
Summary:
invariant.group.launder should not stop propagation
of nonnull and dereferenceable, because e.g. we would not be
able to hoist loads speculatively.
Reviewers: rsmith, amharc, kuhar, xbolva00, hfinkel
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D46972
llvm-svn: 332788
Summary:
This feature is not needed, but it might be usefull in the future
to use metadata to mark what which function should support it
(and strip it when not).
Reviewers: rsmith, sanjoy, amharc, kuhar
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45419
llvm-svn: 332787
Summary:
Memdep had funny bug related to invariant.groups - because it did not
invalidated cache, in some very rare cases it was possible to show memory
dependence of the instruction that was deleted, but because other
instruction took it's place it resulted in call to vtable!
Thanks @amharc for repro!.
Reviewers: dberlin, kuhar, amharc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45320
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 332781
Eliminate loads from the dispatch packet when they will have
a known value.
Also pattern match the code used by the library to handle partial
workgroup dispatches, which isn't necessary if reqd_work_group_size
is used.
llvm-svn: 332771
Summary:
Floating point division by zero or even undef does not have undefined
behavior and may occur due to optimizations.
Fixes https://bugs.llvm.org/show_bug.cgi?id=37523.
Reviewers: kcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47085
llvm-svn: 332761
The code that generates post-increments for Hexagon considered
integer values only. This patch adds support to generate them for
floating point values, f32 and f64.
Differential Revision: https://reviews.llvm.org/D47036
llvm-svn: 332748
BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737)
The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1:
SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM)
Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner)
llvm-svn: 332745
The intrinsic legalization for masked truncate uses ISD::TRUNCATE which can be constant folded by getNode. This prevents getVectorMaskingNode from seeing the ISD::TRUNCATE special case where it should emit X86ISD::SELECT instead of ISD::VSELECT. This causes a vselect with a v16i1 or v8i1 condition to be emitted during vector legalization. but vector legalization doesn't revisit nodes it creates. DAG combine will then promote this condition to match the result type. Then op legalization will try to legalize it, but the custom lowering hook returned SDValue(). But op legalization doesn't have an Expand for VSELECT because it expects vector legalization to have taken care of it. So the operation sticks around and fails in isel.
This patch adds a custom legalization hook to morph it to a vXi8 vselect instead.
This also simplifies the normal vXi16 vselect handling because vector legalization was normally expanding to AND/ANDN/OR and DAG combine was turning that into VBLENDVB. So we can skip a step by doing it directly.
Fixes PR37499
Differential Revision: https://reviews.llvm.org/D47025
llvm-svn: 332743
This is a revert of the changes from https://reviews.llvm.org/D46265;
the new test introduced (test/CodeGen/X86/PR37310.mir) causes buildbot
failures.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47061
llvm-svn: 332742
Avoid requirement that number of values must be known at assembler
time.
Fixes PR33586.
Reviewers: rnk, peter.smith, echristo, jyknight
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D46703
llvm-svn: 332741
This patch adds a remark which tells the user when a pass changes the number of
IR instructions in a module.
It can be enabled by using -Rpass-analysis=size-info.
The point of this is to make it easier to collect statistics on how passes
modify programs in terms of code size. This is similar in concept to timing
reports, but using a remark-based interface makes it easy to diff changes over
multiple compilations of the same program.
By adding functionality like this, we can see
* Which passes impact code size the most
* How passes impact code size at different optimization levels
* Which pass might have contributed the most to an overall code size
regression
The patch lives in the legacy pass manager, but since it's simply emitting
remarks, it shouldn't be too difficult to adapt the functionality to the new
pass manager as well. This can also be adapted to handle MachineInstr counts in
code gen passes.
https://reviews.llvm.org/D38768
llvm-svn: 332739
Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc.
Fixes BtVer2/SLM which have different behaviours for GPR stores.
llvm-svn: 332718
Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc.
Fixes BtVer2/SLM which have different behaviours for GPR stores.
llvm-svn: 332714
Summary:
Avoid assert/crash during liveness calculation in situations where the
incoming machine function has statically unreachable BBs.
Fixes PR37130.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46265
llvm-svn: 332707
This patch aims to match the changes introduced in gcc by
https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The
IBT feature definition is removed, with the IBT instructions
being freely available on all X86 targets. The shadow stack
instructions are also being made freely available, and the
use of all these CET instructions is controlled by the module
flags derived from the -fcf-protection clang option. The hasSHSTK
option remains since clang uses it to determine availability of
shadow stack instruction intrinsics, but it is no longer directly used.
Comes with a clang patch (D46881).
Patch by mike.dvoretsky
Differential Revision: https://reviews.llvm.org/D46882
llvm-svn: 332705
Summary:
Fix a case where FoldBranchToCommonDest() would bail out from doing CSE
when encountering a debug intrinsic. Handle that by skipping past the
debug intrinsics.
Also, as a minor refactoring, rename checkCSEInPredecessor() to
tryCSEWithPredecessor() to make it a bit more clear that the function
may remove instructions.
Reviewers: fhahn, craig.topper, dblaikie, xbolva00
Reviewed By: fhahn, xbolva00
Subscribers: vsk, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D46635
llvm-svn: 332698
For RISCV branch instructions, we need to preserve relocation types when linker
relaxation enabled, so then linker could modify offset when the branch offsets
changed.
We preserve relocation types by define shouldForceRelocation.
IsResolved return by evaluateFixup will always false when shouldForceRelocation
return true. It will make RISCV MC Branch Relaxation always relax 16-bit
branches to 32-bit form, even if the symbol actually could be resolved.
To avoid 16-bit branches always relax to 32-bit form when linker relaxation
enabled, we add a new parameter WasForced to indicate that the symbol actually
couldn't be resolved and not forced by shouldForceRelocation return true.
RISCVAsmBackend::fixupNeedsRelaxationAdvanced could relax branches with
unresolved symbols by (!IsResolved && !WasForced).
RISCV MC Branch Relaxation is needed because RISCV could perform 32-bit
to 16-bit transformation in MC layer.
Differential Revision: https://reviews.llvm.org/D46350
llvm-svn: 332696
CanProveNotTakenFirstIteration utility does not handle the case when
condition of the branch is a constant. Add its handling.
Reviewers: reames, anna, mkazantsev
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46996
llvm-svn: 332695
The Darwin build bot failed with:
```
llc -mcpu=skylake-avx512 -mtriple=x86_64-unknown-linux-gnu domain-reassignment-test.ll -o - | llvm-mc
--
Exit Code: 134
Command Output (stderr):
--
Assertion failed: (MAI->hasSingleParameterDotFile()), function EmitFileDirective, file lib/MC/MCAsmStreamer.cpp, line 1087.
```
Looks like this is because the `llvm-mc` command was missing a triple
directive and defaulting to MachO. Add the triple option.
llvm-svn: 332694