When we are scheduling the load and addi, if all other heuristic didn't take effect,
we will try to schedule the addi before the load, to hide the latency, and avoid the
true dependency added by RA. And this only take effects for Power9.
Differential Revision: https://reviews.llvm.org/D61930
llvm-svn: 361600
Summary:
Appears identical to powerpc64{,le}.
Regenerate test that is being affected by upcoming patch.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: nemanjai, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62339
llvm-svn: 361543
Summary:
For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2].
The following results are expected:
ELFv1 when using:
-target powerpc64-unknown-freebsd12.0
-target powerpc64-unknown-freebsd12.0 -mabi=elfv1
-target powerpc64-unknown-freebsd12.0-elfv1
ELFv2 when using:
-target powerpc64-unknown-freebsd12.0 -mabi=elfv2
-target powerpc64-unknown-freebsd12.0-elfv2
[1] https://wiki.freebsd.org/powerpc/llvm-elfv2
[2] https://clang.llvm.org/docs/CrossCompilation.html
Patch by Alfredo Dal'Ava Júnior!
Differential Revision: https://reviews.llvm.org/D61950
llvm-svn: 361355
With a fix for PR41917: The predecessor list was changing under our feet.
- for (BasicBlock *Pred : predecessors(EntryBlock_)) {
+ while (!pred_empty(EntryBlock_)) {
+ BasicBlock* const Pred = *pred_begin(EntryBlock_);
llvm-svn: 361009
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lround/llround generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
llvm-svn: 360889
Trace through multiple COPYs when looking for a physreg source. Add
hinting for vregs that will be copied into physregs (we only hinted
for vregs getting copied to a physreg previously). Give hinted a
register a bonus when deciding which value to spill. This is part of
my rewrite regallocfast series. In fact this one doesn't even have an
effect unless you also flip the allocation to happen from back to
front of a basic block. Nonetheless it helps to split this up to ease
review of D52010
Patch by Matthias Braun
llvm-svn: 360887
Summary:
The emitError path allows the program to continue, unlike report_fatal_error.
This is friendlier to use cases where LLVM is embedded in a larger program,
because the caller may be able to deal with the error somewhat gracefully.
Change the number of requested NOP bytes in the AArch64 and PowerPC
test cases to avoid triggering an unrelated assertion. The compilation
still fails, as verified by the test.
Change-Id: Iafb9ca341002a597b82e59ddc7a1f13c78758e3d
Reviewers: arsenm, MatzeB
Subscribers: qcolombet, nemanjai, wdng, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61489
llvm-svn: 360786
Instead of patching the original blocks, we now generate new blocks and
delete the old blocks. This results in simpler code with a less twisted
control flow (see the change in `entry-block-shuffled.ll`).
This will make https://reviews.llvm.org/D60318 simpler by making it more
obvious where control flow created and deleted.
Reviewers: gchatelet
Subscribers: hiraditya, llvm-commits, spatel
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61736
llvm-svn: 360771
The 3-field form was introduced by D3499 in 2014 and the legacy 2-field
form was planned to be removed in LLVM 4.0
For the textual format, this patch migrates the existing 2-field form to
use the 3-field form and deletes the compatibility code.
test/Verifier/global-ctors-2.ll checks we have a friendly error message.
For bitcode, lib/IR/AutoUpgrade UpgradeGlobalVariables will upgrade the
2-field form (add i8* null as the third field).
Reviewed By: rnk, dexonsmith
Differential Revision: https://reviews.llvm.org/D61547
llvm-svn: 360742
For known CRBit spills, CRSET/CRUNSET, it is more efficient to load and spill
the known value instead of extracting the bit.
eg. This sequence is currently used to spill a CRUNSET:
crclr 4*cr5+lt
mfocrf r3,4
rlwinm r3,r3,20,0,0
stw r3,132(r1)
This patch custom lower it to:
li r3,0
stw r3,132(r1)
Differential Revision: https://reviews.llvm.org/D61754
llvm-svn: 360677
Summary:
X86TargetLowering::LowerAsmOperandForConstraint had better support than
TargetLowering::LowerAsmOperandForConstraint for arbitrary depth
getelementpointers for "i", "n", and "s" extended inline assembly
constraints. Hoist its support from the derived class into the base
class.
Link: https://github.com/ClangBuiltLinux/linux/issues/469
Reviewers: echristo, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61560
llvm-svn: 360604
build-vector-tests.ll is a huge testcase, it is hard to maintain: eg:
any fundamental changes might need to update hundreds of lines. We should
leverage the script to maintain it.
This patch simply run utils/update_llc_test_checks.py on it. There
should be no missing test points.
llvm-svn: 360175
The single-constant algorithm produces infinities on a lot of denormal values.
The precision of the two-constant algorithm is actually sufficient across the
range of denormals. We will switch to that algorithm for now to avoid the
infinities on denormals. In the future, we will re-evaluate the algorithm to
find the optimal one for PowerPC.
Differential revision: https://reviews.llvm.org/D60037
llvm-svn: 360144
A condition for exiting the legalization of v4i32 conversion to v2f64 through
extract/convert/build erroneously checks for the extract having type i32.
This is not adequate as smaller extracts are actually legalized to i32 as well.
Furthermore, an early exit is missing which means that we only check that
both extracts are from the same vector if that check fails.
As a result, both cases in the included test case fail - the first gets a
select error and the second generates incorrect code.
The culprit commit is r274535.
llvm-svn: 360043
This is a slightly reduced version of the test from D61384.
Adding this as a preliminary step, so I can update D61149 with
the proposed fix.
llvm-svn: 359709
-t is --symbols in llvm-readobj but --section-details (unimplemented) in readelf.
The confusing option should not be used since we aim for improving
compatibility.
Keep just one llvm-readobj -t use case in test/tools/llvm-readobj/symbols.test
llvm-svn: 359661
Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry),
if adde is not legal for the target. Even it's at type-legalize phase.
Because adde is special and will not be legalized at operation-legalize phase later.
This fixes: PR40922
https://bugs.llvm.org/show_bug.cgi?id=40922
Differential Revision: https://reviews.llvm.org//D60854
llvm-svn: 359532
Summary:
This patch adds support for __builtin_dcbf for PPC.
__builtin_dcbf copies the contents of a modified block from the data cache
to main memory and flushes the copy from the data cache.
Differential revision: https://reviews.llvm.org/D59843
llvm-svn: 359517
Change the PPCISelLowering.cpp function that decides to avoid update form in
favor of partial vector loads to know about newer load types and to not be
confused by the chain operand.
Differential Revision: https://reviews.llvm.org/D60102
llvm-svn: 359504
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.
Refactors a few subclasses to support the target independent %a, %c, and
%n.
The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.
It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.
Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449
Reviewers: echristo, void
Reviewed By: void
Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60887
llvm-svn: 359337
Using initial-exec TLS variables is a reasonable performance
optimisation for system libraries. Use the correct PIC mechanism to get
hold of the GOT to avoid text relocations.
Differential Revision: https://reviews.llvm.org/D61026
llvm-svn: 359146
Summary:
This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177
When the two operands for BUILD_VECTOR are same, we will get assert error.
llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&):
Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) &&
"The loads cannot be both consecutive and reverse consecutive."' failed.
This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We
should use `getScalarType().getStoreSize();` to get the ElemSize instread of
`getScalarSizeInBits() / 8`.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D60811
llvm-svn: 358644
Summary:
None of these derived classes do anything that the base class cannot.
If we remove these case statements, then the base class can handle them
just fine.
Reviewers: peter.smith, echristo
Reviewed By: echristo
Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60803
llvm-svn: 358603
Summary:
Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D60248
llvm-svn: 358271
Summary:
Some llc debug options need pass-name as the parameters.
But if we use the pass-name ppc-early-ret, we will get below error:
llc test.ll -stop-after ppc-early-ret
LLVM ERROR: "ppc-early-ret" pass is not registered.
Below pass-names have the pass is not registered error:
ppc-ctr-loops
ppc-ctr-loops-verify
ppc-loop-preinc-prep
ppc-toc-reg-deps
ppc-vsx-copy
ppc-early-ret
ppc-vsx-fma-mutate
ppc-vsx-swaps
ppc-reduce-cr-ops
ppc-qpx-load-splat
ppc-branch-coalescing
ppc-branch-select
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D60248
llvm-svn: 358256
There are 3 operands of maddld, (add (mul %1, %2), %3) and sometimes
they are constant. If there is constant operand, it takes extra li to
materialize the operand, and one more extra register too. So it's not
profitable to use maddld to optimize mul-add pattern.
Differential Revision: https://reviews.llvm.org/D60181
llvm-svn: 358253
Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.
I will try to follow this up with some better tests.
llvm-svn: 358113
Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.
This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.
Also add a missing truncation on X86, found by testing of this patch.
Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa
Reviewers: bogner, craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59535
llvm-svn: 357745
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.
Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight
Reviewed By: jyknight
Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58070
llvm-svn: 357283
Summary:
PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode.
double __builtin_setrnd(int mode);
The effective values for mode are:
0 - round to nearest
1 - round to zero
2 - round to +infinity
3 - round to -infinity
Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2).
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D59405
llvm-svn: 357241
A shift and add/sub sequence combination is faster in place of a multiply by constant.
Because the cycle or latency of multiply is not huge, we only consider such following
worthy patterns.
```
(mul x, 2^N + 1) => (add (shl x, N), x)
(mul x, -(2^N + 1)) => -(add (shl x, N), x)
(mul x, 2^N - 1) => (sub (shl x, N), x)
(mul x, -(2^N - 1)) => (sub x, (shl x, N))
```
And the cycles or latency is subtarget-dependent so that we need consider the
subtarget to determine to do or not do such transformation.
Also data type is considered for different cycles or latency to do multiply.
Differential Revision: https://reviews.llvm.org/D58950
llvm-svn: 357233
The UseVSXReg flag can be safely removed and the code cleaned up.
Patch By: Yi-Hong Liu
Differential Revision: https://reviews.llvm.org/D58685
llvm-svn: 357028
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations or not fully pruning unused result values. This can
result in nodes that are never added to the worklist and therefore can
not be pruned.
Add a node inserter as the current node deleter to make sure such
nodes have the chance of being pruned.
Many minor changes, mostly positive.
llvm-svn: 356996
The 2nd loop calculates spill costs but reports free registers as cost
0 anyway, so there is little benefit from having a separate early
loop.
Surprisingly this is not NFC, as many register are marked regDisabled
so the first loop often picks up later registers unnecessarily instead
of the first one available in the allocation order...
Patch by Matthias Braun
llvm-svn: 356499
Summary:
A number of optimizations are inhibited by single-use TokenFactors not
being merged into the TokenFactor using it. This makes we consider if
we can do the merge immediately.
Most tests changes here are due to the change in visitation causing
minor reorderings and associated reassociation of paired memory
operations.
CodeGen tests with non-reordering changes:
X86/aligned-variadic.ll -- memory-based add folded into stored leaq
value.
X86/constant-combiners.ll -- Optimizes out overlap between stores.
X86/pr40631_deadstore_elision -- folds constant byte store into
preceding quad word constant store.
Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet
Reviewed By: courbet
Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59260
llvm-svn: 356068
Vector imm setting instructions like XXLXORz/XXLXORspz/XXLXORdpz
Should behave like LI8.
We should set corresponding flags to allow rematerialization and other
opts in LICM, RA, Scheduling etc.
Differential Revision: https://reviews.llvm.org/D58645
llvm-svn: 355948
In file PPCBranchSelector.cpp we tend to over estimate code size due to large
alignment and inline assembly. Usually it causes larger computed branch offset,
it is not big problem. But sometimes it may also causes smaller computed branch
offset than actual branch offset. If the offset is close to the limit of
encoding, it may cause problem at run time.
Following is a simplified example.
actual estimated
address address
...
bne Far 100 10c
.p2align 4
Near: 110 110
...
Far: 8108 8108
Actual offset: 0x8108 - 0x100 = 0x8008
Computed offset: 0x8108 - 0x10c = 0x7ffc
The computed offset is at most ((1 << alignment) - 4) bytes smaller than actual
offset. So we add this number to the offset for safety.
Differential Revision: https://reviews.llvm.org/D57718
llvm-svn: 355529
Move the stdu instruction in the prologue and epilogue.
This should provide a small performance boost in functions that are able to do
this. I've kept this change rather conservative at the moment and functions
with frame pointers or base pointers will not try to move the stack pointer
update.
Differential Revision: https://reviews.llvm.org/D42590
llvm-svn: 355085
Summary:
Turns out the test was not correct, I had to adjust the test to work. I
also added CHECK-LABELs for better error messages from FileCheck while
I'm here.
Reviewers: jsji
Subscribers: nemanjai, eraman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58614
llvm-svn: 355079
Summary:
Fast selection of llvm fptoi & fptrunc instructions is not handled well about
VSX instruction support.
We'd use VSX float convert integer instruction instead of non-vsx float convert
integer instruction if the operand register class is VSSRC or VSFRC because i32
and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is
openeded.
For float trunc instruction, we do this silimar work like float convert integer
instruction to try to use VSX instruction.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D58430
llvm-svn: 354762
Added four test files to check the existing behaviour of prologue
and epilogue code generation. This patch was done as a setup for
the upcoming patch listed on Phabricator that will change how the
prologue and epilogue work.
The upcoming patch is: https://reviews.llvm.org/D42590
llvm-svn: 353994
If we're comparing some value for equality against 2 constants
and those constants have an absolute difference of just 1 bit,
then we can offset and mask off that 1 bit and reduce to a single
compare against zero:
and/or (setcc X, C0, ne), (setcc X, C1, ne/eq) -->
setcc ((add X, -C1), ~(C0 - C1)), 0, ne/eq
https://rise4fun.com/Alive/XslKj
This transform is disabled by default using a TLI hook
("convertSetCCLogicToBitwiseLogic()").
That should be overridden for AArch64, MIPS, Sparc and possibly
others based on the asm shown in:
https://bugs.llvm.org/show_bug.cgi?id=40611
llvm-svn: 353859
The PowerPC code generator currently scalarizes vector truncates that would fit in a vector register, resulting in vector extracts, scalar operations, and vector merges. This patch custom lowers a vector truncate that would fit in a register to a vector shuffle instead.
Differential Revision: https://reviews.llvm.org/D56507
llvm-svn: 353724
The sqrt case is faster and we already do this for the case where
the exponent is 0.25. This adds the 0.75 case which is also not
sensitive to signed zeros.
Patch by Whitney Tsang (Whitney)
Differential revision: https://reviews.llvm.org/D57434
llvm-svn: 353557
I noticed that we are missing this canonicalization in IR:
rL352515
...and then realized that we don't get this right in SDAG either,
so this has to be fixed first regardless of what we choose to do in IR.
The existing fold was limited to scalars and using the wrong predicate
to guard the transform. We have a boolean contents TLI query that can
be used to decide which direction to fold.
This may eventually lead back to the problems/question in:
https://bugs.llvm.org/show_bug.cgi?id=40486
...but it makes no difference to that yet.
Differential Revision: https://reviews.llvm.org/D57401
llvm-svn: 353433
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.
Reviewers: sanjoy, chandlerc
Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D56625
llvm-svn: 353339
There are 2 changes visible here:
1. There's no reason to limit this transform based on number
of condition registers. That diff allows PPC to produce
slightly better (dot-instructions should be generally good)
code.
Note: someone that cares about PPC codegen might want to
look closer at that output because it seems like we could
still improve this.
2. We (probably?) should not bother trying to form uaddo (or
other overflow ops) when there's no target support for such
an op. This goes beyond checking whether the op is expanded
because both PPC and AArch64 show better codegen for standard
types regardless of whether the op is legal/custom.
llvm-svn: 353001
This fixes most references to the paths:
llvm.org/svn/
llvm.org/git/
llvm.org/viewvc/
github.com/llvm-mirror/
github.com/llvm-project/
reviews.llvm.org/diffusion/
to instead point to https://github.com/llvm/llvm-project.
This is *not* a trivial substitution, because additionally, all the
checkout instructions had to be migrated to instruct users on how to
use the monorepo layout, setting LLVM_ENABLE_PROJECTS instead of
checking out various projects into various subdirectories.
I've attempted to not change any scripts here, only documentation. The
scripts will have to be addressed separately.
Additionally, I've deleted one document which appeared to be outdated
and unneeded:
lldb/docs/building-with-debug-llvm.txt
Differential Revision: https://reviews.llvm.org/D57330
llvm-svn: 352514
If bottom of block BB has only one successor OldTop, in most cases it is profitable to move it before OldTop, except the following case:
-->OldTop<-
| . |
| . |
| . |
---Pred |
| |
BB-----
Move BB before OldTop can't reduce the number of taken branches, this patch detects this case and prevent the moving.
Differential Revision: https://reviews.llvm.org/D57067
llvm-svn: 352236
Fast selection of llvm icmp and fcmp instructions is not handled well about VSX instruction support.
We'd use VSX float comparison instruction instead of non-vsx float comparison instruction
if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and
VSFRC correspondingly if VSX feature is opened.
If the target does not have corresponding VSX instruction comparison for some type,
just copy VSX-related register to common float register class and use non-vsx comparison instruction.
Differential Revision: https://reviews.llvm.org/D57078
llvm-svn: 352174
This patch exploits the instructions that store a single element from a vector
to preform a (store (extract_elt)). We already have code that does this with
ISA 3.0 instructions that were added to handle i8/i16 types. However, we had
never exploited the existing ones that handle f32/f64/i32/i64 types.
Differential revision: https://reviews.llvm.org/D56175
llvm-svn: 352131
compiler identification lines in test-cases.
(Doing so only because it's then easier to search for references which
are actually important and need fixing.)
llvm-svn: 351200
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"
Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.
"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".
tests are mostly updated with
// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"
// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"
Patch by Yuanfang Chen (tabloid.adroit)!
Differential Revision: https://reviews.llvm.org/D56351
llvm-svn: 351049
Bad machine code: Illegal virtual register for instruction
function: TestULE
basic block: %bb.0 entry (0x1000a39b158)
instruction: %2:crrc = FCMPUD %1:vsfrc, %3:f8rc
operand 1: %1:vsfrc
Fix assert about missing match between fcmp instruction and register class.
We should use vsx related cmp instruction xvcmpudp instead of fcmpu when vsx is opened.
add -verifymachineinstrs option into related test cases to enable the verify pass.
Differential Revision: https://reviews.llvm.org/D55686
llvm-svn: 350685
When switched to the MI scheduler for P9, the hardware is modeled as out of order.
However, inside the MI Scheduler algorithm, we still use the in-order scheduling model
as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer
the op. So, only when all the available instructions issued, the pending instruction
could be scheduled. That is not true for our P9 hw in fact.
This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is
picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017.
With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows:
x264_r: +6.95%
cactuBSSN_r: +6.94%
lbm_r: +4.11%
xz_r: -3.85%
And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved.
Reviewer: Nemanjai
Differential Revision: https://reviews.llvm.org/D55810
llvm-svn: 350285
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.
The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.
Differential Revision: https://reviews.llvm.org/D56156
llvm-svn: 350235
PPCPreEmitPeephole pass.
PPCPreEmitPeephole will convert a BC to B when the conditional branch is
based on a constant CR by CRSET or CRUNSET. This is added in
https://reviews.llvm.org/rL343100.
When the conditional branch is known to be always taken, all branches will
be removed and a new unconditional branch will be inserted. However, when
SeenUse is false the original patch will not remove the branches, but still
insert the new unconditional branch, update the successors and create
inconsistent IR. Compiling the synthetic testcase included can show the
problem we run into.
The patch simply removes the SeenUse condition when adding branches into
InstrsToErase set.
Differential Revision: https://reviews.llvm.org/D56041
llvm-svn: 350223
Summary:
For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo.
Then the verifier runs and it seems like we have a use of an undefined register (the register will
be reserved later, but the verifier doesn't know that).
So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know
X2 is a reserved register.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D56148
llvm-svn: 350165
A recent patch has added custom legalization of vector conversions of
v2i16 -> v2f64. This just rounds it out for other types where the input vector
has an illegal (narrower) type than the result vector. Specifically, this will
handle the following conversions:
v2i8 -> v2f64
v4i8 -> v4f32
v4i16 -> v4f32
Differential revision: https://reviews.llvm.org/D54663
llvm-svn: 350155
The current CRBIT spill pseudo-op expansion creates a KILL instruction
that kills the CRBIT and defines the enclosing CR field. However, this
paints a false picture to the register allocator that all bits in the CR
field are killed so copies of other bits out of the field become dead and
removable.
This changes the expansion to preserve the KILL flag on the CRBIT as an
implicit use and to treat the CR field as an undef input.
Thanks to Hal Finkel for the review and Uli Weigand for implementation input.
Differential revision: https://reviews.llvm.org/D55996
llvm-svn: 350153
This is the last one in a series of patches to support better code generation for bitfield insert.
BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE.
This patch adds support for ISD:TRUNCATE in BitPermutationSelector.
For example of this test case,
struct s64b {
int a:4;
int b:16;
int c:24;
};
void bitfieldinsert64b(struct s64b *p, unsigned char v) {
p->b = v;
}
the selection DAG loos like:
t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64
t18: i32 = and t14, Constant:i32<-1048561>
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t22: i64 = AssertZext t4, ValueType:ch:i8
t23: i32 = truncate t22
t16: i32 = shl nuw nsw t23, Constant:i32<4>
t19: i32 = or t18, t16
t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64
By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18.
So the generated sequences with and without this patch are
without this patch
rlwinm 5, 5, 0, 28, 11 # corresponding to t18
rlwimi 5, 4, 4, 20, 27
with this patch
rlwimi 5, 4, 4, 12, 27
Differential Revision: https://reviews.llvm.org/D49076
llvm-svn: 350118
If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have.
Differential Revision: https://reviews.llvm.org/D56078
llvm-svn: 350115
For atomic value operand which less than 4 bytes need to be masked.
And the related operation to calculate the newvalue can be done in 32 bit gprc.
So just use gprc for mask and value calculation.
Differential Revision: https://reviews.llvm.org/D56077
llvm-svn: 350113
Summary:
This patch is to fix the bug imported by rL341634.
In above submit , the the return type of ISD::ADDE is
14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64),
but in fact, the second return type of ISD::ADDE should be
MVT::Glue not MVT::i64.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D55977
llvm-svn: 350061
Summary:
PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC
does not have vector CR selects, PowerPC does not support scalar condition
selects on vectors.
In addition to implementing this hook, isSelectSupported() should return false
when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects
are converted into branch sequences.
Reviewed By: steven.zhang, hfinkel
Differential Revision: https://reviews.llvm.org/D55754
llvm-svn: 349727
For type v4i32/v8ii16/v16i8, do following transforms:
(vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b)
(vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b)
Differential Revision: https://reviews.llvm.org/D55812
llvm-svn: 349599
Power9 VABSDU* instructions can be exploited for some special vselect sequences.
Check in the orignal test case here, later the exploitation patch will update this
and reviewers can check the differences easily.
llvm-svn: 349446
Improve the current vec_abs support on P9, generate ISD::ABS node for vector types,
combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns,
do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity.
Differential Revision: https://reviews.llvm.org/D54783
llvm-svn: 349437
Appended options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to get the
more descriptive output. Also removed useless function attributes.
llvm-svn: 349329
With some patch adopted for Power9 vabsd* insns, some CHECKs can't get the expected results.
But it's false alarm, we should update the case more robust.
llvm-svn: 349325
Add an original test case for setb before the exploitation actually takes effect, later we can check the difference.
Differential Revision: https://reviews.llvm.org/D55696
llvm-svn: 349251
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 349016
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 348843
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.
Differential Revision: https://reviews.llvm.org/D50141
llvm-svn: 348585
Fix assert about using an undefined physical register in machine instruction verify pass.
The reason is that register flag undef is missing when doing transformation from If Conversion Pass.
```
Bad machine code: Using an undefined physical register
- function: func_65
- basic block: %bb.0 entry (0x10024740738)
- instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3
- operand 0: killed $cr5lt
LLVM ERROR: Found 1 machine code errors.
```
There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying.
Differential Revision: https://reviews.llvm.org/D55408
llvm-svn: 348566
If this is not a valid way to assign an SDLoc, then we get this
wrong all over SDAG.
I don't know enough about the SDAG to explain this. IIUC, theoretically,
debug info is not supposed to affect codegen. But here it has clearly
affected 3 different targets, and the x86 change is an actual improvement.
llvm-svn: 348552
The PPC test with 2 extra uses seems clearly better by avoiding this transform.
With 1 extra use, we also prevent an extra register move (although that might
be an RA problem). The general rule should be to only make a change here if
it is always profitable. The x86 diffs are all neutral.
llvm-svn: 348518
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D54738
llvm-svn: 348109
Summary:
A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code
includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the
most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative
so for signed comparisons vs. unsigned ones, the input operands just need to be swapped.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D54825
llvm-svn: 347831
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.
This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.
Differential Revision: https://reviews.llvm.org/D54906
llvm-svn: 347593
This reverts commits r347532. Forget add the option
-mtriple powerpc64-unknown-linux-gnu. So other platform is error except
for PowerPC.
llvm-svn: 347534
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D54738
llvm-svn: 347532
We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'),
but that would not matter without this patch because DAGCombiner was
reversing that transform. I think we need this transform in the backend
regardless of what happens in IR to catch cases where the shift-xor
is formed late from GEP or other ops.
https://rise4fun.com/Alive/NC1
Name: shl
Pre: (-1 << C2) == C1
%shl = shl i8 %x, C2
%r = xor i8 %shl, C1
=>
%not = xor i8 %x, -1
%r = shl i8 %not, C2
Name: shr
Pre: (-1 u>> C2) == C1
%sh = lshr i8 %x, C2
%r = xor i8 %sh, C1
=>
%not = xor i8 %x, -1
%r = lshr i8 %not, C2
https://bugs.llvm.org/show_bug.cgi?id=39657
llvm-svn: 347478
We have efficient codegen on P9 for lowering bswap that involves moving
the value into a vector reg and moving it back. However, the check under
which we custom lowered it did not adequately reflect the actual requirements.
It required only that the subtarget be an implementation of ISA 3.0 since all
compliant implementations have to provide the vector instructions.
However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9
(i.e. don't emit vector code, don't have to save vector regs for context
switch). So we should require the correct features for this lowering.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39334
llvm-svn: 347376
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8.
Since there are already multiple IIC for store update, this patch also merge
IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU
IIC_LdStSTDUX to IIC_LdStSTUX
and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference.
Differential Revision: https://reviews.llvm.org/D54700
llvm-svn: 347311
This patch add a STWU testcase for scheduling check.
Currently P7/P8 which use itineraries are missing IIC_LdStStoreUpd,
We use CHECK-ITIN prefix to check P7/P8, then use default for P9 (and future).
We will fix the missing itineraries of IIC_LdStStoreUpd in following patch,
and update this testcase to show the scheduling difference only there.
Differential Revision: https://reviews.llvm.org/D54699
llvm-svn: 347310
Turns out that there was no check for a store that truncates down
to a single byte when combining a (store (bswap...)) into a byte-swapping
store. This patch just adds that check.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39478.
llvm-svn: 347288
This NFC patch just adds test cases for conversions that currently
require scalarization of vectors. An updcoming patch will change
the legalization for these and it is more suitable on the review
to show the diferences in code gen rather than just the new code gen.
llvm-svn: 347090
To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding,
which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel.
Differential Revision: https://reviews.llvm.org/D49531
llvm-svn: 346824
On Power9, we don't have patterns to select the following intrinsics:
llvm.ppc.vsx.stxvw4x.be
llvm.ppc.vsx.stxvd2x.be
This patch adds support for these.
Differential Revision: https://reviews.llvm.org/D53581
llvm-svn: 346148
The debug-use flag must be set exactly for uses on DBG_VALUEs. This is
so obvious that it can be trivially inferred while parsing. This will
reduce noise when printing while omitting an information that has little
value to the user.
The parser will keep recognizing the flag for compatibility with old
`.mir` files.
Differential Revision: https://reviews.llvm.org/D53903
llvm-svn: 345671