When switched to the MI scheduler for P9, the hardware is modeled as out of order.
However, inside the MI Scheduler algorithm, we still use the in-order scheduling model
as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer
the op. So, only when all the available instructions issued, the pending instruction
could be scheduled. That is not true for our P9 hw in fact.
This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is
picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017.
With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows:
x264_r: +6.95%
cactuBSSN_r: +6.94%
lbm_r: +4.11%
xz_r: -3.85%
And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved.
Reviewer: Nemanjai
Differential Revision: https://reviews.llvm.org/D55810
llvm-svn: 350285
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.
The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.
Differential Revision: https://reviews.llvm.org/D56156
llvm-svn: 350235
PPCPreEmitPeephole pass.
PPCPreEmitPeephole will convert a BC to B when the conditional branch is
based on a constant CR by CRSET or CRUNSET. This is added in
https://reviews.llvm.org/rL343100.
When the conditional branch is known to be always taken, all branches will
be removed and a new unconditional branch will be inserted. However, when
SeenUse is false the original patch will not remove the branches, but still
insert the new unconditional branch, update the successors and create
inconsistent IR. Compiling the synthetic testcase included can show the
problem we run into.
The patch simply removes the SeenUse condition when adding branches into
InstrsToErase set.
Differential Revision: https://reviews.llvm.org/D56041
llvm-svn: 350223
Summary:
For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo.
Then the verifier runs and it seems like we have a use of an undefined register (the register will
be reserved later, but the verifier doesn't know that).
So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know
X2 is a reserved register.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D56148
llvm-svn: 350165
A recent patch has added custom legalization of vector conversions of
v2i16 -> v2f64. This just rounds it out for other types where the input vector
has an illegal (narrower) type than the result vector. Specifically, this will
handle the following conversions:
v2i8 -> v2f64
v4i8 -> v4f32
v4i16 -> v4f32
Differential revision: https://reviews.llvm.org/D54663
llvm-svn: 350155
The current CRBIT spill pseudo-op expansion creates a KILL instruction
that kills the CRBIT and defines the enclosing CR field. However, this
paints a false picture to the register allocator that all bits in the CR
field are killed so copies of other bits out of the field become dead and
removable.
This changes the expansion to preserve the KILL flag on the CRBIT as an
implicit use and to treat the CR field as an undef input.
Thanks to Hal Finkel for the review and Uli Weigand for implementation input.
Differential revision: https://reviews.llvm.org/D55996
llvm-svn: 350153
This is the last one in a series of patches to support better code generation for bitfield insert.
BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE.
This patch adds support for ISD:TRUNCATE in BitPermutationSelector.
For example of this test case,
struct s64b {
int a:4;
int b:16;
int c:24;
};
void bitfieldinsert64b(struct s64b *p, unsigned char v) {
p->b = v;
}
the selection DAG loos like:
t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64
t18: i32 = and t14, Constant:i32<-1048561>
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t22: i64 = AssertZext t4, ValueType:ch:i8
t23: i32 = truncate t22
t16: i32 = shl nuw nsw t23, Constant:i32<4>
t19: i32 = or t18, t16
t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64
By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18.
So the generated sequences with and without this patch are
without this patch
rlwinm 5, 5, 0, 28, 11 # corresponding to t18
rlwimi 5, 4, 4, 20, 27
with this patch
rlwimi 5, 4, 4, 12, 27
Differential Revision: https://reviews.llvm.org/D49076
llvm-svn: 350118
If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have.
Differential Revision: https://reviews.llvm.org/D56078
llvm-svn: 350115
For atomic value operand which less than 4 bytes need to be masked.
And the related operation to calculate the newvalue can be done in 32 bit gprc.
So just use gprc for mask and value calculation.
Differential Revision: https://reviews.llvm.org/D56077
llvm-svn: 350113
Summary:
This patch is to fix the bug imported by rL341634.
In above submit , the the return type of ISD::ADDE is
14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64),
but in fact, the second return type of ISD::ADDE should be
MVT::Glue not MVT::i64.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D55977
llvm-svn: 350061
Summary:
PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC
does not have vector CR selects, PowerPC does not support scalar condition
selects on vectors.
In addition to implementing this hook, isSelectSupported() should return false
when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects
are converted into branch sequences.
Reviewed By: steven.zhang, hfinkel
Differential Revision: https://reviews.llvm.org/D55754
llvm-svn: 349727
For type v4i32/v8ii16/v16i8, do following transforms:
(vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b)
(vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b)
(vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b)
Differential Revision: https://reviews.llvm.org/D55812
llvm-svn: 349599
Power9 VABSDU* instructions can be exploited for some special vselect sequences.
Check in the orignal test case here, later the exploitation patch will update this
and reviewers can check the differences easily.
llvm-svn: 349446
Improve the current vec_abs support on P9, generate ISD::ABS node for vector types,
combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns,
do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity.
Differential Revision: https://reviews.llvm.org/D54783
llvm-svn: 349437
Appended options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to get the
more descriptive output. Also removed useless function attributes.
llvm-svn: 349329
With some patch adopted for Power9 vabsd* insns, some CHECKs can't get the expected results.
But it's false alarm, we should update the case more robust.
llvm-svn: 349325
Add an original test case for setb before the exploitation actually takes effect, later we can check the difference.
Differential Revision: https://reviews.llvm.org/D55696
llvm-svn: 349251
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 349016
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.
Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D55365
llvm-svn: 348843
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.
Differential Revision: https://reviews.llvm.org/D50141
llvm-svn: 348585
Fix assert about using an undefined physical register in machine instruction verify pass.
The reason is that register flag undef is missing when doing transformation from If Conversion Pass.
```
Bad machine code: Using an undefined physical register
- function: func_65
- basic block: %bb.0 entry (0x10024740738)
- instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3
- operand 0: killed $cr5lt
LLVM ERROR: Found 1 machine code errors.
```
There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying.
Differential Revision: https://reviews.llvm.org/D55408
llvm-svn: 348566
If this is not a valid way to assign an SDLoc, then we get this
wrong all over SDAG.
I don't know enough about the SDAG to explain this. IIUC, theoretically,
debug info is not supposed to affect codegen. But here it has clearly
affected 3 different targets, and the x86 change is an actual improvement.
llvm-svn: 348552
The PPC test with 2 extra uses seems clearly better by avoiding this transform.
With 1 extra use, we also prevent an extra register move (although that might
be an RA problem). The general rule should be to only make a change here if
it is always profitable. The x86 diffs are all neutral.
llvm-svn: 348518
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D54738
llvm-svn: 348109
Summary:
A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code
includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the
most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative
so for signed comparisons vs. unsigned ones, the input operands just need to be swapped.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D54825
llvm-svn: 347831
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.
This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.
Differential Revision: https://reviews.llvm.org/D54906
llvm-svn: 347593
This reverts commits r347532. Forget add the option
-mtriple powerpc64-unknown-linux-gnu. So other platform is error except
for PowerPC.
llvm-svn: 347534
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D54738
llvm-svn: 347532
We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'),
but that would not matter without this patch because DAGCombiner was
reversing that transform. I think we need this transform in the backend
regardless of what happens in IR to catch cases where the shift-xor
is formed late from GEP or other ops.
https://rise4fun.com/Alive/NC1
Name: shl
Pre: (-1 << C2) == C1
%shl = shl i8 %x, C2
%r = xor i8 %shl, C1
=>
%not = xor i8 %x, -1
%r = shl i8 %not, C2
Name: shr
Pre: (-1 u>> C2) == C1
%sh = lshr i8 %x, C2
%r = xor i8 %sh, C1
=>
%not = xor i8 %x, -1
%r = lshr i8 %not, C2
https://bugs.llvm.org/show_bug.cgi?id=39657
llvm-svn: 347478
We have efficient codegen on P9 for lowering bswap that involves moving
the value into a vector reg and moving it back. However, the check under
which we custom lowered it did not adequately reflect the actual requirements.
It required only that the subtarget be an implementation of ISA 3.0 since all
compliant implementations have to provide the vector instructions.
However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9
(i.e. don't emit vector code, don't have to save vector regs for context
switch). So we should require the correct features for this lowering.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39334
llvm-svn: 347376
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8.
Since there are already multiple IIC for store update, this patch also merge
IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU
IIC_LdStSTDUX to IIC_LdStSTUX
and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference.
Differential Revision: https://reviews.llvm.org/D54700
llvm-svn: 347311
This patch add a STWU testcase for scheduling check.
Currently P7/P8 which use itineraries are missing IIC_LdStStoreUpd,
We use CHECK-ITIN prefix to check P7/P8, then use default for P9 (and future).
We will fix the missing itineraries of IIC_LdStStoreUpd in following patch,
and update this testcase to show the scheduling difference only there.
Differential Revision: https://reviews.llvm.org/D54699
llvm-svn: 347310
Turns out that there was no check for a store that truncates down
to a single byte when combining a (store (bswap...)) into a byte-swapping
store. This patch just adds that check.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39478.
llvm-svn: 347288
This NFC patch just adds test cases for conversions that currently
require scalarization of vectors. An updcoming patch will change
the legalization for these and it is more suitable on the review
to show the diferences in code gen rather than just the new code gen.
llvm-svn: 347090