This avoids a constant pool load to create 1.
The int->float are showing converts to mask and back. We probably need to widen inputs to sint_to_fp/uint_to_fp before type legalization.
llvm-svn: 324805
Discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html
In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.
llvm-svn: 323922
Summary:
For the most part its better to keep v32i1 as a mask type of a narrower width than trying to promote it to a ymm register.
I had to add some overrides to the methods that get the types for the calling convention so that we still use v32i8 for argument/return purposes.
There are still some regressions in here. I definitely saw some around shuffles. I think we probably should move vXi1 shuffle from lowering to a DAG combine where I think the extend and truncate we have to emit would be better combined.
I think we also need a DAG combine to remove trunc from (extract_vector_elt (trunc))
Overall this removes something like 13000 CHECK lines from lit tests.
Reviewers: zvi, RKSimon, delena, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42031
llvm-svn: 323201
Summary:
Added the FastVariableShuffle feature to cases that resembled processors
for which this fearure is on.
For AVX2 there are processors with and w/o this fearue enable.
For AVX512 only KNL does enable this feature so cases which only have
+avx512f were left without the FastVariableShuffle enabled.
Reviewers: RKSimon, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41851
llvm-svn: 322090
Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.
It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.
This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.
We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.
I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.
There's definitely room for improvement with some follow up patches.
Reviewers: RKSimon, zvi, guyblank
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41560
llvm-svn: 321967
We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns.
llvm-svn: 321598
As noted in PR34686, we are relying on a PSHUFD+PSHUFLW+PSHUFHW shuffle chain for most general vXi16 unary shuffles.
This patch checks for simpler PSHUFLW+PSHUFD and PSHUFHW+PSHUFD cases beforehand, building on some existing code that just handled splat shuffles.
By doing so we also prevent premature use of PSHUFB shuffles which can be slower and require the creation/loading of constant shuffle masks.
We now have the 'fast-variable-shuffle' option for hardware that prefers combining 2 or more shuffles to VPSHUFB etc.
Differential Revision: https://reviews.llvm.org/D38318
llvm-svn: 321553
Previously we used an extend from v8i1 to v8i32/v8i64. Then extracted to the final width. But if we have VLX we should extract first. This way we don't end up with an overly large extend.
This allows us to use vcmpeq to make all ones for the sign extend when DQI isn't available. Otherwise we get a VPTERNLOG.
If we make v2i1/v4i1 legal like proposed in D41560, we could always do this and rely on the lowering of the extend to widen when necessary.
llvm-svn: 321538
Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements.
If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate.
llvm-svn: 319728
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.
The MIR printer prints the IR name of a MBB only for block definitions.
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix
Differential Revision: https://reviews.llvm.org/D40422
llvm-svn: 319665
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.
* Only debug printing is affected. It now follows MIR.
Differential Revision: https://reviews.llvm.org/D40417
llvm-svn: 319187
This reverts r317579, originally committed as r317100.
There is a design issue with marking CFI instructions duplicatable. Not
all targets support the CFIInstrInserter pass, and targets like Darwin
can't cope with duplicated prologue setup CFI instructions. The compact
unwind info emission fails.
When the following code is compiled for arm64 on Mac at -O3, the CFI
instructions end up getting tail duplicated, which causes compact unwind
info emission to fail:
int a, c, d, e, f, g, h, i, j, k, l, m;
void n(int o, int *b) {
if (g)
f = 0;
for (; f < o; f++) {
m = a;
if (l > j * k > i)
j = i = k = d;
h = b[c] - e;
}
}
We get assembly that looks like this:
; BB#1: ; %if.then
Lloh3:
adrp x9, _f@GOTPAGE
Lloh4:
ldr x9, [x9, _f@GOTPAGEOFF]
mov w8, wzr
Lloh5:
str wzr, [x9]
stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill
.cfi_def_cfa_offset 16
.cfi_offset w19, -8
.cfi_offset w20, -16
cmp w8, w0
b.lt LBB0_3
b LBB0_7
LBB0_2: ; %entry.if.end_crit_edge
Lloh6:
adrp x8, _f@GOTPAGE
Lloh7:
ldr x8, [x8, _f@GOTPAGEOFF]
Lloh8:
ldr w8, [x8]
stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill
.cfi_def_cfa_offset 16
.cfi_offset w19, -8
.cfi_offset w20, -16
cmp w8, w0
b.ge LBB0_7
LBB0_3: ; %for.body.lr.ph
Note the multiple .cfi_def* directives. Compact unwind info emission
can't handle that.
llvm-svn: 317726
Reland r317100 with minor fix regarding ComputeCommonTailLength function in
BranchFolding.cpp. Skipping top CFI instructions block needs to executed on
several more return points in ComputeCommonTailLength().
Original r317100 message:
"Correct dwarf unwind information in function epilogue for X86"
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.
The second part is platform independent and ensures that:
- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
different passes. This is done in a late pass by analyzing information
about cfa offset and cfa register in BBs and inserting additional CFI
directives where necessary.
Changed CFI instructions so that they:
- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal
Added CFIInstrInserter pass:
- analyzes each basic block to determine cfa offset and register valid at
its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
rule for calculating CFA
Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.
Patch by Violeta Vukobrat.
llvm-svn: 317579
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.
The second part is platform independent and ensures that:
- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
different passes. This is done in a late pass by analyzing information
about cfa offset and cfa register in BBs and inserting additional CFI
directives where necessary.
Changed CFI instructions so that they:
- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal
Added CFIInstrInserter pass:
- analyzes each basic block to determine cfa offset and register valid at
its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
rule for calculating CFA
Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.
Patch by Violeta Vukobrat.
Differential Revision: https://reviews.llvm.org/D35844
llvm-svn: 317100
Summary:
This suppresses the generation of .Lcfi labels in our textual assembler.
It was annoying that this generated cascading .Lcfi labels:
llc foo.ll -o - | llvm-mc | llvm-mc
After three trips through MCAsmStreamer, we'd have three labels in the
output when none are necessary. We should only bother creating the
labels and frame data when making a real object file.
This supercedes D38605, which moved the entire .seh_ implementation into
MCObjectStreamer.
This has the advantage that we do more checking when emitting textual
assembly, as a minor efficiency cost. Outputting textual assembly is not
performance critical, so this shouldn't matter.
Reviewers: majnemer, MatzeB
Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D38638
llvm-svn: 315259
This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type.
Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases.
Differential Revision: https://reviews.llvm.org/D35320
llvm-svn: 314076
This reverts commit r311038.
Several buildbots are breaking, and at least one appears to be due to
the forwarding of physical regs enabled by this change. Reverting while
I investigate further.
llvm-svn: 311062
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa
Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny
Differential Revision: https://reviews.llvm.org/D30751
llvm-svn: 311038