Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.
Reviewers: MatzeB, andreadb, tstellarAMD
Subscribers: arsenm, jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D14945
llvm-svn: 254085
Instead of trying to move ARGUMENT instructions back up to the top after
they've been scheduled or sunk down, use a fake physical register to
create a liveness constraint that prevents ARGUMENT instructions from
moving down in the first place. This is still not entirely ideal, however
it is more robust than letting them move and moving them back.
llvm-svn: 254084
The e500mc does not actually support the mfocrf instruction; update the
processor definitions to reflect that fact.
Patch by Tom Rix (with some test-case cleanup by me).
llvm-svn: 254064
It was wrong order of operands (from intrinsic to DAG node).
I added more strict type specification for instruction selection.
Differential Revision: http://reviews.llvm.org/D14942
llvm-svn: 254059
This caused PR25607 and also caused Chromium to crash on start-up.
(Also had to update test/CodeGen/X86/avx-splat.ll, which was committed
after shrink wrapping was enabled.)
llvm-svn: 254044
X86 needs to use its own FMA opcodes, preventing the standard FNEG(FMA) pattern table recognition method used by other platforms. This patch adds support for lowering FNEG(FMA(X,Y,Z)) into a single suitably negated FMA instruction.
Fix for PR24364
Differential Revision: http://reviews.llvm.org/D14906
llvm-svn: 254016
This patch fixes the following issues:
1. Fix the return type of X86psadbw: it should not be the same type of inputs.
For vNi8 inputs the output should be vMi64, where M = N/8.
2. Fix the return type of int_x86_avx512_psad_bw_512 accordingly.
3. Fix the definiton of PSADBW, VPSADBW, and VPSADBWY accordingly.
4. Adjust the return type when building a DAG node of X86ISD::PSADBW type.
5. Update related tests.
Differential revision: http://reviews.llvm.org/D14897
llvm-svn: 254010
We had duplicated definitions for the same hardware '[v]movq' instructions. For example with SSE:
def MOVZQI2PQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src),
"mov{d|q}\t{$src, $dst|$dst, $src}", // X86-64 only
[(set VR128:$dst, (v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))))],
IIC_SSE_MOVDQ>;
def MOV64toPQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src),
"mov{d|q}\t{$src, $dst|$dst, $src}",
[(set VR128:$dst, (v2i64 (scalar_to_vector GR64:$src)))],
IIC_SSE_MOVDQ>, Sched<[WriteMove]>;
As shown in the test case and PR25554:
https://llvm.org/bugs/show_bug.cgi?id=25554
This causes us to miss reusing an operand because later passes don't know these 'movq' are the same instruction.
This patch deletes one pair of these defs.
Sadly, this won't fix the original test case in the bug report. Something else is still broken.
Differential Revision: http://reviews.llvm.org/D14941
llvm-svn: 253988
The one regression in the builtin tests is in the read2 test which now
(again) has many extra copies, but this should be solved once the pass
is replaced with a DAG combine.
llvm-svn: 253974
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes.
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights.
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This the second patch above. In this patch SelectionDAG starts to use
probability-based interfaces in MBB to add successors but other MC passes are
still using weight-based interfaces. Therefore, we need to maintain correct
weight list in MBB even when probability-based interfaces are used. This is
done by updating weight list in probability-based interfaces by treating the
numerator of probabilities as weights. This change affects many test cases
that check successor weight values. I will update those test cases once this
patch looks good to you.
Differential revision: http://reviews.llvm.org/D14361
llvm-svn: 253965
This patch detects the AVG pattern in vectorized code, which is simply
c = (a + b + 1) / 2, where a, b, and c have the same type which are vectors of
either unsigned i8 or unsigned i16. In the IR, i8/i16 will be promoted to
i32 before any arithmetic operations. The following IR shows such an example:
%1 = zext <N x i8> %a to <N x i32>
%2 = zext <N x i8> %b to <N x i32>
%3 = add nuw nsw <N x i32> %1, <i32 1 x N>
%4 = add nuw nsw <N x i32> %3, %2
%5 = lshr <N x i32> %N, <i32 1 x N>
%6 = trunc <N x i32> %5 to <N x i8>
and with this patch it will be converted to a X86ISD::AVG instruction.
The pattern recognition is done when combining instructions just before type
legalization during instruction selection. We do it here because after type
legalization, it is much more difficult to do pattern recognition based
on many instructions that are doing type conversions. Therefore, for
target-specific instructions (like X86ISD::AVG), we need to take care of type
legalization by ourselves. However, as X86ISD::AVG behaves similarly to
ISD::ADD, I am wondering if there is a way to legalize operands and result
types of X86ISD::AVG together with ISD::ADD. It seems that the current design
doesn't support this idea.
Tests are added for SSE2, AVX2, and AVX512BW and both i8 and i16 types of
variant vector sizes.
Differential revision: http://reviews.llvm.org/D14761
llvm-svn: 253952
Caller saved regs differ between SysV and Win64. Use the tail call available set to scavenge from.
Refactor register info to create new helper to get at tail call GPRs. Added a new test case for windows. Fixed up a number of X64 tests since now RCX is preferred over RDX on SysV.
Differential Revision: http://reviews.llvm.org/D14878
llvm-svn: 253927
With the '=' suffix now indicating which operands are output operands, it's
no longer as important to distinguish between a call's inputs and its outputs
using operand ordering, so we can go back to printing them in the normal order.
llvm-svn: 253925
This distinguishes input operands from output operands. This is something of
a syntactic experiment to see whether the mild amount of clutter this adds is
outweighed by the extra information it conveys to the reader.
llvm-svn: 253922
The current approach to using get_local and set_local is to use them
implicitly, as register uses and defs. Introduce new copy instructions
which are themselves no-ops except for the get_local and set_local
that they imply, so that we use get_local and set_local consistently.
llvm-svn: 253905
WebAssembly is currently using labels to end scopes, so for example a
loop scope looks like this:
BB0_0:
loop BB0_1
...
BB0_1:
with BB0_0 being the label of the first block not in the loop. This
requires that the label be printed even when it's only reachable via
fallthrough. To arrange this, insert a no-op LOOP_END instruction in
such cases at the end of the loop.
llvm-svn: 253901
Always starting blocks at the top of their containing loops works, but creates
unnecessarily deep nesting because it makes all blocks in a loop overlap.
Refine the BLOCK placement algorithm to start blocks at nearest common
dominating points instead, which significantly shrinks them and reduces
overlapping.
llvm-svn: 253876
Disable custom handling of signed 32-bit and 64-bit integer divide.
Add test cases for both 32-bit and 64-bit integer overflow crashes.
llvm-svn: 253865
ISERT_SUBVECTOR for i1 vectors may be done with shifts, when we insert into the lower part, or into the upper part, on into all-zero vector.
CONCAT_VECTORS uses ISERT_SUBVECTOR.
Differential Revision: http://reviews.llvm.org/D14815
llvm-svn: 253819
Extended DFA tablegen to:
- added "-debug-only dfa-emitter" support to llvm-tblgen
- defined CVI_PIPE* resources for the V60 vector coprocessor
- allow specification of multiple required resources
- supports ANDs of ORs
- e.g. [SLOT2, SLOT3], [CVI_MPY0, CVI_MPY1] means:
(SLOT2 OR SLOT3) AND (CVI_MPY0 OR CVI_MPY1)
- added support for combo resources
- allows specifying ORs of ANDs
- e.g. [CVI_XLSHF, CVI_MPY01] means:
(CVI_XLANE AND CVI_SHIFT) OR (CVI_MPY0 AND CVI_MPY1)
- increased DFA input size from 32-bit to 64-bit
- allows for a maximum of 4 AND'ed terms of 16 resources
- supported expressions now include:
expression => term [AND term] [AND term] [AND term]
term => resource [OR resource]*
resource => one_resource | combo_resource
combo_resource => (one_resource [AND one_resource]*)
Author: Dan Palermo <dpalermo@codeaurora.org>
kparzysz: Verified AMDGPU codegen to be unchanged on all llc
tests, except those dealing with instruction encodings.
Reapply the previous patch, this time without circular dependencies.
llvm-svn: 253793
Extended DFA tablegen to:
- added "-debug-only dfa-emitter" support to llvm-tblgen
- defined CVI_PIPE* resources for the V60 vector coprocessor
- allow specification of multiple required resources
- supports ANDs of ORs
- e.g. [SLOT2, SLOT3], [CVI_MPY0, CVI_MPY1] means:
(SLOT2 OR SLOT3) AND (CVI_MPY0 OR CVI_MPY1)
- added support for combo resources
- allows specifying ORs of ANDs
- e.g. [CVI_XLSHF, CVI_MPY01] means:
(CVI_XLANE AND CVI_SHIFT) OR (CVI_MPY0 AND CVI_MPY1)
- increased DFA input size from 32-bit to 64-bit
- allows for a maximum of 4 AND'ed terms of 16 resources
- supported expressions now include:
expression => term [AND term] [AND term] [AND term]
term => resource [OR resource]*
resource => one_resource | combo_resource
combo_resource => (one_resource [AND one_resource]*)
Author: Dan Palermo <dpalermo@codeaurora.org>
kparzysz: Verified AMDGPU codegen to be unchanged on all llc
tests, except those dealing with instruction encodings.
llvm-svn: 253790
It turns out we have a number of places that just grab the first type attached to a register class for various reasons. This is fine unless for some reason that type isn't legal on the current target, such as for SSE1 which doesn't support v16i8/v8i16/v4i32/v2i64 - all of which were included before 4f32 in the class.
Given that this is such a rare situation I've just re-ordered the types and placed the float types first.
Fix for PR16133
Differential Revision: http://reviews.llvm.org/D14787
llvm-svn: 253773
This change merges adjacent zero stores into a wider single store.
For example :
strh wzr, [x0]
strh wzr, [x0, #2]
becomes
str wzr, [x0]
This will fix PR25410.
llvm-svn: 253711
incorrect, as the chosen representative of the weak symbol may not live
with the code in question. Always indirect the access through the TOC
instead.
Patch by Kyle Butt!
llvm-svn: 253708
Summary:
This follows D14577 to treat ARMv6-J as an alias for ARMv6,
instead of an architecture in its own right.
The functional change is that the default CPU when targeting ARMv6-J
changes from arm1136j-s to arm1136jf-s, which is currently used as
the default CPU for ARMv6; both are, in fact, ARMv6-J CPUs.
The J-bit (Jazelle support) is irrelevant to LLVM, and it doesn't
affect code generation, attributes, optimizations, or anything else,
apart from selecting the default CPU.
Reviewers: rengolin, logan, compnerd
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14755
llvm-svn: 253675