Commit Graph

21150 Commits

Author SHA1 Message Date
Matt Arsenault 581d13f8ae GlobalISel: Return APInt from getConstantVRegVal
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.

Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).

One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
2020-12-22 22:23:58 -05:00
Craig Topper f47b07315a [X86] Teach assembler to accept vmsave/vmload/vmrun/invlpga/skinit with or without the fixed register operands
These instructions read their inputs from fixed registers rather
than using a modrm byte. We shouldn't require the user to list them
when parsing assembly. This matches the GNU assembler.

This patch adds InstAliases so we can accept either form. It also
changes the printing code to use the form without registers. This
will change the behavior of llvm-objdump, but should be consistent
with binutils objdump. This also matches what we already do in LLVM for
clzero and monitorx which also used fixed registers.

I need to add and improve tests before this can be commited. The
disassembler tests exist, but weren't checking the fixed register
so they pass before and after this change.

Fixes https://github.com/ClangBuiltLinux/linux/issues/1216

Differential Revision: https://reviews.llvm.org/D93524
2020-12-19 11:01:55 -08:00
Harald van Dijk adc55b5a5a
[X86] Avoid generating invalid R_X86_64_GOTPCRELX relocations
We need to make sure not to emit R_X86_64_GOTPCRELX relocations for
instructions that use a REX prefix. If a REX prefix is present, we need to
instead use a R_X86_64_REX_GOTPCRELX relocation. The existing logic for
CALL64m, JMP64m, etc. already handles this by checking the HasREX parameter
and using it to determine which relocation type to use. Do this for all
instructions that can use relaxed relocations.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D93561
2020-12-18 23:38:38 +00:00
Simon Pilgrim 8767f3bb97 [X86][AVX] Remove X86ISD::SUBV_BROADCAST (PR38969)
Followup to D92645 - remove the remaining places where we create X86ISD::SUBV_BROADCAST, and fold splatted vector loads to X86ISD::SUBV_BROADCAST_LOAD instead.

Remove all the X86SubVBroadcast isel patterns, including all the fallbacks for if memory folding failed.
2020-12-18 15:49:53 +00:00
Simon Pilgrim 992fad03e2 [X86][AVX] Replace extract_subvector(broadcast(), 0) folds with generic SimplifyDemandedVectorEltsForTargetNode handling.
Simplifies a few more cases, notably shuffle demanded elts cases.
2020-12-18 11:51:10 +00:00
Simon Pilgrim 931e66bd89 [X86] Remove extract_subvector(subv_broadcast_load()) fold.
This was needed in an earlier version of D92645, but isn't now - and I've just noticed that it was potentially flawed depending on the relevant widths of the broadcasted and extracted subvectors.
2020-12-17 11:02:49 +00:00
Simon Pilgrim cdb692ee0c [X86] Add X86ISD::SUBV_BROADCAST_LOAD and begin removing X86ISD::SUBV_BROADCAST (PR38969)
Subvector broadcasts are only load instructions, yet X86ISD::SUBV_BROADCAST treats them more generally, requiring a lot of fallback tablegen patterns.

This initial patch replaces constant vector lowering inside lowerBuildVectorAsBroadcast with direct X86ISD::SUBV_BROADCAST_LOAD loads which helps us merge a number of equivalent loads/broadcasts.

As well as general plumbing/analysis additions for SUBV_BROADCAST_LOAD, I needed to wrap SelectionDAG::makeEquivalentMemoryOrdering so it can handle result chains from non generic LoadSDNode nodes.

Later patches will continue to replace X86ISD::SUBV_BROADCAST usage.

Differential Revision: https://reviews.llvm.org/D92645
2020-12-17 10:25:25 +00:00
QingShan Zhang ebdd20f430 Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128
X86 and AArch64 expand it as libcall inside the target. And PowerPC also
want to expand them as libcall for P8. So, propose an implement in the
legalizer to common the logic and remove the code for X86/AArch64 to
avoid the duplicate code.

Reviewed By: Craig Topper

Differential Revision: https://reviews.llvm.org/D91331
2020-12-17 07:59:30 +00:00
Harald van Dijk 09d0e7a7c1
[X86] Avoid %fs:(%eax) references in x32 mode
The ABI explains that %fs:(%eax) zero-extends %eax to 64 bits, and adds
that the TLS base address, but that the TLS base address need not be
at the start of the TLS block, TLS references may use negative offsets.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D93158
2020-12-16 22:39:57 +00:00
Simon Pilgrim 553808d456 [X86] Rename reduction combiners to make it clearer whats happening. NFCI.
Since these are all working on reduction patterns, actually use that term in the function name to make them easier to search for.

At some point we're likely to start working with the ISD::VECREDUCE_* opcodes directly in the x86 backend, but that is still some way off.
2020-12-16 14:48:21 +00:00
Simon Pilgrim e55f7de946 [X86][SSE] combineReductionToHorizontal - don't rely on widenSubVector to handle illegal vector types.
Thanks to @asbirlea for reporting the bug.
2020-12-16 11:24:40 +00:00
Harald van Dijk 2aae2136d5
[X86] Add REX prefix for GOTTPOFF/TLSDESC relocs in x32 mode
The REX prefix is needed to allow linker relaxations: even if the
instruction we emit may not need it, the linker may change it to a
different instruction which does need it.
2020-12-15 23:07:34 +00:00
Simon Pilgrim 712117338a [X86] Explicitly use SDValue instead of auto. NFCI.
Fix static analyzer warning about not using a SDValue&
2020-12-15 17:27:25 +00:00
Simon Pilgrim b0e5aea557 [X86] Remove unnecessary SUBV_BROADCAST combines. NFCI.
Noticed while dealing with D92645 - these are now handled by getFauxShuffleMask + shuffle combining code.
2020-12-15 16:54:34 +00:00
Simon Pilgrim bd07092669 [X86] Remove trailing whitespace. NFC. 2020-12-15 10:11:38 +00:00
Simon Pilgrim 15a31389b2 [X86][AVX] LowerBUILD_VECTOR - reduce 256/512-bit build vectors with zero/undef upper elements + pad.
As discussed on D92645, we don't do a good job of recognising when we don't require the full width of a ymm/zmm build vector because the upper elements are undef/zero.

This commit allows us to make use of implicit zeroing of upper elements with AVX instructions, which we emulate in DAG with a INSERT_SUBVECTOR into the bottom of a undef/zero vector of the original type.

This exposed a limitation in getTargetConstantBitsFromNode which didn't extract bits from INSERT_SUBVECTORs of different element widths which I've included as well to prevent a couple of regressions.
2020-12-15 10:11:38 +00:00
Harald van Dijk 9eac818370
[X86] Fix variadic argument handling for x32
The X86-64 ABI defines va_list as

  typedef struct {
    unsigned int gp_offset;
    unsigned int fp_offset;
    void *overflow_arg_area;
    void *reg_save_area;
  } va_list[1];

This means the size, alignment, and reg_save_area offset will depend on
whether we are in LP64 or in ILP32 mode, so this commit adds the checks.
Additionally, the VAARG_64 pseudo-instruction assumed 64-bit pointers, so
this commit adds a VAARG_X32 pseudo-instruction that behaves just like
VAARG_64, except for assuming 32-bit pointers.

Some of these changes were originally done by
Michael Liao <michael.hliao@gmail.com>.

Fixes https://bugs.llvm.org/show_bug.cgi?id=48428.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D93160
2020-12-14 23:47:27 +00:00
Simon Pilgrim 5f5a2547c1 [X86] LowerBUILD_VECTOR - track zero/nonzero elements with APInt masks. NFCI.
Prep work for undef/zero 'upper elements' handling as proposed in D92645.
2020-12-14 16:28:45 +00:00
Kazu Hirata 913515e465 [Target] Use llvm::is_contained (NFC) 2020-12-13 19:35:10 -08:00
Craig Topper 0261ce9e17 [X86] Add ExeDomain = SSEPackedSingle to cvtss2sd and cvtsd2ss instrutions.
Prep for D92993
2020-12-13 12:35:33 -08:00
Craig Topper fa31f337a2 [X86] Add isel patterns to form VPDPWSSD from (add (vpmaddwd X, Y), Z) when AVXVNNI is enabled.
We already have these patterns for AVX512VNNI.
2020-12-13 12:02:07 -08:00
Simon Pilgrim d5c434d7dd [X86][SSE] combineX86ShufflesRecursively - add basic handling for combining shuffles of different widths (PR45974)
If a faux shuffle uses smaller shuffle inputs, try to recursively combine with those inputs directly instead of widening them immediately. Then widen all smaller inputs at the bottom of the recursion.

This will still mean we're generating nodes on the fly (PR45974) even if we don't combine to a new shuffle but it does help AVX2+ targets combine across xmm/ymm/zmm types, mainly as variable shuffles.
2020-12-13 17:18:07 +00:00
Simon Pilgrim 47321c311b [X86][SSE] combineReductionToHorizontal - add vXi8 ISD::MUL reduction handling (PR39709)
Default expansion leads to repeated extensions/truncations to/from vXi16 which shuffle combining and demanded elts can't completely unravel.

Better just to promote (any_extend) the input and perform a vXi16 reduction.

We'll be able to remove a lot of this if we ever get decent legalization support for reduction intrinsics in SelectionDAG.
2020-12-13 15:22:54 +00:00
Chris Sears 36a23b33aa X86: Correcting X86OutgoingValueHandler typo (NFC)
https://reviews.llvm.org/D92631
2020-12-12 20:28:37 -05:00
Harald van Dijk f61e5ecb91
[X86] Avoid data16 prefix for lea in x32 mode
The ABI demands a data16 prefix for lea in 64-bit LP64 mode, but not in
64-bit ILP32 mode. In both modes this prefix would ordinarily be
ignored, but the instructions may be changed by the linker to
instructions that are affected by the prefix.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D93157
2020-12-12 17:05:24 +00:00
Luo, Yuanke e52bc1d2bb [X86] Add chain in ISel for x86_tdpbssd_internal intrinsic. 2020-12-12 21:14:38 +08:00
Sanjay Patel 204bdc5322 [InstCombine][x86] fix insertion point bug in vector demanded elts fold (PR48476)
This transform was added at:
c63799fc52

From what I see, it's the first demanded elements transform that adds
a new instruction using the IRBuilder. There are similar folds in
the generic demanded bits chunk of instcombine that also use the
InsertPointGuard code pattern.

The tests here would assert/crash because the new instruction was
being added at the start of the demanded elements analysis rather
than at the instruction that is being replaced.
2020-12-11 17:23:35 -05:00
Luo, Yuanke f80b29878b [X86] AMX programming model.
This patch implements amx programming model that discussed in llvm-dev
 (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html).
 Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet.
 This patch implemeted 7 components.

1. The c interface to end user.
2. The AMX intrinsics in LLVM IR.
3. Transform load/store <256 x i32> to AMX intrinsics or split the
   type into two <128 x i32>.
4. The Lowering from AMX intrinsics to AMX pseudo instruction.
5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx
   intruction.
6. The register allocation for tile register.
7. Morph AMX pseudo instruction to AMX real instruction.

Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0

Differential Revision: https://reviews.llvm.org/D87981
2020-12-10 17:01:54 +08:00
Saleem Abdulrasool ee74d1b420 X86: use a data driven configuration of Windows x86 libcalls (NFC)
Rather than creating a series of associated calls and ensuring that
everything is lined up, use a table driven approach that ensures that
they two always stay in sync.
2020-12-09 22:49:11 +00:00
Craig Topper 5ff5cf8e05 [X86] Use APInt::isSignedIntN instead of isIntN for 64-bit ANDs in X86DAGToDAGISel::IsProfitableToFold
Pretty sure we meant to be checking signed 32 immediates here
rather than unsigned 32 bit. I suspect I messed this up because
in MathExtras.h we have isIntN and isUIntN so isIntN differs in
signedness depending on whether you're using APInt or plain integers.

This fixes a case where we didn't fold a constant created
by shrinkAndImmediate. Since shrinkAndImmediate doesn't topologically
sort constants it creates, we can fail to convert the Constant
to a TargetConstant. This leads to very strange behavior later.

Fixes PR48458.
2020-12-09 13:39:07 -08:00
Simon Pilgrim 24184dbb82 [X86] Fold CONCAT(VPERMV3(X,Y,M0),VPERMV3(Z,W,M1)) -> VPERMV3(CONCAT(X,Z),CONCAT(Y,W),CONCAT(M0,M1))
Further prep work toward supporting different subvector sizes in combineX86ShufflesRecursively
2020-12-09 14:29:32 +00:00
Kerry McLaughlin 4519ff4b6f [SVE][CodeGen] Add the ExtensionType flag to MGATHER
Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode.
Also updated SelectionDAGDumper::print_details so that details of the gather
load (is signed, is scaled & extension type) are printed.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91084
2020-12-09 11:19:08 +00:00
Harald van Dijk 29c8ea6f1a
[X86] Handle localdynamic TLS model in x32 mode
D92346 added TLS_(base_)addrX32 to handle TLS in x32 mode, but missed the
different TLS models. This diff fixes the logic for the local dynamic model
where `RAX` was used when `EAX` should be, and extends the tests to cover
all four TLS models.

Fixes https://bugs.llvm.org/show_bug.cgi?id=26472.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92737
2020-12-08 21:06:00 +00:00
Tim Northover c5978f42ec UBSAN: emit distinctive traps
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
2020-12-08 10:28:26 +00:00
Simon Pilgrim c86c024e10 [X86] Fix static analyzer warnings. NFCI.
Replace '|' with '||' in condition, and fix case of SignedMode variable.
2020-12-07 18:23:55 +00:00
Fangrui Song 9fe1809f8c [X86] Delete 3 unused declarations 2020-12-06 15:13:39 -08:00
Simon Pilgrim 0101fb73de [X86] Fold MOVMSK(ICMP_SGT(X,-1)) -> NOT(MOVMSK(X)))
Noticed while triaging PR37506
2020-12-06 17:56:41 +00:00
Layton Kifer ac522f8700 [DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1)
Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.

Differential Revision: https://reviews.llvm.org/D91589
2020-12-06 11:52:10 -05:00
Simon Pilgrim db900995ed [CostModel][X86] getGatherScatterOpCost - use default implementation for alt costkinds
Noticed while looking at D92701 - we only really handle TCK_RecipThroughput gather/scatter costs - for now drop back to the default implementation for non-legal gathers/scatters.
2020-12-06 14:08:26 +00:00
Fangrui Song 687b83ceab [X86FastISel] Fix MO_GOTPCREL GlobalValue reference in static relocation model
This fixes the bug referenced by 5582a79876
which was exposed by 961f31d8ad.

With this change, `movq src@GOTPCREL, %rcx` => `movq src@GOTPCREL(%rip), %rcx`
2020-12-05 23:13:28 -08:00
Fangrui Song a084c0388e [TargetMachine] Don't imply dso_local on function declarations in Reloc::Static model for ELF/wasm
clang/lib/CodeGen/CodeGenModule sets dso_local on applicable function declarations,
we don't need to duplicate the work in TargetMachine:shouldAssumeDSOLocal.
(Actually the long-term goal (started by r324535) is to drop TargetMachine::shouldAssumeDSOLocal.)

By not implying dso_local, we will respect dso_local/dso_preemptable specifiers
set by the frontend. This allows the proposed -fno-direct-access-external-data
option to work with -fno-pic and prevent a canonical PLT entry (SHN_UNDEF with non-zero st_value)
when taking the address of a function symbol.

This patch should be NFC in terms of the Clang emitted assembly because the case
we don't set dso_local is a case Clang sets dso_local. However, some tests don't
set dso_local on some function declarations and expose some differences. Most
tests have been fixed to be more robust in the previous commit.
2020-12-05 14:54:37 -08:00
Fangrui Song 37f0c8df47 [X86] Emit @PLT for x86-64 and keep unadorned symbols for x86-32
This essentially reverts the x86-64 side effect of r327198.

For x86-32, @PLT (R_386_PLT32) is not suitable in -fno-pic mode so the
code forces MO_NO_FLAG (like a forced dso_local) (https://bugs.llvm.org//show_bug.cgi?id=36674#c6).

For x86-64, both `call/jmp foo` and `call/jmp foo@PLT` emit R_X86_64_PLT32
(https://sourceware.org/bugzilla/show_bug.cgi?id=22791) so there is no
difference using @PLT. Using @PLT is actually favorable because this drops
a difference with -fpie/-fpic code and makes it possible to avoid a canonical
PLT entry when taking the address of an undefined function symbol.
2020-12-05 13:17:47 -08:00
Fangrui Song db13a138bd [TargetMachine] Move X86 specific shouldAssumeDSOLocal logic to X86Subtarget::classifyGlobalFunctionReference 2020-12-05 12:32:50 -08:00
Simon Pilgrim b96a521077 [X86] LowerRotate - enable custom lowering of ROTL/ROTR vXi16 on VBMI2 targets. 2020-12-04 12:16:59 +00:00
Simon Pilgrim d073805be6 [X86] LowerRotate - VBMI2 targets can lower vXi16 rotates using funnel shifts.
Ideally we'd do this inside DAGCombine but until we can make the FSHL/FSHR opcodes legal for VBMI2 it won't help us.
2020-12-04 11:29:23 +00:00
Simon Pilgrim df1ddc4234 [X86] Let VBMI2 non-VLX targets still use funnel shifts instructions 2020-12-04 11:06:43 +00:00
Simon Pilgrim 8eedd18fcb [X86] Remove unnecessary bitcast. NFC.
The X86ISD::SUBV_BROADCAST node is already VT
2020-12-04 09:44:57 +00:00
Xiang1 Zhang f2e2924463 [X86] Unbind the ebx with GOT address in regcall calling convention
No register can be allocated for indirect call when it use regcall calling
convention and passed 5/5+ args.
For example:
call vreg (ag1, ag2, ag3, ag4, ag5, ...) --> 5 regs (EAX, ECX, EDX, ESI, EDI)
used for pass args, 1 reg (EBX )used for hold GOT point, so no regs can be
allocated to vreg.

The Intel386 architecture provides 8 general purpose 32-bit registers. RA
mostly use 6 of them (EAX, EBX, ECX, EDX, ESI, EDI). 5 of this regs can be
used to pass function arguments (EAX, ECX, EDX, ESI, EDI).
EBX used to hold the GOT pointer when making function calls via the PLT.
ESP and EBP usually be "reserved" in register allocation.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D91020
2020-12-04 10:00:13 +08:00
Kazu Hirata a333071754 [X86] Remove DecodeVPERMVMask and DecodeVPERMV3Mask
This patch removes the variants of DecodeVPERMVMask and
DecodeVPERMV3Mask that take "const Constant *C" as they are not used
anymore.

They were introduced on Sep 8, 2015 in commit
e88038f235.

The last use of DecodeVPERMVMask(const Constant *C, ...)  was removed
on Feb 7, 2016 in commit 73fc26b44a.

The last use of DecodeVPERMV3Mask(const Constant *C, ...) was removed
on May 28, 2018 in commit dcfcfdb0d1.

Differential Revision: https://reviews.llvm.org/D91926
2020-12-03 09:12:02 -08:00
Harald van Dijk c9be4ef184
[X86] Add TLS_(base_)addrX32 for X32 mode
LLVM has TLS_(base_)addr32 for 32-bit TLS addresses in 32-bit mode, and
TLS_(base_)addr64 for 64-bit TLS addresses in 64-bit mode. x32 mode wants 32-bit
TLS addresses in 64-bit mode, which were not yet handled. This adds
TLS_(base_)addrX32 as copies of TLS_(base_)addr64, except that they use
tls32(base)addr rather than tls64(base)addr, and then restricts
TLS_(base_)addr64 to 64-bit LP64 mode, TLS_(base_)addrX32 to 64-bit ILP32 mode.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92346
2020-12-02 22:20:36 +00:00