Commit Graph

11405 Commits

Author SHA1 Message Date
Craig Topper 274e08dd81 [X86] Reject registers that require a REX prefix in inline asm constraints in 32-bit mode
We don't currently reject r8-r15 or xmm8-32 or bpl/spl/sil/dil in 32-bit mode.

Differential Revision: https://reviews.llvm.org/D44031

llvm-svn: 326826
2018-03-06 18:56:33 +00:00
Martin Storsjo a7adc3185b [X86] Handle EAX being live when calling chkstk for x86_64
EAX can turn out to be alive here, when shrink wrapping is done
(which is allowed when using dwarf exceptions, contrary to the
normal case with WinCFI).

This fixes PR36487.

Differential Revision: https://reviews.llvm.org/D43968

llvm-svn: 326764
2018-03-06 06:00:13 +00:00
Sanjay Patel 77ae82b84a [x86] auto-generate full checks for fabs tests
Also, change the x86-64 test to optimized and remove the 
unnecessary platform specification from the RUN lines..

llvm-svn: 326735
2018-03-05 19:11:20 +00:00
Craig Topper f2aae62228 [X86] Add a DAG combine to turn stores of vXi1 constants into scalar stores.
llvm-svn: 326679
2018-03-04 19:33:15 +00:00
Craig Topper 1209eb7d66 [X86] Add a 32-bit mode command line to avx512-mask-op.ll. Add tests for storing v2i1 and v4i1 constants.
llvm-svn: 326678
2018-03-04 19:33:13 +00:00
Craig Topper 4196dd12a2 [DAGCombiner] Add a peekThroughBitcast to MergeStoresOfConstantsOrVecElts to fix a crash if we are storing a bitcast of a constant.
Loading a constant into a k-register in AVX512 requires a bitcast from a scalar constant. In the test case here we have a k-register store that gets split into multiple parts of KNL. MergeConsecutiveStores sees each of these pieces as a consecutive store and looks through the bitcast to find the underly scalar constant. But when we went to create the combined store we didn't look through the same bitcast.

llvm-svn: 326677
2018-03-04 18:51:46 +00:00
Simon Pilgrim 8197b04b9b [X86][X87] Add X87 folded integer arithmetic tests
Add tests for FIADD/FISUB/FISUBR/FIMUL/FIDIV/FIDIVR

Shows we have more FILD stack usage than necessary (arg load, spill, reload to x87)

llvm-svn: 326674
2018-03-04 15:00:19 +00:00
Craig Topper a476026f70 [X86] Combine (store (v1i1 (scalar_to_vector (i8 X)))) -> (store (i8 X)).
llvm-svn: 326670
2018-03-04 01:48:02 +00:00
Craig Topper be31585be8 [X86] Lower v1i1/v2i1/v4i1/v8i1 load/stores to i8 load/store during op legalization if AVX512DQ is not supported.
We were previously doing this with isel patterns. Moving it to op legalization gives us chance to see the required bitcast earlier. And it lets us remove some isel patterns.

llvm-svn: 326669
2018-03-04 01:48:00 +00:00
Craig Topper dbf75c9c79 [LegalizeVectorTypes] When scalarizing the operand of a unary op like TRUNC, use a SCALAR_TO_VECTOR rather than a single element BUILD_VECTOR to convert back to a vector type.
X86 considers v1i1 a legal type under AVX512 and as such a truncate from a v1iX type to v1i1 can be turned into a scalar truncate plus a conversion to v1i1. We would much prefer a v1i1 SCALAR_TO_VECTOR over a one element BUILD_VECTOR.

During lowering we were detecting the v1i1 BUILD_VECTOR as a splat BUILD_VECTOR like we try to do for v2i1/v4i1/etc. In this case we create (select i1 splat_elt, v1i1 all-ones, v1i1 all-zeroes). That goes through some more legalization and we end up with a CMOV choosing between 0 and 1 in scalar and a scalar_to_vector.

Arguably we could detect the v1i1 BUILD_VECTOR and do this better in X86 target code. But just using a SCALAR_TO_VECTOR in legalization is much easier.

llvm-svn: 326637
2018-03-02 23:27:50 +00:00
Simon Pilgrim 8cbc1d232b [X86][BTVER2] Fix throughput of YMM bitwise instructions
These instructions are double-pumped, split into 2 128-bit ops and then passing through either FPU pipe.

Found while testing llvm-mca (D43951)

llvm-svn: 326597
2018-03-02 18:20:35 +00:00
Craig Topper 6b1419b547 [X86] Reject xmm16-31 in inline asm constraints when AVX512 is disabled
Fixes PR36532

Differential Revision: https://reviews.llvm.org/D43960

llvm-svn: 326596
2018-03-02 18:19:40 +00:00
Derek Schuff 57feeed307 [X86][x32] Save callee-save register used as base pointer for x32 ABI
For the x32 ABI, since the base pointer register (EBX) is a callee save register
it should be saved before use.

This fixes https://bugs.llvm.org/show_bug.cgi?id=36011

Differential Revision: https://reviews.llvm.org/D42358

Patch by Pratik Bhatu

llvm-svn: 326593
2018-03-02 17:46:39 +00:00
Clement Courbet c6638c813b [MergeICmps] Revert 324317 "Enable the MergeICmps Pass by default."
While working on PR36557.

llvm-svn: 326575
2018-03-02 14:34:49 +00:00
Craig Topper e7ca6f5456 [DAGCombiner] When combining zero_extend of a truncate, only mask before extending for vectors.
Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code.

Differential Revision: https://reviews.llvm.org/D42679

llvm-svn: 326500
2018-03-01 22:32:25 +00:00
Simon Pilgrim 90fd0622b6 [X86][MMX] Improve handling of 64-bit MMX constants
64-bit MMX constant generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type.

This patch bitcasts the constant to a double value to allow correct loading directly to the MMX register.

I've added MMX constant asm comment support to improve testing, it's better to always print the double values as hex constants as MMX is mainly an integer unit (and even with 3DNow! its just floats).

Differential Revision: https://reviews.llvm.org/D43616

llvm-svn: 326497
2018-03-01 22:22:31 +00:00
Craig Topper eedfbc4ab7 [SelectionDAG] Support some SimplifySetCC cases for comparing against vector splats of constants.
This supports things like

(setcc ugt X, 0) -> (setcc ne X, 0)

I've restricted to only make changes to vectors before legalize ops because I doubt all targets have accurate condition code legality information for vectors given how little we did before.

Differential Revision: https://reviews.llvm.org/D42948

llvm-svn: 326495
2018-03-01 22:15:39 +00:00
Simon Pilgrim e57167fab6 [X86][AVX] Add v2f32 <-> v2i8/v2i16/v2i32 vector tests
llvm-svn: 326494
2018-03-01 22:05:40 +00:00
Simon Pilgrim c6f4a6a020 [X86][SSE] Regenerate float to/from i8/i16 vector tests
llvm-svn: 326488
2018-03-01 21:21:30 +00:00
Simon Pilgrim 94ea374b18 [X86][SSE] Regenerate odd sized sext/zext tests
llvm-svn: 326484
2018-03-01 21:13:26 +00:00
Than McIntosh b3d88a7466 [CodeGen] fix argument attribute in lowering statepoint/patchpoint
Summary:
Use the correct loop index varaible, ArgI, to retrieve attributes.

Reviewers: thanm, sanjoy, rnk

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43832

llvm-svn: 326433
2018-03-01 13:31:57 +00:00
Craig Topper ccfa5257a6 [X86] Make sure we don't combine (fneg (fma X, Y, Z)) to a target specific node when there are no FMA instructions.
This would cause a 'cannot select' error at isel when we should have emitted a lib call and an xor.

Fixes PR36553.

llvm-svn: 326393
2018-03-01 00:08:38 +00:00
Simon Pilgrim c38756a992 [X86] Regenerate cmpxchg tests
Add 64-bit cmpxchg8b tests

llvm-svn: 326380
2018-02-28 22:57:23 +00:00
Craig Topper e31b9d1e5f [X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 and extending/truncating.
This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns.

llvm-svn: 326375
2018-02-28 22:23:55 +00:00
Simon Pilgrim 72b86586b0 [X86][AVX512] Improve support for signed saturation truncation stores
Matches what we already manage for unsigned saturation truncation stores

Differential Revision: https://reviews.llvm.org/D43629

llvm-svn: 326372
2018-02-28 21:42:19 +00:00
Chih-Hung Hsieh 9f9e4681ac [TLS] use emulated TLS if the target supports only this mode
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.

Differential Revision: https://reviews.llvm.org/D42999

llvm-svn: 326341
2018-02-28 17:48:55 +00:00
Alexander Ivchenko c01f750480 [GlobalIsel][X86] Support G_INTTOPTR instruction.
Add legalization/selection for x86/x86_64 and
corresponding tests.

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D43622

llvm-svn: 326320
2018-02-28 12:11:53 +00:00
Alexander Ivchenko 46e07e3623 [GlobalIsel][X86] Support G_PTRTOINT instruction.
Add legalization/selection for x86/x86_64 and
corresponding tests.

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D43617

llvm-svn: 326311
2018-02-28 09:18:47 +00:00
Craig Topper 48d5ed265c [X86] Don't use EXTRACT_ELEMENT from v1i1 with i8/i32 result type when we need to guarantee zeroes in the upper bits of return.
An extract_element where the result type is larger than the scalar element type is semantically an any_extend of from the scalar element type to the result type. If we expect zeroes in the upper bits of the i8/i32 we need to mae sure those zeroes are explicit in the DAG.

For these cases the best way to accomplish this is use an insert_subvector to pad zeroes to the upper bits of the v1i1 first. We extend to either v16i1(for i32) or v8i1(for i8). Then bitcast that to a scalar and finish with a zero_extend up to i32 if necessary. We can't extend past v16i1 because that's the largest mask size on KNL. But isel is smarter enough to know that a zext of a bitcast from v16i1 to i16 can use a KMOVW instruction. The insert_subvectors will be dropped during isel because we can determine that the producing instruction already zeroed the upper bits of the k-register.

llvm-svn: 326308
2018-02-28 08:14:28 +00:00
Geoff Berry a2b9011290 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Re-enable commit r323991 now that r325931 has been committed to make
MachineOperand::isRenamable() check more conservative w.r.t. code
changes and opt-in on a per-target basis.

llvm-svn: 326208
2018-02-27 16:59:10 +00:00
Simon Pilgrim ba43ec8702 [X86][AVX] combineLoopMAddPattern - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply
llvm-svn: 326189
2018-02-27 12:20:37 +00:00
Craig Topper e5d39e42b9 [X86] Add constant folding to combineMOVMSK.
There's still some shortcoming in our ability to combine binops of constants with different sizes separated by an extend. I'll try to look at that next.

llvm-svn: 326128
2018-02-26 21:17:33 +00:00
Craig Topper 5e0ceb8865 [X86] Add a custom legalization for (i16 (bitcast v16i1)) and (i32 (bitcast v32i1)) without AVX512 to prevent scalarization
Summary:
We have an early DAG combine to turn these patterns into MOVMSK, but that combine doesn't work if the vXi1 type has more elements than the widest legal vXi8 type. Type legalization will eventually split it down to v16i1 or v32i1 and then the bitcast gets legalized to a truncstore and a scalar load. The truncstore will get lowered to a series of extracts and bit math.

This patch adds a custom legalization to use a sign extend and MOVMSK instead. This prevents the eventual scalarization.

Reviewers: spatel, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D43593

llvm-svn: 326119
2018-02-26 20:32:27 +00:00
Simon Pilgrim db0ed7d724 [X86][AVX] createPSADBW - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply
llvm-svn: 326104
2018-02-26 18:17:25 +00:00
Simon Pilgrim 2f0aab9209 [X86][AVX] Add AVX1 PSAD tests
Cleanup check-prefixes to share more AVX/AVX512 codegen checks

llvm-svn: 326097
2018-02-26 15:55:25 +00:00
Simon Pilgrim 98fcd2eb27 [X86][SSE] Regenerate PSAD tests
Fixes scary typo in a check that lost the end digit off a reg#...

llvm-svn: 326093
2018-02-26 15:21:58 +00:00
Craig Topper f340a0e8c0 [X86] Add avx1 command line to madd.ll to show splitting and concatenating 256-bit operations.
llvm-svn: 326068
2018-02-26 07:48:17 +00:00
Craig Topper 5c980eba47 [X86] Don't use getZExtValue when we have no idea how large the input elements are.
llvm-svn: 326066
2018-02-26 04:43:24 +00:00
Craig Topper 79d189f597 [X86] Remove VT.isSimple() check from detectAVGPattern.
Which types are considered 'simple' is a function of the requirements of all targets that LLVM supports. That shouldn't directly affect what types we are able to handle. The remainder of this code checks that the number of elements is a power of 2 and takes care of splitting down to a legal size.

llvm-svn: 326063
2018-02-26 02:16:31 +00:00
Simon Pilgrim 295e8b4e12 [TargetLowering] SimplifyDemandedVectorElts - pass demanded elts through ADD/SUB ops
llvm-svn: 326044
2018-02-24 20:59:14 +00:00
Simon Pilgrim c0dbdb86c3 [TargetLowering] SimplifyDemandedVectorElts - pass demanded elts through TRUNCATE ops
llvm-svn: 326043
2018-02-24 19:28:34 +00:00
Craig Topper a6f8100788 [X86] Add cvt tests to avx512vl-intrinsics-fast-isel.ll
llvm-svn: 326042
2018-02-24 18:58:08 +00:00
Craig Topper 81c0eaf4c8 [X86] Allow int_x86_sse2_cvtps2dq and int_x86_avx_cvt_ps2dq_256 to select EVEX encoded instructions.
llvm-svn: 326041
2018-02-24 18:58:07 +00:00
Simon Pilgrim a4fb569483 [X86][SSE] combineSubToSubus - support v8i64 handling from SSSE3
Our UMIN/UMAX, vector truncation and shuffle combining is good enough to efficiently handle v8i64 with the number of leading zeros that are necessary for PSUBUS.

llvm-svn: 326034
2018-02-24 14:06:39 +00:00
Simon Pilgrim 8ad91261e8 [X86][SSE] combineSubToSubus - support v8i32 handling from SSSE3 (not SSE41)
Now that UMIN etc are Legal/Custom for SSE2+, we can efficiently match SUBUS v8i32 cases from SSSE3 which can perform efficient truncation with PSHUFB.

llvm-svn: 326033
2018-02-24 13:39:13 +00:00
Simon Pilgrim 744f008a75 [X86][SSE] combineSubToSubus - begun generalizing to work with any type sizes with SplitBinaryOpsAndApply
llvm-svn: 326030
2018-02-24 12:44:12 +00:00
Craig Topper 7bcac492d4 [X86] Remove checks for '(scalar_to_vector (i8 (trunc GR32:)))' from scalar masked move patterns.
This portion can be matched by other patterns. We don't need it to make the larger pattern valid. It's sufficient to have a v1i1 mask input without caring where it came from.

llvm-svn: 325999
2018-02-24 00:15:05 +00:00
Scott Linder 16c7bdaf32 [DebugInfo] Support DWARF v5 source code embedding extension
In DWARF v5 the Line Number Program Header is extensible, allowing values with
new content types. In this extension a content type is added,
DW_LNCT_LLVM_source, which contains the embedded source code of the file.

Add new optional attribute for !DIFile IR metadata called source which contains
source text. Use this to output the source to the DWARF line table of code
objects. Analogously extend METADATA_FILE in Bitcode and .file directive in ASM
to support optional source.

Teach llvm-dwarfdump and llvm-objdump about the new values. Update the output
format of llvm-dwarfdump to make room for the new attribute on file_names
entries, and support embedded sources for the -source option in llvm-objdump.

Differential Revision: https://reviews.llvm.org/D42765

llvm-svn: 325970
2018-02-23 23:01:06 +00:00
Sriraman Tallam 609f8c013c Intrinsics calls should avoid the PLT when "RtLibUseGOT" metadata is present.
Differential Revision: https://reviews.llvm.org/D42216

llvm-svn: 325962
2018-02-23 21:32:06 +00:00
Craig Topper 61d6ddbf0a [X86] Add DAG combine to remove (and X, 1) from in front of a v1i1 scalar to vector.
These can be created by type legalization promoting the inputs to select to match scalar boolean contents.

We were trying to pattern match them away during isel, but its better to just remove them from the DAG.

I've cleaned up some patterns to not check for this 'and' anymore. But I suspect this has also opened up opportunities for pattern removal.

llvm-svn: 325949
2018-02-23 20:13:42 +00:00