Commit Graph

16744 Commits

Author SHA1 Message Date
Craig Topper 9770107b5f [X86] Add JMP16r and JMP32r to Sandybridge scheduler model.
Fixes PR36010

llvm-svn: 327883
2018-03-19 19:00:37 +00:00
Craig Topper 5e65996fac [X86] Remove OUT32rr/OUT8rr/OUT32ri/OUT8ri from Sandybridge scheduler model.
PR35590 was already filed for this information being wrong. It's probably better to default to WriteSystem behavior instead of using something completely wrong.

llvm-svn: 327882
2018-03-19 19:00:35 +00:00
Craig Topper b4c7873f8c [X86] Add JCXZ/JECXZ to Sandybridge/Haswell/Broadwell/Skylake scheduler models.
JRCXZ was already present, but not the others.

We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output.

llvm-svn: 327881
2018-03-19 19:00:32 +00:00
Craig Topper afabf36505 [X86] Correct regular expression in Zen scheduler model that was excluding JECXZ instruction.
The regex was looking for JECXZ_32 or JECXZ_64, but their is just one instruction called JECXZ. They used to exist as separate instructions, but were merged over 3 years ago.

llvm-svn: 327880
2018-03-19 19:00:29 +00:00
Craig Topper 591f44df54 [X86] Correct the SchedRW on (V)MOVAPSrr_REV and similar to match their non _REV counterparts.
llvm-svn: 327879
2018-03-19 19:00:26 +00:00
Craig Topper 836cfb3a4c [X86] Add the rest of the TEST with immediate instructions to the scheduler models to match their 8-bit counterpart.
llvm-svn: 327874
2018-03-19 17:58:41 +00:00
Craig Topper 645e531a69 [X86] Add MOV16ri*/MOV32ri*/MOV64ri* to scheduler models to match MOV8ri. Correct SchedRW and itinerary for MOV32ri64.
llvm-svn: 327872
2018-03-19 17:46:59 +00:00
Craig Topper 259eaa6e7c [X86] Remove sse41 specific code from lowering v16i8 multiply
With the SRAs removed from the SSE2 code in D44267, then there doesn't appear to be any advantage to the sse41 code. The punpcklbw instruction and pmovsx seem to have the same latency and throughput on most CPUs. And the SSE41 code requires moving the upper 64-bits into the lower 64-bit before the sign extend can be done. The unpckhbw in sse2 code can do better than that.

llvm-svn: 327869
2018-03-19 17:31:41 +00:00
Craig Topper 5ccd87233f [X86] Make the multiply and divide itineraries more consistent.
Sometimes we used the same itinerary for MEM and REG forms, but that seems inconsistent with our usual usage.

We also used the MUL8 itinerary for MULX32/64 which was also weird.

The test changes are because we were using IIC_IMUL32_RR and IIC_IMUL64_RR instead of IIC_IMUL32_REG/IIC_IMUL64_REG for the 32 and 64 bit multiplies that produce double width result.

llvm-svn: 327866
2018-03-19 16:38:33 +00:00
Simon Pilgrim 30c38c3849 [X86] Generalize schedule classes to support multiple stages
Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults.

This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases.

I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific.

Differential Revision: https://reviews.llvm.org/D44612

llvm-svn: 327855
2018-03-19 14:46:07 +00:00
Sanjay Patel 05daae75ad [x86] put nops into the WriteNop class and customize for Jaguar
1. Given that we already have a classification bucket with 'nop' in the name, 
   that's where 'nop' belongs. Right now, it's only used for prefix bytes and 'pause'.
2. Make the latency of this class '1' for Jaguar to tell the scheduler (and presumably 
   llvm-mca) how to model the resource requirements better even though a nop has no 
   dependencies.

Differential Revision: https://reviews.llvm.org/D44608

llvm-svn: 327853
2018-03-19 14:26:50 +00:00
Craig Topper e18fbab988 [X86] Merge XADD8rr regular expression with XADD16rr/XADD32rr/XADD64rr in a couple scheduler models.
llvm-svn: 327821
2018-03-19 04:21:42 +00:00
Craig Topper d10ceffa5f [X86] Add ADD16i16/ADD32i32/ADD64i32 and similar to the scheduler models to match ADD8i8.
Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm.

llvm-svn: 327820
2018-03-19 04:21:40 +00:00
Craig Topper e9c99d32b3 [X6] Remove two unused InstrItinClass
llvm-svn: 327819
2018-03-19 02:07:32 +00:00
Craig Topper 793733a6c8 [X86] Use IIC_CMOV64_RR/RM on 64-bit cmov instructions.
llvm-svn: 327817
2018-03-19 00:56:12 +00:00
Craig Topper 9b60dcb29b [X86] Merge 32 and 64-bit RORX/SHLX/SARX/SHRX into single regular expressions in scheduler models.
llvm-svn: 327816
2018-03-19 00:56:11 +00:00
Craig Topper 13a1650d8a [X86] Merge 8-bit instructions into instregex with 16/32/64 instructions in the scheduler models as much as possible. NFCI
This reduces the total number of generated scheduler classes from 5404 to 5316.

llvm-svn: 327815
2018-03-19 00:56:09 +00:00
Simon Pilgrim 203876f104 [X86][Btver2] Fix crc32 schedule costs
The default is currently FAdd for some reason

llvm-svn: 327807
2018-03-18 19:54:42 +00:00
Simon Pilgrim c3db8c7cda [X86][Btver2] FADD/FHADD ymm instructions are double pumped on the JFPA functional pipe
llvm-svn: 327804
2018-03-18 18:45:57 +00:00
Simon Pilgrim 036cc82622 [X86][Btver2] Float bitwise ymm instructions are double pumped on the JFPX (JFPA/JFPM) functional pipes
llvm-svn: 327803
2018-03-18 17:10:12 +00:00
Simon Pilgrim 87d2f7463f [X86][Btver2] F16C instructions are performed on the JSTC functional pipe
llvm-svn: 327801
2018-03-18 15:59:51 +00:00
Simon Pilgrim 541992203d [X86][Btver2] Strip default latency/resource values. NFCI.
llvm-svn: 327795
2018-03-18 13:16:11 +00:00
Simon Pilgrim 40f6d6ad0b [X86][Btver2] SSE4A EXTRQ/INSERTQ instructions are performed on the JVALU0/JVALU1 functional pipes
llvm-svn: 327794
2018-03-18 13:05:09 +00:00
Simon Pilgrim e16790b133 [X86][Btver2] Modelled float bitwise instructions as being performed on the float cluster (FPA/FPM) not the integer.
llvm-svn: 327793
2018-03-18 12:37:35 +00:00
Simon Pilgrim e409f84e7e [X86][Btver2] Correctly distinguish between scheduling pipe and functional unit for JWriteResFpuPair defs
Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit.

This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later.

llvm-svn: 327791
2018-03-18 12:09:17 +00:00
Simon Pilgrim f86d48b3ae [X86][Btver2] Merge equivalent VBLENDVY + VPERMILY schedule groups
Thanks to Craig Topper for noticing this.

llvm-svn: 327789
2018-03-18 10:22:35 +00:00
Craig Topper 2d451e73f9 [X86] Fix a bunch of overlapping regular expressions in the scheduler models.
llvm-svn: 327787
2018-03-18 08:38:06 +00:00
Craig Topper 86b02cf076 [X86] Fix a couple typos in the Zen scheduler model.
llvm-svn: 327786
2018-03-18 08:38:04 +00:00
Craig Topper 89dcda3e90 [X86] Remove MMX_MASKMOVQ64 and VMASKMOVDQU from scheduler models.
The information was so wildly inaccurate and incomplete its better to just remove it.

MMX_MASKMOVQ64 showed up twice in several scheduler models. In Haswell and Broadwell they were on adjacent lines. On Skylake the copies had different information.

MMX_MASKMOVQ and MASKMOVDQU were completely missing.

MMX_MASKMOVQ64 was listed on Haswell/Broadwell as 1 cycle on port 1 despite it being a store instruction.

Filed PR36780 to track fixing this right.

llvm-svn: 327783
2018-03-18 03:24:42 +00:00
Nirav Dave 5f0ab71b62 Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172""
as it times out building test-suite on PPC.

llvm-svn: 327778
2018-03-17 19:24:54 +00:00
Nirav Dave 982d3a56ea [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"
Reland ISel cycle checking improvements after simplifying and reducing
node id invariant traversal.

llvm-svn: 327777
2018-03-17 17:42:10 +00:00
Oren Ben Simhon fdd72fd522 [X86] Added support for nocf_check attribute for indirect Branch Tracking
X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET).
IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp.
The `nocf_check` attribute has two roles in the context of X86 IBT technology:
	1. Appertains to a function - do not add ENDBR instruction at the beginning of the function.
	2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction.

This patch implements `nocf_check` context for Indirect Branch Tracking.
It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks.

Differential Revision: https://reviews.llvm.org/D41879

llvm-svn: 327767
2018-03-17 13:29:46 +00:00
Jessica Paquette b3e7dc9144 [MachineOutliner] Make KILLs invisible
At the point the outliner runs, KILLs don't impact anything, but they're still
considered unique instructions. This commit makes them invisible like
DebugValues so that they can still be outlined without impacting outlining
decisions.

llvm-svn: 327760
2018-03-16 22:53:34 +00:00
Craig Topper 25007c4f32 [X86] Pass SelectionDAG into X86ISelAddressMode::dump and on to SDNode::dump.
This prevents a crash in SelectionDAGDumper with -debug when trying to print mem operands if one of the registers in the addressing mode comes from a load.

llvm-svn: 327744
2018-03-16 21:10:07 +00:00
Craig Topper f0815e01d8 [X86] Merge ADDSUB/SUBADD detection into single methods that can detect either and indicate what they found.
Previously, we called the same functions twice with a bool flag determining whether we should look for ADDSUB or SUBADD. It would be more efficient to run the code once and detect either pattern with a flag to tell which type it found.

Differential Revision: https://reviews.llvm.org/D44540

llvm-svn: 327730
2018-03-16 18:25:59 +00:00
Craig Topper e6913ec340 [X86] Post process the DAG after isel to remove vector moves that were added to zero upper bits.
We previously avoided inserting these moves during isel in a few cases which is implemented using a whitelist of opcodes. But it's too difficult to generate a perfect list of opcodes to whitelist. Especially with AVX512F without AVX512VL using 512 bit vectors to implement some 128/256 bit operations. Since isel is done bottoms up, we'd have to check the VT and opcode and subtarget in order to determine whether an EXTRACT_SUBREG would be generated for some operations.

So instead of doing that, this patch adds a post processing step that detects when the moves are unnecesssary after isel. At that point any EXTRACT_SUBREGs would have already been created and appear in the DAG. So then we just need to ensure the input to the move isn't one.

Differential Revision: https://reviews.llvm.org/D44289

llvm-svn: 327724
2018-03-16 17:13:42 +00:00
Simon Pilgrim 23578e7d3c [X86][Btver2] Add correct mul/imul schedule costs
Integer multiply is performed on the JMul function unit and i64 requires double pumping

llvm-svn: 327707
2018-03-16 14:01:01 +00:00
Simon Pilgrim 8d28ae6aec [X86][Btver2] Add correct lzcnt/tzcnt/popcnt schedule costs
Don't use WriteIMul defaults

llvm-svn: 327706
2018-03-16 13:43:55 +00:00
Simon Pilgrim 14e5a1b05b [X86][Btver2] Add support for multiple pipelines stages for x86 scalar schedules. NFCI.
This allows us to use JWriteResIntPair for complex schedule classes (like WriteIDiv) as well as single pipe instructions.

llvm-svn: 327686
2018-03-15 23:46:12 +00:00
Craig Topper 1b8cf49704 [SelectionDAG][ARM][X86] Teach PromoteIntRes_SETCC to do a better job picking the result type for the setcc.
Previously if getSetccResultType returned an illegal type we just fell back to using the default promoted type. This appears to have been to handle the case where for vectors getSetccResultType returns the input type, but the input type itself isn't legal and will need to be promoted. Without the legality check we would never reach a legal type.

But just picking the promoted type to be the setcc type can create strange setccs where the result type is 128 bits and the operand type is 256 bits. If for example the result type was promoted to v8i16 from v8i1, but the input type was promoted from v8i23 to v8i32. We currently handle this with custom lowering code in X86.

This legality check also caused us reject the getSetccResultType when the input type needed to be widened or split. Even though that result wouldn't have caused legalization to get stuck.

This patch tries to fix this by detecting the getSetccResultType needs to be promoted. If its input type also needs to be promoted we'll try a ask for a new setcc result type based on its eventual promoted value. Otherwise we fall back to default type to promote to.

For any other illegal values we might get back from the initial call to getSetccResultType we just keep and allow it to be re-legalized later via splitting or widening or scalarizing.

llvm-svn: 327683
2018-03-15 23:04:11 +00:00
Simon Pilgrim 3894809997 [X86][Btver2] Fix ymm div/sqrt to use fmul unit
YMM FDiv/FSqrt are dispatched on pipe JFPU1 but should be performed on the JFPM unit - that is where most of the cycles are spent.

This matches the pipes for WriteFSqrt/WriteFDiv definitions.

llvm-svn: 327682
2018-03-15 23:00:47 +00:00
Craig Topper c3983c34cd [X86] Make sure we use FSUB instruction as the reference for operand order in isAddSubOrSubAdd when recognizing subadd
The FADD part of the addsub/subadd pattern can have its operands commuted, but when checking for fsubadd we were using the fadd as reference and commuting the fsub node.

llvm-svn: 327660
2018-03-15 20:30:54 +00:00
Simon Pilgrim 48b758e8ad [X86][Btver2] Attach AES/CLMUL instructions to a scheduler pipe
llvm-svn: 327650
2018-03-15 17:45:10 +00:00
Craig Topper 5a0251fe67 [X86] Simplify the type legality checking for (FM)ADDSUB/SUBADD matching. NFCI
Rather than enumerating all specific types, for the DAG combine we can just use TLI::isTypeLegal and an SSE3 check. For the BUILD_VECTOR version we already know the type is legal so we just need to check SSE3.

llvm-svn: 327649
2018-03-15 17:38:59 +00:00
Craig Topper 627e001fad [X86] Fix 80 column violations.
llvm-svn: 327648
2018-03-15 17:38:55 +00:00
Simon Pilgrim d30df5769e [X86][Btver2] Remove JAny resource, and map system/microcoded instructions to JALU pipes
Simplifies throughput to the issue width (1/2) instead of permitting any pipe (1/6)

llvm-svn: 327632
2018-03-15 15:12:12 +00:00
Simon Pilgrim fb7aa57bf1 [X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore scheduler classes
As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types.

I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428.

Differential Revision: https://reviews.llvm.org/D44471

llvm-svn: 327630
2018-03-15 14:45:30 +00:00
Craig Topper 26a3a80c87 [X86] Add support for matching FMSUBADD from build_vector.
llvm-svn: 327604
2018-03-15 06:14:55 +00:00
Craig Topper a5e712f402 [X86] Remove old TODO. We have coverage for this now.
Coverage was added in r320950.

llvm-svn: 327603
2018-03-15 06:14:53 +00:00
Craig Topper b9526e9fdb [X86] Use MVT in a couple places where we know the type is legal.
llvm-svn: 327602
2018-03-15 06:14:51 +00:00
Simon Pilgrim 48fbf0c69a [X86][Btver2] Add support for multiple pipelines stages for fpu schedules. NFCI.
This allows us to use JWriteResFpuPair for complex schedule classes as well as single pipe instructions.

llvm-svn: 327588
2018-03-14 23:12:09 +00:00
Simon Pilgrim dfeebdbed7 [X86][Btver2] Add ResourceCycles and NumMicroOps overrides to scalar instructions. NFCI.
Currently still use default values - this is setup for a future patch.

llvm-svn: 327582
2018-03-14 21:55:54 +00:00
Craig Topper 9c098ed819 [X86] Add back fast-isel code for handling i8 shifts.
I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds.

This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0.

Fixes PR36731.

llvm-svn: 327540
2018-03-14 17:57:19 +00:00
Craig Topper b36cb20ef9 [X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set non-demanded bits if it helps created an and mask that can be matched as a zero extend.
I had to modify the bswap recognition to allow unshrunk masks to make this work.

Fixes PR36689.

Differential Revision: https://reviews.llvm.org/D44442

llvm-svn: 327530
2018-03-14 16:55:15 +00:00
Simon Pilgrim d1c3c995c0 [X86][AVX] Use WriteFShuffleLd for broadcast reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327524
2018-03-14 15:47:08 +00:00
Alexander Ivchenko 86ef9ab28f [GlobalIsel][X86] Support for G_SDIV instruction
Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44430

llvm-svn: 327520
2018-03-14 15:41:11 +00:00
Simon Pilgrim d594942928 [X86][Btver2] Fix YMM shuffle, permute and permutevar scheduler costs
Account for ymm double pumping and add proper pshufb/permutevar support

llvm-svn: 327510
2018-03-14 14:05:19 +00:00
Simon Pilgrim de995e6e37 [X86][SSE] Use WriteFShuffleLd for MOVDDUP/MOVSHDUP/MOVSLDUP reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327505
2018-03-14 13:22:56 +00:00
Alexander Ivchenko 0bd4d8c901 [GlobalISel][X86] Support G_LSHR/G_ASHR/G_SHL
Support G_LSHR/G_ASHR/G_SHL. We have 3 variance for
shift instructions : shift gpr, shift imm, shift 1.
Currently GlobalIsel TableGen generate patterns for
shift imm and shift 1, but with shiftCount i8.
In G_LSHR/G_ASHR/G_SHL like LLVM-IR both arguments
has the same type, so for now only shift i8 can use
auto generated TableGen patterns.

The support of G_SHL/G_ASHR enables tryCombineSExt
from LegalizationArtifactCombiner.h to hit, which
results in different legalization for the following tests:
    LLVM :: CodeGen/X86/GlobalISel/ext-x86-64.ll
    LLVM :: CodeGen/X86/GlobalISel/gep.ll
    LLVM :: CodeGen/X86/GlobalISel/legalize-ext-x86-64.mir

-; X64-NEXT:    movsbl %dil, %eax
+; X64-NEXT:    movl $24, %ecx
+; X64-NEXT:    # kill: def $cl killed $ecx
+; X64-NEXT:    shll %cl, %edi
+; X64-NEXT:    movl $24, %ecx
+; X64-NEXT:    # kill: def $cl killed $ecx
+; X64-NEXT:    sarl %cl, %edi
+; X64-NEXT:    movl %edi, %eax

..which is not optimal and should be addressed later.

Rework of the patch by igorb

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44395

llvm-svn: 327499
2018-03-14 11:23:57 +00:00
Alexander Ivchenko 327de80529 [GlobalIsel][X86] Support for G_ZEXT instruction
Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44378

llvm-svn: 327482
2018-03-14 09:11:23 +00:00
Matt Arsenault 41e5ac4fa4 TargetMachine: Add address space to getPointerSize
llvm-svn: 327467
2018-03-14 00:36:23 +00:00
Craig Topper ec4881ad53 [X86] Simplify the LowerAVXCONCAT_VECTORS code a little by creating a single path for insert_subvector handling.
We now only create recursive concats if we have more than two non-zero values. This keeps our subvector broadcast DAG combine functioning.

llvm-svn: 327457
2018-03-13 22:36:07 +00:00
Craig Topper cc060e921b [X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats.
This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs.

This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best.

We probably want to merge this with the vXi1 concat code since they are very similar.

llvm-svn: 327454
2018-03-13 22:05:25 +00:00
Craig Topper 7e711a6822 [X86] Remove SplitBinaryOpsAndApply and use SplitOpsAndApply by adding curly braces around the ops.
Summary: Unless you were intentionally avoiding this syntax? I saw you mentioned makeArrayRef in your commit that added SplitOpsAndApply.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44403

llvm-svn: 327418
2018-03-13 16:23:27 +00:00
Simon Pilgrim 3d4c86d399 [X86][Btver2] Split i8/i16/i32/i64 div/idiv costs
We were assuming a mixture of 32/64 division costs.

llvm-svn: 327407
2018-03-13 15:22:24 +00:00
Simon Pilgrim 93bd7187f4 [X86][SSE41] createVariablePermute v2X64 - PCMPEQQ can test for index 0/1 and select between them.
llvm-svn: 327385
2018-03-13 12:22:58 +00:00
Simon Pilgrim 7f1b9196cb [X86][Btver2] Clean up formatting/comments in scheduler model. NFCI.
Moved 'special cases' to be closer to other system classes.

llvm-svn: 327332
2018-03-12 21:35:12 +00:00
Simon Pilgrim f0a9b25394 [X86][Btver2] FSqrt/FDiv reg-reg instructions don't use the AGU.
I love you llvm-mca.

llvm-svn: 327306
2018-03-12 18:12:46 +00:00
Simon Pilgrim a0536c17b9 [X86] Deleting README-MMX.txt now that all tasks have been completed.
MMX buildvectors were improved at rL327247 - new MMX bugs should be raised on bugzilla

llvm-svn: 327300
2018-03-12 17:29:54 +00:00
Simon Pilgrim deface9c73 [X86][Btver2] Prefix all scheduler defs. NFCI.
These are all global, so prefix with 'J' to help prevent accidental name clashes with other models.

llvm-svn: 327296
2018-03-12 17:07:08 +00:00
Craig Topper acaba3b402 [X86] Remove use of MVT class from the ShuffleDecode library.
MVT belongs to the CodeGen layer, but ShuffleDecode is used by the X86 InstPrinter which is part of the MC layer. This only worked because MVT is completely implemented in a header file with no other library dependencies.

Differential Revision: https://reviews.llvm.org/D44353

llvm-svn: 327292
2018-03-12 16:43:11 +00:00
Simon Pilgrim 6f01e654b4 [X86][Btver2] Extend JWriteResFpuPair to accept resource/uop counts. NFCI.
This allows the single resource classes (VarBlend, MPSAD, VarVecShift) to use the JWriteResFpuPair macro.

llvm-svn: 327289
2018-03-12 16:02:56 +00:00
Simon Pilgrim bc216b440f [X86][Btver2] Use JWriteResFpuPair wrapper for AES/CLMUL/HADD scheduler cases. NFCI.
These are single pipe and have the default resource/uop counts like JWriteResFpuPair so there's no need to handle them separately.

llvm-svn: 327283
2018-03-12 15:29:00 +00:00
Nico Weber 73a699e592 MC intel asm parser: Allow @ at the start of function names.
Ports parts of r193000 to the intel parser. Fixes part of PR36676.

https://reviews.llvm.org/D44359

llvm-svn: 327262
2018-03-12 12:47:27 +00:00
Simon Pilgrim 6618e2a09c [X86][SSE] createVariablePermute - PSHUFB requires SSSE3 not just SSE3
llvm-svn: 327259
2018-03-12 12:30:04 +00:00
Craig Topper 7cc1b1fc84 [X86] Don't compute known bits twice for the same SDValue in LowerMUL.
We called MaskedValueIsZero with two different masks, but underneath that calls computeKnownBits before applying the mask. This means we compute the same known bits twice due to the two calls. Instead just call computeKnownBits directly and apply the two masks ourselves.

llvm-svn: 327251
2018-03-12 05:35:02 +00:00
Simon Pilgrim d09cc9c62c [X86][MMX] Support MMX build vectors to avoid SSE usage (PR29222)
64-bit MMX vector generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type.

This patch creates a MMX vector from MMX source values, taking the lowest element from each source and constructing broadcasts/build_vectors with direct calls to the MMX PUNPCKL/PSHUFW intrinsics.

We're missing a few consecutive load combines that could be handled in a future patch if that would be useful - my main interest here is just avoiding a lot of the MMX/SSE crossover.

Differential Revision: https://reviews.llvm.org/D43618

llvm-svn: 327247
2018-03-11 19:22:13 +00:00
Simon Pilgrim 30f74c14ff [X86][AVX] createVariablePermute - scale v16i16 variable permutes to use v32i8 codegen
XOP was already doing this, and now AVX performs v32i8 variable permutes as well.

llvm-svn: 327245
2018-03-11 17:23:54 +00:00
Simon Pilgrim b306501796 [X86][AVX] createVariablePermute - widen permutes for cases where the source vector is wider than the destination type
llvm-svn: 327244
2018-03-11 17:00:46 +00:00
Simon Pilgrim 9a5d0c7540 [X86][AVX] createVariablePermute - use PSHUFB+PCMPGT+SELECT for v32i8 variable permutes
Same as the VPERMILPS/VPERMILPD approach for v8f32/v4f64 cases, rely on PSHUFB using bits[3:0] for indexing - we can ignore the sign bit (zero element) as those index vector values are considered undefined. The select between the lo/hi permute results based on the index size.

llvm-svn: 327242
2018-03-11 16:28:11 +00:00
Simon Pilgrim d2fbd87ce8 Fix for buildbots which didn't like makeArrayRef with initializer lists.
llvm-svn: 327241
2018-03-11 14:31:55 +00:00
Simon Pilgrim e60afdf9eb [X86][SSE] Generalized SplitBinaryOpsAndApply to SplitOpsAndApply to support any number of ops.
I've kept SplitBinaryOpsAndApply as a wrapper to avoid a lot of makeArrayRef code.

llvm-svn: 327240
2018-03-11 14:04:53 +00:00
Simon Pilgrim f9cc80d218 [X86][AVX] createVariablePermute - use 2xVPERMIL+PCMPGT+SELECT for v8i32/v8f32 and v4i64/v4f64 variable permutes
As VPERMILPS/VPERMILPD only selects elements based on the bits[1:0]/bit[1] then we can permute both the (repeated) lo/hi 128-bit vectors in each case and then select between these results based on whether the index was for for lo/hi.

For v4i64/v4f64 this avoids some rather nasty v4i64 multiples on the AVX2 implementation, which seems to be worse than the extra port5 pressure from the additional shuffles/blends.

llvm-svn: 327239
2018-03-11 11:52:26 +00:00
Simon Pilgrim 2565bd421e [X86][AVX512] createVariablePermute - Non-VLX targets can widen v4i64/v8f64 variable permutes to v8i64/v8f64
Permutes in the upper elements will be undefined, but they will be discarded anyway.

llvm-svn: 327238
2018-03-11 11:19:19 +00:00
Simon Pilgrim 64b899f0f3 [x86][SSE] Add widenSubVector helper. NFCI.
Helper function to insert a subvector into the bottom elements of a larger zero/undef vector with the same scalar type.

I've converted a couple of INSERT_SUBVECTOR calls to use it, there are plenty more although in some cases I was worried it might make the code more ambiguous. 

llvm-svn: 327236
2018-03-11 10:50:48 +00:00
Craig Topper d88204fe1b [X86] Add comments to the end of FMA3 instructions to make the operation clear
Summary:
There are 3 different operand orders for FMA instructions so figuring out the exact operation being performed requires a lot of thought.

This patch adds a comment to the end of the assembly line to print the exact operation.

I think I've got all the instructions in here except the ones with builtin rounding.

I didn't update all tests, but I assume we can get them as we regenerate tests in the future.

Reviewers: spatel, v_klochkov, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44345

llvm-svn: 327225
2018-03-10 21:30:46 +00:00
Simon Pilgrim de7f3f0f91 [X86][XOP] createVariablePermute - use VPERMIL2 for v8i32/v4i64 variable permutes
llvm-svn: 327222
2018-03-10 19:49:59 +00:00
Simon Pilgrim ff1248f82f [X86][XOP] createVariablePermute - use VPPERM for v16i16 variable permutes
llvm-svn: 327218
2018-03-10 18:33:29 +00:00
Simon Pilgrim d9dc114e2f [X86][SSE] createVariablePermute - create index scaling helper. NFCI.
This will help in some future changes for custom lowering.

llvm-svn: 327217
2018-03-10 18:12:35 +00:00
Simon Pilgrim 8224241f75 [X86][XOP] createVariablePermute - use VPPERM for v32i8 variable permutes
llvm-svn: 327213
2018-03-10 16:51:45 +00:00
Craig Topper b76ed82b58 [X86] Add a missing EVEX instruction to EmitAnyX86InstComments.
The equivalent SSE and VEX instruction are already there.

llvm-svn: 327205
2018-03-10 06:05:13 +00:00
Craig Topper f27016f3de [X86] Move the AC_EVEX_2_VEX AsmComments enum to X86InstrInfo.h from X86InstComments.h.
X86InstComments.h is used by tools that only have the MC layer. We shouldn't be importing a file from CodeGen into this.

X86InstrInfo.h isn't a great place, but I couldn't find a better one.

llvm-svn: 327202
2018-03-10 05:15:22 +00:00
Craig Topper 9804c67d21 [X86] Rewrite printMasking code in X86InstComments to use TSFlags to determine whether the instruction is masked.
This should have been NFC, but it looks like we were missing PUNPCKLHQDQ/PUNPCKLQDQ instructions in there.

llvm-svn: 327200
2018-03-10 03:12:00 +00:00
Nirav Dave 042678bd55 Revert: r327172 "Correct load-op-store cycle detection analysis"
r327171 "Improve Dependency analysis when doing multi-node Instruction Selection"
        r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection"

Reverting patch as NodeId invariant change is causing pathological
increases in compile time on PPC

llvm-svn: 327197
2018-03-10 02:16:15 +00:00
Nirav Dave 0fab41782d Correct load-op-store cycle detection analysis
Add missing cycle dependency checks in load-op-store fusion.

Fixes PR36274.

Reviewers: craig.topper, bogner

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D43154

llvm-svn: 327172
2018-03-09 20:58:07 +00:00
Nirav Dave d668f69ee7 Improve Dependency analysis when doing multi-node Instruction Selection
Relanding after fixing NodeId Invariant.

Cleanup cycle/validity checks in ISel (IsLegalToFold,
HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full
search for cycles / dependencies pruning the search when topological
property of NodeId allows.

As part of this propogate the NodeId-based cutoffs to narrow
hasPreprocessorHelper searches.

Reviewers: craig.topper, bogner

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D41293

llvm-svn: 327171
2018-03-09 20:57:42 +00:00
Nirav Dave 071699bf82 [DAG] Enforce stricter NodeId invariant during Instruction selection
Instruction Selection makes use of the topological ordering of nodes
by node id (a node's operands have smaller node id than it) when doing
cycle detection.  During selection we may violate this property as a
selection of multiple nodes may induce a use dependence (and thus a
node id restriction) between two unrelated nodes. If a selected node
has an unselected successor this may allow us to miss a cycle in
detection an invalid selection.

This patch fixes this by marking all unselected successors of a
selected node have negated node id.  We avoid pruning on such negative
ids but still can reconstruct the original id for pruning.

In-tree targets have been updated to replace DAG-level replacements
with ISel-level ones which enforce this property.

This preemptively fixes PR36312 before triggering commit r324359 relands

Reviewers: craig.topper, bogner, jyknight

Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D43198

llvm-svn: 327170
2018-03-09 20:57:15 +00:00
Peter Collingbourne 2974856ad4 Use branch funnels for virtual calls when retpoline mitigation is enabled.
The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the
branch predictor, and as a result it can lead to a measurable loss of
performance. We can reduce the performance impact of retpolined virtual
calls by replacing them with a special construct known as a branch
funnel, which is an instruction sequence that implements virtual calls
to a set of known targets using a binary tree of direct branches. This
allows the processor to speculately execute valid implementations of the
virtual function without allowing for speculative execution of of calls
to arbitrary addresses.

This patch extends the whole-program devirtualization pass to replace
certain virtual calls with calls to branch funnels, which are
represented using a new llvm.icall.jumptable intrinsic. It also extends
the LowerTypeTests pass to recognize the new intrinsic, generate code
for the branch funnels (x86_64 only for now) and lay out virtual tables
as required for each branch funnel.

The implementation supports full LTO as well as ThinLTO, and extends the
ThinLTO summary format used for whole-program devirtualization to
support branch funnels.

For more details see RFC:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html

Differential Revision: https://reviews.llvm.org/D42453

llvm-svn: 327163
2018-03-09 19:11:44 +00:00
Simon Pilgrim 2cd489feb2 [X86][AVX] createVariablePermute - fix v2i64/v2f64 VPERMILPD index creation.
The input indices vector will put the index in bit0, but VPERMILPD actually selects off bit1 - so we need to scale accordingly.

llvm-svn: 327159
2018-03-09 18:37:56 +00:00
Simon Pilgrim 230d38b559 [X86][SSE] createVariablePermute - move source vector canonicalization to top of function. NFCI.
This is to make it easier to return early from the switch statement with custom lowering.

llvm-svn: 327157
2018-03-09 18:08:08 +00:00
Simon Pilgrim 033a4167d2 Tidyup comment that was destroyed by clang-format. NFCI.
llvm-svn: 327141
2018-03-09 15:50:09 +00:00
Simon Pilgrim 322c521ed7 [X86][SSE] createVariablePermute - move index vector canonicalization to top of function. NFCI.
This is to make it easier to return early from the switch statement with custom lowering.

llvm-svn: 327140
2018-03-09 15:48:56 +00:00
Sebastian Pop b4bd0a404f [x86][aarch64] ask the backend whether it has a vector blend instruction
The code to match and produce more x86 vector blends was enabled for all
architectures even though the transform may pessimize the code for other
architectures that do not provide a vector blend instruction.

Added an aarch64 testcase to check that a VZIP instruction is generated instead
of byte movs.

Differential Revision: https://reviews.llvm.org/D44118

llvm-svn: 327132
2018-03-09 14:29:21 +00:00
Craig Topper e7060b2040 [X86] Remove duplicate isel pattern. NFC
llvm-svn: 327104
2018-03-09 05:42:44 +00:00
Craig Topper 784f1bbf5e [X86] Remove SRAs from v16i8 multiply lowering on sse2 targets
Previously we unpacked the even bytes of each input into the high byte of 16-bit elements then did an v8i16 arithmetic shift right by 8 bits to fill the upper bits of each word with sign bits. Then we did the v8i16 multiply and then masked to zero the upper 8-bits of each result. The similar was done for all the odd bytes. The results are then packed together with packuswb

Since we are masking each multiply result element to 8-bits, and those 8-bits are determined only by the lower 8-bits of each of the inputs, we don't need to fill the upper bits with sign bits. So we can just unpack into the low byte of each element and treat the upper bits as garbage. This is what gcc also does.

Differential Revision: https://reviews.llvm.org/D44267

llvm-svn: 327093
2018-03-09 01:22:31 +00:00
Simon Pilgrim c286680032 [X86][AVX] Pull out variable permute creation from LowerBUILD_VECTORAsVariablePermute. NFCI.
This will make it easier to handle more complex cases than basic scaling or index masks.

llvm-svn: 327054
2018-03-08 20:07:06 +00:00
Craig Topper a406796f5f [X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead of vXi32.
This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX.

I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that.

I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that.

llvm-svn: 326991
2018-03-08 08:02:52 +00:00
Craig Topper 7ff9779768 [X86] Fix some isel patterns that used aligned vector load instructions with unaligned predicates.
These patterns weren't checking the alignment of the load, but were using the aligned instructions. This will cause a GP fault if the data isn't aligned.

I believe these were introduced in r312450.

llvm-svn: 326967
2018-03-08 00:21:17 +00:00
Simon Pilgrim 68594ee24a [X86][SSE] LowerBUILD_VECTORAsVariablePermute - reorder permute types. NFCI.
Reorder into 128/256/512 bit vector size groupings.

NFCI commit before some new features.

llvm-svn: 326963
2018-03-07 23:56:42 +00:00
Craig Topper 80ec0c3106 [X86] Remove unused function argument. NFC
llvm-svn: 326939
2018-03-07 19:45:45 +00:00
Craig Topper c3c15dd640 [X86] Make the MUL->VPMADDWD work before op legalization on AVX1 targets. Simplify feature checks by using isTypeLegal.
The v8i32 conversion on AVX1 targets was only working after LowerMUL splits 256-bit vectors.

While I was there I've also made it so we don't have to check for AVX2 and BWI directly and instead just ask if the type is legal.

Differential Revision: https://reviews.llvm.org/D44190

llvm-svn: 326917
2018-03-07 17:53:18 +00:00
Clement Courbet 327fac4d75 [X86] Add IMUL scheduling info on sandybridge, fix it on >=haswell.
Summary:
Only IMUL16rri uses an extra P0156. IMUL32* and IMUL16rr only use
P1.
This was computed using https://github.com/google/EXEgesis/blob/master/exegesis/tools/compute_itineraries.cc

This can easily be validated by running perf on the following code:

```
int main(int argc, char**argv) {
  int a = argc;
  int b = argc;
  int c = argc;
  int d = argc;

  for (int i = 0; i < LOOP_ITERATIONS; ++i) {
    asm volatile(
      R"(
        .rept 10000
        imull $0x2, %%edx, %%eax
        imull $0x2, %%ecx, %%ebx
        imull $0x2, %%eax, %%edx
        imull $0x2, %%ebx, %%ecx
        .endr
      )"
      : "+a"(a), "+b"(b), "+c"(c), "+d"(d)
      :
      :);
  }
  return a+b+c+d;
}
```
-> test.cc

perf stat -x, -e cycles --pfm-events=uops_executed_port:port_0:u,uops_executed_port:port_1:u,uops_executed_port:port_2:u,uops_executed_port:port_3:u,uops_executed_port:port_4:u,uops_executed_port:port_5:u,uops_executed_port:port_6:u,uops_executed_port:port_7:u test

Reviewers: craig.topper, RKSimon, gadi.haber

Subscribers: llvm-commits, gchatelet, chandlerc

Differential Revision: https://reviews.llvm.org/D43460

llvm-svn: 326877
2018-03-07 08:14:02 +00:00
Craig Topper 80d3bb3b4b [TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC
The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have.

There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run.

A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does.

llvm-svn: 326832
2018-03-06 19:44:52 +00:00
Craig Topper 274e08dd81 [X86] Reject registers that require a REX prefix in inline asm constraints in 32-bit mode
We don't currently reject r8-r15 or xmm8-32 or bpl/spl/sil/dil in 32-bit mode.

Differential Revision: https://reviews.llvm.org/D44031

llvm-svn: 326826
2018-03-06 18:56:33 +00:00
Martin Storsjo a7adc3185b [X86] Handle EAX being live when calling chkstk for x86_64
EAX can turn out to be alive here, when shrink wrapping is done
(which is allowed when using dwarf exceptions, contrary to the
normal case with WinCFI).

This fixes PR36487.

Differential Revision: https://reviews.llvm.org/D43968

llvm-svn: 326764
2018-03-06 06:00:13 +00:00
Simon Pilgrim 53ff5ae8a1 [X86] cvttpd2dq lowering has been supported for some time
Tests in vec_fp_to_int.ll 

llvm-svn: 326751
2018-03-05 23:00:39 +00:00
Craig Topper f546b2c06f [X86] Replace usages of X86Subtarget::hasFp256 with hasAVX. Remove hasFP256.
Almost none of these usages were FP specific. And we had no clear guideliness on when to use hasAVX vs hasFP256.

I might also remove hasInt256 too since its an alias for hasAVX2.

llvm-svn: 326682
2018-03-05 00:13:35 +00:00
Craig Topper f2aae62228 [X86] Add a DAG combine to turn stores of vXi1 constants into scalar stores.
llvm-svn: 326679
2018-03-04 19:33:15 +00:00
Simon Pilgrim 3a76bc29d7 [X86][MMX] Remove completed _mm_cvtsi32_si64 todo
rL322525 - mmx zero constant support
rL322553 - mmx i32 zero extended value
rL326497 - mmx i64 general constant handling

Not all constants are folded, we generate some on the GPRs (similar to SSE build vector) where appropriate

llvm-svn: 326673
2018-03-04 14:57:26 +00:00
Craig Topper 12c35e1940 [X86] Fix unused variable in release builds.
llvm-svn: 326672
2018-03-04 02:14:16 +00:00
Craig Topper a476026f70 [X86] Combine (store (v1i1 (scalar_to_vector (i8 X)))) -> (store (i8 X)).
llvm-svn: 326670
2018-03-04 01:48:02 +00:00
Craig Topper be31585be8 [X86] Lower v1i1/v2i1/v4i1/v8i1 load/stores to i8 load/store during op legalization if AVX512DQ is not supported.
We were previously doing this with isel patterns. Moving it to op legalization gives us chance to see the required bitcast earlier. And it lets us remove some isel patterns.

llvm-svn: 326669
2018-03-04 01:48:00 +00:00
Simon Pilgrim ad403a483a [X86] This bit-test TODO has been moved in PR36551
llvm-svn: 326658
2018-03-03 16:31:17 +00:00
Craig Topper d4b6601662 [X86] Remove 'else' after return. NFC
llvm-svn: 326642
2018-03-03 05:18:21 +00:00
Simon Pilgrim 8cbc1d232b [X86][BTVER2] Fix throughput of YMM bitwise instructions
These instructions are double-pumped, split into 2 128-bit ops and then passing through either FPU pipe.

Found while testing llvm-mca (D43951)

llvm-svn: 326597
2018-03-02 18:20:35 +00:00
Craig Topper 6b1419b547 [X86] Reject xmm16-31 in inline asm constraints when AVX512 is disabled
Fixes PR36532

Differential Revision: https://reviews.llvm.org/D43960

llvm-svn: 326596
2018-03-02 18:19:40 +00:00
Derek Schuff 57feeed307 [X86][x32] Save callee-save register used as base pointer for x32 ABI
For the x32 ABI, since the base pointer register (EBX) is a callee save register
it should be saved before use.

This fixes https://bugs.llvm.org/show_bug.cgi?id=36011

Differential Revision: https://reviews.llvm.org/D42358

Patch by Pratik Bhatu

llvm-svn: 326593
2018-03-02 17:46:39 +00:00
David Stenberg 3fb8c324b3 Test commit: Remove an extraneous space. NFC
Test commit access.

llvm-svn: 326573
2018-03-02 14:28:56 +00:00
Simon Pilgrim c879aa7eab [X86] Remove old UNIMPLEMENTED list
All of these are implemented and have appropriate test coverage

llvm-svn: 326553
2018-03-02 11:59:37 +00:00
Simon Pilgrim 90fd0622b6 [X86][MMX] Improve handling of 64-bit MMX constants
64-bit MMX constant generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type.

This patch bitcasts the constant to a double value to allow correct loading directly to the MMX register.

I've added MMX constant asm comment support to improve testing, it's better to always print the double values as hex constants as MMX is mainly an integer unit (and even with 3DNow! its just floats).

Differential Revision: https://reviews.llvm.org/D43616

llvm-svn: 326497
2018-03-01 22:22:31 +00:00
Craig Topper cb7881c649 [X86] Stop passing two arguments by reference. NFC
I think these used to be out parameters, but they haven't been for a while.

llvm-svn: 326417
2018-03-01 06:25:13 +00:00
Craig Topper ccfa5257a6 [X86] Make sure we don't combine (fneg (fma X, Y, Z)) to a target specific node when there are no FMA instructions.
This would cause a 'cannot select' error at isel when we should have emitted a lib call and an xor.

Fixes PR36553.

llvm-svn: 326393
2018-03-01 00:08:38 +00:00
Craig Topper e31b9d1e5f [X86] Lower extract_element from k-registers by bitcasting from v16i1 to i16 and extending/truncating.
This is equivalent to what isel was doing anyway but by canonicalizing earlier we can remove some patterns.

llvm-svn: 326375
2018-02-28 22:23:55 +00:00
Simon Pilgrim 72b86586b0 [X86][AVX512] Improve support for signed saturation truncation stores
Matches what we already manage for unsigned saturation truncation stores

Differential Revision: https://reviews.llvm.org/D43629

llvm-svn: 326372
2018-02-28 21:42:19 +00:00
Chih-Hung Hsieh 9f9e4681ac [TLS] use emulated TLS if the target supports only this mode
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.

Differential Revision: https://reviews.llvm.org/D42999

llvm-svn: 326341
2018-02-28 17:48:55 +00:00
Alexander Ivchenko c01f750480 [GlobalIsel][X86] Support G_INTTOPTR instruction.
Add legalization/selection for x86/x86_64 and
corresponding tests.

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D43622

llvm-svn: 326320
2018-02-28 12:11:53 +00:00
Alexander Ivchenko 46e07e3623 [GlobalIsel][X86] Support G_PTRTOINT instruction.
Add legalization/selection for x86/x86_64 and
corresponding tests.

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D43617

llvm-svn: 326311
2018-02-28 09:18:47 +00:00
Craig Topper 48d5ed265c [X86] Don't use EXTRACT_ELEMENT from v1i1 with i8/i32 result type when we need to guarantee zeroes in the upper bits of return.
An extract_element where the result type is larger than the scalar element type is semantically an any_extend of from the scalar element type to the result type. If we expect zeroes in the upper bits of the i8/i32 we need to mae sure those zeroes are explicit in the DAG.

For these cases the best way to accomplish this is use an insert_subvector to pad zeroes to the upper bits of the v1i1 first. We extend to either v16i1(for i32) or v8i1(for i8). Then bitcast that to a scalar and finish with a zero_extend up to i32 if necessary. We can't extend past v16i1 because that's the largest mask size on KNL. But isel is smarter enough to know that a zext of a bitcast from v16i1 to i16 can use a KMOVW instruction. The insert_subvectors will be dropped during isel because we can determine that the producing instruction already zeroed the upper bits of the k-register.

llvm-svn: 326308
2018-02-28 08:14:28 +00:00
Craig Topper ac799b05d4 [X86] Change the masked FPCLASS implementation to use AND instead of OR to combine the mask results.
While the description for the instruction does mention OR, its talking about how the individual classification test results are ORed together.

The incoming mask is used as a zeroing write mask. If the bit is 1 the classification is written to the output. The bit is 0 the output is 0. This equivalent to an AND.

Here is pseudocode from the intrinsics guide

FOR j := 0 to 1
        i := j*64
        IF k1[j]
                k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
        ELSE
                k[j] := 0
        FI
ENDFOR
k[MAX:2] := 0

llvm-svn: 326306
2018-02-28 06:19:55 +00:00
Craig Topper 688d1eb919 Revert r326225 "[X86] Move the load folding tables to a separate .inc file"
The bots don't seem to like the .inc file. I must be missing some cmake incantation.

llvm-svn: 326228
2018-02-27 19:15:40 +00:00
Craig Topper c0a1291478 [X86] Move the load folding tables to a separate .inc file
These tables add 3000 lines to X86InstrInfo.cpp. And if we ever manage to auto generate them they'll be a separate file anyway.

Differential Revision: https://reviews.llvm.org/D43806

llvm-svn: 326225
2018-02-27 18:46:11 +00:00
Simon Pilgrim ba43ec8702 [X86][AVX] combineLoopMAddPattern - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply
llvm-svn: 326189
2018-02-27 12:20:37 +00:00
Craig Topper 264707bae4 [X86] Simplify if condition. NFC
SSE2 implies SSE1 and we already covered f32 in the SSE1 check so we don't need to check f32 in the SSE2 check.

llvm-svn: 326170
2018-02-27 06:00:38 +00:00
Craig Topper fcaa0323ec [X86] Replace an impossible if condition with an assert.
llvm-svn: 326167
2018-02-27 03:50:00 +00:00
Aditya Nandakumar 599990530e [GISel]: Don't assert when constraining RegisterOperands which are uses.
Currently we assert that only non target specific opcodes can have
missing RegisterClass constraints in the MCDesc. The backend can have
instructions with register operands but don't have RegisterClass
constraints (say using unknown_class) in which case the instruction
defining the register will constrain it.
Change the assert to only fire if a def has no regclass.

https://reviews.llvm.org/D43409

llvm-svn: 326142
2018-02-26 22:56:21 +00:00
Simon Pilgrim 9929f90740 [X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)
Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark.

Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch.

Differential Revision: https://reviews.llvm.org/D43733

llvm-svn: 326133
2018-02-26 22:10:17 +00:00
Craig Topper e5d39e42b9 [X86] Add constant folding to combineMOVMSK.
There's still some shortcoming in our ability to combine binops of constants with different sizes separated by an extend. I'll try to look at that next.

llvm-svn: 326128
2018-02-26 21:17:33 +00:00
Craig Topper 5e0ceb8865 [X86] Add a custom legalization for (i16 (bitcast v16i1)) and (i32 (bitcast v32i1)) without AVX512 to prevent scalarization
Summary:
We have an early DAG combine to turn these patterns into MOVMSK, but that combine doesn't work if the vXi1 type has more elements than the widest legal vXi8 type. Type legalization will eventually split it down to v16i1 or v32i1 and then the bitcast gets legalized to a truncstore and a scalar load. The truncstore will get lowered to a series of extracts and bit math.

This patch adds a custom legalization to use a sign extend and MOVMSK instead. This prevents the eventual scalarization.

Reviewers: spatel, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D43593

llvm-svn: 326119
2018-02-26 20:32:27 +00:00
Simon Pilgrim db0ed7d724 [X86][AVX] createPSADBW - support 256-bit cases on AVX1 via SplitBinaryOpsAndApply
llvm-svn: 326104
2018-02-26 18:17:25 +00:00
Craig Topper 5c980eba47 [X86] Don't use getZExtValue when we have no idea how large the input elements are.
llvm-svn: 326066
2018-02-26 04:43:24 +00:00