Commit Graph

11458 Commits

Author SHA1 Message Date
Scott Linder 16c7bdaf32 [DebugInfo] Support DWARF v5 source code embedding extension
In DWARF v5 the Line Number Program Header is extensible, allowing values with
new content types. In this extension a content type is added,
DW_LNCT_LLVM_source, which contains the embedded source code of the file.

Add new optional attribute for !DIFile IR metadata called source which contains
source text. Use this to output the source to the DWARF line table of code
objects. Analogously extend METADATA_FILE in Bitcode and .file directive in ASM
to support optional source.

Teach llvm-dwarfdump and llvm-objdump about the new values. Update the output
format of llvm-dwarfdump to make room for the new attribute on file_names
entries, and support embedded sources for the -source option in llvm-objdump.

Differential Revision: https://reviews.llvm.org/D42765

llvm-svn: 325970
2018-02-23 23:01:06 +00:00
Sriraman Tallam 609f8c013c Intrinsics calls should avoid the PLT when "RtLibUseGOT" metadata is present.
Differential Revision: https://reviews.llvm.org/D42216

llvm-svn: 325962
2018-02-23 21:32:06 +00:00
Craig Topper 61d6ddbf0a [X86] Add DAG combine to remove (and X, 1) from in front of a v1i1 scalar to vector.
These can be created by type legalization promoting the inputs to select to match scalar boolean contents.

We were trying to pattern match them away during isel, but its better to just remove them from the DAG.

I've cleaned up some patterns to not check for this 'and' anymore. But I suspect this has also opened up opportunities for pattern removal.

llvm-svn: 325949
2018-02-23 20:13:42 +00:00
Simon Pilgrim 425965be0f [X86][SSE] Generalize x > C-1 ? x+-C : 0 --> subus x, C combine for non-uniform constants
llvm-svn: 325944
2018-02-23 19:58:44 +00:00
Simon Pilgrim 14686059d5 [X86][SSE] Add x > C-1 ? x+-C : 0 --> subus x, C test caaes for non-uniform constants
llvm-svn: 325936
2018-02-23 18:57:26 +00:00
Craig Topper 11704dcc72 [X86] Custom split v32i16/v64i8 bitcasts when AVX512F is available, but BWI is not.
The test changes you can see are related to the changes in ReplaceNodeResults. Though shuffle-vs-trunc-512.ll does have a test that exercises the code in LowerBITCAST. Looks like the test output didn't change because DAG combining is able to clean up the resulting type legalization. Adding the custom hook just makes type legalization work less hard.

Differential Revision: https://reviews.llvm.org/D43447

llvm-svn: 325933
2018-02-23 18:43:36 +00:00
Simon Pilgrim 43e8e40026 [X86] Regenerate i128 multiply tests
llvm-svn: 325919
2018-02-23 15:55:27 +00:00
Hans Wennborg 89c35fc44d Support for the mno-stack-arg-probe flag
Adds support for this flag. There is also another piece for clang
(separate review). More info:
https://bugs.llvm.org/show_bug.cgi?id=36221

By Ruslan Nikolaev!

Differential Revision: https://reviews.llvm.org/D43107

llvm-svn: 325900
2018-02-23 13:46:25 +00:00
Simon Pilgrim 17f01c394b [X86][F16C] Regenerate half conversion tests
llvm-svn: 325896
2018-02-23 13:18:13 +00:00
Amaury Sechet 893a6b89ff [DAGCOmbine] Ensure that (brcond (setcc ...)) is handled in a canonical manner.
Summary:
There are transformation that change setcc into other constructs, and transform that try to reconstruct a setcc from the brcond condition. Depending on what order these transform are done, the end result differs.

Most of the time, it is preferable to get a setcc as a brcond argument (and this is why brcond try to recreate the setcc in the first place) so we ensure this is done every time by also doing it at the setcc level when the only user is a brcond.

Reviewers: spatel, hfinkel, niravd, craig.topper

Subscribers: nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D41235

llvm-svn: 325892
2018-02-23 11:50:42 +00:00
Richard Smith 1a9a404fb0 Remove file missed by r325852 due to merge conflict.
llvm-svn: 325853
2018-02-23 01:57:28 +00:00
Richard Smith ade53736b0 Revert r325128 ("[X86] Reduce Store Forward Block issues in HW").
This is causing miscompiles in some situations. See the llvm-commits thread for the commit for details.

llvm-svn: 325852
2018-02-23 01:43:46 +00:00
Craig Topper 0dcc88a500 [X86] Turn setne X, signedmax into setgt signedmax, X in LowerVSETCC to avoid an invert
We won't be able to fold the constant pool load, but its still better than materialing ones and xoring for the invert if we used PCMPEQ.

This will fix another regression from D42948.

llvm-svn: 325845
2018-02-23 00:21:39 +00:00
Craig Topper d2fab30827 [X86] Turn setne X, signedmin into setgt X, signedmin in LowerVSETCC to avoid an invert
This will fix one of the regressions from D42948.

Differential Revision: https://reviews.llvm.org/D43531

llvm-svn: 325840
2018-02-22 23:46:28 +00:00
Simon Pilgrim b8237d2e2b [X86][AVX512] Add DQ+VLX scalar int<->fp tests cases for D43441
llvm-svn: 325804
2018-02-22 16:29:08 +00:00
Simon Pilgrim 55b7e01116 [X86][MMX] Generlize MMX_MOVD64rr combines to accept v4i16/v8i8 build vectors as well as v2i32
Also handle both cases where the lower 32-bits of the MMX is undef or zero extended.

llvm-svn: 325736
2018-02-21 23:07:30 +00:00
Simon Pilgrim 664582b781 [X86][MMX] Add MMX_MOVD64rr build vector tests showing undef elements in the lower half
llvm-svn: 325729
2018-02-21 22:10:48 +00:00
Simon Pilgrim 7f078eabda [X86][MMX] Run MMX bitcast test on 32 and 64-bit targets
llvm-svn: 325707
2018-02-21 18:52:16 +00:00
Simon Pilgrim 01da1191a3 [X86][MMX] Regenerate MMX MASKMOV test
llvm-svn: 325698
2018-02-21 16:38:08 +00:00
Simon Pilgrim 3679c95ae4 [X86][MMX] Regenerate MMX arithmetic tests
llvm-svn: 325696
2018-02-21 16:37:10 +00:00
Simon Pilgrim 9c669e13c9 [X86][MMX] Regenerate MMX PSUB commutation test
llvm-svn: 325685
2018-02-21 15:07:47 +00:00
Simon Pilgrim e1f3de55ac [X86] Regenerate GPR:XMM bitcast test
llvm-svn: 325684
2018-02-21 15:05:47 +00:00
Simon Pilgrim b21518a8cc [X86][MMX] Add PR29222 test case
llvm-svn: 325675
2018-02-21 12:06:27 +00:00
Simon Pilgrim cf7640564a [X86][MMX] Add some MMX build vector tests
llvm-svn: 325674
2018-02-21 12:01:30 +00:00
Craig Topper d710adac2d [X86] Disable CLWB for Cannon Lake
Cannon Lake does not support CLWB, therefore it
does not include all features listed under SKX anymore.

Instead, enumerate all SKX features with the exception of CLWB.

Patch by Gabor Buella

Differential Revision: https://reviews.llvm.org/D43380

llvm-svn: 325654
2018-02-21 00:15:48 +00:00
Craig Topper 63dd97513b [X86] Fix copy/paste mistake in test.
The contents of the test case didnt' match the name of the test case. And they were identical to the test above.

llvm-svn: 325635
2018-02-20 22:33:23 +00:00
Craig Topper 7fbea20b90 [SelectionDAG] Support known true/false SimplifySetCC cases for comparing against vector splats of constants.
This is split off from D42948 and includes just the cases that constant fold to true or false. It also includes some refactoring to keep predicate checks together.

This supports things like

(setcc uge X, 0) -> true

Differential Revision: https://reviews.llvm.org/D43489

llvm-svn: 325627
2018-02-20 21:48:14 +00:00
Simon Pilgrim 75853bf149 [X86][MMX] Regenerate MMX bitcast test
llvm-svn: 325611
2018-02-20 18:48:29 +00:00
Simon Pilgrim 2cf3769a7e [X86][3DNow] Regenerate intrinsics tests
llvm-svn: 325609
2018-02-20 18:44:21 +00:00
Craig Topper df0c22fcd3 [X86] Correct SHRUNKBLEND creation to work correctly when there are multiple uses of the condition.
SimplifyDemandedBits forces the demanded mask to all 1s if the node has multiple uses, unless the AssumeSingleUse flag is set.

So previously we were only really likely to simplify something if the condition had a single use. And on the off chance we did simplify with multiple uses the demanded mask being used was all ones so there was no reason to create a shrunkblend.

This patch now checks that the condition is only used by selects first, and then sets the AssumeSingleUse flag for the simplifcation. Then we convert the selects to shrunkblend, and finally replace condition.

Differential Revision: https://reviews.llvm.org/D43446

llvm-svn: 325604
2018-02-20 17:58:17 +00:00
Craig Topper 35801fa5ce [SelectionDAG] Add LegalTypes flag to getShiftAmountTy. Use it to unify and simplify DAGCombiner and simplifySetCC code and fix a bug.
DAGCombiner and SimplifySetCC both use getPointerTy for shift amounts pre-legalization. DAGCombiner uses a single helper function to hide this. SimplifySetCC does it in multiple places.

This patch adds a defaulted parameter to getShiftAmountTy that can make it return getPointerTy for scalar types. Use this parameter to simplify the SimplifySetCC and DAGCombiner.

Additionally, there were two places in SimplifySetCC that were creating shifts using the target's preferred shift amount pre-legalization. If the target uses a narrow type and the type is illegal, this can cause SimplfiySetCC to create a shift with an amount that can't represent all possible shift values for the type. To fix this we should use pointer type there too.

Alternatively we could make getScalarShiftAmountTy for each target return a safe value for large types as proposed in D43445. And maybe we should still do that, but fixing the SimplifySetCC code keeps other targets from tripping over this in the future.

Fixes PR36250.

Differential Revision: https://reviews.llvm.org/D43449

llvm-svn: 325602
2018-02-20 17:41:05 +00:00
Craig Topper 010ae8dcbb [X86] Promote 16-bit cmovs to 32-bits
This allows us to avoid an opsize prefix. And forcing some move immediates to i32 avoids a length changing prefix on those instructions.

This mostly replaces the existing combine we had for zext/sext+cmov of constants. I left in a case for sign extending a 32 bit cmov of constants to 64 bits.

Differential Revision: https://reviews.llvm.org/D43327

llvm-svn: 325601
2018-02-20 17:41:00 +00:00
Simon Pilgrim 61ba223704 [X86] Regenerate XOR tests
llvm-svn: 325579
2018-02-20 14:08:39 +00:00
George Rimar da4f43a4b4 [llvm-mc] - Produce R_X86_64_PLT32 for "call/jmp foo".
For instructions like call foo and jmp foo patch changes
relocation produced from R_X86_64_PC32 to R_X86_64_PLT32.
Relocation can be used as a marker for 32-bit PC-relative branches.
Linker will reduce PLT32 relocation to PC32 if function is defined locally.

Differential revision: https://reviews.llvm.org/D43383

llvm-svn: 325569
2018-02-20 10:17:57 +00:00
Craig Topper 9256ac1a58 [X86] Add 512-bit unmasked pmulhrsw/pmulhw/pmulhuw intrinsics. Remove and auto upgrade 128/256/512 bit masked pmulhrsw/pmulhw/pmulhuw intrinsics.
The 128 and 256 bit versions were already not used by clang. This adds an equivalent unmasked 512 bit version. Then autoupgrades all sizes to use unmasked intrinsics plus select.

llvm-svn: 325559
2018-02-20 07:28:14 +00:00
Craig Topper 41f64c3204 [X86] Mark XOP vpmac* and vpmadc intrinsics as being commutative so that tablegen will generate patterns with the load in operand 0.
This allows loads to be folded during isel without the peephole pass.

llvm-svn: 325548
2018-02-20 03:58:14 +00:00
Craig Topper b195ed8ce3 [X86] Use vpmovq2m/vpmovd2m for truncate to vXi1 when possible.
Previously we used vptestmd, but the scheduling data for SKX says vpmovq2m/vpmovd2m is lower latency. We already used vpmovb2m/vpmovw2m for byte/word truncates. So this is more consistent anyway.

llvm-svn: 325534
2018-02-19 22:07:31 +00:00
Craig Topper e60f1472f1 [X86] Stop swapping the operands of AVX512 setge.
We swapped the operands and used setle, but I don't see any reason to do that. I think this is a holdover from SSE where we swap and the invert to use pcmpgt. But with AVX512 we don't want an invert so we won't use pcmpgt. So there's no need to swap.

llvm-svn: 325527
2018-02-19 19:23:35 +00:00
Craig Topper 9471a7c898 [X86] Reduce the number of isel pattern variations needed for VPTESTM/VPTESTNM matching.
Canonicalize EQ/NE PCMPM to have build vector all zeros on the RHS so we don't have to pattern match it in both locations. This significantly reduces the number of isel patterns needed since we also had to multiply it out with loads being in either operand of the 'and' input node and in the 'and' masking node.

This removes over 24000 bytes from the isel table.

llvm-svn: 325526
2018-02-19 19:23:31 +00:00
Simon Pilgrim 70eb508605 [SelectionDAG] ComputeKnownBits - add support for SMIN+SMAX clamp patterns
If we have a clamp pattern, SMIN(SMAX(X, LO),HI) or SMAX(SMIN(X, HI),LO) then we can deduce that the number of signbits (zeros/ones) will be at least the minimum of the LO and HI constants.

ComputeKnownBits equivalent of D43338.

Differential Revision: https://reviews.llvm.org/D43463

llvm-svn: 325521
2018-02-19 18:08:16 +00:00
Rafael Espindola c7e51805ff Bring back r323297.
It was reverted because it broke the grub build. The reason the grub
build broke is because grub does its own relocation processing and was
not handing R_386_PLT32. Since grub has no dynamic linker, the fix is
trivial: handle R_386_PLT32 exactly like R_386_PC32.

On the report it was noted that they are using
-fno-integrated-assembler. The upstream GAS (starting with
451875b4f976a527395e9303224c7881b65e12ed) will already be producing a
R_386_PLT32 anyway, so they have to update their code one way or the
other

Original message:

Don't assume a null GV is local for ELF and MachO.

This is already a simplification, and should help with avoiding a plt
reference when calling an intrinsic with -fno-plt.

With this change we return false for null GVs, so the caller only
needs to check the new metadata to decide if it should use foo@plt or
*foo@got.

llvm-svn: 325514
2018-02-19 16:02:38 +00:00
Simon Pilgrim c302a581a0 [X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK down to 64-bit subvectors
Add support for chaining PACKSS/PACKUS down to 64-bit vectors by using only a single 128-bit input.

llvm-svn: 325494
2018-02-19 13:29:20 +00:00
Craig Topper 1040f236a3 [X86] Make masked pcmpeq commutable during isel so we can fold loads in other operand to the shorter encoding.
Previously we used the immediate encoding if the load was in operand 0 and the short encoding if the load was in operand 1.

This added an insane number of bytes to the size of the isel table. I'm wondering if we should always use the immediate form during isel and change to the short form during emission. This would remove the need to pattern match every combination for both the immediate form and the short form during isel. We could do the same with vpcmpgt

llvm-svn: 325456
2018-02-18 02:37:33 +00:00
Craig Topper b824050658 [X86] Add -show-mc-encoding to the avx512-vec-cmp.ll test and add test case to show that we're failing to use the shorter pcmpeq encoding when the memory arguemnt is the first argument.
This can't be spotted without showing the encodings since they have the same mnemonic.

llvm-svn: 325455
2018-02-18 02:37:32 +00:00
Simon Pilgrim 7fae42eb27 [SelectionDAG] ComputeNumSignBits - add support for SMIN+SMAX clamp patterns
If we have a clamp pattern, SMIN(SMAX(X, LO),HI) or SMAX(SMIN(X, HI),LO) then we can deduce that the number of signbits will be at least the minimum of the LO and HI constants.

I haven't bothered with the UMIN/UMAX equivalent as (1) we don't have any current use cases and (2) I wonder if we'd be better off immediately falling back for ComputeKnownBits for UMIN/UMAX which already has optimization patterns useful for unsigned cases.

Differential Revision: https://reviews.llvm.org/D43338

llvm-svn: 325450
2018-02-17 22:19:50 +00:00
Simon Pilgrim 8da142bff1 [SelectionDAG] SimplifyDemandedVectorElts - add support for VECTOR_INSERT_ELT
Differential Revision: https://reviews.llvm.org/D43431

llvm-svn: 325449
2018-02-17 21:49:40 +00:00
Quentin Colombet 48abac82b8 Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
This reverts commit r323991.

This commit breaks target that don't model all the register constraints
in TableGen. So far the workaround was to set the
hasExtraXXXRegAllocReq, but it proves that it doesn't cover all the
cases.
For instance, when mutating an instruction (like in the lowering of
COPYs) the isRenamable flag is not properly updated. The same problem
will happen when attaching machine operand from one instruction to
another.

Geoff Berry is working on a fix in https://reviews.llvm.org/D43042.

llvm-svn: 325421
2018-02-17 03:05:33 +00:00
Chandler Carruth a1d6107b14 [DAG, X86] Revert r324797, r324491, and r324359.
Sadly, r324359 caused at least PR36312. There is a patch out for review
but it seems to be taking a bit and we've already had these crashers in
tree for too long. We're hitting this PR in real code now and are
blocked on shipping new compilers as a consequence so I'm reverting us
back to green.

Sorry for the churn due to the stacked changes that I had to revert. =/

llvm-svn: 325420
2018-02-17 02:26:25 +00:00
Craig Topper 0bcdd399e7 [X86] Turn selects with constant condition into vector shuffles during DAG combine
Summary:
Currently we convert to shuffles during lowering. This moves it to DAG combine so hopefully we can get it done before type legalization has to extend the condition.

I believe in some cases we're creating SHRUNKBLENDs that end up with constant conditions because we see the extended on the condition and think its a dynamic selelect before DAG combine gets a chance to constant fold the extend. We could add combines to turn SHRUNKBLENDs with constant condition back to vselect. But it seemed like it might be better to just send them to shuffles as early as possible so they never get a chance to become SHRUNKBLENDs. This the reason some tests went from blends controlled by a constant pool load to just move.

Some of the constant pool entries changed because the sign_extend introduced by type legalization turned undef elements in select condition into 0s. While the select->shuffle used -1 in the shuffle mask. So now the shuffle lowering can do what it wants with them.

I'll remove the lowering code as a follow up. We might be able to simplify some of the pre-checks for SHRUNKBLEND as the FIXME there says.

Reviewers: spatel, RKSimon, efriedma, zvi, andreadb

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43367

llvm-svn: 325417
2018-02-17 00:30:30 +00:00
Aditya Nandakumar b63e763847 [GISel]: Make GlobalISelEmitter rule prioritization compatible with selectionDAG
This patch changes GlobalISelEmitter to rank patterns similar to how the
DAG does it (ie it computes a score for a pattern and adds the added
complexity to it).
This is so that the decision tree for GISelSelector remains compatible
with that of SelectionDAG.

https://reviews.llvm.org/D43270

llvm-svn: 325401
2018-02-16 22:37:15 +00:00
Craig Topper de565fc73e [X86] Only reorder srl/and on last DAG combiner run
This seems to interfere with a target independent brcond combine that looks for the (srl (and X, C1), C2) pattern to enable TEST instructions. Once we flip, that combine doesn't fire and we end up exposing it to the X86 specific BT combine which causes us to emit a BT instruction. BT has lower throughput than TEST.

We could try to make the brcond combine aware of the alternate pattern, but since the flip was just a code size reduction and not likely to enable other combines, it seemed easier to just delay it until after lowering.

Differential Revision: https://reviews.llvm.org/D43201

llvm-svn: 325371
2018-02-16 18:51:09 +00:00
Simon Pilgrim ff53a4a234 [SelectionDAG] Enable SimplifyDemandedVectorElts support for simplifying shuffle masks
Based off the DemandedElts mask the and UNDEF elements returned from the SimplifyDemandedVectorElts calls to the shuffle operands, we can attempt to simplify the shuffle mask.

I had to be very conservative here as accepting post-legalized shuffle masks could cause problems for targets that legalize UNDEF mask elements back to inrange values (PowerPC), similarly combining to identity shuffle masks could cause too much UNDEF information to disappear for later combines.

llvm-svn: 325354
2018-02-16 16:22:14 +00:00
Simon Pilgrim 4e2f757dc1 [X86][SSE] Allow float domain crossing if we are merging 2 or more shuffles and the root started as a float domain shuffle
llvm-svn: 325349
2018-02-16 14:57:25 +00:00
Simon Pilgrim 0ffde50f9c [SelectionDAG] Add initial SimplifyDemandedVectorElts support for simplifying VSELECT operands
This just adds a basic pass through - we can add constant selection mask handling in a future patch to fully match InstCombine.

llvm-svn: 325338
2018-02-16 12:21:08 +00:00
Craig Topper 2e4b838c06 [X86] Allow CMOVs of constants to be sign extended from i32.
Sign extending i32 constants only requires a REX prefix as does widening the CMOV. This is cheaper than the explicit sign extend op.

llvm-svn: 325318
2018-02-16 07:16:15 +00:00
Craig Topper 5d9e301042 [X86] Don't zero_extend cmov up to i64, stop at i32.
Zero extend from i32 to i64 is free. So extend from i16 to i32, and then use a free zero extend to finish.

llvm-svn: 325317
2018-02-16 06:52:43 +00:00
Craig Topper da9c122203 [X86] Add the test cases that were supposed to go with r325287.
llvm-svn: 325306
2018-02-16 00:39:05 +00:00
Craig Topper f3f35efe5c [X86] Enable BT to be used in place of TEST for single bit checks under optsize
We already do this for 64-bit when it won't fit into a 64-bit AND/TEST's immediate field. This adds an additional qualifier to do it for any single bit constant larger than 8-bits under optsize

Differential Revision: https://reviews.llvm.org/D43346

llvm-svn: 325290
2018-02-15 20:27:30 +00:00
Craig Topper 81631a2609 [X86] Add test cases for opportunities for using BT instead of TEST under optsize.
llvm-svn: 325277
2018-02-15 19:00:11 +00:00
Simon Pilgrim 689d8137ce [X86][SSE] Add saturated truncation tests for storing illegal v8i8 types
Tests showing missing opportunities to use PACK instructions in cases where we need to truncate to illegal types for stores

llvm-svn: 325270
2018-02-15 17:48:34 +00:00
Simon Pilgrim 17bb6f0755 [X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK to chain PACKUS vXi32-vXi8 saturated truncation
We can use PACKSS/PACKUS to saturate each stage of the chain: PACKSSDW down to [-32768,32767] and then PACKUSWB to [0,255].

llvm-svn: 325243
2018-02-15 14:37:59 +00:00
Simon Pilgrim 908f833e57 [X86][SSE] combineTruncateWithSat - use truncateVectorWithPACK to chain PACKSS vXi32-vXi8 saturated truncation
We can use PACKSS to saturate each stage of the chain: PACKSSDW down to [-32768,32767] and then PACKSSWB to [-128,127].

PACKUS is a little trickier and will be handled in a separate patch.

llvm-svn: 325235
2018-02-15 13:33:15 +00:00
Simon Pilgrim 80663ee986 [SelectionDAG] Add initial implementation of TargetLowering::SimplifyDemandedVectorElts
This is mainly a move of simplifyShuffleOperands from DAGCombiner::visitVECTOR_SHUFFLE to create a more general purpose TargetLowering::SimplifyDemandedVectorElts implementation.

Further features can be moved/added in future patches.

Differential Revision: https://reviews.llvm.org/D42896

llvm-svn: 325232
2018-02-15 12:14:15 +00:00
Craig Topper ddde46a790 [X86] Regnerate test to show scheduling comments. NFC
These must have not been printing the last time the test was re-generated.

llvm-svn: 325198
2018-02-15 02:14:20 +00:00
Craig Topper a08f83bac3 [X86] Reverse the operand order of invlpga in at&t syntax to match gas.
llvm-svn: 325190
2018-02-14 23:53:21 +00:00
Craig Topper 675752166d [X86] Don't swap argument on BOUND instruction in at&t syntax.
The bound instruction does not have reversed operands in gas.

Fixes PR27653.

Patch by Maya Madhavan.

Differential Revision: https://reviews.llvm.org/D43243

llvm-svn: 325178
2018-02-14 21:54:58 +00:00
Simon Pilgrim 2ec8373633 [X86][SSE] truncateVectorWithPACK - Use src type instead of dst to select between PACK*SDW/PACK*SWB
Try to keep PACK*SDW/PACK*SWB as wide as possible, this helps ComputeNumSignBits as it can only peek through bitcasts to wider types, pre-AVX2 codegen was already doing this as it could peek through bitcasts/subvectors more easily than AVX2 could through shuffles.

This shouldn't affect existing results as calls to truncateVectorWithPACK ensure we have enough sign bits to pack to the same value, but it should make it possible to use truncateVectorWithPACK chains to perform saturation in combineTruncateWithSat with a future patch.

llvm-svn: 325149
2018-02-14 18:23:58 +00:00
Sanjay Patel 3f60b0b3f4 [x86] add baseline vector compare tests for D42948; NFC
llvm-svn: 325138
2018-02-14 16:15:15 +00:00
Alexander Ivchenko 7e5d525bd5 [SelectionDAG][X86] Fix incorrect offset generated for VMASKMOV
When creating high MachineMemOperand for MSTORE/MLOAD we supply
it with the original PointerInfo, while the pointer itself had been incremented.
The patch adds the proper offset to the PointerInfo.

llvm-svn: 325135
2018-02-14 15:55:24 +00:00
Lama Saba fe1016c485 [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.

This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.

The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.

Change-Id: Ic41aa9ade6512e0478db66e07e2fde41b4fb35f9
llvm-svn: 325128
2018-02-14 14:58:53 +00:00
Simon Pilgrim 86d15bff68 [X86][SSE] Relax type legality for combineTruncateWithSat PACKSS/PACKUS truncation
While the AVX512 VTRUNCS/VTRUNCUS instructions require legal types, truncateVectorWithPACK handles cases with multiples of legal types through splitting/concatenation. So we just need to ensure that the src/dst scalar types are correct and leave truncateVectorWithPACK to handle the rest of it.

llvm-svn: 325127
2018-02-14 14:14:29 +00:00
Reid Kleckner 91e11a83fc [X86] Use EDI for retpoline when no scratch regs are left
Summary:
Instead of solving the hard problem of how to pass the callee to the indirect
jump thunk without a register, just use a CSR. At a call boundary, there's
nothing stopping us from using a CSR to hold the callee as long as we save and
restore it in the prologue.

Also, add tests for this mregparm=3 case. I wrote execution tests for
__llvm_retpoline_push, but they never got committed as lit tests, either
because I never rewrote them or because they got lost in merge conflicts.

Reviewers: chandlerc, dwmw2

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D43214

llvm-svn: 325049
2018-02-13 20:47:49 +00:00
Craig Topper f73ff612ca [DAGCombiner] Add one use check to fold (not (and x, y)) -> (or (not x), (not y))
Summary:
If the and has an additional use we shouldn't invert it. That creates an additional instruction.

While there add a one use check to the transform above that looked similar.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43225

llvm-svn: 325019
2018-02-13 16:25:27 +00:00
Craig Topper 036789a7e8 [X86] Add combine to shrink 64-bit ands when one input is an any_extend and the other input guarantees upper 32 bits are 0.
Summary: This gets the shift case from PR35792.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43222

llvm-svn: 325018
2018-02-13 16:25:25 +00:00
Sanjay Patel 907b58530f [DAG] fix type of undef returned by getNode()
The bug has been lying dormant, but apparently was never exposed, until
after rL324941 because we didn't return the correct result 
for shifts with undef operands.

llvm-svn: 325010
2018-02-13 14:55:07 +00:00
Alexander Ivchenko e057325cdd [X86] Rename function main->foo in CodeGen/X86/pr35316.ll. NFC
Using "void main" might be confusing for some cases.

llvm-svn: 324997
2018-02-13 10:58:19 +00:00
Craig Topper 7873648f87 [X86] Add a test case showing blcic matching being broken by an and mask applied to the input. NFC
Playing around with other BMI/TBM instructions after PR35792 and saw this.

llvm-svn: 324987
2018-02-13 07:28:28 +00:00
Craig Topper d6d731270b [X86] Add a blsr test case with a shift from PR35792. NFC
The blsr pattern here is missed because the add is shrunk, but the and is not. This leaves an any_extend between them.

llvm-svn: 324986
2018-02-13 05:33:39 +00:00
Craig Topper df99baa4df [X86] Teach EVEX->VEX pass to turn VRNDSCALE into VROUND when bits 7:4 of the immediate are 0 and the regular EVEX->VEX checks pass.
Bits 7:4 control the scale part of the operation. If the scale is 0 the behavior is equivalent to VROUND.

Fixes PR36246

llvm-svn: 324985
2018-02-13 04:19:26 +00:00
Craig Topper 4b89cc1b96 [X86] Autogenerate complete checks. NFC
llvm-svn: 324984
2018-02-13 04:19:23 +00:00
Craig Topper ef619918f2 [X86] Remove duplicate CHECK-LABEL line the update script didn't delete when I converted the test.
llvm-svn: 324979
2018-02-13 01:36:27 +00:00
Craig Topper 52b5558ca0 [X86] Auto generate complete checks. NFC
llvm-svn: 324964
2018-02-12 23:43:10 +00:00
Adam Nemet 031a00c660 Revert "[LSR] Avoid UB overflow when examining reuse opportunities"
This reverts commit r324943.

Breaking bots, reverting for Gerolf.

llvm-svn: 324958
2018-02-12 22:42:13 +00:00
Craig Topper 8d19c6fba2 [X86] Reverse the operand order of the autoupgrade of the kunpack builtins.
The second operand needs to be in the lower bits of the concatenation. This matches llvm 5.0, gcc, and icc behavior.

Fixes PR36360.

llvm-svn: 324953
2018-02-12 22:38:34 +00:00
Sanjay Patel 0537deb5cd [x86] add select test to show there's no single right answer (PR28968); NFC
llvm-svn: 324947
2018-02-12 22:19:24 +00:00
Gerolf Hoflehner edcd564820 [LSR] Avoid UB overflow when examining reuse opportunities
llvm-svn: 324943
2018-02-12 21:49:32 +00:00
Sanjay Patel 014c000f6a [DAG] make binops with undef operands consistent with IR
This started by noticing that scalar and vector types were producing different results with div ops in PR36305:
https://bugs.llvm.org/show_bug.cgi?id=36305

...but the problem is bigger. I couldn't keep it straight without a table, so I'm attaching that as a PDF to 
the review. The x86 tests in undef-ops.ll correspond to that table.

Green means that instsimplify and the DAG agree on the result for all types.
Red means the DAG was returning undef when IR was not.
Yellow means the DAG was returning a non-undef result when IR returned undef.

This patch assumes that we're currently doing the right thing in IR.

Note: I couldn't find any problems with lowering vector constants as the code comments were warning, 
but those comments were written long ago in rL36413 .

Differential Revision: https://reviews.llvm.org/D43141

llvm-svn: 324941
2018-02-12 21:37:27 +00:00
Simon Pilgrim d0693a6501 [X86][MMX] Add missing scheduling class tag for EMMS/FEMMS
We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639).

AMD targets can perform these a lot quicker than WriteMicrocoded so will need an override in the models.

llvm-svn: 324897
2018-02-12 15:52:59 +00:00
Hans Wennborg 7e19dfc45f Revert r324835 "[X86] Reduce Store Forward Block issues in HW"
It asserts building Chromium; see PR36346.

(This also reverts the follow-up r324836.)

> If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
> A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
> The estimated penalty for a store forward block is ~13 cycles.
>
> This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
> of a load and a store.
>
> The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
> breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.

llvm-svn: 324887
2018-02-12 12:43:39 +00:00
Craig Topper 98ae8f833f [X86] Change some compare patterns to use loadi8/loadi16/loadi32/loadi64 helper fragments.
This enables CMP8mi to fold zextloadi8i1 which in all tests allows us to avoid creating a TEST8rr that peephole can't fold.

llvm-svn: 324863
2018-02-12 02:48:42 +00:00
Craig Topper 27d5b6e4a6 [X86] Autogenerate complete checks. NFC
llvm-svn: 324862
2018-02-12 02:03:36 +00:00
Craig Topper dfc322ddf4 [X86] Allow zextload/extload i1->i8 to be folded into instructions during isel
Previously we just emitted this as a MOV8rm which would likely get folded during the peephole pass anyway. This just makes it explicit earlier.

The gpr-to-mask.ll test changed because the kaddb instruction has no memory form.

llvm-svn: 324860
2018-02-12 01:33:36 +00:00
Craig Topper 3a354152dd [X86] Update some required-vector-width.ll test cases to not pass 512-bit vectors in arguments or return.
ABI for these would require 512 bits support so we don't want to test that.

llvm-svn: 324845
2018-02-11 18:52:16 +00:00
Simon Pilgrim 0d8c4bfc2a [X86][SSE] Use SplitBinaryOpsAndApply to recognise PSUBUS patterns before they're split on AVX1
This needs to be generalised further to support AVX512BW cases but I want to add non-uniform constants first.

llvm-svn: 324844
2018-02-11 17:29:42 +00:00
Craig Topper ca5a340171 [X86] Use min/max for vector ult/ugt compares if avoids a sign flip.
Summary:
Currently we only use min/max to help with ule/uge compares because it removes an invert of the result that would otherwise be needed. But we can also use it for ult/ugt compares if it will prevent the need for a sign bit flip needed to use pcmpgt at the cost of requiring an invert after the compare.

I also refactored the code so that the max/min code is self contained and does its own return instead of setting up a flag to manipulate the rest of the function's behavior.

Most of the test cases look ok with this. I did notice that we added instructions when one of the operands being sign flipped is a constant vector that we were able to constant fold the flip into.

I also noticed that sometimes the SSE min/max clobbers a register that is needed after the compare. This resulted in an extra move being inserted before the min/max to preserve the register. We could try to detect this and switch from min to max and change the compare operands to use the operand that gets reused in the compare.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42935

llvm-svn: 324842
2018-02-11 17:11:40 +00:00
Sanjay Patel eb8c408e50 [TargetLowering] try to create -1 constant operand for math ops via demanded bits
This reverses instcombine's demanded bits' transform which always tries to clear bits in constants.

As noted in PR35792 and shown in the test diffs:
https://bugs.llvm.org/show_bug.cgi?id=35792
...we can do better in codegen by trying to form -1. The x86 sub test shows a missed opportunity. 

I did investigate changing instcombine's behavior, but it would be more work to change 
canonicalization in IR. Clearing bits / shrinking constants can allow killing instructions, 
so we'd have to figure out how to not regress those cases.

Differential Revision: https://reviews.llvm.org/D42986

llvm-svn: 324839
2018-02-11 14:38:23 +00:00
Simon Pilgrim 7630150222 [X86] Add PR33747 test case
llvm-svn: 324838
2018-02-11 13:12:50 +00:00
Simon Pilgrim 0be5567a89 [X86][SSE] Enable SMIN/SMAX/UMIN/UMAX custom lowering for all legal types
This allows us to recognise more saturation patterns and also simplify some MINMAX codegen that was failing to combine CMPGE comparisons to a legal CMPGT.

Differential Revision: https://reviews.llvm.org/D43014

llvm-svn: 324837
2018-02-11 10:52:37 +00:00
Lama Saba 91e2b9d081 fix test/CodeGen/X86/fixup-sfb.ll test failure after commit https://reviews.llvm.org/rL324835
Change-Id: I2526c2f342654e85ce054237de03ae9db9ab4994
llvm-svn: 324836
2018-02-11 10:33:06 +00:00
Lama Saba c2ba6c387e [X86] Reduce Store Forward Block issues in HW
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.

This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.

The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.

Change-Id: I620b6dc91583ad9a1444591e3ddc00dd25d81748
llvm-svn: 324835
2018-02-11 09:34:12 +00:00
Craig Topper 24d3b28d93 [X86] Don't make 512-bit vectors legal when preferred vector width is 256 bits and 512 bits aren't required
This patch adds a new function attribute "required-vector-width" that can be set by the frontend to indicate the maximum vector width present in the original source code. The idea is that this would be set based on ABI requirements, intrinsics or explicit vector types being used, maybe simd pragmas, etc. The backend will then use this information to determine if its save to make 512-bit vectors illegal when the preference is for 256-bit vectors.

For code that has no vectors in it originally and only get vectors through the loop and slp vectorizers this allows us to generate code largely similar to our AVX2 only output while still enabling AVX512 features like mask registers and gather/scatter. The loop vectorizer doesn't always obey TTI and will create oversized vectors with the expectation the backend will legalize it. In order to avoid changing the vectorizer and potentially harm our AVX2 codegen this patch tries to make the legalizer behavior similar.

This is restricted to CPUs that support AVX512F and AVX512VL so that we have good fallback options to use 128 and 256-bit vectors and still get masking.

I've qualified every place I could find in X86ISelLowering.cpp and added tests cases for many of them with 2 different values for the attribute to see the codegen differences.

We still need to do frontend work for the attribute and teach the inliner how to merge it, etc. But this gets the codegen layer ready for it.

Differential Revision: https://reviews.llvm.org/D42724

llvm-svn: 324834
2018-02-11 08:06:27 +00:00
Simon Pilgrim d229bfd20d [X86][SSE] Add SMIN/SMAX combine test
As discussed on D43014, we need the ability to flip SMIN/SMAX to (legal) UMIN/UMAX

llvm-svn: 324829
2018-02-10 23:38:50 +00:00
Craig Topper 4dccffc84a [X86] Change signatures of avx512 packed fp compare intrinsics to return a vXi1 mask type to be closer to an fcmp.
Summary:
This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR.

This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares.

Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it.

This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once.

Reviewers: spatel, delena, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43137

llvm-svn: 324827
2018-02-10 23:33:55 +00:00
Simon Pilgrim b781b2e24c [X86][SSE] Add UMIN/UMAX combine test
As discussed on D43014, we need the ability to flip UMIN/UMAX to (legal) SMIN/SMAX

llvm-svn: 324826
2018-02-10 22:27:35 +00:00
Craig Topper 9121eb575e [X86] Custom legalize (v2i32 (setcc (v2f32))) so that we don't end up with a (v4i1 (setcc (v4f32)))
Undef VLX, getSetCCResultType returns v2i1/v4i1 for v2f32/v4f32 so default type legalization will end up changing the setcc result type back to vXi1 if it had been extended. The resulting extend gets messed up further by type legalization and is difficult to recombine back to (v4i32 (setcc (v4f32))) after legalization.

I went ahead and enabled this for SSE2 and later since its always the result we want and this helps type legalization get there in less steps.

llvm-svn: 324822
2018-02-10 19:12:58 +00:00
Craig Topper 28d3a73c81 [X86] Extend inputs with elements smaller than i32 to sint_to_fp/uint_to_fp before type legalization.
This prevents extends of masks being introduced during lowering where it become difficult to combine them out.

There are a few oddities in here.

We sometimes concatenate two k-registers produced by two compares, sign_extend the combined pair, then extract two halves. This worked better previously because the sign_extend wasn't created until after the fp_to_sint was split which led to a split sign_extend being created.

We probably also need to custom type legalize (v2i32 (sext v2i1)) via widening.

llvm-svn: 324820
2018-02-10 17:58:58 +00:00
Craig Topper 0fbdaa6d81 [X86] Remove some check-prefixes from avx512-cvt.ll to prepare for an upcoming patch.
The update script sometimes has trouble when there are check-prefixes representing every possible combination of feature flags. I have a patch where the update script was generating something that didn't pass lit.

This patch just removes some check-prefixes and expands out some of the checks to workaround this.

llvm-svn: 324819
2018-02-10 17:58:56 +00:00
Sanjay Patel 3a7e447cd6 [x86] preserve test intent by removing undef
D43141 proposes to correct undef folding in the DAG,
and this test would not survive that change.

llvm-svn: 324817
2018-02-10 15:36:23 +00:00
Sanjay Patel 5d176da09c [x86] preserve test intent by removing undef
D43141 proposes to correct undef folding in the DAG,
and this test would not survive that change.

llvm-svn: 324816
2018-02-10 15:28:08 +00:00
Simon Pilgrim 17b00ec4f5 [X86][SSE] Regenerate old sitofp v2i32 test
llvm-svn: 324812
2018-02-10 14:45:58 +00:00
Craig Topper b8d7b1620b [X86] Custom legalize (v2i1 (fp_to_uint/fp_to_sint v2f64)) without AVX512VL.
Strangely the code was already present, just the setOperationAction wasn't being called without VLX.

llvm-svn: 324806
2018-02-10 08:39:31 +00:00
Craig Topper c3aab4bbe1 [X86] Legalize zero extends from vXi1 to vXi16/vXi32/vXi64 using a sign extend and a shift.
This avoids a constant pool load to create 1.

The int->float are showing converts to mask and back. We probably need to widen inputs to sint_to_fp/uint_to_fp before type legalization.

llvm-svn: 324805
2018-02-10 08:06:52 +00:00
Craig Topper d34af6f636 [X86] Teach combineExtSetcc to handle ZERO_EXTEND by widening the setcc and then masking. A later DAG combine will convert to a shift.
This helps to avoid a constant pool load needed to zero extend from the mask.

llvm-svn: 324804
2018-02-10 08:06:49 +00:00
Craig Topper fa6113b3d7 [X86] Teach combineInsertSubvector how to combine some k-register insert_subvectors and extract_subvector sequences to remove extra zeroing.wq
llvm-svn: 324791
2018-02-10 01:00:41 +00:00
Craig Topper 99db883d55 [X86] Teach lower1BitVectorShuffle to recognize shuffles that are just filling upper elements with zero. Replace with insert_subvector.
There's still some extra kshifts in one of the modified test cases here, but hopefully that's only a DAG combine away.

llvm-svn: 324782
2018-02-09 23:32:27 +00:00
Sanjay Patel 4031ce15b8 [x86] remove duplicate undef tests; NFC
These are incomplete and were made redundant with the consolidation in:
https://reviews.llvm.org/rL324678

llvm-svn: 324754
2018-02-09 17:46:38 +00:00
Rafael Espindola c052fa0bd3 Emit smaller exception tables for non-SJLJ mode.
* Use uleb128 for code offsets in the LSDA call site table.
* Omit the TTBase offset if the type table is empty.

This change can reduce the size of the DWARF/Itanium LSDA by about half.

Patch by Ryan Prichard!

llvm-svn: 324750
2018-02-09 17:13:37 +00:00
Rafael Espindola d09b416943 Use assembler expressions to lay out the EH LSDA.
Rely on the assembler to finalize the layout of the DWARF/Itanium
exception-handling LSDA. Rather than calculate the exact size of each
thing in the LSDA, use assembler directives:

    To emit the offset to the TTBase label:

.uleb128 .Lttbase0-.Lttbaseref0
.Lttbaseref0:

    To emit the size of the call site table:

.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
... call site table entries ...
.Lcst_end0:

    To align the type info table:

... action table ...
.balign 4
.long _ZTIi
.long _ZTIl
.Lttbase0:

Using assembler directives simplifies the compiler and allows switching
the encoding of offsets in the call site table from udata4 to uleb128 for
a large code size savings. (This commit does not change the encoding.)

The combination of the uleb128 followed by a balign creates an unfortunate
dependency cycle that the assembler must sometimes resolve either by
padding an LEB or by inserting zero padding before the type table. See
PR35809 or GNU as bug 4029.

Patch by Ryan Prichard!

llvm-svn: 324749
2018-02-09 17:00:25 +00:00
Craig Topper ca5841b4e4 [X86] Simplify some code in lowerV4X128VectorShuffle and lowerV2X128VectorShuffle
Previously we extracted two subvectors and concatenate. But the concatenate will be lowered to two insert subvectors. Then DAG combine will merge once of the inserts and one of the extracts back into the original vector. We might as well just directly use one extract and one insert.

llvm-svn: 324710
2018-02-09 05:54:36 +00:00
Craig Topper 28166a877d [X86] Teach shuffle lowering to recognize 128/256 bit insertions into a zero vector.
This regresses a couple cases in the shuffle combining test. But those cases use intrinsics that InstCombine knows how to turn into a generic shuffle earlier. This should give opportunities to fold this earlier in InstCombine or DAG combine.

llvm-svn: 324709
2018-02-09 05:54:34 +00:00
Craig Topper 090e41d0cc [X86] Add 512-bit shuffle test cases for concatenating 128/256-bits with zeros in the upper portion.
We should recognize this and just use a mov that will zero the upper bits.

llvm-svn: 324708
2018-02-09 05:54:31 +00:00
Craig Topper 79c3255fe4 [x86] Add test cases to demonstrate some dumb mask->gpr->mask transition sequences.
llvm-svn: 324693
2018-02-09 01:14:17 +00:00
Francis Visoiu Mistrih 39ec2e95ae [CodeGen] Unify the syntax of MBB successors in MIR and -debug output
Instead of:

Successors according to CFG: %bb.6(0x12492492 / 0x80000000 = 14.29%)

print:

successors: %bb.6(0x12492492); %bb.6(14.29%)
llvm-svn: 324685
2018-02-09 00:10:31 +00:00
Sanjay Patel b7e13938a9 [x86] consolidate and add tests for undef binop folds; NFC
As was already shown in the div/rem tests and noted in PR36305,
the behavior is inconsistent, but it's not limited to div/rem only.

llvm-svn: 324678
2018-02-08 23:21:44 +00:00
Alexander Ivchenko da9e81c462 [GlobalISel][X86] Fixing failures after https://reviews.llvm.org/D37775
The patch essentially makes sure that X86CallLowering adds proper
G_COPY/G_TRUNC and G_ANYEXT/G_COPY when we are doing lowering of
arguments/returns for floating point values passed on registers.

Tests are updated accordingly

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D42287

llvm-svn: 324665
2018-02-08 22:41:47 +00:00
Alexander Ivchenko a85c4fc029 [GlobalIsel][X86] Making {G_IMPLICIT_DEF, s128} legal
The patch is a split from D42287 and is related to
fixing failures after https://reviews.llvm.org/D37775

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D42287

llvm-svn: 324664
2018-02-08 22:40:31 +00:00
Craig Topper 9e030c9e00 [X86] Improve combineCastedMaskArithmetic to fold (bitcast (vXi1 (and/or/xor X, C)))->(vXi1 (and/or/xor (bitcast X), (bitcast C)) where C is a constant build_vector.
Most vxi1 constant build vectors have to be implemented in the scalar domain anyway so we'll probably end up with a cast there later. But by then its too late to do the combine to get rid of it.

llvm-svn: 324662
2018-02-08 22:26:39 +00:00
Craig Topper 1b5b4ccb77 [X86] Add DAG combine to constant fold a bitcast of a vXi1 constant build_vector into a scalar integer.
llvm-svn: 324661
2018-02-08 22:26:36 +00:00
Craig Topper dccf72b583 [X86] Remove kortest intrinsics and replace with native IR.
llvm-svn: 324646
2018-02-08 20:16:06 +00:00
David Woodhouse 76eb26aa92 [X86] Support 'V' register operand modifier
This allows the register name to be printed without the leading '%'.
This can be used for emitting calls to the retpoline thunks from inline
asm.

llvm-svn: 324645
2018-02-08 20:06:05 +00:00
Simon Pilgrim 21fb6bccc4 [X86] Add common CHECK prefix to shift combine tests
llvm-svn: 324638
2018-02-08 19:23:47 +00:00
Simon Pilgrim ae6873f91c [X86] Add shift undef, %X tests
llvm-svn: 324637
2018-02-08 19:20:34 +00:00
Craig Topper 7aee1a838e [X86] Add a few new test cases for shrunkblend combine
One of them shows a missed opportunity to use SimplifyDemandedBits on the condition when its used by multiple vselects.

The other is a case we shouldn't optimize because the condition has a non-vselect use.

llvm-svn: 324630
2018-02-08 18:34:25 +00:00
Alexander Ivchenko dd5b2396d3 [x86] Add test/CodeGen/X86/vmaskmov-offset.ll. NFC.
Needed for checking current code generation.

llvm-svn: 324601
2018-02-08 13:16:42 +00:00
Craig Topper 8d0c8c9be1 [X86] Support folding in a k-register OR when creating KORTEST from scalar compare of a bitcast from vXi1.
This should allow us to remove the kortest intrinsic from IR and use compare+bitcast+or in IR instead.

llvm-svn: 324580
2018-02-08 08:29:43 +00:00
Craig Topper 93505707b6 [X86] Allow KORTEST instruction to be used for testing if a mask is all ones
The KTEST instruction sets the C flag if the result of anding both operands together is all 1s. We can use this to lower (icmp eq/ne (bitcast (vXi1 X), -1)

Differential Revision: https://reviews.llvm.org/D42772

llvm-svn: 324577
2018-02-08 07:54:16 +00:00
Craig Topper f5465f98d2 [X86] Don't emit KTEST instructions unless only the Z flag is being used
Summary:
KTEST has weird flag behavior. The Z flag is set for all bits in the AND of the k-registers being 0, and the C flag is set for all bits being 1. All other flags are cleared.

We currently emit this instruction in EmitTEST and don't check the condition code. This can lead to strange things like using the S flag after a KTEST for a signed compare.

The domain reassignment pass can also transform TEST instructions into KTEST and is not protected against the flag usage either. For now I've disabled this part of the domain reassignment pass. I tried to comment out the checks in the mir test so that we could recover them later, but I couldn't figure out how to get that to work.

This patch moves the KTEST handling into LowerSETCC and now creates a ktest+x86setcc. I've chosen this approach because I'd like to add support for the C flag for all ones in a followup patch. To do that requires that I can rewrite the condition code going in the x86setcc to be different than the original SETCC condition code.

This fixes PR36182. I'll file a PR to fix domain reassignment once this goes in. Should this be merged to 6.0?

Reviewers: spatel, guyblank, RKSimon, zvi

Reviewed By: guyblank

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42770

llvm-svn: 324576
2018-02-08 07:45:55 +00:00
Francis Visoiu Mistrih da89d1812a [CodeGen] Print MachineBasicBlock labels using MIR syntax in -debug output
Instead of:

%bb.1: derived from LLVM BB %for.body

print:

bb.1.for.body:

Also use MIR syntax for MBB attributes like "align", "landing-pad", etc.

llvm-svn: 324563
2018-02-08 05:02:00 +00:00
Chandler Carruth 0be0cfa65b [x86] Fix nasty bug in the x86 backend that is essentially impossible to
hit from IR but creates a minefield for MI passes.

The x86 backend has fairly powerful logic to try and fold loads that
feed register operands to instructions into a memory operand on the
instruction. This is almost always a good thing, but there are specific
relocated loads that are only allowed to appear in specific
instructions. Notably, R_X86_64_GOTTPOFF is only allowed in `movq` and
`addq`. This patch blocks folding of memory operands using this
relocation unless the target is in fact `addq`.

The particular relocation indicates why we simply don't hit this under
normal circumstances. This relocation is only used for TLS, and it gets
used in very specific ways in conjunction with %fs-relative addressing.
The result is that loads using this relocation are essentially never
eligible for folding into an instruction's memory operands. Unless, of
course, you have an MI pass that inserts usage of such a load. I have
exactly such an MI pass and was greeted by truly mysterious miscompiles
where the linker replaced my instruction with a completely garbage byte
sequence. Go team.

This is the only such relocation I'm aware of in x86, but there may be
others that need to be similarly restricted.

Fixes PR36165.

Differential Revision: https://reviews.llvm.org/D42732

llvm-svn: 324546
2018-02-07 23:59:14 +00:00
Craig Topper 8baa9c77e3 [X86] When doing callee save/restore for k-registers make sure we don't use KMOVQ on non-BWI targets
If we are saving/restoring k-registers, the default behavior of getMinimalRegisterClass will find the VK64 class with a spill size of 64 bits. This will cause the KMOVQ opcode to be used for save/restore. If we don't have have BWI instructions we need to constrain the class returned to give us VK16 with a 16-bit spill size. We can do this by passing the either v16i1 or v64i1 into getMinimalRegisterClass.

Also add asserts to make sure BWI is enabled anytime we use KMOVD/KMOVQ. These are what caught this bug.

Fixes PR36256

Differential Revision: https://reviews.llvm.org/D42989

llvm-svn: 324533
2018-02-07 21:41:50 +00:00
Craig Topper ce26819f9e [X86] Auto-generate complete checks. NFC
llvm-svn: 324530
2018-02-07 21:29:30 +00:00
Craig Topper d18430018d [X86] Regenerate test using update_mir_test_checks.py. NFC
llvm-svn: 324497
2018-02-07 18:32:15 +00:00
Simon Pilgrim b4e789e8f6 [X86][AVX] Add PACKSSDW/PACKUSDW support for truncation of clamped values
SSE and shorter vector sizes will have to wait until we can add support for general SMIN/SMAX matching.

llvm-svn: 324485
2018-02-07 15:48:44 +00:00
Simon Pilgrim c90d79f80a [X86] Regenerate atomic i32 tests
llvm-svn: 324479
2018-02-07 13:28:23 +00:00
Clement Courbet 10003e31f4 [MergeICmps] Re-commit rL324317 "Enable the MergeICmps Pass by default."
With fixes from rL324341.

Original commit message:

[MergeICmps] Enable the MergeICmps Pass by default.

Summary: Now that PR33325 is fixed, this should always improve the generated code.

Reviewers: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42793

llvm-svn: 324465
2018-02-07 09:58:55 +00:00
Chandler Carruth 282ae1632a [x86/retpoline] Make the external thunk names exactly match the names
that happened to end up in GCC.

This is really unfortunate, as the names don't have much rhyme or reason
to them. Originally in the discussions it seemed fine to rely on aliases
to map different names to whatever external thunk code developers wished
to use but there are practical problems with that in the kernel it turns
out. And since we're discovering this practical problems late and since
GCC has already shipped a release with one set of names, we are forced,
yet again, to blindly match what is there.

Somewhat rushing this patch out for the Linux kernel folks to test and
so we can get it patched into our releases.

Differential Revision: https://reviews.llvm.org/D42998

llvm-svn: 324449
2018-02-07 06:16:24 +00:00
Volkan Keles 5838f7c013 GlobalISel: Always check operand types when executing match table
Summary:
Some of the commands tries to get the register without checking
if the specified operands is a register and causing crash. All commands
should check the type of the operand first and reject if the type is
not expected.

Reviewers: dsanders, qcolombet

Reviewed By: qcolombet

Subscribers: qcolombet, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42984

llvm-svn: 324442
2018-02-07 02:44:51 +00:00
Eli Friedman cd07a3e2f9 Place undefined globals in .bss instead of .data
Following up on the discussion from
http://lists.llvm.org/pipermail/llvm-dev/2017-April/112305.html, undef
values are now placed in the .bss as well as null values. This prevents
undef global values taking up potentially huge amounts of space in the
.data section.

The following two lines now both generate equivalent .bss data:

@vals1 = internal unnamed_addr global [20000000 x i32] zeroinitializer, align 4
@vals2 = internal unnamed_addr global [20000000 x i32] undef, align 4 ; previously unaccounted for

This is primarily motivated by the corresponding issue in the Rust
compiler (https://github.com/rust-lang/rust/issues/41315).

Differential Revision: https://reviews.llvm.org/D41705

Patch by varkor!

llvm-svn: 324424
2018-02-06 23:22:14 +00:00
Craig Topper dfea544c84 [X86] Add test cases that exercise the BSR/BSF optimization combineCMov.
combineCmov tries to remove compares against BSR/BSF if we can prove the input to the BSR/BSF are never zero.

As far as I can tell most of the time codegenprepare despeculates ctlz/cttz and gives us a cttz_zero_undef/ctlz_zero_undef which don't use a cmov.

So the only way I found to trigger this code is to show codegenprepare an illegal type which it won't despeculate.

I think we should be turning ctlz/cttz into ctlz_zero_undef/cttz_zero_undef for these cases before we ever get to operation legalization where the cmov is created. But wanted to add these tests so we don't regress.

llvm-svn: 324409
2018-02-06 21:47:04 +00:00
Sanjay Patel 0cdc273ada [x86] add tests to show demanded bits shortcoming; NFC
llvm-svn: 324408
2018-02-06 21:43:57 +00:00