Commit Graph

12454 Commits

Author SHA1 Message Date
Craig Topper f107123a88 [X86] Type legalize v2i32 div/rem by scalarizing rather than promoting
Summary:
Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall.

This patch switches type legalization to do the scalarizing itself using i32.

It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support.

Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides.

Reviewers: RKSimon, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51325

llvm-svn: 342114
2018-09-13 06:13:37 +00:00
Michael Berg 22a53cbc7f Guard FMF context by excluding some FP operators from FPMathOperator
Summary:
Some FPMathOperators succeed and the retrieve FMF context when they never have it, we should omit these cases to keep from removing FMF context.

For instance when we visit some FPMathOperator mapped Instructions which never have FMF flags and a Node was associated which does have FMF flags, that Node today will have all its flags cleared via the intersect operation.  With this change, we exclude associating Nodes that never have FPMathOperator status under FMF.

Reviewers: spatel, wristow, arsenm, hfinkel, aemerson

Reviewed By: spatel

Subscribers: llvm-commits, wdng

Differential Revision: https://reviews.llvm.org/D51145

llvm-svn: 342081
2018-09-12 21:09:59 +00:00
Craig Topper 2262613532 [X86] Remove isel patterns for ADCX instruction
There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags.

I don't think gcc will generate this instruction either.

Fixes PR38852.

Differential Revision: https://reviews.llvm.org/D51754

llvm-svn: 342059
2018-09-12 15:47:34 +00:00
Craig Topper dc32e91bc6 [X86] Teach X86SelectionDAGInfo::EmitTargetCodeForMemcpy about GNUX32
Summary:
In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match.

Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger.

Reviewers: rnk, efriedma, MatzeB, echristo

Reviewed By: efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51893

llvm-svn: 342015
2018-09-12 01:57:22 +00:00
Craig Topper 8238580aae [X86] Prefer unpckhpd over movhlps in isel for fake unary cases
In r337348, I changed lowering to prefer X86ISD::UNPCKL/UNPCKH opcodes over MOVLHPS/MOVHLPS for v2f64 {0,0} and {1,1} shuffles when we have SSE2. This enabled the removal of a bunch of weirdly bitcasted isel patterns in r337349. To avoid changing the tests I placed a gross hack in isel to still emit movhlps instructions for fake unary unpckh nodes. A similar hack was not needed for unpckl and movlhps because we do execution domain switching for those. But unpckh and movhlps have swapped operand order.

This patch removes the hack.

This is a code size increase since unpckhpd requires a 0x66 prefix and movhlps does not. But if that's a big concern we should be using movhlps for all unpckhpd opcodes and let commuteInstruction turnit into unpckhpd when its an advantage.

Differential Revision: https://reviews.llvm.org/D49499

llvm-svn: 341973
2018-09-11 17:57:27 +00:00
Craig Topper cc9efaffad [X86] Teach X86FastISel::X86SelectRet to use EAX for the sret pointer in GNUX32
GNUX32 uses 32-bit pointers despite is64BitMode being true. So we should use EAX to return the value.

Fixes ones of the failures from PR38865.

Differential Revision: https://reviews.llvm.org/D51940

llvm-svn: 341972
2018-09-11 17:57:23 +00:00
Roman Lebedev baf2628043 [DagCombine][NFC] Some more tests fo for X % C == 0 (UREM case) transform
For https://reviews.llvm.org/D50222

Patch by: hermord (Dmytro Shynkevych)!

llvm-svn: 341953
2018-09-11 15:34:26 +00:00
Craig Topper 844f035e1e [X86] In combineMOVMSK, look through int->fp bitcasts before callling SimplifyDemandedBits.
MOVMSKPS and MOVMSKPD both take FP types, but likely the operations before it are on integer types with just a int->fp bitcast between them. If the bitcast isn't used by anything else and doesn't change the element width we can look through it to simplify the integer ops.

llvm-svn: 341915
2018-09-11 08:20:02 +00:00
Craig Topper 85210311ba [X86] Add test cases inspired by PR38840.
These are test cases inspired by sequences like below for extracting the same bit from every vector element and checking for all zeros/ones.

define i1 @and256_x8(<8 x i32>) {
    %a = trunc <8 x i32> %0 to <8 x i1>
    %b = bitcast <8 x i1> %a to i8
    %d = icmp eq i8 %b, -1
    ret i1 %d
}

This is what the above looks like after InstCombine.

define i1 @and256_x8_opt(<8 x i32>) {
  %2 = and <8 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %a = icmp ne <8 x i32> %2, zeroinitializer
  %b = bitcast <8 x i1> %a to i8
  %d = icmp eq i8 %b, -1
  ret i1 %d
}

llvm-svn: 341908
2018-09-11 07:23:29 +00:00
Craig Topper 07889079fa [X89] Explicitly enable aes in aes-schedule.ll to fix failures after r341861.
llvm-svn: 341868
2018-09-10 21:49:01 +00:00
Sanjay Patel 7feb3ed78c [x86] test codegen for unsigned saturated add; NFC
All of the ISA holes are going to make this difficult,
but we can't canonicalize the IR and try to solve PR14613
until we have backend support to get this right.

https://bugs.llvm.org/show_bug.cgi?id=14613

https://rise4fun.com/Alive/Guv
https://rise4fun.com/Alive/AADG

llvm-svn: 341845
2018-09-10 17:40:15 +00:00
Craig Topper 3823516103 [X86] Custom type legalize (v2i32 (fp_to_uint v2f64))) without avx512vl by widening to v4i32 and v4f64 instead of v8i32 and v8f64. Make it aware of x86-experimental-vector-widening-legalization
We have isel patterns for v4i32/v4f64 that artificially widen to v8i32/v8f64 so just use that.

If x86-experimental-vector-widening-legalization is enabled, we don't need any custom legalization and can just return. I've modified the test RUN lines to cover this case.

llvm-svn: 341765
2018-09-09 20:36:36 +00:00
Sanjay Patel 6ebf218e4c [SelectionDAG] enhance vector demanded elements to look at a vector select condition operand
This is the DAG equivalent of D51433.
If we know we're not using all vector lanes, use that knowledge to potentially simplify a vselect condition.

The reduction/horizontal tests show that we are eliminating AVX1 operations on the upper half of 256-bit 
vectors because we don't need those anyway.
I'm not sure what the pr34592 test is showing. That's run with -O0; is SimplifyDemandedVectorElts supposed 
to be running there?

Differential Revision: https://reviews.llvm.org/D51696

llvm-svn: 341762
2018-09-09 14:13:22 +00:00
Craig Topper 7af5e333e7 [X86] Create paddus/psubus from narrower vectors with i8/i16 element types.
Summary:
This patch allows vectors with a power of 2 number of elements and i8/i16 element type to select paddus/psubus instructions. ReplaceNodeResults has been updated to custom widen these operations up to 128 bits like we already do for PAVG.

Another step towards fixing PR38691

Reviewers: RKSimon, spatel

Reviewed By: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51818

llvm-svn: 341753
2018-09-08 19:32:58 +00:00
Craig Topper a2c9694bc8 [X86] Mark the ADCX and ADOX instruction as commutable.
llvm-svn: 341752
2018-09-08 18:47:56 +00:00
Craig Topper 4677110348 [X86] Add test cases for commuting ADCX/ADOX instruction to avoid copies.
This is a MIR test so we can test ADOX which we have no isel patterns for. I also plan to remove ADCX isel patterns in the near future so this will help maintain coverage.

llvm-svn: 341751
2018-09-08 18:47:54 +00:00
Craig Topper c96305970d [X86] Add commuted isel pattern for the load form of ADCX instructions.
This prevents the legacy ADC instruction from being favored over ADCX when the load is in the operand 0.

llvm-svn: 341745
2018-09-08 06:31:43 +00:00
Craig Topper 22a6f51646 [X86] Add load folding test cases for the addcarryx intrinsic.
We are currently only able to fold a load in operand 1 to ADCX. A load in operand 0 will use the legacy ADC instruction.

Ultimately I want to remove isel support for ADCX, but first I'm going to fix the shortcomings I know of so I can write proper MIR tests to maintain coverage later.

llvm-svn: 341744
2018-09-08 06:31:41 +00:00
Craig Topper 761e88d1d4 [X86] Add stack folding MIR test for ADCX/ADOX.
We currently have no way to isel ADOX and I plan to remove isel patterns for ADCX. This test will ensure we still have stack folding support for these instructions if we need them in the future.

llvm-svn: 341743
2018-09-08 05:08:18 +00:00
Reid Kleckner f803b23879 [COFF] Implement llvm.global_ctors priorities for MSVC COFF targets
Summary:
MSVC and LLD sort sections ASCII-betically, so we need to use section
names that sort between .CRT$XCA (the start) and .CRT$XCU (the default
priority).

In the general case, use .CRT$XCT12345 as the section name, and let the
linker sort the zero-padded digits.

Users with low priorities typically want to initialize as early as
possible, so use .CRT$XCA00199 for prioties less than 200. This number
is arbitrary.

Implements PR38552.

Reviewers: majnemer, mstorsjo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D51820

llvm-svn: 341727
2018-09-07 23:07:55 +00:00
Craig Topper fa535c027e [X86] Add codegen tests for narrow PADDUS/PSUBUS patterns for PR38691.
llvm-svn: 341711
2018-09-07 21:28:46 +00:00
Craig Topper 5cbce81c91 [X86] Don't create ZERO_EXTEND_INREG/SIGN_EXTEND_INREG for v1iX vectors.
The generic type legalizer will scalarize vXi1 instructions getting rid of the vector entirely. Creating wider vector instructions is just going to prevent that.

llvm-svn: 341705
2018-09-07 20:56:03 +00:00
Craig Topper 39f48fdcbc [X86] Don't create X86ISD::AVG nodes from v1iX vectors.
The type legalizer will try to scalarize this and fail.

It looks like there's some other v1iX oddities out there too since we still generated some vector instructions.

llvm-svn: 341704
2018-09-07 20:56:01 +00:00
Craig Topper 4863313b35 [X86] Modify the the rdtscp intrinsic to return values instead of taking a pointer argument
Similar to what was recently done for addcarry/subborrow and has been done for rdrand/rdseed for a while. It's better to use two results and an explicit store in IR when the store isn't part of the semantics of the instruction. This allows store->load forwarding to happen in the middle end. Or the store to be removed if its never loaded.

Differential Revision: https://reviews.llvm.org/D51803

llvm-svn: 341698
2018-09-07 19:14:15 +00:00
Craig Topper 72964ae99e [X86] Change the addcarry and subborrow intrinsics to return 2 results and remove the pointer argument.
We should represent the store directly in IR instead. This gives the middle end a chance to remove it if it can see a load from the same address.

Differential Revision: https://reviews.llvm.org/D51769

llvm-svn: 341677
2018-09-07 16:58:39 +00:00
Craig Topper 51e11788a4 [X86] Use regular expressions to make test immune to register allocation changes.
llvm-svn: 341676
2018-09-07 16:58:36 +00:00
Craig Topper 313d09af51 [X86] Teach X86DAGToDAGISel::foldLoadStoreIntoMemOperand to handle loads in operand 1 of commutable operations.
Previously we only handled loads in operand 0, but nothing guarantees the load will be operand 0 for commutable operations.

Differential Revision: https://reviews.llvm.org/D51768

llvm-svn: 341675
2018-09-07 16:27:55 +00:00
Simon Pilgrim 04d0748417 [X86][SSE] Add additional fadd/fsub(x, bitcast_fneg(y)) tests with different integer bitwidths
llvm-svn: 341657
2018-09-07 13:27:07 +00:00
Simon Pilgrim 96d6b9c2e2 [DAGCombiner] foldBitcastedFPLogic - Add basic vector support
Add support for bitcasts from float type to an integer type of the same element bitwidth.

There maybe cases where we need to support different widths (e.g. as SSE __m128i is treated as v2i64) - but I haven't seen cases of this in the wild yet.

llvm-svn: 341652
2018-09-07 12:13:45 +00:00
Simon Pilgrim a2aef22a72 [X86][SSE] Add fadd/fsub(x, bitcast_fneg(y)) tests
Show missing vector support

llvm-svn: 341650
2018-09-07 11:24:43 +00:00
Craig Topper 30e129f256 [X86] Add more test cases for missed opportunities for using RMW form of ADC.
llvm-svn: 341630
2018-09-07 02:39:56 +00:00
Craig Topper 2c9dede9cb [X86] Add RMW ADC patterns with load in operand 1.
ADC is commutable and the load could be in either operand, but we were only checking operand 0.

Ideally we'd mark X86adc_flag as commutable and tablegen would automatically do this, but the EFLAGS register mention is preventing it.

llvm-svn: 341606
2018-09-06 23:55:36 +00:00
Craig Topper 37d68e4599 [X86] Add a test case showing failure to use the RMW form of ADC when the load is in operand 1 going into isel.
The ADC instruction is commutable, but we only have RMW isel patterns with a load on the left hand side. Nothing will canonicalize loads to the LHS on these ops. So we need two patterns.

llvm-svn: 341605
2018-09-06 23:55:34 +00:00
Sanjay Patel 9e5c163154 [x86] add tests for pow --> cbrt; NFC
llvm-svn: 341575
2018-09-06 18:42:55 +00:00
Craig Topper 5a53760f65 [X86][Assembler] Allow %eip as a register in 32-bit mode for .cfi directives.
This basically reverts a change made in r336217, but improves the text of the error message for not allowing IP-relative addressing in 32-bit mode.

Fixes PR38826.

Patch by Iain Sandoe.

llvm-svn: 341512
2018-09-06 02:03:14 +00:00
Sanjay Patel dbf52837fe [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))
This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. 
Here, we only do the transform when the target tells us that sqrt can be lowered with inline code.

This is the basic case. Some potential enhancements are in the TODO comments:

1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper).
2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF.
3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right).

Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with 
refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty 
of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses 
its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should 
also be covered by existing tests).

Differential Revision: https://reviews.llvm.org/D51630 

llvm-svn: 341481
2018-09-05 17:01:56 +00:00
Chandler Carruth 664aa868f5 [x86/SLH] Add a real Clang flag and LLVM IR attribute for Speculative
Load Hardening.

Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.

Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.

While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.

This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.

Differential Revision: https://reviews.llvm.org/D51157

llvm-svn: 341363
2018-09-04 12:38:00 +00:00
Chandler Carruth 219888d1b2 [x86/SLH] Teach SLH to harden against the "ret2spec" attack by
implementing the proposed mitigation technique described in the original
design document.

The idea is to check after calls that the return address used to arrive
at that location is in fact the correct address. In the event of
a mis-predicted return which reaches a *valid* return but not the
*correct* return, this will detect the mismatch much like it would
a mispredicted conditional branch.

This is the last published attack vector that I am aware of in the
Spectre v1 space which is not mitigated by SLH+retpolines. However,
don't read *too* much into that: this is an area of ongoing research
where we expect more issues to be discovered in the future, and it also
makes no attempt to mitigate Spectre v4. Still, this is an important
completeness bar for SLH.

The change here is of course delightfully simple. It was predicated on
cutting support for post-instruction symbols into LLVM which was not at
all simple. Many thanks to Hal Finkel, Reid Kleckner, and Justin Bogner
who helped me figure out how to do a bunch of the complex changes
involved there.

Differential Revision: https://reviews.llvm.org/D50837

llvm-svn: 341358
2018-09-04 10:59:10 +00:00
Chandler Carruth 8d8489f513 [x86/SLH] Teach SLH to harden indirect branches and switches without
retpolines.

This implements the core design of tracing the intended target into the
target, checking it, and using that to update the predicate state. It
takes advantage of a few interesting aspects of SLH to make it a bit
easier to implement:
- We already split critical edges with conditional branches, so we can
assume those are gone.
- We already unfolded any memory access in the indirect branch
instruction itself.

I've left hard errors in place to catch if any of these somewhat subtle
invariants get violated.

There is some code that I can factor out and share with D50837 when it
lands, but I didn't want to couple landing the two patches, so I'll do
that in a follow-up cleanup commit if alright.

Factoring out the code to handle different scenarios of materializing an
address remains frustratingly hard. In a bunch of cases you want to fold
one of the cases into an immediate operand of some other instruction,
and you also have both symbols and basic blocks being used which require
different methods on the MI builder (and different operand kinds).
Still, I'll take a stab at sharing at least some of this code in
a follow-up if I can figure out how.

Differential Revision: https://reviews.llvm.org/D51083

llvm-svn: 341356
2018-09-04 10:44:21 +00:00
Sanjay Patel 0945959869 [AArch64][x86] add tests for pow(x, 0.25); NFC
Folds for this were proposed in D49306, but we
decided the transform is better suited for the backend.

llvm-svn: 341341
2018-09-03 22:11:47 +00:00
Carlos Alberto Enciso eaf2c1f449 Test commit.
Revert change done in r341297. NFC.

Differential Revision: https://reviews.llvm.org/D51583

llvm-svn: 341302
2018-09-03 09:41:43 +00:00
Carlos Alberto Enciso f03e049234 Test commit - adding a new line.
llvm-svn: 341297
2018-09-03 08:26:37 +00:00
Roman Lebedev d7a6244475 [DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle inverted pattern
Summary:
A follow-up for D49266 / rL337166 + D49497 / rL338044.

This is still the same pattern to check for the [lack of]
signed truncation, but in this case the constants and the predicate
are negated.

https://rise4fun.com/Alive/BDV
https://rise4fun.com/Alive/n7Z

Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma, dmgreen

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51532

llvm-svn: 341287
2018-09-02 13:56:22 +00:00
Craig Topper caf6672779 [X86] Add intrinsics for KTEST instructions.
These intrinsics use the same implementation as PTEST intrinsics, but use vXi1 vectors.

New clang builtins will be accompanying them shortly.

llvm-svn: 341259
2018-08-31 21:31:53 +00:00
Craig Topper b7bb9f0078 [X86] Add support for turning vXi1 shuffles into KSHIFTL/KSHIFTR.
This patch recognizes shuffles that shift elements and fill with zeros. I've copied and modified the shift matching code we use for normal vector registers to do this. I'm not sure if there's a good way to share more of this code without making the existing function more complex than it already is.

This will be used to enable kshift intrinsics in clang.

Differential Revision: https://reviews.llvm.org/D51401

llvm-svn: 341227
2018-08-31 17:17:21 +00:00
Alexander Ivchenko 9d053074a1 [GlobalISel][X86] Add the support for G_FPTRUNC
Differential Revision: https://reviews.llvm.org/D49855

llvm-svn: 341202
2018-08-31 11:26:51 +00:00
Alexander Ivchenko 9b0b492653 [GlobalISel][X86_64] Support for G_FPTOSI
Differential Revision: https://reviews.llvm.org/D49183

llvm-svn: 341200
2018-08-31 11:16:58 +00:00
Alexander Ivchenko 58a5d6fde7 [GlobalIsel][X86] Support for llvm.trap intrinsic
Differential Revision: https://reviews.llvm.org/D49180

llvm-svn: 341199
2018-08-31 11:05:13 +00:00
Alexander Ivchenko a26a364e75 [GlobalIsel][X86] Support for G_FCMP
Differential Revision: https://reviews.llvm.org/D49172

llvm-svn: 341193
2018-08-31 09:38:27 +00:00
Roman Lebedev 75c2961b76 [NFC][X86][AArch64] A few more patterns for [lack of] signed truncation check pattern.[NFC][X86][AArch64] A few more patterns for [lack of] signed truncation check pattern.
llvm-svn: 341188
2018-08-31 08:52:03 +00:00