Commit Graph

15167 Commits

Author SHA1 Message Date
Igor Breger f5035d6ee5 [GlobalISel][X86] Support vector type G_EXTRACT selection.
Summary:
Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33957

llvm-svn: 306240
2017-06-25 11:42:17 +00:00
Dorit Nuzman e0e0f1ddb0 [AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2
The cost of an interleaved access was only implemented for AVX512. For other
X86 targets an overly conservative Base cost was returned, resulting in
avoiding vectorization where it is actually profitable to vectorize.
This patch starts to add costs for AVX2 for most prominent cases of
interleaved accesses (stride 3,4 chars, for now).

Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb
workloads; There is also a known issue of 15-30% degradations on some of these
workloads, associated with an interleaved access followed by type
promotion/widening; the resulting shuffle sequence is currently inefficient and
will be improved by a series of patches that extend the X86InterleavedAccess pass
(such as D34601 and more to follow).

Note 2: The costs in this patch do not reflect port pressure penalties which can
be very dominant in the case of interleaved accesses since most of the shuffle
operations are restricted to a single port. Further tuning, that may incorporate
these considerations, will be done on top of the upcoming improved shuffle
sequences (that is, along with the abovementioned work to extend
X86InterleavedAccess pass).


Differential Revision: https://reviews.llvm.org/D34023

llvm-svn: 306238
2017-06-25 08:26:25 +00:00
Rafael Espindola f351292141 Remove redundant argument.
llvm-svn: 306189
2017-06-24 00:26:57 +00:00
Rafael Espindola 801b42de31 ARM: move some logic from processFixupValue to applyFixup.
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.

While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.

llvm-svn: 306177
2017-06-23 22:52:36 +00:00
whitequark 00ede4dcc1 [X86] Fix SP adjustment in stack probes emitted on 32-bit Windows.
Commit r306010 adjusted the condition as follows:

-  if (Is64Bit) {
+  if (!STI.isTargetWin32()) {

The intent was to preserve the behavior on all Windows platforms
but extend the behavior on 64-bit Windows platforms to every
other one. (Before r306010, emitStackProbeCall only ever executed
when emitting code for Windows triples.)

Unfortunately,
  if (Is64Bit && STI.isOSWindows())
is not the same as
  if (!STI.isTargetWin32())
because of the way isTargetWin32() is defined:

  bool isTargetWin32() const {
    return !In64BitMode && (isTargetCygMing() ||
                            isTargetKnownWindowsMSVC());
  }

In practice this broke the JIT tests on 32-bit Windows, which did not
satisfy the new condition:

    LLVM :: ExecutionEngine/MCJIT/2003-01-15-AlignmentTest.ll
    LLVM :: ExecutionEngine/MCJIT/2003-08-15-AllocaAssertion.ll
    LLVM :: ExecutionEngine/MCJIT/2003-08-23-RegisterAllocatePhysReg.ll
    LLVM :: ExecutionEngine/MCJIT/test-loadstore.ll
    LLVM :: ExecutionEngine/OrcMCJIT/2003-01-15-AlignmentTest.ll
    LLVM :: ExecutionEngine/OrcMCJIT/2003-08-15-AllocaAssertion.ll
    LLVM :: ExecutionEngine/OrcMCJIT/2003-08-23-RegisterAllocatePhysReg.ll
    LLVM :: ExecutionEngine/OrcMCJIT/test-loadstore.ll

because %esp was not updated correctly. The failures are only visible
on a MSVC 2017 Debug build, for which we do not have bots.

llvm-svn: 306142
2017-06-23 18:58:10 +00:00
Sanjay Patel 3de6bad65f [x86] fix value types for SBB transform (PR33560)
I'm not sure yet why this wouldn't fail in the simple case,
but clearly I used the wrong value type with:
https://reviews.llvm.org/rL306040

...and the bug manifests with:
https://bugs.llvm.org/show_bug.cgi?id=33560

llvm-svn: 306139
2017-06-23 18:42:15 +00:00
Simon Pilgrim 6e85e92b6c Remove trailing whitespace. NFCI.
llvm-svn: 306121
2017-06-23 16:35:32 +00:00
Rafael Espindola 58173b9720 COFF: Produce an error on invalid pcrel relocs.
X86_64 COFF only has support for 32 bit pcrel relocations. Produce an
error on all others.

Note that gnu as has extended the relocation values to support
this. It is not clear if we should support the gnu extension.

llvm-svn: 306082
2017-06-23 04:07:44 +00:00
Farhana Aleen 9bd593e0d7 Fixed a (product) build error that was due to an unused variable
Details: There was a use but it was in the assert which was not
         exercised during product build.

Reviewers: Andrew Kaylor

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32658

llvm-svn: 306073
2017-06-22 23:56:31 +00:00
Sanjay Patel 359ae44fb4 [x86] add/sub (X==0) --> sbb(cmp X, 1)
This is very similar to the transform in:
https://reviews.llvm.org/rL306040
...but in this case, we use cmp X, 1 to set the carry bit as needed.

Again, we can show that all of these are logically equivalent (although
InstCombine currently canonicalizes to a form not seen here), and if
we believe IACA, then this is the smallest/fastest code. Eg, with SNB:

| Num Of |              Ports pressure in cycles               |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |    |
---------------------------------------------------------------------
|   1    | 1.0       |     |           |           |     |     |    | cmp edi, 0x1
|   2    |           | 1.0 |           |           |     | 1.0 | CP | sbb eax, eax


The larger motivation is to clean up all select-of-constants combining/lowering 
because we're missing some common cases.

llvm-svn: 306072
2017-06-22 23:47:15 +00:00
Farhana Aleen 4b652a5335 Supported lowerInterleavedStore() in X86InterleavedAccess.
Reviewers: RKSimon, DavidKreitzer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32658

llvm-svn: 306068
2017-06-22 22:59:04 +00:00
Craig Topper 792fc92be2 [AVX-512] Remove and autoupgrade the masked integer compare intrinsics
Summary:
These intrinsics aren't used by clang and haven't been for a while.

There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today.

Reviewers: spatel, RKSimon, zvi, igorb

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34389

llvm-svn: 306047
2017-06-22 20:11:01 +00:00
Sanjay Patel 41a34e4111 [x86] add/sub (X==0) --> sbb(neg X)
Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480),
lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb'
codegen in 1 out of the 4 tests. This is a step towards smoothing that out.

First, show that all of these IR forms are equivalent:
http://rise4fun.com/Alive/mx

Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge
(later Intel and AMD chips are similar based on Agner's tables):

This is the "obvious" x86 codegen (what gcc appears to produce currently):

| Num Of |              Ports pressure in cycles               |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |    |
---------------------------------------------------------------------
|   1*   |           |     |           |           |     |     |    | xor eax, eax
|   1    | 1.0       |     |           |           |     |     | CP | test edi, edi
|   1    |           |     |           |           |     | 1.0 | CP | setnz al
|   1    |           | 1.0 |           |           |     |     | CP | neg eax


This is the adc version:
|   1*   |           |     |           |           |     |     |    | xor eax, eax
|   1    | 1.0       |     |           |           |     |     | CP | cmp edi, 0x1
|   2    |           | 1.0 |           |           |     | 1.0 | CP | adc eax, 0xffffffff


And this is sbb:
|   1    | 1.0       |     |           |           |     |     |    | neg edi
|   2    |           | 1.0 |           |           |     | 1.0 | CP | sbb eax, eax

If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be
clearly better than the alternatives going forward.

llvm-svn: 306040
2017-06-22 18:11:19 +00:00
Rafael Espindola 8a261c2565 Add a common error checking for some invalid expressions.
This refactors a bit of duplicated code and fixes an assertion failure
on ELF.

llvm-svn: 306035
2017-06-22 17:25:35 +00:00
whitequark cebe8241ca [X86] Add support for "probe-stack" attribute
This commit adds prologue code emission for stack probe function
calls.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D34387

llvm-svn: 306010
2017-06-22 15:42:53 +00:00
Igor Breger 1c29be7e4f [GlobalISel][X86] Support vector type G_INSERT legalization/selection.
Summary:
Support vector type G_INSERT legalization/selection.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33956

llvm-svn: 305989
2017-06-22 09:43:35 +00:00
Elena Demikhovsky 2dac0b4d58 AVX-512: Lowering Masked Gather intrinsic - fixed a bug
Masked gather for vector length 2 is lowered incorrectly for element type i32.
The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD.
The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal.

In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well.

Differential revision: https://reviews.llvm.org/D34343

llvm-svn: 305987
2017-06-22 06:47:41 +00:00
Rafael Espindola 88d9e37ec8 Use a MutableArrayRef. NFC.
llvm-svn: 305968
2017-06-21 23:06:53 +00:00
Davide Italiano 9b8e3d308f [Solaris] emit .init_array instead of .ctors on Solaris (Sparc/x86)
Patch by Fedor Sergeev.

Differential Revision:  https://reviews.llvm.org/D33868

llvm-svn: 305948
2017-06-21 20:36:32 +00:00
Sanjay Patel cec6a500a8 [x86] fix formatting; NFC
llvm-svn: 305914
2017-06-21 14:27:11 +00:00
Sanjay Patel 0656629b87 [x86] enable CGP memcmp() expansion for 2/4/8 byte sizes
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)

llvm-svn: 305802
2017-06-20 15:58:30 +00:00
Simon Pilgrim 4822b5b649 [X86][SSE] Relax 0/-1 vector element insertion to work for any vector with >=16bit elements
Shuffle lowering/combining now does a good job for 256/512-bit vectors - we don't need to prevent this

llvm-svn: 305801
2017-06-20 15:19:02 +00:00
Simon Pilgrim b233c0a5d2 [X86][SSE] Dropped old INSERT_VECTOR_ELT lowering TODO
Target shuffle combining now supports the matching of INSERT_VECTOR_ELT/PINSRW/PINSRB for merging multiple insertions into shuffles/bitmasks.

llvm-svn: 305788
2017-06-20 10:33:34 +00:00
Igor Breger 0dee0f458e [GlobalISel][X86] fix compilation error ( -Werror=unused-function )
llvm-svn: 305786
2017-06-20 09:40:57 +00:00
Igor Breger 1dcd5e8dc8 [GlobalISel][X86] Get correct RegClass for given RegBank.
Summary:
In some cases RegClass depends on target feature. Hight (16-31) vector registers exist only if AVX512f available.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: t.p.northover, guyblank

Subscribers: guyblank, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33952

Conflicts:
	test/CodeGen/X86/GlobalISel/select-memop-scalar.mir

llvm-svn: 305784
2017-06-20 09:15:10 +00:00
Hans Wennborg ca69fc1cb7 Revert r304824 "Fix PR23384 (part 3 of 3)"
This seems to be interacting badly with ASan somehow, causing false reports of
heap-buffer overflows: PR33514.

> Summary:
> The patch makes instruction count the highest priority for
> LSR solution for X86 (previously registers had highest priority).
>
> Reviewers: qcolombet
>
> Differential Revision: http://reviews.llvm.org/D30562
>
> From: Evgeny Stupachenko <evstupac@gmail.com>

llvm-svn: 305720
2017-06-19 17:57:15 +00:00
Igor Breger bd2dedaa38 [GlobalISel][X86] Fold FI/G_GEP into LDR/STR instruction addressing mode.
Summary: Implement some of the simplest addressing modes.It should help to test ABI.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33888

llvm-svn: 305691
2017-06-19 13:12:57 +00:00
Florian Hahn 5f746c8e27 Recommit rL305677: [CodeGen] Add generic MacroFusion pass
Use llvm::make_unique to avoid ambiguity with MSVC.

This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.

Differential Revision: https://reviews.llvm.org/D34144

llvm-svn: 305690
2017-06-19 12:53:31 +00:00
Florian Hahn e16d3106f3 Revert r305677 [CodeGen] Add generic MacroFusion pass.
This causes Windows buildbot failures do an ambiguous call.

llvm-svn: 305681
2017-06-19 11:26:15 +00:00
Florian Hahn ee1b096f8a [CodeGen] Add generic MacroFusion pass.
Summary:
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.

Reviewers: craig.topper, evandro, t.p.northover, atrick, MatzeB

Reviewed By: MatzeB

Subscribers: atrick, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34144

llvm-svn: 305677
2017-06-19 10:51:38 +00:00
Simon Pilgrim 07cfc80186 Remove trailing whitespace. NFCI.
llvm-svn: 305476
2017-06-15 16:20:27 +00:00
Simon Pilgrim 4d432b2c6b [X86][AVX2] Fix issue in lowerV8I16GeneralSingleInputVectorShuffle that was assuming v8i16 vectors
We can use this with v16i16/v32i16 as well.

Found during fuzz testing.

llvm-svn: 305472
2017-06-15 14:52:30 +00:00
Simon Pilgrim b98cb3808c Revert r305465: [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
This is causing windows buildbot failures

llvm-svn: 305470
2017-06-15 14:39:34 +00:00
Ayman Musa 56912cda71 [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.

When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.

Differential Revision: https://reviews.llvm.org/D33188

llvm-svn: 305465
2017-06-15 13:02:37 +00:00
Sanjay Patel 83cb007940 [x86] avoid unnecessary shuffle mask math in combineX86ShufflesRecursively()
This is a follow-up to https://reviews.llvm.org/D34174 / https://reviews.llvm.org/rL305398.

We mentioned replacing the multiplies with shifts, but the real win seems to be in
bypassing the extra ops in the common case when the RootRatio and OpRatio are one.

This gives us another 1-2% overall win for the test in PR32037:
https://bugs.llvm.org/show_bug.cgi?id=32037

llvm-svn: 305414
2017-06-14 20:37:11 +00:00
Sanjay Patel ce0b99563a [x86] replace div/rem with shift/mask for better shuffle combining perf
We know that shuffle masks are power-of-2 sizes, but there's no way (?) for LLVM to know that, 
so hack combineX86ShufflesRecursively() to be much faster by replacing div/rem with shift/mask.

This makes the motivating compile-time bug in PR32037 ( https://bugs.llvm.org/show_bug.cgi?id=32037 ) 
about 9% faster overall.

Differential Revision: https://reviews.llvm.org/D34174

llvm-svn: 305398
2017-06-14 17:00:57 +00:00
Simon Pilgrim 9ff06a0c7e Strip UTF8 BOM that got added in rL305091
Seems my recent move to VS2017 has resulted in a few text editor issues.....

llvm-svn: 305285
2017-06-13 10:17:57 +00:00
Simon Pilgrim 2b3b717768 [X86][SSE] Refactor getTargetConstantBitsFromNode to avoid large APInts (PR32037)
Much of PR32037's compile time regression is due to getTargetConstantBitsFromNode always creating large (>64bit) APInts during the bitcasting from the source data to the destination bitwidth.

This commit avoids this bitcast stage if the data is already the correct bitwidth.

llvm-svn: 305284
2017-06-13 10:13:48 +00:00
Craig Topper 8b8767662c [AVX-512] Mark masked VPCMP instructions as commutable.
llvm-svn: 305276
2017-06-13 07:13:50 +00:00
Craig Topper e1d8103d8f [AVX-512] Mark masked version of vpcmpeq as being commutable.
llvm-svn: 305275
2017-06-13 07:13:47 +00:00
Craig Topper 42d0339257 [X86] Add masked integer compare instructions to load folding tables.
llvm-svn: 305274
2017-06-13 07:13:44 +00:00
Sanjay Patel d4765a38b4 [DAG] add helper to bind memop chains; NFCI
This step is just intended to reduce code duplication rather than change any functionality.

A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper.

Differential Revision: https://reviews.llvm.org/D33649

llvm-svn: 305192
2017-06-12 14:41:48 +00:00
Simon Pilgrim b079c8b35b [X86][SSE] Change memop fragment to inherit from vec128load with local alignment controls
First possible step towards merging SSE/AVX memory folding pattern fragments.

Also allows us to remove the duplicate non-temporal load logic.

Differential Revision: https://reviews.llvm.org/D33902

llvm-svn: 305184
2017-06-12 10:01:27 +00:00
Craig Topper 69fead95c7 [AVX-512] Add VPCONFLICT and VPLZCNT to load folding tables.
llvm-svn: 305180
2017-06-12 04:57:31 +00:00
Sanjay Patel dcbfbb11d9 [x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load
I was looking closer at the x86 test diffs in D33866, and the first change seems like it 
shouldn't happen in the first place. So this patch will resolve that.

Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for 
any given CPU model, so we should be able to interchange those without affecting perf. 
But as we can see in some of the diffs here, using vperm2f128 allows load folding, so 
we should take that opportunity to reduce code size and register pressure.

A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128 
was introduced with AVX1, we should be selecting it in all of the same situations that we 
would with AVX2. If there's some reason that an AVX1 CPU would not want to use this 
instruction, that should be fixed up in a later pass.

Differential Revision: https://reviews.llvm.org/D33938

llvm-svn: 305171
2017-06-11 21:18:58 +00:00
Simon Pilgrim 3d37b1a277 [X86][SSE] Add support for PACKSS nodes to faux shuffle extraction
If the inputs won't saturate during packing then we can treat the PACKSS as a truncation shuffle

llvm-svn: 305091
2017-06-09 17:29:52 +00:00
Andrew V. Tischenko 8cb1d0931f Add scheduler classes to integer/float horizontal operations.
This patch will close PR32801.
Differential Revision: https://reviews.llvm.org/D33203

llvm-svn: 304986
2017-06-08 16:44:13 +00:00
Andrew V. Tischenko e0531025f8 This patch closes PR28513: an optimization of multiplication by different constants.
The initial patch was rejected: I fixed the issue and re-apply it.

llvm-svn: 304972
2017-06-08 10:20:13 +00:00
Sanjay Patel 6e8e7cc70e [x86] avoid flipping sign bits for vector icmp by using known bits
If we know that both operands of an unsigned integer vector comparison are non-negative, 
then it's safe to directly use a signed-compare-greater-than instruction (the only non-equality
integer vector compare predicate provided by SSE/AVX).

We're intentionally not changing the condition code to signed in order to preserve the
existing transforms that use min/max/psubus below here.

This should solve PR33276:
https://bugs.llvm.org/show_bug.cgi?id=33276

Differential Revision: https://reviews.llvm.org/D33862

llvm-svn: 304909
2017-06-07 13:46:34 +00:00
Simon Pilgrim 58f5be2771 [X86][SSE] Fix an issue with PEXTRW/PEXTRB indices during shuffle combining
We were checking that the index was in range of the destination vector type, not the (larger) source vector type

llvm-svn: 304894
2017-06-07 10:30:35 +00:00
Zachary Turner 264b5d9e88 Move Object format code to lib/BinaryFormat.
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.

Differential Revision: https://reviews.llvm.org/D33843

llvm-svn: 304864
2017-06-07 03:48:56 +00:00
Evgeny Stupachenko 3b88291581 Fix PR23384 (part 3 of 3)
Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304824
2017-06-06 20:04:16 +00:00
Anna Thomas b2a212c070 [Atomics][LoopIdiom] Recognize unordered atomic memcpy
Summary:
Expanding the loop idiom test for memcpy to also recognize
unordered atomic memcpy. The only difference for recognizing
an unordered atomic memcpy and instead of a normal memcpy is
that the loads and/or stores involved are unordered atomic operations.

Background:  http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

Patch by Daniel Neilson!

Reviewers: reames, anna, skatkov

Reviewed By: reames, anna

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33243

llvm-svn: 304806
2017-06-06 16:45:25 +00:00
Simon Pilgrim f7113fd270 [X86][AVX1] Split 256-bit vector non-temporal FastISel loads to keep it non-temporal (PR32744)
Extension to D33728

llvm-svn: 304798
2017-06-06 14:18:39 +00:00
Chandler Carruth 6bda14b313 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Chandler Carruth 41ed4034dd [x86] Revert the X86FoldTablesEmitter due to more miscompiles.
In testing, we've found yet another miscompile caused by the new tables.
And this one is even less clear how to fix (we could teach it to fold
a 16-bit load instead of the 32-bit load it wants, or block folding
entirely).

Also, the approach to excluding instructions seems increasingly to not
scale well.

I have left a more detailed analysis on the review log for the original
patch (https://reviews.llvm.org/D32684) along with suggested path
forward. I will land an additional test case that I wrote which covers
the code that was miscompiling (folding into the output of `pextrw`) in
a subsequent commit to keep this a pure revert.

For each commit reverted here, I've restricted the revert to the
non-test code touching the x86 fold table emission until the last commit
where I did revert the test updates. This means the *new* test cases
added for `insertps` and `xchg` remain untouched (and continue to pass).

Reverted commits:
r304540: [X86] Don't fold into memory operands into insertps in the ...
r304347: [TableGen] Adapt more places to getValueAsString now ...
r304163: [X86] Don't fold away the memory operand of an xchg.
r304123: Don't capture a temporary std::string in a StringRef.
r304122: Resubmit "[X86] Adding new LLVM TableGen backend that ..."

Original commit was in r304088, and after a string of fixes was reverted
previously in r304121 to fix build bots, and then re-landed in r304122.

llvm-svn: 304762
2017-06-06 02:15:31 +00:00
Simon Pilgrim 807b708d13 [X86][SSE41] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)
Missed SSE41 non-temporal load case in previous commit

Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304722
2017-06-05 16:45:32 +00:00
Simon Pilgrim b2ef948628 [X86][AVX1] Split 256-bit vector non-temporal loads to keep it non-temporal (PR32744)
Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304718
2017-06-05 16:02:01 +00:00
Simon Pilgrim a25bf0b6b9 [X86][SSE] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)
Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304717
2017-06-05 15:43:03 +00:00
Simon Pilgrim 46dd55f1e1 [X86][SSE] Change BUILD_VECTOR interleaving ordering to improve coalescing/combine opportunities
We currently generate BUILD_VECTOR as a tree of UNPCKL shuffles of the same type:

e.g. for v4f32:

Step 1: unpcklps 0, 2 ==> X: <?, ?, 2, 0>
      : unpcklps 1, 3 ==> Y: <?, ?, 3, 1>
Step 2: unpcklps X, Y ==>    <3, 2, 1, 0>

The issue is because we are not placing sequential vector elements together early enough, we fail to recognise many combinable patterns - consecutive scalar loads, extractions etc.

Instead, this patch unpacks progressively larger sequential vector elements together:

e.g. for v4f32:

Step 1: unpcklps 0, 2 ==> X: <?, ?, 1, 0>
      : unpcklps 1, 3 ==> Y: <?, ?, 3, 2>
Step 2: unpcklpd X, Y ==>    <3, 2, 1, 0>

This does mean that we are creating UNPCKL shuffle of different value types, but the relevant combines that benefit from this are quite capable of handling the additional BITCASTs that are now included in the shuffle tree.

Differential Revision: https://reviews.llvm.org/D33864

llvm-svn: 304688
2017-06-04 20:12:04 +00:00
Simon Pilgrim f93debb40c [X86][SSE] Add SCALAR_TO_VECTOR(PEXTRW/PEXTRB) support to faux shuffle combining
Generalized existing SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT) code to support AssertZext + PEXTRW/PEXTRB cases as well. 

llvm-svn: 304659
2017-06-03 11:12:57 +00:00
Sanjay Patel e737cf8500 [x86] simplify code for vector icmp pred transforms; NFCI
Organizing by transform is smaller and easier to read than a squashed switch with fall-throughs.

llvm-svn: 304611
2017-06-02 23:21:53 +00:00
Ahmed Bougacha 018a68f9e4 [X86] Correctly broadcast NaN-like integers as float on AVX.
Since r288804, we try to lower build_vectors on AVX using broadcasts of
float/double.  However, when we broadcast integer values that happen to
have a NaN float bitpattern, we lose the NaN payload, thereby changing
the integer value being broadcast.

This is caused by ConstantFP::get, to which we pass the splat i32 as
a float (by bitcasting it using bitsToFloat).  ConstantFP::get takes
a double parameter, so we end up lossily converting a single-precision
NaN to double-precision.

Instead, avoid any kinds of conversions by directly building an APFloat
from the splatted APInt.

Note that this also fixes another piece of code (broadcast of
subvectors), that currently isn't susceptible to the same problem.

Also note that we could really just use APInt and ConstantInt
throughout: the constant pool type doesn't matter much.  Still, for
consistency, use the appropriate type.

llvm-svn: 304590
2017-06-02 20:02:59 +00:00
Sanjay Patel 469014ada4 [x86] fix formatting; NFCI
llvm-svn: 304576
2017-06-02 18:14:31 +00:00
David Blaikie b6b42e018a Tidy up a bit of r304516, use SmallVector::assign rather than for loop
This might give a few better opportunities to optimize these to memcpy
rather than loops - also a few minor cleanups (StringRef-izing,
templating (to avoid std::function indirection), etc).

The SmallVector::assign(iter, iter) could be improved with the use of
SFINAE, but the (iter, iter) ctor and append(iter, iter) need it to and
don't have it - so, workaround it for now rather than bothering with the
added complexity.

(also, as noted in the added FIXME, these assign ops could potentially
be optimized better at least for non-trivially-copyable types)

llvm-svn: 304566
2017-06-02 17:24:26 +00:00
Amaury Sechet 2adb7bdbca Remove ADDC, ADDE, SUBC, SUBE and SETCCE support from the X86 backend, use the CARRY ops instead.
Summary:
As per title. This cleanup some technical debt.

Depends on D33374

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33390

llvm-svn: 304435
2017-06-01 16:33:08 +00:00
Zvi Rackover 7693733e80 [X86] Match bitcast of vxi1 to pmovmsk
Summary:
Add an early combine to match patterns such as:
  (i16 bitcast (v16i1 x))
  ->
  (i16 movmsk (v16i8 sext (v16i1 x)))

This combine needs to happen early enough before
type-legalization scalarizes the result of the setcc.

Reviewers: igorb, craig.topper, RKSimon

Subscribers: delena, llvm-commits

Differential Revision: https://reviews.llvm.org/D33311

llvm-svn: 304406
2017-06-01 11:27:57 +00:00
Amaury Sechet 251ea8a4f8 Do not legalize large setcc with setcce, introduce setcccarry and do it with usubo/setcccarry.
Summary:
This is a continuation of the work started in D29872 . Passing the carry down as a value rather than as a glue allows for further optimizations. Introducing setcccarry makes the use of addc/subc unecessary and we can start the removal process.

This patch only introduce the optimization strictly required to get the same level of optimization as was available before nothing more.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33374

llvm-svn: 304404
2017-06-01 11:14:17 +00:00
Amaury Sechet 6506a90a70 Remove ISD::SETCC match from combineX86ADD. It's done improperly and doesn't work.
llvm-svn: 304403
2017-06-01 11:13:10 +00:00
Dehao Chen 6b737ddce7 Add LiveRangeShrink pass to shrink live range within BB.
Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB.

Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb

Reviewed By: MatzeB, andreadb

Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32563

llvm-svn: 304371
2017-05-31 23:25:25 +00:00
Galina Kistanova c752c4bf56 Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304355
2017-05-31 21:50:45 +00:00
Matthias Braun ac4beccaca X86FloatingPoint: Fix livein lists
After transforming FP to ST registers:
- Do not add the ST register to the livein lists, they are reserved so
  we do not need to track their liveness.
- Remove the FP registers from the livein lists, they don't have defs or
  uses anymore and so are not live.
- (The setKillFlags() call is moved to an earlier place as it relies on
   the FP registers still being present in the livein list.)

llvm-svn: 304342
2017-05-31 20:30:22 +00:00
Matthias Braun 43692a2245 X86FloatingPoint: Add some static assert, cleanup; NFC
llvm-svn: 304341
2017-05-31 20:30:17 +00:00
Galina Kistanova b2c0116e71 Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304332
2017-05-31 19:41:33 +00:00
Matthias Braun d6a36ae282 TargetMachine: Indicate whether machine verifier passes.
This adds a callback to the LLVMTargetMachine that lets target indicate
that they do not pass the machine verifier checks in all cases yet.

This is intended to be a temporary measure while the targets are fixed
allowing us to enable the machine verifier by default with
EXPENSIVE_CHECKS enabled!

Differential Revision: https://reviews.llvm.org/D33696

llvm-svn: 304320
2017-05-31 18:41:23 +00:00
Galina Kistanova 6ad77845e2 Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304312
2017-05-31 17:10:03 +00:00
Matthias Braun bcd4c68233 X86FrameLowering: No need to mark FP as live-in everywhere
The frame pointer (when used as frame pointer) is a reserved register.
We do not track liveness of reserved registers and hence do not need to
add them to the basic block livein lists.

llvm-svn: 304274
2017-05-31 02:11:10 +00:00
Matthias Braun 5e394c3d6f TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFC
TargetPassConfig is not useful for targets that do not use the CodeGen
library, so we may just as well store a pointer to an
LLVMTargetMachine instead of just to a TargetMachine.

While at it, also change the constructor to take a reference instead of a
pointer as the TM must not be nullptr.

llvm-svn: 304247
2017-05-30 21:36:41 +00:00
Vedant Kumar 87aefe9042 Revert "This patch closes PR28513: an optimization of multiplication by different constants. It's implemented on DAG combiner level."
This reverts commit r304209.

I think this change is responsible for a tablgen failure in stage2 builds:

  http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/2171/

I reproduced the failure locally (without ThinLTO), reverted the commit, rebuilt the stage1 clang, rebuilt the stage2 llvm-tblgen tool, and found that the crash disappears when the commit is reverted. Here is the stack trace:

FAILED: lib/Target/ARM/ARMGenRegisterBank.inc.tmp
cd /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM && /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes
/Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp
0  llvm-tblgen              0x0000000106fc9568 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40
1  llvm-tblgen              0x0000000106fc9be6 SignalHandler(int) + 422
2  libsystem_platform.dylib 0x00000001076a7fba _sigtramp + 26
3  libsystem_platform.dylib 0x00007fff58deb468 _sigtramp + 1366570184
4  llvm-tblgen              0x0000000106e89cc7 llvm::CodeGenRegBank::getCompositeSubRegIndex(llvm::CodeGenSubRegIndex*, llvm::CodeGenSubRegIndex*) + 615
5  llvm-tblgen              0x0000000106e88be6 llvm::CodeGenRegister::computeSubRegs(llvm::CodeGenRegBank&) + 2182
6  llvm-tblgen              0x0000000106e8e9f0 llvm::CodeGenRegBank::CodeGenRegBank(llvm::RecordKeeper&) + 2192
7  llvm-tblgen              0x0000000106f384a1 llvm::EmitRegisterBank(llvm::RecordKeeper&, llvm::raw_ostream&) + 65
8  llvm-tblgen              0x0000000106f72c64 (anonymous namespace)::LLVMTableGenMain(llvm::raw_ostream&, llvm::RecordKeeper&) + 1172
9  llvm-tblgen              0x0000000106fcb15f llvm::TableGenMain(char*, bool (*)(llvm::raw_ostream&, llvm::RecordKeeper&)) + 3599
10 llvm-tblgen              0x0000000106f727a6 main + 134
11 libdyld.dylib            0x000000010733c6a5 start + 1
Stack dump:
0.      Program arguments: /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp
/bin/sh: line 1: 41986 Segmentation fault: 11  /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz
-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp

llvm-svn: 304231
2017-05-30 19:25:22 +00:00
Craig Topper f6d4dc5b4a [SelectionDAG] Set ISD::FPOWI to Expand by default
Summary:
Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie".

This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default.

Reviewers: spatel, RKSimon, efriedma

Reviewed By: RKSimon

Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits

Differential Revision: https://reviews.llvm.org/D33530

llvm-svn: 304215
2017-05-30 15:27:55 +00:00
Andrew V. Tischenko 8b04826663 This patch closes PR28513: an optimization of multiplication by different constants.
It's implemented on DAG combiner level.

llvm-svn: 304209
2017-05-30 13:00:44 +00:00
Zachary Turner df1832cf86 Resubmit "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables."
This was reverted due to buildbot breakages and I was not familiar
with this code to investigate it.  But while trying to get a
useful backtrace for the author, it turns out the fix was very
obvious.  Resubmitting this patch as is, and will submit the
fix in a followup so that the fix is not hidden in the larger
CL.

llvm-svn: 304122
2017-05-29 02:19:37 +00:00
Zachary Turner 5b199be769 Revert "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables."
This reverts commit 28cb1003507f287726f43c771024a1dc102c45fe as well
as all subsequent followups.  llvm-tblgen currently segfaults with
this change, and it seems it has been broken on the bots all
day with no fixes in preparation.  See, for example:

http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/

llvm-svn: 304121
2017-05-29 01:48:53 +00:00
Ayman Musa d9f1fe43a8 [X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables.
X86 backend holds huge tables in order to map between the register and memory forms of each instruction.
This TableGen Backend automatically generated all these tables with the appropriate flags for each entry.

Differential Revision: https://reviews.llvm.org/D32684

llvm-svn: 304088
2017-05-28 12:55:36 +00:00
Ayman Musa 0b4f97d5e9 [X86] Adding FoldGenRegForm helper field (for memory folding tables tableGen backend) to X86Inst class and set its value for the relevant instructions.
Some register-register instructions can be encoded in 2 different ways, this happens when 2 register operands can be folded (separately). 
For example if we look at the MOV8rr and MOV8rr_REV, both instructions perform exactly the same operation, but are encoded differently. Here is the relevant information about these instructions from Intel's 64-ia-32-architectures-software-developer-manual:

Opcode  Instruction  Op/En  64-Bit Mode  Compat/Leg Mode  Description
8A /r   MOV r8,r/m8  RM     Valid        Valid            Move r/m8 to r8.
88 /r   MOV r/m8,r8  MR     Valid        Valid            Move r8 to r/m8.
Here we can see that in order to enable the folding of the output and input registers, we had to define 2 "encodings", and as a result we got 2 move 8-bit register-register instructions.

In the X86 backend, we define both of these instructions, usually one has a regular name (MOV8rr) while the other has "_REV" suffix (MOV8rr_REV), must be marked with isCodeGenOnly flag and is not emitted from CodeGen.

Automatically generating the memory folding tables relies on matching encodings of instructions, but in these cases where we want to map both memory forms of the mov 8-bit (MOV8rm & MOV8mr) to MOV8rr (not to MOV8rr_REV) we have to somehow point from the MOV8rr_REV to the "regular" appropriate instruction which in this case is MOV8rr.

This field enable this "pointing" mechanism - which is used in the TableGen backend for generating memory folding tables.

Differential Revision: https://reviews.llvm.org/D32683

llvm-svn: 304087
2017-05-28 12:39:37 +00:00
Matthias Braun ac4307c41e LivePhysRegs: Rework constructor + documentation; NFC
- Take reference instead of pointer to a TRI that cannot be nullptr.
- Improve documentation comments.

llvm-svn: 304038
2017-05-26 21:51:00 +00:00
Andrew V. Tischenko fdb264e263 The fix for PR22004: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands."
llvm-svn: 303985
2017-05-26 13:23:34 +00:00
Oren Ben Simhon 7bf27f03f2 [X86] Adding vpopcntd and vpopcntq instructions
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).

Differential Revision: https://reviews.llvm.org/D33169

llvm-svn: 303858
2017-05-25 13:45:23 +00:00
Simon Pilgrim 9f46d1d479 Strip trailing whitespace. NFCI.
llvm-svn: 303736
2017-05-24 11:02:27 +00:00
Igor Breger 617be6e475 [GlobalISel][X86] G_LOAD/G_STORE vec256/512 support
Summary: mark G_LOAD/G_STORE vec256/512 legal for AVX/AVX512. Implement instruction selection.

Reviewers: zvi, guyblank

Reviewed By: zvi

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33268

llvm-svn: 303617
2017-05-23 08:23:51 +00:00
Igor Breger 014fc566e7 [GlobalISel][X86] Fix G_TRUNC instruction selection.
Updated tests with -verify-machineinstrs flag.
It fixes 3 tests failed with machine verifier enabled and listed
in PR27481

llvm-svn: 303502
2017-05-21 11:13:56 +00:00
Guy Blank 548e22a1a7 [X86][AVX512] Make i1 illegal in the CodeGen
This patch defines the i1 type as illegal in the X86 backend for AVX512.
For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended.
This should produce better scalar code for i1 types since GPRs will be used instead of mask registers.

Differential Revision: https://reviews.llvm.org/D32273

llvm-svn: 303421
2017-05-19 12:35:15 +00:00
Daniel Sanders a1b2db7919 [globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to per-function predicates.
Summary:
This causes them to be re-computed more often than necessary but resolves
objections that were raised post-commit on r301750.

Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls

Reviewed By: qcolombet

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32861

llvm-svn: 303418
2017-05-19 11:08:33 +00:00
Hans Wennborg b00ffd8cb7 Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB."
This also reverts follow-ups r303292 and r303298.

It broke some Chromium tests under MSan, and apparently also internal
tests at Google.

llvm-svn: 303369
2017-05-18 18:50:05 +00:00
Francis Visoiu Mistrih 8b61764cbb [LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handling a null TargetMachine call
  `getAnalysisIfAvailable<TargetPassConfig>`.

* Passes not handling a null TargetMachine
  `addRequired<TargetPassConfig>` and call
  `getAnalysis<TargetPassConfig>`.

* MachineFunctionPasses now use MF.getTarget().

* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.

This fixes a crash when running `llc -start-before prologepilog`.

PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.

Related to PR30324.

Differential Revision: https://reviews.llvm.org/D33222

llvm-svn: 303360
2017-05-18 17:21:13 +00:00
Igor Breger 842b5b36ba [GlobalISel][X86] G_ADD/G_SUB vector legalizer/selector support.
Summary: G_ADD/G_SUB vector legalizer/selector support.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33232

llvm-svn: 303345
2017-05-18 11:10:56 +00:00
Simon Pilgrim 6bba6068be [X86][AVX512] Add 512-bit vector ctpop costs + tests
llvm-svn: 303342
2017-05-18 10:42:34 +00:00
Lama Saba 2ea271b54a [X86] Replace slow LEA instructions in X86
According to Intel's Optimization Reference Manual for SNB+:
  " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
    dispatch via port 1:
  - LEA that has all three source operands: base, index, and offset
  - LEA that uses base and index registers where the base is EBP, RBP,or R13
  - LEA that uses RIP relative addressing mode
  - LEA that uses 16-bit addressing mode "
  This patch currently handles the first 2 cases only.
 
Differential Revision: https://reviews.llvm.org/D32277

llvm-svn: 303333
2017-05-18 08:11:50 +00:00
Davide Italiano 9ae69a75ec [Target/X86] Remove unneeded return. NFCI.
llvm-svn: 303323
2017-05-18 02:36:42 +00:00
Michael Liao ab12984634 Fix PR33028
- '-verify-mahcineinstrs' starts to complain allocatable live-in physical
  registers on non-entry or non-landing-pad basic blocks.
- Refactor the XBEGIN translation to define EAX on a dedicated fallback code
  path due to XABORT. Add a pseudo instruction to define EAX explicitly to
  avoid add physical register live-in.

Differential Revision: https://reviews.llvm.org/D33168

llvm-svn: 303306
2017-05-17 21:48:00 +00:00
Simon Pilgrim 23ef26728a [X86][AVX512] Add 512-bit vector ctlz costs + tests
llvm-svn: 303300
2017-05-17 21:02:18 +00:00
Simon Pilgrim d0365967c4 [X86][AVX512] Add 512-bit vector cttz costs + tests
llvm-svn: 303293
2017-05-17 20:22:54 +00:00
Dehao Chen 02828a93e8 Only enable LiveRangeShrink for x86.
Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure.

Reviewers: MatzeB, qcolombet

Reviewed By: qcolombet

Subscribers: jholewinski, jyknight, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33294

llvm-svn: 303292
2017-05-17 20:18:13 +00:00
Simon Pilgrim a9a92a1a6a [X86][AVX512] Add 512-bit vector bitreverse costs + tests
llvm-svn: 303283
2017-05-17 19:20:20 +00:00
Igor Breger 28f290fab8 [GlobalISel][X86] Support add i64 in IA32.
Summary: support G_UADDE instruction selection.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D33096

llvm-svn: 303255
2017-05-17 12:48:08 +00:00
Reid Kleckner 0ad69fc89f Revert "[X86] Replace slow LEA instructions in X86"
This reverts commit r303183, it broke various buildbots and introduced
sanitizer errors.

llvm-svn: 303199
2017-05-16 19:55:03 +00:00
Lama Saba 52e892577d [X86] Replace slow LEA instructions in X86
According to Intel's Optimization Reference Manual for SNB+:
  " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
    dispatch via port 1:
  - LEA that has all three source operands: base, index, and offset
  - LEA that uses base and index registers where the base is EBP, RBP,or R13
  - LEA that uses RIP relative addressing mode
  - LEA that uses 16-bit addressing mode "
  This patch currently handles the first 2 cases only.
 
Differential Revision: https://reviews.llvm.org/D32277

llvm-svn: 303183
2017-05-16 16:01:36 +00:00
Peter Collingbourne 6f0ecca3b5 IR: Give function GlobalValue::getRealLinkageName() a less misleading name: dropLLVMManglingEscape().
This function gives the wrong answer on some non-ELF platforms in some
cases. The function that does the right thing lives in Mangler.h. To try to
discourage people from using this function, give it a different name.

Differential Revision: https://reviews.llvm.org/D33162

llvm-svn: 303134
2017-05-16 00:39:01 +00:00
Zvi Rackover e6b278bc65 [X86] Utilize SelectionDAG::getSelect(). NFC.
Replace SelectionDAG::getNode(ISD::SELECT, ...)
and SelectionDAG::getNode(ISD::VSELECT, ...)
with SelectionDAG::getSelect(...)
Saves a few lines of code and in some cases saves the need to explicitly
check the type of the desired node.

llvm-svn: 303024
2017-05-14 21:30:38 +00:00
Simon Pilgrim d0ef9d8e93 [X86][AVX1] Account for cost of extract/insert of 256-bit shifts
llvm-svn: 303023
2017-05-14 20:52:11 +00:00
Simon Pilgrim f96b4ab92d [X86][AVX2] Fix costs for v4i64 ashr by splat
llvm-svn: 303022
2017-05-14 20:25:42 +00:00
Simon Pilgrim de4467b182 [X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splat
llvm-svn: 303021
2017-05-14 20:02:34 +00:00
Craig Topper ceea1a76a1 [X86] Remove unused value from IntrinsicType enum. NFC
llvm-svn: 303018
2017-05-14 19:38:06 +00:00
Simon Pilgrim d3f0d03cc5 [X86][AVX1] Account for cost of extract/insert of 256-bit SDIV/UDIV by mul sequences
llvm-svn: 303017
2017-05-14 18:52:15 +00:00
Simon Pilgrim 5bef9c627e [X86][XOP] XOP's general v16i8 shifts will be used instead of v8i16 shift + mask.
Tweak cost model to match what lowering actually does.

llvm-svn: 303013
2017-05-14 17:59:46 +00:00
Simon Pilgrim aa8dffb69b [X86][SSE] Account for cost of extract/insert of v32i8 vector shifts
llvm-svn: 303012
2017-05-14 17:36:07 +00:00
Simon Pilgrim 4599eaa09a [X86][XOP] Account for cost of extract/insert of 256-bit vector shifts
llvm-svn: 303010
2017-05-14 13:38:53 +00:00
Simon Pilgrim f3ee9c6997 [X86][AVX] Allow 32-bit targets to peek through subvectors to extract constant splats for vXi64 shifts.
llvm-svn: 303009
2017-05-14 11:46:26 +00:00
Simon Pilgrim ef46c2762a [x86, SSE] AVX1 PR28129 (256-bit all-ones rematerialization)
Further perf tests on Jaguar indicate that:

vxorps  %ymm0, %ymm0, %ymm0
vcmpps  $15, %ymm0, %ymm0, %ymm0

is consistently faster (by about 9%) than:

vpcmpeqd  %xmm0, %xmm0, %xmm0
vinsertf128  $1, %xmm0, %ymm0, %ymm0

Testing equivalent code on a SandyBridge (E5-2640) puts it slightly (~3%) faster as well.

Committed on behalf of @dtemirbulatov

Differential Revision: https://reviews.llvm.org/D32416

llvm-svn: 302989
2017-05-13 13:42:35 +00:00
Simon Pilgrim b146e61828 Strip trailing whitespace. NFCI.
llvm-svn: 302927
2017-05-12 17:42:36 +00:00
Craig Topper 8df66c602a [KnownBits] Add bit counting methods to KnownBits struct and use them where possible
This patch adds min/max population count, leading/trailing zero/one bit counting methods.

The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.

Differential Revision: https://reviews.llvm.org/D32931

llvm-svn: 302925
2017-05-12 17:20:30 +00:00
Simon Pilgrim 7f03231cc6 Use SDValue::getOperand() helper. NFCI.
llvm-svn: 302894
2017-05-12 13:08:45 +00:00
Reid Kleckner 43bbeb4c9f Issue diagnostics when returning FP values on x86_64 without SSE1/2
Avoid using report_fatal_error, because it will ask the user to file a
bug. If the user attempts to disable SSE on x86_64 and them use floating
point, that's a bug in their code, not a bug in the compiler.

This is just a start. There are other ways to crash the backend in this
configuration, but they should be updated to follow this pattern.

Differential Revision: https://reviews.llvm.org/D27522

llvm-svn: 302835
2017-05-11 22:43:02 +00:00
Igor Breger a44fc83d9f [GlobalISel][X86] Remove hand-written G_FADD/F_SUB selection.
Now it handle by TableGen.

llvm-svn: 302793
2017-05-11 12:15:03 +00:00
Chandler Carruth 97500a9918 [x86] Fix a failure to select with AVX-512 when the type legalizer
manages to form a VSELECT with a non-i1 element type condition. Those
are technically allowed in SDAG (at least, the generic type legalization
logic will form them and I wouldn't want to try to audit everything te
preclude forming them) so we need to be able to lower them.

This isn't too hard to implement. We mark VSELECT as custom so we get
a chance in C++, add a fast path for i1 conditions to get directly
handled by the patterns, and a fallback when we need to manually force
the condition to be an i1 that uses the vptestm instruction to turn
a non-mask into a mask.

This, unsurprisingly, generates awful code. But it at least doesn't
crash. This was actually impacting open source packages built with LLVM
for AVX-512 in the wild, so quickly landing a patch that at least stops
the immediate bleeding.

I think I've found where to fix the codegen quality issue, but less
confident of that change so separating it out from the thing that
doesn't change the result of any existing test case but causes mine to
not crash.

llvm-svn: 302785
2017-05-11 10:52:16 +00:00
Simon Pilgrim a4a13a0da0 Strip trailing whitespace. NFCI.
llvm-svn: 302784
2017-05-11 10:03:05 +00:00
Igor Breger c7b5977bb1 [GlobalISel][X86] G_ICMP support.
Summary: support G_ICMP for scalar types i8/i16/i64.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits, krytarowski

Differential Revision: https://reviews.llvm.org/D32995

llvm-svn: 302774
2017-05-11 07:17:40 +00:00
Igor Breger db75455990 [X86] Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp. NFC
Summary:
Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp so it can be used by GloabalIsel instruction selector.
This is a pre-commit for a patch I'm working on to support G_ICMP. NFC.

Reviewers: zvi, guyblank, delena

Reviewed By: guyblank, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33038

llvm-svn: 302767
2017-05-11 06:36:37 +00:00
Igor Breger fda31e64e0 [GlobalISel][X86] G_ZEXT i1 to i32/i64 support.
Summary: Support G_ZEXT i1 to i32/i64 instruction selection.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32965

llvm-svn: 302623
2017-05-10 06:52:58 +00:00
Serge Guelton e38003f839 Suppress all uses of LLVM_END_WITH_NULL. NFC.
Use variadic templates instead of relying on <cstdarg> + sentinel.
This enforces better type checking and makes code more readable.

Differential Revision: https://reviews.llvm.org/D32541

llvm-svn: 302571
2017-05-09 19:31:13 +00:00
Craig Topper f893d49f0c [X86] Add more patterns for BZHI isel
This patch adds more patterns that a reasonable person might write that can be compiled to BZHI.

This adds support for

(~0U >> (32 - b)) & a;

and

a << (32 - b) >> (32 - b);

This was inspired by the code in APInt::clearUnusedBits.

This can pass an index of 32 to the bzhi instruction which a quick test of Haswell hardware shows will not mask any bits. Though the description text in the Intel manual says the "index is saturated to OperandSize-1". The pseudocode in the same manual indicates no bits will be zeroed for this case.

I think this is still missing cases where the subtract portion is an 8-bit operation.

Differential Revision: https://reviews.llvm.org/D32616

llvm-svn: 302549
2017-05-09 16:32:11 +00:00
Guy Blank 0c42d8c35b VX512] Only look at lower bit in constant scalar masks
for scalar masked instructions only the lower bit of the mask is relevant. so for constant masks we should either do an unmasked operation or no operation, depending on the value of the lower bit.
This patch handles cases where the lower bit is '1'.

Differential Revision: https://reviews.llvm.org/D32805

llvm-svn: 302546
2017-05-09 16:16:48 +00:00
Serge Pavlov d526b13e61 Add extra operand to CALLSEQ_START to keep frame part set up previously
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to  CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.

This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.

The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
  affects all targets that use frame pseudo instructions and touched many
  files although the changes are uniform.
- Access to frame properties are implemented using special instructions
  rather than calls getOperand(N).getImm(). For X86 and ARM such
  replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
  instruction. These involve proper instruction initialization and
  methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
  frame parts initialized inside frame instruction pair and outside it.

The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.

Differential Revision: https://reviews.llvm.org/D32394

llvm-svn: 302527
2017-05-09 13:35:13 +00:00
Simon Pilgrim ca3a63a849 [X86][SSE42] Lower v2i64/v4i64 ASHR(X, 63) as PCMPGTQ(0, X)
Similar to what we do for vXi8 ASHR(X, 7), use SSE42's PCMPGTQ to splat the sign instead of using the PSRAD+PSHUFD.

Avoiding bitcasts this improves combines that utilize computeNumSignBits, permits memory folding and reduces pipe pressure. Although it does require a second register, given that this is a (cheap) zero register the impact is minimal.

Differential Revision: https://reviews.llvm.org/D32973

llvm-svn: 302525
2017-05-09 13:14:40 +00:00
Nikolai Bozhenov b7bf386e80 [X86] Clang option -fuse-init-array has no effect when generating for MCU target
Reviewers: Eugene.Zelenko, dschuff, craig.topper

Reviewed By: craig.topper

Subscribers: ahatanak, aaboud, DavidKreitzer, llvm-commits, cfe-commits

Differential Revision: https://reviews.llvm.org/D32543
Patch by AndreiGrischenko <andrei.l.grischenko@intel.com>

llvm-svn: 302513
2017-05-09 10:14:03 +00:00
Simon Pilgrim df39b03f29 [X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks.
Currently combineLogicBlendIntoPBLENDV can only match ASHR to detect sign splatting of a bit mask, this patch generalises this to use computeNumSignBits instead.

This is a first step in several things we can do to improve PBLENDV support:

 * Better matching of X86ISD::ANDNP patterns.
 * Handle floating point cases.
 * Better vector and bitcast support in computeNumSignBits.
 * Recognise that PBLENDV only uses the sign bit of the mask, we should be able strip away sign splats (ASHR, PCMPGT isNeg tests etc.).

Differential Revision: https://reviews.llvm.org/D32953

llvm-svn: 302424
2017-05-08 14:16:39 +00:00
Igor Breger 810c6257f1 [GlobalISel][X86] G_GEP selection support.
Summary: [GlobalISel][X86] G_GEP selection support.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32396

llvm-svn: 302412
2017-05-08 09:40:43 +00:00
Igor Breger 605b965ae5 [GlobalISel][X86] G_MUL legalizer/selector support.
Summary:
G_MUL legalizer/selector/regbank support.
Use only Tablegen-erated instruction selection.
This patch dealing with legal operations only.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: krytarowski, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D32698

llvm-svn: 302410
2017-05-08 09:03:37 +00:00
Dean Michael Berris 9bcaed867a [XRay] Custom event logging intrinsic
This patch introduces an LLVM intrinsic and a target opcode for custom event
logging in XRay. Initially, its use case will be to allow users of XRay to log
some type of string ("poor man's printf"). The target opcode compiles to a noop
sled large enough to enable calling through to a runtime-determined relative
function call. At runtime, when X-Ray is enabled, the sled is replaced by
compiler-rt with a trampoline to the logic for creating the custom log entries.

Future patches will implement the compiler-rt parts and clang-side support for
emitting the IR corresponding to this intrinsic.

Reviewers: timshen, dberris

Subscribers: igorb, pelikan, rSerge, timshen, echristo, dberris, llvm-commits

Differential Revision: https://reviews.llvm.org/D27503

llvm-svn: 302405
2017-05-08 05:45:21 +00:00
Simon Pilgrim 2d1c6d6e8d [X86][AVX1] Improve 256-bit vector costs for integer unary intrinsics.
Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets. 

llvm-svn: 302378
2017-05-07 20:58:55 +00:00
Simon Pilgrim 33f7397cc0 [X86][AVX512] Relax assertion and just exit combine for unsupported types (PR32907)
llvm-svn: 302361
2017-05-06 20:53:52 +00:00
Simon Pilgrim fea153f341 [X86][AVX512] Move v2i64/v4i64 VPABS lowering to tablegen
Extend NoVLX targets to use the 512-bit versions

llvm-svn: 302359
2017-05-06 19:11:59 +00:00
Simon Pilgrim f15a2f4d94 [X86] Reduce code for setting operations actions by merging into loops across multiple types/ops. NFCI.
llvm-svn: 302357
2017-05-06 18:17:56 +00:00
Simon Pilgrim 781cb10104 [X86][SSE] Break register dependencies on v16i8/v8i16 BUILD_VECTOR on SSE41
rL294581 broke unnecessary register dependencies on partial v16i8/v8i16 BUILD_VECTORs, but on SSE41 we (currently) use insertion for full BUILD_VECTORs as well. By allowing full insertion to occur on SSE41 targets we can break register dependencies here as well.

llvm-svn: 302355
2017-05-06 17:30:39 +00:00
Quentin Colombet 245994d968 [RegisterBankInfo] Uniquely allocate instruction mapping.
This is a step toward having statically allocated instruciton mapping.
We are going to tablegen them eventually, so let us reflect that in
the API.

NFC.

llvm-svn: 302316
2017-05-05 22:48:22 +00:00
Simon Pilgrim 430a335b7b [X86] Use SDValue::getConstantOperandVal helper. NFCI.
llvm-svn: 302286
2017-05-05 20:53:52 +00:00
Craig Topper f0aeee01c3 [KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits.
This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown.

Differential Revision: https://reviews.llvm.org/D32637

llvm-svn: 302262
2017-05-05 17:36:09 +00:00
Simon Pilgrim ac3c4b6da4 [X86][AVX512] Improve support and testing for CTLZ of 512-bit vectors without CDI
llvm-svn: 302233
2017-05-05 13:31:52 +00:00
Simon Pilgrim e9c5d7b70b [X86] Remove duplicate operation actions. NFCI.
llvm-svn: 302230
2017-05-05 12:34:55 +00:00
Simon Pilgrim c89aa0bee5 [X86][AVX512CDI] Move v2i64/v4i64 and v4i32/v8i32 VPLZCNT lowering to tablegen
Extend NoVLX targets to use the 512-bit versions

llvm-svn: 302229
2017-05-05 12:20:34 +00:00
Simon Pilgrim 73b88d5183 Remove unused variable
llvm-svn: 302226
2017-05-05 11:55:38 +00:00
Simon Pilgrim 1d47a15d89 [X86][AVX] Add LowerIntUnary helpers to split unary vector ops in half. NFCI.
Same as LowerIntArith helpers but for unary ops instead of binary.

llvm-svn: 302222
2017-05-05 10:59:24 +00:00
Andrew Ng 807ca72e66 [X86] Remove unused code from X86 optimize LEAs. NFC.
This patch removes unused code which is no longer required because of changes
to the DIExpression::prepend function.

llvm-svn: 302219
2017-05-05 09:21:35 +00:00
Daniel Jasper 07a1771959 Initialize new member X86Operand::FrontendSize in all codepaths.
This fixes MSAN-builds after r302179.

llvm-svn: 302214
2017-05-05 07:31:40 +00:00
Simon Pilgrim 11a1637a10 Strip trailing whitespace. NFCI.
llvm-svn: 302192
2017-05-04 20:55:16 +00:00
Reid Kleckner 6d2ea6ec80 [ms-inline-asm] Use the frontend size only for ambiguous instructions
This avoids problems on code like this:
  char buf[16];
  __asm {
    movups xmm0, [buf]
    mov [buf], eax
  }

The frontend size in this case (1) is wrong, and the register makes the
instruction matching unambiguous. There are also enough bytes available
that we shouldn't complain to the user that they are potentially using
an incorrectly sized instruction to access the variable.

Supersedes D32636 and D26586 and fixes PR28266

llvm-svn: 302179
2017-05-04 18:19:52 +00:00
Igor Breger 70583606b1 [X86][AVX-512] Allow EVEX encoded instruction selection when available for mul v8i32.
Differential Revision: https://reviews.llvm.org/D32679

llvm-svn: 302127
2017-05-04 07:34:58 +00:00
Oren Ben Simhon 51de0330eb [X86] Disabling PLT in Regcall CC Functions
According to psABI, PLT stub clobbers XMM8-XMM15.
In Regcall calling convention those registers are used for passing parameters. 
Thus we need to prevent lazy binding in Regcall.

Differential Revision: https://reviews.llvm.org/D32430

llvm-svn: 302124
2017-05-04 07:22:49 +00:00
Igor Breger c6eccdd5c0 [AVX] Fix vpcmpeqq predicate.
Summary:
Fix vpcmpeqq predicate. AVX512 version of vpcmpeqq is not equivalent to AVX one.
Split from https://reviews.llvm.org/D32679

Reviewers: craig.topper, zvi, aymanmus

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32786

llvm-svn: 302119
2017-05-04 06:24:52 +00:00
Reid Kleckner 5c0bdef5aa Mark functions as not having CFI once we finalize an x86 stack frame
We'll set it back to true in emitPrologue if it gets called. It doesn't
get called for naked functions.

Fixes PR32912

llvm-svn: 302092
2017-05-03 23:13:42 +00:00
Craig Topper d938fd1397 [KnownBits] Add zext, sext, and trunc methods to KnownBits
This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible.

Differential Revision: https://reviews.llvm.org/D32784

llvm-svn: 302088
2017-05-03 22:07:25 +00:00
Reid Kleckner a0b45f4bfc [IR] Abstract away ArgNo+1 attribute indexing as much as possible
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
  to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
  sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
  take attribute list indices.  Most of these were only used from
  BuildLibCalls, and doesNotAlias was only used to test or set if the
  return value is malloc-like.

I'm happy to split the patch, but I think they are probably easier to
review when taken together.

This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
  0: func attrs
  1: retattrs
  2...: arg attrs

Reviewers: chandlerc, pete, javed.absar

Subscribers: david2050, llvm-commits

Differential Revision: https://reviews.llvm.org/D32811

llvm-svn: 302060
2017-05-03 18:17:31 +00:00
Simon Pilgrim 03ccf91d85 [X86][LWP] Add stack folding mappings and tests for LWPINS/LWPVAL instructions
llvm-svn: 302049
2017-05-03 16:46:30 +00:00
Simon Pilgrim eada39d050 Silence a 'enum and non-enum used in conditional' warning.
llvm-svn: 302048
2017-05-03 16:43:57 +00:00
Simon Pilgrim 99b925bdf3 [X86][LWP] Add llvm support for LWP instructions (reapplied).
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).

Reapplied - this time without changing line endings of existing files.

Differential Revision: https://reviews.llvm.org/D32769

llvm-svn: 302041
2017-05-03 15:51:39 +00:00
Simon Pilgrim a271c54324 Revert rL302028 due to accidental line ending changes.
llvm-svn: 302038
2017-05-03 15:42:29 +00:00
Simon Pilgrim b2e0464fde [X86][LWP] Add llvm support for LWP instructions.
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).

Differential Revision: https://reviews.llvm.org/D32769

llvm-svn: 302028
2017-05-03 15:18:34 +00:00
Guy Blank d0baa524d0 [X86][AVX512] remove unnecessary case. NFC
VFPCLASS is for vector types and not scalar, so it cannot get here.

Differential Revision: https://reviews.llvm.org/D32694

llvm-svn: 302023
2017-05-03 13:34:05 +00:00
Oren Ben Simhon dbd4bba1ec [X86] Support of no_caller_saved_registers attribute
This patch implements the LLVM part for no_caller_saved_registers attribute as appears here: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5ed3cc7b66af4758f7849ed6f65f4365be8223be.
In order to implement the attribute, we use the dynamic CSR mechanism to remove returned/passed arguments from the function regmask/CSR list.

Differential Revision: https://reviews.llvm.org/D31876

llvm-svn: 302020
2017-05-03 13:07:19 +00:00
Simon Pilgrim 05cfa83843 [X86] Refactored LowerINTRINSIC_W_CHAIN to use a switch statament. NFCI.
Pre-commit as requested in D32769.

llvm-svn: 302010
2017-05-03 10:40:18 +00:00
Simon Pilgrim 24d361f7bf [X86] Tidyup subvector insert/extract helpers. NFCI.
Use getConstantOperandVal where possible.

llvm-svn: 301912
2017-05-02 11:08:15 +00:00
Simon Pilgrim 7aca5218b0 Fix typo in comment. NFCI.
llvm-svn: 301911
2017-05-02 10:43:33 +00:00
Simon Pilgrim 8d196c88a6 [X86] Reduce code for setting operations actions by merging into loops across multiple types/ops. NFCI.
llvm-svn: 301879
2017-05-01 23:09:01 +00:00
Simon Pilgrim ab1a82764f [X86][AVX] Rename LowerVectorBroadcast to lowerBuildVectorAsBroadcast. NFCI.
Since the shuffle refactor, this is only used during BUILD_VECTOR lowering.

llvm-svn: 301834
2017-05-01 20:56:35 +00:00
Tim Northover 9bb6931c25 X86: initialize a few subtarget variables.
Otherwise an indeterminate value gets read, causing a bunch of UBSan failures.

llvm-svn: 301819
2017-05-01 17:50:15 +00:00
Amara Emerson d28f0cd448 Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.
This removes BinaryWithFlagsSDNode, and flags are now all passed by value.

Differential Revision: https://reviews.llvm.org/D32527

llvm-svn: 301803
2017-05-01 15:17:51 +00:00
Igor Breger 2452ef0ea2 [GlobalISel][X86] Prioritize Tablegen-erated instruction selection. NFC
Summary:
Prioritizes Tablegen-erated instruction selection over C++ instruction selection.
Remove G_ADD/G_SUB C++ selection - implemented by Tablegen.

Reviewers: dsanders, zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32677

llvm-svn: 301792
2017-05-01 07:06:08 +00:00
Igor Breger c08a783521 [GlobalISel][X86] G_SEXT/G_ZEXT support.
Reviewers: zvi, guyblank

Reviewed By: zvi

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32591

llvm-svn: 301790
2017-05-01 06:30:16 +00:00
Igor Breger a9edb88d46 [GlobalISel][X86] G_LOAD/G_STORE pointer selection support.
Summary: [GlobalISel][X86] G_LOAD/G_STORE pointer selection support.

Reviewers: zvi, guyblank

Reviewed By: zvi, guyblank

Subscribers: dberris, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D32217

llvm-svn: 301788
2017-05-01 06:08:32 +00:00
Amaury Sechet 8ac81f3924 Do not legalize large add with addc/adde, introduce addcarry and do it with uaddo/addcarry
Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29872

llvm-svn: 301775
2017-04-30 19:24:09 +00:00
Craig Topper 778f57b4f1 [APInt] Replace calls to setBits with more specific calls to setBitsFrom and setLowBits where possible.
llvm-svn: 301768
2017-04-30 07:44:58 +00:00
Craig Topper d503644a4a [X86] Clear KnownBits instead of reconstructing it. NFC
llvm-svn: 301767
2017-04-30 07:44:55 +00:00
Daniel Sanders e9fdba39e0 [globalisel][tablegen] Compute available feature bits correctly.
Summary:
Predicate<> now has a field to indicate how often it must be recomputed.
Currently, there are two frequencies, per-module (RecomputePerFunction==0)
and per-function (RecomputePerFunction==1). Per-function predicates are
currently recomputed more frequently than necessary since the only predicate
in this category is cheap to test. Per-module predicates are now computed in
getSubtargetImpl() while per-function predicates are computed in selectImpl().

Tablegen now manages the PredicateBitset internally. It should only be
necessary to add the required includes.

Also fixed a problem revealed by the test case where
constrainSelectedInstRegOperands() would attempt to tie operands that
BuildMI had already tied.

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: rovka

Subscribers: kristof.beyls, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32491

llvm-svn: 301750
2017-04-29 17:30:09 +00:00
Matthias Braun 744c215e29 TargetLowering: Add finalizeLowering() function; NFC
Adds a new method finalizeLowering to TargetLoweringBase. This is in
preparation for an upcoming commit.

This function is meant for target specific adjustments to
MachineFrameInfo or register reservations.

Move the freezeRegisters() and the hasCopyImplyingStackAdjustment()
handling into the new function to prove the concept. As an added bonus
GlobalISel no longer missed the hasCopyImplyingStackAdjustment()
handling with this.

Differential Revision: https://reviews.llvm.org/D32621

llvm-svn: 301679
2017-04-28 20:25:05 +00:00
Reid Kleckner 6652a52e2b Use Argument::hasAttribute and AttributeList::ReturnIndex more
This eliminates many extra 'Idx' induction variables in loops over
arguments in CodeGen/ and Target/. It also reduces the number of places
where we assume that ReturnIndex is 0 and that we should add one to
argument numbers to get the corresponding attribute list index.

NFC

llvm-svn: 301666
2017-04-28 18:37:16 +00:00
Adrian Prantl 109b236850 Clean up DIExpression::prependDIExpr a little. (NFC)
llvm-svn: 301662
2017-04-28 17:51:05 +00:00
Simon Pilgrim cce5097ce4 Move variable local to where ita used. NFCI.
llvm-svn: 301646
2017-04-28 14:42:15 +00:00
Andrew Ng 03e35b6bc0 [DebugInfo][X86] Improve X86 Optimize LEAs handling of debug values.
This is a follow up to the fix in r298360 to improve the handling of debug
values when redundant LEAs are removed. The fix in r298360 effectively
discarded the debug values. This patch now attempts to preserve the debug
values by using the DWARF DW_OP_stack_value operation via prependDIExpr.

Moved functions appendOffset and prependDIExpr from Local.cpp to
DebugInfoMetadata.cpp and made them available as static member functions of
DIExpression.

Differential Revision: https://reviews.llvm.org/D31604

llvm-svn: 301630
2017-04-28 08:44:30 +00:00
Clement Courbet 5f0ab9e51d [X86][NFC] Refactor RepMovsRepeats in preparation for D32481.
Differential Revision: https://reviews.llvm.org/D32583

llvm-svn: 301628
2017-04-28 07:56:31 +00:00
Craig Topper d0af7e8ab8 [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.

This is largely a mechanical transformation from KnownZero to Known.Zero.

Differential Revision: https://reviews.llvm.org/D32569

llvm-svn: 301620
2017-04-28 05:31:46 +00:00
Craig Topper 0e03e74e95 [SelectionDAG] Use various APInt methods to reduce temporary APInt creation
This patch uses various APInt methods to reduce the number of temporary APInts. These were all found while working through converting SelectionDAG's computeKnownBits to also use the KnownBits struct recently added to the ValueTracking version.

llvm-svn: 301618
2017-04-28 04:57:59 +00:00
Craig Topper 24e71017aa [APInt] Use inplace shift methods where possible. NFCI
llvm-svn: 301612
2017-04-28 03:36:24 +00:00
Igor Breger 360d0f23ee [GlobalISel][X86] handle not symmetric G_COPY
Summary: handle not symmetric G_COPY

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32420

llvm-svn: 301523
2017-04-27 08:02:03 +00:00
Clement Courbet 7b0ec39494 [CodeGen][NFC] Rename 'Src' to 'Val'.
'Src' looks like it was borrowed from memcpy, 'Val' makes more sense for
memset and is consistent with naming within the function.

Differential Revision: https://reviews.llvm.org/D32580

llvm-svn: 301521
2017-04-27 07:22:30 +00:00
Ayman Musa 11966ab00b [X86] Add missing mayLoad/mayStore attributes to some X86 instructions (Continue)
Complete the patch committed in rL300190.

Differential Revision: https://reviews.llvm.org/D32287

llvm-svn: 301393
2017-04-26 11:34:09 +00:00
Andrew V. Tischenko c3c6723ab5 PR31007 and PR27884 will be closed: a possibility to compile constants like 0bH is now supported in MS asm.
llvm-svn: 301390
2017-04-26 09:56:59 +00:00
Ayman Musa d9fb157845 [X86][SSE2] Fix asm string for movq (Move Quadword) instruction.
Replace "mov{d|q}" with "movq".

Differential Revision: https://reviews.llvm.org/D32220

llvm-svn: 301386
2017-04-26 07:08:44 +00:00
Simon Pilgrim d68785803b [SelectionDAG] Added getBuildVector(ArrayRef<SDUse>) helper.
llvm-svn: 301322
2017-04-25 16:41:28 +00:00
Krzysztof Parzyszek c8e8e2a046 Move value type list from TargetRegisterClass to TargetRegisterInfo
Differential Revision: https://reviews.llvm.org/D31937

llvm-svn: 301234
2017-04-24 19:51:12 +00:00
Krzysztof Parzyszek 98ab4c64c4 Revert r301231: Accidentally committed stale files
I forgot to commit local changes before commit.

llvm-svn: 301232
2017-04-24 19:48:51 +00:00