Commit Graph

15191 Commits

Author SHA1 Message Date
Igor Breger bd2dedaa38 [GlobalISel][X86] Fold FI/G_GEP into LDR/STR instruction addressing mode.
Summary: Implement some of the simplest addressing modes.It should help to test ABI.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33888

llvm-svn: 305691
2017-06-19 13:12:57 +00:00
Florian Hahn 5f746c8e27 Recommit rL305677: [CodeGen] Add generic MacroFusion pass
Use llvm::make_unique to avoid ambiguity with MSVC.

This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.

Differential Revision: https://reviews.llvm.org/D34144

llvm-svn: 305690
2017-06-19 12:53:31 +00:00
Florian Hahn e16d3106f3 Revert r305677 [CodeGen] Add generic MacroFusion pass.
This causes Windows buildbot failures do an ambiguous call.

llvm-svn: 305681
2017-06-19 11:26:15 +00:00
Florian Hahn ee1b096f8a [CodeGen] Add generic MacroFusion pass.
Summary:
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.

Reviewers: craig.topper, evandro, t.p.northover, atrick, MatzeB

Reviewed By: MatzeB

Subscribers: atrick, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34144

llvm-svn: 305677
2017-06-19 10:51:38 +00:00
Simon Pilgrim 07cfc80186 Remove trailing whitespace. NFCI.
llvm-svn: 305476
2017-06-15 16:20:27 +00:00
Simon Pilgrim 4d432b2c6b [X86][AVX2] Fix issue in lowerV8I16GeneralSingleInputVectorShuffle that was assuming v8i16 vectors
We can use this with v16i16/v32i16 as well.

Found during fuzz testing.

llvm-svn: 305472
2017-06-15 14:52:30 +00:00
Simon Pilgrim b98cb3808c Revert r305465: [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
This is causing windows buildbot failures

llvm-svn: 305470
2017-06-15 14:39:34 +00:00
Ayman Musa 56912cda71 [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.

When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.

Differential Revision: https://reviews.llvm.org/D33188

llvm-svn: 305465
2017-06-15 13:02:37 +00:00
Sanjay Patel 83cb007940 [x86] avoid unnecessary shuffle mask math in combineX86ShufflesRecursively()
This is a follow-up to https://reviews.llvm.org/D34174 / https://reviews.llvm.org/rL305398.

We mentioned replacing the multiplies with shifts, but the real win seems to be in
bypassing the extra ops in the common case when the RootRatio and OpRatio are one.

This gives us another 1-2% overall win for the test in PR32037:
https://bugs.llvm.org/show_bug.cgi?id=32037

llvm-svn: 305414
2017-06-14 20:37:11 +00:00
Sanjay Patel ce0b99563a [x86] replace div/rem with shift/mask for better shuffle combining perf
We know that shuffle masks are power-of-2 sizes, but there's no way (?) for LLVM to know that, 
so hack combineX86ShufflesRecursively() to be much faster by replacing div/rem with shift/mask.

This makes the motivating compile-time bug in PR32037 ( https://bugs.llvm.org/show_bug.cgi?id=32037 ) 
about 9% faster overall.

Differential Revision: https://reviews.llvm.org/D34174

llvm-svn: 305398
2017-06-14 17:00:57 +00:00
Simon Pilgrim 9ff06a0c7e Strip UTF8 BOM that got added in rL305091
Seems my recent move to VS2017 has resulted in a few text editor issues.....

llvm-svn: 305285
2017-06-13 10:17:57 +00:00
Simon Pilgrim 2b3b717768 [X86][SSE] Refactor getTargetConstantBitsFromNode to avoid large APInts (PR32037)
Much of PR32037's compile time regression is due to getTargetConstantBitsFromNode always creating large (>64bit) APInts during the bitcasting from the source data to the destination bitwidth.

This commit avoids this bitcast stage if the data is already the correct bitwidth.

llvm-svn: 305284
2017-06-13 10:13:48 +00:00
Craig Topper 8b8767662c [AVX-512] Mark masked VPCMP instructions as commutable.
llvm-svn: 305276
2017-06-13 07:13:50 +00:00
Craig Topper e1d8103d8f [AVX-512] Mark masked version of vpcmpeq as being commutable.
llvm-svn: 305275
2017-06-13 07:13:47 +00:00
Craig Topper 42d0339257 [X86] Add masked integer compare instructions to load folding tables.
llvm-svn: 305274
2017-06-13 07:13:44 +00:00
Sanjay Patel d4765a38b4 [DAG] add helper to bind memop chains; NFCI
This step is just intended to reduce code duplication rather than change any functionality.

A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper.

Differential Revision: https://reviews.llvm.org/D33649

llvm-svn: 305192
2017-06-12 14:41:48 +00:00
Simon Pilgrim b079c8b35b [X86][SSE] Change memop fragment to inherit from vec128load with local alignment controls
First possible step towards merging SSE/AVX memory folding pattern fragments.

Also allows us to remove the duplicate non-temporal load logic.

Differential Revision: https://reviews.llvm.org/D33902

llvm-svn: 305184
2017-06-12 10:01:27 +00:00
Craig Topper 69fead95c7 [AVX-512] Add VPCONFLICT and VPLZCNT to load folding tables.
llvm-svn: 305180
2017-06-12 04:57:31 +00:00
Sanjay Patel dcbfbb11d9 [x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load
I was looking closer at the x86 test diffs in D33866, and the first change seems like it 
shouldn't happen in the first place. So this patch will resolve that.

Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for 
any given CPU model, so we should be able to interchange those without affecting perf. 
But as we can see in some of the diffs here, using vperm2f128 allows load folding, so 
we should take that opportunity to reduce code size and register pressure.

A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128 
was introduced with AVX1, we should be selecting it in all of the same situations that we 
would with AVX2. If there's some reason that an AVX1 CPU would not want to use this 
instruction, that should be fixed up in a later pass.

Differential Revision: https://reviews.llvm.org/D33938

llvm-svn: 305171
2017-06-11 21:18:58 +00:00
Simon Pilgrim 3d37b1a277 [X86][SSE] Add support for PACKSS nodes to faux shuffle extraction
If the inputs won't saturate during packing then we can treat the PACKSS as a truncation shuffle

llvm-svn: 305091
2017-06-09 17:29:52 +00:00
Andrew V. Tischenko 8cb1d0931f Add scheduler classes to integer/float horizontal operations.
This patch will close PR32801.
Differential Revision: https://reviews.llvm.org/D33203

llvm-svn: 304986
2017-06-08 16:44:13 +00:00
Andrew V. Tischenko e0531025f8 This patch closes PR28513: an optimization of multiplication by different constants.
The initial patch was rejected: I fixed the issue and re-apply it.

llvm-svn: 304972
2017-06-08 10:20:13 +00:00
Sanjay Patel 6e8e7cc70e [x86] avoid flipping sign bits for vector icmp by using known bits
If we know that both operands of an unsigned integer vector comparison are non-negative, 
then it's safe to directly use a signed-compare-greater-than instruction (the only non-equality
integer vector compare predicate provided by SSE/AVX).

We're intentionally not changing the condition code to signed in order to preserve the
existing transforms that use min/max/psubus below here.

This should solve PR33276:
https://bugs.llvm.org/show_bug.cgi?id=33276

Differential Revision: https://reviews.llvm.org/D33862

llvm-svn: 304909
2017-06-07 13:46:34 +00:00
Simon Pilgrim 58f5be2771 [X86][SSE] Fix an issue with PEXTRW/PEXTRB indices during shuffle combining
We were checking that the index was in range of the destination vector type, not the (larger) source vector type

llvm-svn: 304894
2017-06-07 10:30:35 +00:00
Zachary Turner 264b5d9e88 Move Object format code to lib/BinaryFormat.
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.

Differential Revision: https://reviews.llvm.org/D33843

llvm-svn: 304864
2017-06-07 03:48:56 +00:00
Evgeny Stupachenko 3b88291581 Fix PR23384 (part 3 of 3)
Summary:
The patch makes instruction count the highest priority for
 LSR solution for X86 (previously registers had highest priority).

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30562

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 304824
2017-06-06 20:04:16 +00:00
Anna Thomas b2a212c070 [Atomics][LoopIdiom] Recognize unordered atomic memcpy
Summary:
Expanding the loop idiom test for memcpy to also recognize
unordered atomic memcpy. The only difference for recognizing
an unordered atomic memcpy and instead of a normal memcpy is
that the loads and/or stores involved are unordered atomic operations.

Background:  http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

Patch by Daniel Neilson!

Reviewers: reames, anna, skatkov

Reviewed By: reames, anna

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33243

llvm-svn: 304806
2017-06-06 16:45:25 +00:00
Simon Pilgrim f7113fd270 [X86][AVX1] Split 256-bit vector non-temporal FastISel loads to keep it non-temporal (PR32744)
Extension to D33728

llvm-svn: 304798
2017-06-06 14:18:39 +00:00
Chandler Carruth 6bda14b313 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Chandler Carruth 41ed4034dd [x86] Revert the X86FoldTablesEmitter due to more miscompiles.
In testing, we've found yet another miscompile caused by the new tables.
And this one is even less clear how to fix (we could teach it to fold
a 16-bit load instead of the 32-bit load it wants, or block folding
entirely).

Also, the approach to excluding instructions seems increasingly to not
scale well.

I have left a more detailed analysis on the review log for the original
patch (https://reviews.llvm.org/D32684) along with suggested path
forward. I will land an additional test case that I wrote which covers
the code that was miscompiling (folding into the output of `pextrw`) in
a subsequent commit to keep this a pure revert.

For each commit reverted here, I've restricted the revert to the
non-test code touching the x86 fold table emission until the last commit
where I did revert the test updates. This means the *new* test cases
added for `insertps` and `xchg` remain untouched (and continue to pass).

Reverted commits:
r304540: [X86] Don't fold into memory operands into insertps in the ...
r304347: [TableGen] Adapt more places to getValueAsString now ...
r304163: [X86] Don't fold away the memory operand of an xchg.
r304123: Don't capture a temporary std::string in a StringRef.
r304122: Resubmit "[X86] Adding new LLVM TableGen backend that ..."

Original commit was in r304088, and after a string of fixes was reverted
previously in r304121 to fix build bots, and then re-landed in r304122.

llvm-svn: 304762
2017-06-06 02:15:31 +00:00
Simon Pilgrim 807b708d13 [X86][SSE41] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)
Missed SSE41 non-temporal load case in previous commit

Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304722
2017-06-05 16:45:32 +00:00
Simon Pilgrim b2ef948628 [X86][AVX1] Split 256-bit vector non-temporal loads to keep it non-temporal (PR32744)
Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304718
2017-06-05 16:02:01 +00:00
Simon Pilgrim a25bf0b6b9 [X86][SSE] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)
Differential Revision: https://reviews.llvm.org/D33728

llvm-svn: 304717
2017-06-05 15:43:03 +00:00
Simon Pilgrim 46dd55f1e1 [X86][SSE] Change BUILD_VECTOR interleaving ordering to improve coalescing/combine opportunities
We currently generate BUILD_VECTOR as a tree of UNPCKL shuffles of the same type:

e.g. for v4f32:

Step 1: unpcklps 0, 2 ==> X: <?, ?, 2, 0>
      : unpcklps 1, 3 ==> Y: <?, ?, 3, 1>
Step 2: unpcklps X, Y ==>    <3, 2, 1, 0>

The issue is because we are not placing sequential vector elements together early enough, we fail to recognise many combinable patterns - consecutive scalar loads, extractions etc.

Instead, this patch unpacks progressively larger sequential vector elements together:

e.g. for v4f32:

Step 1: unpcklps 0, 2 ==> X: <?, ?, 1, 0>
      : unpcklps 1, 3 ==> Y: <?, ?, 3, 2>
Step 2: unpcklpd X, Y ==>    <3, 2, 1, 0>

This does mean that we are creating UNPCKL shuffle of different value types, but the relevant combines that benefit from this are quite capable of handling the additional BITCASTs that are now included in the shuffle tree.

Differential Revision: https://reviews.llvm.org/D33864

llvm-svn: 304688
2017-06-04 20:12:04 +00:00
Simon Pilgrim f93debb40c [X86][SSE] Add SCALAR_TO_VECTOR(PEXTRW/PEXTRB) support to faux shuffle combining
Generalized existing SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT) code to support AssertZext + PEXTRW/PEXTRB cases as well. 

llvm-svn: 304659
2017-06-03 11:12:57 +00:00
Sanjay Patel e737cf8500 [x86] simplify code for vector icmp pred transforms; NFCI
Organizing by transform is smaller and easier to read than a squashed switch with fall-throughs.

llvm-svn: 304611
2017-06-02 23:21:53 +00:00
Ahmed Bougacha 018a68f9e4 [X86] Correctly broadcast NaN-like integers as float on AVX.
Since r288804, we try to lower build_vectors on AVX using broadcasts of
float/double.  However, when we broadcast integer values that happen to
have a NaN float bitpattern, we lose the NaN payload, thereby changing
the integer value being broadcast.

This is caused by ConstantFP::get, to which we pass the splat i32 as
a float (by bitcasting it using bitsToFloat).  ConstantFP::get takes
a double parameter, so we end up lossily converting a single-precision
NaN to double-precision.

Instead, avoid any kinds of conversions by directly building an APFloat
from the splatted APInt.

Note that this also fixes another piece of code (broadcast of
subvectors), that currently isn't susceptible to the same problem.

Also note that we could really just use APInt and ConstantInt
throughout: the constant pool type doesn't matter much.  Still, for
consistency, use the appropriate type.

llvm-svn: 304590
2017-06-02 20:02:59 +00:00
Sanjay Patel 469014ada4 [x86] fix formatting; NFCI
llvm-svn: 304576
2017-06-02 18:14:31 +00:00
David Blaikie b6b42e018a Tidy up a bit of r304516, use SmallVector::assign rather than for loop
This might give a few better opportunities to optimize these to memcpy
rather than loops - also a few minor cleanups (StringRef-izing,
templating (to avoid std::function indirection), etc).

The SmallVector::assign(iter, iter) could be improved with the use of
SFINAE, but the (iter, iter) ctor and append(iter, iter) need it to and
don't have it - so, workaround it for now rather than bothering with the
added complexity.

(also, as noted in the added FIXME, these assign ops could potentially
be optimized better at least for non-trivially-copyable types)

llvm-svn: 304566
2017-06-02 17:24:26 +00:00
Amaury Sechet 2adb7bdbca Remove ADDC, ADDE, SUBC, SUBE and SETCCE support from the X86 backend, use the CARRY ops instead.
Summary:
As per title. This cleanup some technical debt.

Depends on D33374

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33390

llvm-svn: 304435
2017-06-01 16:33:08 +00:00
Zvi Rackover 7693733e80 [X86] Match bitcast of vxi1 to pmovmsk
Summary:
Add an early combine to match patterns such as:
  (i16 bitcast (v16i1 x))
  ->
  (i16 movmsk (v16i8 sext (v16i1 x)))

This combine needs to happen early enough before
type-legalization scalarizes the result of the setcc.

Reviewers: igorb, craig.topper, RKSimon

Subscribers: delena, llvm-commits

Differential Revision: https://reviews.llvm.org/D33311

llvm-svn: 304406
2017-06-01 11:27:57 +00:00
Amaury Sechet 251ea8a4f8 Do not legalize large setcc with setcce, introduce setcccarry and do it with usubo/setcccarry.
Summary:
This is a continuation of the work started in D29872 . Passing the carry down as a value rather than as a glue allows for further optimizations. Introducing setcccarry makes the use of addc/subc unecessary and we can start the removal process.

This patch only introduce the optimization strictly required to get the same level of optimization as was available before nothing more.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33374

llvm-svn: 304404
2017-06-01 11:14:17 +00:00
Amaury Sechet 6506a90a70 Remove ISD::SETCC match from combineX86ADD. It's done improperly and doesn't work.
llvm-svn: 304403
2017-06-01 11:13:10 +00:00
Dehao Chen 6b737ddce7 Add LiveRangeShrink pass to shrink live range within BB.
Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB.

Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb

Reviewed By: MatzeB, andreadb

Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32563

llvm-svn: 304371
2017-05-31 23:25:25 +00:00
Galina Kistanova c752c4bf56 Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304355
2017-05-31 21:50:45 +00:00
Matthias Braun ac4beccaca X86FloatingPoint: Fix livein lists
After transforming FP to ST registers:
- Do not add the ST register to the livein lists, they are reserved so
  we do not need to track their liveness.
- Remove the FP registers from the livein lists, they don't have defs or
  uses anymore and so are not live.
- (The setKillFlags() call is moved to an earlier place as it relies on
   the FP registers still being present in the livein list.)

llvm-svn: 304342
2017-05-31 20:30:22 +00:00
Matthias Braun 43692a2245 X86FloatingPoint: Add some static assert, cleanup; NFC
llvm-svn: 304341
2017-05-31 20:30:17 +00:00
Galina Kistanova b2c0116e71 Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304332
2017-05-31 19:41:33 +00:00
Matthias Braun d6a36ae282 TargetMachine: Indicate whether machine verifier passes.
This adds a callback to the LLVMTargetMachine that lets target indicate
that they do not pass the machine verifier checks in all cases yet.

This is intended to be a temporary measure while the targets are fixed
allowing us to enable the machine verifier by default with
EXPENSIVE_CHECKS enabled!

Differential Revision: https://reviews.llvm.org/D33696

llvm-svn: 304320
2017-05-31 18:41:23 +00:00
Galina Kistanova 6ad77845e2 Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
llvm-svn: 304312
2017-05-31 17:10:03 +00:00
Matthias Braun bcd4c68233 X86FrameLowering: No need to mark FP as live-in everywhere
The frame pointer (when used as frame pointer) is a reserved register.
We do not track liveness of reserved registers and hence do not need to
add them to the basic block livein lists.

llvm-svn: 304274
2017-05-31 02:11:10 +00:00
Matthias Braun 5e394c3d6f TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFC
TargetPassConfig is not useful for targets that do not use the CodeGen
library, so we may just as well store a pointer to an
LLVMTargetMachine instead of just to a TargetMachine.

While at it, also change the constructor to take a reference instead of a
pointer as the TM must not be nullptr.

llvm-svn: 304247
2017-05-30 21:36:41 +00:00
Vedant Kumar 87aefe9042 Revert "This patch closes PR28513: an optimization of multiplication by different constants. It's implemented on DAG combiner level."
This reverts commit r304209.

I think this change is responsible for a tablgen failure in stage2 builds:

  http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/2171/

I reproduced the failure locally (without ThinLTO), reverted the commit, rebuilt the stage1 clang, rebuilt the stage2 llvm-tblgen tool, and found that the crash disappears when the commit is reverted. Here is the stack trace:

FAILED: lib/Target/ARM/ARMGenRegisterBank.inc.tmp
cd /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM && /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes
/Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp
0  llvm-tblgen              0x0000000106fc9568 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40
1  llvm-tblgen              0x0000000106fc9be6 SignalHandler(int) + 422
2  libsystem_platform.dylib 0x00000001076a7fba _sigtramp + 26
3  libsystem_platform.dylib 0x00007fff58deb468 _sigtramp + 1366570184
4  llvm-tblgen              0x0000000106e89cc7 llvm::CodeGenRegBank::getCompositeSubRegIndex(llvm::CodeGenSubRegIndex*, llvm::CodeGenSubRegIndex*) + 615
5  llvm-tblgen              0x0000000106e88be6 llvm::CodeGenRegister::computeSubRegs(llvm::CodeGenRegBank&) + 2182
6  llvm-tblgen              0x0000000106e8e9f0 llvm::CodeGenRegBank::CodeGenRegBank(llvm::RecordKeeper&) + 2192
7  llvm-tblgen              0x0000000106f384a1 llvm::EmitRegisterBank(llvm::RecordKeeper&, llvm::raw_ostream&) + 65
8  llvm-tblgen              0x0000000106f72c64 (anonymous namespace)::LLVMTableGenMain(llvm::raw_ostream&, llvm::RecordKeeper&) + 1172
9  llvm-tblgen              0x0000000106fcb15f llvm::TableGenMain(char*, bool (*)(llvm::raw_ostream&, llvm::RecordKeeper&)) + 3599
10 llvm-tblgen              0x0000000106f727a6 main + 134
11 libdyld.dylib            0x000000010733c6a5 start + 1
Stack dump:
0.      Program arguments: /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp
/bin/sh: line 1: 41986 Segmentation fault: 11  /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz
-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp

llvm-svn: 304231
2017-05-30 19:25:22 +00:00
Craig Topper f6d4dc5b4a [SelectionDAG] Set ISD::FPOWI to Expand by default
Summary:
Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie".

This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default.

Reviewers: spatel, RKSimon, efriedma

Reviewed By: RKSimon

Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits

Differential Revision: https://reviews.llvm.org/D33530

llvm-svn: 304215
2017-05-30 15:27:55 +00:00
Andrew V. Tischenko 8b04826663 This patch closes PR28513: an optimization of multiplication by different constants.
It's implemented on DAG combiner level.

llvm-svn: 304209
2017-05-30 13:00:44 +00:00
Zachary Turner df1832cf86 Resubmit "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables."
This was reverted due to buildbot breakages and I was not familiar
with this code to investigate it.  But while trying to get a
useful backtrace for the author, it turns out the fix was very
obvious.  Resubmitting this patch as is, and will submit the
fix in a followup so that the fix is not hidden in the larger
CL.

llvm-svn: 304122
2017-05-29 02:19:37 +00:00
Zachary Turner 5b199be769 Revert "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables."
This reverts commit 28cb1003507f287726f43c771024a1dc102c45fe as well
as all subsequent followups.  llvm-tblgen currently segfaults with
this change, and it seems it has been broken on the bots all
day with no fixes in preparation.  See, for example:

http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/

llvm-svn: 304121
2017-05-29 01:48:53 +00:00
Ayman Musa d9f1fe43a8 [X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables.
X86 backend holds huge tables in order to map between the register and memory forms of each instruction.
This TableGen Backend automatically generated all these tables with the appropriate flags for each entry.

Differential Revision: https://reviews.llvm.org/D32684

llvm-svn: 304088
2017-05-28 12:55:36 +00:00
Ayman Musa 0b4f97d5e9 [X86] Adding FoldGenRegForm helper field (for memory folding tables tableGen backend) to X86Inst class and set its value for the relevant instructions.
Some register-register instructions can be encoded in 2 different ways, this happens when 2 register operands can be folded (separately). 
For example if we look at the MOV8rr and MOV8rr_REV, both instructions perform exactly the same operation, but are encoded differently. Here is the relevant information about these instructions from Intel's 64-ia-32-architectures-software-developer-manual:

Opcode  Instruction  Op/En  64-Bit Mode  Compat/Leg Mode  Description
8A /r   MOV r8,r/m8  RM     Valid        Valid            Move r/m8 to r8.
88 /r   MOV r/m8,r8  MR     Valid        Valid            Move r8 to r/m8.
Here we can see that in order to enable the folding of the output and input registers, we had to define 2 "encodings", and as a result we got 2 move 8-bit register-register instructions.

In the X86 backend, we define both of these instructions, usually one has a regular name (MOV8rr) while the other has "_REV" suffix (MOV8rr_REV), must be marked with isCodeGenOnly flag and is not emitted from CodeGen.

Automatically generating the memory folding tables relies on matching encodings of instructions, but in these cases where we want to map both memory forms of the mov 8-bit (MOV8rm & MOV8mr) to MOV8rr (not to MOV8rr_REV) we have to somehow point from the MOV8rr_REV to the "regular" appropriate instruction which in this case is MOV8rr.

This field enable this "pointing" mechanism - which is used in the TableGen backend for generating memory folding tables.

Differential Revision: https://reviews.llvm.org/D32683

llvm-svn: 304087
2017-05-28 12:39:37 +00:00
Matthias Braun ac4307c41e LivePhysRegs: Rework constructor + documentation; NFC
- Take reference instead of pointer to a TRI that cannot be nullptr.
- Improve documentation comments.

llvm-svn: 304038
2017-05-26 21:51:00 +00:00
Andrew V. Tischenko fdb264e263 The fix for PR22004: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands."
llvm-svn: 303985
2017-05-26 13:23:34 +00:00
Oren Ben Simhon 7bf27f03f2 [X86] Adding vpopcntd and vpopcntq instructions
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).

Differential Revision: https://reviews.llvm.org/D33169

llvm-svn: 303858
2017-05-25 13:45:23 +00:00
Simon Pilgrim 9f46d1d479 Strip trailing whitespace. NFCI.
llvm-svn: 303736
2017-05-24 11:02:27 +00:00
Igor Breger 617be6e475 [GlobalISel][X86] G_LOAD/G_STORE vec256/512 support
Summary: mark G_LOAD/G_STORE vec256/512 legal for AVX/AVX512. Implement instruction selection.

Reviewers: zvi, guyblank

Reviewed By: zvi

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33268

llvm-svn: 303617
2017-05-23 08:23:51 +00:00
Igor Breger 014fc566e7 [GlobalISel][X86] Fix G_TRUNC instruction selection.
Updated tests with -verify-machineinstrs flag.
It fixes 3 tests failed with machine verifier enabled and listed
in PR27481

llvm-svn: 303502
2017-05-21 11:13:56 +00:00
Guy Blank 548e22a1a7 [X86][AVX512] Make i1 illegal in the CodeGen
This patch defines the i1 type as illegal in the X86 backend for AVX512.
For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended.
This should produce better scalar code for i1 types since GPRs will be used instead of mask registers.

Differential Revision: https://reviews.llvm.org/D32273

llvm-svn: 303421
2017-05-19 12:35:15 +00:00
Daniel Sanders a1b2db7919 [globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to per-function predicates.
Summary:
This causes them to be re-computed more often than necessary but resolves
objections that were raised post-commit on r301750.

Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls

Reviewed By: qcolombet

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32861

llvm-svn: 303418
2017-05-19 11:08:33 +00:00
Hans Wennborg b00ffd8cb7 Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB."
This also reverts follow-ups r303292 and r303298.

It broke some Chromium tests under MSan, and apparently also internal
tests at Google.

llvm-svn: 303369
2017-05-18 18:50:05 +00:00
Francis Visoiu Mistrih 8b61764cbb [LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handling a null TargetMachine call
  `getAnalysisIfAvailable<TargetPassConfig>`.

* Passes not handling a null TargetMachine
  `addRequired<TargetPassConfig>` and call
  `getAnalysis<TargetPassConfig>`.

* MachineFunctionPasses now use MF.getTarget().

* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.

This fixes a crash when running `llc -start-before prologepilog`.

PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.

Related to PR30324.

Differential Revision: https://reviews.llvm.org/D33222

llvm-svn: 303360
2017-05-18 17:21:13 +00:00
Igor Breger 842b5b36ba [GlobalISel][X86] G_ADD/G_SUB vector legalizer/selector support.
Summary: G_ADD/G_SUB vector legalizer/selector support.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33232

llvm-svn: 303345
2017-05-18 11:10:56 +00:00
Simon Pilgrim 6bba6068be [X86][AVX512] Add 512-bit vector ctpop costs + tests
llvm-svn: 303342
2017-05-18 10:42:34 +00:00
Lama Saba 2ea271b54a [X86] Replace slow LEA instructions in X86
According to Intel's Optimization Reference Manual for SNB+:
  " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
    dispatch via port 1:
  - LEA that has all three source operands: base, index, and offset
  - LEA that uses base and index registers where the base is EBP, RBP,or R13
  - LEA that uses RIP relative addressing mode
  - LEA that uses 16-bit addressing mode "
  This patch currently handles the first 2 cases only.
 
Differential Revision: https://reviews.llvm.org/D32277

llvm-svn: 303333
2017-05-18 08:11:50 +00:00
Davide Italiano 9ae69a75ec [Target/X86] Remove unneeded return. NFCI.
llvm-svn: 303323
2017-05-18 02:36:42 +00:00
Michael Liao ab12984634 Fix PR33028
- '-verify-mahcineinstrs' starts to complain allocatable live-in physical
  registers on non-entry or non-landing-pad basic blocks.
- Refactor the XBEGIN translation to define EAX on a dedicated fallback code
  path due to XABORT. Add a pseudo instruction to define EAX explicitly to
  avoid add physical register live-in.

Differential Revision: https://reviews.llvm.org/D33168

llvm-svn: 303306
2017-05-17 21:48:00 +00:00
Simon Pilgrim 23ef26728a [X86][AVX512] Add 512-bit vector ctlz costs + tests
llvm-svn: 303300
2017-05-17 21:02:18 +00:00
Simon Pilgrim d0365967c4 [X86][AVX512] Add 512-bit vector cttz costs + tests
llvm-svn: 303293
2017-05-17 20:22:54 +00:00
Dehao Chen 02828a93e8 Only enable LiveRangeShrink for x86.
Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure.

Reviewers: MatzeB, qcolombet

Reviewed By: qcolombet

Subscribers: jholewinski, jyknight, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33294

llvm-svn: 303292
2017-05-17 20:18:13 +00:00
Simon Pilgrim a9a92a1a6a [X86][AVX512] Add 512-bit vector bitreverse costs + tests
llvm-svn: 303283
2017-05-17 19:20:20 +00:00
Igor Breger 28f290fab8 [GlobalISel][X86] Support add i64 in IA32.
Summary: support G_UADDE instruction selection.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D33096

llvm-svn: 303255
2017-05-17 12:48:08 +00:00
Reid Kleckner 0ad69fc89f Revert "[X86] Replace slow LEA instructions in X86"
This reverts commit r303183, it broke various buildbots and introduced
sanitizer errors.

llvm-svn: 303199
2017-05-16 19:55:03 +00:00
Lama Saba 52e892577d [X86] Replace slow LEA instructions in X86
According to Intel's Optimization Reference Manual for SNB+:
  " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
    dispatch via port 1:
  - LEA that has all three source operands: base, index, and offset
  - LEA that uses base and index registers where the base is EBP, RBP,or R13
  - LEA that uses RIP relative addressing mode
  - LEA that uses 16-bit addressing mode "
  This patch currently handles the first 2 cases only.
 
Differential Revision: https://reviews.llvm.org/D32277

llvm-svn: 303183
2017-05-16 16:01:36 +00:00
Peter Collingbourne 6f0ecca3b5 IR: Give function GlobalValue::getRealLinkageName() a less misleading name: dropLLVMManglingEscape().
This function gives the wrong answer on some non-ELF platforms in some
cases. The function that does the right thing lives in Mangler.h. To try to
discourage people from using this function, give it a different name.

Differential Revision: https://reviews.llvm.org/D33162

llvm-svn: 303134
2017-05-16 00:39:01 +00:00
Zvi Rackover e6b278bc65 [X86] Utilize SelectionDAG::getSelect(). NFC.
Replace SelectionDAG::getNode(ISD::SELECT, ...)
and SelectionDAG::getNode(ISD::VSELECT, ...)
with SelectionDAG::getSelect(...)
Saves a few lines of code and in some cases saves the need to explicitly
check the type of the desired node.

llvm-svn: 303024
2017-05-14 21:30:38 +00:00
Simon Pilgrim d0ef9d8e93 [X86][AVX1] Account for cost of extract/insert of 256-bit shifts
llvm-svn: 303023
2017-05-14 20:52:11 +00:00
Simon Pilgrim f96b4ab92d [X86][AVX2] Fix costs for v4i64 ashr by splat
llvm-svn: 303022
2017-05-14 20:25:42 +00:00
Simon Pilgrim de4467b182 [X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splat
llvm-svn: 303021
2017-05-14 20:02:34 +00:00
Craig Topper ceea1a76a1 [X86] Remove unused value from IntrinsicType enum. NFC
llvm-svn: 303018
2017-05-14 19:38:06 +00:00
Simon Pilgrim d3f0d03cc5 [X86][AVX1] Account for cost of extract/insert of 256-bit SDIV/UDIV by mul sequences
llvm-svn: 303017
2017-05-14 18:52:15 +00:00
Simon Pilgrim 5bef9c627e [X86][XOP] XOP's general v16i8 shifts will be used instead of v8i16 shift + mask.
Tweak cost model to match what lowering actually does.

llvm-svn: 303013
2017-05-14 17:59:46 +00:00
Simon Pilgrim aa8dffb69b [X86][SSE] Account for cost of extract/insert of v32i8 vector shifts
llvm-svn: 303012
2017-05-14 17:36:07 +00:00
Simon Pilgrim 4599eaa09a [X86][XOP] Account for cost of extract/insert of 256-bit vector shifts
llvm-svn: 303010
2017-05-14 13:38:53 +00:00
Simon Pilgrim f3ee9c6997 [X86][AVX] Allow 32-bit targets to peek through subvectors to extract constant splats for vXi64 shifts.
llvm-svn: 303009
2017-05-14 11:46:26 +00:00
Simon Pilgrim ef46c2762a [x86, SSE] AVX1 PR28129 (256-bit all-ones rematerialization)
Further perf tests on Jaguar indicate that:

vxorps  %ymm0, %ymm0, %ymm0
vcmpps  $15, %ymm0, %ymm0, %ymm0

is consistently faster (by about 9%) than:

vpcmpeqd  %xmm0, %xmm0, %xmm0
vinsertf128  $1, %xmm0, %ymm0, %ymm0

Testing equivalent code on a SandyBridge (E5-2640) puts it slightly (~3%) faster as well.

Committed on behalf of @dtemirbulatov

Differential Revision: https://reviews.llvm.org/D32416

llvm-svn: 302989
2017-05-13 13:42:35 +00:00
Simon Pilgrim b146e61828 Strip trailing whitespace. NFCI.
llvm-svn: 302927
2017-05-12 17:42:36 +00:00
Craig Topper 8df66c602a [KnownBits] Add bit counting methods to KnownBits struct and use them where possible
This patch adds min/max population count, leading/trailing zero/one bit counting methods.

The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.

Differential Revision: https://reviews.llvm.org/D32931

llvm-svn: 302925
2017-05-12 17:20:30 +00:00
Simon Pilgrim 7f03231cc6 Use SDValue::getOperand() helper. NFCI.
llvm-svn: 302894
2017-05-12 13:08:45 +00:00
Reid Kleckner 43bbeb4c9f Issue diagnostics when returning FP values on x86_64 without SSE1/2
Avoid using report_fatal_error, because it will ask the user to file a
bug. If the user attempts to disable SSE on x86_64 and them use floating
point, that's a bug in their code, not a bug in the compiler.

This is just a start. There are other ways to crash the backend in this
configuration, but they should be updated to follow this pattern.

Differential Revision: https://reviews.llvm.org/D27522

llvm-svn: 302835
2017-05-11 22:43:02 +00:00
Igor Breger a44fc83d9f [GlobalISel][X86] Remove hand-written G_FADD/F_SUB selection.
Now it handle by TableGen.

llvm-svn: 302793
2017-05-11 12:15:03 +00:00
Chandler Carruth 97500a9918 [x86] Fix a failure to select with AVX-512 when the type legalizer
manages to form a VSELECT with a non-i1 element type condition. Those
are technically allowed in SDAG (at least, the generic type legalization
logic will form them and I wouldn't want to try to audit everything te
preclude forming them) so we need to be able to lower them.

This isn't too hard to implement. We mark VSELECT as custom so we get
a chance in C++, add a fast path for i1 conditions to get directly
handled by the patterns, and a fallback when we need to manually force
the condition to be an i1 that uses the vptestm instruction to turn
a non-mask into a mask.

This, unsurprisingly, generates awful code. But it at least doesn't
crash. This was actually impacting open source packages built with LLVM
for AVX-512 in the wild, so quickly landing a patch that at least stops
the immediate bleeding.

I think I've found where to fix the codegen quality issue, but less
confident of that change so separating it out from the thing that
doesn't change the result of any existing test case but causes mine to
not crash.

llvm-svn: 302785
2017-05-11 10:52:16 +00:00
Simon Pilgrim a4a13a0da0 Strip trailing whitespace. NFCI.
llvm-svn: 302784
2017-05-11 10:03:05 +00:00
Igor Breger c7b5977bb1 [GlobalISel][X86] G_ICMP support.
Summary: support G_ICMP for scalar types i8/i16/i64.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits, krytarowski

Differential Revision: https://reviews.llvm.org/D32995

llvm-svn: 302774
2017-05-11 07:17:40 +00:00
Igor Breger db75455990 [X86] Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp. NFC
Summary:
Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp so it can be used by GloabalIsel instruction selector.
This is a pre-commit for a patch I'm working on to support G_ICMP. NFC.

Reviewers: zvi, guyblank, delena

Reviewed By: guyblank, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33038

llvm-svn: 302767
2017-05-11 06:36:37 +00:00
Igor Breger fda31e64e0 [GlobalISel][X86] G_ZEXT i1 to i32/i64 support.
Summary: Support G_ZEXT i1 to i32/i64 instruction selection.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32965

llvm-svn: 302623
2017-05-10 06:52:58 +00:00
Serge Guelton e38003f839 Suppress all uses of LLVM_END_WITH_NULL. NFC.
Use variadic templates instead of relying on <cstdarg> + sentinel.
This enforces better type checking and makes code more readable.

Differential Revision: https://reviews.llvm.org/D32541

llvm-svn: 302571
2017-05-09 19:31:13 +00:00
Craig Topper f893d49f0c [X86] Add more patterns for BZHI isel
This patch adds more patterns that a reasonable person might write that can be compiled to BZHI.

This adds support for

(~0U >> (32 - b)) & a;

and

a << (32 - b) >> (32 - b);

This was inspired by the code in APInt::clearUnusedBits.

This can pass an index of 32 to the bzhi instruction which a quick test of Haswell hardware shows will not mask any bits. Though the description text in the Intel manual says the "index is saturated to OperandSize-1". The pseudocode in the same manual indicates no bits will be zeroed for this case.

I think this is still missing cases where the subtract portion is an 8-bit operation.

Differential Revision: https://reviews.llvm.org/D32616

llvm-svn: 302549
2017-05-09 16:32:11 +00:00
Guy Blank 0c42d8c35b VX512] Only look at lower bit in constant scalar masks
for scalar masked instructions only the lower bit of the mask is relevant. so for constant masks we should either do an unmasked operation or no operation, depending on the value of the lower bit.
This patch handles cases where the lower bit is '1'.

Differential Revision: https://reviews.llvm.org/D32805

llvm-svn: 302546
2017-05-09 16:16:48 +00:00
Serge Pavlov d526b13e61 Add extra operand to CALLSEQ_START to keep frame part set up previously
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to  CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.

This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.

The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
  affects all targets that use frame pseudo instructions and touched many
  files although the changes are uniform.
- Access to frame properties are implemented using special instructions
  rather than calls getOperand(N).getImm(). For X86 and ARM such
  replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
  instruction. These involve proper instruction initialization and
  methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
  frame parts initialized inside frame instruction pair and outside it.

The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.

Differential Revision: https://reviews.llvm.org/D32394

llvm-svn: 302527
2017-05-09 13:35:13 +00:00
Simon Pilgrim ca3a63a849 [X86][SSE42] Lower v2i64/v4i64 ASHR(X, 63) as PCMPGTQ(0, X)
Similar to what we do for vXi8 ASHR(X, 7), use SSE42's PCMPGTQ to splat the sign instead of using the PSRAD+PSHUFD.

Avoiding bitcasts this improves combines that utilize computeNumSignBits, permits memory folding and reduces pipe pressure. Although it does require a second register, given that this is a (cheap) zero register the impact is minimal.

Differential Revision: https://reviews.llvm.org/D32973

llvm-svn: 302525
2017-05-09 13:14:40 +00:00
Nikolai Bozhenov b7bf386e80 [X86] Clang option -fuse-init-array has no effect when generating for MCU target
Reviewers: Eugene.Zelenko, dschuff, craig.topper

Reviewed By: craig.topper

Subscribers: ahatanak, aaboud, DavidKreitzer, llvm-commits, cfe-commits

Differential Revision: https://reviews.llvm.org/D32543
Patch by AndreiGrischenko <andrei.l.grischenko@intel.com>

llvm-svn: 302513
2017-05-09 10:14:03 +00:00
Simon Pilgrim df39b03f29 [X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks.
Currently combineLogicBlendIntoPBLENDV can only match ASHR to detect sign splatting of a bit mask, this patch generalises this to use computeNumSignBits instead.

This is a first step in several things we can do to improve PBLENDV support:

 * Better matching of X86ISD::ANDNP patterns.
 * Handle floating point cases.
 * Better vector and bitcast support in computeNumSignBits.
 * Recognise that PBLENDV only uses the sign bit of the mask, we should be able strip away sign splats (ASHR, PCMPGT isNeg tests etc.).

Differential Revision: https://reviews.llvm.org/D32953

llvm-svn: 302424
2017-05-08 14:16:39 +00:00
Igor Breger 810c6257f1 [GlobalISel][X86] G_GEP selection support.
Summary: [GlobalISel][X86] G_GEP selection support.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32396

llvm-svn: 302412
2017-05-08 09:40:43 +00:00
Igor Breger 605b965ae5 [GlobalISel][X86] G_MUL legalizer/selector support.
Summary:
G_MUL legalizer/selector/regbank support.
Use only Tablegen-erated instruction selection.
This patch dealing with legal operations only.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: krytarowski, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D32698

llvm-svn: 302410
2017-05-08 09:03:37 +00:00
Dean Michael Berris 9bcaed867a [XRay] Custom event logging intrinsic
This patch introduces an LLVM intrinsic and a target opcode for custom event
logging in XRay. Initially, its use case will be to allow users of XRay to log
some type of string ("poor man's printf"). The target opcode compiles to a noop
sled large enough to enable calling through to a runtime-determined relative
function call. At runtime, when X-Ray is enabled, the sled is replaced by
compiler-rt with a trampoline to the logic for creating the custom log entries.

Future patches will implement the compiler-rt parts and clang-side support for
emitting the IR corresponding to this intrinsic.

Reviewers: timshen, dberris

Subscribers: igorb, pelikan, rSerge, timshen, echristo, dberris, llvm-commits

Differential Revision: https://reviews.llvm.org/D27503

llvm-svn: 302405
2017-05-08 05:45:21 +00:00
Simon Pilgrim 2d1c6d6e8d [X86][AVX1] Improve 256-bit vector costs for integer unary intrinsics.
Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets. 

llvm-svn: 302378
2017-05-07 20:58:55 +00:00
Simon Pilgrim 33f7397cc0 [X86][AVX512] Relax assertion and just exit combine for unsupported types (PR32907)
llvm-svn: 302361
2017-05-06 20:53:52 +00:00
Simon Pilgrim fea153f341 [X86][AVX512] Move v2i64/v4i64 VPABS lowering to tablegen
Extend NoVLX targets to use the 512-bit versions

llvm-svn: 302359
2017-05-06 19:11:59 +00:00
Simon Pilgrim f15a2f4d94 [X86] Reduce code for setting operations actions by merging into loops across multiple types/ops. NFCI.
llvm-svn: 302357
2017-05-06 18:17:56 +00:00
Simon Pilgrim 781cb10104 [X86][SSE] Break register dependencies on v16i8/v8i16 BUILD_VECTOR on SSE41
rL294581 broke unnecessary register dependencies on partial v16i8/v8i16 BUILD_VECTORs, but on SSE41 we (currently) use insertion for full BUILD_VECTORs as well. By allowing full insertion to occur on SSE41 targets we can break register dependencies here as well.

llvm-svn: 302355
2017-05-06 17:30:39 +00:00
Quentin Colombet 245994d968 [RegisterBankInfo] Uniquely allocate instruction mapping.
This is a step toward having statically allocated instruciton mapping.
We are going to tablegen them eventually, so let us reflect that in
the API.

NFC.

llvm-svn: 302316
2017-05-05 22:48:22 +00:00
Simon Pilgrim 430a335b7b [X86] Use SDValue::getConstantOperandVal helper. NFCI.
llvm-svn: 302286
2017-05-05 20:53:52 +00:00
Craig Topper f0aeee01c3 [KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits.
This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown.

Differential Revision: https://reviews.llvm.org/D32637

llvm-svn: 302262
2017-05-05 17:36:09 +00:00
Simon Pilgrim ac3c4b6da4 [X86][AVX512] Improve support and testing for CTLZ of 512-bit vectors without CDI
llvm-svn: 302233
2017-05-05 13:31:52 +00:00
Simon Pilgrim e9c5d7b70b [X86] Remove duplicate operation actions. NFCI.
llvm-svn: 302230
2017-05-05 12:34:55 +00:00
Simon Pilgrim c89aa0bee5 [X86][AVX512CDI] Move v2i64/v4i64 and v4i32/v8i32 VPLZCNT lowering to tablegen
Extend NoVLX targets to use the 512-bit versions

llvm-svn: 302229
2017-05-05 12:20:34 +00:00
Simon Pilgrim 73b88d5183 Remove unused variable
llvm-svn: 302226
2017-05-05 11:55:38 +00:00
Simon Pilgrim 1d47a15d89 [X86][AVX] Add LowerIntUnary helpers to split unary vector ops in half. NFCI.
Same as LowerIntArith helpers but for unary ops instead of binary.

llvm-svn: 302222
2017-05-05 10:59:24 +00:00
Andrew Ng 807ca72e66 [X86] Remove unused code from X86 optimize LEAs. NFC.
This patch removes unused code which is no longer required because of changes
to the DIExpression::prepend function.

llvm-svn: 302219
2017-05-05 09:21:35 +00:00
Daniel Jasper 07a1771959 Initialize new member X86Operand::FrontendSize in all codepaths.
This fixes MSAN-builds after r302179.

llvm-svn: 302214
2017-05-05 07:31:40 +00:00
Simon Pilgrim 11a1637a10 Strip trailing whitespace. NFCI.
llvm-svn: 302192
2017-05-04 20:55:16 +00:00
Reid Kleckner 6d2ea6ec80 [ms-inline-asm] Use the frontend size only for ambiguous instructions
This avoids problems on code like this:
  char buf[16];
  __asm {
    movups xmm0, [buf]
    mov [buf], eax
  }

The frontend size in this case (1) is wrong, and the register makes the
instruction matching unambiguous. There are also enough bytes available
that we shouldn't complain to the user that they are potentially using
an incorrectly sized instruction to access the variable.

Supersedes D32636 and D26586 and fixes PR28266

llvm-svn: 302179
2017-05-04 18:19:52 +00:00
Igor Breger 70583606b1 [X86][AVX-512] Allow EVEX encoded instruction selection when available for mul v8i32.
Differential Revision: https://reviews.llvm.org/D32679

llvm-svn: 302127
2017-05-04 07:34:58 +00:00
Oren Ben Simhon 51de0330eb [X86] Disabling PLT in Regcall CC Functions
According to psABI, PLT stub clobbers XMM8-XMM15.
In Regcall calling convention those registers are used for passing parameters. 
Thus we need to prevent lazy binding in Regcall.

Differential Revision: https://reviews.llvm.org/D32430

llvm-svn: 302124
2017-05-04 07:22:49 +00:00
Igor Breger c6eccdd5c0 [AVX] Fix vpcmpeqq predicate.
Summary:
Fix vpcmpeqq predicate. AVX512 version of vpcmpeqq is not equivalent to AVX one.
Split from https://reviews.llvm.org/D32679

Reviewers: craig.topper, zvi, aymanmus

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32786

llvm-svn: 302119
2017-05-04 06:24:52 +00:00
Reid Kleckner 5c0bdef5aa Mark functions as not having CFI once we finalize an x86 stack frame
We'll set it back to true in emitPrologue if it gets called. It doesn't
get called for naked functions.

Fixes PR32912

llvm-svn: 302092
2017-05-03 23:13:42 +00:00
Craig Topper d938fd1397 [KnownBits] Add zext, sext, and trunc methods to KnownBits
This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible.

Differential Revision: https://reviews.llvm.org/D32784

llvm-svn: 302088
2017-05-03 22:07:25 +00:00
Reid Kleckner a0b45f4bfc [IR] Abstract away ArgNo+1 attribute indexing as much as possible
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
  to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
  sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
  take attribute list indices.  Most of these were only used from
  BuildLibCalls, and doesNotAlias was only used to test or set if the
  return value is malloc-like.

I'm happy to split the patch, but I think they are probably easier to
review when taken together.

This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
  0: func attrs
  1: retattrs
  2...: arg attrs

Reviewers: chandlerc, pete, javed.absar

Subscribers: david2050, llvm-commits

Differential Revision: https://reviews.llvm.org/D32811

llvm-svn: 302060
2017-05-03 18:17:31 +00:00
Simon Pilgrim 03ccf91d85 [X86][LWP] Add stack folding mappings and tests for LWPINS/LWPVAL instructions
llvm-svn: 302049
2017-05-03 16:46:30 +00:00
Simon Pilgrim eada39d050 Silence a 'enum and non-enum used in conditional' warning.
llvm-svn: 302048
2017-05-03 16:43:57 +00:00
Simon Pilgrim 99b925bdf3 [X86][LWP] Add llvm support for LWP instructions (reapplied).
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).

Reapplied - this time without changing line endings of existing files.

Differential Revision: https://reviews.llvm.org/D32769

llvm-svn: 302041
2017-05-03 15:51:39 +00:00
Simon Pilgrim a271c54324 Revert rL302028 due to accidental line ending changes.
llvm-svn: 302038
2017-05-03 15:42:29 +00:00
Simon Pilgrim b2e0464fde [X86][LWP] Add llvm support for LWP instructions.
This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4).

Differential Revision: https://reviews.llvm.org/D32769

llvm-svn: 302028
2017-05-03 15:18:34 +00:00
Guy Blank d0baa524d0 [X86][AVX512] remove unnecessary case. NFC
VFPCLASS is for vector types and not scalar, so it cannot get here.

Differential Revision: https://reviews.llvm.org/D32694

llvm-svn: 302023
2017-05-03 13:34:05 +00:00
Oren Ben Simhon dbd4bba1ec [X86] Support of no_caller_saved_registers attribute
This patch implements the LLVM part for no_caller_saved_registers attribute as appears here: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5ed3cc7b66af4758f7849ed6f65f4365be8223be.
In order to implement the attribute, we use the dynamic CSR mechanism to remove returned/passed arguments from the function regmask/CSR list.

Differential Revision: https://reviews.llvm.org/D31876

llvm-svn: 302020
2017-05-03 13:07:19 +00:00
Simon Pilgrim 05cfa83843 [X86] Refactored LowerINTRINSIC_W_CHAIN to use a switch statament. NFCI.
Pre-commit as requested in D32769.

llvm-svn: 302010
2017-05-03 10:40:18 +00:00
Simon Pilgrim 24d361f7bf [X86] Tidyup subvector insert/extract helpers. NFCI.
Use getConstantOperandVal where possible.

llvm-svn: 301912
2017-05-02 11:08:15 +00:00
Simon Pilgrim 7aca5218b0 Fix typo in comment. NFCI.
llvm-svn: 301911
2017-05-02 10:43:33 +00:00
Simon Pilgrim 8d196c88a6 [X86] Reduce code for setting operations actions by merging into loops across multiple types/ops. NFCI.
llvm-svn: 301879
2017-05-01 23:09:01 +00:00
Simon Pilgrim ab1a82764f [X86][AVX] Rename LowerVectorBroadcast to lowerBuildVectorAsBroadcast. NFCI.
Since the shuffle refactor, this is only used during BUILD_VECTOR lowering.

llvm-svn: 301834
2017-05-01 20:56:35 +00:00
Tim Northover 9bb6931c25 X86: initialize a few subtarget variables.
Otherwise an indeterminate value gets read, causing a bunch of UBSan failures.

llvm-svn: 301819
2017-05-01 17:50:15 +00:00
Amara Emerson d28f0cd448 Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.
This removes BinaryWithFlagsSDNode, and flags are now all passed by value.

Differential Revision: https://reviews.llvm.org/D32527

llvm-svn: 301803
2017-05-01 15:17:51 +00:00
Igor Breger 2452ef0ea2 [GlobalISel][X86] Prioritize Tablegen-erated instruction selection. NFC
Summary:
Prioritizes Tablegen-erated instruction selection over C++ instruction selection.
Remove G_ADD/G_SUB C++ selection - implemented by Tablegen.

Reviewers: dsanders, zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32677

llvm-svn: 301792
2017-05-01 07:06:08 +00:00
Igor Breger c08a783521 [GlobalISel][X86] G_SEXT/G_ZEXT support.
Reviewers: zvi, guyblank

Reviewed By: zvi

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32591

llvm-svn: 301790
2017-05-01 06:30:16 +00:00
Igor Breger a9edb88d46 [GlobalISel][X86] G_LOAD/G_STORE pointer selection support.
Summary: [GlobalISel][X86] G_LOAD/G_STORE pointer selection support.

Reviewers: zvi, guyblank

Reviewed By: zvi, guyblank

Subscribers: dberris, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D32217

llvm-svn: 301788
2017-05-01 06:08:32 +00:00
Amaury Sechet 8ac81f3924 Do not legalize large add with addc/adde, introduce addcarry and do it with uaddo/addcarry
Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29872

llvm-svn: 301775
2017-04-30 19:24:09 +00:00
Craig Topper 778f57b4f1 [APInt] Replace calls to setBits with more specific calls to setBitsFrom and setLowBits where possible.
llvm-svn: 301768
2017-04-30 07:44:58 +00:00
Craig Topper d503644a4a [X86] Clear KnownBits instead of reconstructing it. NFC
llvm-svn: 301767
2017-04-30 07:44:55 +00:00
Daniel Sanders e9fdba39e0 [globalisel][tablegen] Compute available feature bits correctly.
Summary:
Predicate<> now has a field to indicate how often it must be recomputed.
Currently, there are two frequencies, per-module (RecomputePerFunction==0)
and per-function (RecomputePerFunction==1). Per-function predicates are
currently recomputed more frequently than necessary since the only predicate
in this category is cheap to test. Per-module predicates are now computed in
getSubtargetImpl() while per-function predicates are computed in selectImpl().

Tablegen now manages the PredicateBitset internally. It should only be
necessary to add the required includes.

Also fixed a problem revealed by the test case where
constrainSelectedInstRegOperands() would attempt to tie operands that
BuildMI had already tied.

Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar

Reviewed By: rovka

Subscribers: kristof.beyls, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32491

llvm-svn: 301750
2017-04-29 17:30:09 +00:00
Matthias Braun 744c215e29 TargetLowering: Add finalizeLowering() function; NFC
Adds a new method finalizeLowering to TargetLoweringBase. This is in
preparation for an upcoming commit.

This function is meant for target specific adjustments to
MachineFrameInfo or register reservations.

Move the freezeRegisters() and the hasCopyImplyingStackAdjustment()
handling into the new function to prove the concept. As an added bonus
GlobalISel no longer missed the hasCopyImplyingStackAdjustment()
handling with this.

Differential Revision: https://reviews.llvm.org/D32621

llvm-svn: 301679
2017-04-28 20:25:05 +00:00
Reid Kleckner 6652a52e2b Use Argument::hasAttribute and AttributeList::ReturnIndex more
This eliminates many extra 'Idx' induction variables in loops over
arguments in CodeGen/ and Target/. It also reduces the number of places
where we assume that ReturnIndex is 0 and that we should add one to
argument numbers to get the corresponding attribute list index.

NFC

llvm-svn: 301666
2017-04-28 18:37:16 +00:00
Adrian Prantl 109b236850 Clean up DIExpression::prependDIExpr a little. (NFC)
llvm-svn: 301662
2017-04-28 17:51:05 +00:00
Simon Pilgrim cce5097ce4 Move variable local to where ita used. NFCI.
llvm-svn: 301646
2017-04-28 14:42:15 +00:00
Andrew Ng 03e35b6bc0 [DebugInfo][X86] Improve X86 Optimize LEAs handling of debug values.
This is a follow up to the fix in r298360 to improve the handling of debug
values when redundant LEAs are removed. The fix in r298360 effectively
discarded the debug values. This patch now attempts to preserve the debug
values by using the DWARF DW_OP_stack_value operation via prependDIExpr.

Moved functions appendOffset and prependDIExpr from Local.cpp to
DebugInfoMetadata.cpp and made them available as static member functions of
DIExpression.

Differential Revision: https://reviews.llvm.org/D31604

llvm-svn: 301630
2017-04-28 08:44:30 +00:00
Clement Courbet 5f0ab9e51d [X86][NFC] Refactor RepMovsRepeats in preparation for D32481.
Differential Revision: https://reviews.llvm.org/D32583

llvm-svn: 301628
2017-04-28 07:56:31 +00:00
Craig Topper d0af7e8ab8 [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.

This is largely a mechanical transformation from KnownZero to Known.Zero.

Differential Revision: https://reviews.llvm.org/D32569

llvm-svn: 301620
2017-04-28 05:31:46 +00:00
Craig Topper 0e03e74e95 [SelectionDAG] Use various APInt methods to reduce temporary APInt creation
This patch uses various APInt methods to reduce the number of temporary APInts. These were all found while working through converting SelectionDAG's computeKnownBits to also use the KnownBits struct recently added to the ValueTracking version.

llvm-svn: 301618
2017-04-28 04:57:59 +00:00
Craig Topper 24e71017aa [APInt] Use inplace shift methods where possible. NFCI
llvm-svn: 301612
2017-04-28 03:36:24 +00:00
Igor Breger 360d0f23ee [GlobalISel][X86] handle not symmetric G_COPY
Summary: handle not symmetric G_COPY

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32420

llvm-svn: 301523
2017-04-27 08:02:03 +00:00
Clement Courbet 7b0ec39494 [CodeGen][NFC] Rename 'Src' to 'Val'.
'Src' looks like it was borrowed from memcpy, 'Val' makes more sense for
memset and is consistent with naming within the function.

Differential Revision: https://reviews.llvm.org/D32580

llvm-svn: 301521
2017-04-27 07:22:30 +00:00
Ayman Musa 11966ab00b [X86] Add missing mayLoad/mayStore attributes to some X86 instructions (Continue)
Complete the patch committed in rL300190.

Differential Revision: https://reviews.llvm.org/D32287

llvm-svn: 301393
2017-04-26 11:34:09 +00:00
Andrew V. Tischenko c3c6723ab5 PR31007 and PR27884 will be closed: a possibility to compile constants like 0bH is now supported in MS asm.
llvm-svn: 301390
2017-04-26 09:56:59 +00:00
Ayman Musa d9fb157845 [X86][SSE2] Fix asm string for movq (Move Quadword) instruction.
Replace "mov{d|q}" with "movq".

Differential Revision: https://reviews.llvm.org/D32220

llvm-svn: 301386
2017-04-26 07:08:44 +00:00
Simon Pilgrim d68785803b [SelectionDAG] Added getBuildVector(ArrayRef<SDUse>) helper.
llvm-svn: 301322
2017-04-25 16:41:28 +00:00
Krzysztof Parzyszek c8e8e2a046 Move value type list from TargetRegisterClass to TargetRegisterInfo
Differential Revision: https://reviews.llvm.org/D31937

llvm-svn: 301234
2017-04-24 19:51:12 +00:00
Krzysztof Parzyszek 98ab4c64c4 Revert r301231: Accidentally committed stale files
I forgot to commit local changes before commit.

llvm-svn: 301232
2017-04-24 19:48:51 +00:00
Krzysztof Parzyszek c0197066d7 Move value type list from TargetRegisterClass to TargetRegisterInfo
Differential Revision: https://reviews.llvm.org/D31937

llvm-svn: 301231
2017-04-24 19:43:45 +00:00
Krzysztof Parzyszek 44e25f37ae Move size and alignment information of regclass to TargetRegisterInfo
1. RegisterClass::getSize() is split into two functions:
   - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const;
   - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const;
2. RegisterClass::getAlignment() is replaced by:
   - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const;

This will allow making those values depend on subtarget features in the
future.

Differential Revision: https://reviews.llvm.org/D31783

llvm-svn: 301221
2017-04-24 18:55:33 +00:00
Matthias Braun f9796b76e9 X86RegisterInfo: eliminateFrameIndex: Avoid code duplication; NFC
Re-Commit of r300922 and r300923 with less aggressive assert (see
discussion at the end of https://reviews.llvm.org/D32205)

X86RegisterInfo::eliminateFrameIndex() and
X86FrameLowering::getFrameIndexReference() both had logic to compute the
base register. This consolidates the code.

Also use MachineInstr::isReturn instead of manually enumerating tail
call instructions (return instructions were not included in the previous
list because they never reference frame indexes).

Differential Revision: https://reviews.llvm.org/D32206

llvm-svn: 301211
2017-04-24 18:15:00 +00:00
Igor Breger 87aafa073f [GlobalISel][X86] Lower FormalArgument/Ret using G_MERGE_VALUES/G_UNMERGE_VALUES.
Summary: [GlobalISel][X86] Lower FormalArgument/Ret using G_MERGE_VALUES/G_UNMERGE_VALUES.

Reviewers: zvi, t.p.northover, guyblank

Reviewed By: t.p.northover

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D32288

llvm-svn: 301194
2017-04-24 17:05:52 +00:00
Simon Pilgrim 06d6263309 [X86][SSE] Add scheduler class support for SSE42 (PCMPGT) instructions
llvm-svn: 301142
2017-04-23 21:23:27 +00:00
Renato Golin 4abfb3d741 Revert "[APInt] Fix a few places that use APInt::getRawData to operate within the normal API."
This reverts commit r301105, 4, 3 and 1, as a follow up of the previous
revert, which broke even more bots.

For reference:
Revert "[APInt] Use operator<<= where possible. NFC"
Revert "[APInt] Use operator<<= instead of shl where possible. NFC"
Revert "[APInt] Use ashInPlace where possible."

PR32754.

llvm-svn: 301111
2017-04-23 12:15:30 +00:00
Ayman Musa 137c44fe64 [X86][MPX] Add load & store instructions of bnd values to getLoadStoreRegOpcode function.
This is needed for a follow up patch that generates the memory folding tables.

Differential Revision: https://reviews.llvm.org/D32232

llvm-svn: 301109
2017-04-23 08:28:42 +00:00
Craig Topper cdd5ae6676 [APInt] Use operator<<= where possible. NFC
llvm-svn: 301104
2017-04-23 05:43:02 +00:00
Craig Topper 5f68af0806 [APInt] Use operator<<= instead of shl where possible. NFC
llvm-svn: 301103
2017-04-23 05:18:31 +00:00
Craig Topper ae9672c96d [APInt] Use ashInPlace where possible.
llvm-svn: 301101
2017-04-23 03:45:59 +00:00
Hans Wennborg 9b9a5358dd Re-commit r301040 "X86: Don't emit zero-byte functions on Windows"
In addition to the original commit, tighten the condition for when to
pad empty functions to COFF Windows.  This avoids running into problems
when targeting e.g. Win32 AMDGPU, which caused test failures when this
was committed initially.

llvm-svn: 301047
2017-04-21 21:48:41 +00:00
Hans Wennborg 04593000d8 Revert r301040 "X86: Don't emit zero-byte functions on Windows"
This broke almost all bots. Reverting while fixing.

llvm-svn: 301041
2017-04-21 21:10:37 +00:00
Hans Wennborg cb3e810714 X86: Don't emit zero-byte functions on Windows
Empty functions can lead to duplicate entries in the Guard CF Function
Table of a binary due to multiple functions sharing the same RVA,
causing the kernel to refuse to load that binary.

We had a terrific bug due to this in Chromium.

It turns out we were already doing this for Mach-O in certain
situations. This patch expands the code for that in
AsmPrinter::EmitFunctionBody() and renames
TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it
seems it was used for not just Mach-O anyway.

Differential Revision: https://reviews.llvm.org/D32330

llvm-svn: 301040
2017-04-21 20:58:12 +00:00
Matthias Braun 1a9062408f Revert "X86RegisterInfo: eliminateFrameIndex: Avoid code duplication; NFC"
It seems we have on situation in a sanitizer enable bootstrap build
where the return instruction has a frame index operand that does not
point to a fixed object and fails the assert added here.

This reverts commit r300923.
This reverts commit r300922.

llvm-svn: 301024
2017-04-21 19:26:45 +00:00
Akira Hatanaka 22e839f4b2 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300932 and r300930, which was causing dag-combine to
loop forever. The problem was that optimizeLogicalImm was returning
true even when there was no change to the immediate node (which happened
when the immediate was all zeros or ones), which caused dag-combine to
push and pop the same node to the work list over and over again without
making any progress.

This commit fixes the bug by returning false early in optimizeLogicalImm
if the immediate is all zeros or ones. Also, it changes the code to
compare the immediate with 0 or Mask rather than calling
countPopulation.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 301019
2017-04-21 18:53:12 +00:00
Daniel Sanders e7b0d66080 [globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule.
Summary:
The SelectionDAG importer now imports rules with Predicate's attached via
Requires, PredicateControl, etc. These predicates are implemented as
bitset's to allow multiple predicates to be tested together. However,
unlike the MC layer subtarget features, each target only pays for it's own
predicates (e.g. AArch64 doesn't have 192 feature bits just because X86
needs a lot).

Both AArch64 and X86 derive at least one predicate from the MachineFunction
or Function so they must re-initialize AvailableFeatures before each
function. They also declare locals in <Target>InstructionSelector so that
computeAvailableFeatures() can use the code from SelectionDAG without
modification.

Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab

Reviewed By: rovka

Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D31418

llvm-svn: 300993
2017-04-21 15:59:56 +00:00
Daniel Sanders 419efdd55b Revert r300964 + r300970 - [globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule.
It's causing llvm-clang-x86_64-expensive-checks-win to fail to compile and I
haven't worked out why. Reverting to make it green while I figure it out.

llvm-svn: 300978
2017-04-21 14:09:20 +00:00
Daniel Sanders 279d03527e [globalisel][tablegen] Import SelectionDAG's rule predicates and support the equivalent in GIRule.
Summary:
The SelectionDAG importer now imports rules with Predicate's attached via
Requires, PredicateControl, etc. These predicates are implemented as
bitset's to allow multiple predicates to be tested together. However,
unlike the MC layer subtarget features, each target only pays for it's own
predicates (e.g. AArch64 doesn't have 192 feature bits just because X86
needs a lot).

Both AArch64 and X86 derive at least one predicate from the MachineFunction
or Function so they must re-initialize AvailableFeatures before each
function. They also declare locals in <Target>InstructionSelector so that
computeAvailableFeatures() can use the code from SelectionDAG without
modification.

Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab

Reviewed By: rovka

Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D31418

llvm-svn: 300964
2017-04-21 10:27:20 +00:00
Clement Courbet 41b4333066 typo
llvm-svn: 300963
2017-04-21 09:21:05 +00:00
Clement Courbet d5f6182bec use repmovsb when optimizing forminsize
llvm-svn: 300960
2017-04-21 09:20:55 +00:00
Clement Courbet 203fc17797 Rename FastString flag.
llvm-svn: 300959
2017-04-21 09:20:50 +00:00
Clement Courbet 1ce3b82dea X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copies
when the subtarget has fast strings.

This has two advantages:
  - Speed is improved. For example, on Haswell thoughput improvements increase
    linearly with size from 256 to 512 bytes, after which they plateau:
    (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes).
  - Code is much smaller (no need to handle boundaries).

llvm-svn: 300957
2017-04-21 09:20:39 +00:00
Clement Courbet 8177fee513 Delete dead code
llvm-svn: 300952
2017-04-21 07:40:59 +00:00
Akira Hatanaka 78ccba6a20 Revert r300932 and r300930.
It seems that r300930 was creating an infinite loop in dag-combine when
compling the following file:

MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c

llvm-svn: 300940
2017-04-21 01:31:50 +00:00
Akira Hatanaka 19077aaee0 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 300930
2017-04-21 00:05:16 +00:00
Matthias Braun 9610a26251 X86RegisterInfo: eliminateFrameIndex: Avoid code duplication; NFC
X86RegisterInfo::eliminateFrameIndex() and
X86FrameLowering::getFrameIndexReference() both had logic to compute the
base register. This consolidates the code.

Also use MachineInstr::isReturn instead of manually enumerating tail
call instructions (return instructions were not included in the previous
list because they never reference frame indexes).

Differential Revision: https://reviews.llvm.org/D32206

llvm-svn: 300923
2017-04-20 23:34:50 +00:00
Matthias Braun 63e3e8ce72 X86RegisterInfo: eliminateFrameIndex: Force SP for AfterFPPop; NFC
AfterFPPop is used for tailcall/tailjump instructions. We shouldn't ever
have frame-pointer/base-pointer relative addressing for those. After all
the frame/base pointer should already be restored to their previous
values at the return.

Make this fact explicit in preparation for an upcoming refactoring.

Differential Revision: https://reviews.llvm.org/D32205

llvm-svn: 300922
2017-04-20 23:34:46 +00:00
Akira Hatanaka 7b06cebe73 Revert "[AArch64] Improve code generation for logical instructions taking"
This reverts r300913.

This broke bots.

llvm-svn: 300916
2017-04-20 23:03:30 +00:00
Akira Hatanaka e327f09832 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

llvm-svn: 300913
2017-04-20 22:47:56 +00:00
Benjamin Kramer 58dadd59d9 Fix use-after-frees on memory allocated in a Recycler.
This will become asan errors once the patch lands that poisons the
memory after free. The x86 change is a hack, but I don't see how to
solve this properly at the moment.

llvm-svn: 300867
2017-04-20 18:29:14 +00:00
Craig Topper bcfd2d1789 [APInt] Rename getSignBit to getSignMask
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.

Differential Revision: https://reviews.llvm.org/D32108

llvm-svn: 300856
2017-04-20 16:56:25 +00:00
Matthias Braun 372ee59766 X86FrameLowering: Fix getFrameIndexReference() for 'fixed' objects
Debug information is calculated with getFrameIndexReference() which was
missing some logic for the fixed object cases (= parameters on the stack).

rdar://24557797

Differential Revision: https://reviews.llvm.org/D32204

llvm-svn: 300781
2017-04-19 23:10:43 +00:00
Dehao Chen 58601674d2 PR32710: Disable using PMADDWD for unsigned short.
Summary: PMADDWD can only handle signed short.

Reviewers: mkuper, wmi

Reviewed By: mkuper

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32236

llvm-svn: 300737
2017-04-19 19:50:34 +00:00
Igor Breger 4fdf1e489c [GlobalIsel][X86] support G_TRUNC selection.
Summary:
[GlobalIsel][X86] support G_TRUNC selection.
Add regbank-select and legalizer tests. Currently legalization of trunc i64 on 32bit platform not supported.

Reviewers: ab, zvi, rovka

Reviewed By: zvi

Subscribers: dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D32115

llvm-svn: 300678
2017-04-19 11:34:59 +00:00
Sanjoy Das f09c1e346e Add a getPointerOperandType() helper to LoadInst and StoreInst; NFC
I will use this in a later change.

llvm-svn: 300613
2017-04-18 22:00:54 +00:00
Matt Arsenault 3138075dd4 DAG: Make mayBeEmittedAsTailCall parameter const
llvm-svn: 300603
2017-04-18 21:16:46 +00:00
Simon Pilgrim e8ad1da4e2 [X86] Use for-range loop. NFCI.
llvm-svn: 300567
2017-04-18 17:18:54 +00:00
Craig Topper fc947bcfba [APInt] Use lshrInPlace to replace lshr where possible
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.

This adds an lshrInPlace(const APInt &) version as well.

Differential Revision: https://reviews.llvm.org/D32155

llvm-svn: 300566
2017-04-18 17:14:21 +00:00
Konstantin Zhuravlyov dc77b2e960 Distinguish between code pointer size and DataLayout::getPointerSize() in DWARF info generation
llvm-svn: 300463
2017-04-17 17:41:25 +00:00
Benjamin Kramer f5f593b674 [X86] Remove special handling for 16 bit for A asm constraints.
Our 16 bit support is assembler-only + the terrible hack that is
.code16gcc. Simply using 32 bit registers does the right thing for the
latter.

Fixes PR32681.

llvm-svn: 300429
2017-04-16 20:13:08 +00:00
Dimitry Andric 909b3376ba Use correct registers for "A" inline asm constraint
Summary:
In PR32594, inline assembly using the 'A' constraint on x86_64 causes
llvm to crash with a "Cannot select" stack trace.  This is because
`X86TargetLowering::getRegForInlineAsmConstraint` hardcodes that 'A'
means the EAX and EDX registers.

However, on x86_64 it means the RAX and RDX registers, and on 16-bit x86
(ia16?) it means the old AX and DX registers.

Add new register classes in `X86RegisterInfo.td` to support these cases,
and amend the logic in `getRegForInlineAsmConstraint` to cope with
different subtargets.  Also add a test case, derived from PR32594.

Reviewers: craig.topper, qcolombet, RKSimon, ab

Reviewed By: ab

Subscribers: ab, emaste, royger, llvm-commits

Differential Revision: https://reviews.llvm.org/D31902

llvm-svn: 300404
2017-04-15 22:15:01 +00:00
Reid Kleckner fb502d2f5e [IR] Make paramHasAttr to use arg indices instead of attr indices
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.

Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.

llvm-svn: 300367
2017-04-14 20:19:02 +00:00
Simon Pilgrim 5a22eaa2bf [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (LLVM)
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.

Clang companion patch: D31766.

Differential Revision: https://reviews.llvm.org/D31767

llvm-svn: 300325
2017-04-14 15:05:35 +00:00
Andrew V. Tischenko 75745d0c3e This patch closes PR#32216: Better testing of schedule model instruction latencies/throughputs.
The details are here: https://reviews.llvm.org/D30941

llvm-svn: 300311
2017-04-14 07:44:23 +00:00
Serge Pavlov 49acf9c8eb Use methods to access data stored with frame instructions
Instructions CALLSEQ_START..CALLSEQ_END and their target dependent
counterparts keep data like frame size, stack adjustment etc. These
data are accessed by getOperand using hard coded indices. It is
error prone way. This change implements the access by special methods,
which improve readability and allow changing data representation without
massive changes of index values.

Differential Revision: https://reviews.llvm.org/D31953

llvm-svn: 300196
2017-04-13 14:10:52 +00:00
Ayman Musa 62d1c71676 [X86] Added missing mayLoad/mayStore attributes to some X86 instructions.
Throughout the effort of automatically generating the X86 memory folding tables these missing information were encountered.
This is a preparation work for a future patch including the automation of these tables.

Differential Revision: https://reviews.llvm.org/D31714

llvm-svn: 300190
2017-04-13 10:03:45 +00:00
Ayman Musa c494718050 [X86] Change instructions names to keep consistency with the naming convention. NFC
Differential Revision: https://reviews.llvm.org/D31743

llvm-svn: 300184
2017-04-13 09:12:32 +00:00
Easwaran Raman 02a0e91831 Fix the bootstrap failure caused by r299986.
llvm-svn: 300069
2017-04-12 15:26:15 +00:00
Igor Breger 3b97ea39e7 [GlobalIsel][X86] support G_CONSTANT selection.
Summary: [GlobalISel][X86] support G_CONSTANT selection. Add regbank select tests.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: llvm-commits, dberris, rovka, kristof.beyls

Differential Revision: https://reviews.llvm.org/D31974

llvm-svn: 300057
2017-04-12 12:54:54 +00:00
Jonas Paulsson fccc7d66c3 [SystemZ] TargetTransformInfo cost functions implemented.
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.

Interleaved access vectorization enabled.

BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.

Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631

llvm-svn: 300052
2017-04-12 11:49:08 +00:00
Easwaran Raman ddb9ae192a [x86] Relax the check in areLoadsFromSameBasePtr
Check if the scale operand is identical (doesn't have to be 1) and
do not check the chaain operand.

Differential revision: https://reviews.llvm.org/D31833

llvm-svn: 299986
2017-04-11 21:05:02 +00:00
Davide Italiano 8455f7d623 [X86] Create the correct ADC/SBB SDNode when lowering add.
Differential Revision:  https://reviews.llvm.org/D31911

llvm-svn: 299973
2017-04-11 19:11:20 +00:00
Serge Guelton 59a2d7b909 Module::getOrInsertFunction is using C-style vararg instead of variadic templates.
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments.
The variadic template is an obvious solution to both issues.

Differential Revision: https://reviews.llvm.org/D31070

llvm-svn: 299949
2017-04-11 15:01:18 +00:00
Diana Picus b050c7fbe0 Revert "Turn some C-style vararg into variadic templates"
This reverts commit r299925 because it broke the buildbots. See e.g.
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/6008

llvm-svn: 299928
2017-04-11 10:07:12 +00:00
Serge Guelton 5fd75fb72e Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.

From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.

llvm-svn: 299925
2017-04-11 08:36:52 +00:00
Simon Pilgrim b6702eaec3 [X86][MMX] Add fast-isel support for MMX non-temporal writes
Differential Revision: https://reviews.llvm.org/D31754

llvm-svn: 299852
2017-04-10 16:58:07 +00:00
Dehao Chen 58fa724494 Use PMADDWD to expand reduction in a loop
Summary:
PMADDWD can help improve 8/16 bit integer mutliply-add operation performance for cases like:

for (int i = 0; i < count; i++)
  a += x[i] * y[i];

Reviewers: wmi, davidxl, hfinkel, RKSimon, zvi, mkuper

Reviewed By: mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31679

llvm-svn: 299776
2017-04-07 15:41:52 +00:00
Igor Breger 2953788c36 [GlobalISel] implement narrowing for G_CONSTANT.
Summary: [GlobalISel] implement narrowing for G_CONSTANT.

Reviewers: bogner, zvi, t.p.northover

Reviewed By: t.p.northover

Subscribers: llvm-commits, dberris, rovka, kristof.beyls

Differential Revision: https://reviews.llvm.org/D31744

llvm-svn: 299772
2017-04-07 14:41:59 +00:00
Michael Kuperstein 6129887d21 [X86] Revert r299387 due to AVX legalization infinite loop.
llvm-svn: 299720
2017-04-06 22:33:25 +00:00
Mehdi Amini db11fdfda5 Revert "Turn some C-style vararg into variadic templates"
This reverts commit r299699, the examples needs to be updated.

llvm-svn: 299702
2017-04-06 20:23:57 +00:00
Mehdi Amini 579540a8f7 Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.

From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.

Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>

Differential Revision: https://reviews.llvm.org/D31070

llvm-svn: 299699
2017-04-06 20:09:31 +00:00
Daniel Sanders 0b5293f6ae [globalisel][tablegen] Move <Target>InstructionSelector declarations to anonymous namespaces
Summary: This resolves the issue of tablegen-erated includes in the headers for non-GlobalISel builds in a simpler way than before.

Reviewers: qcolombet, ab

Reviewed By: ab

Subscribers: igorb, ab, mgorny, dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30998

llvm-svn: 299637
2017-04-06 09:49:34 +00:00
Keno Fischer 1ec5dd85a2 [X86 TTI] Implement LSV hook
Summary:
LSV wants to know the maximum size that can be loaded to a vector register.
On X86, this always matches the maximum register width. Implement this
accordingly and add a test to make sure that LSV can vectorize up to the
maximum permissible width on X86.

Reviewers: delena, arsenm

Reviewed By: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D31504

llvm-svn: 299589
2017-04-05 20:51:38 +00:00
Sanjay Patel b2f1621bb1 [DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to bitwise logic+setcc (PR32401)
This is a generic combine enabled via target hook to reduce icmp logic as discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32401

It's likely that other targets will want to enable this hook for scalar transforms, 
and there are probably other patterns that can use bitwise logic to reduce comparisons.

Note that we are missing an IR canonicalization for these patterns, and we will probably
prefer the pair-of-compares form in IR (shorter, more likely to fold).

Differential Revision: https://reviews.llvm.org/D31483

llvm-svn: 299542
2017-04-05 14:09:39 +00:00
Simon Pilgrim 5fbd93b21a [X86][SSE] Renamed combine to make it clear that it only handles the vector shift by immediate opcodes. NFCI
llvm-svn: 299532
2017-04-05 10:44:42 +00:00
Alex Bradbury 866113c2ea Add MCContext argument to MCAsmBackend::applyFixup for error reporting
A number of backends (AArch64, MIPS, ARM) have been using
MCContext::reportError to report issues such as out-of-range fixup values in
their TgtAsmBackend. This is great, but because MCContext couldn't easily be
threaded through to the adjustFixupValue helper function from its usual
callsite (applyFixup), these backends ended up adding an MCContext* argument
and adding another call to applyFixup to processFixupValue. Adding an
MCContext parameter to applyFixup makes this unnecessary, and even better -
applyFixup can take a reference to MCContext rather than a potentially null
pointer.

Differential Revision: https://reviews.llvm.org/D30264

llvm-svn: 299529
2017-04-05 10:16:14 +00:00
Ahmed Bougacha ec8b1fb539 [X86] Relax assert in broadcast-of-subvector lowering.
Before r294774, there was a problem when lowering broadcasts to use
128-bit subvectors.

When we looked through a bitcast to find the broadcast input, we'd keep
using the original type, so you'd end up with things like:
  (v8f32 (broadcast
    (v4f32 (extract_subvector
      (v8i32 V),
      ...))
    ))

r294774 fixed it to always emit subvectors with the scalar type of the
original source.

It also introduced some asserts, to check that we use scalars with
the same size, and vectors with the same number of elements.

The scalar size equality is checked earlier when looking through bitcasts,
and is a useful assert.

However, the number of elements don't have to be identical: we're always
going to extract a 128-bit subvector, and we can have different size
inputs if we looked through a concat_vector to find a 256-bit source.

Relax the overzealous assert.

Replace it with a check of the original source vector being 256 or 512
bits.  If it's 128 bits, we can't extract_subvector from it.

Fixes PR32371.

llvm-svn: 299490
2017-04-05 00:14:39 +00:00
Sanjay Patel ac618383e3 [x86] remove dead select-of-constants transform; NFCI
https://reviews.llvm.org/D30537 / https://reviews.llvm.org/rL296977 added these transforms
and other related transforms to the generic DAGCombiner (with a hook that x86 sets to true),
so these patterns should not exist by the time we reach the target-specific combiner hook.

llvm-svn: 299448
2017-04-04 16:54:58 +00:00
Coby Tayree 2cb497afa4 [X86][MS-compatability]Allow named synonymous for MS-assembly operators
This patch enhances X86AsmParser's immediate expression parsing abilities, to include a named synonymous for selected binary/unary bitwise operators: {and,shl,shr,or,xor,not}, ultimately achieving better MS-compatability
MASM reference:
https://msdn.microsoft.com/en-us/library/94b6khh4.aspx

Differential Revision: D31277

llvm-svn: 299439
2017-04-04 14:43:23 +00:00
Simon Pilgrim 448222d8ba Strip trailing whitespace
llvm-svn: 299438
2017-04-04 14:40:53 +00:00
Michael Zuckerman 88fb171015 [X86][LLVM] Converting __mm{|256|512}_movm_epi{8|16|32|64} LLVMIR call into generic intrinsics.
This patch is a part one of two reviews, one for the clang and the other for LLVM. 
The patch deletes the back-end intrinsics and adds support for them in the auto upgrade.

Differential Revision: https://reviews.llvm.org/D31393

llvm-svn: 299432
2017-04-04 13:32:14 +00:00
Oren Ben Simhon 568fb197da [X86] Add 64 bit pattern matching for PSADBW
PSADBW pattern currently supports the 32 bit IR pattern and only GLT (greather than) comparison.
The patch extends the pattern to catch also 64 bit IR pattern and includes all other comparison types (not only GLT).

Differential Revision: https://reviews.llvm.org/D31577

llvm-svn: 299425
2017-04-04 10:23:18 +00:00
Simon Pilgrim af33757b5d [X86][SSE]] Lower BUILD_VECTOR with repeated elts as BUILD_VECTOR + VECTOR_SHUFFLE
It can be costly to transfer from the gprs to the xmm registers and can prevent loads merging.

This patch splits vXi16/vXi32/vXi64 BUILD_VECTORS that use the same operand in multiple elements into a BUILD_VECTOR with only a single insertion of each of those elements and then performs an unary shuffle to duplicate the values.

There are a couple of minor regressions this patch unearths due to some missing MOVDDUP/BROADCAST folds that I will address in a future patch.

Note: Now that vector shuffle lowering and combining is pretty good we should be reusing that instead of duplicating so much in LowerBUILD_VECTOR - this is the first of several patches to address this.

Differential Revision: https://reviews.llvm.org/D31373

llvm-svn: 299387
2017-04-03 21:06:51 +00:00
Amjad Aboud 0389f62879 x86 interrupt calling convention: re-align stack pointer on 64-bit if an error code was pushed
The x86_64 ABI requires that the stack is 16 byte aligned on function calls. Thus, the 8-byte error code, which is pushed by the CPU for certain exceptions, leads to a misaligned stack. This results in bugs such as Bug 26413, where misaligned movaps instructions are generated.

This commit fixes the misalignment by adjusting the stack pointer in these cases. The adjustment is done at the beginning of the prologue generation by subtracting another 8 bytes from the stack pointer. These additional bytes are popped again in the function epilogue.

Fixes Bug 26413

Patch by Philipp Oppermann.

Differential Revision: https://reviews.llvm.org/D30049

llvm-svn: 299383
2017-04-03 20:28:45 +00:00
Craig Topper d33ee1b960 [APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt class. Implement them without memory allocation for multiword
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.

Differential Revision: https://reviews.llvm.org/D31565

llvm-svn: 299362
2017-04-03 16:34:59 +00:00
Simon Pilgrim 0e2f8cd875 [X86][MMX] Improve support for folding fptosi from XMM to MMX
llvm-svn: 299338
2017-04-02 17:45:41 +00:00