Commit Graph

7544 Commits

Author SHA1 Message Date
Craig Topper 707c89c00d [AVX512] Add non-temporal store patterns for v16i32/v32i16/v64i8.
llvm-svn: 268889
2016-05-08 23:43:17 +00:00
Craig Topper c41320d700 [AVX512] Add missing patterns for non-temporal stores of 128/256-bit vXi8/vXi16/vXi32 when VLX is enabled. The equivalent AVX1/2 patterns are disabled by VLX.
This caused regular stores to be emitted instead.

llvm-svn: 268886
2016-05-08 23:08:45 +00:00
Craig Topper e5ce84a33c [AVX512] Add VLX 128/256-bit SET0 operations that encode to 128/256-bit EVEX encoded VPXORD so all 32 registers can be used.
llvm-svn: 268884
2016-05-08 21:33:53 +00:00
Craig Topper 298b6d7493 [X86] Re-generate tests using update_llc_test_checks.py to prepare for a future commit. NFC
llvm-svn: 268883
2016-05-08 21:33:47 +00:00
Craig Topper 092794b82a Remove Windows line endings in some tests to prepare for a future commit. NFC
llvm-svn: 268882
2016-05-08 21:33:44 +00:00
Craig Topper d681e23336 [X86] Lower 256-bit vector all-zero constants to v8i32 even with AVX1 only. Either way a 256-bit VXORPS will be used.
llvm-svn: 268873
2016-05-08 07:10:54 +00:00
Craig Topper 3d6722910c [X86] Add patterns for 256-bit non-temporal stores when only AVX1 is supported. While there, add a predicate to the SSE2 patterns to avoid an ordering dependency.
llvm-svn: 268872
2016-05-08 07:10:50 +00:00
Craig Topper d788498411 [X86] No need to avoid selecting AVX_SET0 for 256-bit integer types when only AVX1 is supported. AVX_SET0 just expands to 256-bit VXORPS which is legal in AVX1.
llvm-svn: 268871
2016-05-08 07:10:47 +00:00
Simon Pilgrim b6f82c449a [SelectionDAG] Added bitreverse(bitreverse(v)) --> v
Added bitreverse creation testing

llvm-svn: 268865
2016-05-07 20:12:36 +00:00
Simon Pilgrim 8ef046a8ca [X86] Added BITREVERSE constant folding and identity tests
Identity tests are currently failing - this will be fixed soon

llvm-svn: 268862
2016-05-07 19:04:00 +00:00
Sanjay Patel c2751e7050 [x86, BMI] add TLI hook for 'andn' and use it to simplify comparisons
For the sake of minimalism, this patch is x86 only, but I think that at least
PPC, ARM, AArch64, and Sparc probably want to do this too.

We might want to generalize the hook and pattern recognition for a target like
PPC that has a full assortment of negated logic ops (orc, nand).

Note that http://reviews.llvm.org/D18842 will cause this transform to trigger
more often.

For reference, this relates to:
https://llvm.org/bugs/show_bug.cgi?id=27105
https://llvm.org/bugs/show_bug.cgi?id=27202
https://llvm.org/bugs/show_bug.cgi?id=27203
https://llvm.org/bugs/show_bug.cgi?id=27328

Differential Revision: http://reviews.llvm.org/D19087

llvm-svn: 268858
2016-05-07 15:03:40 +00:00
Ahmed Bougacha 04a8fc2e37 [X86] Teach X86FixupBWInsts to promote MOV8rr/MOV16rr to MOV32rr.
This re-applies r268760, reverted in r268794.
Fixes http://llvm.org/PR27670

The original imp-defs assertion was way overzealous: forward all
implicit operands, except imp-defs of the new super-reg def (r268787
for GR64, but also possible for GR16->GR32), or imp-uses of the new
super-reg use.
While there, mark the source use as Undef, and add an imp-use of the
old source reg: that should cover any case of dead super-regs.

At the stage the pass runs, flags are unlikely to matter anyway;
still, let's be as correct as possible.

Also add MIR tests for the various interesting cases.

Original commit message:
Codesize is less (16) or equal (8), and we avoid partial
dependencies.

Differential Revision: http://reviews.llvm.org/D19999

llvm-svn: 268831
2016-05-07 01:11:17 +00:00
Quentin Colombet a09f050dc1 Revert "[X86] Add a new LOW32_ADDR_ACCESS_RBP register class."
This reverts commit r268796.
I believe it breaks test/CodeGen/X86/asm-mismatched-types.ll with:
Cannot emit physreg copy instruction

llvm-svn: 268799
2016-05-06 21:21:50 +00:00
Quentin Colombet 2728074e3c [X86] Add a new LOW32_ADDR_ACCESS_RBP register class.
ABIs like NaCl uses 32-bit addresses but have 64-bit frame.
The new register class reflects those constraints when choosing a
register class for a address access.

llvm-svn: 268796
2016-05-06 21:10:53 +00:00
Nico Weber 9b32b4fbee Revert r268760, it caused PR27670.
llvm-svn: 268794
2016-05-06 21:07:02 +00:00
Eric Christopher e3f7d3df3c The associated PR for this test was PR27135 not PR27132.
llvm-svn: 268772
2016-05-06 18:23:14 +00:00
Ahmed Bougacha 258426ca7a [X86] Teach X86FixupBWInsts to promote MOV8rr/MOV16rr to MOV32rr.
Codesize is less (16) or equal (8), and we avoid partial dependencies.

Differential Revision: http://reviews.llvm.org/D19999

llvm-svn: 268760
2016-05-06 17:42:57 +00:00
Ahmed Bougacha 16547c4e31 [CodeGen] Round [SU]INT_TO_FP result when promoting from f16.
If we don't, values that aren't precisely representable in f16 could
be used as-is in a promoted f32 operation, which would produce
incorrect results.

AArch64 had the correct behavior; add a focused test.

Fixes http://llvm.org/PR26871

llvm-svn: 268700
2016-05-06 00:58:00 +00:00
Marcin Koscielnicki 0275fac2c9 [X86] Extend some Linux special cases to cover kFreeBSD.
Both Linux and kFreeBSD use glibc, so follow similiar code paths.
Add isTargetGlibc to check for this, and use it instead of isTargetLinux
in a few places.

Fixes PR22248 for kFreeBSD.

Differential Revision: http://reviews.llvm.org/D19104

llvm-svn: 268624
2016-05-05 11:35:51 +00:00
David Majnemer 911d0e3c21 [X86] Use the right type when folding xor (truncate (shift)) -> setcc
The result type of setcc is dependent on whether or not AVX512 is
present.
We had an X86-specific DAG-combine which assumed that the result type
should be i8 when it could be i1.
This meant that we would generate illegal setccs which LowerSETCC did
not like.

Instead, use an appropriate type and zero extend to i8.

Also, there were some scenarios where the fold should have fired but
didn't because we were overly cautious about the types.  This meant that
we generated:

        shrl    $31, %edi
        andl    $1, %edi
        kmovw   %edi, %k0
        kxnorw  %k0, %k0, %k1
        kshiftrw        $15, %k1, %k1
        kxorw   %k1, %k0, %k0
        kmovw   %k0, %eax

instead of:

        testl   %edi, %edi
        setns   %al

This fixes PR27638.

llvm-svn: 268609
2016-05-05 06:00:56 +00:00
Quentin Colombet 0c5bfd0514 [X86] Add a few register classes for x32 address accesses.
The new register classes allow to tell the machine verifier that it is
fine to use RIP for address accesses in x32 mode. Prior to that patch,
we would complain that we are using a GR64 in place of GR32, whereas it
is actually fine to use GR64 for x32 as long as the 32 high bits are 0s.
RIP has this property and is used for RIP-relative addressing.

This partially fixes http://llvm.org/PR27481.

llvm-svn: 268567
2016-05-04 22:45:31 +00:00
Simon Pilgrim 1f5ad702f8 [SelectionDAG] BITREVERSE vector legalization of bit operations (REAPPLIED)
Some vector bit operations are promoted instead of having custom lowering. This patch changes the isOperationLegalOrCustom tests for vector AND/OR operations to use a new TLI helper isOperationLegalOrCustomOrPromote instead, allowing the SSE implementations to stay on the simd unit.

Differential Revision: http://reviews.llvm.org/D19805

llvm-svn: 268561
2016-05-04 22:08:51 +00:00
Sanjay Patel 13d57b94bb [x86] add tests to show current codegen for obscured fneg/fabs
llvm-svn: 268533
2016-05-04 19:06:03 +00:00
Simon Pilgrim 1a14f0d25c Revert r268504
llvm-svn: 268526
2016-05-04 17:49:14 +00:00
Simon Pilgrim bc0e1d7492 [X86][SSE] Regenerate vector bswap tests
llvm-svn: 268514
2016-05-04 15:45:48 +00:00
Simon Pilgrim b97c06210b [SelectionDAG] BITREVERSE vector legalization of bit operations
Vector bit operations are typically promoted instead of having custom lowering. This patch changes the isOperationLegalOrCustom tests for vector AND/OR operations to use isOperationLegalOrPromote instead, allowing the SSE implementations to stay on the simd unit.

Differential Revision: http://reviews.llvm.org/D19805

llvm-svn: 268504
2016-05-04 15:01:13 +00:00
Elena Demikhovsky 24aba1ca38 The test files are auto-generated by update_llc_test_checks.py utility.
No functional changes.

llvm-svn: 268498
2016-05-04 14:31:18 +00:00
David Majnemer 2c5aeabedd [X86] Lower zext i1 arguments
i1 is now a legal type for X86 with AVX512.
There were some paths in X86FastISel which were not quite ready to see
an i1 value: they were not quite sure how to deal with sign/zero extends
for call arguments.
DTRT by extending to i8 for zeroext and bailing out of FastISel for
signext.

This fixes PR27591.

llvm-svn: 268470
2016-05-04 00:22:23 +00:00
Simon Pilgrim fb1766ad68 [X86][XOP] Add placeholder VPERMIL2 combining tests
llvm-svn: 268450
2016-05-03 21:55:37 +00:00
Tim Northover d2ecbccf27 X86-Darwin: start emitting data-region directives for jump-tables.
The surrounding tools can cope these days, and they were invented for a reason.

llvm-svn: 268437
2016-05-03 21:03:41 +00:00
Quentin Colombet 26dab3a485 [ImplicitNullChecks] Account for implicit-defs as well when updating the liveness.
The replaced load may have implicit-defs and those defs may be used
in the block of the original load. Make sure to update the liveness
accordingly.

This is a generalization of r267817.

llvm-svn: 268412
2016-05-03 18:09:06 +00:00
Simon Pilgrim d2752708a3 [X86][SSE] Added target shuffle combine to MOVQ
llvm-svn: 268391
2016-05-03 15:05:13 +00:00
Simon Pilgrim 32e78c3ff7 [X86][SSSE3] Missing combine opportunity to simplify to a MOVQ shuffle
llvm-svn: 268378
2016-05-03 13:12:44 +00:00
Igor Breger 58c07806ae [AVX512] Add support for commutative MAX/MIN . In general VMAX{PS,PD} and VMIN{PS,PD} instruction are not commutative . In combine pass only if UnsafeFPMath are used VMAX/VMAX are converted to commutative nodes VMAXC/VMAXC.
Differential Revision: http://reviews.llvm.org/D19860

llvm-svn: 268375
2016-05-03 11:51:45 +00:00
Igor Breger ab076c683c [AVX512] Fix lowerV4X128VectorShuffle to select correctly input operands .
Differential Revision: http://reviews.llvm.org/D19803

llvm-svn: 268368
2016-05-03 08:08:44 +00:00
Quentin Colombet 776e6de516 [MachineBlockPlacement] Let the target optimize the branches at the end.
After the layout of the basic blocks is set, the target may be able to get rid
of unconditional branches to fallthrough blocks that the generic code does not
catch. This happens any time TargetInstrInfo::AnalyzeBranch is not able to
analyze all the branches involved in the terminators sequence, while still
understanding a few of them.

In such situation, AnalyzeBranch can directly modify the branches if it has been
instructed to do so.

This patch takes advantage of that.

llvm-svn: 268328
2016-05-02 22:58:59 +00:00
Quentin Colombet 4e1d389ac5 [X86] Model FAULTING_LOAD_OP as a terminator and branch.
This operation may branch to the handler block and we do not want it
to happen anywhere within the basic block.
Moreover, by marking it "terminator and branch" the machine verifier
does not wrongly assume (because of AnalyzeBranch not knowing better)
the branch is analyzable. Indeed, the target was seeing only the
unconditional branch and not the faulting load op and thought it was
a simple unconditional block.
The machine verifier was complaining because of that and moreover,
other optimizations could have done wrong transformation!

In the process, simplify the representation of the handler block in
the faulting load op. Now, we directly reference the handler block
instead of using a label. This has the benefits of:
1. MC knows how to issue a label for a BB, so leave that to it.
2. Accessing the target BB from its label is painful, whereas it is
   direct from a MBB operand.

Note: The 2 bytes offset in implicit-null-check.ll comes from the
fact the unconditional jumps are not removed anymore, as the whole
terminator sequence is not analyzable anymore.

Will fix it in a subsequence commit.

llvm-svn: 268327
2016-05-02 22:58:54 +00:00
Simon Pilgrim 21b2c5660e [X86][AVX2] Added 128-bit wide shuffle test
Demonstrate missing 128-bit wide shuffle combine support

llvm-svn: 268290
2016-05-02 19:46:58 +00:00
David L Kreitzer 0fe4632bd7 Enable the X86 call frame optimization for the 64-bit targets that allow it.
Fixes PR27241.

Differential Revision: http://reviews.llvm.org/D19688

llvm-svn: 268227
2016-05-02 13:45:25 +00:00
Craig Topper b6da65403a [AVX512] VPACKUSWB/VPACKSSWB should not be encoded with EVEX.W=1. While there fix the execution domain for VPACKSSDW/VPACKUSDW.
llvm-svn: 268200
2016-05-01 17:38:32 +00:00
Igor Breger 110af565c7 getelementptr instruction, support index vector of EVT.
Differential Revision: http://reviews.llvm.org/D19775

llvm-svn: 268195
2016-05-01 13:29:12 +00:00
Igor Breger 131008fbcb Change AVX512 braodcastsd/ss patterns interaction with spilling . New implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth.
Differential Revision: http://reviews.llvm.org/D19579

llvm-svn: 268190
2016-05-01 08:40:00 +00:00
Craig Topper e430de8be6 [AVX512] Prefer AVX512 VPACK instructions over AVX/AVX2 instructions when VLX and BWI are supported.
llvm-svn: 268189
2016-05-01 06:52:19 +00:00
Haicheng Wu 4afe0425db [MBP] Use Function::optForSize() instead of checking OptimizeForSize directly.
Fix a FIXME.  Disable loop alignment if compiled with -Oz now.

llvm-svn: 268121
2016-04-29 22:01:10 +00:00
Sriraman Tallam 7da9b445ea Differential Revision: http://reviews.llvm.org/D19733
llvm-svn: 268106
2016-04-29 21:19:16 +00:00
Matt Arsenault ab2232cf73 DAGCombiner: Reduce truncated shl width
llvm-svn: 268094
2016-04-29 19:53:16 +00:00
Mitch Bodart e60465ddf7 [X86] Enable the post-RA-scheduler for clang's default 32-bit cpu.
For compilations with no explicit cpu specified, this exhibits
nice gains on Silvermont, with neutral performance on big cores.

Differential Revision: http://reviews.llvm.org/D19138

llvm-svn: 267809
2016-04-27 22:52:35 +00:00
Quentin Colombet bf200688de [X86][FastISel] Make sure we use the right register class when we select stores.
llvm-svn: 267806
2016-04-27 22:33:42 +00:00
Quentin Colombet d6dbec4c6f [X86] Fix the lowering of TLS calls.
The callseq_end node must be glued with the TLS calls, otherwise,
the generic code will miss the uses of the returned value and will
mark it dead.
Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI,
the pseudo uses the symbol address at this point not RDI and the
lowering will do the right thing.

llvm-svn: 267797
2016-04-27 21:37:37 +00:00
Kevin B. Smith c378a99ba5 [X86]: Quit promoting 16 bit loads to 32 bit.
Differential Revision: http://reviews.llvm.org/D19592

llvm-svn: 267773
2016-04-27 19:58:03 +00:00
Nico Weber e69b9548b8 Revert r267649, it caused PR27539.
llvm-svn: 267723
2016-04-27 15:16:54 +00:00
Ahmed Bougacha 19a2ee591a [X86] Don't assume that MMX extractelts are from index 0.
It's probably the case for all 3 MMX users out there, but with
hand-crafted IR, you can trigger selection failures. Fix that.

llvm-svn: 267652
2016-04-27 01:35:29 +00:00
Ahmed Bougacha e68363a03c [X86] Re-enable MMX i32 extractelt combine.
This effectively adds back the extractelt combine removed by r262358:
the direct case can still occur (because x86_mmx is special, see
r262446), but it's the indirect case that's now superseded by the
generic combine.

llvm-svn: 267651
2016-04-27 01:35:25 +00:00
Cong Hou 6f879d9eb1 Detects the SAD pattern on X86 so that much better code will be emitted once the pattern is matched.
Differential revision: http://reviews.llvm.org/D14840

llvm-svn: 267649
2016-04-27 01:29:18 +00:00
Quentin Colombet 4ff3cfb673 [X86] Make sure it is safe to clobber EFLAGS, if need be, when choosing
the prologue.

Do not use basic blocks that have EFLAGS live-in as prologue if we need
to realign the stack. Realigning the stack uses AND instruction and this
clobbers EFLAGS.

An other alternative would have been to save and restore EFLAGS around
the stack realignment code, but this is likely inefficient.

Fixes PR27531.

llvm-svn: 267634
2016-04-26 23:44:14 +00:00
Mitch Bodart 807e13379b [X86] Replace -mcpu with -mattr in several tests
Differential Revision: http://reviews.llvm.org/D19568

llvm-svn: 267629
2016-04-26 23:36:38 +00:00
Quentin Colombet 08e79990a0 [MachineBasicBlock] Take advantage of the partially dead information.
Thanks to that information we wouldn't lie on a register being live whereas it
is not.

llvm-svn: 267622
2016-04-26 23:14:29 +00:00
Quentin Colombet 3f19245015 [MachineInstrBundle] Improvement the recognition of dead definitions.
Now, it is possible to know that partial definitions are dead definitions and
recognize that clobbered registers are also dead.

llvm-svn: 267621
2016-04-26 23:14:24 +00:00
Manman Ren 1c3f65a18c Swift Calling Convention: use %RAX for sret.
We don't need to copy the sret argument into %rax upon return.
rdar://25671494

llvm-svn: 267579
2016-04-26 18:08:06 +00:00
Sanjay Patel d66607bd8c [CodeGenPrepare] use branch weight metadata to decide if a select should be turned into a branch
This is part of solving PR27344:
https://llvm.org/bugs/show_bug.cgi?id=27344

CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this
same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the
IR further with a select in place.

For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly
predictable branch. Even the most limited HW branch predictors will be correct on this branch almost
all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all
the times the branch was predicted correctly.

As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's
MispredictPenalty. Or we could just let targets override the default by implementing the hook with that
and other target-specific options. Note that trying to statically determine mispredict rates for 
close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie, 
50/50 taken/not-taken might still be 100% predictable.

Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable()
branch weight default values are 4 and 64. A proposal to change that is in D19435.

Differential Revision: http://reviews.llvm.org/D19488

llvm-svn: 267572
2016-04-26 17:11:17 +00:00
Andrey Turetskiy b405606432 [X86] PR27502: Fix the LEA optimization pass.
Handle MachineBasicBlock as a memory displacement operand in the LEA optimization pass.

Differential Revision: http://reviews.llvm.org/D19409

llvm-svn: 267551
2016-04-26 12:18:12 +00:00
Ahmed Bougacha 5cf735a5b1 [X86] Use LivePhysRegs in X86FixupBWInsts.
Kill-flags, which computeRegisterLiveness uses, are not reliable.
LivePhysRegs is.

Differential Revision: http://reviews.llvm.org/D19472

llvm-svn: 267495
2016-04-26 00:00:48 +00:00
Sanjay Patel 43c7af6889 add tests for potential CGP transform (PR27344)
llvm-svn: 267426
2016-04-25 16:56:52 +00:00
Sanjay Patel 0fc4137065 [x86] auto-generate checks for cmov tests
llvm-svn: 267417
2016-04-25 15:26:57 +00:00
David Majnemer dd21523653 [WinEH] Update SplitAnalysis::computeLastSplitPoint to cope with multiple EH successors
We didn't have logic to correctly handle CFGs where there was more than
one EH-pad successor (these are novel with WinEH).
There were situations where a register was live in one exceptional
successor but not another but the code as written would only consider
the first exceptional successor it found.

This resulted in split points which were insufficiently early if an
invoke was present.

This fixes PR27501.

N.B.  This removes getLandingPadSuccessor.

llvm-svn: 267412
2016-04-25 14:31:32 +00:00
Michael Zuckerman 1bd66dd1c2 Fixing wrong mask size error. From __mmask8 to __mmask16.
Was reviewed over the shoulder by AsafBadouh.
Connected to review http://reviews.llvm.org/D19195.

llvm-svn: 267379
2016-04-25 05:27:51 +00:00
Craig Topper 61a14911b2 [X86] Add a complete set of tests for all operand sizes of cttz/ctlz with and without zero undef being lowered to bsf/bsr.
llvm-svn: 267373
2016-04-25 01:01:15 +00:00
Simon Pilgrim 646c2a5569 [X86][AVX] Added PR24935 test case
llvm-svn: 267362
2016-04-24 20:30:48 +00:00
Simon Pilgrim 2d0104cc47 [X86][SSE] Added SSSE3/AVX/AVX2 BITREVERSE tests
Codegen is pretty bad at the moment but could use PSHUFB quite efficiently 

llvm-svn: 267347
2016-04-24 15:45:06 +00:00
Simon Pilgrim f379a6c684 [X86][XOP] Fixed VPPERM permute op decoding (PR27472).
Fixed issue with VPPERM target shuffle mask decoding that was incorrectly masking off the 3-bit permute op with a 2-bit mask.

llvm-svn: 267346
2016-04-24 15:05:04 +00:00
Simon Pilgrim 9f5697ef68 [X86][SSE] Improved support for decoding target shuffle masks through bitcasts
Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm.

This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472.

llvm-svn: 267343
2016-04-24 14:53:54 +00:00
Simon Pilgrim 7eedee938f [X86][SSE] Demonstrate issue with decoding shuffle masks that have been lowered as rematerialized constants on scalar unit
Found whilst investigating PR27472

llvm-svn: 267339
2016-04-24 13:45:30 +00:00
Craig Topper 601b6c69bc [X86] Fix patterns that turn cmove/cmovne+ctlz/cttz into lzcnt/tzcnt instructions. Only one of the conditions should be valid for each pattern, not both. Update tests accordingly.
llvm-svn: 267311
2016-04-24 02:01:22 +00:00
Duncan P. N. Exon Smith a59d3e5af8 DebugInfo: Remove MDString-based type references
Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around
DIType*.  It is no longer legal to refer to a DICompositeType by its
'identifier:', and DIBuilder no longer retains all types with an
'identifier:' automatically.

Aside from the bitcode upgrade, this is mainly removing logic to resolve
an MDString-based reference to an actualy DIType.  The commits leading
up to this have made the implicit type map in DICompileUnit's
'retainedTypes:' field superfluous.

This does not remove DITypeRef, DIScopeRef, DINodeRef, and
DITypeRefArray, or stop using them in DI-related metadata.  Although as
of this commit they aren't serving a useful purpose, there are patchces
under review to reuse them for CodeView support.

The tests in LLVM were updated with deref-typerefs.sh, which is attached
to the thread "[RFC] Lazy-loading of debug info metadata":

  http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html

llvm-svn: 267296
2016-04-23 21:08:00 +00:00
Simon Pilgrim 9448b112de [X86][XOP] Added VPPERM -> BLEND-WITH-ZERO Test
Currently failing due to poor blend matching, found whilst investigating PR27472

llvm-svn: 267282
2016-04-23 11:14:18 +00:00
Craig Topper 7e5fad66f3 [CodeGen] When promoting CTTZ operations to larger type, don't insert a select to detect if the input is zero to return the original size instead of the extended size. Instead just set the first bit in the zero extended part.
llvm-svn: 267280
2016-04-23 05:20:47 +00:00
Sriraman Tallam 3cb773431d Differential Revision: http://reviews.llvm.org/D19040
llvm-svn: 267229
2016-04-22 21:41:58 +00:00
Matt Arsenault 3b748d76f6 DAGCombiner: Relax alignment restriction when changing store type
If the target allows the alignment, this should be OK.

llvm-svn: 267217
2016-04-22 21:01:41 +00:00
Peter Collingbourne 265ebd7d70 CodeGen: Use PLT relocations for relative references to unnamed_addr functions.
The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual
functions defined in other DSOs. The unnamed_addr attribute means that the
function's address is not significant, so we're allowed to substitute it
with the address of a PLT entry.

Also includes a bonus feature: addends for COFF image-relative references.

Differential Revision: http://reviews.llvm.org/D17938

llvm-svn: 267211
2016-04-22 20:40:10 +00:00
Matt Arsenault 629d12de70 DAGCombiner: Relax alignment restriction when changing load type
If the target allows the alignment, this should still be OK.

llvm-svn: 267209
2016-04-22 20:21:36 +00:00
Nirav Dave 9a878c4930 Emit code16 in assembly in 16-bit mode
Summary:
When generating assembly using -m16 we must explicitly mark it as
16-bit. Emit .code16 at beginning of file. Fixes wrong results when
using -fno-integrated-as.

Reviewers: dwmw2

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19392

llvm-svn: 267152
2016-04-22 13:36:11 +00:00
Craig Topper 59479e7208 [AVX512] Teach lowering to use vplzcntd/q to implement 128/256-bit CTTZ_ZERO_UNDEF even without VLX support. We can just extend to 512-bits and extract like we do for CTLZ.
llvm-svn: 267100
2016-04-22 03:22:38 +00:00
Matt Arsenault 8d1052f55c DAGCombiner: Reduce 64-bit BFE pattern to pattern on 32-bit component
If the extracted bits are restricted to the upper half or lower half,
this can be truncated.

llvm-svn: 267024
2016-04-21 18:03:06 +00:00
Craig Topper 21690db05a [AVX512] Add CTTZ support for v8i64 and v16i32 vectors.
llvm-svn: 266968
2016-04-21 07:30:06 +00:00
Craig Topper 89d7a76d88 [X86] Fix vector-tzcnt-512 test to disable CDI while enabling BWI for one of the runs. Update check patterns accordingly.
llvm-svn: 266967
2016-04-21 07:30:03 +00:00
Craig Topper 4c07b0f896 Fix test command line to explicitly disable CDI instructions for one test.
llvm-svn: 266966
2016-04-21 07:29:59 +00:00
Craig Topper 340ad0a0c9 [AVX512] Add support for lowering CTTZ v64i8 and v32i16 with BWI instructions.
llvm-svn: 266963
2016-04-21 06:39:34 +00:00
Craig Topper 3dd625ce79 [AVX512] Add support for popcount of v8i64 and v16i32 with and without BWI instructions.
Without BWI we have to split the vectors into 256-bit vectors so we can use AVX2 pshufb and then concatenate the results.

llvm-svn: 266950
2016-04-21 03:57:24 +00:00
Asaf Badouh 89406d1815 [X86] enable PIE for functions
Call locally defined function directly for PIE/fPIE

Differential Revision: http://reviews.llvm.org/D19226

llvm-svn: 266863
2016-04-20 08:32:57 +00:00
Craig Topper ddf022a337 [AVX512] Add avx512cd+vl runs to vector-tzcnt-128/256 tests to show using the vplzcntd/q instructions.
llvm-svn: 266860
2016-04-20 05:19:01 +00:00
Craig Topper 7e37746011 [AVX512] Update vector-tzcnt-512 test to show how bad v32i16 and v64i8 is with avx512bw enabled.
llvm-svn: 266859
2016-04-20 05:18:58 +00:00
Craig Topper 99e60e9f1f [AVX512] Add popcount support for v32i16 and v64i8.
llvm-svn: 266858
2016-04-20 05:18:55 +00:00
Mandeep Singh Grang 029a0567fa [LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC.
Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests.

Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier

Subscribers: mcrosier, dsanders

Differential Revision: http://reviews.llvm.org/D19279

llvm-svn: 266834
2016-04-19 23:51:52 +00:00
Tim Shen e885d5e4d3 [SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARD
With this change, ideally IR pass can always generate llvm.stackguard
call to get the stack guard; but for now there are still IR form stack
guard customizations around (see getIRStackGuard()). Future SSP
customization should go through LOAD_STACK_GUARD.

There is a behavior change: stack guard values are not CSEed anymore,
since we should never reuse the value in case that it has been spilled (and
corrupted). See ssp-guard-spill.ll. This also cause the change of stack
size and codegen in X86 and AArch64 test cases.

Ideally we'd like to know if the guard created in llvm.stackprotector() gets
spilled or not. If the value is spilled, discard the value and reload
stack guard; otherwise reuse the value. This can be done by teaching
register allocator to know how to rematerialize LOAD_STACK_GUARD and
force a rematerialization (which seems hard), or check for spilling in
expandPostRAPseudo. It only makes sense when the stack guard is a global
variable, which requires more instructions to load. Anyway, this seems to go out
of the scope of the current patch.

llvm-svn: 266806
2016-04-19 19:40:37 +00:00
Simon Pilgrim 32b1c9fe7f [X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary shuffles
Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not.

Differential Revision: http://reviews.llvm.org/D19228

llvm-svn: 266728
2016-04-19 12:26:40 +00:00
Sanjoy Das c0441c29df Introduce a "patchable-function" function attribute
Summary:
The `"patchable-function"` attribute can be used by an LLVM client to
influence LLVM's code generation in ways that makes the generated code
easily patchable at runtime (for instance, to redirect control).
Right now only one patchability scheme is supported,
`"prologue-short-redirect"`, but this can be expanded in the future.

Reviewers: joker.eph, rnk, echristo, dberris

Subscribers: joker.eph, echristo, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19046

llvm-svn: 266715
2016-04-19 05:24:47 +00:00
Simon Pilgrim 9c7ea2dfaa [X86][SSE] Test case for PR2585
llvm-svn: 266669
2016-04-18 21:07:49 +00:00
Simon Pilgrim 647e9f80af [X86][AVX] Added extra memory folding tests for D19228
llvm-svn: 266662
2016-04-18 19:48:16 +00:00
Simon Pilgrim 6802410f42 [X86][AVX] Added zero+blend vs vperm2f128 optsize tests cases (PR22984)
We should be trying to use vperm2f128 instead of zero+blend (if we're only the user of zero?) when optsize is enabled.

llvm-svn: 266632
2016-04-18 17:14:04 +00:00
Simon Pilgrim ec4f40b6ee [X86][AVX] Renamed vperm2f128 test to make it quicker to review
missed one the first time round...

llvm-svn: 266623
2016-04-18 16:08:19 +00:00
Simon Pilgrim 1a87a93dff [X86][AVX] Renamed vperm2f128 tests to make it quicker to review
llvm-svn: 266621
2016-04-18 15:37:45 +00:00
Simon Pilgrim 6ccbd2bdd2 [X86][SSE] Added 16i8 -> 8i64 sext test
Shows poor codegen for AVX2

llvm-svn: 266560
2016-04-17 15:10:42 +00:00
Craig Topper 75869d5701 [AVX512] ISD::MUL v2i64/v4i64 should only be legal if DQI and VLX features are enabled.
llvm-svn: 266554
2016-04-17 07:25:39 +00:00
Simon Pilgrim b75744ceae [X86][AVX] Add shuffle combine tests for MOVDDUP/MOVSHDUP/MOVSLDUP
128, 256 and 512 bit implementations (some not yet supported by combineX86ShuffleChain)

llvm-svn: 266535
2016-04-16 20:30:59 +00:00
Simon Pilgrim fd4b9b02a3 [X86][XOP] Added VPPERM constant mask decoding and target shuffle combining support
Added additional test that peeks through bitcast to v16i8 mask

llvm-svn: 266533
2016-04-16 17:52:07 +00:00
Simon Pilgrim 4eb94a93a0 [X86][XOP] More VPPERM shuffle mask decode tests
As requested by D18441

llvm-svn: 266531
2016-04-16 16:37:21 +00:00
Wei Mi 963f2df4d2 Don't skip splitSeparateComponents in eliminateDeadDefs for HoistSpillHelper::hoistAllSpills.
Because HoistSpillHelper::hoistAllSpills is called in postOptimization, before the
patch we didn't want LiveRangeEdit::eliminateDeadDefs to call splitSeparateComponents
and generate unassigned new vregs. However, skipping splitSeparateComponents will make
verify-machineinstrs unhappy, so I remove the early return, and use
HoistSpillHelper::LRE_DidCloneVirtReg to assign physreg/stackslot for those new vregs.

In addition, some code reorganization to make class HoistSpillHelper privately inheriting
from LiveRangeEdit::Delegate possible. This is to be consistent with class RAGreedy and
class RegisterCoalescer.

Differential Revision: http://reviews.llvm.org/D19142

llvm-svn: 266489
2016-04-15 23:16:44 +00:00
Hans Wennborg 07f6d3a893 Switch lowering: don't add incoming PHI values from skipped bit test MBB's (PR27135)
After r245976, LLVM will skip the last bit test case if knows it will always be
true. However, we would still erroneously update PHI nodes with incoming values
from the MBB that would perform the final bit test, causing -verify-machineinstrs
to fail.

llvm-svn: 266479
2016-04-15 21:45:30 +00:00
Adrian Prantl 75819aedf6 [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.

Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.

Motivation
----------

Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.

We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.

Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.

http://reviews.llvm.org/D19034
<rdar://problem/25256815>

llvm-svn: 266446
2016-04-15 15:57:41 +00:00
Adam Nemet 7aab648831 Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"
This reverts commit r266086.

It breaks the LTO build of gcc in SPEC2000.

llvm-svn: 266282
2016-04-14 08:47:17 +00:00
Matthias Braun 588d1cdad4 X86: Use a callee save register for the swiftself parameter.
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D18902

llvm-svn: 266252
2016-04-13 21:43:21 +00:00
Sanjay Patel d6c53f8727 [x86] add tests to show potential BMI optimization
llvm-svn: 266243
2016-04-13 20:40:43 +00:00
Simon Pilgrim cc901b57d5 [X86][SSE] Regenerated vector integer absolute tests
llvm-svn: 266194
2016-04-13 12:40:22 +00:00
Simon Pilgrim b6ef8b730d Added missing autogeneration note
llvm-svn: 266185
2016-04-13 09:28:44 +00:00
Wei Mi 9a16d655c7 Recommit r265547, and r265610,r265639,r265657 on top of it, plus
two fixes with one about error verify-regalloc reported, and
another about live range update of phi after rematerialization.

r265547:
Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.

analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.

To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.

Patches on top of r265547:
r265610 "Fix the compare-clang diff error introduced by r265547."
r265639 "Fix the sanitizer bootstrap error in r265547."
r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]"

Differential Revision: http://reviews.llvm.org/D15302
Differential Revision: http://reviews.llvm.org/D18934
Differential Revision: http://reviews.llvm.org/D18935
Differential Revision: http://reviews.llvm.org/D18936

llvm-svn: 266162
2016-04-13 03:08:27 +00:00
Artur Pilipenko dbe0bc8df4 Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change.

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 266086
2016-04-12 15:58:04 +00:00
Than McIntosh b9049084a0 Test commit, NFC.
Adds a blank line.

llvm-svn: 266082
2016-04-12 15:35:05 +00:00
Simon Pilgrim edf100e4e7 [X86] Regenerated avx512 calling convention test checks
llvm-svn: 266070
2016-04-12 13:31:01 +00:00
Evgeniy Stepanov f17120a85f [safestack] Add canary to unsafe stack frames
Add StackProtector to SafeStack. This adds limited protection against
data corruption in the caller frame. Current implementation treats
all stack protector levels as -fstack-protector-all.

llvm-svn: 266004
2016-04-11 22:27:48 +00:00
Simon Pilgrim 82e54871d0 [DAGCombiner] Fold xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) anytime before LegalizeVectorOprs
xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) was only being combined at the AfterLegalizeTypes stage, this patch permits the combine to occur anytime before then as well.

The main aim with this to improve the ability to recognise bitmasks that can be converted to shuffles.

I had to modify a number of AVX512 mask tests as the basic bitcast to/from scalar pattern was being stripped out, preventing testing of the mmask bitops. By replacing the bitcasts with loads we can get almost the same result.

Differential Revision: http://reviews.llvm.org/D18944

llvm-svn: 265998
2016-04-11 21:10:33 +00:00
Manman Ren 5751814eda Swift Calling Convention: swifterror target support.
Differential Revision: http://reviews.llvm.org/D18716

llvm-svn: 265997
2016-04-11 21:08:06 +00:00
Adrian Prantl b01a4d48ac More upgrading of old- and very-old-style debug info in testcases.
llvm-svn: 265953
2016-04-11 15:53:44 +00:00
Simon Pilgrim ac0cac3b0d [X86] Added extra widening tests for and/xor/or bit operations
Add tests for bitcasting an illegal vector to/from a legal scalar

Additional tests requested for D18944

llvm-svn: 265930
2016-04-11 11:10:36 +00:00
Simon Pilgrim ff434aaa3a [X86] Added extra widening tests for and/xor/or bit operations
To make sure we're dealing with both cases of legal/illegal number of vector elements and legal/illegal vector element types

llvm-svn: 265929
2016-04-11 10:58:52 +00:00
Simon Pilgrim d839fe9cfb [X86] Regenerated sdglue test checks
llvm-svn: 265927
2016-04-11 10:22:05 +00:00
Simon Pilgrim da730769fb [X86] Added widening tests for and/xor/or bit operations
Part of additional tests requested for D18944

llvm-svn: 265925
2016-04-11 10:16:27 +00:00
Simon Pilgrim 2d463b1a0f [X86][AVX512] Add vector integer division by constant tests
Added sdiv/srem and udiv/urem tests cases for 512 bit vectors.

llvm-svn: 265903
2016-04-10 17:14:26 +00:00
Simon Pilgrim d263fdc512 [X86][AVX512BW] Add support for v64i8 multiplies
Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets.

I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL.

Differential Revision: http://reviews.llvm.org/D18937

llvm-svn: 265902
2016-04-10 17:02:48 +00:00
Simon Pilgrim bab9979ee9 [X86][AVX512] Regenerated mask op tests
llvm-svn: 265898
2016-04-10 14:16:03 +00:00
Charles Davis 2f65f35c27 [CodeGen] Don't assume that fixed stack objects are aligned in a stack-realigned function.
Summary:
After we make the adjustment, we can assume that for local allocas, but
not for stack parameters, the return address, or any other fixed stack
object (which has a negative offset and therefore lies prior to the
adjusted SP).

Fixes PR26662.

Reviewers: hfinkel, qcolombet, rnk

Subscribers: rnk, llvm-commits

Differential Revision: http://reviews.llvm.org/D18471

llvm-svn: 265886
2016-04-09 23:34:42 +00:00
Sanjay Patel 4abae4e0fa [x86] use BMI 'andn' for logic + compare ops
With BMI, we can use 'andn' to save an instruction when the result is only used in a compare.
This is related to one of the potential sequences to check 'isfinite' in:
https://llvm.org/bugs/show_bug.cgi?id=27164

Differential Revision: http://reviews.llvm.org/D18910

llvm-svn: 265875
2016-04-09 16:02:52 +00:00
Simon Pilgrim 1cc5712763 [X86][XOP] Support for VPPERM 2-input shuffle mask decoding
This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle.

The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp.

Differential Revision: http://reviews.llvm.org/D18441

llvm-svn: 265874
2016-04-09 14:51:26 +00:00
Sanjay Patel c0a627524d [x86] show missed opportunities to use andn
llvm-svn: 265850
2016-04-08 21:26:11 +00:00
Sanjay Patel a699ecc55c [x86] regenerate checks for BMI tests
llvm-svn: 265841
2016-04-08 20:29:39 +00:00
Kevin B. Smith e0a6fc3bcc [X86] Fix PR23155 by turning on X86FixupBWInsts by default.
Differential Revision: http://reviews.llvm.org/D18866

llvm-svn: 265830
2016-04-08 18:58:29 +00:00
Hans Wennborg 5a7723c7a2 Revert r265547 "Recommit r265309 after fixed an invalid memory reference bug happened"
It caused PR27275: "ARM: Bad machine code: Using an undefined physical register"

Also reverting the following commits that were landed on top:
r265610 "Fix the compare-clang diff error introduced by r265547."
r265639 "Fix the sanitizer bootstrap error in r265547."
r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]"

llvm-svn: 265790
2016-04-08 15:17:43 +00:00
Simon Pilgrim 872dd6c3fe [X86][SSE] Added 32-bit tests for vector lzcnt/tzcnt tests
v2i64 tests are particularly bad on 32-bit targets.

llvm-svn: 265789
2016-04-08 15:01:31 +00:00
Dmitry Polukhin 24c4f28ac9 [IFUNC] Fix ifunc-asm.ll test
It seems that llc cannot be called used in assembler tests so test that
checks asm for particular target needs to be moved to codegen.

llvm-svn: 265770
2016-04-08 06:45:19 +00:00
Amaury Sechet c53ad4f3b2 Do not select EhPad BB in MachineBlockPlacement when there is regular BB to schedule
Summary:
EHPad BB are not entered the classic way and therefor do not need to be placed after their predecessors. This patch make sure EHPad BB are not chosen amongst successors to form chains, and are selected as last resort when selecting the best candidate.

EHPad are scheduled in reverse probability order in order to have them flow into each others naturally.

Reviewers: chandlerc, majnemer, rafael, MatzeB, escha, silvas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17625

llvm-svn: 265726
2016-04-07 21:29:39 +00:00
Simon Pilgrim fba9352f31 [X86][SSE] Added bitmask pattern shuffle tests
Based on OR(AND(MASK,V0),AND(~MASK,V1)) style patterns

llvm-svn: 265697
2016-04-07 17:23:55 +00:00
Kevin B. Smith 3802c4af59 [X86]: Fix for PR27251.
Differential Revision: http://reviews.llvm.org/D18850

llvm-svn: 265690
2016-04-07 16:15:34 +00:00
Simon Pilgrim d54bae6525 [X86][SSE] Add support for VZEXT constant folding
llvm-svn: 265646
2016-04-07 07:52:45 +00:00
Ahmed Bougacha 1cf67fb9cb [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
Re-apply r265450 which caused PR27245 and was reverted in r265559
because of a wrong generalization: the fetch_and_add->add_and_fetch
combine only works in specific, but pretty common, cases:
  (icmp slt x, 0) -> (icmp sle (add x, 1), 0)
  (icmp sge x, 0) -> (icmp sgt (add x, 1), 0)
  (icmp sle x, 0) -> (icmp slt (sub x, 1), 0)
  (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0)

Original Message:

We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265636
2016-04-07 02:07:10 +00:00
Ahmed Bougacha 70bde5445b [X86] Refresh and tweak EFLAGS reuse tests. NFC.
The non-1 and EQ/NE tests were misguided.

llvm-svn: 265635
2016-04-07 02:06:53 +00:00
Hans Wennborg ab16be799c Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
Third time's the charm? The previous attempt (r265345) caused ASan test
failures on X86, as broken CFI caused stack traces to not work.

This version of the patch makes sure not to merge with stack adjustments
that have CFI, and to not add merged instructions' offests to the CFI
about to be generated.

This is already covered by the lit tests; I just got the expectations
wrong previously.

llvm-svn: 265623
2016-04-07 00:05:49 +00:00
Hans Wennborg 6849f8f15f Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC."
It caused ASan 32-bit tests to hang (PR27245).

llvm-svn: 265559
2016-04-06 16:44:38 +00:00
Hans Wennborg a7e396b5ef Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)""
It seems to be causing ASan tests to crash, probably due to
miscompiling the run-time somehow.

llvm-svn: 265551
2016-04-06 16:10:20 +00:00
Wei Mi 18293bef4e Recommit r265309 after fixed an invalid memory reference bug happened
when DenseMap growed and moved memory. I verified it fixed the bootstrap
problem on x86_64-linux-gnu but I cannot verify whether it fixes
the bootstrap error on clang-ppc64be-linux. I will watch the build-bot
result closely.

Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.

analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.

To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.

Differential Revision: http://reviews.llvm.org/D15302

llvm-svn: 265547
2016-04-06 15:41:07 +00:00
Sanjoy Das 65a60670e8 Lower @llvm.experimental.deoptimize as a noreturn call
While preserving the return value for @llvm.experimental.deoptimize at
the IR level is useful during mid-level optimization, doing so at the
machine instruction level requires generating some extra code and a
return that is non-ideal.  This change has LLVM lower

```
  %val = call @llvm.experimental.deoptimize
  ret %val
```

to effectively

```
  call @__llvm_deoptimize()
  unreachable
```

instead.

llvm-svn: 265502
2016-04-06 01:33:49 +00:00
Evgeniy Stepanov dde29e2799 Faster stack-protector for Android/AArch64.
Bionic has a defined thread-local location for the stack protector
cookie. Emit a direct load instead of going through __stack_chk_guard.

llvm-svn: 265481
2016-04-05 22:41:50 +00:00
Manman Ren f8bdd88cd9 Swift Calling Convention: add swiftcc.
Differential Revision: http://reviews.llvm.org/D17863

llvm-svn: 265480
2016-04-05 22:41:47 +00:00
Ahmed Bougacha 50e6cd4a3a [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265450
2016-04-05 20:02:57 +00:00
Ahmed Bougacha b76e7253e9 [X86] Add tests for ATOMIC_LOAD_OP EFLAGS reuse. NFC.
llvm-svn: 265448
2016-04-05 20:02:44 +00:00
Sanjay Patel 0484879fe7 [x86] regenerate checks
utils/update_test_checks.py was improved with:
http://reviews.llvm.org/rL265414
to include the first line of the function (expected to be
a comment line). This ensures that nothing bad has happened
before the first actual line of checked asm. It also matches
the existing behavior of the old script.

llvm-svn: 265416
2016-04-05 17:12:19 +00:00
Chuang-Yu Cheng d3fb38cae5 Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge
Presently, CodeGenPrepare deletes all nearly empty (only phi and branch)
basic blocks. This pass can delete loop preheaders which frequently creates
critical edges. A preheader can be a convenient place to spill registers to
the stack. If the entrance to a loop body is a critical edge, then spills
may occur in the loop body rather than immediately before it. This patch
protects loop preheaders from deletion in CodeGenPrepare even if they are
nearly empty.

Since the patch alters the CFG, it affects a large number of test cases.
In most cases, the changes are merely cosmetic (basic blocks have different
names or instruction orders change slightly). I am somewhat concerned about
the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not
deleted, then the MIPS backend does not take advantage of a branch delay
slot. Consequently, I would like some close review by a MIPS expert.

The patch also partially subsumes D16893 from George Burgess IV. George
correctly notes that CodeGenPrepare does not actually preserve the dominator
tree. I think the dominator tree was usually not valid when CodeGenPrepare
ran, but I am using LoopInfo to mark preheaders, so the dominator tree is
now always valid before CodeGenPrepare.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng

http://reviews.llvm.org/D16984

llvm-svn: 265397
2016-04-05 14:06:20 +00:00
Teresa Johnson f4cf1c3eb4 Don't fold double constant to an integer if dest type not integral
Summary:
I encountered this issue when constant folding during inlining tried to
fold away a bitcast of a double to an x86_mmx, which is not an integral
type. The test case exposes the same issue with a smaller code snippet
during early CSE.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18528

llvm-svn: 265367
2016-04-04 23:50:46 +00:00
Hans Wennborg a47a692341 Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
The original commit miscompiled things on 32-bit Windows, e.g. a Clang
boostrap. It turns out that mergeSPUpdates() was a bit too generous in
what it interpreted as a stack adjustment, causing the following code:

        addl    $12, %esp
        leal    -4(%ebp), %esp

To be "optimized" into simply:

        addl    $8, %esp

This commit tightens up mergeSPUpdates() and includes a new test
(test14 in movtopush.ll) for this situation.

llvm-svn: 265345
2016-04-04 21:02:46 +00:00
Sean Silva f4517a15b8 Beef up some dllexport tests.
Adds some dllexport tests to verify that:
  - Variables in bss are exported appropriately
  - Non-dllexport symbols aliased to dllexport symbols are not exported
  - Symbols declared as dllexport but are not defined are not exported

We plan to enable dllimport/dllexport support for the PS4, and these
additional tests are for points we noticed in our internal testing.

Patch by Warren Ristow!

Differential Revision: http://reviews.llvm.org/D18682

llvm-svn: 265333
2016-04-04 19:10:55 +00:00
Wei Mi fb5252cac1 Revert r265309 and r265312 because they caused some errors I need to investigate.
llvm-svn: 265317
2016-04-04 17:45:03 +00:00
Wei Mi ffbc9c7f3b Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.

analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.

To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.

Differential Revision: http://reviews.llvm.org/D15302

llvm-svn: 265309
2016-04-04 16:42:40 +00:00
Elena Demikhovsky e99c561391 AVX-512: Truncating store for i1 vectors
Implemented truncstore for KNL and skylake-avx512.
Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes.

Differential Revision: http://reviews.llvm.org/D18740

llvm-svn: 265283
2016-04-04 07:17:47 +00:00
Simon Pilgrim d74f6e22f2 [X86][SSE] Refreshed MOVMSK sign bit tests
llvm-svn: 265267
2016-04-03 18:59:42 +00:00
Elena Demikhovsky 5e426f7356 AVX-512: Load and Extended Load for i1 vectors
Implemented load+{sign|zero}_extend for i1 vectors
Fixed failures in i1 vector load.
Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX.

Differential Revision: http://reviews.llvm.org/D18737

llvm-svn: 265259
2016-04-03 08:41:12 +00:00
Simon Pilgrim b2e837d875 [X86][SSE] Added 1024-bit vector comparison tests
More examples of PR22603, poor vector splitting for AVX512F targets as well as missing uses of PACKSS/MOVMSK

llvm-svn: 265248
2016-04-02 21:33:09 +00:00
Simon Pilgrim 8fd7d852d5 [X86][AVX512] Added AVX512 comparison tests
llvm-svn: 265247
2016-04-02 21:24:42 +00:00
Simon Pilgrim a2c8da9e06 [X86][AVX] Added vector float truncation (double2float) tests
llvm-svn: 265222
2016-04-02 14:09:17 +00:00
Adrian Prantl cf0961f5ea Add missing emissionKind flags to the DICompileUnits of several old testcases.
llvm-svn: 265192
2016-04-01 22:18:43 +00:00
Simon Pilgrim 3243b21dae [X86][SSE] Regenerated vector float tests - fabs / floor(etc.) / fneg / float2double
llvm-svn: 265186
2016-04-01 21:30:48 +00:00
Simon Pilgrim 1e5bf0a256 [X86][SSE] Vector i64 load tests
llvm-svn: 265185
2016-04-01 21:06:17 +00:00
Simon Pilgrim 275b2bcb76 [X86][SSE] Regenerated comparison mask and float immediate tests
llvm-svn: 265184
2016-04-01 21:00:00 +00:00
Simon Pilgrim a372a0f295 [X86][SSE] Regenerated the vec_extract tests.
llvm-svn: 265183
2016-04-01 20:55:19 +00:00
Simon Pilgrim f739d8a2ed [X86][SSE] Regenerated the vec_insert tests.
llvm-svn: 265179
2016-04-01 19:42:23 +00:00
Simon Pilgrim 6121118663 [X86][SSE] Regenerated vec_partial tests.
llvm-svn: 265173
2016-04-01 18:30:29 +00:00
Sanjay Patel 9b5b5c82ca [x86] add an SSE2 + fast-unaligned accesses run for memset nonzero tests
Was there really no other way to splat a byte in SSE2?
    punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
    pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
    pshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]

llvm-svn: 265172
2016-04-01 18:29:25 +00:00
Simon Pilgrim 858194640e [X86][SSE] Regenerated vec_logical tests.
llvm-svn: 265171
2016-04-01 18:28:23 +00:00
Simon Pilgrim 1b14082488 [X86][SSE] Regenerated vector sdiv to shifts tests
Added SSE + AVX1 tests as well as AVX2

llvm-svn: 265169
2016-04-01 18:18:40 +00:00
Sanjay Patel d3e3d48cb9 [x86] add an SSE1 run for these tests
Note however that this is identical to the existing SSE2 run.
What we really want is yet another run for an SSE2 machine that
also has fast unaligned 16-byte accesses.

llvm-svn: 265167
2016-04-01 18:11:30 +00:00
Simon Pilgrim b8283631a5 [X86][SSE] Regenerated vec_setcc tests.
llvm-svn: 265164
2016-04-01 17:55:02 +00:00
Simon Pilgrim 41e31ff6bd [X86][SSE] Regenerated the vec_set tests.
Replaced lots of dodgy greps with actual codegen

llvm-svn: 265163
2016-04-01 17:40:25 +00:00
Sanjay Patel 9f413364d5 [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 -
where we noticed that an intermediate splat was being generated for memsets of
non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.

Note that the SSE1 path is not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

llvm-svn: 265161
2016-04-01 17:36:45 +00:00
Sanjay Patel a05e0ff223 [x86] avoid intermediate splat for non-zero memsets (PR27100)
Follow-up to D18566 - where we noticed that an intermediate splat was being
generated for memsets of non-zero chars.

That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.

The tests that were added in the last patch are now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.
In the new tests, the splat via shuffling looks ok to me, but there might be some
room for improvement depending on uarch there.

Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.

Differential Revision: http://reviews.llvm.org/D18676

llvm-svn: 265148
2016-04-01 16:27:14 +00:00
Simon Pilgrim 7ec092d0f8 [X86][AVX512] Regenerated intrinsics tests
llvm-svn: 265135
2016-04-01 11:57:51 +00:00
Andrey Turetskiy 958eb46443 [X86] Introduce Lakemont CPU.
Add a new Intel MCU CPU Lakemont, which doesn't support X87.

Differential Revision: http://reviews.llvm.org/D18650

llvm-svn: 265128
2016-04-01 10:16:15 +00:00
Sean Silva 3de5ef96c9 Improve CHECK-NOT robustness of dllexport tests
This changes some dllexport tests, to verify that some symbols that
should not be exported are not, in a way that improves the robustness
of CHECK-SAME interaction with CHECK-NOT.

We plan to enable dllimport/dllexport support for the PS4, and these
changes are for points we noticed in our internal testing.

Patch by Warren Ristow!

llvm-svn: 265106
2016-04-01 03:54:03 +00:00
Sanjoy Das 9d41a8f269 Don't use an i64 return type with webkit_jscc
Re-enable an assertion enabled by Justin Lebar in rL265092.  rL265092
was breaking test/CodeGen/X86/deopt-intrinsic.ll because webkit_jscc
does not like non-i64 return types.  Change the test case to not do
that.

llvm-svn: 265099
2016-04-01 02:51:21 +00:00
Adrian Prantl b8089516a5 testcase gardening: update the emissionKind enum to the new syntax. (NFC)
llvm-svn: 265081
2016-04-01 00:16:49 +00:00
Adrian Prantl b939a25707 Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.
This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder
into DICompileUnit. DIBuilder is not the right place for this enum to live
in — a metadata consumer should not have to include DIBuilder.h.
I also added a Verifier check that checks that the emission kind of a
DICompileUnit is actually legal.

http://reviews.llvm.org/D18612
<rdar://problem/25427165>

llvm-svn: 265077
2016-03-31 23:56:58 +00:00
Sanjay Patel 61e13249d8 [x86] add memset tests to show another potential improvement
llvm-svn: 265048
2016-03-31 20:40:32 +00:00
Hans Wennborg 132cd62121 Revert r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
I think it might have caused these build breakages:
http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio

llvm-svn: 265046
2016-03-31 20:27:30 +00:00
Simon Pilgrim aab59b7a28 [X86][SSE] Some basic tests for variable shuffles
We don't really support non-constant shuffle masks, but these tests are for cases where BUILD_VECTOR is made up from vector extracts (as well as undef/zero scalars).

llvm-svn: 265045
2016-03-31 20:26:30 +00:00
Hans Wennborg e97fb414e8 [X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)
For code such as:

  void f(int, int);
  void g() {
      f(1, 2);
  }

compiled for 32-bit X86 Linux, Clang would previously generate:

  subl    $12, %esp
  subl    $8, %esp
  pushl   $2
  pushl   $1
  calll   f
  addl    $16, %esp
  addl    $12, %esp
  retl

This patch fixes that by merging adjacent stack adjustments in
eliminateCallFramePseudoInstr().

Differential Revision: http://reviews.llvm.org/D18627

llvm-svn: 265039
2016-03-31 19:26:24 +00:00
Sanjay Patel 92d5ea5e07 [x86] use SSE/AVX ops for non-zero memsets (PR27100)
Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast
targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping
into a codegen sinkhole while trying to splat a byte into an XMM reg.

Follow-on bugs exposed by the current codegen are:
https://llvm.org/bugs/show_bug.cgi?id=27141
https://llvm.org/bugs/show_bug.cgi?id=27143

Differential Revision: http://reviews.llvm.org/D18566

llvm-svn: 265029
2016-03-31 17:30:06 +00:00
Hans Wennborg a7543ba10c More checks in win32-seh-nested-finally.ll after comment on r264966
llvm-svn: 265027
2016-03-31 16:42:10 +00:00
Davide Italiano 936a2b09f3 [DebugInfo] Subprograms should belong to a CU.
Start fixing tests accordingly. There are still
about 35 failures before we can enable this check
in the IR verifier.

llvm-svn: 264990
2016-03-31 03:40:07 +00:00
Sean Silva 24d7e2e869 Fix case confusion.
The test case was defining and using a function 'notExported()', but
the FileCheck checks were checking for the name 'not_exported'. This
changes the test to use 'notExported' across the board. Also, the test
defined a function 'not_defined()', but doesn't have any checks related
to it. For consistency, this name is changed to 'notDefined'. A later
commit will add checks for 'notDefined'.

Patch by Warren Ristow!

llvm-svn: 264984
2016-03-31 01:47:33 +00:00
Hans Wennborg be0df2b102 Add some more triples after r264966
llvm-svn: 264972
2016-03-30 23:55:22 +00:00
Hans Wennborg 6596977130 [X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325)
The size savings are significant, and from what I can tell, both ICC and GCC do this.

Differential Revision: http://reviews.llvm.org/D18573

llvm-svn: 264966
2016-03-30 23:38:01 +00:00
Simon Pilgrim c49bd2ede0 [X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.

llvm-svn: 264922
2016-03-30 20:52:24 +00:00
Simon Pilgrim b87ffe8519 [X86][XOP] BITREVERSE lowering using VPPERM
XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.

llvm-svn: 264870
2016-03-30 14:14:00 +00:00
Simon Pilgrim 9490b56a89 [X86][SSE] Test the legalization of vector comparison results
We are currently doing a REALLY bad job of packing results of vector comparisons into the legalized <X x i1> result equivalents - a mixture of PACKSS/PMOVMSKB would be much better here.

llvm-svn: 264867
2016-03-30 13:55:00 +00:00