Commit Graph

4636 Commits

Author SHA1 Message Date
Benjamin Kramer 4dae598bc8 DAGCombiner: Turn divs of vector splats into vectorized multiplications.
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.

I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.

llvm-svn: 207315
2014-04-26 12:06:28 +00:00
Michael Zolotukhin 1a97a7bcbf Revert r206749 till a final decision about the intrinsics is made.
llvm-svn: 207313
2014-04-26 09:56:41 +00:00
Juergen Ributzka a6bda8bae2 [DAG] During DAG legalization keep opaque constants even after expanding.
The included test case would return the incorrect results, because the expansion
of an shift with a constant shift amount of 0 would generate undefined behavior.

This is because ExpandShiftByConstant assumes that all shifts by constants with
a value of 0 have already been optimized away. This doesn't happen for opaque
constants and usually this isn't a problem, because opaque constants won't take
this code path - they are not supposed to. In the case that the opaque constant
has to be expanded by the legalizer, the legalizer would drop the opaque flag.
In this case we hit the limitations of ExpandShiftByConstant and create incorrect
code.

This commit fixes the legalizer by not dropping the opaque flag when expanding
opaque constants and adding an assertion to ExpandShiftByConstant to catch this
not supported case in the future.

This fixes <rdar://problem/16718472>

llvm-svn: 207304
2014-04-26 02:58:04 +00:00
Quentin Colombet ea18933d97 [X86] Implement TargetLowering::getScalingFactorCost hook.
Scaling factors are not free on X86 because every "complex" addressing mode
breaks the related instruction into 2 allocations instead of 1.

<rdar://problem/16730541>

llvm-svn: 207301
2014-04-26 01:11:26 +00:00
Filipe Cabecinhas d71f110fe9 Appease the almighty buildbots.
llvm-svn: 207295
2014-04-26 00:02:37 +00:00
Filipe Cabecinhas 363b570d2a Optimization for certain shufflevector by using insertps.
Summary:
If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower
certain shufflevectors to an insertps instruction:
When most of the shufflevector result's elements come from one vector (and
keep their index), and one element comes from another vector or a memory
operand.

Added tests for insertps optimizations on shufflevector.
Added support and tests for v4i32 vector optimization.

Reviewers: nadav

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3475

llvm-svn: 207291
2014-04-25 23:51:17 +00:00
Benjamin Kramer 76f753e9a9 X86: Don't transform shifts into ands when the sign bit is tested.
Should unbreak MultiSource/Benchmarks/mediabench/g721/g721encode/encode.

llvm-svn: 207145
2014-04-24 20:51:37 +00:00
Reid Kleckner 5772b77789 Add 'musttail' marker to call instructions
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur.  It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.

Reviewers: nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D3240

llvm-svn: 207143
2014-04-24 20:14:34 +00:00
Reid Kleckner 0fbb1e91e5 Fix rdtsc.ll test to match r8 on win64
llvm-svn: 207142
2014-04-24 20:14:08 +00:00
Andrea Di Biagio d1ab866868 [X86] Add support for Read Time Stamp Counter x86 builtin intrinsics.
This patch:
- Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and
   'int_x86_rdtscp') as GCCBuiltin intrinsics;
- Teaches the backend how to lower the two new builtins;
- Introduces a common function to lower READCYCLECOUNTER dag nodes
  and the two new rdtsc/rdtscp intrinsics;
- Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll'
  correctly verifies that both READCYCLECOUNTER and the two new intrinsics
  work fine for both 64bit and 32bit Subtargets.

llvm-svn: 207127
2014-04-24 17:18:27 +00:00
Benjamin Kramer f4575db2fd X86: Emit test instead of constant shift + compare if the shift result is unused.
This allows us to compile
  return (mask & 0x8 ? a : b);
into
  testb $8, %dil
  cmovnel %edx, %esi
instead of
  andl  $8, %edi
  shrl  $3, %edi
  cmovnel %edx, %esi

which we formed previously because dag combiner canonicalizes setcc of and into shift.

llvm-svn: 207088
2014-04-24 08:15:31 +00:00
Saleem Abdulrasool 11049a0fef MC: honour IMAGE_SCN_CNT_INITIALIZED_DATA
Emit the flag to indicate to the assembler that a section contains data if there
is pre-populated data present.

llvm-svn: 207028
2014-04-23 21:29:34 +00:00
Elena Demikhovsky acc5c9e83e AVX-512: store and truncstore for i1 values
llvm-svn: 206897
2014-04-22 14:13:10 +00:00
Lang Hames f6f42cac3f [X86] Don't use BZHI for short masks (>=32 bits). Thanks to Ben Kramer for the
review.

llvm-svn: 206869
2014-04-22 07:40:34 +00:00
Michael Zolotukhin f2ba994bf6 Reapply r206732. This time without optimization of branches.
llvm-svn: 206749
2014-04-21 12:01:33 +00:00
NAKAMURA Takumi 54d9f88bed llvm/test/CodeGen/X86/bmi.ll: Relax expressions for targeting win32.
llvm-svn: 206743
2014-04-21 11:01:46 +00:00
Lang Hames 5aa6ee80b6 [X86] ISEL (and X, <constant mask>) to BZHI when BMI2 is available.
Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)),
was already supported, but we were missing the constant-mask case. This patch
fixes that.

<rdar://problem/15480077>

llvm-svn: 206738
2014-04-21 08:18:53 +00:00
Chandler Carruth a2533a7bef Revert r206732 which is causing llc to crash on most of the build bots.
Original commit message:
  Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN,
  safe.srem.iN, safe.urem.iN (iN = i8, i61, i32, or i64).

llvm-svn: 206735
2014-04-21 07:11:15 +00:00
Michael Zolotukhin 137a84616c Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN, safe.srem.iN,
safe.urem.iN (iN = i8, i16, i32, or i64).

llvm-svn: 206732
2014-04-21 05:33:09 +00:00
Duncan P. N. Exon Smith 6611a377eb Revert "blockfreq: Temporarily turn on -debug-only=block-freq"
This reverts commit r206705, as planned.

llvm-svn: 206706
2014-04-19 22:45:44 +00:00
Duncan P. N. Exon Smith bffee5bb90 blockfreq: Temporarily turn on -debug-only=block-freq
These tests fail after my BlockFrequencyInfo rewrite on two buildbots
[1][2].  I can't reproduce it locally, so I'm temporarily turning on
-debug-only=block-freq so I can find the problem.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1860
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18477

llvm-svn: 206705
2014-04-19 22:40:56 +00:00
Yaron Keren d7ba46b287 Patch by Vadim Chugunov
Win64 stack unwinder gets confused when execution flow "falls through" after
a call to 'noreturn' function. This fixes the "missing epilogue" problem by 
emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows.

A secondary use for it would be for anyone wanting to make double-sure that
'noreturn' functions, indeed, do not return.

llvm-svn: 206684
2014-04-19 13:47:43 +00:00
Yaron Keren d0d38bf91e Expanded test for x86-pc-windows-gnu and x86_64-pc-windows-gnu environments.
llvm-svn: 206649
2014-04-18 21:10:11 +00:00
Adam Nemet ee7a3e38c9 [X86] Improve buildFromShuffleMostly for AVX
For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors,
both the BUILD_VECTOR and its operands may need to be legalized in multiple
steps.  Consider:

(v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>),
                     (extract_vector_elt %vreg0, Constant<2>),
                     (extract_vector_elt %vreg0, Constant<3>),
                     (extract_vector_elt %vreg0, Constant<4>),
                     (extract_vector_elt %vreg0, Constant<5>),
                     (extract_vector_elt %vreg0, Constant<6>),
                     (extract_vector_elt %vreg0, Constant<7>),
                     %vreg1))

a. We can't build a 256-bit vector efficiently so, we need to split it into
two 128-bit vecs and combine them with VINSERTX128.

b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be
split into a VEXTRACTX128 and a further extract_vector_elt from the
resulting 128-bit vector.

c. The extract_vector_elt from b. is lowered into a shuffle to the first
element and a movss.

Depending on the order in which we legalize the BUILD_VECTOR and its
operands[1], buildFromShuffleMostly may be faced with:

(v4f32 (BUILD_VECTOR (extract_vector_elt
                      (vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     %vreg1))

In order to figure out the underlying vector and their identity we need to see
through the shuffles.

[1] Note that the order in which operations and their operands are legalized is
only guaranteed in the first iteration of LegalizeDAG.

Fixes <rdar://problem/16296956>

llvm-svn: 206634
2014-04-18 19:44:16 +00:00
Duncan P. N. Exon Smith c812b5b33a Add some target triples for better determinism
These tests were failing on some buildbots after r206548 (reverted in
r206556), but passing locally.

They were missing target triples, so maybe that's the problem?

llvm-svn: 206621
2014-04-18 17:22:19 +00:00
Benjamin Kramer e6c821ef4c X86: Pattern match scalar loads + vcvtph2ps into just vcvtph2ps.
vcvtph2ps only reads the lower 64 bits of the address passed to the
intrinsic.

llvm-svn: 206579
2014-04-18 10:45:33 +00:00
Josh Magee adfde5fef6 [stack protector] Make the StackProtector pass respect ssp-buffer-size.
Previously, SSPBufferSize was assigned the value of the "stack-protector-buffer-size"
attribute after all uses of SSPBufferSize.  The effect was that the default
SSPBufferSize was always used during analysis.  I moved the check for the
attribute before the analysis; now --param ssp-buffer-size= works correctly again.

Differential Revision: http://reviews.llvm.org/D3349

llvm-svn: 206486
2014-04-17 19:08:36 +00:00
Andrea Di Biagio aac2eac4c2 [X86] Improve the lowering of packed shifts by constant build_vector.
This patch teaches the backend how to efficiently lower logical and
arithmetic packed shifts on both SSE and AVX/AVX2 machines.

When possible, instead of scalarizing a vector shift, the backend should try
to expand the shift into a sequence of two packed shifts by immedate count
followed by a MOVSS/MOVSD.

Example
  (v4i32 (srl A, (build_vector < X, Y, Y, Y>)))

Can be rewritten as:
  (v4i32 (MOVSS (srl A, <Y,Y,Y,Y>), (srl A, <X,X,X,X>)))

[with X and Y ConstantInt]

The advantage is that the two new shifts from the example would be lowered into
X86ISD::VSRLI nodes. This is always cheaper than scalarizing the vector into
four scalar shifts plus four pairs of vector insert/extract.

llvm-svn: 206316
2014-04-15 19:30:48 +00:00
Robert Lougher a9bf2463b9 Revert r191049/r191059 as it can produce wrong code (see PR17975).
It has already been reverted on the 3.4 branch in r196521.

llvm-svn: 206311
2014-04-15 18:34:24 +00:00
Quentin Colombet c396019837 [Register Coalescer] Add a test case for 206060.
<rdar://problem/16582185>

llvm-svn: 206235
2014-04-15 01:15:32 +00:00
Hal Finkel c3998306f4 Add the ability to use GEPs for address sinking in CGP
The current memory-instruction optimization logic in CGP, which sinks parts of
the address computation that can be adsorbed by the addressing mode, does this
by explicitly converting the relevant part of the address computation into
IR-level integer operations (making use of ptrtoint and inttoptr). For most
targets this is currently not a problem, but for targets wishing to make use of
IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a
problem for two reasons:
  1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr
  2. In cases where type-punning was used, and BasicAA was used
     to override TBAA, BasicAA may no longer do so. (this had forced us to disable
     all use of TBAA in CodeGen; something which we can now enable again)

This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by
default (except for those targets that use AA during CodeGen), and so aside
from some PowerPC subtargets and SystemZ, there should be no change in
behavior. We may be able to switch completely away from the ptrtoint/inttoptr
sinking on all targets, but further testing is required.

I've doubled-up on a number of existing tests that are sensitive to the
address sinking behavior (including some store-merging tests that are
sensitive to the order of the resulting ADD operations at the SDAG level).

llvm-svn: 206092
2014-04-12 00:59:48 +00:00
Quentin Colombet 4344da1c71 [RegAllocGreedy][Last Chance Recoloring] Change the name of the exhaustive search option.
fexhaustive-register-search => exhaustive-register-search
'f' is a Clang thing!

This is related to PR18747.

llvm-svn: 206075
2014-04-11 21:51:09 +00:00
Quentin Colombet 567e30bc2b [RegAllocGreedy][Last Chance Recoloring] Addition of
-fexhaustive-register-search option to allow an exhaustive search during last
chance recoloring.

This is related to PR18747

Patch by MAYUR PANDEY <mayur.p@samsung.com>. 

llvm-svn: 206072
2014-04-11 21:39:44 +00:00
Reid Kleckner 9c6582129a Move the segmented stack switch to a function attribute
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.

Patch by Luqman Aden and Alex Crichton!

llvm-svn: 205997
2014-04-10 22:58:43 +00:00
Josh Magee 79ae600818 [stack protector] Refactor and clean-up test. No functionality change.
Refactored stack-protector.ll to use new-style function attributes everywhere
and eliminated unnecessary attributes.

This cleanup is in preparation for an upcoming test change.

llvm-svn: 205996
2014-04-10 22:47:27 +00:00
Jim Grosbach 576f8cf19f X86: Tighten up test.
llc CPU autodection bites again. Speculative fix for bot failures.

llvm-svn: 205940
2014-04-10 00:27:43 +00:00
Jim Grosbach e4fef71981 Add support for load folding of avx1 logical instructions
AVX supports logical operations using an operand from memory. Unfortunately
because integer operations were not added until AVX2 the AVX1 logical
operation's types were preventing the isel from folding the loads. In a limited
number of cases the peephole optimizer would fold the loads, but most were
missed. This patch adds explicit patterns with appropriate casts in order for
these loads to be folded.

The included test cases run on reduced examples and disable the peephole
optimizer to ensure the folds are being pattern matched.

Patch by Louis Gerbarg <lgg@apple.com>

rdar://16355124

llvm-svn: 205938
2014-04-09 23:39:25 +00:00
Jim Grosbach cad4cd6c9e SelectionDAG: Don't constant fold target-specific nodes.
FoldConstantArithmetic() only knows how to deal with a few target independent
ISD opcodes. Bail early if it sees a target-specific ISD node. These node do
funny things with operand types which may break the assumptions of the code
that follows, and there's no actual folding that can be done anyway. For example,
non-constant 256 bit vector shifts on X86 have a shift-amount operand that's a
128-bit v4i32 vector regardless of what the first operand type is and that breaks
the assumption that the operand types must match.

rdar://16530923

llvm-svn: 205937
2014-04-09 23:28:11 +00:00
Elena Demikhovsky cf0b9bafc3 AVX-512: insert element to mask vector; store i1 data
Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors;
Implemented "store" for i1 type

llvm-svn: 205850
2014-04-09 12:37:50 +00:00
Elena Demikhovsky 3dcfbdfa54 AVX-512: Added fp_to_uint and uint_to_fp patterns.
llvm-svn: 205754
2014-04-08 07:24:02 +00:00
NAKAMURA Takumi 39485e2dc3 Quick fix: Triple::isOSMSVCRT() should be false for targeting cygwin.
It affected callee's stack pop in x86. It is one of devergences between cygwin and mingw since mingw-gcc-4.6.

Added testcases to llvm/test/CodeGen/X86/win32_sret.ll for cygwin.

llvm-svn: 205688
2014-04-06 10:01:23 +00:00
Quentin Colombet 96bd2a1490 [RegAllocGreedy][Last Chance Recoloring] Emit diagnostics when last chance
recoloring cut-offs are encountered and register allocation failed.

This is related to PR18747

Patch by MAYUR PANDEY <mayur.p@samsung.com>.

llvm-svn: 205601
2014-04-04 02:05:21 +00:00
Quentin Colombet 9c816f39ad Revert r205599, the commit was not intended to have so many changes
llvm-svn: 205600
2014-04-04 02:02:49 +00:00
Quentin Colombet 7ee4e79dec [RegAllocGreedy][Last Chance Recoloring] Emit diagnostics when last chance
recoloring cut-offs are hit.

This is related to PR18747.

Patch by MAYUR PANDEY <mayur.p@samsung.com>

llvm-svn: 205599
2014-04-04 01:58:57 +00:00
NAKAMURA Takumi 8ff866c24e llvm/test/CodeGen/X86/peephole-multiple-folds.ll: Relax expressions to satisfy win32.
llvm-svn: 205559
2014-04-03 20:07:51 +00:00
Richard Trieu 7c6fcd2060 Fix test case.
llvm-svn: 205492
2014-04-03 00:14:18 +00:00
Lang Hames 5dc14bd54c [CodeGen] Teach the peephole optimizer to remember (and exploit) all folding
opportunities in the current basic block, rather than just the last one seen.

<rdar://problem/16478629>

llvm-svn: 205481
2014-04-02 22:59:58 +00:00
Juergen Ributzka fcd2e94ecc Add comments and test case for [DAG] Keep the opaque constant flag when performing unary constant folding operations (r204737).
llvm-svn: 205474
2014-04-02 22:21:01 +00:00
Reid Kleckner 101102711d Support segmented stacks on Win64
Identical to Win32 method except the GS segment register is used for TLS
instead of FS and pvArbitrary is at TEB offset 0x28 instead of 0x14.

llvm-svn: 205342
2014-04-01 18:34:21 +00:00
Alexey Volkov 1328b28dc6 [x86] Do not convert to cmp32 for Atom arch by Sergey Okunev
Differential Revision: http://llvm-reviews.chandlerc.com/D2824

llvm-svn: 205288
2014-04-01 08:13:07 +00:00