Commit Graph

4734 Commits

Author SHA1 Message Date
Juergen Ributzka a6bda8bae2 [DAG] During DAG legalization keep opaque constants even after expanding.
The included test case would return the incorrect results, because the expansion
of an shift with a constant shift amount of 0 would generate undefined behavior.

This is because ExpandShiftByConstant assumes that all shifts by constants with
a value of 0 have already been optimized away. This doesn't happen for opaque
constants and usually this isn't a problem, because opaque constants won't take
this code path - they are not supposed to. In the case that the opaque constant
has to be expanded by the legalizer, the legalizer would drop the opaque flag.
In this case we hit the limitations of ExpandShiftByConstant and create incorrect
code.

This commit fixes the legalizer by not dropping the opaque flag when expanding
opaque constants and adding an assertion to ExpandShiftByConstant to catch this
not supported case in the future.

This fixes <rdar://problem/16718472>

llvm-svn: 207304
2014-04-26 02:58:04 +00:00
Quentin Colombet ea18933d97 [X86] Implement TargetLowering::getScalingFactorCost hook.
Scaling factors are not free on X86 because every "complex" addressing mode
breaks the related instruction into 2 allocations instead of 1.

<rdar://problem/16730541>

llvm-svn: 207301
2014-04-26 01:11:26 +00:00
Filipe Cabecinhas d71f110fe9 Appease the almighty buildbots.
llvm-svn: 207295
2014-04-26 00:02:37 +00:00
Filipe Cabecinhas 363b570d2a Optimization for certain shufflevector by using insertps.
Summary:
If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower
certain shufflevectors to an insertps instruction:
When most of the shufflevector result's elements come from one vector (and
keep their index), and one element comes from another vector or a memory
operand.

Added tests for insertps optimizations on shufflevector.
Added support and tests for v4i32 vector optimization.

Reviewers: nadav

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3475

llvm-svn: 207291
2014-04-25 23:51:17 +00:00
Benjamin Kramer 76f753e9a9 X86: Don't transform shifts into ands when the sign bit is tested.
Should unbreak MultiSource/Benchmarks/mediabench/g721/g721encode/encode.

llvm-svn: 207145
2014-04-24 20:51:37 +00:00
Reid Kleckner 5772b77789 Add 'musttail' marker to call instructions
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur.  It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.

Reviewers: nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D3240

llvm-svn: 207143
2014-04-24 20:14:34 +00:00
Reid Kleckner 0fbb1e91e5 Fix rdtsc.ll test to match r8 on win64
llvm-svn: 207142
2014-04-24 20:14:08 +00:00
Andrea Di Biagio d1ab866868 [X86] Add support for Read Time Stamp Counter x86 builtin intrinsics.
This patch:
- Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and
   'int_x86_rdtscp') as GCCBuiltin intrinsics;
- Teaches the backend how to lower the two new builtins;
- Introduces a common function to lower READCYCLECOUNTER dag nodes
  and the two new rdtsc/rdtscp intrinsics;
- Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll'
  correctly verifies that both READCYCLECOUNTER and the two new intrinsics
  work fine for both 64bit and 32bit Subtargets.

llvm-svn: 207127
2014-04-24 17:18:27 +00:00
Benjamin Kramer f4575db2fd X86: Emit test instead of constant shift + compare if the shift result is unused.
This allows us to compile
  return (mask & 0x8 ? a : b);
into
  testb $8, %dil
  cmovnel %edx, %esi
instead of
  andl  $8, %edi
  shrl  $3, %edi
  cmovnel %edx, %esi

which we formed previously because dag combiner canonicalizes setcc of and into shift.

llvm-svn: 207088
2014-04-24 08:15:31 +00:00
Saleem Abdulrasool 11049a0fef MC: honour IMAGE_SCN_CNT_INITIALIZED_DATA
Emit the flag to indicate to the assembler that a section contains data if there
is pre-populated data present.

llvm-svn: 207028
2014-04-23 21:29:34 +00:00
Elena Demikhovsky acc5c9e83e AVX-512: store and truncstore for i1 values
llvm-svn: 206897
2014-04-22 14:13:10 +00:00
Lang Hames f6f42cac3f [X86] Don't use BZHI for short masks (>=32 bits). Thanks to Ben Kramer for the
review.

llvm-svn: 206869
2014-04-22 07:40:34 +00:00
Michael Zolotukhin f2ba994bf6 Reapply r206732. This time without optimization of branches.
llvm-svn: 206749
2014-04-21 12:01:33 +00:00
NAKAMURA Takumi 54d9f88bed llvm/test/CodeGen/X86/bmi.ll: Relax expressions for targeting win32.
llvm-svn: 206743
2014-04-21 11:01:46 +00:00
Lang Hames 5aa6ee80b6 [X86] ISEL (and X, <constant mask>) to BZHI when BMI2 is available.
Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)),
was already supported, but we were missing the constant-mask case. This patch
fixes that.

<rdar://problem/15480077>

llvm-svn: 206738
2014-04-21 08:18:53 +00:00
Chandler Carruth a2533a7bef Revert r206732 which is causing llc to crash on most of the build bots.
Original commit message:
  Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN,
  safe.srem.iN, safe.urem.iN (iN = i8, i61, i32, or i64).

llvm-svn: 206735
2014-04-21 07:11:15 +00:00
Michael Zolotukhin 137a84616c Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN, safe.srem.iN,
safe.urem.iN (iN = i8, i16, i32, or i64).

llvm-svn: 206732
2014-04-21 05:33:09 +00:00
Duncan P. N. Exon Smith 6611a377eb Revert "blockfreq: Temporarily turn on -debug-only=block-freq"
This reverts commit r206705, as planned.

llvm-svn: 206706
2014-04-19 22:45:44 +00:00
Duncan P. N. Exon Smith bffee5bb90 blockfreq: Temporarily turn on -debug-only=block-freq
These tests fail after my BlockFrequencyInfo rewrite on two buildbots
[1][2].  I can't reproduce it locally, so I'm temporarily turning on
-debug-only=block-freq so I can find the problem.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1860
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18477

llvm-svn: 206705
2014-04-19 22:40:56 +00:00
Yaron Keren d7ba46b287 Patch by Vadim Chugunov
Win64 stack unwinder gets confused when execution flow "falls through" after
a call to 'noreturn' function. This fixes the "missing epilogue" problem by 
emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows.

A secondary use for it would be for anyone wanting to make double-sure that
'noreturn' functions, indeed, do not return.

llvm-svn: 206684
2014-04-19 13:47:43 +00:00
Yaron Keren d0d38bf91e Expanded test for x86-pc-windows-gnu and x86_64-pc-windows-gnu environments.
llvm-svn: 206649
2014-04-18 21:10:11 +00:00
Adam Nemet ee7a3e38c9 [X86] Improve buildFromShuffleMostly for AVX
For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors,
both the BUILD_VECTOR and its operands may need to be legalized in multiple
steps.  Consider:

(v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>),
                     (extract_vector_elt %vreg0, Constant<2>),
                     (extract_vector_elt %vreg0, Constant<3>),
                     (extract_vector_elt %vreg0, Constant<4>),
                     (extract_vector_elt %vreg0, Constant<5>),
                     (extract_vector_elt %vreg0, Constant<6>),
                     (extract_vector_elt %vreg0, Constant<7>),
                     %vreg1))

a. We can't build a 256-bit vector efficiently so, we need to split it into
two 128-bit vecs and combine them with VINSERTX128.

b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be
split into a VEXTRACTX128 and a further extract_vector_elt from the
resulting 128-bit vector.

c. The extract_vector_elt from b. is lowered into a shuffle to the first
element and a movss.

Depending on the order in which we legalize the BUILD_VECTOR and its
operands[1], buildFromShuffleMostly may be faced with:

(v4f32 (BUILD_VECTOR (extract_vector_elt
                      (vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     %vreg1))

In order to figure out the underlying vector and their identity we need to see
through the shuffles.

[1] Note that the order in which operations and their operands are legalized is
only guaranteed in the first iteration of LegalizeDAG.

Fixes <rdar://problem/16296956>

llvm-svn: 206634
2014-04-18 19:44:16 +00:00
Duncan P. N. Exon Smith c812b5b33a Add some target triples for better determinism
These tests were failing on some buildbots after r206548 (reverted in
r206556), but passing locally.

They were missing target triples, so maybe that's the problem?

llvm-svn: 206621
2014-04-18 17:22:19 +00:00
Benjamin Kramer e6c821ef4c X86: Pattern match scalar loads + vcvtph2ps into just vcvtph2ps.
vcvtph2ps only reads the lower 64 bits of the address passed to the
intrinsic.

llvm-svn: 206579
2014-04-18 10:45:33 +00:00
Josh Magee adfde5fef6 [stack protector] Make the StackProtector pass respect ssp-buffer-size.
Previously, SSPBufferSize was assigned the value of the "stack-protector-buffer-size"
attribute after all uses of SSPBufferSize.  The effect was that the default
SSPBufferSize was always used during analysis.  I moved the check for the
attribute before the analysis; now --param ssp-buffer-size= works correctly again.

Differential Revision: http://reviews.llvm.org/D3349

llvm-svn: 206486
2014-04-17 19:08:36 +00:00
Andrea Di Biagio aac2eac4c2 [X86] Improve the lowering of packed shifts by constant build_vector.
This patch teaches the backend how to efficiently lower logical and
arithmetic packed shifts on both SSE and AVX/AVX2 machines.

When possible, instead of scalarizing a vector shift, the backend should try
to expand the shift into a sequence of two packed shifts by immedate count
followed by a MOVSS/MOVSD.

Example
  (v4i32 (srl A, (build_vector < X, Y, Y, Y>)))

Can be rewritten as:
  (v4i32 (MOVSS (srl A, <Y,Y,Y,Y>), (srl A, <X,X,X,X>)))

[with X and Y ConstantInt]

The advantage is that the two new shifts from the example would be lowered into
X86ISD::VSRLI nodes. This is always cheaper than scalarizing the vector into
four scalar shifts plus four pairs of vector insert/extract.

llvm-svn: 206316
2014-04-15 19:30:48 +00:00
Robert Lougher a9bf2463b9 Revert r191049/r191059 as it can produce wrong code (see PR17975).
It has already been reverted on the 3.4 branch in r196521.

llvm-svn: 206311
2014-04-15 18:34:24 +00:00
Quentin Colombet c396019837 [Register Coalescer] Add a test case for 206060.
<rdar://problem/16582185>

llvm-svn: 206235
2014-04-15 01:15:32 +00:00
Hal Finkel c3998306f4 Add the ability to use GEPs for address sinking in CGP
The current memory-instruction optimization logic in CGP, which sinks parts of
the address computation that can be adsorbed by the addressing mode, does this
by explicitly converting the relevant part of the address computation into
IR-level integer operations (making use of ptrtoint and inttoptr). For most
targets this is currently not a problem, but for targets wishing to make use of
IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a
problem for two reasons:
  1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr
  2. In cases where type-punning was used, and BasicAA was used
     to override TBAA, BasicAA may no longer do so. (this had forced us to disable
     all use of TBAA in CodeGen; something which we can now enable again)

This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by
default (except for those targets that use AA during CodeGen), and so aside
from some PowerPC subtargets and SystemZ, there should be no change in
behavior. We may be able to switch completely away from the ptrtoint/inttoptr
sinking on all targets, but further testing is required.

I've doubled-up on a number of existing tests that are sensitive to the
address sinking behavior (including some store-merging tests that are
sensitive to the order of the resulting ADD operations at the SDAG level).

llvm-svn: 206092
2014-04-12 00:59:48 +00:00
Quentin Colombet 4344da1c71 [RegAllocGreedy][Last Chance Recoloring] Change the name of the exhaustive search option.
fexhaustive-register-search => exhaustive-register-search
'f' is a Clang thing!

This is related to PR18747.

llvm-svn: 206075
2014-04-11 21:51:09 +00:00
Quentin Colombet 567e30bc2b [RegAllocGreedy][Last Chance Recoloring] Addition of
-fexhaustive-register-search option to allow an exhaustive search during last
chance recoloring.

This is related to PR18747

Patch by MAYUR PANDEY <mayur.p@samsung.com>. 

llvm-svn: 206072
2014-04-11 21:39:44 +00:00
Reid Kleckner 9c6582129a Move the segmented stack switch to a function attribute
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.

Patch by Luqman Aden and Alex Crichton!

llvm-svn: 205997
2014-04-10 22:58:43 +00:00
Josh Magee 79ae600818 [stack protector] Refactor and clean-up test. No functionality change.
Refactored stack-protector.ll to use new-style function attributes everywhere
and eliminated unnecessary attributes.

This cleanup is in preparation for an upcoming test change.

llvm-svn: 205996
2014-04-10 22:47:27 +00:00
Jim Grosbach 576f8cf19f X86: Tighten up test.
llc CPU autodection bites again. Speculative fix for bot failures.

llvm-svn: 205940
2014-04-10 00:27:43 +00:00
Jim Grosbach e4fef71981 Add support for load folding of avx1 logical instructions
AVX supports logical operations using an operand from memory. Unfortunately
because integer operations were not added until AVX2 the AVX1 logical
operation's types were preventing the isel from folding the loads. In a limited
number of cases the peephole optimizer would fold the loads, but most were
missed. This patch adds explicit patterns with appropriate casts in order for
these loads to be folded.

The included test cases run on reduced examples and disable the peephole
optimizer to ensure the folds are being pattern matched.

Patch by Louis Gerbarg <lgg@apple.com>

rdar://16355124

llvm-svn: 205938
2014-04-09 23:39:25 +00:00
Jim Grosbach cad4cd6c9e SelectionDAG: Don't constant fold target-specific nodes.
FoldConstantArithmetic() only knows how to deal with a few target independent
ISD opcodes. Bail early if it sees a target-specific ISD node. These node do
funny things with operand types which may break the assumptions of the code
that follows, and there's no actual folding that can be done anyway. For example,
non-constant 256 bit vector shifts on X86 have a shift-amount operand that's a
128-bit v4i32 vector regardless of what the first operand type is and that breaks
the assumption that the operand types must match.

rdar://16530923

llvm-svn: 205937
2014-04-09 23:28:11 +00:00
Elena Demikhovsky cf0b9bafc3 AVX-512: insert element to mask vector; store i1 data
Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors;
Implemented "store" for i1 type

llvm-svn: 205850
2014-04-09 12:37:50 +00:00
Elena Demikhovsky 3dcfbdfa54 AVX-512: Added fp_to_uint and uint_to_fp patterns.
llvm-svn: 205754
2014-04-08 07:24:02 +00:00
NAKAMURA Takumi 39485e2dc3 Quick fix: Triple::isOSMSVCRT() should be false for targeting cygwin.
It affected callee's stack pop in x86. It is one of devergences between cygwin and mingw since mingw-gcc-4.6.

Added testcases to llvm/test/CodeGen/X86/win32_sret.ll for cygwin.

llvm-svn: 205688
2014-04-06 10:01:23 +00:00
Quentin Colombet 96bd2a1490 [RegAllocGreedy][Last Chance Recoloring] Emit diagnostics when last chance
recoloring cut-offs are encountered and register allocation failed.

This is related to PR18747

Patch by MAYUR PANDEY <mayur.p@samsung.com>.

llvm-svn: 205601
2014-04-04 02:05:21 +00:00
Quentin Colombet 9c816f39ad Revert r205599, the commit was not intended to have so many changes
llvm-svn: 205600
2014-04-04 02:02:49 +00:00
Quentin Colombet 7ee4e79dec [RegAllocGreedy][Last Chance Recoloring] Emit diagnostics when last chance
recoloring cut-offs are hit.

This is related to PR18747.

Patch by MAYUR PANDEY <mayur.p@samsung.com>

llvm-svn: 205599
2014-04-04 01:58:57 +00:00
NAKAMURA Takumi 8ff866c24e llvm/test/CodeGen/X86/peephole-multiple-folds.ll: Relax expressions to satisfy win32.
llvm-svn: 205559
2014-04-03 20:07:51 +00:00
Richard Trieu 7c6fcd2060 Fix test case.
llvm-svn: 205492
2014-04-03 00:14:18 +00:00
Lang Hames 5dc14bd54c [CodeGen] Teach the peephole optimizer to remember (and exploit) all folding
opportunities in the current basic block, rather than just the last one seen.

<rdar://problem/16478629>

llvm-svn: 205481
2014-04-02 22:59:58 +00:00
Juergen Ributzka fcd2e94ecc Add comments and test case for [DAG] Keep the opaque constant flag when performing unary constant folding operations (r204737).
llvm-svn: 205474
2014-04-02 22:21:01 +00:00
Reid Kleckner 101102711d Support segmented stacks on Win64
Identical to Win32 method except the GS segment register is used for TLS
instead of FS and pvArbitrary is at TEB offset 0x28 instead of 0x14.

llvm-svn: 205342
2014-04-01 18:34:21 +00:00
Alexey Volkov 1328b28dc6 [x86] Do not convert to cmp32 for Atom arch by Sergey Okunev
Differential Revision: http://llvm-reviews.chandlerc.com/D2824

llvm-svn: 205288
2014-04-01 08:13:07 +00:00
Juergen Ributzka e117992f00 [Stackmaps] Update the stackmap format to use 64-bit relocations for the function address and properly align all entries.
This commit updates the stackmap format to version 1 to indicate the
reorganizaion of several fields. This was done in order to align stackmap
entries to their natural alignment and to minimize padding.

Fixes <rdar://problem/16005902>

llvm-svn: 205254
2014-03-31 22:14:04 +00:00
Yaron Keren c6a57ea4e9 Two updated tests for MinGW 32 and 64 exception handling code generation.
llvm-svn: 205227
2014-03-31 17:34:15 +00:00
Akira Hatanaka 9afbb8c2b1 [x86] Fix printing of register operands with q modifier.
Emit 32-bit register names instead of 64-bit register names if the target does
not have 64-bit general purpose registers.

<rdar://problem/14653996>

llvm-svn: 205067
2014-03-28 23:28:07 +00:00
David Majnemer 02f2188bb9 X86: Disable IsLegalToCallImmediateAddr for Win32
WinCOFF cannot form PC relative relocations to support absolute
MCValues.  We should reenable this once WinCOFF supports emission of
IMAGE_REL_I386_REL32 relocations.

This fixes PR19272.

llvm-svn: 205058
2014-03-28 21:40:47 +00:00
Rafael Espindola 24a669d225 Prevent alias from pointing to weak aliases.
This adds back r204781.

Original message:

Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204934
2014-03-27 15:26:56 +00:00
Elena Demikhovsky bb2f6b72d3 AVX-512: Implemented masking for integer arithmetic & logic instructions.
By Robert Khasanov rob.khasanov@gmail.com

llvm-svn: 204906
2014-03-27 09:45:08 +00:00
Ekaterina Romanova b9aea9383a This is a fix for PR# 19051. I noticed code gen differences due to code motion when running tests with and without the debug info at O2. The problem is in branch folding. A loop wanted to skip the debug info, but actually it didn't do so.
llvm-svn: 204865
2014-03-26 22:15:28 +00:00
Jim Grosbach ed2cd39b81 Fix for incorrect address sinking in the presence of potential overflows.
In some cases it is possible for CGP to attempt to reuse a base address from
another basic block. In those cases we have to be sure that all the address
math was either done at the same bit width, or that none of it overflowed
before it was extended.

Patch by Louis Gerbarg <lgg@apple.com>

rdar://16307442

llvm-svn: 204833
2014-03-26 17:27:01 +00:00
Hans Wennborg d683a22dd2 Revert "X86 memcpy lowering: use "rep movs" even when esi is used as base pointer" (r204174)
>  For functions where esi is used as base pointer, we would previously fall ba
>  from lowering memcpy with "rep movs" because that clobbers esi.
>
>  With this patch, we just store esi in another physical register, and restore
>  it afterwards. This adds a little bit of register preassure, but the more
>  efficient memcpy should be worth it.
>
>  Differential Revision: http://llvm-reviews.chandlerc.com/D2968

This didn't work. I was ending up with code like this:

  lea     edi,[esi+38h]
  mov     ecx,0Fh
  mov     edx,esi
  mov     esi,ebx
  rep movs dword ptr es:[edi],dword ptr [esi]
  lea     ecx,[esi+74h] <-- Ooops, we're now using esi before restoring it from edx.
  add     ebx,3Ch
  mov     esi,edx

I guess if we want to do this we need stronger glue or something, or doing the expansion
much later.

llvm-svn: 204829
2014-03-26 16:30:54 +00:00
Cameron McInally 4532596b8f Fix AVX512 Gather and Scatter execution domains.
llvm-svn: 204804
2014-03-26 13:50:50 +00:00
Renato Golin c0a3c1d66b Add @llvm.clear_cache builtin
Implementing the LLVM part of the call to __builtin___clear_cache
which translates into an intrinsic @llvm.clear_cache and is lowered
by each target, either to a call to __clear_cache or nothing at all
incase the caches are unified.

Updating LangRef and adding some tests for the implemented architectures.
Other archs will have to implement the method in case this builtin
has to be compiled for it, since the default behaviour is to bail
unimplemented.

A Clang patch is required for the builtin to be lowered into the
llvm intrinsic. This will be done next.

llvm-svn: 204802
2014-03-26 12:52:28 +00:00
Rafael Espindola 65481d7b97 Revert "Prevent alias from pointing to weak aliases."
This reverts commit r204781.

I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.

llvm-svn: 204784
2014-03-26 06:14:40 +00:00
Rafael Espindola 3b712a84a9 Prevent alias from pointing to weak aliases.
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given

define void @my_func() {
  ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias

We produce without this patch:

        .weak   my_alias
my_alias = my_func
        .globl  my_alias2
my_alias2 = my_alias

That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a

@my_alias = alias void ()* @other_func

would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.

There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.

llvm-svn: 204781
2014-03-26 04:48:47 +00:00
Quentin Colombet 6f12ae0d5c [X86] Add broadcast instructions to the table used by ExeDepsFix pass.
Adds the different broadcast instructions to the ReplaceableInstrsAVX2 table.
That way the ExeDepsFix pass can take better decisions when AVX2 broadcasts are
across domain (int <-> float).

In particular, prior to this patch we were generating:
  vpbroadcastd  LCPI1_0(%rip), %ymm2
  vpand %ymm2, %ymm0, %ymm0
  vmaxps  %ymm1, %ymm0, %ymm0 ## <- domain change penalty

Now, we generate the following nice sequence where everything is in the float
domain:
  vbroadcastss  LCPI1_0(%rip), %ymm2
  vandps  %ymm2, %ymm0, %ymm0
  vmaxps  %ymm1, %ymm0, %ymm0

<rdar://problem/16354675>

llvm-svn: 204770
2014-03-26 00:10:22 +00:00
Adam Nemet 4beef4c90d [X86] Generate VPSHUFB for in-place v16i16 shuffles
This used to resort to splitting the 256-bit operation into two 128-bit
shuffles and then recombining the results.

Fixes <rdar://problem/16167303>

llvm-svn: 204735
2014-03-25 17:47:06 +00:00
Cameron McInally 45dc489403 Fix AVX2 Gather execution domains.
llvm-svn: 204713
2014-03-25 12:36:38 +00:00
David Majnemer 273bff4713 WinCOFF: Add support for -fdata-sections
This is a pretty straight forward translation for COFF, we just need to
stick the data in a COMDAT section marked as
IMAGE_COMDAT_SELECT_NODUPLICATES.

N.B. We must be careful to avoid sticking entities with private linkage
in COMDAT groups.  COFF is pretty hostile to the renaming of entities so
we must be careful to disallow GlobalVariables with unstable names.

llvm-svn: 204703
2014-03-25 06:14:26 +00:00
Quentin Colombet 2d5c156b96 [X86][ISelDAG] Add missing fallback patterns for avx2 broadcast instructions.
Those patterns are used when the load cannot be folded into the related broadcast
during the select phase.
This happens when the load gets additional uses that were not anticipated during
the previous lowering phases (constant vector to constant load, then constant
load reused) or when selection DAG is not able to prove that folding the load
will not create a cycle in the DAG.

<rdar://problem/16074331>

llvm-svn: 204631
2014-03-24 17:54:19 +00:00
David Majnemer 9338984f57 WinCOFF: Add support for -ffunction-sections
This is a pretty straight forward translation for COFF, we just need to
stick the function in a COMDAT section marked as
IMAGE_COMDAT_SELECT_NODUPLICATES.

llvm-svn: 204565
2014-03-23 17:47:39 +00:00
Andrea Di Biagio 5b0aacf1c7 [DAG] Fix an assertion failure caused by an invalid cast in method 'BuildVectorSDNode::isConstantSplat'
This patch renames method 'isConstantSplat' as 'getConstantSplatValue'
(mainly for consistency reasons), and rewrites its logic to ensure
that we always perform a legal 'cast<ConstantSDNode>'.

Added test shift-combine-crash.ll to verify that DAGCombiner no longer crashes with an assertion failure in the attempt to simplify a vector shift by a vector of all undef counts.

llvm-svn: 204536
2014-03-22 01:47:22 +00:00
Manman Ren c935560568 Register allocator: add condition to hoist a spill to outer loop.
We make sure a spill is not hoisted to a hotter outer loop by adding
a condition. Hoist a spill to outer loop if there are multiple dependents
(it can be beneficial if more than one dependents are hoisted) or
if DepSV (the hoisting source) is hotter than SV (the hoisting destination).

rdar://16268194

llvm-svn: 204522
2014-03-21 21:46:24 +00:00
Rafael Espindola 734f105379 Move codegen test over to MC.
llvm-svn: 204490
2014-03-21 17:55:34 +00:00
Rafael Espindola c07cc8f370 Convert test to using cfi.
An unnamed global in llvm still produces a regular symbol.

llvm-svn: 204488
2014-03-21 17:38:01 +00:00
Rafael Espindola 7618632517 Remove redundant test.
The production of the .eh symbols is done from MC now and we already have tests
for it.

llvm-svn: 204483
2014-03-21 17:26:35 +00:00
Rafael Espindola d8eb29ecfd Split out the MC part of this test.
llvm-svn: 204481
2014-03-21 17:16:11 +00:00
Juergen Ributzka f0dff49ad0 [Constant Hoisting] Make the constant materialization cost operand dependent
Extend the target hook to take also the operand index into account when
calculating the cost of the constant materialization.

Related to <rdar://problem/16381500>

llvm-svn: 204435
2014-03-21 06:04:45 +00:00
Rafael Espindola f1b10242c0 Convert a CodeGen test into a MC test.
llvm-svn: 204421
2014-03-21 00:55:42 +00:00
Rafael Espindola 2544330a29 Port test to cfi.
llvm-svn: 204416
2014-03-21 00:30:24 +00:00
Rafael Espindola fc72577d92 Convert another CodeGen test into a MC test.
llvm-svn: 204412
2014-03-20 23:35:00 +00:00
Rafael Espindola c889a278fa Remove unused options from test.
llvm-svn: 204401
2014-03-20 21:38:04 +00:00
Juergen Ributzka 46357931ab Revert "[Constant Hoisting] Extend coverage of the constant hoisting pass."
I will break this up into smaller pieces for review and recommit.

llvm-svn: 204393
2014-03-20 20:17:13 +00:00
Juergen Ributzka 6dab520c70 [Constant Hoisting] Extend coverage of the constant hoisting pass.
This commit extends the coverage of the constant hoisting pass, adds additonal
debug output and updates the function names according to the style guide.

Related to <rdar://problem/16381500>

llvm-svn: 204389
2014-03-20 19:55:52 +00:00
Hans Wennborg aec21ce43e X86 memcpy lowering: use "rep movs" even when esi is used as base pointer
For functions where esi is used as base pointer, we would previously fall back
from lowering memcpy with "rep movs" because that clobbers esi.

With this patch, we just store esi in another physical register, and restore
it afterwards. This adds a little bit of register preassure, but the more
efficient memcpy should be worth it.

Differential Revision: http://llvm-reviews.chandlerc.com/D2968

llvm-svn: 204174
2014-03-18 20:04:34 +00:00
Michael Zolotukhin 7ac41056c8 Fix test lsr-normalization.ll broken in r204161.
llvm-svn: 204166
2014-03-18 18:17:59 +00:00
Michael Zolotukhin ed0a7761e5 Add stride normalization to SCEV Normalize/Denormalize transformation.
llvm-svn: 204161
2014-03-18 17:34:03 +00:00
Andrea Di Biagio 28f46d9f39 [DAGCombiner] teach how to simplify xor/and/or nodes according to the following rules:
1)  (AND (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (AND (A, B), C, Mask)
 2)  (OR  (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (OR  (A, B), C, Mask)
 3)  (XOR (shuf (A, C, Mask), shuf (B, C, Mask)) -> shuf (XOR (A, B), V_0, Mask)

 4)  (AND (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, AND (A, B), Mask)
 5)  (OR  (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (C, OR  (A, B), Mask)
 6)  (XOR (shuf (C, A, Mask), shuf (C, B, Mask)) -> shuf (V_0, XOR (A, B), Mask)

llvm-svn: 204160
2014-03-18 17:12:59 +00:00
Matt Arsenault 985b9de485 Make DAGCombiner work on vector bitshifts with constant splat vectors.
llvm-svn: 204071
2014-03-17 18:58:01 +00:00
Adam Nemet 24381f1cb7 [VectorLegalizer/X86] Don't unvectorize fp_to_uint for v8f32->v8i16
Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get
promoted to fp_to_sint v8f32->v8i32.  This is a legal operation on AVX.

For that to work properly, we also need to teach the legalizer about the
specific promotion required here.  The default vector promotion uses
bitcasting to a vector type of the same total size.  We want to promote the
vector element type, effectively widening the operation and then truncating
the result.  This is analogous to the current logic of how int_to_fp is
promoted.

The change also factors out some code from the int_to_fp promotion code to
ValueType::widenIntegerVectorElementType.  This is now shared between
int_to_fp and fp_to_int.

There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in
X86.  It can now go through the new target-independent fp_to_*int promotion
logic.

I also checked that no other target uses Promote for these ops yet, so there
shouldn't be any unexpected change in behavior.

Fixes <rdar://problem/16202247>

llvm-svn: 204058
2014-03-17 17:06:14 +00:00
Lang Hames 7c8189c6d3 [X86] New and improved VZeroUpperInserter optimization.
- Adds support for inserting vzerouppers before tail-calls.
  This is enabled implicitly by having MachineInstr::copyImplicitOps preserve
  regmask operands, which allows VZeroUpperInserter to see where tail-calls use
  vector registers.

- Fixes a bug that caused the previous version of this optimization to miss some
  vzeroupper insertion points in loops. (Loops-with-vector-code that followed
  loops-without-vector-code were mistakenly overlooked by the previous version).

- New algorithm never revisits instructions.

Fixes <rdar://problem/16228798>

llvm-svn: 204021
2014-03-17 01:22:54 +00:00
Adrian Prantl 5a4b90deae Re-add checks that were in this testcase before it was converted to dwarfdump.
llvm-svn: 203981
2014-03-14 23:08:21 +00:00
Rafael Espindola 2fb5bc33a3 Remove the linker_private and linker_private_weak linkages.
These linkages were introduced some time ago, but it was never very
clear what exactly their semantics were or what they should be used
for. Some investigation found these uses:

* utf-16 strings in clang.
* non-unnamed_addr strings produced by the sanitizers.

It turns out they were just working around a more fundamental problem.
For some sections a MachO linker needs a symbol in order to split the
section into atoms, and llvm had no idea that was the case. I fixed
that in r201700 and it is now safe to use the private linkage. When
the object ends up in a section that requires symbols, llvm will use a
'l' prefix instead of a 'L' prefix and things just work.

With that, these linkages were already dead, but there was a potential
future user in the objc metadata information. I am still looking at
CGObjcMac.cpp, but at this point I am convinced that linker_private
and linker_private_weak are not what they need.

The objc uses are currently split in

* Regular symbols (no '\01' prefix). LLVM already directly provides
whatever semantics they need.
* Uses of a private name (start with "\01L" or "\01l") and private
linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm
agrees with clang on L being ok or not for a given section. I have two
patches in code review for this.
* Uses of private name and weak linkage.

The last case is the one that one could think would fit one of these
linkages. That is not the case. The semantics are

* the linker will merge these symbol by *name*.
* the linker will hide them in the final DSO.

Given that the merging is done by name, any of the private (or
internal) linkages would be a bad match. They allow llvm to rename the
symbols, and that is really not what we want. From the llvm point of
view, these objects should really be (linkonce|weak)(_odr)?.

For now, just keeping the "\01l" prefix is probably the best for these
symbols. If we one day want to have a more direct support in llvm,
IMHO what we should add is not a linkage, it is just a hidden_symbol
attribute. It would be applicable to multiple linkages. For example,
on weak it would produce the current behavior we have for objc
metadata. On internal, it would be equivalent to private (and we
should then remove private).

llvm-svn: 203866
2014-03-13 23:18:37 +00:00
Kevin Enderby 3de14bc77e Add -mtriple=x86_64-linux to this test case to fix the build bots.5
The original commit was r203829.

llvm-svn: 203844
2014-03-13 20:31:19 +00:00
Ekaterina Romanova 8d62008ecb Fix for http://llvm.org/bugs/show_bug.cgi?id=18590
This patch fixes the bug in peephole optimization that folds a load which defines one vreg into the one and only use of that vreg. With debug info, a DBG_VALUE that referenced the vreg considered to be a use, preventing the optimization. The fix is to ignore DBG_VALUE's during the optimization, and undef a DBG_VALUE that references a vreg that gets removed.
Patch by Trevor Smigiel!

llvm-svn: 203829
2014-03-13 18:47:12 +00:00
Manuel Jacob a7c48f99ae CodeGenPrep: sink extends of illegal types into use block.
Summary:
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.

This is an update of D2973 which was reverted because of a bug reported
as PR19084.

Reviewers: t.p.northover, chapuni

Reviewed By: t.p.northover

CC: llvm-commits, alex, chapuni

Differential Revision: http://llvm-reviews.chandlerc.com/D3021

llvm-svn: 203797
2014-03-13 13:36:25 +00:00
Elena Demikhovsky fd05667276 AVX-512: masked load/store + intrinsics for them.
llvm-svn: 203790
2014-03-13 12:05:52 +00:00
Adam Nemet d4e56073c7 [X86] Add peephole for masked rotate amount
Extend what's currently done for shift because the HW performs this masking
implicitly:

   (rotl:i32 x, (and y, 31)) -> (rotl:i32 x, y)

I use the newly factored out multiclass that was only supporting shifts so
far.

For testing I extended my testcase for the new rotation idiom.

<rdar://problem/15295856>

llvm-svn: 203718
2014-03-12 21:20:55 +00:00
Rafael Espindola f3336bc1d5 Reject alias to undefined symbols in the verifier.
On ELF and COFF an alias is just another name for a position in the file.
There is no way to refer to a position in another file, so an alias to
undefined is meaningless.

MachO currently doesn't support aliases. The spec has a N_INDR, which when
implemented will have a different set of restrictions. Adding support for
it shouldn't be harder than any other IR extension.

For now, having the IR represent what is actually possible with current
tools makes it easier to fix the design of GlobalAlias.

llvm-svn: 203705
2014-03-12 20:15:49 +00:00
Hans Wennborg 6c37f8b985 X86: Don't generate 64-bit movd after cmpneqsd in 32-bit mode (PR19059)
This fixes the bug where we would bitcast the 64-bit floating point result
of cmpneqsd to a 64-bit integer even on 32-bit targets.

Differential Revision: http://llvm-reviews.chandlerc.com/D3009

llvm-svn: 203581
2014-03-11 15:49:24 +00:00
Tim Northover e94a518a22 IR: add a second ordering operand to cmpxhg for failure
The syntax for "cmpxchg" should now look something like:

	cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic

where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).

rdar://problem/15996804

llvm-svn: 203559
2014-03-11 10:48:52 +00:00
Jim Grosbach c94d993adf X86: Enable ISel of 16-bit MOVBE instructions.
When the MOVBE instructions are available, use them for 16-bit endian
swapping as well as for 32 and 64 bit.

The patterns were already present on the instructions, but weren't being
matched because the operation was unconditionally marked to 'Expand.'
Change that to be conditional on whether the MOVBE instructions are
available. Use 'rolw' to implement the in-register version (32 and 64
bit have the dedicated 'bswap' instruction for that).

Patch by Louis Gerbarg <lgg@apple.com>.

rdar://15479984

llvm-svn: 203524
2014-03-11 00:44:14 +00:00
Matt Arsenault 532db69984 Fix undefined behavior in vector shift tests.
These were all shifting the same amount as the bitwidth.

llvm-svn: 203519
2014-03-11 00:01:41 +00:00
NAKAMURA Takumi 1783e1e984 Revert r203230, "CodeGenPrep: sink extends of illegal types into use block."
It choked i686 stage2.

llvm-svn: 203386
2014-03-09 11:01:07 +00:00
David Majnemer c4ab61cb2f IR: Change inalloca's grammar a bit
The grammar for LLVM IR is not well specified in any document but seems
to obey the following rules:

 - Attributes which have parenthesized arguments are never preceded by
   commas.  This form of attribute is the only one which ever has
   optional arguments.  However, not all of these attributes support
   optional arguments: 'thread_local' supports an optional argument but
   'addrspace' does not.  Interestingly, 'addrspace' is documented as
   being a "qualifier".  What constitutes a qualifier?  I cannot find a
   definition.

 - Some attributes use a space between the keyword and the value.
   Examples of this form are 'align' and 'section'.  These are always
   preceded by a comma.

 - Otherwise, the attribute has no argument.  These attributes do not
   have a preceding comma.

Sometimes an attribute goes before the instruction, between the
instruction and it's type, or after it's type.  'atomicrmw' has
'volatile' between the instruction and the type while 'call' has 'tail'
preceding the instruction.

With all this in mind, it seems most consistent for 'inalloca' on an
'inalloca' instruction to occur before between the instruction and the
type.  Unlike the current formulation, there would be no preceding
comma.  The combination 'alloca inalloca' doesn't look particularly
appetizing, perhaps a better spelling of 'inalloca' is down the road.

llvm-svn: 203376
2014-03-09 06:41:58 +00:00
Adam Nemet 4203039760 Update comment from r203315 based on review
llvm-svn: 203361
2014-03-08 21:51:55 +00:00
David Blaikie 078278fe3a DebugInfo: further improvements to test following up on r203329
llvm-svn: 203337
2014-03-08 02:45:53 +00:00
David Blaikie f528f054d0 DebugInfo: Fix test fallout from r203323
Will fix this harder in a moment.

llvm-svn: 203329
2014-03-08 01:32:51 +00:00
Adam Nemet 5117f5dffc [DAGCombiner] Recognize another rotation idiom
This is the new idiom:

  x<<(y&31) | x>>((0-y)&31)

which is recognized as:

  x ROTL (y&31)

The change refines matchRotateSub.  In
Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is
Pos' & (OpSize - 1) we can just use Pos' instead of Pos.

llvm-svn: 203315
2014-03-07 23:56:28 +00:00
Arnold Schwaighofer d33e942958 ISel: Make VSELECT selection terminate in cases where the condition type has to
be split and the result type widened.

When the condition of a vselect has to be split it makes no sense widening the
vselect and thereby widening the condition. We end up in an endless loop of
widening (vselect result type) and splitting (condition mask type) doing this.
Instead, split both the condition and the vselect and widen the result.

I ran this over the test suite with i686 and mattr=+sse and saw no regressions.

Fixes PR18036.

llvm-svn: 203311
2014-03-07 23:25:55 +00:00
Tim Northover ad3d81d320 CodeGenPrep: sink extends of illegal types into use block.
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.

Patch by Manuel Jacob.

llvm-svn: 203230
2014-03-07 11:04:30 +00:00
Rafael Espindola b1f25f1b93 Replace PROLOG_LABEL with a new CFI_INSTRUCTION.
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.

The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.

The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.

I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.

The net result is that we don't create temporary labels that are never used.

llvm-svn: 203204
2014-03-07 06:08:31 +00:00
Rafael Espindola 3b30cb41a9 Remove shouldEmitUsedDirectiveFor.
Clang now uses llvm.compiler.used for these cases.

llvm-svn: 203174
2014-03-06 22:47:08 +00:00
Rafael Espindola 123256a4aa Convert test to FileCheck.
llvm-svn: 203173
2014-03-06 22:21:43 +00:00
Andrea Di Biagio 6292a140ee [X86] Teach the DAGCombiner how to fold a OR of two shufflevector nodes.
This patch teaches the DAGCombiner how to fold a binary OR between two
shufflevector into a single shuffle vector when possible.

The rules are:
  1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1)
  2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2)

The DAGCombiner can take advantage of the fact that OR is commutative and
compute two possible shuffle masks (Mask1 and Mask2) for the resulting
shuffle node.

Before folding a dag according to either rule 1 or 2, DAGCombiner verifies
that the resulting shuffle mask is legal for the target.
DAGCombiner would firstly try to fold according to 1.; If not possible
then it will try to fold according to 2.
If both Mask1 and Mask2 are illegal then we conservatively don't fold
the OR instruction.

llvm-svn: 203156
2014-03-06 20:19:52 +00:00
Rafael Espindola 8377085657 Always print the implicit .text at the start of an asm file.
Before llvm-mc would print it, but llc was assuming that it would produce
another section changing directive before one was needed. That assumption is
false with inline asm.

Fixes PR19049.

Another option would be to always create the section, but in the asm printer
avoid printing sections changes during initialization. That would work, but
* We do use the fact that llvm-mc prints it in testing. The tests can be changed
  if needed.
* A quick poll on IRC suggest that most developers prefer the implicit .text to
  be printed.

llvm-svn: 203001
2014-03-05 20:09:15 +00:00
Cameron McInally 791ae9927c Lower AVX v4i64->v4i32 truncate to one shuffle.
llvm-svn: 202996
2014-03-05 19:41:16 +00:00
Andrew Trick fbb278c541 Make stackmap machineinstrs clobber the scratch regs too.
Patchpoints already did this. Doing it for stackmaps is a convenience
for the runtime in the event that it needs to scratch register to
patch or perform a runtime call thunk.

Unlike patchpoints, we just assume the AnyRegCC calling
convention. This is the only language and target independent calling
convention specific to stackmaps so makes sense.  Although the calling
convention is not currently used to select the scratch registers.

llvm-svn: 202943
2014-03-05 07:08:16 +00:00
Hans Wennborg acb842d523 Check for dynamic allocas and inline asm that clobbers sp before building
selection dag (PR19012)

In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).

The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.

This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.

Differential Revision: http://llvm-reviews.chandlerc.com/D2954

llvm-svn: 202930
2014-03-05 02:43:26 +00:00
Elena Demikhovsky 9737e3886b AVX-512: Fixed extract_vector_elt for v8i1 vector
llvm-svn: 202624
2014-03-02 09:19:44 +00:00
Manman Ren 709c951b42 SpillPlacement: fix a bug in iterate.
Inside iterate, we scan backwards then scan forwards in a loop. When iteration
is not zero, the last node was just updated so we can skip it. But when
iteration is zero, we can't skip the last node.

For the testing case, fixing this will save a spill and move register copies
from hot path to cold path.

llvm-svn: 202557
2014-02-28 23:05:31 +00:00
Adam Nemet 6586e5d6ac Test commit
llvm-svn: 202528
2014-02-28 18:44:39 +00:00
Daniel Sanders 9f088ba322 Stop test/CodeGen/X86/v4i32load-crash.ll targeting non-X86-64 targets.
Summary:
Fixes an issue where a test attempts to use -mcpu=x86-64 on non-X86-64 targets.
This triggers an assertion in the MIPS backend since it doesn't know what ABI to
use by default for unrecognized processors.

CC: llvm-commits, rafael

Differential Revision: http://llvm-reviews.chandlerc.com/D2877

llvm-svn: 202369
2014-02-27 09:24:31 +00:00
Andrew Trick 52a00936b4 Add a limit to the heuristic that register allocates instructions in local order.
This handles pathological cases in which we see 2x increase in spill
code for large blocks (~50k instructions). I don't have a unit test
for this behavior.

Fixes rdar://16072279.

llvm-svn: 202304
2014-02-26 22:07:26 +00:00
Quentin Colombet 85c9e16291 Lower unsigned vsetcc to psubus in certain cases
The current approach to lower a vsetult is to flip the sign bit of the
operands, swap the operands and then use a (signed) pcmpgt.  psubus (unsigned
saturating subtract) can be used to emulate a vsetult more efficiently:

+    case ISD::SETULT: {
+      // If the comparison is against a constant we can turn this into a
+      // setule.  With psubus, setule does not require a swap.  This is
+      // beneficial because the constant in the register is no longer
+      // destructed as the destination so it can be hoisted out of a loop.

I also enable lowering via psubus in a few other cases where it's clearly
beneficial: setule and setuge if minu/maxu cannot be used.
    
rdar://problem/14338765

Patch by Adam Nemet <anemet@apple.com>.

llvm-svn: 202301
2014-02-26 21:39:12 +00:00
Rafael Espindola f863ee2949 Store a DataLayout in Module.
Now that DataLayout is not a pass, store one in Module.

Since the C API expects to be able to get a char* to the datalayout description,
we have to keep a std::string somewhere. This patch keeps it in Module and also
uses it to represent modules without a DataLayout.

Once DataLayout is mandatory, we should probably move the string to DataLayout
itself since it won't be necessary anymore to represent the special case of a
module without a DataLayout.

llvm-svn: 202190
2014-02-25 20:01:08 +00:00
Elena Demikhovsky 3ebfe11532 AVX-512: Fixed encoding of VPTESTMQ
llvm-svn: 201980
2014-02-23 14:28:35 +00:00
Benjamin Kramer d20d1adfb8 Make test more resilient against scheduling decisions.
Should bring the atom buildbots back to life.

llvm-svn: 201951
2014-02-22 20:14:02 +00:00
NAKAMURA Takumi 0607c15435 llvm/test/CodeGen/X86/shift-pcmp.ll: Tweak to appease FileCheck. "CHECK-LABEL" doesn't identify labels magically and CHECK-LABEL behaves free from other contexts.
For targeting pecoff, ".def foo" appears before ".short 32".

          .def    foo;
  ...
  .LCPI0_0:
          .short  32
  foo:

CHECK-LABEL seeks not from ".short 32" but from the top of the input.

llvm-svn: 201931
2014-02-22 07:27:04 +00:00
Quentin Colombet 4db08df18e [DAGCombiner] PCMP* sets its result to all ones or zeros so we can AND with the
shifted mask rather than masking and shifting separately.

The patch adds this transformation to the DAGCombiner:

  (shl (and (setcc:i8v16 ...) N01C) N1C) -> (and (setcc:i8v16 ...) N01C<<N1C)

<rdar://problem/16054492>

Patch by Adam Nemet <anemet@apple.com>

llvm-svn: 201906
2014-02-21 23:42:41 +00:00
Elena Demikhovsky 2efed98b58 AVX-512: added a lit test for truncate operation
llvm-svn: 201763
2014-02-20 07:34:13 +00:00
Rafael Espindola daeafb4c2a Add back r201608, r201622, r201624 and r201625
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.

They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.

llvm-svn: 201700
2014-02-19 17:23:20 +00:00
Cameron McInally 7b544f0297 Fix AVX512 vector sqrt assembly strings.
llvm-svn: 201681
2014-02-19 15:16:09 +00:00
Daniel Jasper 7e198ad862 Revert r201622 and r201608.
This causes the LLVMgold plugin to segfault. More information on the
replies to r201608.

llvm-svn: 201669
2014-02-19 12:26:01 +00:00
Rafael Espindola b9ea63c551 Avoid an infinite cycle with private linkage and -f{data|function}-sections.
When outputting an object we check its section to find its name, but when
looking for the section with -ffunction-section we look for the symbol name.

Break the loop by requesting a name with the private prefix when constructing
the section name. This matches the behavior before r201608.

llvm-svn: 201622
2014-02-19 01:28:30 +00:00
Rafael Espindola 09dcc6a536 Fix PR18743.
The IR
@foo = private constant i32 42

is valid, but before this patch we would produce an invalid MachO from it. It
was invalid because it would use an L label in a section where the liker needs
the labels in order to atomize it.

One way of fixing it would be to just reject this IR in the backend, but that
would not be very front end friendly.

What this patch does is use an 'l' prefix in sections that we know the linker
requires symbols for atomizing them. This allows frontends to just use
private and not worry about which sections they go to or how the linker handles
them.

One small issue with this strategy is that now a symbol name depends on the
section, which is not available before codegen. This is not a problem in
practice. The reason is that it only happens with private linkage, which will
be ignored by the non codegen users (llvm-nm and llvm-ar).

llvm-svn: 201608
2014-02-18 22:24:57 +00:00
Tim Northover f06df5866f X86: use vpsllvd (& friends) for 16-bit shifts on Haswell
llvm-svn: 201558
2014-02-18 11:15:32 +00:00
Elena Demikhovsky 750498c77b AVX-512: implemented zext fron i1 to i16
llvm-svn: 201502
2014-02-17 07:29:33 +00:00
Elena Demikhovsky 1fad075974 AVX-512: simpyfied BUILD_VECTOR for masks; fixed cmp/test sequence
llvm-svn: 201487
2014-02-16 11:34:23 +00:00
Nico Rieck 7647178738 Fix broken CHECK lines
llvm-svn: 201479
2014-02-16 07:31:05 +00:00
Rafael Espindola 1f3de49f37 Use __literal16. It has been supported by the linker since 2005.
llvm-svn: 201365
2014-02-13 23:16:11 +00:00
Rafael Espindola 98f0cfdd90 .file is only available on ELF, use a triple instead of -march.
llvm-svn: 201337
2014-02-13 15:38:16 +00:00
Daniel Sanders 753e17629d Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.

Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
  (fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
  (should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
  (should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
  to be enabled regardless of default setting or -no-integrated-as.
  (should fix SystemZ buildbots)

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201333
2014-02-13 14:44:26 +00:00
Juergen Ributzka 2b97f9b211 [DAG] Fix the recognition of opaque constants in the SelectionDAGBuilder.
This fix checks the original LLVM IR node to identify opaque constants by
looking for the bitcast-constant pattern. Originally we looked at the generated
SDNode, but this might lead to incorrect results. The SDNode could have been
generated by an constant expression that was folded to a constant.

This fixes <rdar://problem/16050719>

llvm-svn: 201291
2014-02-13 04:19:26 +00:00
Andrea Di Biagio 386d566395 [X86] Teach the backend how to lower vector shift left into multiply rather than scalarizing it.
Instead of expanding a packed shift into a sequence of scalar shifts,
the backend now tries (when possible) to convert the vector shift into a
vector multiply.

Before this change, a shift of a MVT::v8i16 vector by a
build_vector of constants was always scalarized into a long sequence of "vector
extracts + scalar shifts + vector insert".
With this change, if there is SSE2 support, we emit a single vector multiply.

This change also affects SSE4.1, AVX, AVX2 shifts:
 - A shift of a MVT::v4i32 vector by a build_vector of non uniform constants
is now lowered when possible into a single SSE4.1 vector multiply.
 - Packed v16i16 shift left by constant build_vector are now expanded when
possible into a single AVX2 vpmullw.
This change also improves the lowering of AVX512f vector shifts.

Added test CodeGen/X86/vec_shift6.ll with some code examples that are affected
by this change.

llvm-svn: 201271
2014-02-12 23:42:28 +00:00
Daniel Sanders abe212a3b8 Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
It introduced multiple test failures in the buildbots.

llvm-svn: 201241
2014-02-12 15:39:20 +00:00
Daniel Sanders a7d504cf58 Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.

The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.

All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.

Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2686

llvm-svn: 201237
2014-02-12 14:44:54 +00:00
David Blaikie 6bd395f3f0 DebugInfo: Remove dependence on file numbering in the line table.
These tests were unnecessarily sensitive to the presence and ordering of
elements in the line table file_names list which will break on a future
change I'm working on.

llvm-svn: 201185
2014-02-11 21:46:46 +00:00
Robert Lougher 7d9084ffa1 Teach the DAGCombiner how to fold concat_vector nodes when the input is two
BUILD_VECTOR nodes, e.g.:

(concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4))
->
(BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4)

This fixes an issue with AVX, where a sequence was not recognized as a 256-bit
vbroadcast due to the concat_vectors.

llvm-svn: 201158
2014-02-11 15:42:46 +00:00
Elena Demikhovsky 1f32c313f1 AVX: fixed a bug in LowerVECTOR_SHUFFLE
llvm-svn: 201140
2014-02-11 10:21:53 +00:00
Elena Demikhovsky 2aafc22ed9 AVX-512: Optimized BUILD_VECTOR pattern;
fixed encoding of VEXTRACTPS instruction.

llvm-svn: 201134
2014-02-11 07:25:59 +00:00
Quentin Colombet 64ca544d95 [CodeGenPrepare] Test case for the promotions that bypass the
profitability check due to some other checks in the addressing
mode matcher. I.e., test case for commit r201121.

<rdar://problem/16020230>

llvm-svn: 201132
2014-02-11 06:55:43 +00:00
Robert Lougher 48ee75b7e3 Test commit - added a new line to vec_shuf-insert.ll.
llvm-svn: 201083
2014-02-10 12:42:13 +00:00
Elena Demikhovsky 9f423d6f25 AVX-512: Fixed extract_vector_elt for v16i1 and v8i1 vectors.
llvm-svn: 201066
2014-02-10 07:02:39 +00:00