Commit Graph

11222 Commits

Author SHA1 Message Date
Sanjay Patel 9bb601856e use SDValue methods directly instead of getNode()->* ; NFCI
llvm-svn: 227334
2015-01-28 18:01:31 +00:00
Michael Kuperstein 90e08320c9 [x32] Change the condition from bitness to LP64 for TCRETURNdi64.
TCRETURNmi64, which was mistakenly changed in r227307 will wait for another day.

llvm-svn: 227317
2015-01-28 16:11:35 +00:00
Michael Kuperstein 951995821a [X86] Reduce some 32-bit imuls into lea + shl
Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well.

Differential Revision: http://reviews.llvm.org/D7196

llvm-svn: 227308
2015-01-28 14:08:22 +00:00
Michael Kuperstein f387611ac2 [x32] Enable sibcall optimization on x32.
This includes two things:
1) Fix TCRETURNdi and TCRETURN64di patterns to check the right thing (LP64 as opposed to target bitness).
2) Allow LEA64_32 in MatchingStackOffset.

llvm-svn: 227307
2015-01-28 13:38:48 +00:00
Elena Demikhovsky 7b0dd39db6 AVX-512: Added FMA intrinsics with rounding mode
By Asaf Badouh and Elena Demikhovsky

Added special nodes for rounding: FMADD_RND, FMSUB_RND..
It will prevent merge between nodes with rounding and other standard nodes.

llvm-svn: 227303
2015-01-28 10:21:27 +00:00
Craig Topper 7d3c6d307a [X86] Teach disassembler to handle illegal immediates on AVX512 integer compare instructions.
llvm-svn: 227302
2015-01-28 10:09:56 +00:00
Craig Topper 6772eac490 [X86] Merge printSSECC and printAVXCC. They only differed by an assertion.
llvm-svn: 227301
2015-01-28 10:09:52 +00:00
Alexey Samsonov 533948088e Revert "[x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vector"
This reverts commits r226953 and r226974.

llvm-svn: 227248
2015-01-27 21:34:11 +00:00
Simon Pilgrim 0629ba1ad9 [X86][SSE] Float comparisons can sometimes be safely commuted
For ordered, unordered, equal and not-equal tests, packed float and double comparison instructions can be safely commuted without affecting the results. This patch checks the comparison mode of the (v)cmpps + (v)cmppd instructions and commutes the result if it can.

Differential Revision: http://reviews.llvm.org/D7178

llvm-svn: 227145
2015-01-26 22:29:24 +00:00
Simon Pilgrim 9b7c00352d [X86][PCLMUL] Enable commutation for PCLMUL instructions
Patch to allow (v)pclmulqdq to be commuted - swaps the src registers and inverts the immediate (low/high) src mask.

Differential Revision: http://reviews.llvm.org/D7180

llvm-svn: 227141
2015-01-26 22:00:18 +00:00
Alex Rosenberg b9fefdd215 Use a different encoding for debugtrap on PS4.
llvm-svn: 227116
2015-01-26 19:09:27 +00:00
Eric Christopher 8b7706517c Move DataLayout back to the TargetMachine from TargetSubtargetInfo
derived classes.

Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.

*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.

llvm-svn: 227113
2015-01-26 19:03:15 +00:00
Sanjay Patel 805bc02c2b Model sqrtsd as a binary operation with one source operand tied to the destination (PR14221)
This patch fixes the following miscompile:

define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp {
  %0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind 
  %a0 = extractelement <2 x double> %0, i32 0
  %conv = fptrunc double %a0 to float
  %a1 = extractelement <2 x double> %0, i32 1
  %conv3 = fptrunc double %a1 to float
  tail call void @callee2(float %conv, float %conv3) nounwind
  ret void
}

Current codegen:

sqrtsd	%xmm0, %xmm1        ## high element of %xmm1 is undef here
xorps	%xmm0, %xmm0
cvtsd2ss	%xmm1, %xmm0
shufpd	$1, %xmm1, %xmm1
cvtsd2ss	%xmm1, %xmm1 ## operating on undef value
jmp	_callee

This is a continuation of http://llvm.org/viewvc/llvm-project?view=revision&revision=224624 ( http://reviews.llvm.org/D6330 ) 
which was itself a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).

All of these patches are partial fixes for PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ); 
this should be the final patch needed to resolve that bug.

Differential Revision: http://reviews.llvm.org/D6885

llvm-svn: 227111
2015-01-26 18:42:16 +00:00
Elena Demikhovsky 1a603b3f13 AVX-512: Changes in operations on masks registers for KNL and SKX
- Added KSHIFTB/D/Q for skx
- Added KORTESTB/D/Q for skx
- Fixed store operation for v8i1 type for KNL
- Store size of v8i1, v4i1 and v2i1 are changed to 8 bits

llvm-svn: 227043
2015-01-25 12:47:15 +00:00
Craig Topper ca8e179bc2 [X86] Give scalar VRNDSCALE instructions priority in AVX512 mode.
llvm-svn: 227039
2015-01-25 08:49:22 +00:00
Craig Topper 1d60952401 Simplify a multiclass. No functional change.
llvm-svn: 227038
2015-01-25 08:49:19 +00:00
Craig Topper e3155c96ee Remove tab characters. NFC
llvm-svn: 227036
2015-01-25 08:45:32 +00:00
Elena Demikhovsky a3232f764e Implemented cost model for masked load/store operations.
llvm-svn: 227035
2015-01-25 08:44:46 +00:00
Craig Topper 53a846764c [X86] Replace i32i8imm on SSE/AVX instructions with i32u8imm which will make the assembler bounds check them. It will also make them print as unsigned.
llvm-svn: 227032
2015-01-25 02:21:16 +00:00
Craig Topper fc946a0e6f [X86] Use u8imm in several places that used i32i8imm that don't require an i32 type.
llvm-svn: 227031
2015-01-25 02:21:13 +00:00
Craig Topper e7f6cf437c Remove tab characters. NFC.
llvm-svn: 227030
2015-01-25 02:21:11 +00:00
Bruno Cardoso Lopes ddcc2e31a7 [x86] Fix a comment
llvm-svn: 226974
2015-01-24 00:22:04 +00:00
Bruno Cardoso Lopes 56567f9135 [x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vector
Handle the poor codegen for i64/x86xmm->v2i64 (%mm -> %xmm) moves. Instead of
using stack store/load pair to do the job, use scalar_to_vector directly, which
in the MMX case can use movq2dq. This was the current behavior prior to
improvements for vector legalization of extloads in r213897.

This commit fixes the regression and as a side-effect also remove some
unnecessary shuffles.

In the new attached testcase, we go from:

pshufw  $-18, (%rdi), %mm0
movq    %mm0, -8(%rsp)
movq    -8(%rsp), %xmm0
pshufd  $-44, %xmm0, %xmm0
movd    %xmm0, %eax
...

To:

pshufw  $-18, (%rdi), %mm0
movq2dq %mm0, %xmm0
movd    %xmm0, %eax
...

Differential Revision: http://reviews.llvm.org/D7126
rdar://problem/19413324

llvm-svn: 226953
2015-01-23 22:44:16 +00:00
Reid Kleckner 5cc1569c54 Classify functions by EH personality type rather than using the triple
This mostly reverts commit r222062 and replaces it with a new enum. At
some point this enum will grow at least for other MSVC EH personalities.

Also beefs up the way we were sniffing the personality function.
Previously we would emit the Itanium LSDA despite using
__C_specific_handler.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6987

llvm-svn: 226920
2015-01-23 18:49:01 +00:00
Eric Christopher a1c6e0c8ce Remove some local variables in place of just querying for them
in the couple of asserts.

llvm-svn: 226917
2015-01-23 17:22:44 +00:00
Craig Topper 0271d10d35 [x86] Change u8imm operands to always print as unsigned. This makes shuffle masks and the like make way more sense.
llvm-svn: 226902
2015-01-23 08:00:59 +00:00
Craig Topper 46469aa4da [X86] Add IntrNoMem to the AVX512 conflict intrinsics.
llvm-svn: 226897
2015-01-23 06:11:45 +00:00
Simon Pilgrim 7e6d573e87 [X86][AVX] Added (V)MOVDDUP / (V)MOVSLDUP / (V)MOVSHDUP memory folding + tests.
Minor tweak now that D7042 is complete, we can enable stack folding for (V)MOVDDUP and do proper testing.

Added missing AVX ymm folding patterns and fixed alignment for AVX VMOVSLDUP / VMOVSHDUP.

llvm-svn: 226873
2015-01-22 22:39:59 +00:00
Alexander Potapenko a007905e4e Mark |TLI| variables used to suppress -Wunused-variable warnings.
(These vars are only used in assertions)

llvm-svn: 226815
2015-01-22 13:03:33 +00:00
Elena Demikhovsky 150d9f3187 Fixed a bug in type legalizer for masked load/store intrinsics.
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.

llvm-svn: 226808
2015-01-22 12:07:59 +00:00
Craig Topper e0c8e8f6a7 Revert r226798. Guess I missed the patterns.
llvm-svn: 226802
2015-01-22 09:01:20 +00:00
Craig Topper ffef4cf1e1 Use u8imm instead of i32i8imm on a couple instructions that have no patterns and thus no reason to use a larger operand size.
llvm-svn: 226798
2015-01-22 08:53:11 +00:00
Craig Topper 9b39e54001 [X86] Remove some unused multiclasses from AVX512 instruction file.
llvm-svn: 226797
2015-01-22 08:53:08 +00:00
Saleem Abdulrasool 10ed0babd3 ARM: fail less catastrophically on invalid Windows input
Windows supports a restricted set of relocations (compared to ARM ELF).  In some
cases, we may end up generating an unsupported relocation.  This can occur with
bad input to the assembler in particular (the frontend should never generate
code that cannot be compiled).  Generate an error rather than just aborting.

The change in the API is driven by the desire to provide a slightly more helpful
message for debugging purposes.

llvm-svn: 226779
2015-01-22 04:03:32 +00:00
Simon Pilgrim 5fa0fb23ca [X86][SSE] Missing SSE/AVX1 memory folding integer instructions
Added most of the missing integer vector folding patterns for SSE (to SSE42) and AVX1.

The most useful of these are probably the i32/i64 extraction, i8/i16/i32/i64 insertions, zero/sign extension, unsigned saturation subtractions, i64 subtractions and the variable mask blends (pblendvb) - others include CLMUL, SSE42 string comparisons and bit tests.

Differential Revision: http://reviews.llvm.org/D7094

llvm-svn: 226745
2015-01-21 23:43:30 +00:00
Simon Pilgrim b16b09b154 [X86][SSE] Added support for SSE3 lane duplication shuffle instructions
This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds.

Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles.

Also adds a missing tablegen pattern for MOVDDUP.

Differential Revision: http://reviews.llvm.org/D7042

llvm-svn: 226716
2015-01-21 22:44:35 +00:00
Simon Pilgrim 47af023ada [X86][SSE] movddup shuffle mask decodes
Patch to provide shuffle decodes and asm comments for the SSE3/AVX1 movddup double duplication instructions.

llvm-svn: 226705
2015-01-21 22:02:30 +00:00
Ahmed Bougacha 8f09e9f7c5 [X86] Declare SSE4.1/AVX2 vector extloads covered by PMOV[SZ]X legal.
Now that we can fully specify extload legality, we can declare them
legal for the PMOVSX/PMOVZX instructions.  This for instance enables
a DAGCombine to fire on code such as
  (and (<zextload-equivalent> ...), <redundant mask>)
to turn it into:
  (zextload ...)
as seen in the testcase changes.

There is one regression, in widen_load-2.ll: we're no longer able
to do store-to-load forwarding with illegal extload memory types.
This will be addressed separately.

Differential Revision: http://reviews.llvm.org/D6533

llvm-svn: 226676
2015-01-21 17:07:06 +00:00
Michael Kuperstein ada9fa1ca9 [x32] Fast ISel should use LEA64_32r instead of LEA32r to adjust addresses in x32 mode.
llvm-svn: 226661
2015-01-21 14:44:05 +00:00
Craig Topper 42b326ea12 [x86] Remove some unnecessary and slightly confusing typecasts from some patterns. I think it actually went i32->iPtr->i32 in some of these cases.
llvm-svn: 226647
2015-01-21 08:43:57 +00:00
Craig Topper 7ff6ab30a9 [X86] Convert all the i8imm used by AVX512 and MMX instructions to u8imm.
llvm-svn: 226646
2015-01-21 08:43:49 +00:00
Craig Topper 620b50cc23 [X86] Convert all the i8imm used by SSE and AVX instructions to u8imm.
This makes the assembler check their size and removes a hack from the disassembler to avoid sign extending the immediate.

llvm-svn: 226645
2015-01-21 08:15:54 +00:00
Craig Topper f38dea1cfa [x86] Add assembly parser bounds checking to the immediate value for cmpss/cmpsd/cmpps/cmppd.
llvm-svn: 226642
2015-01-21 06:07:53 +00:00
Craig Topper 9f4d485610 [x86] Add some mayLoad/hasSideEffects flags. Remove one that was already covered by a pattern.
llvm-svn: 226562
2015-01-20 12:15:30 +00:00
Simon Pilgrim 20bc37c7db [X86][AVX] Missing AVX1 memory folding float instructions
Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing).

Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me.

Differential Revision: http://reviews.llvm.org/D7055

llvm-svn: 226513
2015-01-19 22:40:45 +00:00
Rafael Espindola 2658554aec Add r224985 back with fixes.
The fixes are to note that AArch64 has additional restrictions on when local
relocations can be used. In particular, ld64 requires that relocations to
cstring/cfstrings use linker visible symbols.

Original message:

In an assembly expression like

bar:
  .long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

llvm-svn: 226503
2015-01-19 21:11:14 +00:00
Craig Topper f4bf9119a1 [x86] Change AVX512 intrinsics to take a 8-bit immediate for the comparision kind instead of a 32-bit immediate. This better aligns with the emitted instruction. It also matches SSE and AVX1 equivalents. Also add auto upgrade support.
llvm-svn: 226430
2015-01-19 06:07:27 +00:00
David Blaikie 9459832ebd std::unique_ptrify the MCStreamer argument to createAsmPrinter
llvm-svn: 226414
2015-01-18 20:29:04 +00:00
Saleem Abdulrasool c3f8ad3e83 X86: fix comment typo in AsmParser
Fix a typo.  NFC.

llvm-svn: 226313
2015-01-16 20:16:06 +00:00
Adam Nemet 3e8b22bc1b [AVX512] Add intrinsics for masked aligned FP loads and stores
Similar to the unaligned cases.

Test was generated with update_llc_test_checks.py.

Part of <rdar://problem/17688758>

llvm-svn: 226296
2015-01-16 18:50:09 +00:00
Andrea Di Biagio ae47bc6ab9 [X86][DAG] Disable target specific combine on INSERTPS dag nodes at -O0.
This patch disables target specific combine on X86ISD::INSERTPS dag nodes
if optlevel is CodeGenOpt::None.

The backend currently implements a target specific combine rule that converts
a vector load used by an INSERTPS dag node into a scalar load plus a
scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of
two instructions (i.e. a vector load plus INSERTPSrr).

However, the existing target combine rule on INSERTPS nodes only works under
the assumption that ISel will always be able to match an INSERTPSrm. This is
not true in general at -O0, since the backend only allows folding a load into
the memory operand of an instruction if the optimization level is not
CodeGenOpt::None.

In the example below:

//
__m128 test(__m128 a, __m128 *b) {
  __m128 c = _mm_insert_ps(a, *b, 1 << 6);
  return c;
}
//

Before this patch, at -O0, the backend would have canonicalized the load to 'b'
into a scalar load plus scalar_to_vector. Later on, ISel would have selected an
INSERTPSrr leaving the insertps mask in an inconsistent state:

  movss 4(%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3].

With this patch, the backend avoids folding the vector load into the operand of
the INSERTPS. The new codegen at -O0 is:

  movaps (%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3].

llvm-svn: 226277
2015-01-16 14:55:26 +00:00
Joerg Sonnenberger b6956e113a Support @PLT loads on 32bit x86.
llvm-svn: 226182
2015-01-15 17:59:02 +00:00
Craig Topper 9fdd078afb Hide some redundant AVX512 instructions from the asm parser, but force them to show up in the disassembler.
llvm-svn: 226155
2015-01-15 09:37:15 +00:00
Rafael Espindola 7244bb3c17 Revert "Add r224985 back with two fixes."
This reverts commit r225644 while I debug a regression.

llvm-svn: 226022
2015-01-14 19:07:23 +00:00
David Majnemer 7efc6139d9 Use the operand vector instead so inline assembly can be validated too
The buildbots got upset after r225941, this should hopefully fix things.

llvm-svn: 225954
2015-01-14 06:14:36 +00:00
Saleem Abdulrasool aa32297fb8 X86: only access operands if they are present
If there is no associated immediate (MS style inline asm), do not try to access
the operand, assume that it is valid.  This should fix the buildbots after SVN
r225941.

llvm-svn: 225950
2015-01-14 05:37:10 +00:00
JF Bastien eeea8970b4 Revert "Insert random noops to increase security against ROP attacks (llvm)"
This reverts commit:
http://reviews.llvm.org/D3392

llvm-svn: 225948
2015-01-14 05:24:33 +00:00
Saleem Abdulrasool ca24b1d638 X86: validate 'int' instruction
The int instruction takes as an operand an 8-bit immediate value.  Validate that
the input is valid rather than silently truncating the value.

llvm-svn: 225941
2015-01-14 05:10:21 +00:00
JF Bastien dcdd5ad252 Insert random noops to increase security against ROP attacks (llvm)
A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks.

Command line options:
  -noop-insertion // Enable noop insertion.
  -noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion)
  -max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions.

In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future).
  -fdiversify

This is the llvm part of the patch.
clang part: D3393

http://reviews.llvm.org/D3392
Patch by Stephen Crane (@rinon)

llvm-svn: 225908
2015-01-14 01:07:26 +00:00
Adam Nemet e5dbcb7fd0 [AVX512] Unpack support in new shuffle lowering
This now handles both 32 and 64-bit element sizes.

In this version, the test are in vector-shuffle-512-v8.ll, canonicalized by
Chandler's update_llc_test_checks.py.

Part of <rdar://problem/17688758>

llvm-svn: 225838
2015-01-13 22:20:18 +00:00
Adam Nemet 67c8484794 [AVX512] Add pretty-printing of shuffle mask for unpacks
llvm-svn: 225837
2015-01-13 22:20:14 +00:00
Reid Kleckner 3542ace6ef Rename llvm.recoverframeallocation to llvm.framerecover
This name is less descriptive, but it sort of puts things in the
'llvm.frame...' namespace, relating it to frameallocate and
frameaddress. It also avoids using "allocate" and "allocation" together.

llvm-svn: 225752
2015-01-13 01:51:34 +00:00
Reid Kleckner e9b8931873 Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics
These intrinsics allow multiple functions to share a single stack
allocation from one function's call frame. The function with the
allocation may only perform one allocation, and it must be in the entry
block.

Functions accessing the allocation call llvm.recoverframeallocation with
the function whose frame they are accessing and a frame pointer from an
active call frame of that function.

These intrinsics are very difficult to inline correctly, so the
intention is that they be introduced rarely, or at least very late
during EH preparation.

Reviewers: echristo, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D6493

llvm-svn: 225746
2015-01-13 00:48:10 +00:00
Simon Pilgrim d88ab87064 [X86][SSE] Minor regression fix for r225551
r225551 vector byte shuffle optimization caused an assertion as fully zeroable vectors can be produced under certain circumstances. This fix drops the assert and returns a zero vector where the assert would have failed.

llvm-svn: 225718
2015-01-12 22:38:08 +00:00
Ahmed Bougacha 291833b959 [X86] Also create+widen FMIN/FMAX nodes for v2f32.
This happens in the HINT benchmark, where the SLP-vectorizer created
v2f32 fcmp/select code.  The "correct" solution would have been to
teach the vectorizer cost model that v2f32 isn't legal (because really,
it isn't), but if we can vectorize we might as well do so.

We legalize these v2f32 FMIN/FMAX nodes by widening to v4f32 later on.
v3f32 were already widened to v4f32 by the generic unroll-and-build-vector
legalization.

rdar://15763436
Differential Revision: http://reviews.llvm.org/D6557

llvm-svn: 225691
2015-01-12 20:31:30 +00:00
Rafael Espindola d9c3e308f5 Add r224985 back with two fixes.
One is that AArch64 has additional restrictions on when local relocations can
be used. We have to take those into consideration when deciding to put a L
symbol in the symbol table or not.

The other is that ld64 requires the relocations to cstring to use linker
visible symbols on AArch64.

Thanks to Michael Zolotukhin for testing this!

Remove doesSectionRequireSymbols.

In an assembly expression like

bar:
.long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

llvm-svn: 225644
2015-01-12 18:13:07 +00:00
Simon Pilgrim b5869f6c7c [X86][SSE] Minor fix to VPBLENDW AVX2 commutation.
D6015 / rL221313 enabled commutation for SSE immediate blend instructions, but due to a typo the AVX2 VPBLENDW ymm instructions weren't flagged as commutative along with the others in the tables, but were still being commuted in code and tested for.

llvm-svn: 225612
2015-01-11 22:08:01 +00:00
David Majnemer 14141f941a Revert most of r225597
We can't rely on a DataLayout enlightened constant folder.

llvm-svn: 225599
2015-01-11 07:29:51 +00:00
David Majnemer 292d0c796b X86: Properly decode shuffle masks when the constant pool type is weird
It's possible for the constant pool entry for the shuffle mask to come
from a completely different operation.  This occurs when Constants have
the same bit pattern but have different types.

Make DecodePSHUFBMask tolerant of types which, after a bitcast, are
appropriately sized vector types.

This fixes PR22188.

llvm-svn: 225597
2015-01-11 05:08:57 +00:00
Saleem Abdulrasool 9cf2679d3b X86: teach X86TargetLowering about L,M,O constraints
Teach the ISelLowering for X86 about the L,M,O target specific constraints.
Although, for the moment, clang performs constraint validation and prevents
passing along inline asm which may have immediate constant constraints violated,
the backend should be able to cope with the invalid inline asm a bit better.

llvm-svn: 225596
2015-01-11 04:39:24 +00:00
Simon Pilgrim 94a4cc027a [X86][SSE] Improved (v)insertps shuffle matching
In the current code we only attempt to match against insertps if we have exactly one element from the second input vector, irrespective of how much of the shuffle result is zeroable.

This patch checks to see if there is a single non-zeroable element from either input that requires insertion. It also supports matching of cases where only one of the inputs need to be referenced.

We also split insertps shuffle matching off into a new lowerVectorShuffleAsInsertPS function.

Differential Revision: http://reviews.llvm.org/D6879

llvm-svn: 225589
2015-01-10 19:45:33 +00:00
Simon Pilgrim ec1f2c2cab [X86][SSE] Avoid vector byte shuffles with zero by using pshufb to create zeros
pshufb can shuffle in zero bytes as well as bytes from a source vector - we can use this to avoid having to shuffle 2 vectors and ORing the result when the used inputs from a vector are all zeroable.

Differential Revision: http://reviews.llvm.org/D6878

llvm-svn: 225551
2015-01-09 22:03:19 +00:00
Lang Hames 1e923ec122 Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revision
introduced.

A test case for the bug was already committed in r225385.

Patch by Rafael Espindola.

llvm-svn: 225534
2015-01-09 18:55:42 +00:00
Chandler Carruth 685b1803ab [x86] Add a flag to control the vector shuffle legality predicates that
complements the new vector shuffle lowering code path. This flag,
naturally, is *off* because we've not tested or evaluated the results of
this at all. However, the flag will make it much easier to evaluate
whether we can be this aggressive and whether there are missing vector
shuffle lowering optimizations.

llvm-svn: 225491
2015-01-09 01:24:36 +00:00
Ahmed Bougacha d716121888 [X86] Reflow comment. NFC.
llvm-svn: 225455
2015-01-08 17:49:48 +00:00
Michael Kuperstein 46f7d525c3 [X86] Don't try to generate direct calls to TLS globals
The call lowering assumes that if the callee is a global, we want to emit a direct call.
This is correct for regular globals, but not for TLS ones.

Differential Revision: http://reviews.llvm.org/D6862

llvm-svn: 225438
2015-01-08 11:50:58 +00:00
Craig Topper 7c10252943 [X86] Don't print 'dword ptr' or 'qword ptr' on the operand to some of the LEA variants in Intel syntax. The memory operand is inherently unsized.
llvm-svn: 225432
2015-01-08 07:41:30 +00:00
Ahmed Bougacha 2b6917b020 [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532

llvm-svn: 225421
2015-01-08 00:51:32 +00:00
Matthias Braun ada0adf396 X86: VZeroUpperInserter: shortcut should not trigger if we have any function live-ins.
llvm-svn: 225419
2015-01-08 00:33:48 +00:00
Ahmed Bougacha 67dd2d25a3 [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.
A few loops do trickier things than just iterating on an MVT subset,
so I'll leave them be for now.
Follow-up of r225387.

llvm-svn: 225392
2015-01-07 21:27:10 +00:00
Ahmed Bougacha b994d0c0c5 [X86] Fix 512->256 typo in comments. NFC.
llvm-svn: 225367
2015-01-07 19:38:50 +00:00
David Majnemer 4d77fdf311 X86: Allow the stack probe size to be configurable per function
LLVM emits stack probes on Windows targets to ensure that the stack is
correctly accessed.  However, the amount of stack allocated before
emitting such a probe is hardcoded to 4096.

It is desirable to have this be configurable so that a function might
opt-out of stack probes.  Our level of granularity is at the function
level instead of, say, the module level to permit proper generation of
code after LTO.

Patch by Andrew H!

N.B.  The inliner needs to be updated to properly consider what happens
after inlining a function with a specific stack-probe-size into another
function with a different stack-probe-size.

llvm-svn: 225360
2015-01-07 18:14:07 +00:00
Ahmed Bougacha aa2d290997 [X86] Teach FCOPYSIGN lowering to recognize constant magnitudes.
For code like:
    float foo(float x) { return copysign(1.0, x); }
We used to generate:
    andps  <-0.000000e+00,0,0,0>, %xmm0
    movss  <1.000000e+00>, %xmm1
    andps  <nan>, %xmm1
    orps   %xmm0, %xmm1
Basically doing an abs(1.0f) in the two middle instructions.

We now generate:
    andps  <-0.000000e+00,0,0,0>, %xmm0
    orps   <1.000000e+00,0,0,0>, %xmm0

Builds on cleanups r223415, r223542.
rdar://19049548
Differential Revision: http://reviews.llvm.org/D6555

llvm-svn: 225357
2015-01-07 17:33:03 +00:00
Craig Topper 39354e1b1a [X86] Merge a switch statement inside a default case of another switch statement on the same variable. There was no additional code in the default so this should be no functional change.
llvm-svn: 225345
2015-01-07 08:10:38 +00:00
Craig Topper 8b3c47ca57 [X86] Don't mark the shift by 1 instructions as isConvertibleToThreeAddress. There is no handling for them.
llvm-svn: 225344
2015-01-07 08:10:36 +00:00
Craig Topper 23fa478709 [X86] Remove some unused TYPE enums from the disassembler.
llvm-svn: 225343
2015-01-07 07:47:52 +00:00
Lang Hames 66f755f84f Revert r224935 "Refactor duplicated code. No intended functionality change."
This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin.
Reverting to get the bots green while I track down the source of the changed
behavior.

llvm-svn: 225311
2015-01-06 23:04:36 +00:00
Craig Topper 639445494f [X86] Add OpSize32 to XBEGIN_4. Add XBEGIN_2 with OpSize16.
Requires new AsmParserOperand types that detect 16-bit and 32/64-bit mode so that we choose the right instruction based on default sizing without predicates. This is necessary since predicates mess up the disassembler table building.

llvm-svn: 225256
2015-01-06 08:59:30 +00:00
Craig Topper ddbf51f904 [X86] Make isel select the 2-byte register form of INC/DEC even in non-64-bit mode. Convert to the 1-byte form in non-64-bit mode as part of MCInst lowering.
Overall this seems simpler. It reduces duplication of patterns between both modes and it simplifies the memory folding/unfolding tables as they don't need to create fake instructions just to keep track of 64-bitness.

llvm-svn: 225252
2015-01-06 07:35:50 +00:00
David Majnemer 29c52f7449 X86: Don't make illegal GOTTPOFF relocations
"ELF Handling for Thread-Local Storage" specifies that R_X86_64_GOTTPOFF
relocation target a movq or addq instruction.

Prohibit the truncation of such loads to movl or addl.

This fixes PR22083.

Differential Revision: http://reviews.llvm.org/D6839

llvm-svn: 225250
2015-01-06 07:12:52 +00:00
Craig Topper 0f2c4ac649 [X86] Remove 16-bit and 32-bit offset jump instructions from the AsmParser. We always select the 8-bit size and let the assembler backend relax to the larger size.
llvm-svn: 225243
2015-01-06 04:23:57 +00:00
Craig Topper 49758aab94 [X86] Make isel select the shorter form of jump instructions instead of the long form.
The assembler backend will relax to the long form if necessary. This removes a swap from long form to short form in the MCInstLowering code. Selecting the long form used to be required by the old JIT.

llvm-svn: 225242
2015-01-06 04:23:53 +00:00
Lang Hames 04b37c4043 Revert r225048: It broke ObjC on AArch64.
I've filed http://llvm.org/PR22100 to track this issue.

llvm-svn: 225228
2015-01-06 00:54:32 +00:00
Brad Smith e78889c669 Remove X86 .quad workaround for buggy GNU assembler on OpenBSD / Bitrig.
llvm-svn: 225227
2015-01-06 00:53:52 +00:00
Simon Pilgrim 4c55af6850 [X86][SSE] lowerVectorShuffleAsByteShift tidyup
Removed local isSequential predicate and use standard helper isSequentialOrUndefInRange instead.

llvm-svn: 225216
2015-01-05 22:08:48 +00:00
Simon Pilgrim 71b96b35e1 [X86][SSE] Fixed description for isSequentialOrUndefInRange. NFC.
llvm-svn: 225202
2015-01-05 21:09:48 +00:00
Craig Topper d3c02f177a Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert.
llvm-svn: 225160
2015-01-05 10:15:49 +00:00
Craig Topper dc2fc8035d [X86] Remove the predicates from the register forms of the 2-byte inc and dec instructions. Remove the 32-bit mode only versions that existed for the disassembler. Move the patterns out of the instructions so they can still be qualified with predicates.
llvm-svn: 225157
2015-01-05 08:19:12 +00:00
Craig Topper 3dcdde2e92 [X86] Simplify code a little by just summing flags instead of conditionally incrementing. NFC
llvm-svn: 225156
2015-01-05 08:19:10 +00:00
Craig Topper 859677edef [X86] Remove unnecessary redeclaration of a variable with the same assignment as the beginning of the function. NFC.
llvm-svn: 225155
2015-01-05 08:19:07 +00:00