Commit Graph

11554 Commits

Author SHA1 Message Date
Sanjay Patel 912315811e fix typos; NFC
llvm-svn: 235896
2015-04-27 17:03:31 +00:00
Elena Demikhovsky a480ef5494 AVX-512: added calling conventions for i1 vectors.
Fixed bug: https://llvm.org/bugs/show_bug.cgi?id=20724

llvm-svn: 235889
2015-04-27 15:11:19 +00:00
Elena Demikhovsky d1084c5b3f AVX-512: Extend/Truncate operations for SKX,
SETCC for bit-vectors

llvm-svn: 235875
2015-04-27 12:57:59 +00:00
Simon Pilgrim 4f683c264a [X86][SSE] Add v16i8/v32i8 multiplication support
Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized.

The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result.

Differential Revision: http://reviews.llvm.org/D9115

llvm-svn: 235837
2015-04-27 07:55:46 +00:00
Lang Hames 9ff69c8f4d [AsmPrinter] Make AsmPrinter's OutStreamer member a unique_ptr.
AsmPrinter owns the OutStreamer, so an owning pointer makes sense here. Using a
reference for this is crufty.

llvm-svn: 235752
2015-04-24 19:11:51 +00:00
Sanjay Patel cab567873f [x86] Add store-folded memop patterns for vcvtps2ph
Differential Revision: http://reviews.llvm.org/D7296

llvm-svn: 235517
2015-04-22 16:11:19 +00:00
Andrea Di Biagio 6cd2f42fac [X86][AVX] Fix failure due to a missing ISel pattern to select VBROADCAST nodes (PR23259).
This fixes a regression introduced at revision 218263.

On AVX, if we optimize for size, a splat build_vector of a load
is lowered into a VBROADCAST node. This is done even if the value type of the
splat build_vector node is v2i64.

Since AVX doesn't support v2f64/v2i64 broadcasts, revision 218263 added two
extra tablegen patterns to allow selecting a VMOVDDUPrm from an X86VBroadcast
where the scalar element comes from a loadi64/loadf64.

However, revision 218263 forgot to add an extra fallback pattern for the case
where we have a X86VBroadcast of a loadi64 with multiple uses.

This patch adds the missing tablegen pattern in X86InstrSSE.td.
This patch also adds an extra test to 'splat-for-size.ll' to verify that ISel
doesn't crash with a 'fatal error in the backend' due to a missing AVX pattern
to select v2i64 X86ISD::BROADCAST nodes.

llvm-svn: 235509
2015-04-22 14:53:39 +00:00
Lang Hames 65613a634a [patchpoint] Add support for symbolic patchpoint targets to SelectionDAG and the
X86 backend.

The code generated for symbolic targets is identical to the code generated for
constant targets, except that a relocation is emitted to fix up the actual
target address at link-time. This allows IR and object files containing
patchpoints to be cached across JIT-invocations where the target address may
change.

llvm-svn: 235483
2015-04-22 06:02:31 +00:00
Sanjay Patel fe1365ac50 [x86] allow 64-bit extracted vector element integer stores on a 32-bit system
With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system
even though 64-bit integers are not legal types.

So instead of producing this:

  pshufd	$229, %xmm0, %xmm1      ## xmm1 = xmm0[1,1,2,3]
  movd	%xmm0, (%eax)
  movd	%xmm1, 4(%eax)

We can do:

  movq %xmm0, (%eax)

This is a fix for the problem noted in D7296.

Differential Revision: http://reviews.llvm.org/D9134

llvm-svn: 235460
2015-04-22 00:24:30 +00:00
Matthias Braun 9e9e8b3230 X86: Match for X86ISD nodes in LowerBUILD_VECTOR instead of BUILD_VECTORCombine
There doesn't seem to be a reason to perform this target ISD node matching
in an DAGCombine, moving it to lowering fixes PR23296.

Differential Revision: http://reviews.llvm.org/D9137

llvm-svn: 235394
2015-04-21 17:21:36 +00:00
Elena Demikhovsky 0e6d6d54ce AVX-512: Added VPMOVx2M instructions for SKX,
fixed encoding of VPMOVM2x.

llvm-svn: 235385
2015-04-21 14:38:31 +00:00
Elena Demikhovsky 431b81e41f AVX-512: Added VPTESTM and VPTESTNM instructions for SKX
llvm-svn: 235383
2015-04-21 13:13:46 +00:00
Elena Demikhovsky 50b88ddb87 AVX-512: Added logical and arithmetic instructions for SKX
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 235375
2015-04-21 10:27:40 +00:00
Simon Pilgrim 398ce22b86 [X86][SSE] Provide execution domains for scalar floating point operations
This is an updated version of Chandler's patch D7402 that got accepted but never committed, and has bit-rotted a bit since.

I've updated the execution domain declarations to match the approach of the packed templates and also added some extra scalar unary tests.

Differential Revision: http://reviews.llvm.org/D9095

llvm-svn: 235372
2015-04-21 08:40:22 +00:00
Matthias Braun b6b5aaad98 X86: Do not select X86 custom vector nodes if operand types don't match
X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected
if the operand types do not match the result type because vector type
legalization cannot deal with this for custom nodes.

Testcase X86ISD::ADDSUB is attached. I could not create a testcase for
the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296

Differential Revision: http://reviews.llvm.org/D9120

llvm-svn: 235367
2015-04-21 01:13:41 +00:00
Andrea Di Biagio 98c367093d [X86][FastIsel] Fix assertion failure when selecting int-to-double conversion (PR23273).
This fixes a regression introduced at revision 231243.
The target-independent selection algorithm in FastISel knows how to select
a SINT_TO_FP if the target is SSE but not AVX. That is because on X86, the
tablegen'd 'fastEmit' functions know how to select CVTSI2SSrr and CVTSI2SDrr.

Method X86FastISel::X86SelectSIToFP was therefore working under the
wrong assumption that the target was AVX. That assumption was incorrect since
we can have a target that is neither AVX nor SSE.

So, rather than asserting for the presence of AVX, we should have had an
early exit from 'X86SelectSIToFP' if the target was not AVX.
This patch fixes the issue replacing the invalid assertion with an early exit.

Thanks to Dimitry Andric for reporting this problem and for providing a small
reproducible testcase. Added test pr23273.ll.

llvm-svn: 235295
2015-04-20 11:56:59 +00:00
Simon Pilgrim 749953eebb [X86][SSE] Fix for getScalarValueForVectorElement to detect scalar sources requiring truncation.
The fix ensures that scalar sources inserted into a vector are the correct bit size.

Integer scalar sources from BUILD_VECTOR and SCALAR_TO_VECTOR nodes may require truncation that this function doesn't currently support.

llvm-svn: 235281
2015-04-19 22:16:49 +00:00
Craig Topper 43d413b698 Remove unnecessary include and probably a layering violation.
llvm-svn: 235262
2015-04-19 00:57:33 +00:00
Sanjay Patel 2161c49a4e [X86, AVX] add an exedepfix entry for vmovq == vmovlps == vmovlpd
This is the AVX extension of r235014:
http://llvm.org/viewvc/llvm-project?view=revision&revision=235014

Review:
http://reviews.llvm.org/D8691

llvm-svn: 235210
2015-04-17 17:02:37 +00:00
Rafael Espindola 7f4e07befc Move AliasedSymbol to MachObjectWriter.
It was only used by MachO.
Part of pr19627.

llvm-svn: 235185
2015-04-17 12:28:43 +00:00
Sanjay Patel c03d93baa0 [X86] add an exedepfix entry for movq == movlps == movlpd
This is a 1-line patch (with a TODO for AVX because that will affect
even more regression tests) that lets us substitute the appropriate
64-bit store for the float/double/int domains.

It's not clear to me exactly what the difference is between the 0xD6 (MOVPQI2QImr) and 
0x7E (MOVSDto64mr) opcodes, but this is apparently the right choice.

Differential Revision: http://reviews.llvm.org/D8691

llvm-svn: 235014
2015-04-15 15:47:51 +00:00
Sanjay Patel 7024b8121a [x86] Implement combineRepeatedFPDivisors
Set the transform bar at 2 divisions because the fastest current
x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle
latency (best case) relative to a 5 cycle multiplier. 
So that's the worst case for this transform (no latency win), 
but multiplies are obviously pipelined while divisions are not,
so there's still a big throughput win which we would expect to
show up in typical FP code.

These are the sequences I'm comparing:

  divss   %xmm2, %xmm0
  mulss   %xmm1, %xmm0
  divss   %xmm2, %xmm0

Becomes:

  movss   LCPI0_0(%rip), %xmm3    ## xmm3 = mem[0],zero,zero,zero
  divss   %xmm2, %xmm3
  mulss   %xmm3, %xmm0
  mulss   %xmm1, %xmm0
  mulss   %xmm3, %xmm0

[Ignore for the moment that we don't optimize the chain of 3 multiplies
into 2 independent fmuls followed by 1 dependent fmul...this is the DAG
version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that,
then the transform becomes even more profitable on all targets.]

Differential Revision: http://reviews.llvm.org/D8941

llvm-svn: 235012
2015-04-15 15:22:55 +00:00
Rafael Espindola 5560a4cfbd Use raw_pwrite_stream in the object writer/streamer.
The ELF object writer will take advantage of that in the next commit.

llvm-svn: 234950
2015-04-14 22:14:34 +00:00
Krzysztof Parzyszek a46c36b8f4 Allow memory intrinsics to be tail calls
llvm-svn: 234764
2015-04-13 17:16:45 +00:00
Alexander Kornienko f817c1cb9a Use 'override/final' instead of 'virtual' for overridden methods
The patch is generated using clang-tidy misc-use-override check.

This command was used:

  tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \
    -checks='-*,misc-use-override' -header-filter='llvm|clang' \
    -j=32 -fix -format

http://reviews.llvm.org/D8925

llvm-svn: 234679
2015-04-11 02:11:45 +00:00
Benjamin Kramer 619c4e57ba Reduce dyn_cast<> to isa<> or cast<> where possible.
No functional change intended.

llvm-svn: 234586
2015-04-10 11:24:51 +00:00
Rafael Espindola 49286e9f4a clang-format bits of code to make a followup patch easy to read.
llvm-svn: 234519
2015-04-09 18:32:58 +00:00
Rafael Espindola df7305a438 Don't repeat name in comment. NFC.
llvm-svn: 234506
2015-04-09 17:10:57 +00:00
Rafael Espindola b91455b5c0 Refactor a lot of duplicated code for stub output.
This also moves it earlier so that it they are produced before we print
an end symbol for the data section.

llvm-svn: 234315
2015-04-07 13:42:44 +00:00
Simon Pilgrim 49ba9b8e2f [X86][SSE] Use (V)PINSRB for direct byte insertion in 16i8 buildvector on SSE4.1 targets
This patch allows SSE4.1 targets to use (V)PINSRB to create 16i8 vectors by inserting i8 scalars directly into a XMM register instead of merging pairs of i8 scalars into a i16 and using the SSE2 PINSRW instruction.

This allows folding of byte loads and reduces scalar register usage as well.

Differential Revision: http://reviews.llvm.org/D8839

llvm-svn: 234193
2015-04-06 18:39:00 +00:00
Craig Topper 7ea899a15f [X86] Apply AddedComplexity consistently for similar patterns. This keeps them together in the DAGISel tables and reduces table size slightly.
llvm-svn: 234086
2015-04-04 04:22:12 +00:00
Craig Topper 3d44178733 [X86] Add a comment about the change in r234075.
llvm-svn: 234079
2015-04-04 02:31:43 +00:00
Craig Topper 9012028738 [X86] Don't use GR64 register 'and with immediate' instructions if the immediate is zero in the upper 33-bits or upper 57-bits. Use GR32 instructions instead.
Previously the patterns didn't have high enough priority and we would only use the GR32 form if the only the upper 32 or 56 bits were zero.

Fixes PR23100.

llvm-svn: 234075
2015-04-04 02:08:20 +00:00
David Majnemer 3337064a47 [WinEH] Sink UnwindHelp completely out of IR
We don't need to represent UnwindHelp in IR.  Instead, we can use the
knowledge that we are emitting the parent function to decide if we
should create the UnwindHelp stack object.

llvm-svn: 234061
2015-04-03 22:32:26 +00:00
Duncan P. N. Exon Smith 3bef6a3803 CodeGen: Assert that inlined-at locations agree
As a follow-up to r234021, assert that a debug info intrinsic variable's
`MDLocalVariable::getInlinedAt()` always matches the
`MDLocation::getInlinedAt()` of its `!dbg` attachment.

The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), but I'll let these assertions bake for a while
first.

If you have an out-of-tree backend that just broke, you're probably
attaching the wrong `DebugLoc` to a `DBG_VALUE` instruction.  The one
you want is the location that was attached to the corresponding
`@llvm.dbg.declare` or `@llvm.dbg.value` call that you started with.

llvm-svn: 234038
2015-04-03 19:20:26 +00:00
Simon Pilgrim 0184622bbc [X86] Added SSE4.2 CRC32 memory folding patterns + tests
llvm-svn: 234013
2015-04-03 14:24:40 +00:00
Simon Pilgrim 8dba5da06d [X86][3DNow] Added 3DNow! memory folding patterns + tests
llvm-svn: 234008
2015-04-03 11:50:30 +00:00
Peter Collingbourne 10d362c51b MC: For variable symbols, maintain MCSymbol::Section as a cache.
Fixes PR19582.

Previously, when an asm assignment (.set or =) was created, we would look up
the section immediately in MCSymbol::setVariableValue. This caused symbols
to receive the wrong section if the RHS of the assignment had not been seen
yet. This had a knock-on effect in the object file emitters, causing them
to emit extra symbols, or to give symbols the wrong visibility or the wrong
section. For example, in the following asm:

.data
.Llocal:

.text
leaq .Llocal1(%rip), %rdi
.Llocal1 = .Llocal2
.Llocal2 = .Llocal

the first assignment would give .Llocal1 a null section, which would never get
fixed up by the second assignment. This would cause the ELF object file emitter
to consider .Llocal1 to be an undefined symbol and give it external linkage,
even though .Llocal1 should not have been emitted at all in the object file.

Or in the following asm:

alias_to_local = Ltmp0
Ltmp0:

the Mach-O object file emitter would give the alias_to_local symbol a n_type
of N_SECT and a n_sect of 0.  This is invalid under the Mach-O specification,
which requires N_SECT symbols to receive a non-zero section number if the
symbol is defined in a section in the object file.

https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachORuntime/#//apple_ref/c/tag/nlist

After this change we do not look up the section when the assignment is created,
but instead look it up on demand and store it in Section, which is treated
as a cache if the symbol is a variable symbol.

This change also fixes a bug in MCExpr::FindAssociatedSection. Previously,
if we saw a subtraction, we would return the first referenced section, even in
cases where we should have been returning the absolute pseudo-section. Now we
always return the absolute pseudo-section for expressions that subtract two
section-derived expressions. This isn't always correct (e.g. if one of the
sections ends up being laid out at an absolute address), but it's probably
the best we can do without more context.

This allows us to remove code in two places where we appear to have been
working around this bug, in MachObjectWriter::markAbsoluteVariableSymbols
and in X86AsmPrinter::EmitStartOfAsmFile.

Re-applies r233595 (aka D8586), which was reverted in r233898.

Differential Revision: http://reviews.llvm.org/D8798

llvm-svn: 233995
2015-04-03 01:46:11 +00:00
Sanjay Patel eca590ffb3 [AVX] Improve insertion of i8 or i16 into low element of 256-bit zero vector
Without this patch, we split the 256-bit vector into halves and produced something like:
	movzwl	(%rdi), %eax
	vmovd	%eax, %xmm0
	vxorps	%xmm1, %xmm1, %xmm1
	vblendps	$15, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]

Now, we eliminate the xor and blend because those zeros are free with the vmovd:
        movzwl  (%rdi), %eax
        vmovd   %eax, %xmm0

This should be the final fix needed to resolve PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685

llvm-svn: 233941
2015-04-02 20:21:52 +00:00
Sanjay Patel 2bb5d695f9 [X86, AVX] adjust tablegen patterns to generate better code for scalar insertion into zero vector (PR23073)
For code like this:

define <8 x i32> @load_v8i32() {
  ret <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
}

We produce this AVX code:

_load_v8i32:                            ## @load_v8i32
  movl	$7, %eax
  vmovd	%eax, %xmm0
  vxorps	%ymm1, %ymm1, %ymm1
  vblendps	$1, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]
  retq

There are at least 2 bugs in play here:

    We're generating a blend when a move scalar does the same job using 2 less instruction bytes (see FIXMEs).
    We're not matching an existing pattern that would eliminate the xor and blend entirely. The zero bytes are free with vmovd.

The 2nd fix involves an adjustment of "AddedComplexity" [1] and mostly masks the 1st problem.

[1] AddedComplexity has close to no documentation in the source. 
The best we have is this comment: "roughly corresponds to the number of nodes that are covered". 
It appears that x86 has bastardized this definition by inflating its values for some other
undocumented reason. For example, we have a pattern with "AddedComplexity = 400" (!). 

I searched my way to this page:
https://groups.google.com/forum/#!topic/llvm-dev/5UX-Og9M0xQ

Differential Revision: http://reviews.llvm.org/D8794

llvm-svn: 233931
2015-04-02 17:56:17 +00:00
Elena Demikhovsky 1eeece1285 AVX-512: intrinsics for VPADD, VPMULDQ and VPSUB
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 233906
2015-04-02 10:51:40 +00:00
Peter Collingbourne 949eb3f6a7 Revert r233595, "MC: For variable symbols, maintain MCSymbol::Section as a cache."
llvm-svn: 233898
2015-04-02 07:02:51 +00:00
Benjamin Kramer 3a16a36bb6 [X86] Don't accidentally select shll $1, %eax when shrinking an immediate.
addl has higher throughput and this was needlessly picking a suboptimal
encoding causing PR23098.

I wish there was a way of doing this without further duplicating tbl-
generated patterns, but so far I haven't found one.

llvm-svn: 233832
2015-04-01 19:01:09 +00:00
David Majnemer a225a19dd0 [WinEH] Generate .xdata for catch handlers
This lets us catch exceptions in simple cases.

N.B. Things that do not work include (but are not limited to):
- Throwing from within a catch handler.
- Catching an object with a named catch parameter.
- 'CatchHigh' is fictitious, we aren't sure of its purpose.
- We aren't entirely efficient with regards to the number of EH states
  that we generate.
- IP-to-State tables are sensitive to the order of emission.

llvm-svn: 233767
2015-03-31 22:35:44 +00:00
Sanjay Patel 0b464dc4a6 typo; NFC
llvm-svn: 233761
2015-03-31 21:24:47 +00:00
Sanjay Patel 30d589536a [X86, AVX] fix zero-extending integer operand load patterns to use integer instructions
This is a follow-on to r233704 and another partial fix for PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685

llvm-svn: 233724
2015-03-31 18:43:43 +00:00
Sanjay Patel 2ae9943881 [X86, AVX] try to lowerVectorShuffleAsElementInsertion() for all 256-bit vector sub-types
I suggested this change in D7898 (http://llvm.org/viewvc/llvm-project?view=revision&revision=231354)

It improves the v4i64 case although not optimally. This AVX codegen:

  vmovq {{.*#+}} xmm0 = mem[0],zero
  vxorpd %ymm1, %ymm1, %ymm1
  vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]

Becomes:

  vmovsd {{.*#+}} xmm0 = mem[0],zero

Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here:

    We're not handling v32i8 / v16i16.
    We're not getting the FP / int domains right for instruction selection.

But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64, 
I'm submitting this patch ahead of fixing the above.

Differential Revision: http://reviews.llvm.org/D8341

llvm-svn: 233704
2015-03-31 16:32:11 +00:00
Rafael Espindola dd3add6c60 Fix the operand encoding in the test instruction.
Fixes pr22995.

llvm-svn: 233686
2015-03-31 12:31:55 +00:00
Ahmed Bougacha 77b5bb7ce2 [X86] Generate MOVNT for all vector types.
We used to miss non-Q YMM integer vectors, and, non-Q/D XMM integer
vectors.
While there, change the v4i32 patterns to prefer MOVNTDQ.

llvm-svn: 233668
2015-03-31 03:16:51 +00:00
Eric Christopher f8019408dc Replace the MCSubtargetInfo parameter with a Triple when creating
an MCInstPrinter. Update all callers and use where we wanted a Triple
previously.

llvm-svn: 233648
2015-03-31 00:10:04 +00:00