Commit Graph

5747 Commits

Author SHA1 Message Date
Daniel Jasper 4d7b04384e Remove experimental options to control machine block placement.
This reverts r226034. Benchmarking with those flags has not revealed
anything interesting.

llvm-svn: 229648
2015-02-18 08:18:07 +00:00
Elena Demikhovsky 714f23bcdb AVX-512: Added support for FP instructions with embedded rounding mode.
By Asaf Badouh <asaf.badouh@intel.com>

llvm-svn: 229645
2015-02-18 07:59:20 +00:00
Craig Topper 55ac42426e [X86] Add another test case for the bug fixed in r229642. With the bug a vpsrldq was emitted instead of pslldq.
llvm-svn: 229643
2015-02-18 07:45:43 +00:00
Chandler Carruth 55553f5299 [x86] Rewrite the byte shift detection to not use boolean variables to
track state.

I didn't like this in the code review because the pattern tends to be
error prone, but I didn't see a clear way to rewrite it. Turns out that
there were bugs here, I found them when fuzz testing our shuffle
lowering for correctness on x86.

The core of the problem is that we need to consistently test all our
preconditions for the same directionality of shift and the same input
vector. Instead, formulate this as two predicates (one doesn't depend on
the input in any way), pass things like the directionality and input
vector as inputs, and loop over the alternatives.

This fixes a pattern of very rare miscompiles coming out of this code.
Turned up roughly 4 out of every 1 million v8 shuffles in my fuzz
testing. The new code is over half a million test runs with no failures
yet. I've also fuzzed every other function in the lowering code with
over 3.5 million test cases and not discovered any other miscompiles.

llvm-svn: 229642
2015-02-18 07:13:48 +00:00
Craig Topper b324e43aed [X86] Remove AVX2 and SSE2 pslldq and psrldq intrinsics. We can represent them in IR with vector shuffles now. All their uses have been removed from clang in favor of shuffles.
llvm-svn: 229640
2015-02-18 06:24:44 +00:00
Andrea Di Biagio e7b58ee555 [X86][FastIsel] Teach how to select scalar integer to float/double conversions.
This patch teaches fast-isel how to select a (V)CVTSI2SSrr for an integer to 
float conversion, and how to select a (V)CVTSI2SDrr for an integer to double
conversion.

Added test 'fast-isel-int-float-conversion.ll'.

Differential Revision: http://reviews.llvm.org/D7698

llvm-svn: 229589
2015-02-17 23:40:58 +00:00
Rafael Espindola df19519800 Add r228939 back with a fix.
The problem in the original patch was not switching back to .text after printing
an eh table.

Original message:

On ELF, put PIC jump tables in a non executable section.

Fixes PR22558.

llvm-svn: 229586
2015-02-17 23:34:51 +00:00
Rafael Espindola 8c77768609 Add a test showing the problem in r228939.
If an EH table is printed in between the function and the jump table we would
fail to switch back to the text section to print the jump table.

llvm-svn: 229580
2015-02-17 23:21:46 +00:00
Simon Pilgrim 1d89a02abb [X86][SSE] Generalised unpckl/unpckh shuffle matching
Added commuted unpckl/unpckh shuffle matching patterns as many cases containing undefined lanes fail to commute by themselves.

Differential Revision: http://reviews.llvm.org/D7564

llvm-svn: 229571
2015-02-17 22:24:32 +00:00
Sanjay Patel 5562c554f0 use a triple instead of a cpu; less builbot sadness
llvm-svn: 229563
2015-02-17 21:59:54 +00:00
Rafael Espindola 928e9ae89c Add testcases I missed in r229541.
llvm-svn: 229542
2015-02-17 20:50:39 +00:00
Sanjay Patel 716fef68c7 make basic block label matching more flexible for less sad buildbots
llvm-svn: 229535
2015-02-17 20:29:31 +00:00
Sanjay Patel b811c1d6a5 prevent folding a scalar FP load into a packed logical FP instruction (PR22371)
Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors. 
That's what the hardware packed logical FP instructions define: 128-bit memory operands.
There are no scalar versions of these instructions...because this is x86.

Generating the wrong code (folding a scalar load into a 128-bit load) is still possible
using the peephole optimization pass and the load folding tables. We won't completely
solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other
places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl()
to make sure it isn't changing the size of the load.

Differential Revision: http://reviews.llvm.org/D7474

llvm-svn: 229531
2015-02-17 20:08:21 +00:00
Sanjay Patel ab7e86e5be Canonicalize splats as build_vectors (PR22283)
This is a follow-on patch to:
http://reviews.llvm.org/D7093

That patch canonicalized constant splats as build_vectors, 
and this patch removes the constant check so we can canonicalize
all splats as build_vectors.

This fixes the 2nd test case in PR22283:
http://llvm.org/bugs/show_bug.cgi?id=22283

The unfortunate code duplication between SelectionDAG and DAGCombiner
is discussed in the earlier patch review. At least this patch is just
removing code...

This improves an existing x86 AVX test and changes codegen in an ARM test.

Differential Revision: http://reviews.llvm.org/D7389

llvm-svn: 229511
2015-02-17 16:54:32 +00:00
Andrea Di Biagio b55666f7e3 [X86][FastISel] Add missing flag -fast-isel-abort to run lines in test fast-isel-fptrunc-fpext.ll.
Flag -fast-isel-abort is required in order to verify that X86FastISel
never fails to select FPExt (float-to-double) and FPTrunc (double-to-float).
No Functional change intended.

llvm-svn: 229489
2015-02-17 12:25:49 +00:00
Elena Demikhovsky ba84672519 AVX-512: changes in intel_ocl_bi calling conventions
- added mask types v8i1 and v16i1 to possible function parameters
- enabled passing 512-bit vectors in standard CC
- added a test for KNL intel_ocl_bi conventions

llvm-svn: 229482
2015-02-17 09:20:12 +00:00
Michael Kuperstein ff5acaf50c [X86] Combine vector anyext + and into a vector zext
Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements.
Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND.
This combine only covers a subset of the cases, but it's a start.

Differential Revision: http://reviews.llvm.org/D7666

llvm-svn: 229480
2015-02-17 08:22:51 +00:00
Chandler Carruth 55db07016e [x86] Teach the unpack lowering to try wider element unpacks.
This allows it to match still more places where previously we would have
to fall back on floating point shuffles or other more complex lowering
strategies.

I'm hoping to replace some of the hand-rolled unpack matching with this
routine is it gets more and more clever.

llvm-svn: 229463
2015-02-17 02:12:24 +00:00
Hal Finkel 7f957c17a0 Specify arch in test/CodeGen/X86/float-conv-elim.ll
This test was failing on non-x86 hosts because it specified a cpu of x86_64,
but not an architecture. x86_64 is obviously not a valid cpu on all
architectures.

llvm-svn: 229460
2015-02-17 00:11:19 +00:00
Cameron McInally c5764cbe4e [AVX512] Make 512b vector floating point rounds legal on AVX512.
llvm-svn: 229445
2015-02-16 22:15:42 +00:00
Simon Pilgrim b2c00f3286 [X86][SSE] Add SSE MOVQ instructions to SSEPackedInt domain
Patch to explicitly add the SSE MOVQ (rr,mr,rm) instructions to SSEPackedInt domain - prevents a number of costly domain switches.

Differential Revision: http://reviews.llvm.org/D7600

llvm-svn: 229439
2015-02-16 21:50:56 +00:00
Mehdi Amini 3e0023b8f6 SelectionDAG: fold (fp_to_u/sint (s/uint_to_fp)) here too
Update SPARC tests to match.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229438
2015-02-16 21:47:58 +00:00
Craig Topper 49df44e2e2 [X86] Remove the multiply by 8 that goes into the shift constant for X86ISD::VSHLDQ and X86ISD::VSRLDQ. This simplifies the pattern matching in isel and allows these nodes to become the patterns embedded in the instruction.
llvm-svn: 229431
2015-02-16 20:52:07 +00:00
Chandler Carruth 1e57e2deb8 [x86] Add a generic unpack-targeted lowering technique. This can be used
to generically lower blends and is particularly nice because it is
available frome SSE2 onward. This removes a lot of the remaining domain
crossing blends in SSE2 code.

I'm hoping to replace some of the "interleaved" lowering hacks with
something closer to this which should be more principled. First, this
needs to learn how to detect and use other interleavings besides that of
the natural type provided. That will be a follow-up patch though.

llvm-svn: 229378
2015-02-16 12:28:18 +00:00
Chandler Carruth 50dc783d75 [x86] Switch this test to use checks generated by my update script. NFC
llvm-svn: 229377
2015-02-16 12:23:22 +00:00
Chandler Carruth c802085b3a [x86] Add initial basic support for forming blends of v16i8 vectors.
This blend instruction is ... really lame. The register usage is insane.
As a consequence this is probably only *barely* better than 2 pshufbs
followed by a por, and that mostly because it only has to read from
a single memory location.

However, this doesn't fix as much as I kind of expected, so more to go.
Pretty sure that the ordering and delegation of v16i8 is just really,
really bad.

llvm-svn: 229373
2015-02-16 10:58:23 +00:00
Chandler Carruth e8b558c336 [x86] Add some more test cases for i8 vector blends.
llvm-svn: 229372
2015-02-16 10:51:49 +00:00
Craig Topper 7e8dcef094 [X86] Add support for lowering shuffles to 256-bit PALIGNR instruction.
llvm-svn: 229359
2015-02-16 06:29:06 +00:00
Craig Topper b2b4f8a721 [X86] Remove some hard tab characters from tests.
llvm-svn: 229358
2015-02-16 06:29:02 +00:00
Chandler Carruth 87e580a659 [x86] Teach the 128-bit vector shuffle lowering routines to take
advantage of the existence of a reasonable blend instruction.

The 256-bit vector shuffle lowering has leveraged the general technique
of decomposed shuffles and blends for quite some time, but this never
made it back into the 128-bit code, and there are a large number of
patterns where this is substantially better. For example, this removes
almost all domain crossing in vector shuffles that involve some blend
and some permutation with SSE4.1 and later. See the massive reduction
in 'shufps' for integer test cases in this commit.

This isn't perfect yet for a few reasons:

1) The v8i16 shuffle lowering continues to plague me. We don't always
   form an unpack-based blend when that would be better. But the wins
   pretty drastically outstrip the losses here.
2) The v16i8 shuffle lowering is just a disaster here. I never went and
   implemented blend support here for some terrible reason. I'll do
   that next probably. I've not updated it for now.

More variations on this technique are coming as well -- we don't
shuffle-into-unpack or shuffle-into-palignr, both of which would also be
profitable.

Note that some test cases grow significantly in the number of
instructions, but I expect to actually be faster. We use
pshufd+pshufd+blendw instead of a single shufps, but the pshufd's are
very likely to pipeline well (two ports on most modern intel chips) and
the blend is a *very* fast instruction. The domain switch penalty will
essentially always be more than a blend instruction, which is the only
increase in tree height.

llvm-svn: 229350
2015-02-16 01:52:02 +00:00
Chandler Carruth c06b7fbfc3 [x86] Clean up a few test cases with the update script. NFC
llvm-svn: 229349
2015-02-16 01:39:50 +00:00
Simon Pilgrim d4ed5df3a6 Added (still inefficient) shuffle test case for PR21138
llvm-svn: 229321
2015-02-15 18:21:39 +00:00
Simon Pilgrim 5a6375c3ba Added some test cases of missed opportunities to use unpckl/unpckh shuffles
llvm-svn: 229313
2015-02-15 15:07:45 +00:00
Simon Pilgrim 00bd79d794 [X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2
This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets.

It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions.

Differential Revision: http://reviews.llvm.org/D7596

llvm-svn: 229311
2015-02-15 13:19:52 +00:00
Chandler Carruth 6a61efdce5 [x86] Add the test case from PR22412, we now get this right even with
the new vector shuffle legality.

llvm-svn: 229310
2015-02-15 12:45:05 +00:00
Chandler Carruth bf0fb06e0d [x86] Teach the decomposed shuffle/blend lowering to use an early blend
when that will allow it to lower with a single permute instead of
multiple permutes.

It tries to detect when it will only have to do a single permute in
either case to maximize folding of loads and such.

This cuts a *lot* of the avx2 shuffle permute counts in half. =]

llvm-svn: 229309
2015-02-15 12:42:15 +00:00
Chandler Carruth 1b5285dd57 [SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats
directly into blends of the splats.

These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.

Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.

llvm-svn: 229308
2015-02-15 12:18:12 +00:00
Chandler Carruth 56e0ceda0d [x86] Stop shuffling zero vectors. =]
I was somewhat surprised this pattern really came up, but it does. It
seems better to just directly handle it than try to special case every
place where we end up forming a shuffle that devolves to a shuffle of
a zero vector.

llvm-svn: 229301
2015-02-15 10:34:52 +00:00
Chandler Carruth 62558c1d4d [x86] When splitting 256-bit vectors into 128-bit vectors, don't extract
subvectors from buildvectors. That doesn't really make any sense and it
breaks all of the down-stream matching of buildvectors to cleverly lower
shuffles.

With this, we now get the shift-based lowering of 256-bit vector
shuffles with AVX1 when we split them into 128-bit vectors. We also do
much better on the zero-extension patterns, although there remains quite
a bit of room for improvement here.

llvm-svn: 229299
2015-02-15 10:12:02 +00:00
Chandler Carruth 1c60d18aee [x86] Update some tests with the latest version of my script and llc.
This mostly adds some shuffle decode comments and cleans up indentation.

llvm-svn: 229296
2015-02-15 09:26:15 +00:00
Chandler Carruth 0ddfe0c7c5 [x86] Add a slight variation on some of the other generic shuffle
lowerings -- one which decomposes into an initial blend followed by
a permute.

Particularly on newer chips, blends are handled independently of
shuffles and so this is much less bottlenecked on the single port that
floating point shuffles are executed with on Intel.

I'll be adding this lowering to a bunch of other code paths in
subsequent commits to handle still more places where we can effectively
leverage blends when they're available in the ISA.

llvm-svn: 229292
2015-02-15 08:26:30 +00:00
Chandler Carruth 97381fd0af [x86] Add a test case for PR22390 which was a dup of PR22377 and fixed
by r229285. This is a nice different test case though, so I'd like to
have the extra testing of these kinds of patterns.

llvm-svn: 229286
2015-02-15 07:05:50 +00:00
Chandler Carruth 499d7332c5 [x86] Fix PR22377, a regression with the new vector shuffle legality
test.

This was just a matter of the DAG combine for vector shuffles being too
aggressive. This is a bit of a grey area, but I think generally if we
can re-use intermediate shuffles, we should. Certainly, given the test
cases I have available, this seems like the right call.

llvm-svn: 229285
2015-02-15 07:01:10 +00:00
Chandler Carruth fe69608839 [x86] Switch a collection of tests explicitly to the new vector shuffle
legality test (essentially, everything is legal).

I'm planning to make this the default shortly, but I'd like to fix
a collection of the bugs it exposes first, and this will let me easily
test them. It also showcases both the improvements and a few of the
regressions triggered by the change. The biggest improvements by far are
the significantly reduced shuffling and domain crossing in the combining
test case. The biggest regressions are missing some clever blending
patterns.

llvm-svn: 229284
2015-02-15 06:37:21 +00:00
Chandler Carruth 89a60770e0 [x86] Remove the now-default-on flag for the new vector shuffle lowering
strategy from a bunch of tests.

llvm-svn: 229283
2015-02-15 06:20:51 +00:00
Chandler Carruth 0613751dcb [x86] Teach my test updating script about another quirk of the printed
asm and port the mmx vector shuffle test to it.

Not thrilled with how it handles the stack manipulation logic, but I'm
much less bothered by that than I am by updating the test manually. =]
If anyone wants to teach the test checks management script about stack
adjustment patterns, that'd be cool too.

llvm-svn: 229268
2015-02-15 00:08:01 +00:00
Simon Pilgrim 31457d54f7 [X86][XOP] Enable commutation for XOP instructions
Patch to allow XOP instructions (integer comparison and integer multiply-add) to be commuted. The comparison instructions sometimes require the compare mode to be flipped but the remaining instructions can use default commutation modes.

This patch also sets the SSE domains of all the XOP instructions.

Differential Revision: http://reviews.llvm.org/D7646

llvm-svn: 229267
2015-02-14 22:40:46 +00:00
Andrea Di Biagio f54432388f [optnone] Skip pass Constant Hoisting on optnone functions.
Added test CodeGen/X86/constant-hoisting-optnone.ll to verify that
pass Constant Hoisting is not run on optnone functions.

llvm-svn: 229258
2015-02-14 15:11:48 +00:00
Simon Pilgrim 6eb925a3ed [X86] Ensure integer domain on scalar load/store stack folding tests. NFC
llvm-svn: 229257
2015-02-14 14:10:44 +00:00
Matthias Braun 33cc10724d Revert "On ELF, put PIC jump tables in a non executable section."
This reverts commit r228939.

The commit broke something in the output of exception handling tables on
darwin x86-64.

llvm-svn: 229203
2015-02-14 01:16:54 +00:00
Sanjay Patel baa6bc378f [SSE/AVX] Use multiclasses to reduce the mass of scalar math patterns; NFCI
This takes the preposterous number of patterns in this section
that were last added to in r219033 down to just plain obnoxious.

With a little more work, we might get this down to just comical.

I've added more test cases to the existing file that checks these
patterns, but it seems that some of these patterns simply don't
exist with today's shuffle lowering.

llvm-svn: 229158
2015-02-13 21:52:42 +00:00
Andrea Di Biagio b14ae8692d [CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to speculate calls to cttz/ctlz.
SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are
'cheap' for the target. Therefore, some of the logic in CodeGenPrepare
that was originally added at revision 224899 can now be removed.

This patch is basically a no functional change. It removes the duplicated
logic in CodeGenPrepare and converts all the existing target specific tests
for cttz/ctlz into SimplifyCFG tests.

Differential Revision: http://reviews.llvm.org/D7608

llvm-svn: 229105
2015-02-13 14:15:48 +00:00
Chandler Carruth d99f427e31 Revert a series of commits starting at r228886 which is triggering some
regressions for LLDB on Linux. Rafael indicated on lldb-dev that we
should just go ahead and revert these but that he wasn't at a computer.
The patches backed out are as follows:

r228980: Add support for having multiple sections with the name and ...
r228889: Invert the section relocation map.
r228888: Use the existing SymbolTableIndex intsead of doing a lookup.
r228886: Create the Section -> Rel Section map when it is first needed.

These patches look pretty nice to me, so hoping its not too hard to get
them re-instated. =D

llvm-svn: 229080
2015-02-13 07:52:39 +00:00
Craig Topper 916708f152 [X86] Add support for parsing and printing the mnemonic aliases for the XOP VPCOM instructions.
llvm-svn: 229078
2015-02-13 07:42:25 +00:00
Craig Topper d1e1bf5d78 Fix probable typo in test.
llvm-svn: 229070
2015-02-13 06:07:27 +00:00
Craig Topper 4e0700f365 [X86] Remove int_x86_sse2_psll_dq_bs and int_x86_sse2_psrl_dq_bs intrinsics. The builtins aren't used by clang.
llvm-svn: 229069
2015-02-13 06:07:24 +00:00
Rafael Espindola b6a812ebb1 Add support for having multiple sections with the same name and comdat.
Using this in combination with -ffunction-sections allows LLVM to output a .o
file with mulitple sections named .text. This saves space by avoiding long
unique names of the form .text.<C++ mangled name>.

llvm-svn: 228980
2015-02-12 23:29:51 +00:00
David Majnemer a12fcb790f X86: Don't crash if we can't decode the pshufb mask
Constant pool entries are uniqued by their contents regardless of their
type.  This means that a pshufb can have a shuffle mask which isn't a
simple array of bytes.

The code path which attempts to decode the mask didn't check for
failure, causing PR22559.

llvm-svn: 228979
2015-02-12 23:26:26 +00:00
Simon Pilgrim b4a0df9a4a Ensure integer domain on general shuffle stack folding tests
llvm-svn: 228972
2015-02-12 22:47:45 +00:00
David Blaikie 7548aeeb9f Remove typedef of a pointer type used in a gep to simplify migration of geps to a typeless-pointer future.
I'd modify my migration tool to account for this, but this is the only
instance of a typedef'd pointer type to a gep I found in the whole test
suite, so it didn't seem worthwhile.

llvm-svn: 228970
2015-02-12 22:45:25 +00:00
Rafael Espindola 203c5b9f39 On ELF, put PIC jump tables in a non executable section.
Fixes PR22558.

llvm-svn: 228939
2015-02-12 17:46:49 +00:00
Rafael Espindola 29786d4c16 Put each jump table in an independent section if the function is too.
This allows the linker to GC both, fixing pr22557.

llvm-svn: 228937
2015-02-12 17:16:46 +00:00
Michael Kuperstein f4d1aca568 [X86] Call frame optimization - allow stack-relative movs to be folded into a push
Since we track esp precisely, there's no reason not to allow this.

llvm-svn: 228924
2015-02-12 14:17:35 +00:00
Elena Demikhovsky d2cb3c8876 AVX-512: Fixed the "test" operation for i1 type
Using KORTESTW for comparison i1 value with zero was wrong since the instruction tests 16 bits.
KORTESTW may be used with KSHIFTL+KSHIFTR that clean the 15 upper bits.
I removed (X86cmp i1, 0) pattern and zero-extend i1 to i8 and then use TESTB.

There are some cases where i1 is in the mask register and the upper bits are already zeroed.
Then KORTESTW is the better solution, but it is subject for optimization.
Meanwhile, I'm fixing the correctness issue.

llvm-svn: 228916
2015-02-12 08:40:34 +00:00
Michael Kuperstein db95d04be4 [X86] A heuristic to estimate the size impact for converting stack-relative parameter movs to pushes
This gives a rough estimate of whether using pushes instead of movs is profitable, in terms of size.
We go over all calls in the MachineFunction and compute:
a) For each callsite that can not use pushes, the penalty of not having a reserved call frame.
b) For each callsite that can use pushes, the gain of actually replacing the movs with pushes (and the potential penalty of having to readjust the stack).

Differential Revision: http://reviews.llvm.org/D7561

llvm-svn: 228915
2015-02-12 08:36:35 +00:00
Ahmed Bougacha 24433a7005 [CodeGen] Don't blindly combine (fp_round (fp_round x)) to (fp_round x).
We used to do this DAG combine, but it's not always correct:
If the first fp_round isn't a value preserving truncation, it might
introduce a tie in the second fp_round, that wouldn't occur in the
single-step fp_round we want to fold to.
In other words, double rounding isn't the same as rounding.

Differential Revision: http://reviews.llvm.org/D7571

llvm-svn: 228911
2015-02-12 06:15:29 +00:00
Simon Pilgrim 2a9a745328 [X86][SSE] Added dual vector truncation tests.
llvm-svn: 228857
2015-02-11 18:14:35 +00:00
Sanjay Patel afe251649b fixed to test features, not CPUs
llvm-svn: 228836
2015-02-11 15:00:41 +00:00
Sanjay Patel b53d82cbc5 fixed to test features, not CPUs
llvm-svn: 228835
2015-02-11 15:00:19 +00:00
Sanjay Patel 8b88bc91bd fixed to test features, not CPUs
llvm-svn: 228834
2015-02-11 14:58:25 +00:00
David Majnemer ca19485f08 X86: @llvm.frameaddress should defer to SelectionDAG for Win CFI
llvm-svn: 228754
2015-02-10 22:00:34 +00:00
David Majnemer 13d0b11d7b X86: Make @llvm.frameaddress work correctly with Windows unwind codes
Simply loading or storing the frame pointer is not sufficient for
Windows targets.  Instead, create a synthetic frame object that we will
lower later.  References to this synthetic object will be replaced with
the correct reference to the frame address.

llvm-svn: 228748
2015-02-10 21:22:05 +00:00
David Majnemer a7d908eb2b X86: Emit Win64 SaveXMM opcodes at the right offset in the right order
Walk the instructions marked FrameSetup and consider any stores of XMM
registers to the stack as needing a SaveXMM opcode.

This fixes PR22521.

Differential Revision: http://reviews.llvm.org/D7527

llvm-svn: 228724
2015-02-10 19:01:47 +00:00
Paul Robinson 848cf6aa3a Explicitly initialize a flag in a default constructor.
Works around a Visual C++ issue.

Patch by Douglas Yung!

llvm-svn: 228699
2015-02-10 15:30:02 +00:00
Simon Pilgrim d142ab7d08 [X86][AVX2] Missing AVX2 memory folding instructions
Added most of the missing vector folding patterns for AVX2 (as well as fixing the vpermpd and verpmq patterns)

Differential Revision: http://reviews.llvm.org/D7492

llvm-svn: 228688
2015-02-10 13:22:57 +00:00
Simon Pilgrim cd32254a35 [X86][XOP] Added XOP memory folding patterns + tests
This patch adds the complete AMD Bulldozer XOP instruction set to the memory folding pattern tables for stack folding, etc.

Note: Many of the XOP instructions have multiple table entries as it can fold loads from different sources.

Differential Revision: http://reviews.llvm.org/D7484

llvm-svn: 228685
2015-02-10 12:57:17 +00:00
Andrea Di Biagio 62622d2396 [X86][FastIsel] Avoid introducing legacy SSE instructions if the target has AVX.
This patch teaches X86FastISel how to select AVX instructions for scalar
float/double convert operations.

Before this patch, X86FastISel always selected legacy SSE instructions
for FPExt (from float to double) and FPTrunc (from double to float).

For example:
\code
  define double @foo(float %f) {
    %conv = fpext float %f to double
    ret double %conv
  }
\end code

Before (with -mattr=+avx -fast-isel) X86FastIsel selected a CVTSS2SDrr which is
legacy SSE:
  cvtss2sd %xmm0, %xmm0

With this patch, X86FastIsel selects a VCVTSS2SDrr instead:
  vcvtss2sd %xmm0, %xmm0, %xmm0

Added test fast-isel-fptrunc-fpext.ll to check both the register-register and
the register-memory float/double conversion variants.

Differential Revision: http://reviews.llvm.org/D7438

llvm-svn: 228682
2015-02-10 12:04:41 +00:00
Nick Lewycky 1cbc13a928 Remove non-test files that appear to have been accidentally committed in r228641.
llvm-svn: 228657
2015-02-10 02:39:17 +00:00
Chandler Carruth b65d61a2e8 [x86] Fix PR22524: the DAG combiner was incorrectly handling illegal
nodes when folding bitcasts of constants.

We can't fold things and then check after-the-fact whether it was legal.
Once we have formed the DAG node, arbitrary other nodes may have been
collapsed to it. There is no easy way to go back. Instead, we need to
test for the specific folding cases we're interested in and ensure those
are legal first.

This could in theory make this less powerful for bitcasting from an
integer to some vector type, but AFAICT, that can't actually happen in
the SDAG so its fine. Now, we *only* whitelist specific int->fp and
fp->int bitcasts for post-legalization folding. I've added the test case
from the PR.

(Also as a note, this does not appear to be in 3.6, no backport needed)

llvm-svn: 228656
2015-02-10 02:25:56 +00:00
David Majnemer 93c22a45be X86: Emit an ABI compliant prologue and epilogue for Win64
Win64 has specific contraints on what valid prologues and epilogues look
like.  This constraint is born from the flexibility and descriptiveness
of Win64's unwind opcodes.

Prologues previously emitted by LLVM could not be represented by the
unwind opcodes, preventing operations powered by stack unwinding to
successfully work.

Differential Revision: http://reviews.llvm.org/D7520

llvm-svn: 228641
2015-02-10 00:57:42 +00:00
Sanjay Patel 546f26acf3 fixed to test features, not CPUs
llvm-svn: 228581
2015-02-09 17:17:09 +00:00
Sanjay Patel baf0a2415c fix test attributes; this is an SSE2 test, not a Nehalem test
llvm-svn: 228546
2015-02-08 21:14:27 +00:00
Sanjay Patel e6eed52325 fix test attributes; this is an x86-64 test, not a Nehalem test
llvm-svn: 228545
2015-02-08 21:10:40 +00:00
Sanjay Patel 5cf03374a1 fix test attributes; these are SSE2 tests, not Nehalem tests
llvm-svn: 228544
2015-02-08 21:05:03 +00:00
Sanjay Patel 9be09a3617 fix test attributes; these are SSE2 tests, not Nehalem tests
llvm-svn: 228541
2015-02-08 20:50:58 +00:00
Sanjay Patel ff9dec22ba fix test attributes; these are x86-64 tests, not Nehalem tests
llvm-svn: 228536
2015-02-08 20:05:53 +00:00
Sanjay Patel 972425d221 fix test attributes; these are MMX tests, not Nehalem tests
llvm-svn: 228535
2015-02-08 20:01:12 +00:00
Sanjay Patel c871fe746b fix test attributes; these are SSE2 tests, not Nehalem tests
llvm-svn: 228534
2015-02-08 19:50:55 +00:00
Sanjay Patel 3fa03da6f8 generalize test; nothing Nehalem-specific here
llvm-svn: 228532
2015-02-08 19:38:25 +00:00
Simon Pilgrim e490385843 [X86][AVX2] AVX2 broadcast + permute memory folding tests.
llvm-svn: 228528
2015-02-08 18:33:13 +00:00
Simon Pilgrim 7440699267 [X86][AVX2] AVX2 integer stack folding tests.
This adds tests for the remaining AVX2 instructions that currently support memory folding.

llvm-svn: 228513
2015-02-07 23:28:16 +00:00
Simon Pilgrim a2618679a8 [X86][AVX] Added missing stack folding support + test for vptest ymm instruction
llvm-svn: 228509
2015-02-07 21:44:06 +00:00
Simon Pilgrim cbc3b2fdc2 [X86][SSE] Added missing stack folding tests for (v)mpsadbw instruction
llvm-svn: 228506
2015-02-07 21:20:11 +00:00
Simon Pilgrim 0238b96c06 [X86] Force fp stack folding tests to keep to specific domain.
General boolean instructions (AND, ANDN, OR, XOR) need to use a specific domain instruction (and not just the default).

llvm-svn: 228495
2015-02-07 16:14:55 +00:00
Simon Pilgrim 947ce78d49 [X86][AVX2] More AVX2 integer stack folding tests.
llvm-svn: 228494
2015-02-07 16:07:27 +00:00
David Majnemer 5614ea9aae MC: Emit COFF section flags in the "proper" order
COFF section flags are not idempotent:
  'rd' will make a read-write section because 'd' implies write
  'dr' will make a read-only section because 'r' disables write

llvm-svn: 228490
2015-02-07 08:26:40 +00:00
Simon Pilgrim 76cb85a6c7 [X86][AVX2] Begun adding AVX2 integer stack folding tests.
llvm-svn: 228462
2015-02-06 23:12:15 +00:00
Reid Kleckner 526ec29370 Don't dllexport declarations
Fixes PR22488

llvm-svn: 228411
2015-02-06 17:59:49 +00:00
Matthias Braun 5cfba2f573 X86: Test cleanup
Use FileCheck, make it more consistent and do not rely on unoptimized
or(cmp,cmp) getting combined for max to be matched.

llvm-svn: 228361
2015-02-05 23:52:12 +00:00
Ahmed Bougacha e892d13d90 [CodeGen] Add hook/combine to form vector extloads, enabled on X86.
The combine that forms extloads used to be disabled on vector types,
because "None of the supported targets knows how to perform load and
sign extend on vectors in one instruction."

That's not entirely true, since at least SSE4.1 X86 knows how to do
those sextloads/zextloads (with PMOVS/ZX).
But there are several aspects to getting this right.
First, vector extloads are controlled by a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).

The interesting optimization enables folding of s/zextloads to illegal
(splittable) vector types, expanding them into smaller legal extloads.

It's not ideal (it introduces some legalization-like behavior in the
combine) but it's better than the obvious alternative: form illegal
extloads, and later try to split them up.  If you do that, you might
generate extloads that can't be split up, but have a valid ext+load
expansion.  At vector-op legalization time, it's too late to generate
this kind of code, so you end up forced to scalarize. It's better to
just avoid creating egregiously illegal nodes.

This optimization is enabled unconditionally on X86.

Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
FIXME).

Also note that the existing combine that forms extloads is now also
enabled on legal vectors.  This doesn't have a big effect on X86
(because sext+load is usually combined to sext_inreg+aextload).
On ARM it fires on some rare occasions; that's for a separate commit.

Differential Revision: http://reviews.llvm.org/D6904

llvm-svn: 228325
2015-02-05 18:31:02 +00:00
Andrew Trick 7fc4583eda X86 ABI fix for return values > 24 bytes.
The return value's address must be returned in %rax.
i.e. the callee needs to copy the sret argument (%rdi)
into the return value (%rax).

This probably won't manifest as a bug when the caller is LLVM-compiled
code. But it is an ABI guarantee and tools expect it.

llvm-svn: 228321
2015-02-05 18:09:05 +00:00
Bruno Cardoso Lopes ab9ae87623 [X86][MMX] Handle i32->mmx conversion using movd
Implement a BITCAST dag combine to transform i32->mmx conversion patterns
into a X86 specific node (MMX_MOVW2D) and guarantee that moves between
i32 and x86mmx are better handled, i.e., don't use store-load to do the
conversion..

llvm-svn: 228293
2015-02-05 13:23:07 +00:00
Bruno Cardoso Lopes cc6089d2e0 [X86][MMX] Add several bitcast tests
Avoid regression in previously supported MMX code by adding different
combinations of tests which exercise MMX bitcasts. Small improvements
to these patterns should come next.

llvm-svn: 228292
2015-02-05 13:22:57 +00:00
Rafael Espindola a092f17580 Don' try to make sections in comdats SHF_MERGE.
Parts of llvm were not expecting it and we wouldn't print
the entity size of the section.

Given what comdats are used for, having SHF_MERGE sections would be
just a small improvement, so just disable it for now.

Fixes pr22463.

llvm-svn: 228196
2015-02-04 21:27:24 +00:00
Michael Kuperstein cd63c5fa73 Fixes a bug in vector load legalization that confused bits and bytes.
Differential Revision: http://reviews.llvm.org/D7400

llvm-svn: 228168
2015-02-04 18:54:01 +00:00
Chandler Carruth 4d31f58c88 [x86] Give movss and movsd execution domains in the x86 backend.
This associates movss and movsd with the packed single and packed double
execution domains (resp.). While this is largely cosmetic, as we now
don't have weird ping-pong-ing between single and double precision, it
is also useful because it avoids the domain fixing algorithm from seeing
domain breaks that don't actually exist. It will also be much more
important if we have an execution domain default other than packed
single, as that would cause us to mix movss and movsd with integer
vector code on a regular basis, a very bad mixture.

llvm-svn: 228135
2015-02-04 10:58:53 +00:00
Chandler Carruth 78c8dcd9d3 [x86] Remove a low-value test that was just checking how we cleared
a register. We have lots of tests covering this.

llvm-svn: 228133
2015-02-04 10:47:34 +00:00
Chandler Carruth bb525e336b [x86] Mechanically update a bunch of tests' check lines using the latest
version of the script.

Changes include:
- Using the VEX prefix
- Skipping more detail when we have useful shuffle comments to match
- Matching more shuffle comments that have been added to the printer
  (yay!)
- Matching the destination registers of some AVX instructions
- Stripping trailing whitespace that crept in
- Fixing indentation issues

Nothing interesting going on here. I'm just trying really hard to ensure
these changes don't show up in the diffs with actual changes to the
backend.

llvm-svn: 228132
2015-02-04 10:46:53 +00:00
Chandler Carruth 22b1525ae8 [x86] Include the destination register in the check-lines for AVX
instructions.

No actual change here.

llvm-svn: 228127
2015-02-04 09:18:27 +00:00
Chandler Carruth 18ba596609 [x86] Add some tests I missed in the prior commit to cover blends with
zero for v8i16 as well.

These exhibit the same domain badness, but also exhibit other weaknesses
in our blend lowering. More fixes to come.

llvm-svn: 228126
2015-02-04 09:15:46 +00:00
Chandler Carruth 024cf8efd7 [x86] Start to introduce bit-masking based blend lowering.
This is the simplest form of bit-math based blending which only fires
when we are blending with zero and is relatively profitable. I've only
enabled this path on very specific lowering strategies. I'm planning to
widen its applicability in subsequent patches, but so far you'll notice
that even though we get fewer shufps instructions, we *still* do the bit
math in the FP execution port. I'm looking into why this is still
happening.

llvm-svn: 228124
2015-02-04 09:06:05 +00:00
Chandler Carruth 872d80e7a4 [x86] Add tests for blends-with-zero on 4-element vectors.
llvm-svn: 228122
2015-02-04 09:05:58 +00:00
Chandler Carruth abd09a1f35 [x86] Refresh the checks of a number of tests using
update_llc_test_checks.py.

The exact format of the checks has changed over time. This includes
different indenting rules, new shuffle comments that have been added,
and more operand hiding behind regular expressions.

No functional change to the tests are expected here, but this will make
subsequent patches have a clean diff as they change shuffle lowering.

llvm-svn: 228097
2015-02-04 00:58:42 +00:00
Chandler Carruth abde67eb1c [x86] Switch to using the long '--check-prefix' form which the
update_llc_test_checks.py script uses, and refresh the checks in this
test.

No functionality changed here, just bringing this test up to work with
automated updates using the python script.

llvm-svn: 228096
2015-02-04 00:58:40 +00:00
Chandler Carruth 52332dc620 [x86] Port this test to use utils/update_llc_test_checks.py.
This will make it easy to update as I change some parts of the X86
backend, makes it more clear what instruction differences are
introduced, and I find it makes it a bit easier to read as well.

llvm-svn: 228095
2015-02-04 00:58:37 +00:00
Sanjay Patel b82b8d6b84 improved CHECK
llvm-svn: 228086
2015-02-04 00:24:06 +00:00
Simon Pilgrim 46cd4f7400 [X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2
Patch to match cases where shuffle masks can be reduced to bit shifts. Similar to byte shift shuffle matching from D5699.

Differential Revision: http://reviews.llvm.org/D6649

llvm-svn: 228047
2015-02-03 21:58:29 +00:00
Chandler Carruth 1fff318a41 [x86] Add two truly horrific test cases for the new vector shuffle
lowering. I'm prepping patches to improve these, and this will let the
delta of those patches show the improvement. =]

llvm-svn: 228044
2015-02-03 21:56:28 +00:00
Chandler Carruth 4ce669d91c [x86] Update the indent and layout of some tests in this file. NFC
This is just to remove voise from using the update_llc_test_checks
script.

llvm-svn: 228043
2015-02-03 21:56:24 +00:00
Chandler Carruth a4a77ed59e [x86] Tweak my update script to use test case function names starting
with 'stress' to indicate that the specific output isn't interesting and
relax them to only check the last instruction (a ret).

I've updated the one test case that really uses this to name the one
'stress_test' which was actually producing output we can directly check.
With this, the script doesn't introduce noise when run over the v16 test
file.

llvm-svn: 228033
2015-02-03 21:26:45 +00:00
Simon Pilgrim d9885856e6 [X86][SSE] Added general integer shuffle matching for MOVQ instruction
This patch adds general shuffle pattern matching for the MOVQ zero-extend instruction (copy lower 64bits, zero upper) for all 128-bit integer vectors, it is added as a fallback test in lowerVectorShuffleAsZeroOrAnyExtend.

llvm-svn: 228022
2015-02-03 20:09:18 +00:00
Simon Pilgrim 6544f815b3 [X86][AVX2] Enabled shuffle matching for the AVX2 zero extension (128bit -> 256bit) vpmovzx* instructions.
Differential Revision: http://reviews.llvm.org/D7251

llvm-svn: 228014
2015-02-03 19:34:09 +00:00
Rafael Espindola 8d911b4419 Fix typo in test/CodeGen/X86/sibcall.ll (pr22331).
llvm-svn: 228011
2015-02-03 19:20:26 +00:00
Sanjay Patel b7d5628784 Merge consecutive 16-byte loads into one 32-byte load (PR22329)
This patch detects consecutive vector loads using the existing 
EltsFromConsecutiveLoads() logic. This fixes:
http://llvm.org/bugs/show_bug.cgi?id=22329

This patch effectively reverts the tablegen additions of D6492 / 
http://reviews.llvm.org/rL224344 ...which in hindsight were a horrible hack.

The test cases that were added with that patch are simply modified to load
from varying offsets of a base pointer. These loads did not match the existing
tablegen patterns.

A happy side effect of doing this optimization earlier is that we can now fold
the load into a math op where possible; this is shown in some of the updated
checks in the test file.

Differential Revision: http://reviews.llvm.org/D7303

llvm-svn: 228006
2015-02-03 18:54:00 +00:00
Sanjay Patel ffd039bde1 Fix program crashes due to alignment exceptions generated for SSE memop instructions (PR22371).
r224330 introduced a bug by misinterpreting the "FeatureVectorUAMem" bit.
The commit log says that change did not affect anything, but that's not correct.
That change allowed SSE instructions to have unaligned mem operands folded into
math ops, and that's not allowed in the default specification for any SSE variant. 

The bug is exposed when compiling for an AVX-capable CPU that had this feature
flag but without enabling AVX codegen. Another mistake in r224330 was not adding
the feature flag to all AVX CPUs; the AMD chips were excluded.

This is part of the fix for PR22371 ( http://llvm.org/bugs/show_bug.cgi?id=22371 ).

This feature bit is SSE-specific, so I've renamed it to "FeatureSSEUnalignedMem".
Changed the existing test case for the feature bit to reflect the new name and
renamed the test file itself to better reflect the feature.
Added runs to fold-vex.ll to check for the failing codegen.

Note that the feature bit is not set by default on any CPU because it may require a
configuration register setting to enable the enhanced unaligned behavior.

llvm-svn: 227983
2015-02-03 17:13:04 +00:00
Sanjay Patel a4276fb294 Improve test to actually check for a folded load.
This test was checking for lack of a "movaps" (an aligned load)
rather than a "movups" (an unaligned load). It also included
a store which complicated the checking.

Add specific CPU runs to prevent subtarget feature flag overrides
from inhibiting this optimization.

llvm-svn: 227972
2015-02-03 15:37:18 +00:00
Bruno Cardoso Lopes 077774b820 [X86][MMX] Improve transfer from mmx to i32
Improve EXTRACT_VECTOR_ELT DAG combine to catch conversion patterns
between x86mmx and i32 with more layers of indirection.

Before:
  movq2dq %mm0, %xmm0
  movd %xmm0, %eax
After:
  movd %mm0, %eax

llvm-svn: 227969
2015-02-03 14:46:49 +00:00
Alex Rosenberg bacd479a5d Revert part of r227437 as it was unnecessary. Thanks to echristo for
pointing this out.

llvm-svn: 227897
2015-02-02 23:58:54 +00:00
Bruno Cardoso Lopes f7410e5292 [X86][MMX] Add tests for MMX extract element
LLVM ToT produces poor MMX code compared to 3.5. However, part of the previous
functionality can be achieved by using -x86-experimental-vector-widening-legalization.
Add tests to be sure we don't regress again.

llvm-svn: 227869
2015-02-02 22:00:48 +00:00
Bruno Cardoso Lopes e4716e65b3 [X86][MMX] Cleanup shuffle, bitcast and insert element tests
- Merge MMX arg passing test files
- Merge MMX bitcast, insert elt and shuffle tests

llvm-svn: 227867
2015-02-02 21:56:11 +00:00
Sanjay Patel 122a7a7098 fix typo
llvm-svn: 227815
2015-02-02 17:47:30 +00:00
Michael Kuperstein 13fbd45263 [X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.

(Re-commit of r227728)

Differential Revision: http://reviews.llvm.org/D6789

llvm-svn: 227752
2015-02-01 16:56:04 +00:00
Michael Kuperstein e86aa9a8a4 Revert r227728 due to bad line endings.
llvm-svn: 227746
2015-02-01 16:15:07 +00:00
Michael Kuperstein bd57186c76 [X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.

Differential Revision: http://reviews.llvm.org/D6789

llvm-svn: 227728
2015-02-01 11:44:44 +00:00
Elena Demikhovsky 534a99d878 AVX2: Added 2 more tests for gather intrinsics.
llvm-svn: 227718
2015-02-01 08:52:15 +00:00
Simon Pilgrim 9c76b47469 [X86][SSE] Shuffle mask decode support for zero extend, scalar float/double moves and integer load instructions
This patch adds shuffle mask decodes for integer zero extends (pmovzx** and movq xmm,xmm) and scalar float/double loads/moves (movss/movsd).

Also adds shuffle mask decodes for integer loads (movd/movq).

Differential Revision: http://reviews.llvm.org/D7228

llvm-svn: 227688
2015-01-31 14:09:36 +00:00
Ahmed Bougacha 2d80ea1939 [X86] Cleanup tabs in test vector-zext.ll. NFC.
Some tests have tabs, some don't.
In vector-[sz]ext.ll, space wins (well duh!).

llvm-svn: 227615
2015-01-30 21:41:28 +00:00
Reid Kleckner a580b6ec67 Win64: Put a REX_W prefix on all TAILJMP* instructions
MSDN's x64 software conventions page says that this is one of the fixed
list of legal epilogues:
https://msdn.microsoft.com/en-us/library/tawsa7cb.aspx

Presumably this is how the unwinder distinguishes epilogue jumps from
in-function control flow.

Also normalize the way we place "## TAILCALL" comments on such jumps.

llvm-svn: 227611
2015-01-30 21:03:31 +00:00
Reid Kleckner ca9b9feb2c x86: Fix large model calls to __chkstk for dynamic allocas
In the large code model, we now put __chkstk in %r11 before calling it.

Refactor the code so that we only do this once. Simplify things by using
__chkstk_ms instead of __chkstk on cygming. We already use that symbol
in the prolog emission, and it simplifies our logic.

Second half of PR18582.

llvm-svn: 227519
2015-01-29 23:58:04 +00:00
Reid Kleckner dafc2ae1ad Update comments to use unreachable instead of llvm.trap, as implemented now
win64: Call __chkstk through a register with the large code model

Fixes half of PR18582. True dynamic allocas will still have a
CALL64pcrel32 which will fail.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7267

llvm-svn: 227503
2015-01-29 22:33:00 +00:00
Robert Lougher c69cfeeafa [X86] Use single add/sub for large stack offsets
For large stack offsets the compiler generates multiple immediate mode
sub/add instructions in the prologue/epilogue.  This patch makes the
compiler place the final amount to be added/subtracted into a register,
which is then added/substracted with a single operation.

Differential Revision: http://reviews.llvm.org/D7226

llvm-svn: 227458
2015-01-29 16:18:29 +00:00
Alex Rosenberg 96e833a6a6 Make the test actually test what it's supposed to test. Add a test for the from memory variant of vcvtph2ps for 256-bit.
llvm-svn: 227446
2015-01-29 15:19:54 +00:00
Alex Rosenberg c235500295 Cleanup a few tests on sse4a machines and FileCheckize along the way.
llvm-svn: 227437
2015-01-29 13:31:32 +00:00
Rafael Espindola fad3901095 Don't create multiple mergeable sections with -fdata-sections.
ELF has support for sections that can be split into fixed size or
null terminated entities.

Since these sections can be split by the linker, it is not necessary
to split them in codegen.

This reduces the combined .o size in a llvm+clang build from
202,394,570 to 173,819,098 bytes.

The time for linking clang with gold (on a VM, on a laptop) goes
from 2.250089985 to 1.383001792 seconds.

The flip side is the size of rodata in clang goes from 10,926,785
to 10,929,345 bytes.

The increase seems to be because of http://sourceware.org/bugzilla/show_bug.cgi?id=17902.

llvm-svn: 227431
2015-01-29 12:43:28 +00:00
Reid Kleckner 1185fced3d Add a Windows EH preparation pass that zaps resumes
If the personality is not a recognized MSVC personality function, this
pass delegates to the dwarf EH preparation pass. This chaining supports
people on *-windows-itanium or *-windows-gnu targets.

Currently this recognizes some personalities used by MSVC and turns
resume instructions into traps to avoid link errors.  Even if cleanups
are not used in the source program, LLVM requires the frontend to emit a
code path that resumes unwinding after an exception.  Clang does this,
and we get unreachable resume instructions. PR20300 covers cleaning up
these unreachable calls to resume.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7216

llvm-svn: 227405
2015-01-29 00:41:44 +00:00
Michael Kuperstein 951995821a [X86] Reduce some 32-bit imuls into lea + shl
Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well.

Differential Revision: http://reviews.llvm.org/D7196

llvm-svn: 227308
2015-01-28 14:08:22 +00:00
Michael Kuperstein f387611ac2 [x32] Enable sibcall optimization on x32.
This includes two things:
1) Fix TCRETURNdi and TCRETURN64di patterns to check the right thing (LP64 as opposed to target bitness).
2) Allow LEA64_32 in MatchingStackOffset.

llvm-svn: 227307
2015-01-28 13:38:48 +00:00
Elena Demikhovsky 7b0dd39db6 AVX-512: Added FMA intrinsics with rounding mode
By Asaf Badouh and Elena Demikhovsky

Added special nodes for rounding: FMADD_RND, FMSUB_RND..
It will prevent merge between nodes with rounding and other standard nodes.

llvm-svn: 227303
2015-01-28 10:21:27 +00:00
Quentin Colombet 308b171318 Revert r227242 - Merge vector stores into wider vector stores (PR21711).
This commit creates infinite loop in DAG combine for in the LLVM test-suite
for aarch64 with mcpu=cylcone (just having neon may be enough to expose this).

llvm-svn: 227272
2015-01-27 23:58:01 +00:00
Alexey Samsonov 533948088e Revert "[x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vector"
This reverts commits r226953 and r226974.

llvm-svn: 227248
2015-01-27 21:34:11 +00:00