Commit Graph

22574 Commits

Author SHA1 Message Date
David Majnemer ad5c1734e2 MC: Reorganize macro MC test along dialect lines
This commit seeks to do two things:
 - Run the surfeit of tests under the Darwin dialect.  This ends up
   affecting tests which assumed that spaces could deliminate arguments.
 - The GAS dialect tests should limit their surface area to things that
   could plausibly work under GAS. For example, Darwin style arguments
   have no business being in such a test.

llvm-svn: 200383
2014-01-29 09:18:43 +00:00
Kostya Serebryany 6e15ec1442 [asan] simplify a test
llvm-svn: 200378
2014-01-29 07:35:43 +00:00
Venkatraman Govindaraju 141d0e2221 [Sparc] Use %r_disp32 for pc_rel entries in FDE as well.
This makes MCAsmInfo::getExprForFDESymbol() a virtual function and overrides it in SparcMCAsmInfo.

llvm-svn: 200376
2014-01-29 06:59:20 +00:00
NAKAMURA Takumi b366f01f83 Revert r200340, "Add line table debug info to COFF files when using a win32 triple."
It was incompatible with --target=i686-win32.

llvm-svn: 200375
2014-01-29 06:05:38 +00:00
Venkatraman Govindaraju fd5c1f9497 [Sparc] Use %r_disp32 for pc_rel entries in gcc_except_table and eh_frame.
Otherwise, assembler (gas) fails to assemble them with error message "operation
combines symbols in different segments". This is because MC computes
pc_rel entries with subtract expression between labels from different sections.

llvm-svn: 200373
2014-01-29 04:51:35 +00:00
Chandler Carruth 66f0b16360 [LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered"
because of the inside-out run of LoopSimplify in the LoopPassManager and
the fact that LoopSimplify couldn't be "preserved" across two
independent LoopPassManagers.

Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI
node because it thought it was rewriting (via SCEV) the incoming value
to a loop invariant value. While it may well be invariant for the
current loop, it may be rewritten in terms of an enclosing loop's
values. This in and of itself is fine, as the LCSSA PHI node in the
enclosing loop for the inner loop value we're rewriting will have its
own LCSSA PHI node if used outside of the enclosing loop. With me so
far?

Well, the current loop and the enclosing loop may share an exiting
block and exit block, and when they do they also share LCSSA PHI nodes.
In this case, its not valid to RAUW through the LCSSA PHI node.

Expected crazy test included.

llvm-svn: 200372
2014-01-29 04:40:19 +00:00
Rafael Espindola 7a578026ae We do use pipefail these days. Update the test.
llvm-svn: 200370
2014-01-29 04:08:05 +00:00
Venkatraman Govindaraju 50f32d949b [SparcV9] Use correct register class (I64RegClass) to hold the address of _GLOBAL_OFFSET_TABLE_ in sparcv9.
llvm-svn: 200368
2014-01-29 03:35:08 +00:00
Rafael Espindola 310f501ef0 Use a raw_stream to implement the mangler.
This is a bit more convenient for some callers, but more importantly, it is
easier to implement correctly. Doing this removes the patching of already
printed data that was used for fastcall, fixing a crash with private fastcall
symbols.

llvm-svn: 200367
2014-01-29 02:30:38 +00:00
Kevin Qin 92d64d2d56 [AArch64 NEON] Lower SELECT_CC with vector operand.
When the scalar compare is between floating point and operands are
vector, we custom lower SELECT_CC to use NEON SIMD compare for
generating less instructions.

llvm-svn: 200365
2014-01-29 01:57:30 +00:00
David Woodhouse 21bfc71752 [ARM] Remove superfluous inline asm mode switch test
llvm-svn: 200361
2014-01-29 00:49:28 +00:00
David Woodhouse 7db3705f9e Tests for mode switching
1. test that inlineasm works
2. test that relaxable instructions are re-encoded in the correct mode.

llvm-svn: 200351
2014-01-28 23:13:30 +00:00
Timur Iskhodzhanov 7523743075 Disable the COFF tests on non-X86 archs
llvm-svn: 200341
2014-01-28 21:47:33 +00:00
Timur Iskhodzhanov 2c659648b3 Add line table debug info to COFF files when using a win32 triple.
Reviewed at http://llvm-reviews.chandlerc.com/D2232

llvm-svn: 200340
2014-01-28 21:33:27 +00:00
Matheus Almeida 2e03f24301 [mips] Fix ELF header flags.
As opposed to GCC/GAS the default ABI for Mips64 is n64.
Compatibility bit should be set if o32 ABI is used when targeting Mips64.

llvm-svn: 200332
2014-01-28 19:24:11 +00:00
Gautam Chakrabarti 2c283400f9 [NVPTX] Fix emitting aggregate parameters
The code was missing the case for aggregate parameters and
hence was emitting them as .b0 type. Also fixed a couple
of comments.

llvm-svn: 200325
2014-01-28 18:35:29 +00:00
Andrea Di Biagio 2ea61f17ad [X86] Add extra rules for combining vselect dag nodes into movsd.
This improves the fix committed at revision 199683 adding the
following new target specific combine rules:

1) fold (v4i32: vselect <0,0,-1,-1>, A, B) ->
        (v4i32 (bitcast (movsd (v2i64 (bitcast A)), (v2i64 (bitcast B))) ))

2) fold (v4f32: vselect <0,0,-1,-1>, A, B) ->
        (v4f32 (bitcast (movsd (v2f64 (bitcast A)), (v2f64 (bitcast B))) ))

3) fold (v4i32: vselect <-1,-1,0,0>, A, B) ->
        (v4i32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))

4) fold (v4f32: vselect <-1,-1,0,0>, A, B) ->
        (v4f32 (bitcast (movsd (v2i64 (bitcast B)), (v2i64 (bitcast A))) ))

llvm-svn: 200324
2014-01-28 18:14:21 +00:00
Rafael Espindola ab73c493ea Fix pr14893.
When simplifycfg moves an instruction, it must drop metadata it doesn't know
is still valid with the preconditions changes. In particular, it must drop
the range and tbaa metadata.

The patch implements this with an utility function to drop all metadata not
in a white list.

llvm-svn: 200322
2014-01-28 16:56:46 +00:00
Andrea Di Biagio b6d39afbda [DAGCombiner] Avoid introducing an illegal build_vector when folding a sign_extend.
Make sure that we don't introduce illegal build_vector dag nodes
when trying to fold a sign_extend of a build_vector.

This fixes a regression introduced by r200234.
Added test CodeGen/X86/fold-vector-sext-crash.ll
to verify that llc no longer crashes with an assertion failure
due to an illegal build_vector of type MVT::v4i64.

Thanks to Ilia Filippov for spotting this regression and for
providing a reproducible test case.

llvm-svn: 200313
2014-01-28 12:53:56 +00:00
Chandler Carruth b783628560 [vectorizer] Completely disable the block frequency guidance of the loop
vectorizer, placing it behind an off-by-default flag.

It turns out that block frequency isn't what we want at all, here or
elsewhere. This has been I think a nagging feeling for several of us
working with it, but Arnold has given some really nice simple examples
where the results are so comprehensively wrong that they aren't useful.

I'm planning to email the dev list with a summary of why its not really
useful and a couple of ideas about how to better structure these types
of heuristics.

llvm-svn: 200294
2014-01-28 09:10:41 +00:00
Hal Finkel 4e703bcecd Handle spilling the PPC GPRC_NOR0 register class
GPRC_NOR0 is not a subclass of GPRC (because it also contains the ZERO pseudo
register). As a result, we also need to check for it in the spilling code.

llvm-svn: 200288
2014-01-28 05:32:58 +00:00
Michel Danzer bf1a641060 R600/SI: Add pattern for truncating i32 to i1
Fixes half a dozen piglit tests with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200283
2014-01-28 03:01:16 +00:00
Jakob Stoklund Olesen 83c677353b Fix the DWARF EH encodings for Sparc PIC code.
Also emit the stubs that were generated for references to typeinfo
symbols.

llvm-svn: 200282
2014-01-28 02:52:26 +00:00
Reid Kleckner 26af2cae05 Update optimization passes to handle inalloca arguments
Summary:
I searched Transforms/ and Analysis/ for 'ByVal' and updated those call
sites to check for inalloca if appropriate.

I added tests for any change that would allow an optimization to fire on
inalloca.

Reviewers: nlewycky

Differential Revision: http://llvm-reviews.chandlerc.com/D2449

llvm-svn: 200281
2014-01-28 02:38:36 +00:00
Arnold Schwaighofer 18865db3c1 LoopVectorize: Support conditional stores by scalarizing
The vectorizer takes a loop like this and widens all instructions except for the
store. The stores are scalarized/unrolled and hidden behind an "if" block.

  for (i = 0; i < 128; ++i) {
    if (a[i] < 10)
      a[i] += val;
  }

  for (i = 0; i < 128; i+=2) {
    v = a[i:i+1];
    v0 = (extract v, 0) + 10;
    v1 = (extract v, 1) + 10;
    if (v0 < 10)
      a[i] = v0;
    if (v1 < 10)
      a[i] = v1;
  }

The vectorizer relies on subsequent optimizations to sink instructions into the
conditional block where they are anticipated.

The flag "vectorize-num-stores-pred" controls whether and how many stores to
handle this way. Vectorization of conditional stores is disabled per default for
now.

This patch also adds a change to the heuristic when the flag
"enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small
loops until load/store ports are saturated. This heuristic uses TTI's
getMaxUnrollFactor as a measure for load/store ports.

I also added a second flag -enable-cond-stores-vec. It will enable vectorization
of conditional stores. But there is no cost model for vectorization of
conditional stores in place yet so this will not do good at the moment.

rdar://15892953

Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll
-vectorize-num-stores-pred=1 (before the BFI change):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower)
   Applications/siod/siod         2.18%
 Performance improvements:
   mesa                          -4.42%
   libquantum                    -4.15%

 With a patch that slightly changes the register heuristics (by subtracting the
 induction variable on both sides of the register pressure equation, as the
 induction variable is probably not really unrolled):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2  7.73%
   Applications/siod/siod          1.97%

 Performance Improvements:
   libquantum                    -13.05% (we now also unroll quantum_toffoli)
   mesa                           -4.27%

llvm-svn: 200270
2014-01-28 01:01:53 +00:00
Manman Ren f1cb16e481 PGO branch weight: keep halving the weights until they can fit into
uint32.

When folding branches to common destination, the updated branch weights
can exceed uint32 by more than factor of 2. We should keep halving the
weights until they can fit into uint32.

llvm-svn: 200262
2014-01-27 23:39:03 +00:00
Mark Seaborn ba86cf51c9 ARM MC: Fix the initial DWARF CFI unwind info at the start of a function
This brings MC into line with GNU 'as' on ARM, and it brings the ARM
target into line with most other LLVM targets, which declare the
initial CFI state with addInitialFrameState().

Without this, functions generated with .cfi_startproc/endproc on ARM
will tend to cause GDB to abort with:
  gdb/dwarf2-frame.c:1132: internal-error: Unknown CFA rule.

I've also tested this by comparing the output of "readelf -w" on the
object files produced by llvm-mc and gas when given the .s file added
here.

This change is part of addressing PR18636.

Differential Revision: http://llvm-reviews.chandlerc.com/D2597

llvm-svn: 200255
2014-01-27 22:38:14 +00:00
David Peixotto b76f55f74a Fix unsupported addressing mode assertion for pld
Summary:
This commit gives an address mode to the PLD instruction. We
were getting an assertion failure in the frame lowering code
because we had code that was doing a pld of a stack allocated
address. The frame lowering was checking the address mode and
then asserting because pld had none defined.

This commit fixes pld for arm mode. There was a previous fix for
thumb mode in a separate commit. The commit for thumb mode
added a test in a separate file because it would otherwise fail
for arm. This commit moves the thumb test back into the prefetch.ll
file and adds the corresponding arm test.

Differential Revision: http://llvm-reviews.chandlerc.com/D2622

llvm-svn: 200248
2014-01-27 21:39:04 +00:00
Andrea Di Biagio f09a357765 [DAGCombiner] Teach how to fold sext/aext/zext of constant build vectors.
This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when
the operand in input is a build vector of constants (or UNDEFs).

The inability to fold a sext/zext of a constant build_vector was the root
cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support.

Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a
ConstantSDNode.

llvm-svn: 200234
2014-01-27 18:45:30 +00:00
David Majnemer e035cf9ce4 MC: Add support for .cfi_startproc simple
This commit allows LLVM MC to process .cfi_startproc directives when
they are followed by an additional `simple' identifier. This signals to
elide the emission of target specific CFI instructions that would
normally occur initially.

This fixes PR16587.

Differential Revision: http://llvm-reviews.chandlerc.com/D2624

llvm-svn: 200227
2014-01-27 17:20:25 +00:00
Chandler Carruth e24f3973eb [vectorize] Initial version of respecting PGO in the vectorizer: treat
cold loops as-if they were being optimized for size.

Nothing fancy here. Simply test case included. The nice thing is that we
can now incrementally build on top of this to drive other heuristics.
All of the infrastructure work is done to get the profile information
into this layer.

The remaining work necessary to make this a fully general purpose loop
unroller for very hot loops is to make it a fully general purpose loop
unroller. Things I know of but am not going to have time to benchmark
and fix in the immediate future:

1) Don't disable the entire pass when the target is lacking vector
   registers. This really doesn't make any sense any more.
2) Teach the unroller at least and the vectorizer potentially to handle
   non-if-converted loops. This is trivial for the unroller but hard for
   the vectorizer.
3) Compute the relative hotness of the loop and thread that down to the
   various places that make cost tradeoffs (very likely only the
   unroller makes sense here, and then only when dealing with loops that
   are small enough for unrolling to not completely blow out the LSD).

I'm still dubious how useful hotness information will be. So far, my
experiments show that if we can get the correct logic for determining
when unrolling actually helps performance, the code size impact is
completely unimportant and we can unroll in all cases. But at least
we'll no longer burn code size on cold code.

One somewhat unrelated idea that I've had forever but not had time to
implement: mark all functions which are only reachable via the global
constructors rigging in the module as optsize. This would also decrease
the impact of any more aggressive heuristics here on code size.

llvm-svn: 200219
2014-01-27 13:11:50 +00:00
Benjamin Kramer 9e709bce86 ConstantHoisting: We can't insert instructions directly in front of a PHI node.
Insert before the terminating instruction of the dominating block instead.

llvm-svn: 200218
2014-01-27 13:11:43 +00:00
Chandler Carruth edfa37effa [vectorizer] Add an override for the target instruction cost and use it
to stabilize a test that really is trying to test generic behavior and
not a specific target's behavior.

llvm-svn: 200215
2014-01-27 11:41:50 +00:00
Chandler Carruth 147c23278f [vectorizer] Teach the loop vectorizer's unroller to only unroll by
powers of two. This is essentially always the correct thing given the
impact on alignment, scaling factors that can be used in addressing
modes, etc. Also, fix the management of the unroll vs. small loop cost
to more accurately model things with this world.

Enhance a test case to actually exercise more of the unroll machinery if
using synthetic constants rather than a specific target model. Before
this change, with the added flags this test will unroll 3 times instead
of either 2 or 4 (the two sensible answers).

While I don't expect this to make a huge difference, if there are lots
of loops sitting right on the edge of hitting the 'small unroll' factor,
they might change behavior. However, I've benchmarked moving the small
loop cost up and down in many various ways and by a huge factor (2x)
without seeing more than 0.2% code size growth. Small adjustments such
as the series that led up here have led to about 1% improvement on some
benchmarks, but it is very close to the noise floor so I mostly checked
that nothing regressed. Let me know if you see bad behavior on other
targets but I don't expect this to be a sufficiently dramatic change to
trigger anything.

llvm-svn: 200213
2014-01-27 11:12:24 +00:00
Nick Lewycky 629199ccb3 Fix crasher introduced in r200203 and caught by a libc++ buildbot. Don't assume that getMulExpr returns a SCEVMulExpr, it may have simplified it to something else!
llvm-svn: 200210
2014-01-27 10:47:44 +00:00
Nick Lewycky 31eaca5513 Teach SCEV to handle more cases of 'and X, CST', specifically where CST is any number of contiguous 1 bits in a row, with any number of leading and trailing 0 bits.
Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary.

Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown.

llvm-svn: 200203
2014-01-27 10:04:03 +00:00
Stepan Dyatkovskiy 55139555c4 Additional fix for 200201: due to dependence on bitwidth test was moved to X86 directory.
llvm-svn: 200202
2014-01-27 09:43:10 +00:00
Stepan Dyatkovskiy 157bb42e27 Fix for PR18102.
Issue outcomes from DAGCombiner::MergeConsequtiveStores, more precisely from
mem-ops sequence sorting.

Consider, how MergeConsequtiveStores works for next example:

store i8 1, a[0]
store i8 2, a[1]
store i8 3, a[1]   ; a[1] again.
return   ; DAG starts here

1. Method will collect all the 3 stores.
2. It sorts them by distance from the base pointer (farthest with highest
index).
3. It takes first consecutive non-overlapping stores and (if possible) replaces
them with a single store instruction.

The point is, we can't determine here which 'store' instruction
would be the second after sorting ('store 2' or 'store 3').
It happens that 'store 3' would be the second, and 'store 2' would be the third.

So after merging we have the next result:

store i16 (1 | 3 << 8), base   ; is a[0] but bit-casted to i16
store i8 2, a[1]

So actually we swapped 'store 3' and 'store 2' and got wrong contents in a[1].

Fix: In sort routine just also take into account mem-op sequence number. 
llvm-svn: 200201
2014-01-27 09:18:31 +00:00
Michel Danzer 13736221e3 R600/SI: Add intrinsic for BUFFER_LOAD_DWORD* instructions
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200196
2014-01-27 07:20:51 +00:00
Michel Danzer 6064f57ae8 R600/SI: Add intrinsic for S_SENDMSG instruction
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200195
2014-01-27 07:20:44 +00:00
Rui Ueyama 06dc5e79c6 Rename IMAGE_DLL_CHARACTERISTICS_HIGH_ENTROPY_VA.
editbin.exe and link.exe both accepts /highentropyva option to set this bit, so
doing s/VIRTUAL_ADDRESS/VA/ should make sense.

llvm-svn: 200191
2014-01-27 04:22:24 +00:00
Kevin Qin 4a183d7094 [AArch64 NEON] Try to generate CONCAT_VECTOR when lowering BUILD_VECTOR or SHUFFLE_VECTOR.
Replace r199791.

llvm-svn: 200180
2014-01-27 02:53:54 +00:00
Kevin Qin 9eeedfbaa6 Revert r199791.
It's old version which has some bugs. I'll commit lattest patch soon.

llvm-svn: 200179
2014-01-27 02:53:41 +00:00
Saleem Abdulrasool f9352a3880 MC: fix test locations/name
Placed the MC variant diagnostics in the wrong directory accidentally.  Move
them into their respective architecture specific directories.

llvm-svn: 200161
2014-01-26 22:55:02 +00:00
Saleem Abdulrasool a903661289 ARM: improve diagnostics for .word directive
If a complex expression was passed to the .word directive and the first part of
the directive failed to parse, a secondary diagnostic would be produced that
would clutter the error diagnostics.  Improve the diagnostics by consuming the
remainder of the statement.

llvm-svn: 200160
2014-01-26 22:29:50 +00:00
Saleem Abdulrasool a25e1e4ebe AsmParser: improve diagnostics for invalid variants
An emitted diagnostic for an invalid relocation variant would place the caret on
the token following the relocation variant indicator or at the end of the line
if there was no following token.  This change corrects the placement of the
caret to point to the token.

llvm-svn: 200159
2014-01-26 22:29:43 +00:00
Jakob Stoklund Olesen 6f39ce4be2 Clean up the Legal/Expand logic for SPARC popc.
llvm-svn: 200141
2014-01-26 08:12:34 +00:00
Rafael Espindola cb1953f6d9 Implement the missing bits corresponding to .mips_hack_elf_flags.
These were:
* noreorder handling on the target object streamer and asm parser.
* setting the initial flag bits based on the enabled features.
* setting the elf header flag for micromips

It is *really* depressing I am the one doing this instead of someone at
mips actually taking the time to understand the infrastructure.

llvm-svn: 200138
2014-01-26 06:57:13 +00:00
Jakob Stoklund Olesen ead3b3d7a1 Only generate the popc instruction for SPARC CPUs that implement it.
The popc instruction is defined in the SPARCv9 instruction set
architecture, but it was emulated on CPUs older than Niagara 2.

llvm-svn: 200131
2014-01-26 06:09:59 +00:00
Jakob Stoklund Olesen 39f0833f47 Fix swapped CASA operands.
Found by SingleSource/UnitTests/AtomicOps.c

llvm-svn: 200130
2014-01-26 06:09:54 +00:00