Commit Graph

29484 Commits

Author SHA1 Message Date
Hal Finkel ffb460fdf0 [PowerPC] Fix PPCLoopPreIncPrep for depth > 1 loops
This pass had the same problem as the data-prefetching pass: it was only
checking for depth == 1 loops in practice. Fix that, add some debugging
statements, and make sure that, when we grab an AddRec, it is for the loop we
expect.

llvm-svn: 234670
2015-04-11 00:33:08 +00:00
Ahmed Bougacha b96444efd1 [CodeGen] Split -enable-global-merge into ARM and AArch64 options.
Currently, there's a single flag, checked by the pass itself.
It can't force-enable the pass (and is on by default), because it
might not even have been created, as that's the targets decision.
Instead, have separate explicit flags, so that the decision is
consistently made in the target.

Keep the flag as a last-resort "force-disable GlobalMerge" for now,
for backwards compatibility.

llvm-svn: 234666
2015-04-11 00:06:36 +00:00
Reid Kleckner 9405ef0e1f [WinEH] Recognize SEH finally block inserted by the frontend
This allows winehprepare to build sensible llvm.eh.actions calls for SEH
finally blocks.  The pattern matching in this change is brittle and
should be replaced with something more robust soon.  In the meantime,
this will let us write the code that produces __C_specific_handler xdata
tables, which we need regardless of how we decide to get finally blocks
through EH preparation.

llvm-svn: 234663
2015-04-10 23:12:29 +00:00
Philip Reames 8106a264d3 [RewriteStatepointsForGC] test case missing from 234657
llvm-svn: 234658
2015-04-10 22:58:39 +00:00
Philip Reames df1ef08c0c [RewriteStatepointsForGC] Use an actual liveness algorithm
When rewriting statepoints to make relocations explicit, we need to have a conservative but consistent notion of where a particular pointer is live at a particular site. The old code just used dominance, which is correct, but decidedly more conservative then it needed to be. This patch implements a simple dataflow algorithm that's run one per function (well, twice counting fixup after base pointer insertion). There's still lots of room to make this faster, but it's fast enough for all practical purposes today.

Differential Revision: http://reviews.llvm.org/D8674

llvm-svn: 234657
2015-04-10 22:53:14 +00:00
Philip Reames 85b36a8157 [RewriteStatepointsForGC] Preprocess the IR to remove unreachable blocks and single entry phis
Two related small changes:

    Various dominance based queries about liveness can get confused if we're talking about unreachable blocks. To avoid reasoning about such cases, just remove them before rewriting statepoints.
    Remove single entry phis (likely left behind by LCSSA) to reduce the number of live values.

Both of these are motivated by http://reviews.llvm.org/D8674 which will be submitted shortly.

Differential Revision: http://reviews.llvm.org/D8675

llvm-svn: 234651
2015-04-10 22:07:04 +00:00
Philip Reames 8531d8c491 [RewriteStatepointsForGC] Limited support for vectors of pointers
This patch adds limited support for inserting explicit relocations when there's a vector of pointers live over the statepoint. This doesn't handle the case where the vector contains a mix of base and non-base pointers; that's future work.

The current implementation just scalarizes the vector over the gc.statepoint before doing the explicit rewrite. An alternate approach would be to plumb the vector all the way though the backend lowering, but doing that appears challenging. In particular, the size of the indirect spill slot is currently assumed to be sizeof(pointer) throughout the backend.

In practice, this is enough to allow running the SLP and Loop vectorizers before RewriteStatepointsForGC.

Differential Revision: http://reviews.llvm.org/D8671

llvm-svn: 234647
2015-04-10 21:48:25 +00:00
Sanjoy Das b6c5914308 [InstCombine][CodeGenPrep] Create llvm.uadd.with.overflow in CGP.
Summary:
This change moves creating calls to `llvm.uadd.with.overflow` from
InstCombine to CodeGenPrep.  Combining overflow check patterns into
calls to the said intrinsic in InstCombine inhibits optimization because
it introduces an intrinsic call that not all other transforms and
analyses understand.

Depends on D8888.

Reviewers: majnemer, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8889

llvm-svn: 234638
2015-04-10 21:07:09 +00:00
Reid Kleckner aa7a5a3a0f Avoid spewing binary to stdout in some filetype=obj tests
llvm-svn: 234627
2015-04-10 19:36:55 +00:00
Sanjay Patel 89e28f67cc use update_llc_test_checks.py to tighten checking
test features, not CPUs

remove unnecessary cruft

llvm-svn: 234622
2015-04-10 18:31:42 +00:00
Reid Kleckner 6e48a826e8 [WinEH] Try to make outlining invokes work a little better
WinEH currently turns invokes into calls. Long term, we will reconsider
this, but for now, make sure we remap the operands and clone the
successors of the new terminator.

llvm-svn: 234608
2015-04-10 16:26:42 +00:00
Hal Finkel a9fceb803d [PowerPC] Prefetching should also consider depth > 1 loops
Iterating over loops from the LoopInfo instance only provides top-level loops.
We need to search the whole tree of loops to find the inner ones.

llvm-svn: 234603
2015-04-10 15:05:02 +00:00
Toma Tabacu 038a330ede [mips] [IAS] Make the mips-expansions-bad.s test more readable. NFC.
Move the check lines below the code lines and change the indentation from 8
spaces to 2 spaces.

llvm-svn: 234584
2015-04-10 10:46:59 +00:00
Jingyue Wu 5da831cc31 Divergence analysis for GPU programs
Summary:
Some optimizations such as jump threading and loop unswitching can negatively
affect performance when applied to divergent branches. The divergence analysis
added in this patch conservatively estimates which branches in a GPU program
can diverge. This information can then help LLVM to run certain optimizations
selectively.

Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll

Reviewers: resistor, hfinkel, eliben, meheff, jholewinski

Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8576

llvm-svn: 234567
2015-04-10 05:03:50 +00:00
Hal Finkel 93138503ae [PowerPC] Don't crash on PPC32 i64 fp_to_uint on modern cores
When we have an instruction for this (and, thus, don't generate a runtime
call), we need to custom type legalize this (in a trivial way, just as we do
for fp_to_sint).

Fixes PR23173.

llvm-svn: 234561
2015-04-10 03:39:00 +00:00
Ahmed Bougacha 1ffe7c7d36 [AArch64] Promote f16 operations to f32.
For the most common ones (such as fadd), we already did the promotion.
Do the same thing for all the others.

Currently, we'll just crash/assert on all these operations, as
there's no hardware or libcall support whatsoever.

f16 (half) is specified as an interchange - not arithmetic - format,
and is expected to be promoted to single-precision for arithmetic
operations.

While there, teach the legalizer about promoting some of the (mostly
floating-point) operations that we never needed before.

Differential Revision: http://reviews.llvm.org/D8648
See related discussion on the thread for: http://reviews.llvm.org/D8755

llvm-svn: 234550
2015-04-10 00:08:48 +00:00
Nemanja Ivanovic c09047916a Add LLVM support for remaining integer divide and permute instructions from ISA 2.06
This is the patch corresponding to review:
http://reviews.llvm.org/D8406

It adds some missing instructions from ISA 2.06 to the PPC back end.

llvm-svn: 234546
2015-04-09 23:54:37 +00:00
Ahmed Bougacha df43737782 [CodeGen] Combine concat_vector of trunc'd scalar to scalar_to_vector.
We already do:
  concat_vectors(scalar, undef) -> scalar_to_vector(scalar)
When the scalar is legal.
When it's not, but is a truncated legal scalar, we can also do:
  concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)
Which is equivalent, since the upper lanes are undef anyway.
While there, teach the combine to look at more than 2 operands.

Differential Revision: http://reviews.llvm.org/D8883

llvm-svn: 234530
2015-04-09 20:04:47 +00:00
Juergen Ributzka bd0c7eb4dc [AArch64][FastISel] Fix integer extend optimization.
The integer extend optimization tries to fold the extend into the load
instruction. This requires us to identify if the extend has already been
emitted or not and act accordingly on it.

The check that was originally performed for this was not sufficient. Besides
checking the ValueMap for a mapped register we also need to check if the
virtual register has already an associated machine instruction that defines it.

This fixes rdar://problem/20470788.

llvm-svn: 234529
2015-04-09 20:00:46 +00:00
Rafael Espindola 1c84271694 Revert "Refactoring and enhancement to FMA combine."
This reverts commit r234513. It was failing on the bots.

llvm-svn: 234518
2015-04-09 18:29:32 +00:00
Olivier Sallenave 53703d0862 Refactoring and enhancement to FMA combine.
llvm-svn: 234513
2015-04-09 17:55:26 +00:00
Javed Absar 5c5e3c5e36 [ARM] support for Cortex-R4/R4F
Currently, llvm (backend) doesn't know cortex-r4, even though it is the
default target for armv7r. Using "--target=armv7r-arm-none-eabi" provokes
'cortex-r4' is not a recognized processor for this target' by llvm.
This patch adds support for cortex-r4 and, very closely related, r4f.

llvm-svn: 234486
2015-04-09 14:07:28 +00:00
Kristof Beyls 17cb8982f4 [AArch64] Add support for dynamic stack alignment
Differential Revision: http://reviews.llvm.org/D8876

llvm-svn: 234471
2015-04-09 08:49:47 +00:00
Lang Hames 522bf13b83 [AArch64] Remove redundant -march option. Also fix a think-o from r234462.
llvm-svn: 234467
2015-04-09 05:34:57 +00:00
Nick Lewycky d65d3d7b8b Not all triples put _ before function names. Specify a triple to make this test pass on Linux.
llvm-svn: 234466
2015-04-09 05:31:32 +00:00
Lang Hames 903338511b [AArch64] Teach AArch64TargetLowering::getOptimalMemOpType to consider alignment
restrictions when choosing a type for small-memcpy inlining in
SelectionDAGBuilder.

This ensures that the loads and stores output for the memcpy won't be further
expanded during legalization, which would cause the total number of instructions
for the memcpy to exceed (often significantly) the inlining thresholds.

<rdar://problem/17829180>

llvm-svn: 234462
2015-04-09 03:40:33 +00:00
Akira Hatanaka 4cb3d064ce Use option -march instead of -mtriple to avoid overconditionalizing the test.
This fixes r234439, which was committed to fix the test failures caused by
r234430.

llvm-svn: 234451
2015-04-08 23:02:45 +00:00
Akira Hatanaka bda7aa5938 Pass -mtriple to llc to appease buildbot.
This fixes the test case I committed in r234430.

llvm-svn: 234439
2015-04-08 21:30:48 +00:00
Andrew Kaylor 67d3c0359d [WinEH] Minor bug fixes.
Fixed insert point for allocas created for demoted values.
Clear the nested landing pad list after it has been processed.

llvm-svn: 234433
2015-04-08 20:57:22 +00:00
Akira Hatanaka c6fab80536 [DAGCombine] Fix a bug in MergeConsecutiveStores.
The bug manifests when there are two loads and two stores chained as follows in
a DAG,

(ld v3f32) -> (st f32) -> (ld v3f32) -> (st f32)

and the stores' values are extracted from the preceding vector loads.

MergeConsecutiveStores would replace the first store in the chain with the
merged vector store, which would create a cycle between the merged store node
and the last load node that appears in the chain.

This commits fixes the bug by replacing the last store in the chain instead.

rdar://problem/20275084

Differential Revision: http://reviews.llvm.org/D8849

llvm-svn: 234430
2015-04-08 20:34:53 +00:00
Adam Nemet ce48250f11 [LoopAccesses] Allow analysis to complete in the presence of uniform stores
(Re-apply r234361 with a fix and a testcase for PR23157)

Both run-time pointer checking and the dependence analysis are capable
of dealing with uniform addresses. I.e. it's really just an orthogonal
property of the loop that the analysis computes.

Run-time pointer checking will only try to reason about SCEVAddRec
pointers or else gives up. If the uniform pointer turns out the be a
SCEVAddRec in an outer loop, the run-time checks generated will be
correct (start and end bounds would be equal).

In case of the dependence analysis, we work again with SCEVs. When
compared against a loop-dependent address of the same underlying object,
the difference of the two SCEVs won't be constant. This will result in
returning an Unknown dependence for the pair.

When compared against another uniform access, the difference would be
constant and we should return the right type of dependence
(forward/backward/etc).

The changes also adds support to query this property of the loop and
modify the vectorizer to use this.

Patch by Ashutosh Nema!

llvm-svn: 234424
2015-04-08 17:48:40 +00:00
Scott Douglass 7ad7792088 [ARM] make vminnm/vmaxnm work with ?le, ?ge and no-nans-fp-math
Because -menable-no-nans causes fcmp conditions to be rewritten
without 'o' or 'u' the recognition code in needs to cope. Also
extended it to handle 'le' and 'ge.

Differential Revision: http://reviews.llvm.org/D8725

llvm-svn: 234421
2015-04-08 17:18:28 +00:00
Sanjay Patel 0f712d04f2 fixed to test features, not CPU models
llvm-svn: 234413
2015-04-08 16:51:42 +00:00
Toma Tabacu c6ce0749ad [mips] [IAS] Do not generate redundant move when expanding lw/sw with symbol.
Summary:
Even though there is no 2nd register operand in the "lw/sw $8, symbol" case, we still try to find one, 
and we end up with $0, which makes us generate an unnecessary "addu $8, $8, $0" (a.k.a. "move $8, $8").

We can avoid this by checking if the 2nd register operand is different from $0, before generating the addu.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8055

llvm-svn: 234406
2015-04-08 13:52:41 +00:00
Toma Tabacu 91fc0b3c10 [mips] [IAS] Add support for the BNEZL and BEQZL pseudo-instructions.
Summary:
They are of the form "bnezl/beqzl $rs, offset" and expand to "bnel/beql $rs, $zero, offset".

These instructions are used in Linux inline assembly.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8540

llvm-svn: 234401
2015-04-08 12:15:05 +00:00
Rafael Espindola 7230f80e3b Write the section header in the end.
One could make the argument for writing it immediately after the ELF header,
but writing it in the middle of the sections like we were doing just makes
it harder for no reason.

llvm-svn: 234400
2015-04-08 11:41:24 +00:00
Sergey Dmitrouk 3cc62b3715 [ARM][Debug Info] Restore emitting of .cfi_def_cfa_offset for functions without stack frame
Summary: Looks like new code from [[ http://reviews.llvm.org/rL222057 | rL222057 ]] doesn't account for early `return` in `ARMFrameLowering::emitPrologue`, which leads to loosing `.cfi_def_cfa_offset` directive for functions without stack frame.

Reviewers: echristo, rengolin, asl, t.p.northover

Reviewed By: t.p.northover

Subscribers: llvm-commits, rengolin, aemerson

Differential Revision: http://reviews.llvm.org/D8606

llvm-svn: 234399
2015-04-08 10:10:12 +00:00
Toma Tabacu 7567a10c47 [mips] [IAS] Remove AssemblerPredicate's from RelocPIC and RelocStatic.
Summary:
These AssemblerPredicate's are unnecessary and actually make some instructions unusable when assembling pre-MIPS32 ISAs.
For example, this was causing the IAS to reject the 'j' instruction for MIPS I-V.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8300

llvm-svn: 234398
2015-04-08 10:06:45 +00:00
Adam Nemet e09a928c80 Revert "[LoopAccesses] Allow analysis to complete in the presence of uniform stores"
This reverts commit r234361.

It caused PR23157.

llvm-svn: 234387
2015-04-08 04:16:55 +00:00
Tom Stellard d7e6f13671 R600/SI: Initial support for assembler and inline assembly
This is currently considered experimental, but most of the more
commonly used instructions should work.

So far only SI has been extensively tested, CI and VI probably work too,
but may be buggy.  The current set of tests cases do not give complete
coverage, but I think it is sufficient for an experimental assembler.

See the documentation in R600Usage for more information.

llvm-svn: 234381
2015-04-08 01:09:26 +00:00
Tom Stellard 1f3416a63d R600/SI: Don't print offset0/offset1 DS operands when they are 0
llvm-svn: 234379
2015-04-08 01:09:19 +00:00
Tim Northover 5b44f1ba19 AArch64: disallow "fmov sD, #-0.0" during assembly.
We weren't checking the sign of the floating point immediate before translating
it to "fmov sD, wzr". Similarly for D-regs.

Technically "movi vD.2s, #0x80, lsl #24" would work most of the time, but it's
not a blessed alias (and I don't think it should be since people expect writing
sD to zero out the high lanes, and there's no dD equivalent). So an error it is.

rdar://20455398

llvm-svn: 234372
2015-04-07 22:49:47 +00:00
Adam Nemet 0515c33b70 [LoopAccesses] Allow analysis to complete in the presence of uniform stores
Both run-time pointer checking and the dependence analysis are capable
of dealing with uniform addresses. I.e. it's really just an orthogonal
property of the loop that the analysis computes.

Run-time pointer checking will only try to reason about SCEVAddRec
pointers or else gives up. If the uniform pointer turns out the be a
SCEVAddRec in an outer loop, the run-time checks generated will be
correct (start and end bounds would be equal).

In case of the dependence analysis, we work again with SCEVs. When
compared against a loop-dependent address of the same underlying object,
the difference of the two SCEVs won't be constant. This will result in
returning an Unknown dependence for the pair.

When compared against another uniform access, the difference would be
constant and we should return the right type of dependence
(forward/backward/etc).

The changes also adds support to query this property of the loop and
modify the vectorizer to use this.

Patch by Ashutosh Nema!

llvm-svn: 234361
2015-04-07 21:46:16 +00:00
Reid Kleckner f1853c65d9 [WinEH] Fix xdata generation when no catch object is present
The lack of a catch object is indicated by a frame escape index of -1.

Fixes PR23137.

llvm-svn: 234346
2015-04-07 19:46:38 +00:00
Toma Tabacu f25949b588 [mips] [IAS] Allow .set assignments for already defined symbols.
Summary:
This is not possible when using the IAS for MIPS, but it is possible when using the IAS for other architectures and when using GAS for MIPS.


Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8578

llvm-svn: 234316
2015-04-07 13:59:39 +00:00
Rafael Espindola b91455b5c0 Refactor a lot of duplicated code for stub output.
This also moves it earlier so that it they are produced before we print
an end symbol for the data section.

llvm-svn: 234315
2015-04-07 13:42:44 +00:00
Toma Tabacu 3d5ce49ce5 [TableGen] Prevent invalid code generation when emitting AssemblerPredicate conditions.
Summary:
The loop which emits AssemblerPredicate conditions also links them together by emitting a '&&'.
If the 1st predicate is not an AssemblerPredicate, while the 2nd one is, nothing gets emitted for the 1st one, but we still emit the '&&' because of the 2nd predicate.
This generated code looks like "( && Cond2)" and is invalid.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D8294

llvm-svn: 234312
2015-04-07 12:10:11 +00:00
Daniel Jasper c234535e04 Add test showing that MachineLICM is calculating register pressure wrong
More details: http://llvm.org/PR23143

llvm-svn: 234309
2015-04-07 11:41:40 +00:00
Rafael Espindola d58de064b8 Use sext in fast isel.
Fast isel used to zero extends immediates to 64 bits. This normally goes
unnoticed because the value is truncated to 32 bits for output.

Two cases were it is noticed:

* We fail to use smaller encodings.
* If the original constant was smaller than i32.

In the tests using i1 constants, codegen would change to use -1, which is fine
(and matches what regular isel does) since only the lowest bit is then used.

Instead, this patch then changes the ir to use i8 constants, which looks more
like what clang produces.

llvm-svn: 234249
2015-04-06 22:29:07 +00:00
Reid Kleckner b401941f3d [WinEH] Don't sink allocas into child handlers
The uselist isn't enough to infer anything about the lifetime of such
allocas. If we want to re-add this optimization, we will need to
leverage lifetime markers to do it.

Fixes PR23122.

llvm-svn: 234196
2015-04-06 18:50:38 +00:00