Commit Graph

264 Commits

Author SHA1 Message Date
Simon Pilgrim 15c0a59463 [InstCombine][X86][SSE] Replace sign/zero extension intrinsics with native IR
Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code.

Differential Revision: http://reviews.llvm.org/D11503

llvm-svn: 243303
2015-07-27 18:52:15 +00:00
Simon Pilgrim 54fcd62c6f [InstCombine][SSE4A] Standardized references to Length/Width and Index/Start to match AMD docs. NFCI.
llvm-svn: 243226
2015-07-25 20:41:00 +00:00
Philip Reames c25df11614 Reapply 239795 - [InstCombine] Propagate non-null facts to call parameters
The original change broke clang side tests.  I will be submitting those momentarily.  This change includes post commit feedback on the original change from from Pete Cooper.

Original Submission comments:
If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129.

Differential Revision: http://reviews.llvm.org/D9132

llvm-svn: 239849
2015-06-16 20:24:25 +00:00
Philip Reames 1a6305f313 Revert 239795
I forgot to update some clang test cases.  I'll fix and resubmit tomorrow.

llvm-svn: 239800
2015-06-16 01:20:53 +00:00
Philip Reames dfc29fba60 [InstCombine] Propagate non-null facts to call parameters
If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129.

Differential Revision: http://reviews.llvm.org/D9132

llvm-svn: 239795
2015-06-16 00:43:54 +00:00
David Majnemer 1503258157 [InstSimplify] Handle some overflow intrinsics in InstSimplify
This change does a few things:
- Move some InstCombine transforms to InstSimplify
- Run SimplifyCall from within InstCombine::visitCallInst
- Teach InstSimplify to fold [us]mul_with_overflow(X, undef) to 0.

llvm-svn: 237995
2015-05-22 03:56:46 +00:00
Sanjoy Das 89c5491a72 [RewriteStatepointsForGC] Fix a bug on creating gc_relocate for pointer to vector of pointers
Summary:
In RewriteStatepointsForGC pass, we create a gc_relocate intrinsic for
each relocated pointer, and the gc_relocate has the same type with the
pointer. During the creation of gc_relocate intrinsic, llvm requires to
mangle its type. However, llvm does not support mangling of all possible
types. RewriteStatepointsForGC will hit an assertion failure when it
tries to create a gc_relocate for pointer to vector of pointers because
mangling for vector of pointers is not supported.

This patch changes the way RewriteStatepointsForGC pass creates
gc_relocate. For each relocated pointer, we erase the type of pointers
and create an unified gc_relocate of type i8 addrspace(1)*. Then a
bitcast is inserted to convert the gc_relocate to the correct type. In
this way, gc_relocate does not need to deal with different types of
pointers and the unsupported type mangling is no longer a problem. This
change would also ease further merge when LLVM erases types of pointers
and introduces an unified pointer type.

Some minor changes are also introduced to gc_relocate related part in
InstCombineCalls, CodeGenPrepare, and Verifier accordingly.

Patch by Chen Li!

Reviewers: reames, AndyAyers, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9592

llvm-svn: 237009
2015-05-11 18:49:34 +00:00
Hal Finkel f0d68d788b [InstCombine/PowerPC] Fix single-precision QPX load/store replacement
The QPX single-precision load/store intrinsics have implied
truncation/extension from/to the declared value type of <4 x double> to the
memory type of <4 x float>. When we can prove the alignment of the pointer
argument, and thus replace the intrinsic with a regular load or store, we need
to load or store the correct data type (<4 x float>) instead of (<4 x double>).

llvm-svn: 236973
2015-05-11 06:37:03 +00:00
Pete Cooper 2777d88745 Change typeIncompatible to return an AttrBuilder instead of new-ing an AttributeSet.
This makes use of the new API which can remove attributes from a set given a builder.

This is much faster than creating a temporary set and reduces llc time by about 0.3% which was all spent creating temporary attributes sets on the context.

llvm-svn: 236668
2015-05-06 23:19:56 +00:00
Sanjoy Das 499d703f52 [Statepoint] Clean up Statepoint.h: accessor names.
Use getFoo() as accessors consistently and some other naming changes.

llvm-svn: 236564
2015-05-06 02:36:26 +00:00
Sanjay Patel c1d20a36fb [x86] instcombine more cases of insertps into a shufflevector
This is a follow-on to D8833 (insertps optimization when the zero mask is not used).

In this patch, we check for the case where the zmask is used, but both input vectors
to the insertps intrinsic are the same operand or the zmask overrides the destination
lane. This lets us replace the 2nd shuffle input operand with the zero vector.

Differential Revision: http://reviews.llvm.org/D9257

llvm-svn: 235810
2015-04-25 20:55:25 +00:00
Philip Reames 5461d45abf Move Value.isDereferenceablePointer to ValueTracking [NFC]
Move isDereferenceablePointer function to Analysis. This function recursively tracks dereferencability over a chain of values like other functions in ValueTracking.

This refactoring is motivated by further changes to support dereferenceable_or_null attribute (http://reviews.llvm.org/D8650). isDereferenceablePointer will be extended to perform context-sensitive analysis and IR is not a good place to have such functionality.

Patch by: Artur Pilipenko <apilipenko@azulsystems.com>
Differential Revision: reviews.llvm.org/D9075

llvm-svn: 235611
2015-04-23 17:36:48 +00:00
Sanjay Patel c86867cd5f [X86, SSE] instcombine common cases of insertps intrinsics into shuffles
This is very similar to D8486 / r232852 (vperm2). If we treat insertps intrinsics
as shufflevectors, we can optimize them better.

I've left all but the full zero case of the zero mask variants out of this patch. 
I don't think those can be converted into a single shuffle in all cases, but I'd
be happy to be proven wrong as I was for vperm2f128.

Either way, we'd need to support whatever sequence we come up with for those cases
in the backend before converting them here.

Differential Revision: http://reviews.llvm.org/D8833

llvm-svn: 235124
2015-04-16 17:52:13 +00:00
Nick Lewycky d6f241d53b GCC complains thusly: "attributes at the beginning of statement are ignored [-Werror=attributes]". Very well then! NFC
llvm-svn: 234788
2015-04-13 20:03:08 +00:00
Nick Lewycky abe2cc17da Subtraction is not commutative. Fixes PR23212!
llvm-svn: 234780
2015-04-13 19:17:37 +00:00
Sanjoy Das b098447128 [InstCombine] Refactor out OptimizeOverflowCheck. NFCI.
Summary:
This patch adds an enum `OverflowCheckFlavor` and a function
`OptimizeOverflowCheck`.  This will allow InstCombine to optimize
overflow checks without directly introducing an intermediate call to the
`llvm.$op.with.overflow` instrinsics.

This specific change is a refactoring and does not intend to change
behavior.

Reviewers: majnemer, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8888

llvm-svn: 234388
2015-04-08 04:27:22 +00:00
Sanjay Patel e304bea010 optimize the AVX2 (integer) version of vperm2 into a shuffle
...because this is what happens when an instruction
set puts its underwear on after its pants.

This is an extension of r232852, r233100, and 233110:
http://llvm.org/viewvc/llvm-project?view=revision&revision=232852
http://llvm.org/viewvc/llvm-project?view=revision&revision=233100
http://llvm.org/viewvc/llvm-project?view=revision&revision=233110

llvm-svn: 233127
2015-03-24 22:39:29 +00:00
Sanjay Patel 43a87fdc79 [X86, AVX] instcombine vperm2 intrinsics with zero inputs into shuffles
This is the IR optimizer follow-on patch for D8563: the x86 backend patch
that converts this kind of shuffle back into a vperm2.

This is also a continuation of the transform that started in D8486. 
In that patch, Andrea suggested that we could convert vperm2 intrinsics that
use zero masks into a single shuffle. 

This is an implementation of that suggestion.

Differential Revision: http://reviews.llvm.org/D8567

llvm-svn: 233110
2015-03-24 20:36:42 +00:00
Sanjay Patel ccf5f24b7b [X86, AVX] instcombine common cases of vperm2* intrinsics into shuffles
vperm2* intrinsics are just shuffles. 
In a few special cases, they're not even shuffles.

Optimizing intrinsics in InstCombine is better than
handling this in the front-end for at least two reasons:

1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders
   really angry (and so I have regrets about some patches from last week).

2. Doing mask conversion logic in header files is hard to write and 
   subsequently read.

There are a couple of TODOs in this patch to complete this optimization.

Differential Revision: http://reviews.llvm.org/D8486

llvm-svn: 232852
2015-03-20 21:47:56 +00:00
David Majnemer d61a6fd8ed InstCombine: Don't fold call bitcast into args if callee is byval
This fixes a bug reported here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150309/265341.html

llvm-svn: 231948
2015-03-11 18:03:05 +00:00
Mehdi Amini a28d91d81b DataLayout is mandatory, update the API to reflect it with references.
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.

This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.

I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.

I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.

Test Plan:

Reviewers: echristo

Subscribers: llvm-commits

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
2015-03-10 02:37:25 +00:00
Hal Finkel 221f467185 [InstCombine/PowerPC] Convert aligned QPX load/store intrinsics into loads/stores
InstCombine has long had logic to convert aligned Altivec load/store intrinsics
into regular loads and stores. This mirrors that functionality for QPX vector
load/store intrinsics.

llvm-svn: 230660
2015-02-26 18:56:03 +00:00
Ramkumar Ramachandra 8fcb498a9a InstCombine: propagate deref via new addDereferenceableAttr
The "dereferenceable" attribute cannot be added via .addAttribute(),
since it also expects a size in bytes. AttrBuilder#addAttribute or
AttributeSet#addAttribute is wrapped by classes Function, InvokeInst,
and CallInst. Add corresponding wrappers to
AttrBuilder#addDereferenceableAttr.

Having done this, propagate the dereferenceable attribute via
gc.relocate, adding a test to exercise it. Note that -datalayout is
required during execution over and above -instcombine, because
InstCombine only optionally requires DataLayoutPass.

Differential Revision: http://reviews.llvm.org/D7510

llvm-svn: 229265
2015-02-14 19:37:54 +00:00
Chandler Carruth 2496910325 Revert r228556: InstCombine: propagate nonNull through assume
This commit isn't using the correct context, and is transfoming calls
that are operands to loads rather than calls that are operands to an
icmp feeding into an assume. I've replied on the original review thread
with a very reduced test case and some thoughts on how to rework this.

llvm-svn: 228677
2015-02-10 08:07:32 +00:00
Ramkumar Ramachandra a021ee62ca InstCombine: propagate nonNull through assume
Make assume (load (call|invoke) != null) set nonNull return attribute
for the call and invoke. Also include tests.

Differential Revision: http://reviews.llvm.org/D7107

llvm-svn: 228556
2015-02-09 01:13:13 +00:00
Chandler Carruth a917458203 [PM] Rename InstCombine.h to InstCombineInternal.h in preparation for
creating a non-internal header file for the InstCombine pass.

I thought about calling this InstCombiner.h or in some way more clearly
associating it with the InstCombiner clas that it is primarily defining,
but there are several other utility interfaces defined within this for
InstCombine. If, in the course of refactoring, those end up moving
elsewhere or going away, it might make more sense to make this the
combiner's header alone.

Naturally, this is a bikeshed to a certain degree, so feel free to lobby
for a different shade of paint if this name just doesn't suit you.

llvm-svn: 226783
2015-01-22 05:25:13 +00:00
David Majnemer 4c0a6e918a InstCombine: Don't strip bitcasts off of callsites marked 'thunk'
The return type of a thunk is meaningless, we just want the arguments
and return value to be forwarded.

llvm-svn: 226708
2015-01-21 22:32:04 +00:00
Chandler Carruth ba4c5179a0 [PM] Simplify (ha! ha!) the way that instcombine calls the
SimplifyLibCalls utility by sinking it into the specific call part of
the combiner.

This will avoid us needing to do any contortions to build this object in
a subsequent refactoring I'm doing and seems generally better factored.
We don't need this utility everywhere and it carries no interesting
state so we might as well build it on demand.

llvm-svn: 226654
2015-01-21 11:23:40 +00:00
David Majnemer 5310c1e954 Analysis: Reformulate WillNotOverflowUnsignedAdd for reusability
WillNotOverflowUnsignedAdd's smarts will live in ValueTracking as
computeOverflowForUnsignedAdd.  It now returns a tri-state result:
never overflows, always overflows and sometimes overflows.

llvm-svn: 225329
2015-01-07 00:39:50 +00:00
David Majnemer 3b83b3fa0b InstCombine: Just a small tidy-up
llvm-svn: 225328
2015-01-07 00:39:42 +00:00
David Majnemer 9b6b822814 InstCombine: Bitcast call arguments from/to pointer/integer type
Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing
arguments and return values when DataLayout says it is safe to do so.

llvm-svn: 225254
2015-01-06 08:41:31 +00:00
Chandler Carruth 66b3130cda [PM] Split the AssumptionTracker immutable pass into two separate APIs:
a cache of assumptions for a single function, and an immutable pass that
manages those caches.

The motivation for this change is two fold. Immutable analyses are
really hacks around the current pass manager design and don't exist in
the new design. This is usually OK, but it requires that the core logic
of an immutable pass be reasonably partitioned off from the pass logic.
This change does precisely that. As a consequence it also paves the way
for the *many* utility functions that deal in the assumptions to live in
both pass manager worlds by creating an separate non-pass object with
its own independent API that they all rely on. Now, the only bits of the
system that deal with the actual pass mechanics are those that actually
need to deal with the pass mechanics.

Once this separation is made, several simplifications become pretty
obvious in the assumption cache itself. Rather than using a set and
callback value handles, it can just be a vector of weak value handles.
The callers can easily skip the handles that are null, and eventually we
can wrap all of this up behind a filter iterator.

For now, this adds boiler plate to the various passes, but this kind of
boiler plate will end up making it possible to port these passes to the
new pass manager, and so it will end up factored away pretty reasonably.

llvm-svn: 225131
2015-01-04 12:03:27 +00:00
David Majnemer c8a576b5c0 InstCombine: Detect when llvm.umul.with.overflow always overflows
We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero
and LHSKnownOne * RHSKnownOne overflow.

llvm-svn: 225077
2015-01-02 07:29:47 +00:00
David Majnemer 491331aca8 Analysis: Reformulate WillNotOverflowUnsignedMul for reusability
WillNotOverflowUnsignedMul's smarts will live in ValueTracking as
computeOverflowForUnsignedMul.  It now returns a tri-state result:
never overflows, always overflows and sometimes overflows.

llvm-svn: 225076
2015-01-02 07:29:43 +00:00
Philip Reames 9db26ffc9a Carry facts about nullness and undef across GC relocation
This change implements four basic optimizations:

    If a relocated value isn't used, it doesn't need to be relocated.
    If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.)
    If the value being relocated is undef, the relocation is meaningless.
    If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.)

I outlined other planned work in comments.

Differential Revision: http://reviews.llvm.org/D6600

llvm-svn: 224968
2014-12-29 23:27:30 +00:00
Erik Eckstein a451b9b0b5 Strength reduce intrinsics with overflow into regular arithmetic operations if possible.
Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced.
This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow.
It completes the work on PR20194.

llvm-svn: 224417
2014-12-17 07:29:19 +00:00
Andrea Di Biagio 72b05aa59c [InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.
This patch teaches the instruction combiner how to fold a call to 'insertqi' if
the 'length field' (3rd operand) is set to zero, and if the sum between
field 'length' and 'bit index' (4th operand) is bigger than 64.

From the AMD64 Architecture Programmer's Manual:
1. If the sum of the bit index + length field is greater than 64, then the
   results are undefined;
2. A value of zero in the field length is defined as a length of 64.

This patch improves the existing combining logic for intrinsic 'insertqi'
adding extra checks to address both point 1. and point 2.

Differential Revision: http://reviews.llvm.org/D6583

llvm-svn: 224054
2014-12-11 20:44:59 +00:00
Erik Eckstein 096ff7dcd6 Refactor creation of overflow result tuples in InstCombineCalls.
Extract the creation of overflow result tuples in a separate function. NFC.

llvm-svn: 224006
2014-12-11 08:02:30 +00:00
Duncan P. N. Exon Smith 5bf8fef580 IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532.  Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.

I have a follow-up patch prepared for `clang`.  If this breaks other
sub-projects, I apologize in advance :(.  Help me compile it on Darwin
I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.

This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.

Here's a quick guide for updating your code:

  - `Metadata` is the root of a class hierarchy with three main classes:
    `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
    the `Value` class hierarchy.  It is typeless -- i.e., instances do
    *not* have a `Type`.

  - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).

  - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
    replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.

    If you're referring solely to resolved `MDNode`s -- post graph
    construction -- just use `MDNode*`.

  - `MDNode` (and the rest of `Metadata`) have only limited support for
    `replaceAllUsesWith()`.

    As long as an `MDNode` is pointing at a forward declaration -- the
    result of `MDNode::getTemporary()` -- it maintains a side map of its
    uses and can RAUW itself.  Once the forward declarations are fully
    resolved RAUW support is dropped on the ground.  This means that
    uniquing collisions on changing operands cause nodes to become
    "distinct".  (This already happened fairly commonly, whenever an
    operand went to null.)

    If you're constructing complex (non self-reference) `MDNode` cycles,
    you need to call `MDNode::resolveCycles()` on each node (or on a
    top-level node that somehow references all of the nodes).  Also,
    don't do that.  Metadata cycles (and the RAUW machinery needed to
    construct them) are expensive.

  - An `MDNode` can only refer to a `Constant` through a bridge called
    `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).

    As a side effect, accessing an operand of an `MDNode` that is known
    to be, e.g., `ConstantInt`, takes three steps: first, cast from
    `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
    third, cast down to `ConstantInt`.

    The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
    metadata schema owners transition away from using `Constant`s when
    the type isn't important (and they don't care about referring to
    `GlobalValue`s).

    In the meantime, I've added transitional API to the `mdconst`
    namespace that matches semantics with the old code, in order to
    avoid adding the error-prone three-step equivalent to every call
    site.  If your old code was:

        MDNode *N = foo();
        bar(isa             <ConstantInt>(N->getOperand(0)));
        baz(cast            <ConstantInt>(N->getOperand(1)));
        bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
        bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));

    you can trivially match its semantics with:

        MDNode *N = foo();
        bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
        baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
        bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
        bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
        bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));

    and when you transition your metadata schema to `MDInt`:

        MDNode *N = foo();
        bar(isa             <MDInt>(N->getOperand(0)));
        baz(cast            <MDInt>(N->getOperand(1)));
        bak(cast_or_null    <MDInt>(N->getOperand(2)));
        bat(dyn_cast        <MDInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));

  - A `CallInst` -- specifically, intrinsic instructions -- can refer to
    metadata through a bridge called `MetadataAsValue`.  This is a
    subclass of `Value` where `getType()->isMetadataTy()`.

    `MetadataAsValue` is the *only* class that can legally refer to a
    `LocalAsMetadata`, which is a bridged form of non-`Constant` values
    like `Argument` and `Instruction`.  It can also refer to any other
    `Metadata` subclass.

(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)

llvm-svn: 223802
2014-12-09 18:38:53 +00:00
Philip Reames 1a1bdb22bf [Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder
This is the third patch in a small series.  It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085).  The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.  

With this change, gc.statepoints should be functionally complete.  The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.

I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated.  The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.  

During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints.  Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack.  The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.  

In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator.  In principal, we shouldn't need to eagerly spill at all.  The register allocator should do any spilling required and the statepoint should simply record that fact.  Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.  

Reviewed by: atrick, ributzka

llvm-svn: 223137
2014-12-02 18:50:36 +00:00
Bill Schmidt 729547847f [PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics
This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for
PowerPC, which provide programmer access to the lxvd2x, lxvw4x,
stxvd2x, and stxvw4x instructions.

New LLVM intrinsics are provided to represent these four instructions
in IntrinsicsPowerPC.td.  These are patterned after the similar
intrinsics for lvx and stvx (Altivec).  In PPCInstrVSX.td, these
intrinsics are tied to the code gen patterns, with additional patterns
to allow plain vanilla loads and stores to still generate these
instructions.

At -O1 and higher the intrinsics are immediately converted to loads
and stores in InstCombineCalls.cpp.  This will open up more
optimization opportunities while still allowing the correct
instructions to be generated.  (Similar code exists for aligned
Altivec loads and stores.)

The new intrinsics are added to the code that checks for consecutive
loads and stores in PPCISelLowering.cpp, as well as to
PPCTargetLowering::getTgtMemIntrinsic().

There's a new test to verify the correct instructions are generated.
The loads and stores tend to be reordered, so the test just counts
their number.  It runs at -O2, as it's not very effective to test this
at -O0, when many unnecessary loads and stores are generated.

I ended up having to modify vsx-fma-m.ll.  It turns out this test case
is slightly unreliable, but I don't know a good way to prevent
problems with it.  The xvmaddmdp instructions read and write the same
register, which is one of the multiplicands.  Commutativity allows
either to be chosen.  If the FMAs are reordered differently than
expected by the test, the register assignment can be different as a
result.  Hopefully this doesn't change often.

There is a companion patch for Clang.

llvm-svn: 221767
2014-11-12 04:19:40 +00:00
Philip Reames 66c6de61ee Canonicalize an assume(load != null) into !nonnull metadata
We currently have two ways of informing the optimizer that the result of a load is never null: metadata and assume. This change converts the second in to the former. This avoids a need to implement optimizations using both forms.

We should probably extend this basic idea to metadata of other forms; in particular, range metadata. We view is that assumes should be considered a "last resort" for when there isn't a more canonical way to represent something.

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5951

llvm-svn: 221737
2014-11-11 23:33:19 +00:00
Duncan P. N. Exon Smith de36e8040f Revert "IR: MDNode => Value"
Instead, we're going to separate metadata from the Value hierarchy.  See
PR21532.

This reverts commit r221375.
This reverts commit r221373.
This reverts commit r221359.
This reverts commit r221167.
This reverts commit r221027.
This reverts commit r221024.
This reverts commit r221023.
This reverts commit r220995.
This reverts commit r220994.

llvm-svn: 221711
2014-11-11 21:30:22 +00:00
Duncan P. N. Exon Smith 3872d0084c IR: MDNode => Value: Instruction::getMetadata()
Change `Instruction::getMetadata()` to return `Value` as part of
PR21433.

Update most callers to use `Instruction::getMDNode()`, which wraps the
result in a `cast_or_null<MDNode>`.

llvm-svn: 221024
2014-11-01 00:10:31 +00:00
Frederic Riss c1892e2d48 Assert that ValueHandleBase::ValueIsRAUWd doesn't change the tracked Value type.
This invariant is enforced in Value::replaceAllUsesWith, thus it seems
logical to apply it also to ValueHandles. This commit fixes InstCombine
to not trigger the assertion during the removal of constant bitcasts in
call instructions.

Differential Revision: http://reviews.llvm.org/D5828

llvm-svn: 220468
2014-10-23 04:08:42 +00:00
Matt Arsenault d6511b49ac Add minnum / maxnum intrinsics
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.

llvm-svn: 220341
2014-10-21 23:00:20 +00:00
Hal Finkel 4564688806 [InstCombine] Simplify the logic from r219067 using ValueTracking
Joerg suggested on IRC that I look at generalizing the logic from r219067 to
handle more general redundancies (like removing an assume(x > 3) dominated by
an assume(x > 5)). The way to do this would be to ask ValueTracking to
determine the value of the i1 argument. It turns out that ValueTracking is not
very good at this right now (although it does get the trivial redundancy case)
because it does not understand ICmps. Nevertheless, the resulting code in
InstCombine is simpler than r219067, so we might as well do it now.

llvm-svn: 219070
2014-10-05 00:53:02 +00:00
Hal Finkel 04a156139e [InstCombine] Remove redundant @llvm.assume intrinsics
For any @llvm.assume intrinsic, if there is another which dominates it and uses
the same condition, then it is redundant and can be removed. While this does
not alter the semantics of the @llvm.assume intrinsics, it makes subsequent
handling more efficient (and the resulting IR easier to read).

llvm-svn: 219067
2014-10-04 21:27:06 +00:00
Hal Finkel 60db05896a Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.)
This change, which allows @llvm.assume to be used from within computeKnownBits
(and other associated functions in ValueTracking), adds some (optional)
parameters to computeKnownBits and friends. These functions now (optionally)
take a "context" instruction pointer, an AssumptionTracker pointer, and also a
DomTree pointer, and most of the changes are just to pass this new information
when it is easily available from InstSimplify, InstCombine, etc.

As explained below, the significant conceptual change is that known properties
of a value might depend on the control-flow location of the use (because we
care that the @llvm.assume dominates the use because assumptions have
control-flow dependencies). This means that, when we ask if bits are known in a
value, we might get different answers for different uses.

The significant changes are all in ValueTracking. Two main changes: First, as
with the rest of the code, new parameters need to be passed around. To make
this easier, I grouped them into a structure, and I made internal static
versions of the relevant functions that take this structure as a parameter. The
new code does as you might expect, it looks for @llvm.assume calls that make
use of the value we're trying to learn something about (often indirectly),
attempts to pattern match that expression, and uses the result if successful.
By making use of the AssumptionTracker, the process of finding @llvm.assume
calls is not expensive.

Part of the structure being passed around inside ValueTracking is a set of
already-considered @llvm.assume calls. This is to prevent a query using, for
example, the assume(a == b), to recurse on itself. The context and DT params
are used to find applicable assumptions. An assumption needs to dominate the
context instruction, or come after it deterministically. In this latter case we
only handle the specific case where both the assumption and the context
instruction are in the same block, and we need to exclude assumptions from
being used to simplify their own ephemeral values (those which contribute only
to the assumption) because otherwise the assumption would prove its feeding
comparison trivial and would be removed.

This commit adds the plumbing and the logic for a simple masked-bit propagation
(just enough to write a regression test). Future commits add more patterns
(and, correspondingly, more regression tests).

llvm-svn: 217342
2014-09-07 18:57:58 +00:00
Hal Finkel 74c2f355d2 Add an Assumption-Tracking Pass
This adds an immutable pass, AssumptionTracker, which keeps a cache of
@llvm.assume call instructions within a module. It uses callback value handles
to keep stale functions and intrinsics out of the map, and it relies on any
code that creates new @llvm.assume calls to notify it of the new instructions.
The benefit is that code needing to find @llvm.assume intrinsics can do so
directly, without scanning the function, thus allowing the cost of @llvm.assume
handling to be negligible when none are present.

The current design is intended to be lightweight. We don't keep track of
anything until we need a list of assumptions in some function. The first time
this happens, we scan the function. After that, we add/remove @llvm.assume
calls from the cache in response to registration calls and ValueHandle
callbacks.

There are no new direct test cases for this pass, but because it calls it
validation function upon module finalization, we'll pick up detectable
inconsistencies from the other tests that touch @llvm.assume calls.

This pass will be used by follow-up commits that make use of @llvm.assume.

llvm-svn: 217334
2014-09-07 12:44:26 +00:00