Commit Graph

17752 Commits

Author SHA1 Message Date
Reid Kleckner 04b69f89aa Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type promotion."
This reverts commit r224351. It causes assertion failures when building
ICU.

llvm-svn: 224397
2014-12-17 00:29:23 +00:00
Hans Wennborg 224cb82a39 SelectionDAG switch lowering: use 'unsigned' to count destination popularity
SwitchInst::getNumCases() returns unsinged, so using uint64_t to count cases
seems unnecessary.

Also fix a missing CHECK in the test case.

llvm-svn: 224393
2014-12-16 23:41:59 +00:00
Sanjay Patel 7129c10cae merge consecutive loads that are offset from a base address
SelectionDAG::isConsecutiveLoad() was not detecting consecutive loads
when the first load was offset from a base address. 

This patch recognizes that pattern and subtracts the offset before comparing
the second load to see if it is consecutive.

The codegen change in the new test case improves from:

vmovsd	32(%rdi), %xmm0
vmovsd	48(%rdi), %xmm1 
vmovhpd	56(%rdi), %xmm1, %xmm1
vmovhpd	40(%rdi), %xmm0, %xmm0
vinsertf128	$1, %xmm1, %ymm0, %ymm0

To:

vmovups	32(%rdi), %ymm0

An existing test case is also improved from:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovsd	24(%rdi), %xmm2
vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
vmovhpd	8(%rdi), %xmm1, %xmm3

To:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovhpd	24(%rdi), %xmm0, %xmm0
vmovhpd	8(%rdi), %xmm1, %xmm1

This patch fixes PR21771 ( http://llvm.org/bugs/show_bug.cgi?id=21771 ).

Differential Revision: http://reviews.llvm.org/D6642

llvm-svn: 224379
2014-12-16 21:57:18 +00:00
Matt Arsenault dd3b77d64c Move lowerConstant to AsmPrinter
This was a static function before, and NVPTX duplicated it
because it wasn't exposed.

llvm-svn: 224354
2014-12-16 19:16:14 +00:00
Quentin Colombet d5e57b731f [CodeGenPrepare] Move sign/zero extensions near loads using type promotion.
This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>

llvm-svn: 224351
2014-12-16 19:09:03 +00:00
Aaron Ballman 0d6a010c13 Fixing -Wsign-compare warnings; NFC.
llvm-svn: 224337
2014-12-16 14:04:11 +00:00
Matthias Braun 1aed6ffa35 LiveRangeCalc: Rewrite subrange calculation
This changes subrange calculation to calculate subranges sequentially
instead of in parallel. The code is easier to understand that way and
addresses the code review issues raised about LiveOutData being
hard to understand/needing more comments by removing them :)

llvm-svn: 224313
2014-12-16 04:03:38 +00:00
Adrian Prantl b9fa945d51 ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.

llvm-svn: 224294
2014-12-16 00:20:49 +00:00
Matthias Braun c3a72c2e5f Revert "LiveRangeCalc: Rewrite subrange calculation"
Revert until I find out why non-subreg enabled targets break.

This reverts commit 6097277eefb9c5fb35a7f493c783ee1fd1b9d6a7.

llvm-svn: 224278
2014-12-15 21:36:35 +00:00
Matthias Braun 0352201e15 LiveRangeCalc: Rewrite subrange calculation
This changes subrange calculation to calculate subranges sequentially
instead of in parallel. The code is easier to understand that way and
addresses the code review issues raised about LiveOutData being
hard to understand/needing more comments by removing them :)

llvm-svn: 224272
2014-12-15 21:16:21 +00:00
Matthias Braun 42fab34ffb LiveRangeCalc: use more range based for loops; NFC
llvm-svn: 224263
2014-12-15 19:40:46 +00:00
Michael Ilseman addddc441f Silence more static analyzer warnings.
Add in definedness checks for shift operators, null checks when
pointers are assumed by the code to be non-null, and explicit
unreachables.

llvm-svn: 224255
2014-12-15 18:48:43 +00:00
Akira Hatanaka 7ba78302b5 Rename argument strings of codegen passes to avoid collisions with command line
options.

This commit changes the command line arguments (PassInfo::PassArgument) of two
passes, MachineFunctionPrinter and MachineScheduler, to avoid collisions with
command line options that have the same argument strings.

This bug manifests when the PassList construct (defined in opt.cpp) is used
in a tool that links with codegen passes. To reproduce the bug, paste the
following lines into llc.cpp and run llc.

#include "llvm/IR/LegacyPassNameParser.h"
static llvm:🆑:list<const llvm::PassInfo*, bool, llvm::PassNameParser>
PassList(llvm:🆑:desc("Optimizations available:"));

rdar://problem/19212448

llvm-svn: 224186
2014-12-13 04:52:04 +00:00
Andrea Di Biagio d65fd9facd Reapply "[MachineScheduler] Fix for PR21807: minor code difference building with/without -g."
This reapplies r224118 with a fix for test 'misched-code-difference-with-debug.ll'.
That test was failing on some buildbots because it was x86 specific but it was
missing a target triple.
Added an explicit triple to test misched-code-difference-with-debug.ll.

llvm-svn: 224126
2014-12-12 15:09:58 +00:00
Andrea Di Biagio 5634a54efc Revert: [MachineScheduler] Fix for PR21807: minor code difference building with/without -g.
Test 'misched-code-difference-with-debug.ll' was failing on some buildbots.

llvm-svn: 224121
2014-12-12 13:34:03 +00:00
Andrea Di Biagio 01236e3eca [MachineScheduler] Fix for PR21807: minor code difference building with/without -g.
This patch fixes the issue reported as PR21807. There was a minor difference
in the generated code depending on the -g flag.

The cause was that with -g the machine scheduler used a different
scheduling strategy. This decision was based on the number of instructions
in a schedule region and included debug instructions in that count.

This patch fixes the issue in MISched and provides a test.

Patch by Russell Gallop!

llvm-svn: 224118
2014-12-12 12:41:22 +00:00
Ekaterina Romanova 90ff20d8f5 A fix for PR21176.
DW_OP_const <const> doesn't describe a constant value, but a value at a constant address. 
The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value. 
Added DW_OP_stack_value to the stack. 

Marked incorrect-variable-debugloc1.ll to xfail for PowerPC64, while the the failure (PR21881) 
is being investigated. 

llvm-svn: 224098
2014-12-12 05:11:47 +00:00
Philip Reames 60de8b29f7 Comment and minor code cleanup for GCStrategy (NFC)
Updating comments to reflect the current state of the world after my recent changes to ownership structure and generally better describe what a GCStrategy is and how it works.

llvm-svn: 224086
2014-12-12 00:49:03 +00:00
Matt Arsenault 810cb62962 Add target hook for whether it is profitable to reduce load widths
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.

llvm-svn: 224084
2014-12-12 00:00:24 +00:00
Duncan P. N. Exon Smith d6f8e4b03c CodeGen: Stop using LeakDetector for MachineInstr
Since `MachineInstr` is required to have a trivial destructor, it cannot
remove itself from `LeakDetection`.  Remove the calls.

As it happens, this requirement is because `MachineFunction` allocates
all `MachineInstr`s in a custom allocator; when the `MachineFunction` is
destroyed they're dropped of the edge.  There's no benefit to detecting
leaks.

llvm-svn: 224061
2014-12-11 21:51:37 +00:00
Matthias Braun 7e37a5f523 [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

This is the 2nd attempt at this after realizing that PassManager::add() may
actually delete the pass.

llvm-svn: 224059
2014-12-11 21:26:47 +00:00
Rafael Espindola 01c73610d0 This reverts commit r224043 and r224042.
check-llvm was failing.

llvm-svn: 224045
2014-12-11 20:03:57 +00:00
Matthias Braun a7c82a9f1d [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

llvm-svn: 224042
2014-12-11 19:42:05 +00:00
Matthias Braun a4e932db16 [CodeGen] Let MachineVerifierPass own its banner string
llvm-svn: 224041
2014-12-11 19:41:51 +00:00
Patrik Hagglund cb06a36c9a Bugfix in InlineSpiller::traceSiblingValue().
Properly determine whether or not a phi was added by splitting.
Check against the current VNInfo of OrigLI instead of against the
OrigVNI argument.

Patch provided by Jonas Paulsson. Reviewed by Quentin Colombet.

llvm-svn: 224009
2014-12-11 10:40:17 +00:00
Ekaterina Romanova 75fd123967 Reverting commit 223981, because the test that I added (incorrect-variable-debugloc1.ll) failed for llvm-ppc64.
The test is failing for llvm-ppc64 because for this platform the location list is not being generated at all (most likely because of the bug in PPC code optimization or generation). I will file a bug agains PPC compiler, but meanwhile, until PPC bug is fixed, I will have to revert my change.  

llvm-svn: 224000
2014-12-11 06:22:35 +00:00
Philip Reames 1e30897497 GCStrategy should not own GCFunctionInfo
This change moves the ownership and access of GCFunctionInfo (the object which describes the safepoints associated with a safepoint under GCRoot) to GCModuleInfo. Previously, this was owned by GCStrategy which was in turned owned by GCModuleInfo. This made GCStrategy module specific which is 'surprising' given it's name and other purposes.

There's a few more changes needed, but we're getting towards the point we can reuse GCStrategy for gc.statepoint as well.

p.s. The style of this code ends up being a mess. I was trying to move code around without otherwise changing much. Once I get the ownership structure rearranged, I will go through and fixup spacing, naming, comments etc.

Differential Revision: http://reviews.llvm.org/D6587

llvm-svn: 223994
2014-12-11 01:47:23 +00:00
Matthias Braun 09afa1ea74 LiveInterval: Use range based for loops for subregister ranges.
llvm-svn: 223991
2014-12-11 00:59:06 +00:00
Ekaterina Romanova ceeaba7932 A fix for PR21176.
DW_OP_const <const> doesn't describe a constant value, but a value at a constant address.
The proper way to describe a constant value is DW_OP_constu <const>, DW_OP_stack_value.

Added DW_OP_stack_value to the stack.

-This line, and those below, will be ignored--

M    lib/CodeGen/AsmPrinter/DwarfDebug.cpp
A    test/DebugInfo/incorrect-variable-debugloc1.ll

llvm-svn: 223981
2014-12-10 23:19:56 +00:00
Matthias Braun 96761959d4 LiveInterval: Use more range based for loops for value numbers and segments.
llvm-svn: 223978
2014-12-10 23:07:54 +00:00
Aaron Ballman e5a2a0c9a8 Silencing a -Wsequence-point warning, and the resulting undefined behavior. NFC.
llvm-svn: 223926
2014-12-10 14:14:54 +00:00
Matthias Braun 96d7732b08 MachineVerifier: Allow physreg use if just a subreg is defined.
We can't mark partially undefined registers, so we have to allow reading
a register in the machine verifier if just parts of a register are
defined.

llvm-svn: 223896
2014-12-10 01:13:13 +00:00
Matthias Braun 21554d9b30 MachineVerifier: Allow LiveInterval segments to end at a partial write.
In the subregister liveness tracking case we do not create implicit
reads on partial register writes anymore, still we need to produce a new
SSA value for partial writes so the live segment has to end.

llvm-svn: 223895
2014-12-10 01:13:11 +00:00
Matthias Braun 279f83645c VirtRegMap: Improve block live-in info if subregister liveness is available.
llvm-svn: 223894
2014-12-10 01:13:08 +00:00
Matthias Braun d70caaf5a5 VirtRegMap: No implicit defs/uses for super registers with subreg liveness tracking.
Adding the implicit defs/uses to the superregisters is semantically questionable
but was not dangerous before as the register allocator never assigned the same
register to two overlapping LiveIntervals even when the actually live
subregisters do not overlap. With subregister liveness tracking enabled this
does actually happen and leads to subsequent bugs if we don't stop adding
the superregister defs/uses.

llvm-svn: 223892
2014-12-10 01:13:04 +00:00
Matthias Braun 587e27415d LiveRegMatrix: Respect subregister liveness when allocating registers.
llvm-svn: 223891
2014-12-10 01:13:01 +00:00
Matthias Braun a0f0c1f013 LiveIntervalUnion: Allow specification of liverange when unifying/extracting.
This allows it to add subregister ranges into the union.

llvm-svn: 223890
2014-12-10 01:12:59 +00:00
Matthias Braun 14f764c872 RegisterCoalescer: Preserve subregister liveranges.
llvm-svn: 223888
2014-12-10 01:12:52 +00:00
Matthias Braun 2079aa9140 LiveInterval: Add removeEmptySubRanges().
llvm-svn: 223887
2014-12-10 01:12:40 +00:00
Matthias Braun 8970d847c4 LiveIntervalAnalysis: Add subregister aware variants pruneValue().
llvm-svn: 223886
2014-12-10 01:12:36 +00:00
Matthias Braun e3d3b88cb9 Add a flag to enable/disable subregister liveness.
llvm-svn: 223884
2014-12-10 01:12:30 +00:00
Matthias Braun e5f861b781 LiveIntervalAnalysis: Adapt repairIntervalsInRange() to subregister liveness.
llvm-svn: 223883
2014-12-10 01:12:26 +00:00
Matthias Braun fe896c703c LiveRangeEdit: Adapt eliminateDeadDef() to subregister liveness.
llvm-svn: 223882
2014-12-10 01:12:23 +00:00
Matthias Braun 7044d69e87 LiveIntervalAnalysis: Adapt handleMove() to subregister ranges.
llvm-svn: 223881
2014-12-10 01:12:20 +00:00
Matthias Braun 20e1f38a41 LiveIntervalAnalysis: Update SubRanges in shrinkToUses().
llvm-svn: 223880
2014-12-10 01:12:18 +00:00
Matthias Braun 2f66232bde LiveIntervalAnalysis: Compute subregister ranges.
llvm-svn: 223878
2014-12-10 01:12:12 +00:00
Matthias Braun 3f1d8fdd33 LiveInterval: Add support to track liveness of subregisters.
This code adds the required data structures. Algorithms to compute it follow.

llvm-svn: 223877
2014-12-10 01:12:10 +00:00
Matthias Braun e62c207092 LiveInterval: Add a 'covers' operation to LiveRange.
llvm-svn: 223876
2014-12-10 01:12:06 +00:00
Philip Reames de226055ca Remove the Module pointer from GCStrategy and GCMetadataPrinter
In the current implementation, GCStrategy is a part of the ownership structure for the gc metadata which describes a Module. It also contains a reference to the module in question. As a result, GCStrategy instances are essentially Module specific.

I plan to transition away from this design. Instead, a GCStrategy will be owned by the LLVMContext. It will be a lightweight policy object which contains no information about the Modules or Functions involved, but can be easily reached given a Function.

The first step in this transition is to remove the direct Module reference from GCStrategy. This also requires removing the single user of this reference, the GCMetadataPrinter hierarchy. In theory, this will allow the lifetime of the printers to be scoped to the LLVMContext as well, but in practice, I'm not actually changing that. (Yet?)

An alternate design would have been to move the direct Module reference into the GCMetadataPrinter and change the keying of the owning maps to explicitly key off both GCStrategy and Module. I'm open to doing it that way instead, but didn't see much value in preserving the per Module association for GCMetadataPrinters.

The next change in this sequence will be to start unwinding the intertwined ownership between GCStrategy, GCModuleInfo, and GCFunctionInfo.

Differential Revision: http://reviews.llvm.org/D6566

llvm-svn: 223859
2014-12-09 23:57:54 +00:00
Duncan P. N. Exon Smith 5bf8fef580 IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532.  Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.

I have a follow-up patch prepared for `clang`.  If this breaks other
sub-projects, I apologize in advance :(.  Help me compile it on Darwin
I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.

This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.

Here's a quick guide for updating your code:

  - `Metadata` is the root of a class hierarchy with three main classes:
    `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
    the `Value` class hierarchy.  It is typeless -- i.e., instances do
    *not* have a `Type`.

  - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).

  - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
    replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.

    If you're referring solely to resolved `MDNode`s -- post graph
    construction -- just use `MDNode*`.

  - `MDNode` (and the rest of `Metadata`) have only limited support for
    `replaceAllUsesWith()`.

    As long as an `MDNode` is pointing at a forward declaration -- the
    result of `MDNode::getTemporary()` -- it maintains a side map of its
    uses and can RAUW itself.  Once the forward declarations are fully
    resolved RAUW support is dropped on the ground.  This means that
    uniquing collisions on changing operands cause nodes to become
    "distinct".  (This already happened fairly commonly, whenever an
    operand went to null.)

    If you're constructing complex (non self-reference) `MDNode` cycles,
    you need to call `MDNode::resolveCycles()` on each node (or on a
    top-level node that somehow references all of the nodes).  Also,
    don't do that.  Metadata cycles (and the RAUW machinery needed to
    construct them) are expensive.

  - An `MDNode` can only refer to a `Constant` through a bridge called
    `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).

    As a side effect, accessing an operand of an `MDNode` that is known
    to be, e.g., `ConstantInt`, takes three steps: first, cast from
    `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
    third, cast down to `ConstantInt`.

    The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
    metadata schema owners transition away from using `Constant`s when
    the type isn't important (and they don't care about referring to
    `GlobalValue`s).

    In the meantime, I've added transitional API to the `mdconst`
    namespace that matches semantics with the old code, in order to
    avoid adding the error-prone three-step equivalent to every call
    site.  If your old code was:

        MDNode *N = foo();
        bar(isa             <ConstantInt>(N->getOperand(0)));
        baz(cast            <ConstantInt>(N->getOperand(1)));
        bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
        bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));

    you can trivially match its semantics with:

        MDNode *N = foo();
        bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
        baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
        bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
        bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
        bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));

    and when you transition your metadata schema to `MDInt`:

        MDNode *N = foo();
        bar(isa             <MDInt>(N->getOperand(0)));
        baz(cast            <MDInt>(N->getOperand(1)));
        bak(cast_or_null    <MDInt>(N->getOperand(2)));
        bat(dyn_cast        <MDInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));

  - A `CallInst` -- specifically, intrinsic instructions -- can refer to
    metadata through a bridge called `MetadataAsValue`.  This is a
    subclass of `Value` where `getType()->isMetadataTy()`.

    `MetadataAsValue` is the *only* class that can legally refer to a
    `LocalAsMetadata`, which is a bridged form of non-`Constant` values
    like `Argument` and `Instruction`.  It can also refer to any other
    `Metadata` subclass.

(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)

llvm-svn: 223802
2014-12-09 18:38:53 +00:00
Juergen Ributzka 8bda738221 [CGP] Rewrite pattern match for splitBranchCondition to work with Values instead.
Rewrite the pattern match code to work also with Values instead with
Instructions only. Also remove the no longer need matcher (m_Instruction).

llvm-svn: 223797
2014-12-09 17:50:10 +00:00
Juergen Ributzka 194350a936 Revert "Move function to obtain branch weights into the BranchInst class. NFC."
This reverts commit r223784 and copies the 'ExtractBranchMetadata' to CodeGenPrepare.

llvm-svn: 223795
2014-12-09 17:32:12 +00:00
Juergen Ributzka c1bbcbbd32 [CodeGenPrepare] Split branch conditions into multiple conditional branches.
This optimization transforms code like:
bb1:
  %0 = icmp ne i32 %a, 0
  %1 = icmp ne i32 %b, 0
  %or.cond = or i1 %0, %1
  br i1 %or.cond, label %TrueBB, label %FalseBB

into a multiple branch instructions like:

bb1:
  %0 = icmp ne i32 %a, 0
  br i1 %0, label %TrueBB, label %bb2
bb2:
  %1 = icmp ne i32 %b, 0
  br i1 %1, label %TrueBB, label %FalseBB

This optimization is already performed by SelectionDAG, but not by FastISel.
FastISel cannot perform this optimization, because it cannot generate new
MachineBasicBlocks.

Performing this optimization at CodeGenPrepare time makes it available to both -
SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be
removed. There are currenty a few differences in codegen for X86 and PPC, so
this commmit only enables it for FastISel.

Reviewed by Jim Grosbach

This fixes rdar://problem/19034919.

llvm-svn: 223786
2014-12-09 16:36:13 +00:00
Owen Anderson 558012a3fc Fix a few instances found in SelectionDAG where we were not handling F16 at parity with F32 and F64.
llvm-svn: 223760
2014-12-09 06:50:39 +00:00
Hal Finkel c8cf2b88bc Handle early-clobber registers in the aggressive anti-dep breaker
The aggressive anti-dep breaker, used by the PowerPC backend during post-RA
scheduling (but is available to all targets), did not handle early-clobber MI
operands (at all). When constructing the list of available registers for the
replacement of some def operand, check the using instructions, and remove
registers assigned to early-clobbered defs from the set.

Fixes PR21452.

llvm-svn: 223727
2014-12-09 01:00:59 +00:00
Tom Stellard 3e01d47d98 MISched: Fix moving stores across barriers
This fixes an issue with ScheduleDAGInstrs::buildSchedGraph
where stores without an underlying object would not be added
as a predecessor to the current BarrierChain.

llvm-svn: 223717
2014-12-08 23:36:48 +00:00
Justin Bogner 61ba2e3996 InstrProf: An intrinsic and lowering for instrumentation based profiling
Introduce the ``llvm.instrprof_increment`` intrinsic and the
``-instrprof`` pass. These provide the infrastructure for writing
counters for profiling, as in clang's ``-fprofile-instr-generate``.

The implementation of the instrprof pass is ported directly out of the
CodeGenPGO classes in clang, and with the followup in clang that rips
that code out to use these new intrinsics this ends up being NFC.

Doing the instrumentation this way opens some doors in terms of
improving the counter performance. For example, this will make it
simple to experiment with alternate lowering strategies, and allows us
to try handling profiling specially in some optimizations if we want
to.

Finally, this drastically simplifies the frontend and puts all of the
lowering logic in one place.

llvm-svn: 223672
2014-12-08 18:02:35 +00:00
Hans Wennborg 08de833c1c SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

SimplifyCFG currently does this transformation, but I'm working towards changing
that so we can optimize harder based on unreachable defaults.

Differential Revision: http://reviews.llvm.org/D6510

llvm-svn: 223566
2014-12-06 01:28:50 +00:00
Eric Christopher d1fb7e4590 These two calls were grabbing the same register info. Unify them.
llvm-svn: 223502
2014-12-05 19:23:55 +00:00
Ahmed Bougacha 55e3c2d9cf [CodeGenPrepare] Use variables for reused values. NFC.
llvm-svn: 223491
2014-12-05 18:04:40 +00:00
Hal Finkel 66d7791176 Revert "r223440 - Consider subregs when calling MI::registerDefIsDead for phys deps"
Reverting this because, while it fixes the problem in the reduced test case, it
does not fix the problem in the full test case from the bug report.

llvm-svn: 223442
2014-12-05 02:07:35 +00:00
Hal Finkel d013d99fe0 Consider subregs when calling MI::registerDefIsDead for phys deps
The scheduling dependency graph is built bottom-up within each scheduling
region, and ScheduleDAGInstrs::addPhysRegDeps is called to add output/anti
dependencies, based on physical registers, to the SUs for instructions
based on those that come before them.

In the test case, we start before post-RA scheduling with a block that looks
like this:

...
	INLINEASM <...
andc $0,$0,$2
stdcx. $0,0,$3
bne- 1b
> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:G8RC], %X6<earlyclobber,def,dead>, $1:[mem], %X3<kill>, $2:[reguse:G8RC], %X5<kill>, $3:[reguse:G8RC], %X3, $4:[mem], %X3, $5:[clobber], %CC<earlyclobber,imp-def,dead>, <<badref>>
	...
	%X4<def,dead> = ANDIo8 %X4<kill>, 1, %CR0<imp-def,dead>, %CR0GT<imp-def>
	...
	%R29<def> = ISEL %R3<undef>, %R4<kill>, %CR0GT<kill>

where it is relevant that %CC is an alias to %CR0, and that %CR0GT is a
subregister of %CR0. However, for post-RA scheduling, no dependency was added
to prevent the INLINEASM from being scheduled in between the ANDIo8 and the
ISEL (which communicate via the %CR0GT register).

In ScheduleDAGInstrs::addPhysRegDeps, when called for the %CC operand, we'd
iterate over all of its aliases (which include %CC itself and also %CR0), and
look for previously-encountered defs of those registers. We'd find the ANDIo8,
but decide not to add a dependency between the INLINEASM and the ANDIo8 because
both the INLINEASM's def of %CC is dead, and also the ANDIo8 def of %CR0 is
dead. This ignores, however, that ANDIo8 has a non-dead def of %CR0GT, a
subregister of %CR0, and thus a dependency still must exist.

To fix this problem, when calling registerDefIsDead on the SU with the def, we
also check all subregisters for possible non-dead defs, and add the dependency
if any are found.

Fixes PR21742.

llvm-svn: 223440
2014-12-05 01:57:22 +00:00
Adrian Prantl ab255fcd09 Cleanup: Calls to getDwarfRegNum() may actually fail, if there is
no DWARF register number mapping, or if the register was a virtual
register that was never materialized. Previously, we would just emit a
bogus location, after this patch we don't emit a location at all by
doing an early exit.

After my bugfix in r223401 today, this doesn't actually happen on any
target that I tested this with, but it's still preferable to make the
possibility of a failure explicit.

llvm-svn: 223428
2014-12-05 01:02:46 +00:00
Adrian Prantl da7e03f1bf Simplify implementation and testcase of r223401 based on feedback from dblaikie.
llvm-svn: 223405
2014-12-04 22:58:41 +00:00
Adrian Prantl a3ae0b3b5b Debug info: If the RegisterCoalescer::reMaterializeTrivialDef() is
eliminating all uses of a vreg, update any DBG_VALUE describing that vreg
to point to the rematerialized register instead.

llvm-svn: 223401
2014-12-04 22:29:04 +00:00
Patrik Hagglund d06de4b954 Use DomTree in MachineSink to sink over diamonds.
According to a previous FIXME comment we now not only look at MBB
successors, but also handle code sinking past them:

  x = computation
  if () {} else {}
  use x

The instruction could be sunk over the whole diamond for the
if/then/else (or loop, etc), allowing it to be sunk into other blocks
after that.

Modified test added in r204522, due to one spill less present.

Minor fixes in comments.

Patch provided by Jonas Paulsson. Reviewed by Hal Finkel.

llvm-svn: 223350
2014-12-04 10:36:42 +00:00
Simon Pilgrim be24ab367b [InstCombine] Minor optimization for bswap with binary ops
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:

OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )

Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:

fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))

Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.

Differential Revision: http://reviews.llvm.org/D6407

llvm-svn: 223349
2014-12-04 09:44:01 +00:00
Elena Demikhovsky f1de34b84d Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 223348
2014-12-04 09:40:44 +00:00
Matt Arsenault 4e27343eec Allow target to specify prefix for labels
Use the MCAsmInfo instead of the DataLayout, and allow
specifying a custom prefix for labels specifically. HSAIL
requires that labels begin with @, but global symbols with &.

llvm-svn: 223323
2014-12-04 00:06:57 +00:00
Quentin Colombet 079aba733a [RegAllocFast] Handle implicit definitions conservatively.
Prior to this commit, physical registers defined implicitly were considered free
right after their definition, i.e.. like dead definitions. Therefore, their uses
had to immediately follow their definitions, otherwise the related register may
be reused to allocate a virtual register.

This commit fixes this assumption by keeping implicit definitions alive until
they are actually used. The downside is that if the implicit definition was dead
(and not marked at such), we block an otherwise available register. This is
however conservatively correct and makes the fast register allocator much more
robust in particular regarding the scheduling of the instructions.

Fixes PR21700.

llvm-svn: 223317
2014-12-03 23:38:08 +00:00
Peter Collingbourne 51d2de7b9e Prologue support
Patch by Ben Gamari!

This redefines the `prefix` attribute introduced previously and
introduces a `prologue` attribute.  There are a two primary usecases
that these attributes aim to serve,

  1. Function prologue sigils

  2. Function hot-patching: Enable the user to insert `nop` operations
     at the beginning of the function which can later be safely replaced
     with a call to some instrumentation facility

  3. Runtime metadata: Allow a compiler to insert data for use by the
     runtime during execution. GHC is one example of a compiler that
     needs this functionality for its tables-next-to-code functionality.

Previously `prefix` served cases (1) and (2) quite well by allowing the user
to introduce arbitrary data at the entrypoint but before the function
body. Case (3), however, was poorly handled by this approach as it
required that prefix data was valid executable code.

Here we redefine the notion of prefix data to instead be data which
occurs immediately before the function entrypoint (i.e. the symbol
address). Since prefix data now occurs before the function entrypoint,
there is no need for the data to be valid code.

The previous notion of prefix data now goes under the name "prologue
data" to emphasize its duality with the function epilogue.

The intention here is to handle cases (1) and (2) with prologue data and
case (3) with prefix data.

References
----------

This idea arose out of discussions[1] with Reid Kleckner in response to a
proposal to introduce the notion of symbol offsets to enable handling of
case (3).

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html

Test Plan: testsuite

Differential Revision: http://reviews.llvm.org/D6454

llvm-svn: 223189
2014-12-03 02:08:38 +00:00
Hal Finkel bbdee93638 [PowerPC] Implement readcyclecounter for PPC32
We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).

This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).

Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.

llvm-svn: 223161
2014-12-02 22:01:00 +00:00
Philip Reames 72fbe7a6f0 Restructure some assertion checking based on post commit feedback by Aaron and Tom.
llvm-svn: 223150
2014-12-02 21:01:48 +00:00
Philip Reames f814a511da Appease a build bot complaining about an unused variable that's used in an assertion.
llvm-svn: 223142
2014-12-02 19:28:57 +00:00
Philip Reames 1a1bdb22bf [Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder
This is the third patch in a small series.  It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085).  The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.  

With this change, gc.statepoints should be functionally complete.  The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.

I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated.  The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.  

During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints.  Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack.  The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.  

In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator.  In principal, we shouldn't need to eagerly spill at all.  The register allocator should do any spilling required and the statepoint should simply record that fact.  Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.  

Reviewed by: atrick, ributzka

llvm-svn: 223137
2014-12-02 18:50:36 +00:00
Ahmed Bougacha 54b7d334c7 [MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction.
Go through implicit defs of CSMI and MI, and clear the kill flags on
their uses in all the instructions between CSMI and MI.
We might have made some of the kill flags redundant, consider:
  subs  ... %NZCV<imp-def>        <- CSMI
  csinc ... %NZCV<imp-use,kill>   <- this kill flag isn't valid anymore
  subs  ... %NZCV<imp-def>        <- MI, to be eliminated
  csinc ... %NZCV<imp-use,kill>
Since we eliminated MI, and reused a register imp-def'd by CSMI
(here %NZCV), that register, if it was killed before MI, should have
that kill flag removed, because it's lifetime was extended.

Also, add an exhaustive testcase for the motivating example.

Reviewed by: Juergen Ributzka <juergen@apple.com>

llvm-svn: 223133
2014-12-02 18:09:51 +00:00
Philip Reames 0365f1a376 [Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & x86-64 Backend
This is the second patch in a small series.  This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints.  It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code.  I will be submitting the SelectionDAG parts within the next 24-48 hours.  Since those pieces are by far the most complicated, I wanted to minimize the size of that patch.  That patch will include the tests which exercise the functionality in this patch.  The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.

The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots.  The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes.  The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT.  (Doing so would break relocation semantics for collectors which wish to relocate roots.)

The implementation of STATEPOINT is closely modeled on PATCHPOINT.  Eventually, much of the code in this patch will be removed.  The long term plan is to merge the functionality provided by statepoints and patchpoints.  Merging their implementations in the backend is likely to be a good starting point.

Reviewed by: atrick, ributzka

llvm-svn: 223085
2014-12-01 22:52:56 +00:00
Ahmed Bougacha fb6eeb74c5 [MachineVerifier] Accept a MBB with a single landing pad successor.
The MachineVerifier used to check that there was always exactly one
unconditional branch to a non-landingpad (normal) successor.
If that normal successor to an invoke BB is unreachable, it seems
reasonable to only have one successor, the landing pad.
On targets other than AArch64 (and on AArch64 with a different testcase),
the branch folder turns the branch to the landing pad into a fallthrough.
The MachineVerifier, which relies on AnalyzeBranch, is unable to check
the condition, and doesn't complain. However, it does in this specific
testcase, where the branch to the landing pad remained.
Make the MachineVerifier accept it.

llvm-svn: 223059
2014-12-01 18:43:53 +00:00
Hans Wennborg 5bef5b522b Revert r223049, r223050 and r223051 while investigating test failures.
I didn't foresee affecting the Clang test suite :/

llvm-svn: 223054
2014-12-01 17:36:43 +00:00
Hans Wennborg 1571336fb2 SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

llvm-svn: 223049
2014-12-01 17:08:32 +00:00
Akira Hatanaka b9991a2656 [stack protector] Set edge weights for newly created basic blocks.
This commit fixes a bug in stack protector pass where edge weights were not set
when new basic blocks were added to lists of successor basic blocks.

Differential Revision: http://reviews.llvm.org/D5766

llvm-svn: 222987
2014-12-01 04:27:03 +00:00
Hans Wennborg 6dfb041ffc Switch lowering: reformat some for loops etc. NFC
llvm-svn: 222962
2014-11-29 21:24:12 +00:00
Hans Wennborg 6c42d1a5de Switch lowering: Fix broken 'Figure out which block is next' code
This doesn't seem to have worked in a long time, but other optimizations
would clean it up.

llvm-svn: 222961
2014-11-29 21:17:05 +00:00
Simon Pilgrim 2bfd9129f4 Target triple OS detection tidyup. NFC
Use Triple::isOS*() helpers where possible.

llvm-svn: 222960
2014-11-29 19:18:21 +00:00
Duncan P. N. Exon Smith 9bc81fbe92 Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936
2014-11-28 21:29:14 +00:00
Elena Demikhovsky 6de72e5b0c Converted back to Unix format (after my last commit 222632)
llvm-svn: 222636
2014-11-23 15:21:53 +00:00
Elena Demikhovsky 9e5089a938 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
Manman Ren f0a582bada Debug Info: revert r222195, r222210 and r222239.
This is no longer needed after David's fix at r222377 + r222485.
rdar://18958417

llvm-svn: 222563
2014-11-21 19:55:23 +00:00
Manman Ren c98ec0e70a [Objective-C] Support a new special module flag that will be put into the
objc_imageinfo struct.

rdar://17954668

llvm-svn: 222558
2014-11-21 19:24:55 +00:00
Sanjay Patel eb4a4d5aeb Don't repeat class/function/variable names in comments. NFC.
llvm-svn: 222555
2014-11-21 18:58:38 +00:00
Sanjay Patel b06441aded Less space; NFC
llvm-svn: 222546
2014-11-21 18:05:59 +00:00
Andrea Di Biagio 0225b5bf6f [DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero.
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.

This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.

llvm-svn: 222536
2014-11-21 14:32:06 +00:00
Andrea Di Biagio 26e8f4d166 [DAG] Refactor the shuffle combining logic in DAGCombiner. NFC.
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.

llvm-svn: 222522
2014-11-21 11:33:07 +00:00
Hao Liu 44e5d7a131 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
Matthias Braun 745ea43687 RegisterCoalescer: Improve debug messages
- Show "Considering..." message after flipping so you actually see the final
  destination vreg as destination.
- Add a message on final join, so you can grep for "Success" messages to obtain
  a list of which register got merged with which.

llvm-svn: 222382
2014-11-19 19:46:17 +00:00
Matthias Braun d2f4c77800 Add a print and verify pass after the RegisterCoalescer
llvm-svn: 222381
2014-11-19 19:46:15 +00:00
Matthias Braun 47760d9667 MachineVerifier: Report register for bad liveranges
llvm-svn: 222380
2014-11-19 19:46:13 +00:00
Matthias Braun 9f87d75060 Introduce register dump helper
llvm-svn: 222379
2014-11-19 19:46:11 +00:00
Simon Pilgrim 3ac3b251a9 [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.

I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.

Differential Revision: http://reviews.llvm.org/D5699

llvm-svn: 222340
2014-11-19 10:06:49 +00:00
David Blaikie 70573dcd9f Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
David Blaikie 5106ce7897 Remove StringMap::GetOrCreateValue in favor of StringMap::insert
Having two ways to do this doesn't seem terribly helpful and
consistently using the insert version (which we already has) seems like
it'll make the code easier to understand to anyone working with standard
data structures. (I also updated many references to the Entry's
key and value to use first() and second instead of getKey{Data,Length,}
and get/setValue - for similar consistency)

Also removes the GetOrCreateValue functions so there's less surface area
to StringMap to fix/improve/change/accommodate move semantics, etc.

llvm-svn: 222319
2014-11-19 05:49:42 +00:00
Owen Anderson b5a259935c Fix an incorrect chain operand when expanding INSERT_VECTOR operations through the stack.
Patch by Daniil Troshkov!

llvm-svn: 222254
2014-11-18 20:50:19 +00:00
Frederic Riss fdccfc1e19 Allow DwarfCompileUnit::constructImportedEntityDIE to instanciate a GlobalVariable DIE.
Usually global variables are in a retain list and instanciated before
any call to constructImportedEntityDIE is made. This isn't true for
forward declarations though.
The testcase for this change is generated by a clang patched to emit
such forward declarations (patch at http://reviews.llvm.org/D6173
which will land soon). The updated testcase tests more than just
global variables, it now tests every type of 'using' clause we
support.

llvm-svn: 222217
2014-11-18 02:46:11 +00:00
Manman Ren 554865da5b Debug Info: In DIBuilder, the context field of a global variable is updated to
use DIScopeRef.

A paired commit at clang will follow to show cases where we will use an
identifer for the context of a global variable.

rdar://18958417

llvm-svn: 222195
2014-11-18 00:29:08 +00:00
Oliver Stannard d29db9b949 Fix optimisations of SELECT_CC which assumed result is boolean
Some optimisations in DAGCombiner cause miscompilations for targets that use
TargetLowering::UndefinedBooleanContent, because they assume that the results
of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
XORed. These optimisations are only valid for targets that use
ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.

This is a follow-up to D6210/r221693.

llvm-svn: 222123
2014-11-17 10:49:31 +00:00
Craig Topper f98c606479 Add missing semicolon from r222118.
llvm-svn: 222119
2014-11-17 05:58:26 +00:00
Craig Topper cf0444ba2a Move register class name strings to a single array in MCRegisterInfo to reduce static table size and number of relocation entries.
Indices into the table are stored in each MCRegisterClass instead of a pointer. A new method, getRegClassName, is added to MCRegisterInfo and TargetRegisterInfo to lookup the string in the table.

llvm-svn: 222118
2014-11-17 05:50:14 +00:00
Craig Topper 6438fc3d05 Replace a couple asserts with static_asserts.
llvm-svn: 222114
2014-11-17 00:26:50 +00:00
Craig Topper 7f416c8acb Convert some EVTs to MVTs where only a SimpleValueType is needed.
llvm-svn: 222109
2014-11-16 21:17:18 +00:00
Andrea Di Biagio e13a0b81f4 [DAG] Improved target independent vector shuffle folding logic.
This patch teaches the DAGCombiner how to combine shuffles according to rules:
   shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2)

llvm-svn: 222090
2014-11-15 22:56:25 +00:00
Reid Kleckner c2291f3905 Rename EH related stuff to be more precise
Summary:
The current "WinEH" exception handling type is more about Itanium-style
LSDA tables layered on top of the Windows native unwind info format
instead of .eh_frame tables or EHABI unwind info. Use the name
"ItaniumWinEH" to better reflect the hybrid nature of the design.

Also rename isExceptionHandlingDWARF to usesItaniumLSDAForExceptions,
since the LSDA is part of the Itanium C++ ABI document, and not the
DWARF standard.

Reviewers: echristo

Subscribers: llvm-commits, compnerd

Differential Revision: http://reviews.llvm.org/D6279

llvm-svn: 222062
2014-11-14 23:31:07 +00:00
Reid Kleckner 283bc2ed28 Allow the use of functions as typeinfo in landingpad clauses
This is one step towards supporting SEH filter functions in LLVM.

llvm-svn: 221954
2014-11-14 00:35:50 +00:00
Reid Kleckner 971c3ea67b Use nullptr instead of NULL for variadic sentinels
Windows defines NULL to 0, which when used as an argument to a variadic
function, is not a null pointer constant. As a result, Clang's
-Wsentinel fires on this code. Using '0' would be wrong on most 64-bit
platforms, but both MSVC and Clang make it work on Windows. Sidestep the
issue with nullptr.

llvm-svn: 221940
2014-11-13 22:55:19 +00:00
Aditya Nandakumar 3053155652 We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed.
llvm-svn: 221926
2014-11-13 21:29:21 +00:00
Aditya Nandakumar a27193297f This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively
llvm-svn: 221878
2014-11-13 09:26:31 +00:00
Frederic Riss 0f7abef2cf Add an assert and a test that verify r221709's fix.
llvm-svn: 221854
2014-11-13 03:20:23 +00:00
Quentin Colombet f5485bb008 [CodeGenPrepare] Handle zero extensions in the TypePromotionHelper.
Prior to this patch the TypePromotionHelper was promoting only sign extensions.
Supporting zero extensions changes:
- How constants are extended.
- How sign extensions, zero extensions, and truncate are composed together.
- How the type of the extended operation is recorded. Now we need to know the
  kind of the extension as well as its type.

Each change is fairly small, unlike the diff.
Most of the diff are comments/variable renaming to say "extension" instead of
"sign extension".

The performance improvements on the test suite are within the noise.

Related to <rdar://problem/18310086>.

llvm-svn: 221851
2014-11-13 01:44:51 +00:00
Frederic Riss 3a6b354b3e Fix emission of Dwarf accelerator table when there are multiple CUs.
The DIE offset in the accel tables is an offset relative to the start
of the debug_info section, but we were encoding the offset to the
start of the containing CU.

llvm-svn: 221837
2014-11-12 23:48:14 +00:00
Ahmed Bougacha 026600d967 [CodeGenPrepare] Replace other uses of EVT::getEVT with TL::getValueType.
r221820 fixed a problem (PR21548) where an iPTR was used in TLI legality checks,
which isn't valid and resulted in a failed assertion.
The solution was to lower pointer types into the correct target's VT, by
using TL::getValueType instead of EVT::getEVT.

This commit changes 3 other uses of EVT::getEVT, but without any tests:
- One of these non-lowered EVTs is passed to allowsMisalignedMemoryAccesses,
which goes into target's TL implementation and doesn't cause any problem (yet.)
- Two others are passed to TLI.isOperationLegalOrCustom:
  - one only looks at extensions, so doesn't concern pointers.
  - one only looks at binary operators, so also isn't a problem.

The latter might some day be exposed to pointers and cause the same assert as
the original PR, because there's a comment hinting at also supporting cast ops.

For consistency, update all of them and be done with it.

llvm-svn: 221827
2014-11-12 23:05:03 +00:00
Ahmed Bougacha 0788d49a40 [CodeGenPrepare][AArch64] Fix a TLI legality check on iPTR to use a lowered instead.
Fixes PR21548.  Related to PR20474.

llvm-svn: 221820
2014-11-12 22:16:55 +00:00
Timur Iskhodzhanov 0e76a16200 Temporary fix for PR21528 - use mangled C++ function names in COFF debug info to un-break ASan on Windows
llvm-svn: 221813
2014-11-12 20:21:20 +00:00
Timur Iskhodzhanov a11b32b7e5 [COFF] Make it clearer that the symbols subsection holds function display name rather than just name
llvm-svn: 221812
2014-11-12 20:10:09 +00:00
Duncan P. N. Exon Smith de36e8040f Revert "IR: MDNode => Value"
Instead, we're going to separate metadata from the Value hierarchy.  See
PR21532.

This reverts commit r221375.
This reverts commit r221373.
This reverts commit r221359.
This reverts commit r221167.
This reverts commit r221027.
This reverts commit r221024.
This reverts commit r221023.
This reverts commit r220995.
This reverts commit r220994.

llvm-svn: 221711
2014-11-11 21:30:22 +00:00
Tom Roeder 6312f4a422 Fix build break: remove unused variable in FCFI.
llvm-svn: 221710
2014-11-11 21:26:33 +00:00
Frederic Riss 8ad4f498fb Totally forget deallocated SDNodes in SDDbgInfo.
What would happen before that commit is that the SDDbgValues associated with
a deallocated SDNode would be marked Invalidated, but SDDbgInfo would keep
a map entry keyed by the SDNode pointer pointing to this list of invalidated
SDDbgNodes. As the memory gets reused, the list might get wrongly associated
with another new SDNode. As the SDDbgValues are cloned when they are transfered,
this can lead to an exponential number of SDDbgValues being produced during
DAGCombine like in http://llvm.org/bugs/show_bug.cgi?id=20893

Note that the previous behavior wasn't really buggy as the invalidation made
sure that the SDDbgValues won't be used. This commit can be considered a
memory optimization and as such is really hard to validate in a unit-test.

llvm-svn: 221709
2014-11-11 21:21:08 +00:00
Tom Roeder eb7a303d1b Add Forward Control-Flow Integrity.
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.

This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.

Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.

Review: http://reviews.llvm.org/D4167
llvm-svn: 221708
2014-11-11 21:08:02 +00:00
Oliver Stannard 8c2c67e63c LLVM incorrectly folds xor into select
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.

llvm-svn: 221693
2014-11-11 17:36:01 +00:00
Saleem Abdulrasool d2c5d7f6da Transforms: address some late comments
We already use the llvm namespace.  Remove the unnecessary prefix.  Use the
StringRef::equals method to compare with C strings rather than instantiating
std::strings.

Addresses late review comments from David Majnemer.

llvm-svn: 221564
2014-11-08 00:00:50 +00:00
Saleem Abdulrasool 5898e09057 Transform: add SymbolRewriter pass
This introduces the symbol rewriter. This is an IR->IR transformation that is
implemented as a CodeGenPrepare pass. This allows for the transparent
adjustment of the symbols during compilation.

It provides a clean, simple, elegant solution for symbol inter-positioning. This
technique is often used, such as in the various sanitizers and performance
analysis.

The control of this is via a custom YAML syntax map file that indicates source
to destination mapping, so as to avoid having the compiler to know the exact
details of the source to destination transformations.

llvm-svn: 221548
2014-11-07 21:32:08 +00:00
Lang Hames cdd9077f3a [RegAlloc] Kill off the trivial spiller - nobody is using it any more.
llvm-svn: 221474
2014-11-06 19:12:38 +00:00
Rafael Espindola 26cfbea738 Compute the correct jump table entries on 32 bit windows.
On 32 bit windows we use label differences and .set does not suppress
rolocations, a combination that was not used before r220256.

This fixes PR21497.

llvm-svn: 221456
2014-11-06 14:39:49 +00:00
Rafael Espindola 83f0ea8982 Add three other sections when L symbols are allowed.
llvm-svn: 221436
2014-11-06 05:01:21 +00:00
Rafael Espindola bf77ed6826 Allow L symbols in no_dead_strip sections.
If a section cannot be dead stripped, it is safe to use L symbols, since
the linker will keep all of it in the end.

llvm-svn: 221431
2014-11-06 02:42:03 +00:00
Duncan P. N. Exon Smith c5754a65e6 IR: MDNode => Value: NamedMDNode::getOperator()
Change `NamedMDNode::getOperator()` from returning `MDNode *` to
returning `Value *`.  To reduce boilerplate at some call sites, add a
`getOperatorAsMDNode()` for named metadata that's expected to only
return `MDNode` -- for now, that's everything, but debug node named
metadata (such as llvm.dbg.cu and llvm.dbg.sp) will soon change.  This
is part of PR21433.

Note that there's a follow-up patch to clang for the API change.

llvm-svn: 221375
2014-11-05 18:16:03 +00:00
Andrea Di Biagio ce46b97b48 [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.

This allows for example to simplify the following code:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
  %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
  %3 = or <4 x i32> %1, %2
  ret <4 x i32> %3
}

Before this patch llc (-mcpu=corei7) generated:
        andps  LCPI1_0(%rip), %xmm0, %xmm0
        andps  LCPI1_1(%rip), %xmm1, %xmm1
        orps   %xmm1, %xmm0, %xmm0
        retq

With this patch we generate a single 'vpblendw'.

llvm-svn: 221343
2014-11-05 13:04:14 +00:00
Craig Topper 12f0d9ef2c Improve logic that decides if its profitable to commute when some of the virtual registers involved have uses/defs chains connecting them to physical register. Fix up the tests that this change improves.
llvm-svn: 221336
2014-11-05 06:43:02 +00:00
David Blaikie 3a443c29b9 Provide gmlt-like inline scope information in the skeleton CU to facilitate symbolication without needing the .dwo files
Clang -gsplit-dwarf self-host -O0, binary increases by 0.0005%, -O2,
binary increases by 25%.

A large binary inside Google, split-dwarf, -O0, and other internal flags
(GDB index, etc) increases by 1.8%, optimized build is 35%.

The size impact may be somewhat greater in .o files (I haven't measured
that much - since the linked executable -O0 numbers seemed low enough)
due to relocations. These relocations could be removed if we taught the
llvm-symbolizer to handle indexed addressing in the .o file (GDB can't
cope with this just yet, but GDB won't be reading this info anyway).
Also debug_ranges could be shared between .o and .dwo, though ideally
debug_ranges would get a schema that could used index(+offset)
addressing, and move to the .dwo file, then we'd be back to sharing
addresses in the address pool again.

But for now, these sizes seem small enough to go ahead with this.

Verified that no other DW_TAGs are produced into the .o file other than
subprograms and inlined_subroutines.

llvm-svn: 221306
2014-11-04 22:12:25 +00:00
David Blaikie 9bfd7a9f43 Move cross-unit DIE caching to the DwarfFile level, so it doesn't interfere with fission-gmlt data and produce skeleton<>full unit cross referencing.
llvm-svn: 221305
2014-11-04 22:12:18 +00:00
Arnaud A. de Grandmaison a11cab3120 [PBQP] Callee saved regs should have a higher cost than scratch regs
Registers are not all equal. Some are not allocatable (infinite cost),
some have to be preserved but can be used, and some others are just free
to use.

Ensure there is a cost hierarchy reflecting this fact, so that the
allocator will favor scratch registers over callee-saved registers.

llvm-svn: 221293
2014-11-04 20:51:29 +00:00
Arnaud A. de Grandmaison 829dd81377 [PBQP] Tweak spill costs and coalescing benefits
This patch improves how the different costs (register, interference, spill
and coalescing) relates together. The assumption is now that:
 - coalescing (or any other "side effect" of reg alloc) is negative, and
   instead of being derived from a spill cost, they use the block
   frequency info.
 - spill costs are in the [MinSpillCost:+inf( range
 - register or interference costs are in [0.0:MinSpillCost( or +inf

The current MinSpillCost is set to 10.0, which is a random value high
enough that the current constraint builders do not need to worry about
when settings costs. It would however be worth adding a normalization
step for register and interference costs as the last step in the
constraint builder chain to ensure they are not greater than SpillMinCost
(unless this has some sense for some architectures). This would work well
with the current builder pipeline, where all costs are tweaked relatively
to each others, but could grow above MinSpillCost if the pipeline is
deep enough.

The current heuristic is tuned to depend rather on the number of uses of
a live interval rather than a density of uses, as used by the greedy
allocator. This heuristic provides a few percent improvement on a number
of benchmarks (eembc, spec, ...) and will definitely need to change once
spill placement is implemented: the current spill placement is really
ineficient, so making the cost proportionnal to the number of use is a
clear win.

llvm-svn: 221292
2014-11-04 20:51:24 +00:00
David Majnemer b925715c56 CodeGen: Enable DWARF emission for MS ABI targets
This is experimental, just barely enough to get things to not
immediately combust.

A note for those who are curious:
Only lld can successfully link the object files, other linkers truncate
the section names making the debug sections illegible to debuggers.

Even with this in mind, we believe we are having trouble with SECREL
relocations.

llvm-svn: 221245
2014-11-04 08:03:31 +00:00
Sanjoy Das e839965faa The patchpoint lowering logic would crash with live constants equal to
the tombstone or empty keys of a DenseMap<int64_t, T>.  This patch
fixes the issue (and adds a tests case).

llvm-svn: 221214
2014-11-04 00:59:21 +00:00
Sanjoy Das 429c9cae09 Change logic in StackMaps::recordStackMapOpers to use the isInt<32>
predicate instead of bitwise operations.

This is not a functional change.

llvm-svn: 221209
2014-11-04 00:06:57 +00:00
David Blaikie 5b02a19f90 Use common range handling for the CU's ranges
This generalizes the range handling for ranges in both the skeleton and
full unit, laying the foundation for the addition of more ranges (rather
than just the CU's special case) in the skeleton CU with fission+gmlt.

llvm-svn: 221202
2014-11-03 23:10:59 +00:00
David Blaikie 542616d47c Push the CURangeList down into the skeleton CU (where available) rather than the full CU
So that it may be shared between skeleton/full compile unit, for CU
ranges and other ranges to be added for fission+gmlt.

(at some point we might want some kind of object shared between the
skeleton and full compile units for all those things we only want one of
in that scope, rather than having the full unit always look through to
the skeleton... - alternatively, we might be able to have the skeleton
pointer (or another, separate pointer) point to the skeleton or to the
unit itself in non-fission, so we don't have to special case its
absence)

llvm-svn: 221186
2014-11-03 21:52:56 +00:00
David Blaikie ce343492ee Add DwarfCompileUnit::BaseAddress to track the base address used by relative addressing in debug_ranges and debug_loc
This is one of a few steps to generalize range handling to include the
CU range (thus the CU's range list will be moved into the range list
list, losing track of the base address in the process), which means
generalizing ranges from both the skeleton and full unit under fission.

And... then I can used that generalized support for ranges in
fission+gmlt where there'll be a bunch more ranges in the skeleton.

llvm-svn: 221182
2014-11-03 21:15:30 +00:00
Paul Robinson ad06e430ce Normally an 'optnone' function goes through fast-isel, which does not
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.

Commit includes the test that found this, plus another one that got
missed in the original optnone work.

llvm-svn: 221168
2014-11-03 18:19:26 +00:00
David Blaikie 077ad48447 Cleanup some unused or trivial functions in DwarfCompileUnit
llvm-svn: 221164
2014-11-03 17:10:38 +00:00
David Blaikie bc532b44a0 Sink DwarfUnit::CURanges into DwarfCompileUnit
llvm-svn: 221161
2014-11-03 16:40:43 +00:00
Oliver Stannard cf6bfb1dd0 Revert r221150, as it broke sanitizer tests
llvm-svn: 221151
2014-11-03 12:19:03 +00:00
Oliver Stannard 652ec6ee89 Emit .eh_frame with relocations to functions, rather than sections
When LLVM emits DWARF call frame information, it currently creates a local,
section-relative symbol in the code section, which is pointed to by a
relocation on the .eh_frame section. However, for C++ we emit some functions in
section groups, and the SysV ABI has some rules to make it easier to remove
these sections
(http://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_group_rules):

  A symbol table entry with STB_LOCAL binding that is defined relative to one
  of a group's sections, and that is contained in a symbol table section that is
  not part of the group, must be discarded if the group members are discarded.
  References to this symbol table entry from outside the group are not allowed.

This means that we need to use the function symbol for the relocation, not a
temporary symbol.

There was a comment in the code claiming that the local symbol was used to
avoid creating a relocation, but a relocation must be created anyway as the
code and CFI are in different sections.

llvm-svn: 221150
2014-11-03 12:02:51 +00:00
David Blaikie 89a26f012a Sink range list handling down from DwarfUnit into its only use, in DwarfCompileUnit.
llvm-svn: 221123
2014-11-03 02:41:49 +00:00
David Blaikie 4aa49b2054 Formatting
llvm-svn: 221095
2014-11-02 08:52:37 +00:00
David Blaikie cafd962d97 Add DwarfUnit::isDwoUnit and use it to generalize string creation
Currently we only need to emit skeleton strings into the CU header and
we do this by explicitly calling "addLocalString". With gmlt-in-fission,
we'll be emitting a bunch of other strings from other codepaths where
it's not statically known that these strings will be local or not.

Introduce a virtual function to indicate whether this unit is a DWO unit
or not (I'm not sure if we have a good term for this, the
opposite/alternative to 'skeleton' unit) and use that to generalize the
string emission logic so that strings can be correctly emitted in both
the skeleton and dwo unit when in split dwarf mode.

And to demonstrate that this works, switch the existing special callers
of addLocalString in the skeleton builder to addString - and they still
work. Yay.

llvm-svn: 221094
2014-11-02 08:51:37 +00:00
David Blaikie 279c451c0b Remove the last mention of LineTablesOnly from DwarfUnit, sinking it into DwarfCompileUnit
This is a useful distinction/invariant/delination to make because
LineTablesOnly mode is never relevant to type units, so it's clear that
we're not doing weird line-tables-only-with-types by making this API
choice.

It also lays the foundations nicely for adding gmlt-like data to fission
skeleton CUs while limiting the effects to CUs and not TUs.

llvm-svn: 221093
2014-11-02 08:18:06 +00:00
David Blaikie 3363a57c8e Sink DwarfUnit::applySubprogramAttributesToDefinition into DwarfCompileUnit
llvm-svn: 221092
2014-11-02 08:09:09 +00:00
David Blaikie 978020807a Sink DwarfUnit::addExpr into DwarfCompileUnit
llvm-svn: 221090
2014-11-02 07:11:55 +00:00
David Blaikie 8c485b5d74 Fix the build from the last commit
llvm-svn: 221089
2014-11-02 07:08:12 +00:00
David Blaikie 02a6333ba7 Sink DwarfUnit::applyVariableAttributes into DwarfCompileUnit
llvm-svn: 221088
2014-11-02 07:06:51 +00:00
David Blaikie 4bc0881ac7 Sink DwarfUnit::addLocationList down into DwarfCompileUnit
llvm-svn: 221087
2014-11-02 07:03:19 +00:00
David Blaikie 77895fb276 Sink DwarfUnit::addComplexAddress down into DwarfCompileUnit
llvm-svn: 221086
2014-11-02 06:58:44 +00:00
David Blaikie f7435ee6ce Push DwarfUnit::addAddress down into DwarfCompileUnit
llvm-svn: 221085
2014-11-02 06:46:40 +00:00
David Blaikie 7d48be2b7b Sink DwarfUnit::addVariableAddress into DwarfCompileUnit since type units don't have variables
llvm-svn: 221084
2014-11-02 06:37:23 +00:00
David Blaikie 192b45c1ef DebugInfo: Sink accelerator table lists down (GlobalNames/Types) into DwarfCompileUnit
llvm-svn: 221083
2014-11-02 06:16:39 +00:00
David Blaikie 98cf172175 Add DwarfUnit::addGlobalType to match DwarfUnit::addGlobalName
(these will shortly become virtual, with a null implementation in
DwarfUnit (since type units don't have accelerator tables in the current
schema) and the current implementation down in DwarfCompileUnit, moving
the actual maps there too)

llvm-svn: 221082
2014-11-02 06:06:14 +00:00
David Blaikie 871c2d9d63 DebugInfo: Refactor index type DIE initialization by rolling it into the accessor
llvm-svn: 221080
2014-11-02 03:09:13 +00:00
David Blaikie ce47366150 Be sure to initialize DwarfCompileUnit::LabelBegin now that it may be skipped in initSection
llvm-svn: 221079
2014-11-02 02:40:26 +00:00
David Blaikie b6726a9ece Don't bother creating LabelBegin for .dwo units
This would help catch cases where we might otherwise try to reference a
dwo CU label, which would be weird - because without relocations in the
dwo file it's not generally meaningful to talk about the CU offsets
there (or, if it is, we can do so in absolute terms without using a
relocation to compute it).

llvm-svn: 221078
2014-11-02 02:26:24 +00:00
David Blaikie 27e35f2302 Drop DwarfCompileUnit::getLocalLabel* in favor of just mapping through the skeleton explicitly.
Confusing to do this two different ways - I'm not too wedded to either
one, but here goes.

llvm-svn: 221076
2014-11-02 01:21:43 +00:00
David Blaikie f4bdc31271 Sink DwarfUnit::LabelBegin down into DwarfCompileUnit since that's the only place it's needed.
llvm-svn: 221075
2014-11-02 01:21:40 +00:00
David Blaikie ae57e66e36 Sink dwarf unit length emission down into DwarfUnit::emitHeader
This allows the CU label to be emitted only for compile units, as
they're the only ones that need it (so they can be referenced from
pubnames)

llvm-svn: 221072
2014-11-01 23:59:23 +00:00
David Blaikie 983bfea0d0 Remove DwarfUnit::LabelEnd in favor of computing the length of the section directly
This was a compile-unit specific label (unused in type units) and seems
unnecessary anyway when we can more easily directly compute the size of
the compile unit.

llvm-svn: 221067
2014-11-01 23:07:14 +00:00
David Blaikie a34568b220 Sink DwarfUnit::SectionSym into DwarfCompileUnit as it's only needed/used there.
llvm-svn: 221062
2014-11-01 20:06:28 +00:00
David Blaikie a5437b6bf6 Make DwarfCompileUnit::Skeleton more narrowly typed (DwarfCompileUnit* instead of DwarfUnit*) now that it's specific to DwarfCompileUnit anyway.
llvm-svn: 221060
2014-11-01 19:26:05 +00:00
David Blaikie 7cbf58af15 Sink DwarfUnit::Skeleton down into DwarfCompileUnit
Type units no longer have skeletons and it's misleading to be able to
query for a type unit's skeleton (it might incorrectly lead one to
conclude that if a unit doesn't have a skeleton it's not in a .dwo
file... ).

llvm-svn: 221055
2014-11-01 18:18:07 +00:00
David Blaikie f6dac29a83 Sink DwarfDebug::AbstractSPDies down into DwarfFile
This is the first big step to allowing gmlt-like inline scope
information in the skeleton CU. While this commit doesn't change the
functionality, it's only a small step to call
"constructAbstractSubprogramDIE" on both the InfoHolder and the
SkeletonHolder (when in use) and that will at least create the abstract
SP dies in that case, though still not creating the other subprograms.

llvm-svn: 221051
2014-11-01 17:21:26 +00:00
David Blaikie 2f08093dda Remove unused function
llvm-svn: 221037
2014-11-01 01:15:26 +00:00
David Blaikie a8be0a794f And... fix the build some more.
llvm-svn: 221036
2014-11-01 01:15:24 +00:00
David Blaikie 1998a2e789 Just iterate the DwarfCompileUnits rather than trying to filter them out of the list of all units.
llvm-svn: 221034
2014-11-01 01:11:19 +00:00
David Blaikie bbad0bbb2f Add '*' to auto variable that is a pointer, as per the coding conventions.
llvm-svn: 221033
2014-11-01 01:03:39 +00:00
David Blaikie a33cd6aa27 Add DwarfCompileUnit::getSkeleton that returns DwarfCompileUnit* to avoid having to cast from DwarfUnit* on every call.
llvm-svn: 221031
2014-11-01 00:50:34 +00:00
Duncan P. N. Exon Smith 3872d0084c IR: MDNode => Value: Instruction::getMetadata()
Change `Instruction::getMetadata()` to return `Value` as part of
PR21433.

Update most callers to use `Instruction::getMDNode()`, which wraps the
result in a `cast_or_null<MDNode>`.

llvm-svn: 221024
2014-11-01 00:10:31 +00:00
David Blaikie 1d96cc2ff3 Sink some of DwarfDebug::collectDeadVariables down into DwarfCompileUnit.
llvm-svn: 221010
2014-10-31 22:30:30 +00:00
David Blaikie 49be5b357b Sink most of DwarfDebug::constructAbstractSubprogramScopeDIE into DwarfCompileUnit
llvm-svn: 221005
2014-10-31 21:57:02 +00:00
Quentin Colombet c32615dfef [CodeGenPrepare] Move extractelement close to store if they can be combined.
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.


** Context **

Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.


** Motivating Example **

Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
 %in1 = load <2 x i32>* %addr1, align 8
 %extract = extractelement <2 x i32> %in1, i32 1
 %out = or i32 %extract, 1
 store i32 %out, i32* %dest, align 4
 ret void
}

As it is, this IR generates the following assembly on armv7:
  vldr  d16, [r0]            @vector load  
  vmov.32 r0, d16[1]  @ cross-register-file copy: 20 cycles
  orr r0, r0, #1           @ scalar bitwise or
  str r0, [r1]               @ scalar store
  bx  lr

Whereas we could generate much faster code:
  vldr  d16, [r0]               @ vector load
  vorr.i32  d16, #0x1     @ vector bitwise or
  vst1.32 {d16[1]}, [r1:32] @ vector extract + store
  bx  lr

Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.


** Proposed Solution **

To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.

Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).

The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.

For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.

[1] We can imagine targets that can combine an extractelement with  other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.

Differential Revision: http://reviews.llvm.org/D5921

<rdar://problem/14170854>

llvm-svn: 220978
2014-10-31 17:52:53 +00:00
David Blaikie fb2efde341 Correct assert text from r220923
Noticed in post-commit review by Adrian Prantl.

llvm-svn: 220967
2014-10-31 16:45:36 +00:00
Hao Liu e02b1a068f PR20557: Fix the bug that bogus cpu parameter crashes llc on AArch64 backend.
Initial patch by Oleg Ranevskyy.

llvm-svn: 220945
2014-10-31 02:35:34 +00:00
Ahmed Bougacha 9f336c4ec5 [SelectionDAG] When scalarizing trunc, don't assert for legal operands.
r212242 introduced a legalizer hook, originally to let AArch64 widen
v1i{32,16,8} rather than scalarize, because the legalizer expected, when
scalarizing the result of a conversion operation, to already have
scalarized the operands.  On AArch64, v1i64 is legal, so that commit
ensured operations such as v1i32 = trunc v1i64 wouldn't assert.

It did that by choosing to widen v1 types whenever possible.  However,
v1i1 types, for which there's no legal widened type, would still trigger
the assert.

This commit fixes that, by only scalarizing a trunc's result when the
operand has already been scalarized, and introducing an extract_elt
otherwise.  
This is similar to r205625.

Fixes PR20777.

llvm-svn: 220937
2014-10-30 23:46:50 +00:00
Louis Gerbarg e8f9c78247 Fix incorrect invariant check in DAG Combine
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.

Tests included.

rdar://18816719

llvm-svn: 220933
2014-10-30 22:21:03 +00:00
David Blaikie 76fd43c653 PR21408: Workaround the appearance of duplicate variables due to problems when inlining two calls to the same function from the same call site.
llvm-svn: 220923
2014-10-30 20:20:11 +00:00
NAKAMURA Takumi f51a34ec1f Whitespace.
llvm-svn: 220857
2014-10-29 15:23:11 +00:00
David Blaikie ff468a5e0b Minimize the scope of some variables, NFC.
llvm-svn: 220759
2014-10-28 02:57:26 +00:00
Lang Hames 5fe30ca56f [PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of these
sets as keys into a cache of interference matrice values in the Interference
constraint adder.

Creating interference matrices was one of the large remaining time-sinks in
PBQP. Caching them reduces the total compile time (when using PBQP) on the
nightly test suite by ~10%.

llvm-svn: 220688
2014-10-27 17:44:25 +00:00
David Blaikie df1cca3a24 Remove some unnecessary casts.
llvm-svn: 220658
2014-10-26 23:37:04 +00:00
Frederic Riss 987fe22789 Sink DwarfUnit::constructImportedEntityDIE into DwarfCompileUnit.
So that it has access to getOrCreateGlobalVariableDIE. If we ever support
decsribing using directive in C++ classes (thus requiring support in type
units), it will certainly use another mechanism anyway.

Differential Revision: http://reviews.llvm.org/D5975

llvm-svn: 220594
2014-10-24 21:31:09 +00:00
Matt Arsenault 4b7bd2d52f Fix copy paste comment
llvm-svn: 220581
2014-10-24 18:13:10 +00:00
David Blaikie 80e5b1ebd1 DebugInfo: Sink DwarfDebug::ScopeVariables down into DwarfFile
(part of refactoring to allow subprogram emission in both the skeleton
and main units to enable -gmlt-like data to be included in the skeleton
for live inlined backtracing purposes)

llvm-svn: 220578
2014-10-24 17:57:34 +00:00
David Blaikie 0dc623c232 Remove DwarfDebug::FirstCU as it has no use
It was only being used as a flag to identify the lack of debug info from
within endModule - use the section labels for that instead.

llvm-svn: 220575
2014-10-24 17:53:38 +00:00
Sanjay Patel 957efc23bb Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

llvm-svn: 220570
2014-10-24 17:02:16 +00:00
Marcello Maggioni 2259400f8e Added reset of LexicalScope in LiveDebugVariables reset function.
llvm-svn: 220545
2014-10-24 02:46:50 +00:00
Timur Iskhodzhanov 2bc90fdbdc Fix PR21189 -- Emit symbol subsection required to debug LLVM-built binaries with VS2012+
Reviewed at http://reviews.llvm.org/D5772

llvm-svn: 220544
2014-10-24 01:27:45 +00:00
David Blaikie f4c504e03c DebugInfo: Remove DwarfDebug::addScopeVariable now that it's just a trivial wrapper
llvm-svn: 220542
2014-10-24 00:43:47 +00:00
Ahmed Bougacha 7daf3b89f9 [SelectionDAG] Teach the vector scalarizer about FP conversions.
This adds support for legalization of instructions of the form:

  [fp_conv] <1 x i1> %op to <1 x double>

where fp_conv is one of fpto[us]i, [us]itofp.  This used to assert
because they were simply missing from the vector operand scalarizer.

A similar problem arose in r190830, with trunc instead.

Fixes PR20778.

Differential Revision: http://reviews.llvm.org/D5810

llvm-svn: 220533
2014-10-23 22:49:25 +00:00
Ahmed Bougacha e05afd4d1e Update comment and fix typos in assert message. (NFC)
llvm-svn: 220531
2014-10-23 22:40:34 +00:00
Tim Northover e4c7be56bf ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodes
x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS
dependency because it was represented by a pair of CopyFromReg(EFLAGS) ->
CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an
implicit-def on the instruction, where the result numbers in the DAG and the
Uses list in TableGen matched up precisely.

The Copy notation seems much more robust, so this patch extends ScheduleDAG
rather than refactoring x86.

Should fix PR20376.

llvm-svn: 220529
2014-10-23 22:31:48 +00:00
David Blaikie 1dd573db45 DebugInfo: Remove DwarfDebug::CurrentFnArguments since we have to handle argument ordering of other arguments (abstract arguments) in the same way and already have code for that too.
While refactoring this code I was confused by both the name I had
introduced (addNonArgumentVariable... but it has all this logic to
handle argument numbering and keep things in order?) and by the
redundancy. Seems when I fixed the misordered inlined argument handling,
I didn't realize it was mostly redundant with the argument ordering code
(which I may've also written, I'm not sure). So let's just rely on the
more general case.

The only oddity in output this produces is that it means when we emit
all the variables for the current function, we don't track when we've
finished the argument variables and are about to start the local
variables and insert DW_AT_unspecified_parameters (for varargs
functions) there. Instead it ends up after the local variables, scopes,
etc. But this isn't invalid and doesn't cause DWARF consumers problems
that I know of... so we'll just go with that because it makes the code
nice & simple.

(though, let's see what the buildbots have to say about this - *crosses
fingers*)

There will be some cleanup commits to follow to remove the now trivial
wrappers, etc.

llvm-svn: 220527
2014-10-23 22:27:50 +00:00
David Blaikie 3b5c84008d DebugInfo: Sink DwarfDebug::addNonArgumentScopeVariable into DwarfFile.
llvm-svn: 220520
2014-10-23 22:04:30 +00:00
David Blaikie ff4181adec DebugInfo: Remove DwarfDebug::addCurrentFnArgument declaration now that it's moved to DwarfFile.
llvm-svn: 220515
2014-10-23 21:53:17 +00:00
David Blaikie 49cfc8ca8e DebugInfo: Simplify/tidy/correct global variable decl/def emission handling.
This fixes a bug (introduced by fixing the IR emitted from Clang where
the definition of a static member would be scoped within the class,
rather than within its lexical decl context) where the definition of a
static variable would be placed inside a class.

It also improves source fidelity by scoping static class member
definitions inside the lexical decl context in which tehy are written
(eg: namespace n { class foo { static int i; } int foo::i; } - the
definition of 'i' will be within the namespace 'n' in the DWARF output
now).

Lastly, and the original goal, this reduces debug info size slightly
(and makes debug info easier to read, etc) by placing the definitions of
non-member global variables within their namespace, rather than using a
separate namespace-scoped declaration along with a definition at global
scope.

Based on patches and discussion with Frédéric.

llvm-svn: 220497
2014-10-23 19:12:43 +00:00
David Blaikie bcfa8a56b1 Remove explicit (void) use of DwarfFile::DD that was accidentally left in r220452.
Caught in post-commit review by Frédéric.

llvm-svn: 220487
2014-10-23 16:12:58 +00:00
David Blaikie f299947bfa [DebugInfo] Sink DwarfDebug::addCurrentFnArgument down into DwarfFile.
Variable handling will be sunk into DwarfFile so that abstract variables
and the like can be shared across multiple CUs (to handle cross-CU
inlining, for example).

llvm-svn: 220453
2014-10-23 00:16:05 +00:00
David Blaikie 2b22b1e9a2 [DebugInfo] Add DwarfDebug& to DwarfFile.
Use the DwarfDebug in one function that previously took it as a
parameter, and lay the foundation for use this for other operations
coming soon.

llvm-svn: 220452
2014-10-23 00:16:03 +00:00
David Blaikie 263a008525 [DebugInfo] Remove LexicalScopes::isCurrentFunctionScope and CSE a use of LexicalScopes::getCurrentFunctionScope
Now that we're sure the only root (non-abstract) scope is the current
function scope, there's no need for isCurrentFunctionScope, the property
can be tested directly instead.

llvm-svn: 220451
2014-10-23 00:06:27 +00:00
Benjamin Kramer 7ad22403fb Strength reduce constant-sized vectors into arrays. No functionality change.
llvm-svn: 220412
2014-10-22 19:55:26 +00:00
Matt Arsenault 8226ca29e2 Fix typo
llvm-svn: 220353
2014-10-22 00:28:59 +00:00
Matt Arsenault 7c93690be0 Add minnum / maxnum codegen
llvm-svn: 220342
2014-10-21 23:01:01 +00:00
Arnaud A. de Grandmaison 5c7fe7e97b Pacify bots and simplify r220321
llvm-svn: 220335
2014-10-21 21:50:49 +00:00
Arnaud A. de Grandmaison a61262f989 [PBQP] Teach PassConfig to tell if the default register allocator is used.
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.

llvm-svn: 220321
2014-10-21 20:47:22 +00:00
Arnaud A. de Grandmaison d3648d0cbe [PBQP] Fix coalescing benefits
As coalescing registers is a benefit, the cost should be improved (i.e. made smaller) when coalescing is possible.

llvm-svn: 220302
2014-10-21 16:24:15 +00:00
Rafael Espindola c606bfe660 Fix a bit of confusion about .set and produce more readable assembly.
Every target we support has support for assembly that looks like

a = b - c
.long a

What is special about MachO is that the above combination suppresses the
production of a relocation.

With this change we avoid producing the intermediary labels when they don't
add any value.

llvm-svn: 220256
2014-10-21 01:17:30 +00:00
Rafael Espindola 74dd8547db Make AsmPrinter::EmitLabelOffsetDifference a static helper and simplify.
It had exactly one caller in a position where we know hasSetDirective is true.

llvm-svn: 220250
2014-10-21 00:25:49 +00:00
Philip Reames 5a3f5f751b Introduce enum values for previously defined metadata types. (NFC)
Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well.  Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison.  This change adds enum value for three existing metadata types:
+    MD_nontemporal = 9, // "nontemporal"
+    MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access"
+    MD_nonnull = 11 // "nonnull"

I went through an updated various uses as well.  I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate.  For example, there were several items in LoopInfo.cpp I chose not to update.

llvm-svn: 220248
2014-10-21 00:13:20 +00:00
Lang Hames ad0962aec5 [PBQP] Replace the interference-constraints algorithm with a faster version
loosely based on linear scan.

On x86-64 this is good for a ~2% drop in compile time on the nightly test suite.

llvm-svn: 220143
2014-10-18 17:26:07 +00:00
Pete Cooper 230332f4fe Check for dynamic alloca's when selecting lifetime intrinsics.
TL;DR: Indexing maps with [] creates missing entries.

The long version:

When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime.  Trouble is, we don't first check to see if this is a dynamic alloca.

On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime.  0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots.

PEI would later trigger an assert because it expects the stack protector to not be dead.

This fix ensures that we only get frame indices for static allocas, ie, those in the map.  Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken.

rdar://problem/18672951

llvm-svn: 220099
2014-10-17 22:59:33 +00:00
Juergen Ributzka ad2363f9ee [Stackmaps] Enable invoking the patchpoint intrinsic.
Patch by Kevin Modzelewski
Reviewers: atrick, ributzka
Reviewed By: ributzka
Subscribers: llvm-commits, reames

Differential Revision: http://reviews.llvm.org/D5634

llvm-svn: 220055
2014-10-17 17:39:00 +00:00
Jan Vesely af62cf4db0 SelectionDAG: Add sext_inreg optimizations
v2: use dyn_cast
    fixup comments
v3: use cast

Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220044
2014-10-17 14:45:25 +00:00
Juergen Ributzka fd4633e1a5 Reduce code duplication between patchpoint and non-patchpoint lowering. NFC.
This is in preparation for another patch that makes patchpoints invokable.

Reviewers: atrick, ributzka
Reviewed By: ributzka
Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5657

llvm-svn: 219967
2014-10-16 21:26:35 +00:00
Robin Morisset e2de06bef6 Erase fence insertion from SelectionDAGBuilder.cpp (NFC)
Summary:
Backends can use setInsertFencesForAtomic to signal to the middle-end that
montonic is the only memory ordering they can accept for
stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger
ordering to fences + monotonic accesses is currently living in
SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it
for several reasons:
- There is lots of redundancy to avoid: extremely similar logic already
  exists in AtomicExpand.
- The current code in SelectionDAGBuilder does not use any target-hooks, it
  does the same transformation for every backend that requires it
- As a result it is plain *unsound*, as it was apparently designed for ARM.
  It happens to mostly work for the other targets because they are extremely
  conservative, but Power for example had to switch to AtomicExpand to be
  able to use lwsync safely (see r218331).
- Because it produces IR-level fences, it cannot be made sound ! This is noted
  in the C++11 standard (section 29.3, page 1140):
```
Fences cannot, in general, be used to restore sequential consistency for atomic
operations with weaker ordering semantics.
```
It can also be seen by the following example (called IRIW in the litterature):
```
atomic<int> x = y = 0;
int r1, r2, r3, r4;
Thread 0:
  x.store(1);
Thread 1:
  y.store(1);
Thread 2:
  r1 = x.load();
  r2 = y.load();
Thread 3:
  r3 = y.load();
  r4 = x.load();
```
r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst.
But if they are lowered to monotonic accesses, no amount of fences can prevent it..

This patch does three things (I could cut it into parts, but then some of them
would not be tested/testable, please tell me if you would prefer that):
- it provides a default implementation for emitLeadingFence/emitTrailingFence in
terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder.
As we saw above, this is unsound, but the best that can be done without knowing
the targets well (and there is a comment warning about this risk).
- it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default
implementation (that exactly replicates the logic of SelectionDAGBuilder, so no
functional change)
- it finally erase this logic from SelectionDAGBuilder as it is dead-code.

Ideally, each target would define its own override for emitLeading/TrailingFence
using target-specific fences, but I do not know the Sparc/Mips/XCore memory model
well enough to do this, and they appear to be dealing fine with the ARM-inspired
default expansion for now (probably because they are overly conservative, as
Power was). If anyone wants to compile fences more agressively on these
platforms, the long comment should make it clear why he should first override
emitLeading/TrailingFence.

Test Plan: make check-all, no functional change

Reviewers: jfb, t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5474

llvm-svn: 219957
2014-10-16 20:34:57 +00:00
Eric Christopher 2181fb2ff3 Avoid caching the MachineFunction, we don't use it outside of
runOnMachineFunction.

llvm-svn: 219847
2014-10-15 21:06:25 +00:00
Rafael Espindola 7b61ddfa6e Simplify handling of --noexecstack by using getNonexecutableStackSection.
llvm-svn: 219799
2014-10-15 16:12:52 +00:00
Jingyue Wu 2954280f6a [MachineSink] Use the real post dominator tree
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in
isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo.
The old heuristics caused instructions to sink unnecessarily, and might create
register pressure.

This is the second try of the fix. The first one (D4814) caused a performance
regression due to failing to sink instructions out of loops (PR21115). This
patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower
one regardless of whether the target block post-dominates the source.

Thanks Alexey Volkov for reporting PR21115! 

Test Plan:
Added a NVPTX codegen test to verify that our change prevents the backend from
over-sinking. It also shows the unnecessary register pressure caused by
over-sinking.

Added an X86 test to verify we can sink instructions out of loops regardless of
the dominance relationship. This test is reduced from Alexey's test in PR21115.

Updated an affected test in X86.

Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime
performance. Results are attached separately in the review thread.

Reviewers: Jiangning, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D5633

llvm-svn: 219773
2014-10-15 03:27:43 +00:00
Gerolf Hoflehner a4c96d02a2 [AAarch64] Optimize CSINC-branch sequence
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.

Examples:

1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
   to b.<invCC>

2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
   to b.<CC>


rdar://problem/18506500

llvm-svn: 219742
2014-10-14 23:07:53 +00:00
Rafael Espindola 76936ebc49 Remove unused member variable.
Fixes pr20904.

llvm-svn: 219706
2014-10-14 18:53:16 +00:00
David Blaikie 3dfe4788ae DebugInfo: Ensure that all debug location scope chains from instructions within a function, lead to the function itself.
Let me tell you a tale...

Originally committed in r211723 after discovering a nasty case of weird
scoping due to inlining, this was reverted in r211724 after it fired in
ASan/compiler-rt.

(minor diversion where I accidentally committed/reverted again in
r211871/r211873)

After further testing and fixing bugs in ArgumentPromotion (r211872) and
Inlining (r212065) it was recommitted in r212085. Reverted in r212089
after the sanitizer buildbots still showed problems.

Fixed another bug in ArgumentPromotion (r212128) found by this
assertion.

Recommitted in r212205, reverted in r212226 after it crashed some more
on sanitizer buildbots.

Fix clang some more in r212761.

Recommitted in r212776, reverted in r212793. ASan failures.
Recommitted in r213391, reverted in r213432, trying to reproduce flakey
ASan build failure.

Fixed bugs in r213805 (ArgPromo + DebugInfo), r213952
(LiveDebugVariables strips dbg_value intrinsics in functions not
described by debug info).

Recommitted in r214761, reverted in r214999, flakey failure on Windows
buildbot.

Fixed DeadArgElimination + DebugInfo bug in r219210.

Recommitted in r219215, reverted in r219512, failure on ObjC++ atomic
properties in the test-suite on Darwin.

Fixed ObjC++ atomic properties issue in Clang in r219690.

[This commit is provided 'as is' with no hope that this is the last time
I commit this change either expressed or implied]

llvm-svn: 219702
2014-10-14 18:22:52 +00:00
David Blaikie 24026502d5 Revert "Fix stuff... again."
Accidental commit.

This reverts commit r219693.

llvm-svn: 219695
2014-10-14 17:13:09 +00:00
David Blaikie e75f963c61 Revert some parts of r196288 that were confusing and untested.
If we figure out why they should be here, let's add some testing of some
kind so we can better demonstrate why it's needed.

llvm-svn: 219694
2014-10-14 17:12:02 +00:00
David Blaikie 27549023b0 Fix stuff... again.
llvm-svn: 219693
2014-10-14 17:11:59 +00:00
Eric Christopher 307c2cb26f Remove unnecessary TargetMachine.h includes.
llvm-svn: 219672
2014-10-14 07:22:08 +00:00
Eric Christopher 6062180203 Grab the subtarget and subtarget dependent variables off of
MachineFunction rather than TargetMachine.

llvm-svn: 219671
2014-10-14 07:22:00 +00:00
Eric Christopher b66367a891 Grab the subtarget and subtarget dependent variables off of
MachineFunction rather than TargetMachine.

llvm-svn: 219670
2014-10-14 07:17:23 +00:00
Eric Christopher 92b4bcbbee Instead of the TargetMachine cache the MachineFunction
and TargetRegisterInfo in the peephole optimizer. This
makes it easier to grab subtarget dependent variables off
of the MachineFunction rather than the TargetMachine.

llvm-svn: 219669
2014-10-14 07:17:20 +00:00
Eric Christopher eb9e87f6e3 Access subtarget specific variables off of the MachineFunction's
cached subtarget and not the TargetMachine.

llvm-svn: 219668
2014-10-14 07:00:33 +00:00
Eric Christopher 99556d77ef Access the subtarget off of the MachineFunction via the DAG
scheduler or via the SelectionDAG if available. Otherwise
grab the subtarget off of the MachineFunction by going up
the parent chain.

llvm-svn: 219666
2014-10-14 06:56:25 +00:00
Eric Christopher b65c7b919c Remove the use and member variable of the TargetMachine from
MachineLICM as we can get the same data off of the MachineFunction.

llvm-svn: 219663
2014-10-14 06:26:57 +00:00
Eric Christopher 20c98938bb Have MachineInstrBundle use the MachineFunction for subtarget
access rather than the TargetMachine.

llvm-svn: 219662
2014-10-14 06:26:55 +00:00
Eric Christopher d3fa440d08 Access the subtarget off of the MachineFunction rather than
through the TargetMachine.

llvm-svn: 219661
2014-10-14 06:26:53 +00:00
Eric Christopher 2a321f74f0 Remove the TargetMachine from DFAPacketizer since it was only
being used to grab subtarget specific things that we can grab
from the MachineFunction anyhow.

llvm-svn: 219650
2014-10-14 01:03:16 +00:00
Eric Christopher 1c5fce0ebb Migrate another set of getSubtargetImpl away.
llvm-svn: 219636
2014-10-13 21:57:44 +00:00
Adrian Prantl 049d21caea Add an assertion about the integrity of the iterator.
Broken parent scope pointers in inlined DIVariables can cause
ensureAbstractVariableIsCreated to insert new abstract scopes, thus
invalidating the iterator in this loop and leading to hard-to-debug
crashes. Useful when manually reducing IR for testcases.

llvm-svn: 219628
2014-10-13 20:44:58 +00:00
Adrian Prantl 13c58820f8 constify the getters in SDNodeDbgValue.
llvm-svn: 219627
2014-10-13 20:43:47 +00:00