Commit Graph

72254 Commits

Author SHA1 Message Date
Akira Hatanaka 452ea6698f [X86, X87 stackifier] Do not mark an operand of a debug instruction as kill.
<rdar://problem/16952634>

llvm-svn: 215962
2014-08-19 02:09:57 +00:00
Duncan P. N. Exon Smith dfd04582ae IR: Reduce RAUW traffic in ConstantExpr
Avoid RAUW-ing `ConstantExpr` when an operand changes unless the new
`ConstantExpr` already has users.  This prevents the RAUW from rippling
up the expression tree unnecessarily.

This commit indirectly adds test coverage for r215953 (this is how I
came across the bug).

This is part of PR20515.

llvm-svn: 215960
2014-08-19 01:12:53 +00:00
Duncan P. N. Exon Smith 344ebf7c9a IR: Replace uses of ConstantAggrUniqueMap with ConstantUniqueMap
Now that `ConstantAggrUniqueMap` and `ConstantUniqueMap` work the same
way, change the aggregates to use the new one.

llvm-svn: 215959
2014-08-19 01:02:18 +00:00
Duncan P. N. Exon Smith 864ccec435 Remove extraneous typenames from r215957
llvm-svn: 215958
2014-08-19 00:55:34 +00:00
Duncan P. N. Exon Smith 8d12558bad IR: Rewrite ConstantUniqueMap
Rewrite `ConstantUniqueMap` to be more similar to
`ConstantAggrUniqueMap`.

  - Use a `DenseMap` with custom MapInfo instead of a `std::map` with
    linear lookups and deletion.
  - Don't waste memory explicitly storing (heavyweight) keys.

Only `ConstantExpr` and `InlineAsm` actually use this data structure, so
I also updated them to use it.

This code cleanup is a precursor to reducing RAUW traffic on
`ConstantExpr` -- I felt badly adding a new (linear) call to
`ConstantUniqueMap::FindExistingKey`, so this designs away the concern.

A follow-up commit will transition the users of `ConstantAggrUniqueMap`
over.

llvm-svn: 215957
2014-08-19 00:42:32 +00:00
Duncan P. N. Exon Smith 3fa26b7661 IR: Declare LookupKey right before its use, NFC
llvm-svn: 215956
2014-08-19 00:24:26 +00:00
Duncan P. N. Exon Smith 1cd4aebefe IR: ArrayRef-ize {Insert,Extract}ValueConstantExpr constructors
No functionality change.

llvm-svn: 215955
2014-08-19 00:23:17 +00:00
Duncan P. N. Exon Smith 25fb110f21 Prevent clang-format from moving the namespace closing brace, NFC
llvm-svn: 215954
2014-08-19 00:21:04 +00:00
Duncan P. N. Exon Smith 7b859ff22c NVPTX: Use RAUW instead of reinventing the wheel
This code had a homemade RAUW that was incorrect when a user was a
constant: instead of calling `replaceUsersWithOnConstant()` it would
incorrectly update the operand in-place, invalidating
`LLVMContextImpl::ExprConstants`.  RAUW does the job better.

The ValueHandle that `GVMap` is holding onto needs to be removed first,
so this commit also removes each variable from the map on-the-fly.

Since deletions from `ExprConstants` use a linear search that compares
directly on the pointer value (instead of using the key), there isn't an
obvious way to expose this with a testcase.

llvm-svn: 215953
2014-08-19 00:20:02 +00:00
Duncan P. N. Exon Smith 171699083d LLParser: Handle BlockAddresses on-the-fly
Previously all `blockaddress()` constants were treated as forward
references.  They were resolved twice:  once at the end of the function
in question, and again at the end of the module.  Furthermore, if the
same blockaddress was referenced N times, the parser created N distinct
`GlobalVariable`s (one for each reference).

Instead, resolve all block addresses at the beginning of the function,
creating the standard `BasicBlock` forward references used for all other
basic block references.  After the function, all references can be
resolved immediately.  To check for the condition of parsing block
addresses from within the same function, I created a reference to the
current per-function-state in `BlockAddressPFS`.

Also, create only one forward-reference per basic block.  Because
forward references to block addresses are rare, the data structure here
shouldn't matter.  If somehow it does someday, this can be pretty easily
changed to a `DenseMap<std::pair<ValID, ValID>, GV>`.

This is part of PR20515.

llvm-svn: 215952
2014-08-19 00:13:19 +00:00
Rafael Espindola 57d9ab648c Use a range loop. NFC.
llvm-svn: 215948
2014-08-18 23:15:59 +00:00
Rafael Espindola 43d22b83dc These classes only need a StringRef, not a MemoryBuffer.
llvm-svn: 215945
2014-08-18 22:28:28 +00:00
Rafael Espindola 7342ca4155 Delete unused method.
llvm-svn: 215944
2014-08-18 22:20:18 +00:00
Robin Morisset 9e98e7f7fc Answer to Philip Reames comments
- add check for volatile (probably unneeded, but I agree that we should be conservative about it).
- strengthen condition from isUnordered() to isSimple(), as I don't understand well enough Unordered semantics (and it also matches the comment better this way) to be confident in the previous behaviour (thanks for catching that one, I had missed the case Monotonic/Unordered).
- separate a condition in two.
- lengthen comment about aliasing and loads
- add tests in GVN/atomic.ll

llvm-svn: 215943
2014-08-18 22:18:14 +00:00
Robin Morisset 4ffe8aaa69 Weak relaxing of the constraints on atomics in MemoryDependencyAnalysis
Monotonic accesses do not have to kill the analysis, as long as the QueryInstr is not
itself atomic.

llvm-svn: 215942
2014-08-18 22:18:11 +00:00
Lang Hames 4f867bfada [MCJIT] Respect target endianness in RuntimeDyldMachO and RuntimeDyldChecker.
This patch may address some of the issues described in http://llvm.org/PR20640.

llvm-svn: 215938
2014-08-18 21:43:16 +00:00
Kevin Enderby ec5ca03674 Make llvm-objdump handle both arm and thumb disassembly from the same Mach-O
file with -macho, the Mach-O specific object file parser option.

After some discussion I chose to do this implementation contained in the logic
of llvm-objdump’s MachODump.cpp using a second disassembler for thumb when
needed and with updates mostly contained in the MachOObjectFile class.

llvm-svn: 215931
2014-08-18 20:21:02 +00:00
Quentin Colombet 7e939fb431 [X86][Haswell][SchedModel] Tidy up.
<rdar://problem/15607571>

llvm-svn: 215924
2014-08-18 17:56:01 +00:00
Quentin Colombet 95e053119e [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Floating Point XMM and YMM instructions.
Sub-group: Other instructions.

<rdar://problem/15607571>

llvm-svn: 215923
2014-08-18 17:55:59 +00:00
Quentin Colombet 81db56d931 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Floating Point XMM and YMM instructions.
Sub-group: Logic instructions.

<rdar://problem/15607571>

llvm-svn: 215922
2014-08-18 17:55:56 +00:00
Quentin Colombet c13c50e0f3 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Floating Point XMM and YMM instructions.
Sub-group: Math instructions.

<rdar://problem/15607571>

llvm-svn: 215921
2014-08-18 17:55:53 +00:00
Quentin Colombet 45c469c0c3 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Floating Point XMM and YMM instructions.
Sub-group: Arithmetic instructions.

<rdar://problem/15607571>

llvm-svn: 215920
2014-08-18 17:55:51 +00:00
Quentin Colombet ca74f23df7 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Floating Point XMM and YMM instructions.
Sub-group: Conversion instructions.

<rdar://problem/15607571>

llvm-svn: 215919
2014-08-18 17:55:49 +00:00
Quentin Colombet 71cdecd73c [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Floating Point XMM and YMM instructions.
Sub-group: Move instructions.

<rdar://problem/15607571>

llvm-svn: 215918
2014-08-18 17:55:46 +00:00
Quentin Colombet bd11563742 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Integer MMX and XMM instructions.
Sub-group: Other instructions.

<rdar://problem/15607571>

llvm-svn: 215917
2014-08-18 17:55:43 +00:00
Quentin Colombet 91513d9522 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Integer MMX and XMM instructions.
Sub-group: Logic instructions.

<rdar://problem/15607571>

llvm-svn: 215916
2014-08-18 17:55:41 +00:00
Quentin Colombet e9f8b4b7ac [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Integer MMX and XMM instructions.
Sub-group: Arithmetic instructions.

<rdar://problem/15607571>

llvm-svn: 215915
2014-08-18 17:55:39 +00:00
Quentin Colombet f68e09418c [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Integer MMX and XMM instructions.
Sub-group: Move instructions.

<rdar://problem/15607571>

llvm-svn: 215914
2014-08-18 17:55:36 +00:00
Quentin Colombet 33b0bf200d [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Floating Point x87 instructions.
Sub-group: Math instructions.

<rdar://problem/15607571>

llvm-svn: 215913
2014-08-18 17:55:32 +00:00
Quentin Colombet 456c991fb4 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Floating Point x87 instructions.
Sub-group: Arithmetic instructions.

<rdar://problem/15607571>

llvm-svn: 215912
2014-08-18 17:55:29 +00:00
Quentin Colombet 0bc907e5e8 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Floating Point x87 instructions.
Sub-group: Move instructions.

<rdar://problem/15607571>

llvm-svn: 215911
2014-08-18 17:55:26 +00:00
Quentin Colombet 6e62be2f5a [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Integer instructions.
Sub-group: Other instructions.

<rdar://problem/15607571>

llvm-svn: 215910
2014-08-18 17:55:23 +00:00
Quentin Colombet a6c56f5072 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Integer instructions.
Sub-group: Synchronization instructions.

<rdar://problem/15607571>

llvm-svn: 215909
2014-08-18 17:55:21 +00:00
Quentin Colombet c58fc449fd [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Integer instructions.
Sub-group: String instructions.

<rdar://problem/15607571>

llvm-svn: 215908
2014-08-18 17:55:19 +00:00
Quentin Colombet e1b17768a0 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Integer instructions.
Sub-group: Control transfer instructions.

<rdar://problem/15607571>

llvm-svn: 215907
2014-08-18 17:55:16 +00:00
Quentin Colombet fb887b1c05 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Integer instructions.
Sub-group: Logic instructions.

<rdar://problem/15607571>

llvm-svn: 215906
2014-08-18 17:55:13 +00:00
Quentin Colombet df26059e13 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Integer instructions.
Sub-group: Arithmetic instructions.

<rdar://problem/15607571>

llvm-svn: 215905
2014-08-18 17:55:11 +00:00
Quentin Colombet 35d37b7571 [X86][Haswell][SchedModel] Add architecture specific scheduling models.
Group: Integer instructions.
Sub-group: Move instructions.

<rdar://problem/15607571>

llvm-svn: 215904
2014-08-18 17:55:08 +00:00
Robin Morisset b155f529fc Make use of isAtLeastRelease/Acquire in the ARM/AArch64 backends
Summary:
Make use of isAtLeastRelease/Acquire in the ARM/AArch64 backends
These helper functions are introduced in D4844.
Depends D4844

Test Plan: make check-all passes

Reviewers: jfb

Subscribers: aemerson, llvm-commits, mcrosier, reames

Differential Revision: http://reviews.llvm.org/D4937

llvm-svn: 215902
2014-08-18 16:48:58 +00:00
Oliver Stannard f5469bec97 Teach the AArch64 backend to handle f16
This allows the AArch64 backend to handle fadd, fsub, fmul and fdiv
operations on f16 (half-precision) types by promoting to f32.

llvm-svn: 215891
2014-08-18 14:22:39 +00:00
Oliver Stannard 12993dd916 [ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.

llvm-svn: 215890
2014-08-18 12:42:15 +00:00
Elena Demikhovsky 34d2d76d25 AVX-512: Fixed a bug in emitting compare for MVT:i1 type.
Added a test.

llvm-svn: 215889
2014-08-18 11:59:06 +00:00
Aaron Ballman f12dc9c802 Silencing an MSVC warning about loop variable conflicting with a variable from an outer scope. NFC.
llvm-svn: 215888
2014-08-18 11:51:41 +00:00
Tim Northover 26bb14e6a7 TableGen: allow use of uint64_t for available features mask.
ARM in particular is getting dangerously close to exceeding 32 bits worth of
possible subtarget features. When this happens, various parts of MC start to
fail inexplicably as masks get truncated to "unsigned".

Mostly just refactoring at present, and there's probably no way to test.

llvm-svn: 215887
2014-08-18 11:49:42 +00:00
Abramo Bagnara 9c2f73ed20 Added forgotten noexcept.
llvm-svn: 215886
2014-08-18 07:48:18 +00:00
Craig Topper 6230691c91 Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size."
Getting a weird buildbot failure that I need to investigate.

llvm-svn: 215870
2014-08-18 00:24:38 +00:00
Craig Topper 5229cfd163 Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size.
llvm-svn: 215868
2014-08-17 23:47:00 +00:00
Rafael Espindola 3a1623f5e4 Use copy initialization to initialize std::unique_ptr.
Thanks to David Blaikie for the suggestion.

llvm-svn: 215867
2014-08-17 23:38:08 +00:00
Saleem Abdulrasool 3fd996ef5c ARM: mark missing functions from RTABI
Simply indicate the functions that are part of the runtime library that we do
not setup libcalls for.  This is merely for ease of identification.  NFC.

llvm-svn: 215863
2014-08-17 22:51:04 +00:00
Saleem Abdulrasool 017bd57fce ARM: improve RTABI 4.2 conformance on Linux
The set of functions defined in the RTABI was separated for no real reason.
This brings us closer to proper utilisation of the functions defined by the
RTABI.  It also sets the ground for correctly emitting function calls to AEABI
functions on all AEABI conforming platforms.

The previously existing lie on the behaviour of __ldivmod and __uldivmod is
propagated as it is beyond the scope of the change.

The changes to the test are due to the fact that we now use the divmod functions
which return both the quotient and remainder and thus we no longer need to
invoke two functions on Linux (making it closer to EABI's behaviour).

llvm-svn: 215862
2014-08-17 22:51:02 +00:00
Saleem Abdulrasool 740be89f51 ARM: whitespace
Whitespace fix, NFC.

llvm-svn: 215861
2014-08-17 22:50:59 +00:00
Rafael Espindola f43a94e75e Remove unused member variable.
llvm-svn: 215860
2014-08-17 22:48:55 +00:00
Rafael Espindola a6d0b2f801 Return a std::uinque_ptr. Every caller was already using one.
llvm-svn: 215858
2014-08-17 22:37:39 +00:00
Rafael Espindola b16ecf8224 Convert an ownership comment with std::uinque_ptr.
llvm-svn: 215855
2014-08-17 22:20:33 +00:00
Rafael Espindola f7aed8017a Pass a std::uinque_ptr to ParseAssembly to make the ownership explicit. NFC.
llvm-svn: 215852
2014-08-17 21:36:47 +00:00
Rafael Espindola 2bb0c94d42 getLazyIRModule always takes ownership. Make that explicit.
llvm-svn: 215851
2014-08-17 21:22:19 +00:00
Daniel Sanders f28bf76d88 Revert: r215698 - Current implementation of c.cond.fmt instructions only accept default cc0 register...
It causes a number of regressions when -fintegrated-as is enabled. This happens
because there are codegen-only instructions that incorrectly uses the first
operand as the encoding for the $fcc register. The regressions do not occur when
-via-file-asm is also given.

llvm-svn: 215847
2014-08-17 19:47:47 +00:00
Saleem Abdulrasool 78c44725f8 ARM: correct toggling behaviour
This was a thinko.  The intent was to flip the explicit bits that need toggling
rather than all bits.  This would result in incorrect behaviour (which now is
tested).

Thanks to Nico Weber for pointing this out!

llvm-svn: 215846
2014-08-17 19:20:38 +00:00
Rafael Espindola c66d761b97 llvm-objdump: don't print relocations in non-relocatable files.
This matches the behavior of GNU objdump.

llvm-svn: 215844
2014-08-17 19:09:37 +00:00
Rafael Espindola ab73774c47 Add a non-templated ELFObjectFileBase class.
Use it to implement some ELF only virtual interfaces instead of using error
prone series of dyn_casts.

llvm-svn: 215838
2014-08-17 17:52:10 +00:00
Elena Demikhovsky 67d9500444 Reverted last commit
llvm-svn: 215828
2014-08-17 09:39:48 +00:00
Elena Demikhovsky c0b420fdf5 Reverted last commit
llvm-svn: 215827
2014-08-17 09:36:07 +00:00
Elena Demikhovsky 2bb991a0c5 Added a table for intrinsics on X86.
It should remove dosens of lines in handling instrinsics (in a huge switch) and give an easy way to add new intrinsics.
I did not completed to move al intrnsics to the table, I'll do this in the upcomming commits.

llvm-svn: 215826
2014-08-17 09:00:20 +00:00
Owen Anderson a4428aa484 Remove an InstCombine that transformed patterns like (x * uitofp i1 y) to (select y, x, 0.0) when the multiply has fast math flags set.
While this might seem like an obvious canonicalization, there is one subtle problem with it.  The result of the original expression
is undef when x is NaN (remember, fast math flags), but the result of the select is always defined when x is NaN.  This means that the
new expression is strictly more defined than the original one.  One unfortunate consequence of this is that the transform is not reversible!
It's always legal to make increase the defined-ness of an expression, but it's not legal to reduce it.  Thus, targets that prefer the original
form of the expression cannot reverse the transform to recover it.  Another way to think of it is that the transform has lost source-level
information (the fast math flags), which is undesirable.

llvm-svn: 215825
2014-08-17 03:51:29 +00:00
Chandler Carruth 5a37dd6ff4 [x86] Fix an indentation goof in a prior commit. Should have re-run
clang-format.

llvm-svn: 215824
2014-08-17 00:40:34 +00:00
Matt Arsenault 6cc00429ff Fix fmul combines with constant splat vectors
Fixes things like fmul x, 2 -> fadd x, x

llvm-svn: 215820
2014-08-16 10:14:19 +00:00
Chandler Carruth e020f117ce [x86] Teach lots of the new vector shuffle lowering to use UNPCK
instructions for blend operations at 128 bits. This was a serious hole
in our prior blend lowering.

llvm-svn: 215819
2014-08-16 09:42:15 +00:00
David Majnemer 1a0bbc8a5c InstCombine: Fix a potential bug in 0 - (X sdiv C) -> (X sdiv -C)
While *most* (X sdiv 1) operations will get caught by InstSimplify, it
is still possible for a sdiv to appear in the worklist which hasn't been
simplified yet.

This means that it is possible for 0 - (X sdiv 1) to get transformed
into (X sdiv -1); dividing by -1 can make the transform produce undef
values instead of the proper result.

Sorry for the lack of testcase, it's a bit problematic because it relies
on the exact order of operations in the worklist.

llvm-svn: 215818
2014-08-16 09:23:42 +00:00
David Majnemer f9a095d606 InstCombine: Combine mul with div.
We can combne a mul with a div if one of the operands is a multiple of
the other:

%mul = mul nsw nuw %a, C1
%ret = udiv %mul, C2
  =>
%ret = mul nsw %a, (C1 / C2)

This can expose further optimization opportunities if we end up
multiplying or dividing by a power of 2.

Consider this small example:

define i32 @f(i32 %a) {
  %mul = mul nuw i32 %a, 14
  %div = udiv exact i32 %mul, 7
  ret i32 %div
}

which gets CodeGen'd to:

    imull       $14, %edi, %eax
    imulq       $613566757, %rax, %rcx
    shrq        $32, %rcx
    subl        %ecx, %eax
    shrl        %eax
    addl        %ecx, %eax
    shrl        $2, %eax
    retq

We can now transform this into:
define i32 @f(i32 %a) {
  %shl = shl nuw i32 %a, 1
  ret i32 %shl
}

which gets CodeGen'd to:

    leal        (%rdi,%rdi), %eax
    retq

This fixes PR20681.

llvm-svn: 215815
2014-08-16 08:55:06 +00:00
Nico Weber ae050bb057 arm asm: Let .fpu enable instructions, PR20447.
I'm not very happy with duplicating the fpu->feature mapping in ARMAsmParser.cpp
and in clang's driver. See the bug for a patch that doesn't do that, and the
review thread [1] for why this duplication exists.

1: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140811/231052.html
llvm-svn: 215811
2014-08-16 05:37:51 +00:00
Duncan P. N. Exon Smith 5a5fd7b1b3 BitcodeReader: Only create one basic block for each blockaddress
Block address forward-references are implemented by creating a
`BasicBlock` ahead of time that gets inserted in the `Function` when
it's eventually encountered.

However, if the same blockaddress was used in two separate functions
that were parsed *before* the referenced function (and the blockaddress
was never used at global scope), two separate basic blocks would get
created, one of which would be forgotten creating invalid IR.

This commit changes the forward-reference logic to create only one basic
block (and always return the same blockaddress).

llvm-svn: 215805
2014-08-16 01:54:37 +00:00
Duncan P. N. Exon Smith 1318364e3e UseListOrder: Correctly count the number of uses
This is an off-by-one bug I found by inspection, which would only
trigger if the bitcode writer sees more uses of a `Value` than the
reader.  Since this is only relevant when an instruction gets upgraded
somehow, there unfortunately isn't a reasonable way to add test
coverage.

llvm-svn: 215804
2014-08-16 01:54:34 +00:00
Duncan P. N. Exon Smith 0c39d40c8b IR: Don't add inbounds to GEPs of extern_weak variables
Global variables that have `extern_weak` linkage may be null, so it's
incorrect to add `inbounds` when constant folding.

This also fixes a bug when parsing global aliases, whose forward
reference placeholders are global variables with `extern_weak` linkage.
If GEPs to these aliases are encountered before the alias itself, the
GEPs would incorrectly gain the `inbounds` keyword as well.

llvm-svn: 215803
2014-08-16 01:54:32 +00:00
Andrea Di Biagio b23bad11e7 [DAGCombiner] Improve the folding of target independet shuffles to Undef.
When combining a pair of shuffle nodes, check if the combined shuffle mask is
trivially Undef. In case, immediately fold that pair of shuffles to Undef.

The lack of checks for undef masks was the root-cause of a poor-codegen bug
in the dag combiner.

Example:
  %1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6>
  %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3>

Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire
sequence to Undef value and therefore we generated:
  shufps $-123, %xmm1, $xmm0
  pshufd $-46, %xmm0, %xmm0

With this patch, the entire shuffle sequence is folded to Undef and no
shuffles are generated in the output assembly.

Added new test cases to test 'combine-vec-shuffle-5.ll'.

llvm-svn: 215797
2014-08-16 00:29:44 +00:00
Hal Finkel 41a55ad0a5 [PowerPC] Mark fixed-offset byvals as pointed-to by IR values
A byval object, even if allocated at a fixed offset (prescribed by the ABI) is
pointed to by IR values. Most fixed-offset stack objects are not pointed-to by
IR values, so the default is to assume this is not possible. However, we need
to override the default in this case (instruction scheduling can cause
miscompiles otherwise).

Fixes PR20280.

llvm-svn: 215795
2014-08-16 00:17:05 +00:00
Hal Finkel 0815a05fd7 Make isAliased property for fixed-offset stack objects adjustable
We used to assume that any fixed-offset stack object was not aliased. This
meant that no IR value could point to the memory contained in such an object.
This is a reasonable default, but is not a universally-correct
target-independent fact. For example, on PowerPC (both Darwin and non-Darwin),
some byval arguments are allocated at fixed offsets by the ABI. These, however,
certainly can be pointed to by IR values. This change moves the 'isAliased'
logic out of FixedStackPseudoSourceValue and into MFI, and allows the isAliased
property to be overridden for fixed-offset objects.

This will be used by an upcoming commit to the PowerPC backend to fix PR20280.

No functionality change intended (the behavior of
FixedStackPseudoSourceValue::isAliased has been made more conservative for
callers that don't pass an MFI object, but I don't see any in-tree callers that
do that).

llvm-svn: 215794
2014-08-16 00:17:02 +00:00
Hal Finkel dda588cdc1 [PowerPC] Darwin byval arguments are not immutable
On PPC/Darwin, byval arguments occur at fixed stack offsets in the callee's
frame, but are not immutable -- the pointer value is directly available to the
higher-level code as the address of the argument, and the value of the byval
argument can be modified at the IR level.

This is necessary, but not sufficient, to fix PR20280. When PR20280 is fixed in
a follow-up commit, its test case will cover this change.

llvm-svn: 215793
2014-08-16 00:16:29 +00:00
Sean Silva db79484998 Revert "[Support] Promote cl::StringSaver to a separate utility"
This reverts commit r215784 / 3f8a26f6fe16cc76c98ab21db2c600bd7defbbaa.

LLD has 3 StringSaver's, one of which takes a lock when saving the
string... Need to investigate more closely.

llvm-svn: 215790
2014-08-15 23:39:01 +00:00
Robin Morisset d539f866ac Get rid of dead code: SelectAtomic64 in X86ISelDAGtoDAG.cpp
llvm-svn: 215789
2014-08-15 23:36:00 +00:00
Sean Silva 42ec6fdf58 [Support] Promote cl::StringSaver to a separate utility
This class is generally useful.

In breaking it out, the primary change is that it has been made
non-virtual. It seems like being abstract led to there being 3 different
(2 in llvm + 1 in clang) concrete implementations which disagreed about
the ownership of the saved strings (see the manual call to free() in the
unittest StrDupSaver; yes this is different from the CommandLine.cpp
StrDupSaver which owns the stored strings; which is different from
Clang's StringSetSaver which just holds a reference to a
std::set<std::string> which owns the strings).

I've identified 2 other places in the
codebase that are open-coding this pattern:

  memcpy(Alloc.Allocate<char>(strlen(S)+1), S, strlen(S)+1)

I'll be switching them over. They are
* llvm::sys::Process::GetArgumentVector
* The StringAllocator member of YAMLIO's Input class
This also will allow simplifying Clang's driver.cpp quite a bit.

Let me know if there are any other places that could benefit from
StringSaver. I'm also thinking of adding a saveStringRef member for
getting a stable StringRef.

llvm-svn: 215784
2014-08-15 23:18:33 +00:00
Robin Morisset d18cda620c Fix typos in comments
llvm-svn: 215777
2014-08-15 22:17:28 +00:00
Chad Rosier b1bbf6f8ce [AArch32] Add support for FP rounding operations for ARMv8/AArch32.
Phabricator Revision: http://reviews.llvm.org/D4935

llvm-svn: 215772
2014-08-15 21:38:16 +00:00
Nick Kledzik bcd6e2a943 [Option] Support MultiArg in --help
Currently, if you use a MultiArg<> option, then printing out the help/usage
message will cause an assert.  This fixes getOptionHelpName() to work with
MultiArg Options.

llvm-svn: 215770
2014-08-15 21:35:07 +00:00
Rafael Espindola 3931c28b03 Set comdats when lazily linking functions.
We were setting the comdat when functions were copied in the initial pass, but
not when they were linked only when we found out that they are needed.

llvm-svn: 215765
2014-08-15 20:17:08 +00:00
Juergen Ributzka 6597d319fe [FastISel][AArch64] Fix a latent bug in floating-point materialization.
The floating-point value positive zero (+0.0) is a valid immedate value
according to isFPImmLegal. As a result AArch64 FastISel went ahead and
used the immediate version of fmov to materialize the constant.

The problem is that the immediate version of fmov cannot encode an imediate for
postive zero. Instead a fmov from the zero register was supposed to be used in
this case.

This fix adds handling for this special case and uses fmov from the zero
register to materialize a positive zero (negative zeroes go to the constant
pool).

There is no test case for this, because this code is currently dead. It will be
enabled in a future commit and I will add a test case in a separate commit
after that.

This fixes <rdar://problem/18027157>.

llvm-svn: 215753
2014-08-15 18:55:55 +00:00
Juergen Ributzka 6bca986ef1 Reapplying [FastISel][AArch64] Cleanup constant materialization code. NFCI.
Note: This reapplies r215582 without any modifications. The refactoring wasn't
responsible for the buildbot failures.

Original commit message:
Cleanup and prepare constant materialization code for future commits.

llvm-svn: 215752
2014-08-15 18:55:52 +00:00
Matt Arsenault fabf545299 R600/SI: Move all fabs / fneg handling to patterns
llvm-svn: 215749
2014-08-15 18:42:22 +00:00
Matt Arsenault 13623d0e28 R600/SI: Use source modifiers for f64 fneg
llvm-svn: 215748
2014-08-15 18:42:18 +00:00
Matt Arsenault a147438e37 R600/SI: Use source modifier for f64 fabs
llvm-svn: 215747
2014-08-15 18:42:15 +00:00
Matt Arsenault 9e7cf548ea R600/SI: Refactor fneg / fabs patterns
llvm-svn: 215746
2014-08-15 18:42:11 +00:00
Reid Kleckner a6b86bef4d Fix the build with MSVC 2013 after new shuffle code
MSVC gives this awesome diagnostic:

..\lib\Target\X86\X86ISelLowering.cpp(7085) : error C2971: 'llvm::VariadicFunction1' : template parameter 'Func' : 'isShuffleEquivalentImpl' : a local variable cannot be used as a non-type argument
        ..\include\llvm/ADT/VariadicFunction.h(153) : see declaration of 'llvm::VariadicFunction1'
        ..\lib\Target\X86\X86ISelLowering.cpp(7061) : see declaration of 'isShuffleEquivalentImpl'

Using an anonymous namespace makes the problem go away.

llvm-svn: 215744
2014-08-15 18:03:58 +00:00
Matt Arsenault b2baffaffd R600/SI: Fix offset folding in some cases with shifted pointers.
Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
is only done if the add has one use. If the resulting constant
add can be folded into an addressing mode, force this to happen
for the pointer operand.

This ends up happening a lot because of how LDS objects are allocated.
Since the globals are allocated next to each other, acessing the first
element of the second object is directly indexed by a shifted pointer.

llvm-svn: 215739
2014-08-15 17:49:05 +00:00
Chandler Carruth 03f456abbe [x86] Teach the new AVX v4f64 shuffle lowering to use UNPCK instructions
where applicable for blending.

llvm-svn: 215737
2014-08-15 17:42:00 +00:00
Juergen Ributzka 5b1dbec1b4 [FastISel] Remove an performance debugging assert.
As Jim pointed out this assert isn't really needed to test for correctness,
because the code right afterwards does the same check and falls-back to
SelectionDAG - as intended.

llvm-svn: 215735
2014-08-15 17:36:30 +00:00
Matt Arsenault 2e7cc48baa R600/SI: Add intrinsic for ldexp
llvm-svn: 215734
2014-08-15 17:30:25 +00:00
Matt Arsenault 5015a89aa5 R600/SI: Implement isLegalAddressingMode
The default assumes that a 16-bit signed offset is used.
LDS instruction use a 16-bit unsigned offset, so it wasn't
being used in some cases where it was assumed a negative offset
could be used.

More should be done here, but first isLegalAddressingMode needs
to gain an addressing mode argument. For now, copy most of the rest
of the default implementation with the immediate offset change.

llvm-svn: 215732
2014-08-15 17:17:07 +00:00
Moritz Roth 8f3765625e ARM: Fix and re-enable load/store optimizer for Thumb1.
In a previous iteration of the pass, we would try to compensate for
writeback by updating later instructions and/or inserting a SUBS to
reset the base register if necessary.
Since such a SUBS sets the condition flags it's not generally safe to do
this. For now, only merge LDR/STRs if there is no writeback to the base
register (LDM that loads into the base register) or the base register is
killed by one of the merged instructions. These cases are clear wins
both in terms of instruction count and performance.

Also add three new test cases, and update the existing ones accordingly.

llvm-svn: 215729
2014-08-15 17:00:30 +00:00
Moritz Roth 378a43bfe0 ARM load/store optimizer: Compute BaseKill correctly.
This adds some code back that was deleted in r92053. The location of the
last merged memory operation needs to be kept up-to-date since MemOps
may be in a different order to the original instruction stream to
allow merging (since registers need to be in ascending order). Also
simplify the logic to determine BaseKill using findRegisterUseOperandIdx
to use an equivalent function call instead.

llvm-svn: 215728
2014-08-15 17:00:20 +00:00
Juergen Ributzka 5df8603dfd [FastISel][ARM] Fix a think-o in my previous commit (r215682).
We actually need to return the register into which we materialized the constant
and not just "true" for success. This code is currently partially dead, that is
why it didn't trigger any failures yet. Once I change the order of the constant
materialization this code will be fully exercised.

llvm-svn: 215727
2014-08-15 16:59:46 +00:00
Rafael Espindola ea46c32f81 Introduce a helper to combine instruction metadata.
Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use
it.

Patch by Björn Steinbrink!

llvm-svn: 215723
2014-08-15 15:46:38 +00:00
Rafael Espindola c23174b990 Make EmitAbsValue an static helper.
llvm-svn: 215721
2014-08-15 15:12:13 +00:00
Rafael Espindola 7bb91d942b Delete dead code. NFC.
llvm-svn: 215720
2014-08-15 14:58:22 +00:00
Rafael Espindola 5e955fbd27 Make EmitDwarfSetLineAddr an static helper. NFC.
llvm-svn: 215718
2014-08-15 14:43:02 +00:00
Rafael Espindola 3851808e40 Make BuildSymbolDiff an static helper.
llvm-svn: 215717
2014-08-15 14:31:47 +00:00
Amara Emerson 82da7d07a1 [AArch64] Narrow arguments passed in wrong position on the stack in
big-endian mode.

Patch by Asiri Rathnayake.

Differential Revision: http://reviews.llvm.org/D4922

llvm-svn: 215716
2014-08-15 14:29:57 +00:00
Rafael Espindola ed735d3360 Make ForceExpAbs an static helper.
llvm-svn: 215715
2014-08-15 14:24:41 +00:00
Rafael Espindola adbe02435d Add a helper to MCExpr for when an expression is know to be absolute.
llvm-svn: 215713
2014-08-15 14:20:32 +00:00
Rafael Espindola d610ba99cb Remove HasLEB128.
We already require CFI, so it should be safe to require .leb128 and .uleb128.

llvm-svn: 215712
2014-08-15 14:01:07 +00:00
Benjamin Kramer 769989c4e9 PPC: Clean up pointer casting, no functionality change.
Silences GCC's -Wcast-qual.

llvm-svn: 215703
2014-08-15 11:05:45 +00:00
Chandler Carruth f88612581e [x86] Add the initial skeleton of type-based dispatch for AVX vectors in
the new shuffle lowering and an implementation for v4 shuffles.

This allows us to handle non-half-crossing shuffles directly for v4
shuffles, both integer and floating point. This currently misses places
where we could perform the blend via UNPCK instructions, but otherwise
generates equally good or better code for the test cases included to the
existing vector shuffle lowering. There are a few cases that are
entertainingly better. ;]

llvm-svn: 215702
2014-08-15 11:01:40 +00:00
Chandler Carruth 0288620f67 [x86] Teach the instruction printer to decode immediate operands to
BLENDPS, BLENDPD, and PBLENDW instructions into pretty shuffle comments.

These will be used in my next commit as part of test cases for AVX
shuffles which can directly use blend in more places.

llvm-svn: 215701
2014-08-15 11:01:37 +00:00
Tim Northover ee843ef0fa ARM: implement MRS/MSR (banked reg) system instructions.
These are system-only instructions for CPUs with virtualization
extensions, allowing a hypervisor easy access to all of the various
different AArch32 registers.

rdar://problem/17861345

llvm-svn: 215700
2014-08-15 10:47:12 +00:00
Erik Verbruggen ccc517c47f Remove testcase from README which we didn't get. We do get it now.
llvm-svn: 215699
2014-08-15 10:33:03 +00:00
Vladimir Medic 8d380fa37a Current implementation of c.cond.fmt instructions only accept default cc0 register. This patch enables the instruction to accept other fcc registers. The aliases with default fcc0 registers are also defined.
llvm-svn: 215698
2014-08-15 09:29:30 +00:00
Chandler Carruth 6649f53207 [x86] Remove the duplicated code for testing whether we can widen the
elements of a shuffle mask and simplify how it works. No functionality
changed now that the bug that was here has been fixed.

llvm-svn: 215696
2014-08-15 07:41:57 +00:00
Chandler Carruth 17fd848bfa [x86] Fix the very broken formation of vpunpck instructions in the
target-specific shuffl DAG combines.

We were recognizing the paired shuffles backwards. This code needs to be
replaced anyways as we have the same functionality elsewhere, but I'll
do the refactoring in a follow-up, this is the minimal fix to the
behavior.

In addition to fixing miscompiles with the new vector shuffle lowering,
it also causes the canonicalization to kick in much better, selecting
the smaller encoding variants in lots of places in the new AVX path.
This still isn't quite ideal as we don't need both the shufpd and the
punpck instructions, but that'll get fixed in a follow-up patch.

llvm-svn: 215690
2014-08-15 03:54:49 +00:00
Rafael Espindola fe963b1764 Don't print comments to an object streamer :-)
llvm-svn: 215689
2014-08-15 03:07:13 +00:00
Rafael Espindola 91f6621edb EmitAbsValue is the same as EmitValue on non-darwin. NFC.
llvm-svn: 215688
2014-08-15 02:51:31 +00:00
Chandler Carruth 372c143c2f [x86] Fix PR20540 where the x86 shuffle DAG combiner had completely
broken logic for merging shuffle masks in the face of SM_SentinelZero
mask operands.

While these are '-1' they don't mean 'undef' the way '-1' means in the
pre-legalized shuffle masks. Instead, they mean that the shuffle
operation is forcibly zeroing that lane. Reflect this and explicitly
handle it in a bunch of places. In one place the effect is equivalent
but much more clear. In the rest it was really weirdly broken.

Also, rewrite the entire merging thing to be a more directy operation
with a single loop and just doing math to map the indices through the
various masks.

Also add a bunch of asserts to try to make in extremely clear what the
different masks can possibly look like.

Finally, add some comments to clarify that we're merging shuffle masks
*up* here rather than *down* as we do everywhere else, and thus the
logic is quite confusing.

Thanks to several different people for sending test cases, and for
Robert Khasanov for an initial attempt at fixing.

llvm-svn: 215687
2014-08-15 02:43:18 +00:00
Bill Schmidt 3755e17dec [PPC64] Add missing dependency on X2 to LDinto_toc.
The LDinto_toc pattern has been part of 64-bit PowerPC for a long
time, and represents loading from a memory location into the TOC
register (X2).  However, this pattern doesn't explicitly record that
it modifies that register.  This patch adds the missing dependency.

It was very surprising to me that this has never shown up as a problem
in the past, and that we only saw this problem recently in a single
scenario when building a self-hosted clang.  It turns out that in most
cases we have another dependency present that keeps the LDinto_toc
instruction tied in place.  LDinto_toc is used for TOC restore
following a call site, so this is a typical sequence:

   BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
   LDinto_toc 24, %X1
   ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>

Because the LDinto_toc is inserted prior to the ADJCALLSTACKUP, there
is a natural anti-dependency between the two that keeps it in place.

Therefore we don't usually see a problem.  However, in one particular
case, one call is followed immediately by another call, and the second
call requires a parameter that is a TOC-relative address.  This is the
code sequence:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1
  ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>
  ADJCALLSTACKDOWN 96, %R1<imp-def>, %R1<imp-use>
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

Note that the back-to-back stack adjustments are the same size!  The
back end is smart enough to recognize this and optimize them away:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

Now there is nothing to prevent the ADDIStocHA instruction from moving
ahead of the LDinto_toc instruction, and because of the longest-path
heuristic, this is what happens.

With the accompanying patch, %X2 is represented as an implicit def:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1, %X2<imp-def,dead>
  ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use>
  ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use>
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

So now when the two stack adjustments are removed, ADDIStocHA is
prevented from being moved above LDinto_toc.

I have not yet created a test case for this, because the original
failure occurs on a relatively large function that needs reduction.
However, this is a fairly serious bug, despite its infrequency, and I
wanted to get this patch onto the list as soon as possible so that it
can be considered for a 3.5 backport.  I'll work on whittling down a
test case.

Have we missed the boat for 3.5 at this point?

Thanks,
Bill

llvm-svn: 215685
2014-08-15 01:25:26 +00:00
Juergen Ributzka 81db58e177 [FastISel][ARM] Fall-back to constant pool loads when materializing an i32 constant.
FastEmit_i won't always succeed to materialize an i32 constant and just fail.
This would trigger a fall-back to SelectionDAG, which is really not necessary.

This fix will first fall-back to a constant pool load to materialize the constant
before giving up for good.

This fixes <rdar://problem/18022633>.

llvm-svn: 215682
2014-08-14 23:29:49 +00:00
Hal Finkel 61c386126b Copy noalias metadata from call sites to inlined instructions
When a call site with noalias metadata is inlined, that metadata can be
propagated directly to the inlined instructions (only those that might access
memory because it is not useful on the others). Prior to inlining, the noalias
metadata could express that a call would not alias with some other memory
access, which implies that no instruction within that called function would
alias. By propagating the metadata to the inlined instructions, we preserve
that knowledge.

This should complete the enhancements requested in PR20500.

llvm-svn: 215676
2014-08-14 21:09:37 +00:00
Juergen Ributzka 790bacf232 Revert several FastISel commits to track down a buildbot error.
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."

llvm-svn: 215673
2014-08-14 19:56:28 +00:00
Duncan P. N. Exon Smith e85df16564 Fix whitespace error from r215279, NFC
llvm-svn: 215667
2014-08-14 17:18:26 +00:00
Adam Nemet 6fa675754a [AVX512] Switch FMA intrinsics to the masking version
This does the renaming and updates the lowering logic.

Part of <rdar://problem/17688758>

llvm-svn: 215664
2014-08-14 17:13:30 +00:00
Adam Nemet 9b4f08c729 [X86] Break out logic to map FMA Intrinsic number to Opcode
No functional change.  Will be used to lower AVX512 masking FMA intrinsics.

llvm-svn: 215663
2014-08-14 17:13:27 +00:00
Adam Nemet 50b83f0bb8 [AVX512] Add enum for the static rounding types
No functional change.  This will be used by the new FMA intrinsic lowering
code.

We can probably add NO_EXC here as well, I am just not too familiar with this
part of AVX512 yet.  We can add that later.

llvm-svn: 215662
2014-08-14 17:13:26 +00:00
Adam Nemet 6f31063dcf [AVX512] Break out the logic to lower masking intrinsics
No functional change.  This will be used by the FMA intrinsic lowering as well
and hopefully many more.

llvm-svn: 215661
2014-08-14 17:13:24 +00:00
Adam Nemet 2e91ee58fe [AVX512] Add masking variant for the FMA instructions
This change further evolves the base class AVX512_masking in order to make it
suitable for the masking variants of the FMA instructions.

Besides AVX512_masking there is now a new base class that instructions
including FMAs can use: AVX512_masking_3src.  With three-source (destructive)
instructions one of the sources is already tied to the destination.  This
difference from AVX512_masking is captured by this new class.  The common bits
between _masking and _masking_3src are broken out into a new super class
called AVX512_masking_common.

As with valign, there is some corresponding restructuring of the underlying
format classes.  The idea is the same we want to derive from two classes
essentially: one providing the format bits and another format-independent
multiclass supplying the various masking and non-masking instruction variants.

Existing fma tests in avx512-fma*.ll provide coverage here for the non-masking
variants.  For masking, the next patches in the series will add intrinsics and
intrinsic tests.

For AVX512_masking_3src to work, the (ins ...) dag has to be passed *without*
the leading source operand that is tied to dst ($src1).  This is necessary to
properly construct the (ins ...) for the different variants.  For the record,
I did check that if $src is mistakenly included, you do get a fairly intuitive
error message from the tablegen backend.

Part of <rdar://problem/17688758>

llvm-svn: 215660
2014-08-14 17:13:19 +00:00
Juergen Ributzka 34ed422c42 Revert "[FastISel][AArch64] Add support for more addressing modes."
This reverts commits r215597, because it might have broken the build bots.

llvm-svn: 215659
2014-08-14 17:10:54 +00:00
Hal Finkel d2dee16c27 Add noalias metadata for general calls (not just memory intrinsics) during inlining
When preserving noalias function parameter attributes by adding noalias
metadata in the inliner, we should do this for general function calls (not just
memory intrinsics). The logic is very similar to what already existed (except
that we want to add this metadata even for functions taking no relevant
parameters). This metadata can be used by ModRef queries in the caller after
inlining.

This addresses the first part of PR20500. Adding noalias metadata during
inlining is still turned off by default.

llvm-svn: 215657
2014-08-14 16:44:03 +00:00
Moritz Roth 6f257cf27b Testing commit access.
Remove a trailing whitespace.

llvm-svn: 215653
2014-08-14 16:20:50 +00:00
Chad Rosier 11ab941644 [Reassociation] Add support for reassociation with unsafe algebra.
Vector instructions are (still) not supported for either integer or floating
point.  Hopefully, that work will be landed shortly.

llvm-svn: 215647
2014-08-14 15:23:01 +00:00
Sanjay Patel 35d3133650 optimize vector fneg of bitcasted integer value
This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.

This patch is very similar to a fabs patch committed at r214892.

Differential Revision: http://reviews.llvm.org/D4852

llvm-svn: 215646
2014-08-14 15:15:28 +00:00
Rafael Espindola 36d3ee7c32 Delete support for AuroraUX.
auroraux.org is not resolving.

I will add this to the release notes as soon as I figure out where to put the
3.6 release notes :-)

llvm-svn: 215645
2014-08-14 15:15:09 +00:00
Aaron Ballman 61acc22129 Silencing an MSVC C4334 warning ('<<' : result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)). NFC.
llvm-svn: 215642
2014-08-14 13:43:57 +00:00
Chandler Carruth a8311b3681 [x86] Begin stubbing out the AVX support in the new vector shuffle
lowering scheme.

Currently, this just directly bails to the fallback path of splitting
the 256-bit vector into two 128-bit vectors, operating there, and then
joining the results back together. While the results are far from
perfect, they are *shockingly* good for what we're doing here. I'll be
layering the rest of the functionality on top of this piece by piece and
updating tests as I go.

Note that 256-bit vectors in this mode are still somewhat WIP. While
I think the code paths that I'm adding here are clean and good-to-go,
there are still a lot of 128-bit assumptions that I'll need to stomp out
as I march through the functional spread here.

llvm-svn: 215637
2014-08-14 12:13:59 +00:00
Zoran Jovanovic 73ff948746 [mips][microMIPS] MicroMIPS Compact Branch Instructions BEQZC and BNEZC
Differential Revision: http://reviews.llvm.org/D3545

llvm-svn: 215636
2014-08-14 12:09:10 +00:00
Toma Tabacu 0d64b20c03 [mips] Add assembler support for the "la $reg,symbol" pseudo-instruction.
Summary:
This pseudo-instruction allows the programmer to load an address from a symbolic expression into a register.

Patch by David Chisnall.
His work was sponsored by: DARPA, AFRL

I've made some minor changes to the original, such as improving the formatting and adding some comments, and I've also added a test case.

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D4808

llvm-svn: 215630
2014-08-14 10:29:17 +00:00
Daniel Sanders cdb45fa391 [mips] Rename [gs]etCanHaveModuleDir to more natural names
Summary:
getCanHaveModuleDir() is renamed to isModuleDirectiveAllowed(), and
setCanHaveModuleDir() is renamed to forbidModuleDirective() since it is only
ever given a false argument.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4885

llvm-svn: 215628
2014-08-14 09:18:14 +00:00
Chandler Carruth 7cd15be784 [SDAG] Fix a bug in the DAG combiner where we would fail to return the
input node after manually adding it to the worklist and using CombineTo.

Once we use CombineTo the input node may have been deleted. Despite this
being *completely confusing* and somewhat broken, the only way to
"correctly" return from a DAG combine after potentially deleting the
input node is to return *that exact node*....

But really, this code should just never have used CombineTo. It won't do
what it wants (returning the node as mentioned above just causes the
combine to infloop). The correct way to combine away a casted load to
a load of the correct type is to RAUW the chain directly and then return
the loaded value to replace the actual value node.

I managed to find this with the vector shuffle fuzzer even though it
clearly has nothing at all to do with vector shuffles and rather those
happen to trigger a load of a constant pool that hits this combine *just
right*. I've included the test as it is small and a nice stress test
that the infrastructure isn't asserting.

llvm-svn: 215622
2014-08-14 08:18:34 +00:00
David Majnemer 698dca0b95 InstCombine: ((A | ~B) ^ (~A | B)) to A ^ B
Proof using CVC3 follows:
$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR((A | ~B),(~A |B)) = BVXOR(A,B);
$ cvc3 t.cvc
Valid.

Patch by Mayur Pandey!

Differential Revision: http://reviews.llvm.org/D4883

llvm-svn: 215621
2014-08-14 06:46:25 +00:00
David Majnemer c307c66765 AArch64: Silence warning in AArch64FastISel
GCC was emitting a signed vs unsigned comparison warning.

llvm-svn: 215620
2014-08-14 06:44:51 +00:00
David Majnemer f1eda23514 Added InstCombine Transform for ((B | C) & A) | B -> B | (A & C)
Transform ((B | C) & A) | B --> B | (A & C)

Z3 Link: http://rise4fun.com/Z3/hP6p

Patch by Sonam Kumari!

Differential Revision: http://reviews.llvm.org/D4865

llvm-svn: 215619
2014-08-14 06:41:38 +00:00
Saleem Abdulrasool bb67af44e1 MC: AsmLexer: handle multi-character CommentStrings correctly
As X86MCAsmInfoDarwin uses '##' as CommentString although a single '#' starts a
comment a workaround for this special case is added.

Fixes divisions in constant expressions for the AArch64 assembler and other
targets which use '//' as CommentString.

Patch by Janne Grunau!

llvm-svn: 215615
2014-08-14 02:51:43 +00:00
Lang Hames ea800ca586 [MCJIT] Support DisableSymbolSearching and InstallLazyFunctionCreator in MCJIT.
Patch by Anthony Pesch. Thanks Anthony!

llvm-svn: 215613
2014-08-14 02:38:20 +00:00
Chandler Carruth 8039b16de7 [SDAG] Fix a case where we would iteratively legalize a node during
combining by replacing it with something else but not re-process the
node afterward to remove it.

In a truly remarkable stroke of bad luck, this would (in the test case
attached) end up getting some other node combined into it without ever
getting re-processed. By adding it back on to the worklist, in addition
to deleting the dead nodes more quickly we also ensure that if it
*stops* being dead for any reason it makes it back through the
legalizer. Without this, the test case will end up failing during
instruction selection due to an and node with a type we don't have an
instruction pattern for.

It took many million runs of the shuffle fuzz tester to find this.

llvm-svn: 215611
2014-08-14 01:07:37 +00:00
Quentin Colombet 57fb040beb [X86] Fix the value of the low mask for the lowering of MUL_LOHI for v4i32.
Found by code inspection.

llvm-svn: 215604
2014-08-13 23:49:24 +00:00
Akira Hatanaka b74db09c97 [AArch64, fast-isel] Fall back to SelectionDAG to select tail calls.
Certain functions such as objc_autoreleaseReturnValue have to be called as
tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls,
we have to fall back to SelectionDAG to select calls that are marked as tail.

<rdar://problem/17991614>

llvm-svn: 215600
2014-08-13 23:23:58 +00:00
Juergen Ributzka 98347d902e [FastISel][AArch64] Add support for more addressing modes.
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

llvm-svn: 215597
2014-08-13 22:53:29 +00:00
Juergen Ributzka 0f8bc043c5 [FastISel][X86] Add large code model support for materializing floating-point constants.
In the large code model for X86 floating-point constants are placed in the
constant pool and materialized by loading from it. Since the constant pool
could be far away, a PC relative load might not work. Therefore we first
materialize the address of the constant pool with a movabsq and then load
from there the floating-point value.

Fixes <rdar://problem/17674628>.

llvm-svn: 215595
2014-08-13 22:25:35 +00:00
Juergen Ributzka ba8b79e932 [FastISel][X86] Use XOR to materialize the "0" value.
llvm-svn: 215594
2014-08-13 22:22:17 +00:00
Juergen Ributzka 230494b399 [FastISel][X86] Emit more efficient instructions for integer constant materialization.
This mostly affects the i64 value type, which always resulted in an 15byte
mobavsq instruction to materialize any constant. The custom code checks the
value of the immediate and tries to use a different and smaller mov
instruction when possible.

This fixes <rdar://problem/17420988>.

llvm-svn: 215593
2014-08-13 22:18:11 +00:00
Juergen Ributzka 24080d60fa [FastISel][AArch64] Make use of the zero register when possible.
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

llvm-svn: 215591
2014-08-13 22:13:14 +00:00
Juergen Ributzka 7cee768e55 [FastISel] Let the target decide first if it wants to materialize a constant.
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.

On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.

On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.

On ARM it would generate unnecessary mov instructions or not use mvn.

This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.

Related to <rdar://problem/17420988>.

llvm-svn: 215588
2014-08-13 22:08:02 +00:00
Gerolf Hoflehner fe2c11ffd6 [MachineCombiner] Removal of dangling DBG_VALUES after combining [20598]
This is a cleaner solution to the problem described in r215431.
When instructions are combined a dangling DBG_VALUE is removed.
This resolves bug 20598.

llvm-svn: 215587
2014-08-13 22:07:36 +00:00
Juergen Ributzka 2b98e393f2 [FastISel][X86] Refactor constant materialization. NFCI.
Split the constant materialization code into three separate helper functions for
Integer-, Floating-Point-, and GlobalValue-Constants.

llvm-svn: 215586
2014-08-13 22:01:55 +00:00
Juergen Ributzka a5b083853c [FastISel][ARM] Use MOVT/MOVW if the subtarget requests it.
This change is also in preparation for a future change to make sure that
the constant materialization uses MOVT/MOVW when available and not a load
from the constant pool.

llvm-svn: 215584
2014-08-13 21:42:19 +00:00
Juergen Ributzka 2cbcf7aad9 [FastISel][ARM] Fix a bug in the integer materialization code.
getRegClassFor returns the incorrect register class when in Thumb2 mode.
This fix simply manually selects the register class as in the code just a few
lines above.

There is no test case for this code, because the code is currently
unreachable. This will be changed in a future commit and existing test
cases will exercise this code.

llvm-svn: 215583
2014-08-13 21:39:18 +00:00
Juergen Ributzka 5ae43a136b [FastISel][AArch64] Cleanup constant materialization code. NFCI.
Cleanup and prepare constant materialization code for future commits.

llvm-svn: 215582
2014-08-13 21:34:04 +00:00
Gerolf Hoflehner caa8bfd13b [Cleanup] Utility function to erase instruction and mark DBG_Values
New function to erase a machine instruction and mark DBG_VALUE
for removal. A DBG_VALUE is marked for removal when it references
an operand defined in the instruction.
Use the new function to cleanup code in dead machine instruction
removal pass.

llvm-svn: 215580
2014-08-13 21:15:23 +00:00
Quentin Colombet abea99f65a [MachineDominatorTree] Provide a method to inform a MachineDominatorTree that a
critical edge has been split. The MachineDominatorTree will when lazy update the
underlying dominance properties when require.

** Context **

This is a follow-up of r215410.
Each time a critical edge is split this invalidates the dominator tree
information. Thus, subsequent queries of that interface will be slow until the
underlying information is actually recomputed (costly).

** Problem **

Prior to this patch, splitting a critical edge needed to query the dominator
tree to update the dominator information.
Therefore, splitting a bunch of critical edges will likely produce poor
performance as each query to the dominator tree will use the slow query path.
This happens a lot in passes like MachineSink and PHIElimination.

** Proposed Solution **

Splitting a critical edge is a local modification of the CFG. Moreover, as soon
as a critical edge is split, it is not critical anymore and thus cannot be a
candidate for critical edge splitting anymore. In other words, the predecessor
and successor of a basic block inserted on a critical edge cannot be inserted by
critical edge splitting.

Using these observations, we can pile up the splitting of critical edge and
apply then at once before updating the DT information.

The core of this patch moves the update of the MachineDominatorTree information
from MachineBasicBlock::SplitCriticalEdge to a lazy MachineDominatorTree.

** Performance **

Thanks to this patch, the motivating example compiles in 4- minutes instead of
6+ minutes. No test case added as the motivating example as nothing special but
being huge!

The binaries are strictly identical for all the llvm test-suite + SPECs with and
without this patch for both Os and O3.

Regarding compile time, I observed only noise, although on average I saw a
small improvement.

<rdar://problem/17894619>

llvm-svn: 215576
2014-08-13 21:00:07 +00:00
Jan Vesely 0cd3ec6cfa utils: Fix segfault in flattencfg
v2: continue iterating through the rest of the bb
    use for loop

v3: initialize FlattenCFG pass in ScalarOps
    add test

v4: split off initializing flattencfg to a separate patch
    add comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215574
2014-08-13 20:31:53 +00:00
Jan Vesely 5a956d49f7 Initialize FlattenCFG pass
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215573
2014-08-13 20:31:52 +00:00
Matt Arsenault 74ef277774 R600: Correctly set the src value offset for scalarized kernel args
This for some reason fixes v1i64 kernel arguments on pre-SI. This
currently breaks some other cases in the kernel-args.ll test for R600,
but I'm not particularly confident in the new output. VTX_READ_* are not
used for some of the scalarized cases, and the code reading from the
constant buffer doesn't make much sense to me.

llvm-svn: 215564
2014-08-13 18:14:11 +00:00
Benjamin Kramer a7c40ef022 Canonicalize header guards into a common format.
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)

Changes made by clang-tidy with minor tweaks.

llvm-svn: 215558
2014-08-13 16:26:38 +00:00
Andrea Di Biagio ace8e1e3d4 [DAGCombiner] Improved target independent vector shuffle combine rule.
This patch improves the existing algorithm in DAGCombiner that
attempts to fold shuffles according to rule:
  shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3)

Before this change, there were cases where the DAGCombiner conservatively
avoided folding shuffles even if the resulting mask would have been legal.
That is because the algorithm wrongly assumed that commuting
an illegal shuffle mask would always produce an illegal mask.

With this change, we now correctly compute the commuted shuffle mask before
calling method 'isShuffleMaskLegal' on it.
On X86, this improves for example the codegen for the following function:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3>
  ret <4 x i32> %2
}

Before this change the X86 backend (-mcpu=corei7) generated
the following assembly code for function @test:
  shufps $-23, %xmm0, %xmm1  # xmm1 = xmm1[1,2],xmm0[2,3]
  movhlps %xmm1, %xmm1       # xmm1 = xmm1[1,1]
  movaps %xmm1, %xmm0

Now we produce:
  movhlps %xmm0, %xmm0       # xmm0 = xmm0[1,1]

Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly
fold according to the above-mentioned rule.

llvm-svn: 215555
2014-08-13 16:09:40 +00:00
Toma Tabacu 88f05ce30e [mips] Refactor calls to setCanHaveModuleDir.
Summary:
Moved some calls to setCanHaveModuleDir to the MipsTargetStreamer base class and removed the resulting empty functions from the MipsTargetELFStreamer class.

Also fixed a missing call to setCanHaveModuleDir in MipsTargetELFStreamer::emitDirectiveSetMicroMips.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: tomatabacu

Differential Revision: http://reviews.llvm.org/D4781

llvm-svn: 215542
2014-08-13 12:48:12 +00:00
Chandler Carruth 0fb998110a [optnone] Make the optnone attribute effective at suppressing function
attribute and function argument attribute synthesizing and propagating.

As with the other uses of this attribute, the goal remains a best-effort
(no guarantees) attempt to not optimize the function or assume things
about the function when optimizing. This is particularly useful for
compiler testing, bisecting miscompiles, triaging things, etc. I was
hitting specific issues using optnone to isolate test code from a test
driver for my fuzz testing, and this is one step of fixing that.

llvm-svn: 215538
2014-08-13 10:49:33 +00:00
Aaron Ballman 1013b6b60c Silence a -Wparenthesis warning with these asserts. NFC.
llvm-svn: 215537
2014-08-13 10:49:07 +00:00
Robert Khasanov ed8829703f [SKX] Extended non-temporal load/store instructions for AVX512VL subsets.
Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction.
Added encoding and lowering tests.

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 215536
2014-08-13 10:46:00 +00:00
Daniel Sanders d97a634f12 Re-commit: [mips] Implement .ent, .end, .frame, .mask and .fmask.
Patch by Matheus Almeida and Toma Tabacu

The lld test failure on the previous attempt to commit was caused by the
addition of the .pdr section causing the offsets it was checking to change.
This has been fixed by removing the .ent/.end directives from that test since
they weren't really needed.

llvm-svn: 215535
2014-08-13 10:07:34 +00:00
Chandler Carruth 3f92ecc2a0 Revert r215415 which causse MSan to crash on a great deal of C++ code.
I've followed up on the original commit as well.

llvm-svn: 215532
2014-08-13 09:19:39 +00:00
Elena Demikhovsky 51bbd011c3 AVX-512: Fixed a bug in shufflevector lowering.
PALIGNR instruction does not exist in AVX-512F set.
Added a test.

llvm-svn: 215526
2014-08-13 07:58:43 +00:00
Karthik Bhat a4a4db91be InstCombine: Combine (xor (or %a, %b) (xor %a, %b)) to (add %a, %b)
Correctness proof of the transform using CVC3-

$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR(A | B, BVXOR(A,B) ) = A & B;

$ cvc3 t.cvc
Valid.

llvm-svn: 215524
2014-08-13 05:13:14 +00:00
Hal Finkel b216ca55af [NVPTX] Remove MemIntrinsicSDNode/MemSDNode duplicate checking
As of r214452, isa<MemSDNode> will return true for nodes for which
isa<MemIntrinsicSDNode> will return true (classof now respects the actual class
hierarchy). So we no longer need to check for both MemIntrinsicSDNode and
MemSDNode separately.

No functionality change intended.

llvm-svn: 215523
2014-08-13 04:59:51 +00:00
Chandler Carruth b7eda21bb0 [x86] Rewrite a core part of the new vector shuffle lowering to handle
one pesky test case correctly.

This test case caused the old code to infloop occilating between solving
the low-half and the high-half. The 'side balancing' part of
single-input v8 shuffle lowering didn't handle the one pattern which can
cause it to occilate. Fortunately the fuzz testing found this case.
Unfortuately it was *terrible* to handle. I'm really sorry for the
amount and density of the code here, I'd love suggestions on how to
simplify it. I feel like there *must* be a simpler form here, but after
a lot of days I've not found it. This is the only one I've found that
even works. I've added the one pesky test case along with some nice
comments explaining the core problem that we have to solve here.

So far this has survived approximately 32k test cases. More strenuous
fuzzing commencing.

llvm-svn: 215519
2014-08-13 01:25:45 +00:00
Hal Finkel 46ef7ce283 [PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic
This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store
intrinsics. As with the construction of the MachineMemOperands for the
intrinsic calls used for unaligned load/store lowering, the only slight
complication is that we need to represent a larger memory range than the
loaded/stored value-type size (because the address is rounded down to an
aligned address, and we need to conservatively represent the entire possible
range of the actual access). This required adding an extra size field to
TargetLowering::IntrinsicInfo, and this was done in a way that required no
modifications to other targets (the size defaults to the store size of the
provided memory data type).

This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed).

llvm-svn: 215512
2014-08-13 01:15:40 +00:00
Adrian Prantl 5e1fa85ec6 Remove a condition that can never be true, as wittnessed by the assert
above.

llvm-svn: 215477
2014-08-12 21:55:58 +00:00
Adam Nemet cee9d0a460 [AVX512] Handle valign masking intrinsic via C++ lowering
I think that this will scale better in most cases than adding a Pat<> for each
mapping from the intrinsic DAG to the intruction (i.e. rri, rrik, rrikz).  We
can just lower to the SDNode and have the resulting DAG be matches by the DAG
patterns.

Alternatively (long term), we could keep the Pat<>s but generate them via the
new AVX512_masking multiclass.  The difficulty is that in order to formulate
that we would have to concatenate DAGs.  Currently this is only supported if
the operators of the input DAGs are identical.

llvm-svn: 215473
2014-08-12 21:13:12 +00:00
Matt Arsenault 4815f09bbe Allwo bitcast + struct GEP transform to work with addrspacecast
llvm-svn: 215467
2014-08-12 19:46:13 +00:00
Jan Vesely e5ca27d716 R600: Use optimized 24bit path in udivrem
v2: drop enum keyword
    use correct extension mode
    don't bother computing the sign in unsinged case

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215462
2014-08-12 17:31:20 +00:00
Jan Vesely e377a6b59a R600: Remove unused code.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215461
2014-08-12 17:31:19 +00:00
Jan Vesely 4a33bc6206 R600: Use i24 optimized path for SREM
v2: add tests
    rename LowerSDIV24 to LowerSDIVREM24
    handle the rem part in this function

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215460
2014-08-12 17:31:17 +00:00
Quentin Colombet 8427df974e Fix a parentheses warning introduced in r215394.
llvm-svn: 215459
2014-08-12 17:11:26 +00:00
Duncan P. N. Exon Smith 09d84addb7 Don't upgrade global constructors when reading bitcode
An optional third field was added to `llvm.global_ctors` (and
`llvm.global_dtors`) in r209015.  Most of the code has been changed to
deal with both versions of the variables.  Users of the C API might
create either version, the helper functions in LLVM create the two-field
version, and clang now creates the three-field version.

However, the BitcodeReader was changed to always upgrade to the
three-field version.  This created an unnecessary inconsistency in the
IR before/after serializing to bitcode.

This commit resolves the inconsistency by making the third field truly
optional (and not upgrading in the bitcode reader).  Since `llvm-link`
was relying on this upgrade code, rather than deleting it I've moved it
into `ModuleLinker`, where it upgrades these arrays as necessary to
resolve inconsistencies between modules.

The ideal resolution would be to remove the 2-field version and make the
third field required.  I filed PR20506 to track that.

I changed `test/Bitcode/upgrade-global-ctors.ll` to a negative test and
duplicated the `llvm-link` check in `test/Linker/global_ctors.ll` to
check both upgrade directions.

Since I came across this as part of PR5680 (serializing use-list order),
I've also added the missing `verify-uselistorder` RUN line to
`test/Bitcode/metadata-2.ll`.

llvm-svn: 215457
2014-08-12 16:46:37 +00:00
Sanjay Patel 8687f320e0 fixed typos
llvm-svn: 215451
2014-08-12 16:00:06 +00:00
Toma Tabacu 9ea5582816 Reverted my "Testing commit access" commit.
llvm-svn: 215441
2014-08-12 12:41:44 +00:00
Toma Tabacu 2bf228eb47 Testing commit access.
llvm-svn: 215440
2014-08-12 12:29:40 +00:00
Eric Christopher ce40dbcbaa Have MachineRegisterInfo take and store the MachineFunction it
was created for rather than the TargetMachine since we only
needed the TM for the subtarget and we can get that from the
MF.

llvm-svn: 215432
2014-08-12 08:00:56 +00:00
Gerolf Hoflehner eb90500d06 [MachineCombiner] Fix for ICE bug 20598
The combiner ignored DBG nodes when checking
the uses of a virtual register.

It combined a sequence like
   %vreg1 = madd %vreg2, %vreg3,...
   DBG_VALUE (%vreg1 ...)
   %vreg4 = add %vreg1,...
to
  %vreg4 = madd %vreg2, %vreg3

leaving behind a dangling DBG_VALUE with
a definition. This triggered an assertion
in the MachineTraceMetrics.cpp module.

llvm-svn: 215431
2014-08-12 07:54:12 +00:00
Justin Bogner c0087f3611 IR: Print a newline when dumping Types
Type::dump() doesn't print a newline, which makes for a poor
experience in a debugger. This looks like it was an ommission
considering Value::dump() two lines above, so I've changed Type to add
a newline as well.

Of the two in-tree callers, one added a newline anyway, and I've
updated the other one to use Type::print instead.

llvm-svn: 215421
2014-08-12 03:24:59 +00:00
Peter Zotov b19f78f01d [LLVM-C] Expose User::getOperandUse as LLVMGetOperandUse.
Patch by Gabriel Radanne <drupyog@zoho.com>

llvm-svn: 215419
2014-08-12 02:55:40 +00:00
Adrian Prantl 9724b5c9a4 DebugLocEntry: Restore the comparison predicate from before the
refactoring in 215384. This way it can unique multiple entries describing
the same piece even if they don't have the exact same location.
(The same piece may get merged in and be added from OpenRanges).
There ought to be a more elegant solution for this, though.

llvm-svn: 215418
2014-08-12 01:07:53 +00:00
Reid Kleckner 3ae6e1528a msan: Handle musttail calls
First, avoid calling setTailCall(false) on musttail calls.  The funciton
prototypes should be "congruent", so the shadow layout should be exactly
the same.

Second, avoid inserting instrumentation after a musttail call to
propagate the return value shadow.  We don't need to propagate the
result of a tail call, it should already be in the right place.

Reviewed By: eugenis

Differential Revision: http://reviews.llvm.org/D4331

llvm-svn: 215415
2014-08-12 00:12:43 +00:00
Reid Kleckner e31acf239a Move helper for getting a terminating musttail call to BasicBlock
No functional change.  To be used in future commits that need to look
for such instructions.

Reviewed By: rafael

Differential Revision: http://reviews.llvm.org/D4504

llvm-svn: 215413
2014-08-12 00:05:15 +00:00
David Blaikie f73ae4fbf6 Revert "Partially revert r214761 that asserted that all concrete debug info variables had DIEs, due to a failure on Darwin."
I believe this was addressed by r215157 and r215227, so let's have
another go at the bots, etc.

This reverts commit r214880.

llvm-svn: 215412
2014-08-12 00:00:31 +00:00
Quentin Colombet 5cded89d12 [MachineSink] Improve the compile time by preserving the dominance information
as long as possible.

** Context **

Each time the dominance information is modified, the dominator tree analysis
switches in a slow query mode. After a few queries without any modification on
the dominator tree, it performs an expensive update of its internal structure to
provide fast queries again.

** Problem **

Prior to this patch, the MachineSink pass was splitting the critical edges on
demand while relying heavy on the dominator tree information. In some cases,
this leads to pathological behavior where:
- We end up in the slow query mode right after splitting an edge.
- We update the dominance information.
- We break the dominance information again, thus ending up in the slow query
  mode and so on.

** Proposed Solution **

To mitigate this effect, this patch postpones all the splitting of the edges at
the end of each iteration of the main loop.
The benefits are:
- The dominance information is valid for the life time of an iteration.
- This simplifies the code as we do not have to special treat instructions that
  are sunk on critical edges. Indeed, the related block will be available
  through the next iteration.

The downside is that when edges splitting is required, this incurs an additional
iteration of the main loop compared to the previous scheme.

** Performance **

Thanks to this patch, the motivating example compiles in 6+ minutes instead of
10+ minutes. No test case added as the motivating example as nothing special but
being huge!

I have measured only noise for both the compile time and the runtime on the llvm
test-suite + SPECs with Os and O3.

Note: The current implementation of MachineBasicBlock::SplitCriticalEdge also
uses the dominance information and therefore, hits this problem. A subsequent
patch will address that.

<rdar://problem/17894619>

llvm-svn: 215410
2014-08-11 23:52:01 +00:00
Michael J. Spencer 6b2f5b47d2 [x86] Fold extract_vector_elt of a load into the Load's address computation.
llvm-svn: 215409
2014-08-11 23:49:33 +00:00
Adrian Prantl 76502d8417 Add a couple of convenience accessors to DebugLocEntry::Value to further
simplify common usage patterns.

llvm-svn: 215407
2014-08-11 23:22:59 +00:00
NAKAMURA Takumi 5f79ee53ec R600/SIInstrInfo.cpp: Suppress an warning. [-Wunused-variable]
llvm-svn: 215406
2014-08-11 23:03:38 +00:00
Quentin Colombet 55fd3ba33e [ARM] Mark VMOVDRR with the RegSequence property and implement the related
target hook.

This patch teaches the compiler that:
dX = VMOVDRR rY, rZ
is the same as:
dX = REG_SEQUENCE rY, ssub_0, rZ, ssub_1

<rdar://problem/12702965>

llvm-svn: 215404
2014-08-11 22:56:22 +00:00
Adrian Prantl e8bde9f070 Make these DebugLocEntry::Value comparison operators friend functions
as suggested by dblaikie in a comment on r215384.

llvm-svn: 215403
2014-08-11 22:52:56 +00:00
Jim Grosbach 1eee3dfded Add missing closing namespace comment.
llvm-svn: 215402
2014-08-11 22:42:31 +00:00
Jim Grosbach 8e810ba411 AArch64: Tidy up a few comments.
Have the comments match the actual parameter names. Found via clang-tidy.

llvm-svn: 215401
2014-08-11 22:42:28 +00:00
David Majnemer ab07f00c64 InstCombine: Combine (add (and %a, %b) (or %a, %b)) to (add %a, %b)
What follows bellow is a correctness proof of the transform using CVC3.

$ < t.cvc
A, B : BITVECTOR(32);

QUERY BVPLUS(32, A & B, A | B) = BVPLUS(32, A, B);

$ cvc3 < t.cvc
Valid.

llvm-svn: 215400
2014-08-11 22:32:02 +00:00
Tom Stellard 155bbb7713 R600/SI: Add a ComplexPattern for selecting MUBUF _OFFSET variant
This saves us from having to copy a 64-bit 0 value into VGPRs for
BUFFER_* instruction which only have a 12-bit immediate offset.

llvm-svn: 215399
2014-08-11 22:18:17 +00:00
Tom Stellard ddea48673f R600/SI: Add an _OFFEN variant MUBUF_STORE_* and use it for scratch writes
llvm-svn: 215398
2014-08-11 22:18:14 +00:00
Tom Stellard 93ba12f163 R600/SI: Clear lds bit on MUBUF instructions used for private stores
This bit was left uninitialized, which was causing some random failures
of piglit tests.

NOTE: This is a candidate for the 3.5 branch.
llvm-svn: 215396
2014-08-11 22:18:09 +00:00
Quentin Colombet d533cdf26f Add isRegSequence property.
This patch adds a new property: isRegSequence and the related target hooks: 
TargetIntrInfo::getRegSequenceInputs and 
TargetInstrInfo::getRegSequenceLikeInputs to specify that a target specific
instruction is a (kind of) REG_SEQUENCE.

<rdar://problem/12702965>

llvm-svn: 215394
2014-08-11 22:17:14 +00:00
Quentin Colombet c64c175173 [AArch64] Fix registerAllocator assigns same register for base and wback in
pre/post-index load and store.

Patch by Steven Wu <stevenwu@apple.com>

llvm-svn: 215390
2014-08-11 21:39:53 +00:00
Adrian Prantl be4b5171d3 Debug info: Remove an obsolete constructor from DebugLocEntry.
llvm-svn: 215387
2014-08-11 21:06:03 +00:00
Adrian Prantl 1c6f2ec112 Debug info: Modify DebugLocEntry::addValue to take multiple values so it
only has to sort/unique values once per batch.

llvm-svn: 215386
2014-08-11 21:06:00 +00:00
Adrian Prantl caaf053c79 Debug info: Further simplify the implementation of buildLocationList by
getting rid of the redundant DIVariable in the OpenRanges pair.

llvm-svn: 215385
2014-08-11 21:05:57 +00:00
Adrian Prantl 293dd93f95 Debug Info: Move the sorting and uniqueing of pieces from emitLocPieces()
into buildLocationList(). By keeping the list of Values sorted,
DebugLocEntry::Merge can also merge multi-piece entries.

llvm-svn: 215384
2014-08-11 21:05:55 +00:00
Adrian Prantl e09ee3faaf Debug info: Refactor DebugLocEntry's Merge function to make
buildLocationLists easier to read.

The previous implementation conflated the merging of individual pieces
and the merging of entire DebugLocEntries.

By splitting this functionality into two separate functions the intention
of the code should be clearer.

llvm-svn: 215383
2014-08-11 20:59:28 +00:00
Saleem Abdulrasool 27c78bf131 ARM: try harder to detect non-IT eligible instructions
For many Thumb-1 register register instructions, setting the CPSR is not
permitted inside an IT block.  We would not correctly flag those instructions.
The previous change to identify this scenario was insufficient as it did not
actually catch all the instances.  The current list is formed by manual
inspection of the ARMv6M ARM.

The change to the Thumb2 IT block test is due to the fact that the new more
stringent checking of the MIs results in the If Conversion pass being prevented
from executing (since not all the instructions in the BB are predicable).  This
results in code gen changes.

Thanks to Tim Northover for pointing out that the previous patch was
insufficient and hinting that the use of the v6M ARM would be much easier to use
than the v7 or v8!

llvm-svn: 215382
2014-08-11 20:13:25 +00:00
Reid Kleckner 21aedc4b83 MC: Diagnose an unexpected token in COFF .section instead of asserting
This can easily arise when trying to assemble and ELF style .section
directive for a COFF object file.

llvm-svn: 215373
2014-08-11 18:34:43 +00:00
Sylvestre Ledru 469de19a09 Fix typos:
* libaries => libraries
* avaiable => available

llvm-svn: 215366
2014-08-11 18:04:46 +00:00
Rafael Espindola b16196a3e0 Fix use of uninitialized variable.
Fixes linking bitcode files that use the new style comdats for constructors
with ones that don't.

llvm-svn: 215364
2014-08-11 17:07:34 +00:00
Rafael Espindola 2ef3f299d5 Use an early return. NFC.
llvm-svn: 215363
2014-08-11 16:55:42 +00:00
Daniel Sanders b9bc75b625 Revert r215359 - [mips] Implement .ent, .end, .frame, .mask and .fmask assembler directives
It seems to cause an lld test (elf/Mips/hilo16-3.test) to fail. Reverted while we investigate.

llvm-svn: 215361
2014-08-11 16:10:19 +00:00
Daniel Sanders 21cf026893 [mips] Implement .ent, .end, .frame, .mask and .fmask assembler directives
Patch by Matheus Almeida and Toma Tabacu

Differential Revision: http://reviews.llvm.org/D4179

llvm-svn: 215359
2014-08-11 15:28:56 +00:00
Hans Wennborg 97a59ae589 PeepholeOptimizer: make parameter ref to SmallPtrSetImpl
This makes the function type independent of the in-line size
of LocalMIs.

llvm-svn: 215356
2014-08-11 13:52:46 +00:00
Hans Wennborg 5f5b8cc04f Make this SmallVector size a power of two as suggested by Chandler
llvm-svn: 215355
2014-08-11 13:47:57 +00:00
Tim Northover c532bbd647 AArch64: add support for dynamic-loader relocations
LLD needs them, and it's good to be able to print them properly when
our object dumpers encounter them.

Patch by Daniel Stewart.

llvm-svn: 215352
2014-08-11 10:10:27 +00:00
Elena Demikhovsky 40a77144a4 AVX-512: added a missing bitcast from v16f32 to v16i32
llvm-svn: 215351
2014-08-11 09:59:08 +00:00
Oliver Stannard 11790b2dac ARM: __gnu_h2f_ieee and __gnu_f2h_ieee always use the soft-float calling convention
By default, LLVM uses the "C" calling convention for all runtime
library functions. The half-precision FP conversion functions use the
soft-float calling convention, and are needed for some targets which
use the hard-float convention by default, so must have their calling
convention explicitly set.

llvm-svn: 215348
2014-08-11 09:12:32 +00:00
Jiangning Liu dd6e12d71c In Machine CSE pass, the source register of a COPY machine instruction can
be propagated to all its users, and this propagation could increase the 
probability of finding common subexpressions. If the COPY has only one user,
the COPY itself can be removed.

llvm-svn: 215344
2014-08-11 05:17:19 +00:00
Jiangning Liu 40b04fd994 In LVI(Lazy Value Info), originally value on a BB can only be caculated once,
and the lattice will be updated to be a state other than "undefined". This
limiation could miss some opportunities of lowering "overdefined" to be an
even accurate value. So this patch ask the algorithm to try to lower the
lattice value again even if the value has been lowered to be "overdefined".

llvm-svn: 215343
2014-08-11 05:02:04 +00:00
Hans Wennborg 941a5709dc Re-commit "Increase the size of this SmallVector in PeepholeOptimizer." (r215340)
This time, also update the function that receives a reference to the SmallPtrSet as
a parameter.

llvm-svn: 215342
2014-08-11 02:50:43 +00:00
Hans Wennborg 98b3cf8594 Revert "Increase the size of this SmallVector in PeepholeOptimizer." (r215340)
That broke the build:

/data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:729:46: error: non-const lvalue reference to type 'SmallPtrSet<[...], 8>' cannot bind to a value of unrelated type 'SmallPtrSet<[...], 16>'
        Changed |= optimizeExtInstr(MI, MBB, LocalMIs);
                                             ^~~~~~~~
/data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:265:49: note: passing argument to parameter 'LocalMIs' here
                 SmallPtrSet<MachineInstr*, 8> &LocalMIs) {
                                                ^

llvm-svn: 215341
2014-08-11 02:34:52 +00:00
Hans Wennborg 5b439f9c8a Increase the size of this SmallVector in PeepholeOptimizer.
During a Clang build, the median size of this was 9

llvm-svn: 215340
2014-08-11 02:21:34 +00:00
Hans Wennborg 4fa2fd12ca SubTargetFeature.cpp: it seems the size of this SmallVector should be 3
because some subtarget feature strings have three components.

llvm-svn: 215339
2014-08-11 02:21:32 +00:00
Hans Wennborg 12b996da88 Increase the size of SpillPlacement::BlockFrequencies.
This SmallVector's median size during a Clang build was 7.

llvm-svn: 215338
2014-08-11 02:21:30 +00:00
Hans Wennborg 21b20cb11a Increase the size of these SmallVectors in X86ISelLowering.cpp.
In a Clang bootstrap, their sizes were always 12, 16 and 16, respectively.

llvm-svn: 215336
2014-08-11 02:21:22 +00:00
Hans Wennborg 01416e66b6 Increase the size of this SmallVector in CloneNodeWithValues.
In a Clang bootstrap, the size of this vector was always 6.

llvm-svn: 215335
2014-08-11 02:21:19 +00:00
Hans Wennborg b8cff696cd Increase the size of DwarfAccelTable::TableHeaderData::Atoms.
During a Clang bootstrap, it seems this SmallVector always contains 3 elements.

llvm-svn: 215334
2014-08-11 02:18:15 +00:00
Petar Jovanovic 3a908a0bfc Add support for scalarizing cttz_zero_undef
Follow up to r214266. Add missing case in ScalarizeVectorResult() for
cttz_zero_undef.

Differential Revision: http://reviews.llvm.org/D4813

llvm-svn: 215330
2014-08-10 22:49:54 +00:00
Saleem Abdulrasool ed8885b402 ARM: correct isPredicable for MULS in ThHUMB mode
The ARM ARM states that CPSR may not be updated by a MUL in thumb mode.  Due to
an ordering of Thumb 2 Size Reduction and If Conversion, we would end up
generating a THUMB MULS inside an IT block.

The If Conversion pass uses the TTI isPredicable method to ensure that it can
transform a Basic Block.  However, because we only check for IT handling on
Thumb2 functions, we may miss some cases.  Even then, it only validates that the
CPSR is not *live* rather than it is not accessed.  This corrects the handling
for that particular case since the same restriction does not hold on the vast
majority of the instructions.

This does prevent the IfConversion optimization from kicking in in certain
cases, but generating correct code is more valuable.  Addresses PR20555.

llvm-svn: 215328
2014-08-10 22:20:37 +00:00
Joerg Sonnenberger bfef1dd694 @l and friends adjust their value depending the context used in.
For ori, they are unsigned, for addi, signed. Create a new target
expression type to handle this and evaluate Fixups accordingly.

llvm-svn: 215315
2014-08-10 12:41:50 +00:00
Joerg Sonnenberger b696d46045 Fix tabs.
llvm-svn: 215311
2014-08-10 11:37:07 +00:00
Joerg Sonnenberger 752b91bd82 If available, pass down the Fixup object to EvaluateAsRelocatable.
At least on PowerPC, the interpretation of certain modifiers depends on
the context they appear in.

llvm-svn: 215310
2014-08-10 11:35:12 +00:00
Saleem Abdulrasool 0e1b31c2fd ADT: remove MinGW32 and Cygwin OSType enum
Remove the MinGW32 and Cygwin types from the OSType enumeration.  These values
are represented via environments of Windows.  It is a source of confusion and
needlessly clutters the code.  The cost of doing this is that we must sink the
check for them into the normalization code path along with the spelling.

Addresses PR20592.

llvm-svn: 215303
2014-08-09 23:12:20 +00:00
Sanjay Patel b48d1cfa53 fixed typos
llvm-svn: 215299
2014-08-09 22:23:02 +00:00
Aaron Ballman 18ac683fe5 Resolving some type truncation warnings in MSVC (enum to bool in this case). No functional changes intended.
llvm-svn: 215293
2014-08-09 19:53:34 +00:00
Saleem Abdulrasool 1d0d00f4e4 MC: remove duplicated code
This removes the duplicate definition of GetXDataSection.  This function is
available as a static method and is identical to the previous implementation.
This just cleans up the unnecessary duplication.

llvm-svn: 215289
2014-08-09 17:21:36 +00:00
Saleem Abdulrasool 87a54c46b0 MC: cleanup includes
Cleanup Win64EH header inclusion.  NFC.

llvm-svn: 215288
2014-08-09 17:21:33 +00:00
Saleem Abdulrasool f158ca353f CodeGen: switch to a range based for loop
Use a range based for loop instead of manual iteration.  NFC.

llvm-svn: 215287
2014-08-09 17:21:29 +00:00
Joerg Sonnenberger 5aab5afcd2 Allow the third argument for the subi family to be an expression.
llvm-svn: 215286
2014-08-09 17:10:26 +00:00