Commit Graph

75490 Commits

Author SHA1 Message Date
Peter Collingbourne 06e8543cd0 Make the __morestack function available to the JIT memory manager under Linux.
This function's implementation lives in libgcc, a static library, so we need
to expose it explicitly, like the other such functions.

Differential Revision: http://reviews.llvm.org/D6788

llvm-svn: 224993
2014-12-30 18:06:52 +00:00
Colin LeMahieu 82fb8cba16 [Hexagon] Dropping old combine instructions without encodings.
llvm-svn: 224992
2014-12-30 17:53:54 +00:00
Colin LeMahieu 377ac65340 [Hexagon] Adding compare byte/halfword reg-reg/reg-imm forms. Adding compare to general register reg-imm form.
llvm-svn: 224991
2014-12-30 17:39:24 +00:00
Colin LeMahieu d7a56fd9ff [Hexagon] Updating constant extender def, adding alu-not instructions, compare to general register, and inverted compares.
llvm-svn: 224989
2014-12-30 15:44:17 +00:00
Elena Demikhovsky 84d1997b95 Some code improvements in Masked Load/Store.
No functional changes.

llvm-svn: 224986
2014-12-30 14:28:14 +00:00
Rafael Espindola b22d5aa49a Remove doesSectionRequireSymbols.
In an assembly expression like

bar:
.long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

llvm-svn: 224985
2014-12-30 13:13:27 +00:00
Philip Reames 4750dd18ea Add IRBuilder routines for gc.statepoints, gc.results, and gc.relocates
Nothing particularly interesting, just adding infrastructure for use by in tree users and out of tree users.

Note: These were extracted out of a working frontend, but they have not been well tested in isolation.

Differential Revision: http://reviews.llvm.org/D6807

llvm-svn: 224981
2014-12-30 05:55:58 +00:00
Philip Reames 9db26ffc9a Carry facts about nullness and undef across GC relocation
This change implements four basic optimizations:

    If a relocated value isn't used, it doesn't need to be relocated.
    If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.)
    If the value being relocated is undef, the relocation is meaningless.
    If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.)

I outlined other planned work in comments.

Differential Revision: http://reviews.llvm.org/D6600

llvm-svn: 224968
2014-12-29 23:27:30 +00:00
Philip Reames b35f46ce06 Refine the notion of MayThrow in LICM to include a header specific version
In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up.

This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs.

define void @nothrow_header(i64 %x, i64 %y, i1 %cond) {
; CHECK-LABEL: nothrow_header
; CHECK-LABEL: entry
; CHECK: %div = udiv i64 %x, %y
; CHECK-LABEL: loop
; CHECK: call void @use(i64 %div)
entry:
  br label %loop
loop: ; preds = %entry, %for.inc
  %div = udiv i64 %x, %y
  br i1 %cond, label %loop-if, label %exit
loop-if:
  call void @use(i64 %div)
  br label %loop
exit:
  ret void
}

The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load.  The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata.

Differential Revision: http://reviews.llvm.org/D6725

llvm-svn: 224965
2014-12-29 23:00:57 +00:00
Philip Reames 5ad26c353c Loading from null is valid outside of addrspace 0
This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen.  This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces.

We really should introduce a hook to control this property on a per target per address space basis.  We may be loosing valuable optimizations in some address spaces by being too conservative.

Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me.

llvm-svn: 224961
2014-12-29 22:46:21 +00:00
Colin LeMahieu 651b72095b [Hexagon] Adding allocframe, post-increment circular immediate stores, post-increment circular register stores, and bit reversed post-increment stores.
llvm-svn: 224957
2014-12-29 21:33:45 +00:00
Colin LeMahieu 488b6f7bbc [Hexagon] Fixing 224952 where an addressing mode update was missed.
llvm-svn: 224955
2014-12-29 21:18:02 +00:00
Alexey Samsonov 4c79845125 Remove unnecessary StringRef->std::string conversion.
llvm-svn: 224953
2014-12-29 20:59:02 +00:00
Colin LeMahieu bda31b42a0 [Hexagon] Adding post-increment register form stores and register-immediate form stores with tests.
llvm-svn: 224952
2014-12-29 20:44:51 +00:00
Colin LeMahieu 9a3cd3f58c [Hexagon] Replacing the remaining postincrement stores with versions that have encoding bits.
llvm-svn: 224951
2014-12-29 20:00:43 +00:00
Colin LeMahieu 3d34afb32d [Hexagon] Renaming old multiclass for removal. Adding post-increment store classes and instruction defs.
llvm-svn: 224949
2014-12-29 19:42:14 +00:00
Craig Topper 31d6d9a0fb [X86] Fix some cases where some 8-bit instructions were marked as being convertible to three address instructions, but aren't really.
llvm-svn: 224940
2014-12-29 16:25:26 +00:00
Craig Topper 874a1966ae [X86] Add the 0x82 instructions to the disassebmler. They are identical in functionality to the 0x80 opcode instructions, but are not valid in 64-bit mode.
llvm-svn: 224939
2014-12-29 16:25:23 +00:00
Craig Topper c51b7993b8 [x86] Refactor some tablegen instruction info classes slightly to prepare for another change. NFC.
llvm-svn: 224938
2014-12-29 16:25:22 +00:00
Craig Topper 56c8e05c24 [x86] Remove unused classes from tablegen instruction info.
llvm-svn: 224937
2014-12-29 16:25:19 +00:00
Rafael Espindola 44eae72c40 Add segmented stack support for DragonFlyBSD.
Patch by Michael Neumann.

llvm-svn: 224936
2014-12-29 15:47:28 +00:00
Rafael Espindola bed67f3adc Refactor duplicated code.
No intended functionality change.

llvm-svn: 224935
2014-12-29 15:18:31 +00:00
Keno Fischer fd22c6693b [X86][ISel] Fix a regression I introduced in r224884
The else case ResultReg was not checked for validity.
To my surprise, this case was not hit in any of the
existing test cases. This includes a new test cases
that tests this path.

Also drop the `target triple` declaration from the
original test as suggested by H.J. Lu, because
apparently with it the test won't be run on Linux

llvm-svn: 224901
2014-12-28 15:20:57 +00:00
Michael Kuperstein 683c3cde43 [X86] Add missing memory variants to AVX false dependency breaking
Adds missing memory instruction variants to AVX false dependency breaking handling. (SSE was handled in r224246)

Differential Revision: http://reviews.llvm.org/D6780

llvm-svn: 224900
2014-12-28 13:15:05 +00:00
Andrea Di Biagio 22ee3f63b9 [CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz.
If the control flow is modelling an if-statement where the only instruction in
the 'then' basic block (excluding the terminator) is a call to cttz/ctlz,
CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control
flow graph.

Example:
\code
entry:
  %cmp = icmp eq i64 %val, 0
  br i1 %cmp, label %end.bb, label %then.bb

then.bb:
  %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
  br label %end.bb

end.bb:
  %cond = phi i64 [ %c, %then.bb ], [ 64, %entry]
\code

In this example, basic block %then.bb is taken if value %val is not zero.
Also, the phi node in %end.bb would propagate the size-of in bits of %val
only if %val is equal to zero.

With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb
into basic block %entry only if cttz is cheap to speculate for the target.

Added two new hooks in TargetLowering.h to let targets customize the behavior
(i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The
two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'.
By default, both methods return 'false'.
On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has
LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI.

Differential Revision: http://reviews.llvm.org/D6728

llvm-svn: 224899
2014-12-28 11:07:35 +00:00
Elena Demikhovsky 87700a734d Scalarizer for masked load and store intrinsics.
Masked vector intrinsics are a part of common LLVM IR, but they are really supported on AVX2 and AVX-512 targets. I added a code that translates masked intrinsic for all other targets. The masked vector intrinsic is converted to a chain of scalar operations inside conditional basic blocks.

http://reviews.llvm.org/D6436

llvm-svn: 224897
2014-12-28 08:54:45 +00:00
Craig Topper 6e3a582809 [x86] Prevent instruction selection of AVX512 cmp.ps/pd/ss/sd intrinsics with illegal immediates. Correctly this time. I did the wrong patterns the first time.
llvm-svn: 224891
2014-12-27 20:08:45 +00:00
David Majnemer d0bcef2040 PowerPC: CTR shouldn't fire if a TLS call is in the loop
Determining the address of a TLS variable results in a function call in
certain TLS models.  This means that a simple ICmpInst might actually
result in invalidating the CTR register.

In such cases, do not attempt to rely on the CTR register for loop
optimization purposes.

This fixes PR22034.

Differential Revision: http://reviews.llvm.org/D6786

llvm-svn: 224890
2014-12-27 19:45:38 +00:00
Aaron Ballman 4eb5c2e089 Fixing another -Wunused-variable warning, this time in release builds without asserts. NFC.
llvm-svn: 224889
2014-12-27 19:17:53 +00:00
Aaron Ballman b66d54c549 Removing a variable that is set but never used, to silence a -Wunused-but-set-variable warning; NFC.
llvm-svn: 224888
2014-12-27 19:01:19 +00:00
Craig Topper 1113fb343e [x86] Prevent instruction selection of AVX512 cmp.ps/pd/ss/sd intrinsics with illegal immediates. Forgot to do this when I did SSE/SSE2/AVX/AVX2.
llvm-svn: 224887
2014-12-27 18:51:06 +00:00
Craig Topper 53f75b9dc0 [x86] Assert on invalid immediates in the instruction printer for cmp.ps/pd/ss/sd instead of truncating the immediate. The assembly parser and instruction selection shouldn't generate invalid immediates.
llvm-svn: 224886
2014-12-27 18:11:00 +00:00
Craig Topper acc73445b7 [x86] Prevent llvm.x86.cmp.ps/pd/ss/sd from being selected with bad immediates. The frontend now checks this when the builtin is used. This will allow the instruction printer to not have to deal with invalid immediates on these instructions.
llvm-svn: 224885
2014-12-27 18:10:56 +00:00
Keno Fischer 8438b08663 [FastIsel][X86] Fix invalid register replacement for bool args
Summary:
Consider the following IR:

  %3 = load i8* undef
  %4 = trunc i8 %3 to i1
  %5 = call %jl_value_t.0* @foo(..., i1 %4, ...)
  ret %jl_value_t.0* %5

Bools (that are the result of direct truncs) are lowered as whatever
the argument to the trunc was and a "and 1", causing the part of the
MBB responsible for this argument to look something like this:

  %vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7

Later, when the load is lowered, it will insert

  %vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14

but remember to (at the end of isel) replace vreg7 by vreg15. Now for
the bug. In fast isel lowering, we mistakenly mark vreg8 as the result
of the load instead of the trunc. This adds a fixup to have
vreg8 replaced by whatever the result of the load is as well, so
we end up with

  %vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15

which is an SSA violation and causes problems later down the road.

This fixes PR21557.

Test Plan: Test test case from PR21557 is added to the test suite.

Reviewers: ributzka

Reviewed By: ributzka

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6245

llvm-svn: 224884
2014-12-27 13:10:15 +00:00
Colin LeMahieu 8233fb002d [Hexagon] Adding auto-incrementing loads with and without byte reversal.
llvm-svn: 224871
2014-12-26 21:09:25 +00:00
Colin LeMahieu 0a721cd4e1 [Hexagon] Adding locked loads.
llvm-svn: 224870
2014-12-26 20:42:27 +00:00
Colin LeMahieu ff370ed90e [Hexagon] Adding deallocframe and circular addressing loads.
llvm-svn: 224869
2014-12-26 20:30:58 +00:00
Colin LeMahieu c83cbbf6a1 [Hexagon] Adding remaining post-increment instruction variants. Removing unused classes.
llvm-svn: 224868
2014-12-26 19:31:46 +00:00
Colin LeMahieu fe9612e09d [Hexagon] Adding post-increment unsigned byte loads.
llvm-svn: 224867
2014-12-26 19:12:11 +00:00
Colin LeMahieu 96976a10a3 [Hexagon] Adding post-increment signed byte loads with tests.
llvm-svn: 224866
2014-12-26 18:57:13 +00:00
Craig Topper c4b12166f2 [X86] Add the debug registers DR8-DR15 so we can assemble and disassemble references to them.
llvm-svn: 224862
2014-12-26 18:20:05 +00:00
Craig Topper d5b39237a1 [X86] Don't fail disassembly if REX.R/REX.B is used on an MMX register. Similar fix to not fail to disassembler CR9-CR15 references.
llvm-svn: 224861
2014-12-26 18:19:44 +00:00
Timur Iskhodzhanov b6fa52f274 Band-aid fix for PR22032: don't emit DWARF debug info if AddressSanitizer is enabled on Windows
llvm-svn: 224860
2014-12-26 17:00:51 +00:00
David Majnemer b1296ec0fd InstCombine: Infer nuw for multiplies
A multiply cannot unsigned wrap if there are bitwidth, or more, leading
zero bits between the two operands.

llvm-svn: 224849
2014-12-26 09:50:35 +00:00
David Majnemer a55027f68a ValueTracking: Small cleanup in ComputeNumSignBits
Constant contains the isAllOnesValue and isNullValue predicates, not
ConstantInt.

llvm-svn: 224848
2014-12-26 09:20:17 +00:00
David Majnemer 54c2ca2539 InstCombe: Infer nsw for multiplies
We already utilize this logic for reducing overflow intrinsics, it makes
sense to reuse it for normal multiplies as well.

llvm-svn: 224847
2014-12-26 09:10:14 +00:00
Craig Topper ee9eef2fd8 Teach disassembler to handle illegal immediates on (v)cmpps/pd/ss/sd instructions. Instead of rejecting we'll just generate the _alt forms that don't try to alter the mnemonic. While I'm here, merge some common code in the Instruction printers for the condition code replacement and fix the mask on SSE to be 3-bits instead of 4.
llvm-svn: 224846
2014-12-26 06:36:28 +00:00
Craig Topper 2e44492b1d Use MCPhysReg for table of register encodings.
llvm-svn: 224845
2014-12-26 06:36:23 +00:00
Hal Finkel 0c505b08a5 [PowerPC] [FastISel] i1 constants must be zero extended
When materializing constant i1 values, they must be zero extended. We represent
i1 values as [0, 1], not [0, -1], in i32 registers. As it turns out, this code
path was dead for i1 values prior to r216006 (which is why this did not manifest in
miscompiles until recently).

Fixes -O0 self-hosting on PPC64/Linux.

llvm-svn: 224842
2014-12-25 23:08:25 +00:00
David Majnemer 25b383ac66 Silence GCC's -Wparentheses warning
No functionality change intended.

llvm-svn: 224833
2014-12-25 10:03:23 +00:00
Elena Demikhovsky fb81b93e17 Masked Load/Store - Changed the order of parameters in intrinsics.
No functional changes.
The documentation is coming.

llvm-svn: 224829
2014-12-25 07:49:20 +00:00
David Majnemer 2913eca4e2 CodeGen: Error on redefinitions instead of asserting
It's possible to have a prior definition of a symbol in module asm.
Raise an error instead of crashing.

llvm-svn: 224828
2014-12-24 23:06:55 +00:00
David Majnemer 8e92dfee20 CodeGen: Allow aliases to be overridden by variables
llvm-svn: 224827
2014-12-24 22:44:29 +00:00
Saleem Abdulrasool 747ec2dda3 MC: address some comments in deprecation checks
Bob Wilson pointed out the unnecessary checks that had been committed to the
instruction check predicates.  The check was meant to ensure that the check was
not accidentally applied to non-ARM instructions.  This is better served as an
assertion rather than a condition check.

llvm-svn: 224825
2014-12-24 18:40:42 +00:00
David Majnemer 58cb80c940 MC: Label definitions are permitted after .set directives
.set directives may be overridden by other .set directives as well as
label definitions.

This fixes PR22019.

llvm-svn: 224811
2014-12-24 10:27:50 +00:00
Saleem Abdulrasool 4d6ed7c778 IAS: correct debug line info for asm macros
Correct the line information generation for preprocessed assembly.  Although we
tracked the source information for the macro instantiation, we failed to account
for the fact that we were instantiating a macro, which is populated into a new
buffer and that the line information would be relative to the definition rather
than the actual instantiation location.  This could cause the line number
associated with the statement to be very high due to wrapping of the difference
calculated for the preprocessor line information emitted into the stream.
Properly calculate the line for the macro instantiation, referencing the line
where the macro is actually used as GCC/gas do.

The test case uses x86, though the same problem exists on any other target using
the LLVM IAS.

llvm-svn: 224810
2014-12-24 06:32:43 +00:00
Craig Topper b86338f7b2 [X86] Remove the single AdSize indicator and replace it with separate AdSize16/32/64 flags.
This removes a hardcoded list of instructions in the CodeEmitter. Eventually I intend to remove the predicates on the affected instructions since in any given mode two of them are valid if we supported addr32/addr16 prefixes in the assembler.

llvm-svn: 224809
2014-12-24 06:05:22 +00:00
David Majnemer 0fe246e079 MC: Don't emit .no_dead_strip on targets which don't support it
llvm-svn: 224808
2014-12-24 04:11:42 +00:00
Matthias Braun 51ca510094 LiveInterval: Remove accidentally committed debug code.
llvm-svn: 224807
2014-12-24 02:35:07 +00:00
Matthias Braun dbcca0dbb4 LiveInterval: Introduce createMainRangeFromSubranges().
This function constructs the main liverange by merging all subranges if
subregister liveness tracking is available. This should be slightly
faster to compute instead of performing the liveness calculation again
for the main range. More importantly it avoids cases where the main
liverange would cover positions where no subrange was live. These cases
happened for partial definitions where the actual defined part was dead
and only the undefined parts used later.

The register coalescing requires that every part covered by the main
live range has at least one subrange live.

I also expect this function to become usefull later for places where the
subranges are modified in a way that it is hard to correctly fix the
main liverange in the machine scheduler, we can simply reconstruct it
from subranges then.

llvm-svn: 224806
2014-12-24 02:11:51 +00:00
Matthias Braun 7030dda8d5 RegisterCoalescer: With subrange liveness there may be no RedefVNI for unused lanes.
llvm-svn: 224805
2014-12-24 02:11:48 +00:00
Matthias Braun 36768c684f LiveRangeEdit: Check for completely empy subranges after removing ValNos.
Completely empty subranges are not allowed and must be removed when
subreg liveness is enabled.

llvm-svn: 224804
2014-12-24 02:11:46 +00:00
Matthias Braun f603c88d13 LiveIntervalAnalysis: Fix performance bug that I introduced in r224663.
Without a reference the code did not remember when moving the iterators
of the subranges/registerunit ranges forward and instead would scan from
the beginning again at the next position.

llvm-svn: 224803
2014-12-24 02:11:43 +00:00
Chandler Carruth ffb7ce56a6 [SROA] Update the documentation and names for accessing the slices
within a partition of an alloca in SROA.

This reflects the fact that the organization of the slices isn't really
ideal for analysis, but is the naive way in which the slices are
available while we're processing them in the core partitioning
algorithm.

It is possible we could improve matters, and I've left a FIXME with
one of my ideas for how to do this, but it is a lot of work, the benefit
is somewhat minor, and it isn't clear that it would be strictly better.
=/ Not really satisfying, but I'm out of really good ideas.

This also improves one place where the debug logging failed to mark some
split partitions. Now we log in one place, slightly later, and with
accurate information about whether the slice is split by the partition
being rewritten.

llvm-svn: 224800
2014-12-24 01:48:09 +00:00
Adrian Prantl 3026a54aa2 Debug Info: In symmetry to DW_TAG_pointer_type, do not emit the byte size
of a DW_TAG_ptr_to_member_type.
This restores the behavior from before r224780-r224781.

llvm-svn: 224799
2014-12-24 01:17:51 +00:00
Chandler Carruth 5031bbe86a [SROA] Refactor the integer and vector promotion testing logic to
operate in terms of the new Partition class, and generally have a more
clear set of arguments. No functionality changed.

The most notable improvements here are consistently using the
terminology of 'partition' for a collection of slices that will be
rewritten together and 'slice' for a region of an alloca that is used by
a particular instruction.

This also makes it more clear that the split things are actually slices
as well, just ones that will be split by the proposed partition.

This doesn't yet address the confusing aspects of the partition's
interface where slices that will be split by the partition and start
prior to the partition are accesssed via Partition::splitSlices() while
the core range of slices exposed by a Partition includes both unsplit
slices and slices which will be split by the end, but started within the
offset range of the partition. This is particularly hard to address
because the algorithm which computes partitions quite literally doesn't
know which slices these will end up being until too late. I'm looking at
whether I can fix that or not, but I'm not optimistic. I'll update the
comments and/or names to further explain this either way. I've also
added one FIXME in this patch relating to this confusion so that I don't
forget about it.

llvm-svn: 224798
2014-12-24 01:05:14 +00:00
Colin LeMahieu e193e1c48b [Hexagon] Removing old classes.
llvm-svn: 224795
2014-12-24 00:43:00 +00:00
Kevin Enderby 48ef534b74 Add printing the LC_THREAD load commands with llvm-objdump’s -private-headers.
llvm-svn: 224792
2014-12-23 22:56:39 +00:00
Kostya Serebryany 9fdeb37bd3 [asan] change the coverage collection scheme so that we can easily emit coverage for the entire process as a single bit set, and if coverage_bitset=1 actually emit that bitset
llvm-svn: 224789
2014-12-23 22:32:17 +00:00
Hal Finkel fc096c98f3 [PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64
On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl
instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction
sequence is interpreted by the unwinding code in libgcc. To make sure these
occur as a pair, as with other pairings interpreted by the linker, fuse the two
instructions into one instruction (for code generation only).

In the future, we might wish to do this by emitting CFI directives instead,
but this solution is simpler, and mirrors what GCC does. Additional discussion
on this point is contained in the PR.

Fixes PR22015.

llvm-svn: 224788
2014-12-23 22:29:40 +00:00
Colin LeMahieu 947cd70413 [Hexagon] Adding doubleword load.
llvm-svn: 224787
2014-12-23 20:44:59 +00:00
Colin LeMahieu 026e88d317 [Hexagon] Reapplying 224775 load words.
llvm-svn: 224786
2014-12-23 20:02:16 +00:00
Jozef Kolek ab6d1cce3e [mips][microMIPS] Implement CACHE, PREF, SSNOP, EHB and PAUSE instructions
Differential Revision: http://reviews.llvm.org/D5204

llvm-svn: 224785
2014-12-23 19:55:34 +00:00
Colin LeMahieu 20be15718b Reverting 224775 until mayLoad flag is addressed.
llvm-svn: 224783
2014-12-23 19:22:59 +00:00
Rafael Espindola c6c58d5e71 Finish removing DestroySource.
Fixes pr21901.

llvm-svn: 224782
2014-12-23 19:16:45 +00:00
Adrian Prantl 48af2ef40f DIBuilder: Similar to createPointerType, make createMemberPointerType take
a size and alignment. Several assertions in DwarfDebug rely on all variable
types to report back a size, or to be derived from a type with a size.

Tested in CFE.

llvm-svn: 224780
2014-12-23 19:11:47 +00:00
Mehdi Amini d38920891e Always assert in DAGCombine and not only when -debug is enabled
Right now in DAG Combine check the validity of the returned type 
only when -debug is given on the command line. However usually 
the test cases in the validation does not use -debug. 
An Assert build should always check this.

llvm-svn: 224779
2014-12-23 18:59:02 +00:00
Colin LeMahieu 122aeaafea [Hexagon] Adding word loads.
llvm-svn: 224775
2014-12-23 18:06:56 +00:00
Colin LeMahieu 8e39cad934 [Hexagon] Adding signed halfword loads.
llvm-svn: 224774
2014-12-23 17:25:57 +00:00
Colin LeMahieu a9386d28a5 [Hexagon] Adding unsigned halfword load.
llvm-svn: 224772
2014-12-23 16:42:57 +00:00
Jozef Kolek 12c6982b3b [mips][microMIPS] Implement LWSP and SWSP instructions
Differential Revision: http://reviews.llvm.org/D6416

llvm-svn: 224771
2014-12-23 16:16:33 +00:00
Michael Kuperstein be8032c875 [ValueTracking] Move GlobalAlias handling to be after the max depth check in computeKnownBits()
GlobalAlias handling used to be after GlobalValue handling, which meant it was, in practice, dead code. r220165 moved GlobalAlias handling to be before GlobalValue handling, but also moved it to be before the max depth check, causing an assert due to a recursion depth limit violation. 

This moves GlobalAlias handling forward to where it's safe, and changes the GlobalValue handling to only look at GlobalObjects.

Differential Revision: http://reviews.llvm.org/D6758

llvm-svn: 224765
2014-12-23 11:33:41 +00:00
Elena Demikhovsky fcea06acb5 AVX-512: Added FMA instructions, intrinsics an tests for KNL and SKX targets
by Asaf Badouh

http://reviews.llvm.org/D6456

llvm-svn: 224764
2014-12-23 10:30:39 +00:00
Hal Finkel 6e27c6d450 [PowerPC] Don't mark the return-address slot as immutable
It is tempting to mark the fixed stack slot used to store the return address as
immutable when lowering @llvm.returnaddress(i32 0). Unfortunately, within the
function, it is not completely immutable: it is written during the function
prologue. When using post-RA instruction scheduling, the prologue instructions
are available for scheduling, and we're not free to interchange the order of a
particular store in the prologue with loads from that stack location.

Fixes PR21976.

llvm-svn: 224761
2014-12-23 09:45:06 +00:00
Elena Demikhovsky 3121449f0b AVX-512: BLENDM - fixed encoding of the broadcast version
Added more intrinsics and encoding tests.

llvm-svn: 224760
2014-12-23 09:36:28 +00:00
Michael Kuperstein f4536ea6e8 [DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements
This partially fixes PR21943.

For AVX, we go from:

vmovq   (%rsi), %xmm0
vmovq   (%rdi), %xmm1
vpermilps       $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps       $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
vinsertps       $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
vpermilps       $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
vinsertps       $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]

To the expected:

vmovq   (%rdi), %xmm0
vmovhpd (%rsi), %xmm0, %xmm0
retq

Fixing this for AVX2 is still open.

Differential Revision: http://reviews.llvm.org/D6749

llvm-svn: 224759
2014-12-23 08:59:45 +00:00
Hal Finkel 04b16b51ec [PowerPC] Don't attempt a 64-bit pow2 division on PPC32
In r224033, in moving the signed power-of-2 division expansion into
BuildSDIVPow2, I accidentally made it possible to attempt the lowering for a
64-bit division on PPC32. This later asserts.

Fixes PR21928.

llvm-svn: 224758
2014-12-23 08:38:50 +00:00
Michael Liao 5313da3263 [SimplifyCFG] Revise common code sinking
- Fix the case where more than 1 common instructions derived from the same
  operand cannot be sunk. When a pair of value has more than 1 derived values
  in both branches, only 1 derived value could be sunk.
- Replace BB1 -> (BB2, PN) map with joint value map, i.e.
  map of (BB1, BB2) -> PN, which is more accurate to track common ops.

llvm-svn: 224757
2014-12-23 08:26:55 +00:00
Michael Kuperstein 0bf33ffde4 Remove a bad cast in CloneModule()
A cast that was introduced in r209007 was accidentally left in after the changes made to GlobalAlias rules in r210062. This crashes if the aliasee is a now-leggal ConstantExpr.

llvm-svn: 224756
2014-12-23 08:23:45 +00:00
Ahmed Bougacha 4553bff412 [ARM] Don't break alignment when combining base updates into load/stores.
r223862/r224203 tried to also combine base-updating load/stores.
There was a mistake there: the alignment was added as is as an operand to
the ARMISD::VLD/VST node.  However, the VLD/VST selection logic doesn't care
about less-than-standard alignment attributes.
For example, no matter the alignment of a v2i64 load (say 1), SelectVLD picks
VLD1q64 (because of the memory type).  But VLD1q64 ("vld1.64 {dXX, dYY}") is
8-aligned, per ARMARMv7a 3.2.1.
For the 1-aligned load, what we really want is VLD1q8.

This commit introduces bitcasts if necessary, and changes the vld/vst type to
one whose standard alignment matches the original load/store alignment.

Differential Revision: http://reviews.llvm.org/D6759

llvm-svn: 224754
2014-12-23 06:07:31 +00:00
Alexey Samsonov 2c55974da5 Fix UBSan bootstrap: replace shift of negative value with multiplication.
llvm-svn: 224752
2014-12-23 04:15:53 +00:00
Chandler Carruth c7d1e24b34 Revert r224739: Debug info: Teach SROA how to update debug info for
fragmented variables.

This caused codegen to start crashing when we built somewhat large
programs with debug info and optimizations. 'check-msan' hit in, and
I suspect a bootstrap would as well. I mailed a test case to the
review thread.

llvm-svn: 224750
2014-12-23 02:58:14 +00:00
Jim Grosbach 1bd0f3530e X86: Don't over-align combined loads.
When combining consecutive loads+inserts into a single vector load,
we should keep the alignment of the base load. Doing otherwise can, and does,
lead to using overly aligned instructions. In the included test case, for
example, using a 32-byte vmovaps on a 16-byte aligned value. Oops.

rdar://19190968

llvm-svn: 224746
2014-12-23 00:35:23 +00:00
Reid Kleckner ce0093344f Make musttail more robust for vector types on x86
Previously I tried to plug musttail into the existing vararg lowering
code. That turned out to be a mistake, because non-vararg calls use
significantly different register lowering, even on x86. For example, AVX
vectors are usually passed in registers to normal functions and memory
to vararg functions.  Now musttail uses a completely separate lowering.

Hopefully this can be used as the basis for non-x86 perfect forwarding.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6156

llvm-svn: 224745
2014-12-22 23:58:37 +00:00
David Blaikie ea37c1173e Remove dynamic allocation/indirection from GCOVBlocks owned by GCOVFunction
Since these are all created in the DenseMap before they are referenced,
there's no problem with pointer validity by the time it's required. This
removes another use of DeleteContainerSeconds/manual memory management
which I'm cleaning up from time to time.

llvm-svn: 224744
2014-12-22 23:12:42 +00:00
Adrian Prantl d9e64b6c08 Thumb1 frame lowering: Mark CFI instructions with the FrameSetup flag.
Followup to r224294:

ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.

llvm-svn: 224743
2014-12-22 23:09:14 +00:00
Chandler Carruth e2f66ceed9 [SROA] Lift the logic for traversing the alloca slices one partition at
a time into a partition iterator and a Partition class.

There is a lot of knock-on simplification that this enables, largely
stemming from having a Partition object to refer to in lots of helpers.
I've only done a minimal amount of that because enoguh stuff is changing
as-is in this commit.

This shouldn't change any observable behavior. I've worked hard to
preserve the *exact* traversal semantics which were originally present
even though some of them make no sense. I'll be changing some of this in
subsequent commits now that the logic is carefully factored into
a reusable place.

The primary motivation for this change is to break the rewriting into
phases in order to support more intelligent rewriting. For example, I'm
planning to change how split loads and stores are rewritten to remove
the significant overuse of integer bit packing in the resulting code and
allow more effective secondary splitting of aggregates. For any of this
to work, they have to share the exact traversal logic.

llvm-svn: 224742
2014-12-22 22:46:00 +00:00
Bruno Cardoso Lopes bad65c3b70 [LCSSA] Handle PHI insertion in disjoint loops
Take two disjoint Loops L1 and L2.

LoopSimplify fails to simplify some loops (e.g. when indirect branches
are involved). In such situations, it can happen that an exit for L1 is
the header of L2. Thus, when we create PHIs in one of such exits we are
also inserting PHIs in L2 header.

This could break LCSSA form for L2 because these inserted PHIs can also
have uses in L2 exits, which are never handled in the current
implementation. Provide a fix for this corner case and test that we
don't assert/crash on that.

Differential Revision: http://reviews.llvm.org/D6624

rdar://problem/19166231

llvm-svn: 224740
2014-12-22 22:35:46 +00:00
Adrian Prantl a47ace5901 Debug info: Teach SROA how to update debug info for fragmented variables.
This allows us to generate debug info for extremely advanced code such as

  typedef struct { long int a; int b;} S;

  int foo(S s) {
    return s.b;
  }

which at -O1 on x86_64 is codegen'd into

  define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 {
    ret i32 %s.coerce1, !dbg !24
  }

with this patch we emit the following debug info for this

  TAG_formal_parameter [3]
    AT_location( 0x00000000
                 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004
                 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 )
                 AT_name( "s" )
                 AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" )

Thanks to chandlerc, dblaikie, and echristo for their feedback on all
previous iterations of this patch!

llvm-svn: 224739
2014-12-22 22:26:00 +00:00
Reid Kleckner 5aabc06c16 Fix Windows unwind info for functions in sections other than .text
Previously we assumed the section name had the form .text$foo, which is
what we used to do for inline functions. If the dollar wasn't present,
we'd put unwind data in the .pdata and .xdata sections for the main
.text section, which is incorrect.

Fixes PR22001.

llvm-svn: 224738
2014-12-22 22:10:08 +00:00
Colin LeMahieu 4b1eac4dda [Hexagon] Adding memb instruction. Fixing whitespace in test from 224730.
llvm-svn: 224735
2014-12-22 21:40:43 +00:00
Colin LeMahieu af1e5de141 [Hexagon] Adding classes and load unsigned byte instruction, updating usages.
llvm-svn: 224730
2014-12-22 21:20:03 +00:00
Bruno Cardoso Lopes 811c173523 [x86] Add vector @llvm.ctpop intrinsic custom lowering
Currently, when ctpop is supported for scalar types, the expansion of
@llvm.ctpop.vXiY uses vector element extractions, insertions and individual
calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used
for the scalar calls.

Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY
expansion in some cases by using a using a vector parallel bit twiddling
approach, based on:

v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = ((v + (v >> 4) & 0xF0F0F0F)
v = v + (v >> 8)
v = v + (v >> 16)
v = v & 0x0000003F
(from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)

When scalar ctpop isn't supported, the approach above performs better for
v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop
is supported, this approach performs ~2x better for v8i32.

Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop
support with -march=core-avx2.

== [x86_64h - new]
v8i32: 0.661685
v4i32: 0.514678
v4i64: 0.652009
v2i64: 0.324289
== [x86_64h - old]
v8i32: 1.29578
v4i32: 0.528807
v4i64: 0.65981
v2i64: 0.330707

== [x86_64 - new]
v8i32: 1.003
v4i32: 0.656273
v4i64: 1.11711
v2i64: 0.754064
== [x86_64 - old]
v8i32: 2.34886
v4i32: 1.72053
v4i64: 1.41086
v2i64: 1.0244

More work for other vector types will come next.

llvm-svn: 224725
2014-12-22 19:45:43 +00:00
Juergen Ributzka a3ca1b8823 Remove unused header. NFC.
llvm-svn: 224722
2014-12-22 19:09:15 +00:00
Peter Zotov c433cd7bfa [C API] Expose LLVMGetGlobalValueAddress and LLVMGetFunctionAddress.
Patch by Ramkumar Ramachandra <artagnon@gmail.com>

llvm-svn: 224720
2014-12-22 18:53:11 +00:00
Quentin Colombet 84f89ccd45 [CodeGenPrepare] Handle properly the promotion of operands when this does not
generate instructions.

Fixes PR21978.
Related to <rdar://problem/18310086>

llvm-svn: 224717
2014-12-22 18:11:52 +00:00
Elena Demikhovsky 949b0d46bf AVX-512: Added all forms of BLENDM instructions,
intrinsics, encoding tests for AVX-512F and skx instructions.

llvm-svn: 224707
2014-12-22 13:52:48 +00:00
Karthik Bhat bf662901c1 Lower multiply-negate operation to mneg on AArch64
This patch pattern matches code such as-
neg	 w8, w8
mul	 w8, w9, w8
to
mneg	 w8, w8, w9

Review: http://reviews.llvm.org/D6754
llvm-svn: 224706
2014-12-22 13:38:58 +00:00
Rafael Espindola 366e5c1bf1 The leak detector is dead, long live asan and valgrind.
In resent times asan and valgrind have found way more memory management bugs
in llvm than the special purpose leak detector.

llvm-svn: 224703
2014-12-22 13:00:36 +00:00
Saleem Abdulrasool 90c224a143 CodeGen: minor style tweaks to SSP
Clean up some style related things in the StackProtector CodeGen.  NFC.

llvm-svn: 224693
2014-12-21 21:52:38 +00:00
Craig Topper 23fd69560b [X86] Add hasSideEffects = 0 to CALLpcrel16. This matches what is inferred from patterns for the 32-bit version.
llvm-svn: 224692
2014-12-21 20:05:06 +00:00
Matt Arsenault 22b4c256e1 Enable (sext x) == C --> x == (trunc C) combine
Extend the existing code which handles this for zext. This makes this
more useful for targets with ZeroOrNegativeOne BooleanContent and
obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne)
since the constant will now be shrunk to i1.

llvm-svn: 224691
2014-12-21 16:48:42 +00:00
Craig Topper 01dcd8a31d [X86] Swap operand order in Intel syntax on a bunch of aliases.
llvm-svn: 224687
2014-12-20 23:05:59 +00:00
Craig Topper 643a11268f [X86] Swap operand order of imul aliases in Intel syntax. Also disable printing of the alias instead of the real instruction.
llvm-svn: 224686
2014-12-20 23:05:57 +00:00
Craig Topper 4b8a47050f [X86] Remove '*' from asm strings in far call/jump aliases for Intel syntax.
llvm-svn: 224685
2014-12-20 23:05:55 +00:00
Craig Topper 3564080896 [X86] Don't swap the order of segment and offset in immediate form of far call/jump in Intel syntax.
llvm-svn: 224684
2014-12-20 23:05:52 +00:00
Saleem Abdulrasool 57b5fe57e3 CodeGen: constify and use range loop for SSP
Use range-based for loop and constify the iterators.  NFC.

llvm-svn: 224683
2014-12-20 21:37:51 +00:00
Saleem Abdulrasool 0fa832002c ARM: further improve deprecated diagnosis (LDM)
The ARM ARM states:
  LDM/LDMIA/LDMFD:
    The SP can be in the list. However, ARM deprecates using these instructions
    with SP in the list.

    ARM deprecates using these instructions with both the LR and the PC in the
    list.

  LDMDA/LDMFA/LDMDB/LDMEA/LDMIB/LDMED:
    The SP can be in the list. However, instructions that include the SP in the
    list are deprecated.

    Instructions that include both the LR and the PC in the list are deprecated.

  POP:
    The SP can only be in the list before ARMv7. ARM deprecates any use of ARM
    instructions that include the SP, and the value of the SP after such an
    instruction is UNKNOWN.

    ARM deprecates the use of this instruction with both the LR and the PC in
    the list.

Attempt to diagnose use of deprecated forms of these instructions.  This mirrors
the previous changes to diagnose use of the deprecated forms of STM in ARM mode.

llvm-svn: 224682
2014-12-20 20:25:36 +00:00
Craig Topper 35545fa20a [X86] Immediate forms of far call/jump are not valid in x86-64.
llvm-svn: 224678
2014-12-20 07:43:27 +00:00
David Majnemer b0362e4ee6 InstCombine: Squash an icmp+select into bitwise arithmetic
(X & INT_MIN) == 0 ? X ^ INT_MIN : X  into  X | INT_MIN
(X & INT_MIN) != 0 ? X ^ INT_MIN : X  into  X & INT_MAX

This fixes PR21993.

llvm-svn: 224676
2014-12-20 04:45:35 +00:00
David Majnemer 147f8586be InstSimplify: Don't bother if getScalarSizeInBits returns zero
getScalarSizeInBits returns zero when the comparison operands are not
integral.  No functionality change intended.

llvm-svn: 224675
2014-12-20 04:45:33 +00:00
David Majnemer 7bd7144e44 Simplify the code
No functionality change intended.

llvm-svn: 224673
2014-12-20 03:29:59 +00:00
David Majnemer 0b6a0b0257 InstSimplify: Optimize away pointless comparisons
(X & INT_MIN) ? X & INT_MAX : X  into  X & INT_MAX
(X & INT_MIN) ? X : X & INT_MAX  into  X
(X & INT_MIN) ? X | INT_MIN : X  into  X
(X & INT_MIN) ? X : X | INT_MIN  into  X | INT_MIN

llvm-svn: 224669
2014-12-20 03:04:38 +00:00
Chandler Carruth 113dc64c67 [SROA] Run clang-format over the entire SROA pass as I wrote it before
much of the glory of clang-format, and now any time I touch it I risk
introducing formatting changes as part of a functional commit.

Also, clang-format is *way* better at formatting my code than I am.
Most of this is a huge improvement although I reverted a couple of
places where I hit a clang-format bug with lambdas that has been filed
but not (fully) fixed.

llvm-svn: 224666
2014-12-20 02:39:18 +00:00
Matthias Braun 714c494ca1 LiveIntervalAnalysis: No kill flags for partially undefined uses.
We must not add kill flags when reading a vreg with some undefined
subregisters, if subreg liveness tracking is enabled.  This is because
the register allocator may reuse these undefined subregisters for other
values which are not killed.

llvm-svn: 224664
2014-12-20 01:54:50 +00:00
Matthias Braun 7f8dece1d7 LiveIntervalAnalysis: cleanup addKills(), NFC
- Use more const modifiers
- Use references for things that can't be nullptr
- Improve some variable names

llvm-svn: 224663
2014-12-20 01:54:48 +00:00
Eric Christopher 3ab98895bc Remove unused variable and initialization.
llvm-svn: 224655
2014-12-20 00:07:09 +00:00
Eric Christopher 8985ba912f Remove unused variable, initializer, and accessor.
llvm-svn: 224650
2014-12-19 23:46:53 +00:00
Matt Arsenault 013ddaf18c R600: Remove outdated comment
llvm-svn: 224648
2014-12-19 23:29:13 +00:00
Elena Demikhovsky fb73ca516b Masked load and store codegen - fixed 128-bit vectors
The codegen failed on 128-bit types on AVX2.
I added patterns and in td files and tests.

llvm-svn: 224647
2014-12-19 23:27:57 +00:00
Matt Arsenault dc10307524 R600/SI: Only form min/max with 1 use.
If the condition is used for something else, this increases
the number of instructions.

llvm-svn: 224646
2014-12-19 23:15:30 +00:00
Reid Kleckner f2acbbaf22 EH: Sink computation of local PadMap variable into function that uses it
No functionality change.

llvm-svn: 224635
2014-12-19 22:30:08 +00:00
Kevin Enderby 52e4ce4a53 Add printing the LC_ROUTINES load commands with llvm-objdump’s -private-headers.
llvm-svn: 224627
2014-12-19 22:25:22 +00:00
Reid Kleckner 93acac6cfc Add the ExceptionHandling::MSVC enumeration
It is intended to be used for a family of personality functions that
have similar IR preparation requirements. Typically when interoperating
with MSVC personality functions, bits of functionality need to be
outlined from the main function into helper functions. There is also
usually more than one landing pad per invoke, which does not match the
LLVM IR landingpad representation.

None of this is implemented yet. This change just adds a new enum that
is active for *-windows-msvc and delegates to the EH removal preparation
pass.  No functionality change for other targets.

llvm-svn: 224625
2014-12-19 22:19:48 +00:00
Sanjay Patel 1da5f1645b Model sqrtss as a binary operation with one source operand tied to the destination (PR14221)
This is a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).
That patch started to fix PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ), but it was not completed. 

Differential Revision: http://reviews.llvm.org/D6330

llvm-svn: 224624
2014-12-19 22:16:28 +00:00
Tom Stellard 5352f35a89 R600/SI: isLegalOperand() shouldn't check constant bus for SALU instructions
The constant bus restrictions only apply to VALU instructions.  This
enables SIFoldOperands to fold immediates into SALU instructions.

llvm-svn: 224623
2014-12-19 22:15:37 +00:00
Tom Stellard c3d7eeb6e5 R600/SI: Make sure non-inline constants aren't folded into mubuf soffset operand
mubuf instructions now define the soffset field using the SCSrc_32
register class which indicates that only SGPRs and inline constants
are allowed.

llvm-svn: 224622
2014-12-19 22:15:30 +00:00
Yaron Keren e8270225a2 Remove isSubroutineType test for isCompositeType, getTag() is enough.
llvm-svn: 224621
2014-12-19 22:15:09 +00:00
Kevin Enderby 186eac3c0c Add printing the LC_SUB_CLIENT load command with llvm-objdump’s -private-headers.
llvm-svn: 224616
2014-12-19 21:06:24 +00:00
Colin LeMahieu 0f850bde0e [Hexagon] Removing old variants of instructions and updating references.
llvm-svn: 224612
2014-12-19 20:29:29 +00:00
Sanjay Patel 0428a5786e merge consecutive stores of extracted vector elements
Add a path to DAGCombiner::MergeConsecutiveStores() 
to combine multiple scalar stores when the store operands
are extracted vector elements. This is a partial fix for
PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

For the new test case, codegen improves from:

   vmovss  %xmm0, (%rdi)
   vextractps      $1, %xmm0, 4(%rdi)
   vextractps      $2, %xmm0, 8(%rdi)
   vextractps      $3, %xmm0, 12(%rdi)
   vextractf128    $1, %ymm0, %xmm0
   vmovss  %xmm0, 16(%rdi)
   vextractps      $1, %xmm0, 20(%rdi)
   vextractps      $2, %xmm0, 24(%rdi)
   vextractps      $3, %xmm0, 28(%rdi)
   vzeroupper
   retq

To:

   vmovups	%ymm0, (%rdi)
   vzeroupper
   retq

Patch reviewed by Nadav Rotem.

Differential Revision: http://reviews.llvm.org/D6698

llvm-svn: 224611
2014-12-19 20:23:41 +00:00
Colin LeMahieu 38ce8cd2e2 [Hexagon] Adding bit extraction and table indexing instructions.
llvm-svn: 224610
2014-12-19 20:01:08 +00:00
Colin LeMahieu 3c7f664d5a [Hexagon] Adding bit insertion instructions.
llvm-svn: 224609
2014-12-19 19:54:38 +00:00
Colin LeMahieu d63ef93b4b [Hexagon] Adding more xtype shift instructions.
llvm-svn: 224608
2014-12-19 19:51:35 +00:00
Kevin Enderby 36c8d3ae63 Add printing the LC_SUB_LIBRARY load command with llvm-objdump’s -private-headers.
llvm-svn: 224607
2014-12-19 19:48:16 +00:00
Colin LeMahieu cc09d1ccc5 [Hexagon] Adding xtype shift instructions.
llvm-svn: 224604
2014-12-19 19:34:50 +00:00
Colin LeMahieu f3db884efb [Hexagon] Adding transfers to and from control registers.
llvm-svn: 224599
2014-12-19 19:06:32 +00:00
Colin LeMahieu 402f772b82 [Hexagon] Adding doubleregs for control registers. Renaming control register class.
llvm-svn: 224598
2014-12-19 18:56:10 +00:00
Frederic Riss b88adbdeb0 [DebugInfo] Move all DWARF headers to the public include directory.
dsymutil needs access to DWARF specific inforamtion, the small DIContext
wrapper isn't sufficient. Other DWARF consumers might want to use it too
(I'm looking at you lldb).

Differential Revision: http://reviews.llvm.org/D6694

llvm-svn: 224594
2014-12-19 18:26:33 +00:00
Tilmann Scheller be98f3c4ec [BBVectorize] Remove two more redundant assignments.
Found by the Clang static analyzer.

llvm-svn: 224590
2014-12-19 17:21:38 +00:00
Tilmann Scheller 945ce0ae00 [BBVectorize] Remove redundant assignment.
Found by the Clang static analyzer.

llvm-svn: 224589
2014-12-19 17:13:12 +00:00
Bruno Cardoso Lopes f6cf8ad4e5 Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. Also, fix code to also return the modified switch when only
the truncation is performed.

This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224588
2014-12-19 17:12:35 +00:00
Tilmann Scheller b811030b47 [LoopVectorize] Remove redundant assignment.
Found by the Clang static analyzer.

llvm-svn: 224587
2014-12-19 17:02:31 +00:00
Tilmann Scheller e24bb41bad [ARM] Remove dead assignment.
Found by the Clang static analyzer.

llvm-svn: 224586
2014-12-19 16:57:33 +00:00
Sanjay Patel ea3c802887 use -0.0 when creating an fneg instruction
Backends recognize (-0.0 - X) as the canonical form for fneg
and produce better code. Eg, ppc64 with 0.0:

   lis r2, ha16(LCPI0_0)
   lfs f0, lo16(LCPI0_0)(r2)
   fsubs f1, f0, f1
   blr

vs. -0.0:

   fneg f1, f1
   blr

Differential Revision: http://reviews.llvm.org/D6723

llvm-svn: 224583
2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes 3be15b2fa6 Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr"
Reverts commit r224574 to appease buildbots:

The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

llvm-svn: 224576
2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes c9005f2f2b [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224574
2014-12-19 14:23:15 +00:00
Tilmann Scheller 3f7f3341b9 Remove redundant assignment.
Found with the Clang static analyzer.

llvm-svn: 224570
2014-12-19 11:29:34 +00:00
Duncan P. N. Exon Smith 46d7af5729 Rename MapValue(Metadata*) to MapMetadata()
Instead of reusing the name `MapValue()` when mapping `Metadata`, use
`MapMetadata()`.  The old name doesn't make much sense after the
`Metadata`/`Value` split.

llvm-svn: 224566
2014-12-19 06:06:18 +00:00
Juergen Ributzka 4d7f70d47e [Object] Don't crash on empty export lists.
Summary: This fixes the exports iterator if the export list is empty.

Reviewers: Bigcheese, kledzik

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6732

llvm-svn: 224563
2014-12-19 02:31:01 +00:00
Matthias Braun aeb50b3805 RegisterCoalescer: rewrite eliminateUndefCopy().
This also fixes problems with undef copies of subregisters. I can't
attach a testcase for that as none of the targets in trunk has
subregister liveness tracking enabled.

llvm-svn: 224560
2014-12-19 01:39:46 +00:00
Colin LeMahieu 5ccbb1298b [Hexagon] Adding loop0/1 sp0/1/2loop0 instructions.
llvm-svn: 224556
2014-12-19 00:06:53 +00:00
Adrian Prantl cf44e7870b Explain why LLVM is emitting a DW_AT_containing_type inside of a class.
llvm-svn: 224555
2014-12-19 00:01:20 +00:00
David Majnemer 824e011ad7 ConstantFold: Shifting undef by zero results in undef
llvm-svn: 224553
2014-12-18 23:54:43 +00:00
Colin LeMahieu 174476ed96 Reverting 224550, was not ready for commit.
llvm-svn: 224552
2014-12-18 23:36:15 +00:00
Colin LeMahieu 9000481cda [Hexagon] Adding loop0/1 sp0/1/2loop0 instructions.
llvm-svn: 224550
2014-12-18 23:27:51 +00:00
Kevin Enderby a2bd8d98a1 Add printing the LC_SUB_UMBRELLA load command with llvm-objdump’s -private-headers.
llvm-svn: 224548
2014-12-18 23:13:26 +00:00
Roman Divacky a93d002321 Instead of explicitely comparing both lowercase and uppercase variants.
.lower() the Name and compare only the lowecase. Removing 81 compares/lines of
code. This changes the accepted string to be mixed lower/upper case but it
should be ok.

Discussed with Jim Grosbach.

llvm-svn: 224547
2014-12-18 23:12:34 +00:00
Sanjay Patel c242dbb3b6 fix formatting; NFC
llvm-svn: 224542
2014-12-18 21:11:09 +00:00
Matthias Braun 15abf3743c LiveIntervalAnalysis: Cleanup computeDeadValues
- This also fixes a bug introduced in r223880 where values were not
  correctly marked as Dead anymore.
- Cleanup computeDeadValues(): split up SubRange code variant, simplify
  arguments.

llvm-svn: 224538
2014-12-18 19:58:52 +00:00
Kevin Enderby b4b7931748 Add printing the LC_SUB_FRAMEWORK load command with llvm-objdump’s -private-headers.
llvm-svn: 224534
2014-12-18 19:24:35 +00:00
Juergen Ributzka 84ba342ba9 Add missing implementation of 'sys::path::is_other' to the support library.
The header claims that this function exists, but the linker wasn't too happy
about it not being in the library.

llvm-svn: 224527
2014-12-18 18:19:47 +00:00
Jozef Kolek 2f27d571c8 [mips][microMIPS] Fix bugs related to atomic SC/LL instructions
Fix bugs related to atomic microMIPS SC/LL instructions: While expanding atomic
operations the mips32r2 encoding was emitted instead of microMIPS.

Differential Revision: http://reviews.llvm.org/D6659

llvm-svn: 224524
2014-12-18 16:39:29 +00:00
Saleem Abdulrasool 0b5a8520ac ARM: fix an off-by-one in the register list access
Fix an off-by-one access introduced in 224502 for push.w and pop.w with single
register operands.  Add test cases for both scenarios.

Thanks to Asiri Rathnayake for pointing out the failure!

llvm-svn: 224521
2014-12-18 16:16:53 +00:00
Robert Khasanov 79fb7292d7 [AVX512] Enable FP arithmetic lowering for AVX512VL subsets.
Added RegOp2MemOpTable4 to transform 4th operand from register to memory in merge-masked versions of instructions. 
Added lowering tests.

llvm-svn: 224516
2014-12-18 12:28:22 +00:00
Viktor Kutuzov b4ffb5d5e9 [Msan] Generalize instrumentation code to support FreeBSD mapping
Differential Revision: http://reviews.llvm.org/D6666

llvm-svn: 224514
2014-12-18 12:12:59 +00:00
Yaron Keren 06d6930b68 Fix Visual C++ error "'llvm::make_unique' : ambiguous call to overloaded function".
llvm-svn: 224506
2014-12-18 10:03:35 +00:00
Saleem Abdulrasool 3a23917d48 ARM: improve instruction validation for thumb mode
The ARM Architecture Reference Manual states the following:
  LDM{,IA,DB}:
    The SP cannot be in the list.
    The PC can be in the list.
    If the PC is in the list:
      • the LR must not be in the list
      • the instruction must be either outside any IT block, or the last
        instruction in an IT block.
  POP:
    The PC can be in the list.
    If the PC is in the list:
      • the LR must not be in the list
      • the instruction must be either outside any IT block, or the last
        instruction in an IT block.
  PUSH:
    The SP and PC can be in the list in ARM instructions, but not in Thumb
    instructions.
  STM:{,IA,DB}:
    The SP and PC can be in the list in ARM instructions, but not in Thumb
    instructions.

llvm-svn: 224502
2014-12-18 05:24:38 +00:00
Chandler Carruth 68ea415d04 [SROA] Cleanup - remove the use of std::mem_fun_ref nonsense and use
a lambda now that we have them.

llvm-svn: 224500
2014-12-18 05:19:47 +00:00
Rafael Espindola 7d727b5f11 Modernize the getStreamedBitcodeModule interface a bit. NFC.
llvm-svn: 224499
2014-12-18 05:08:43 +00:00
Craig Topper f7df7221d1 [PowerPC] Use MCPhysReg for tables of registers. Const-correct the tables. Only put the anonymous namespace around classes. NFC.
llvm-svn: 224498
2014-12-18 05:02:14 +00:00
Craig Topper 5645f6b45b [X86] Use correct opsize on indirect call and jump aliases.
llvm-svn: 224497
2014-12-18 05:02:12 +00:00
Craig Topper 9480732be2 [X86] Don't use PS prefix on LDMXCSR/STMXCSR.
Near as I can tell prefixes are ignored on these instructions except for a comment in the Intel docs about 0xf3. Binutils disassembler seems to ignore prefixes on these instructions. Our disassembler still doesn't distinguish PS and "no prefix" well enough for this to make a functional change, but it helps with experiments I'm doing on a potential new disassembler table builder.

llvm-svn: 224496
2014-12-18 05:02:10 +00:00
Craig Topper 2e2aee0cd6 [X86] Remove unnecessary 'In64BitMode' predicate for instructions that already indicate use of REX.W.
llvm-svn: 224495
2014-12-18 05:02:08 +00:00
Justin Hibbits 88030b94c6 Add a corresponding '@LOCAL' parse to match r224415.
Pointed out by Jim Grosbach.

llvm-svn: 224494
2014-12-18 03:06:37 +00:00
Eric Christopher 661f2d1ca1 Add a new string member to the TargetOptions struct for the name
of the abi we should be using. For targets that don't use the
option there's no change, otherwise this allows external users
to set the ABI via string and avoid some of the -backend-option
pain in clang.

Use this option to move the ABI for the ARM port from the
Subtarget to the TargetMachine and update the testcases
accordingly since it's no longer valid to set via -mattr.

llvm-svn: 224492
2014-12-18 02:20:58 +00:00
Eric Christopher 1971c3508a Model ARM backend ABI selection after the front end code doing the
same. This will change the "bare metal" ABI from APCS to AAPCS.

The only difference between the front and back end code is that
the code for Triple::GNU was added for environment. That will migrate
to the front end shortly.

Tests updated with the ABI they were originally testing in the case
of bare metal (e.g. -mtriple armv7) or with a -gnu for arm-linux
triples.

llvm-svn: 224489
2014-12-18 02:08:45 +00:00
Duncan P. N. Exon Smith fda0cee7c6 Reapply "Linker: Drop superseded subprograms"
This reverts commit r224416, reapplying r224389.  The buildbots hadn't
recovered after my revert, waiting until David reverted a couple of his
commits.  It looks like it was just bad timing (where we were both
modifying code related to the same assertion).  Trying again...

Here's the original text:

    When a function gets replaced by `ModuleLinker`, drop superseded
    subprograms.  This ensures that the "first" subprogram pointing at a
    function is the same one that `!dbg` references point at.

    This is a stop-gap fix for PR21910.  Notably, this fixes Release+Asserts
    bootstraps that are currently asserting out in
    `LexicalScopes::initialize()` due to the explicit instantiations in
    `lib/IR/Dominators.cpp` eventually getting replaced by -argpromotion.

llvm-svn: 224487
2014-12-18 01:05:33 +00:00
Kevin Enderby d0b6b7fb7f Add printing the LC_LINKER_OPTION load command with llvm-objdump’s -private-headers.
Also corrected the name of the load command to not end in an ’S’ as well as corrected
the name of the MachO::linker_option_command struct and other places that had the
word option as plural which did not match the Mac OS X headers.

llvm-svn: 224485
2014-12-18 00:53:40 +00:00
Duncan P. N. Exon Smith 97f07c2778 IR: Handle self-referencing DICompositeTypes in DIBuilder
Add API to DIBuilder to handle self-referencing `DICompositeType`s.

Self-references aren't expected in the debug info graph, and we take
advantage of that by only calling `resolveCycles()` on nodes that were
once forward declarations (otherwise, DIBuilder needs an expensive
tracking reference to every unresolved node it creates, which in cyclic
graphs is *all of them*).

However, clang seems to create self-referencing `DICompositeType`s.  Add
API to manage this safely.  The paired commit to clang will include the
regression test.

I'll make the `DICompositeType` API `private` in a follow-up to prevent
misuse (I've separated that to prevent build failures from missing the
clang commit).

llvm-svn: 224482
2014-12-18 00:46:16 +00:00
Duncan P. N. Exon Smith 140d41b791 LTO: Lazy-load LTOModule in local contexts
Start lazy-loading `LTOModule`s that own their contexts.  These can only
really be used for parsing symbols, so its unnecessary to ever
materialize their functions.

I looked into using `IRObjectFile::create()` and optionally calling
`materializAllPermanently()` afterwards, but this turned out to be
awkward.

  - The default target triple and data layout logic needs to happen
    *before* the call to `IRObjectFile::IRObjectFile()`, but after
    `Module` was created.

  - I tried passing a lambda in to do the module initialization, but
    this seemed to require threading the error message from
    `TargetRegistry::lookupTarget()` through `std::error_code`.

  - I also looked at setting `errMsg` directly from within the lambda,
    but this didn't look any better.

(I guess there's a reason we weren't already using that function.)

llvm-svn: 224466
2014-12-17 22:05:42 +00:00
Kostya Serebryany fea4fb404e [sanitizer] allow -fsanitize-coverage=N w/ -fsanitize=leak, llvm part
llvm-svn: 224463
2014-12-17 21:50:04 +00:00
Matthias Braun 0a410f6243 RegisterCoalescer: Fix stripCopies() picking up main range instead of subregister range
This fixes a problem where stripCopies() would switch to values in the
main liverange when it crossed a copy instruction. However when joining
subranges we need to stay in the respective subregister ranges.

llvm-svn: 224461
2014-12-17 21:25:20 +00:00
Matt Arsenault 303011a005 R600/SI: Fix f64 inline immediates
llvm-svn: 224458
2014-12-17 21:04:08 +00:00
Colin LeMahieu 2055538edb [Hexagon] Reconfiguring register alternate names.
llvm-svn: 224455
2014-12-17 20:35:11 +00:00
Will Schmidt 428488c594 Enable the P8Model entry
This was missed last time around, for the P8 Instruction Scheduling
changes (223257). This will hook the P8Model entry in so those
changes will actually be used.

llvm-svn: 224452
2014-12-17 19:56:29 +00:00
Matthias Braun 8142efa8ea ExecutionDepsFix: Correctly handle wide registers.
The ExecutionDepsFix previously mapped each register to 1 or zero
registers of the register class it was called with and therefore
simulating liveness for.  This was problematic for cases involving wider
registers like Q0 on ARM where ExecutionDepsFix gets invoked for the Dxx
registers. In these cases the wide register would get mapped to the last
matching D register, while it should have been all matching D registers.
This commit changes the AliasMap to use a SmallVector to map registers
to potentially multiple destination regclass registers. This is required
to avoid regressions with subregister liveness tracking enabled.

llvm-svn: 224447
2014-12-17 19:13:47 +00:00
JF Bastien e6acbdc487 Random Number Generator Refactoring (removing from Module)
This patch removes the RNG from Module. Passes should instead create a new RNG for their use as needed.

Patch by Stephen Crane @rinon.

Differential revision: http://reviews.llvm.org/D4377

llvm-svn: 224444
2014-12-17 18:12:10 +00:00
Jingyue Wu e4c9cf04f5 [NVPTX] Fix bugs related to isSingleValueType
Summary:
With isSingleValueType starting to treat vector types as single-value types,
code that uses this interface needs to be updated.

Test Plan:
vector-global.ll
nvcl-param-align.ll

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D6573

llvm-svn: 224440
2014-12-17 17:59:04 +00:00
Saleem Abdulrasool 1ce7d31f33 ARM: correct an off-by-one in an assert
The assert was off-by-one, resulting in failures for valid input.

Thanks to Asiri Rathnayake for pointing out the failure!

llvm-svn: 224432
2014-12-17 16:17:44 +00:00
Michael Kuperstein 047b1a0400 [DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.
This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors.

This fixes PR15872 and provides a partial fix for PR21711.

Differential Revision: http://reviews.llvm.org/D6678

llvm-svn: 224429
2014-12-17 12:32:17 +00:00
Vladimir Medic 636fefe252 MipsABIInfo class is used in different libraries. Moving the files to MCTargetDesc folder(LLVMMipsDesc library) prevents linkage errors. There are no functional changes.
llvm-svn: 224427
2014-12-17 11:49:56 +00:00
Toma Tabacu a23f13c3b0 [mips] Set GCC-compatible MIPS asssembler options before inline asm blocks.
Summary:
When generating MIPS assembly, LLVM always overrides the default assembler options by emitting the '.set noreorder', '.set nomacro' and '.set noat' directives,
while GCC uses the default options if an assembly-level function contains inline assembly code.

This becomes a problem when the code generated by LLVM is interleaved with inline assembly which assumes GCC-like assembler options (from Linux, for example).

This patch fixes these conflicts by setting the appropriate assembler options at the beginning of an inline asm block and popping them at the end.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6637

llvm-svn: 224425
2014-12-17 10:56:16 +00:00
Suyog Sarda 43fae93da8 Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
and vectorizes it." 

This was re-ordering floating point data types resulting in mismatch in output.

llvm-svn: 224424
2014-12-17 10:34:27 +00:00
Erik Eckstein a451b9b0b5 Strength reduce intrinsics with overflow into regular arithmetic operations if possible.
Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced.
This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow.
It completes the work on PR20194.

llvm-svn: 224417
2014-12-17 07:29:19 +00:00
Duncan P. N. Exon Smith 92731d26bc Revert "Linker: Drop superseded subprograms"
This reverts commit r224389.  Based on feedback from the bots, the
assertion seems to be going off *more* often, not less (previously I was
just seeing it in an internal bootstrap, now it's happening in public
builds too).

http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build/936/
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/5325

Reverting in order to investigate.

llvm-svn: 224416
2014-12-17 07:27:31 +00:00
Justin Hibbits 0c0d5deff1 Add parsing of 'foo@local".
Summary:
Currently, it supports generating, but not parsing, this expression.
Test added as well.

Test Plan: New test added, no regressions due to this.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6672

llvm-svn: 224415
2014-12-17 06:23:35 +00:00
Rafael Espindola 5f06030989 Remove a debugging assert.
Sorry for the noise, I have no idea how it survived to the final version.

llvm-svn: 224414
2014-12-17 03:38:04 +00:00
Rafael Espindola 81adfb5c2e Fix the windows build.
llvm-svn: 224412
2014-12-17 02:42:20 +00:00
Rafael Espindola 97935a9123 Refactor and simplify the code reading /proc/cpuinfo. NFC.
llvm-svn: 224410
2014-12-17 02:32:44 +00:00
Matthias Braun f4a72cd06e RegisterCoalescer: Sprinkle some const modifiers.
llvm-svn: 224409
2014-12-17 02:18:13 +00:00
Nick Lewycky 52ee5e446b Delete debugging cruft that crept in with r223802.
llvm-svn: 224407
2014-12-17 01:56:51 +00:00
David Majnemer 65c52ae8ca InstSimplify: shl nsw/nuw undef, %V -> undef
We can always choose an value for undef which might cause %V to shift
out an important bit except for one case, when %V is zero.

However, shl behaves like an identity function when the right hand side
is zero.

llvm-svn: 224405
2014-12-17 01:54:33 +00:00
Nick Lewycky ee0a3a7a2f Make ValueEnumerator::print use OS for metadata too. Noticed by inspection.
llvm-svn: 224404
2014-12-17 01:52:08 +00:00
Quentin Colombet fc2201e922 [CodeGenPrepare] Reapply r224351 with a fix for the assertion failure:
The type promotion helper does not support vector type, so when make
such it does not kick in in such cases.

Original commit message:
[CodeGenPrepare] Move sign/zero extensions near loads using type promotion.

This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>

llvm-svn: 224402
2014-12-17 01:36:17 +00:00
Kevin Enderby 57538299e8 Add printing the LC_ENCRYPTION_INFO_64 load command with llvm-objdump’s -private-headers
and add tests for the two AArch64 binaries.

llvm-svn: 224400
2014-12-17 01:01:30 +00:00
David Blaikie 8b979f01c6 PR21875: codegen for non-type template parameters of nullptr_t type
llvm-svn: 224399
2014-12-17 00:43:22 +00:00
Reid Kleckner 04b69f89aa Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type promotion."
This reverts commit r224351. It causes assertion failures when building
ICU.

llvm-svn: 224397
2014-12-17 00:29:23 +00:00
Hans Wennborg 224cb82a39 SelectionDAG switch lowering: use 'unsigned' to count destination popularity
SwitchInst::getNumCases() returns unsinged, so using uint64_t to count cases
seems unnecessary.

Also fix a missing CHECK in the test case.

llvm-svn: 224393
2014-12-16 23:41:59 +00:00
Colin LeMahieu aa1bade7b4 [Hexagon] Updating doubleword shift usages to new versions.
llvm-svn: 224391
2014-12-16 23:36:15 +00:00
Kevin Enderby 0804f467f2 Add printing the LC_ENCRYPTION_INFO load command with llvm-objdump’s -private-headers.
llvm-svn: 224390
2014-12-16 23:25:52 +00:00
Duncan P. N. Exon Smith 8759026893 Linker: Drop superseded subprograms
When a function gets replaced by `ModuleLinker`, drop superseded
subprograms.  This ensures that the "first" subprogram pointing at a
function is the same one that `!dbg` references point at.

This is a stop-gap fix for PR21910.  Notably, this fixes Release+Asserts
bootstraps that are currently asserting out in
`LexicalScopes::initialize()` due to the explicit instantiations in
`lib/IR/Dominators.cpp` eventually getting replaced by -argpromotion.

llvm-svn: 224389
2014-12-16 23:23:41 +00:00
Simon Pilgrim bf1e079005 [X86][SSE] Vector double -> float conversion memory folding (cvtpd2ps)
Added a missing memory folding relationship for the (V)CVTPD2PS instruction - we can safely fold these for stack reloads.

Differential Revision: http://reviews.llvm.org/D6663

llvm-svn: 224383
2014-12-16 22:30:10 +00:00
Rafael Espindola 9573a9cf9d Make the assert a bit stronger.
We should get no declarations in here.

llvm-svn: 224382
2014-12-16 22:29:43 +00:00
Colin LeMahieu 7fc90fc7e9 [Hexagon] Removing old XTYPE/BIT instructions and replacing usages.
llvm-svn: 224381
2014-12-16 22:17:09 +00:00
Sanjay Patel 7129c10cae merge consecutive loads that are offset from a base address
SelectionDAG::isConsecutiveLoad() was not detecting consecutive loads
when the first load was offset from a base address. 

This patch recognizes that pattern and subtracts the offset before comparing
the second load to see if it is consecutive.

The codegen change in the new test case improves from:

vmovsd	32(%rdi), %xmm0
vmovsd	48(%rdi), %xmm1 
vmovhpd	56(%rdi), %xmm1, %xmm1
vmovhpd	40(%rdi), %xmm0, %xmm0
vinsertf128	$1, %xmm1, %ymm0, %ymm0

To:

vmovups	32(%rdi), %ymm0

An existing test case is also improved from:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovsd	24(%rdi), %xmm2
vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
vmovhpd	8(%rdi), %xmm1, %xmm3

To:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovhpd	24(%rdi), %xmm0, %xmm0
vmovhpd	8(%rdi), %xmm1, %xmm1

This patch fixes PR21771 ( http://llvm.org/bugs/show_bug.cgi?id=21771 ).

Differential Revision: http://reviews.llvm.org/D6642

llvm-svn: 224379
2014-12-16 21:57:18 +00:00
Colin LeMahieu f5acc8c625 [Hexagon] Adding tstbit/bitclr/bitset instructions.
llvm-svn: 224374
2014-12-16 21:28:58 +00:00
Kostya Serebryany 7376294086 [sanitizer] prevent function call merging for sanitizer-coverage callbacks
llvm-svn: 224372
2014-12-16 21:24:15 +00:00
Colin LeMahieu 615757f2f1 [Hexagon] Adding bit count and twiddling instructions.
llvm-svn: 224367
2014-12-16 20:57:56 +00:00
Colin LeMahieu 6fce46baf6 [Hexagon] Adding asr/lsr/asl reg/imm, asl with saturation, asr with rounding. Doubleword abs/neg/not. Interleave and deinterleave instructions.
llvm-svn: 224365
2014-12-16 20:40:23 +00:00
JF Bastien 5d3280c7a7 x86-32: PUSHF/POPF use/def EFLAGS
Summary: As a side-quest for D6629 jvoung pointed out that I should use -verify-machineinstrs and this found a bug in x86-32's handling of EFLAGS for PUSHF/POPF. This patch fixes the use/def, and adds -verify-machineinstrs to all x86 tests which contain 'EFLAGS'. One exception: this patch leaves inline-asm-fpstack.ll as-is because it fails -verify-machineinstrs in a way unrelated to EFLAGS. This patch also modifies cmpxchg-clobber-flags.ll along the lines of what D6629 already does by also testing i386.

Test Plan: ninja check

Reviewers: t.p.northover, jvoung

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6687

llvm-svn: 224359
2014-12-16 20:15:45 +00:00
Rafael Espindola a4a94f1b55 Use CastInst::castIsValid to simplify the verifier.
Also delete a dead member variable.

llvm-svn: 224356
2014-12-16 19:29:29 +00:00
Matt Arsenault 31a52ad48c NVPTX: Remove duplicate of AsmPrinter::lowerConstant
llvm-svn: 224355
2014-12-16 19:16:17 +00:00
Matt Arsenault dd3b77d64c Move lowerConstant to AsmPrinter
This was a static function before, and NVPTX duplicated it
because it wasn't exposed.

llvm-svn: 224354
2014-12-16 19:16:14 +00:00
Quentin Colombet d5e57b731f [CodeGenPrepare] Move sign/zero extensions near loads using type promotion.
This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>

llvm-svn: 224351
2014-12-16 19:09:03 +00:00
Robert Khasanov d04cd2fbfe [AVX512] Enable integer arithmetic lowering for AVX512BW/VL subsets.
Added lowering tests.

llvm-svn: 224349
2014-12-16 18:24:07 +00:00
Colin LeMahieu 1944a8cd04 [Hexagon] Adding absolute value, and negate with saturation
llvm-svn: 224346
2014-12-16 17:44:49 +00:00
Sanjay Patel e46d54f0bf combine consecutive subvector 16-byte loads into one 32-byte load
This is a fix for PR21709 ( http://llvm.org/bugs/show_bug.cgi?id=21709 ).
When we have 2 consecutive 16-byte loads that are merged into one 32-byte vector,
we can use a single 32-byte load instead. 
But we don't do this for SandyBridge / IvyBridge because they have slower 32-byte memops.
We also don't bother using 32-byte *integer* loads on a machine that only has AVX1 (btver2)
because those operands would have to be split in half anyway since there is no support for
32-byte integer math ops.

Differential Revision: http://reviews.llvm.org/D6492

llvm-svn: 224344
2014-12-16 16:30:01 +00:00
Colin LeMahieu 455f24aa77 [Hexagon] Adding saturate and swizzle instructions.
llvm-svn: 224343
2014-12-16 16:27:17 +00:00
Robert Khasanov 8d9b93eac8 [AVX512] Add a comment for avx512_broadcast_pat multiclass
llvm-svn: 224341
2014-12-16 16:12:11 +00:00
Colin LeMahieu d9b23509bf [Hexagon] Removing old multiply defs and updating references to new versions.
llvm-svn: 224340
2014-12-16 16:10:01 +00:00
Vladimir Medic e88609388a The single check for N64 inside MipsDisassemblerBase's subclasses is actually wrong. It should be testing for FeatureGP64bit.There are no functional changes.
llvm-svn: 224339
2014-12-16 15:29:12 +00:00
Zoran Jovanovic 2deca34803 [mips][microMIPS] Implement SWP and LWP instructions
Differential Revision: http://reviews.llvm.org/D5667

llvm-svn: 224338
2014-12-16 14:59:10 +00:00
Aaron Ballman 0d6a010c13 Fixing -Wsign-compare warnings; NFC.
llvm-svn: 224337
2014-12-16 14:04:11 +00:00
Elena Demikhovsky f5b72afff4 Masked Load and Store Intrinsics in loop vectorizer.
The loop vectorizer optimizes loops containing conditional memory
accesses by generating masked load and store intrinsics.
This decision is target dependent.

http://reviews.llvm.org/D6527

llvm-svn: 224334
2014-12-16 11:50:42 +00:00
Bradley Smith ececb7f6e2 [ARM] Prevent PerformVCVTCombine from combining a vmul/vcvt with 8 lanes
This would result in a crash since the vcvt used does not support v8i32 types.

llvm-svn: 224332
2014-12-16 10:59:27 +00:00
Elena Demikhovsky a79fc16bb0 X86: Added FeatureVectorUAMem for all AVX architectures.
According to AVX specification:

"Most arithmetic and data processing instructions encoded using the VEX prefix and
performing memory accesses have more flexible memory alignment requirements
than instructions that are encoded without the VEX prefix. Specifically,
With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions,
most VEX-encoded, arithmetic and data processing instructions operate in
a flexible environment regarding memory address alignment, i.e. VEX-encoded
instruction with 32-byte or 16-byte load semantics will support unaligned load
operation by default. Memory arguments for most instructions with VEX prefix
operate normally without causing #GP(0) on any byte-granularity alignment
(unlike Legacy SSE instructions)."

The same for AVX-512.

This change does not affect anything right now, because only the "memop pattern fragment"
depends on FeatureVectorUAMem and it is not used in AVX patterns.
All AVX patterns are based on the "unaligned load" anyway.

llvm-svn: 224330
2014-12-16 09:10:08 +00:00
Duncan P. N. Exon Smith bb7d2fb1e5 IR: Stop printing 'metadata' in Metadata::print()
Stop printing `metadata` in `Metadata::print()` and
`Metadata::printAsOperand()`.

llvm-svn: 224327
2014-12-16 07:40:31 +00:00
Duncan P. N. Exon Smith fee167fcc0 IR: Make MDNode::dump() useful by adding addresses
It's horrible to inspect `MDNode`s in a debugger.  All of their operands
that are `MDNode`s get dumped as `<badref>`, since we can't assign
metadata slots in the context of a `Metadata::dump()`.  (Why not?  Why
not assign numbers lazily?  Because then each time you called `dump()`,
a given `MDNode` could have a different lazily assigned number.)

Fortunately, the C memory model gives us perfectly good identifiers for
`MDNode`.  Add pointer addresses to the dumps, transforming this:

    (lldb) e N->dump()
    !{i32 662302, i32 26, <badref>, null}

    (lldb) e ((MDNode*)N->getOperand(2))->dump()
    !{i32 4, !"foo"}

into:

    (lldb) e N->dump()
    !{i32 662302, i32 26, <0x100706ee0>, null}

    (lldb) e ((MDNode*)0x100706ee0)->dump()
    !{i32 4, !"foo"}

and this:

    (lldb) e N->dump()
    0x101200248 = !{<badref>, <badref>, <badref>, <badref>, <badref>}

    (lldb) e N->getOperand(0)
    (const llvm::MDOperand) $0 = {
      MD = 0x00000001012004e0
    }
    (lldb) e N->getOperand(1)
    (const llvm::MDOperand) $1 = {
      MD = 0x00000001012004e0
    }
    (lldb) e N->getOperand(2)
    (const llvm::MDOperand) $2 = {
      MD = 0x0000000101200058
    }
    (lldb) e N->getOperand(3)
    (const llvm::MDOperand) $3 = {
      MD = 0x00000001012004e0
    }
    (lldb) e N->getOperand(4)
    (const llvm::MDOperand) $4 = {
      MD = 0x0000000101200058
    }
    (lldb) e ((MDNode*)0x00000001012004e0)->dump()
    !{}

    (lldb) e ((MDNode*)0x0000000101200058)->dump()
    !{null}

into:

    (lldb) e N->dump()
    !{<0x1012004e0>, <0x1012004e0>, <0x101200058>, <0x1012004e0>, <0x101200058>}

    (lldb) e ((MDNode*)0x1012004e0)->dump()
    !{}

    (lldb) e ((MDNode*)0x101200058)->dump()
    !{null}

llvm-svn: 224325
2014-12-16 07:09:37 +00:00
Saleem Abdulrasool 417fc6b303 ARM: diagnose deprecated syntax
The use of SP and PC in the register list for stores is deprecated on ARM
(ARM ARM A.8.8.199):

  ARM deprecates the use of ARM instructions that include the SP or the PC in
  the list.

Provide a deprecation warning from the assembler in the case that the syntax is
ever seen.

llvm-svn: 224319
2014-12-16 05:53:25 +00:00
Hal Finkel 8adf2254ef [PowerPC] Improve instruction selection bit-permuting operations (32-bit)
The PowerPC backend, somewhat embarrassingly, did not generate an
optimal-length sequence of instructions for a 32-bit bswap. While adding a
pattern for the bswap intrinsic to fix this would not have been terribly
difficult, doing so would not have addressed the real problem: we had been
generating poor code for many bit-permuting operations (by which I mean things
like byte-swap that permute the bits of one or more inputs around in various
ways). Here are some initial steps toward solving this deficiency.

Bit-permuting operations are represented, at the SDAG level, using ISD::ROTL,
SHL, SRL, AND and OR (mostly with constant second operands). Looking back
through these operations, we can build up a description of the bits in the
resulting value in terms of bits of one or more input values (and constant
zeros). For each bit, we compute the rotation amount from the original value,
and then group consecutive (value, rotation factor) bits into groups. Groups
sharing these attributes are then collected and sorted, and we can then
instruction select the entire permutation using a combination of masked
rotations (rlwinm), imm ands (andi/andis), and masked rotation inserts
(rlwimi).

The result is that instead of lowering an i32 bswap as:

	rlwinm 5, 3, 24, 16, 23
	rlwinm 4, 3, 24, 0, 7
	rlwimi 4, 3, 8, 8, 15
	rlwimi 5, 3, 8, 24, 31
	rlwimi 4, 5, 0, 16, 31

we now produce:

	rlwinm 4, 3, 8, 0, 31
	rlwimi 4, 3, 24, 16, 23
	rlwimi 4, 3, 24, 0, 7

and for the 'test6' example in the PowerPC/README.txt file:

 unsigned test6(unsigned x) {
   return ((x & 0x00FF0000) >> 16) | ((x & 0x000000FF) << 16);
 }

we used to produce:

	lis 4, 255
	rlwinm 3, 3, 16, 0, 31
	ori 4, 4, 255
	and 3, 3, 4

and now we produce:

	rlwinm 4, 3, 16, 24, 31
	rlwimi 4, 3, 16, 8, 15

and, as a nice bonus, this fixes the FIXME in
test/CodeGen/PowerPC/rlwimi-and.ll.

This commit does not include instruction-selection for i64 operations, those
will come later.

llvm-svn: 224318
2014-12-16 05:51:41 +00:00
Saleem Abdulrasool 08408ea86e ARM: 80-column
clang-format a function with an overly long string constant.  NFC.

llvm-svn: 224314
2014-12-16 04:10:10 +00:00
Matthias Braun 1aed6ffa35 LiveRangeCalc: Rewrite subrange calculation
This changes subrange calculation to calculate subranges sequentially
instead of in parallel. The code is easier to understand that way and
addresses the code review issues raised about LiveOutData being
hard to understand/needing more comments by removing them :)

llvm-svn: 224313
2014-12-16 04:03:38 +00:00
Rafael Espindola a23008ad4b Remove the last unnecessary member variable of mapped_file_region. NFC.
llvm-svn: 224312
2014-12-16 03:10:29 +00:00
Rafael Espindola 369d514616 Convert a member variable to a local variable. NFC.
llvm-svn: 224311
2014-12-16 02:53:35 +00:00
Rafael Espindola 986f5adf8d Remove unused member and simplify. NFC.
llvm-svn: 224309
2014-12-16 02:19:26 +00:00
Rafael Espindola 9d1020648c Start adding thin archive support.
This is just sufficient for 'ar t' to work.

llvm-svn: 224307
2014-12-16 01:43:41 +00:00
Adrian Prantl b9fa945d51 ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.

llvm-svn: 224294
2014-12-16 00:20:49 +00:00
Colin LeMahieu d9a00a9c38 [Hexagon] Adding doubleword multiplies with and without accumulation.
llvm-svn: 224293
2014-12-16 00:07:24 +00:00
Michael Ilseman 8210281d13 Sink the isa into the assert
llvm-svn: 224291
2014-12-15 23:41:21 +00:00
Colin LeMahieu 18c927620a [Hexagon] Adding halfword to doubleword multiplies.
llvm-svn: 224289
2014-12-15 23:29:37 +00:00
Colin LeMahieu 64ffd52943 [Hexagon] Adding logical-logical accumulation instructions and tests.
llvm-svn: 224288
2014-12-15 23:19:07 +00:00
Sanjoy Das 4555b6d870 Teach ScalarEvolution to exploit min and max expressions when proving
isKnownPredicate.

The motivation for this change is to optimize away checks in loops
like this:

    limit = min(t, len)
    for (i = 0 to limit)
      if (i >= len || i < 0) throw_array_of_of_bounds();
      a[i] = ...

Differential Revision: http://reviews.llvm.org/D6635

llvm-svn: 224285
2014-12-15 22:50:15 +00:00
JF Bastien 388b8794c9 x86: Emit LOCK prefix after DATA16
Summary: x86 allows either ordering for the LOCK and DATA16 prefixes, but using GCC+GAS leads to different code generation than using LLVM. This change matches the order that GAS emits the x86 prefixes when a semicolon isn't used in inline assembly (see tc-i386.c comment before define LOCK_PREFIX), and helps simplify tooling that operates on the instruction's byte sequence (such as NaCl's validator). This change shouldn't have any performance impact.

Test Plan: ninja check

Reviewers: craig.topper, jvoung

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D6630

llvm-svn: 224283
2014-12-15 22:34:58 +00:00
Colin LeMahieu 71e11a1d0d [Hexagon] Adding a number of additional multiply forms with tests.
llvm-svn: 224282
2014-12-15 22:10:37 +00:00
Michael Ilseman 00a6087f6b Clean up warning about unused variable
llvm-svn: 224281
2014-12-15 21:47:09 +00:00
Matthias Braun c3a72c2e5f Revert "LiveRangeCalc: Rewrite subrange calculation"
Revert until I find out why non-subreg enabled targets break.

This reverts commit 6097277eefb9c5fb35a7f493c783ee1fd1b9d6a7.

llvm-svn: 224278
2014-12-15 21:36:35 +00:00
Michael Ilseman 1c38396db7 Revert of r223763, in spirit.
r223763 was made to work around a temporary issue where a user of the
JIT was passing down a declaration (incorrectly). This shouldn't
occur, so assert rather than silently continue.

llvm-svn: 224277
2014-12-15 21:36:29 +00:00
Mark Heffernan acbed5e530 Clarify HowFarToZero computation when the step is a positive power of two. Functionally this should be identical to the existing code except for the case where Step is maximally negative (eg, INT_MIN). We now punt in that one corner case to make reasoning about the code easier.
llvm-svn: 224274
2014-12-15 21:19:53 +00:00
Colin LeMahieu 4a46429305 [Hexagon] Adding misc multiply encodings and tests.
llvm-svn: 224273
2014-12-15 21:17:03 +00:00
Matthias Braun 0352201e15 LiveRangeCalc: Rewrite subrange calculation
This changes subrange calculation to calculate subranges sequentially
instead of in parallel. The code is easier to understand that way and
addresses the code review issues raised about LiveOutData being
hard to understand/needing more comments by removing them :)

llvm-svn: 224272
2014-12-15 21:16:21 +00:00
Colin LeMahieu 26f884aedf [Hexagon] Adding doubleworld accumulating multiplies of halfwords.
llvm-svn: 224267
2014-12-15 20:17:46 +00:00
Colin LeMahieu 572c53e258 [Hexagon] Adding accumulating half word multiplies.
llvm-svn: 224266
2014-12-15 20:10:28 +00:00
Colin LeMahieu d1704cdc07 [Hexagon] Adding multiply with rnd/sat/rndsat
llvm-svn: 224265
2014-12-15 20:01:59 +00:00
Matthias Braun 42fab34ffb LiveRangeCalc: use more range based for loops; NFC
llvm-svn: 224263
2014-12-15 19:40:46 +00:00
Colin LeMahieu fe4012a969 [Hexagon] Adding encoding bits for halfword multiplies.
llvm-svn: 224261
2014-12-15 19:22:07 +00:00
Ahmed Bougacha c2a87ddf01 [X86] Also pretty-print shuffle mask for INSERTPS rm variants.
llvm-svn: 224260
2014-12-15 19:17:54 +00:00
Duncan P. N. Exon Smith be7ea19b58 IR: Make metadata typeless in assembly
Now that `Metadata` is typeless, reflect that in the assembly.  These
are the matching assembly changes for the metadata/value split in
r223802.

  - Only use the `metadata` type when referencing metadata from a call
    intrinsic -- i.e., only when it's used as a `Value`.

  - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
    when referencing it from call intrinsics.

So, assembly like this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata !{i32 %v}, metadata !0)
      call void @llvm.foo(metadata !{i32 7}, metadata !0)
      call void @llvm.foo(metadata !1, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{metadata !3}, metadata !0)
      ret void, !bar !2
    }
    !0 = metadata !{metadata !2}
    !1 = metadata !{i32* @global}
    !2 = metadata !{metadata !3}
    !3 = metadata !{}

turns into this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata i32 %v, metadata !0)
      call void @llvm.foo(metadata i32 7, metadata !0)
      call void @llvm.foo(metadata i32* @global, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{!3}, metadata !0)
      ret void, !bar !2
    }
    !0 = !{!2}
    !1 = !{i32* @global}
    !2 = !{!3}
    !3 = !{}

I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines).  I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.

This is part of PR21532.

llvm-svn: 224257
2014-12-15 19:07:53 +00:00
Michael Ilseman addddc441f Silence more static analyzer warnings.
Add in definedness checks for shift operators, null checks when
pointers are assumed by the code to be non-null, and explicit
unreachables.

llvm-svn: 224255
2014-12-15 18:48:43 +00:00
Vladimir Medic d7ecf49e97 Add disassembler tests for mips3 platform. There are no functional changes.
llvm-svn: 224253
2014-12-15 16:19:34 +00:00
Aaron Ballman 2d67fd6d64 Changing a cast from unsigned to uint64_t, should be NFC in practice.
llvm-svn: 224249
2014-12-15 14:25:12 +00:00
Elena Demikhovsky a5599bfd72 Sink store based on alias analysis
- by Ella Bolshinsky
The alias analysis is used define whether the given instruction
is a barrier for store sinking. For 2 identical stores, following
instructions are checked in the both basic blocks, to determine
whether they are sinking barriers.

http://reviews.llvm.org/D6420

llvm-svn: 224247
2014-12-15 14:09:53 +00:00
Michael Kuperstein 47c97157ef [X86] Break false dependencies before partial register updates when the source operand is in memory
Adds the various "rm" instruction variants into the list of instructions that have a partial register update. Also adds all variants of SQRTSD that were missing in the original list.

Differential Revision: http://reviews.llvm.org/D6620

llvm-svn: 224246
2014-12-15 13:18:21 +00:00
Elena Demikhovsky 72860c341e AVX-512: Added EXPAND instructions and intrinsics.
llvm-svn: 224241
2014-12-15 10:03:52 +00:00
Alexey Bataev 48be28ef1a Fix line mapping information in LLVM JIT profiling with Vtune
The line mapping information for dynamic code is reported incorrectly. It causes VTune to map LLVM generated code to source lines incorrectly. This patch fix this issue.
Patch by Denis Pravdin.
Differential Revision: http://reviews.llvm.org/D6603

llvm-svn: 224229
2014-12-15 04:45:43 +00:00
David Majnemer c175dd2ea5 ThreadLocal: Move Unix-specific code out of Support/ThreadLocal.cpp
Just a cleanup, no functionality change is intended.

llvm-svn: 224227
2014-12-15 01:19:53 +00:00
David Majnemer 421c89debc ThreadLocal: Return a mutable pointer if templated with a non-const type
It makes more sense for ThreadLocal<const T>::get to return a const T*
and ThreadLocal<T>::get to return a T*.

llvm-svn: 224225
2014-12-15 01:04:45 +00:00
Elena Demikhovsky 3fcafa2cdb Loop Vectorizer minor changes in the code -
some comments, function names, identation.

Reviewed here: http://reviews.llvm.org/D6527

llvm-svn: 224218
2014-12-14 09:43:50 +00:00
David Majnemer 7f03920dad APInt: udivrem should use machine instructions for single-word APInts
This mirrors the behavior of APInt::udiv and APInt::urem.  Some
architectures, like X86, have a single instruction which can compute
both division and remainder.

llvm-svn: 224217
2014-12-14 09:41:56 +00:00
David Majnemer 4e87936d2f ScalarEvolution: Remove SCEVUDivision, it's unused
This is just a code simplification, no functionality change is intended.

llvm-svn: 224216
2014-12-14 09:12:33 +00:00
Hal Finkel 4104a1a346 [PowerPC] Handle cmp op promotion for SELECT[_CC] nodes in PPCTL::DAGCombineExtBoolTrunc
PPCTargetLowering::DAGCombineExtBoolTrunc contains logic to remove unwanted
truncations and extensions when dealing with nodes of the form:
  zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)

There was a FIXME in the implementation (now removed) regarding the fact that
the function would abort the transformations if any of the non-output operands
of a SELECT or SELECT_CC node would need to be promoted (because they were
also output operands, for example). As a result, we continued to generate
unnecessary zero-extends for code such as this:

  unsigned foo(unsigned a, unsigned b) {
    return  (a <= b) ? a : b;
  }

which would produce:

  cmplw 0, 3, 4
  isel 3, 4, 3, 1
  rldicl 3, 3, 0, 32
  blr

and now we produce:

  cmplw 0, 3, 4
  isel 3, 4, 3, 1
  blr

which is better in the obvious way.

llvm-svn: 224213
2014-12-14 05:53:19 +00:00
Ahmed Bougacha 0cb861634b Reapply "[ARM] Combine base-updating/post-incrementing vector load/stores."
r223862 tried to also combine base-updating load/stores.
r224198 reverted it, as "it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown."
Reapply, with a fix to ignore non-normal load/stores.
Truncstores are handled elsewhere (you can actually write a pattern for
those, whereas for postinc loads you can't, since they return two values),
but it should be possible to also combine extloads base updates, by checking
that the memory (rather than result) type is of the same size as the addend.

Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

Differential Revision: http://reviews.llvm.org/D6585

llvm-svn: 224203
2014-12-13 23:22:12 +00:00
Renato Golin df8f9b6dc9 Revert "[ARM] Combine base-updating/post-incrementing vector load/stores."
This reverts commit r223862, as it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown. We'll investigate the issue and re-apply
when safe.

llvm-svn: 224198
2014-12-13 20:23:18 +00:00
Aaron Ballman 786a86cb13 Silencing a -Wsign-compare warning; NFC.
llvm-svn: 224195
2014-12-13 16:55:02 +00:00
Akira Hatanaka 7ba78302b5 Rename argument strings of codegen passes to avoid collisions with command line
options.

This commit changes the command line arguments (PassInfo::PassArgument) of two
passes, MachineFunctionPrinter and MachineScheduler, to avoid collisions with
command line options that have the same argument strings.

This bug manifests when the PassList construct (defined in opt.cpp) is used
in a tool that links with codegen passes. To reproduce the bug, paste the
following lines into llc.cpp and run llc.

#include "llvm/IR/LegacyPassNameParser.h"
static llvm:🆑:list<const llvm::PassInfo*, bool, llvm::PassNameParser>
PassList(llvm:🆑:desc("Optimizations available:"));

rdar://problem/19212448

llvm-svn: 224186
2014-12-13 04:52:04 +00:00
Hal Finkel 4c6658feb0 [PowerPC] Add a DAGToDAG peephole to remove unnecessary zero-exts
On PPC64, we end up with lots of i32 -> i64 zero extensions, not only from all
of the usual places, but also from the ABI, which specifies that values passed
are zero extended. Almost all 32-bit PPC instructions in PPC64 mode are defined
to do *something* to the higher-order bits, and for some instructions, that
action clears those bits (thus providing a zero-extended result). This is
especially common after rotate-and-mask instructions. Adding an additional
instruction to zero-extend the results of these instructions is unnecessary.

This PPCISelDAGToDAG peephole optimization examines these zero-extensions, and
looks back through their operands to see if all instructions will implicitly
zero extend their results. If so, we convert these instructions to their 64-bit
variants (which is an internal change only, the actual encoding of these
instructions is the same as the original 32-bit ones) and remove the
unnecessary zero-extension (changing where the INSERT_SUBREG instructions are
to make everything internally consistent).

llvm-svn: 224169
2014-12-12 23:59:36 +00:00
David Majnemer 9b6097586c ValueTracking: Don't recurse too deeply in computeKnownBitsFromAssume
Respect the MaxDepth recursion limit, doing otherwise will trigger an
assert in computeKnownBits.

This fixes PR21891.

llvm-svn: 224168
2014-12-12 23:59:29 +00:00
Chad Rosier 620fb2206d [ARMConstantIsland] Insert tbb/tbh optimization where previous jump table resided.
llvm-svn: 224165
2014-12-12 23:27:40 +00:00
Yaron Keren a604d6c84d Pass EC by reference to MemoryBufferMMapFile to return error code.
Patch by Kim Grasman!

llvm-svn: 224159
2014-12-12 22:27:53 +00:00
Michael Ilseman 5be22a12c2 Clean up static analyzer warnings.
Clang's static analyzer found several potential cases of undefined
behavior, use of un-initialized values, and potentially null pointer
dereferences in tablegen, Support, MC, and ADT. This cleans them up
with specific assertions on the assumptions of the code.

llvm-svn: 224154
2014-12-12 21:48:03 +00:00