Commit Graph

38679 Commits

Author SHA1 Message Date
Ron Lieberman 8123b966cb [Hexagon] Generate vector printing instructions
llvm-svn: 277370
2016-08-01 19:36:39 +00:00
George Burgess IV 5f0e76dca6 [CFLAA] Remove modref queries from CFLAA.
As it turns out, modref queries are broken with CFLAA. Specifically,
the data source we were using for determining modref behaviors
explicitly ignores operations on non-pointer values. So, it wouldn't
note e.g. storing an i32 to an i32* (or loading an i64 from an i64*).
It also ignores external function calls, rather than acting
conservatively for them.

(N.B. These operations, where necessary, *are* tracked by CFLAA; we just
use a different mechanism to do so. Said mechanism is relatively
imprecise, so it's unlikely that we can provide reasonably good modref
answers with it as implemented.)

Patch by Jia Chen.

Differential Revision: https://reviews.llvm.org/D22978

llvm-svn: 277366
2016-08-01 18:47:28 +00:00
Evandro Menezes 82e245a202 [AArch64] Add support for Samsung Exynos M2 (NFC).
llvm-svn: 277364
2016-08-01 18:39:45 +00:00
David Majnemer d1548eaa17 Included test for r277360.
llvm-svn: 277361
2016-08-01 18:07:19 +00:00
David Majnemer ba6665d88a [Verifier] Resume instructions can only be in functions w/ a personality
This fixes PR28799.

llvm-svn: 277360
2016-08-01 18:06:34 +00:00
Krzysztof Parzyszek ddafa2cd5f [Hexagon] Check for offset overflow when reserving scavenging slots
Scavenging slots were only reserved when pseudo-instruction expansion in
frame lowering created new virtual registers. It is possible to still
need a scavenging slot even if no virtual registers were created, in cases
where the stack is large enough to overflow instruction offsets.

llvm-svn: 277355
2016-08-01 17:15:30 +00:00
Nirav Dave 6e0b732009 Add removed inline-assembly-comment test from r277146
llvm-svn: 277349
2016-08-01 15:36:10 +00:00
Daniel Sanders b3ae33c7a6 [mips][fastisel] Correct argument lowering for (f64, f64, i32) and similar.
Summary:
Allocating an AFGR64 shadows two GPR32's instead of just one.

This fixes an LNT regression detected by our internal buildbots.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: https://reviews.llvm.org/D23012

llvm-svn: 277348
2016-08-01 15:32:51 +00:00
Valery Pykhtin 902db3101b [AMDGPU] refactor DS instruction definitions. NFC.
Differential revision: https://reviews.llvm.org/D22522

llvm-svn: 277344
2016-08-01 14:21:30 +00:00
Simon Pilgrim 46f119a59f [X86] Use implicit masking of SHLD/SHRD shift double instructions
Similar to the regular shift instructions, SHLD/SHRD only use the bottom bits of the shift value

llvm-svn: 277341
2016-08-01 12:11:43 +00:00
Simon Pilgrim 7fd4ad6849 Fixed test check ordering issue on windows buildbots
llvm-svn: 277337
2016-08-01 10:40:15 +00:00
James Molloy bade86cedc [SimplifyCFG] Fix nasty RAUW bug from r277325
Using RAUW was wrong here; if we have a switch transform such as:
  18 -> 6 then
  6 -> 0

If we use RAUW, while performing the second transform the  *transformed* 6
from the first will be also replaced, so we end up with:
  18 -> 0
  6 -> 0

Found by clang stage2 bootstrap; testcase added.

llvm-svn: 277332
2016-08-01 09:34:48 +00:00
Diana Picus ab5a4c7dbb [AArch64] Return the correct size for TLSDESC_CALLSEQ
The branch relaxation pass is computing the wrong offsets because it assumes
TLSDESC_CALLSEQ eats up 4 bytes, when in fact it is lowered to an instruction
sequence taking up 16 bytes. This can become a problem in huge files with lots
of TLS accesses, as it may slowly move branch targets out of the range computed
by the branch relaxation pass.

Fixes PR24234 https://llvm.org/bugs/show_bug.cgi?id=24234

Differential Revision: https://reviews.llvm.org/D22870

llvm-svn: 277331
2016-08-01 08:38:49 +00:00
Craig Topper d2b2d745ff [AVX-512] Fix a test missed in r277327.
llvm-svn: 277330
2016-08-01 08:15:30 +00:00
James Molloy 91821bd0b4 [SimplifyCFG] Try and pacify buildbots after r277325
It looks like the two independent parts of the rotate operation (a lshr and shl) are being reordered on some bots. Add CHECK-DAGs to account for this.

llvm-svn: 277329
2016-08-01 08:09:55 +00:00
Craig Topper c48c029610 [AVX-512] Fix duplicate column in AVX512 execution dependency table that was preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types.
llvm-svn: 277327
2016-08-01 07:55:33 +00:00
Craig Topper ddc96cd33d [X86] Regenerate a test to pick up shuffle comments that were added at some point.
llvm-svn: 277326
2016-08-01 07:55:24 +00:00
James Molloy b2e436de42 [SimplifyCFG] Range reduce switches
If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example:

    switch (i) {
    case 5: ...
    case 9: ...
    case 13: ...
    case 17: ...
    }

can become:

    if ( (i - 5) % 4 ) goto default;
    switch ((i - 5) / 4) {
    case 0: ...
    case 1: ...
    case 2: ...
    case 3: ...
    }

or even better:

   switch ( ROTR(i - 5, 2) {
   case 0: ...
   case 1: ...
   case 2: ...
   case 3: ...
   }

The division and remainder operations could be costly so we only do this if the factor is a power of two, and emit a right-rotate instead of a divide/remainder sequence. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables.

llvm-svn: 277325
2016-08-01 07:45:11 +00:00
Hrvoje Varga 00d96ee7b9 [mips] Clang generates unaligned offset for MSA instruction st.d
Differential Revision: https://reviews.llvm.org/D19475

llvm-svn: 277323
2016-08-01 06:46:20 +00:00
Diana Picus 850043b25a [AArch64] Register passes so they can be run by llc
Initialize all AArch64-specific passes in the TargetMachine so they can be run
by llc. This can lead to conflicts in opt with some command line options that
share the same name as the pass, so I took this opportunity to do some cleanups:
* rename all relevant command line options from "aarch64-blah" to
  "aarch64-enable-blah" and update the tests accordingly
* run clang-format on their declarations
* move all these declarations to a common place (the TargetMachine) as opposed
  to having them scattered around (AArch64BranchRelaxation and
  AArch64AddressTypePromotion were the only offenders)

llvm-svn: 277322
2016-08-01 05:56:57 +00:00
Craig Topper 749a111f1e [AVX-512] Teach X86InstrInfo::getLargestLegalSuperClass to inflate to FR32X/FR64X if AVX512 is supported and VR128X/VR256X if VLX is supported.
Had to update a stack folding test to clobber the other 16 registers since this now made them get used instead of spilling.

llvm-svn: 277321
2016-08-01 05:31:50 +00:00
Craig Topper 9161e4ec22 [AVX512] Replace scalar fp arithmetic intrinsics with native IR in an AVX512 test. The intrinsics aren't lowered to AVX512 instructions.
The intrinsics really should be removed and autoupgraded.

llvm-svn: 277320
2016-08-01 04:29:16 +00:00
Sean Silva 423c7149dc Revert r277313 and r277314.
They seem to trigger an LSan failure:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/15140/steps/check-llvm%20asan/logs/stdio

Revert "Add the tests for r277313"

This reverts commit r277314.

Revert "CodeExtractor : Add ability to preserve profile data."

This reverts commit r277313.

llvm-svn: 277317
2016-08-01 04:16:09 +00:00
Sean Silva e5a5c966cd Move this test to x86-specific directory.
No bots have yelled yet, but this test references an x86 intrinsic.
Also, it invokes llc on x86 IR.

Fixup to r277315.

llvm-svn: 277316
2016-08-01 03:22:05 +00:00
Sean Silva a0a802abe3 Fix - CodeExtractor : Inherit Target Dependent Attributes from the parent function.
When extracting a set of blocks make sure to inherit all of the target
dependent attributes to make sure that the function will be valid for
lowering. One example is the "target-features" attribute for x86, if the
extracted region has functionality that relies on a specific feature it
will fail to be lowered.
This also allows for extracted functions to be valid for inlining, at
least back into the parent function, as the target attributes are tested
when inlining for compatibility.

Patch by River Riddle!

Differential Revision: https://reviews.llvm.org/D22713

llvm-svn: 277315
2016-08-01 03:15:32 +00:00
Sean Silva 72be9a6937 Add the tests for r277313
Forgot to `git add` them.

llvm-svn: 277314
2016-08-01 03:04:34 +00:00
Simon Pilgrim 8ae4354df6 [X86][SSE] Regenerate frem tests
llvm-svn: 277311
2016-07-31 21:59:23 +00:00
Simon Pilgrim b089ba4c65 [X86][SSE] Regenerate fpext tests
llvm-svn: 277310
2016-07-31 21:55:33 +00:00
Craig Topper 7afdc0fb25 [AVX512] Always use EVEX encodings for 128/256-bit move instructions in getLoadStoreRegOpcode if VLX is supported.
llvm-svn: 277305
2016-07-31 20:20:05 +00:00
Craig Topper 4c53e60360 [AVX512] Add VLX packed move instructions to the execution dependency fix pass and update tests.
llvm-svn: 277304
2016-07-31 20:20:01 +00:00
Craig Topper 338ec9a0cb [AVX512] Stop treating VR512 specially in getLoadStoreRegOpcode and use the regular switch which already tried to handle it, but was unreachable. This has the added benefit of enabling aligned loads/stores if the stack is aligned.
llvm-svn: 277302
2016-07-31 20:19:53 +00:00
Craig Topper 2a6bbb8203 [AVX512] Add X86::VR512RegClassID to X86RegisterInfo::getLargestLegalSuperClass.
llvm-svn: 277301
2016-07-31 20:19:50 +00:00
Simon Pilgrim 6be48e4aa7 [X86] Improve 64-bit shifts on 32-bit targets (PR14593)
As discussed on PR14593, this patch adds support for lowering to SHLD/SHRD from the patterns generated by DAGTypeLegalizer::ExpandShiftWithKnownAmountBit.

Differential Revision: https://reviews.llvm.org/D23000

llvm-svn: 277299
2016-07-31 19:50:45 +00:00
Simon Pilgrim 64845bb8b4 [X86] Add tests for the lowering SHLD/SHRD from manual pattern similar to those generated by ExpandShiftWithKnownAmountBit
Test for add(v,v) as well as shl(v,1)

llvm-svn: 277293
2016-07-31 17:51:37 +00:00
Craig Topper 00d34ed64f [AVX-512] Don't let ExeDependencyFix pass convert VPANDD/Q to VPANDPS/PD unless DQI instructions are supported. Same for ANDN, OR, and XOR.
Thanks to Igor Breger for pointing out my mistake.

llvm-svn: 277292
2016-07-31 17:15:07 +00:00
Simon Pilgrim 1e096b3a7a [X86] Add tests for the lowering SHLD/SHRD from manual patterns
As discussed on D23000

llvm-svn: 277291
2016-07-31 17:11:49 +00:00
Simon Pilgrim 9e201eac32 [SLPVectorizer][X86] Added vXi8/vXi16 sitofp/uitofp tests
Dropped useless 2i32-2f32 test

llvm-svn: 277281
2016-07-30 21:01:34 +00:00
Simon Pilgrim fbe0fcb009 [X86][SSE] Regenerate vshift tests
llvm-svn: 277278
2016-07-30 20:28:02 +00:00
Simon Pilgrim f5134a2867 [SLPVectorizer][X86] Added SITOFP/UITOFP vectorization tests
llvm-svn: 277275
2016-07-30 18:43:30 +00:00
Simon Pilgrim 2e9de1cebb [X86][AVX] Added signum example test functions from PR13248
These are good examples of missed combine opportunities with zero/all bit vector compare results
 

llvm-svn: 277274
2016-07-30 16:29:19 +00:00
Simon Pilgrim b9e1f9c5c5 [X86][X87] Add vector arithmetic tests for targets with sse disabled
To make sure the X86_64 target isn't doing anything stupid

llvm-svn: 277272
2016-07-30 16:01:30 +00:00
Simon Pilgrim cf49fa3251 [X86][SSE] Let 64-bit targets use the fast 2i32-2f32 UINT_TO_FP conversion as well as 32-bit
The 2i32-2i64 legalization means that we can use the slightly quicker double bits + fptrunc approach for the same results

llvm-svn: 277271
2016-07-30 14:06:59 +00:00
Matt Arsenault 749035b7b1 AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior
This should really be true for any immediate, not just
inline ones.

llvm-svn: 277260
2016-07-30 01:40:36 +00:00
Weiming Zhao 812fde3603 DAG: avoid duplicated truncating for sign extended operand
Summary:
When performing cmp for EQ/NE and the operand is sign extended, we can
avoid the truncaton if the bits to be tested are no less than origianl
bits.

Reviewers: eli.friedman

Subscribers: eli.friedman, aemerson, nemanjai, t.p.northover, llvm-commits

Differential Revision: https://reviews.llvm.org/D22933

llvm-svn: 277252
2016-07-29 23:33:48 +00:00
Tim Northover 5fc93b75d9 GlobalISel: translate "unreachable" (into nothing)
Easiest instruction ever!

llvm-svn: 277225
2016-07-29 22:41:55 +00:00
Tim Northover 5fb414d870 GlobalISel: support translation of intrinsic calls.
These come in two variants for now: G_INTRINSIC and G_INTRINSIC_W_SIDE_EFFECTS.
We may decide to split the latter up with finer-grained restrictions later, if
necessary.

llvm-svn: 277224
2016-07-29 22:32:36 +00:00
Kevin Enderby 31b07f1445 Think this will fix issues with the error messages generated for malformed-archives.test
in r277177 and added back this test which was deleted in r277196 while
I tracked down these problems.

Changed from constructing Twine's to std::string's as Twine's don't work
across statements.  Also removed a few unneeded Twine() constructions.

Fix the write_escaped() calls to not pass the unintended second argument
fixing the warning on the ld-x86_64-win7 bot.

llvm-svn: 277223
2016-07-29 22:32:02 +00:00
Michael Kuperstein f396b4c40d [X86] Match PSADBW in straight-line code
Up until now, we only had code to match PSADBW patterns that look like what
comes out of the loop vectorizer - a partial reduction inside the loop body
that gets fed into a horizontal operation in a different basic block.

This adds support for straight-line patterns, like those generated by the
SLP vectorizer.

Differential Revision: https://reviews.llvm.org/D22889

llvm-svn: 277219
2016-07-29 21:45:51 +00:00
Michael Kuperstein e9ac9b9aaf [Hexagon] Fix test that uses -debug-only to require asserts.
llvm-svn: 277218
2016-07-29 21:44:33 +00:00
Rui Ueyama 7a5cdc6225 pdbdump: Dump Free Page Map contents.
Differential Revision: https://reviews.llvm.org/D22974

llvm-svn: 277216
2016-07-29 21:38:00 +00:00
Simon Pilgrim f107ffa8f0 [X86][AVX] Fix VBROADCASTF128 selection bug (PR28770)
Support for lowering to VBROADCASTF128 etc. in D22460 was not correctly ensuring that the only users of the 128-bit vector load were the insertions of the vector into the lower/upper subvectors.

llvm-svn: 277214
2016-07-29 21:05:10 +00:00
Tim Northover 6b3bd61283 CodeGen: add new "intrinsic" MachineOperand kind.
This will be used during GlobalISel, where we need a more robust and readable
way to write tests than a simple immediate ID.

llvm-svn: 277209
2016-07-29 20:32:59 +00:00
Eli Bendersky c08ff1267f Add a REQUIRES: assert on a Lanai test that uses a -debug-only flag
llvm-svn: 277204
2016-07-29 19:35:22 +00:00
Adam Nemet 12937c361f [LoopUnroll] Include hotness of region in opt remark
LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter
is added to the common function analysis passes that loop passes
depend on.

The BFI and indirectly BPI used in this pass is computed lazily so no
overhead should be observed unless -pass-remarks-with-hotness is used.

This is how the patch affects the O3 pipeline:

         Dominator Tree Construction
         Natural Loop Information
         Canonicalize natural loops
         Loop-Closed SSA Form Pass
         Basic Alias Analysis (stateless AA impl)
         Function Alias Analysis Results
         Scalar Evolution Analysis
+        Lazy Branch Probability Analysis
+        Lazy Block Frequency Analysis
+        Optimization Remark Emitter
         Loop Pass Manager
           Rotate Loops
           Loop Invariant Code Motion
           Unswitch loops
         Simplify the CFG
         Dominator Tree Construction
         Basic Alias Analysis (stateless AA impl)
         Function Alias Analysis Results
         Combine redundant instructions
         Natural Loop Information
         Canonicalize natural loops
         Loop-Closed SSA Form Pass
         Scalar Evolution Analysis
+        Lazy Branch Probability Analysis
+        Lazy Block Frequency Analysis
+        Optimization Remark Emitter
         Loop Pass Manager
           Induction Variable Simplification
           Recognize loop idioms
           Delete dead loops
           Unroll loops
...

llvm-svn: 277203
2016-07-29 19:29:47 +00:00
Simon Pilgrim 455460f310 Fixed line endings
llvm-svn: 277199
2016-07-29 18:58:57 +00:00
David Majnemer 718da3d1f6 [ConstantFolding] Handle bitcasts of undef fp vector elements
We used the wrong type for constructing a zero vector element which led
to type mismatches.

This fixes PR28771.

llvm-svn: 277197
2016-07-29 18:48:27 +00:00
Kevin Enderby 5207b5dcae Remove the test/tools/llvm-objdump/malformed-archives.test for
now while I investagate the bot failures with this test.

llvm-svn: 277196
2016-07-29 18:46:24 +00:00
Andrew Kaylor b99d1cc7ed Recommitting r275284: add support to inline __builtin_mempcpy
Patch by Sunita Marathe

Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)

llvm-svn: 277189
2016-07-29 18:23:18 +00:00
Kyle Butt 02d8d054ab Codegen: MachineBlockPlacement Improve probability layout.
The following pattern was being layed out poorly:

              A
             / \
            B   C
           / \ / \
          D   E   ? (Doesn't matter)

Where A->B is far more likely than A->C, and prob(B->D) = prob(B->E)

The current algorithm gives:
A,B,C,E (D goes on worklist)

It does this even if C has a frequency count of 0. This patch
adjusts the layout calculation so that if freq(B->E) >> freq(C->E)
then we go ahead and layout E rather than C. Fallthrough half the time
is better than fallthrough never, or fallthrough very rarely. The
resulting layout is:

A,B,E, (C and D are in a worklist)

llvm-svn: 277187
2016-07-29 18:09:28 +00:00
Kyle Butt af324f76ad Tests: Add branch weights to non-layout tests.
Add branch weights to a few tests that aren't testing layout to make them less
sensitive to changes in the layout algorithm.

llvm-svn: 277186
2016-07-29 18:09:25 +00:00
Tim Northover 69c2ba546f GlobalISel: add generic conditional branch.
Just the basic equivalent to DAG's condbr for now, we'll get to things like
br_cc when we start doing more legalization.

llvm-svn: 277184
2016-07-29 17:58:00 +00:00
Krzysztof Parzyszek b0c4376697 [Hexagon] Testcase for not merging stores into a misaligned store
The DAG combiner will try to merge consecutive stores into a bigger
store, unless the resulting store is not fast. Misaligned vector stores
are allowed on Hexagon, but are not fast. Add a testcase to make sure
this type of merging does not occur.

Patch by Pranav Bhandarkar.

llvm-svn: 277182
2016-07-29 17:55:37 +00:00
Krzysztof Parzyszek 3e137e3429 Revert r277178, the actual change had already been applied
Will submit another patch with the testcase only.

llvm-svn: 277180
2016-07-29 17:50:47 +00:00
Krzysztof Parzyszek 68fe439d06 [Hexagon] Misaligned loads and stores are not fast
The DAG combiner tries to merge stores to adjacent vector wide memory
locations by creating stores which are integral multiples of the vector
width. Discourage this by informing it that this is slow. This should
not affect legalization passes, because all of them ignore the "Fast"
argument.

Patch by Pranav Bhandarkar.

llvm-svn: 277178
2016-07-29 17:45:16 +00:00
Kevin Enderby f4586039f6 The next step along the way to getting good error messages for bad archives.
As mentioned in commit log for r276686 this next step is adding a new
method in the ArchiveMemberHeader class to get the full name that
does proper error checking, and can be use for error messages.

To do this the name of ArchiveMemberHeader::getName() is changed to
ArchiveMemberHeader::getRawName() to be consistent with
Archive::Child::getRawName().  Then the “new” method is the addition
of a new implementation of ArchiveMemberHeader::getName() which gets
the full name and provides proper error checking.  Which is mostly a rewrite
of what was Archive::Child::getName() and cleaning up incorrect uses of
llvm_unreachable() in the code which were actually just cases of errors
in the input Archives.

Then Archive::Child::getName() is changed to return Expected<> and use
the new implementation of ArchiveMemberHeader::getName() .

Also needed to change Archive::getMemoryBufferRef() with these
changes to return Expected<> as well to propagate Errors up.
As well as changing Archive::isThinMember() to return Expected<> .

llvm-svn: 277177
2016-07-29 17:44:13 +00:00
Ahmed Bougacha 6db3cfe2da [AArch64][GlobalISel] Select G_XOR.
llvm-svn: 277173
2016-07-29 16:56:25 +00:00
Ahmed Bougacha 784e3423e6 [GlobalISel] Add G_XOR.
llvm-svn: 277172
2016-07-29 16:56:20 +00:00
Ahmed Bougacha 7adfac56b3 [AArch64][GlobalISel] Select G_LOAD/G_STORE.
Mostly straightforward as we ignore addressing modes and just
use the base + unsigned immediate offset (always 0) variants.

This currently fails to select extloads because we have yet to
agree on a representation.

llvm-svn: 277171
2016-07-29 16:56:16 +00:00
Brendon Cahoon 254f889dc5 MachinePipeliner pass that implements Swing Modulo Scheduling
Software pipelining is an optimization for improving ILP by
overlapping loop iterations. Swing Modulo Scheduling (SMS) is
an implementation of software pipelining that attempts to
reduce register pressure and generate efficient pipelines with
a low compile-time cost.

This implementaion of SMS is a target-independent back-end pass.
When enabled, the pass should run just prior to the register
allocation pass, while the machine IR is in SSA form. If the pass
is successful, then the original loop is replaced by the optimized
loop. The optimized loop contains one or more prolog blocks, the
pipelined kernel, and one or more epilog blocks.

This pass is enabled for Hexagon only. To enable for other targets,
a couple of target specific hooks must be implemented, and the
pass needs to be called from the target's TargetMachine
implementation.

Differential Review: http://reviews.llvm.org/D16829

llvm-svn: 277169
2016-07-29 16:44:44 +00:00
Krzysztof Parzyszek 0bd55a7608 [Hexagon] Custom lower VECTOR_SHUFFLE and EXTRACT_SUBVECTOR for HVX
If the mask of a vector shuffle has alternating odd or even numbers
starting with 1 or 0 respectively up to the largest possible index
for the given type in the given HVX mode (single of double) we can
generate vpacko or vpacke instruction respectively.

E.g.
  %42 = shufflevector <32 x i16> %37, <32 x i16> %41,
                      <32 x i32> <i32 1, i32 3, ..., i32 63>
  is %42.h = vpacko(%41.w, %37.w)

Patch by Pranav Bhandarkar.

llvm-svn: 277168
2016-07-29 16:44:27 +00:00
Matt Masten a6669a1e05 Initial support for vectorization using svml (short vector math library).
Differential Revision: https://reviews.llvm.org/D19544

llvm-svn: 277166
2016-07-29 16:42:44 +00:00
Paul Robinson fd0fb094be Reinstate optnone test for GVN Hoisting, removed in r276479.
llvm-svn: 277158
2016-07-29 16:05:50 +00:00
Nirav Dave b20c34edf3 Remove inline-comment-2.ll until I can debug why it fails on some builds
llvm-svn: 277152
2016-07-29 15:24:06 +00:00
Krzysztof Parzyszek 0006e1afdd [Hexagon] Improve balancing of address calculation
Rebalances address calculation trees and applies Hexagon-specific
optimizations to the trees to improve instruction selection.

Patch by Tobias Edler von Koch.

llvm-svn: 277151
2016-07-29 15:15:35 +00:00
Nirav Dave 3853f608be Fix inline-comment-2.ll triple
llvm-svn: 277149
2016-07-29 15:12:00 +00:00
David L Kreitzer 8b959e5cfa Avoid unnecessary 32-bit to 64-bit zero extensions following
32-bit CMOV instructions on x86_64. The 32-bit CMOV implicitly
zero extends.

Differential Revision: https://reviews.llvm.org/D22941

llvm-svn: 277148
2016-07-29 15:09:54 +00:00
Nirav Dave 8b3dc876ea [MC] When emitting output hash comments always use standard line comment seperator
llvm-svn: 277146
2016-07-29 14:42:00 +00:00
Daniel Sanders cbaca42a03 Re-commit: [mips][fastisel] Handle 0-4 arguments without SelectionDAG.
Summary:
Implements fastLowerArguments() to avoid the need to fall back on
SelectionDAG for 0-4 argument functions that don't do tricky things like
passing double in a pair of i32's.

This allows us to move all except one test to -fast-isel-abort=3. The
remaining one has function prototypes of the form 'i32 (i32, double, double)'
which requires floats to be passed in GPR's.

The previous commit had an uninitialized variable that caused the incoming
argument region to have undefined size. This has been fixed.

Reviewers: sdardis

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D22680

llvm-svn: 277136
2016-07-29 12:27:28 +00:00
Simon Pilgrim cb780b32a3 [X86][SSE] Optimize the truncation of vector comparison results with PACKSS
We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element.

Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently.

We avoid performing this on AVX512 as it should have better alternative truncation instructions.

Differential Revision: https://reviews.llvm.org/D22814

llvm-svn: 277132
2016-07-29 10:23:10 +00:00
Prakhar Bahuguna d1233e857e [Thumb] Emit Thumb move in both Thumb modes for struct_byval predicates
Summary:
The MOV/MOVT instructions being chosen for struct_byval predicates was
conditional only on Thumb2, resulting in an ARM MOV/MOVT instruction
being incorrectly emitted in Thumb1 mode. This is especially apparent
with v8-m.base targets. This patch ensures that Thumb instructions are
emitted in both Thumb modes.

Reviewers: rengolin, t.p.northover

Subscribers: llvm-commits, aemerson, rengolin

Differential Revision: https://reviews.llvm.org/D22865

llvm-svn: 277128
2016-07-29 09:16:46 +00:00
Craig Topper e4f868ea16 [AVX512] Mark EVEX VMOVSSrm and VMOVSDrm as canFoldAsLoad and isReMaterializable.
llvm-svn: 277120
2016-07-29 06:06:04 +00:00
Craig Topper 07aa37039e [AVX512] Add AVX512 run lines to some tests for scalar fma/add/sub/mul/div and regenerate. Follow up commits will bring AVX512 code up to the same quality as AVX/SSE.
llvm-svn: 277118
2016-07-29 06:05:58 +00:00
David Majnemer 130b9f99d6 [EarlyCSE] Correctly handle simplified, but live, instructions
Some instructions may have their uses replaced with a symbolic constant.
However, the instruction may still have side effects which percludes it
from being removed from the function.  EarlyCSE treated such an
instruction as if it were removed, resulting in PR28763.

llvm-svn: 277114
2016-07-29 05:39:21 +00:00
David Majnemer e4218cf11e [ConstantFolding] Fold bitcasts of vectors w/ undef elements
An undef vector element can be treated as if it had any value.  Folding
such a vector element to 0 in a bitcast can open up further folding
opportunities.

llvm-svn: 277104
2016-07-29 04:06:09 +00:00
David Majnemer 57b94c8d6a [ConstantFolding] Use ConstantExpr::getWithOperands
ConstantExpr::getWithOperands does much of the hard work that
ConstantFoldInstOperandsImpl tries to do but more completely.

This lets us fold ExtractValue/InsertValue expressions.

llvm-svn: 277100
2016-07-29 03:27:31 +00:00
David Majnemer d536f2328e [ConstnatFolding] Teach the folder how to fold ConstantVector
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.

Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.

llvm-svn: 277099
2016-07-29 03:27:26 +00:00
Craig Topper c7de3a1018 [AVX512] Remove the intrinsic forms of VMOVSS/VMOVSD. We don't need two different forms of 'rr' and 'rm'. This matches SSE/AVX.
I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently.

llvm-svn: 277098
2016-07-29 02:49:08 +00:00
Teresa Johnson 7de70738df Capture stderr when checking for gold version
On MacOS the ld version is emitted to stderr, resulting in lots of
messages in the ninja check output.

llvm-svn: 277092
2016-07-29 00:39:56 +00:00
Piotr Padlewski 84abc74f2c Added ThinLTO inlining statistics
Summary:
copypasta doc of ImportedFunctionsInliningStatistics class
 \brief Calculate and dump ThinLTO specific inliner stats.
 The main statistics are:
 (1) Number of inlined imported functions,
 (2) Number of imported functions inlined into importing module (indirect),
 (3) Number of non imported functions inlined into importing module
 (indirect).
 The difference between first and the second is that first stat counts
 all performed inlines on imported functions, but the second one only the
 functions that have been eventually inlined to a function in the importing
 module (by a chain of inlines). Because llvm uses bottom-up inliner, it is
 possible to e.g. import function `A`, `B` and then inline `B` to `A`,
 and after this `A` might be too big to be inlined into some other function
 that calls it. It calculates this statistic by building graph, where
 the nodes are functions, and edges are performed inlines and then by marking
 the edges starting from not imported function.

 If `Verbose` is set to true, then it also dumps statistics
 per each inlined function, sorted by the greatest inlines count like
 - number of performed inlines
 - number of performed inlines to importing module

Reviewers: eraman, tejohnson, mehdi_amini

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D22491

llvm-svn: 277089
2016-07-29 00:27:16 +00:00
Sanjoy Das c6af5ead86 [IR] Introduce a non-integral pointer type
Summary:
This change adds a `ni` specifier in the `datalayout` string to denote
pointers in some given address spaces as "non-integral", and adds some
typing rules around these special pointers.

Reviewers: majnemer, chandlerc, atrick, dberlin, eli.friedman, tstellarAMD, arsenm

Subscribers: arsenm, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22488

llvm-svn: 277085
2016-07-28 23:43:38 +00:00
Adam Nemet aa3506c5f0 [BPI] Add new LazyBPI analysis
Summary:
The motivation is the same as in D22141: In order to add the hotness
attribute to optimization remarks we need BFI to be available in all
passes that emit optimization remarks.  BFI depends on BPI so unless we
make this lazy as well we would still compute BPI unconditionally.

The solution is to use the new LazyBPI pass in LazyBFI and only compute
BPI when computation of BFI is requested by the client.

I extended the laziness test using a LoopDistribute test to also cover
BPI.

Reviewers: hfinkel, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22835

llvm-svn: 277083
2016-07-28 23:31:12 +00:00
Changpeng Fang 26fb9d268b AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.
Differential Revision: http://reviews.llvm.org/D22021

Reviewed by: arsenm

llvm-svn: 277073
2016-07-28 23:01:45 +00:00
Vitaly Buka 0ab23cf1c8 Do not remove empty lifetime.start/lifetime.end ranges
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.

This is possible for code like this:
for (int i = 0; i < 3; i++) {
  int x;
  p = &x;
}

"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.

PR27453

Reviewers: eugenis

Differential Revision: https://reviews.llvm.org/D22842

llvm-svn: 277072
2016-07-28 22:59:03 +00:00
Vitaly Buka 2fae6a7702 Should be committed as one CL.
This reverts commits r277068 r277067 r277066.

llvm-svn: 277071
2016-07-28 22:59:01 +00:00
Vitaly Buka f0500b6ae5 Do not remove empty lifetime.start/lifetime.end ranges
Summary:
Asan stack-use-after-scope check should poison alloca even if there is
no access between start and end.

This is possible for code like this:
for (int i = 0; i < 3; i++) {
  int x;
  p = &x;
}

"Loop Invariant Code Motion" will move "p = &x;" out of the loop, making
start/end range empty.

PR27453

Reviewers: eugenis

Differential Revision: https://reviews.llvm.org/D22842

llvm-svn: 277068
2016-07-28 22:50:48 +00:00
Vitaly Buka 3645793872 maned
llvm-svn: 277067
2016-07-28 22:50:45 +00:00
Michael Kuperstein e45d4d9b35 [PM] Port LowerGuardIntrinsic to the new PM.
llvm-svn: 277057
2016-07-28 22:08:41 +00:00
David Majnemer 3d32b7ed0d [coroutines] Part 3 of N: Adding Boilerplate for Coroutine Passes
This adds boilerplate code for all coroutine passes,
the passes are no-ops for now.
Also, a small test has been added to verify that passes execute in
the expected order or not at all if coroutine support is disabled.

Patch by Gor Nishanov!

Differential Revision: https://reviews.llvm.org/D22847

llvm-svn: 277033
2016-07-28 21:04:31 +00:00
Krzysztof Parzyszek 6400dec5ab Fix build breaks after r277028
llvm-svn: 277031
2016-07-28 20:25:21 +00:00
Krzysztof Parzyszek 167d918225 [Hexagon] Implement MI-level constant propagation
llvm-svn: 277028
2016-07-28 20:01:59 +00:00
Krzysztof Parzyszek c43644d332 [Hexagon] Insert CFI instructions before throwing calls
Normally, CFI instructions should be inserted after allocframe, but
if allocframe is in the same packet with a call, the CFI instructions
should be inserted before that packet.

llvm-svn: 277020
2016-07-28 19:13:46 +00:00
Ahmed Bougacha 8550509b64 [AArch64][GlobalISel] Select G_BR.
This is the first unsized instruction we support; move down the
'sized' check to binops.

llvm-svn: 277007
2016-07-28 17:15:15 +00:00
Ahmed Bougacha d760de0b32 [MIRParser] Accept unsized generic instructions.
Since r276158, we require generic instructions to have a sized type.
G_BR doesn't; relax the restriction.

llvm-svn: 277006
2016-07-28 17:15:12 +00:00
Ahmed Bougacha d7748d6491 [AArch64][GlobalISel] Select GPR G_SUB.
llvm-svn: 277003
2016-07-28 16:58:35 +00:00
Ahmed Bougacha 61a7928dde [AArch64][GlobalISel] Select GPR G_AND.
llvm-svn: 277002
2016-07-28 16:58:31 +00:00
Ahmed Bougacha 46c05fc861 [GlobalISel] Remove types on selected insts instead of using LLT().
LLT() has a particular meaning: it's one invalid type. But we really
want selected instructions to have no type whatsoever.

Also verify that types don't linger after ISel, and enable the verifier
on the AArch64 select test.

llvm-svn: 277001
2016-07-28 16:58:27 +00:00
Ahmed Bougacha 07994ec39b [AArch64][GlobalISel] Remove 'alignment' from MIR tests. NFC.
llvm-svn: 277000
2016-07-28 16:58:21 +00:00
Wei Ding 07e03712d3 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

llvm-svn: 276998
2016-07-28 16:42:13 +00:00
Daniel Sanders 6e74651658 Revert r276982 and r276984: [mips][fastisel] Handle 0-4 arguments without SelectionDAG
It seems that the stack offset in callabi.ll varies between machines. I'll look
into it.

llvm-svn: 276989
2016-07-28 15:37:42 +00:00
Craig Topper 7e27885f69 [X86] Remove CustomInserter for FMA3 instructions. Looks like since we got full commuting support for FMAs after this was added, the coalescer can now get this right on its own.
Differential Revision: https://reviews.llvm.org/D22799

llvm-svn: 276987
2016-07-28 15:28:56 +00:00
Daniel Sanders e0b529f619 [mips][fastisel] Handle 0-4 arguments without SelectionDAG.
Summary:
Implements fastLowerArguments() to avoid the need to fall back on
SelectionDAG for 0-4 argument functions that don't do tricky things like
passing double in a pair of i32's.

This allows us to move all except one test to -fast-isel-abort=3. The
remaining one has function prototypes of the form 'i32 (i32, double, double)'
which requires floats to be passed in GPR's.

Reviewers: sdardis

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D22680

llvm-svn: 276982
2016-07-28 14:55:28 +00:00
Nicolai Haehnle 3b572002a2 AMDGPU: add execfix flag to SI_ELSE
Summary:
SI_ELSE is lowered into two parts:

s_or_saveexec_b64 dst, src (at the start of the basic block)

s_xor_b64 exec, exec, dst (at the end of the basic block)

The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction

s_and_b64 exec, exec, s[...]

which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:

s_or_savexec_b64 dst, src

s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow

s_xor_b64 exec, exec, dst

Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.

Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit

s_or_saveexec_b64 dst, src

...

s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst

and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions

s_and_b64 vcc, exec, vcc

before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.

In any case, this current patch could help in the short term with the whole
ExecModified business.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22846

llvm-svn: 276972
2016-07-28 11:39:24 +00:00
David Majnemer 19d024b2fd [ConstantFolding] Don't bail on folding if ConstantFoldConstantExpression fails
When folding an expression, we run ConstantFoldConstantExpression on
each operand of that expression.
However, ConstantFoldConstantExpression can fail and retur nullptr.

Previously, we would bail on further refining the expression.
Instead, use the original operand and see if we can refine a later
operand.

llvm-svn: 276959
2016-07-28 06:39:48 +00:00
David Majnemer 67f684e18e [CodeView] Don't crash on functions without subprograms
A function may have instructions annotated with debug info without
having a subprogram.

This fixes PR28747.

llvm-svn: 276956
2016-07-28 05:03:22 +00:00
David Majnemer 0be7155350 [InstCombine] Handle failures from ConstantFoldConstantExpression
ConstantFoldConstantExpression returns null when folding fails.

This fixes PR28745.

llvm-svn: 276952
2016-07-28 02:29:06 +00:00
Wei Mi 315bb33f27 Fix the assertion error in collectLoopUniforms caused by empty Worklist before expanding.
Contributed-by: David Callahan

Differential Revision: https://reviews.llvm.org/D22886

llvm-svn: 276943
2016-07-27 23:53:58 +00:00
George Burgess IV dbd35c44d4 [CFLAA] Add getModRefBehavior to CFLAnders.
This patch lets CFLAnders respond to mod-ref queries. It also includes
a small bugfix to CFLSteens.

Patch by Jia Chen.

Differential Revision: https://reviews.llvm.org/D22823

llvm-svn: 276939
2016-07-27 23:07:07 +00:00
Vedant Kumar fc07e8b428 [llvm-cov] Add a debug mode for source range highlighting (in html)
llvm-cov's `-dump' option now emits information which helps debug source
range highlighting in html mode.

llvm-svn: 276924
2016-07-27 21:57:15 +00:00
Justin Lebar 23a9686011 [LSV] Don't assume that bitcast ops are Instructions.
Summary:
When we ask the builder to create a bitcast on a constant, we get back a
constant, not an instruction.

Reviewers: asbirlea

Subscribers: jholewinski, mzolotukhin, llvm-commits, arsenm

Differential Revision: https://reviews.llvm.org/D22878

llvm-svn: 276922
2016-07-27 21:45:48 +00:00
Krzysztof Parzyszek 06a2b6b1ee [Hexagon] Find speculative loop preheader in hardware loop generation
Before adding a new preheader block, check if there is a candidate block
where the loop setup could be placed speculatively. This will be off by
default.

llvm-svn: 276919
2016-07-27 21:20:54 +00:00
Krzysztof Parzyszek 5241b8efcf [Hexagon] Do not optimize volatile stack spill slots
llvm-svn: 276916
2016-07-27 20:50:42 +00:00
Andrew Kaylor 9155354ff2 Revert EH-specific checks in BranchFolding that were causing blow ups in compile time.
Differential Revision: https://reviews.llvm.org/D22839

llvm-svn: 276898
2016-07-27 17:55:33 +00:00
Tim Northover 8d2f52e035 GlobalISel: support zero-sized allocas
All allocas must be at least 1 byte at the MachineIR level so we allocate just
one byte.

llvm-svn: 276897
2016-07-27 17:47:54 +00:00
Nirav Dave 06a99a46e2 [MC][X86] Fix Intel Operand assembly parsing for .set ids
Fix intel syntax special case identifier operands that refer to a constant
(e.g. .set <ID> n) to be interpreted as immediate not memory in parsing.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22585

llvm-svn: 276895
2016-07-27 17:39:41 +00:00
Simon Pilgrim 7efafbc583 [X86][SSE] Updated test so that both are applying the post-multiply
This is to ensure that there are no diffs other than due to buildvector/legalization

llvm-svn: 276882
2016-07-27 15:30:20 +00:00
Renato Golin 5267d53779 [ARM] Check that the thumb COFF segment flag gets set on thumb windows
Patch by Martin Storsjö.

llvm-svn: 276877
2016-07-27 14:37:18 +00:00
Ahmed Bougacha 6756a2c953 [GlobalISel] Introduce an instruction selector.
And implement it for AArch64, supporting x/w ADD/OR.

Differential Revision: https://reviews.llvm.org/D22373

llvm-svn: 276875
2016-07-27 14:31:55 +00:00
Daniel Sanders c5537427c2 [mips][ias] Check '$rs = $rd' constraints when both registers are in AsmText.
Summary:
This is one possible solution to the problem of ignoring constraints that Simon
raised in D21473 but it's a bit of a hack.

The integrated assembler currently ignores violations of the tied register
constraints when the operands involved in a tie are both present in the AsmText.
For example, 'dati $rs, $rt, $imm' with the '$rs = $rt' will silently replace
$rt with $rs. So 'dati $2, $3, 1' is processed as if the user provided
'dati $2, $2, 1' without any diagnostic being emitted.

This is difficult to solve properly because there are multiple parts of the
matcher that are silently forcing these constraints to be met. Tied operands are
rendered to instructions by cloning previously rendered operands but this is
unnecessary because the matcher was already instructed to render the operand it
would have cloned. This is also unnecessary because earlier code has already
replaced the MCParsedOperand with the one it was tied to (so the parsed input
is matched as if it were 'dati <RegIdx 2>, <RegIdx 2>, <Imm 1>'). As a result,
it looks like fixing this properly amounts to a rewrite of the tied operand
handling which affects all targets.

This patch however, merely inserts a checking hook just before the
substitution of MCParsedOperands and the Mips target overrides it. It's not
possible to accurately check the registers are the same this early (because
numeric registers haven't been bound to a register class yet) so it cheats a
bit and checks that the tokens that produced the operand are lexically
identical. This works because tied registers need to have the same register
class but it does have a flaw. It will reject 'dati $4, $a0, 1' for violating
the constraint even though $a0 ends up as the same register as $4.

Reviewers: sdardis

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D21994

llvm-svn: 276867
2016-07-27 13:49:44 +00:00
Teresa Johnson d92012d51c [test/gold] Add gold test subdirectory tests needing v1.12 (or higher)
Summary:
As discussed in the review for D22677, added a subdirectory to
enable tests that require at least version 1.12 of gold.

Add an initial test requiring this version.

Reviewers: davidxl, mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D22827

llvm-svn: 276860
2016-07-27 12:59:51 +00:00
Renato Golin 80e58869f8 [ARM] Set a non-conflicting comment character for assembly in MSVC mode
Currently, for ARMCOFFMCAsmInfoMicrosoft, no comment character is set, thus the
idefault, '#', is used.

The hash character doesn't work as comment character in ARM assembly, since '#'
is used for immediate values.

The comment character is set to ';', which is the comment character used by MS
armasm.exe. (The microsoft armasm.exe uses a different directive syntax than
what LLVM currently supports though, similar to ARM's armasm.)

This allows inline assembly with immediate constants to be built (and brings the
assembly output from clang -S closer to being possible to assemble).

A test is added that verifies that ';' is correctly interpreted as comments in
this mode, and verifies that assembling code that includes literal constants
with a '#' works.

Patch by Martin Storsjö.

llvm-svn: 276859
2016-07-27 12:31:58 +00:00
Renato Golin b9edb5c9b0 [ARM] Adds test for immediate encoding
The encoding of expressions as immediates wasn't correct, and was reported in
PR23000. However, we have done some refactoring on how immediates are handled
and now it seems the problem is fixed. This is a test just to make sure it
won't regress again.

llvm-svn: 276858
2016-07-27 12:15:26 +00:00
Simon Pilgrim 10bf0ff879 [DAGCombiner] Use APInt directly to detect out of range shift constants
Using getZExtValue() will assert if the value doesn't fit into uint64_t - SHL was already doing this, I've just updated ASHR/LSHR to match

As mentioned on D22726

llvm-svn: 276855
2016-07-27 10:30:55 +00:00
Davide Italiano 7c9fc738b1 [MC] Add command-line option to choose the max nest level in asm macros.
Submitted by: t83wCSLq
Differential Revision:  https://reviews.llvm.org/D22313

llvm-svn: 276842
2016-07-27 05:51:56 +00:00
Sebastian Pop 55c3007b88 GVN-hoist: improve code generation for recursive GEPs
When loading or storing in a field of a struct like "a.b.c", GVN is able to
detect the equivalent expressions, and GVN-hoist would fail in the code
generation.  This is because the GEPs are not hoisted as scalar operations to
avoid moving the GEPs too far from their ld/st instruction when the ld/st is not
movable.  So we end up having to generate code for the GEP of a ld/st when we
move the ld/st.  In the case of a GEP referring to another GEP as in "a.b.c" we
need to code generate all the GEPs necessary to make all the operands available
at the new location for the ld/st.  With this patch we recursively walk through
the GEP operands checking whether all operands are available, and in the case of
a GEP operand, it recursively makes all its operands available. Code generation
happens from the inner GEPs out until reaching the GEP that appears as an
operand of the ld/st.

Differential Revision: https://reviews.llvm.org/D22599

llvm-svn: 276841
2016-07-27 05:48:12 +00:00
Vedant Kumar 90be9db7d9 [llvm-cov] Escape '\' in strings when emitting JSON
Test that Windows path separators are escaped properly. Add a round-trip
test to verify the JSON produced by the exporter.

llvm-svn: 276832
2016-07-27 04:08:32 +00:00
David Majnemer bc36b15253 [ConstantFolding] Correctly handle failures in ConstantFoldConstantExpressionImpl
Failures in ConstantFoldConstantExpressionImpl were ignored causing
crashes down the line.

This fixes PR28725.

llvm-svn: 276827
2016-07-27 02:39:16 +00:00
Andrew Kaylor f990fa5f7b Reverting r276771 due to MSan failures.
llvm-svn: 276824
2016-07-27 01:19:24 +00:00
Matt Arsenault e3862cdc93 AMDGPU: Use rcp for fdiv 1, x with fpmath metadata
Using rcp should be OK for safe math usually, so this
should not be replacing the original fdiv.

llvm-svn: 276823
2016-07-26 23:25:44 +00:00
Hans Wennborg 685e8ff953 Revert r276136 "Use ValueOffsetPair to enhance value reuse during SCEV expansion."
It causes Clang tests to fail after Windows self-host (PR28705).

(Also reverts follow-up r276139.)

llvm-svn: 276822
2016-07-26 23:25:13 +00:00
Matt Arsenault 918e81c3d4 AMDGPU: Add more tests for LDS size with occupancy
llvm-svn: 276821
2016-07-26 23:15:59 +00:00
Vedant Kumar 7101d73c71 Retry: [llvm-cov] Add support for exporting coverage data to JSON
This enables users to export coverage information as portable JSON for use by
analysis tools and storage in document based databases.

The export sub-command is invoked just like the others:

  llvm-cov export -instr-profile path/to/foo.profdata path/to/foo.binary

The resulting JSON contains a list of files and functions. Every file object
contains a list of segments, expansions, and a summary of the file's region,
function, and line coverage. Every function object contains the function's name
and regions. There is also a total summary for the entire object file.

Changes since the initial commit (r276813):

  - Fixed the regexes in the tests to handle Windows filepaths.

Patch by Eddie Hurtig!

Differential Revision: https://reviews.llvm.org/D22651

llvm-svn: 276818
2016-07-26 22:50:58 +00:00
Vedant Kumar e85353b849 Revert "[llvm-cov] Add support for exporting coverage data to JSON"
This reverts commit r276813. The Windows bots are complaining about some
of the filename regexes in the tests:

  http://bb.pgr.jp/builders/ninja-clang-i686-msc19-R/builds/5299

llvm-svn: 276816
2016-07-26 21:55:39 +00:00
Matthias Braun 333e468d15 MIRParser: Use dot instead of colon to mark subregisters
Change the syntax to use `%0.sub8` to denote a subregister.

This seems like a more natural fit to denote subregisters; I also plan
to introduce a new ":classname" syntax in upcoming patches to denote the
register class of a vreg.

Note that this commit disallows plain identifiers to start with a '.'
character.  This shouldn't affect anything as external names/IR
references are all prefixed with '$'/'%', plain identifiers are only
used for instruction names, register mask names and subreg indexes.

Differential Revision: https://reviews.llvm.org/D22390

llvm-svn: 276815
2016-07-26 21:49:34 +00:00
Vedant Kumar d5b7436c1f [llvm-cov] Add support for exporting coverage data to JSON
This enables users to export coverage information as portable JSON for use by
analysis tools and storage in document based databases.

The export sub-command is invoked just like the others:

  llvm-cov export -instr-profile path/to/foo.profdata path/to/foo.binary

The resulting JSON contains a list of files and functions. Every file object
contains a list of segments, expansions, and a summary of the file's region,
function, and line coverage. Every function object contains the function's name
and regions. There is also a total summary for the entire object file.

Patch by Eddie Hurtig!

Differential Revision: https://reviews.llvm.org/D22651

llvm-svn: 276813
2016-07-26 21:35:43 +00:00
Krzysztof Parzyszek 2a480599bb [Hexagon] Post-increment loads/stores enhancements
- Generate vector post-increment stores more aggressively.
- Predicate post-increment and vector stores in early if-conversion.

llvm-svn: 276800
2016-07-26 20:30:30 +00:00
Tim Northover ad2b717f2c GlobalISel: add generic load and store instructions.
Pretty straightforward, the only oddity is the MachineMemOperand (which it's
surprisingly difficult to share code for).

llvm-svn: 276799
2016-07-26 20:23:26 +00:00
Krzysztof Parzyszek 57c3ddddec [Hexagon] Gracefully handle reg class mismatch in HexagonLoopReschedule
llvm-svn: 276793
2016-07-26 19:17:13 +00:00
Krzysztof Parzyszek 6eba5b8c37 [Hexagon] Rerun bit tracker on new instructions in RIE
Consider this case:
  vreg1 = A2_zxth vreg0   (1)
  ...
  vreg2 = A2_zxth vreg1   (2)

Redundant instruction elimination could delete the instruction (1)
because the user (2) only cares about the low 16 bits. Then it could
delete (2) because the input is already zero-extended. The problem
is that the properties allowing each individual instruction to be
deleted depend on the existence of the other instruction, so either
one can be deleted, but not both.
The existing check for this situation in RIE was insufficient. The
fix is to update all dependent cells when an instruction is removed
(replaced via COPY) in RIE.

llvm-svn: 276792
2016-07-26 19:08:45 +00:00
Krzysztof Parzyszek 1adca30c39 [Hexagon] Bitwise operations for insert/extract word not simplified
Change the bit simplifier to generate REG_SEQUENCE instructions in
addition to COPY, which will handle cases of word insert/extract.

llvm-svn: 276787
2016-07-26 18:30:11 +00:00
Justin Lebar 16da82f4d2 Fix NVPTX/call-with-alloca-buffer.ll after r276777.
r276777 makes InstSimplify stronger, letting it see through some
unnecessary addrspace casts.

llvm-svn: 276786
2016-07-26 18:28:33 +00:00
Matthias Braun ee0679207b MIRParser: Use shorter cfi identifiers
In an instruction like:
	CFI_INSTRUCTION .cfi_def_cfa ...
we can drop the '.cfi_' prefix since that should be obvious by the
context:
	CFI_INSTRUCTION def_cfa ...

While being a terser and cleaner syntax this also prepares to dropping
support for identifiers starting with a dot character so we can use it
for expressions.

Differential Revision: http://reviews.llvm.org/D22388

llvm-svn: 276785
2016-07-26 18:20:00 +00:00
Davide Italiano f17d48e58a [MC] Don't crash when trying to emit a relocation against .bss.
Turn that into an error instead.

llvm-svn: 276783
2016-07-26 18:16:33 +00:00
David Majnemer 6774d612d4 [InstSimplify] Cast folding can be made more generic
Use isEliminableCastPair to determine if a pair of casts are foldable.

llvm-svn: 276777
2016-07-26 17:58:05 +00:00
Tim Northover ab395cb071 GlobalISel: add correct operand type to G_FRAME_INDEX instrs.
Frame indices should use "addFrameIndex", not "addImm".

llvm-svn: 276775
2016-07-26 17:42:40 +00:00
Krzysztof Parzyszek 29c567a3f0 [Hexagon] Add support for proper handling of H and L constraints
H -> High part of reg pair.
L -> Low part of reg pair.

Patch by Sundeep Kushwaha.

llvm-svn: 276773
2016-07-26 17:31:02 +00:00
Tim Northover 26e40bdb9b GlobalISel: omit braces on MachineInstr types when there's only one.
Tidies up the representation a bit in the common case.

llvm-svn: 276772
2016-07-26 17:28:01 +00:00
Andrew Kaylor 3104a6bad0 Re-committing r275284: add support to inline __builtin_mempcpy
Patch by Sunita Marathe

Differential Revision: http://reviews.llvm.org/D21920

llvm-svn: 276771
2016-07-26 17:23:13 +00:00
Matt Arsenault 07f65718bb AMDGPU: Add missing tests for xnack option for HSA
llvm-svn: 276765
2016-07-26 16:45:50 +00:00
Matt Arsenault 32fc527c65 AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

llvm-svn: 276764
2016-07-26 16:45:45 +00:00
Oliver Stannard 1c6e591457 [ARM] Improve error messages for .arch_extension directive
- More informative message when extension name is not an identifier token.
- Stop parsing directive if extension is unknown (avoid duplicate error
  messages).
- Report unsupported extensions with a source location, rather than
  report_fatal_error.

Differential Revision: https://reviews.llvm.org/D22806

llvm-svn: 276748
2016-07-26 14:24:43 +00:00
Oliver Stannard 2171828a49 [ARM] Implement -mimplicit-it assembler option
This option, compatible with gas's -mimplicit-it, controls the
generation/checking of implicit IT blocks in ARM/Thumb assembly.

This option allows two behaviours that were not possible before:
- When in ARM mode, emit a warning when assembling a conditional
  instruction that is not in an IT block. This is enabled with
  -mimplicit-it=never and -mimplicit-it=thumb.
- When in Thumb mode, automatically generate IT instructions when an
  instruction with a condition code appears outside of an IT block. This
  is enabled with -mimplicit-it=thumb and -mimplicit-it=always.

The default option is -mimplicit-it=arm, which matches the existing
behaviour (allow conditional ARM instructions outside IT blocks without
warning, and error if a conditional Thumb instruction is outside an IT
block).

The general strategy for generating IT blocks in Thumb mode is to keep a
small list of instructions which should be in the IT block, and only
emit them when we encounter something in the input which means we cannot
continue the block.  This could be caused by:
- A non-predicable instruction
- An instruction with a condition not compatible with the IT block
- The IT block already contains 4 instructions
- A branch-like instruction (including ALU instructions with the PC as
  the destination), which cannot appear in the middle of an IT block
- A label (branching into an IT block is not legal)
- A change of section, architecture, ISA, etc
- The end of the assembly file.

Some of these, such as change of section and end of file, are parsed
outside of the ARM asm parser, so I've added a new virtual function to
AsmParser to ensure any previously-parsed instructions have been
emitted. The ARM implementation of this flushes the currently pending IT
block.

We now have to try instruction matching up to 3 times, because we cannot
know if the current IT block is valid before matching, and instruction
matching changes depending on the IT block state (due to the 16-bit ALU
instructions, which set the flags iff not in an IT block). In the common
case of not having an open implicit IT block and the instruction being
matched not needing one, we still only have to run the matcher once.

I've removed the ITState.FirstCond variable, because it does not store
any information that isn't already represented by CurPosition. I've also
updated the comment on CurPosition to accurately describe it's meaning
(which this patch doesn't change).

Differential Revision: https://reviews.llvm.org/D22760

llvm-svn: 276747
2016-07-26 14:19:47 +00:00
Simon Pilgrim 0280959c0d [X86][SSE] Added extra memory folding tests for cvtsd2ss intrinsic
SSE only fold partial reg update instructions when optsize is enabled

llvm-svn: 276743
2016-07-26 12:44:50 +00:00
Simon Pilgrim 019e102426 [X86][SSE] Fixed issue with memory folding of (v)cvtsd2ss intrinsics
Fixed typo in the intrinsic definitions of (v)cvtsd2ss with memory folding.

This was only unearthed when rL276102 started using the intrinsic again.....

llvm-svn: 276740
2016-07-26 10:41:28 +00:00
Simon Dardis 68a204ddc1 [mips] MIPS64R6 compact branch support
MIPS64R6 compact branch support. As the MIPS LLVM backend uses distinct
MachineInstrs for certain 32 and 64 bit instructions (e.g. BEQ & BEQ64) that
map to the same instruction, extend compact branch support for the
corresponding 64bit branches.

Reviewers: dsanders

Differential Revision: https://reviews.llvm.org/D20164

llvm-svn: 276739
2016-07-26 10:25:07 +00:00
Simon Dardis 273fc26b79 [mips] sgtu, s[rl]l, sra, dnegu, neg instruction aliases
Add the instruction alias sgtu (register form only), two operand forms of
s[rl]l and sra, and missing single/two operand forms of dnegu/neg.

Reviewers: dsanders

Differential Revision: https://reviews.llvm.org/D22752

llvm-svn: 276736
2016-07-26 09:13:46 +00:00
David Majnemer a90a621d1e Reapply: [InstSimplify] Add support for bitcasts"
This reverts commit r276700 and reapplies r276698.
The relevant clang tests have been updated.

llvm-svn: 276727
2016-07-26 05:52:29 +00:00
Sebastian Pop 91d4a30159 GVN-hoist: use a DFS numbering of instructions (PR28670)
Instead of DFS numbering basic blocks we now DFS number instructions that avoids
the costly operation of which instruction comes first in a basic block.

Patch mostly written by Daniel Berlin.

Differential Revision: https://reviews.llvm.org/D22777

llvm-svn: 276714
2016-07-26 00:15:10 +00:00
Evgeniy Stepanov 906f6fb565 [safestack] Fix stack guard live range.
Stack guard slot is live throughout the function.

llvm-svn: 276712
2016-07-26 00:05:14 +00:00
Adam Nemet 39e039c606 [lit] Don't match tool names within new PM's <> markers
For example, stop expanding 'opt' in -passes='require<opt-remark-emit>'.

llvm-svn: 276707
2016-07-25 23:09:10 +00:00
Renato Golin 32b165f561 [ARM] Saturation instructions are DSP-only
The saturation instructions appeared in v6T2, with DSP extensions, but they
were being accepted / generated on any, with the new introduction of the
saturation detection in the back-end. This commit restricts the usage to
DSP-enable only cores.

Fixes PR28607.

llvm-svn: 276701
2016-07-25 22:25:25 +00:00
David Majnemer 6e06b577cc Revert "[InstSimplify] Add support for bitcasts"
This reverts commit r276698.  Clang has tests which rely on the
optimizer :(

llvm-svn: 276700
2016-07-25 22:24:59 +00:00
David Majnemer 62611fd3f7 [InstSimplify] Add support for bitcasts
BitCasts of BitCasts can be folded away as can BitCasts which don't
change the type of the operand.

llvm-svn: 276698
2016-07-25 22:04:58 +00:00
Simon Pilgrim fa863fb2f1 [X86] Regenerate v2i256 shift legalization tests
llvm-svn: 276692
2016-07-25 21:14:22 +00:00
Simon Pilgrim 2d7735e428 [X86] Regenerate i64 shift legalization tests
llvm-svn: 276691
2016-07-25 21:11:45 +00:00
Tim Northover 7c9eba90ff GlobalISel: add generic casts to IRTranslator
This adds LLVM's 3 main cast instructions (inttoptr, ptrtoint, bitcast) to the
IRTranslator. The first two are direct translations (with 2 MachineInstr types
each). Since LLT discards information, a bitcast might become trivial and we
emit a COPY in those cases instead.

llvm-svn: 276690
2016-07-25 21:01:29 +00:00
Tim Northover e2e0067352 GlobalISel[AArch64]: support pointer types in argument lowering.
They're basically i64 for AArch64, but we'll leave them intact for stranger
targets. Also add some tests for the (very few) other cases we can handle right
now.

llvm-svn: 276689
2016-07-25 21:01:17 +00:00
Michael Kuperstein 39feb6290c [PM] Port SymbolRewriter to the new PM
Differential Revision: https://reviews.llvm.org/D22703

llvm-svn: 276687
2016-07-25 20:52:00 +00:00
Kevin Enderby 95b0842e64 Next step along the way to getting good error messages for bad archives.
I consulted with Lang Hames on this work, and the goal was to add a bit
of "where" in the archive the error occurred along with what the error was.

So this step changes ArchiveMemberHeader into a class with a pointer
to the archive header and the parent archive.  Which allows the methods
in the ArchiveMemberHeader to determine which member the header is
for to include that information in the error message.

For this first step the "where" is just the offset to the member in the
archive.  The next step will be a new method on ArchiveMemberHeader
to get the full name, if possible, to be use in the error message.  Which
will now be possible as ArchiveMemberHeader contains a pointer to
the Archive with its string table and its size, etc. so the full name can
be determined from the header if it is valid.

Also this change adds the missing checks the archive header is actually
contained in the buffer and is not truncated, as well as if the terminating
characters are correct in the header.

And changes one error message in Archive::Child::getNext() where the
name or offset to member is now added.

llvm-svn: 276686
2016-07-25 20:36:36 +00:00
Jan Vesely b64c8925e9 AMDGPU: Remove read_workdim intrinsic
Differential revision: https://reviews.llvm.org/D22732

llvm-svn: 276682
2016-07-25 20:17:02 +00:00
Matt Arsenault 7cddfed7e8 Scalarizer: Support scalarizing intrinsics
llvm-svn: 276681
2016-07-25 20:02:54 +00:00
Matt Arsenault e047b2598e AMDGPU: Fix missing verify-machineinstrs in control flow test
llvm-svn: 276679
2016-07-25 19:39:06 +00:00
Evgeniy Stepanov 8d78bd5041 Fix invalid iterator use in safestack coloring.
llvm-svn: 276676
2016-07-25 19:25:40 +00:00
Rong Xu 705f7775bb [PGO] Fix profile mismatch in COMDAT function with pre-inliner
Pre-instrumentation inline (pre-inliner) greatly improves the IR
instrumentation code performance, among other benefits. One issue of the
pre-inliner is it can introduce CFG-mismatch for COMDAT functions. This
is due to the fact that the same COMDAT function may have different early
inline decisions across different modules -- that means different copies
of COMDAT functions will have different CFG checksum.

In this patch, we propose a partially renaming the COMDAT group and its
member function/variable so we have different profile counter for each
version. We will post-fix the COMDAT function and the group name with its
FunctionHash.

Differential Revision: http://reviews.llvm.org/D22600

llvm-svn: 276673
2016-07-25 18:45:37 +00:00
Simon Pilgrim ce8d82775c [X86][SSE] Added 2048-bit vector comparison tests
Upper limit of what can be held in a <32 x i8> result

llvm-svn: 276666
2016-07-25 17:56:01 +00:00
Elena Demikhovsky 64e5f929d0 AVX-512: Fixed [US]INT_TO_FP selection for i1 vectors.
It failed with assertion before this patch.

Differential Revision: https://reviews.llvm.org/D22735

llvm-svn: 276648
2016-07-25 16:51:00 +00:00
Wei Mi 97de034e18 Remove useless pass from the pipeline in test/Analysis/Dominators/2007-01-14-BreakCritEdges.ll.
llvm-svn: 276644
2016-07-25 16:27:34 +00:00
Krzysztof Parzyszek 080bebd212 [Hexagon] Add target feature to generate long calls
llvm-svn: 276638
2016-07-25 14:42:11 +00:00
Sam Parker d5ca0a65b5 [ARM] Improve longMAC codegen test
Added thumb targets and dataflow checks to the longMAC test.

Differential Revision: https://reviews.llvm.org/D22684

llvm-svn: 276629
2016-07-25 10:11:00 +00:00
Simon Dardis 618975206e [mips] Optimize materialization of i64 constants
Avoid MipsAnalyzeImmediate usage if the constant fits in an 32-bit
integer. This allows us to generate the same instructions for the
materialization of the same constants regardless the width of their
type.

Patch by: Vasileios Kalintiris

Contributions by: Simon Dardis

Reviewers: Daniel Sanders

Differential Review: https://reviews.llvm.org/D21689

llvm-svn: 276628
2016-07-25 09:57:28 +00:00
Sam Parker 68c71cd1e4 [ARM] Enable ISel of SMMLS for ARM and Thumb2
Use ISelDAGToDAG to recognise the SMMLS instruction pattern.

Differential Revision: https://reviews.llvm.org/D22562

llvm-svn: 276624
2016-07-25 09:20:20 +00:00
Craig Topper ce415ff9c5 [AVX512] Add load folding support for the unmasked forms of the FMA instructions.
llvm-svn: 276615
2016-07-25 07:20:35 +00:00
Craig Topper 318e40b6f7 [AVX512] Add some additional patterns so that we can fold broadcast loads in the first argument of an FMADD/FMSUB/FNMADD/FNMSUB/FMADDSUB/FMSUBADD node. Also add patterns to support all combinations of the broadcast input and the preserved input for masked versions.
llvm-svn: 276614
2016-07-25 07:20:31 +00:00
Craig Topper 6bcbf5338c [AVX512] Cleanup FMA operand order in patterns to match the VEX versions and to really be 213, 231, and 132.
llvm-svn: 276613
2016-07-25 07:20:28 +00:00
Sean Silva fe5abd5e0c Fix : Partial Inliner requires AssumptionCacheTracker
The public InlineFunction utility assumes that the passed in
InlineFunctionInfo has a valid AssumptionCacheTracker.

Patch by River Riddle!

Differential Revision: https://reviews.llvm.org/D22706

llvm-svn: 276609
2016-07-25 05:00:00 +00:00
David Majnemer 68623a0e9f [GVNHoist] Merge metadata on hoisted instructions less conservatively
We can combine metadata from multiple instructions intelligently for
certain metadata nodes.

llvm-svn: 276602
2016-07-25 02:21:25 +00:00
David Majnemer 4728569d0a [GVNHoist] Properly merge alignments when hoisting
If we two loads of two different alignments, we must use the minimum of
the two alignments when hoisting.  Same deal for stores.

For allocas, use the maximum of the two allocas.

llvm-svn: 276601
2016-07-25 02:21:23 +00:00
Simon Pilgrim a6878bdc0f [X86][SSE] Added PR27854 tests
llvm-svn: 276571
2016-07-24 16:39:50 +00:00
Simon Pilgrim b7d75fee74 [X86] Add shift double tests for PR14593
llvm-svn: 276570
2016-07-24 16:10:21 +00:00
Simon Pilgrim 381a0ade5a [X86] Add 'FeatureSlowSHLD' to cpu 'bdver4'
As with all AMD CPUs, excavator has poor SHLD/SHRD performance. Also added bdver3 to the test as it was missing.

llvm-svn: 276569
2016-07-24 16:00:53 +00:00
Simon Pilgrim b2df2dc298 [X86] Add SHRD shift combine tests
llvm-svn: 276568
2016-07-24 15:47:44 +00:00
Simon Pilgrim 0ededcb344 [X86] Regenerate shift by parts tests
llvm-svn: 276567
2016-07-24 15:38:51 +00:00
Simon Pilgrim 114205076d [X86][SSE] Regenerate shifts tests
llvm-svn: 276566
2016-07-24 15:25:36 +00:00
Simon Pilgrim 30a7cc2e1f [X86][SSE] Regenerate SSE copysign tests
llvm-svn: 276565
2016-07-24 15:17:50 +00:00
Simon Pilgrim 7f1c1db73c [X86][AVX512VL] Added AVX512VL half2float vector conversions tests to demonstrate PR23941
llvm-svn: 276563
2016-07-24 13:01:51 +00:00
Craig Topper 2dca3b287b [X86] Make the FMA3 instruction names consistent between VEX and EVEX encoded versions.
This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions.

llvm-svn: 276559
2016-07-24 08:26:38 +00:00
Elena Demikhovsky 376a18bd92 [Loop Vectorizer] Handling loops FP induction variables.
Allowed loop vectorization with secondary FP IVs. Like this:
float *A;
float x = init;
for (int i=0; i < N; ++i) {
  A[i] = x;
  x -= fp_inc;
}

The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute.

Differential Revision: https://reviews.llvm.org/D21330

llvm-svn: 276554
2016-07-24 07:24:54 +00:00
Simon Pilgrim 9e99dd9e8b [X86][SSE] Added float widened broadcast tests
llvm-svn: 276535
2016-07-23 21:24:02 +00:00
Simon Pilgrim 8d18969716 [X86][SSE] Added more widened broadcast tests
Added more vXi16 and vXi8 tests

llvm-svn: 276534
2016-07-23 21:15:31 +00:00
Simon Pilgrim b9e47a8cd0 [X86][SSE] Added tests where we should be trying to widen a load+splat into a broadcast
llvm-svn: 276527
2016-07-23 16:19:17 +00:00
Simon Pilgrim 8aa6f34455 [X86][SSE] Regenerated uitofp <2 x i32> -> <2 x float> conversion tests
Demonstrate difference in codegen discussed on PR14760

llvm-svn: 276526
2016-07-23 15:55:42 +00:00
Sanjay Patel 1271bf9178 [InstCombine] allow icmp (bit-manipulation-intrinsic(), C) folds for vectors
llvm-svn: 276523
2016-07-23 13:06:49 +00:00
Craig Topper b6519db90d [AVX512] Implement commuting support for EVEX encoded FMA3 instructions.
llvm-svn: 276521
2016-07-23 07:16:56 +00:00
Xinliang David Li 9239245401 [Profile] Use explicit flag to enable IR PGO
Patch by Jake VanAdrighem

Differential Revision: http://reviews.llvm.org/D22607

llvm-svn: 276516
2016-07-23 04:28:52 +00:00
Sanjoy Das a7d9ec8751 [SCEV] Make isImpliedCondOperandsViaRanges smarter
This change lets us prove things like

  "{X,+,10} s< 5000" implies "{X+7,+,10} does not sign overflow"

It does this by replacing replacing getConstantDifference by
computeConstantDifference (which is smarter) in
isImpliedCondOperandsViaRanges.

llvm-svn: 276505
2016-07-23 00:54:36 +00:00
Sanjay Patel 8d8594acb9 auto-generate checks
llvm-svn: 276501
2016-07-23 00:09:54 +00:00
Tom Stellard b8253c88b6 Revert "[AMDGPU] Emit read-only data to .rodata for hsa"
This reverts commit r276298.

Data stored in .rodata can have a negative offset from .text, but we
don't support negative values in relocations yet.

This caused a regression in one of the amp conformance tests:
5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01

llvm-svn: 276498
2016-07-22 23:46:40 +00:00
Adam Nemet 9e6e63fba2 [LoopDataPrefetch] Include hotness of region in opt remark
llvm-svn: 276488
2016-07-22 22:53:17 +00:00
Sanjay Patel e063ddb347 add tests for icmp vector folds
llvm-svn: 276482
2016-07-22 22:19:52 +00:00
Tim Northover 98a56eb7f4 GlobalISel: allow multiple types on MachineInstrs.
llvm-svn: 276481
2016-07-22 22:13:36 +00:00
Vitaly Buka e3a032a740 Unpoison stack before resume instruction
Summary:
Clang inserts cleanup code before resume similar way as before return instruction.
This makes asan poison local variables causing false use-after-scope reports.

__asan_handle_no_return does not help here as it was executed before
llvm.lifetime.end inserted into resume block.

To avoid false report we need to unpoison stack for resume same way as for return.

PR27453

Reviewers: kcc, eugenis

Differential Revision: https://reviews.llvm.org/D22661

llvm-svn: 276480
2016-07-22 22:04:38 +00:00
Alina Sbirlea ba21ffebff Add flag to PassManagerBuilder to disable GVN Hoist Pass.
Summary:
Adding a flag to diable GVN Hoisting by default.
Note: The GVN Hoist Pass causes some Halide tests to hang. Halide will disable the pass while investigating.

Reviewers: llvm-commits, chandlerc, spop, dberlin

Subscribers: mehdi_amini

Differential Revision: https://reviews.llvm.org/D22639

llvm-svn: 276479
2016-07-22 22:02:19 +00:00
Michael Kuperstein 38e7298093 [SLPVectorizer] Vectorize reverse-order loads in horizontal reductions
When vectorizing a tree rooted at a store bundle, we currently try to sort the
stores before building the tree, so that the stores can be vectorized. For other
trees, the order of the root bundle - which determines the order of all other
bundles - is arbitrary. That is bad, since if a leaf bundle of consecutive loads
happens to appear in the wrong order, we will not vectorize it.

This is partially mitigated when the root is a binary operator, by trying to
build a "reversed" tree when that's considered profitable. This patch extends the
workaround we have for binops to trees rooted in a horizontal reduction.

This fixes PR28474.

Differential Revision: https://reviews.llvm.org/D22554

llvm-svn: 276477
2016-07-22 21:28:48 +00:00
Sanjay Patel cbc4377af1 add tests for icmp vector folds
llvm-svn: 276476
2016-07-22 21:28:20 +00:00
Sanjay Patel 97e61dcc2d add tests for icmp vector folds
llvm-svn: 276475
2016-07-22 21:13:08 +00:00
Sanjay Patel b73d7aed71 add tests for icmp vector folds
llvm-svn: 276472
2016-07-22 21:02:33 +00:00
Vedant Kumar 127d0502a0 [llvm-cov] Don't copy stylesheets into index files
Just link in the stylesheet from the toplevel dir of the report.

llvm-svn: 276468
2016-07-22 20:49:23 +00:00
Sanjay Patel 859278005d update to use FileCheck and auto-generate checks
llvm-svn: 276466
2016-07-22 20:39:07 +00:00
Sanjay Patel 296a776a5b add tests for icmp vector folds
llvm-svn: 276464
2016-07-22 20:11:08 +00:00
Tim Northover 33b07d6725 GlobalISel: implement legalization pass, with just one transformation.
This adds the actual MachineLegalizeHelper to do the work and a trivial pass
wrapper that legalizes all instructions in a MachineFunction. Currently the
only transformation supported is splitting up a vector G_ADD into one acting on
smaller vectors.

llvm-svn: 276461
2016-07-22 20:03:43 +00:00
Teresa Johnson f432c9cefa [ThinLTO/gold] Remove thin archive part of new test due to bot failures
I am getting a bot failure from the thin archive part of this test:

From
http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/40468/steps/test_llvm/logs/LLVM%20%3A%3A%20tools__gold__X86__thinlto_emit_linked_objects.ll:

Command Output (stderr):
--
/home/bb/cmake-llvm-x86_64-linux/build/./bin/llvm-ar: creating
/home/bb/cmake-llvm-x86_64-linux/build/test/tools/gold/X86/Output/thinlto_emit_linked_objects.ll.tmp2.a
/usr/bin/ld.gold: internal error in add_writer, at
../../gold/token.h:124

--

This appears to be an issue with an older version of gold. The test case
passes for me locally when I use the gold v1.12 I was testing with, but
when I tried the gold installed on my system which is v1.11 I get the
same error.

Remove the thin archive version of the test, since there isn't a way to
predicate it on gold version.

llvm-svn: 276453
2016-07-22 18:32:30 +00:00
Jun Bum Lim 6a7dc5c430 Recommit - [DSE]Enhance shorthening MemIntrinsic based on OverlapIntervals
Recommiting r275571 after fixing crash reported in PR28270.
Now we erase elements of IOL in deleteDeadInstruction().

Original Summary:
This change use the overlap interval map built from partial overwrite tracking to perform shortening MemIntrinsics.
Add test cases which was missing opportunities before.

llvm-svn: 276452
2016-07-22 18:27:24 +00:00
Sanjay Patel beaea95a0d add tests for vector bit manipulation intrinsics
llvm-svn: 276451
2016-07-22 18:22:25 +00:00
Teresa Johnson 1e2708c9e0 [ThinLTO/gold] Support for getting list of included objects from gold
Summary:
In the distributed backend case, the ThinLink step and the final native
object link are separate processes. This can be problematic when archive
libraries are involved in the link (e.g. via --start-lib/--end-lib
pairs). The linker only includes objects from libraries when
there is a strong reference to them, and depending on the intervening
ThinLTO backend processes' importing/inlining, the strong references
may appear different in the two link steps. See D22356 and D22467
for two scenarios where this causes issues.

To ensure that the final link includes the same objects, this patch
adds support for an "=filename" form of the thinlto-index-only plugin
option, in which case objects gold included in the link are emitted to
the given filename. This should be used as input to the final link (e.g.
via the @filename option to gold), instead of listing all the objects
within --start-lib/--end-lib pairs again.

Note that the support for the gold callback that identifies included
objects was added in gold version 1.12.

Reviewers: davidxl, mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D22677

llvm-svn: 276450
2016-07-22 18:20:22 +00:00
Wei Mi e04d0eff29 [PM] Port BreakCriticalEdges to the new PM.
Differential Revision: https://reviews.llvm.org/D22688

llvm-svn: 276449
2016-07-22 18:04:25 +00:00
Anna Thomas 0be4a0e6a4 Invariant start/end intrinsics overloaded for address space
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address
space.

With this change, these intrinsics are overloaded for any adddress space
for memory objects
and we can use these llvm invariant intrinsics in non-default address
spaces.

Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)

This overloaded intrinsic is needed for representing final or invariant
memory in managed languages.

Reviewers: apilipenko, reames

Subscribers: llvm-commits
llvm-svn: 276447
2016-07-22 17:49:40 +00:00
Matt Arsenault e2fe67b951 AMDGPU: Remove redundant test
llvm-svn: 276439
2016-07-22 17:01:36 +00:00
Matt Arsenault 3c07c813c0 AMDGPU: Fix groupstaticsize for large LDS
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.

This should probably be merged to the release branch.

llvm-svn: 276438
2016-07-22 17:01:33 +00:00
Matt Arsenault 8d718dcfda AMDGPU: Add HSA dispatch id intrinsic
llvm-svn: 276437
2016-07-22 17:01:30 +00:00
Matt Arsenault 7fb961f3e6 AMDGPU: Fix i1 fp_to_int
R600's i1 fp_to_uint selected but was incorrect according to
what instcombine constant folds to.

llvm-svn: 276435
2016-07-22 17:01:21 +00:00
Tim Northover bd5054602e GlobalISel: implement alloca instruction
llvm-svn: 276433
2016-07-22 16:59:52 +00:00
Simon Pilgrim 820f87a72d [SelectionDAG] Optimization of BITREVERSE legalization for power-of-2 integer scalar/vector types
An extension of D19978, this patch replaces the default BITREVERSE evaluation of individual bit masks+shifts with block mask+shifts when we have integer elements of power-of-2 bits in size.

After calling BSWAP to reverse the order of the constituent bytes (which typically follows a similar approach), every neighbouring 4-bits, 2-bits and finally 1-bit pairs are masked off and swapped over with shifts.

In doing so we can significantly reduce the number of operations required.

Differential Revision: https://reviews.llvm.org/D21578

llvm-svn: 276432
2016-07-22 16:46:25 +00:00
Krzysztof Parzyszek d3d0a4bda3 [Hexagon] Use loop data prefetch on Hexagon
llvm-svn: 276422
2016-07-22 14:22:43 +00:00
Simon Pilgrim ea0d4f9962 [X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 (reapplied)
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.

This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.

We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).

Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly).

Differential Revision: https://reviews.llvm.org/D22460

llvm-svn: 276416
2016-07-22 13:58:44 +00:00
Ahmed Bougacha 29333c9de6 [FastISel] Ignore @llvm.assume.
llvm-svn: 276410
2016-07-22 12:54:53 +00:00
Ying Yi e59ee43cf1 [llvm-cov] - Add the coverage of lines in the summary report.
The llvm-cov ‘report' command displays a summary of the coverage of a binary file.
The summary report currently only includes covered regions and covered functions.
This patch adds the coverage of lines in the summary report.

Differential Revision: https://reviews.llvm.org/D22569

llvm-svn: 276409
2016-07-22 12:46:13 +00:00
Benjamin Kramer a81f4728f3 [llvm-profdata] Bring back reading profile data from STDIN.
This feature was lost in r276197.

llvm-svn: 276407
2016-07-22 12:39:55 +00:00
Benjamin Kramer 5ba0e20315 Revert "[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128"
It caused PR28657.

This reverts commit r276281.

llvm-svn: 276405
2016-07-22 11:03:10 +00:00
Ying Yi 60a3da3f4c [llvm-cov] - Improve llvm-cov error message
Summary:

When giving the following command:
% llvm-cov report -instr-profile=default.profraw

llvm-cov will give the following error message:

>llvm-cov report: Not enough positional command line arguments specified!
>Must specify at least 1 positional arguments: See: orbis-llvm-cov report -help

This patch changes the error message from  '1 positional arguments'
to '1 positional argument'.

Differential Revision: https://reviews.llvm.org/D22621

llvm-svn: 276404
2016-07-22 10:52:21 +00:00
Hrvoje Varga 2db00ce4b6 [mips][microMIPS] Implement SLT, SLTI, SLTIU, SLTU microMIPS32r6 instructions
Differential Revision: https://reviews.llvm.org/D19906

llvm-svn: 276397
2016-07-22 07:18:33 +00:00
Craig Topper 52e2e8381b [AVX512] Add ExeDomain to vector extend and truncate instructions.
llvm-svn: 276394
2016-07-22 05:46:44 +00:00
Craig Topper f4151bea72 [AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions.
llvm-svn: 276393
2016-07-22 05:00:52 +00:00
Craig Topper 0b90756b0a [AVX512] Add load folding for some AVX512VL logic and arithmetic instructions.
llvm-svn: 276391
2016-07-22 05:00:39 +00:00
Craig Topper ab13b33ded [AVX512] Update X86InstrInfo::foldMemoryOperandCustom to handle the EVEX encoded instructions too.
llvm-svn: 276390
2016-07-22 05:00:35 +00:00
David Majnemer 522a91181a Don't remove side effecting instructions due to ConstantFoldInstruction
Just because we can constant fold the result of an instruction does not
imply that we can delete the instruction.  It may have side effects.

This fixes PR28655.

llvm-svn: 276389
2016-07-22 04:54:44 +00:00
Vitaly Buka 53054a7024 Fix detection of stack-use-after scope for char arrays.
Summary:
Clang inserts GetElementPtrInst so findAllocaForValue was not
able to find allocas.

PR27453

Reviewers: kcc, eugenis

Differential Revision: https://reviews.llvm.org/D22657

llvm-svn: 276374
2016-07-22 00:56:17 +00:00
Sanjoy Das aae623f4c2 [IRCE] Don't misuse CHECK-LABEL; NFC
llvm-svn: 276373
2016-07-22 00:41:02 +00:00
Sanjoy Das bb969791b4 [IRCE] Add an option to skip profitability checks
If `-irce-skip-profitability-checks` is passed in, IRCE will kick in in
all cases where it is legal for it to kick in.  This flag is intended to
help diagnose and analyse performance issues.

llvm-svn: 276372
2016-07-22 00:40:56 +00:00
Vedant Kumar fa522ca3b3 [llvm-cov] Strengthen a test case
Check that stylesheets work when we're not using -output-dir.

llvm-svn: 276363
2016-07-21 23:31:26 +00:00
Vedant Kumar c076c49076 [llvm-cov] Use relative paths to the stylesheet (for html reports)
This makes it easy to swap out the default stylesheet for a custom one.
It also shaves ~6.62 MB out of the report directory for a full coverage
build of llvm+clang.

While we're at it, prune the CSS and add tests for it.

llvm-svn: 276359
2016-07-21 23:26:15 +00:00
Sebastian Pop 31fd506623 GVH-hoist: only clone GEPs (PR28606)
Do not clone stored values unless they are GEPs that are special cased to avoid
hoisting them without hoisting their associated ld/st.

Differential revision: https://reviews.llvm.org/D22652

llvm-svn: 276358
2016-07-21 23:22:10 +00:00
Wei Mi 1cf58f8996 [PM] Port NaryReassociate to the new PM
Differential Revision: https://reviews.llvm.org/D22648

llvm-svn: 276349
2016-07-21 22:28:52 +00:00
Quentin Colombet ecd81a3d1b [MIRTesting] Abort when failing to parse a function.
When we failed to parse a function in the mir parser, we should abort
the whole compilation instead of continuing in a weird state. Indeed,
this was creating strange machine function passes failures that were
hard to understand, until we notice that the function actually did not
get parsed correctly!

llvm-svn: 276348
2016-07-21 22:25:57 +00:00
Michael Kuperstein c523333bbf [X86] Do not use AND8ri8 in AVX512 pattern
This variant is (as documented in the TD) for disassembler use only, and should
not be used in patterns - it is longer, and is broken on 64-bit.

llvm-svn: 276347
2016-07-21 22:24:08 +00:00
Sanjay Patel e9fc79bb13 [InstSimplify] don't crash handling a pointer or aggregate type
llvm-svn: 276345
2016-07-21 21:56:00 +00:00
Akira Hatanaka b8d2873d93 [AArch64][Inline-Asm] Return the 32-bit floating point register class
when constraint "w" is used on a 32-bit operand.

This enables compiling the following code, which used to error out in
the backend:

void foo1(int a) {
  asm volatile ("sqxtn h0, %s0\n" : : "w"(a):);
}

Fixes PR28633.

llvm-svn: 276344
2016-07-21 21:39:05 +00:00
Sanjay Patel a3bfb4e313 [InstSimplify] recognize trunc + icmp sgt/slt variants of select simplifications (PR28466)
rL245171 exposed a hole in InstSimplify that manifested in a strange way in PR28466:
https://llvm.org/bugs/show_bug.cgi?id=28466

It's possible to use trunc + icmp sgt/slt in place of an and + icmp eq/ne, so we need to
recognize that pattern to eliminate selects that are choosing between some value and some
bitmasked version of that value.

Note that there is significant room for improvement (refactoring) and enhancement (more
patterns, possibly in InstCombine rather than here).

Differential Revision: https://reviews.llvm.org/D22537

llvm-svn: 276341
2016-07-21 21:26:45 +00:00
Adam Nemet 84a6425d61 [OptDiag,LDist] Convert remaining opt remarks to use the new API
llvm-svn: 276340
2016-07-21 21:21:34 +00:00
Matthew Simpson 102729cf1b [LV] Move vector int induction update to end of latch
This patch moves the update instruction for vectorized integer induction phi
nodes to the end of the latch block. This ensures consistent placement of all
induction updates across all the kinds of int inductions we create (scalar,
splat vector, or vector phi).

Differential Revision: https://reviews.llvm.org/D22416

llvm-svn: 276339
2016-07-21 21:20:15 +00:00
Sanjay Patel 9eec550a2b add vector tests and a simpler version of the negative tests
llvm-svn: 276328
2016-07-21 20:11:08 +00:00
Anna Thomas c858faa244 Revert "Invariant start/end intrinsics overloaded for address space"
This reverts commit r276316.

llvm-svn: 276320
2016-07-21 19:06:28 +00:00
Anna Thomas 29b24dfe44 Invariant start/end intrinsics overloaded for address space
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address space.

With this change, these intrinsics are overloaded for any adddress space for memory objects
and we can use these llvm invariant intrinsics in non-default address spaces.

Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)

This overloaded intrinsic is needed for representing final or invariant memory in managed languages.

Reviewers: tstellarAMD, reames, apilipenko

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22519

llvm-svn: 276316
2016-07-21 18:41:44 +00:00
Quentin Colombet 2b59eab79f [IRTranslator] Add G_SUB opcode.
This commit adds a generic SUB opcode to global-isel.

llvm-svn: 276308
2016-07-21 17:26:50 +00:00
Konstantin Zhuravlyov 3c0d8d22fe [AMDGPU] Emit read-only data to .rodata for hsa
Differential Revision: https://reviews.llvm.org/D22538

llvm-svn: 276298
2016-07-21 15:59:23 +00:00
Quentin Colombet 7bcc921dd8 [IRTranslator] Add G_AND opcode.
This commit adds a generic AND opcode to global-isel.

llvm-svn: 276297
2016-07-21 15:50:42 +00:00
Konstantin Zhuravlyov 155626238b AMDGPU/SI: Add support for R_AMDGPU_ABS32
Differential Revision: https://reviews.llvm.org/D21646

llvm-svn: 276294
2016-07-21 15:29:19 +00:00
Geoff Berry 4ff2e36d32 [AArch64] Load/store opt: Don't count transient instructions towards search limits.
Summary:
This change also changes findMatchingInsn and
findMatchingUpdateInsnForward to take DBG_VALUE opcodes into account
when tracking register defs and uses, which could potentially inhibit
these optimizations in the presence of debug information.

Reviewers: mcrosier

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22582

llvm-svn: 276293
2016-07-21 15:20:25 +00:00
Simon Pilgrim 88e0940d3b [X86][SSE] Allow folding of store/zext with PEXTRW of 0'th element
Under normal circumstances we prefer the higher performance MOVD to extract the 0'th element of a v8i16 vector instead of PEXTRW.

But as detailed on PR27265, this prevents the SSE41 implementation of PEXTRW from folding the store of the 0'th element. Additionally it prevents us from making use of the fact that the (SSE2) reg-reg version of PEXTRW implicitly zero-extends the i16 element to the i32/i64 destination register.

This patch only preferentially lowers to MOVD if we will not be zero-extending the extracted i16, nor prevent a store from being folded (on SSSE41).

Fix for PR27265.

Differential Revision: https://reviews.llvm.org/D22509

llvm-svn: 276289
2016-07-21 14:54:17 +00:00
Simon Pilgrim 4caefdf834 Fixed line endings
llvm-svn: 276287
2016-07-21 14:36:41 +00:00
Simon Pilgrim c8e20b1150 [X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.

This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.

We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).

Differential Revision: https://reviews.llvm.org/D22460

llvm-svn: 276281
2016-07-21 14:10:54 +00:00
Marina Yatsina c1fa163392 ExecutionDepsFix - Fix bug in clearance calculation
The clearance calculation did not take into account registers defined as outputs or clobbers in inline assembly machine instructions because these register defs are implicit.

Differential Revision: http://reviews.llvm.org/D22580

llvm-svn: 276266
2016-07-21 12:37:07 +00:00
Matt Arsenault f0ba86a4d5 AMDGPU: Fix phis from blocks split due to register indexing
llvm-svn: 276257
2016-07-21 09:40:57 +00:00
David Majnemer 825e4ab9e3 [GVNHoist] Preserve optimization hints which agree
If we have optimization hints with agree with each other along different
paths, preserve them.

llvm-svn: 276248
2016-07-21 07:16:26 +00:00
David Majnemer 4808f26422 [GVNHoist] Don't wrongly preserve TBAA
We hoisted loads/stores without taking into account which can cause
miscompiles.

llvm-svn: 276240
2016-07-21 05:59:53 +00:00
Matthias Braun d9fdad72ae IPRA: Fix RegMask calculation for alias registers
This patch fixes a very subtle bug in regmask calculation. Thanks to zan
jyu Wong <zyfwong@gmail.com> for bringing this to notice.
For example if CL is only clobbered than CH should not be marked
clobbered but CX, RCX and ECX should be mark clobbered. Previously for
each modified register all of its aliases are marked clobbered by
markRegClobbred() in RegUsageInfoCollector.cpp but that is wrong because
when CL is clobbered then MRI::isPhysRegModified() will return true for
CL, CX, ECX, RCX which is correct behavior but then for CX, EXC, RCX we
mark CH also clobbered as CH is aliased to CX,ECX,RCX so
markRegClobbred() is not required because isPhysRegModified already take
cares of proper aliasing register. A very simple test case has been
added to verify this change.
Please find relevant bug report here :
http://llvm.org/PR28567

Patch by Vivek Pandya <vivekvpandya@gmail.com>

Differential Revision: https://reviews.llvm.org/D22400

llvm-svn: 276235
2016-07-21 03:50:39 +00:00
Adam Nemet 7cfd5971ab [OptDiag,LV] Add hotness attribute to applied-optimization remarks
Test coverage is provided by modifying the function in the FP-math
testcase that we are allowed to vectorize.

llvm-svn: 276223
2016-07-21 01:07:13 +00:00
Sanjay Patel 0753c06d9c [InstCombine] LogicOpc (zext X), C --> zext (LogicOpc X, C) (PR28476)
The benefits of this change include:
1. Remove DeMorgan-matching code that was added specifically to work-around 
   the missing transform in http://reviews.llvm.org/rL248634.
2. Makes the DeMorgan transform work for vectors too.
3. Fix PR28476: https://llvm.org/bugs/show_bug.cgi?id=28476

Extending this transform to other casts and other associative operators may
be useful too. See https://reviews.llvm.org/D22421 for a prerequisite for
doing that though.

Differential Revision: https://reviews.llvm.org/D22271

llvm-svn: 276221
2016-07-21 00:24:18 +00:00
Adam Nemet 0e0e2d5d26 [OptDiag,LV] Add hotness attribute to the derived analysis remarks
This includes FPCompute and Aliasing.

Testcase is based on no_fpmath.ll.

llvm-svn: 276211
2016-07-20 23:50:32 +00:00
Sanjay Patel 5f3c70307d [InstSimplify][InstCombine] don't crash when folding vector selects of icmp
Differential Revision: https://reviews.llvm.org/D22602

llvm-svn: 276209
2016-07-20 23:40:01 +00:00
Xinliang David Li fb64ebe313 Fix test failure on Win
llvm-svn: 276202
2016-07-20 22:53:39 +00:00
Xinliang David Li 9a1bfcfa16 Reapply r276185
Fix the test case that should not depend on dir iteration order.

llvm-svn: 276197
2016-07-20 22:24:52 +00:00
Justin Lebar cd564c6b46 [NVPTX] Enable the load-store vectorizer on nvptx.
Reviewers: tra

Subscribers: jholewinski, arsenm, asbirlea

Differential Revision: https://reviews.llvm.org/D22592

llvm-svn: 276196
2016-07-20 22:11:36 +00:00
Xinliang David Li ce3f385eeb Revert r276185 -- build bot failure
llvm-svn: 276194
2016-07-20 21:50:38 +00:00
Adam Nemet 5b3a5cf6b0 [OptDiag,LV] Add hotness attribute to analysis remarks
The earlier change added hotness attribute to missed-optimization
remarks.  This follows up with the analysis remarks (the ones explaining
the reason for the missed optimization).

llvm-svn: 276192
2016-07-20 21:44:26 +00:00
Artem Belevich 7e9c9a6582 [NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC.
After r276153 the pass applies to both kernels and regular functions.

Differential Revision: https://reviews.llvm.org/D22583

llvm-svn: 276189
2016-07-20 21:44:07 +00:00
Xinliang David Li d0b867e3e5 [Profile] support directory reading in profile merging
Differential Revision:  http://reviews.llvm.org/D22560

llvm-svn: 276185
2016-07-20 21:31:29 +00:00
Ahmed Bougacha a0cdd79070 [AArch64][FastISel] Select -O0 legal cmpxchg.
At -O0, cmpxchg survives AtomicExpand: it's mostly straightforward
to select it in fast-isel, and let the pseudo be expanded later.

extractvalues on the result are the tricky part: the generic logic
only works for legal types (and it would be painful to make it
support illegal types), so we can only support i32/i64 cmpxchg.

llvm-svn: 276183
2016-07-20 21:12:32 +00:00
Ahmed Bougacha b0674d1143 [AArch64][FastISel] Select atomic stores into STLR.
llvm-svn: 276182
2016-07-20 21:12:27 +00:00
David Majnemer bd21012c6c [GVNHoist] Don't hoist PHI nodes
We hoisted PHIs without respecting their special insertion point in the
block, leading to verfier errors.

This fixes PR28626.

llvm-svn: 276181
2016-07-20 21:05:01 +00:00
Davide Italiano 15ff2d6d0c [SCCP] Zap multiple return values.
We can replace the return values with undef if we replaced all
the call uses with a constant/undef.

Differential Revision:  https://reviews.llvm.org/D22336

llvm-svn: 276174
2016-07-20 20:17:13 +00:00
Justin Lebar a272c12b73 [LSV] Don't move stores across may-load instrs, and loosen restrictions on moving loads.
Summary:
Previously we wouldn't move loads/stores across instructions that had
side-effects, where that was defined as may-write or may-throw.  But
this is not sufficiently restrictive: Stores can't safely be moved
across instructions that may load.

This patch also adds a DEBUG check that all instructions in our chain
are either loads or stores.

Reviewers: asbirlea

Subscribers: llvm-commits, jholewinski, arsenm, mzolotukhin

Differential Revision: https://reviews.llvm.org/D22547

llvm-svn: 276171
2016-07-20 20:07:37 +00:00